In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothetical and potentially infinite group of objects conceived as a generalization from experience (e.g. the set of all possible hands in a game of poker). A common aim of statistical analysis is to produce information about some chosen population.
In statistical inference, a subset of the population (a statistical sample) is chosen to represent the population in a statistical analysis. The ratio of the size of this statistical sample to the size of the population is called a sampling fraction. It is then possible to estimate the population parameters using the appropriate sample statistics.
A subset of a population that shares one or more additional properties is called a sub population. For example, if the population is all Egyptian people, a sub population is all Egyptian males; if the population is all pharmacies in the world, a sub population is all pharmacies in Egypt. By contrast, a sample is a subset of a population that is not chosen to share any additional property.
Descriptive statistics may yield different results for different sub populations. For instance, a particular medicine may have different effects on different sub populations, and these effects may be obscured or dismissed if such special sub populations are not identified and examined in isolation.
Similarly, one can often estimate parameters more accurately if one separates out sub populations: the distribution of heights among people is better modeled by considering men and women as separate sub populations, for instance.
Populations consisting of sub populations can be modeled by mixture models, which combine the distributions within sub populations into an overall population distribution. Even if sub populations are well-modeled by given simple models, the overall population may be poorly fit by a given simple model – poor fit may be evidence for the existence of sub populations. For example, given two equal sub populations, both normally distributed, if they have the same standard deviation but different means, the overall distribution will exhibit low kurtosis relative to a single normal distribution – the means of the sub populations fall on the shoulders of the overall distribution. If sufficiently separated, these form a bimodal distribution; otherwise, it simply has a wide peak. Further, it will exhibit [overdispersion] relative to a single normal distribution with the given variation. Alternatively, given two sub populations with the same mean but different standard deviations, the overall population will exhibit high kurtosis, with a sharper peak and heavier tails (and correspondingly shallower shoulders) than a single distribution.
- "Glossary of statistical terms: Population". Statistics.com. Retrieved 22 February 2016.
- Weisstein, Eric W. "Statistical population". MathWorld.
- Yates, Daniel S.; Moore, David S; Starnes, Daren S. (2003). The Practice of Statistics (2nd ed.). New York: Freeman. ISBN 978-0-7167-4773-4. Archived from the original on 2005-02-09.
- "Glossary of statistical terms: Sample". Statistics.com. Retrieved 22 February 2016.