Enago Academy

Unraveling Research Population and Sample: Understanding their role in statistical inference

' src=

Research population and sample serve as the cornerstones of any scientific inquiry. They hold the power to unlock the mysteries hidden within data. Understanding the dynamics between the research population and sample is crucial for researchers. It ensures the validity, reliability, and generalizability of their findings. In this article, we uncover the profound role of the research population and sample, unveiling their differences and importance that reshapes our understanding of complex phenomena. Ultimately, this empowers researchers to make informed conclusions and drive meaningful advancements in our respective fields.

Table of Contents

What Is Population?

The research population, also known as the target population, refers to the entire group or set of individuals, objects, or events that possess specific characteristics and are of interest to the researcher. It represents the larger population from which a sample is drawn. The research population is defined based on the research objectives and the specific parameters or attributes under investigation. For example, in a study on the effects of a new drug, the research population would encompass all individuals who could potentially benefit from or be affected by the medication.

When Is Data Collection From a Population Preferred?

In certain scenarios where a comprehensive understanding of the entire group is required, it becomes necessary to collect data from a population. Here are a few situations when one prefers to collect data from a population:

1. Small or Accessible Population

When the research population is small or easily accessible, it may be feasible to collect data from the entire population. This is often the case in studies conducted within specific organizations, small communities, or well-defined groups where the population size is manageable.

2. Census or Complete Enumeration

In some cases, such as government surveys or official statistics, a census or complete enumeration of the population is necessary. This approach aims to gather data from every individual or entity within the population. This is typically done to ensure accurate representation and eliminate sampling errors.

3. Unique or Critical Characteristics

If the research focuses on a specific characteristic or trait that is rare and critical to the study, collecting data from the entire population may be necessary. This could be the case in studies related to rare diseases, endangered species, or specific genetic markers.

4. Legal or Regulatory Requirements

Certain legal or regulatory frameworks may require data collection from the entire population. For instance, government agencies might need comprehensive data on income levels, demographic characteristics, or healthcare utilization for policy-making or resource allocation purposes.

5. Precision or Accuracy Requirements

In situations where a high level of precision or accuracy is necessary, researchers may opt for population-level data collection. By doing so, they mitigate the potential for sampling error and obtain more reliable estimates of population parameters.

What Is a Sample?

A sample is a subset of the research population that is carefully selected to represent its characteristics. Researchers study this smaller, manageable group to draw inferences that they can generalize to the larger population. The selection of the sample must be conducted in a manner that ensures it accurately reflects the diversity and pertinent attributes of the research population. By studying a sample, researchers can gather data more efficiently and cost-effectively compared to studying the entire population. The findings from the sample are then extrapolated to make conclusions about the larger research population.

What Is Sampling and Why Is It Important?

Sampling refers to the process of selecting a sample from a larger group or population of interest in order to gather data and make inferences. The goal of sampling is to obtain a sample that is representative of the population, meaning that the sample accurately reflects the key attributes, variations, and proportions present in the population. By studying the sample, researchers can draw conclusions or make predictions about the larger population with a certain level of confidence.

Collecting data from a sample, rather than the entire population, offers several advantages and is often necessary due to practical constraints. Here are some reasons to collect data from a sample:

define population research

1. Cost and Resource Efficiency

Collecting data from an entire population can be expensive and time-consuming. Sampling allows researchers to gather information from a smaller subset of the population, reducing costs and resource requirements. It is often more practical and feasible to collect data from a sample, especially when the population size is large or geographically dispersed.

2. Time Constraints

Conducting research with a sample allows for quicker data collection and analysis compared to studying the entire population. It saves time by focusing efforts on a smaller group, enabling researchers to obtain results more efficiently. This is particularly beneficial in time-sensitive research projects or situations that necessitate prompt decision-making.

3. Manageable Data Collection

Working with a sample makes data collection more manageable . Researchers can concentrate their efforts on a smaller group, allowing for more detailed and thorough data collection methods. Furthermore, it is more convenient and reliable to store and conduct statistical analyses on smaller datasets. This also facilitates in-depth insights and a more comprehensive understanding of the research topic.

4. Statistical Inference

Collecting data from a well-selected and representative sample enables valid statistical inference. By using appropriate statistical techniques, researchers can generalize the findings from the sample to the larger population. This allows for meaningful inferences, predictions, and estimation of population parameters, thus providing insights beyond the specific individuals or elements in the sample.

5. Ethical Considerations

In certain cases, collecting data from an entire population may pose ethical challenges, such as invasion of privacy or burdening participants. Sampling helps protect the privacy and well-being of individuals by reducing the burden of data collection. It allows researchers to obtain valuable information while ensuring ethical standards are maintained .

Key Steps Involved in the Sampling Process

Sampling is a valuable tool in research; however, it is important to carefully consider the sampling method, sample size, and potential biases to ensure that the findings accurately represent the larger population and are valid for making conclusions and generalizations. While the specific steps may vary depending on the research context, here is a general outline of the sampling process:

define population research

1. Define the Population

Clearly define the target population for your research study. The population should encompass the group of individuals, elements, or units that you want to draw conclusions about.

2. Define the Sampling Frame

Create a sampling frame, which is a list or representation of the individuals or elements in the target population. The sampling frame should be comprehensive and accurately reflect the population you want to study.

3. Determine the Sampling Method

Select an appropriate sampling method based on your research objectives, available resources, and the characteristics of the population. You can perform sampling by either utilizing probability-based or non-probability-based techniques. Common sampling methods include random sampling, stratified sampling, cluster sampling, and convenience sampling.

4. Determine Sample Size

Determine the desired sample size based on statistical considerations, such as the level of precision required, desired confidence level, and expected variability within the population. Larger sample sizes generally reduce sampling error but may be constrained by practical limitations.

5. Collect Data

Once the sample is selected using the appropriate technique, collect the necessary data according to the research design and data collection methods . Ensure that you use standardized and consistent data collection process that is also appropriate for your research objectives.

6. Analyze the Data

Perform the necessary statistical analyses on the collected data to derive meaningful insights. Use appropriate statistical techniques to make inferences, estimate population parameters, test hypotheses, or identify patterns and relationships within the data.

Population vs Sample — Differences and examples

While the population provides a comprehensive overview of the entire group under study, the sample, on the other hand, allows researchers to draw inferences and make generalizations about the population. Researchers should employ careful sampling techniques to ensure that the sample is representative and accurately reflects the characteristics and variability of the population.

define population research

Research Study: Investigating the prevalence of stress among high school students in a specific city and its impact on academic performance.

Population: All high school students in a particular city

Sampling Frame: The sampling frame would involve obtaining a comprehensive list of all high schools in the specific city. A random selection of schools would be made from this list to ensure representation from different areas and demographics of the city.

Sample: Randomly selected 500 high school students from different schools in the city

The sample represents a subset of the entire population of high school students in the city.

Research Study: Assessing the effectiveness of a new medication in managing symptoms and improving quality of life in patients with the specific medical condition.

Population: Patients diagnosed with a specific medical condition

Sampling Frame: The sampling frame for this study would involve accessing medical records or databases that include information on patients diagnosed with the specific medical condition. Researchers would select a convenient sample of patients who meet the inclusion criteria from the sampling frame.

Sample: Convenient sample of 100 patients from a local clinic who meet the inclusion criteria for the study

The sample consists of patients from the larger population of individuals diagnosed with the medical condition.

Research Study: Investigating community perceptions of safety and satisfaction with local amenities in the neighborhood.

Population: Residents of a specific neighborhood

Sampling Frame: The sampling frame for this study would involve obtaining a list of residential addresses within the specific neighborhood. Various sources such as census data, voter registration records, or community databases offer the means to obtain this information. From the sampling frame, researchers would randomly select a cluster sample of households to ensure representation from different areas within the neighborhood.

Sample: Cluster sample of 50 households randomly selected from different blocks within the neighborhood

The sample represents a subset of the entire population of residents living in the neighborhood.

To summarize, sampling allows for cost-effective data collection, easier statistical analysis, and increased practicality compared to studying the entire population. However, despite these advantages, sampling is subject to various challenges. These challenges include sampling bias, non-response bias, and the potential for sampling errors.

To minimize bias and enhance the validity of research findings , researchers should employ appropriate sampling techniques, clearly define the population, establish a comprehensive sampling frame, and monitor the sampling process for potential biases. Validating findings by comparing them to known population characteristics can also help evaluate the generalizability of the results. Properly understanding and implementing sampling techniques ensure that research findings are accurate, reliable, and representative of the larger population. By carefully considering the choice of population and sample, researchers can draw meaningful conclusions and, consequently, make valuable contributions to their respective fields of study.

Now, it’s your turn! Take a moment to think about a research question that interests you. Consider the population that would be relevant to your inquiry. Who would you include in your sample? How would you go about selecting them? Reflecting on these aspects will help you appreciate the intricacies involved in designing a research study. Let us know about it in the comment section below or reach out to us using  #AskEnago  and tag  @EnagoAcademy  on  Twitter ,  Facebook , and  Quora .

' src=

Thank you very much, this is helpful

Very impressive and helpful and also easy to understand….. Thanks to the Author and Publisher….

Rate this article Cancel Reply

Your email address will not be published.

define population research

Enago Academy's Most Popular Articles

Gender Bias in Science Funding

  • Diversity and Inclusion
  • Trending Now

The Silent Struggle: Confronting gender bias in science funding

In the 1990s, Dr. Katalin Kariko’s pioneering mRNA research seemed destined for obscurity, doomed by…

Content Analysis vs Thematic Analysis: What's the difference?

  • Reporting Research

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for data interpretation

In research, choosing the right approach to understand data is crucial for deriving meaningful insights.…

Addressing Biases in the Journey of PhD

Addressing Barriers in Academia: Navigating unconscious biases in the Ph.D. journey

In the journey of academia, a Ph.D. marks a transitional phase, like that of a…

Cross-sectional and Longitudinal Study Design

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right approach

The process of choosing the right research design can put ourselves at the crossroads of…

Networking in Academic Conferences

  • Career Corner

Unlocking the Power of Networking in Academic Conferences

Embarking on your first academic conference experience? Fear not, we got you covered! Academic conferences…

Choosing the Right Analytical Approach: Thematic analysis vs. content analysis for…

Comparing Cross Sectional and Longitudinal Studies: 5 steps for choosing the right…

Research Recommendations – Guiding policy-makers for evidence-based decision making

define population research

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

  • 2000+ blog articles
  • 50+ Webinars
  • 10+ Expert podcasts
  • 50+ Infographics
  • 10+ Checklists
  • Research Guides

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

define population research

What should universities' stance be on AI tools in research and academic writing?

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety

define population research

Research Population

All research questions address issues that are of great relevance to important groups of individuals known as a research population.

This article is a part of the guide:

  • Non-Probability Sampling
  • Convenience Sampling
  • Random Sampling
  • Stratified Sampling
  • Systematic Sampling

Browse Full Outline

  • 1 What is Sampling?
  • 2.1 Sample Group
  • 2.2 Research Population
  • 2.3 Sample Size
  • 2.4 Randomization
  • 3.1 Statistical Sampling
  • 3.2 Sampling Distribution
  • 3.3.1 Random Sampling Error
  • 4.1 Random Sampling
  • 4.2 Stratified Sampling
  • 4.3 Systematic Sampling
  • 4.4 Cluster Sampling
  • 4.5 Disproportional Sampling
  • 5.1 Convenience Sampling
  • 5.2 Sequential Sampling
  • 5.3 Quota Sampling
  • 5.4 Judgmental Sampling
  • 5.5 Snowball Sampling

A research population is generally a large collection of individuals or objects that is the main focus of a scientific query. It is for the benefit of the population that researches are done. However, due to the large sizes of populations, researchers often cannot test every individual in the population because it is too expensive and time-consuming. This is the reason why researchers rely on sampling techniques .

A research population is also known as a well-defined collection of individuals or objects known to have similar characteristics. All individuals or objects within a certain population usually have a common, binding characteristic or trait.

Usually, the description of the population and the common binding characteristic of its members are the same. "Government officials" is a well-defined group of individuals which can be considered as a population and all the members of this population are indeed officials of the government.

define population research

Relationship of Sample and Population in Research

A sample is simply a subset of the population. The concept of sample arises from the inability of the researchers to test all the individuals in a given population. The sample must be representative of the population from which it was drawn and it must have good size to warrant statistical analysis.

The main function of the sample is to allow the researchers to conduct the study to individuals from the population so that the results of their study can be used to derive conclusions that will apply to the entire population. It is much like a give-and-take process. The population “gives” the sample, and then it “takes” conclusions from the results obtained from the sample.

define population research

Two Types of Population in Research

Target population.

Target population refers to the ENTIRE group of individuals or objects to which researchers are interested in generalizing the conclusions. The target population usually has varying characteristics and it is also known as the theoretical population.

Accessible Population

The accessible population is the population in research to which the researchers can apply their conclusions. This population is a subset of the target population and is also known as the study population. It is from the accessible population that researchers draw their samples.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Explorable.com (Nov 15, 2009). Research Population. Retrieved Apr 21, 2024 from Explorable.com: https://explorable.com/research-population

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

Want to stay up to date? Follow us!

Save this course for later.

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

define population research

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

Study Population

  • Reference work entry
  • pp 6412–6414
  • Cite this reference work entry

Book cover

3174 Accesses

Download reference work entry PDF

Study population is a subset of the target population from which the sample is actually selected. It is broader than the concept sample frame . It may be appropriate to say that sample frame is an operationalized form of study population. For example, suppose that a study is going to conduct a survey of high school students on their social well-being . High school students all over the world might be considered as the target population. Because of practicalities, researchers decide to only recruit high school students studying in China who are the study population in this example. Suppose there is a list of high school students of China, this list is used as the sample frame .

Description

Study population is the operational definition of target population (Henry, 1990 ; Bickman & Rog, 1998 ). Researchers are seldom in a position to study the entire target population, which is not always readily accessible. Instead, only part of it—respondents who are both eligible for the study and available—are recruited. In some cases, a list of elements of the target population simply does not exist for sampling purposes. Even where the list does exist, it is usually somewhat incomplete. For example, not all migrant workers register themselves with the government. Still, part of the population of research interest may be omitted for other practical considerations. For example:

“National polling firms may limit their national samples to the 48 adjacent states, omitting Alaska and Hawaii for practical reasons.” “A researcher wishing to sample psychology professors may limit the study population to those in psychology departments, omitting those in other departments.” (Babbie, 2010 :199)

The study population should be defined before an investigation begins. Researchers need to define the characteristics of individuals who will be selected for the study, to avoid ambiguity and confusion. Researchers also need to make these definitions clear to other researchers and the public. There are several reasons for this. First, the academic community and the public must know to what kinds of people the research findings apply; second, knowledge of the study population helps other researchers assess the study’s merit and appropriateness; third, details of the methods and procedures are necessary for the other researchers to be able to replicate or expand the study (Friedman, Furberg, & DeMets, 2010 ).

The required information to answer research questions is obtained from the study population (Kumar, 2011 ). Similar to the process of narrowing the research problem, a specific and clear framework for the study population is developed, enabling the selection of appropriate respondents. Moreover, “it is only through making your procedures explicit that you can validly describe, explain, verify and test.” (Kumar, 2011 :57) Notice that Kumar ( 2011 ) also points out that both the research problem and the study population need to be narrowed, and as specific as possible in quantitative research, however, in qualitative research, the two should remain loose and flexible to ensure obtaining varied and rich data.

Specifically, defining the study population has received great research attention in medical and clinical study (Friedman et al., 2010 ; Gerrish & Lacey, 2010 ; Riegelman, 2005 ). The characteristics of those being studied are defined by inclusion criteria and exclusion criteria. Inclusion criteria identify the types of individuals who should be included in the study and must be present for an individual to be eligible to take part. Exclusion criteria mean that the individuals are no longer eligible for the study even if they meet the inclusion criteria. Without exclusion criteria, the situation may be more complicated and the interpretation of the research findings more difficult. Here is an example from Riegelman ( 2005 ):

“An investigator wanted to study the effect of a new therapy for breast cancer. He selected all available breast cancer patients and found that the treatment, on average, resulted in no improvement in outcome. Later research revealed that the therapy provided a substantial improvement in outcome for women with stage III breast cancer. The therapy, however, was shown to have no benefit if women with breast cancer had undergone previous radiation therapy.”

If stage III breast cancer were required as an inclusion criterion and previous radiation therapy as an exclusion criterion in this investigation, the results would be very different.

Friedman et al. ( 2010 ) proposed a framework to develop individual eligibility criteria. For example, those who have the potential to benefit from the clinical intervention obviously should be eligible for the study; any person for whom the intervention is known to be harmful should not be enrolled into the study. Although the framework is heavily based in medical science, it might be informative and interesting to researchers of other fields.

Inclusion criteria and exclusion criteria narrow the group being studied. As a result, the study population may or may not reflect the target population to whom investigators wish to apply the research findings. It is thus very important to understand the gap between the study population and the target population and to make it explicit and clear to the readers. Moreover, considering how representative is the sample of the study population, it requires more careful considerations in generalizing from participants actually being studied to the study population and then to the target population. For the characteristics such as age, sex, or weight which can be clearly stated and measured, it is relatively easy to specify how the study sample and study population are different from the target population and therefore to decide the appropriateness of generalizing the research findings. For other factors of the study participants, it may be difficult to make appropriate compensatory adjustments in the analysis. For example, are volunteers who agree to participate in the study different from those who do not and in what ways? If there are some specific factors that motivate the participants, how do they affect the representativeness of the study sample?

Study population is an important concept in survey research , clinical trial, and other special designs or experiments. Defining the study population is indispensable to posing and narrowing the research question, obtaining required information to address the research question, verifying and testing hypotheses, and applying the research findings. Usually, there is a gap between the study population and the target population. The inclusion and exclusion criteria developed to obtain the study population should always be made as specific, explicit, and clear as conditions allow. Caution is needed to generalize the research findings based on study participants—the sample selected from the study population—to the study population and then to the target population. More research needs to be done to help us understand how the process of defining and operationalizing study population may impact on coverage error, enrolment of study participants, sampling error , and generalizability of the research findings.

Cross-References

Sample Frame

Sampling Error

Social Well-Being

Survey Research

Babbie, E. R. (2010). The practice of social research . Belmont, CA: Wadsworth Publishing Company.

Google Scholar  

Bickman, L., & Rog, D. J. (1998). Handbook of applied social research methods . Thousand Oaks, CA: Sage Publications.

Friedman, L. M., Furberg, C. D., & DeMets, D. L. (2010). Fundamentals of clinical trials . New York: Springer.

Gerrish, K., & Lacey, A. (2010). The research process in nursing . West Sussex: Wiley-Blackwell.

Henry, G. T. (1990). Practical sampling . Newbury Park, CA: Sage Publications.

Kumar, R. (2011). Research methodology: A step-by-step guide for beginners . London: Sage Publications Limited.

Riegelman, R. K. (2005). Studying a study and testing a test: How to read the medical evidence . Philadelphia: Lippincott Williams & Wilkins.

Download references

Author information

Authors and affiliations.

Sociology Department, National University of Singapore, 11 Arts Link, 117570, Singapore, Singapore

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Shu Hu .

Editor information

Editors and affiliations.

University of Northern British Columbia, Prince George, BC, Canada

Alex C. Michalos

(residence), Brandon, MB, Canada

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media Dordrecht

About this entry

Cite this entry.

Hu, S. (2014). Study Population. In: Michalos, A.C. (eds) Encyclopedia of Quality of Life and Well-Being Research. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-0753-5_2893

Download citation

DOI : https://doi.org/10.1007/978-94-007-0753-5_2893

Publisher Name : Springer, Dordrecht

Print ISBN : 978-94-007-0752-8

Online ISBN : 978-94-007-0753-5

eBook Packages : Humanities, Social Sciences and Law

Share this entry

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

define population research

Module 1 - Population Health

Part 1 - asking questions and generating evidence.

  •   Page:
  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  

On This Page sidebar

Defining Populations

Categories of eligibility criteria for study populations, dynamic and stationary (fixed) populations, sampling from a population.

Learn More sidebar

We can begin by defining a population as a collection of individuals who share at least one common or organizing characteristi c. While this definition is broad, it retains the flexibility to define populations in several ways depending upon the public health question of interest. How we define populations for study affects analysis , interpretation , and generalizability of results.

When studying population health, it is useful to define study populations based on eligibility criteria, i.e., the characteristics of individuals that make them appropriate for an epidemiologic study. 

There are three main categories that are useful for defining eligibility for a study population:

Public health questions often focus on specific geographic areas of varying size (village, city, county, state, country) over a specific period of time. People living in a specific location may have many common characteristics that might influence health, including climate, environmental exposures, culture, socioeconomic factors, nutrition, etc. Individuals born during the same period of time (birth cohorts) are often found to have a similar course with respect to health outcomes, and different birth cohorts may have dissimilar health outcomes. Since people frequently move from one place to another, geographically defined cohorts can be dynamic, with people moving in or moving out. Obviously, living within a given geographic area is the primary criterion for membership in the population. Given the dynamic nature of these studies, it is sometimes useful to think of the population as being comprised not of people, but as individual lengths of "person-time" during which each individual met the eligibility criteria. For example, consider a study population focusing on health issues in Woburn, MA from 1970-1980. An individual who moved from Los Angeles to Woburn in 1975 and then moved back to LA two years later would only have contributed 2 person-years of information to the overall study.

If one were interested in studying the health outcomes of newborn infants based on their birth weight, the study population would logically be comprised of neonates and would not necessarily focus narrowly on geography or year of birth. Similarly, the study population might be defined by an event such as the attacks on the World Trade Center and the health consequences among responders to that event. These two examples illustrate relatively stationary populations, but populations defined in this way can be dynamic, such as a study of 70-80 year-olds. During a longitudinal study, new subjects would continually become eligible, will others would become ineligible by virtue of exceeding the age limit or by dying.

The study population might also be defined based on the likelihood of achieving a successful study. For example, in 1981 the Physicians' Health Study invited all 261,248 male physicians between 40 and 84 years of age who lived in the United States and who were registered with the American Medical Association to participate in a randomized clinical trial to test the efficacy of low-dose aspirin and beta carotene in the primary prevention of cardiovascular disease and cancer. Almost half responded to the invitation, but there were also a number of other eligibility criteria and 26,062 were told they could not participate because of a prior history of myocardial infarction, stroke, cancer, or other excluding criteria.

The 33,223 who were eligible and willing were enrolled in a "run-in" phase during which all received active aspirin and placebo beta-carotene. After 18 weeks, participants were asked about their health status, side effects, compliance, and willingness to continue in the trial, and over 11,000 decided not to participate.

The remaining 22,071 physicians were then randomized to one of the four treatment arms of the study. Physicians were chosen because they could provide reliable information on questionnaires, and they would be easier to follow, particularly since they were all registered physicians. Restriction to those between 40-84 years old ensured a population at higher risk of having one of the outcomes of interest, and women were excluded because there were so few female physicians in that age group in 1981.

Finally, the run-in phase narrowed the population even further to the subset of physicians who were most likely to be able and willing to comply with the regimen over time. So, there were multiple eligibility criteria that enhanced the likelihood of a study that would successfully answer the questions being addressed.

An individual may meet the eligibility criteria to be included in a population at one point in time, but not at another. Populations with individuals moving in and out of eligibility are termed dynamic in contrast to stationary or fixed populations.

 A population of homeless people would be considered very dynamic, and it would be difficult to conduct a longitudinal follow up study in them. In contrast, workers who dealt with the aftermath of the attacks on the World Trade Center (a population defined by event) would be considered a stationary or fixed population, because they had experienced the defining event and would be considered members of that cohort until they died, even if they moved elsewhere. The distinction between dynamic and stationary populations is not strict, but it is something that should be considered when designing a study. When studying relatively dynamic populations, consideration should be given to considering data collection based on the "person-time" contributed by individuals when they were eligible. This will be discussed in greater detail in the module on measuring the frequency of health events.

When studying a population, it would be ideal to have all of the information we wanted from all members of the population. However, this is rarely possible because of the time and resources that would be required to collect the information needed. Because of this we commonly take samples that are representative of the population of interest and study them in a way that enables us to make valid inferences about the population from which they were drawn. In order to obtain accurate answers to the questions being addressed and achieve the research goals it is essential to:

  • Define the population of interest and
  • Define the question to be answered

These requirements go hand-in-hand, because selection of an appropriate study population is dependent upon the question being addressed. Sometimes the study population seems obvious given the research question, but the study populations may be broader than that which at first seems obvious. For example, we saw previously that a study of the causes of hypertension could be conducted among male civil servants in London by comparing the characteristics of people with hypertension to those without it. However, a more complete understanding might be achieved by broadening the study population to include additional populations. When suspicions of an unusually high frequency of leukemia and other diseases arose in Woburn, MA in the late 1970s, one avenue of study would be to designate Woburn as the population of interest and compare the characteristics of diseased residents to those of non-diseased residents. However, this by itself would omit other important comparisons. For example, how did the frequency of leukemia and other diseases in Woburn compare to that observed in Massachusetts in general? Or to the frequency observed across the United States? And how did environmental conditions in Woburn differ from those in other locations?

return to top | previous page | next page

Content ©2019. All Rights Reserved. Date last modified: May 16, 2019. Wayne W. LaMorte, MD, PhD, MPH

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • v.90(4); 2012 Dec

Who and What Is a “Population”? Historical Debates, Current Controversies, and Implications for Understanding “Population Health” and Rectifying Health Inequities

The idea of “population” is core to the population sciences but is rarely defined except in statistical terms. Yet who and what defines and makes a population has everything to do with whether population means are meaningful or meaningless, with profound implications for work on population health and health inequities.

In this article, I review the current conventional definitions of, and historical debates over, the meaning(s) of “population,” trace back the contemporary emphasis on populations as statistical rather than substantive entities to Adolphe Quetelet's powerful astronomical metaphor, conceived in the 1830s, of l’homme moyen (the average man), and argue for an alternative definition of populations as relational beings. As informed by the ecosocial theory of disease distribution, I then analyze several case examples to explore the utility of critical population-informed thinking for research, knowledge, and policy involving population health and health inequities.

Four propositions emerge: (1) the meaningfulness of means depends on how meaningfully the populations are defined in relation to the inherent intrinsic and extrinsic dynamic generative relationships by which they are constituted; (2) structured chance drives population distributions of health and entails conceptualizing health and disease, including biomarkers, as embodied phenotype and health inequities as historically contingent; (3) persons included in population health research are study participants, and the casual equation of this term with “study population” should be avoided; and (4) the conventional cleavage of “internal validity” and “generalizability” is misleading, since a meaningful choice of study participants must be in relation to the range of exposures experienced (or not) in the real-world societies, that is, meaningful populations, of which they are a part.

Conclusions

To improve conceptual clarity, causal inference, and action to promote health equity, population sciences need to expand and deepen their theorizing about who and what makes populations and their means.

Population sciences, whether focused on people or the plenitude of other species with which we inhabit this world, rely on a remarkable, almost alchemical, feat that nevertheless now passes as commonplace: creating causal and actionable knowledge via the transmutation of data from unique individuals into population distributions, dynamics, and rates. In the case of public health, a comparison of population data—especially rates and averages of traits—sets the basis for not only elucidating etiology but also identifying and addressing health, health care, and health policy inequities manifested in differential outcomes caused by social injustice ( Davis and Rowland 1983 ; Irwin et al. 2006 ; Krieger 2001 , 2011 ; Svensson 1990 ; Whitehead 1992 ; WHO 2008 , 2011 ).

But who are these “populations,” and why should their means be meaningful? Might some instead be meaningless, the equivalent of fool's gold or, worse, dangerously misleading?

Because “population” is such a fundamental term for so many sciences that analyze population data—for example, epidemiology, demography, sociology, ecology, and population biology and population genetics, not to mention statistics and biostatistics (see, e.g., Desrosières 1998 ; Gaziano 2010 ; Greenhalgh 1996 ; Hey 2011 ; Kunitz 2007 ; Mayr 1988 ; Pearce 1999 ; Porter 1986 ; Ramsden 2002 ; Stigler 1986 ; Weiss and Long 2009 )—presumably it would be reasonable to posit that the meaning of “population” is clear-cut and needs no further discussion.

As I document in this article, the surprise instead is that although the idea of “population” is core to the population sciences, it is rarely defined, especially in sciences dealing with people, except in abstract statistical terms. Granted, the “fuzziness” of concepts sometimes can be useful, especially when their empirical content is still being worked out, as illustrated by the well-documented contested history of the meanings of the “gene” as variously an abstract, functional, or physical entity, extending from before and still continuing well after the mid-twentieth-century discovery of DNA ( Burian and Zallen 2009 ; Falk 2000 ; Keller 2000 ; Morange 2001 ). Nevertheless, such fuzziness can also be a major problem, especially if the lack of clear definition or a conflation of meanings distorts causal analysis and accountability.

In this article, I accordingly call for expanding and deepening what I term “critical population-informed thinking.” Such thinking is needed to reckon with, among other things, claims of “population-based” evidence, principles for comparing results across “populations” (and their “subpopulations”), terminology regarding “study participants” (vs. “study population”), and assessing the validity (and not just the generalizability) of results. Addressing these issues requires clearly differentiating between (1) the dominant view that populations are (statistical) entities composed of component parts defined by innate attributes and (2) the alternative that I describe, in which populations are dynamic beings constituted by intrinsic relationships both among their members and with the other populations that together produce their existence and make meaningful casual inference possible.

To make my case, I review current conventional definitions of, and historical debates over, the meaning(s) of “population” and then offer case examples involving population health and health inequities. Informing my argument is the ecosocial theory of disease distribution and its focus on how people literally biologically embody their societal and ecological context, at multiple levels, across the life course and historical generations ( Krieger 1994 , 2001 , 2011 ), thereby producing population patterns of health, disease, and well-being.

Who and What Is a Population?

Conventional definitions.

Who and what determines who and what counts as a “population”? Table 1 lists conventional definitions culled from several contemporary scholarly reference texts. As quickly becomes apparent, the meaning of this term has expanded over time to embrace a variety of concepts. Tracing its etymology to the word's Latin roots, the Oxford English Dictionary ( OED 2010 ), for example, notes that “population” originally referred to the people living in (i.e., populating) a particular place, and this remains its primary meaning. Even so, as the OED 's definitions also make clear, “population” has come to acquire a technical meaning. In statistics, it refers to “a (real or hypothetical) totality of objects or individuals under consideration, of which the statistical attributes may be estimated by the study of a sample or samples drawn from it.” In genetics (or, really, biology more broadly), the OED defines “population” as “a group of animals, plants, or humans, within which breeding occurs.” Likewise, atoms, subatomic particles, stars, and other “celestial objects” are stated as sharing certain properties allowing them to be classed together in “populations” (even though the study of inanimate objects typically falls outside the purview of the “population sciences”).

Definitions of “Population” from Scholarly Reference Texts

Mirroring the OED 's definitions are those provided in diverse “population sciences” dictionaries and encyclopedias. Four such texts, whose definitions are echoed in key works in population health ( Evans, Barer, and Marmor 1994 ; Rose 1992 , 2008 ; Rothman, Greenland, and Lash 2008 ; Young 2005 ), are worth noting: A Dictionary of Epidemiology ( Porta 2008 ), A Dictionary of Sociology ( Scott and Marshall 2005 ), and the two entries from the International Encyclopedia of the Social & Behavioral Sciences that offer a definition of “population,” one focused on “human evolutionary genetics” ( Mountain 2001 ) and the other on “generalization: conceptions in the social sciences” ( Cook 2001 ). A fifth resource, the Encyclopedia of Life Sciences , interestingly does not include any articles specifically on defining “population.” However, of the 396 entries located with the search term “population” and sorted by “relevance,” the first 25 focus on populations principally in relation to genetics, reproduction, and natural selection ( Clarke et al. 2000 –2011).

Among these four texts, all germane to population sciences that study people, the first two briefly define “population” in relation to inhabitants of an area but notably remain mum on the myriad populations appearing in the public health literature not linked to geographic locale (e.g., the “elderly population,” the “white population,” or the “lesbian/gay/bisexual/transgender population”). Most of their text is instead devoted to the idea of “population” in relation to statistical sampling ( Porta 2008 ; Scott and Marshall 2005 ). By contrast, the third text invokes biology (with no mention of statistics) and defines a “population” to be a “mating pool” ( Mountain 2001 , 6985), albeit observing that “groups of humans rarely, if ever, meet this definition,” so that “in practice … human evolutionary geneticists delineate populations along linguistic, geographic, socio-political, and/or cultural boundaries. A population might include, for example, all speakers of a particular Bantu language, all inhabitants of a river valley in Italy, or all members of a caste group in India.”

The fourth text avers that in the social sciences, “population” has two meanings: as a theory-dependent hypothetical “construct” (whose basis is not defined) and as an empirically defined “universe” (used as a sampling frame) ( Cook 2001 ). A telling example illustrates that for people, geographical location, nationality, and ancestry need not neatly match, as in the case of an illegal immigrant or a legal citizen of one country legally residing in a different country ( table 1 ). Consequently, apart from specifying that entities comprising a population individually possess some attribute qualifying them to be a member of that population, none of the conventional definitions offers systematic criteria by which to decide, in theoretical or practical terms, who and what is a population, let alone whether and, if so, why their mean value or rate (or any statistical parameter) might have any substantive meaning.

Meet the “Average Man”: Quetelet's 1830s Astronomical Metaphor Amalgamating “Population” and “Statistics”

The overarching emphasis on “populations” as technical statistical entities and the limited discussion as to what defines them, especially for the human populations, is at once remarkable and unsurprising. It is remarkable because “population” stands at the core, conceptually and empirically, of any and all population sciences. It is unsurprising, given the history and politics of how, in the case of people, “population” and “sample” first were joined ( Krieger 2011 ).

In brief, and as recounted by numerous historians of statistics ( Daston 1987 ; Desrosières 1998 ; Hacking 1975 , 1990 ; Porter 1981 , 1986 , 1995 , 2002 , 2003 ; Stigler 1986 , 2002 ; Yeo 2003 ), during the early 1800s the application of quantitative methods and laws of probability to the study of people in Europe took off, a feat that required reckoning with such profound issues as free will, God's will, and human fate. To express the mind shift involved, a particularly powerful metaphor took root: that of the “l’homme moyen” (the average man), which, in the convention of the day, included women ( figure 1 ). First used in 1831 in an address given by Adolphe Quetelet (1796–1874), the Belgian astronomer-turned-statistician-turned-sociologist-turned-nosologist ( Hankins 1968 ; Stigler 2002 ), the metaphor gained prominence following the publication in 1835 of Quetelet's enormously influential opus, Sur l’homme et le development de ses facultés, ou essai de physique sociale ( Quetelet 1835 ). Melding the ideas of essential types, external influences, and random errors, the image of the “average man” solidified a view of populations, particularly human populations, as innately defined by their intrinsic qualities. Revealing these innate qualities, according to Quetelet, was a population's on-average traits, whether pertaining to height and weight, birth and death rates, intellectual faculties, moral properties, and even propensity to commit crime ( Quetelet 1835 , 1844 ).

An external file that holds a picture, illustration, etc.
Object name is milq0090-0634-f1.jpg

What is the meaning of means and errors?—Adolphe Quetelet (1796–1874) and the astronomical metaphor animating his 1830s “I'homme moyen” (“the average man”).

Source: Illustration of normal curve from Quetelet 1844 .

The metaphor animating Quetelet's “average man” was inspired by his background in astronomy and meteorology. Shifting his gaze from the heavens to the earth, Quetelet arrived at his idea of “the average man” by inverting the standard approach his colleagues used to fix the location of stars, in which the results of observations from multiple observatories (each with some degree of error) were combined to determine a star's most likely celestial coordinates ( Porter 1981 ; Stigler 1986 , 2002 ). Reasoning by analogy, Quetelet ingeniously, if erroneously, argued that the distribution of a population's characteristics served as a guide to its true (inherent) value ( Quetelet 1835 , 1844 ). From this standpoint, the observed “deviations” or “errors” arose from the imperfect variations of individuals, each counting as an “observation-with-error” akin to the data produced by each observatory. The impact of these “errors” was effectively washed out by the law of large numbers. Attesting to the power of metaphor in science and more generally ( Krieger 1994 , 2011 ; Martin and Harré 1982 ; Ziman 2000 ), Quetelet's astronomical “average man” simultaneously enabled a new way to see and study population variation even as it erased a crucial distinction. For a star, the location of the mean referred to the location of a singular real object, whereas for a population, the location of its population mean depended on how the population was defined.

To Quetelet, this new conception of population meant that population means, based on sufficiently large samples, could be meaningfully compared to determine if the populations’ essential characteristics truly differed. The contingent causal inference was that if the specified populations differed in their means, this would mean that they either differed in their essence (if subject to the same external forces) or else were subject to different external forces (assuming the same internal essence). Reflecting, however, the growing pressure for nascent social scientists to be seen as “objective,” Quetelet's discussion of external forces steered clear of politics. Concretely, this translated to not challenging mainstream religious or economic beliefs, including the increasingly widespread individualistic philosophies then linked to the rapid ascendance of the liberal free-market economy ( Desrosières 1998 ; Hacking 1990 ; Heilbron, Magnusson, and Wittrock 1998 ; Porter 1981 , 1986 , 1995 , 2003 ; Ross 2003 ). For example, although Quetelet conceded that “the laws and principles of religion and morality” could act as “influencing causes” ( Quetelet 1844 , xvii), in his analyses he treated education, occupation, and the propensity to commit crime as individual attributes no different from height and weight. The net result was that a population's essence—crucial to its success or failure—was conceptualized as an intrinsic property of the individuals who comprised the population; the corollary was that population means and rates were a result and an expression of innate individual characteristics.

Or so the argument went. At the time, others were not convinced and contended that Quetelet's means were simply arbitrary arithmetic contrivances resulting from declaring certain groups to be populations ( Cole 2000 ; Desrosières 1998 ; Porter 1981 ; Stigler 1986 , 2002 ). As Quetelet himself acknowledged, the national averages and rates defining a country's “average man” coexisted with substantial regional and local variation. Hence, data for one region of France would yield one mean, and for another region it would be something else. If the two were combined, a third mean would result—and who was to say which, if any, of these means was meaningful, let alone reflective of an intrinsic essence (or, for that matter, external influences)?

Quetelet's tautological answer was to differentiate between what he termed “true means” versus mere “arithmetical averages” ( Porter 1981 ; Quetelet 1844 ). The former could be derived only from “true” populations, whose distribution by definition expressed the “law of errors” (e.g., the normal curve). In such cases, Quetelet argued, the mean reflected the population's true essence. By contrast, any disparate lot of objects measured by a common metric could yield a simple “average” (e.g., average height of books or of buildings), but the meaningless nature of this parameter, that is, its inability to be informative about any innate “essence,” would be revealed by the lack of a normal distribution.

And so the argument continued until the terms were changed in a radically different way by Darwin's theory of evolution, presented in Origin of Species , published in 1859 (Darwin [1859] [ 2004 ]). The central conceptual shift was from “errors” to “variation” ( Eldredge 2005 ; Hey 2011 ; Hodge 2009 ; Mayr 1988 ). This variation, thought to reflect inheritable characteristics passed on from parent to progeny, was in effect a consequence of who survived to reproduce, courtesy of “natural selection.” No longer were species, that is, the evolving biological populations to which these individuals belonged, either arbitrary or constant. Instead, they were produced by reproducing organisms and their broader ecosystem. Far from being either Platonic “ideal types” ( Hey 2011 ; Hodge 2009 ; Mayr 1988 ; Weiss and Long 2009 ), per Quetelet's notion of fixed essence plus error, or artificially assembled aggregates capable of yielding only what Quetelet would term meaningless mere “averages,” “populations” were newly morphed into temporally dynamic and mutable entities arising by biological descent. From this standpoint, variation was vital, and variants that were rare at one point in time could become the new norm at another.

Nevertheless, even though the essence of biological populations was now impermanent, what substantively defined “populations” remained framed as fundamentally endogenous. In the case of biological organisms, this essence resided in whatever material substances were transmitted by biological reproduction. Left intact was an understanding of population, population traits, and their variability as innately defined, with this variation rendered visible through a statistical analysis of appropriate population samples. The enduring result was to (1) collapse the distinctions between populations as substantive beings versus statistical objects and (2) imply that population characteristics reflect and are determined by the intrinsic essence of their component parts. Current conventional definitions of “population” say as much and no more ( table 1 ).

Conceptual Criteria for Defining Meaningful Populations for Public Health

Framing and Contesting “Population” through an Epidemiologic Lens . In the 150 years since these initial features of populations were propounded, they have become deeply entrenched, although not entirely uncontested. Figure 2 is a schematic encapsulation of mid-nineteenth to early twentieth-century notions of populations, with the entries emphasizing population statistics and population genetics because of their enduring influence, even now, on conceptions of populations in epidemiology and other population sciences. During this period, myriad disciplines in the life, social, and physical sciences embraced a statistical understanding of “population” ( Desrosières 1998 ; Hey 2011 ; Porter 1981 , 1986 , 2002 , 2003 ; Ross 2003 ; Schank and Twardy 2009 ; Yeo 2003 ). Eugenic thinking likewise became ascendant, espoused by leading scientists and statisticians, especially the newly named “biometricians,” who held that individuals and populations were determined and defined by their heredity, with the role of the “environment” being negligible or nil ( Carlson 2001 ; Davenport 1911 ; Galton 1904 ; Kevels 1985 ; Mackenzie 1982 ; Porter 2003 ; Tabery 2008 ).

An external file that holds a picture, illustration, etc.
Object name is milq0090-0634-f2.jpg

A schematic cross-disciplinary genealogy of mid-nineteen to early twentieth-century “population” thinking and current impact.

Sources : Carver 2003 ; Crow 1990 , 1994 ; Dale and Katz 2011 ; Darwin 1859 ; Daston 1987 ; Desrosières 1998 ; Eldredge 2005 ; Galton 1889 , 1904 ; Hacking 1975 , 1990 ; Hey 2011 ; Hodge 2009 ; Hogben 1933 ; Keller 2010 ; Mackenzie 1982 ; Marx 1845 ; Mayr 1988 ; Porter 1981 , 1986 , 2002 , 2003 ; Quetelet 1835 , 1844 ; Sarkar 1996 ; Schank and Twardy 2009 ; Stigler 1986 , 1997 ; Tabery 2008 ; Yeo 2003 .

It was also during the early twentieth century that the nascent academic discipline of epidemiology advanced its claims about being a population science, as part of distinguishing both the knowledge it generated and its methods from those used in the clinical and basic sciences ( Krieger 2000 , 2011 ; Lilienfeld 1980 ; Rosen [1958] [ 1993 ]; Susser and Stein 2009 ; Winslow et al. 1952 ). In 1927 and in 1935, for example, the first professors of epidemiology in the United States and the United Kingdom—Wade Hampton Frost (1880–1938) at the Johns Hopkins School of Hygiene and Public Health in 1921 ( Daniel 2004 ; Fee 1987 ), and Major Greenwood (1880–1949) at the London School of Hygiene and Tropical Medicine in 1928 ( Butler 1949 ; Hogben 1950 )—urged that epidemiology clearly define itself as the science of the “mass phenomena” of disease, Frost in his landmark essay “Epidemiology” (Frost [1927] [ 1941 ], 439) and Greenwood in his discipline-defining book Epidemics and Crowd Diseases: An Introduction to the Study of Epidemiology ( Greenwood 1935 , 125). Neither Frost nor Greenwood, however, articulated what constituted a “population,” other than the large numbers required to make a “mass.”

Also during the 1920s and 1930s, two small strands of epidemiologic work—each addressing different aspects of the inherent dual engagement of epidemiology with biological and societal phenomena ( Krieger 1994 , 2001 , 2011 )—began to challenge empirically and conceptually the dominant view of population characteristics as arising solely from individuals' intrinsic properties. The first thread was metaphorically inspired by chemistry's law of “mass action,” referring to the likelihood that two chemicals meeting and interacting in, say, a beaker, would equal the product of their spatial densities ( Heesterbeek 2005 ; Mendelsohn 1998 ). Applied to epidemiology, the law of “mass action” spurred novel efforts to model infectious disease dynamics arising from interactions between what were termed the “host” and the “microbial” populations, taking into account changes in the host's characteristics (e.g., from susceptible to either immune or dead) and also the population size, density, and migration patterns (Frost [ 1928 ] 1976; Heesterbeek 2005 ; Hogben 1950 ; Kermack and McKendrick 1927 ; Mendelsohn 1998 ).

The second thread was articulated in debates concerning eugenics and also in response to the social crises and economic depression precipitated by the 1929 stock market crash. Its focus concerned how societal conditions could drive disease rates, not only by changing individuals’ economic position, but also through competing interests. Explicitly stating this latter point was the 1933 monograph Health and Environment ( Sydenstricker 1933 ), prepared for the U.S. President's Research Committee on Social Trends by Edgar Sydenstricker (1881–1936), a leading health researcher and the first statistician to serve in the U.S. Public Health Service ( Krieger 2011 ; Krieger and Fee 1996 ; Wiehl 1974 ). In this landmark text, which explicitly delineated diverse aspects of what he termed the “social environment” alongside the physical environment, Sydenstricker argued (1933, 16, italics in original):

Economic factors in the conservation or waste of health, for example, are not merely the rate of wages; the hours of labor; the hazard of accident, of poisonous substances, or of deleterious dusts; they include also the attitude consciously taken with respect to the question of the relative importance of large capitalistic profits versus maintenance of the workers’ welfare.

In other words, social relations, not just individual traits, shape population distributions of health.

Influenced by and building on both Greenwood's and Sydenstricker's work, in 1957 Jeremy Morris (1910–2009) published his highly influential and pathbreaking book Uses of Epidemiology ( Morris 1957 ), which remains a classic to this day ( Davey Smith and Morris 2004 ; Krieger 2007a ; Smith 2001 ). Going beyond Frost and Greenwood, Morris emphasized that “the unit of study in epidemiology is the population or group , not the individual ” ( Morris 1957 , 3, italics in original) and also went further by newly defining epidemiology in relational terms, as “ the study of health and disease of populations and of groups in relation to their environment and ways of living ” ( Morris 1957 , 16, italics in original). As a step toward defining “population,” Morris noted that “the ‘population’ may be of a whole country or any particular and defined sector of it” ( Morris 1957 , 3), as delimited by people's “environment, their living conditions, and special ways of life” ( Morris 1957 , 61). He also, however, recognized that better theorizing about populations was needed and hence called for a greater “understanding of the properties of individuals which they have in virtue of their group membership” ( Morris 1957 , 120, italics in original). But this appeal went largely unheeded, as it directly contradicted the era's prevailing framework of methodological individualism ( Issac 2007 ; Krieger 2011 ; Ross 2003 ).

Morris's insights notwithstanding, the dominant view has remained what is presented in table 1 . Even the recent influential work of Geoffrey Rose (1926–1993), crucial to reframing individual risk in population terms, theorized populations primarily in relation to their distributional, not substantive, properties ( Rose 1985 , 1992 , 2008 ). Rose's illuminating analyses thus emphasized that (1) within a population, most cases arise from the proportionately greater number of persons at relatively low risk, as opposed to the much smaller number of persons at high risk; (2) determinants of risk within populations may not be the same as determinants of risk between populations; and (3) population norms shape where both the tails and the mean of a distribution occur. Rose thus cogently clarified that to change populations is to change individuals, and vice versa, implying that the two are mutually constitutive, but he left unspecified who and what makes meaningful populations and when they can be meaningfully compared.

Current Challenges to Conventional Views of “Population.” A new wave of work contesting the still reigning idea of “the average man” can currently be found in recent theoretical and empirical work in the social and biological sciences attempting to analyze population phenomena in relation to dynamic causal processes that encompass multiple levels and scales, from macro to micro ( Biersack and Greenberg 2006 ; Eldredge 1999 ; Eldredge and Grene 1992 ; Gilbert and Epel 2009 ; Grene and Depew 2004 ; Harraway 2008 ; Illari, Russo, and Williamson 2011 ; Krieger 2011 ; Lewontin 2000 ; Turner 2005 ). Also germane is research on system properties in the physical and information sciences ( Kuhlmann 2011 ; Mitchell 2009 ; Strevens 2003 ).

Applicable to the question of who and what makes a population, one major focus of this alternative thinking is on processes that generate, maintain, transform, and lead to the demise of complex entities. This perspective builds on and extends a long history of critiques of reductionism ( Grene and Depew 2004 ; Harré 2001 ; Illari, Russo, and Williamson 2011 ; Lewontin 2000 ; Turner 2005 ; Ziman 2000 ), which together aver that properties of a complex “whole” cannot be reduced to, and explained solely by, the properties of its component “parts.” The basic two-part argument is that (a) new (emergent) properties can arise out of the interaction of the “parts” and (b) properties of the “whole” can transform the properties of their parts. Thus, to use one well-known example, a brain can think in ways that a neuron cannot. Taking this further in regard to the generative causal processes at play, what a brain thinks can affect neuron connections within the brain, and it also is affected by the ecological context and experiences of the organism, of which the brain is a part ( Fox, Levitt, and Nelson 2010 ; Gibson 1986 ; Harré 2001 ; Stanley, Phelps, and Banaji 2008 ). The larger claim is that the causal processes that give rise to complex entities can both structure and transform the characteristics of both the whole and its parts.

What might it look like for public health to bring this alternative perspective to the question of defining, substantively, who and what makes a population? Let me start with a conceptual answer, followed by some concrete public health propositions and examples.

Populations as Relational Beings: An Alternative Causal Conceptualization

In brief, I argue that a working definition of “populations” for public health (or any field concerned with living organisms) would, in line with Sydenstricker (1933) and Morris (1957) and the other contemporary theorists just cited, stipulate that populations are first and foremost relational beings, not “things.” They are active agents, not simply statistical aggregates characterized by distributions.

Specifically, as tables 2 and ​ and3 3 show, the substantive populations that populate our planet

Conceptual Criteria for Defining Meaningful Populations for Population Sciences, Guided by the Ecosocial Theory of Disease Distribution

Source: Krieger 1994 , 2001 , and 2011 , 214–15.

Defining Features of Populations of Living Beings, Including Humans, Relevant to Public Health and Population Sciences

  • Are animate, self-replicating, and bounded complex entities, generated by systemic causal processes.
  • Arise from and are constituted by relationships of varying strengths, both externally (with and as bounded by other populations) and internally (among their component beings).
  • Are inherently constituted by, and simultaneously influence the characteristics of, the varied individuals who comprise its members and their population-defined and -defining relationships.

It is these relationships and their underlying causal processes (both deterministic and probabilistic), not simply random samples derived from large numbers, that make it possible to make meaningful substantive and statistical inferences about population characteristics, as well as meaningful causal inferences about observed associations.

Accordingly, as summarized by Richard A. Richards, a philosopher of biology (who was writing about species, one type of population), populations have “well-defined beginnings and endings, and cohesion and causal integration” ( Richards 2001 ). They likewise necessarily exhibit historically contingent distributions in time and space, by virtue of the dynamic interactions intrinsically occurring between (and within) their unique individuals and with other equally dynamic codefining populations and also their changing abiotic environs. Underscoring this point, even a population of organisms cloned from a single source organism will exhibit variation and distributions as illustrated by the phenomenon of developmental “noise,” an idea presaged by early twentieth-century observations of chance differences in coat color among litter mates of pure-bred populations raised in identical circumstances ( Davey Smith 2011 ; Lewontin 2000 ; Wright 1920 ).

As for the inherent relationships characterizing populations, both internally and externally, I suggest that four key types stand out, as informed by the ecosocial theory of disease distribution ( Krieger 1994 , 2001 , 2011 ); the collaborative writing of Niles Eldredge, an evolutionary biologist, and Marjorie Grene, a philosopher of biology ( Eldredge and Grene 1992 ); as well as works from political sociology, political ecology, and political geography ( Biersack and Greenberg 2006 ; Harvey 1996 ; Nash and Scott 2001 ). As tables 2 and ​ and3 3 summarize, these four kinds of relationships are (1) genealogical , that is, relationships by biological descent; (2) internal and economical, in the original sense of the term, referring to relationships essential to the daily activities of whatever is involved in maintaining life (in ancient Greece, oikos , the root of the “eco” in both “ecology” and “economics,” referred to a “household,” conceptualized in relation to the activities and interactions required for its existence [ OED 2010]); (3) external and ecological , referring to relationships between populations and with the environs they coinhabit; and (4) in the case of people (and likely other species as well), teleological , that is, by design, with some conscious purpose in mind (e.g., citizenship criteria). Spanning from mutually beneficial (e.g., symbiotic) to exploitative (benefiting one population at the expense of the other), these relationships together causally shape the characteristics of populations and their members.

What are some concrete examples of animate populations that exemplify these points? Table 3 provides four examples. Two pertain to human populations: the “U.S. population” ( Foner 1997 ; Zinn 2003 ) and “social classes” ( Giddens and Held 1982 ; Wright 2005 ). The third considers microbial populations within humans ( Dominguez-Bello and Blaser 2011 ; Pflughoeft and Versalovic 2012 ; Walter and Ley 2011 ), and the fourth concerns a plant population, a species of tree, the poplar, whose genus name ( Populus ) derives from the same Latin root as “population” ( Braatne, Rood, and Heillman 1996 ; Fergus 2005 ; Frost et al. 2007 ; Jansson and Douglas 2007 ). Together, these examples clarify what binds—as well as distinguishes—each of these dynamic populations and their component individuals. They likewise underscore that contrary to common usage, “population” and “individual” are not antonyms. Instead, they hark back to the original meaning of “individual”—that is, “individuum,” or what is indivisible, referring to the smallest unit that retained the properties of the whole to which it intrinsically belonged ( OED 2010; Williams 1985 ). Thus, although it is analytically possible to distinguish between “populations” and “individuals,” in reality these phenomena occur and are lived simultaneously. A person is not an individual on one day and a member of a population on another. Rather, we are both, simultaneously. This joint fact is fundamental and is essential to keep in mind if analysis of either individual or population phenomena is to be valid.

The importance of considering the intrinsic relationships—both internal and external—that are the integuments of living populations, themselves active agents and composed of active agents, is further illuminated through contrast to the classic case of a hypothetical population: the proverbial jar of variously colored marbles, used in many classes to illustrate the principles of probability and sampling. Apart from having been manufactured to be of a specific size, density, and color, there are no intrinsic relationships between the marbles as such. Spill such a jar, and see what happens.

As this thought experiment makes clear, the marbles will not reconstitute themselves into any meaningful relationships in space or time. They will just roll to wherever they do, and that will be the end of it, unless someone with both energy and a plan scoops them up and puts them back in the jar. Nor will a sealed jar of marbles change its color composition (i.e., the proportion of marbles of a certain color), or an individual marble change its color, unless someone opens the jar and replaces, adds, or removes some marbles or treats them with a color-changing agent. Hence, a purely statistical understanding of “populations,” however necessary for sharpening ideas about causal inference, study design, and empirical estimation, is by itself insufficient for defining and analyzing real-life populations, including “population health.”

That said, marbles do have their uses. In particular, they can help us visualize how causal determinants can structure population distributions of the risks of random individuals via what I term “structured chances.”

Populations and Structured Chances

One long-standing conundrum in population sciences is their ability to identify and use data on population regularities to elucidate causal pathways, even though they cannot predict which individuals in the population will experience the outcome in question ( Daston 1987 ; Desrosières 1998 ; Hacking 1990 ; Illari, Russo, and Williamson 2011 ; Porter 1981 , 2002 , 2003 ; Quetelet 1835 ; Stigler 1986 ; Strevens 2003 ). This incommensurability of population and individual data has been a persistent source of tension between epidemiology and medicine (Frost [1927] [ 1941 ]; Greenwood 1935 ; Morris 1957 ; Rose 1992 , 2008 ). Epidemiologic research, for example, routinely uses aggregated data obtained from individuals to gain insight into both disease etiology and why population rates vary, and does so with the understanding that such research cannot predict which individual will get the disease in question ( Coggon and Martyn 2005 ). By contrast, medical research remains bent on using just these sorts of data to predict an individual's risk, as exemplified in its increasingly molecularized quest for “personalized medicine” ( Davey Smith 2011 ).

Where marbles enter the picture is that they can, through the use of a physical model, demonstrate the importance of how population distributions are simultaneously shaped by both structure (arising from causal processes) and randomness (including truly stochastic events, not just “randomness” as a stand-in for “ignorance” of myriad deterministic events too complex to model). As Stigler has recounted (1997), perhaps the first person to propose using physical models to understand probability was Sir Francis Galton (1822–1911), a highly influential British scientist and eugenicist ( figure 2 ), who himself coined the term “eugenics” and who held that heredity fundamentally trumped “environment” for traits influencing the capacity to thrive, whether physical, like health status, or mental, like “intelligence” ( Carlson 2001 ; Cowan 2004 ; Galton 1889 , 1904 ; Keller 2010 ; Kevels 1985 ; Stigler 1997 ). In his 1889 opus Natural Inheritance ( Galton 1889 ), Galton sketched ( figure 3 ) “an apparatus … that mimics in a very pretty way the conditions on which Deviation depends” ( Galton 1889 , 63), whereby gun shots (i.e., marble equivalents) would be poured through a funnel down a board whose surface was studded with carefully placed pins, off which each pellet would ricochet, to be collected in evenly spaced bins at the bottom.

An external file that holds a picture, illustration, etc.
Object name is milq0090-0634-f3.jpg

Producing population distributions: structured chances as represented by physical models.

Sources: Galton's Quincunx, Galton 1889 , 63; physical models, Limpert, Stahel, and Abbt 2001 (reproduced with permission).

Galton termed his apparatus, which he apparently never built ( Stigler 1997 ), the “Quincunx” because the pattern of the pins used to deflect the shot was like a tree-planting arrangement of that name, which at the time was popular among the English aristocracy ( Stigler 1997 ). The essential point was that although each presumably identical ball had the same starting point, depending on the chance interplay of which pins it hit during its descent at which angle, it would end up in one or another bin. The accumulation of balls in any bin in turn would reflect the number of possible pathways (i.e., likelihood) leading to its ending up in that bin. Galton designed the pin pattern to yield a normal distribution. He concluded that his device revealed ( Galton 1889 , 66)

a wonderful form of cosmic order expressed by the “Law of Frequency of Error.” The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy the more perfect is its sway … each element, as it is sorted into place, finds, as it were, a pre-ordained niche, accurately adapted to fit it.

In other words, in accord with Quetelet's view of “l’homme moyen,” Galton saw the order produced as the property of each “element,” in this case, the gun shot.

However, a little more than a century later, some physicists not only built Galton's “Quincunx,” as others have done ( Stigler 1997 ), but went one further ( Limpert, Stahel, and Abbt 2001 ): they built two, one designed to generate the normal distribution and the other to generate the log normal distribution (a type of distribution skewed on the normal scale, but for which the natural logarithm of the values displays a normal distribution) ( figure 3 ). As their devices clearly show, what structures the distribution is not the innate qualities of the “elements” themselves but the features of both the funnel and the pins—both their shape and placement. Together, these structural features determine which pellets can (or cannot) pass through the pins and, for those that do, their possible pathways.

The lesson is clear: altering the structure can change outcome probabilities, even for identical objects, thereby creating different population distributions. For the population sciences, this insight permits understanding how there can simultaneously be both chance variation within populations (individual risk) and patterned differences between population distributions (rates). Such an understanding of “structured chances” rejects explanations of population difference premised solely on determinism or chance and also brings Quetelet's astronomical “l’homme moyen” and its celestial certainties of fixed stars back down to earth, grounding the study of populations instead in real-life, historically contingent causal processes, including those structured by human agency.

Rethinking the Meaning and Making of Means: The Utility of Critical Population-Informed Thinking

How might a more critical understanding of the substantive nature of real-life populations benefit research on, knowledge about, and policies regarding population health and health inequities? Drawing on table 2 's conceptual criteria for defining who and what makes populations, table 4 offers four sets of critical public health propositions about “populations” and “study populations,” whose salience I assess using examples of breast cancer, a disease increasingly recognized as a major cause of morbidity and mortality in both the global South and the global North ( Althuis et al. 2005 ; Bray, McCarron, and Parkin 2004 ; Parkin and Fernández 2006 ) and one readily revealing that the problem of meaningful means is as vexing for “the average woman” as for “the average man.”

Four Propositions to Improve Population Health Research, Premised on Critical Population-Informed Thinking

Propositions 1 and 2: Critically Parsing Population Rates and Their Comparisons

Consider, first, three illustrative cases pertaining to analyses of population rates of breast cancer:

  • A recent high-profile analysis of the global burden of breast cancer ( Briggs 2011 ; Forouzanafar et al. 2011 ; IHME 2011; Jaslow 2011 ), which estimated and compared rates across countries, accompanied by interpretative text, with the article stating, for example, that Colombia and Venezuela “… have very different trends, despite sharing many of the same lifestyle and demographic factors,” followed by the inference that the “explanation of these divergent trends may lie in the interaction between genes and individual risk factors.” (IHME 2011, 24)
  • Typical reviews of the global epidemiology of breast cancer, which contain such statements as “Population-based statistics show that globally, when compared to whites, women of African ancestry (AA) tend to have more aggressive breast cancers that present more frequently as estrogen receptor negative (ERneg) tumors” ( Dunn et al. 2010 , 281); and “early onset ER negative tumors also develop more frequently in Asian Indian and Pakistani women and in women from other parts of Asia, although not as prevalent as it is in West Africa.” ( Wallace, Martin, and Ambs 2011 , 1113)
  • The headline-making news that the U.S. breast cancer incidence rate in 2003 unexpectedly dropped by 10 percent, a huge decrease ( Kolata 2006 , 2007 ; Ravdin et al. 2006 , 2007 ).

What these three commonplace examples have in common is an uncritical approach to presenting and interpreting population data, premised on the dominant assumption that population rates are statistical phenomena driven by innate individual characteristics. Cautioning against accepting these claims at face value are propositions 1 and 2, with their emphases, respectively, on (1) critically appraising who constitutes the populations whose means are at issue and (2) critically considering the dynamic relationships that give rise to population patterns of health, including health inequities.

From the standpoint of proposition 1, the first relevant fact is that as a consequence of global disparities in resources ( Klassen and Smith 2011 ) arising from complex histories of colonialism and underdevelopment ( Birn, Pillay, and Holtz 2009 ), only 16 percent of the world's population is covered by cancer registries, with coverage of less than 10 percent within the world's most populous regions (Africa, Asia [other than Japan], Latin America, and the Caribbean), versus 99 percent in North America ( Parkin and Fernández 2006 ). Put in national terms, among the 184 countries for which the International Agency on Cancer (IARC) reports estimated rates, only 33 percent—almost all located in the global North—have reliable national incidence data ( GLOBOCAN 2012 ). These data limitations are candidly acknowledged both by IARC ( GLOBOCAN 2012 ) and in the scientific literature, including that on breast cancer ( Althuis et al. 2005 ; Bray, McCarron, and Parkin 2004 ; Ferlay et al. 2012 ; Krieger, Bassett, and Gomez 2012 ; Parkin and Fernández 2006 ). To generate estimates of incidence in countries lacking national cancer registry data, the IARC transparently employs several modeling approaches, based on, for example, a country's national mortality data combined with city-specific or regional cancer registry data (if they do exist, albeit typically not including the rural poor) or, when no credible national data are available, estimating rates based on data from neighboring countries ( GLOBOCAN 2012 ).

A critical analysis of the population claims asserted in examples 1 and 2 starts by questioning whether the means at issue can bear the weight of meaningful comparisons and inference. Thus, relevant to example 1, Colombia has only one city-based cancer registry (in Cali), and Venezuela has no cancer registries at all ( GLOBOCAN 2012 ). Moreover, the rates compared ( Forouzanafar et al. 2011 ; IHME 2011 ) were generated by nontransparent modeling methods ( Krieger, Bassett, and Gomez 2012 ) that have empirically been shown not to estimate accurately the actually observed rates in the “gold-standard” Nordic countries, known for their excellent cancer registration data ( Ferlay et al. 2012 ). Second, relevant to the countries and geographic regions listed in example 2, the cancer incidence rates estimated by IARC are based (a) for Pakistan, solely on the weighted average for observed rates in south Karachi, (b) for India, on a complex estimation scheme for urban and rural rates in different Indian states and data from cancer registries in several cities, and (c) for western Africa, on the weighted average of data for sixteen countries, of which ten have incidence rates estimated based on those of neighboring countries, another five rely on data extrapolated from cancer registry data from one city (or else city-based cancer registries in neighboring countries), and only one of which has a national cancer registry ( GLOBOCAN 2012 ). Critical thinking about who and what makes a population thus prompts questions about whether the data presented in examples 1 and 2 can provide insight into either alleged individual innate characteristics or into what the true on-average rate would be if everyone were counted (let alone what the variability in rates might be across social groups and regions). There is nothing mundane about a mean.

Proposition 2 in turn calls attention to structured chance in relation to the dynamic intrinsic and extrinsic relationships constituting national populations, with table 2 illustrating what types of relationships are at play using the example of the United States. It thus spurs critical queries as to whether observed national and racial/ethnic differences (if real, and not an artifact of inaccurate data) arise from innate (i.e., genetic) differences between “populations,” as posed by examples 1 and 2. Two lines of evidence alternatively suggest these population differences could instead be embodied inequalities ( Krieger 1994 , 2000 , 2005 , 2011 ; Krieger and Davey Smith 2004 ) that arise from structured chances. The first line pertains to well-documented links among national, racial/ethnic, and socioeconomic inequalities in breast cancer incidence, survival, and mortality ( Klassen and Smith 2011 ; Krieger 2002 ; Vona-Davis and Rose 2009 ). The second line stems from research that evaluates claims of intrinsic biological difference by examining their dynamics, as illustrated by the first investigation to test statistically for temporal trends in the white/black odds ratio for ER positive breast cancer between 1992 and 2005, which revealed that in the United States, the age-adjusted odds ratio rose between 1992 and 2002 and then leveled off (and actually fell among women aged fifty to sixty-nine) ( Krieger, Chen, and Waterman 2011 ).

Relevant to example 3, these findings of dynamic, not fixed, black/white risk differences for breast cancer ER status likely reflect the socially patterned abrupt decline in hormone therapy use following the July 2002 release of results from the U.S. Women's Health Initiative (WHI) ( Rossouw et al. 2002 ). This was the first large randomized clinical trial of hormone therapy, despite its having been widely prescribed since the mid-1960s ( Krieger 2008 ). The WHI found that contrary to what was expected, hormone therapy did not decrease (and may have raised) the risk of cardiovascular disease, and at the same time, the WHI confirmed prior evidence that long-term use of hormone therapy increased the risk of breast cancer (especially ER+). Thus, before the initiative, hormone therapy use in the United States was highest among white women with health insurance who could afford, and were healthy enough, to take the medication without any contraindications ( Brett and Madans 1997 ; Friedman-Koss et al. 2002 ). Population-informed thinking would thus predict that any drops in breast cancer incidence would occur chiefly among those sectors of women most likely to have used hormone therapy. Subsequent global research has borne out these predictions ( Zbuk and Anand 2012 ), including the sole U.S. study that systematically explored socioeconomic differentials both within and across racial/ethnic groups, which found that the observed breast cancer decline was restricted to white non-Hispanic women with ER+ tumors residing in more affluent counties ( Krieger, Chen, and Waterman 2010 ). These results counter the widely disseminated and falsely reassuring impression that breast cancer risk was declining for everyone ( Kolata 2006 , 2007 ). They accordingly provide better guidance to public health agencies, clinical providers, and breast cancer advocacy groups regarding trends in breast cancer occurrence among the real-life populations they serve.

Together, these examples illuminate why proposition 2's corollary 2.2 proposes conceptualizing the jointly lived experience of population rates and individual manifestations of health, disease, and well-being as what I would term “embodied phenotype.” Inherently dynamic and relational, this proposed construct meaningfully links the macro and micro, and populations and individuals, through the play of structured chance. It also is consonant with new insights emerging from the fast-growing field of ecological evolutionary developmental biology (“eco-evo-devo”) into the profound and dynamic links among environmental exposures, gene expression, development, speciation, and the flexibility of organisms’ phenotypes across the life span ( Gilbert and Epel 2009 ; Piermsa and van Gils 2011 ; West-Eberhard 2003 ). Only just beginning to be integrated into epidemiologic theorizing and research ( Bateson and Gluckman 2012 ; Davey Smith 2011 , 2012 ; Gilbert and Epel 2009 ; Kuzawa 2012 ; Relton and Davey Smith 2012 ), eco-evo-devo's historical and relational approach to biological expression affirms the need for critical population-informed thinking.

Propositions 3 and 4: Study Participants, Study Populations, and Causal Inference

Finally, a population-informed approach helps clarify, in accordance with propositions 3 and 4, why improving our understanding of “study populations,” and thus study participants, matters for causal inference. Consider, for example, the 1926 pathbreaking epidemiologic study of breast cancer conducted by the British physician and epidemiologist Janet Elizabeth Lane-Claypon (1877–1967) ( Lane-Claypon 1926 ), the first study to identify systematically what were then called “antecedents” of breast cancer (today termed “risk factors”) and now also widely acknowledged to be the first epidemiologic case-control study, as well as the first epidemiologic study to publish its questionnaire ( Press and Pharoah 2010 ; Winkelstein 2004 ). Quickly replicated in the United States in 1931 by Wainwright ( Wainwright 1931 ), these two studies have recently been reanalyzed, using current statistical methods. The results show that their estimates of risk associated with major reproductive risk factors (e.g., early age at first birth, parity, lactation, and early age at menopause) are consistent with the current evidence ( Press and Pharoah 2010 ).

Not addressed in the reanalysis, however, are the two studies’ different results for occupational class, defined in relation to the women's employment before marriage. When these occupational data are recoded into the meaningful categories of professional, working-class nonmanual, and working-class manual ( Krieger, Williams, and Moss 1997 ; Rose and Pevalin 2003 ), the data quickly reveal why the studies had discrepant results. Thus, Lane-Claypon concluded there was no “appreciable difference” in breast cancer risk by social class ( Lane-Claypon 1926 , 12) (χ 2 = 1.833; p = 0.4), whereas in the U.S. study risk was lower among the working-class manual women (χ 2 = 9.305; p = 0.01). Why? In brief, a far higher proportion of the British women were working-class manual (78.7% cases, 84.2% controls vs. the U.S. women: 48.8% cases, 62.5% controls), and a far lower proportion were professionals (6.5% cases, 4.2% controls, vs. the U.S. women: 23.8% cases, 20.7% controls). Just as Rose famously observed that if everyone smoked, smoking would not be identified as a cause of lung cancer ( Rose 1985 , 1992 ), when most study participants are from only one social class, socioeconomic inequalities in health cannot and will not be detected ( Krieger 2007b ). The net result is erroneous causal inferences about the relevance of social class to structuring the risk of disease, thereby distorting the evidence base informing efforts to address health inequities.

Critical population-informed thinking therefore would question the dominant conventional cleavage, in both the population health and the social sciences, between “internal validity” and “generalizability” (or “external validity”) and the related endemic language of “study population”—routinely casually equated with study participants—and “general population” ( Broadbent 2011 ; Cartwright 2011 ; Cook 2001 ; Kincaid 2011 ; Kukuall and Ganguli 2012 ; Porta 2008 ; Rothman, Greenland, and Lash 2008 ). One critical determinant of a study's ability to provide valid tests of exposure-outcome hypotheses is the range of exposure encompassed ( Chen and Rossi 1987 ; Schlesselman and Stadel 1987 ); another is the extent to which participants’ selection into a study is associated with important unmeasured determinants of the outcome ( Pizzi et al. 2011 ). Given the social structuring of the vast majority of exposures, as evidenced by the virtually ubiquitous and dynamic societal patternings of disease ( Birn, Pillay, and Holtz 2009 ; Davey Smith 2003 ; Krieger 1994 , 2011 ; WHO 2008), meaningful research requires that the range of exposures experienced (or not) by study participants needs to capture the etiologically relevant range experienced in the real-world societies, that is, meaningful populations, of which they are a part. The point is not that ideal study participants should be a random sample of some “general population”; instead, it is that their location in the intrinsic and extrinsic relationships creating their population membership cannot be ignored.

Highlighting the need for critical population-informed thinking is advice provided in the widely used and highly influential textbook Modern Epidemiology ( Rothman, Greenland, and Lash 2008 ). Although the text correctly states that “the pursuit of representativeness can defeat the goal of validly identifying causal relations,” it further asserts that “one would want to select study groups for homogeneity with respect to important confounders, for highly cooperative behavior, and for availability of accurate information, rather than attempt to be representative of a natural population” (p. 146). “Classic examples” of the populations fulfilling these criteria are stated to be “the British Physicians’ Study of smoking and health and the Nurses’ Health Study, neither of which were remotely representative of the general population with respect to sociodemographic factors” ( Rothman, Greenland, and Lash 2008 , 146–47).

Of course, studies need accurate data, but the advice here raises more questions than it answers. First, just who and what is a “natural population”?—and, related, who is that “general population”? Second, might there be drawbacks to, not just benefits from, preferentially studying predominantly white health professionals and others with the resources to be “highly cooperative” and possess “accurate information”? Stated another way, what might be the adverse consequences on scientific knowledge and policymaking of discounting people that mainstream research already routinely and problematically calls “hard-to-reach” populations ( Crosby et al. 2010 ; Shaghaghi, Bhopal, and Sheik 2011 )? These populations include the disempowered and dispossessed, whose adverse social and physical circumstances mean that their range of exposures almost invariably differ, in both level and type, from those encountered by the effectively “easy-to-reach.” Might it not also be critical for researchers to develop more inclusive approaches that could yield accurate etiologic and policy-relevant data on the distributions and determinants of disease among those who bear the brunt of health inequities ( Smylie et al. 2012 )?—a scientific task that necessarily requires contrasts in both exposures and outcomes between the social groups defined by the inequitable societal relationships at issue, whether involving social class, racism, gender, or other forms of social inequality ( Krieger 2007b ).

Reflecting on how who is studied determines what can be learned, the eminent British biologist Lancelot Hogben (1895–1975) ( figure 2 ; Bud 2004 ; Werskey 1988 ), in his lucid and prescient 1933 book titled Nature and Nurture ( Hogben 1933 , 106), cogently observed:

Differences to which members of the same family or different families living at one and the same social level are exposed may be very much less than differences to which individuals belonging to families taken from different social levels are exposed. Experiment shows that ultra-violet light has a considerable influence on growth in mammals. In Great Britain, some families live continuously in the sooty atmosphere of an industrial area. Others spend their winters on the Riviera.

In other words, critical population-informed thinking is vital to good science.

Conclusion: Meaningful Means, Embodied Phenotypes, and the Structural Determinants of Populations and the People's Health

In conclusion, to improve causal inference and policies and action based on this knowledge, the population sciences need to expand and deepen theorizing about who and what makes populations and their means. At a time when the topic of causality in the sciences remains hotly debated by philosophers and researchers alike, all parties nevertheless agree that “the question of how probabilistic accounts of causality can mesh with mechanistic accounts of causality desperately needs answering” ( Illari, Russo, and Williamson 2011 , 20). As my article makes clear, the idea and reality of “population” reside at the nexus of this question. Clarifying the substantive defining features of populations, including who and what structures the dynamic and emergent distributions of their characteristics and components, is thus crucial to both analyzing and altering causal processes. For public health, this means sharpening our thinking about how structured chances, structured by the political and economic relationships constituting the societal determinants of health ( Birn, Pillay, and Holtz 2009 ; Irwin et al. 2006 ; Krieger 1994 , 2011 ), generate the embodied phenotypes that are the people's health.

As should be evident, the challenges to developing critical population-informed thinking are not purely conceptual; they are also political, because these ideas necessarily engage with issues involving not only the distribution of people but also the distribution of power and property and the societal relationships that bind individuals and populations, for good and for bad ( Krieger 2011 ). Nearly two hundred years after Quetelet introduced his “l’homme moyen,” the countervailing call for routinely measuring and tracking population health inequities, and not just on-average population rates of health, is only now gaining traction globally (WHO 2008, 2011). This is coincident with the ever-accelerating aforementioned genomic quest for “personalized medicine” ( Davey Smith 2011 ), as well as the continued economic, social, political, and public health reverberations of the 2008 global economic crash ( Benatar, Gill, and Bakker 2011 ; Stiglitz 2010 ). In such a context, clarity regarding who and what populations are, and the making and meaning of their means, is vital to population sciences, population health, and the promotion of health equity.

Acknowledgments

No funding supported this work.

  • Althuis MD, Dozier JM, Anderson WF, Devesa SS, Brinton LA. Global Trends in Breast Cancer Incidence and Mortality 1973–1999. International Journal of Epidemiology. 2005; 34 :405–12. [ PubMed ] [ Google Scholar ]
  • Bateson P, Gluckman P. Plasticity and Robustness in Development and Evolution. International Journal of Epidemiology. 2012; 41 :219–23. [ PubMed ] [ Google Scholar ]
  • Benatar SR, Gill S, Bakker I. Global Health and the Global Economic Crisis. American Journal of Public Health. 2011; 101 :646–53. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Biersack A, Greenberg JB. Reimagining Political Ecology. Durham, NC: Duke University Press; 2006. [ Google Scholar ]
  • Birn AE, Pillay Y, Holtz TM. Textbook of International Health: Global Health in a Dynamic World. 3rd ed. New York: Oxford University Press; 2009. [ Google Scholar ]
  • Braatne JH, Rood SB, Heillman PE. Life History, Ecology, and Conservation of Riparian Cottonwoods in North America. In: Stettler RF, Bradshaw HD Jr, Heilman PE, Hinckley TM, editors. Biology of Populus and Its Implications for Management and Conservation. Ottawa: National Research Council of Canada, NRC Research Press; 1996. pp. 57–85. [ Google Scholar ]
  • Bray F, McCarron P, Parkin DM. The Changing Global Patterns of Female Breast Cancer Incidence and Mortality. Breast Cancer Research. 2004; 6 :229–39. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Brett KM, Madans JH. Difference in Use of Postmenopausal Hormone Replacement Therapy by Black and White Women. Menopause. 1997; 4 :66–70. [ Google Scholar ]
  • Briggs H. Women's Cancers Reach Two Million. 2011. BBC News Health, September 14. Available at http://www.bbc.co.uk/news/health-14917284 (accessed June 17, 2012)
  • Broadbent A. Inferring Causation in Epidemiology: Mechanisms, Black Boxes, and Contrasts. In: Illari PM, Russo F, Williamson J, editors. Causality in the Sciences. Oxford: Oxford University Press; 2011. pp. 45–69. [ Google Scholar ]
  • Bud R. Oxford Dictionary of National Biography. Oxford: Oxford University Press; 2004. Hogben, Lancelot Thomas (1895–1975) Available at http://www.oxforddnb.com.ezp-prod1.hul.harvard.edu/view/article/31244?docPos=1 (accessed June 17, 2012) [ Google Scholar ]
  • Burian RM, Zallen DT. Genes. In: Bowler PJ, Pickstone JV, editors. The Modern Biological and Earth Sciences. Cambridge: Cambridge University Press; 2009. Cambridge Histories Online. DOI: 10.1017/CHOL9780521572019.024 . [ Google Scholar ]
  • Butler AHB. Obituary: Major Greenwood. Journal of the Royal Statistical Society: Series A (General) 1949; 112 :487–89. [ Google Scholar ]
  • Carlson EA. The Unfit: A History of a Bad Idea. Cold Spring Harbor, NY: Cold Spring Harbor Press; 2001. [ Google Scholar ]
  • Cartwright N. Predicting “It Will Work for Us”: (Way) beyond Statistics. In: Illari PM, Russo F, Williamson J, editors. Causality in the Sciences. Oxford: Oxford University Press; 2011. pp. 750–68. [ Google Scholar ]
  • Carver T. Marx and Marxism. In: Porter TM, Ross D, editors. The Modern Social Sciences. Cambridge: Cambridge University Press; 2003. Cambridge Histories Online. DOI: 10.1017/CHOL9780521594424.013 . [ Google Scholar ]
  • Chen H-T, Rossi PH. The Theory-Driven Approach to Validity. Evaluation and Program Planning. 1987; 10 :95–103. [ Google Scholar ]
  • Clarke A, Agrò AF, Zheng Y, Tickle C, Jansson R, Kehrer-Sawatzki H, Cooper DN, Delves P, Battista J, Melino G, Perkel DJ, Hetherington AM, Bynum WF, Valpuesta JM, Harper D. Encyclopedia of Life Sciences. Chichester: Wiley; 2000. –2011. Available at http://www.els.net/WileyCDA/ (accessed September 6, 2011) [ Google Scholar ]
  • Coggon DIW, Martyn CN. Time and Chance: The Stochastic Nature of Disease Causation. The Lancet. 2005; 365 :1434–37. [ PubMed ] [ Google Scholar ]
  • Cole J. The Power of Large Numbers: Populations, Politics, and Gender in Nineteenth-Century France. Ithaca, NY: Cornell University Press; 2000. [ Google Scholar ]
  • Cook TD. Generalization: Conceptions in the Social Sciences. In: Smelser NJ, Baltes PB, editors. International Encyclopedia of the Social & Behavioral Sciences. Oxford: Pergamon; 2001. pp. 6037–43. DOI: 10.1016/B0-08-043076-7/00698-7 . [ Google Scholar ]
  • Cowan RS. Oxford Dictionary of National Biography. Oxford: Oxford University Press; 2004. Galton, Sir Francis (1822–1911) Available at http://www.oxforddnb.com.ezp-prod1.hul.harvard.edu/view/article/33315 (accessed June 17, 2012) [ Google Scholar ]
  • Crosby RA, Salazar LF, DiClemente RJ, Lang DL. Balancing Rigor against the Inherent Limitations of Investigating Hard-to-Reach Populations. Health Education Research. 2010; 25 :1–5. [ PubMed ] [ Google Scholar ]
  • Crow JF. R.A. Fisher: A Centennial View. Genetics. 1990; 124 :204–11. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Crow JF. Sewall Wright (1889–1988): A Biographical Memoir. Washington, DC: National Academy of Science; 1994. [ Google Scholar ]
  • Daintith J, Martin E, editors. A Dictionary of Science. 5th ed. Oxford: Oxford University Press; 2005. [ Google Scholar ]
  • Dale AI, Katz S. Arthur L. Bowley: A Pioneer in Modern Statistics and Economics. London: World Scientific Publishing; 2011. [ Google Scholar ]
  • Daniel TM. Wade Hampton Frost: Pioneer Epidemiologist 1880–1938. Rochester, NY: University of Rochester Press; 2004. [ Google Scholar ]
  • Darwin C. Origin of Species. Edison, NJ: Castle Books; 2004. (1859) [ Google Scholar ]
  • Daston LJ. Rational Individuals versus Laws of Society: From Probability to Statistics. In: Krüger L, Daston LJ, Heidelberger M, editors. The Probabilistic Revolution. Vol. 1 , Ideas in History. Cambridge, MA: MIT Press; 1987. pp. 295–304. [ Google Scholar ]
  • Davenport CB. Heredity in Relation to Eugenics. New York: Henry Holt; 1911. [ Google Scholar ]
  • Davey Smith G. Health Inequalities: Lifecourse Approaches. Bristol: Policy Press; 2003. [ Google Scholar ]
  • Davey Smith G. Epidemiology, Epigenetics and the “Gloomy Prospect”: Embracing Randomness in Population Health Research and Practice. International Journal of Epidemiology. 2011; 40 :537–62. [ PubMed ] [ Google Scholar ]
  • Davey Smith G. Epigenesis for Epidemiologists: Does Evo-Devo Have Implications for Population Health Research and Practice. International Journal of Epidemiology. 2012; 41 :236–47. [ PubMed ] [ Google Scholar ]
  • Davey Smith G, Morris J. A Conversation with Jerry Morris. Epidemiology. 2004; 15 :770–73. [ PubMed ] [ Google Scholar ]
  • Davis K, Rowland D. Uninsured and Underserved: Inequities in Health Care in the United States. The Milbank Quarterly. 1983; 61 :149–76. [ PubMed ] [ Google Scholar ]
  • Desrosières A. The Politics of Large Numbers: A History of Statistical Reasoning. Cambridge, MA: Harvard University Press; 1998. Trans. Camille Naish. [ Google Scholar ]
  • Dominguez-Bello MG, Blaser MJ. The Human Microbiota as a Marker for Migrations of Individuals and Populations. Annual Review of Anthropology. 2011; 40 :451–74. [ Google Scholar ]
  • Dunn BK, Agurs-Collins T, Browne D, Lubet R, Johnson KA. Health Disparities in Breast Cancer: Biology Meets Socioeconomic Status. Breast Cancer Research and Treatment. 2010; 121 :281–92. [ PubMed ] [ Google Scholar ]
  • Eldredge N. The Pattern of Evolution. New York: Freeman; 1999. [ Google Scholar ]
  • Eldredge N. Darwin: Discovering the Tree of Life. New York: Norton; 2005. [ Google Scholar ]
  • Eldredge N, Grene M. Interactions: The Biological Context of Social Systems. New York: Columbia University Press; 1992. [ Google Scholar ]
  • Evans RG, Barer ML, Marmor TR. Why Are Some People Healthy and Others Not? The Determinants of Health of Populations. New York: De Gruyter; 1994. [ Google Scholar ]
  • Falk R. The Gene—A Concept in Tension: A Critical Overview. In: Beurton PJ, Falk R, Rehinberger H-J, editors. The Concept of the Gene in Development and Evolution: Historical and Epistemological Perspectives. Cambridge: Cambridge University Press; 2000. pp. 317–49. [ Google Scholar ]
  • Fee E. Disease and Discovery: A History of the Johns Hopkins School of Hygiene and Public Health, 1916–1939. Baltimore: Johns Hopkins University Press; 1987. [ Google Scholar ]
  • Fergus C. Trees of New England: A Natural History. Guildford, CT: FalconGuide; 2005. [ Google Scholar ]
  • Ferlay J, Forman D, Mathers CD, Bray F. Re: “Breast and Cervical Cancer in 187 Countries between 1980 and 2010” The Lancet. 2012; 379 :1390–91. [ PubMed ] [ Google Scholar ]
  • Foner E, editor. The New American History. Rev. and expanded ed. Philadelphia: Temple University Press; 1997. [ Google Scholar ]
  • Forouzanafar MH, Foreman KJ, Delossantos AM, Lozano R, Lopez AD, Murray CJ, Naghanvi M. Breast and Cervical Cancer in 187 Countries between 1980 and 2010: A Systematic Analysis. The Lancet. 2011; 378 :1461–84. [ PubMed ] [ Google Scholar ]
  • Fox SE, Levitt P, Nelson CA., III How the Timing and Quality of Early Experiences Influence the Development of Brain Architecture. Child Development. 2010; 81 :28–40. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Friedman-Koss D, Crespo CJ, Bellantoni MF, Andersen RE. The Relationship of Race/Ethnicity and Social Class to Hormone Replacement Therapy: Results from the Third National Health and Nutrition Examination Survey 1988–1994. Menopause. 2002; 9 :264–72. [ PubMed ] [ Google Scholar ]
  • Frost C, Appel H, Carlson J, De Moraes CM, Mescher M, Schultz JC. Within-Plant Signaling by Volatiles Overcomes Vascular Constraints on Systemic Signaling and Primes Responses against Herbivores. Ecology Letters. 2007; 10 :490–98. [ PubMed ] [ Google Scholar ]
  • Frost WH. In: Epidemiology. Maxcy KF, editor. New York: Commonwealth Fund; 1941. pp. 439–52. (1927) In Papers of Wade Hampton Frost, M.D . [ Google Scholar ]
  • Frost WH. 1976. Some Conceptions of Epidemics in General. American Journal of Epidemiology. 1928; 103 :141–51. [ PubMed ] [ Google Scholar ]
  • Galton F. Natural Inheritance. London: Macmillan; 1889. [ Google Scholar ]
  • Galton F. Eugenics: Its Definition, Scope, and Aims. Nature. 1904; 70 :82. [ Google Scholar ]
  • Gaziano JM. The Evolution of Population Science: Advent of the Mega Cohort. JAMA. 2010; 304 :2288–89. [ PubMed ] [ Google Scholar ]
  • Gibson JJ. The Ecological Approach to Visual Perception. Hillsdale, NJ: Erlbaum; 1986. [ Google Scholar ]
  • Giddens A, Held D, editors. Classes, Power, and Conflict: Classical and Contemporary Debates. Berkeley: University of California Press; 1982. [ Google Scholar ]
  • Gilbert SF, Epel D. Ecological Developmental Biology: Integrating Epigenetics, Medicine, and Evolution. Sunderland, MA: Sinaeur Associates; 2009. [ Google Scholar ]
  • GLOBOCAN. 2012. Data Sources and Methods. International Agency for Research on Cancer, World Health Organization. Available at http://globocan.iarc.fr/ (accessed June 17, 2012)
  • Greenhalgh S. The Social Construction of Population Science: An Intellectual, Institutional, and Political History of Twentieth-Century Demography. Comparative Studies Society History. 1996; 38 :26–66. [ Google Scholar ]
  • Greenwood M. Epidemics and Crowd Diseases: An Introduction to the Study of Epidemiology. London: Williams & Norgate; 1935. [ Google Scholar ]
  • Grene M, Depew D. The Philosophy of Biology. Cambridge: Cambridge University Press; 2004. [ Google Scholar ]
  • Hacking I. The Emergence of Probability. Cambridge: Cambridge University Press; 1975. [ Google Scholar ]
  • Hacking I. The Taming of Chance. Cambridge: Cambridge University Press; 1990. [ Google Scholar ]
  • Hankins FH. Adolphe Quetelet as Statistician. New York: Arno Press; 1968. [ Google Scholar ]
  • Harraway DJ. When Species Meet. Minneapolis: University of Minnesota Press; 2008. [ Google Scholar ]
  • Harré R. Individual/Society: History of the Concept. In: Smelser NJ, Baltes PB, editors. International Encyclopedia of the Social & Behavioral Sciences. Oxford: Pergamon; 2001. pp. 7306–10. DOI: 10.1016/B0-08-043076-7/00125-X . [ Google Scholar ]
  • Harvey D. Justice, Nature, and the Geography of Difference. Cambridge, MA: Blackwell; 1996. [ Google Scholar ]
  • Heesterbeek H. The Law of Mass-Action in Epidemiology: A Historical Perspective. In: Cuddington K, Beisner BE, editors. Ecological Paradigms Lost: Routes of Theory Change. Burlington, MA: Elsevier Academic Press; 2005. pp. 81–106. [ Google Scholar ]
  • Heilbron J, Magnusson L, Wittrock B, editors. The Rise of the Social Sciences and the Formation of Modernity: Conceptual Change in Context, 1750–1850. Dordrecht: Kluwer Academic Publishers; 1998. [ Google Scholar ]
  • Hey J. Regarding the Confusion between the Population Concept and Mayr's “Population Thinking.” Quarterly Review of Biology. 2011; 86 :253–64. [ PubMed ] [ Google Scholar ]
  • Hodge J. Evolution. In: Bowler PJ, Pickstone JV, editors. The Modern Biological and Earth Sciences. Cambridge: Cambridge University Press; 2009. Cambridge Histories Online. DOI: 10.1017/CHOL9780521572019.015 . [ Google Scholar ]
  • Hogben L. Nature and Nurture. London: Williams & Norgate; 1933. [ Google Scholar ]
  • Hogben L. Major Greenwood: 1880–1949. Obituary Notices of Fellows of the Royal Society. 1950; 7 :138–54. [ Google Scholar ]
  • IHME (Institute for Health Metrics and Evaluation) The Challenge Ahead: Progress and Setbacks in Breast and Cervical Cancer. Seattle: 2011. [ Google Scholar ]
  • Illari PM, Russo F, Williamson J. Why Look at Causality in the Sciences? A Manifesto. In: Illari PM, Russo F, Williamson J, editors. Causality in the Sciences. Oxford: Oxford University Press; 2011. pp. 3–22. [ Google Scholar ]
  • Irwin A, Valentine N, Brown C, Loewenson R, Solar O, Brown H, Koller T, Vega J. The Commission on the Social Determinants of Health: Tackling the Social Roots of Health Inequities. PLoS Medicine. 2006; 3 (6):e106. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Issac J. The Human Sciences in Cold War America. Historical Journal. 2007; 50 :725–46. [ Google Scholar ]
  • Jansson S, Douglas CJ. Populus: A Model System for Plant Biology. Annual Review of Plant Biology. 2007; 58 :435–458. [ PubMed ] [ Google Scholar ]
  • Jaslow R. CBS News. 2011. Breast, Cervical Cancer Rates Rising around World: Why? September 15, 2011. Available at http://www.cbsnews.com/8301-504763_162-20106719-10391704.html (accessed June 17, 2012) [ Google Scholar ]
  • Keller EF. The Century of the Gene. Cambridge, MA: Harvard University Press; 2000. [ Google Scholar ]
  • Keller EF. The Mirage of a Space between Nature and Nurture. Durham, NC: Duke University Press; 2010. [ Google Scholar ]
  • Kermack WO, McKendrick AG. Contributions to the Mathematical Theory of Epidemics, Part I. Proceedings of the Royal Society Series A. 1927; 115 :700–721. [ Google Scholar ]
  • Kevels D. In the Name of Eugenics: Genetics and the Uses of Human Heredity. New York: Knopf; 1985. [ Google Scholar ]
  • Kincaid H. Causal Modeling, Mechanisms, and Probability in Epidemiology. In: Illari PM, Russo F, Williamson J, editors. Causality in the Sciences. Oxford: Oxford University Press; 2011. pp. 70–90. [ Google Scholar ]
  • Klassen AC, Smith KC. The Enduring and Evolving Relationship between Social Class and Breast Cancer Burden: A Review of the Literature. Cancer Epidemiology. 2011; 35 :217–34. [ PubMed ] [ Google Scholar ]
  • Kolata G. New York Times. 2006. Reversing Trend, Big Drop Is Seen in Breast Cancer. December 15. Available at http://www.nytimes.com/2006/12/15/health/15breast.html?pagewanted=all (accessed June 17, 2012) [ Google Scholar ]
  • Kolata G. New York Times. 2007. Sharp Drop in Rates of Breast Cancer Holds. April 19. Available at http://query.nytimes.com/gst/fullpage.html?res=9a03e6d91e3ff93aa25757c0a9619c8b63 (accessed June 17, 2012) [ Google Scholar ]
  • Krieger N. Epidemiology and the Web of Causation: Has Anyone Seen the Spider. Social Science & Medicine. 1994; 39 :887–903. [ PubMed ] [ Google Scholar ]
  • Krieger N. Epidemiology and Social Sciences: Towards a Critical Reengagement in the 21st Century. Epidemiology Review. 2000; 11 :155–63. [ PubMed ] [ Google Scholar ]
  • Krieger N. Theories for Social Epidemiology in the 21st Century: An Ecosocial Perspective. International Journal of Epidemiology. 2001; 30 :668–77. [ PubMed ] [ Google Scholar ]
  • Krieger N. Breast Cancer: A Disease of Affluence, Poverty, or Both?—The Case of African American Women. American Journal of Public Health. 2002; 92 :611–13. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Krieger N. Embodiment: A Conceptual Glossary for Epidemiology. Journal of Epidemiology & Community Health. 2005; 59 :350–55. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Krieger N. Ways of Asking and Ways of Living: Reflections on the 50th Anniversary of Morris’ Ever-Useful Uses of Epidemiology. International Journal of Epidemiology. 2007a; 36 :1173–80. [ PubMed ] [ Google Scholar ]
  • Krieger N. Why Epidemiologists Cannot Afford to Ignore Poverty. Epidemiology. 2007b; 18 :658–63. [ PubMed ] [ Google Scholar ]
  • Krieger N. Hormone Therapy and the Rise and Perhaps Fall of US Breast Cancer Incidence Rates: Critical Reflections. International Journal of Epidemiology. 2008; 37 :627–37. [ PubMed ] [ Google Scholar ]
  • Krieger N. Epidemiology and the People's Health: Theory and Context. New York: Oxford University Press; 2011. [ Google Scholar ]
  • Krieger N, Bassett M, Gomez S. Re: “Breast and Cervical Cancer in 187 Countries between 1980 and 2010.” The Lancet. 2012; 379 :1391–92. [ PubMed ] [ Google Scholar ]
  • Krieger N, Chen JT, Waterman PD. Decline in US Breast Cancer Rates after the Women's Health Initiative: Socioeconomic and Racial/Ethnic Differentials. American Journal of Public Health. 2010; 100 :S132–S139. erratum, 972. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Krieger N, Chen JT, Waterman PD. Temporal Trends in the Black/White Breast Cancer Case Ratio for Estrogen Receptor Status: Disparities Are Historically Contingent, Not Innate. Cancer Causes and Control. 2011; 22 :511–14. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Krieger N, Davey Smith G. Bodies Count & Body Counts: Social Epidemiology & Embodying Inequality. Epidemiology Review. 2004; 26 :92–103. [ PubMed ] [ Google Scholar ]
  • Krieger N, Fee E. Measuring Social Inequalities in Health in the United States: An Historical Review, 1900–1950. International Journal of Health Services. 1996; 26 :391–418. [ PubMed ] [ Google Scholar ]
  • Krieger N, Williams D, Moss N. Measuring Social Class in US Public Health Research: Concepts, Methodologies and Guidelines. Annual Review of Public Health. 1997; 18 :341–78. [ PubMed ] [ Google Scholar ]
  • Kuhlmann M. Mechanisms in Dynamically Complex Systems. In: Illari PM, Russo F, Williamson J, editors. Causality in the Sciences. Oxford: Oxford University Press; 2011. pp. 880–906. [ Google Scholar ]
  • Kukuall WA, Ganguli M. Generalizability: The Trees, the Forest, and the Low-Hanging Fruit. Neurology. 2012; 78 :1886–91. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kunitz SJ. The Health of Populations: General Theories and Particular Realities. New York: Oxford University Press; 2007. [ Google Scholar ]
  • Kuzawa C. Why Evolution Needs Development, and Medicine Needs Evolution. International Journal of Epidemiology. 2012; 41 :223–29. [ PubMed ] [ Google Scholar ]
  • Lane-Claypon JE. A Further Report on Cancer of the Breast with Special Reference to Its Associated Antecedent Conditions. Reports on Public Health and Medical Subjects no. 32. London: HMSO; 1926. [ Google Scholar ]
  • Lewontin R. The Triple Helix: Gene, Organism, and Environment. Cambridge, MA: Harvard University Press; 2000. [ Google Scholar ]
  • Lilienfeld AM, editor. Times, Places, and Persons: Aspects of the History of Epidemiology. Baltimore: Johns Hopkins University Press; 1980. [ Google Scholar ]
  • Limpert E, Stahel WA, Abbt M. Log-Normal Distributions across the Sciences: Keys and Clues. BioSci. 2001; 51 :341–52. [ Google Scholar ]
  • Mackenzie D. Statistics in Britain, 1865–1930: The Social Construction of Scientific Knowledge. Edinburgh: Edinburgh University Press; 1982. [ Google Scholar ]
  • Martin J, Harré R. Metaphor in Science. In: Miall DS, editor. Metaphor: Problems and Perspectives. Sussex, NJ: Harvester Press; 1982. pp. 89–105. [ Google Scholar ]
  • Marx K. In: Theses on Feuerbach. Dietz JHW, editor. Stuttgart: 1845. 1888. First published, in an edited version, as an appendix to Engels F. Ludwig Feuerbach und der Ausgang der klassischen deutschen Philosophie. Mit Anghard: Karl Marx über Feuerbach von Jarhe 1845 . Available at http://www.marxists.org/archive/marx/works/1845/theses/index.htm (2002 trans. by Cyril Smith) (accessed June 17, 2012) [ Google Scholar ]
  • Mayr E. Towards a New Philosophy of Biology: Observations of an Evolutionist. Cambridge, MA: Harvard University Press; 1988. [ Google Scholar ]
  • Mendelsohn JA. From Eradication to Equilibrium: How Epidemics Became Complex after World War I. In: Lawrence C, Weisz G, editors. Greater Than the Parts: Holism in Biomedicine, 1920–1950. New York: Oxford University Press; 1998. pp. 303–31. [ Google Scholar ]
  • Mitchell M. Complexity: A Guided Tour. Oxford: Oxford University Press; 2009. [ Google Scholar ]
  • Morange M. The Misunderstood Gene. Cambridge, MA: Harvard University Press; 2001. [ Google Scholar ]
  • Morris JN. Uses of Epidemiology. Edinburgh: E. & S. Livingston; 1957. [ Google Scholar ]
  • Mountain JL. Human Evolutionary Genetics. In: Smelser NJ, Baltes PB, editors. International Encyclopedia of the Social & Behavioral Sciences. Oxford: Pergamon, Oxford; 2001. pp. 6984–91. DOI: 10.1016/B0-08-043076-7/03088-6 . [ Google Scholar ]
  • Nash K, Scott A, editors. The Blackwell Companion to Political Sociology. Malden, MA: Blackwell; 2001. [ Google Scholar ]
  • OED (Oxford English Dictionary) online. 2010. Draft revision June. Available at http://dictionary.oed.com.ezp-prod1.hul.harvard.edu/ (accessed June 17, 2012)
  • Parkin DM, Fernández LMG. Use of Statistics to Assess the Global Burden of Breast Cancer. Breast Journal. 2006; 12 (suppl. 1):S70– S80. [ PubMed ] [ Google Scholar ]
  • Pearce N. Epidemiology as a Population Science. International Journal of Epidemiology. 1999; 28 :S1015–S18. [ PubMed ] [ Google Scholar ]
  • Pflughoeft KJ, Versalovic J. Human Microbiome in Health and Disease. Annual Review of Pathology: Mechanisms of Disease. 2012; 7 :99–122. [ PubMed ] [ Google Scholar ]
  • Piermsa T, van Gils JA. The Flexible Phenotype: A Body-Centered Integration of Ecology, Physiology, and Behavior. New York: Oxford University Press; 2011. [ Google Scholar ]
  • Pizzi C, De Stavola B, Merletti F, Bellocco R, dos Santos Silva I, Pearce N, Richiardi L. Sample Selection and Validity of Exposure-Disease Association Estimates in Cohort Studies. Journal of Epidemiology & Community Health. 2011; 65 :407–11. [ PubMed ] [ Google Scholar ]
  • Porta M, editor. A Dictionary of Epidemiology. 5th ed. Oxford: Oxford University Press; 2008. [ Google Scholar ]
  • Porter TM. A Statistical Survey of Gases: Maxwell's Social Physics. Historical Studies in the Physical Sciences. 1981; 12 :77–116. [ Google Scholar ]
  • Porter TM. The Rise of Statistical Thinking, 1820–1900. Princeton, NJ: Princeton University Press; 1986. [ Google Scholar ]
  • Porter TM. Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton, NJ: Princeton University Press; 1995. [ Google Scholar ]
  • Porter TM. Statistics and Physical Theories. In: Nye MJ, editor. The Modern Physical and Mathematical Sciences. Cambridge: Cambridge University Press; 2002. Cambridge Histories Online. DOI: 10.1017/CHOL9780521571999.027 . [ Google Scholar ]
  • Porter TM. Statistics and Statistical Methods. In: Porter TM, Ross D, editors. The Modern Social Sciences. Cambridge: Cambridge University Press; 2003. Cambridge Histories Online. DOI: 10.1017/CHOL9780521594424.015 . [ Google Scholar ]
  • Press DJ, Pharoah P. Risk Factors for Breast Cancer: A Reanalysis of Two Case-Control Studies from 1926 and 1931. Epidemiology. 2010; 21 :566–72. [ PubMed ] [ Google Scholar ]
  • Quetelet A. In: Sur l’homme et le development des ses facultés, ou essai de physique sociale. Knox R, translator. 1835. Paris. For a translation, see Quetelet, A. (1842) 1968. A Treatise on Man and the Development of His Faculties . Reprint, New York: Burt Franklin. [ Google Scholar ]
  • Quetelet A. Recherches statistiques. Brussels: M. Hayez (Imprimeur de la Commission centrale de statistique); 1844. [ Google Scholar ]
  • Ramsden E. Carving Up Population Science: Eugenics, Demography and the Controversy over the “Biological Law” of Population Growth. Social Studies of Science. 2002; 32 :857–99. [ Google Scholar ]
  • Ravdin PM, Cronin KA, Howlader N, Berg CD, Chlebowski RT, Feuer EJ, Edwards BK, Berry DA. The Decrease in Breast-Cancer Incidence in 2003 in the United States. New England Journal of Medicine. 2007; 356 :1670–74. [ PubMed ] [ Google Scholar ]
  • Ravdin PM, Cronin KA, Howlader N, Chlebowski RT, Berry DA. A Sharp Decrease in Breast Cancer Incidence in the United States in 2003. Breast Cancer Research and Treatment. 2006; 100 (suppl) S2 (abstract) [ Google Scholar ]
  • Relton CL, Davey Smith G. Is Epidemiology Ready for Epigenetics? International Journal of Epidemiology. 2012; 41 :5–9. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Richards RA. Encyclopedia of Life Sciences. New York: Wiley; 2001. Species Problem—A Philosophical Analysis. (online 2007). DOI: 10.1002/9780470015902.a0003456 . [ Google Scholar ]
  • Rose D, Pevalin DJ, editors. A Researcher's Guide to the National Statistics Socio-economic Classification. London: Sage; 2003. [ PubMed ] [ Google Scholar ]
  • Rose GA. Sick Individuals and Sick Populations. International Journal of Epidemiology. 1985; 14 :32–38. [ PubMed ] [ Google Scholar ]
  • Rose GA. The Strategy of Preventive Medicine. Oxford: Oxford University Press; 1992. [ Google Scholar ]
  • Rose GA. Rose's Strategy of Preventive Medicine: The Complete Original Text, with a Commentary by Kay-Tee Khaw and Michael Marmot. Oxford: Oxford University Press; 2008. [ Google Scholar ]
  • Rosen G. A History of Public Health. Baltimore: Johns Hopkins University Press; 1993. (1958) Expanded ed. Introduction by E. Fee; biographical essay and new bibliography by E.T. Morman. [ Google Scholar ]
  • Ross D. Changing Contours of the Social Science Disciplines. In: Porter TM, Ross D, editors. The Modern Social Sciences. Cambridge: Cambridge University Press; 2003. pp. 275–305. [ Google Scholar ]
  • Rossouw JE, Anderson GL, Prentice RL, LaCroix AZ, Kooperberg C, Stefanick ML, Jackson RD, Beresford SA, Howard BV, Johnson KC, Kotchen JM, Ockene J, Writing Group for the Women's Health Initiative Investigators Risk and Benefits of Estrogen plus Progestin in Healthy Postmenopausal Women: Principal Results from the Women's Health Initiative Randomized Controlled Trial. JAMA. 2002; 288 :321–33. [ PubMed ] [ Google Scholar ]
  • Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008. [ Google Scholar ]
  • Sarkar S. Lancelot Hogben, 1895–1975. Genetics. 1996; 142 :655–60. [ Google Scholar ]
  • Schank JC, Twardy C. Mathematical Models. In: Bowler PJ, Pickstone JV, editors. The Modern Biological and Earth Sciences. Cambridge: Cambridge University Press; 2009. Cambridge Histories Online. DOI: 10.1017/CHOL9780521572019.023 . [ Google Scholar ]
  • Schlesselman JJ, Stadel BV. Exposure Opportunity in Epidemiologic Studies. American Journal of Epidemiology. 1987; 125 :174–78. [ PubMed ] [ Google Scholar ]
  • Scott J, Marshall G, editors. A Dictionary of Sociology. 3rd ed. Oxford: Oxford University Press; 2005. [ Google Scholar ]
  • Shaghaghi A, Bhopal RJ, Sheik A. Approaches to Recruiting “Hard-to-Reach” Populations in Research: Review of the Literature. Health Promotion Perspectives. 2011; 1 (2):1–9. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Smith GD. The Uses of “Uses of Epidemiology.” International Journal of Epidemiology. 2001; 30 :1146–55. [ PubMed ] [ Google Scholar ]
  • Smylie J, Lofters A, Firestone M, O’Campo P. Population-Based Data and Community Empowerment. In: O’Campo P, Dunn JR, editors. Rethinking Social Epidemiology: Towards a Science of Change. Dordrecht: Springer Science+Business Media B.V; 2012. pp. 68–92. [ Google Scholar ]
  • Stanley D, Phelps AE, Banaji MR. The Neural Basis of Implicit Attitudes. Current Directions in Psychological Science. 2008; 17 :165–70. [ Google Scholar ]
  • Steinman E. Sovereigns and Citizens? The Contested Status of American Indian Tribal Nations and Their Members. Citizenship Studies. 2011; 15 :57–74. [ Google Scholar ]
  • Stigler SM. The History of Statistics: The Measurement of Uncertainty before 1900. Cambridge, MA: Belknap Press /Harvard University Press; 1986. [ Google Scholar ]
  • Stigler SM. Regression towards the Mean, Historically Considered. Statistical Methods in Medical Research. 1997; 6 :103–14. [ PubMed ] [ Google Scholar ]
  • Stigler SM. The Average Man Is 168 Years Old. In: Stigler SM, editor. Statistics on the Table: The History of Statistical Concepts and Methods. Cambridge, MA: Harvard University Press; 2002. pp. 51–65. [ Google Scholar ]
  • Stiglitz J. Freefall: America, Free Markets, and the Sinking World Economy. New York: Norton; 2010. [ Google Scholar ]
  • Strevens M. Bigger Than Chaos: Understanding Complexity through Probability. Cambridge, MA: Harvard University Press; 2003. [ Google Scholar ]
  • Susser M, Stein Z. Eras in Epidemiology: The Evolution of Ideas. New York: Oxford University Press; 2009. [ Google Scholar ]
  • Svensson P-G. Special Issue: Health Inequities in Europe. Social Science & Medicine. 1990; 31 :225–27. [ PubMed ] [ Google Scholar ]
  • Sydenstricker E. Health and Environment. New York: McGraw-Hill; 1933. [ Google Scholar ]
  • Tabery J. R.A. Fisher, Lancelot Hogben, and the Origin(s) of Genotype-Environment Interaction. Journal of the History of Biology. 2008; 41 :717–61. [ PubMed ] [ Google Scholar ]
  • Turner JH. A New Approach for Theoretically Integrating Micro and Macro Analyses. In: Calhoun C, Rojek C, Turner B, editors. The Sage Handbook of Sociology. Thousand Oaks, CA: Sage; 2005. pp. 405–22. [ Google Scholar ]
  • U.S. Citizenship and Immigration Services. 2012. Citizenship. Available at http://www.uscis.gov/portal/site/uscis/ (accessed June 17, 2012)
  • Vona-Davis L, Rose DP. The Influence of Socioeconomic Disparities on Breast Cancer Tumor Biology and Prognosis: A Review. Journal of Women's Health. 2009; 18 :883–93. [ PubMed ] [ Google Scholar ]
  • Wainwright JM. A Comparison of Conditions Associated with Breast Cancer in Great Britain and America. American Journal of Cancer. 1931; 15 :2610–45. [ Google Scholar ]
  • Wallace TA, Martin DN, Ambs S. Interactions among Genes, Tumor Biology and the Environment in Cancer Health Disparities: Examining the Evidence on a National and Global Scale. Carcinogenesis. 2011; 32 :1107–21. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Walter J, Ley R. The Human Gut Microbiome: Ecology and Recent Evolutionary Changes. Annual Review of Microbiology. 2011; 65 :411–29. [ PubMed ] [ Google Scholar ]
  • Weiss KM, Long JC. Non-Darwinian Estimation: My Ancestors, My Genes’ Ancestors. Genome Research. 2009; 19 :703–10. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Werskey G. In: The Visible College: A Collective Biography of British Scientists and Socialists of the 1930s. Young RM, editor. London: Free Association Books; 1988. Foreword by. [ Google Scholar ]
  • West-Eberhard MT. Developmental Plasticity and Evolution. New York: Oxford University Press; 2003. [ Google Scholar ]
  • Whitehead M. The Concepts and Principles of Equity and Health. International Journal of Health Services. 1992; 22 :429–45. [ PubMed ] [ Google Scholar ]
  • WHO (World Health Organization) Closing the Gap in a Generation: Health Equity through Action on the Social Determinants of Health. 2008. Commission on the Social Determinants of Health—Final Report. Geneva. Available at http://www.who.int/social_determinants/thecommission/finalreport/en/index.html (accessed June 17, 2012) [ PubMed ] [ Google Scholar ]
  • WHO (World Health Organization) 2011. Rio Political Declaration on Social Determinants of Health. Rio de Janeiro, October 21. Available at http://www.who.int/sdhconference/declaration/en/index.html (accessed June 17, 2012)
  • Wiehl DG. Edgar Sydenstricker: A Memoir. In: Kasius RV, editor. The Challenge of the Facts: Selected Public Health Papers of Edgar Sydenstricker. New York: Prodist, for the Milbank Memorial Fund; 1974. pp. 1–17. [ Google Scholar ]
  • Williams R. Keywords: A Vocabulary of Culture and Society. Rev. ed. New York: Oxford University Press; 1985. [ Google Scholar ]
  • Wimmer A, Schiller NG. Methodological Nationalism and Beyond: Nation-State, Migration, and the Social Sciences. Global Networks. 2002; 4 :301–34. [ Google Scholar ]
  • Winkelstein W., Jr . Oxford Dictionary of National Biography. Oxford: Oxford University Press; 2004. Claypon, Janet Elizabeth Lane- [married name Janet Elizabeth Forber, Lady Forber] (1877–1967) Available at http://www.oxforddnb.com.ezp-prod1.hul.harvard.edu/view/article/61714 (accessed June 17, 2012) [ Google Scholar ]
  • Winslow C-EA, Smillie WG, Doull JA, Gordon JE. In: The History of American Epidemiology. Top FH, editor. Mosby; 1952. Sponsored by the Epidemiology Section, American Public Health Association. St. Louis. [ Google Scholar ]
  • Wright EO, editor. Approaches to Class Analysis. Cambridge: Cambridge University Press; 2005. [ Google Scholar ]
  • Wright S. The Relative Importance of Heredity and Environment in Determining the Pie-Bald Pattern of Guinea-Pigs. 1920; 6 :320–32. Proceedings of the National Academy of Sciences . [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Yeo EJ. Social Surveys in the Eighteenth and Nineteenth Centuries. In: Porter TM, Ross D, editors. The Modern Social Sciences. Cambridge: Cambridge University Press; 2003. Cambridge Histories Online. DOI: 10.1017/CHOL9780521594424.007 . [ Google Scholar ]
  • Young TK. Population Health: Concepts and Methods. 2nd ed. New York: Oxford University Press; 2005. [ Google Scholar ]
  • Zbuk K, Anand SS. Declining Incidence of Breast Cancer after Decreased Use of Hormone-Replacement Therapy: Magnitude and Time Lags in Different Countries. Journal of Epidemiology & Community Health. 2012; 66 :1–7. [ PubMed ] [ Google Scholar ]
  • Ziman J. Real Science: What It Is and What It Means. Cambridge: Cambridge University Press; 2000. [ Google Scholar ]
  • Zinn H. A People's History of the United States: 1492–Present. New York: HarperCollins; 2003. [ Google Scholar ]

3. Populations and samples

Populations, unbiasedness and precision, randomisation, variation between samples, standard error of the mean.

define population research

A population is defined as a group of individuals of the same species living and interbreeding within a given area. Members of a population often rely on the same resources, are subject to similar environmental constraints, and depend on the availability of other members to persist over time. Scientists study a population by examining how individuals in that population interact with each other and how the population as a whole interacts with its environment. As a tool for objectively studying populations, population ecologists rely on a series of statistical measures, known as demographic parameters , to describe that population (Lebreton et al . 1992). The field of science interested in collecting and analyzing these numbers is termed population demographics, also known as demography.

Broadly defined, demography is the study of the characteristics of populations. It provides a mathematical description of how those characteristics change over time. Demographics can include any statistical factors that influence population growth or decline, but several parameters are particularly important: population size, density, age structure, fecundity (birth rates), mortality (death rates), and sex ratio (Dodge 2006). We introduce each of these in turn.

Population Size

Populations display distinctive behaviors based on their size. Small populations face a greater risk of extinction (Caughley 1994). Individuals in these populations can have a hard time finding quality mates so, fewer individuals mate and those that do risk inbreeding (Hamilton 1967). Additionally, individuals in small population are more susceptible to random deaths. Events like fire, floods, and disease have a greater chance of killing all individuals in the population.

Large populations experience their own problems. As they approach the maximum sustainable population size, known as carrying capacity, large populations show characteristic behavior. Populations nearing their carrying capacity experience greater competition for resources, shifts in predator-prey relationships , and lowered fecundity. If the population grows too large, it may begin to exceed the carrying capacity of the environment and degrade available habitat (Figure 1).

Figure 1: Swarms of locusts exceed carrying capacity with huge population sizes. These short-lived spikes in population size produce swarms capable of destroying farms as they move across the agricultural landscapes, eating everything in their path. Photo courtesy of Compton Tucker/NASA GSFC.

Population Density

A more complete description of a population's size includes the population density — the size of a population in relation to the amount of space that it occupies. Density is usually expressed as the number of individuals per unit area or volume (Lebreton et al . 1992). For example: the number of crows per square kilometer or the number of plankton per liter (Andren 1992, Sterner 1986). Like all population properties, density is a dynamic characteristic that changes over time as individuals are added to or removed from the population. Closely related species of Gannet birds will maintain very different densities (Figure 2 ). Birth and immigration — the influx of new individuals from other areas — can increase a population's density, while death and emigration — the movement of individuals out of a population to other areas — can decrease its density (Lebreton et al . 1992).

Figure 2: Gannets can persist at very high densities. They have developed exaggerated territorial behavior as an adaptation to sustain these densely packed colonies. Photo courtesy of Follash via Wikimedia Commons.

Similar to population size, population density displays distinctive characteristics at both high and low values. Density-dependent factors , including competition, predation , migration and disease, intensify within populations as density increases. In contrast, density-independent factors , such as weather, fire regimes, and flooding, impact populations regardless of their specific densities (Lebreton et al . 1992).

Age Structure

Age structure can be represented graphically with a population pyramid (Figure 3). Although a population's age structure is not always pyramidal in shape, most populations have younger cohorts that are larger than older cohorts. For example, Sherman and Morton's studies of the Tioga Pass Belding's ground squirrels revealed birth cohorts larger than 300 individuals and less than 10 individuals in cohorts over the age of six (Sherman & Morton 1984).

View Terms of Use

While maximum fecundity is a constant for populations, realized fecundity varies over time based on the size, density, and age structure of the population. External conditions, such as food and habitat availability, can also influence fecundity. Density-dependent regulation provides a negative feedback if the population grows too large, by reducing birth rates and halting population growth through a host of mechanisms (Lebreton et al . 1992). In white-footed mice, for example, populations regulate their reproductive rate via a stress hormone . As population densities increase, so do aggressive interactions between individuals (even when food and shelter are unlimited). High population densities lead to frequent aggressive encounters, triggering a stress syndrome in which hormonal changes delay sexual maturation , cause reproductive organs to shrink, and depress the immune system (Krohne 1984).

To visualize mortality and fecundity within a population, ecologists create life tables to display age-specific statistical summaries of a population's survival patterns. First developed by Roman actuaries, life tables were used to estimate how long individuals of a particular age category were expected to live in order to value life insurance products (Trenerry 1926). Raymond Pearl (1928) first introduced the life table to biology when he applied it to laboratory studies of the fruit fly, Drosophila . Life tables are particularly useful for species with discrete developmental stages and mortality rates that vary widely from one stage to the next (Figure 5).

Nezara viridula ) assumes different body forms through metamorphosis between different discrete life stages." /> Figure 5: The Green Stink Bug ( Nezara viridula ) assumes different body forms through metamorphosis between different discrete life stages. Photo courtesy of Jovo26 via Wikimedia Commons

Interestingly, sex ratio is not always random but can be manipulated at birth by environmental or physiological mechanisms. All crocodiles and many reptiles utilize a strategy called environmental sex determination, wherein incubation temperature determines the sex of each individual (Delmas et al . 2008). For example, low temperatures will produce males and high temperatures will produce females. In times of limited resources or high population densities, females can manipulate the sex ratios of their clutch by spending more or less time incubating their eggs (Girondot et al . 2004).

age-specific : The age of the individual is important for statistical purposes.

clutch size : The number of offspring one female produces in one reproductive cycle.

cohort : Group of all individuals sharing a statistical factor (such as age or developmental stage)

density-dependent factors : Depending on the local density of the population

density-independent factors : Not linked to the local density of the population

discrete developmental stages : Non-overlapping and structurally distinct growth stages. E.g. tadpoles are one discrete developmental stage and adult frogs are another.

ecosystem : A natural system including the interaction of all living and non-living elements.

extinction : No longer existing.

extrapolating : Estimating an unknown value by assuming that a known value can translate (without distortion) to the scale of the unknown value.

growth rate : The rate of change of population size over time.

inbreeding : Breeding of closely related individuals, often with negative genetic consequences.

incubated : Provided with a heat source during embryonic development.

life tables : Specific format of statistical summary of demographic parameters.

migration : Populations moving from one geographic location to another.

objectively : To study without bias and by measurable and repeatable metrics.

offspring : The individual produced from the reproduction of its parents.

parameter : A value in an equation that does not vary. These values can change between different equations of similar form.

predator-prey relationships : How populations of predators are interacting with populations of prey.

predation : The act of killing another living organism for food.

physiological : The parts and functions of living organisms.

reproductive organs : Specialized collection of cells used to exchange gametes between sexually reproducing organisms.

rates : A mathematical term for the number of things or events happening in a given amount of time.

rearing : To invest energy in the growth and development of offspring after they are born.

subjectively designated geographic range : A parcel of land, the size of which is chosen without using standardized criteria. Picked at the discretion of the researcher.

sustainable : System able to be maintained itself indefinitely without supplement.

sexual maturation : An individual reaching a stage of development where it is able to sexually reproduce.

stress hormone : Chemical compounds synthesized in the body to chemically communicate a stress reaction to various systems within that organism.

statistic : A number acting as a description for more numbers.

References and Recommended Reading

Andren, H. Corvid density and nest predation in relation to forest fragmentation: A landscape perspective. Ecology 73, 794-804 (1992).

Bull, J. Evolution of environmental sex determination from genotypic sex determination. Heredity 47, 173-184 (1981).

Caughley, G. Directions in conservation biology. Journal of Animal Ecology 63, 215-244 (1994).

Delmas, V., Pieau, C. & Girondot, M. A mechanistic model of temperature-dependent sex determination in a chelonian, the European pond turtle. Functional Ecology 22, 84-93 (2008).

Dodge, Y. The Oxford Dictionary of Statistical Terms. Oxford, UK: Oxford University Press, 2006.

Benrey, B & Denno, R. F. The slow-growth-high-mortality hypothesis: A test using the cabbage butterfly. Ecology 78, 987-999 (1997).

Girondot, M. et al . "Implications of temperature-dependent sex determination for population dynamics," Temperature-Dependent Sex Determination in Vertebrates , 148-155, eds. N. Valenzuela & V. Lance. Smithsonian Books, 2004.

Hamilton, W. D. Extraordinary sex ratios. Science 156, 477-488 (1967).

Harcombe, P. A. Tree life tables. BioScience 37, 557-568 (1987).

Hutchinson, G. E. Population studies: Animal ecology and demography. Bulletin of Mathematical Biology 53, 193-213 (1991).

Jiguet, F. et al . Bird population trends are linearly affected by climate change along species thermal ranges. Proceedings of the Royal Society B: Biological Sciences 277, 3601-3608 (2010).

Krohne, D. T., Dubbs, B. A. & Baccus, R. An analysis of dispersal in an unmanipulated population of Peromyscus leucopus. American Midland Naturalist 112, 146-156 (1984).

Lebreton, J-D. et al . Modeling survival and testing biological hypotheses using marked animals: A unified approach with case studies. Ecological Monographs 62, 67-118 (1992).

Martin, T. E. Avian life history evolution in relation to nest site, nest predation, and food. Ecological Monographs 65, 101-127 (1995).

Pearl, R. The Rate of Living, Being an Account of Some Experimental Studies on the Biology of Life Duration . New York, NY: Alfred A. Knopf, 1928.

Stearns, S. C. Life history tactics: A review of the ideas. The Quarterly Review of Biology 51, 3-47 (1976).

Sterner, R. Herbivores' direct and indirect effects on algal populations. Science 231, 605-607 (1986).

Trenerry, C. F. The Origin and Early History of Insurance, Including the Contract of Bottomry. London, UK: P. S. King & Son, 1926.

Flag Inappropriate

Google Plus+

StumbleUpon

Email your Friend

define population research

  •  |  Lead Editor: 

Topic Rooms

Within this Subject (22)

  • Basic (12)
  • Intermediate (5)
  • Advanced (5)

Other Topic Rooms

  • Ecosystem Ecology
  • Physiological Ecology
  • Population Ecology
  • Community Ecology
  • Global and Regional Ecology
  • Conservation and Restoration
  • Animal Behavior
  • Teach Ecology
  • Earth's Climate: Past, Present, and Future
  • Terrestrial Geosystems
  • Marine Geosystems
  • Scientific Underpinnings
  • Paleontology and Primate Evolution
  • Human Fossil Record
  • The Living Primates

ScholarCast

© 2014 Nature Education

  • Press Room |
  • Terms of Use |
  • Privacy Notice |

Send

Visual Browse

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology
  • Population vs Sample | Definitions, Differences & Examples

Population vs Sample | Definitions, Differences & Examples

Published on 3 May 2022 by Pritha Bhandari . Revised on 5 December 2022.

Population vs sample

A population is the entire group that you want to draw conclusions about.

A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population.

In research, a population doesn’t always refer to people. It can mean a group containing elements of anything you want to study, such as objects, events, organisations, countries, species, or organisms.

Table of contents

Collecting data from a population, collecting data from a sample, population parameter vs sample statistic, practice questions: populations vs samples, frequently asked questions about samples and populations.

Populations are used when your research question requires, or when you have access to, data from every member of the population.

Usually, it is only straightforward to collect data from a whole population when it is small, accessible and cooperative.

For larger and more dispersed populations, it is often difficult or impossible to collect data from every individual. For example, every 10 years, the federal US government aims to count every person living in the country using the US Census. This data is used to distribute funding across the nation.

However, historically, marginalised and low-income groups have been difficult to contact, locate, and encourage participation from. Because of non-responses, the population count is incomplete and biased towards some groups, which results in disproportionate funding across the country.

In cases like this, sampling can be used to make more precise inferences about the population.

Prevent plagiarism, run a free check.

When your population is large in size, geographically dispersed, or difficult to contact, it’s necessary to use a sample. With statistical analysis , you can use sample data to make estimates or test hypotheses about population data.

Ideally, a sample should be randomly selected and representative of the population. Using probability sampling methods (such as simple random sampling or stratified sampling ) reduces the risk of sampling bias and enhances both internal and external validity .

For practical reasons, researchers often use non-probability sampling methods . Non-probability samples are chosen for specific criteria; they may be more convenient or cheaper to access. Because of non-random selection methods, any statistical inferences about the broader population will be weaker than with a probability sample.

Reasons for sampling

  • Necessity : Sometimes it’s simply not possible to study the whole population due to its size or inaccessibility.
  • Practicality : It’s easier and more efficient to collect data from a sample.
  • Cost-effectiveness : There are fewer participant, laboratory, equipment, and researcher costs involved.
  • Manageability : Storing and running statistical analyses on smaller datasets is easier and reliable.

When you collect data from a population or a sample, there are various measurements and numbers you can calculate from the data. A parameter is a measure that describes the whole population. A statistic is a measure that describes the sample.

You can use estimation or hypothesis testing to estimate how likely it is that a sample statistic differs from the population parameter.

Sampling error

A sampling error is the difference between a population parameter and a sample statistic. In your study, the sampling error is the difference between the mean political attitude rating of your sample and the true mean political attitude rating of all undergraduate students in the Netherlands.

Sampling errors happen even when you use a randomly selected sample. This is because random samples are not identical to the population in terms of numerical measures like means and standard deviations .

Because the aim of scientific research is to generalise findings from the sample to the population, you want the sampling error to be low. You can reduce sampling error by increasing the sample size.

Samples are used to make inferences about populations . Samples are easier to collect data from because they are practical, cost-effective, convenient, and manageable.

Populations are used when a research question requires data from every member of the population. This is usually only feasible when the population is small and easily accessible.

A statistic refers to measures about the sample , while a parameter refers to measures about the population .

A sampling error is the difference between a population parameter and a sample statistic .

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Bhandari, P. (2022, December 05). Population vs Sample | Definitions, Differences & Examples. Scribbr. Retrieved 22 April 2024, from https://www.scribbr.co.uk/research-methods/population-versus-sample/

Is this article helpful?

Pritha Bhandari

Pritha Bhandari

Other students also liked, sampling methods | types, techniques, & examples, a quick guide to experimental design | 5 steps & examples, what is quantitative research | definition & methods.

Population Research: Definition, Examples

Population Research Definition, Examples

What is Population Research?

Population research is a scientific inquiry to understand the population dynamics of a population’s size, structure, growth, distribution, and dispersal.

In contrast, family planning research inquires about contraception, side effects, follow-up, etc.

Reproductive health encompasses maternal, adolescent, antenatal, postnatal, and delivery care. Child health includes, among others, nutrition, growth monitoring, breastfeeding, immunization, diarrhoeal diseases, etc.

Examples of Population Research

Example #1: (Population dynamics)

Kabir et al. (1997), utilizing the data from various sources, examined the prospect of stabilization of the Bangladesh population under alternative demographic scenarios.

They further discussed opportunities for achieving zero population growth vis a vis NRR=1 by the end of 2010. Their findings concluded that if the fertility target could be completed as envisaged, the Bangladesh population would be more or less stable by 2050.

Example #2: (Population/Demography)

While substantial fertility decline has started to take place in other countries of the South Asian region, Bangladesh has shown only a slight decrease in the prevailing high fertility rates.

Several demographers, economists, and sociologists have emphasized the role of demand for children as an essential source of change in the reproductive behavior of individuals (Bulatao, 1981; Pullum, 1983; Bulatao and Lee, 1983; Pritchett, 1994).

In Bangladesh, many social scientists believe that the demand for children is still high, which keeps fertility levels high; couples prefer to have more children.

It is thus imperative to have an insight into the fertility preferences maintained by the people that are considered to have an essential bearing on fertility outcomes and contraceptive use behavior.

Example #3: (Family Planning)

Duston and Miller (1995) initiated research to ascertain how to improve community-based family planning services and the potential for increasing contraceptive prevalence in Bangladesh.

The study’s specific objectives were to investigate the degree to which improved service delivery in Bangladesh can increase contraceptive use given the present status of demand and programmatic factors most associated with increased prevalence and make these projects viable and more widely known.

The National Institute of Population Research and Training (NIPORT), under the Ministry of Health and Family Welfare, Govt of Bangladesh, contacts some studies encompassing health, nutrition, family planning, and reproductive health.

In a recent search for priority research, NIPORT identified a few areas and prepared a list to execute the studies in 2017-18.

How does population research differ from family planning research?

While population research focuses on understanding population dynamics, family planning research specifically inquires about topics like contraception, side effects, follow-up, and other related aspects.

30 Accounting Research Paper Topics and Ideas for Writing

Your email address will not be published. Required fields are marked *

  • Search Search Please fill out this field.

What Is Population?

Understanding populations, how to measure a population, population and investing, the bottom line.

  • Fundamental Analysis

Population Definition in Statistics and How to Measure It

define population research

Pete Rathburn is a copy editor and fact-checker with expertise in economics and personal finance and over twenty years of experience in the classroom.

define population research

Investopedia / Matthew Collins

In statistics , a population is the pool from which a sample is drawn for a study. Thus, any selection grouped by a common feature can be considered a population. A sample is a statistically significant portion of a population.

Key Takeaways

  • In statistics, a population is the entire group on which data is being gathered and analyzed.
  • It is generally difficult in terms of cost and time to gather the data needed on an entire population, so samples are often used to make inferences about a population.
  • A sample of a population must be randomly selected for the results of the study to accurately reflect the whole.

Statisticians, scientists, and analysts prefer to know the characteristics of every entity in a population to draw the most precise conclusions possible. However, this is impossible or impractical most of the time since population sets tend to be quite large. A sample of a population must usually be taken since the characteristics of every individual in a population cannot be measured due to constraints of time, resources, and accessibility.

It's important to note that when referring to an individual in statistics, the term does not always mean a person. Statistically, an individual is a single entity in the group being studied.

For example, there is no real way to gather data on all of the great white sharks in the ocean (a population) because finding and tagging each one isn't feasible. So, marine biologists tag the great whites they can (a sample) and begin collecting information on them to make inferences about the entire population of great whites. This is a random sampling approach because the initial encounters with tagged great whites are entirely random.

A valid statistic may be drawn from either a sample or a study of an entire population. The objective of a random sample is to avoid bias in the results. A sample is random if every member of the whole population has an equal chance to be selected to participate.

The difficulty of measuring a population lies in whatever you're attempting to analyze and what you're trying to accomplish. Data must be collected through surveys, measurements, observation, or other methods.

Therefore, gathering the data on a large population is generally not done because of the costs, time, and resources necessary to obtain it. For instance, when you see advertisements claiming, "62% of doctors recommend XYZ for their patients,"—all of the doctors with patients who could use XYZ in the U.S. were likely not contacted. Of the doctors who responded to the several hundred or thousand surveys that were requested, 62% responded that they would recommend XYZ—this is a population sample.

While a parameter is a characteristic of a population, a statistic is a characteristic of a sample, and samples can only result in inferences about a population characteristic. Inferential statistics enables you to make an educated guess about a population parameter based on a statistic computed from a sample randomly drawn from that population.

Statistics such as  averages  (means) and  standard deviations , when taken from populations, are referred to as population parameters. Many, such as a population's mean and standard deviation, are represented by Greek letters like µ (mu) and σ (sigma). Much of the time, these statistics are inferential in nature because samples are used rather than populations.

If you have all the data for the population being studied, you do not need to use statistical inference because you won't need to use a sample of the population.

Market and investment analysts use statistics to analyze investment data and make inferences about the market, a specific investment, or an index. In some cases, financial analysts can evaluate an entire population because price data has been recorded for decades. For example, the price of every publicly traded stock could be analyzed for a total market evaluation because the prices are recorded—this is a population, in terms of investment analysis. Another population might be the stock prices of all tech companies since 2010.

An analyst can calculate parameters with all of this data; however, the parameters used by analysts are only occasionally used in the same way statisticians and scientists use them.

Some of the parameters you might see used by investment analysts, statisticians, and scientists and their differences are:

Alpha : The excess returns of an asset compared to a benchmark

Standard Deviation : Average amount of variability in prices, used to measure volatility and risk

Moving Average : Used to smooth out short-term price fluctuations to indicate trends

Beta : Measures the performance of an investment/portfolio against the market as a whole

Alpha : The probability of making a Type I error, or rejecting the null hypothesis when it is true

Standard Deviation : Average amount of variablility in data

Moving Average : Smooths out short-term fluctuations in data values

Beta : The probability of making a Type II error, or incorrectly failing to reject the null hypothesis

What Is the Population Mean?

A population mean is the average of whatever value you're measuring in a given population.

What Are 2 Examples of Population?

One example of a population might be all green-eyed children in the U.S. under age 12. Another could be all the great white sharks in the ocean.

What Is the Best Example of a Population?

Imagine you're a teacher trying to see how well your fifth-grade math class did on a standardized test compared to all fifth-graders in the U.S. The population would be all fifth-grade math scores in the country.

In statistics, a population is the pool being studied from which data is extracted. Populations can be difficult to gather data on, especially if the studied topic is expansive and widely dispersed. Studying humans is an excellent example—there is no way to gather data on every brown-eyed person in the world (a statistical population), so random sampling is the only way to infer anything about that population.

In investment analysis, populations are generally specific types of assets being analyzed. These data sets are generally small (in statistical terms) and easy to acquire because they have been recorded, unlike data on living organisms, which is much more difficult to obtain.

Ocearch. " Tracker ."

define population research

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Your Privacy Choices

IMAGES

  1. Population Data: Definition, Classification, Estimation and Importance

    define population research

  2. What Is Population In Research

    define population research

  3. Sample & Population Statistics: Understanding the Basics

    define population research

  4. Population Definition and Examples

    define population research

  5. PPT

    define population research

  6. Population vs Sample (02)

    define population research

VIDEO

  1. Research Population

  2. POPULATION POLICIES

  3. Epidemiological transition in urdu| BsN 6th, Post-Rn 2nd| Unit#6,part#2| Epidemiology|

  4. Population Explosion

  5. Population, Target and Accessible population, Sample, sampling, Census study

  6. What is Population and Types of Population in Urdu and Hindi

COMMENTS

  1. What Is the Big Deal About Populations in Research?

    A population is a complete set of people with specified characteristics, while a sample is a subset of the population. 1 In general, most people think of the defining characteristic of a population in terms of geographic location. However, in research, other characteristics will define a population.

  2. Population vs. Sample

    1. Define the Population. Clearly define the target population for your research study. The population should encompass the group of individuals, elements, or units that you want to draw conclusions about. 2. Define the Sampling Frame. Create a sampling frame, which is a list or representation of the individuals or elements in the target ...

  3. Population vs. Sample

    A population is the entire group that you want to draw conclusions about.. A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population. In research, a population doesn't always refer to people. It can mean a group containing elements of anything you want to study, such as objects, events, organizations, countries ...

  4. Statistics without tears: Populations and samples

    A population is a complete set of people with a specialized set of characteristics, and a sample is a subset of the population. The usual criteria we use in defining population are geographic, for example, "the population of Uttar Pradesh". In medical research, the criteria for population may be clinical, demographic and time related.

  5. Research Population

    Research Population. All research questions address issues that are of great relevance to important groups of individuals known as a research population. A research population is generally a large collection of individuals or objects that is the main focus of a scientific query. It is for the benefit of the population that researches are done.

  6. What Is the Big Deal About Populations in Research?

    In research, there are 2 kinds of populations: the target pop-ulation and the accessible population. The accessible popula-tion is exactly what it sounds like, the subset of the target population that we can easily get our hands on to conduct our research. While our target population may be Caucasian females with a GFR of 20 or less who are ...

  7. Defining and Identifying Members of a Research Study Population: CTSA

    The defined population then will become the basis for applying the research results to other relevant populations. Clearly defining a study population early in the research process also helps assure the overall validity of the study results. Many research reports fail to define or describe a study population adequately.

  8. Study Population

    Definition. Study population is a subset of the target population from which the sample is actually selected. It is broader than the concept sample frame. It may be appropriate to say that sample frame is an operationalized form of study population. For example, suppose that a study is going to conduct a survey of high school students on their ...

  9. Defining Populations

    Define the population of interest and; Define the question to be answered; These requirements go hand-in-hand, because selection of an appropriate study population is dependent upon the question being addressed. Sometimes the study population seems obvious given the research question, but the study populations may be broader than that which at ...

  10. Who and What Is a Population?

    Methods. In this article, I review the current conventional definitions of, and historical debates over, the meaning(s) of "population," trace back the contemporary emphasis on populations as statistical rather than substantive entities to Adolphe Quetelet's powerful astronomical metaphor, conceived in the 1830s, of l'homme moyen (the average man), and argue for an alternative definition ...

  11. 3. Populations and samples

    Answers Chapter 3 Q3.pdf. Populations In statistics the term "population" has a slightly different meaning from the one given to it in ordinary speech. It need not refer only to people or to animate creatures - the population of Britain, for instance or the dog population of London. Statisticians also speak of a population.

  12. Understanding Population in Scientific Research: A Comprehensive

    The first step in addressing the population in research is to clearly define the target population. This involves specifying the characteristics of the larger group to which the study's findings will be generalized. The target population should be explicitly defined in terms of relevant factors such as demographic characteristics, geographic ...

  13. Defining the study population: who and why?

    After defining the research question, a study must identify the study population to assess. Study populations can include a whole target population (i.e., census); however, most studies include sampling, in which the sample represents a subset of the target population. ... cohort studies may define a population by an exposure/intervention ...

  14. Introduction to Population Demographics

    Aa Aa Aa. A population is defined as a group of individuals of the same species living and interbreeding within a given area. Members of a population often rely on the same resources, are subject ...

  15. Population vs Sample

    A population is the entire group that you want to draw conclusions about. A sample is the specific group that you will collect data from. The size of the sample is always less than the total size of the population. In research, a population doesn't always refer to people. It can mean a group containing elements of anything you want to study ...

  16. PDF Describing Populations and Samples in Doctoral Student Research

    The sampling frame intersects the target population. The sam-ple and sampling frame described extends outside of the target population and population of interest as occa-sionally the sampling frame may include individuals not qualified for the study. Figure 1. The relationship between populations within research.

  17. Research Fundamentals: Study Design, Population, and Sample Size

    design, population of interest, study setting, recruit ment, and sampling. Study Design. The study design is the use of e vidence-based. procedures, protocols, and guidelines that provide the ...

  18. Population Research: Definition, Examples

    Examples of Population Research. Example #1: (Population dynamics) Kabir et al. (1997), utilizing the data from various sources, examined the prospect of stabilization of the Bangladesh population under alternative demographic scenarios. They further discussed opportunities for achieving zero population growth vis a vis NRR=1 by the end of 2010.

  19. Samples & Populations in Research

    Tell your students that you will read a scenario and they must decide on whether the research scenario relates to a population or a sample. If it is a sample, they must identify the type of sample ...

  20. Differentiating Between Population and Target Population in Research

    Learn how to define and differentiate the population and the target population in research studies with this informative article. Download the PDF for free on ResearchGate.

  21. Population Definition in Statistics and How to Measure It

    Population is the entire pool from which a statistical sample is drawn. In statistics, population may refer to people, objects, events, hospital visits, measurements, etc. A population can ...

  22. Defining the Population and Target Population in Clinical Studies

    This review emphasizes the importance of clarifying terminology and adopting a systematic approach to population definition, considering demographic characteristics, clinical parameters, and ...

  23. Who is Hispanic?

    The term is not well known among the population it is meant to describe. In a 2019 Center survey, only 23% of U.S. adults who self-identified as Hispanic or Latino had heard of the term, and just 3% said they use it to describe themselves. However, awareness and use of the term varied across subgroups of Hispanics.

  24. (PDF) CONCEPT OF POPULATION AND SAMPLE

    The population refers to an entire set of units that exhibit a variable characteristic under investigation and for which research findings can be generalized (Shukla, 2020). Meanwhile, a sample is ...