Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Epidemiology is the study of the distribution and determinants of disease or other health-related states or events in specified populations and the application of this study to the control of health problems. A key component of this definition is that epidemiology focuses on populations, an emphasis that distinguishes epidemiology from clinical case studies, which focus on individual subjects.
Health events can be characterized by their distribution (descriptive epidemiology) and by factors that influence their occurrence (analytic epidemiology). In both descriptive and analytic epidemiology, health-related questions are addressed using quantitative methods to identify patterns or associations from which inferences can be drawn and interventions developed, applied, and assessed.
The goals of descriptive epidemiology are to define the frequency of health-related events and determine their distribution by person, place, and time. The foundation of descriptive epidemiology is surveillance, or case detection. Retrospective surveillance identifies health events from existing data, such as clinical or laboratory records, hospital discharge data, and death certificates. Prospective surveillance identifies and collects information about cases as they occur, for example, through ongoing laboratory-based reporting.
With passive surveillance, case reports are supplied voluntarily by clinicians, laboratories, health departments, or other sources. The completeness and accuracy of passive reporting are affected by whether reporting is legally mandated, the ease of establishing a definitive diagnosis for the disease under surveillance, illness severity, interest in and awareness of the medical condition among the public and the medical community, and by whether a report will elicit a public health response. Because more severe illness is more likely to be diagnosed and reported, the severity and clinical spectrum of passively reported cases often differ from those of all cases of an illness. Weekly and annual state counts of passively collected reports of nationally notifiable diseases are available on the National Notifiable Diseases System (NNDSS) Data and Statistics web page ( https://wwwn.cdc.gov/nndss/data-and-statistics.html ).
In active surveillance, an effort is made to ascertain all cases of a condition occurring in a defined population. Active case finding can be prospective (through routine contacts with reporting sources), retrospective (through record audit), or both. Population-based active surveillance, in which all cases in a defined geographic area are identified and reported, provides the most complete and unbiased ascertainment of disease and is optimal for describing the rate of a disease and its clinical spectrum. By contrast, active surveillance conducted at only one or several participating facilities, often referred to as sentinel surveillance, can yield biased information on disease frequency or spectrum based on the representativeness of the patient population and the size of the sample obtained. The range and severity of clinical symptoms also influence whether active surveillance is able to ascertain all cases of a disease; individuals with mild disease may not present to a physician for diagnosis.
Establishing a standard case definition is a necessary first step for surveillance and description of the epidemiology of a disease or health event. Formulation of a case definition is particularly important when laboratory diagnostic testing results are not definitive. More restrictive case definitions have greater specificity and minimize misclassification of persons without the condition of interest as cases; however, they can exclude true cases and may be most useful when investigating a newly recognized condition, in which the ability to determine etiology, pathogenesis, or risk factors is decreased by inclusion of noncases in the study population. A more inclusive definition can be important in an outbreak setting to increase sensitivity to detect potential cases for further investigation or to inform application of preventive interventions (e.g., reactive vaccination campaigns). Multiple research or public health objectives can be addressed by developing a tiered case definition that incorporates varying degrees of diagnostic certainty for confirmed, probable, and suspected cases. The Council of State and Territorial Epidemiologists (CSTE) provides uniform surveillance case definitions for nationally notifiable infectious and noninfectious conditions ( https://wwwn.cdc.gov/nndss/case-definitions.html ).
Sensitivity, specificity, and predictive values can be used to quantify the performance of a case definition or the results of a diagnostic test or algorithm ( Table 1.1 ). Sensitivity and specificity are intrinsic measures of a case definition or diagnostic test, whereas predictive values vary with the prevalence of a condition within a population. Even with a highly specific diagnostic test, if a disease is uncommon among the people tested, a large proportion of positive test results will be false positives, and the positive predictive value will be low ( Table 1.2 ). If the test is applied more selectively, such that the proportion of people tested who truly have disease is greater, the test’s predictive value will be improved. Thus, predictive values depend both on test sensitivity and specificity, and on the disease prevalence in the population in which the test is applied, also called the pre-test probability.
Measures of test accuracy | Sensitivity: Proportion of true positive (diseased) with a positive test result | A/(A + C) | |
Specificity: Proportion of true negative (nondiseased) with a negative test result | D/(B + D) | ||
Positive predictive value (PPV): Proportion of positive test results that are true positives | A/(A + B) | ||
Negative predictive value (NPV): Proportion of negative test results that are true negatives | D/(C + D) | ||
Measures of data dispersion and precision | Variance: Statistic describing variability among individual members of a population | |
|
Standard deviation (SD): A second, more commonly used statistic describing variability among individual members of a population | |
||
Standard error (SE) : Statistic describing the variability of sample-based point estimates (P) around the true population value being estimated | |
||
Confidence interval: A range of values that is believed to contain the true value within a defined level of certainty (usually 95%) | — |
Proportion With Condition | Positive Predictive Value | Negative Predictive Value |
---|---|---|
1% | 8% | >99% |
10% | 50% | 99% |
20% | 69% | 97% |
50% | 90% | 90% |
Often, the sensitivity and specificity of a test are inversely related. Selecting the optimal balance of sensitivity and specificity depends on the purpose for which the test is used. Generally, a screening test should be highly sensitive, whereas a follow-up confirmatory test should be highly specific.
Characterizing disease frequency is one of the most important aspects of descriptive epidemiology. Frequency measures typically include a count of new or existing cases of disease as the numerator and a quantification of the population at risk as the denominator. Cumulative incidence is expressed as a proportion and describes the number of new cases of an illness occurring in a fixed at-risk population over a specified period of time. The incidence density or incidence rate is the rate of new cases of disease in a dynamic at-risk population; the denominator typically is expressed as the population-time at-risk (e.g., person-time).
Because the occurrence of many infections varies with season, extrapolating annual incidence from cases detected during a short observation period can be inaccurate. In describing the risk of acquiring illness during a disease outbreak, the attack rate, defined as the number of new cases of disease occurring in a specified population and time period, is a useful measure. Finally, the case-fatality rate , or proportion of cases of a disease that result in death, is used to quantify the mortality resulting from a disease in a particular population and time period.
Prevalence refers to the proportion of the population having a condition at a specific point in time. As such, it is a better measure of disease burden for chronic conditions than is incidence or attack rate, which identify only new (incident) cases. Prevalent cases of disease can be ascertained in a cross-sectional survey, whereas determining incidence requires longitudinal surveillance. When disease prevalence (P) is low and incidence (I) and duration (D) are stable, prevalence is a function of disease incidence multiplied by its average duration (P = I × D).
Characterizing disease by person, place, and time is often useful. Demographic variables, including age, sex, socioeconomic status, and race or ethnicity, often are associated with the risk of disease. Describing a disease by place can help define risk groups, for example, when an illness is caused by an environmental exposure or is vector borne, or during an outbreak with a point source exposure. Time also is a useful descriptor of disease occurrence. Evaluating long-term (secular) trends provides information that can be used to identify emerging health problems or to assess the impact of prevention programs. The timing of illness in outbreaks can be displayed in an epidemic curve ( Fig. 1.1 ) and can be useful in defining the mode of transmission or incubation period, or for assessing the effectiveness of control measures.
The goal of analytic epidemiologic studies is to assess for and quantify the association between an exposure and a health outcome. This goal can be addressed in experimental or observational studies. In experimental studies, hypotheses are tested by systematically allocating an exposure of interest to subjects in separate groups to achieve the desired comparison. Such studies include randomized, controlled, double-blind treatment trials as well as laboratory experiments. By carefully controlling study variables, investigators can restrict differences among groups and thereby increase the likelihood that the observed differences are a consequence of the specific factor being studied. Because experiments are prospective, the temporal sequence of exposure and outcome can be established, making it possible to define cause and effect.
By contrast, observational studies test hypotheses using observational methods to assess exposures and outcomes among individual subjects in populations and to identify statistical associations from which inferences regarding causation are drawn. Although observational studies cannot be controlled to the same degree as experiments, they are practical in circumstances in which exposures or behaviors cannot be assigned. Moreover, the results often are more generalizable to a real population having a wide range of attributes. The 3 basic types of observational studies are cohort studies, cross-sectional studies, and case-control studies ( Table 1.3 ). Hybrid study designs, incorporating components of these 3 types, also have been developed. In planning observational studies, care must be taken in the selection of participants to minimize the possibility of bias. Selection bias results when study subjects have differing probabilities of being selected and the probability of selection is related to the risk factors or outcomes under evaluation.
Type of Study | Design and Characteristics | Advantages | Disadvantages |
---|---|---|---|
Cohort | Prospective or retrospective | Ideal for outbreak investigations in defined populations | Unsuited for rare diseases or those with long latency |
Select study group | Prospective design ensures that exposure preceded disease | Expensive | |
Observe for exposures and disease | Selection of study group is unbiased by knowledge of disease status | Can require long follow-up periods | |
Outcome measures used: Relative risk (RR) or hazard ratio (HR) of disease given exposure | RR and HR accurately describe risk given an exposure | Difficult to investigate multiple exposures | |
Cross-sectional | Nondirectional | Rapid, easy to perform, and inexpensive | Timing of exposure and disease can be difficult to determine |
Select study group | Ideal to determine knowledge, attitudes, and behaviors | Biases can affect recall of past exposures | |
Determine exposure and disease status | |||
Outcome measures used: Prevalence ratio for disease given exposure | |||
Case-control | Retrospective | Rapid, easy to perform, and inexpensive | Timing of exposure and disease can be difficult to determine |
Identify cases with disease | Ideal for studying rare diseases, those with long latency, new diseases | Biases can occur in selecting cases and controls and determining exposures | |
Identify controls without disease | OR only provides an estimate of the RR if disease is rare | ||
Determine exposures in cases and controls | |||
Outcome measures used: Odds ratio (OR) for an exposure given disease |
In contrast to experimental or observational studies that analyze information about individual subjects, ecologic studies draw inferences from data on a population level. Causal inferences from ecologic studies must be made with caution because relationships observed on a population level do not necessarily apply on the individual level (a problem known as the ecologic fallacy ). Because of these drawbacks, ecologic studies are suited best for generating hypotheses that can be tested using other study methods.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here