Key Points

  • Epidemiology is the study of the distribution and determinants of disease frequency in humans to inform the natural history, service needs, and etiology of illness.

  • The frequency of disease can be expressed in different concepts, including cumulative incidence, incidence density, point prevalence, lifetime prevalence, and so on.

  • Epidemiological studies frequently rely on assessment instruments to evaluate psychiatric disorders. It is important to first establish the reliability (or consistency) and validity (or truthfulness) of these assessment instruments.

  • Based on the recent National Comorbidity Survey (NCS) in the US, the most common psychiatric disorders were major depression and alcohol dependence, followed by social and simple phobias. Approximately one in four respondents met criteria for a substance use disorder, one in four for an anxiety disorder, and one in five for an affective disorder in their lifetime.

  • Epidemiological studies in the US and in European countries showed that, in general, individuals with a psychiatric disorder underutilize mental health services. Among those who sought treatment, there was significant delay in seeking help.

Overview

Epidemiology is the study of the distribution and determinants of disease frequency in man. Epidemiological studies typically examine large groups of individuals, and by providing data on the distribution and frequency of diseases, they help describe the natural history of illness, assess service needs in the community or in special institutions, and shed light on the etiology of illness.

Epidemiology is based on two fundamental assumptions: first, that human disease does not occur at random, and second, that human disease has causal and preventive factors that can be identified through systematic investigation of different populations in different places or at different times. By measuring disease frequency, and by examining who gets a disease within a population, as well as where and when the disease occurs, it is possible to formulate hypotheses concerning possible causal and preventive factors.

Epidemiological Measures of Disease Frequency

The frequency of disease or some other outcome within a population group is described using different concepts: the rate at which new cases are observed, or the proportion of a given population that exhibits the outcome of interest.

Incidence refers to the number of new events that develop in a population over a specified period of time ( t 0 to t 1 ). If this incidence rate is described as the number of events (outcomes) in proportion to the population at risk for the event, it is called the cumulative incidence (CI), and is calculated by the following equation:


CI = Number of new cases , t 0 to t 1 Population at risk at t 0

The denominator equals the total number of persons at risk for the event at the start of the time period ( t 0 ) without adjustment for any subsequent reduction in the cohort size for any reason, for example, loss to follow-up, death, or reclassification to “case” status. Therefore, CI is best used to describe stable populations where there is little reduction in cohort size during the time period of interest. An example would be a study of the incidence of major depressive disorder (MDD) in a residential program. If, at the beginning of the study, 8 of the 100 residents have MDD, and of the 92 remaining patients, 8 develop MDD over the next 12 months, the CI for MDD would be (8/92 × 100) = 8.7% for this period (i.e., 1 year). Note that the denominator does not include those in the population with the condition at t 0 , since they are not at risk for newly experiencing the outcome.

When patients are followed for varying lengths of time (e.g., due to loss to follow-up, death, or reclassification to “case” status) and the denominator value representing the population-at-risk changes significantly, incidence density provides a more precise measure of the rate at which new events occur. Incidence density (ID) is defined as the number of events occurring per unit population per unit time:


Incidence density = Number of new cases , t 0 to t 1 Total person-time of observation

The denominator is the population that is actively at risk for the event, and is adjusted as people no longer belong in that pool. In a study of psychosis, for instance, if a person develops hallucinations and delusions, he or she becomes “a case” and no longer contributes to the denominator. Similarly, a person lost to follow-up would also contribute to the denominator only so long as he or she is being tracked by the study. To illustrate, suppose in a 100-person study of human immunodeficiency virus (HIV) infection, 6 people are lost to follow-up at the end of 6 months, and 4 develop HIV infection at the end of the third month, the person-years of observation would be calculated as follows: (90 × 1 year) + (6 × 0.5 year) + (4 × 0.25 year) = 94 person-years, and incidence density = (4 cases)/(94 person-years) = 4.26 cases/100 person-years of observation.

Prevalence is the proportion of individuals who have a particular disease or outcome at a point or period in time. In most psychiatric studies, “prevalence” refers to the proportion of the population that has the outcome at a particular point in time, and is called the point prevalence:


Point prevalence = Number of existing cases at t 0 Population at t 0

In stable populations, prevalence (P) can be related to incidence density (ID) by the equation P = ID × D, where D is the average duration of the disease before termination (by death or remission, for example). At times, the numerator is expanded to include the number of all cases, existing and new, in a specified time period; this is known as a period prevalence. When the period of interest is a lifetime, it is a type of period prevalence called lifetime prevalence, which is the proportion of people who have ever had the specified disease or attribute in their lifetime.

Lifetime prevalence is often used to convey the overall risk for someone who develops an illness, particularly psychiatric ones that have episodic courses, or require a certain duration of symptoms to qualify for a diagnosis (e.g., depression, anxiety, or post-traumatic stress disorder). In practice, however, an accurate lifetime prevalence rate is difficult to determine since it often relies on subject recall and on sampling populations of different ages (not necessarily at the end of their respective “lifetimes”). It is also an overall rate that does not account for changes in incidence rates over time, nor for possible differences in mortality rates in those with or without the condition.

Criteria for Assessment Instruments

There are a number of concepts that are helpful in the evaluation of assessment instruments. These involve the consistency of the results that the instrument provides, and its fidelity to the concept being measured.

Reliability is the degree to which an assessment instrument produces consistent or reproducible results when used by different examiners at different times. Lack of reliability may be the result of divergence between observers, imprecision in the measurement tool, or instability in the attribute being measured. Inter-rater reliability ( Figure 61-1 ) is the extent to which different examiners obtain equivalent results in the same subject when using the same instrument; test-retest reliability is the extent to which the same instrument obtains equivalent results in the same subject on different occasions.

Figure 61-1, Inter-rater reliability.

Reliability is not sufficient for a measurement instrument—it could, for example, consistently and reliably give results that are neither meaningful nor accurate. However, it is a necessary attribute, since inconsistency would impair the accuracy of any tool. The demonstration of the reliability of an assessment tool is thus required before its use in epidemiological studies. The use of explicit diagnostic criteria, trained examiners to interpret data uniformly, and a structured assessment that obtains the same types of information from all subjects can enhance the reliability of assessment instruments.

There are several commonly used measures to indicate the degree of consistency between sets of data, which in psychiatry is often used to quantify the degree of agreement between raters. The kappa statistic (κ) is used for categorical or binary data, and the intra-class correlation coefficient (ICC, usually represented as r ) for continuous data. Both measures have the same range of values (−1 to +1), from perfect negative correlation (−1), to no correlation (0), to perfect positive correlation (+1). For acceptable reliability, the kappa statistic value of 0.7 or greater is generally required; for the ICC, a value of 0.8 or greater is generally required.

Calculation of the kappa statistic (κ) requires only arithmetic computation, and accounts for the degree of consistency between raters with an adjustment for the probability of agreement due to chance. When the frequency of the disorder is very low, however, the kappa statistic will be low despite having a high degree of consistency between raters; it is not appropriate for the measurement of reliability of infrequent disorders.


κ = P o P c 1 P c

where P o is the observed agreement and P c is an agreement due to chance. P o = ( a + d )/ n and P c = [( a + c )( a + b ) + ( b + d )( c + d )]/ n 2 . Calculation of the ICC is more involved and is beyond the scope of this text.

Validity is a term that expresses the degree to which a measurement instrument actually measures what it purports to measure. When translating a theoretical concept into an operational instrument that purports to assess or measure it, several aspects of validity need to be accounted for.

For any abstract concept, there are an infinite number of criteria that one might use to assess it. For example, if one wants to develop a questionnaire to diagnose bipolar disorder, one should ask about mood, thought process, and energy level, but probably not whether the subject owns a bicycle. Content validity is the extent to which the instrument adequately incorporates the domain of items that would accurately measure the concept of interest.

Criterion validity is the extent to which the measurement can predict or agree with constructs external to the construct being measured. There are two types of criterion validity generally distinguished, predictive validity and concurrent validity. Predictive validity is the extent to which the instrument's measurements can predict an external criterion. For instance, if we devise an instrument to measure math ability, we might postulate that math ability should be correlated to better grades in college math courses. A high correlation between the measure's assessment of math ability and college math course grades would indicate that the instrument can correctly predict as it theoretically should, and has predictive validity. Concurrent validity refers to the extent to which the measurement correlates to another criterion at the same point in time. For example, if we devise a measure relying on visual inspection of a wound to determine infection, we can correlate it to a bacteriological examination of a specimen taken at the same time. A high correlation would indicate concurrent validity, and suggest that our new measure gives valid results for determining infection.

Construct validity refers to the extent to which the measure assesses the underlying theoretical construct that it intends to measure. This concept is the most complex, and both content and criterion validity point to it. An example of a measure lacking construct validity would be a test for assessing algebra skills using word problems that inadvertently assesses reading skills rather than factual knowledge of algebra. Construct validity also refers to the extent that the construct exists as theorized and can be quantified by the instrument. In psychiatry, this is especially difficult since there are no “gold standard” laboratory (e.g., chemical, anatomical, physiological) tests, and the criteria if not the existence of many diagnoses are disputed. To establish the validity for any diagnosis, certain requirements have been proposed, and include an adequate clinical description of the disorder that distinguishes it from other similar disorders and the ability to correlate the diagnosis to external criteria, such as laboratory tests, familial transmission patterns, and consistent outcomes, including response to treatment.

Because there are no “gold standard” diagnostic tests in psychiatry, efforts to validate diagnoses have focused around such efforts as increasing the reliability of diagnostic instruments—by defining explicit and observationally-based diagnostic criteria (DSM-III and subsequent versions), or employing structured interviews, such as the Diagnostic Interview Schedule (DIS)—and conducting genetic and outcome studies for diagnostic categories. The selection of a “gold standard” criterion instrument in psychiatry, however, remains problematic.

Assessment of New Instruments

If we assume that a reliable criterion instrument that provides valid results exists, the assessment of a new measurement instrument would involve comparing the results of the new instrument to those of the criterion instrument. The criterion instrument's results are considered “true,” and a judgment of the validity of the new instrument's results are based on how well they match the criterion instrument's ( Figure 61-2 ).

Figure 61-2, Validity of a new instrument.

Sensitivity is the proportion of true cases, as identified by the criterion instrument, who are identified as cases by the new instrument (also known as the true positive rate ).

Specificity is the proportion of non-cases, as identified by the criterion instrument, who are identified as non-cases by the new instrument (also known as the true negative rate ).

For any given instrument, there are tradeoffs between sensitivity and specificity, depending on where the threshold limits are set to distinguish “case” from “non-case.” For example, in the Hamilton-Depression Scale (HAM-D) instrument, the cutoff value for the diagnosis of MDD (often set at 15) would determine whether an individual would be identified as “case” or “non-case.” If the value were instead set at 5, which most clinicians would consider “normal” or not depressed, the HAM-D would be an unusually sensitive instrument (e.g., using a structured clinical interview as the criterion instrument) since most anyone evaluated with even a modicum of depressive thinking would be considered a “case” as would anybody typically considered to have major depression. However, the test would not be especially specific, since it would be poor at identifying those without depression. Conversely, if the cutoff value were set at 25, sensitivity would be low but the specificity high.

In practice, the threshold values in any given evaluation instrument, whether creatine kinase (CK) levels for determining myocardial infarction, the number of colonies on a Petri dish to determine infection, or criteria to determine attention-deficit/hyperactivity disorder (ADHD) (e.g., 6 of 9 from group one, 6 of 9 from group two), are chosen to balance the need for both sensitivity and specificity. To improve both these measures without a tradeoff, either the instrument itself or its administration must be improved, or efforts made to ensure maximum stability of the attribute being measured (e.g., administering them concurrently, or in similar circumstances, such as at the same time of day, or a similar clinical setting).

Two other useful measures are the positive predictive value (PPV), the proportion of those with a positive test that are true cases as determined by the criterion instrument. Negative predictive value (NPV) is the proportion of those with a negative test that are true non-cases as determined by the criterion instrument.

Study Designs

There are six basic study types, presented here in the order of their respective ability to infer causality.

Descriptive Studies

The weakest of all study designs, these studies simply describe the health status of a population or of a number of subjects. Case series are an example of descriptive studies, and are simply descriptions of cases, without a comparison group. They can be useful for monitoring unusual patients, and in generating hypotheses for future study. One example is Teicher and colleagues' 1990 case series of six patients who developed suicidal ideation on fluoxetine, which informed future studies that ultimately led to black box warnings for all antidepressants. Case series can also be misleading, though, as in the early 1980s when physicians began describing male homosexuals with depressed immune systems. The use of amyl nitrate-based sexual stimulants was a suspected cause, and studies on the effects of amyl nitrates on the immune system were under way when HIV was discovered.

Ecological Studies

In these types of studies, groups of individuals are studied as a whole, and the overall occurrence of disease (outcome) is correlated with the aggregate level of exposure to a risk factor. The groups being studied can be differentiated by geographical region or by other criteria, such as school, workplace, or clinic. Data are usually collected at different times for different reasons, and do not include data on individuals. Ecological studies are helpful for generating hypotheses, and are generally inexpensive and not time-consuming (since they are often based on data that are routinely published, such as death and disease rates, per capita income, religious affiliation, or food consumption). However, these studies are limited in showing causality because of the lack of individual data, the temporal ambiguity of the data (it is not known if a given risk factor precedes the outcome, for example), and problems with using data in an aggregate form to generalize about individuals. One such study that helped generate a helpful hypothesis included an intercountry comparison of prevalence rates for coronary artery disease (CAD); it showed that CAD was highest in those countries with the highest mean serum cholesterol values. This eventually led to more rigorous studies that confirmed a causal link between cholesterol levels and CAD.

Cross-sectional Studies

Cross-sectional studies examine individuals and determine their case status and risk factor exposures at the same time. Outcome rates between those with exposure can then be compared to those without. Data are collected by surveys, laboratory tests, physical measurements, or other procedures, and there is no follow-up or other longitudinal component. Cross-sectional studies are also called prevalence studies (more precisely, they are point prevalence studies), and, as with ecological studies, are relatively inexpensive and are useful for informing future research. They also aid in public health planning (e.g., determining the number of hospital beds needed) and generating more specific hypotheses around disease etiology by looking at specific risk factors.

As with the previously discussed study types, linking outcome and exposure is problematic. Although the data are collected for individuals, a person's exposure status may differ from when the disease actually began and when the study was conducted. To illustrate, if smokers tend to quit smoking and start exercising once diagnosed with lung cancer, a cross-sectional study looking at these factors would systematically underestimate the link between smoking and lung cancer, and suggest a link between exercise and lung cancer. Another problem with cross-sectional studies is that point prevalence rates are affected both by the rate at which the outcome develops and by the chronicity of the outcome. For instance, if a given disease has a longer time course in men than in women but identical incidence rates, the point prevalence rate in men would be higher than in women.

Case-control Studies

In case-control studies, subjects are selected based on whether they have the outcome (case) or not (control), and their exposures are then determined by looking backward in time. For this reason, they are also called retrospective studies, since they rely on historical records or recall. This type of study design is appropriate for rare diseases or for those with long latencies, and they can also be used to study possible risk factors. Problems with case-control studies include recall bias (which occurs if cases and controls recall past exposures differently) and difficulty in selecting controls. Ideally, one wants controls who are exactly matched to the cases in all other exposures except for the risk factor in question ( Figure 61-3 ). Thus, controls should be matched for a variety of factors: for example, gender, socioeconomic status (SES), smoking status (unless that is what is being studied), and alcohol use. For case-control studies, an odds ratio is used to determine whether the outcome is more likely in those with the exposure, or in those without one, and it is an approximation of their relative incidence rates.

Figure 61-3, Association between risk factors and outcome in case-control and cohort studies.

Cohort Studies

In a cohort study, a group of healthy individuals is identified and followed over time to see who develops the outcomes (diseases) of interest and who do not. Exposures to risk factors are assessed over time, and so the sequence between exposure and outcome can be determined, as well as the relationship between different exposures and outcomes. Because neither the subjects nor the researchers know whether and who will develop which outcomes, bias in measuring exposures is avoided. Disadvantages include high cost (both in terms of manpower and time), potential for loss to follow-up, and inefficiency in studying rare diseases. In cohort studies, the association between outcome and exposure is expressed as relative risk, the ratio between the incidence rates in those with the risk factor and those without.


Relative risk = Incidence rate in the group with the risk factor Incidence rate in the group without the risk factor RR = a a + b ÷ c c + d

Examples of cohort studies include the Framingham Heart Study (which has followed generations of residents from Framingham, Massachusetts) and the Nurses Health Study (which has followed a national sample of nurses with annual questionnaires). One classic study in Britain followed 35,445 British physicians and found that among smokers, the incidence of lung cancer was 1.30 per 1,000, but only 0.07 for non-smokers. The relative risk was therefore 1.30/0.07 = 18.6, indicating that smokers had more than 18 times the risk of developing lung cancer than non-smokers.

Randomized Controlled Trials

Randomized controlled trials (RCTs) are a type of cohort study in which the exposure is controlled by the researchers. Like standard cohort studies, a population who has yet to develop the outcome(s) of interest is defined; then, unlike standard cohort studies, the subjects are randomly assigned to different exposures. In these trials, the exposure is usually a treatment, such as a medication, or an intervention, such as counseling or a behavioral program. Those randomized to non-exposure may receive a placebo, or treatment as usual for the community. Multiple outcomes can be studied, from cessation of psychosis, reduction of depressive or anxious symptoms, to side effects or other adverse outcomes. RCTs, also known as experimental studies, are the gold standard of epidemiological research.

RCTs retain all the advantages of standard cohort studies, but because the exposure is randomized, the causal link between the exposure and the outcome is much stronger. Disadvantages are the same as for other cohort studies, but also include ethical issues around assigning subjects to treatment or non-treatment, as well as issues around adequate blinding and creation of appropriate placebo controls. Many times in psychiatric research, blinding is impossible, as with studies of psychotherapy, and defining an adequate placebo or “non-exposure” is difficult.

Development of Assessment Tools

Case Definition

In 1972, Cooper and colleagues published a US/UK study that showed high variability in the diagnosis of psychotic disorders. It highlighted the need for having explicit operational criteria for case identification. The development of such diagnostic criteria with the publishing of the Diagnostic and Statistical Manual of Mental Disorders, ed 3 (DSM-III) in 1980 represented a notable step toward increasing the reliability and validity of psychiatric diagnoses.

Standardized Instruments for Case Assessment

The clinical interview is generally used to diagnose psychiatric illness. However, differences in personal styles and theoretical frameworks, among other factors, can affect the process and conclusions of a psychiatric interview. To increase inter-rater reliability, a number of standardized interview instruments have been developed. The first was the Present State Examination (PSE), initially used in the International Pilot Study of Schizophrenia sponsored by the World Health Organization (WHO). The PSE was designed for use by psychiatrists or experienced clinicians, however, so its use in larger epidemiological studies was impractical. In 1978, epidemiologists at the National Institute of Mental Health (NIMH) began developing a comprehensive diagnostic instrument for large-scale, epidemiological studies that could be administered by either lay people or clinicians. The result was the Diagnostic Interview Schedule (DIS), which used the then newly-published DSM-III (1980), and elements of other research instruments, including the PSE, the Renald Diagnostic Interview (RDI), the St. Louis criteria, and the Schedule for Affective Disorders and Schizophrenia (SADS). The DIS has been used extensively in the US and many other countries for surveys of psychiatric illness. Over time, the DIS has undergone revisions, first to incorporate DSM-III-R and then DSM-IV diagnoses. The WHO and the NIMH have also jointly developed the Composite International Diagnostic Interview (CIDI) that is structurally similar to the DIS and provides both ICD-10 and DSM-IV diagnoses. With the release of DSM-5, diagnostic instruments will need to be adapted further.

Contemporary Studies in Psychiatric Epidemiology

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here