Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
All those who drink of this remedy recover in a short time, except those whom it does not help, who die. Therefore, it is obvious that it fails only in incurable cases. Galen (circa AD 100)
Many of the improvements in medical care for women in the past 30 years and tools to tackle emerging challenges have resulted from carefully designed studies of interventions aimed at improving health. Before that time, physicians relied on anecdote and personal experience to guide patient care. Evidence-based medicine is a style of practice best described as “integrating individual clinical expertise with the best-available external clinical evidence from systematic research.” Therefore, contrary to what many believe, evidence-based medicine combines understanding of available data with individual expertise. It is this blend of evidence and clinical intuition that makes evidence-based medicine so attractive and essential to the practice of modern medicine.
Why is evidence-based medicine important in maternal-fetal medicine? First, the practice of evidence-based medicine allows us to provide the best care to our patients. Instances in obstetrics are easily found where an incomplete or improper assessment of evidence led to problems with care. The classic example is the emergence of electronic fetal heart rate monitoring (see Chapter 33 ). This device, novel when it was introduced, generated new information that was widely expected to lead to improved perinatal outcomes. However, electronic fetal monitoring was widely implemented before evidence of benefit existed, and it became firmly rooted in obstetrics in the United States and many other countries. As has been well documented, it is uncertain whether continuous electronic fetal monitoring confers any benefit beyond that of intermittent auscultation in low-risk patients, and it has been a major contributing factor to the increase in the rate of cesarean delivery.
Second, clinical research is growing exponentially, as evidenced by the number of medical journals, research publications, and scientific societies. Some have estimated that 1000 articles are added to MEDLINE per day. In addition, clinical research has become more important, receiving increased funding and focus from the National Institutes of Health and other funding agencies. Because there is so much information, and because both physicians and patients can access it so rapidly, it is essential for practicing clinicians to be able to assess medical literature to determine a best course of action for an individual patient.
Third, evidence-based medicine provides us the best tools to tackle emerging challenges such as the COVID-19 pandemic. Elegant basic and translational research cohort studies and clinical trials have informed the impact of COVID-19 on pregnancy outcomes, as well as the effectiveness and safety of different therapeutics and vaccines.
Finally, in perinatal medicine, we are still faced with many important unanswered questions, such as the following:
Should universal cervical length screening be offered? If so, should women with a short cervix be treated with vaginal progesterone, cervical cerclage, or a pessary?
Can preeclampsia be detected, prevented, or reduced through prenatal screening and treatment programs?
Should we screen women for inherited thrombophilias if they have a history of poor pregnancy outcome? If so, how should we treat women who test positive?
How do we, as physicians and researchers, reach sound decisions in such situations? We start by learning to assess the quality of available medical evidence. In this chapter, we review the principles that serve as a basis for learning to interpret clinical research, including clinical research study designs, measures of effect, sources of error in clinical research (systematic and random), and screening and diagnosis. This chapter provides the reader with tools that will advance the journey toward becoming a practitioner of evidence-based medicine.
Several study designs are reported in medical literature. Fig. 15.1 illustrates how the design of a study is determined.
Case reports and case series are descriptions of either a single case or a number of cases and thus are termed descriptive studies . Often they focus on an unusual disease, an unusual presentation of a disease, or an unusual treatment for a disease. In case reports and case series, there is no control group. Therefore, drawing any inference on causality is impossible. Such studies are useful mainly for hypothesis generation rather than hypothesis testing. However, case reports and case series can be very valuable in the scientific process, as many important observations were initially made by a single case or series of cases. For example, in late December 2019, several local health facilities reported clusters of patients with pneumonia of unknown cause that were epidemiologically linked to a wet animal wholesale market in Wuhan, Hubei Province, China. Subsequent epidemiologic and etiologic investigation led to detection of a novel coronavirus that is responsible for the COVID-19 pandemic.
Analytic studies, in contrast to descriptive studies, involve two or more comparison groups. This study design permits inferences to be drawn by quantifying the relationship between factors. Analytic studies may be observational or interventional, depending on whether the investigator assigns the exposure.
The two main types of observational studies are case-control studies and cohort studies. These study designs attempt to assess the relationship between an exposure and an outcome ( Table 15.1 ).
Case-Control Studies | Cohort Studies |
---|---|
Good for rare disease | Good for common disease |
Study multiple exposures | Study multiple outcomes |
Done quickly | Long follow-up |
Inexpensive | Expensive (prospective) |
No incidence data | Can directly calculate incidence |
Prone to bias | Less prone to bias |
In case-control studies, subjects are identified on the basis of disease rather than exposure. Groups of subjects with and without disease are identified, and exposures of interest are retrospectively sought. Comparisons of the distribution of exposures are then made between cases and controls. Case-control studies are useful for the study of rare conditions. Advantages of case-control studies include efficient use of time, low cost, and the ability to assess the impact of multiple exposures. However, case-control studies cannot be used to calculate an incidence of disease for a particular exposure, and they carry substantial potential for confounding and bias.
A nested case-control study is a modification of the case-control design. In this design, cases and controls are drawn from a defined cohort of subjects. Because all subjects in the cohort are disease-free at entry into the study, subjects who go on to develop the outcome of interest become the cases and a random sample of the remaining subjects who do not develop the outcome become the controls. To reduce confounding, controls are often matched to cases based on the presence or absence of one or more variables. This unique study design reduces potential selection bias of controls coming from a population that is different from that of the cases. This design is also useful when measurements of interest are costly or time-consuming. Rather than performing the measurement on all patients in the cohort, archived samples are analyzed only for subjects selected as cases and controls.
Cohort studies identify subjects based on exposure and assess the relationship between the exposure and the clinical outcome of interest. Cohort studies can be either retrospective or prospective. In a retrospective cohort study, the exposed population is identified after the event of interest has occurred. In a prospective cohort study, exposed and unexposed subjects are followed over time to see whether the outcome of interest occurs. Cohort studies are useful in the study of rare exposures. The advantages of cohort studies are that the incidence of disease in exposed and unexposed individuals can be assessed and there is less potential for bias (especially if prospective). The main disadvantage of prospective cohort studies is that they can be time-consuming, sometimes requiring years to complete (if prospective), and are therefore often expensive.
A clinical example may help to contrast these study designs. The relationship between anticonvulsant use in pregnancy and the occurrence of neural tube defects could be assessed with either a case-control or a cohort study. In a case-control study, one would identify a group of cases of fetuses or neonates with neural tube defects and a group of controls (i.e., fetuses or neonates without a neural tube defect). The maternal health record could be reviewed to determine whether exposure to anticonvulsants has occurred. To study this question with a cohort study, one would first identify a population of women taking anticonvulsants in pregnancy and a group not taking anticonvulsants and then follow both groups through pregnancy and delivery to determine the frequency of neural tube defects in each group.
Whereas cohort studies can be either prospective or retrospective, case-control studies are almost always retrospective. The advantage of a prospective cohort study is that the type and amount of data collected can be tailored to optimally answer the research question. In a retrospective cohort study, one almost always relies on inpatient or outpatient records for data collection, so the study is limited by the type and quality of data included in these sources. For example, suppose an investigator is interested in the relationship between maternal cocaine use and fetal growth restriction. In a prospective cohort study, one would have the opportunity for a very accurate assessment of this exposure, perhaps by obtaining a hair sample. A retrospective study would have to rely on what was recorded in the medical record, which is most likely based on patient self-report. This can lead to information bias.
There is a common misconception that an analysis performed using data collected prospectively and contained in a database is equivalent to a prospective cohort study. In fact, unless the research question was defined a priori (i.e., before the start of data collection), this is best termed a retrospective secondary analysis of prospectively collected data . In many cases, such analyses are similar to retrospective cohort studies because important clinical information may not have been collected as completely or systematically as it could have been had the research question been specified in advance (see Types of Data for Clinical Research, later).
In contrast to observational studies, where the investigator has no control over the exposure, interventional studies involve assignment of the exposure by the investigator. The ability to assign exposure provides a level of investigator control over interventional studies that cannot be achieved in observational studies. However, interventional studies may not be feasible for all research questions. For example, it may not be ethical for an investigator to expose subjects to a factor likely to cause a deleterious outcome. Similarly, outcomes with a long lag time (often associated with high costs) may make the interventional design unsuitable for a particular question. Interventional studies involving human subjects are termed clinical trials . Depending on whether subjects are randomly or nonrandomly assigned to the comparison groups, clinical trials may be randomized or nonrandomized.
The randomized clinical trial is the gold standard of clinical research design. In this type of clinical trial, eligible consenting participants are randomly allocated to receive different therapies. Differences in clinical outcomes are then compared based on treatment assignment. Clinical trials are powerful because the likelihood that confounding and bias will influence the results is minimized. Randomization is the hallmark of randomized controlled trials. It is the method of assigning subjects to groups in such a way that characteristics of the subjects do not affect the group to which they are assigned. To achieve this, the investigator allows chance to decide to which group each subject is assigned to. This may be achieved by computer-generated random sequences. Randomization ensures that differences in outcomes between comparison groups are attributable to the intervention alone and not to known or unknown confounding characteristics. Although randomization does not guarantee that the groups will be identical in all baseline characteristics, it ensures that any differences between them are the result of chance alone. Randomization also facilitates concealment of intervention from subjects and investigators to further reduce bias. Finally, randomization leads to groups that are random samples of the study population, permitting the use of standard statistical tests that are based on probability theory.
Clinical trials can be logistically difficult and expensive, and depending on the study they can take years to complete. There are also concerns about whether the results of clinical trials can be generalized; that is, applied to clinical practice with the expectation that the same results will occur. Specifically, people who consent to be part of a trial may differ from people who do not consent in that they may be more likely to comply with an intervention or have a generally healthier lifestyle than individuals who decline to enter the study. In addition, well-performed clinical trials often have strict inclusion and exclusion criteria, with strict follow-up procedures. In real-life clinical situations, such rigor in follow-up rarely occurs. Pragmatic clinical trials have been advocated to increase generalizability of the results of clinical trials. These trials are designed to inform clinical or policy decisions by providing evidence of the effectiveness of an intervention in real-world clinical practice. Features include the recruitment of investigators and participants, the intervention and its delivery, follow-up, and the determination and analysis of outcomes in a manner consistent with everyday clinical practice, rather than the tightly controlled protocols of explanatory trials aimed at assessing efficacy of an intervention. Standards for reporting prospective randomized trials and other study designs have been developed to ensure complete and standardized reporting of studies ( Table 15.2 ).
a Schulz KF, Altman DG, Moher D, for the CONSORT Group. CONSORT 2010 Statement: updated guidelines for reporting parallel group randomised trials. Ann Intern Med . 2012;152:726–732.
b von Elm E, Altman DG, Egger M, Pocock SJ, Gotzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Ann Intern Med . 2007;147:573–577.
c Moher D, Liberati A, Tetzlaff J, Altman DG, The PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. Ann Intern Med . 2009;151:264–269.
d Stroup DF, Berlin JA, Morton SC et al. Meta-analysis of observational studies in epidemiology: a proposal for reporting. JAMA . 2000;283:2008–2012.
e Bossuyt PM, Reitsma JB, Bruns DE et al., for the STARD Group. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ . 2015;351:h5527.
f Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med . 2015;162:55–63.
g Husereau D, Drummond M, Petrou S et al. Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement. BMJ . 2013;346:f1049.
Despite these concerns, clinical trials provide the best evidence to guide practice. An excellent example of a clinical trial that guided practice was the screening and treatment study of bacterial vaginosis (BV) in pregnancy performed by the Maternal-Fetal Medicine Units (MFMU) Network. A variety of studies from around the world suggested that both symptomatic BV and asymptomatic BV were associated with spontaneous preterm birth. In addition, secondary analyses of data from clinical trials with high-risk women suggested that screening and treating BV in pregnancy might reduce the occurrence of spontaneous preterm delivery. , Many assumed that screening and treating pregnant women might reduce the incidence of preterm birth if applied to all pregnant women. To answer this question, the MFMU Network performed a placebo-controlled clinical trial comparing placebo with treatment with metronidazole for pregnant women who screened positive for BV. This study demonstrated that treating pregnant women with asymptomatic BV did not affect the occurrence of preterm birth.
Another benefit of randomized clinical trials is that subgroup analyses from such data can be used to generate hypotheses for future research. One example is a landmark study by Hauth and colleagues. After primary analysis of their randomized clinical trial of metronidazole and erythromycin to reduce the risk for preterm birth in women with a prior preterm birth or other historical risk factors demonstrated a reduction of preterm birth, a secondary analysis found that the benefit was limited to women with BV. This secondary analysis (and a similar secondary analysis of another randomized clinical trial ) should have prompted a new randomized trial of antibiotic treatment into which women would be enrolled if they had both BV and a historical risk for preterm birth. However, subgroup analysis in clinical trials should be interpreted with caution. On one hand, multiple subgroup analyses may produce a significant difference in one or more subgroups by chance alone. On the other hand, there is often limited statistical power for detecting differences in specific subgroups because trials are often powered for the main effect and not for effects within subgroups. To overcome these challenges, good practice recommends performing subgroup analyses in select prespecified groups based on a biologically plausible rationale for anticipated differences. Moreover, rather than assessing whether the effect of the intervention is statistically significant within a given subgroup, tests of interaction should be used to assess whether the effect size is significantly different between subgroups.
Two other study designs warrant mention: (1) systematic review and meta-analysis and (2) decision analysis. Both are valuable tools for the evidence-based medicine practitioner.
Systematic review and meta-analysis are two related but different terms, and they are often confused. A systematic review is a scientific investigation that focuses on a specific question and uses explicit, planned methods to identify, select, assess, and summarize the findings of similar but separate studies. It may or may not include a quantitative synthesis of the results from separate studies. A meta-analysis is the process of using statistical methods to quantitatively combine the results of similar studies identified in a systematic review to allow inferences to be made from the sample of studies. Thus a meta-analysis includes a systematic review, but a systematic review does not necessarily include a meta-analysis. In a meta-analysis, the results of a number of randomized clinical trials or observational studies may be statistically combined to obtain a summary estimate for the effect of a given treatment. Systematic reviews and meta-analyses should be differentiated from other, less data-driven narrative review articles in which authors present their own interpretation of data. The strength of a meta-analysis comes from it being an analysis of combined results from multiple studies, thereby increasing power to detect differences. This is an especially important methodology in obstetrics, as here there are few large randomized clinical trials to guide treatments.
Numerous meta-analyses have been performed for topics in obstetrics, , and many appear in the Cochrane Database of Systematic Reviews. Two such analyses ( Figs. 15.2 and 15.3 ) are taken from the Cochrane Library meta-analysis of the effect on neonatal outcome of antibiotics given antenatally to women with preterm prematurely ruptured amniotic membranes. Fig. 15.2 shows a comparison of neonatal infectious complications between women who received antibiotics and women who did not, and data are pooled for all available studies. Each of 11 randomized trials that met inclusion criteria for this analysis is listed, with the number of subjects and the frequency of the outcome in the treatment and control groups noted. The relative risk and 95% confidence interval (see Assessing Random Error, later) for each study, weighted for their sample size, are shown. The total number of subjects with the outcome of interest is summed, and the combined relative risk and 95% confidence interval are calculated. In this example, a number of small trials show a nonsignificant trend in favor of antibiotic treatment. The pooled (i.e., statistically combined) relative risk was 0.67, with a 95% confidence interval of 0.52 to 0.85. The point estimate (i.e., the relative risk) suggests that the “best guess” is that antibiotics reduce the risk for neonatal infection by 33%. The confidence interval suggests that data are consistent with as much as a 48% reduction in risk (1−0.52) or as little as a 15% (1−0.85) reduction in risk. Even the upper bound of the confidence interval suggests a protective effect of antibiotics on neonatal infection.
Compare this summary graph with the graph for the effect of antibiotics on perinatal death in women with preterm premature rupture of membranes (see Fig. 15.3 ). The pooled estimate yields a point estimate relative risk of 0.89, with a 95% confidence interval of 0.67 to 1.18. The point estimate suggests that the best estimate is that antibiotics reduce the occurrence of perinatal death by 11%. The confidence interval suggests that the data are consistent with a 33% reduction in perinatal death or an 18% increase in perinatal death with antibiotics. Because the confidence interval crosses a relative risk of 1.0, the data are consistent with “no difference” between the groups.
A notable limitation of meta-analysis is that clinical trials on the same general topic seldom enroll populations or employ treatments that are the same, resulting in heterogeneity. Therefore meta-analysis can at times seem like mixing apples and oranges. , Although there are statistical tools for taking heterogeneity into account (i.e., random effects models), it is incumbent on the reader to make such a determination. Guidelines for publication of quality meta-analyses have been promulgated by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement, proposed by a consortium of journal editors ; similar to the Consolidated Standards of Reporting Trials (CONSORT) statement, they are subscribed to by the American Journal of Obstetrics and Gynecology, Obstetrics and Gynecology, and general medical journals such as Lancet, New England Journal of Medicine, and Journal of the American Medical Association (see Table 15.2 ).
Two other issues are pertinent to the subject of meta-analyses. First, performing a meta-analysis requires significant methodological skill, so not all meta-analyses are of the same quality. The Cochrane Library, for example, includes meta-analyses of very high quality by experts on a number of obstetric topics. Second, there is debate about the role of meta-analyses when large clinical trials are available. This issue was raised in a meta-analysis of antiplatelet agents for the secondary prevention of preeclampsia. The authors suggested that antiplatelet agents may reduce the risk for preeclampsia and the risk for birth before 34 weeks’ gestation. In this meta-analysis of five studies that enrolled more than 1000 women in each treatment arm, four did not show a reduction in the risk for preeclampsia with antiplatelet therapy. How do we reconcile the role of large clinical trials with the role of meta-analyses in guiding our practice? On one hand, although opinions vary, we believe that a single, well-performed randomized clinical trial in a generalizable population may provide stronger evidence than a meta-analysis (where heterogeneous studies must be combined). On the other hand, meta-analyses that include large studies may provide insight into the efficacy of treatment in subgroups of subjects. For example, a more recent meta-analysis of antiplatelet agents for the prevention of preeclampsia, stratified by timing of initiation of intervention, demonstrated greater than 50% risk reduction when aspirin was initiated before 16 weeks’ gestation and no significant effect when aspirin was initiated at 16 weeks or later.
Decision analysis is a methodology in which the component parts of a complex decision are identified and analyzed in a theoretical model. Decision models often use available data to compare different therapeutic strategies for a clinical dilemma. The ultimate goal of any decision analysis is to reach a clinical decision. Decision models are often the foundation for formal economic analyses, such as cost-effectiveness analysis. Decision and economic analyses are common in the obstetric literature. Such analyses have been published on screening for group B streptococci, indomethacin use for preterm labor, tocolysis at advanced gestational ages, thromboprophylaxis at cesarean delivery, and universal cervical length screening to prevent preterm birth. , Interested readers should consider reading review articles on this subject. ,
Data for clinical research may be primary or secondary. Primary data are data collected specifically for the purpose of answering a given research question. For example, data collected during a clinical trial to test the stated trial hypothesis are primary data. Such data are tailored to the specific question, and important variables are systematically collected.
Secondary data are data collected for another purpose and then used for clinical research. Most of these data are derived from institutional or administrative databases. Analysis of data collected as part of a different research question is a secondary data analysis. There are advantages and disadvantages of secondary data. Because such data are already available, the expense and time needed to collect them are circumvented. Furthermore, national databases tend to be population or representative samples, increasing generalizability of research findings. The sample sizes are often large, facilitating the evaluation of even rare outcomes. For example, the national birth certificate data sets include more than 99% of births in the United States, with sample sizes of nearly 4 million. They can be linked to other data sets, such as infant death data, for further analyses. Finally, because they have been collected for years, they are an excellent resource for trend analysis.
Despite these advantages, limitations of existing data must be considered when assessing such studies. First, because they are collected primarily for other purposes (e.g., public health surveillance or billing) and not for clinical research, the specifics of the data collected and the method by which they are collected may be suboptimal for research purposes. Second, there are validity and accuracy concerns as well as issues of misclassification and missing data. In particular, when missing data are related to whether the outcome of interest is present or absent (i.e., not missing at random), use of such data can produce biased results. Furthermore, although the large sample sizes of secondary data are often an advantage, they may also result in statistically significant differences that are of limited clinical value.
Because of the significant limitations of secondary data, researchers should know the data source well, including how the information was collected, what the accuracy of the data is, and what proportion of data is missing. For administrative database studies to be considered for publication, some journals require authors to describe in the cover letter and methods section of the manuscript how the accuracy of the database was validated. Research using secondary data should capitalize on the strengths of the particular data set and avoid analyses that are dependent on the weakest aspect of the data. The main goal of studies based on secondary data should usually be hypothesis generation and not definitive hypothesis testing.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here