Learning outcomes

After carefully reading this chapter, the physician assistant student will be able to:

  • 1.

    Provide an overview of the history of evidence-based medicine.

  • 2.

    Describe the steps to evidence-based practice.

  • 3.

    Write a foreground (“PICO”) question for a given clinical vignette.

  • 4.

    Differentiate experimental from observational study designs.

  • 5.

    Describe the key design elements of randomized controlled trials, cohort, case control, and cross-sectional studies, as well as systematic reviews and meta-analyses.

  • 6.

    Critically appraise a study of each of the designs for various threats to internal and external validity.

  • 7.

    Synthesize evidence and develop a clinical management plan in response to a clinical problem.

  • 8.

    Evaluate his/her own evidence-based medicine process as a part of ongoing skill development and lifelong learning.

Introduction

At this point in history, when so much information is available with the click of a mouse or with a sweep of a finger, it is important for medical providers to continue to strengthen their capacity for incorporating evidence into their clinical decision making. Although providers have easier access to current information than ever before, the sheer volume of health-related data can quickly become overwhelming for providers, who must efficiently care for patients. For this reason, it is important for busy clinicians to have a grasp of the process and principles of evidence-based practice, from asking the question to finding the evidence and evaluating its quality and finally to incorporating the evidence into clinical decision making.

History of evidence-based medicine

The challenges of implementing the best quality evidence into medical decision making predates the modern medical era. The well-known “scurvy” experiment dates back to the British Navy of the 1740s. A naval surgeon, James Lind, conducted an experiment in search of a cause and a treatment for sick sailors. Although he had a small sample, he used important experimental principles, such as the establishment of control groups, a clear endpoint, and the inclusion of similar cases, in an attempt to control for potential confounding variables. In his experiment, Lind clearly demonstrated the importance of citrus in the diet, but it took 7 years for his findings to be published and 40 years before the British Navy included citrus on every voyage. This delay in the implementation of best evidence into clinical practice has been a recurring theme historically.

Another example of early experimental evidence ultimately informing medical practice includes examination of maternal mortality rates by Semmelweis in the middle 1800s. Through a comparison of deliveries performed by physicians and those by nurse midwives, Semmelweis noted that mortality rates from postpartum infection were much higher for pregnant women attended by physicians. He ultimately attributed the increase to the fact that doctors routinely performed postmortem examinations early in the morning before attending their obstetric patients. The introduction of good handwashing practices significantly lowered the mortality rates for mothers whose babies were delivered by physicians. Mortality rates increased again, however, when the new practice of consistent handwashing slackened. The historical challenges of implementing practices based on best evidence and sustaining those practices mirrors challenges encountered today.

Historically, a collection of high-quality evidence was limited by bias and lack of blinding, as individual physicians made observations about interventions and outcomes in their own patients. The earliest reported randomized controlled trials (RCTs) only occurred in the mid-to-late 1940s and included a streptomycin trial and a whooping cough vaccine trial. The whooping cough trial actually included elements of the placebo control and informed consent, strengthening the rigor of the trial and the validity of the evidence.

It has been a challenge, however, to summarize and communicate research-based evidence to make it usable by practicing clinicians. In 1967, David Sackett, MD, started the first Department of Clinical Epidemiology at McMaster University in Ontario, Canada. Before Sackett’s work, epidemiology and biostatistics and their implications in public health were not readily digestible for practicing clinicians. Sackett was among the first to develop practical tools for physicians to apply research evidence to the care of individual patients. Dr. Sackett continued at McMaster University until 1994, when he became the foundation director of the Center for Evidence-Based Medicine at Oxford University. After his retirement from Oxford, he returned to Canada and continued to teach clinical epidemiology to students until his death in May 2015.

Another important figure in the history of evidence-based medicine (EBM) is Dr. Gordon Guyatt. During his tenure as the director of the internal medicine residency program at McMaster University, Dr Guyatt was the first to coin the term “evidence-based medicine.”

Throughout the 1990s, an ongoing series of articles was published in the Journal of the American Medical Association (JAMA) titled “User’s Guides to the Medical Literature.” These ultimately led to the development of a textbook summarizing the principles of evidence-based clinical practice. In his work, Dr. Guyatt presented a methodical, easy-to-remember approach to the practice of EBM that many clinicians use today.

Guyatt and others credit three additional researcher-clinicians from an earlier generation who influenced their work in EBM. Dr. Tom Chalmers recognized the value of rigorous study design and randomized trials as early as 1955 in his paper on bed rest and diet for hepatitis. That paper heavily influenced Guyatt’s understanding of what he later called clinical epidemiology. Dr. Alvan Feinstein from Yale was both a clinician and a researcher who was a key player in the development of an approach to studying the ways medicine is practiced on a daily basis. The third individual was Dr. Archie Cochrane. His work as a clinician, an epidemiologist, and a medical school faculty member inspired the later development of the Cochrane Collaboration, which has become a recognized leader in the development of EBM and EBM resources.

Evidence-based medicine process

Effective EBM incorporates five primary tasks. These are (1) asking a clinical question, (2) searching for evidence that addresses the question, (3) assessing the quality of the evidence, (4) incorporating the evidence into a clinical decision, and (5) evaluating the process.

Task 1: Asking a clinical question

Typically, clinical questions are categorized as background questions or foreground questions. Background questions are very general questions most often asked by new learners or by practitioners encountering an unfamiliar diagnosis or clinical presentation. Background questions commonly begin with who, what, when, where, how, or why. Examples of background questions may include, “Where is the incidence of Lyme disease highest?” or “What are the risk factors for osteoporosis?” The answers to these questions provide background information on a particular topic.

Foreground questions are very specific questions designed to provide guidance for the clinical care of a particular patient or group of patients. A foreground question about Lyme disease, for example, may compare two antibiotic dosing regimens for speed of recovery. A useful acronym for developing foreground questions is “PICO.” PICO stands for:

  • P: Population or patient—How would the patient or population be described?

  • I: Intervention—Which intervention is being considered?

  • C: Comparison—What are the alternative approaches?

  • O: Outcome—What is the clinician hoping to measure, achieve, or affect?

PICO questions can be developed to address a variety of clinical question types, including diagnosis, etiology or harm, prognosis, and treatment. For example, consider a commonly diagnosed disorder such as diabetes mellitus. A physician assistant (PA) may have many questions about diabetes. See Table 41.1 for sample PICO questions of each clinical type regarding diabetes. Properly structuring the question at the outset is the key step to obtaining a meaningful evidence-based answer.

Table 41.1
Example PICO Questions of Each Clinical Type Regarding Diabetes
Type of Question PICO Question
Diagnosis In patients with type 2 diabetes, is a 24-hour urine collection for creatinine clearance more sensitive than a serum creatinine for detecting early-onset kidney disease?
Etiology or harm In middle-aged adults, is family history of diabetes a greater risk factor than obesity for the development of type 2 diabetes?
Treatment In patients with new-onset type 2 diabetes, are saxagliptin and metformin more effective than glipizide and metformin at decreasing the risk of renal failure?
Prognosis In patients with type 1 diabetes, is a hemoglobin A1c goal of 6.0% more effective than hemoglobin A1c of 7.0% at increasing survival?
PICO, Population or patient, intervention, comparison, outcome.

Task 2: Searching for evidence

The search for evidence begins with identifying the type of evidence of interest. Evidence can be broadly divided into two categories, filtered and unfiltered. Filtered evidence is that which has already been gathered and synthesized by experts into a format that is readily usable by clinicians. Clinical guidelines developed by professional bodies are an example of filtered evidence. Other examples include critically appraised topics (CATs), evidence-based summaries, structured abstracts, and systematic reviews. For a list of evidence-based filtered resources, see Table 41.2 .

Table 41.2
Examples of Filtered and Unfiltered Sources of Evidence
Filtered (Secondary) Evidence Unfiltered (Primary) Evidence
  • Clinical guidelines: American Cancer Society Guideline Summaries, U.S. Preventive Services Task Force Recommendations for Primary Care Practice

  • CATs: BestBETs

  • Evidence-based summaries: UpToDate, Clinical Evidence, Bandolier

  • Structured Abstracts: EBM Online, ACP Journal Club

  • Systematic reviews: Cochrane Library

  • Databases: Trip Database, Essential Evidence Plus

  • PubMed

  • EBSCO

  • Ovid

ACP, American College of Physicians; CATs, critically appraised topics; EBM, evidence-based medicine.

Unfiltered or primary evidence includes original research articles published in peer-reviewed journals. There are a variety of databases through which a search for primary literature may be conducted. Table 41.2 lists examples of databases of primary literature. Individual practitioners will need to determine which databases are available through their employing institutions. The focus of the rest of this chapter will be on accessing and assessing primary literature.

A systematic approach to searching medical databases is critical to uncovering the evidence. Table 41.3 provides a format for tracking progress through a systematic literature search. The search begins with identifying an available database and then choosing search terms. Start by entering the key words of the PICO question. For example, in the prognosis question, “For patients with stage IV colon cancer, is chemotherapy plus radiation more effective than chemotherapy alone at prolonging survival?” a search of the PubMed database may begin with the search terms “stage IV colon cancer” and “survival.” Subsequent searches will include these first two terms and add “chemotherapy” and “radiation.” For the opening search, record the number of articles identified in the search table as demonstrated in Table 41.3 . If after the second search the number of articles identified remains unwieldy, limiters may be added to the search. Limiters may include acceptable dates of publication, desired publication language, human participants, or the study design. Table 41.3 contains an example of the recording of a step-by-step search for the colon cancer prognosis question.

Table 41.3
Search Table Example
Database Search Terms Limiters Articles
PubMed Stage IV colon cancer AND survival None 791
PubMed Stage IV colon cancer AND survival 2014–2019; English language 287
PubMed Stage IV colon cancer AND survival AND chemotherapy 2014–2019; English language 96
PubMed Stage IV colon cancer AND survival AND chemotherapy AND radiation 2014–2019; English language 10

Continue to narrow the search, step by step, until a manageable number of relevant articles is obtained. At that point, the titles and abstracts can be reviewed, allowing the practitioner to eliminate articles that are clearly irrelevant to the clinical question. Full-text articles are then collected for review and appraisal. If the article is not available in full text, consult a medical librarian for interlibrary loan options. In this way, it will not be necessary to limit a search to “full-text” articles only and potentially miss some important evidence. Additional primary evidence may also be uncovered through a hand search of reference lists at the end of some of your key articles. More detailed tutorials on searching databases are available on the website for individual databases or through consultation with a medical librarian.

Evidence essentials

Research study design.

After the primary literature has been searched and sources of evidence identified, it is important to assess each article for usefulness and validity. Ultimately, the evidence-based practitioner aims to uncover the most valid evidence available to inform clinical decision making. Study design is an important feature of research studies that affects validity. Generally, research study designs that address the types of clinical questions in Table 41.1 can be divided into two categories, experimental and observational. Experimental studies are those in which the investigator assigns (preferably randomly) study participants to their respective groups. The RCT is a classic example of an experimental design. RCTs are frequently used to assess the efficacy of new treatments or interventions. In an RCT, study participants are randomly assigned to either the new treatment or one or more comparison groups and then followed over time for the development of the outcome of interest. The risk of occurrence of the outcome is compared in the two groups to determine which treatment is more effective.

Observational study designs are those in which the investigator observes existing groups of patients. The three most common observational designs are cohort, case-control, and cross-sectional. In a cohort study, a group of people with a common characteristic (cohort) is assembled, and the participants are divided into two or more groups based on their level of exposure to the independent variable of interest. These groups are then followed over time to see who develops the outcome of interest. Cohort studies can be used to address any of the question types in Table 41.1 . The independent variable could represent a therapeutic option, in which case one of the study groups would have undergone the therapy of interest, and the other would have experienced an alternative treatment or perhaps none at all. Alternatively, in an etiology question, the groups within the cohort are categorized as exposed or unexposed to some risk factor. In a manner similar to the RCT, the participants in a cohort study are then followed forward in time to determine the risk of development of the outcome of interest. The outcome may be cure or improvement of symptoms in a treatment study, development of disease in an etiology or harm study, or survival or mortality in a prognosis study.

The case-control study design is quite different from the cohort in that the groups of participants are defined by disease state (the outcome) rather than by exposure. Case-control studies are particularly useful in the examination of rare diseases for possible risk factors. For this type of question, a group of people with a disease (cases) are identified and then matched to a group of control patients. Ideally, the control participants will be like the cases in every respect except that they do not have the disease of interest. Then the cases and control participants are queried for their level of exposure to a possible risk factor. For example, to explore the possible association of maternal exposure to secondhand smoke with the development of congenital anomalies, investigators would assemble a group of women who have birthed babies with congenital anomalies and a group of women who have delivered healthy babies and query both groups of mothers about their exposure to secondhand smoke during their pregnancies.

Cross-sectional studies are the third common type of observational design. A cross-sectional study is sometimes referred to as a “snapshot” or “slice in time” study because the exposure and outcome variables are measured at the same point in time for the study participants. Cross-sectional studies can be used to assess disease prevalence but not incidence. The cross-sectional study can be conducted more quickly and less expensively than other study types, but it is often difficult to ascertain the temporal relation between the exposure and outcome because they are measured at the same time. Causality can never be established by a cross-sectional study.

Two additional study designs that are important for evidence-based practitioners to understand are the systematic review article and meta-analysis. These both represent filtered evidence in that the authors have searched out the original research and synthesized the information to address a clinical question. In a systematic review article, the investigators perform a systematic search for all of the primary literature on a topic, locate these articles, critically review the articles, and develop a response to their clinical question based on the evidence. In a meta-analysis, this process is taken one step further. The investigators not only seek out primary research but also seek to gather the original data from the investigators and determine whether it is legitimate to pool those data, repeat the statistical analysis, and come to a new conclusion based on the larger sample size. The strengths and limitations to these approaches are addressed later in this chapter. Perhaps their greatest strengths, however, are the increased sample size and broader perspective on a clinical question.

Evidence pyramid.

Proponents of EBM have developed an evidence pyramid to help users understand the relative rigor of the various study designs. Of the epidemiologic study designs discussed, the systematic review article and meta-analysis provide the greatest rigor in terms of evidence because of their increased sample size and more representative populations. Among the individual study designs, the RCT is the most rigorous followed by the cohort study, the case-control study, and the cross-sectional study ( Fig. 41.1 ).

Fig. 41.1, Evidence pyramid.

Important concepts in outcome measurement

Although evidence-based practitioners need not be trained statisticians to be effective in critical appraisal of the literature, a working knowledge of basic statistical principles empowers them to evaluate the evidence with greater confidence. A few important statistical concepts regard types of data, types of variables, and level of measurement. Generally, data can be characterized as qualitative or quantitative. Whereas qualitative data are often represented by words, quantitative data involve numerical expressions. Variables are defined as either independent or dependent. Independent variables are set by the researcher. These often include an intervention in an RCT or an exposure in an observational study. The dependent variable represents the outcome of interest. In the sample PICO questions in Table 41.1 , the dependent variables included early-onset kidney disease, type 2 diabetes, renal failure, and survival.

In addition to being characterized as independent or dependent, variables represent different levels of measurement. A nominal level of measurement involves only one property, which is classification. When variables are measured at the nominal level, their values are classified into categories. Examples include eye color, vital status (alive or dead), and so on. At the ordinal level of measurement, the additional property of order is present. Variables measured at the ordinal level are classified into categories that have an inherent order. For example, cancer is recorded as stages I to IV. Interval or ratio levels of measurement are marked by their characteristics of equal intervals and a true zero. Variables measured at an interval or ratio level are often further divided into continuous or discrete. Continuous variables represent “amounts,” and discrete variables represent “counts.” Continuous variables are measured with units; discrete variables have no units. Examples of continuous data include height, weight, and systolic blood pressure. Examples of discrete data include number of pregnancies, number of hospitalizations, and number of surgeries. The level of measurement for the variables included in a research study dictates the type of statistical analysis that is indicated. Consumers of the medical literature are better positioned to confidently appraise research articles when they have a basic understanding of the connection between levels of measurement and statistical analysis. For example, in a study of a new weight loss drug, the outcome of interest may be average weight loss in the group that took the new drug compared with the group that took the standard of care treatment. The dependent variable, weight loss, is a continuous variable, and the outcome would be expressed as the mean number of pounds. The independent variable is the type of treatment, a nominal variable that splits the participants into two groups. The appropriate statistical test is an independent samples t -test, which evaluates the difference between means in two groups. A more detailed explanation of the appropriate statistical test for various levels of measurement may be found elsewhere.

Evidence: Translating the greek

Most readers of the medical literature are generally familiar with the concepts of p values and confidence intervals (CIs), but type I (α) and type II (β) errors are also an important underpinning to the interpretation of study results and are often less well understood. Clinical trials and observational studies are usually founded on some type of research hypothesis. The research hypothesis may be that New Drug A is more effective than Standard of Care B at preventing a particular outcome. In conducting the study, however, a more specific hypothesis is required. As a result, the researcher tests a null hypothesis (H 0 ). In this case the null hypothesis is that there is no difference between New Drug A and Standard of Care B. Before beginning the study, the investigators must decide how great a chance they are willing to take of making a type I error. A type I error occurs when the researchers reject a true null hypothesis. By comparison, a type II error occurs when the researchers accept (fail to reject) a false null hypothesis. The degree of risk they are willing to take of making a type I error is generally referred to as “level of significance” or “α.” By convention, α is set at 0.05. This means that the investigators are willing to accept a 5% probability that the results occurred by chance alone. Figure 41.2 depicts the possible outcomes of a research study.

Fig. 41.2, Four possible outcomes for a hypothesis test.

After data collection, the statistical analysis is completed. In the hypothetical study of New Drug A and Standard of Care B, if the outcome of interest is captured by a continuous variable, the t -test is the appropriate statistical test. Along with the t statistic, the analysis will generate a p value. If the value of p is less than the α level that was established a priori, the null hypothesis (that there is no difference in efficacy between New Drug A and Standard of Care B) is rejected, and the difference in results between New Drug A and Standard of Care B is deemed statistically significant. In this case the p value represents the probability that the difference in outcome detected between New Drug A and Standard of Care B occurred by chance alone.

As demonstrated in this example, p values are useful for determining whether an effect is present. CIs, however, include an additional level of information. When thinking about CIs and hypothesis testing, it is important to remember that a research study is performed on a sample of the population about which the research question was asked. The entire population is rarely accessible, and the cost and time required to study the entire population are prohibitive. For this reason, a sample is drawn from the population. Data are then collected and analyzed from the sample with the intent of generalizing the results to the entire population from which the sample was drawn. The outcome measure from the sample is referred to as a point estimate of its value for the population. A practical definition for the 95% CI is that it represents a range of values in which the researcher is 95% confident that the true value for the population occurs. That means that the CI provides a sense of the size of the effect and the precision of the estimate.

In addition, a conclusion can be drawn about the statistical significance of the results by considering the CI. If the null value of the point estimate is within the range of values indicated in the CI, the result is not statistically significant. For example, if New Drug A and Standard of Care B represent blood pressure–lowering agents and the results of the study are expressed as a difference in blood pressure lowering between the two drugs, then the null value (the value that indicates there is no difference between the two therapies) is equal to zero. If the study found that New Drug A lowered blood pressure by an average of 8 mm Hg and Standard of Care B lowered blood pressure by an average of 2 mm Hg, then the point estimate for the mean difference in blood pressure lowering between New Drug A and Standard of Care B is 6 mm Hg. If the 95% CI for the point estimate is 3 to 9 mm Hg, then the result is statistically significant. If the 95% CI is -2 to 14 mm Hg, however, then the result is not statistically significant because the CI contains the null value of zero. Consider the meaning of the CI—the researcher is 95% certain that the true value for the mean difference in blood pressure lowering between New Drug A and Standard of Care B lies between -2 mm Hg and 14 mm Hg. That means that the true difference may be zero, or in other words, that there is no difference between the blood pressure lowering abilities of New Drug A and Standard of Care B. Readers of the medical literature will find that original research articles may include p values, CIs, or both. Although the CI provides additional information regarding the precision of the point estimate, it is important to note that with regard to hypothesis testing the conclusion provided by the CI and the p value will always agree.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here