Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Outcomes research, or clinical epidemiology, is the study of treatment effectiveness or the success of treatment in the nonrandomized, real-world setting. It allows researchers to gain knowledge from observational data.
Bias and confounding can affect researchers’ interpretation of study data. Accurate assessments of baseline disease status, treatment given, and outcomes of treatment is critical to sound outcomes research.
Many types of studies are available to evaluate treatment effectiveness and include randomized trial, observational study, case-control study, case series, and expert opinions. The concept of evidence-based medicine uses the level of evidence presented in the aforementioned studies to grade diagnostic and treatment recommendations. Meta-analyses can summarize findings across multiple studies and provide important insights into the body of literature.
Outcomes in clinical epidemiology can be difficult to quantify, and thus instruments measuring these outcomes must meet criteria of the Classical Test Theory (reliability, validity, responsiveness, and burden) or the Item Response Theory to be considered psychometrically valid.
Many outcomes instruments have been created to assess health-related quality of life. These scales are generic or disease specific, including assessment of head and neck cancer, otologic disease, rhinologic disease, pediatric disease, voice disorders, sleep disorders, and facial plastic surgery outcomes.
The time when physicians chose treatment based solely on their personal opinions of what was best is past. This era, although chronologically recent, is now conceptually distant. In a health care environment altered by abundant information on the internet and continual oversight by managed care organizations, patients and insurers are now active participants in selecting treatment. Expert opinions are replaced by objective evidence representing multiple stakeholders, and the physician's sense of what is best is being supplemented by patients’ perspectives on outcomes after treatment.
Outcomes research (clinical epidemiology) is the scientific study of treatment effectiveness. The word “effectiveness” is critical because it pertains to the success of treatment in populations found in actual practice in the real world, as opposed to treatment success in the controlled populations of randomized clinical trials in academic settings (“efficacy”). Success of treatment can be measured using survival, costs, and physiologic measures, as well as health-related quality of life (HRQOL).
To gain scientific insight into these types of outcomes in the observational (nonrandomized) setting, outcomes researchers and care providers relying on evidence-based medicine (EBM) need to be fluent in methodologic techniques that are borrowed from a variety of disciplines, including epidemiology, biostatistics, economics, management science, and psychometrics. A full description of the techniques in clinical epidemiology is beyond the scope of this chapter. The goal of this chapter is to provide a primer on the basic concepts in effectiveness research and to provide a sense of the breadth and capacity of outcomes research and clinical epidemiology.
In 1900, Dr. Ernest Codman proposed to study what he termed the “end-results” of therapy at the Massachusetts General Hospital. He asked his fellow surgeons to report the success and failure of each operation and developed a classification scheme by which failures could be further detailed. Over the next two decades, his attempts to introduce systematic study of surgical end-results were scorned by the medical establishment, and his prescient efforts to study surgical outcomes gradually faded.
Over the next 50 years, the medical community accepted the randomized clinical trial (RCT) as the dominant method for evaluating treatment. By the 1960s, the authority of the RCT was rarely questioned. However, a landmark 1973 publication by Wennberg and Gittelsohn spurred a reevaluation of the value of observational (nonrandomized) data. These authors documented significant geographic variation in rates of surgery. Tonsillectomy rates in 13 Vermont regions varied from 13 to 151 per 10,000 persons, even though there was no variation in the prevalence of tonsillitis. Even in cities with similar demographics and similar access to health care (Boston and New Haven), rates of surgical procedures varied tenfold. These findings raised the question of whether the higher rates of surgery represented better care or unnecessary surgery.
Researchers at the Rand Corporation sought to evaluate the appropriateness of surgical procedures. Supplementing relatively sparse data in the literature about treatment effectiveness with expert opinion conferences, these investigators argued that rates of inappropriate surgery were high. However, utilization rates did not correlate with rates of inappropriateness and therefore did not explain all of the variation in surgical rates. To some, this suggested that the practice of medicine was anecdotal and inadequately scientific. In 1988 a seminal editorial by physicians from the Health Care Financing Administration argued that a fundamental change towards study of treatment effectiveness was necessary. These events subsequently led Congress to establish the Agency for Health Care Policy and Research in 1989 (since renamed the Agency for Healthcare Research and Quality [AHRQ]), which was charged with “systematically studying the relationships between health care and its outcomes.”
In the past decade, outcomes research and the AHRQ have become integral to understanding treatment effectiveness and establishing health policy. Randomized trials cannot be used to answer all clinical questions, and outcomes research techniques can be used to gain considerable insights from observational data (including data from large administrative databases). With current attention on EBM and quality of care, a basic familiarity with outcomes research is more important than ever.
The fundamentals of clinical epidemiology can be understood by thinking about an episode of treatment: a patient presents at baseline with an index condition, receives treatment for that condition, and then experiences a response to treatment. Assessment of baseline state, treatment, and outcomes are all subject to forces that may influence how effective that treatment appears to be. We will begin with a brief review of bias and confounding.
Bias occurs when “compared components are not sufficiently similar.” The compared components may involve any aspect of the study. Selection bias exists if there are systematic differences between people in the comparison groups. For example, selection bias may occur if, in comparing surgical resection to chemoradiation, oncologists avoid treating patients with kidney or liver failure. This makes the comparison biased because on average the surgical cohort will accrue more ill patients and this may influence survival or complication rates. This can be addressed through random assignment of participants to different treatment groups, known as randomization. Information bias exists if there are systematic differences in how exposures or outcomes are measured. Information bias can include observer bias, in which data are not collected the same way across comparison groups, and recall bias, in which inaccuracies of retrospective assessment can influence findings. Observer bias can be reduced by using blinded data collection, in which measurements are made without knowledge of which comparison group they are for; single blinding means participants do not know which group they are in, and double blinding means study staff who collect and/or interpret data do not know which study participants are in which group (until blinding is removed at the end). Recall bias can be reduced by using prospective data collection, in which measurements are made as participants move forward through time as opposed to attempting to remember what happened in the past.
Similar to bias, confounding also has the potential to distort the results. However, confounding refers to specific variables. Confounding occurs when a variable thought to cause an outcome is actually not responsible, because of the unseen effects of another variable. Consider the hypothetical (and obviously faulty) case where an investigator postulates that nicotine-stained teeth cause laryngeal cancer. Despite a strong statistical association, this relationship is not causal, because another variable—cigarette smoking—is responsible. Cigarette smoking is confounding because it is associated with both the outcome (laryngeal cancer) and the supposed baseline state (stained teeth).
Most physicians are aware of the confounding influences of age, gender, ethnicity, and race. However, accurate baseline assessment also means that investigators should carefully define the disease under study, account for disease severity, and consider other important variables such as comorbidity.
It would seem obvious that the first step is to establish diagnostic criteria for the disease under study. Yet this is often incomplete. Inclusion criteria should include all relevant portions of the history, the physical examination, and laboratory and radiographic data. For example, the definition of chronic sinusitis may vary by pattern of disease (e.g., persistent vs. recurrent acute infections), duration of symptoms (3 months vs. 6 months), and diagnostic criteria for sinusitis (clinical exam vs. ultrasound vs. CT vs. sinus taps and cultures). All of these aspects must be delineated to place studies into proper context.
In addition, advances in diagnostic technology may introduce a bias called stage migration. In cancer treatment, stage migration occurs when more sensitive technologies (such as CT scans in the past, and PET scans nowadays) may “migrate” patients with previously undetectable metastatic disease out of an early stage (improving the survival of that group) and place them into a stage with otherwise advanced disease (improving this group's survival as well). The net effect is that there is improvement in stage-specific survival but no change in overall survival.
The severity of disease strongly influences response to treatment. This reality is second nature for oncologists, who use TNM stage to select treatment and interpret survival outcomes. It is intuitively clear that the more severe the disease, the more difficult it will be (on average) to restore function. Interestingly, however, criteria for staging do evolve over time, and therefore it is critical to understand not just stages of severity but also how the stages are defined.
Integration of the concept of disease severity into the study and practice of common otolaryngologic diseases such as sinusitis and hearing loss is also developing. Recent progress has been made in sinusitis. Kennedy identified prognostic factors for successful outcomes in patients with sinusitis and encouraged the development of staging systems. Several staging systems have been proposed, with most systems relying primarily on radiographic appearance. Clinical measures of disease severity (symptoms, findings) are not typically included. Although the Lund-Mackay staging system is reproducible, often radiographic staging systems have correlated poorly with clinical disease. As such, the Zinreich method was created as a modification of the Lund-Mackay system, adding assessment of osteomeatal obstruction. Alternatively, the Harvard staging system has been reproducible and may predict response to treatment. Scoring systems have also been developed for specific disorders such as acute fungal rhinosinusitis, and clinical scoring systems based on endoscopic evaluation have likewise been developed. The development and validation of reliable staging systems for other common disorders, and the integration of these systems into patient care, are pressing challenges in otolaryngology.
Comorbidity refers to the presence of concomitant disease unrelated to the “index disease” (the disease under consideration), which may affect the diagnosis, treatment, and prognosis for the patient. Documentation of comorbidity is important because the failure to identify comorbid conditions such as liver failure may result in inaccurately attributing poor outcomes to the index disease or treatment being studied. This baseline variable is most commonly considered in oncology because most models of comorbidity have been developed to predict survival. The Adult Comorbidity Evaluation 27 (ACE-27) is a validated instrument for evaluating comorbidity in cancer patients and when used has shown the prognostic significance of comorbidity in a cancer population. Given its impact on costs, utilization, and QOL, comorbidity should be incorporated in studies of nononcologic diseases as well.
Reliance on case series to report results of surgical treatment is time honored. Although case series can be informative, they are inadequate for establishing cause and effect relationships. A recent evaluation of endoscopic sinus surgery reports revealed that only 4 of 35 studies used a control group. Without a control group, the investigator cannot establish that the observed effects of treatment were directly related to the treatment itself.
It is also particularly crucial to recognize that the scientific rigor of the study will vary with the suitability of the control group. The more fair the comparison, the more rigorous the results. Therefore a randomized cohort study, where subjects are randomly allocated to different treatments, is more likely to be free of biased comparisons than observational cohort studies, where treatment decisions are made by an individual, a group of individuals, or a health care system. Within observational cohorts, there are also different levels of rigor. In a recent evaluation of critical pathways in head and neck cancer, a “positive” finding in comparison with a historical control group (a comparison group assembled in the past) was not significant when compared with a concurrent control group.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here