Critical Assessment of Surgical Outcomes and Health Services Research


The practice of surgery has undergone a dramatic evolution over the last century with the availability of new scientific evidence supporting the use of different surgical techniques and management. Central to this mission is the process of critically appraising the surgical evidence base at every opportunity and deciding what can and should be applied to routine clinical practice.

Critical appraisal is defined as the process of carefully and systematically examining research evidence to judge its validity, its value, and its relevance to a particular context. Decisions related to surgical care delivery should be made following a careful assessment of individual patient preferences, clinical experience, and critical appraisal of the evidence in the medical literature. However, given the breadth and complexity of information, surgeons often are challenged to critically evaluate the literature and extrapolate the findings from health services and outcomes studies.

This chapter provides a framework to critically appraise surgical outcomes and health services research. This includes methods to assess the strength of evidence for surgical practices, establish the validity of scientific studies in surgery, and apply evidence-based medicine practices to improve the quality of surgical care. The intent is to provide the reader with conceptual and analytic tools that a modern, evidence-based surgeon needs to navigate the surgical outcomes literature and implement practices that are based on sound science.

Evidence-Based Approach to Surgery

Surgeons must be able to understand the evidence-based medicine process in order to identify, access, apply, and integrate new knowledge into their clinical practice and provide high-quality care for their patients. In practice, evidence-based medicine involves the following three fundamental principles. First, the ability to make optimal clinical decisions requires awareness of the best available research evidence. Second, there must be standards by which to judge whether evidence can be trusted. Third, the quality of evidence for benefit must be weighed against the risks, burdens, and costs associated with alternative management strategies, while simultaneously considering individual patients’ predicaments, values, and preferences. This evidence-based approach to clinical practice can also be remembered by the five A’s—ask, acquire, appraise, apply, and assess ( Fig. 8.1 ). These five steps provide a model for practicing evidence-based medicine that all clinicians are encouraged to use when questions arise during the routine care of patients.

Fig. 8.1, The evidence-based approach to clinical practice.

In everyday surgical practice, the evidence-based medicine process often starts when a surgeon learns about a new procedure or technique that could be used to treat a condition managed by their surgical specialty. The next step is to determine whether any research studies provide evidence that support the efficacy and effectiveness of that new procedure. For a study to provide a high level of evidence, it needs to address a significant clinical problem and provide novel results, extend what was previously known, or describe an innovative procedure that represents an improvement over existing technique. The study should include patients who are likely to be offered the intervention in everyday surgical practice. Moreover, it should focus on surgical treatments or strategies that can be replicated in a real-world clinical setting.

Evaluating Research Study Questions

There are several strategies that surgeons can use to review the literature, evaluate research study questions, and determine whether scientific evidence should be applied to their clinical practice. One strategy for evaluating the implications of a research question is outlined by the mnemonic FINER—feasible, interesting, novel, ethical, and relevant. FINER provides a framework that prompts the reader to ask several important questions when considering a research study. Is the study feasible and adequately powered to answer the specific research question? Does the study address a topic that is interesting to the surgical community? Is the research novel or innovative, and does the study meet all ethical standards of research conduct? Finally, do the results from the study change surgical practice or policy, and do they merit further scientific research or evidence.

Another widely accepted approach to developing and accessing a research question is the PICOT framework—population, intervention, comparison, outcome, and time ( Table 8.1 ). PICOT is a structured way to summarize the main components of a research question and can also be used to appraise a study or interpret its results. For example, a surgeon evaluating a surgical outcomes study would be prompted to ask: (P) Is the study population or clinical problem comparable with surgical patients or issues that I deal with in routine clinical practice; (I) Is the intervention truly novel, and does it represent an improvement over current standards of care; (C) Are study comparators valid and representative of usual care; (O) Are outcome measures valid and reliable; and (T) Does the duration of treatment and follow-up correspond to the duration important to patients? Using the PICOT framework to ask these questions provides a systematic process for determining whether studies are valid and relevant to clinical practice.

Table 8.1
PICOT framework for evaluating research questions.
Acronym Element Definition Description
P Population or problem Sample of subjects or problems that will be addressed.
I Intervention What is being tested in the study? May apply to therapy, prevention, diagnosis, or exposure groups.
C Comparator What is the main intervention being compared to?
O Outcome The main results that are being examined and assessed.
T Time frame Time for data collection and follow-up assessment.

Study Design

There are many study designs used to undertake surgical research, including both experimental and observational designs ( Fig. 8.2 ). The specific study design used to generate evidence is often influenced by practical considerations, such as resource availability and feasibility. Nonetheless, the study design selected to evaluate surgical interventions will largely determine our levels of confidence about cause and effect.

Fig. 8.2, Types of clinical study designs.

Several criteria have been proposed to establish epidemiological evidence of causal relationships, including the set of nine criteria originally published by Sir Bradford Hill in 1965. Distilled down to the most simplistic terms, causal relationships exist when (1) the cause precedes the effect, (2) the cause was related to the effect, and (3) we can find no other plausible explanation for the effect other than the proposed cause. Experimental studies that are prospective and randomized usually provide the greatest insurance that these criteria are met and are associated with the highest levels of evidence. But even prospective randomized studies can lead to erroneous conclusions if they are not executed or analyzed properly. Evaluating the quality of clinical evidence requires a close look at the strengths and limitations of each type of study design ( Table 8.2 ).

Table 8.2
Strengths and weaknesses of common study designs used in clinical research.
Study Design Characteristic Strengths Weaknesses
Randomized controlled trial Allocation of subjects to experimental or control group by chance
  • Ability to establish causal effects between exposure and outcomes

  • Can study more than one intervention

  • Expensive

  • Can take a long period

  • Not suitable for rare events

  • Often low generalizability due to selection criteria

Cohort study Cohort of subjects is compared based on different exposure
  • High generalizability

  • Can study rare exposures and multiple outcomes

  • Some potential to establish causal effects

  • Can take a long time (prospective)

  • Selection bias

  • Can be expensive

Case control study Cases are compared with controls with respect to exposure
  • Can study rare outcomes and multiple exposures

  • Relatively inexpensive

  • Hypothesis generating

  • Selection bias

  • Recall bias

  • Limited potential to establish causal effects

  • Can only study one outcome

Cross-sectional study Exposure and outcomes measures at same point in time
  • Use for describing disease prevalence

  • Fast and inexpensive

  • Hypothesis generating

  • Sample bias

  • Survival bias

  • Very limited potential to establish causal effects

Case series and report Detailed description of one or more subjects (i.e., cases) without a control group
  • Very detailed

  • Inexpensive

  • Hypothesis generating

  • Selection bias

  • Not generalizable

  • No ability to establish causal effects

Randomized Controlled Trials

When appropriately designed and conducted, randomized controlled trials (RCTs) are considered the gold standard for evaluating most types of health care interventions. In this study design, patients are randomly allocated to either a treatment or a control group prior to receiving an intervention using an element of chance to determine the assignments, similar to tossing a coin. After randomization, patients are followed in exactly the same way and the only difference is the treatment group they were allocated to.

The RCT study design has the greatest potential for determining causation between exposure and outcome given that randomization balances both known and unknown prognostic factors (i.e., measured and unmeasured confounders) between comparison groups during the assignment of treatment. As such, the effect of the treatment on outcomes relative to the control group can be determined while other variables are kept constant and selection bias is minimized. Furthermore, randomization can facilitate blinding (also known as masking) of the identity of treatments from investigators, participants, and assessors.

RCTs can be used to compare nearly all types of interventions used in surgical patients. This includes surgical procedures, medications used before or after surgery, screening programs, or diagnostic modalities, to name a few. But while RCTs help us determine the presence and strength of a causal relationship between different surgical interventions and a given outcome, they also have several weaknesses. First, they are very expensive to complete and it can take many years to obtain results when large sample sizes are needed. Second, the results may not be generalizable to patients in everyday practice due to narrow enrollment criteria. Third, many surgical interventions are simply not amenable to randomization. Finally, RCTs can still yield biased results if the study is not well designed or executed properly and lacks methodological rigor.

To assess whether RCT findings are valid, surgeons need complete and transparent information on the methodology and results. This need for adequate RCT reporting fueled the development of the original Consolidated Standards of Reporting Trials (CONSORT) statement in 1996 and its subsequent revision in 2010. , The most recent CONSORT statement includes a 25-item checklist and flow diagram, which provides guidance for reporting all types of RCTs but focuses on the most common two parallel group design type. This checklist provides an easy way for surgeons to evaluate the quality of evidence derived from these types of studies.

Cross-Sectional Study

A cross-sectional study involves looking at research subjects who differ on one key characteristic at a specific point in time. Exposure and outcome data are collected at the same time point among subjects who are similar in most other characteristics but different in a key factor of interest. For example, this could involve patients undergoing a specific type of surgery who come from different age groups, income levels, or geographic locations.

Cross-sectional studies are commonly used to describe disease prevalence or characteristics that exist in a community or hospital setting. This study design is inexpensive and is often used to make inferences about possible relationships or to gather preliminary data to support further research and experimentation (i.e., hypothesis generating). However, these studies are limited by survival bias and cannot analyze associations over a longitudinal period of time or be used to determine cause-and-effect relationships between exposure and outcome variables. Furthermore, there may be a sampling bias if the timing of the cross-sectional study leads to a sample that is not representative of patients in the general population.

Case-Control Study

The case-control design uses a different sampling strategy in which the investigators identify a group of individuals who exhibit a specific outcome (i.e., the cases) and then compare this group to a set of individuals who do not exhibit the outcome of interest (i.e., the controls). The cases and controls are then compared with respect to the frequency of one or more past exposures. If the cases have substantially higher odds of exposure to a particular factor compared to the control subjects, it suggests an association but does not necessarily provide evidence of causal inference.

There are numerous strengths of case-control studies. This study design allows investigators to examine risk factors associated with outcomes that are uncommon, to examine the association of multiple risk factors with outcomes simultaneously, and to examine outcomes that occur a long period of time after the exposure occurs. They are also relatively inexpensive to complete and can help generate hypotheses. On balance, however, case-control studies are limited by their retrospectively design that allows the potential for recall and selection bias. Moreover, identifying valid control groups to compare to cases can be difficult, estimating the frequency of exposure in the population at-large is often inaccurate, and the design allows evaluation of only one outcome at a time.

Cohort Study

A cohort study is an observational study design in which groups of subjects are identified based on their exposure to a particular risk factor and then compared to a group that have not been exposed to that same factor. This study design can be conducted from either a forward-looking (i.e., prospective) or backward-looking (i.e., retrospective) viewpoint. Prospective cohort studies are planned in advance and carried out over a period of time to assess outcome incidence among exposure groups. In comparison, retrospective cohort studies look at data that already exist and attempt to identify risk factors for outcomes that have already occurred. Retrospective analyses of existing surgical databases, in particular, constitute one of the most common applications of this study design. For both types of cohort studies, a higher incidence of outcomes in the exposed group suggests an association between that factor and the outcome. However, because prospective studies collect information about exposures and outcomes purposefully and systematically, they provide stronger evidence for causation.

There are several strengths and limitations of cohort studies. The advantages of cohort studies are that they are easier and less expensive to conduct than RCTs, the incidence (or rate) of exposure and outcomes can be estimated, and subjects in cohorts can be matched to limit the influence of confounding variables. In addition, enrollment criteria and outcome measures can be standardized, and—unlike case-control studies—multiple simultaneous outcome assessment is possible. However, cohort studies also have several disadvantages. Because there is no randomization, one cannot account for unmeasured imbalances in patient characteristics. Blinding or masking is difficult (or impossible retrospectively), and outcomes of interest can take a long time to occur, requiring many years of information about exposures. For retrospective studies in particular, treatment selection bias and confounding variables may lead to unmeasured differences in exposure groups over time that cannot be controlled for with statistical analysis. Moreover, interpretations can be limited because of missing data that is impossible for researchers go back in time to collect.

In order to assess the evidence quality derived from different types of cohort studies, several simplified checklist methods have been developed. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement is one of these checklists that provides an organized system for assessing observational study methodology and results that is similar to the CONSORT system used for RCTs. STROBE provides guidance on how to report critical components of research studies from all types of observational study designs, including cohort, case-control, and cross-sectional study designs.

As part of a series of articles on research methods published in a special issue of JAMA Surgery in 2018, another checklist was proposed for evaluating the quality of evidence derived from databases commonly used in surgical outcomes research. This checklist consists of 10 items for review that can be used when evaluating the findings of retrospective cohort studies, or considering questions that might be reasonably addressed using these databases. In the same issue of JAMA Surgery , in-depth information was provided for 13 of the most popular surgical databases, including the National Inpatient Sample; Surveillance, Epidemiology, and End Results (SEER) Program ; Medicare Claims ; National Surgical Quality Improvement Program (NSQIP) ; Society of Vascular Surgery Vascular Quality Initiative (SVS-VQI) ; and the Society of Thoracic Surgeons (STS) National Database.

Case Series and Reports

Historically, the surgical literature has relied upon case series and case reports to describe surgical interventions and patient outcomes. These have been the most highly cited type of study design in the surgical literature until the past couple of decades. Observations are made on a series of individuals, usually all receiving the same surgical intervention, but with no control group. The sampling of a case series is based on either exposure or outcomes, but not both. For example, a series could include all patients who underwent surgery at a single institution over a decade, or a case report might describe patients who experienced some rare adverse event after surgery.

The strength of case series and reports lie in their ability to provide in-depth narrative information on a given topic when other study designs cannot be carried out. This in turn may help to generate new hypotheses. However, the main limitation of this study design is the lack of ability to generalize findings or to establish a cause and effect relationship between exposures and outcomes. Case series and reports cannot be comparative and no absolute risk nor relative effect measures for an outcome can be calculated. As such, studies with these designs are nearly always considered low quality evidence.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here