Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Scientific evidence and data are widely regarded as the basis for modern medicine and surgery. However, the concept of conducting experiments to decide whether new treatments are effective and safe is a relatively recent one. The first published clinical trial that used systematic methods was conducted in 1747 by James Lind, a Scottish naval surgeon. In this study, sailors were divided into six treatment groups and given different scurvy treatments. Lind identified that consumption of lemons and oranges prevented scurvy, which addressed a major problem at the time. However, for the next 100 years, progress was slow, and evidence-based medicine only started to gain traction in the 20th century. Many methods used routinely in research studies today were devised as recently as the 1980s.
Historically, research in surgery was composed mainly of case series and poorly controlled studies from surgeons writing about how they perform a particular procedure. Similarly, much surgical evidence was derived from nonrandomised studies. The perceived poor quality of surgical literature has drawn criticism, with an editor of the Lancet referring to it as a ‘comic opera’. Thankfully, things have improved greatly in recent years.
The ability to interpret evidence and apply this to patient care is a key skill, particularly given the vast quantity of data being generated every day. Surgical research has unique challenges compared with other areas of medicine. For example, there are many surgical devices on the market designed to perform the same task; when do these need testing within a formal trial, and when do they not? Similarly, when two surgeons perform the same procedure using different techniques, how do we compare or standardise techniques when the procedure or overall result is the same?
Importantly, research is becoming ever more accessible to patients. Understanding the current evidence and effectively communicating its benefits and harms is crucial in helping patients decide what treatment they should choose. In this chapter we will discuss the following:
How to formulate a clear research question
How to measure and interpret an outcome
Types of study and levels of evidence
Sources of bias and how these may affect study results
How research and quality improvement are part of good surgical practice
This will be supplemented with examples of important surgical research studies.
An important concept in research is clinical equipoise. This describes the assumption that one particular intervention is not ‘better’ than another. In other words, doctors might be uncertain or even disagree as to which might be the best intervention. For example, equipoise does not exist in the comparison of antibiotics to saline for the treatment of severe pneumonia: it would be considered unethical to give saline instead. In contrast, if we wanted to compare two different antibiotics thought to be equivalent for the treatment of severe pneumonia, equipoise would exist, given the uncertainty around which is best.
The first step in the design of a clinical research study is to formulate a study hypothesis or question where equipoise exists. Understanding the constituent parts of a clinical hypothesis is key to evaluating the relevance and quality of surgical research studies.
A simple, structured approach can be used to formulate clinical questions. This approach considers several important aspects of a clinical research study:
P—Population (those patients with the target condition)
I—Intervention or exposure (the intervention or exposure being studied)
C—Comparison (the control group that the intervention is being compared with)
O—Outcome (the outcome of interest and how it is measured)
S—Study design (how the comparison is being made)
This is known as the PICOS approach. We can consider many clinical questions in this way ( Box 3.1 ).
We can study the use of antibiotics for adults with acute appendicitis. In this situation, antibiotics are utilised to reduce the complications of appendicitis and reduce the necessity for surgical removal of the appendix.
Using a PICOS approach, we could suggest a randomised clinical trial to answer this question:
P: The population is adults with acute appendicitis.
I: The intervention is the use of antibiotics.
C: The comparison group is that of usual clinical care (surgical removal of appendix).
O: The primary outcome is the complication rate.
S: The study design is a randomised control trial.
A study population is a group of participants selected from a general population based on specific characteristics. Having a clearly defined study population is key to ensuring research is clinically relevant and can answer a specific question. If a target condition is used as the basis for selecting a population, validated diagnostic criteria should be applied to identify such a population. For example, a population might be ‘adults with appendicitis’, where the diagnostic criterion is appendicitis diagnosed by a computed tomography (CT) scan.
The basis for the inclusion of particular patients in a study should be systematic to avoid bias. The best way of ensuring a sample is both representative and free from bias is to approach every eligible patient in a consecutive manner. This is often referred to as a consecutive sample.
Specific study populations may have special considerations that must be considered in the study design. An example of this may be older age, where visual or hearing impairment may present difficulties with particular data collection methods, such as telephone interviews.
The intervention is the main variable changed within the treatment group. In observational research, no direct experimental intervention occurs; thus, the variable is termed the exposure .
When considering an intervention for surgical research, particular care must be given to standardisation between patients and clinicians. Delivering interventions in a standardised manner is essential to ensure patients are comparable. Variation in how a treatment is delivered should be considered during the design process. In surgery, this is particularly important because new surgical procedures have an associated learning curve. In studies of new surgical techniques or approaches, care should be taken to ensure both groups receive a similar level of treatment. Standardising the surgical intervention may include an assessment within the research study to determine whether an acceptable level of competence has been achieved. Methods to do this include completing practical training sessions, performing a minimum number of cases with a new technology and video recording of an operation. A good study protocol will help address this.
The acceptability and practicality of the intervention must also be given due thought. If an intervention is not acceptable to patients, it will be very difficult to convince them (and a research ethics committee) to participate. One means of ensuring interventions are acceptable is to involve patient representatives when designing research studies. This is known as patient or community engagement in research. These patient representatives are a key part of the study design team and provide feedback to investigators to ensure studies are conducted in a feasible and acceptable manner.
The comparison is the control group that the intervention or exposure of interest is being compared against. This group should be sufficiently similar to the intervention group so as to ensure valid conclusions may be drawn as to the true effects of the intervention. In many studies, the comparison group receives the gold standard of clinical care, allowing a direct comparison of a new treatment with the current best therapy.
In controlled trials, the comparison group often uses a placebo (i.e., a treatment that should have no active biological effect) in order to reduce bias and maintain blinding. In surgery, it is sometimes difficult to practically deliver a placebo. Techniques such as using large dressing pads to cover wounds (so that patients cannot tell if they had laparoscopic or open surgery) or even sham operations have been devised to address this. However, subjecting patients to the risks of general anaesthesia and how far sham surgery should go (i.e., making incisions) should be carefully weighed against the benefits of including a placebo.
If the control group differs from the intervention group, bias may be introduced, leading to invalid conclusions. This is of concern in observational research, such as case-control studies, where the comparison group is directly selected by investigators.
The study outcome is the variable by which the effect of the intervention or exposure of interest is measured. For an outcome to be useful, several properties of the chosen measure should be considered:
Incidence: How commonly the outcome occurs.
The more common the outcome of interest, the smaller the sample size required to demonstrate a difference between treatment groups.
Directness: Is the study outcome actually measuring what it intends to or what is important?
The outcome of interest should directly measure what the intervention is ultimately intended to achieve. For example, warming devices are used during surgery to keep the patient warm, thereby reducing postoperative complications. If a study like this just measured the air temperature put into the device, rather than the temperature of the patient, or if the patient had any complications, it would not be directly measuring the outcome that it has set out to investigate. So in this example, the efficacy of intraoperative warming devices should use the postoperative complication rate as a primary outcome and whether this kept the patient warm rather than measuring the temperature of the air going into the device.
Definition: Measured using the same criteria.
All outcomes should be measured using clearly defined criteria. Preferably, these criteria would be used widely across the field so that all studies are measuring the same outcome in the same way. One example of such criteria would be the Response Evaluation Criteria in Solid Tumours (RECIST) criteria for the assessment of tumour progression.
Relevance: Outcomes should be relevant to the research question and to patient care.
What may be relevant to clinicians may not also be relevant to patients. To this end, patient representatives should be consulted when selecting study outcomes. Often, functional and quality-of-life outcomes are of primary concern to patients.
Timing: Outcomes must be measured at appropriate time points.
These should reflect the time frame in which the outcome of interest would be expected to occur. For example, measuring an outcome at 72 hours after a procedure may be relevant for postoperative bleeding but not for surgical-site infection.
Reliability: The chosen outcome measure should be able to reliably detect an event and should be standardised for all patients.
Patient-reported outcome measures (PROMs) are outcomes that are reported directly by patients. This contrasts with outcomes where an investigator or clinician reports or decides an outcome. Examples of PROMs include validated general quality-of-life questionnaires (e.g., the EQ-5D or HR-QoL).
Cost-effectiveness is sometimes used as an outcome measure. This is often used by clinical governance bodies to decide whether a treatment should be recommended for use and represents value for the money.
There are many study designs available to answer clinical questions. Study design methodology is constantly evolving; however, all types of study can be broadly divided and subdivided into the following categories:
Primary research (research done on individual patients)
Randomised trial
Prospective cohort study
Retrospective cohort study
Cross-sectional study
Case-control study
Case series
Case report
Secondary research (research that considers multiple sources of primary research)
Systematic review
Systematic review with meta-analysis
Fig. 3.1 provides a useful means of identifying the type of study when the article may not be explicit as to its design.
A commonly used way of classifying evidence is according to the Oxford Centre for Evidence-Based Medicine (CEBM), which divides evidence according to the risk of bias ( Fig. 3.2 ). These are commonly referred to as the levels of evidence.
A further way of classifying these studies is using the ‘pyramid of evidence’ ( Fig. 3.3 ). This pyramid takes into account the inherent properties of each study design and classifies each study according to how likely it is to provide a reliable answer to the study question, as close to the ‘true value’ as possible.
This pyramid of evidence is accompanied by several caveats. First of all, not all questions can be answered by a randomised controlled trial (RCT), either because of lack of equipoise (where it would be unethical to conduct a trial) or for logistical reasons (where a trial would be too expensive or simply unfeasible). Secondly, a poorly conducted study may give an inaccurate or unreliable answer compared with a well-conducted study of a different design. For example, a small, poorly conducted randomised trial may not provide a better answer to a question than a large, well-conducted prospective cohort study.
At the top of the pyramid are high-quality systematic reviews and meta-analyses, which take multiple sources of evidence for a given clinical question. For high-quality reviews of interventions, these sources should be well-conducted RCTs in well-defined and comparable populations. Studies of diagnostic accuracy (i.e., how well a test performs) should also use well-conducted RCTs or prospective cohort studies. Systematic searching methods are used to attempt to find every possible study that may contain the answer to the clinical question of interest. These methods often include searching multiple databases, searching the references of key articles in the field and contacting experts to ask if they have suggestions for potential studies to include. Once all studies that may be eligible are screened, each study is critically appraised individually (see Critical Appraisal). The arising systematic review then should present an overarching and balanced view of the evidence for a given clinical question.
Meta-analysis describes the use of statistical methods to combine the numeric results of similar studies in order to derive an estimate of how well the intervention of interest works (i.e., the treatment effect). This is often combined with a systematic review, where the results of studies are combined after being screened for inclusion. Combining studies in this manner is referred to as pooling .
An important part of meta-analysis is an assessment of the similarity between different studies of the same clinical question. Variation between studies is termed heterogeneity and takes two forms, clinical and statistical heterogeneity. Clinical heterogeneity concerns how clinically similar a population in one study is to another. It is unlikely to make sense to combine the results of two trials examining the same treatment in two distinct populations, for instance, adults and children. Statistical heterogeneity refers to the differences in the actual results of included studies and whether these are likely to be a result of chance (sampling error) or true differences in outcome.
Clinical trials are research studies that allocate groups of patients to one or more tests or treatments. They are used to test hypotheses and allow researchers to construct studies where new treatments are tested while controlling for potential sources of bias.
RCTs are primary research studies that test the effect of an intervention using randomisation to create equally balanced treatment groups. After participants are recruited, they are allocated to treatment groups at random, where the choice of treatment is not determined by the patient, clinician or any other person. Removing this choice of treatment removes the risk of selection bias, which is often the largest source of bias in medical research. Selection bias simply refers to how doctors and patients select treatments based on certain patient characteristics.
The controlled part of the name refers to two integral components of high-quality trials. First, there is a comparable control group with which the intervention of interest is compared. Second, the trial is conducted in strict accordance with a predefined study protocol.
Study protocols are important and are essentially a detailed instruction manual outlining how a piece of research will be conducted. An important advantage of a published study protocol is that it ensures investigators stick to a preplanned analysis of their data, and in the case of multicentre research, it promotes standardisation across centres. Study protocols are not unique to RCTs. It is highly recommended that all clinical research (including systematic reviews and cohort studies) should be conducted according to a predefined protocol. Many journals now stipulate this as a requirement for even considering a study for publication.
To take a new drug or procedure from concept to clinical practice, a number of steps must be undertaken. There are five phases of research trials ( Fig. 3.4 ), as described next.
Phase 0 trials are ‘first-in-man’ studies. Here, very small quantities of promising new compounds are given to healthy humans in a highly controlled environment. These studies aim to approximate the pharmacokinetic and pharmacodynamics properties of a drug (e.g., half-life, distribution, excretion, absorption, metabolism and potential toxicities).
Phase I trials aim to formally assess the safety of a treatment. Here, a range of doses within the postulated therapeutic range is tested (this is called dose ranging ) in a larger sample, and the side effects of these varying doses are assessed. Usually, this sample is of healthy volunteers, but it can be those with the target condition. From these studies, the dose with the most acceptable side-effect profile is selected.
In phase II trials, a new treatment is administered to patients with the target condition to determine safety and efficacy. Because these trials are designed to elucidate whether a new treatment works or not, they are usually tightly controlled and may compare the new treatment to a placebo.
Phase III is the final stage of testing prior to a new treatment being released on the open market. Phase III trials are the largest trials, where a new treatment is given to patients with the target condition. Large sample sizes are used to further identify rare side effects and to accurately estimate the treatment effect of the new intervention. Phase III trials are often pragmatic rather than explanatory and seek to confirm that the new treatment will work in the real world.
Once the new treatment has passed national regulatory approvals, it can be sold to healthcare providers. After approval, all drugs undergo monitoring for rarer side effects and other harms that may have been missed in the initial studies. This process is called postmarketing surveillance. One well-known approach to postmarketing surveillance in the UK is the ‘Yellow-Card Scheme’, where clinicians can report adverse events that are suspected to have been caused by a specific drug.
In addition to the phases of clinical trials that are present for all medical therapies, innovation in surgery carries its own challenges. For example, a drug trial usually does not contain the same complexities around different techniques of administration or evaluating how the intervention can be used by clinicians. In contrast, there are a number of additional complexities that surgical innovations might have. For example, an implantable prosthetic mesh is sometimes used to reinforce abdominal wall weaknesses or to support tissue in fixed positions. However, a study to compare using a new mesh reinforcement versus suture or no reinforcement would not be of great use because the surgical technique, the safety of the procedure and the safety of the mesh have to be established first. The IDEAL (Idea, Development, Exploration, Assessment, Long-term Follow-up) framework can help us to develop surgical innovations and to know what needs to be done next to thoroughly evaluate a new innovation.
The IDEAL framework has five stages:
The idea. At this stage, the innovation is new and is ‘proof of concept’, rather than a full product. The emphasis here is ‘Is this innovation possible?’ Case reports in animal models or very few numbers of humans would be the evidence produced from these studies. This might be done by only one or two surgeons.
Development of the idea. Here, the innovation undergoes refinement, and techniques are refined. The most important part here is that the safety of the innovation is carefully looked at. Prospective, nonrandomised clinical trials are often used, with small numbers of patients. This might be done by only a small group of surgeons.
Exploration of the developed innovation. This is the first stage where the safety can be compared with other innovations. RCTs are the ideal design to compare the innovation to treatments or procedures for the same disease. This is opened up to a greater number of surgeons.
Assessment. Through completion of the other stages, the techniques needed for the innovation are now well established. The first test of efficacy can be made at this stage, with comparison with other treatments already used in clinical practice. Safety is also looked at here too. Again, for treatment comparisons, RCTs are the gold standard to compare with other interventions head to head.
Long-term studies. As well as short-term outcomes and common outcomes that can be easily identified in studies of a few hundred patients, longer-term studies of many more patients will help identify lasting effects of a new innovation and rare events, which may only be detectable with large numbers of patients. Studies at this stage often work as registries, where the innovation has made it to clinical practice, but patients have to be registered on a database when they receive the innovation.
There are several examples where these stages have been omitted, and as a result, harm has occurred. Notable examples include the use of prosthetic mesh leading to chronic pain, particularly when used for the treatment of female organ prolapse, and the use of unevaluated silicone breast implants by the French company PIP, which led to high rupture rates and harm to patients. Safety and long-term outcomes are essential in measuring during the innovation process. Even in the case where a very similar innovation exists (e.g., a breast implant), if there are changes to the innovation or the way it is manufactured, the IDEAL framework still should be used to demonstrate that it is safe.
RCTs can be further grouped into two categories: explanatory trials and pragmatic trials.
An explanatory trial is designed to explain precisely how an intervention may work; in other words, it is designed to elicit mechanisms. In order to do this, an explanatory trial will find a highly homogenous group of patients in whom to test a hypothesis. The use of a placebo is common because investigators will attempt to control for as many factors as possible.
A pragmatic trial is designed to encompass as much variation in clinical practice as possible. The population of these studies is often highly heterogeneous to make the trial generalisable to the true population the intervention will be used on in clinical practice. Often, instead of placebos, pragmatic trials compare interventions to the current standard of care. Pragmatic trials are usually large in size and conducted across multiple centres. These trials are intended to give an idea as to whether an intervention works in ‘real-life’ healthcare systems rather than a highly controlled environment.
Research studies should ideally capture measures of both effectiveness and safety. A new treatment that is not safe but very effective might not be very useful. Thalidomide is an example of why safety monitoring is crucial. Thalidomide was found to be effective in relieving morning sickness and became popular. However, poor testing and disregard of emerging reports of teratogenicity led to thousands of children being born with malformations.
Well-conducted clinical trials can potentially identify adverse effects early on, before wide uptake of a treatment. However, if an adverse effect is rare, it may only be identified after lots of people have taken the treatment.
Data monitoring committees are teams of experts who are often independent of the clinical trial investigation team. The data monitoring committee looks periodically at data from the clinical trial as it progresses. If a treatment within a trial looks as if it could be causing harm, the committee should stop the trial to avoid further harm to future participants who might be enrolled. Similarly, if a treatment is shown to be very effective, more so than first thought, the trial can be stopped because it would be unethical to give future participants a less effective control treatment.
LEOPARD–2 |
This randomised trial compared laparoscopic versus open surgery for the removal of pancreatic cancer (pancreatoduodenectomy). In other types of surgery, laparoscopic surgery has been shown to reduce complications and shorten the time taken to recover from an operation. |
Monitoring of the LEOPARD–2 trial while it was ongoing found that 15% of patients who received laparoscopic surgery died within 90 days of surgery, compared with none in the open surgery group. This prompted discontinuation of the trial. |
In surgical trials, patients are typically allocated to a single treatment that does not change. This type of study is called a parallel-group study ( Fig. 3.5A ). The treatment group (or groups) is compared with a control group in order to calculate an effect size for the given treatment.
A second type of allocation method involves patients receiving one treatment and then changing to a different treatment or a control treatment. This is called a crossover design. The idea of this type of study is that each patient acts as their own control (see Fig. 3.5B ). When the patient crosses over to the different treatment, any alterations in the clinical outcome can then be put down to the effects of giving the new treatment or halting the old one. In surgery, this type of design is rare because surgical intervention is typically a one-time treatment. This crossover process can be repeated several times, until the investigators are certain as to the effects of the new treatment. Crossover designs can only be used to measure specific outcomes, usually that are on a scale and can occur multiple times, rather than a categorical outcome that is definite (e.g., mortality).
Allocation concealment is often confused with blinding. It describes the principle that those determining which treatment a patient receives should not influence the selection process, either consciously or subconsciously. The classical way of addressing this is to use opaque envelopes to disguise which treatment allocation is contained in the envelope.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here