Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The critical evaluation of clinical studies is essential for the practice of evidence-based medicine. This chapter will provide readers with sufficient knowledge of study design and data analysis to understand and to begin to assess the quality of clinical investigations.
The aim of research is to discover the truth. In clinical trials, the truth is usually the true effectiveness (real world) or efficacy (ideal situations) of a treatment. In searching for the truth in a clinical research study, the usual aim is to discover true causal relationships, for example, to determine whether giving a new treatment causes an improvement in the disease state or causes a particular side effect.
To know if an intervention is truly causal, we need evidence stronger than just association or correlation. A new treatment may be associated with a good disease response, but it may not be causal. This may occur for two reasons. First, association in the absence of causation may occur because of a fluke (random or chance association). Second, it may occur because something else is the true cause of the disease response, and that other factor is also associated with getting the treatment (i.e., confounding bias; see Bias section). For example, if sicker patients are preferentially given a new treatment, whereas less sick patients continue to receive the standard therapy, the new treatment may not be associated with an improved disease state. The reason patients received either the new treatment (in this case, a worse disease state) or the standard treatment (a better disease state) may have more to do with the outcome than the treatment itself. Sicker patients often end up sicker even if the new treatment truly is efficacious. (This situation is an example of confounding by indication; see Bias section.)
How can an investigator determine when a treatment is truly causal and not just associated? Most modern theories of causation appeal to the notion of the counterfactual. A treatment is the cause of a disease response if, had the treatment not been given, the response would not have occurred. Counterfactual reasoning is when we ask questions such as “What would have happened if?” Of course, the answer is unknown because all we can know is what did happen.
In order to determine, as best we can, the true causal relationships between treatment and disease response, or treatment and side effects, we try to mimic the counterfactual. Not every type of research is convincing in terms of its ability to mimic the counterfactual. The idea of levels of evidence relates to the fact that some experiences are rather poor proofs of causation, whereas others are quite rigorous and convincing.
For example, we can appreciate how a case report of a girl with dermatomyositis whose calcinosis improved after being given diltiazem is not a very strong proof of true causation; what would have happened if she was not treated with diltiazem? Calcinosis often regresses over time. Was the improvement just due to time? Was the diltiazem just given at the right time to mimic a causal effect? On the other hand, take the example of a patient who is treated with a new small molecule for a universally fatal genetic autoinflammatory disease, and the patient lives. We might think we know enough about what would have happened in the counterfactual situation (100% fatal) to believe that the treatment is truly causal (i.e., efficacious).
The randomized parallel group clinical trial is a rigorous test of the counterfactual because (1) random assignment to treatment groups prevents a systematic association of the treatment assignment with some other true causation of disease improvement (confounder), (2) blinded outcome assessment prevents investigators from seeing an association (consciously or unconsciously) where one hasn’t occurred, and (3) rigorous sample size calculation minimizes the chances of a fluke association.
Evidence-based medicine has been formulated as “a systematic approach to clinical problem solving which allows the integration of the best available research evidence with clinical expertise and patient values.” The levels of evidence, that is, the ability of research studies to mimic the counterfactual and provide a truthful accounting of causation, is used as a way of determining the best available evidence. ( Fig. 6.1 )
There are many different ways to define and categorize clinical research. One approach is to consider the entire spectrum of medical research, from basic science discoveries in the laboratory to interventions in large health care delivery systems. Different types of research can then be classified as contributing knowledge along this translational pathway from initial discovery to real-world benefits. One proposed system includes these five categories: (1) basic biomedical research, (2) translation to humans (e.g., phase 1 clinical trials), (3) translation to patients (e.g., phase 3 clinical trials), (4) translation to practice (e.g., comparative effectiveness trials and implementation research), and (5) translation to communities (e.g., public policy studies).
Clinical investigations are sometimes categorized into different disciplines, but these may have imprecise and overlapping definitions. Some notable examples include the following: Health services research studies how social factors, financing systems, organizational structures and processes, health technologies, and personal behaviors affect access to health care, the quality and cost of health care, and, ultimately, individual health and well-being. Pharmacoepidemiology studies the use of and effects of drugs in large groups of people. This may include pharmacovigilance studies in which clinical data are prospectively collected to study safety, as well as retrospective studies of data collected for other purposes, such as administrative claims billing data. Implementation science seeks to effectively translate clinical research findings into clinical practice and improve clinical outcomes.
In clinical investigations there has been increasing emphasis on patient-centered research. This results from recognizing the importance of including patients and families; in this conception patients and families are both participants in the assessment of outcomes (i.e., patient-reported outcomes) and major stakeholders in all aspects of clinical investigations. These aspects include developing the study question, designing and conducting the study, analyzing and interpreting the results, and disseminating the findings.
The fundamental differences between clinical investigations and basic science investigations are evident throughout study design, analysis, and interpretation. Basic science investigations are often conducted using remarkably homogenous “subjects,” such as identical genetically inbred strains of mice or cell cultures derived from a single cell. Clinical investigations are conducted using human subjects who are inherently heterogeneous in countless ways including their genetics, environment, and pathophysiology of disease. Even the most carefully conducted large randomized clinical trial will not generate results that are as precise as a well-conducted basic science experiment. This does not lessen the significance of clinical investigations, but it does require a different methodology and greater attention to certain details in order to correctly interpret results.
In clinical investigations, an exposure is any factor the effects of which investigators wish to study. This broad term includes factors that would be commonly thought of as exposures (e.g., environmental toxins, infectious agents, medications) and ones that may not fit the layperson’s concept of exposures (e.g., genetic polymorphisms, body mass index). Depending on the exposure, a patient’s exposure status may change over time and different patients may have varying levels of the same exposure.
Conversely, an outcome is an event about which the investigators wish to study the potential causes. Outcomes may be desirable (e.g., cure of disease) or undesirable (e.g., death).
Clinical investigations can be divided into observational or experimental studies. In observational studies, also known as noninterventional studies, there is no manipulation by the investigators of any factor that is to be assessed in the study. Observational studies may be either retrospective or prospective. In experimental studies, also known as interventional studies, investigators intentionally manipulate one or more factors assessed in the study, typically including the main exposure of interest. By definition, all experimental studies are prospective studies. The assignment of exposure is often performed with randomization (e.g., a randomized clinical trial), but experimental studies need not include randomization.
Clinical investigations can also be divided broadly into controlled or uncontrolled studies. In uncontrolled studies, investigators only report results associated with the exposure(s) of interest without reporting corresponding results from a comparator group of patients without the exposure. In controlled studies, investigators report results from a comparator group of unexposed patients that, ideally, closely resembles the patients with the exposure of interest.
Most clinical studies assess and report the association between exposures and outcomes of interest. Especially in the case of observational studies, care should be taken not to confuse association with causation. For example, an investigator could observe a busy pedestrian area for several days and report a very strong association between the number of people carrying umbrellas in the morning (exposure) and the likelihood of rain that afternoon (outcome). It would be unwise to assume that the people carrying umbrellas had caused the subsequent rainfall.
In reality, it can be quite challenging to prove causation. More than 50 years ago, Sir Austin Bradford Hill provided considerations for assessing causation (see Box 6.1 ). Although satisfying these criteria is neither necessary nor sufficient to establish causation, many of the criteria remain highly relevant today.
Strength of the association: How strong is the association between the factor and the outcome?
Consistency of the association: Does the association between factor and outcome persist from one study to the next, even if variations in study design and samples of patients vary substantially?
Specificity of the association: Is the association limited to specific factors and types of disease, with little association between the factors and other diseases? As the study of causation has advanced, the issue of specificity is considered less important than it previously was.
Temporal correctness: Did the exposure to the factor occur before the outcome?
Biological gradient: Is there a dose–response relationship between the factor and the outcome?
Biological plausibility: Does the association make sense with what is currently understood about the disease and its pathogenesis?
Coherence: Is the association consistent with laboratory science investigations of the disease?
Experiment: Does the association hold up under experimental conditions?
Analogy: Are there similar factors that are accepted to be the cause of similar diseases?
The first step for conducting a clinical study is to formalize the research question. A research question is a clear, focused, concise query around which an investigation is centered. A clinical investigation is testing a hypothesis; the hypothesis being studied is either true or not, and the research aims to discover which is correct. For clinical questions, the research question should address four elements: (1) the population under study, (2) the exposure of interest, (3) the outcome being considered, and (4) an explicit statement about the comparisons being used to test the hypothesis. A carefully considered and well-formulated research question will often help inform many decisions that must be made during the design of a study.
The primary objective or specific aim of a research study is an active statement about how the study will answer the research question and often includes more details about the chosen study design and methodology. Hypotheses are what drive the research question and specific aims. They are declarative statements about the predicted relationship between exposures and outcomes.
The choice of study design is determined by the nature of the study question and the resources available to answer the question.
Descriptive studies (also called nonanalytic studies ) are the simplest form of clinical investigations and do not seek to quantify the relationship between exposures and outcomes. Single reports of clinical events (case reports) or collections of such events (case series) are examples of descriptive studies, as are reports of the characteristics (e.g., the frequency of use of certain medications) of a population of patients.
Cross-sectional designs, in which each patient’s characteristics are assessed at a single point in time, are often used for descriptive studies. Cross-sectional designs can be used for analytic studies too, but because the exposures and outcomes are measured at the same point in time, their temporal relationship cannot be assured (i.e., the outcome may have occurred prior to the exposure).
A common study design used to assess an association between exposure and outcome is a case-control study. By design, case-control studies are always observational and retrospective. In this study design, the first step is to identify persons who have experienced the outcome of interest (e.g., developed a disease) and persons who have not experienced the outcome of interest. The next step is to identify the exposure status at some point prior to the outcome ascertainment and compare the frequencies of exposures between the persons with (“cases”) and without (“controls”) the outcome of interest. The result of a case-control study is reported as an odds ratio (OR) (see Table 6.1 ). Additional information about ORs can be found in the Communicating Risk section of this chapter.
Risk Factor | Disease Present | Disease Absent |
---|---|---|
Positive | A | B |
Negative | C | D |
The 2 × 2 table may be used to calculate associations between the risk factor and the disease |
Term | Calculation | Meaning |
---|---|---|
Incidence | (A + C) /(A + B + C + D) | Number of new cases among those at risk |
Attributable risk | [A / (A + B)] – [C / (C + D)] |
Incidence among those with the risk factor minus incidence among those without the risk factor (sometimes expressed as a percentage of the incidence rate among those with the risk factor) |
Relative risk | (A / [A + B]) ÷ (C / [C + D]) |
Incidence among those exposed divided by incidence among those not exposed |
Odds ratio | (A × D) / (B × C) | Approximation to the relative risk used in case-control studies and logistic regression |
The choice of an appropriate control group is crucial for the conduct of a case-control study. Controls are chosen with the intent to adjust for factors other than the exposure of interest that may influence the outcome. There is no gold standard for selecting control subjects. One basic principle is that the control patients should be representative of the underlying population from which the cases were derived (i.e., if the controls had developed disease, then they would have been identified as cases for the study).
Advantages of case-control studies include efficiency, low cost, quick results, and low risk to study subjects. With the advent of modern computers and the ability to perform analyses adjusted for multiple factors, the advantages of case-control studies have diminished in importance. They are still advantageous when studying rare outcomes because the persons with the outcome or disease of interest are identified from the beginning of the study (in contrast to a cohort study design).
There are several disadvantages of case-control studies, too. The temporal relationship between exposure and outcome may be obscured. Historical information about past exposures may be incomplete, inaccurate, or not verifiable. For example, persons who develop a disease may remember prior exposures differently than those who do not develop a disease because they suspect a possible causal association in their own minds (i.e., recall bias).
In cohort studies, a population is defined from which the sample is drawn. Exposure to some factor is established, and subjects are categorized as having been either “exposed” or “not exposed” to a factor thought to contribute risk of some outcome. Each of the two cohorts is assessed for the subsequent development of the outcome. Cohort studies may be either retrospective or prospective. In retrospective cohort studies, the outcomes were previously collected before the study was designed, and the data were typically not collected specifically to assess the study outcome. Prospective cohort studies most closely resemble an experiment because the study is designed prior to the data collection and the study subjects are monitored prospectively to observe whether the outcome develops. Relative risk is the statistic most commonly used in describing results from this study design.
The advantages of a prospective observational cohort study are that a clear temporal relationship between exposure and disease is established, and the study may yield information about the length of induction (incubation) of the disease or outcome. The design facilitates the study of rare exposures and allows direct calculation of disease incidence rates and thus relative rates or risks. Prospective cohorts or registries also are useful in pediatric rheumatology when the aim is to identify risk factors for development of certain complications or outcomes in a group of children who, typically, have the same disease but vary in predictor or risk variables.
The disadvantages of cohort studies include the potential for loss to follow-up or alteration of behavior because of the long follow-up time that may be necessary. Identifying and correctly measuring the exposure can present challenges. Cohort studies are not particularly efficient for the study of rare outcomes. Detailed studies of the mechanisms of the disease typically are impossible in cohort studies.
A patient registry has been defined as the organized collection of uniform observational data to evaluate specific outcomes for a defined population of persons. Registries may be developed to examine the natural history of disease, to analyze the effectiveness and safety of treatments, to measure quality of care, and for other purposes as well. The primary data in registries may be generated by medical encounters (e.g., physician assessments and the results of investigative studies), and these data may be additionally linked to secondary data sources that are collected for other purposes (e.g., outpatient pharmacy billing data). Advantages of prospective observational patient registries include the study of “real-world” patients and conditions—with resultant excellent generalizability—and the ability to examine clinical questions for which a randomized clinical trial is impractical or unethical. The main disadvantage of registries is common to all observational studies, namely, the potential for bias, especially confounding by indication. These concepts are presented in the Bias section of this chapter.
Data collected for nonresearch purposes can be a rich resource for observational studies. One example is administrative claims data. These data sources generally provide detailed patient-level information about physician diagnoses, hospitalizations and other resource utilization, and outpatient medication prescription fills. Advantages of claims data include large available sample sizes and reasonably accurate and complete records, irrespective of where medical services were obtained. The primary disadvantage is the lack of clinical data, such as physician assessments and results of investigative studies.
Electronic health record (EHR) data are increasingly used for clinical investigations. The primary advantage of EHR data is rich clinical information, but, to date, key data are often unstructured, making them difficult to access and manipulate, and data are limited to medical encounters that occur within a particular hospital system.
Several publications provide guidance for the conduct of clinical trials. The Code of Federal Regulations of the U.S. Food and Drug Administration (FDA), and in particular Title 21, is the most relevant to clinical researchers in the United States ( http://www.ecfr.gov ). Regulatory activities for clinical research are described in the Good Clinical Practice (GCP) guidance developed by the International Conference on Harmonisation (ICH) of Technical Requirements for Registration of Pharmaceuticals for Human Use ( http://www.ich.org ). The ICH GCPs represent an international quality standard that various regulatory agencies around the world can transpose into regulations for conducting clinical research. The GCPs include guidelines for human rights protection, how clinical trials should be conducted, and the responsibilities and roles of clinical investigators and sponsors.
Clinical drug development programs are often described as consisting of four temporal phases, numbered I through IV by the pharmaceutical industry and by regulatory agencies ( Fig. 6.2 ).
Phase 0 trial is a term sometimes used to refer to the preclinical or theoretical phase of agent development during which the uses of the drug based on animal models and cellular assays are explored.
Phase I studies are human pharmacology trials with a focus on pharmacokinetics and pharmacodynamics. Both are important in determining a drug’s effect. Pharmacokinetics can be defined as the study of the time course of drug absorption, distribution, metabolism, and excretion. Pharmacodynamics is the study of the physiological effects of drugs on the body and the mechanisms of drug action. These initial dose-finding studies provide an early estimation of drug safety and tolerability and are often conducted in healthy volunteers rather than persons with the condition to be treated.
Phase II studies are the earliest attempt to establish efficacy in the intended patient population. The hypotheses may be less well defined than in later studies. These studies may use a variety of different types of study design, including comparisons with baseline status or concurrent controls. In these studies, the eligibility criteria are typically very narrow, leading to a homogeneous population that is carefully monitored for safety. Further studies may establish the drug’s safety and efficacy in a broader population after it is determined that a drug does have activity. In phase II, another aim is to determine, more exactly, the doses and regimens for later studies. An additional important goal is to determine potential study end points, therapeutic regimens (including the use of concurrent medications), and subsets of the disease population (mild vs. severe).
Phase III The primary objective of phase III studies is to confirm a therapeutic benefit. This type of trial always has a predefined hypothesis that is tested. These studies also estimate (with substantial precision) the size of the treatment effect attributable to the drug. Typical phase III studies are blinded and randomized. Also incorporated in phase III development are further exploration of the dose–response relationship, study of the drug in a wider population and in different stages of the disease, and the effects of adding other drugs to the agent being investigated. These studies continue to add information to the accumulating safety database.
Phase IV postmarketing surveillance and pharmacovigilance studies accumulate longer-term safety data from large numbers of more diverse subjects followed for extended periods, even after the drug has been discontinued in the patient. These types of studies begin after the drug reaches the market and extend the prior demonstration of the drug’s safety, efficacy, anddosing. The FDA has published a guidance document for designers of surveillance and vigilance studies.
The heterogeneity of the patient population that would be allowed to enroll in the trial is influenced by the phase of development. Early exploratory studies are often concerned with whether a drug has any effect whatsoever. In these trials, one may use a very narrow subgroup of the total patient population for which the agent may eventually be labeled. Later phase confirmatory trials typically relax the eligibility criteria to allow for a broader, more heterogeneous sample of the target population. If the criteria for enrollment are too broad, interpretation of treatment effects becomes difficult.
Regulatory agencies have helped improve the study of drugs in children. In addition, they have sponsored laws that provide incentives for pharmaceutical companies to develop studies of their new drugs in children. In the United States, the Pediatric Research Equity Act (PREA) (enacted in 2003; formerly referred to as the Pediatric Rule ) requires manufacturers to assess the safety and effectiveness of a drug or biological product in children if the disease for which the drug was developed in adults also occurs in children. The Best Pharmaceuticals for Children Act (BPCA) (enacted in 2002) provides manufacturers with pediatric exclusivity incentives, provides a process for “off-patent” drug development, and requires that pediatric study results be incorporated into labeling. Similar legislation was introduced by the European Medicines Agency (EMA) in 2007. To obtain the right to market their medications for use in adults within the European Union, companies are now required to study medicines in pediatric subjects and develop age-appropriate formulations. A Pediatric Drug Committee (PDCO), based at the EMA, is responsible for agreement of the Pediatric Investigation Plan (PIP) with the companies ( http://www.ema.europa.eu/en/committees/paediatric-committee-pdco ). The PIP contains a full proposal of all the studies and their timings necessary to support the pediatric use of an individual product.
The phase I and IV studies are usually open label, meaning that everyone involved with the study, including the patient and the physician, knows what the patient is receiving. As one would expect, the possibility of bias in interpretation of safety and efficacy information with open studies is much greater than with blinded studies. Note that phase II and phase III studies may have an open-label extension phase, during which patients who took part in the comparative phase openly receive the investigational drug for an extended period.
Beginning either in late phase I or early phase II, blinded, controlled (comparative) studies are performed. Blinding refers to the masking of treatment assignment. The purpose is to prevent identification of the treatment until any opportunity for bias has passed. These biases include (but are not limited to) decisions about whether to enroll a patient, allocation of patients to different comparator arms of the study, clinical assessment of end points (outcomes), and approaches to data analysis and interpretation. Designs in which the assessor and the patient are blinded are called double-blind designs. Designs in which only the patient or only the assessor is blinded to the treatment are called single-blind designs. Studies should attempt to maintain blinding until the final patient has completed the study.
Certain studies may present challenges to the maintenance of blinding because blinding is either unethical or impractical. In this situation, a blind assessor may be used to evaluate the patient’s condition. Another situation in which blinding of the patient is difficult is when the administration regimen is different for two drugs being compared. In this case the double-dummy design can be a useful way to maintain the blind by administering both one active drug and one dummy drug (placebo) to each patient, thus replicating both administration regimens.
The purpose of randomization is to remove any bias in the allocation of treatment groups. The simplest form of randomization is unrestricted randomization. Patients are assigned to one of two or more treatment groups by a random, sequential list of treatments. Blocked randomization is commonly used to ensure that roughly equal numbers of patients are placed in each treatment group. Blocks may also be balanced by some prognostic factor to ensure equal distribution of the factor among the treatment groups.
To the extent possible, clinical trials should assess a clinically meaningful outcome.
Surrogate end points are outcomes that are intended to relate to a clinically important end point but do not in themselves measure a clinical benefit. For example, the measurement of blood pressure reduction is sometimes used as an indicator of reduction in cardiovascular events. Because of their relatively short duration, clinical trials often make use of surrogate end points, but surrogate end points may be used in other study designs as well. There is always the risk that the surrogate end point may not accurately predict the clinically important outcome it is meant to represent.
Composite outcome measures integrate or combine multiple relevant variables into a single variable, often using a predefined algorithm. An example is the American College of Rheumatology (ACR) Pediatric 30 for use in trials of children with juvenile idiopathic arthritis (JIA).
If patients remain in the same group to which they are initially assigned, the study is known as a parallel group design . In crossover designs , patients switch from one treatment to the next, often in a randomized manner, and each patient acts as his or her own control for purposes of analysis. Factorial designs allow for study of the interaction of two treatments that are likely to be used in combination. The simplest factorial design is a 2 × 2 design in which patients are assigned to receive drug A only, drug B only, both drug A and drug B, or neither drug A nor drug B.
A design used to study, among others, the treatment of polyarticular JIA is the blinded withdrawal design . In this approach, all patients receive active medication long enough to establish whether patients respond (according to a standard definition). Patients who are not classified as “responders” after the prescribed time period are discontinued from the study and considered therapeutic failures. Patients who do respond are then randomly assigned either (1) to be withdrawn blindly from active medication and given placebo or (2) to continue to receive active medication but in a blind manner. The primary outcome after randomization can be time to flare or percentage who flare (according to a standard definition). Although the blinded withdrawal design has the definite advantage of diminishing the use of placebo in children, this study design has several important limitations, including not generating relevant estimates of efficacy (newly started medications are not abruptly discontinued in clinical practice) and not providing robust information about medication safety because all study subjects receive the investigational agent.
For trials using an adaptive design, patients are randomized to multiple different active treatments. Patient outcomes are carefully monitored and analyzed during the conduct of the trial, and the randomization scheme is adjusted during the trial to increase the proportion of subjects who receive the treatment that appears to be most favorable. This reduces the number of patients randomized to the less effective treatment, while preserving the statistical power of the study to draw conclusions about the most effective therapy.
Another approach uses an end point–driven protocol. The outcomes are tallied during the conduct of the trial and when sufficient confidence that the treatment is effective (or not effective) is attained, subject enrollment can be halted and the placebo phase of the study can be terminated. The study of canakinumab in systemic JIA (sJIA) used an end point–driven design in which the blinded study period was discontinued after a predetermined number of disease flares had occurred.
The N-of-1 approach repeatedly, and randomly, crosses over individual patients from one therapy to the next. For example, the randomization scheme may be A , B , B , A , A , B . A current pediatric rheumatology example is the N-of-1 study of rilonacept in patients with familial Mediterranean fever. Data from numerous N-of-1 trials in individuals may be combined to increase the sample size, but this is fraught with difficulties and sources of potential bias. Careful consideration of the carryover effect of the treatment and natural fluctuations of the disease state unrelated to therapy need to occur when planning an N-of-1 trial. The N-of-1 method is most useful in situations where the drug under study has a relatively rapid onset and offset of effect (i.e., has a limited drug carryover effect), and the disease is relatively chronic and stable. Such trials may be poised to emerge as an important part of the methodological armamentarium for comparative effectiveness research and patient-centered outcomes research. By permitting direct estimation of individual treatment effects, N-of-1 trials can facilitate finely graded individualized care, enhance therapeutic precision, improve patient outcomes, and reduce costs.
A trial design that is generating widespread interest is the pragmatic trial. These studies use randomization of treatment assignment to control for group differences but differ from traditional (explanatory) trials in several ways. Often, all of the treatment arms in a pragmatic trial are forms of accepted active therapy. Depending on the interventions and the study questions, the randomization may take place at the individual patient level or at the physician or clinical site level, known as cluster randomization. After randomization, the study procedures are generally minimal and are consistent with typical clinical practice. In some cases, the ascertainment of outcome may not require direct interaction with the patients (e.g., death). Most importantly, pragmatic trials are intended to closely represent real-world outcomes and attempt to answer questions about the relative effectiveness of existing accepted treatment approaches.
An additional trial design of interest is the Bayesian trial . Most clinical trials in the published literature use a hypothesis testing paradigm in order to establish whether a treatment is successful. They test the null hypothesis (which usually states that the treatment is not effective/efficacious) against some alternative hypothesis (which usually states the treatment is effective to at least some minimal standard; this minimal standard is often called the minimal clinically important difference [MCID]).
How do investigators determine which hypothesis is correct? Over the history of published clinical trials, the most often used method is the so-called frequentist method. The frequentist approach is to generate P values, or confidence intervals (CIs), to determine whether a hypothesis is supported.
What the frequentist approach actually does is determine the likelihood that the results observed in the particular trial in question (or results more extreme) would have been seen if the null hypothesis was actually true and the trial was repeated a countless number of times. This likelihood is often called a P value. If the P value is very small, the likelihood that this particular trial’s results would have occurred under the null hypothesis given a countless number of repetitions of the trial (which is a bit difficult to grasp because the trial is often only done once) is very low. A very low likelihood of occurring is taken as evidence that perhaps the null hypothesis is not true, and the alternate hypothesis is true .
The frequentist method, therefore, examines the likelihood of obtaining data under a given hypothesis (like the null hypothesis), which is, perhaps, not really what we want to know as health care practitioners.
An older method was proposed by the Rev. Thomas Bayes; his posthumous thesis of inverse probability was read before the Royal Society in 1763. Bayes proposed a method in which our beliefs before an experiment (called prior beliefs and expressed as a probability distribution) are multiplied by the likelihood of seeing the data obtained from the experiment to yield a new set of beliefs, now informed by the results of the experiment (called posterior beliefs and also expressed as a probability distribution). For example, if we are studying the efficacy of a new small molecule in treating polyarticular JIA, we likely already have enough information to have some beliefs about its potential effect before we even start the clinical trial (e.g., we would likely have animal model data, studies in adults with rheumatoid arthritis, and perhaps pilot studies in children); this information can be quantified as our prior beliefs. When we do our clinical trial, we get new data. The clinical trial on its own (see the frequentist method definition) can only be used to describe the likelihood of our observed data under different hypotheses (e.g., the null hypothesis). However, we can combine the prior beliefs and the new data to refine and revise our beliefs. Our revised beliefs are the posterior beliefs.
Unlike the frequentist method, the Bayesian method, therefore, provides the probability of a given hypothesis (e.g., the null hypothesis, or any number of alternative hypotheses) given the data from the trial. This is, usually, exactly what we want to know as health care practitioners.
There are a number of advantages to the Bayesian method. In a Bayesian trial, our interpretation of how new findings alter our previous beliefs is made explicit (as opposed to our implicit valuation of findings from a frequentist trial). In a Bayesian trial, we can have multiple looks at our data, and continue accruing subjects until we have a strong feel for the evidence for or against a new treatment; in a frequentist trial, multiple looks at the data require complicated calculations to make sure that we don’t exaggerate the potential for false positive conclusions. Finally, in a Bayesian trial, we get some new information—and adjust our new beliefs—no matter how small the trial is. This is quite different from a frequentist trial. In a frequentist trial, a small sample size often results in low power and may lead to a false negative result; the only conclusions that can be made in a frequentist trial that results in a P value above some arbitrary value (usually 0.05) is that there are insufficient data to disprove the null hypothesis. Nothing more can be said.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here