Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Additional content is available online at Elsevier eBooks for Practicing Clinicians
Modern cardiovascular medicine prides itself on being evidence based. Virtually all the therapeutic advances that have informed the treatment of patients with cardiovascular disease have resulted from the findings of randomized clinical trials. Randomized trials are generally considered to provide the highest level of evidence, and this principle is reflected in the approach that all major guidelines use to support the strength of therapeutic recommendations. While clinical trials are used for evidence generation in virtually all disciplines, there are few fields in which clinical trials have been so impactful as in cardiovascular medicine.
This chapter will review the basic principles of clinical trials in cardiovascular medicine, the approach to designing and executing clinical trials, and an introduction to the interpretation of clinical trials results to inform clinical practice.
Observation has always been the key to the generation of medical evidence. For centuries, astute physicians have observed patients’ responses to various remedies and occasionally made insightful inferences about the benefit of new treatments. Observational studies (see also Chapter 5 ) can assess the natural history of disease, demonstrate relationships between risk factors and outcomes, and generate hypotheses for more definitive experimental testing. Yet observational studies are almost always biased and limited when it comes to assessing the merits of new therapies. This single-greatest inherent problem with attempting to infer effects of therapies from observational studies is termed “confounding by indication” and refers to biases, known or unknown, that influence which therapies are used for which patients and which conditions. These biases can be overcome to some extent by taking account of, or adjusting for , all the other factors that might have influenced the decision to use that medication and the outcomes in those patients. Although several novel statistical methods have been developed to attenuate indication bias in observational studies, adjustment is rarely able to overcome all the potential biases because all such factors cannot be known or accounted for. Indeed, many therapies that had initially been based on observational data, such as hormone replacement therapy in post-menopausal women to reduce cardiovascular risk, have been refuted by subsequent randomized trials.
In contrast to observational studies, randomized clinical trials are prospective human experiments in which an intervention (which could be a pharmacologic or device therapy or an interventional strategy) are compared with a control and in which randomization is used to eliminate the potential biases related to administration of a therapy ( Fig. 4.1 ). In a large enough study, randomization ensures that patients in both the experimental group and the control group are similar in every respect excepting the randomly allocated therapy. While single arm studies are sometimes referred to as trials, we will in this chapter limit our discussion to multiple arm studies in which treatment allocation is randomized.
Developmental programs for drugs and devices are categorized in phases ( Table 4.1 ). Phase I studies assess the safety and tolerability in the first human experience of a novel therapy typically using healthy volunteers. These studies can be open label and even single arm and collect information that can be helpful in identifying a maximally tolerated dose (dose escalation studies).
Phase | Features | Purpose |
---|---|---|
I | First administration of new treatment | Safety and biologic plausibility |
II | Early trial in patients with the disease to be studied | Efficacy—dose finding, adverse events, pathophysiologic insights |
III | “Pivotal” trial large enough to test safety and efficacy | Designed to allow for regulatory approval |
IV | Mechanistic, additional safety | Elucidate mechanisms, assess safety in novel populations, postmarketing surveillance |
Phase II studies are designed to confirm the biologic activity of the experimental therapy in patients with the disease of interest and, in some cases, to determine the likely optimal dose for both efficacy and tolerability. The results of these studies are typically used to determine whether to proceed to a pivotal , or phase III trial , which is used for regulatory assessment. Safety and tolerability are also assessed along with other secondary and exploratory measures of efficacy that might inform further development. Phase II trials often use surrogate endpoints rather than clinical endpoints (see later).
Phase III , or pivotal studies, are designed to provide enough information on efficacy and safety for regulatory evaluation and hopefully approval. Pivotal trials require assessment of “approvable” endpoints—that is, endpoints that have been previously agreed upon by regulatory authorities such as the U.S. Food and Drug Administration (FDA) and European Medicines Agency (EMA). Approval of a new antihypertensive agent may require only demonstration of blood pressure lowering, and trials of cholesterol-lowering medication may require only demonstration of serum cholesterol lowering for approval. In contrast, other indications, such as for treatment of heart failure, may require demonstration of benefit for clinical outcomes, such as reducing death, hospitalizations for heart failure, or myocardial infarction. Phase III trials are sometimes performed for the primary purpose of determining safety for a therapy—a concern regarding cardiovascular safety for previously approved diabetes therapies prompted the FDA in 2008 to issue guidance requiring all diabetes registration programs to assess cardiovascular safety by assessing and adjudicating adverse cardiovascular events such as cardiovascular mortality, myocardial infarction, and stroke. What is needed for registration is usually negotiated with regulatory authorities prior to initiation of a clinical trial. Guidance from health authorities regarding what is needed for registration has evolved over time. Recently, the FDA has indicated willingness to consider functional endpoints, such as 6-minute walk or patient-reported outcomes (PROs), for initial approval for a heart failure therapy.
Phase IV trials , sometimes referred to as post-marketing trials , are designed to add mechanistic or other support for an indication, to extend a previous indication to a new population, or to meet a regulatory requirement such as providing additional safety information, perhaps in a specific patient population. The EVALUATE trial, for example, was a phase IV trial examining the effect of sacubitril/valsartan compared with enalapril on aortic stiffness and ventricular remodeling to provide mechanistic support for the findings in PARADIGM-HF, a positive phase III outcomes trial.
Randomized clinical trials and clinical trial reports should include all the following components: rationale, inclusion and exclusion criteria, study design, study execution, study endpoints, and analytic approach . These components are typically codified in the study protocol, which serves as the principal documentation of the study background, objectives, design, organization, execution, and preliminary outline analysis plan.
Because all clinical trials are human experiments, they need to be justifiable to investigators, institutional review boards or ethics committees, and participants; a well-thought-out, clinically relevant, scientifically and ethically valid rationale is the essential first step in a clinical trial design. In short, the question should be one for which the answer is not known and for which the result would either directly inform clinical care or would provide crucial information that would inform the continued development of a particular therapy. The scientific rationale for conducting a trial can be in the form of basic research that supports a particular pathway or mechanism that may be affected by the therapy, preclinical data involving animal experiments in which a therapy was tested in a manner similar to a human trial, or early clinical or “pilot” studies that may provide some evidence that a therapy might be efficacious. Because virtually all interventional therapies can be associated with risk, the use of a particular therapy must at least have the potential to be beneficial in a particular disease state, although early-phase trials do not necessarily need to demonstrate that benefit to be successful.
Clinical trial designs vary, and each have distinct advantages and disadvantages. The most commonly used is a parallel group design ( Fig. 4.2A ) in which patients are randomized to two or more groups and endpoints are compared between groups. These trials can be placebo controlled or active controlled and can have multiple arms (e.g., a placebo, active comparator and study drug, or multiple doses of a study drug). In this design, patients are randomized to receive one of these therapies for the duration of the trial. This type of design can be used for either clinical outcomes trials or phase II trials in which the primary endpoint is a surrogate (e.g., cholesterol or a natriuretic peptide).
In contrast, in crossover trials ( Fig. 4.2B ) , patients receive one therapy for a period of time and then are “crossed over” to receive placebo or another therapy. In this design, individual patients act as their own control, and these designs are typically used for phase II studies in which the endpoint is a measured surrogate such as a biomarker. The advantage of crossover trials is that fewer patients are needed because each patient serves as his or her own control, reducing variability. The disadvantage is that effects of a therapy from the first phase can carry over and contaminate the second phase. This issue is typically mitigated with a washout period, a time between therapies during which the effect of the first phase would be expected to wear off. Crossover designs are not suitable for long-acting therapies or to outcomes trials (where a clinical outcome, such as a death or hospitalization, might influence whether the patient would join the second phase).
Factorial design ( Fig. 4.2C ) trials are essentially parallel group studies in which there are two consecutive randomizations within the same patient population so that an individual patient would be randomized to treatment A versus B, and also to C versus D, leaving four distinct treatment groups (A+C, A+D, B+C, B+D). In a factorial design trial, each randomization is essentially treated as its own trial. Factorial trials are best when the therapies are distinct enough that there will be no “interaction” between therapies. Assuming there is no or minimal interaction between therapies, factorial designs can be executed with a modest increase in the sample size required for a single intervention. If interaction between the two therapies is suspected, sample sizes need to be increased to allow for formal interaction testing. Examples of factorial design trials include the ISIS-2 trial, which randomized patients to both streptokinase or placebo and additionally randomized the same patients to aspirin or placebo, and the DREAM trial, which compared the effects of ramipril versus placebo, and rosiglitazone versus placebo on the incidence of diabetes. ,
Superiority trials ( Fig. 4.3 ) test whether therapy A is superior to therapy B, which can be either an active comparator or placebo. Superiority trials aim to reject the null hypothesis that there is no difference between the therapies (see statistical considerations later). In contrast, noninferiority trials are designed to determine whether one therapy is noninferior to (loosely translated to not worse than ) another therapy. In the case of noninferiority trials, rejecting the null hypothesis requires that therapy A be not inferior to therapy B within a certain margin of error; this requires setting a prespecified noninferiority margin and requiring an upper 95% confidence interval to be within that margin. Noninferiority trials are typically used when it is necessary to show only that a novel therapy is “as good as” an established therapy, which may be clinically important if the novel therapy has a better side effect profile, is less expensive, or may be easier to administer. Trials can be designed to test for both noninferiority and superiority, and a particular therapy can be noninferior even if not superior (see Fig. 4.3 ). The VALIANT trial compared the angiotensin receptor blocker (ARB) valsartan to the angiotensin-converting enzyme (ACE) inhibitor captopril (and the combination of the two) in post–myocardial infarction (MI) patients; while valsartan was not superior to the ACE inhibitor, it was noninferior , leading to an indication in post-MI patients.
Randomization in a clinical trial can be as simple as a coin-toss (or the electronic equivalent) or considerably more complex, and there are a variety of approaches for randomly allocating treatment in clinical trials. Although randomization should lead to balanced groups in large trials, in smaller trials randomizing by a simple coin-toss (or random number) method can lead to imbalances at any time during the trial. For example, in a 100-patient trial, there would be a 5% risk of 60% of participants being allocated to one therapy. The commonly used blocked or permuted block randomization scheme mitigates this risk by ensuring equal number of participants assigned to each randomized group within each block of x , ensuring that the maximum imbalance at any given time is essentially the size of the permuted block.
While randomization in theory should lead to balanced groups in which characteristics known to be important in the disease being studied are balanced, in reality baseline imbalances are common in even relatively large trials and can influence results. For variables for which balance is especially desired, there are a variety of methods to stratify either at the randomization stage or the analysis stage. In stratified randomization, a participant is placed into a stratum (e.g., men or women) and then randomized within that stratum (ensuring balanced randomization within the stratum). This approach is more complicated than stratifying at the analysis stage, in which patients are compared within each stratum, an approach that is equally effective when trials are large.
Cluster randomization is a design in which groups of individuals, rather than actual individuals, are randomized. For example, trials testing specific strategies might randomly allocate clinics to one approach or another, as was done in the HOOPS trial, which randomly allocated 174 practices to pharmacist intervention or usual care to optimize use of guideline-directed therapy in patients with left ventricular dysfunction. This avoids the risk of investigators applying the new strategy under investigation to “control” patients in the same clinic or practice because all patients in each practice or clinic receive one or other strategy. Cluster randomized trials require a slightly larger sample size than noncluster randomized trials.
Randomized trials can be blinded or unblinded, and blinded trials can be single-blind , double-blind , or triple bind . Blinding is designed primarily to avoid bias by allowing either participants or investigators to know which therapy a patient is on. In an open-label, or unblinded, trial, the participants and investigators will know which therapy is offered. The bias associated with this design can be mitigated using a blinded endpoint approach, often called a prospective open-label blinded endpoint (PROBE) design, in which the assessment is performed by individuals who are not aware of treatment assignment. An example of a PROBE design trial is one assessing cholesterol lowering where the laboratory making the cholesterol measurements was not aware of the treatment assignment and therefore could not be biased by this knowledge. It would, in contrast, be nearly impossible to eliminate bias for a patient-reported outcome in an unblinded clinical trial. Unblinded trials are less expensive and simpler to execute than blinded trials and have become an important approach to pragmatic trial design (see later).
A single-blind study is one in which the investigator, but not the participant, is aware of the study assignment, and it is also simpler to execute than fully blinded trials. If the investigator is involved, however, in collection of data and decisions about the care of the patient that might be influenced by his or her knowledge of the treatment assignment, the integrity of the trial could be compromised.
A double-blind study is one in which neither the participants nor the investigators are aware to which therapy a participant is assigned. Double-blind studies are considered the “gold standard” of clinical trial designs. Nevertheless, blinding can be difficult in practice, especially when investigators or patients may get “clues” about which therapy they have been assigned to (e.g., the taste of an experimental compound has unblinded participants to their therapy, and specific laboratory abnormalities, such as elevation in serum potassium, have the potential to unblind investigators). A triple-blind (i.e., triple-masking) study is a randomized experiment in which the treatment or intervention is unknown to (1) the research participant, (2) the individual(s) who administer the treatment or intervention, and (3) the individual(s) who assess the outcomes.
Blinding is typically accomplished by matching an experimental therapy to placebo. There are a variety of approaches used to ensure that experimental therapies are matched to placebo, including using dyes to ensure similarity in appearance of medication, overencapsulation, or various ingredients to mask taste. In device trials, there are several ways to accomplish blinding, although this is often impossible. While sham procedures can be performed, they are often impractical. Devices can be implanted but not turned on or can be programmed differently. It can even be challenging to blind the endpoints, because various diagnostic procedures (x-rays, ECGs) can unblind clinicians, investigators, and even endpoint adjudicators.
Properly defining the patient population is key to a successful clinical trial. Inclusion and exclusion criteria need to be tailored to ensure that the patients enrolled in the trial have the disease being studied and are likely to benefit from the therapy being tested if that therapy has actual benefit. For example, in a lipid-lowering therapy trial, for which the primary endpoint was degree of cholesterol lowering, patients would be required to have elevated levels of cholesterol at baseline. In a study of a similar therapy in which the primary endpoint was reduction in major adverse cardiovascular events (MACE, see later), patients enrolled need to be at risk for those events (e.g., assessment of MACEs in a primary prevention population of young adults might be impractical because of the very low event rate in that population). Often, enrichment criteria are used to ensure patients have sufficient risk—for example, in a heart failure outcomes trial, it is common to include a requirement for elevation in natriuretic peptides to ensure that patients have a high enough event rate. Exclusion criteria are based on ensuring patient safety; typical exclusions might include patients who are pregnant or may become pregnant during the course of the trial (pregnancy tests are often mandatory) if a therapy may be harmful to pregnancy or the fetus. Other exclusions might be specific to the therapy being tested. For example, a specific upper limit of serum potassium might be set when testing a drug that elevates serum potassium, such as a mineralocorticoid receptor antagonist (MRA) or renin-angiotensin system (RAS) inhibitor; alternatively, a lower blood pressure limit is typically used in heart failure trials testing drugs that tend to lower blood pressure, but this threshold may be much lower when testing an inotropic agent. Of note, specific inclusion and exclusion criteria limit the generalizability of a population—a common criticism of clinical trials, and often result in labels, guidelines, or payment decisions that reflect the specifics of those criteria.
Clinical trials are generally designed to evaluate both efficacy and safety. The metrics by which efficacy is assessed depend on the disease being studied, the mechanism of action of the therapy, and where the trial fits in the development lifecycle of the therapy. Measures of efficacy in cardiovascular medicine are numerous and include a variety of biomarkers including those that can be measured in the blood, such as cholesterol levels or natriuretic peptides, physical examination measures such as blood pressure or heart rate, or clinical outcomes, such as hospitalization (all cause or cause specific), or death (all cause or cause specific). Evaluation of efficacy requires a predetermined analysis plan with prespecified statistical approaches to determining whether a therapeutic benefit is met (see later). Measured endpoints—such as blood pressure—can be compared directly between treatment groups and may be measured at several time points during the course of a trial, although typically at baseline and at least once during follow-up. Usual analyses would include between-group comparisons adjusted for baseline level, although there are a variety of statistical techniques to handle multiple measures. These types of evaluation are particularly sensitive to subject drop-out leading to missing data, which can occur for a number of reasons, including subject death. For example, assessment of the effect of a drug on ejection fraction over time in a heart failure trial in which there is a high death rate, leading to many patients with missing data, can be problematic—especially if there is differential drop-out such that patients in one arm drop out at a greater rate, as might happen if the therapy resulted in fewer deaths, and the patients remaining alive in the placebo group were those whose ejection fraction were least likely to worsen. Such a scenario might lead to underestimation of a true treatment effect. Surrogate endpoints are measured endpoints that are thought to be directionally related to clinical outcomes and are often used in phase II trials. Good surrogate endpoints can usually be measured earlier than clinical outcomes, are indicative of disease progression, and are directionally related to the clinical outcome (changes in the surrogate endpoint correlate with clinical outcomes). Natriuretic peptides, for example, are often used as a surrogate in heart failure trials, and reduction in natriuretic peptides has been shown to correlate with improvement in clinical outcomes. Although implanted devices have long recorded data that could be used as endpoints in trials (e.g., arrhythmia endpoints), novel endpoints from data acquired from wearable devices or smart phones are being used with greater frequency.
Clinical outcomes, such as death or hospitalization for heart failure, are typically counted and expressed as a proportion (i.e., percentage of patients dying over the course of the trial in each arm) or a rate (i.e., number of deaths per 100 patient-years). While clinical outcomes can be expressed as the proportion of patients who have an event at a certain time point (e.g., 30 days post randomization), this approach is best reserved for studies with relatively short-term outcomes. For longer outcomes trials, the time to event is usually incorporated by comparing the time from randomization to the event between treatment groups, thus accounting for the difference between a patient who died on the 30th day of a trial and a patient who died on the 300th day of a trial.
Clinical outcomes can be grouped into composites in which an “event” is said to occur if any of the several components of the composite occur, and the time to that event is based on the first occurrence of one of the component events. The designation MACE, or three-point MACE, is typically used to describe a composite of cardiovascular death, MI, or stroke. Similarly, a typical composite in heart failure trials is the combination of cardiovascular death or heart failure hospitalization (or more recently, cardiovascular death or heart failure hospitalization or urgent heart failure visit ). Including a fatal and nonfatal component in a composite addresses the issue of competing risk . Patients who die in a trial are clearly not at risk for a subsequent nonfatal event; thus, assessing only nonfatal events in trials where fatal events are likely can artificially deflate the risk of the nonfatal event in the group with a higher mortality rate, because this will likely deplete higher risk individuals. The number of composite events will not simply be the sum of all the component events for a given patient because only the first event that occurs is being counted. For example, a composite event of cardiovascular death or heart failure hospitalization would not count a death event if that event occurred after a heart failure hospitalization. Similarly, a second heart failure hospitalization would not be counted in that composite either (see later for alternative approaches that incorporate multiple events).
Patient-reported outcomes have become particularly important in cardiovascular trials because they provide meaningful insight into how specific therapies truly affect how patients feel and their quality of life. PROs are assessed through instruments (or questionnaires) that have been previously validated, although the type and extent of validation can vary. Examples of specific instruments typically used in cardiovascular medicine include the Kansas City Cardiomyopathy Questionnaire (KCCQ) and the Minnesota Living with Heart Failure (MLHF) instruments commonly used in heart failure studies, and the European Quality of Life Group-5D (EQ-5D) instrument, typically used for health economic assessment. Recently, the FDA has indicated that PROs may be considered approvable endpoints for certain conditions. ,
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here