Methods of Therapeutic Trials


SUMMARY

Clinical trials for acute and chronic pain can achieve high levels of precision if they adhere to some simple rules. This chapter discusses the various ways in which pain may be studied, how trials may be designed, and how the results are analyzed.

The magnitude of pain intensity or pain relief is generally measured with a numerical rating scale or visual analog scale. Studies indicate that a 30% reduction in pain intensity on a visual analog scale corresponds to a clinically significant reduction in pain. The area under the time–analgesic effect curve for the intensity (total pain relief) is a useful measure to describe the magnitude of a therapeutic effect. A number of statistical methods can be used to examine the results of clinical trials, including P values, odds ratios, and relative risk. However, they can be difficult for the non-specialist to interpret, and the number needed to treat (NNT) provides a clinically useful and intuitive measure of the therapeutic effect size. The NNT is an estimate of the number of patients who would need to be given a treatment for one of them to achieve a desired outcome. The NNT may be used to judge the relative efficacy of treatments. Relative efficacy is consistent whether the comparison is made at 30% pain relief or 50%. The number needed to harm (NNH) can also be calculated in the same way to report the likelihood of side effects. Both the NNT and NNH should specify the patient group, the intervention, the outcome, and the duration of treatment.

Measuring pain alone is not sufficient because function, distress, and adverse effects are important as well. To achieve an accurate picture of the clinical effectiveness of an analgesic intervention or a fair representation of the harm that may be caused, we need to study much larger numbers of patients than we have done in the past. Future trial designs may need to change to take this into account.

Outcomes used in trial reporting are changing. With both acute and chronic pain the most favored outcome is that approximating about 50% pain relief. With chronic pain, in particular, this degree of pain relief brings a significant reduction in associated symptoms of fatigue, depression, and poor sleep and comes with substantial improvement in health-related quality of life.

Clinical trials are used to show that our analgesic interventions—be they drugs, injections, operations, psychological or physical maneuvers, or even prayer—are effective and safe. Clinical trials need to produce credible results. To make the results credible, it is vital that trial design, conduct, and analysis minimize bias and maximize validity and that the trials be large enough to avoid the random play of chance; the credibility that is needed can then be achieved. Both the Consolidated Standards of Reporting Trials (CONSORT) guidelines for clinical trial reporting ( ) and the Initiative on Methods, Measurements, and Pain Assessment in Clinical Trials (IMMPACT) statement on chronic pain ( ) summarize many of the factors that underpin that credibility.

The efficacy of analgesic interventions is judged by the change that they bring about in the patient’s report of pain. A brief description of methods of pain measurement is followed by discussions of trial design and pain models.

Pain Measurement for Trials

Pain is a personal experience, which makes it difficult to define and measure. It includes both sensory input and modulation by physiological, psychological, and environmental factors. Not surprisingly, there are no objective measures; there is no way to measure pain directly by sampling blood or urine or by performing neurophysiological tests. Measurement of pain must therefore rely on recording the patient’s own report. The assumption is often made that because the measurement is subjective, it must be of little value. The reality is that if the measurements are done properly, remarkably sensitive and consistent results can be obtained from self-reports. In some contexts, however, it is not possible to measure pain at all, or reports are likely to be unreliable. These contexts include work with patients with impaired consciousness, young children, those with psychiatric pathology or severe anxiety, and patients unwilling to cooperate or unable to understand the measurements. Such problems are deliberately avoided in trials. Most analgesic studies include measurements of pain intensity and/or pain relief, and among the most common tools used are categorical, visual analog, and global scales.

There is a necessary and difficult distinction between measurement in trials and measurement in the clinic. The methods used in trials may work in the clinic, but the problems, which are deliberately minimized in trials, will be present in the clinic. The rigor of the trial will be absent in the clinic. Analyses based on retrospective record review and database abstraction must take this into account.

Pain Scales

Categorical scales ( Fig. 29-1 ) use words to describe the magnitude of the pain. For pain intensity and pain relief, the patient picks the most appropriate word from a number of categories (e.g., none, mild, moderate, and severe; none, slight, moderate, good or lots, and complete). For analysis, numbers are given to the verbal categories (e.g., none, 0; mild, 1; moderate, 2; and severe, 3). The small number of descriptors may force the scorer to choose a particular category when none describes the pain satisfactorily. The main advantages of categorical scales are that they are quick and simple.

Figure 29-1, Categorical and visual analog scales.

Visual analog scales (VASs) ( Fig. 29-1 ), or lines with the left end labeled “no relief of pain” and the right end labeled “complete relief of pain,” seem to overcome this limitation. Patients mark the line at the point that corresponds to their pain. Scores are obtained by measuring the distance between the no-relief end and the patient’s mark, usually in millimeters. The main advantages of VASs are that they are simple and quick to score, avoid imprecise descriptive terms, and provide many points from which to choose. More concentration and coordination are needed, which can be difficult postoperatively or in patients with neurological disorders. The results are usually reported as continuous data: mean or median pain relief or intensity. Ideally, studies should also present results as discrete data, such as giving the number of participants who report a certain level of pain intensity or relief at any given assessment point.

Numerical rating scales (NRSs) and global subjective efficacy ratings are also used. The NRS, also called a Likert or ordinal scale, is analogous to the VAS in that it is generally 100 mm long and has the same anchor points, but the answers are constrained to 7 or 11 possible responses. The NRS has proven validity and sensitivity and has been used widely in pain studies.

Global rating scales are designed to measure overall treatment performance. Patients are asked questions such as “How effective do you think the treatment was?” and answer by using a labeled numerical or a categorical scale. Although these judgments probably include adverse effects, they can be the most sensitive discriminant between treatments. Global scale results can correlate well with results from the other scales ( ) and are easier to administer. One of the oldest scales was the binary question “Is your pain half gone?” Its advantage is that it has a clearer clinical meaning than a 10-mm shift on a VAS. The disadvantage, for the small trial intensive measure pundits at least, is that all the potential intermediate information (1% to 49% or greater than 50%) is discarded.

Analgesic requirements (including patient-controlled analgesia), special pediatric scales, and questionnaires (such as the McGill Pain Questionnaire and the pain subscale of the Brief Pain Inventory) are also used. Patient-controlled analgesia in particular is a fraught pain outcome. Individual variation is huge and the distribution is often skewed ( ) such that a large trial group size is necessary to show any difference. If medication consumption based on patient-controlled analgesia is used with a self-report pain scale, any difference between trial groups in patient-controlled analgesia is valid only at similar pain scale values ( ). Special caution is necessary because the results from one or two patients with very high analgesic consumption can easily skew the data. It may be preferable to dichotomize data, with low analgesic requirement being preferred, because this is strongly correlated with good patient-centered outcome ( ).

Pain relief scales are perceived as more convenient than pain intensity scales, probably because patients have the same baseline relief (none) but could start with different baseline intensity. A patient with severe initial pain intensity has more scope to show improvement than one who starts with mild pain. Relief scale results are thus easier to compare across patients. A theoretical drawback of relief scales is that the patient has to remember what the pain was like to begin with. Judgment by the patient rather than by the caregiver is the ideal. Caregivers overestimate the extent of pain relief in comparison to the patient’s version ( ). The evidence we have is that the choice of pain measurement scale—intensity or relief, categorical or VAS—is not crucial for assessing efficacy ( ).

How Much Change Is Worthwhile?

The concept of a minimal clinically important difference is attractive. The problem is that even in the (relatively) simple situation of acute pain, defining and quantifying what it is may be fraught with difficulty ( ). With acute pain, at least 50% of maximum pain relief has become the accepted clinically useful outcome ( ). It has the advantage of producing stable estimates of efficacy while differentiating between analgesics of different efficacy.

With chronic pain, a 30% reduction in pain intensity is now regarded as a moderately important benefit, and a reduction of greater than 50% is a significantly important benefit ( ). Reductions in pain intensity of between 30% and 70% have been shown to produce major benefits in terms of sleep, fatigue, depression, function, work, and/or quality of life, including fibromyalgia (Moore 2010d), painful diabetic neuropathy ( ), and hand osteoarthritis ( ). Another outcome that is likely to find favor is that the pain should be reduced to below about 30 mm on a 100-mm scale or to no worse than mild pain, and it is likely that even more stringent outcomes will become important.

Restricting to Moderate and Severe Initial Pain Intensity

To optimize trial sensitivity, a rule developed in which only patients with moderate or severe pain intensity at baseline would be studied. Those with mild or no pain would not. For those using VASs, we know from individual patient data that if a patient records a baseline VAS pain intensity score in excess of 30 mm, at least moderate pain on a 4-point categorical scale would have been recorded by the patient ( ).

The requirement that only patients with moderate or severe baseline pain intensity should be studied presents particular problems for pre-emptive techniques and local anesthetic blocks. With pre-emptive techniques, there is no pain when the intervention is made. It is the absence of subsequent pain that is the desired outcome. The sensitivity of the subsequent measurements, such as time to further analgesic requirement, is then of supreme importance. The same applies to local anesthetic blocks given during surgery because we cannot be sure that the patient would have had any pain if the block had not been performed. It is known that a proportion of patients (6% after minor orthopedic operations; ) have little or no analgesic requirement after surgery. An example of the problem is intra-articular morphine. Many studies claimed efficacy when patients would have had no pain without intra-articular morphine ( ).

Longer Studies

Most investigators of both chronic and acute pain (after the hospital) use patient diaries supplemented by telephone calls. Little empirical information is available to help choose between particular scales and methods of presentation, just examples of particular trials that proved to be sensitive. Over the years, our diaries have become simpler, and an example is shown in Figure 29-2 . For chronic long-term use, patients are asked to complete the diary just before bed and note their current pain intensity and their typical pain intensity for the day. In such longer-term studies it is often the weekly average of the daily scale measurement that is used for analysis.

Figure 29-2, The Oxford Pain Chart.

Analysis of Pain Scale Results: Summary Measures

In the research context, pain is usually assessed before the intervention is made and then on multiple occasions. The area under the time–analgesic effect curve for the intensity (sum of pain intensity differences [SPID]) or relief (total pain relief [TOTPAR]) measures is then derived.


S P I D = t = 0 6 n P I D t TOTPAR = t = 0 6 n P R t

where at the t th assessment point ( t = 0, 1, 2, n ), P t and PR t are pain intensity and pain relief measured at that point, respectively; P 0 is pain intensity at t = 0; and PID t is the pain intensity difference calculated as ( P 0 P t ) ( Fig. 29-3 ). Traditionally, studies have been conducted over a period of 4–6 hours, and hence 4- or 6-hour SPID or TOTPAR has been the standard analysis. Longer studies now often use 8- or 12-hour TOTPAR.

Figure 29-3, Calculating the percentage of the maximum possible pain relief score.

SPID and TOTPAR are now less important than the dichotomous outcome of at least 50% maximum pain relief (maxTOTPAR) determined for individual patients. It is calculated by using the pain relief scale: with a best pain relief score of 4, the maximum TOTPAR over a 6-hour period would be 24, and those with individual TOTPAR scores above 12 would have more than 50% maximum pain relief.

These summary measures reflect the cumulative response to the intervention. Their disadvantage is that they do not provide information about the onset and peak of the analgesic effect. If onset or peak is important, it is necessary to investigate time to maximum pain relief (or reduction in pain intensity) or time for pain to return to baseline.

It is increasingly becoming apparent that to maximize the yield from trials (which take huge time, effort, and money), a responder analysis should be part of the results. The responder analysis tells us what proportion of patients achieved the responder criterion, with at least 30% and at least 50% pain relief being the responses of interest, although being in a low pain state is also potentially important. For chronic pain, as for acute pain, responses are not Gaussian. Figure 29-4 shows the distribution of reduction in pain intensity over a 12-week period in patients with osteoarthritis ( ).

Figure 29-4, Reduction in pain intensity from baseline over a 12-week period for placebo and 30 mg etoricoxib in randomized trials of osteoarthritis.

Outcomes Other Than Pain

Outcomes other than pain are important, not least because improved function at the same level of pain may be missed by an investigator who studies only pain. Mobility, satisfaction, and length of stay are important in the acute context; mobility or disability (physical function), emotional functioning, and satisfaction are important in the chronic context. With chronic pain, an analgesic intervention that improves pain by as little as 10% may be very important to the patient because this small shift in pain allows an important shift in function. Reductions in pain intensity greater than 10% are needed to reliably improve quality-of-life indicators ( ).

For function (disability), researchers often have the choice of using off-the-shelf validated scales developed in other clinical contexts, such as the Western Ontario and McMaster Universities Osteoarthritis Index, or developing their own scale. We have found that the small shifts in function that matter to patients with chronic pain are picked up poorly (if at all) by scales developed for advanced cancer. A fruitful approach may be to determine which outcomes matter to patients, for instance, by using patient focus groups. Given adequate consensus, the output may then be used to fashion a function outcome scale for the trial, with the minimal clinically important difference being predetermined. This will take time to develop and validate.

Output from Trials

A number of statistical methods can be used to examine the results of clinical trials, including P values, odds ratios, relative risk, reduction or increase in relative risk, and so on. All may have their place, but they are difficult output for the non-specialist to interpret. To overcome this, we use the number needed to treat (NNT; ). The NNT, as the name implies, is an estimate of the number of patients who would need to be given a treatment for one of them to achieve a desired outcome. The NNT should specify the patient group, the intervention, and the outcome. Using postoperative pain as an example, the NNT describes the number of patients who have to be treated with an analgesic intervention for one of them to have at least 50% pain relief over a period of 4–6 hours and who would not have had pain relief of that magnitude with placebo. This does not mean that pain relief of a lower intensity will not occur.

For an analgesic trial, the NNT is calculated very simply as


NNT = 1 Proportion of patients with at least 50 % pain relief with analgesic Percentage of patients with at least 50 % pain relief with placebo

Taking a hypothetical example from a randomized trial,

  • 50 patients were given placebo, and 10 of them had more than 50% pain relief over a 6-hour period, and

  • 50 patients were given ibuprofen, and 27 of them had more than 50% pain relief over a 6-hour period.

The NNT is therefore calculated as


NNT = 1 / ( 27 / 50 ) ( 10 / 50 ) = 1 / 0.54 0.20 = 1 / 0.34 = 2.9

The best NNT would, of course, be 1, when every patient with treatment benefited but no patient in the control group did. Generally, NNTs between 2 and 5 are indicative of effective analgesic treatment. For acute pain, combination analgesics and high doses of coxibs achieve NNTs of around 1.5. For adverse effects, we can calculate a number needed to harm (NNH) in exactly the same way as an NNT. For an NNH, large numbers are obviously better than small numbers.

Study Design and Validity

Pain measurement is one of the oldest and most studied of the subjective measures, and pain scales have been used for more than 40 years. Even in the early days of pain measurement it was understood that the design of studies contributed directly to the validity of the result obtained. Trial designs that lack validity produce information that is at best difficult to use and much of the time will be useless; therefore, the trial will be unethical. Studies designed to show that one treatment is better than another need controls because we need to show that the study had adequate internal sensitivity to detect a difference between treatments. The controls can be active (for instance, two different doses of a standard treatment) or negative (placebo). Many different aspects of trial design contribute to study validity, and new biases are being found based on individual patient data analyses involving large numbers of patients. For example, chronic pain trials may be subject to bias because the trials use easily achieved outcomes, they are short rather than long, or the imputation methods include pain relief scores from persons who have withdrawn from the trial ( ).

Randomized Controlled Trials

A randomized controlled trial (RCT) is the most reliable way to estimate the effect of an intervention. The principle of randomization is simple. Patients in a randomized trial have the same probability of receiving any of the interventions being compared. Randomization minimizes selection bias because it prevents investigators from influencing who undergoes which intervention. Randomization also helps ensure that other factors, such as age or sex distribution, are equivalent for the different treatment groups. Inadequate randomization or inadequate concealment of randomization leads to exaggeration of therapeutic effect ( ). In broad terms, methods of randomization that do not give each patient the same probability of receiving any of the interventions being compared—such as allocation by date of birth, day of the week, or hospital number—are bad, whereas tossing a coin, tables of random numbers, or a computer variant, which do give the same probability to each patient, are good.

An example of the impact of randomization on the conclusions that one draws is the use of transcutaneous electrical nerve stimulation (TENS) for postoperative pain. In a systematic review, 17 reports on 786 patients could be regarded unequivocally as RCTs of acute postoperative pain. Fifteen of these 17 RCTs demonstrated no benefit of TENS over placebo. Nineteen reports had pain outcomes but were not RCTs; in 17 of these 19, TENS was considered by the authors to have had a positive analgesic effect ( ).

Stratification—deliberately making sure that patients with factors known or suspected to influence the outcome are equally distributed (randomized) into the trial groups—may be incorporated (see for discussion). The randomization may be organized in blocks, which is helpful when multiple institutions are involved in a study or when there are multiple observers. Each institution or observer works through a particular block or blocks.

Double Blinding

Double blinding means that neither the investigating team nor the patient knows which of the interventions undergoing testing that the patient is actually receiving. Double blinding is relatively easy to organize for drug trials. With non-drug interventions it may be difficult or impossible. Although people have struggled to blind TENS or acupuncture, it is hard to see how twice-a-day versus once-a-day physiotherapy can be blinded. Does it matter? We know that studies that are not blinded overestimate treatment effects by an average of 17% ( ). With a subjective outcome such as pain, the ideal is clearly that the study should be both randomized and double blind. If the intervention cannot be blinded, the study should be randomized and open. The size of the study will in all likelihood have to be increased for the open condition versus the double-blind trial. Precisely how much bigger it will need to be depends on the intervention and on the outcome.

An example of a trial in which the principles of randomization and double blinding were breached and an incorrect conclusion reached is the investigation of epidural analgesia as a pre-emptive measure to reduce the incidence of phantom pain ( ). A decade passed before a randomized and double-blind trial showed that the original conclusions were incorrect ( ).

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here