Appropriate Use Criteria in Spine Surgery


Summary of Key Points

  • Appropriate use criteria are developed to define which patients certain medical and surgical procedures are appropriate for and when the benefits sufficiently exceed the risks, thus making the procedure worth doing.

  • The RAND Corporation/University of California Los Angeles Appropriateness Method uses extensive literature review and an expert panel to classify indications as appropriate, equivocal, or inappropriate.

  • Appropriateness criteria have been developed for procedures such as cervical fusion and lumbar laminectomy, as well as surgery for degenerative lumbar scoliosis, degenerative lumbar spondylolisthesis, vertebral fragility fractures, and persistent pain following spinal surgery.

  • There is still a need to develop further appropriateness criteria in spinal surgery to improve evidence-based clinical decision-making.

  • Surgeon utilization of appropriate use criteria may have future implications in terms of physician reimbursements from Medicare.

Efficient and fair healthcare systems depend on the delivery of appropriate care. Appropriate care means that the health benefits exceed the health risks by a sufficiently wide margin and that the procedure is worth doing, exclusive of cost. There has been well-documented variation in rates of surgical procedures in the United States that are not fully explained by disease incidence or patient preferences, indicating that surgical procedures are being underused in certain regions and inappropriately overused in others. Underuse is defined as any patient with a necessary indication who does not receive the procedure, whereas overuse is defined as any patient who undergoes the procedure for an inappropriate indication. In order to increase appropriate care and reduce inappropriate care, there must be a clear way to define appropriate care.

Clinical practice guidelines (CPGs) have been developed to serve as tools for healthcare providers in clinical decision-making. The definition of CPGs has evolved over time. The Institute of Medicine previously defined CPGs as “systematically developed statements to assist practitioner and patient decisions about appropriate healthcare for specific clinical circumstances.” The definition has since been updated to “statements that include the recommendations intended to optimize patient care that are informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options.” Appropriate use criteria (AUC) are distinct clinical decision-making tools aside from CPGs. They build upon the evidence-based recommendations in CPGs and attempt to cover any gaps in the guidelines. In areas where there are little data or where the quality of evidence is lacking, AUC bring in the experience of those practitioners in the field to inform practice. AUC are much more detailed and specific, meant to be applicable to nearly every patient in every clinical scenario imaginable. Table 179.1 highlights the differences between CPGs and AUC.

Table 179.1
Comparison of Clinical Practice Guidelines and Appropriate Use Criteria
Clinical Practice Guidelines Appropriate Use Criteria
Recommendations intended to optimize patient care Specify when it is appropriate to use a procedure
Informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options Based on systematic review of evidence and uses consensus of expert opinion where quality evidence is developing or lacking
Reflect best practices based on available evidence only Indicate what is reasonable to do in many specific clinical scenarios
Advisory based on strength of evidence Clearly assign scenarios as appropriate or inappropriate indications

RAND Corporation/University of California Los Angeles Appropriateness Method

The appropriateness method was developed to determine for which patients certain medical and surgical procedures are appropriate. A widely used and reliable method to produce AUC is the RAND Corporation/University of California Los Angeles (UCLA) Appropriateness Method (RAM). The RAM was developed to systematically assess variation in the use of surgeries by clearly defining which patients should and should not undergo surgical intervention.

Investigators at RAND and UCLA attempted to determine appropriate use for a procedure from the medical literature in order to test their hypothesis at the time that high rates of use of a procedure in a geographic region likely represented “inappropriate” use of the procedure in that region. Upon review of the literature for six different procedures, they realized that medical literature simply was not enough to make determinations regarding the appropriateness of a procedure. There were unanswered questions that required input from those with clinical experience in treating the condition. Therefore, multiple clinical disciplines could help make judgments about appropriateness where the medical literature was lacking. The investigators also felt that appropriateness criteria should be able to be applied to almost every patient in every possible situation where a clinician would be considering a certain procedure. Furthermore, the criteria had to be direct, describing the clinical scenarios in enough detail so each specific scenario could be labeled as appropriate or not. “Weasel words” could not be used to keep the determination of appropriateness vague or ambiguous for any situation.

The RAM involves review of literature in conjunction with the clinical judgment of a multidisciplinary panel. Fig. 179.1 shows a flowchart for the RAM. The original description of the method used a nine-member panel; however, panels can now be composed of six to 15 members. Panelists are often carefully selected through a process that seeks to bring the top experts in a field together to make determinations of appropriateness for a procedure or procedures for a certain condition. A set of specific definitions for all relevant but potentially ambiguous terms is provided to the panelists so that all panelists are making decisions from the same frame of reference. Clear definitions for every relevant term also help make application of the appropriateness criteria reproducible for real-life cases.

Fig. 179.1, Flowchart for use of the RAND Corporation/University of California Los Angeles Appropriateness Method in the development of appropriate use criteria.

Panelists are given an extensive literature review that discusses the risks and benefits of a procedure and are then asked to rate the appropriateness of performing the procedure for specific clinical scenarios, using both their clinical judgment and the best available literature. Note that panelists are asked to consider the average patient presenting to the average physician who performs the procedure at the average hospital without considering cost implications when making their determinations of appropriateness.

To be comprehensive, many different scenarios, often hundreds, must be rated. The appropriateness ratings are done on a nine-point scale, with 1 being the lowest (highly inappropriate) and 9 being the highest (highly appropriate). A rating of 5 is given when the risks and benefits are comparable. The panel rates each indication in two rounds, with the second round after in-person discussion. In the first round (often performed at home), each panelist rates the appropriateness for each clinical scenario. Then the panelists are able to see all of the group ratings so each panelist may compare them to their own ratings (the Delphi group process method). A moderator will lead the in-person discussion and go through each scenario one by one. It is important that the moderator be someone who is comfortable with the subject and very familiar with the literature review. Often, the moderator is a physician and has assisted in performing the literature review. It may be wise to not have a physician who performs the procedure(s) being rated to avoid the introduction of his or her bias into the discussion.

Discussion focuses on scenarios for which there was a wide range of ratings in the first round. The panel can choose to alter definitions of the terms at this time to suit their clinical judgment if there is disagreement or lack of clarity for all panel members. This is also the time to highlight new studies about which not all panel members may be knowledgeable. There can simply be disagreement among panelists based on their own critical assessment of the literature and their own clinical experience. The moderator does not try to force agreement as panelists prepare for the second round. The second-round ratings are then used for analysis with the median panel rating used to classify the indications.

Appropriate (rating of 7–9) indications are when the expected benefits of the procedure outweigh the expected harms. Equivocal (4–6) indications are when expected benefits and harms are nearly equal or when there is disagreement among panelists. Inappropriate (1–3) indications are when the expected harms outweigh the benefits. Appropriate indications may be further classified as necessary if it would be improper care to not offer the procedure to the patient, there is a reasonable chance the procedure will benefit the patient, and the magnitude of the benefit is not small. This may be done in a third round of ratings. Rarely, in cases of disagreement, indications are considered uncertain. The most commonly used definition of disagreement in a nine-member panel is when three panelists rated a scenario in the lowest tertile, three in the middle tertile, and three in the upper tertile.

The RAM has been studied extensively to test for its reliability and validity. , The results of the RAM are sensitive to panel composition, with physicians who perform the procedure more enthusiastic about its appropriateness than nonperformers. Kahan et al. found that performers of a procedure tended to rate procedures higher on the appropriateness scale compared with physicians in other specialties or primary care providers. Multispecialty panels provided more variation in appropriateness ratings, which led to fewer indications rated as appropriate. However, independent panels with the same composition of panelist specialties generate reproducible results (kappa 0.5–0.7). , Test-retest reliability of the same panelists has had a correlation coefficient greater than 0.9. The sensitivity and specificity of the RAM to identify inappropriate overuse of coronary revascularization has been estimated at 68% and 99%, respectively; while the sensitivity and specificity of the RAM in identifying underuse of the procedure has been estimated at 94% and 97%, respectively.

Existing Appropriate Use Criteria In Spine Surgery

There are several existing AUC that were created using the RAM and pertain to spine surgery. AUC have been published for cervical fusion and lumbar laminectomy procedures. There are also AUC published for the following conditions: degenerative lumbar scoliosis, degenerative lumbar spondylolisthesis, vertebral fragility fractures, and persistent pain after spine surgery. These existing AUC are summarized in Table 179.2 . The international consensuses reached using a modified Delphi survey performed by the AOSpine Knowledge Deformity Forum to identify appropriate management for both adolescent idiopathic scoliosis and adult spinal deformity are not included. ,

Table 179.2
List of Currently Published Appropriate Use Criteria in Spine Surgery
Condition/ Procedure Author Year Published Summary
Cervical fusion North American Spine Society 2013
  • 14-member panel rated over 250 scenarios

  • Fusion more appropriate for degenerative conditions with radiculopathy than axial pain alone

  • Both short and long fusions considered appropriate for myelopathy and/or radiculopathy

  • Single-level fusion may be considered appropriate for conditions without stenosis or causing axial pain only

  • Anterior fusion appropriate regardless of sagittal alignment

  • Posterior fusion more often appropriate with kyphosis

  • Combined anterior and posterior surgery appropriate for longer fusion constructs

  • Revision fusion appropriate for symptomatic pseudoarthrosis

  • Comorbidities, including smoking and psychosocial problems, affected the appropriateness of cervical fusion

Lumbar laminectomy Porchet et al. 1995
  • Nine-member panel rated 1000 scenarios

  • Major groups of indications were sciatica, back pain only, spinal stenosis, spondylolisthesis, miscellaneous, and repeat laminectomy

  • Laminectomy considered appropriate in 11% of cases, equivocal in 25%, and inappropriate in 63%

Degenerative lumbar scoliosis Chen et al. 2016
  • 11-member panel rated 260 scenarios

  • Surgery generally appropriate for moderate to severe symptoms and large/progressive deformity, moderate spinal/foraminal stenosis, sagittal plane imbalance

  • More extensive fusion and deformity correction procedures preferred in large/progressive deformities, imbalance, and severe multilevel stenosis

  • Surgery generally inappropriate for mild symptoms and smaller, stable deformities

Daubs et al. 2018
  • 11-member international panel rated 260 scenarios

  • When sagittal imbalance is present, surgery is more likely appropriate/necessary

  • With imbalance and moderate to severe symptoms, deformity correction procedure appropriate/necessary

  • Procedures not correcting imbalance usually inappropriate

Degenerative lumbar spondylolisthesis Mannion et al. 2014
  • 14-member international panel rated 744 scenarios

  • Surgery rated appropriate in 27% of cases, uncertain in 41%, inappropriate in 31%

  • The variables most likely to be components of scenarios rated appropriate for surgery were severe disability, severe neurological deficit, no yellow flags (psychological distress, inappropriate beliefs about back pain, unhelpful coping strategies)

Vertebral fragility fractures Hirsch et al. 2018
  • 12-member panel rated 576 scenarios

  • Vertebral augmentation appropriate in patients with positive findings on advanced imaging, worsening symptoms, and at least two unfavorable factors (progression of height loss, severe impact on functioning, >25% height reduction, kyphotic deformity)

Persistent pain postoperatively Tronnier et al. 2009
  • 18-member international panel rated 210 scenarios

  • Four treatment options: conservative, minimally invasive, neurostimulation, reoperation

  • Only one treatment appropriate in 48% of the scenarios

  • For patients without clear anatomic abnormalities and for those with new pain, conservative treatment appropriate

  • For patients with neuropathic leg pain in the absence of surgical indication, neurostimulation appropriate

  • For patients with recurrent disc, spinal/foraminal stenosis, or spinal instability, reoperation appropriate

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here