Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The last century has brought remarkable advances in the field of orthopaedic traumatology. The creation and dissemination of this new knowledge could not have been possible without the ability of surgeon scientists to measure and communicate their patient outcomes. Just as the delivery of skeletal trauma care has evolved through the decades, so have the techniques used to quantitate and report these advances.
The concept of outcome measurement is not new to orthopaedic surgery. In 1914, the term “end result” was coined by Dr. Ernest Codman, an orthopaedist who is most recognized for his contributions to shoulder rehabilitation. In addition to his work on the shoulder, Codman carried the strong belief that all patients should be reevaluated at the end of 1 year to learn from the patient's “end result” of treatment. Unfortunately the medical community was not receptive to these initiatives and the concept lay dormant for decades.
As skeletal trauma care advances burgeoned during the latter half of the twentieth century, pioneers recognized the need for improvements in the ways in which injuries were described and results reported. Contributing to the dramatic improvement in skeletal trauma care was the formation of the Swiss fracture study group Arbeitsgemeinschaft für Osteosynthesefragen (AO). Since its inception in 1958, one of the central tenets of this group was the rigorous documentation of cases. As the principles espoused by this group gained acceptance internationally, there soon became the need to report the outcomes of this novel approach to fracture care. Unfortunately, the standardized fracture classification scheme of the AO was not initially combined with standardization or consistency in outcome measure. Those authors sharing their experience in the medical literature were forced to devise their own outcome measures. A prototypical example of this is Anderson's classic 1975 article describing his group's success with compression plating of both-bone forearm fractures. In this article, the authors “arbitrarily” grouped patients into those with union, delayed union, and nonunion. Additionally, they characterized functional outcome as excellent, good, unsatisfactory, or failed without precise definitions of these categories. This article, alongside the vast majority of works from that era, was retrospective in design and without consistency in the manner in which results were reported. Standard data from that era focused on parameters such as range of motion, stability, alignment, radiographic union, lost fixation, and infection. These outcomes were used because they were the ones important to the orthopaedist and available by retrospective chart review. Clinical outcomes were simply grouped into large categories arbitrarily defined by the authors.
While this generation of outcome measurement played an important role in the advancement of skeletal trauma care, substantial shortcomings were inherent. Comparing rates of “excellent” results in one study to another was fraught with difficulty because of the lack of consistent standards. Additionally, objective clinical outcomes are known to have significant intra- and interexaminer inconsistencies bringing into further question the validity of these results. As orthopaedic clinical research design improved to include multicenter trials and prospective randomized design, a distinct need was recognized for improvement in outcome measures.
With these acknowledged shortcomings, the orthopaedic community zealously addressed this issue with the introduction of dozens of outcome measures directed at specific anatomic sites. These regional scores typically combined physician-assessed parameters, functional abilities, and patient's perception of pain. Some of the more common tests that evolved included the Michigan Hand Outcomes Questionnaire, American Shoulder and Elbow Surgeons Elbow Questionnaire, Constant-Murley Shoulder Outcome Score, American College of Foot and Ankle Surgeons scoring scales, Iowa Knee Score, and the Harris Hip Score. These measures would typically take between 10 and 30 minutes for completion and vary between those that were completed exclusively by the patient to those that required an adjunct examiner. With this proliferation, the clinical research landscape rapidly transitioned from an environment void of published outcome measures to one in which there rapidly became too many to choose from. With this ever-increasing number of outcome instruments, a new body of literature emerged directed at choosing the best instrument for some of the various anatomic regions. Martin and Irrgang and coworkers have studied the performance of 14 patient-completed foot and ankle scores. They concluded that there were significant advantages and disadvantages for each of the 14 but that a consistently dominant choice was not available. Others examining the various measures for the hand, elbow, and hip have drawn similar conclusions. In pursuit of creating a greater degree of consistency and comparability between studies, scores are reflective of upper and lower extremities. The Disabilities of the Arm, Shoulder, and Hand (DASH) questionnaire is an example of such an instrument that applies to the upper extremity. This instrument is a 30-question, self-administered evaluation of physical activity, pain, symptom severity, and impact of upper extremity disease on everyday activities. Answers are weighted and compiled to produce a single score for intra- and interpatient comparison. Because it has been well studied in a wide range of upper extremity applications, translated into numerous languages, and recently been made available in a shortened form, it has become one of the more popular regionally specific outcome measures in orthopaedic trauma.
Beyond focused regional outcome measures, it is becoming increasingly clear that these joint-specific measures do not always provide insight into outcomes that are of importance to patients. A significant amount of literature supports the observation that good clinical outcomes as determined by physicians do not necessarily indicate good functional outcomes from the patient's perspective. This realization has led to an exponential growth in use of “patient-reported outcomes” (PROs) of general health in important prospective clinical trials. The driving philosophy of patient-derived outcome measures is that they reflect on the health and well-being of domains of life that extend well beyond those captured in a regional outcome measure. While data procurement may be facilitated through interviews, this only qualifies as a PRO if the interviewer is gaining the patient's views, not where the interviewer uses patient responses to make a professional assessment or judgment of the impact of the patient's condition. Because, by definition, general health outcome measures have application to a wide variety of medical and social conditions, their associated population norms, domain specificity, and reproducibility have been extensively studied. Some of the most useful measurement tools used in orthopaedic outcome measurement include the Medical Outcomes Study Short Form 36-item health survey (SF-36), the Quality of Well-Being (QWB) scale, the Sickness Impact Profile (SIP), and the EuroQol Groups’ five dimensions (EQ-5D) questionnaire. The use of generic instruments to obtain physical, mental, and functional outcome data has become so commonplace that some granting institutions require the incorporation of a generic questionnaire to the design of clinical projects. Each of these instruments assesses domains of human activity, including physical, psychological, social, and role functioning. The ultimate effect is a global evaluation of the patient as a whole being rather than a disease, an injury, or an organ system.
The SF-36 was developed by Ware and colleagues and the Rand Corporation as a part of the Medical Outcomes Study. The SF-36 is the most widely applied general health status instrument and has certain features that make it particularly appealing for studying musculoskeletal injury. The SF-36 consists of 36 scaled-response questions (0 = poor, 100 = best) concerning eight different functional subscales: bodily pain, role function–physical, role function–emotional, social function, physical function, energy/fatigue, mental health, and general health perceptions. Each scale is scored separately. These subscales can be combined into the Physical Component Score and the Mental Component Score. The SF-36 has been published with normative values for the U.S. population, which vary with age, gender, and comorbidities.
The SIP, a 136-question endorsable statement (yes/no) questionnaire, requires trained interviewers for administration and takes 25 to 35 minutes to complete. The SIP inquires about 12 different domains, which are first scored independently, then combined into physical and psychosocial subscales, as well as one aggregate score. The scale is 0 to 100 points—the higher the score, the worse the disability. Patients with scores in excess of the mid-30s have significantly diminished quality of life. The SIP has been used in patients with multiple health conditions and allows for comparisons of impact of disease on health. The SIP has been used in musculoskeletal trauma with good success. Lesser degrees of musculoskeletal dysfunction are not identified, and therefore, the SIP also suffers from the ceiling effect. Because of the difficulty and length of its administration, the SIP may be most useful for well-funded outcome studies or controlled trials.
The EQ-5D is so named because it assesses five dimensions of health status—mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The first three dimensions reflect physical functioning. For pain level estimation, patients are asked to choose between three responses for level of pain and discomfort (none, moderate, or extreme). However, if chronic pain is well controlled, the patient's best response may be most appropriately described as “mild,” which is not an option. The answers within each dimension are then weighted, based on preferences from 0 to 1 that correlate to worst versus best health, and then used to calculate a final score.
Although the SF-36, SIP, and EQ-5D are some of the most widely used general health outcome measures, a common concern over their use in musculoskeletal measuring outcomes of skeletal injury is the ceiling effect. This form of measurement failure occurs when a patient's level of function scores at the top end (ceiling) of a given tool. This suggests that the range of function tested by the particular tool is not high enough for the condition under consideration. In response to this concern, the Musculoskeletal Functional Assessment (MFA) and the Short Musculoskeletal Functional Assessment (SMFA) questionnaires were developed. While continuing focus on the patient's perceived general health outcome, the content of these tools was directed and function derived from use of the extremities. The MFA is a generic, 101-item instrument that assesses function in 10 domains, with emphasis on musculoskeletal function: self-care, emotional status, recreation, household work, employment, sleep and rest, relationships, thinking, activities using arms and legs, and activities using hands. The MFA requires approximately 15 to 20 minutes to complete and can be either self- or interviewer-administered.
The drawback of the MFA is that its detail becomes time-consuming for patients and staff. To address this concern, the SMFA was developed. Investigators selected questions from the longer MFA based on universality, applicability, uniqueness, reliability, and validity. The SMFA is a 46-question, self-administered instrument that can be completed in approximately 10 to 15 minutes. This instrument is divided into two parts. The first part has four categories: daily activities, emotional status, arm/hand function, and mobility, with an accompanying five-point scale for patients to estimate their function; part 1 questions are then totaled to create the “dysfunction index.” The second part contains 12 questions that assess the degree to which patients are bothered in recreation and leisure, sleep and rest, work, and family, also with an accompanying five-point scale; responses from the second part are combined to create the “bother index.” Table 79.1 provides an overview of some of the more common outcome measures used in the reporting of results in orthopaedic trauma.
Term | Definition |
---|---|
Performance | What is done and how well it is done to provide healthcare (JCAHO 2002) |
Performance Measurement * | The use of both outcomes and process measures to understand organizational performance and effect positive change to improve care (Nadzam and Nelson 1997) |
Performance Indicator † | Markers or signs of things you want to measure but which may not be directly, fully or easily measured (Alberta Government 1998) |
Performance Measure | A quantitative tool, such as rate, ratio or percentage, that provides an indication of an organization's performance in relation to a specified process or outcome (JCAHO 2002) |
Process Measure | A measure focusing on a process that leads to a certain outcome, meaning that a scientific basis exists for believing that the process, when executed well, will increase the probability of achieving a desired outcome (JCAHO 2002) |
Outcome Measure | Not simply a measure of health, well-being or any other state; rather, it is a change in status confidently attributable to antecedent care (intervention) (Donabedian 1968) |
* Like this one, many performance measurement (PM) definitions included the use of measurement results for organizational improvement that implies performance management—and resulted in these two terms being used interchangeably in the literature.
† Despite these distinctions, the terms performance measure and performance indicator were usually used interchangeably in most general discussions about PM because either or both are used in PM.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here