Rating Systems and Outcomes of Total Hip Arthroplasty


Key Points

  • The outcome of total hip arthroplasty (THA) is generally excellent, resulting in long-lasting improvement in pain and function for the vast majority of patients.

  • The profound improvement in pain and function associated with THA results in a large standard effect size that introduces a paradox of outcomes assessment in that patients find it subjectively difficult to interpret subtle differences in outcomes associated with the introduction of new THA technology in the face of overall improvement postsurgery.

  • The gold standard for assessment of outcomes after hip arthroplasty is prosthesis survivorship. Survivorship is limited by the fact that revision status is a relatively blunt metric and generally is nonrepresentative of function, degree of pain relief, and overall patient satisfaction after hip arthroplasty.

  • Higher precision metrics and scoring systems are necessary to help introduce the next phase of surgical innovation.

  • Survivorship for THA is generally improving over time.

  • All rating systems are a construct for the “true” outcome. As such, all rating systems are subject to variable bias.

The Oxford English Dictionary defines outcome as the result or effect of treatment. Rating systems are tools or metrics used to assess an outcome. Numerous types of rating systems have been proposed for assessment of the outcome of total hip arthroplasty (THA). These include objective metrics, usually completed by the surgeon or allied health care professional, and subjective metrics, usually completed by the patient. Traditionally, and early in the development of THA, outcomes have concentrated on objective outcomes to record the success of the THA in terms of survivorship or reduction in complications. This was a necessary first step in the introduction, evolution, and refinement of THA as a viable and highly successful procedure. However, because the World Health Organization (WHO) has defined health as “…not only the absence of infirmity and disease but also a state of physical, mental and social well-being,” subjective outcomes have become more prevalent to augment objective outcome metrics. Subjective outcomes refer more specifically to the patient's impression of the overall outcome of the surgical intervention, considering their personal experiences and expectations with the procedure. Subjective outcomes are commonly referred to as patient-reported outcome measures (PROMs).

Hip arthroplasty has been shown to have a profound impact on PROMs when preoperative and postoperative statuses are compared. Such profound results make preoperative and postoperative comparisons of different prosthetic designs and surgical techniques difficult to interpret and potentially irrelevant because assumed subtle differences in PROMs would be lost in the large signal of the surgical intervention. Paradoxically, the large delta in PROMs after hip arthroplasty is so large in some cases that it obscures the subtler signal of interest. Additionally, hip arthroplasty innovation has made it inherently difficult to distinguish between subtle changes in treatments ( Fig. 60.1 ). Given the difficulty in distinguishing between new surgical techniques and prostheses in the early postoperative period using PROMs, survivorship is often relied on to compare differences in outcomes. Unfortunately, differences in revision rates may occur late after significant penetration of a new technique or prosthesis into the surgical community. The net effect has been a proliferation of new hip prostheses into some countries, with no significant improvement in outcomes, most specifically, survivorship. Given the above, a surgeon should be mindful of the pitfalls and limitations associated with both objective and subjective metrics (PROMs).

Fig. 60.1
Asymptote of surgical innovation for total hip arthroplasty. Phase 1 innovation involved radical changes in technology from innovation to innovation (largely historical). Phase 2 represents modern, substantially more subtle, innovative changes.

The types of general objective outcome metrics to be applied to THA are relatively standard and noncontentious. They would broadly include surgeon-completed questionnaires, perioperative complications, and survivorship. PROMs are more contentious with respect to standardization. Subsequently, numerous PROMs have been introduced in the literature, and new metrics continue to be introduced. Researchers are subsequently forced to choose a metric based on its published psychometric properties or based on precedence and extraneous political factors. This practice has led to significant variation in the reporting of outcomes postarthroplasty. Although general trends in outcomes can be contrasted with various outcome metrics, subtler differences in outcomes are lost in the psychometric variability between outcome tools. Efforts have been made by the International Society for Arthroplasty Registries to catalog the various PROMs metrics used by international hip arthroplasty registries and to provide some guidance for further standardization and comparison between metrics.

Background

Objective Outcome Metrics—Surgeon Derived

One of the earliest assessments of hip mobility was devised in 1931 by Fergusson and Howarth to assess patients with slipped upper femoral epiphysis. In this purely objective assessment, points were allocated for hip flexion and abduction along with adduction and hyperextension. In 1954, Merle d'Aubigné and Postel further developed the evaluation tool into a hip scoring system by including a subjective component that scored pain.

The hip scoring system of Merle d'Aubigné and Postel was modified by Charnley in 1972 and has since become one of the most widely used hip assessment tools by orthopedic surgeons. The Charnley modification of the Merle d'Aubigné and Postel hip score assesses hip movements, pain, and walking. It is important to note that these categories require input from both the patient and surgeon (i.e., subjective and objective information, respectively), and the scores from each section are not combined for a total score.

The Harris Hip Score, developed in 1969, is one of the most widely used scoring systems to report outcomes following THA. It is a clinician-completed scoring system used to evaluate pain, function, range of motion, and absence of deformity. The function of a patient is judged by walking habits and the ability to do specified activities.

Surgeon-derived outcome measures can differ significantly from PROMs after THA. Discrepancies between patient and surgeon perspectives are particularly large when patients are dissatisfied with replacement surgery. Thus, appraising the success of a THA using objective information exclusively will account for only a portion of the complete picture; therefore, subjective metrics are an important component of a hip scoring system.

Objective Outcome Metrics—Survivorship

The gold standard for assessment of outcomes after THA is prosthesis survivorship. Survival analysis has been a powerful tool in the long-term assessment of replacement arthroplasty and allows comparison among types or series of joint replacements. Survivorship analysis was first used in orthopedics by Dobbs in 1980 and remains essential today.

Survival analysis provides critical information on the long-term performance of prostheses. For this reason, many countries have developed joint replacement registries. These national databases monitor the survival of implants based on many variables, such as material, fixation technique, and size.

Although survival analysis is an essential tool for measuring THA outcomes, it is a crude metric. Survivorship is based on an endpoint (e.g., revision, death) and often fails to account for the complexity of the variables involved. An implant can be revised (or not revised) for a variety of reasons; therefore, large sample sizes can be required to allow conclusions about a particular procedure. Hence, survival analysis has no predictive power, and its applications are limited to post hoc and trend analyses.

Precision Objective Metrics

Over the past decade, high-precision metrics—such as radiostereometric analysis (RSA) and gait analysis—have become increasingly available. Currently, these techniques are used mainly as research tools, but as they evolve and less expensive surrogates are validated, their use will become more pervasive and essential to assessment of arthroplasty outcomes.

Subjective Metrics—PROMs

PROMs became increasingly prevalent after the WHO altered its definition of health. Objective measures are unable to capture the mental and social well-being of patients. Unfortunately, subjectivity can make it difficult to make comparisons between different scoring systems; even the same scoring system reproduced in different languages can produce inconsistent results.

Reporting of total scores can have the effect of blurring results. Individual section scores may not be proportional and cannot be meaningfully added together. Proportionality is particularly difficult to achieve when objective (e.g., radiologic measurements) and subjective (e.g., patient pain assessments) composite scores are combined.

There are four broad categories of PROMs as applied to THA: general health, disease specific, joint specific, and patient specific. Each of these will be discussed in the “Basic Science” section of this chapter and are listed in Box 60.1 .

Box 60.1
List of Questionnaire Types

  • General health outcomes

  • Disease-specific outcomes

  • Joint-specific outcomes

  • Patient-specific outcomes

  • Single-item questionnaires

Chaotic Innovation

Unfortunately, the introduction of new technology usually does not follow a stepwise algorithm and could be considered chaotic. The initial step, preclinical testing, is robust in North America. However, instead of being incorporated into prospective randomized studies before general release, new technologies are often made immediately available to a wide surgical community. Little emphasis is placed on formal study of clinical outcomes. Only a few specialized academic centers conduct prospective randomized studies. Again, this introduces a reporting bias into the literature. Finally, most published studies on new technology are retrospective in nature, often published after the technology has already changed. See Fig. 60.2 for an illustration that demonstrates chaotic innovation.

Fig. 60.2, Unfortunately, the introduction of new technology usually does not follow a stepwise algorithm and could be considered chaotic. RCTs, Randomized controlled trials.

Basic Science

Objective Outcome Metrics—Surgeon Derived

With the advent of prosthetic components that demonstrate predictably good results, it became evident that more formalized outcome metrics were necessary. The initial response was that surgeons assessed the results of their interventions. Purely surgeon-derived outcome assessments were quickly shown to be inadequate without subjective data. In 1972, Sir John Charnley modified the Merle d'Aubigné and Postel hip score to assess the outcome of his prosthesis; this system has become one of the most widely used hip scoring systems. This score assesses hip movement, pain, and walking. It is important to note that these categories require input from both the patient and surgeon.

Objective Outcome Metrics—Survivorship

The Kaplan-Meier method is most commonly used to estimate prosthesis survival and to construct survival plots. It provides results that are independent of time intervals, in that survival is estimated at every failure time. Statistically significant differences can be assessed by using the log rank test. However, the log rank test does not allow adjustment for confounding factors. Relative risks for revision can be assessed and adjustments made for differences between compared groups (e.g., age, gender, diagnosis, and other confounding factors) by using the Cox multiple regression model.

A 95% confidence interval should be given when survival results are presented. These can be presented in tables or on curves ( Fig. 60.3 ). Murray and Tsiatis recommended the inclusion of a “worst-case” curve—in which all patients lost to follow-up are considered failures—to provide a statistically accurate statement of survival. In addition, Lettin and colleagues recommended that at least 40 surviving subjects are required to produce reliable results.

Fig. 60.3, An example of a survival curve with a 95% confidence interval. The survival curve is represented by the solid line for the sample of subjects. Confidence intervals, represented by the dotted line, become wider as the sample size decreases over time.

Revision is a definite and easily reproducible endpoint that can be influenced by extraneous factors, such as a patient's fitness for surgery and severity of pain. Other endpoints, such as the presence of severe pain, low functional scores, and radiographic failure, should also be included.

Arthroplasty registries use prosthesis survival as the primary outcome. Survival analysis is a definitive metric that facilitates comparison of outcomes between nations. National arthroplasty registries have organized to form the International Society of Arthroplasty Registries with the potential to compare and contrast survivorship outcomes. However, such comparisons are limited with respect to variations in demographics, including age at time of operation, diagnostic groupings, body mass index, gender, and activity levels. Research efforts are directed at defining the demographics of each nation/center in detail so that the denominator of the comparative data can be determined. Without this level of research, comparison of outcomes in survivorship between nations/centers is prone to misinterpretation. Furthermore, the specific method of defining survivorship should be standardized. For example, Cox's regression is a particularly useful method because it accounts for other factors, such as age and gender, which are known to influence outcomes. If such factors are not considered in outcome analyses, reported differences in survival curves between various prostheses are difficult to interpret, particularly on an international basis.

Arthroplasty registries function best as a surveillance tool for implant failure. As such, favorable and unfavorable trends in the outcomes of certain prostheses can be easily determined and disseminated back to the orthopedic community in a quality improvement feedback cycle. However, because arthroplasty registries are surveillance tools, there is an inherent lag in the reporting of outcomes; this creates a potential for suboptimal implants or techniques to penetrate and become part of the clinical norm before detection by the registry. A more accurate and predictive form of survivorship analysis would have the advantage of limiting new technology and techniques to fewer patients than is necessary to see trends with arthroplasty registries.

The gold standard for assessment of outcomes after hip arthroplasty is prosthesis survivorship. However, modern advances in prosthetic design and technique are such that the threshold for joint arthroplasty has moved from salvage operations performed in extreme cases to an intervention designed to improve the quality of life in patients who otherwise might cope without surgery. Hence, judging the success of the surgery may relate more to subtler improvements in quality of life, including relief of pain and improvement in function, most often measured by PROMs. Furthermore, technological innovation has improved the design of prostheses, ensuring survival in situ, barring infection, for at least a decade with relative certainty. Consequently, the homogeneity of current prostheses (with respect to stable and lasting designs) has produced an emerging emphasis on quantifying subtler outcomes after arthroplasty.

Limitation of Survivorship Analysis

Arthroplasty registries rely on revision status as the sole endpoint for defining outcomes after arthroplasty. Revision status is a useful measure because it is relatively easy to define and the incidence of revision is definite. Although definitive, revision status is a relatively blunt metric and is generally nonrepresentative of function, degree of pain relief, and overall patient satisfaction after THA. Furthermore, different surgeons have different thresholds for performing revisions, and not all patients who require revision surgery undergo the procedure because of coexisting medical problems, personal wishes, and so forth. Revision status yields data only on the small minority of operations that fail. The same set of arguments generally holds true for the outcome of continuous migration, as defined by RSA, which essentially acts as a surrogate for revision status. Although some evidence suggests that subjective outcomes may be correlated with RSA-defined migration patterns, this phenomenon is not widely reported in the literature.

Subjective Outcomes

Pythagoras mused that “man is a measure of all things.” The implication of this statement speaks to the conceptualization that the distinction between mind and body is blurred or, indeed, that there is no distinction at all. Although the Western philosophical distinction between mind and body has its origins with the ancient Greeks, it was the works of Renés Descartes that formalized the modern distinction between mind and body. According to Descartes, the rational soul is an entity distinct from the body that may or may not be aware of signals passing through the body via interfibrillar spaces. The interfibrillar spaces (i.e., the sensory nervous system) were “extended” into the physical world, while the rational soul (i.e., consciousness) was not. This distinction between mind and body has persisted into modern Western medical thought.

In 1947, the WHO defined health as follows: “Health is not only the absence of infirmity and disease but also a state of physical, mental and social well-being.” This definition reintroduced the concept that mind and body are, in fact, one and the “well-being” of the mind and body combined represents health. Subsequently, the measurement of health moved from simply defining the success of a procedure by determining its effect on infirmity and disease to the more ambitious approach of defining what effect the intervention had on physical, mental, and social well-being. By this definition, it was no longer adequate to define the outcome of a hip arthroplasty, for example, simply by stating what the range of motion was or what the impact was on mobility. Instead, more comprehensive metrics were needed.

The definition of health put forth by the WHO was perhaps the impetus for the modern movement to measure physical, mental, and social well-being. The first attempts at quantifying general health involved single-item global ratings that were designed to augment organ-specific or more physiologic outcomes. Over time, numerous PROMs were developed that asked more questions around various aspects of health, such that separate scores for each of these health domains were generated. Domains that attempted to account for physical, mental, and social well-being included emotional reaction, sleep, social isolation, body pain, and social functioning, for example. Advanced study and refinement of these tools continue today. The introduction and evolution of generic (or general) health measurements have been well documented. Measurements of this sort are often referred to as “subjective,” as they are difficult to quantify. Still, some form of logical metric was imperative for further research. This dilemma was eloquently alluded to by Lord Kelvin when he said, “I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind.” The WHO continues to be interested in this area of outcomes research. At a workshop in January of 2000 under the umbrella of the Bone and Joint Decade 2000–2010, the need to standardize outcome metrics for musculoskeletal research was discussed. The International Society of Arthroplasty Registries has more recently made efforts to standardize PROMs associated with THA registries. Although the WHO definition of health may be largely responsible for the emergence of general health outcome questionnaires, the first aspect of the definition, that is, “the absence of infirmary or disease,” has not been lost on researchers. Similar evolution of PROMs focused on the organ (or site) or physiologic process (disease) that has come about.

Subjective Outcomes—Patient-Reported Outcome Measures

Psychometric Considerations: What Makes a Good Patient-Reported Outcome Measure?

Definition of Psychometrics

Psychometrics can be defined as “the scientific measurement of mental capacities and processes and of personality.” In other words, psychometrics is the process that allows researchers to apply scientific methods to the measurement of subjective outcomes. In practical terms, the published psychometric properties of a questionnaire pertain mostly to validation of the questionnaire, or to defining how well the questionnaire measures what it is supposed to measure, in a global sense. The validation process usually involves three specific aspects of questionnaire testing: validity, reliability, and responsiveness.

Validity

Validity refers more specifically (as opposed to validation) to how well the questionnaire measures the question of interest. Validity can take many forms, and numerous synonyms have been used in conjunction with it. These include criterion , construct , convergent , divergent , and content validity. Before one can comment on the validity of a questionnaire, the results of the questionnaire must be compared with something.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here