Introduction

Critical Points
Introduction

  • Many knee-specific scales and rating systems have been published in the last decade with acceptable psychometric properties for a variety of diagnoses.

  • Scales rate specific activities, such as sports or daily functions.

  • Rating systems provide a comprehensive analysis of the knee condition and its impact on activity and function after the treatment protocol under study.

  • Only three knee rating systems are currently available that have established reliability, validity, and responsiveness: Cincinnati Knee Rating System, International Knee Documentation Committee Knee Evaluation system, and Knee Society Scoring System.

  • The Cincinnati Knee Rating System is one of the most commonly used instruments in the orthopaedic literature to measure the results of ACL reconstruction, and has been considered a gold standard in the development and content and criterion validity analyses of other knee rating scales.

The assessment of outcome following the treatment of knee injuries and disorders has received tremendous attention in the orthopaedic literature. In regard to rating the outcome of anterior cruciate ligament (ACL) reconstructions, early scoring scales and systems were introduced into the medical community without undergoing an assessment of the psychometric properties of reliability, validity, and responsiveness. There was a lack of consensus regarding which variables to include in the measurement of patient outcome. As a result, studies conducted in the 1990s comparing the results of ACL reconstruction using different rating systems showed distinct differences in results and conclusions.

However, many knee-specific scales and rating systems have been published in the past decade with acceptable psychometric properties for a variety of diagnoses ( Table 41-1 ). It is important to understand the difference between scales that rate specific activities and rating systems that are more encompassing in terms of assessing the entire knee condition. For instance, scales have been designed to quantitate athletic activity levels, problems with activities of daily living (ADLs), or both, all from the patient's perspective. We agree with Zarins that although the subjective assessment of symptoms and functional limitations is important, the final outcome of a specific treatment must also take into account objective measures that are appropriate for the diagnosis or injury under study. The determination of patient outcome according to subjective questionnaire-based data only does not provide a complete understanding of the ability of the treatment protocol to restore normal knee function. For instance, knee rating scales exist in which a patient may be rated as “excellent” even though an ACL reconstruction failed to restore normal or nearly normal knee stability, which is a major goal of the operation. It is well known for this injury that in the short term, patients may do well without a functional ACL, but over time, the knee joint deteriorates, and problems increase to eventually affect daily activities.

TABLE 41-1
Psychometric Properties Measured in Knee-Specific Outcome Instruments
Targeted Diagnosis Study Outcome Instruments Reliability Measured? Validity Measured? Responsiveness Measured?
ACL deficiency, ACL reconstruction Lysholm (1982) Lysholm No No No
Tegner & Lysholm (1985) Lysholm, Tegner No No No
Hanley & Warren (1987) HSS No No No
Straub & Hunter (1988) Sports Performance Index No No No
Seto (1988) Sports Participation Survey No No No
Mohtadi (1998) ACL-QOL Yes Yes No
Irrgang 1998 IKDC Knee Evaluation Form No Yes No
Roos (1998) KOOS Yes Yes Yes
Barber-Westin (1999) Cincinnati Knee Rating Yes Yes Yes
Briggs (2006) Lysholm & Tegner Yes Yes Yes
Briggs (2009) Lysholm & Tegner Yes Yes Yes
Salavati (2011) KOOS Yes Yes No
“Nonoperative” Leigh Brown (1999) Edinburgh Knee Function Yes Yes No
Chondral lesion Kocher (2004) Lysholm Yes Yes Yes
Bekkers (2009) KOOS Yes Yes No
Osteoarthritis Bellamy (1988) WOMAC Yes Yes Yes
Rejeski (1995) Knee Pain Yes Yes No
Williams (2012) WOMAC, Pittsburgh ADL, LEFS Yes No Yes
Knee arthroplasty Amstutz (1984) UCLA No No No
Insall (1989) Knee Society No No No
Finch (1995) LEAP Yes Yes Yes
Zahiri (1998) UCLA No Yes No
Dawson (1998) Oxford Yes Yes Yes
Liow (2000) Knee Society Yes No No
Lingard (2001) Knee Society, WOMAC No Yes Yes
Roos (2003) KOOS Yes Yes Yes
Whitehouse (2003) Reduced WOMAC Yes Yes Yes
Naal (2009) UCLA, Marx Activity Rating, Tegner Yes Yes No
Talbot (2010) High-Activity Arthroplasty Score No Yes No
Scuderi (2012), Noble (2012) New Knee Society Scoring System Yes Yes Yes
Revision knee arthroplasty Saleh (2005) Lower Extremity Activity Scale Yes Yes Yes
Meniscus procedure Briggs (2006) Lysholm & Tegner Yes Yes Yes
Crawford (2007) IKDC Knee Evaluation Form Yes Yes Yes
Patellofemoral pain Chesworth (1989) Functional Index Yes Yes No
Kujala (1993) Kujala No No No
MacIntyre (1995) Functional Index Yes Yes No
Harrison (1995) Functional Index Yes Yes Yes
Piva (2009) Pittsburgh ADL No No Yes
Lee (2013) Samsung Medical Center Yes Yes Yes
Articular cartilage procedure Ebert (2013) KOOS, Lysholm, Tegner No No Yes
Greco (2010) IKDC Subjective, WOMAC, Cincinnati Yes No Yes
Patellar dislocation Paxton (2003) IKDC, Kujala, Fulkerson, Lysholm, Tegner Yes Yes No
Variety of knee problems combined Irrgang (1998) Pittsburgh ADL, Lysholm Yes Yes Yes
Binkley (1999) LEFS Yes Yes No
Marx (2001) Marx Activity Rating Yes Yes No
Marx (2001) Cincinnati, Lysholm, ADL, AAOS Sports/Knee Yes for all Yes for all Yes for all
Irrgang (2001) IKDC Subjective Yes Yes No
Johanson (2004) AAOS Lower Limb Core Scale Yes Yes Yes
Irrgang (2006) IKDC Subjective NA NA Yes
ACL , Anterior cruciate ligament; ACL-QOL , Anterior Cruciate Ligament-Quality of Life; ADL , activities of daily living; HSS , Hospital for Special Surgery; AAOS , American Academy of Orthopaedic Surgeons; IKDC , International Knee Documentation Committee; KOOS , Knee Injury & Osteoarthritis Outcome; LEAP , Lower Extremity Activity Profile; LEFS , Lower Extremity Functional Scale; UCLA , University of California at Los Angeles; WOMAC , Western Ontario and McMaster Universities Osteoarthritis Index.

In contrast to scales, rating systems provide a comprehensive analysis of the knee condition and its impact on activity and function. Clinical investigators have suggested that such systems should measure a variety of symptoms, sports and daily activity functions, patient satisfaction, and objective physical findings. Only three knee rating systems are currently available that have established reliability, validity, and responsiveness: the Cincinnati Knee Rating System (CKRS), the International Knee Documentation Committee (IKDC) Knee Evaluation system, and the new Knee Society Scoring System. The CKRS and IKDC Knee Evaluation measure pain, swelling, giving way, functions of sports and daily activities, sports activity levels, patient perception of the knee condition, range of knee motion, joint effusion, tibiofemoral and patellofemoral crepitus, knee ligament subluxations, compartment narrowing on radiographs, and lower limb symmetry during single-leg hop tests. The CKRS has been validated for a variety of knee problems and the IKDC has been validated for ACL reconstruction, patellar dislocation, and meniscus procedures. The IKDC system is discussed in detail in Chapter 42 . The new Knee Society Scoring system is discussed in detail in Chapter 45 .

General health outcome instruments are also available that determine general health and other indicators such as the psychological and social aspects of the patient's life ( Table 41-2 ). The most widely used of these types of scales for knee research appears to the be Short Form 36 (SF-36) Health Survey, although a recent study indicated the Short Form 12 Health Survey may also be appropriate for ACL reconstruction studies. Generic health questionnaires have limited usefulness in studies comprised of patients with a specific diagnosis and therefore should be used in addition to disease-specific rating systems. Finally, global outcome instruments that are extremity specific (upper or lower) or disease specific (e.g., osteoarthritis) are widely used to determine the outcome of a many types of knee injuries and disorders. The most commonly used assessments are discussed in Chapter 43 .

TABLE 41-2
Psychometric Properties Measured in General Health Outcome Instruments Used in Knee Studies
Targeted Diagnosis Study Outcome Instrument Reliability Measured? Validity Measured? Responsiveness Measured?
ACL deficiency, ACL reconstruction Salavati (2011) SF-36 Yes Yes No
Knee arthroplasty Lingard (2001) SF-36 No No Yes
Articular cartilage procedure Greco (2010) SF-36 Yes No Yes
Ebert (2013) SF-36 No No Yes
Patellar dislocation Paxton (2003) SF-36, MFA Yes Yes No
Various knee problems Martin (1997) SF-36, MFA Yes Yes Yes
ACL , Anterior cruciate ligament; SF-36 , Short Form 36; MFA , Musculoskeletal Function Assessment.

The CKRS was first published in concert with the largest ACL natural history study conducted during that time period. In the early 1980s, the dilemma of the appropriate treatment for complete ACL ruptures stemmed in part from limited knowledge of the functional limitations caused by the injury and the lack of a rigorous rating system which graded symptoms and limitations according to the specific type of activity during which they occurred. Over the next decade, additional scales and modifications were developed for the CKRS to provide a comprehensive assessment of the knee condition. An overall rating scheme was devised to provide a final rating, which is available in either in a numeric or categorical manner, as is discussed later. The major components of the CKRS are shown in Table 41-3 . This system is one of the most commonly used instruments in the orthopedic literature to measure the results of ACL reconstruction and has been considered a gold standard in the development and content and criterion validity analyses of other knee rating scales. The CKRS was initially designed and validated in athletically active populations; however, it is also useful for patients who have undergone other operative procedures such as articular cartilage restorative procedures, meniscus repairs or transplants, osteotomies, or patellofemoral procedures. This chapter describes the rationale and methodology for the major components of the CKRS.

TABLE 41-3
Components of the Cincinnati Knee Rating System
  • I:

    Subjective Assessment

    • Symptom Rating Scale

    • Patient Perception of the Overall Knee Condition Scale

    • Sports Activity Scale

    • Rating of Individual Sports and Daily Activity Functions

    • Occupational Rating Scale

  • II:

    Patient History

  • III:

    Knee Examination

  • IV:

    Objective Testing

  • V:

    Operative Procedures and Articular Cartilage Rating

  • VI:

    Postoperative Complications

  • VII:

    Overall Rating

Measurement of Psychometric Properties of Outcome Instruments

Critical Points
Review of Analyses Used to Measure Psychometric Properties of Outcome Instruments

  • Reliability is the extent to which scores on an instrument are reproducible and is measured either between subjects (test-retest) or among observers (interobserver).

  • Internal consistency represents the concept that the consistency with which a patient answers from one question to the next can be used to provide an estimate of reliability for the total test score.

  • Construct validity is the extent to which a measure corresponds to expected theoretical concepts or hypotheses regarding the diagnosis.

  • Item-discriminant (or -divergent) validity is present when variables which are hypothesized to be dissimilar are proven so.

  • Criterion (or concurrent) validity is assessed by correlating scores on the instrument under study with other criteria known or believed to adequately measure the function or symptom.

  • Responsiveness is the ability of an instrument to detect clinically important change.

Outcome instruments and scales must have strong psychometric properties to be useful in evaluating the results of treatment. These properties—(reliability, validity, and responsiveness)—are determined in a variety of methods.

Reliability is the extent to which scores on an instrument are reproducible and is measured either between subjects (test-retest) or between observers (interobserver). Patients complete questionnaires at separate time periods; a minimum 1-week interval was recommended by Deyo and coworkers to elapse between questionnaire administration. Reliability is measured with product-moment correlations and intraclass correlation coefficients (ICCs). ICC is the most commonly used statistic in modern studies and is calculated by


( A 2 + B 2 C 2 ) / A 2 + B 2 + D 2 C 2 / n

where A is the standard deviation (SD) from all values in trial number 1, B is the SD from all values in trial number 2, C is the SD of the difference in all values between trials number 1 and number 2, D is the mean of the difference between trials number 1 and number 2, and n is the total number of patients.

Correlations among test-retest data should be greater than 0.70, which is considered the standard for adequate reliability for questionnaires. The use of ICC rather than the more common Pearson correlation coefficient was suggested by Deyo and coworkers and Lin to provide a more sensitive assessment of variability within data. The problem that can occur with the Pearson correlation coefficient is that duplicate measurements may be systematically different yet correlate highly and, as a result, be falsely interpreted. For example, if every patient scored exactly 5 points lower on a scale on the second administration, the test-retest correlation would be a perfect 1.0, despite the fact that every patient had a lower score. The ICC handles this problem because it not only assesses the strength of the correlation but also determines whether or not the slope and intercept in the regression line of test-retest data vary from those expected with duplicate results. In our example, the ICC would correspondingly be reduced to demonstrate the systematic difference among the test-retest data.

Internal consistency is determined by a coefficient α greater than 0.60. The underlying concept of this measure is that the consistency with which a patient answers from one question to the next can be used to provide an estimate of reliability for the total test score. A high coefficient α indicates that the items in a questionnaire are consistently measured or are homogeneous with regard to the measurement of the underlying diagnosis or attribute.

Several measures have been described to determine the validity of an instrument, including content, construct, item discriminant, convergent, and criterion. In general, validity is the psychometric criterion in which an instrument is tested to determine its ability to actually measure what it claims to measure. Content validity is the extent to which a question or instrument represents the area of interest and has been described in various manners. Face validity, one example of content validity, is determined by consulting both patients and experienced medical professionals regarding the development of a scale's questions and their relevance to the diagnosis under study. For instance, for ACL reconstruction investigations, a questionnaire with good face validity would be believed by patients, surgeons, and therapists to measure the common problems caused by this injury, such as pain and instability. This represents a subjective analysis that is not statistically analyzed. Another method to determine content validity is to calculate floor (worst result) and ceiling (best result) effects. Scales in which the majority of patients score either the highest level or the lowest level do not allow for an assessment of deterioration or improvement over time. Floor and ceiling effects are present when greater than 30% of the population marks either the best possible or worst possible scores on a scale.

Construct validity is the extent to which a measure corresponds to expected theoretical concepts or hypotheses regarding the diagnosis. An instrument will accurately differentiate patients whose outcome is expected to vary with regard to certain characteristics known to the disease process. Researchers develop hypotheses based on prior research and clinical experience in which the questionnaire scores are expected to be significantly different among selected patient groups. The hypotheses are confirmed using the F and T-test at the level of P <.01. In addition, construct validity is determined by conducting Pearson product-moment correlation coefficients between scale items and either previously validated instruments or physician and patient assessments. A moderately strong coefficient is proven at R > .60.

Item-discriminant (or item-divergent ) validity is present when variables which are hypothesized to be dissimilar (such as patient age and anteroposterior [AP] knee displacements) are indeed found to be statistically unrelated. Pearson correlations are performed to detect statistical dissimilarities, proven at R = 0.28 or less. In the opposite manner, convergent validity is present when variables which are believed to be similar within the questionnaire are indeed found to be statistically similar.

Criterion (or concurrent ) validity is assessed by correlating scores on the instrument under study with other criteria known or believed to adequately measure the function or symptom. This determines how a new instrument compares with an accepted gold standard instrument. The Pearson product-moment correlation coefficient is used to determine this property, with moderately strong findings indicated by R >0.60.

Responsiveness , or the ability of an instrument to detect clinically important change, is determined by calculating standardized response means (SRM) and effect sizes (ES) of the selected instrument categories. The magnitude of the SRM (mean change in score from preoperative to follow up/SD of change in score) and the ES (mean change in score from preoperative to follow up/SD of preoperative score) are interpreted using the Cohen standard of greater than 0.20 for small effects, greater than 0.50 for moderate effects, and greater than 0.80 for large effects. This analysis provides a more precise indication of the change in results over time from those obtained by the standard Student t-test. An instrument's sensitivity simply denotes its ability to measure any change, which by definition does not necessarily indicate one that is clinically meaningful.

Components of the CKRS

Critical Points
Rating of Symptoms and Patient Satisfaction

Rating of Symptoms

  • Pain, swelling, partial giving way and full giving way are major symptoms assessed in knee injuries and disorders.

  • Symptoms are rated according to the highest activity level possible without incurring the symptoms.

  • Six-level scale

Rating of Patient Perception of the Knee Condition

  • Patients rate the overall condition of the knee by circling a number on a scale from 1 to 10.

  • Descriptors are provided to assist the patient in understanding the meaning of the numerical scale.

  • Most subjective of all factors, not correlated with other outcome measures

Rating of Symptoms

Pain, swelling, partial giving way, and full giving way are the major knee symptoms assessed in ACL investigations. Pain and swelling are also symptoms that may occur in all other types of knee injuries and disorders, and instability is a problem with other knee ligament ruptures and certain patellofemoral disorders. Authors have proposed a variety of methods for measuring symptoms, from a binary system (“yes” or “no” ), to visual analogue scales, to a severity rating (such as mild, moderate, severe), which can be done either alone or in combination with activities (such as “slight after strenuous sports”).

In 1983, Noyes and associates proposed that knee symptoms should be rated according to the activity during which they occurred: strenuous sports, recreational sports, or walking. This rationale provided an understanding of the impact of a chronic ACL-deficient knee because 30% of 103 patients reported pain with walking alone, 47% with recreational sports, and 69% with strenuous sports in the authors' natural history study.

The assessment of symptoms was later refined and the scale increased to a six-level gradient shown in Figure 41-1 . Points are awarded for the highest activity level in which the patient is able to participate without incurring the symptom ( Appendix A ). If the symptom is present with ADL, it is rated as either moderate (frequent, limiting) or severe (constant, not relieved). Definitions are provided for terms that might otherwise be ambiguous to patients, such as “moderate” sports (running, turning twisting) and “strenuous” sports (jumping, hard pivoting).

FIG 41-1, Symptom Rating Scale.

When reporting symptoms before and after surgery, a distribution of the percent of patients in each of the six levels should be shown (along with a mean and SD) for both time periods. An example is shown in Figure 41-2 for a group of patients who received a meniscus transplant. The data were also expressed in the body of the text as “the mean preoperative Cincinnati Knee Rating Scale pain score of 2.5 points (range, 0-6 points) improved to a mean of 5.8 points (range, 0-10 points) at follow up ( P <.0001). Before the meniscus allograft, thirty patients (79%) had moderate to severe pain with daily activities but at follow up, only four patients (11%) had pain with daily activities.”

FIG 41-2, The pain scale shows the highest level of activity possible without the patient experiencing knee pain. This example was taken from a clinical study on meniscus transplantation. The difference between the preoperative and follow-up visit was statistically significant ( P <.0001). Mod-sev, Moderate to severe.

One problem may occur in the rating of symptoms when patients have not attempted to return to strenuous sports activities. In these situations, a potential bias may occur if the patient or clinician attempts to project the correct symptom level based on a hypothetical answer. For instance, if a patient returned to bicycling or swimming and had no pain with those activities, then the pain score awarded would be a level 6 (see Fig. 41-1 ). However, if the patient is asked whether she or he believes pain would occur with level 8 activities (running, twisting, turning) and she or he responds that it probably would not occur, a bias would occur if this score was assigned without further verification that this was indeed true. This is often the case with the symptom of giving way, because patients frequently return asymptomatically to level 6 or 8 activities postoperatively but state that they participated a few times at level 10 activities (jumping, hard pivoting) without problems. Points are awarded only with a reasonable basis for the assessment and not speculated by the patient as to the level that may be possible.

This potential bias is a particular problem with populations of chronic knee injuries with compounding problems of advanced articular cartilage deterioration, multiple ligament reconstructive procedures, or varus osseous malalignment, which is corrected with a high tibial osteotomy (HTO) in addition to ACL reconstruction. This problem has two solutions. First, the clinician may ask the patient to test the knee at higher levels of activities if both patient and physician believe this is a reasonable request. Then, the patient can be contacted later for a symptom rating after he or she has participated in strenuous activities several times.

Second, the clinician may use the modified symptom rating scale ( Fig. 41-3 ) that was first introduced in an investigation of patients with varus osseous malalignment and ACL deficiency who were treated with multiple operative procedures. This modified scale consists of a four-level gradient. The levels of 0, 2, and 4 are the same as the original scale shown in Figure 41-1 . The fourth and highest level (level 6) indicates that some type of sports participation is possible without the symptom. This modified scale is intended for studies in which the majority of patients do not return to moderate or strenuous athletics indicated in levels 8 and 10. The reliability of this modified scale was previously shown to be adequate for patients and normal subjects. However, clinicians should be aware that the modified scales may have reduced sensitivity, especially if the results of a study that used the modified scale are compared with another study that used the original scale. This pertains only to the individual symptom results. The effect of the modified scale on the overall rating score, described later in this chapter, is small and only has a negligible impact when comparing the data of the overall scores among different populations.

FIG 41-3, Modified Symptom Rating Scale.

Even though there is always the potential for a bias to exist regarding the scoring of subjective symptoms, the CKRS format allows for an accurate assessment of the activity levels patients return to on a routine basis. The scale was designed to not award points if a patient participates at a high activity level but has symptoms, thereby fulfilling the criteria of the “knee abuser.”

Rating of Patient Perception of the Knee Condition

Modern knee rating systems incorporate some form of patient satisfaction, or rating of the patient's perception of the knee condition, into the assessment of clinical outcome. In the CKRS, patients are asked to rate the overall condition of the knee by circling a number on a scale from 1 to 10 ( Fig. 41-4 ). Four descriptors are provided to assist the patient in understanding the meaning of the numerical scale. Under the number 2 is the term “poor,” defined as “I have significant limitations that affect activities of daily living,” whereas under the number 10 are the terms “normal/excellent,” defined as “I am able to do whatever I wish (any sport) with no problems” (see Appendix A ). For data reporting purposes, a distribution of responses is shown in a five-level gradient. Responses under numbers 1 and 2 are termed “poor,” those under numbers 3 and 4, “fair”; those under numbers 5 and 6, “good”; those under numbers 7 and 8, “very good”; and those under numbers 9 and 10, “normal.”

FIG 41-4, Patient Perception of the Knee Condition.

An example is shown in Figure 41-5 for a group of patients who received a meniscus transplant. The data were also expressed in the body of the text as “the mean preoperative patient perception score of 3.2 points (range, 1-6 points) improved to a mean of 6.2 points (range, 1-9 points) at follow up ( P = .0001). Two patients rated the knee condition as the same, and two as worse.”

FIG 41-5, Distribution of patient perception of the overall knee condition from a clinical study on meniscus transplantation. The difference between preoperative and follow up was statistically significant ( P < .0001).

Clinicians and researchers should realize that inconsistencies may arise from the responses to the patient perception scale and those in other scales of the CKRS. An example is an 18-year-old patient who successfully returned to competitive soccer without problems or symptoms rated the overall knee condition as a 7 because he felt “slower than before the injury.” Conversely, a 45-year-old patient with a triple varus knee who required an HTO, ACL reconstruction, and posterolateral reconstruction and was only able to return to low-impact activities, also rated her knee condition as a 7. She indicated that she was exceptionally pleased that her constant pain with daily activities had resolved and that she was able to swim and bicycle without problems. Because this portion of the CKRS is perhaps the most subjective of all of the assessment factors, it is not correlated with other outcome measures. This underscores the inconsistencies in reporting of patient outcome when only a patient perception rating is used to determine the results of an operation.

Rating of Sports and Daily Activities

Critical Points
Rating of Sports and Daily Function and Activities

Accurate assessment of sports participation must span many levels of athletics, intensity, and frequency of participation; be able to sort populations according to changes in athletic levels or lifestyle; and identify patients who experience symptoms during athletic activities.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here