The Reliability and Diagnostic Utility of the Orthopaedic Clinical Examination


The health sciences and medical professions continue to focus on evidence-based practice defined as the integration of the best available research evidence and clinical expertise with the patient’s values. , Evidence should be incorporated into all aspects of physical therapy patient and client management, including the examination, evaluation, diagnosis, prognosis, and intervention. Perhaps the most crucial component is a careful, succinct clinical examination that can lead to an accurate diagnosis, the selection of appropriate interventions, and the determination of a prognosis. Thus, it is of utmost importance to incorporate evidence of how well clinical tests and measures can distinguish between patients who present with specific musculoskeletal disorders and patients who do not. ,

The diagnostic process entails obtaining a patient history, developing a working hypothesis, and selecting specific tests and measures to confirm or refute the formulated hypothesis. The clinician must determine the pretest (before the evaluation) probability that the patient has a particular disorder. Based on this information the clinician selects appropriate tests and measures that will help determine the posttest (after the evaluation) probability of the patient having the disorder, until a degree of certainty has been reached such that patient management can begin (the treatment threshold ). The purpose of clinical tests is not to obtain diagnostic certainty but rather to reduce the level of uncertainty until the treatment threshold is reached. The concepts of pretest and posttest probability and treatment threshold are elaborated later in this chapter.

As the number of reported clinical tests and measures continues to grow, it is essential to thoroughly evaluate a test’s diagnostic properties before incorporating the test into clinical practice. Integrating the best evidence available for the diagnostic utility of each clinical test is essential in determining an accurate diagnosis and implementing effective, efficient treatment. It seems only sensible for clinicians and students to be aware of the diagnostic properties of tests and measures and to know which have clinical utility. This text assists clinicians and students in selecting tests and measures to ensure the appropriate classification of patients and to allow for quick implementation of effective management strategies.

The assessment of diagnostic tests involves examining several properties, including reliability and diagnostic accuracy. A test is considered reliable if it produces precise and reproducible information. A test is considered to have diagnostic accuracy if it can discriminate between patients who have a specific disorder and patients who do not have it. Scientific evaluation of the clinical utility of physical therapy tests and measures involves comparing the examination results with reference standards such as radiographic studies (which represent the closest measure of the truth). Using statistical methods from the field of epidemiology, the diagnostic accuracy of the test, that is, its ability to determine which patients have a disorder and which do not, is then calculated. This chapter focuses on the characteristics that define the reliability and diagnostic accuracy of specific tests and measures. The chapter concludes with a discussion of the quality assessment of studies investigating reliability and diagnostic utility.

Reliability

For a clinical test to provide information that can be used to guide clinical decision making, it must have acceptable reliability. Reliability is the degree of consistency with which an instrument or rater measures a particular attribute. When we investigate the reliability of a measurement, we are determining the proportion of that measurement that is a true representation and the proportion that is the result of measurement error.

When discussing the clinical examination process, it is important to consider two forms of reliability: intraexaminer and interexaminer reliability. Intraexaminer reliability is the ability of a single rater to obtain identical measurements during separate performances of the same test. Interexaminer reliability is a measure of the ability of two or more raters to obtain identical results with the same test.

The kappa coefficient (κ) is a measure of the proportion of potential agreement after chance is removed , , ; it is the reliability coefficient most often used for categorical data (positive or negative). The correlation coefficient commonly used to determine the reliability of data that are continuous in nature (e.g., range-of-motion data) is the intraclass correlation coefficient (ICC). Although interpretations of reliability vary, coefficients are often evaluated by the criteria described by Shrout, with values less than 0.10 indicating no reliability, values between 0.11 and 0.40 indicating slight reliability, values between 0.41 and 0.60 indicating fair reliability, values between 0.61 and 0.80 indicating moderate reliability, and values greater than 0.81 indicating substantial reliability. “Acceptable reliability” must be decided by the clinician using the specific test or measure and should be based on the variable being tested, the reason a particular test is important, and the patient on whom the test will be used. For example, a 5% measurement error may be very acceptable when measuring joint range of motion but is not nearly as acceptable when measuring pediatric core body temperature.

Diagnostic Accuracy

Clinical tests and measures can never absolutely confirm or exclude the presence of a specific disease. However, clinical tests can be used to alter the clinician’s estimate of the probability that a patient has a specific musculoskeletal disorder. The accuracy of a test is determined by the measure of agreement between the clinical test and a reference standard. , A reference standard is the criterion considered the closest representation of the truth of a disorder being present. The results obtained with the reference standard are compared with the results obtained with the test under investigation to determine the percentage of people correctly diagnosed or the diagnostic accuracy. Because the diagnostic utility statistics are completely dependent on both the reference standard used and the population studied, we have specifically listed these within this text to provide information to consider when selecting the tests and measures reported. Diagnostic accuracy is often expressed in terms of positive and negative predictive values (PPVs and NPVs), sensitivity and specificity, and likelihood ratios (LRs). ,

2×2 Contingency Table

To determine the clinical utility of a test or measure, the results of the reference standard are compared with the results of the test under investigation in a 2×2 contingency table, which provides a direct comparison between the reference standard and the test under investigation. It allows for the calculation of the values associated with diagnostic accuracy to assist with determining the utility of the clinical test under investigation ( Table 1-1 ).

Table 1-1
2×2 Contingency Table Used to Compare the Results of the Reference Standard with Those of the Test under Investigation
Reference Standard Positive Reference Standard Negative
Clinical Test Positive True-positive results
a
False-positive results
b
Clinical Test Negative False-negative results
c
True-negative results
d

The 2×2 contingency table is divided into four cells (a, b, c, d) for the determination of the test’s ability to correctly identify true positives (cell a) and rule out true negatives (cell d). Cell b represents the false-positive findings wherein the diagnostic test was found to be positive yet the reference standard obtained a negative result. Cell c represents the false-negative findings wherein the diagnostic test was found to be negative yet the reference standard obtained a positive result.

Once a study investigating the diagnostic utility of a clinical test has been completed and the comparison with the reference standard has been performed in the 2×2 contingency table, determination of the clinical utility in terms of overall accuracy, PPVs and NPVs, sensitivity and specificity, and LRs can be calculated. These statistics are useful in determining whether a diagnostic test is useful for either ruling in or ruling out a disorder.

Overall Accuracy

The overall accuracy of a diagnostic test is determined by dividing the correct responses (true positives and true negatives) by the total number of patients. Using the 2×2 contingency table, the overall accuracy is determined by the following equation:


Overallaccuracy = 100 % × ( a + d ) / ( a + b + c + d )

A perfect test would exhibit an overall accuracy of 100%. This is most likely unobtainable in that no clinical test is perfect, and each will always exhibit at least a small degree of uncertainty. The accuracy of a diagnostic test should not be used to determine the clinical utility of the test, because the overall accuracy can be a bit misleading. The accuracy of a test can be significantly influenced by the prevalence of a disease, or the total instances of the disease in the population at a given time. ,

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here