Endocrinology Laboratory Testing


Introduction

What makes one an effective user of the clinical laboratory?

Such a user:

  • Identifies at least one sympathetic ally in the clinical laboratory to whom one can reach out for advice about laboratory testing. None of us can be expert in all the areas we need to know for the best possible care of patients, so including a laboratory expert as an ad hoc member of the care team should be a best practice for all clinicians.

  • Is at least somewhat conversant with the language of the clinical laboratory to make it easier to communicate with laboratory experts, as well as to understand key articles published in the clinical laboratory literature.

  • Understands that diagnostic testing, like so much of medicine, is “playing the odds,” and has enough awareness of basic probability theory and statistics to know: (1) when and when not to order laboratory tests and (2) how confident to be about a positive or negative result.

  • Has insight into how laboratory tests are validated (for quality purposes).

  • Knows enough about laboratory methodology to decide which test is most appropriate for the patient.

  • Is aware of test limitations and possible sources of error (many of which can occur even before the sample is actually tested).

  • Accepts the reality that different versions/kits/brands of the same test can give markedly different results, and adjusts diagnostic strategy accordingly.

  • Recognizes the limitations and statistical fuzziness of all reference intervals, particularly those in pediatrics, and interprets results carefully with this inexactitude in mind.

Collaboration among clinicians and laboratorians

Even with a laboratory test as seemingly simple as thyroid-stimulating hormone (TSH), a collaborative relationship between an endocrinologist and a clinical laboratory expert can be mutually beneficial. An experienced clinician may be well aware of the significant circadian rhythm of this hormone but may or may not be aware of the substantial interindividual difference in how much daily variability is present , nor how much TSH results may vary among different assays. The laboratorian in turn may be intimately familiar with how often interfering antibodies may lead to falsely elevated results but may or may not realize how common it is to see teenaged congenital hypothyroidism patients with high TSH and high free thyroxine (T4) because of last-minute double-dosing after months of chronic noncompliance. A clinical laboratory expert can explain how heterophilic antibody interference differs from TSH autoantibody interference (we will cover this later), and which diagnostic maneuvers make the most sense given the clinical scenario presented by the clinician. There is a time commitment involved in arranging regular clinician-laboratorian communications, but without a doubt patients benefit tremendously from what experts can teach each other.

Learning to speak (some of) the language of the laboratory

The language used in the clinical laboratory can be a barrier to learning. “Labspeak” is not quite a foreign language but represents a dialect with many terms unfamiliar to clinicians. Imagine the following conversation:

Clinician: “The results for test X on this patient seem really different between laboratory Y and your laboratory.” Laboratorian: “I agree with you that these two results show more discordance than I’d expect from normal analytical and biological variation because the total allowable error is only 15% even though we’re down near the LOQ. The assay QC looks fine, so we should check for potential preanalytical issues and consider possible heterophilic antibody interference as well. And let’s check the platforms used, because the assays for this analyte are definitely not standardized or even harmonized to date.”

Although those working in the clinical laboratory should be savvy enough to avoid speaking like this to a clinician (especially using acronyms like LOQ), the earlier paragraph is perfectly plausible for a conversation between two laboratorians, and in fact might be the most concise way to convey key points in the investigation to follow. Knowing even just a few commonly used laboratory terms shown in Table 4.1 can help bridge the communication gap and will certainly help a clinician better understand key articles in a useful journal, such as Clinical Chemistry .

Table 4.1
Selected Laboratory Terminology
Term and/or Concept Associated Acronyms Pertinent Section
Aliquot 2
Analyte 2
Calibrator/calibration 7
Carryover 4
Chromatography LC, HPLC, UPLC 5
Competitive immunoassay RIA, EIA, CIA 5
Extraction 5
Harmonization 7
Heterophilic antibody HAMA, HAAA 6
Immunometric assay IRMA, ICMA, IFMA, ELISA 5
Interferences 6
Limit of quantitation (vs. limit of detection) LOQ (vs. LOD) 4
Linearity 4
Mass spectrometry MS, GC-MS, LC-MS/MS 5
Matrix 4
Method comparison 4
Platform 2
Positive (vs. negative) predictive value PPV (vs. NPV) 3
Preanalytical 6
Precision 4
Receiver-operator characteristic curve ROC curve 3
Recovery 4
Reportable range 4
Sensitivity & specificity (analytical) 4
Sensitivity & specificity (clinical) 4
Stability 3
Standardization 4
Validation (Analytical) 7
CIA , Competitive immunoassay; ELISA , enzyme-linked immunosorbent assay; EIA , enzyme immunoassay; GC-MS , gas chromatography-mass spectrometry; ICMA , immunochemiluminometric assay; IFMA , immunofluorescence assay; IRMA , immunoradiometric assay; LC , liquid chromatography; HAAA , human antianimal antibody; HAMA , human antimouse antibody; HPLC , high-performance liquid chromatography; MS , mass spectrometry; RIA , radioimmunoassay; UPLC , ultraperformance liquid chromatography.

Many of these terms will be defined in sidebars in the appropriate section, but there are a few general ones that are worth mentioning right away:

  • “Analyte” is a very common word in laboratory medicine, simple in concept, yet unfamiliar to most clinical ears. It is a generic term for “the thing being measured/analyzed.” Feeling comfortable with this word will make it much easier to talk with the laboratory and scour the pertinent laboratory literature.

  • “Aliquots” are smaller portions of a sample, prepared from the original, or “mother” tube. You can use an aliquot to send out to another laboratory for corroboration, or use a “fresh aliquot” to repeat the test, if you think the original one may have had too many freeze-thaw cycles, or was potentially contaminated.

  • “Platform” is a general, albeit somewhat ambiguous term, most often used to describe the manufacturer and model of automated testing instruments, for example, the Beckman Access versus the Roche Elecsys. Why is this of any importance to the clinician? Because platforms differ in their performance characteristics and vulnerability to interferences. We will see later on that comparing results for the same sample on two different platforms is sometimes the fastest way to investigate certain types of interferences. A clinician faced with an unexpected result must be aware enough to ask the laboratory if they can corroborate that result “on a different platform” when appropriate.

Laboratory statistics: the basics of evidence-based diagnosis

Biostatistics and epidemiology are often taught during the early preclinical years of training, when students are hungry to gain clinical experience, and are sometimes dismissive of what seems like more didactic study. Yet, both experienced clinicians and those involved in clinical research realize quickly how important it is to have at least a basic awareness of medical statistics to avoid making significant errors in diagnostic or treatment decisions.

If you call your laboratory asking about the sensitivity and specificity of a test, you will be asked whether you want “clinical sensitivity and specificity” (covered here) versus the completely different “analytical sensitivity and specificity” (discussed in the following methodology/validation section). The clinical laboratory will certainly have data on the latter, but likely only limited studies for the former, because establishing clinical sensitivity and specificity typically require significant clinical studies beyond the reach of the clinical laboratory.

  • “Clinical sensitivity” is how often the test will be positive in a patient who has the disease being tested for. An excellent mnemonic (useful for examinations) is to think of the abbreviation for “positive in disease” as “PID” and consider how important it is to be “sensitive” when you have a patient with clinical PID (pelvic inflammatory disease).

  • “Clinical specificity” is how often the test will be negative in a patient who is “healthy” (at least, who does not have the disease being tested for). The mnemonic in this case is “negative in health,” or “NIH”—consider how important it is to be very “specific” when writing an NIH grant proposal.

Perhaps more relevant to clinical practice is to understand the concepts of “positive predictive value” (PPV) and “negative predictive value” (NPV)

  • PPV is the probability of disease in a patient with a positive test result.

  • NPV is the probability of “health” (nondisease) in a patient with a negative test result.

Both NPV and PPV are affected by the underlying prevalence of disease, or more precisely, by the probability that the patient in question has the disease (“pretest probability”). Fig. 4.1 summarizes the definitions of these terms, whereas Fig. 4.2 demonstrates the dramatic effect of increasing pretest probability on the PPV of a diagnostic test, which should be a testament to the importance of a good history and physical examination before deciding on laboratory test ordering. The best way to improve diagnostic test performance is to be as certain as possible about the diagnosis even before ordering the test!

Fig. 4.1, Definitions of basic diagnostic test statistics.

Fig. 4.2, Effect of disease prevalence/pretest probability on positive ( PPV ) and negative ( NPV ) predictive values, using a test with 99% clinical sensitivity and 99% clinical specificity. Numbers in boxes represent the distribution of 100,000 patients among true positives ( TP ), false positives ( FP ), false negatives ( FN ), and true negatives ( TN ).

In clinical practice, most endocrine laboratory tests give continuous rather than “yes/no” results and are therefore rarely used as strict positive/negative tests. A TSH of either 6.0 mU/L 60 mU/L are both “positive,” but neither value will be used as a diagnostic cutoff for the diagnosis of primary hypothyroidism. Using a TSH of 6.0 mU/L would ensure that virtually all patients with primary hypothyroidism are detected (maximum sensitivity) but at the expense of many false negatives (very poor specificity) and unnecessary referrals to the endocrine clinic. On the other hand, using a cutoff of 60 mU/L would minimize the number of false negatives (excellent specificity) but at the expense of missing many true cases of primary hypothyroidism (clinically unacceptable low sensitivity). The choice of a cutoff somewhere in between these two extremes should be determined by the clinical scenario (e.g., perhaps lower in a 14-month-old infant than in an obese but otherwise healthy 13-year-old teenager) rather than an arbitrary universal threshold.

For tests that generate continuous results, the overall diagnostic effectiveness of a test can also be evaluated by plotting the true positive rate versus the false positive rate, producing a receiver-operating characteristics curve (ROC curve).

  • An ROC curve plots, for various test results, the true positive rate (clinical sensitivity) of a test on the y-axis versus the false positive rate (1 – clinical specificity) on the x-axis ( Fig. 4.3 ). The area under the curve (AUC) can be used to estimate the ability of the test to distinguish disease from nondisease, with an AUC of 0.50, indicating a test without diagnostic value, a test with an AUC of 0.90 or more generally considered as excellent, and one with an AUC of 0.70 considered to be a fair diagnostic test.

    Fig. 4.3, Receiver-operating characteristic curves for two theoretical diagnostic tests.

Two important caveats for evidence-based diagnosis:

  • (1)

    Calculation of clinical sensitivity and specificity for a diagnostic test depends upon clear definition of who does have the disease in question and who does not. If there is no gold standard diagnostic test for comparison, or if the definition of the disease evolves and becomes less clear-cut over time (e.g., early definitions of severe/anatomically proven growth hormone deficiency [GHD] versus later, less well-defined cases of GHD), the definitions of sensitivity and specificity may be approximate at best.

  • (2)

    Calculation of sensitivity, specificity, and true and false positive or negative rates will vary depending upon the nature of the population being studied. For example, a very good diagnostic test applied across the entire population of the United States will have a far higher false positive rate (as is seen with screening tests) than the same test applied to a carefully selected patient group who has been deemed likely to have the disease in question based on history and physical examination. This again emphasizes the need to increase pretest probability of disease as much as possible before ordering any diagnostic testing.

Analytical validation

Clinical Scenario 1

An investigator inadvertently runs two tubes containing nothing but water on a peptide classic radioimmunoassay (RIA) and is nonplussed when the results show a significant level of the peptide in these tubes.

Clinical Scenario 2

A laboratory declines to run a tumor marker immunoassay ordered on viscous cyst fluid because of a lack of analytic validation data but relents when the physician insists that the laboratory run the sample with a disclaimer “nonvalidated sample type; interpret with caution.” Despite the disclaimer, the laboratory and the clinician are later both sued successfully for inappropriate diagnosis and unnecessary treatments based on what turns out to be a falsely positive result.

Both of these brief clinical vignettes illustrate why regulatory agencies and anyone concerned with quality laboratory testing place such emphasis on analytic method validation. A peptide RIA may give accurate results in serum, but totally inaccurate results in a protein-free fluid; perhaps the tracer (See Methodology: Immunoassays on page 91) in scenario 1 stuck to the sides of the plastic tube, leading to decreased tracer binding and an apparent detectable level of peptide where none was actually present. Scenario 2 represents a situation of misdiagnosis; the viscosity of the solution might have affected the interaction of the assay components, with substantial impact on the patient and legal consequences for all involved.

Analytical validation is meant to ensure that an assay method is accurate for its intended use. Components of an analytical validation include the following: (1) linearity/reportable range; (2) precision; (3) analytical sensitivity; (4) analytic specificity, interferences, and recovery; (5) accuracy/method comparison; (6) sample types and matrix effects; (7) stability; and (8) carryover. Determining reference intervals is an important part of many analytic validations and crosses over to clinical validation. Note that even though not all of these components are always required from a regulatory point of view, all represent good-quality laboratory practice.

Linearity/Reportable Range

Also referred to as the analytic measurement range (AMR), this is the range of concentrations over which the assay is known to be reliable. Standards of known concentration (calibrators) are assayed and plotted against the signal generated in the assay. For the hypothetical study shown in Fig. 4.4 , the upper limit of the AMR would likely be at the concentration represented by calibrator D, because the higher concentration represented by calibrator E does not result in a similar degree of increased signal. However, it may be possible to dilute the sample so that one can make a measurement within the AMR, thereby allowing for assay of concentrations above the upper limit of the AMR. The calibrator choice may alter the absolute value reported, particularly with peptides and proteins where the standard may represent only one of a mixture of differentially modified (e.g., glycosylated or cleaved) forms present in the circulation.

Fig. 4.4, Linearity study used to determine the analytic measurement range (AMR). The concentrations represented by points A and D represent the likely minimum and maximum concentrations of the AMR. Higher concentrations, such as that represented by point E, may still be measured if the sample can be diluted to bring the concentration to within the AMR, provided that previous studies have proven that the response remains linear when a dilution is performed.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here