Establishment and use of reference intervals


Abstract

Background

One of the most important elements of a laboratory test result is the reference interval, a set of values against which physicians compare their patients’ test results, facilitating interpretation. It is extremely important, therefore that the laboratory community devotes sufficient resources to ensure the reference limits they provide are well-founded. Most frequently, these reference limits represent values for healthy, adult patients, but other sets of values can be provided (such as values for pregnancy or for children). Sometimes, clinical decision limits are provided in place of conventional reference limits (such as for treating patients with diabetes or for diagnosing acute coronary syndromes).

Content

In this chapter, we describe the techniques for properly establishing reference intervals, including selection of appropriate reference individuals, implementation of preanalytical standardization, considerations for eliminating outliers, partitioning the reference group, and performance of statistical methods to calculate reference limits and their confidence intervals. In addition, since formal establishment of reference intervals may be beyond the capacity of many laboratories, we discuss alternative sources for reference limits (including manufacturers’ package inserts, peer-reviewed literature, multicenter trials, historical laboratory data), along with techniques to verify the transferability of these data and common reference intervals. Consideration will be given to issues related to enhancing the display of patient test results with the appropriate reference limits. Even though most of the chapter is devoted to single tests (univariate) and population-based reference limits, we will also briefly describe the concept of subject-based and multivariate reference intervals. Lastly, we discuss techniques for ongoing verification of reference limits.

Concept of reference limits

Interpretation by comparison

Laboratory test results play a vital role in clinical medicine. Physicians use these results when screening for diseases in apparently healthy people, for confirming, excluding, or changing the probability of the diagnosis of specific diseases in patients with certain symptoms and signs, and for monitoring changes in a patient over time. To achieve these goals, interpretation is made by comparison with population reference limits, clinical decision limits, or previous results from the same patient. Population reference limits are generally derived from subjects without diseases, whereas clinical decision limits are generally based on clinical categories or outcomes of patients which can be separated on the basis of laboratory results. To facilitate these comparisons it is critical that laboratories provide not only the patient’s test result but appropriate reference limits with which the patient’s results can be compared. Ideally, for comparison with population limits, such reference limits should be available not only from healthy individuals but also from patients with relevant diseases, to assess whether a result is within the expected range for a clinical condition under consideration. Usually only health-associated reference limits are available in pathology reports with expected values in diseases often estimated by doctors based on training and experience. Reference limits have been described as the most common decision support tool in laboratory medicine, and their inclusion on a pathology report is endorsed by the international clinical laboratory standard ISO 15189 and required by the College of American Pathologists (CAP). A detailed history and commentary on the development of reference intervals is available.

Normal values/normal ranges: Obsolete terms

Historically, the term normal values was used to describe the laboratory data provided for purposes of comparison, and normal ranges as the expression of these on pathology reports. However, use of these terms often leads to confusion because the word “normal” has several different connotations. For example, three medically important but very different meanings of “normal” are:

  • 1.

    Statistical sense: Values can be described as “normal” if their observed distribution seems to follow closely the theoretical normal distribution of statistics—the Gaussian probability distribution. This use of “normal” has sometimes misled people to believe that the distribution of biological data is always symmetric and bell shaped, like the Gaussian distribution. However, on closer examination, this usually is not correct. To exorcize the “ghost of Gauss,” Elveback and colleagues recommend not using the term normal limits . For a similar reason, the term normal distribution should be avoided and replaced by the term Gaussian distribution.

  • 2.

    Epidemiologic sense: Another meaning of “normal” is illustrated by the following statement: It is “normal” to find that the activity of gamma-glutamyltransferase (GGT) in serum is between 7 and 47 IU/L, whereas it is considered “abnormal” to have a serum GGT value outside these limits. Here a more exact statement would read as follows: Approximately 95% of the values obtained, when the activity of GGT in sera collected from individuals considered to be healthy is measured, are included in the interval 7 to 47 IU/L. The obsolete concept of normal values in part carried this meaning. Alternative terms for “normal” in this sense are common, frequent, habitual, usual, and typical.

  • 3.

    Clinical sense: The term “normal” also is often used to indicate that values show the absence of certain diseases or the absence of risks for the development of diseases. In this sense, a normal value is considered a sign of health. Better descriptive terms for such values are healthy, nonpathologic , and harmless . As a corollary, when results are discussed with patients, it may be unhelpful to describe results outside reference limits as “abnormal” because this may be taken to indicate the presence of disease or ill health and therefore create unnecessary anxiety or concern.

Because of confusion resulting from the different meanings of normal, the terms normal values and normal ranges are obsolete and should not be used.

To prevent the ambiguities inherent in the term normal values, the concept of reference values, from which the terms reference intervals and reference limits are derived, was introduced and implemented in the 1980s. , The term reference is appropriate because these values provide something to refer to when interpreting a result. This was an important event in establishing a scientific basis for clinical interpretation of laboratory data. The term reference range is sometimes used in place of the term reference interval recommended by the International Federation of Clinical Chemistry and Laboratory Medicine (IFCC). This use is incorrect because the statistical term range denotes the difference (a single value!) between maximum and minimum values in a distribution.

Terminology

The IFCC recommends use of the term reference individuals and related terms such as reference value, reference limit, reference interval, and observed values. , The definitions and the presentation in the following sections of this chapter are in accordance with IFCC recommendations, which have been adopted by the Clinical Laboratory Standards Institute (CLSI).

Reference individual: An individual selected for comparison using defined criteria.

As mentioned previously, for the interpretation of values obtained from an individual under clinical investigation, appropriate comparison values are needed. To provide such values, suitable individuals must be selected. The characteristics of the individuals in each group chosen for comparison should be clearly defined. Their age and sex must be specified and whether they should be healthy or have a certain disease. The definition of a reference individual also covers cases in which the individual under clinical investigation is his or her own reference, as discussed in a later section on subject-based reference values.

Reference population : The entire set of reference individuals.

Reference value : A value obtained by observation or measurement of a particular type of quantity on a reference individual.

If, for example, the activity of GGT is measured in sera collected from a group of reference individuals selected for comparison according to a sufficiently exact set of criteria, the GGT results are considered reference values.

Reference distribution : The distribution of the reference values.

Reference limits : The upper and lower bounds of the specified fraction of the reference distribution, typically the central 95% of the distribution.

Reference interval : The spread of values defined by the upper and lower reference limits.

Observed value : A value of a particular type of quantity obtained by observation or measurement and produced to make a medical decision. Observed values can be compared with reference values, reference distributions, reference limits, or reference intervals.

Or, rephrased: An observed value is the result obtained by analysis of a specimen collected from an individual under clinical investigation . The equivalent term used in the International Vocabulary of Metrology (VIM) is measurement result .

The IFCC also defines other terms related to the concept of reference values : reference sample group, reference distribution, reference limit, and reference interval. Some of these terms are introduced in later sections of this chapter.

Clinical decision limits

The terms reference limits and clinical decision limits should not be confused. , Reference limits are descriptive of the distribution of results in the selected subset of reference individuals; they tell us something about the expected variation of values in the reference population. Comparison of new values with these limits conveys information about similarity to the given reference values. In contrast, clinical decision limits provide separation based on clinical categories or outcomes. The latter limits may be based on analysis of reference values from several groups of individuals (healthy persons and patients with relevant diseases) and are used for the purpose of differential diagnosis. Alternatively, such values are established on the basis of outcome studies and are used as clinical guidelines for treatment. Examples of current decision limits include recommended concentrations for therapeutic drug levels (see Chapter 42 ), the National Cholesterol Education Program guidelines related to cholesterol, the American Diabetes Association recommendations for diagnosis of diabetes with HbA 1c or plasma glucose, and the American Academy of Pediatrics guidelines on neonatal bilirubin. A key factor with clinical decision limits is that each assumes that measurements of the involved analytes are accurate, with the metrological traceability similar to the method used in the clinical studies on which the clinical decision points were established (see Chapter 7 ).

In this context, it is critical to point out another difference between reference limits and clinical decision limits. For most analytes, a laboratory should establish (or verify) its own reference limits. The processes to do this are described later in this chapter. But for analytes interpreted using clinical decision limits such as national or international laboratory guidelines, efforts that once would have been dedicated to establishing or verifying reference intervals should be redirected toward establishing accuracy (trueness). In the 2010 CLSI guidelines, this point is given much-deserved emphasis. It does little good to establish one’s own reference limits if physicians will (and should) use national guidelines or if the laboratory gives results which are biased compared with the results used to determine the clinical decision points. Methods to establish the accuracy of one’s method are discussed in Chapter 7 . It is also important for laboratories to communicate to clinicians the nature of reference limits provided with results, specifying whether these are population reference intervals or clinical decision limits, as well as any additional information required for appropriate use. In particular, information on populations with and without specified diseases allows for determination of important characteristics of diagnostic tests, including their sensitivity, specificity, predictive values, and likelihood ratios, all of which are discussed in detail in Chapter 2 .

Types of reference limits

In practice it is often necessary or convenient to give a short description associated with the term reference limits , such as health-associated reference limits (close to what was understood by the obsolete term normal values ) . With conditions such as obesity, which are prevalent in many populations and associated with poorer health outcomes, the definition of health-associated reference limits becomes more difficult, both to define (this is discussed in subsequent text with exclusions from the reference population) and to communicate to the end-user . Other examples of such qualifying words could be hospital inpatient , pregnancy , and patients with well-controlled diabetes . These short descriptions prevent the common misunderstanding that reference values are associated only with health.

Subject-based and population-based reference values

Subject-based reference values are previous values from the same individual, obtained when he or she was in a known state of health. Population-based reference limits are those obtained from a group of well-defined reference individuals and are usually the types of values referred to when the term reference limits is used with no qualifying words. This chapter deals primarily with population-based values. It should be noted, however, that for some tests, intraindividual variation may be small relative to interindividual differences. The relationship of within- to between-individual variation is known as the index of individuality (see Chapter 8 ), and in cases in which this is low (e.g., creatinine, immunoglobulins ), the use of population-based reference intervals may distract from clinically significant intraindividual changes, as noted later in this chapter. In this setting the concept of “reference change value” (RCV) can be seen as analogous to a population reference limit, as the RCV is defined using data from reference subjects from a reference population, and a statistical analysis used to determine significant changes.

It is also important to note that this chapter focuses on population-based univariate reference limits and quantities derived from them. For example, if separate reference limits for calcium and parathyroid hormone (PTH) in plasma are used, two sets of univariate reference limits are produced. The term multivariate reference limits denotes that results of two or more analytes obtained from the same set of reference individuals are treated in combination. Plasma calcium and PTH values may be used, for example, to define a bivariate reference region, which would reflect the fact that, as calcium concentrations decrease, even within healthy reference limits, PTH levels rise. Thus a PTH level that is within health-associated univariate reference limits might not be within the health-associated bivariate reference limits. This subject is addressed briefly in a later section.

Requirements for valid use of a reference interval

Certain conditions apply for a valid comparison between a patient’s laboratory results and reference values :

  • 1.

    The reference individuals for each test should be clearly defined.

  • 2.

    The patient examined should sufficiently resemble the reference individuals in all respects other than those under investigation.

  • 3.

    The conditions under which the reference specimens were obtained and processed for analysis should be known and these conditions should be the same as for the patient specimen.

  • 4.

    The measurand under examination in the patient and the reference individuals should be the same.

  • 5.

    All laboratory results should be produced using adequately standardized methods under sufficient analytical quality control (see Chapters 6 and 7 ). The standardization should be sufficient that any bias or difference in precision or analytical specificity between the analytical system used for the patient sample and that used for the reference samples does not affect the interpretation.

    • To these general requirements one may add others that become necessary when more detailed and sophisticated approaches to decision making are applied.

  • 6.

    Stages in the pathogenesis of diseases that are the objectives for diagnosis should be demarcated beyond the separation between presence and absence of the disease. For example, although some overlap occurs, the clinical grades of congestive heart failure (CHF) are distinguished by progressive increases in levels of N-terminal pro-brain natriuretic peptide (NTproBNP).

  • 7.

    Clinical diagnostic sensitivity and specificity, prevalence, and clinical costs of misclassification should be known for all laboratory tests used. For example, in some instances, one might want to know whether a given NTproBNP value is “healthy,” in which case one would want to use reference limits for age- and sex-matched individuals with no evidence of CHF. In contrast, when faced with a patient complaining of shortness of breath in the emergency room, one might want instead to know, not so much whether any degree of CHF is present, but whether the patient’s CHF is sufficiently advanced to be the cause of the shortness of breath. ,

Selection of reference individuals

A set of selection criteria determines which individuals should be included in the group of reference individuals. , Such selection criteria include statements describing the source population and specifications of criteria for health or for the disease of interest.

Often, separate reference values for each sex and for different age groups, as well as other criteria, are necessary. The overall group of reference individuals therefore may have to be divided into more homogeneous subgroups. For this purpose, specific rules for the division, called stratification or partitioning criteria, are needed.

It is important to distinguish between selection and partitioning criteria. First, selection criteria are applied to obtain a group of reference individuals. Thereafter, this group is divided into subgroups using partitioning criteria. Whether a specific criterion (e.g., sex) is a selection or a partitioning criterion depends on the purpose of the actual project. For example, sex is a selection criterion if reference values only from female subjects are necessary. Sex can also be a selection criteria where the data will be partitioned using this criterion to ensure sufficient numbers of each sex are collected.

Concept of health in relation to reference values

There is an obvious requirement for health-associated reference values for quantities measured in the clinical laboratory. But the concept of health is problematic; as Grasbeck stated “Health is characterized by a minimum of subjective feelings and objective signs of disease, assessed in relation to the social situation of the subject and the purpose of the medical activity, and it is in the absolute sense an unattainable ideal state.” Much confusion may arise if the selection criteria for health are not clearly stated for a specific project.

When reference values are produced, the following questions are asked: (1) Why are these values needed? (2) How are they going to be used? (3) To what extent does the intended purpose of the project determine how health is identified? For example when setting reference limits for cardiac-specific troponins, a “cardio-healthy” population is required that is in other ways similar to the patients who are likely to present with possible acute coronary syndrome (i.e., they should be of similar age and gender, and they may have hypertension or hyperlipidemia). ,

Strategies for selection of reference individuals

Several methods have been suggested for the selection of reference individuals. Table 9.1 shows a variety of concepts that may be used to describe a sampling scheme. The concepts can be considered as pairs, each of which is mutually exclusive. For example, the sampling may be direct or indirect, and direct sampling may be a priori or a posteriori.

TABLE 9.1
Strategies for Selection of Reference Individuals
Direct Versus Indirect
Direct Individuals are selected from a parent population using defined criteria
Indirect Individuals are not considered, but certain statistical methods are applied to analytical values in a laboratory database
A Priori Versus A Posteriori
A Priori Individuals are selected for specimen collection and analysis if they fulfill defined inclusion criteria
A Posteriori Use of an already existing database containing both relevant clinical information and analytical results. Values of individuals meeting defined inclusion criteria are selected.
Random Versus Nonrandom
Random Process of selection giving each item (individual or test result) in the parent population an equal chance of being chosen
Nonrandom Process of selection that does not ensure that each item in the parent population has an equal chance of being chosen

The merits and disadvantages of these strategies are described in the following sections. It is not possible to recommend one sampling scheme that is superior in all respects and applicable to all situations. One must choose the optimal approach for a given project and state clearly what has been done.

Direct or indirect sampling?

Selection of reference intervals by direct sampling involves collection of specimens from selected members of the reference population for the purpose of establishing reference limits. Indirect sampling involves deriving reference limits from using results of samples collected for other purposes. Direct selection of reference individuals (see Table 9.1 ) concurs with the concept of reference values as recommended by the IFCC, and it is the basis for the presentation in this chapter. Its major disadvantages are the problems and costs of obtaining a representative group of reference individuals.

These practical problems have led to the search for simpler and less expensive approaches such as indirect methods. , Historically the indirect approach has been taken using results in a routine pathology database, often from laboratories serving a largely inpatient population. While these may be the only data available for some laboratories, the indirect approach may be applied to other data sources, such as samples collected for research, epidemiology, or “wellness testing,” where the expected prevalence of disease may be low. A key starting point with any indirect method is an understanding of the population from which the samples have been drawn, even if specific criteria have not been applied at the time of collection.

The indirect approach is based on the observation that many analysis results produced in the clinical laboratory seem to be “normal,” or at least unaffected by the reason for the sample collection. Two main concepts have been used to extract information about reference distributions from this type of data. The first is the use of statistical methods which allow identification of a distribution within the database which is then taken to represent the reference population. Note that for this approach no attempt is made to classify individual results as representing the reference population. The alternate method is to use additional clinical information to classify individual results and exclude those which are more likely to be from individuals with relevant disease or other factors which may affect the results. Typically both methods are applied in the development of reference intervals using the indirect approach.

An example of the results from a pathology database is shown in Fig. 9.1 . As seen, the values of serum sodium concentrations from outpatients have a distribution with a preponderant central peak and a shape similar to a Gaussian distribution. The underlying assumption of the indirect method is that this peak is composed mainly of normal values or, more precisely, is derived from patients without the condition of interest or diseases that may affect the analyte under consideration. Advocates of the method therefore claim that it is possible to estimate a reference interval if the distribution of unaffected values from this distribution is extracted. Fig. 9.1 also shows serum sodium results from hospital inpatients showing both lower results on average, as well as an increased proportion of lower results (the data set is left skewed). This may be due to the presence of a significant proportion of the samples being derived from patients with a condition affecting the results, for example, in the case of serum sodium, diuretic use, dehydration, and other fluid imbalances. It may also be due to systematic preanalytical differences, such as recumbence in inpatients compared with ambulatory outpatients. This shows the importance of selection of the data set to use for indirect analysis. Several mathematical methods have been used to extract a distribution for the derivation of reference limits from routine laboratory data.

FIGURE 9.1, Distribution of serum sodium concentrations obtained in a routine laboratory over 1 year. The top histogram (A) shows 31,183 sequential results from general practice sites and the lower histogram (B) shows 38,751 from hospital wards. The dark shaded areas and attached percentages show the fractions of the two populations outside the reference interval derived by direct methods in the same population (represented by the dashed vertical lines, 136 to 145 mmol/L)

In short, the indirect method has at least two potential major deficiencies:

  • 1.

    Estimates of the reference limits can depend heavily on the particular mathematical method used and on its underlying assumptions.

  • 2.

    Estimates of the reference limits can be affected by the prevalence, nature, and severity of disease included in the laboratory database. This may be a particular problem with databases containing only hospital inpatients. The use of ambulatory outpatients and general practice patients can reduce this variability considerably.

However, if appropriate exclusion criteria are applied, data derived by indirect sampling from pathology databases may be used for the establishment of reference values in a way that is fully concordant with IFCC recommendations. , , The requirement for this approach is that laboratory results should be combined with other information (i.e., to combine an a posteriori strategy with the indirect method). Laboratory results are to be used as reference values only if stated clinical criteria are fulfilled. The types of data which can be used include demographic information such as age, sex, source (e.g., inpatient, outpatient, specific clinics); patient sampling related information (e.g., by excluding multiple results from the same patient or limiting samples to those where only a single request for that test has been made in a specified time); information from other pathology results (e.g., using HbA 1c or fasting glucose results to reduce the likelihood of overweight or obesity related effects); or from other clinical information available by linking with clinical databases. In practice the factors applied are analyte-specific and depend on what is available, and detailed understanding of the pathophysiology of the analyte being examined is required. For example results from inpatients should be excluded for analytes affected by recumbency (e.g., serum albumin or sodium) or intercurrent disease (e.g., C-reactive protein [CRP], other acute phase reactants). For tests used for both diagnosis and monitoring (e.g., serum creatinine, tumor markers), restricting analysis to patients with a single result may be preferred. This can also be seen as an acceptance by the treating doctor that further investigation was not warranted based on the results.

Reference values produced by indirect sampling techniques have a number of significant potential advantages over those based on direct sampling. With any indirect method, the preanalytical and analytical factors are exactly the same for the patient sample and the reference setting process, and also the reference population matches that of the patient. This can provide a more appropriate comparison group, as the role of clinical decision-making is to separate patients with the same clinical presentation on the basis of disease, rather than separating sick from healthy. For example, the need, in patients with chest pain, is to distinguish those having a myocardial infarction from those who are not.

The indirect approach can also be used in settings where collection of samples for reference interval studies may be particularly problematic such as extremes of age or during pregnancy. Additionally the numbers of samples which may be available for indirect techniques can be vastly greater than direct techniques, in the many tens or even hundreds of thousands and the costs are a fraction of those of direct studies. If direct studies are available for comparison, indirect studies will enable an assessment of whether there are differences in the local population, specimen collection techniques, or analytical methods. It is however important to note that the indirect approach is continuing to evolve and that if poorly performed, an indirect study can give misleading results.

A priori or a posteriori sampling?

When carefully performed, both a priori (before) and a posteriori (after) sampling (see Table 9.1 ) may result in reliable reference values. The use of the a priori approach is limited to direct reference interval studies, but as discussed above, the a posteriori approach can be applied to direct and indirect studies. The choice is often a question of practicality. Both require the same set of successive steps, but the order of some of these operations differs depending on the mode of selection: a priori or a posteriori.

The first step in the process of producing reference values for a laboratory test should always be the collection of quantitative information about sources of biological, preanalytical, and analytical variation for the analyte studied. In this setting, biological variation includes expected variation with time of day, with meals, with seasons, and with life stages. A search through relevant literature may yield the required information. , If relevant information cannot be found in the literature, pilot studies may be necessary before the selection of reference individuals is planned in detail. Serum sodium is an example of a biological analyte that is affected by only a few sources of biological variation. However, the list of factors may be rather long for other analytes, such as serum enzymes, proteins, and hormones.

It is important to distinguish between controllable and noncontrollable sources of biological variation. Some factors may be controlled by standardization of the procedure for preparation of reference individuals and specimen collection such as fasting status and time of day (see a later section of this chapter). Other factors, such as age and gender, may be relevant partitioning criteria. Remaining sources of variation should be considered when criteria for the selection of reference individuals are defined.

The a priori strategy is best suited for smaller studies and for analytes for which there are very specific confounding factors or for which the analytical process is very difficult or expensive. One such example is male sex hormone–related reference intervals. Potential reference individuals from the parent population should be interviewed and examined clinically and by selected laboratory methods to decide whether they fulfill the defined inclusion criteria. If they do, specimens for analysis are collected by a standardized procedure (including the necessary preparation of individuals before the collection).

The a posteriori method is based on the availability of a large collection of data on medically examined individuals and measured quantities. Studies thoroughly planned by centers for health screening or preventive medicine may provide such data. It is important that data be collected by a strictly standardized and comprehensive protocol concerning (1) sampling from the parent population, (2) registration of demographic and clinical data on participating individuals, (3) preparation for and execution of specimen collection, and (4) handling and analysis of the specimens. If these requirements are met, values may be selected after application of the defined inclusion criteria to individuals found in the database. The selection of individuals from large pathology databases (see earlier discussion) is another example of the application of an a posteriori method. In this case, however, the quality of data may be lower than that in well-planned population studies.

A study performed in Kristianstad, Sweden, highlights a practical problem often met when reference individuals are selected: the number of subjects fulfilling the inclusion criteria may be too small. In this study, only 17% of participants were accepted into the study, according to the criteria used, leaving an insufficiently sized reference sample group and a risk of selection bias. The frequency of exclusion was higher among women and in older age groups, exacerbating the issues in these groups.

This problem has two possible solutions:

  • 1.

    The exclusion criteria may be relaxed. As already discussed, the set of relevant sources of biological variation differs among different analytes. One may define a minimum set of exclusion criteria for a given laboratory test. In the Kristianstad study, the complete group of individuals could probably be used for establishment of reference values for serum sodium, and most of the individuals would be acceptable for the determination of reference values for several other analytes.

  • 2.

    Another design of the sampling procedure could reduce the practical problems and costs of obtaining a sufficiently large group of reference individuals. The Kristianstad study showed that 75% of excluded subjects could have been identified using only a simple questionnaire. In the upper age group, this percentage was even higher. Therefore preliminary screening of a large number of individuals from the parent population, using a carefully designed questionnaire (i.e., of or related to the current or previous medical history of a patient), would result in a much smaller sample of individuals for examination clinically and by laboratory methods. If 3000 individuals had been prescreened in Kristianstad, and if only the individuals remaining in the reduced sample were subjected to a closer examination, a group of 240 reference individuals would have been obtained.

The two modifications of the protocol may also be combined.

Random or nonrandom sampling?

Ideally, the group of reference individuals should be a random sample of all individuals fulfilling the inclusion criteria defined in the parent population. Statistical estimation of distribution parameters (and their confidence intervals) and statistical hypothesis testing require this assumption.

For several reasons, most collections of reference values are, in fact, obtained by a nonrandom process. This means that all possible reference individuals in the entire population under study do not have an equal chance of being chosen for inclusion in the usually much smaller sample of individuals studied. A strictly random sampling scheme in most cases is impossible for practical reasons. It would imply the examination of and application of inclusion criteria to the entire population (thousands or millions of persons), and then the random selection of a subset of individuals from among those accepted. This approach has been used in selecting individuals at random to provide a cohort that is representative of the full population by several national organizations, such as the National Health and Nutrition Examination Survey (NHANES) in the United States, the Canadian Health Measures Survey in Canada, and the Australian Bureau of Statistics.

Usually the situation is less satisfactory. The sampling process is highly affected by convenience and cost. For example, samples of reference individuals are commonly obtained by selecting (1) from blood donors, (2) from persons working in a nearby factory, (3) from hospital staff, or (4) from hospital databases, none of which represent a random sampling of possible reference individuals in the general population.

The conclusions are obvious: (1) the best reference sample obtainable should be used with a balance between practical considerations and consideration of possible biases that may be introduced by the selection process, and (2) the data should be used and interpreted with due caution, with awareness of the possible bias introduced by the nonrandomness of the sample selection process. For example, lower iron stores may be expected in a sample of regular blood donors, and higher vitamin D concentrations may be expected in a sample drawn from outdoor workers. An additional effect of nonrandomness is an increased chance that results of different reference studies may produce different results even when the defined reference population is intended to be the same.

Selection criteria and evaluation of subjects

The selection of reference individuals consists essentially of applying defined criteria to a group of examined candidate persons. The required characteristics of the reference values determine which criteria should be used in the selection process. Table 9.2 lists some important criteria to consider when production of health-associated reference values is the aim.

TABLE 9.2
Examples of Exclusion and Partitioning Criteria a
Exclusion Partitioning
Age Age
Alcohol intake Blood group
Blood donation (recent) Circadian variation
Drug abuse Ethnicity
Exercise intensity (recent) Exercise intensity (recent)
Fasting vs. nonfasting Fasting vs. nonfasting
Sex Sex
Hospitalization (recent) Menstrual cycle (by stage)
Hypertension
Illness (recent)
Lactation
Obesity Obesity
Occupation Posture (when sampled)
Oral contraceptives
Pregnancy Pregnancy (by stage)
Prescription drugs Prescription drugs
Recent transfusion

a As indicated by the shaded boxes, some criteria may be considered as either exclusion criteria or partitioning criteria.

In practice, consideration of which diseases and risk factors to exclude is difficult (see the discussion on the concept of health earlier in this chapter). The answer lies in part in the intended purpose of establishing reference values; the project must be goal oriented.

Once a factor has been selected as an exclusion factor, a relevant and practical definition is required. For example, obesity is a common condition that is associated with a number of diseases; however, the definition of obesity is problematic. A definition might be based on a known assumed contribution to the risk of a development of specified disease. However, scientific data of this type are seldom available for the studied population. Another possibility for establishing obesity is to use upper limits based on weight measurements in different age, gender, and height groups of the general population (e.g., more than 20% above the national age-, sex-, and height-specific mean weight). For obesity, a common approach is to use definitions based on the body mass index (BMI), although limiting subjects to the healthy range will exclude over 50% of some populations. Tables of optimum or ideal weights have been published by life insurance companies; they may be more appropriate for delineation of obesity. Similar problems relate to the definition of hypertension. And what if a potential reference individual is no longer obese as a result of bariatric surgery or is currently normotensive on drug therapy?

In addition, is it permissible to use exclusion criteria based on laboratory measurements ? It has been argued that a circular process might happen when laboratory tests are used to assess the health of subjects who are subsequently used as healthy control subjects for laboratory tests. But actually there is no difference, in this context, between measuring height, weight, and blood pressure and performing selected laboratory tests, provided that these laboratory tests are neither those for which reference values are produced nor tests that are significantly correlated with them.

The removal of reference results based on other laboratory results has been used in a process termed latent abnormal values exclusion (LAVE). In a multinational study it was shown that this process, using a standard group of exclusion tests and criteria, affected some analytes but had little effect on others. As stated above, care should be taken that tests with correlated results are not used for this purpose. It is particularly difficult to define selection criteria when establishing reference values for older patients. In older age groups, it is “normal” (i.e., common) to have minor or major diseases and to take therapeutic drugs. One solution is to collect values at one time and to use the values of survivors after a defined number of years. ,

Usually the clinical evaluation of candidate individuals is based on (1) a detailed interview or questionnaire (i.e., the complete history recalled and recounted by a patient), (2) a physical examination, and (3) supplementary investigations. Questionnaires and examination forms tailored to the requirements of the actual project facilitate the evaluation and document the decisions made.

Partitioning of the reference group

It may also be necessary to define partitioning criteria for the subclassification of the set of selected reference individuals into more homogeneous groups (see Table 9.2 ). (The question of determining when stratification of the reference sample group is necessary and justified is discussed in later sections.) In practice, the number of partitioning criteria should usually be kept as small as possible to ensure sufficient sample sizes to derive valid estimates.

Age and sex are the most frequently used criteria for subgrouping, because several analytes vary notably among different age and sex groups. , , Age may be categorized by equal intervals (e.g., by decades) or by intervals that are narrower in the periods of life when greater variation is observed. In some cases, more appropriate intervals can be obtained from qualitative age groups, such as (1) postnatal, (2) infancy, (3) childhood, (4) prepubertal, (5) pubertal, (6) adult, (7) premenopausal, (8) menopausal, and (9) geriatric. Further subdivision may also be needed based on Tanner stage of puberty or based on phase of the menstrual cycle. Height and weight also have been used as criteria for categorizing children. The use of age and sex for partitioning has the advantage that reference limits derived from subpopulations on these criteria can be easily applied on pathology reports where these factors are usually known about the patient. In contrast, the application of limits based on other criteria requires knowledge not usually available to the laboratory.

Specimen collection

Several preanalytical factors can influence the values of measured biological quantities, such as the concentrations of components in a blood sample or the amount excreted in feces, urine, or sweat. , This topic is covered elsewhere (see Chapters 4 and 5 ). In this discussion, only aspects of special relevance to the generation of reliable reference values are highlighted. ,

Standardization of the (1) preparation of individuals before specimen collection, (2) procedure of specimen collection itself, and (3) handling of the specimen before analysis may eliminate or minimize bias or variation from these factors. This reduces the “noise” that might otherwise conceal important biological “signals” of disease, risk, or treatment effect.

Preanalytical standardization

Preanalytical procedures used before routine analysis of patient specimens and when reference values are established should be as similar as possible. In general, it is much easier to standardize routines for studies of reference values than those used in the daily clinical setting, especially when specimens are collected in emergency or other unplanned situations. Thus two general approaches have been suggested:

  • 1.

    Only such factors that may be relatively easily controlled in the clinical setting should be part of the standardization when reference values are produced.

  • 2.

    The rules for preanalytical standardization when reference values are produced should also be used for the clinical situation. Such rules include food and beverage restrictions, exercise restrictions, time sitting (or lying down) prior to phlebotomy, and tourniquet time. It has been shown that it is possible to apply these rules rather closely in the clinical setting for both hospitalized and ambulatory patients. The same philosophy forms the basis for recommendations concerning sample preparation preceding analysis.

However, either philosophy is concordant with the concept of reference values, provided that the conditions under which reference values are produced are clearly stated.

Analyte-specific considerations

The types and magnitudes of preanalytical sources of variation clearly are not equal for different analytes (see Chapter 5 ). In fact, some believe that only those factors that cause unwanted variation in the biological quantities for which reference values are being generated should be considered. For example, body posture during specimen collection is highly relevant for the establishment of reference values for analytes that do not diffuse across blood vessel walls, such as albumin in serum or red cell count in blood, but posture is irrelevant for establishment of serum sodium values. ,

Alternatively, several constituents are analyzed routinely in the same clinical specimen and therefore it would be impractical to devise special procedures for every single type of quantity. Consequently, three standardized procedures for blood specimen collection by venipuncture have been recommended , : (1) collection in the morning from hospitalized patients, (2) collection in the morning from ambulatory patients, and (3) collection in the afternoon from ambulatory patients. Such schemes have to be modified depending on local conditions and necessities and on the intended use of the reference values produced. Published checklists , may be helpful in the design of a scheme.

A special problem is caused by drugs taken by individuals before specimen collection, , and it may be necessary to distinguish between indispensable and dispensable medications. If possible, dispensable medication should be avoided for at least 48 hours. The use of indispensable drugs, such as contraceptive pills or essential medication, may be a criterion for exclusion or partitioning if these affect the analyte of interest.

In emergency or other unplanned clinical situations, even a partial application of the standardized procedure for collection has been shown to be of great value. When collections have been made under conditions other than those specified for a specific analyte, interpretation of results against reference limits requires awareness of the type and magnitude of variation that may be expected under those circumstances. For example, a serum cortisol collected in the evening cannot usually be compared with reference limits established for morning collections, the exception being that a high result is still of great clinical relevance because the upper limit for evening values is typically much lower than the upper limit for morning values.

Necessity for additional information

The clinical situation is often different from a controlled research situation; for example, specimens have to be taken (1) during operations, (2) in emergency situations, and (3) when patients are unwilling or unable to follow instructions. Therefore the clinician may need additional information for interpretation of a patient’s values in relation to reference values obtained under fairly standardized conditions.

An empirical approach is to produce other sets of reference values, such as postprandial values, postexercise values, or postpartum values. Such a method, however, is very expensive and does not cover all situations that could possibly arise. This approach is also limited by the variability in these events (i.e., for postprandial samples, the size of the meal, the types of food consumed, and the number of hours since the meal).

Another, more general solution to the problem is called the predictive approach. Starting from a set of ordinary reference values and using quantitative information on the effects of various factors (e.g., intake of food, alcohol, and drugs; exercise; stress; posture; or time of day), expected reference values that fit the actual clinical setting could be estimated. , An interesting example is provided by thyroid-stimulating hormone (TSH), where the effect of diurnal variation needs to be considered.

More studies of such effects are needed, especially for the combined effect of two or more sources of variation. For example, is the combined effect of alcohol and contraceptive drugs on GGT activity in serum less than, equal to, or greater than the sum of their individual effects?

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here