Ultrasound and Biochemical Screening for Fetal Aneuploidy


Key Points

  • First trimester screening for Down syndrome (DS) with a combination of ultrasound, nuchal translucency and two maternal serum markers (combined test) has a much better performance than second trimester screening with four serum markers (quad test).

  • Combined test performance can be improved by the addition of serum markers such as placental growth factor (PlGF) and ultrasound markers such as nasal bone determination.

  • Quad test can be improved by the addition of ultrasound markers such as those based on the facial profile.

  • A late second trimester anatomical scan can be used ad hoc to reinterpret a DS screening result provided ‘soft’ marker–specific likelihood ratios are applied to a posttest risk; however, a policy of routine screening with this scan would have poor performance.

  • Sequential protocols using markers in both trimesters have the best performance; stepwise sequential and contingent tests perform as well as or better than the integrated test.

  • A large proportion of Edwards syndrome (ES) cases are detected incidentally in pregnancies in which the combined test is positive for DS; these proportions are much lower for the quad test, and a separate ES cutoff is needed (similarly for Patau syndrome).

  • Combined test can be extended to screen for adverse pregnancy outcome such as preeclampsia, particularly when PlGF is used.

  • Cell-free DNA screening has much better performance than conventional screening; however, currently the cost is too high for a policy of primary screening in public health settings. Instead, secondary and contingent protocols are recommended.

Historical Perspective

Fetal aneuploidy is a common cause of congenital abnormality; it is associated with intrauterine fatality and, among those surviving to term, moderate to severe intellectual impairment, morbidity and increased mortality. In addition to the learning and health implications, the birth of an affected infant can have a negative long-term impact on the parents and their families.

Prenatal diagnosis of aneuploidy requires testing of fetal material obtained by invasive procedures such as chorionic villus sampling (CVS) and amniocentesis, which have risks and are expensive. Depending on the health care system in a given locality, women are offered these procedures either unselectively, as in the United States; on the basis of prior risk factors; such as advanced maternal age (AMA) or family history; or after routine screening. After invasive prenatal diagnosis, the option of termination of affected pregnancies is taken up by most of those with severe forms of aneuploidy, but some are prepared to continue the pregnancy. The earlier in pregnancy that prenatal diagnosis can be achieved, the more time there is to counsel parents regarding the potential impact of the diagnosis. If the decision is made not to terminate the pregnancy, the additional ‘lead time’ can be used to prepare the parents for the birth of the child.

The modern era of antenatal screening began in the mid-1970s with the discovery that maternal serum α-fetoprotein (AFP) levels are increased, on average, in pregnancies affected by fetal neural tube defects (NTDs). At that time, the analyte had not been standardised, leading to large between-laboratory differences, and the fact that levels increase rapidly with gestation, doubling about every 5 weeks, was not taken into account. To overcome this, results are currently expressed as multiples of the normal median (MoMs) for unaffected pregnancies of the same gestation for each laboratory.

α-Fetoprotein screening for NTDs became routine, and as a consequence, it was incidentally noted, about a decade later, that levels were relatively low in some cases of aneuploidy. Because fetal loss is also associated with low AFP, this observation might have been accounted for by nonviable types of aneuploidy. However, it was soon established that levels were reduced on average in Down syndrome (DS, trisomy 21) cases which are viable. Initially, women were selected for invasive prenatal diagnosis on the basis of a low AFP, but it was shown to be more efficient to combine information on maternal age, family history and MoM into a patient-specific DS risk.

Subsequently, additional ‘markers’ of aneuploidy were added, both maternal serum analytes and ultrasound markers. Using multimarker profiles to calculate risk resulted in better screening performance measured in terms of detection rate (DR, proportion of affected pregnancies referred for prenatal diagnosis) and false-positive rate (FPR, proportion of unaffected pregnancies referred). The risk can be considered as a composite marker, and the result is classified as ‘positive’ and ‘negative’ by comparison with a fixed risk cutoff. First trimester multimarker protocols yielded a higher DR for a fixed FPR compared with the second trimester. In addition to this improvement, the introduction of first trimester screening methods enabled earlier detection, earlier reassurance and safer termination of pregnancy when requested. Sequential protocols testing in both trimesters had even better screening performance, although they sacrificed early detection.

For some time, the main focus of aneuploidy screening was the detection of DS, but gradually centres included separate risks for Edwards syndrome (ES, trisomy 18) and Patau syndrome (PS, trisomy13). For some screening protocols, even when only a DS risk was given, there was a high ‘incidental’ detection of ES, PS and other forms of aneuploidy because of advanced age and having marker profiles similar to DS.

Recently, a completely different screening modality has become available: noninvasive prenatal testing based on the determination of cell-free DNA (cfDNA) in maternal plasma. This substantially increases DR and vastly reduces FPR. However, because cfDNA still has false-positive and false-negative results, it is not a substitute for invasive testing. Nevertheless, professional bodies such as the American College of Obstetricians and Gynecologists (ACOG). and the International Society for Prenatal Diagnosis (ISPD). recommend that cfDNA be used as a secondary screening test after a positive result by conventional screening protocols. An abnormal cfDNA finding requires confirmation by the gold standard of invasive testing.

Despite the superior performance, routine cfDNA screening is unlikely to replace established protocols in the short term. An important rate limiting step is cost; current prices preclude routine testing in most localities. A compromise approach is ‘contingent cfDNA’ screening whereby 10% to 30% of women with the highest risks based on established screening tests are selected for cfDNA. This approach also allows the use of biochemical and ultrasound markers to screen for adverse outcomes of pregnancy and congenital abnormalities other than aneuploidy.

Aneuploidy

Fetal aneuploidy results in a wide spectrum of phenotypes, with viability and clinical outcome varying according to the genotype. Severity ranges from lethal (e.g. triploidy) to the relatively benign Turner syndrome (TS, monosomy X) and other sex chromosome abnormalities (SCAs). Most severely affected embryos abort spontaneously early in the first trimester, sometimes even before there are clinical signs of pregnancy. Among those who survive the first trimester, there remains a high rate of intrauterine mortality and an increased risk of infant death. By far, DS is the most frequent aneuploidy sufficiently viable to survive to term in relatively large numbers and amenable to screening, with a birth prevalence (in the absence of prenatal diagnosis and therapeutic abortion) of 1 to 2 per 1000. ES and PS have respectively about one 10th and one 20th the birth prevalence of DS.

Maternal Age Distribution

In a particular locality, the birth prevalence of DS, ES and PS will vary according to the maternal age distribution of the local population. DS prevalence can be estimated by multiplying the proportion of pregnancies at each single year of age by the maternal age-specific prevalence based on published regression curves. The best available curves are derived by a meta-analysis of published birth prevalence rates for individual years of age determined before prenatal diagnosis became clinically established. Four meta-analyses have been published based on 11 different maternal age-specific birth prevalence series. In one of these meta-analyses, all eight series published at that time were included with a total of 4500 DS births and more than 5 million unaffected births. For each year of age, data were pooled by taking the average birth prevalence rate across the series weighted by the number of births. The best-fitting curve was ‘additive-exponential’ in which there are two components: a background prevalence independent of age and an exponential increase with age.

Standard Age Distribution

The relative advantages of competing screening protocols can be demonstrated either directly or by statistical modelling. For a direct comparison, a very large study will be required in which there are a substantial number of affected pregnancies tested by more than one protocol and there is no intervention. The reason for nonintervention is to avoid ‘viability’ bias because of the high rate of intrauterine fatality. This bias arises because a proportion of affected pregnancies terminated after invasive prenatal diagnosis carried out because of a positive result from one of the protocols would have ended in fetal loss. Because the equivalent proportion among those with negative results will not be identified, the observed DR will necessarily be inflated.

Modelling yields more robust estimates and a more realistic comparison among protocols. The model components include parameters of the marker distributions and the maternal age distribution. The latter could be an observed distribution in some locality, but for protocol comparison purposes, it can also be modelled. Many such modelling exercises, including this chapter, use a Gaussian age distribution with mean of 27 and a standard deviation (SD) of 5.5 years–the ‘standard’ distribution.

Prevalence According to Gestational Age

Studies of prenatal diagnosis can be used to estimate DS prevalence at late first trimester, when CVS is generally performed, and midsecond trimester when amniocentesis is done. Intrauterine loss rates of DS from the time of prenatal diagnosis until term are calculated either by comparing the observed number of cases diagnosed at prenatal diagnosis with that expected from birth prevalence, given the maternal age distribution or by follow up of individuals declining termination of a DS pregnancies using direct or actuarial survival analysis. Published prevalence studies include a total of 341 DS cases diagnosed by CVS and 1159 following amniocentesis. There are three published follow-up series including 110 DS cases after at amniocentesis and a series of 126 DS cases from the UK National Down Syndrome Cytogenetic Register (NDSCR), a very complete national database, which were analysed according to the gestational age at prenatal diagnosis. However, the Register study was biased as there were miscarriages that occurred in women who did intend to have pregnancy termination, thus inflating the rates. An actuarial survival analysis of the Register data was carried out which overcame this bias and was more data efficient because all cases contributed to the estimate, not just those in which pregnancy termination was refused. Actual and potential heterogeneity between the various studies precludes a grand meta-analysis to estimate the fetal loss rates, but an informal synthesis concluded that approximately one-half of DS pregnancies are lost after first trimester CVS and one quarter after midtrimester amniocentesis. Formulae have been published from a large series of more than 57,000 women having invasive prenatal diagnosis based only on advanced maternal age. These calculations assumed that fetal loss rates did not vary with maternal age. ; however, the studies used to calculate the overall rates were largely based on women older than 35 years of age, so this could not be readily analysed. Because fetal loss rate in general increases markedly with maternal age, it is likely that this occurs in DS pregnancies as well, as confirmed in a NDSCR actuarial survival analysis based on 5116 registered DS pregnancies, of which 271 ended in a live birth and 149 in fetal loss; the remainder were terminated. The overall estimated fetal loss rates from the time of CVS and amniocentesis were similar to previous reports, but these rates increased steadily with maternal age: from 23% and 19% at age 25 years to 44% and 33% at age 45 years. One caveat, though, was that the observed maternal age effect was confounded by differences in marker levels. A large proportion of the prenatally diagnosed cases were detected because of a positive result after routine antenatal screening. The marker distribution in screen positives varied according to maternal age; however, marker profile in young women tended to be extreme, but some older women, even those with moderate profiles, had a screen-positive result because of the contribution of their advanced age to the risk.

Screening and Prenatal Diagnosis

There is a fundamental difference between screening and diagnostic tests despite the use of the same terms to describe their respective results: ‘true positive’, ‘false positive’, ‘true negative’ and ‘false negative’. The aim of ultrasound and biochemical screening is limited to the identification from among apparently healthy pregnancies of those that are at high enough risk for a chromosomal abnormality to warrant the use of an invasive diagnostic test. Thus screening for aneuploidy does not aim to make a diagnosis but to ration the use of diagnostic procedures and tests that, without prior selection, would be more hazardous or expensive.

Evaluating the Efficacy of Screening Markers

The potential utility in screening of a given marker depends on the extent of separation between the marker distributions in affected and normal populations. This can be expressed as the absolute difference between the distribution means divided by the average SD for the two distributions, a form of Mahalanobis distance. For continuous variables, the choice of a cutoff level that determines whether a value is positive or negative is arbitrary because there is no intrinsic division between the distributions. The choice is influenced by the perceived relative importance of three factors: DR, FPR and the positive predictive value (PPV), that is, the chance of being affected given that the screening result is positive. The prior risk in those screened influences the PPV. (See Chapter 16 for further discussion of PPV.)

All serum markers used in aneuploidy screening are continuous variables whose distribution of values is higher or lower on average in affected pregnancies. Typically, these markers have considerable overlap in the distribution of results between affected and unaffected individuals. In contrast, the distribution of values for variables used in diagnosis has essentially no overlap. Most of the principal ultrasound markers are also continuous variables with overlapping distributions. There are also some dichotomous ultrasound markers, which present difficulties of quality assessment.

Principal Down Syndrome Markers

Of the more than 50 maternal blood, urine and ultrasound markers of DS, seven are widely used in routine multimarker screening. These are maternal serum AFP, human chorionic gonadotrophin (hCG), the free-β subunit of hCG, unconjugated estriol (uE 3 ), inhibin A and pregnancy-associated plasma protein (PAPP)-A and ultrasound nuchal translucency (NT). Of these markers, PAPP-A and NT are only used in the first trimester; the remainder can be used in the first or second trimester, but generally AFP, uE 3 and inhibin A are used in the second.

Table 18.1 shows the average MoM in DS pregnancies and the Mahalanobis distance for each of the principal markers. For comparison AFP as a marker of NTDs has a distance exceeding 3, and maternal age as a ‘marker’ of DS would have a distance of about 1.

TABLE 18.1
Down Syndrome: Average Multiples of the Normal Median (MoM) and Mahalanobis Distance for Each Marker, According to Gestation a
Marker Gestation (wk) Average MoM Mahalanobis Distance b
NT 11 2.30 2.02
12 2.10 1.87
13 1.92 1.65
PAPP-A 10 0.40 1.31
11 0.45 1.14
12 0.53 0.90
13 0.65 0.61
Free β-hCG 10 1.66 0.76
11 1.86 0.94
12 2.01 1.05
13 2.09 1.11
14–18 2.30 1.33
hCG 10 1.03 0.05
11 1.18 0.32
12 1.41 0.68
13 1.77 1.14
14–18 2.02 1.15
AFP 14–18 0.73 0.79
uE 3 14–18 0.73 0.83
Inhibin A 14–18 1.85 1.12
AFP, α-Fetoprotein; DS, Down syndrome; hCG, human chorionic gonadotrophin; PAPP-A, pregnancy-associated plasma protein A; uE 3 , unconjugated estriol.

a Based on meta-analyses.

b Log (average MoM)/((SD in DS + SD in unaffected pregnancies)/2), expressed as a positive number, where SD is the standard deviation of log (MoM).

Nuchal translucency is by far the single best individual marker. Among the serum markers, PAPP-A is the most discriminatory, but the Mahalanobis distance declines rapidly with increasing gestation. Free β-hCG is more discriminatory at 14 to 18 weeks than at 10 to 13 weeks, although there is a gradual change in Mahalanobis distance between 10 and 18 weeks. At 14 to 18 weeks’ gestation, hCG is less discriminatory than free β-hCG, and before 13 weeks, it is a poor marker. At 14 to 18 weeks, inhibin A is of comparable discriminatory power to hCG. AFP and uE 3 are not very discriminatory markers.

Multimarker Testing Strategies

A large number of marker combinations have been evaluated. Many of them are in use today and are recommended by ISPD. Among the optimal strategies, those yielding the highest DR for a given FPR, there are just six in widespread use:

Combined test

This first-trimester strategy is a combination of two serum markers, PAPP-A and either hCG or free β-hCG, together with ultrasound measurement of NT. The blood sample can be taken from 10 to 13 weeks, although some laboratories accept an earlier sample. However, the NT has a narrower acceptable range of 11 to 13 weeks or an ultrasound crown–rump length (CRL) of 45 to 85 mm.

Quadruple test

A second trimester serum-only strategy combining AFP, hCG or free β-hCG, uE 3 and inhibin A. To use the AFP level for both NTD and aneuploidy screening, the test has to be carried out at 16 to 18 or at least 15 to 19 weeks’ gestation, the window for optimal NTD detection.

Integrated test

This strategy combines markers in both trimesters. PAPP-A and NT are determined in the first trimester, but hCG or free β-hCG measurement is delayed until the second trimester when it is measured together with AFP, uE 3 and inhibin. The protocol requires nondisclosure of risk based on the PAPP-A and NT levels. Some regard the nondisclosure to be unethical or at least impractical because of the difficulty for the health professional not to act on first trimester findings which would of themselves be abnormal, particularly the NT. The increase in DR offered by this approach is offset by the delayed early diagnosis and reassurance that a first trimester test offers.

Serum integrated test

This is a serum only version of the Integrated test.

Stepwise sequential test

The first stage uses the combined test markers and women with very high risks–much higher than for a combined test per se –are immediately offered invasive testing. Those with risks below this cutoff are offered the quad test markers with their final risk based on all markers.

Contingent testing

Contingent testing is performed as with the stepwise sequential test except that second trimester marker testing is contingent on the first trimester results. In this approach, two first trimester cutoffs are used. Very-high-risk patients are referred for diagnostic testing, and low-risk patients only have first trimester screening performed. Intermediate first trimester values have second trimester serum screening. Only 10% to 20% of women with borderline high-risk results are offered the second trimester stage.

Risk Screening

Prior Risk

The prior, or pretest, DS risk can be estimated from the maternal age and family history. It can be expressed either as the chance of having a term pregnancy with the disorder or the chance of the fetus being affected at the time of testing. In so far as the aim of screening is to reduce the prevalence at birth, the former is most appropriate. Because screening is also about providing women with information on which to base an informed choice about prenatal diagnosis, it can be argued that the latter is more relevant. If term risks are used, they can be estimated from the age-specific birth prevalence with an additive component because of family history. If first or second trimester risks are used they can be estimated by applying the intrauterine loss rates to the prevalence.

Likelihood Ratio

Statistically, the optimal way of interpreting the multimarker profile is to estimate the DS risk from the individual marker levels. This is done by modifying the prior risk by calculating a patient specific likelihood ratio (LR) derived from the patient’s marker profile and a model of marker distributions. Because all the principal markers follow an approximately log Gaussian distribution over most of the MoM range, a multivariate log-Gaussian model is used. The model parameters are the log-transformed means, SD and correlation coefficients in affected and unaffected pregnancies. There will be MoM values beyond which there is substantial deviation from the model. Values beyond these ‘truncation limits’ are assumed to be at the nearest limit for risk calculation purposes. Parameters are best derived by meta-analysis, excluding the viability bias that occurs in prospective intervention studies or at least adjusting for bias.

The LR for a single marker is calculated as the ratio of the heights of the two overlapping ‘bell-shaped’ distributions at the specific level. For two markers, the overlapping bivariate distributions can be represented as ‘football shaped’ mountains, and the heights are determined at the longitude and latitude of the two specific levels. When more than two markers are included, the multivariate distributions are difficult to visualise, but the principle is the same whereby the ratio of ‘heights’ is determined. This form of risk calculation assumes that the marker levels and maternal age are independent determinants of risk and that the marker levels are unrelated to the probability of intrauterine survival. Although there is evidence that extreme values of biochemical and ultrasound markers can be associated with increased fetal demise, values within the truncation limits will not be affected.

Covariables

More precise LRs can be estimated when the MoMs and the distribution parameters take account of covariables such as maternal weight, smoking status and ethnicity. The serum markers, when expressed in MoMs, are negatively correlated with maternal weight. This is usually explained in terms of dilution: a fixed mass of chemical produced in the fetoplacental unit is diluted by a variable volume in the maternal unit. However, this cannot be the only factor involved because the extent of correlation differs between the markers (e.g. the correlation is almost twice as great for PAPP-A than AFP or hCG; inhibin has a weaker correlation than these two; and for uE 3 , there is hardly any association at all, particularly in the first trimester). It is standard practice to adjust all serum marker levels for the individual’s weight, dividing the observed MoM by the expected value for the weight derived by regression. A regression formula using 1/weight is more accurate for very large and very small women than simple linear regression and should be derived from a local population.

On average, smokers have reduced levels of PAPP-A, free β-hCG and second trimester hCG but an increased inhibin level. In women of Afro-Caribbean origin or African Americans, on average, AFP, intact hCG and second trimester free β-hCG levels are increased, and inhibin A is decreased. In women of South Asian origin, uE 3 and total hCG levels appear to be somewhat increased. As with maternal weight, adjustment can be carried out by dividing the observed MoM by the average value in the local population according to smoking status and ethnicity. The correction factors used for different ethnic groups appear to differ according to gestational age.

Gestational Dating

Accurate determination of gestational age is a key to both timing of the screening test and for MoM calculation. In general obstetrics, the gestational age based on the time since the last menstrual period (LMP) should only be modified by ultrasound findings if there is a large discrepancy. In early pregnancy, a difference between LMP- and CRL-based estimates greater than 3 days is regarded as large and should lead to a change in gestational age based on the CRL. As pregnancy continues, fetal measurements are somewhat less precise and a difference of more than 7 days is required. In practice, screening performance will be improved if in every pregnancy in which ultrasound dating is available, it is used for MoM calculation instead of the LMP.

Evaluating the Performance of Multimarker Screening

Two widely used methods of estimating DR and FPR are numerical integration and Monte-Carlo simulation. Numerical integration is based on the same model of the log Gaussian distributions of each marker in DS and unaffected pregnancies used for risk calculation together with a maternal age distribution. The theoretical range of the markers (plus to minus 3 SDs) across both outcomes is divided into a number of equal sections, thus forming a ‘grid’ in multidimensional space. The Gaussian distributions are then used to calculate for each section (square for two markers, cube for three and so on) the proportion of DS and unaffected pregnancies in the section and the LR. It is then a matter of applying these values to the maternal age distribution. At each maternal age, the number of DS and unaffected pregnancies is estimated from the age-specific risk curve. The distributions of DS risks are then calculated from the grid values. Monte-Carlo stimulation also uses the Gaussian distributions, but instead of rigid summation over a fixed grid, it uses a random sample of points in multidimensional space to simulate the outcome of a population being screened.

When assessing the relative benefits of different policies, it is best to either fix the FPR (e.g. 1% or 5%) and compare the DRs or fix the DR (e.g. 75% or 85%) and compare the FPRs. However, when changing policy, it would be confusing to alter the cutoff risk to maintain the DR or FPR. In practice, it is common to retain the cutoff (e.g. 1 in 250 at term or 1 in 270 at midtrimester) and allow both DR and FPR to vary. In this chapter, performance is presented using all three methods, and model predictions are based on Gaussian distributions with parameters derived by meta-analysis and use the standard maternal age distribution.

First Trimester Screening

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here