Challenges in Understanding and Quantifying Overdiagnosis and Overtreatment

Plain Language Summary

Overdiagnosed breast cancer is breast cancer detected by screening that would not have caused any health problems had it been left undetected and untreated. Without screening, it would not ever have been detected. Overdiagnosed breast cancers are “real” cancers, in that they meet current professional standards for cancer diagnosis (ie, they are not false-positives), but detecting and treating these cancers does not improve health outcomes.

Over recent years, strong evidence that breast cancer overdiagnosis exists has been obtained from a range of sources, including three randomized controlled trials (RCTs). Furthermore the incidence of breast cancer (particularly ductal carcinoma in situ (DCIS) and early invasive breast cancer) increased strongly after screening mammography was introduced, while rates of advanced breast cancer declined only modestly or not at all. The excess incidence persists even after adjusting for other changes that increase breast cancer (such as use of hormone therapy, changing reproductive patterns, and increasing obesity). This pattern has been observed internationally, and is consistent with overdiagnosis.

Over the last 15 years, many ways to express overdiagnosis as a percentage have emerged; no single approach is “right.” For providing information to women, the best way is to express overdiagnosis as the percentage of breast cancers diagnosed in a screened population of women during the years in which they participate in screening. Percentages which include (in the denominator) all breast cancers detected during women’s lifetimes (ie, over long follow-up) may mislead because they dilute the estimate of overdiagnosis by including many cancers which have nothing to do with screening. Furthermore, including different time periods makes comparison between studies difficult.

We estimate that 15–30% of breast cancers that are detected in women who regularly participate in contemporary screening programs are overdiagnosed, based on results from randomized trials and well conducted observational studies. Lower estimates have been reported from studies with higher risks of bias.

Women who are overdiagnosed with breast cancer receive similar treatment as other women diagnosed with breast cancer, leading to overtreatment. Overtreatment resulting from overdiagnosis therefore affects 15–30% of women undergoing breast cancer treatment in a screened population. We cannot currently say with certainty whether an individual woman has been overdiagnosed, so this presents women and their doctors with difficult dilemmas about treatment choices. Overtreatment causes patient harm from side effects of treatment, and increases the costs and burdens of breast cancer for individual women and for the health care system.

Uncertainty remains about how much overdiagnosis has occurred since screening mammography was introduced, and this may never be known precisely. It is important to monitor the effects of new screening technologies (such as digital breast tomosynthesis) as these may result in incremental overdiagnosis compared to current practice. Future research should include randomized trials to assess new technologies within screening programs and studies to improve the evidence base about the harms and costs of overtreatment.

Introduction

Breast cancer is the most common cancer in women worldwide. Its incidence increased in the 20th century, first in developed, and later in developing countries. Changes in lifestyle-related risk factors contributed to rising incidence. These included changes in reproductive factors and choices, body mass index, use of hormone therapy and consumption of alcohol. It is increasingly clear, however, that overdiagnosis—the detection by screening of breast cancers that would not cause harm and would not ever be found without screening—has also been a contributory factor.

During the last 30 years, the mortality from breast cancer has been falling due to advances in breast cancer treatment and screening. Whilst this mortality benefit is important, to fully understand the value of breast cancer screening, readers must also consider its harms. In this chapter, we focus on, arguably, the most important harm of breast cancer screening—overdiagnosis. We discuss the challenge of understanding and quantifying breast cancer overdiagnosis and provide an up-to-date summary of the best quality research evidence.

Defining Breast Cancer Overdiagnosis

●
A patient perspective
●
A statistical perspective
●
A social perspective

The first challenge in understanding and quantifying breast cancer overdiagnosis is to define it, as, without robust definition, valid quantification cannot proceed. Several approaches to this deceptively simple task have been used to date, and we outline these in this section.

A Patient Perspective

The first, and widely used, definition is: an overdiagnosed cancer is a cancer detected by screening that would not have presented clinically during that person’s lifetime in the absence of screening . These preclinical cancers would never have caused symptoms or become life threatening.

An important concept implicit in this definition is the heterogeneity of breast cancer types and behaviors. Some breast cancers progress rapidly whereas others progress slowly, become static, or may even regress. This heterogeneity of cancer progression—from fast through slow to nonprogressive and regressive—is illustrated in Fig. 6.1 . It leads to the possibility of overdiagnosis, through at least two major mechanisms (see Fig. 6.1 ).

Figure 6.1, Heterogeneity in cancer progression.

One mechanism operates through the detection by screening of a preclinical cancer which, without screening, would have progressed to cause symptoms but the woman dies from a cause other than breast cancer. These cancers are labeled fast and slow in Fig. 6.1 , and at some point would cause clinically apparent symptoms and disease. However, if women die from a cause other than breast cancer before the cancer would have caused symptoms, these cancers become overdiagnosed cancers, according to the definition earlier. Readers can think of this as overdiagnosis due to screen-detection of progressive preclinical cancer. This cause of overdiagnosis is an inevitable consequence of screening. The risk of overdiagnosis due to competing mortality increases steadily with age as the risk of death from other causes—such as heart disease—also increases.

The other important mechanism is through the detection of preclinical cancer which is nonprogressing (indolent) or even regressing. These cancers would never cause symptoms and disease no matter how long women had lived, yet may be readily detected by screening. Readers can think of this as overdiagnosis due to screen-detection of nonprogressive preclinical cancer .

The distinction between these two mechanisms of overdiagnosis is necessary to appreciate that some overdiagnosis, due to competing mortality, is inevitable in breast cancer screening. Unfortunately screening also has a tendency towards overdiagnosis due to the detection of preclinical cancer that is very slow or nonprogressive; this is the focus of most of this chapter. The distinction is also needed to understand the next definition.

A Statistical Perspective

Researchers who develop statistical models to estimate overdiagnosis in cancer screening may use a more technical definition: overdiagnosis can be thought of as cases whose lead time exceeds their remaining years of life .

Lead time is the amount of time by which the diagnosis of cancer is advanced by screening. In other words, it is the period of time from screen-detection of a cancer to the clinical presentation of cancer (with symptoms) if screening had not occurred. As such, it is largely an unobservable phenomenon, and therefore has to be estimated.

Lead time is an important concept in this definition, and the definition is useful for modeling studies. However, Baker points out that the concept lead time is not meaningful in relation to nonprogressive preclinical cancer because, without screening, these cancers would never become symptomatic or diagnosed no matter how much lifetime remains.

A Social Perspective

Carter et al. define overdiagnosis as occurring when

●
Some patients are given a diagnosis of a condition
●
The diagnosis meets current professional standards (it is not a false-positive)
●
But the diagnosis carries an unfavorable benefit:harm ratio.

This definition makes the important point that overdiagnosis is different from a false-positive screening test result. An overdiagnosed breast cancer is a real breast cancer, meeting current professional, pathological criteria for cancer. But it does not behave and progress in ways that we have come to expect cancers to behave. This definition makes explicit that, because of this, overdiagnosis may lead patients to undergo medical and surgical intervention that are, at best, very unlikely to be beneficial but involve significant risk of psychological and physical harm (overtreatment). Furthermore, Carter et al. argue that overdiagnosis has inevitable social and ethical dimensions. For example, social and ethical dimensions are involved when making a (value) judgment of no net benefit (unfavorable benefit:harm ratio), and in considering the best allocation of health care resources within societies.

These varying definitions begin to illustrate the complexity of quantifying overdiagnosis. Discussion of alternative definitions has unearthed important differences between perspectives, and uncovered that estimates of the frequency of overdiagnosis may hinge on researchers’ choice of definition.

No matter which of these definitions is chosen, an overdiagnosed case is an intangible concept, which can only be perceived and quantified at a population level (a “view from space” if you will ). Currently, it is impossible to say with certainty whether any individual has been overdiagnosed or not. This reality lies at the heart of the conundrum of overdiagnosis and overtreatment, and presents policy makers, practitioners and patients with difficult dilemmas once screen-detected cancer is found, because at that point it is not possible to know whether the cancer detected is destined to be progressive or not.

How Do We Know Breast Cancer Overdiagnosis Exists?

Nonprogressive, harmless breast cancer, is a counterintuitive idea. Readers may wonder how we can be sure that overdiagnosis exists, if we cannot identify individual women who are overdiagnosed. To answer this question we need to consider the research evidence which has slowly accumulated since screening mammography was introduced in the 1960s. Taken as a whole it provides compelling evidence that breast cancer overdiagnosis exists and motivated investigators to conduct studies to try to quantify it.

Theoretical Grounds for Expecting Overdiagnosis

To understand the methods used to quantify overdiagnosis, an understanding of the effects of screening on breast cancer incidence is needed. An increase in incidence is expected with screening, especially after the first (prevalent) screening round. This “prevalence peak” occurs because cancers which were destined to present in subsequent years are detected earlier and so the age-specific incidence is augmented. This pattern of increased incidence continues—to a lesser extent—following every subsequent screening round. In an ideal screening program without overdiagnosis, this excess of cancers during screening will be balanced by a deficit of cancers after screening ceases, because all the cancers that would have developed clinically by then will have already been diagnosed (see Fig. 6.2 ). This is necessary and desirable; without it the benefit of early detection could not be realized. However, when screening occurs with overdiagnosis, there is a residual excess incidence in the screened population, even many years after screening ends ( Fig. 6.3 ).

Figure 6.2, Illustration of the anticipated effect on incidence of the introduction of screening, hypothetical data.

Figure 6.3, Illustration of the anticipated effect on incidence of the introduction of screening with overdiagnosis, hypothetical data.

Screening theory predicts overdiagnosis of nonprogressive or very slowly progressing breast cancer because screening will always have a higher probability of detecting slowly progressing cancer or nonprogressing cancer than of detecting rapidly progressing cancer, a phenomenon described as length bias, or prognostic selection ( Fig. 6.4 ).

Figure 6.4, Length (overdiagnosis) bias: A screening episode (represented by the dashed line) will pick up more preclinical cancers that are progressing slowly, and so remain detectable but asymptomatic, for longer. Preclinical cancers that are more rapidly progressing may present with symptoms before the next screening round, as interval cancers. This is why screening can always be expected to detect proportionally more slowly progressing, good prognosis cancers than rapidly progressing cancers, a phenomenon known as length bias (also called prognostic selection bias or overdiagnosis bias). In this hypothetical example, screening detects three slowly progressing cancers (two of which are overdiagnosed), two fast cancers, and misses one interval (fastest) cancers. Note that very slow growing cancers remain asymptomatic for so long that they may still “available” to be detected at the next round, even if missed at the initial screening round.

The possibility that screening mammography might detect nonprogressing breast cancer was recognized early, for example by Feinleib and Zelen in 1969. But their description of the tendency of screening to disproportionately detect low risk cancer was largely ignored. Forrest, in his influential report on breast cancer screening for the UK health ministers in 1986, also mentioned the potential for overdiagnosis in breast cancer screening.

Early Empiric Evidence of Overdiagnosis

Large increases in incidence of early breast cancer have been widely observed, with first observations dating back to the 1970s, coinciding with the introduction of screening mammography in the US. Later, the incidence of breast cancer among women 50–69 years rose between 2 and 10% per year, corresponding with the implementation of screening mammography in countries in Europe, the United Kingdom, North America, and Australia. Studies in Sweden, Norway, Denmark, the United States, Canada, and Australia have shown marked increases in breast cancer incidence with the introduction of screening, even after breast cancer risk factors (reproductive factors, BMI, use of hormone therapy, and alcohol consumption) have been taken into account. These rises in incidence led Fox, and many others subsequently, to consider whether screening mammography itself could be driving up the incidence of breast cancer.

Arguably, more important than simply observing increasing incidence are changes in incidence patterns of early and advanced breast cancer. In an optimal screening scenario, as described by Esserman, rates of advanced cancer should drop in response to increased detection (and treatment) of early stage disease. In turn, this drop in rates of advanced cancer should be a precursor to declining cancer mortality, and both are markers of an effective cancer screening intervention which improves health outcomes. Esserman, however, also described worst-case and intermediate-case scenarios. In a worst-case scenario, the incidence of early cancer rises with the introduction of screening but with no subsequent reduction in advanced stage cancer. Such a pattern, if observed, would strongly suggest overdiagnosis. For a visual representation of Esserman’s optimal, intermediate and worst-case screening scenarios (see Reference 23).

Patterns consistent with an intermediate case scenario (as described by Esserman) in relation to breast cancer have been observed in recent decades. Rates of advanced cancer have declined modestly, or not at all in regions where screening mammography has been implemented, while rates of early breast cancer have substantially increased. More recently, in 1998, the upper age limit for screening mammography in the Netherlands was extended from 69 to 75 years; after this change the incidence of early breast cancer strongly increased between 1998 and 2011 with only a small decrease in advanced breast cancer rates.

In particular, large increases in DCIS incidence have been observed, coincident with the introduction of screening mammography. For example, Kerlikowske documented a 500% increase in incidence of DCIS among women over 50 years of age between 1983 and 2003. DCIS now accounts for 20–25% of screen-detected breast cancer. Notably, despite more than 20 years of detection and treatment of DCIS, incidence rates of invasive breast cancer have not declined. This suggests that DCIS may not be an inevitable precursor lesion of most invasive cancers as previously thought. Instead, it makes more sense to think of DCIS as a marker of risk for future invasive breast cancer.

With hindsight, we can now characterize the changes which have occurred in recent decades as screening mammography has been implemented. These changes are outlined in Table 6.1 and suggest screening is detecting low risk or nonprogressing breast cancer, some of which would never have become life-threatening.

Table 6.1

Characteristics of Screening Mammography Implementation Which May Signal Overdiagnosis

Characteristic		Notes and Explanation
Wide uptake of a cancer screening test with potential for overdiagnosis	✓	Screening mammography has expanded steadily in western countries over the last three decades. Currently 72% of US women aged 50–74 years report being screened for breast cancer in the last 2 years. Participation in regular (3 yearly) screening mammography is about 75% among women 50–70 years in the United Kingdom.
Increased cancer incidence following introduction of screening	✓	Evidence in 35 countries of increased incidence in the years after the introduction of screening mammography.
Increased early stage cancer incidence without commensurate decrease in advanced cancer rates, and/or mortality	✓	In an ideal screening scenario (without overdiagnosis) increased detection of early stage cancer should reduce advanced cancer rates and mortality. In intermediate and worst-case scenarios (adversely affected by overdiagnosis), early stage cancer incidence increases strongly, while advanced cancer rates and/or mortality decline only modestly or not at all. Large increases in early breast cancer (DCIS and early invasive breast cancers) have been consistently observed, with only modest declines in advanced stage cancer rates. Breast cancer mortality rates are declining, however this is likely due to improvements in treatment, particularly the introduction of adjuvant therapy, as well as screening.
Improved case-fatality rate and/or stable mortality rate	✓	Five-year survival figures steadily improved with uptake of screening mammography, and are now >90% for early breast cancer.
Evidence of increased treatment and treatment related harms	✓	Significant increases in the provision of breast cancer treatment (in absolute terms) have been observed in response to the increasing incidence. Breast surgery is provided to almost all women with screen-detected cancer, and increased rates of mastectomy have been observed in association with screening mammography and in response to increasing detection of DCIS. As a consequence of increased treatment provision, the absolute number of women experiencing short and long-term adverse effects of breast cancer treatment has also increased.

How Can Breast Cancer Overdiagnosis Be Quantified? (Methods)

The current accepted view is that breast cancer overdiagnosis exists, but how frequently it occurs remains unclear. To quantify breast cancer overdiagnosis, there are several preliminary considerations. The first is: how should we numerically express overdiagnosis so that an estimate of its frequency is comprehensible, useful, and consistently expressed between studies? Other essential considerations are: given we cannot (yet) identify overdiagnosed individuals, what types of studies can we use to quantify it? And what kind of biases may affect them? We consider these issues now, before moving to consider the results of the studies to date in the next section.

How Is Overdiagnosis Expressed?

There are several ways of expressing overdiagnosis in numeric terms, as described by Marmot, Etzioni, de Gelder, and Welch. Understanding the differences in these expressions is necessary to be able to accurately interpret and apply the results of studies that quantify overdiagnosis. Variations chiefly relate to choice of denominator.

Different expressions of overdiagnosis can be illustrated by reference to the data shown in Table 6.2 , from the Malmö trial, a large randomized trial of screening mammography conducted in Sweden from 1976. In this trial, women aged 45–69 at randomization were invited to screening every 18–24 months between 1976 and 1990. After the active intervention phase (screening period), screening was still offered to the younger women (aged 45–54 years at randomization), but not to the older women (aged 55–69 years at randomization). Follow-up of all women continued up to December 2001. As most of the screening mammography RCTs invited the control group to screen after the intervention period, this unusual trial design provides a rare opportunity to observe incidence changes during and after screening within the context of a randomized trial. At the end of the screening period of the trial, there were more cancers in the invited group (438 vs 324), an excess of 114 cancers (see Table 6.2 ) among the population of older women. This is expected because screening detects cancers earlier than they would have been diagnosed in the absence of screening due to lead time (see “How Do We Know Breast Cancer Overdiagnosis Exists?” section). Once screening ends, the rate of detection of cancers in the previously screened group should slow compared to the control group, compensating for the earlier increased incidence. This is called the “compensatory drop” in the invited group (or sometimes “catch-up” by the control group). This “compensatory drop” in the invited group can be observed in the Malmö I trial data shown in Table 6.2 : 5 years after the end of screening, the excess incidence in the invited group compared to the control group has been reduced from 114 extra cancers to 82 extra cancers.

Table 6.2

Data From Malmö I Trial for Women Aged 55–69 Years (ie, Born 1908–22): 10 Years of Screening and an Average of 15 Years Follow-Up

Data from Ref. .

	Cumulative Number of Cancers in the Invited to Screening Mammography Arm ( N = 20,695)	Cumulative Number of Cancers in the Control Arm ( N = 20,783)	Difference
Years 1–10 (screening years: screening every 18–24 months)	438 282 (screen detected only, ie, excluding interval cancers)	324	114
Years 1–15 (ie, screening years plus 5 years follow-up past the end of screening)	780	698	82

This excess of 82 cancers therefore represents the overdiagnosed cancers, once lead time (compensatory drop) is allowed for. Any method that attempts to measure the percentage of overdiagnosis attributable to screening mammography should use the excess cancers allowing for lead time in the numerator of the percentage calculation.

The choice of denominator depends on the desired perspective. For example, the overdiagnosed cancers (allowing for lead time), N = 82, can be expressed as a percentage of all cancers diagnosed during the screening period . This expression is the method preferred and recommended by the Independent UK Panel on Breast Cancer Screening as the clearest and most intuitive way of providing information to women about the risk of overdiagnosis from screening mammography (Method C from the Independent UK Panel on Breast Cancer Screening). In this example its numeric value is 82/438 or 18.7% and is interpreted as 18.7% of all cancers diagnosed during the period of screening were overdiagnosed. Alternatively, the same excess of 82 cancers can be expressed as 82/282, that is, 29% of screen-detected cancers (Method D from the Independent UK Panel on Breast Cancer Screening).

Other expressions of the percentage of overdiagnosis are also possible, for other purposes. For example, if you prefer to express the overdiagnosed cancers as a percentage of all the cancers diagnosed during women’s lifetimes, then all cancers to the age of 84 (or some other long period of follow-up) may be the best denominator. As an example of this expression, the data from the Canadian National Breast Screening Study can be used as this trial continued to collect follow-up information for many years. The Canadian National Breast Screening Study provided 5 years of annual screening, and follow-up continued for 20 years after the end of the screening period without either group being invited to screening. With long-term follow-up after the end of screening (ie, allowing for lead time), there were 117 excess cancers in the invited group. These 117 extra are 3.6% of the 3250 cancers ever diagnosed in the invited group over 25 years (see Table 6.3 ). Thus, this expression, which gives overdiagnosed cancers as a percentage of all cancers detected during screening plus all subsequently diagnosed cancers no matter how long after screening they were found (Method B from the Independent UK Panel), is much smaller than the overdiagnosed cancers as a percentage of all cancers found in the invited group during the screening period (117/666=17.6%). These calculations demonstrate that including all cancers diagnosed over long-term follow-up in the denominator greatly diminishes the apparent frequency of overdiagnosis. Some argue this is inappropriate because it dilutes the estimate of overdiagnosis by including in the denominator many cancers which have nothing at all to do with screening. Furthermore, any estimate will vary depending on exactly how many years of follow-up after screening are included in the denominator, making it difficult to compare estimates from different studies.

Table 6.3

Data From Canadian National Breast Cancer Screening Study, Women Aged 40–59 Years; 5 Years of Screening and an Average of 25 Years of Follow-Up

Data from Ref. .

	Cumulative Number of Cancers in the Screening Mammography Arm ( N = 44,925)	Cumulative Number of Cancers in the Control Arm ( N = 44,910)	Difference
Years 1–5 (screening years, 5 annual rounds)	666 484 (screen detected only, ie, excluding interval cancers)	524	142
Years 1–10 (ie, Screening years plus 5 years follow-up past the end of screening)	1180	1080	100
Years 1–25 (ie, screening years plus 20 years follow-up past the end of screening)	3250	3133	117

Expressions other than percentage risk of overdiagnosis are possible too, (for example a relative risk of overdiagnosis ), but are less readily interpretable and not recommended by the Independent UK Panel. These will not be discussed further, but it may be useful for readers to bear in mind that there is no single “right” expression, as some may be useful for different purposes. Readers of studies that seek to quantify overdiagnosis, are advised to check carefully how estimates of overdiagnosis are expressed in any particular study.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here