Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The benefits of mammography screening have been studied in nine randomized controlled trials comparing women screened with those not screened. Randomized controlled trials are regarded as the strongest type of study to determine whether screening is effective because they provide the most reliable results about the differences in outcomes between screening and nonscreening groups. It is important to appreciate the differences between the trials and their limitations in order to understand the results.
The trials began between 1965 and 1991, and involved over 600,000 women in the United States, Canada, United Kingdom, and Sweden. The trials varied in their recruitment of women, the numbers of women enrolled, and methods of assigning screening. Some trials enrolled only women in their 40s or 50s, while others included various broader age groups ranging from 39 to 74. In some trials, women were screened every 12 months, while others screened every 33 months. The lengths of screening varied from 4 to 10 years, and follow-up ranged from 11 to 25 years. The trials also differed in the ways they counted breast cancer cases and analyzed their results.
Several limitations of the trials have been recognized. An important concern has been whether the screening and control groups are truly similar at the start of the trials, as well as throughout their duration. Also, the relevance of the screening trials to current populations and practice is likely to have diminished over time. All of the trials were conducted in the past when imaging technologies and breast cancer therapies were markedly different than today. Whether results from trials using these outdated practices would be the same if they were conducted today is questionable. In addition, as with all research studies, the effects of screening for women in trials may be different than for women in the general population. Research has found that women who enrolled in some of the breast cancer screening trials were more educated and had higher risks of breast cancer compared with the general population.
When the results of the trials were combined using statistical methods, summary estimates indicated that death from breast cancer was lower for women undergoing screening compared to those who were not screened, but results varied by age. Results for women in their 40s and 50s were of borderline statistical significance and varied by how cases were counted in the trials. Trials did not enroll enough women age 70 and older to provide reliable results for older women. Assuming that differences actually exist, the estimates can be expressed as the number of breast cancer deaths prevented for 10,000 women screened for 10 years as: 3 deaths for women age 40–49; 5–8 deaths for age 50–59; and 12–21 deaths for age 60–69. The trials also reported that age-specific deaths from all causes, not just death from breast cancer, were not reduced with screening. Cases of advanced breast cancer were reduced for women age 50 and older, but not younger women.
Although the screening trials provide the strongest type of research to determine whether mammography screening is beneficial, the trials are outdated and may not represent current practices. Results indicate only small reductions in breast cancer deaths among women who were screened, but vary by age, with the largest effects (the most benefit) among women age 60–69.
The benefits of mammography screening have been studied in nine main randomized controlled trials (RCT) beginning between 1965 and 1991, and involving over 600,000 women in the United States, Canada, United Kingdom, and Sweden. Trials provide estimates of the effectiveness of routine mammography screening by comparing breast-cancer specific mortality between women randomized to screening versus no screening. Additional outcomes related to benefit include reduced all-cause mortality and incidence of advanced breast cancer.
Age is an important factor in the design, analysis, and interpretation of the screening trials because the risk for breast cancer, types of breast cancer, and performance of screening technologies differ by age. Accordingly, guideline development groups depend on age-specific evidence to provide screening recommendations. To address these issues, this chapter presents results of the trials according to the age of participants when available.
RCTs are regarded as the strongest study design to determine effectiveness, because a well-designed RCT is less susceptible to bias than other study designs. Bias refers to any effect of the design or conduct of a study that systematically favors one comparison group over others. Studies with greater risks of bias are more likely to yield incorrect results. RCTs provide direct comparisons between screened and unscreened groups, while studies of women who are screened but not compared with unscreened women are unable to account for other factors that may explain outcomes.
Randomization is an important aspect of a RCT’s design and refers to how participants are allocated to screening and control groups. A truly randomized method of allocation means that the assignment of screening or control groups is not predictable by the participant, researchers, and others. Randomization largely eliminates the problem of selection bias and associated confounding. With successful randomization and a large enough sample size, important confounders are equally distributed among comparison groups, including confounders that are unrecognized or not measured. Ideally, in this way, the comparison groups are similar except for whether they are screened or not, and differences in outcomes can be correctly attributed to screening.
In general, the strength of evidence is determined by judging how well individual studies were designed and executed to reduce bias, as well as their place on a study design-based evidence hierarchy ( Fig. 2.1 ). Results of appropriately performed RCTs are generally more reliable than results of nonrandomized or observational studies, and they are placed at the highest level of the evidence hierarchy. However, the evidence hierarchy assumes that RCTs are well conducted. A poor-quality RCT could yield results that are as or more misleading than those from studies lower in the evidence hierarchy. That is why it is important to critically evaluate and understand the quality of each trial and its contribution to the overall strength of evidence in order to determine the effectiveness of breast cancer screening.
The nine main RCTs of breast cancer screening include the Health Insurance Plan of Greater New York (HIP) trial, Edinburgh trial, Canadian National Breast Screening Study 1 (CNBSS-1), Canadian National Breast Screening Study 2 (CNBSS-2), United Kingdom Age trial, Stockholm trial, Malmö Mammographic Screening Trial (referred to separately as MMST I and MMST II), Gothenburg trial, and Swedish Two-County Study (referred to separately as Östergötland and Kopparberg). For some trials, results are sometimes combined (eg, Canadian) or provided separately (eg, CNBSS-1, CNBSS-2) leading to different ways of counting the number of trials. The descriptions of trials and metaanalyses in this chapter are based on a systematic review of the most recent trial results. In these analyses, the number of trials was counted as the number of discrete data sources contributing to each summary estimate.
All trials were designed as RCTs, although they varied in their recruitment of participants and controls, sizes, methods of randomization, and screening protocols ( Tables 2.1 and 2.2 ). Some trials enrolled only women in their 40s (CNBSS-1 and Age) or 50s (CNBSS-2), while others included various broader age ranges. The two Canadian trials recruited volunteers, the Age trial identified women from general practice lists, and the HIP trial recruited women enrolled in a health insurance plan. The other trials enrolled participants based on their residence in communities.
Trial | Age, y | Year Trial Began | Screening, n ; Control, n a | Population | Comparison Groups | Method of Randomization |
---|---|---|---|---|---|---|
Health Insurance Plan of New York (HIP) | 40–64 | 1965 | 30,239; 30,765 | New York health plan members | Mammography +clinical breast examination vs usual care | Individual based on stratification by age and family size and drawn from a list |
Canadian National Breast Screening Studies (CNBSS-1 & CNBSS-2) | CNBSS-1: 40–49; CNBSS-2: 50–59 | 1980 | CNBSS-1: 25,214; 25,216; CNBSS-2: 19,711; 19,694 | Self-selected from 15 centers in Canada | Mammography +clinical breast examination vs usual care b | Block stratified by center and 5-year age group |
Edinburgh | 45–64 | 1978 | 28,628; 26,015 | All women from 87 general practices in Edinburgh, Scotland | Mammography +clinical breast examination vs usual care | Cluster based on general practitioner practices |
Malmö Mammographic Screening Trial (MMST I & II) | 43–69 | 1976–78 | MMST I: 21,088; 21,195; MMST II: 9581; 8212 | All women born between 1908 and 1945 living in Malmö, Sweden | Mammography vs usual care | Individual within birth year |
Stockholm | 40–64 | 1981 | 40,318; 19,943 | Residents of Stockholm, Sweden | Mammography vs usual care | Individual by day of month |
Swedish Two-County | 40–70 | 1977 | 77,080; 55,985 | Women from Östergötland and Kopparberg counties in Sweden | Mammography vs usual care | Cluster based on demographically homogeneous geographic units |
Gothenburg | 39–59 | 1982 | 21,650; 29,961 | All women born between 1923 and 1944 living in Gothenburg, Sweden | Mammography vs usual care | Cluster based on day of birth for women born 1923–35 (18%); individual for women born 1936–44 (82%) |
Age | 39–41 | 1991 | 53,884; 106,956 | Women from 23 National Health Service breast screening units in England, Scotland, and Wales | Mammography vs usual care | Individual stratified by general practitioner group c |
a Numbers of participants in screening and control groups vary by publication.
b All women were prescreened with clinical breast examinations and instructed in breast self-examination. For women 50–59, usual care involved annual clinical breast examinations.
c Used random number generation between 1991 and 1992, then Health Authority computer system.
Trial | Screening Interval, months | Rounds, n | Views, n | Adherence, % | Screening Duration, y | Controls Screened | Longest Follow-Up, y |
---|---|---|---|---|---|---|---|
Health Insurance Plan of New York (HIP) | 12 | 4 | 2 | 46 | 4 | After trial completed | 18 |
Canadian National Breast Screening Studies (CNBSS-1 & CNBSS-2) | 12 | 4–5 | 2 | 85 | 4.5 | At age ≥50 after trials completed | 25 |
Edinburgh | 24 | 2–4 varied by cohort | 1–2 | 61 | 2–8 varied by cohort | After 6–10 years, varied by cohort | 10–14 varied by cohort |
Malmö Mammographic Screening Trial (MMST I & II) | 18–24 | 9 | 1–2 | 70 | 10+ | After 14 years | 11–13; 15.5 |
Stockholm | 24–28 | 2 | 1 | 81 | 4.8 | After 5 years | 11.4 |
Swedish Two-County | 24–33 | 3 | 1 | 84 | 7 | After 7 years | 20; 15.5 |
Gothenburg | 18 | 5 | 1–2 | 75 | 9 | After 5 years | 12 |
Age | 12 | 4–6, varied by center | 2 | 57 | 9 | At ages 50–52 | 17 |
The Gothenburg trial used individual (82%) and cluster (18%) randomization methods, and the Swedish Two-County and Edinburgh trials used cluster randomization by community. The other five trials randomized participants at the individual level with some stratifying randomization by age (HIP, CNBSS-1, CNBSS-2, Malmö), clinical center (CNBSS-1, CNBSS-2, Age), or other factors (HIP, Stockholm, Swedish Two County). Seven trials established control groups that received usual care. At the time of the trials, screening mammography was not usually provided as usual care, or only offered at specific age thresholds. Participants in the Canadian trials received baseline clinical breast examinations and instructions in breast self-examination before randomization to either screening or control groups.
None of the trials used digital or three-dimensional digital mammography (digital breast tomosynthesis) for screening. The HIP trial used direct-exposure film mammography and the other trials used screen-film mammography. The Age trial and four Swedish trials provided mammography alone, and the other trials provided mammography combined with clinical breast examination. Protocols varied, with screening intervals ranging from 12 to 33 months; rounds from 2 to 9; and durations from 4 to over 10 years. Follow-up ranged from 11 to 25 years.
Breast cancer mortality was the main outcome measure in all of the screening trials. Most trials also reported all-cause mortality and breast cancer incidence. Ductal carcinoma in situ (DCIS) was included in breast cancer case reporting for most trials (Gothenburg, Stockholm, Malmö, Swedish Two-County, Age, Edinburgh, HIP), while the Canadian trials included only invasive breast cancer in their latest update. All of the trials provided information on the stage, size, or lymph node involvement of cases. However, these outcomes were reported differently across the trials using various descriptions and levels of severity. All trials compared differences between screening and control groups using intention-to-screen analysis.
Two methods of accrual of breast cancer cases and deaths were used in determining outcomes in the trials ( Fig. 2.2 ). These methods affect the analysis of outcomes and are important to recognize when interpreting trial results. The long case accrual method counts all of the breast cancer cases contributing to breast cancer deaths diagnosed during the screening intervention period plus the follow-up period. This method has been referred to as the “follow-up” method of analysis by some investigators. While this method includes the most cases, it has the potential to dilute a true benefit because participants from the control group may have been screened after the study intervention period ended.
The short case accrual method includes only deaths occurring among cases of breast cancer diagnosed during the screening intervention period, and in some trials, within an additional defined case accrual period. This has been referred to as the “evaluation method” of analysis by some investigators. This method always involves the evaluation of fewer breast cancer cases for mortality outcomes because the duration of case accrual is shorter than for the long accrual method.
Methodological limitations of the trials have been widely described. A major concern has been comparability of screening and control groups at the time of randomization, as well as maintenance of comparability during the trial and follow-up periods. For example, differences between baseline characteristics of participants in the Edinburgh trial have been considered important enough to invalidate its randomization. For this reason, the results of the Edinburgh trial are usually not included in quantitative metaanalysis of the screening trials. However, few baseline characteristics were collected in most trials, and data are lacking to determine whether differences in important variables, such as family history of breast cancer, may actually exist.
Also, the clinical relevance (applicability) of the screening trials to current populations and practice is likely to have diminished over time. All of the trials were conducted in the past when imaging technologies and breast cancer therapies were markedly different than today. Whether the results of screening with film mammography, particularly with only one view as used in some trials, are similar to screening with two views using current technology is questionable. Also, advances in breast cancer treatment, particularly more widespread use of adjuvant therapies and new types of therapies, have improved mortality even for women diagnosed at later stages. As a result, the use of more effective treatments may have diminished the impact of screening.
As with all clinical trials, the effects of screening in trial participants may not represent effects in general populations. Women who enroll in trials and attend screening interventions frequently differ from those who do not, underscoring the importance of intention-to-screen analysis to evaluate outcomes. Adherence rates varied widely across the trials, from 46% in the HIP trial to 85% in the Canadian trials. These differences provide additional insights when interpreting the results of the trials and considering their applicability to clinical populations.
Two trials that enrolled participants based on their membership in an insurance plan (HIP) or residence in a specific city (Stockholm) evaluated differences between women randomized to the intervention group who attended the screenings compared with those who did not attend. In these trials, attendees had higher risks of breast cancer and lower risks of all-cause mortality than nonattendees. In the Canadian trials, volunteers were recruited from several communities. Compared with the general population, trial participants were more educated, had fewer pregnancies, and had overall higher risks for breast cancer. These findings suggest that women who were more knowledgeable about breast cancer, recognized their individual risks, and were interested in improving their health were more likely to participate in mammography screening. These are important differences that could introduce bias and influence results.
The HIP trial began in 1963, and was the first RCT of mammography screening. Nonpregnant women age 40–64 without prior breast cancer with 1 year of membership in the insurance plan were eligible. Age and family size-stratified pairs of women were drawn from a list and individually randomized to either intervention or usual care control groups.
The intervention consisted of two-view film mammography, a clinical breast examination performed by a physician (usually a surgeon), and an interview to obtain relevant demographic information and health history. Adherence with the initial exam was 67% and only adherent women were offered subsequent screening with three annual exams. Independence between screening mammography, clinical breast examination, and interview results was strictly maintained. Few women in the control group had mammography screening during the trial. Cases of breast cancer were ascertained using insurance plan records, claims files, death records, cancer registry, and the National Death Index. All causes of death were determined by blind record review.
The two randomized groups were similar in terms of age, religion, marital status, prior pregnancies, and all-cause mortality. Although prior breast cancer was a prespecified exclusion criterion for entry, prior mastectomy status was more completely ascertained for the screening than control group, resulting in differences in the groups’ sizes (30,239 vs 30,756) and potential noncomparability. However, when breast cancer deaths were evaluated, only deaths occurring among breast cancer cases diagnosed within the study period or in a defined period of observation were considered, depending on the analysis.
Information important for critical appraisal of the trial varied across over 40 publications. Limitations of the HIP trial include inadequate description of blinding, low adherence, inconsistent reporting of the numbers of participants in publications, and inadequate reporting of outcomes in screening and control groups.
The CNBSS-1 evaluated the effectiveness of mammography, annual clinical breast examination by physicians or nurses, and breast self-examination instruction compared with usual care in reducing breast cancer mortality among women age 40–49. The trial was conducted in 15 centers in Canada from 1980 to 1985. Volunteers without prior breast cancer or mammography in the prior 12 months were recruited using mass media advertising. After a clinical breast examination and instruction on breast self-examination, 25,214 women were randomized to receive two-view mammography plus clinical breast examination for 4–5 annual screens, and 25,216 women were randomized to usual care. Usual care was based on general care in the community that did not include routine mammography screening, and breast self-examination was reinforced for those returning to the clinical centers. Abnormalities detected during the study were evaluated through referral to the CNBSS review clinic, where the study surgeon reviewed the case and mammogram with the study radiologist and made recommendations to the participant’s physician for evaluation and follow-up.
The CNBSS-2 was designed to determine the incremental benefit of adding mammography to annual clinical breast examination by physicians or nurses and breast self-examination instruction in reducing breast cancer mortality among women age 50–59. Study participants were recruited through mass media advertising and randomized after clinical breast examinations to either annual clinical breast examination alone (19,694 participants) or annual clinical breast examination plus two-view mammography (19,711 participants) for 5 annual screens. In addition, breast self-examination instruction was provided at the first visit and reinforced at each screening visit for participants in both screening and control groups.
The Canadian trials differed from the other screening trials because they obtained medical histories and clinical breast examinations on all participants before randomization, and the trials enrolled volunteers with different breast cancer risk profiles and health habits than the general population, as detailed above in Limitations. Adherence for both trials was 85%. At the initial screen, more women age 40–49 had breast cancer with 4 or more positive nodes in the screening compared with the control group (17 vs 5), suggesting that groups may not have been entirely comparable. However, this could also be the result of mammography detection of advanced as well as early breast cancer and was observed in other trials as well.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here