The Role of Microsimulation Modeling in Evaluating the Outcomes and Effect of Screening

Plain Language Summary

Microsimulation is a process where complex processes are modeled on computers programmed to combine information about separate components of the process into an overall picture. For breast cancer screening this means combining information about the epidemiology of breast cancer, the use and accuracy of screening to detect tumors earlier, and the response of breast cancer to treatment at different times to draw conclusions about the effect of actual or hypothetical screening programs on the frequency of occurrence of, and death from, breast cancer. In this chapter, we review the use of microsimulation to inform breast cancer screening practices and policy.

This chapter begins with a description of some existing microsimulation models and how they are developed and tested for validity. We then review the use of simulation models to provide deeper understanding of the results of clinical trials of mammography and observational data, highlighting, for example, our enhanced understanding of the relative contributions of mammography and adjuvant hormonal treatment and chemotherapy to the ongoing decline in breast cancer mortality.

A particularly important application of microsimulation is projecting the results of different strategies for using mammography. Numerous conflicting guidelines have been proposed regarding starting age, stopping age, and interval between mammograms. None of these guidelines can be directly supported by evidence because clinical trials have not used precisely those protocols, and have not provided life-long follow-up. Microsimulations fill this gap by implementing the protocols under consideration digitally, applying them to simulated women whose “breast cancer histories” mimic those of real women. In addition to providing insight into screening schedules not studied in clinical trials, microsimulations can shed light on the potential of alternative established or existing imaging techniques to improve breast cancer mortality when used in addition to, or instead of, mammography.

Microsimulations also underlie cost-effectiveness analyses of mammographic screening programs, and several examples are discussed.

This chapter continues with a discussion of the strengths and weaknesses of microsimulation as a source of understanding, and the special advantages associated with the collaborative use of multiple models. And we conclude with an outline of possible future directions.

Introduction

This book chapter focuses on the role of microsimulation modeling in evaluating the effects of breast cancer screening. We will start by giving a brief description of the history of microsimulation modeling, explaining briefly what microsimulation is, and we will give some examples of existing microsimulation models of breast cancer screening. This introduction is followed by a description of how microsimulation models are developed, including model development and model validation. Then, we describe the role of modeling in evaluating the effects of breast cancer screening by highlighting examples of the role of modeling in interpreting trials and observational studies, monitoring ongoing screening programs, and extrapolating the effects of screening to different settings, technologies, and screening schedules. Then we describe some of the strengths and weaknesses of using microsimulation modeling and note possible future directions.

In this chapter, we mainly discuss studies using models that have been developed specifically for breast cancer, use microsimulation modeling to evaluate the effects of breast cancer screening, use a variety of perspectives (often societal) and lengthy time horizon (often lifetime), report a variety of outcomes (eg, mortality reduction, cost-effectiveness), and compare numerous breast cancer screening scenarios (varying in invited ages, screening intervals, screening modalities) to no screening or current practice, all depending on the problem/research question that is addressed.

A Brief History of Microsimulation Modeling

Generally, Guy Orcutt, is acknowledged as providing the foundation for the field of microsimulation in 1957, by suggesting that microsimulation models consisting of “various sorts of interacting units which receive inputs and generate outputs” could be used to investigate “what would happen given specified external conditions.” However, it took some time before microsimulation modeling was widely used, mainly because of limitations in computing power and the lack of suitable data. With the rise of technology and data collection, these limitations were gradually overcome, and the use of microsimulation increased. Since then, microsimulation models have been used extensively for evaluating social and economic policies, such as income tax, social security, health benefits, or pension policies. Microsimulation of industrial processes to explore ways to improve them is also common. And microsimulation also underlies some newly popular methodologies for data analysis, specifically multiple imputation for dealing with missing data, and Metropolis-Hastings or Gibbs sampling for Bayesian data analysis.

The use of models to answer health policy questions began to evolve in the 1980s with application in a wide range of health care topics, such as hospital scheduling and organization, communicable disease, screening, costs of illness, and economic evaluation. For breast cancer screening, specifically, models were developed to extrapolate the findings of randomized trials. For example, when long-term outcomes from breast cancer screening trials were not yet available, models were used to predict the lifetime benefits, risks, and costs. Since the 1980s, the breadth of application, diversity, and complexity of model approaches have rapidly increased.

What Is Microsimulation Modeling?

According to Wikipedia, “ microsimulation (from microanalytic simulation) is a category of computerized analytical tools that perform highly detailed analysis of activities such as highway traffic flowing through an intersection, financial transactions, or pathogens spreading disease through a population. Microsimulation is often used to evaluate the effects of proposed interventions before they are implemented in the real world .”

An important/distinguishable feature of microsimulation models is that individuals (or individual components) are simulated. Typically, with microsimulation modeling, a life history is simulated for each individual using an algorithm that describes the events that make up that life history. Random draws decide if and when specific events occur. For health applications, these events typically include birth and death from other causes, as well as a model part that describes the target disease events, such as diagnosis and death from disease. Because individuals are simulated, microsimulation can produce individual-level artificial data that can be analyzed using standard statistical techniques, while population-level outcomes are obtained by aggregating individual life histories. The impact of changes in influential factors on key outcomes for subgroups as well as aggregates of the population can thus be assessed by testing “what if” scenarios. A disadvantage of microsimulation modeling is that it tends to be computationally intensive. In order to obtain stable estimates of stochastic outcomes, a large number of individuals must be simulated, particularly if small effects need to be estimated with precision.

Description of Some Available Models

To evaluate the effects of breast cancer screening several microsimulation models have been developed. We will describe some of the available microsimulation models below. A comprehensive review of all such models would be impractical, so we have selected those we feel have been most influential.

In 1980, Eddy articulated a mathematical theory of screening that closely resembles the approach used by most modern simulation models. He developed a Markov process-based simulation model, CAN*TROL, which he applied to the evaluation of cancer control policies for several cancer sites. He used his model to advise numerous countries about cancer screening programs. Although the underlying mathematical screening is similar to that used by other models reviewed in this chapter, the CAN*TROL model itself is not a microsimulation model.

One of the first microsimulation models, the “MIcrosimulation SCreening ANalysis” (MISCAN) model, was developed to examine the impact of cancer screening on morbidity and mortality. The MISCAN model was developed to evaluate mass screening for breast cancer, as well as for other types of cancers, such as cervical, prostate, and colorectal cancer. It is a microsimulation model with continuous time and discrete events that simulates a dynamic population. The simulated life histories include date of birth and age at death from other causes than the target disease, a preclinical, screen-detectable disease process, in which breast cancer progresses through states, and survival. The natural course of the disease may be influenced by screening when a preclinical lesion becomes screen-detected. Screen-detection can result in the detection of smaller tumors, which may entail a survival benefit.

Within the Cancer Intervention and Surveillance Modeling Network (CISNET), many models have been developed for different cancer sites. For breast cancer, seven models were developed and extensively discussed and compared. Six of the seven models were classified as microsimulation models and six of the seven models include a natural history component. The natural history of breast cancer is modeled with a time from birth to start of the preclinical screen detectable period (the duration of this period being known as the sojourn time), clinical diagnosis, and a period of survival that ends with either breast cancer death or death from other causes. The time of diagnosis can change if screening takes place during the preclinical screen detectable period, which, in turn, may alter the time and cause of death. In most models, therapy influences survival from time of diagnosis.

Despite the similarities, the models also differ in important ways, for example in their assumptions about sojourn time, the mechanism of screen detection, tumor characteristics at diagnosis, survival following clinical detection and survival following screen detection. Most CISNET models include ductal carcinoma in situ (DCIS), but one model includes only invasive breast cancer. Also, assumptions on nonprogression vary: several models specifically assume that some percentage of DCIS are nonprogressive; one model also assumes that some cases of small invasive cancer are nonprogressive. Some groups model breast cancer progression in discrete stages, whereas other models explicitly model continuous tumor growth. The models also differ as to whether treatment affects the hazard of death from breast cancer, results in a cure for some fraction of cases, or both.

Besides these models, many models have been developed over the past years, often to estimate the effects of breast cancer screening for specific countries, and often based on previously developed models. For example, models have been developed to determine the potential costs and effects of a breast cancer screening program for New Zealand as well as Slovenia. For both, it was explicitly stated that the model used was similar to MISCAN. In Spain, a model based on the Lee-Zelen approach (one of the CISNET models) was used to estimate the amount of overdiagnosis from breast cancer screening, as well as to estimate the cost-effectiveness of risk-based screening approaches and the cost implications of switching from film to digital mammography. A similar approach was used to estimate the cost-effectiveness of breast cancer screening in Korea. In addition, specific models have been developed to estimate overdiagnosis in France, Denmark, and the United Kingdom.

How Modeling Is Done: The Process of Microsimulation Modeling

Model Development: Conceptualization and Parameter Estimation

Most models are developed to answer specific questions and the development of a model is ideally guided by an understanding and conceptualization of the problem that is being represented. The conceptualization of the problem involves addressing factors such as the disease or condition, patient population, analytic perspective, alternative interventions, health and other outcomes, and the time horizon.

With regard to the disease or condition, typically, a single disease (eg, breast cancer) or set of closely related diseases is of interest, but sometimes it is necessary to consider multiple diseases, for example if one is interested in estimating the impact on life expectancy of changes in a factor, such as smoking, that influences risk of multiple diseases. The modeled patient population should reflect the population of interest in terms of features relevant to the decision (eg, geography and patient characteristics, including comorbid conditions, disease prevalence, and stage). Analytic perspectives commonly considered are those of the patient, the health plan or insurer, and society. With regard to the diagnostic or therapeutic actions and interventions, it is crucial to model all practical interventions and their variations, because the choice of comparators has a major impact on estimated effectiveness and efficiency, and the results are only meaningful in relation to the set of interventions considered. In addition, the specific implementations of interventions might differ across countries and often across settings or subpopulations within countries. Thus, despite the same label (eg, “breast cancer screening”), effects may differ, depending on the practice patterns in the target population area. Sources of variation include the technology used (plain film vs digital), criteria for recall and biopsy recommendations, skill level, work load, and operating point of the radiologist, and differences in the underlying epidemiology of breast cancer itself. Moreover, the benefits of screening also vary with the range of available treatments and their accessibility to the population under study. It is therefore advisable to specify the components of the intervention in detail so that users can determine how well the analysis reflects their situations. Health outcomes of the model should be directly relevant to the question posed and might include number of cases of disease, number of deaths (or deaths averted by an intervention), life years gained, quality-adjusted life years (QALYs), and costs. The time horizon of the model should be long enough to capture relevant differences in outcomes across strategies; a lifetime time horizon is often required.

In order to make the appropriate choices on all these factors the modeling team should consult widely with clinical, epidemiologic, and health services experts and stakeholders to ensure that the model represents disease processes appropriately and adequately addresses the decision problem.

Once the problem has been clearly conceptualized, conceptualizing the model is the next step. To convert the problem conceptualization into an appropriate model structure, ensuring it reflects current disease knowledge and the process modeled, several methods might be used (expert consultations, influence diagrams, concept mapping). Several problem characteristics should be considered to decide which modeling method is most appropriate: Will the model represent individuals or groups, are there interactions among individuals, what time horizon is appropriate, should time be represented as continuous or discrete, do events occur more than once, and are resource constraints to be considered?

Another dimension is how complex a model should be. The appropriate level of model complexity depends on the disease process, the scientific questions to be addressed by the model, and data available to inform model parameters. There is a tension between the simplicity of a model and the complexity of the disease process. Simpler models include fewer parameters making it more likely that model parameters can be estimated using observed data. Simpler models are easier to describe, may entail fewer assumptions, and are less likely to contain hidden errors, making them more transparent. However, more complex models can be used to address a wider range of scientific questions. One approach is to begin with relatively simple models, extending these as needed to address specific research questions.

Once the model conceptualization has been finalized, parameter estimation can begin. All models have parameters that must be estimated. In doing so, when possible, analysts should conform to evidence-based medicine principles; for example, seek to consider all evidence, rather than selectively picking values that best accord with the modeler’s preferences; use best-practice methods to avoid potential biases, as when estimating treatment effectiveness from observational sources; and employ formal evidence synthesis techniques.

Some parameters might be estimated from observed data using various analytical techniques to produce a result that will be used as a direct model input. For example, data on treatment efficacy might be used by combining results from several trials on treatment effects, and using the obtained hazard rate reduction directly in the model.

Often, other parameters have to be estimated using the model itself. Typically, unknown model parameters are estimated by optimizing the fit between simulated and observed data, using, for example, least-squares, minimum chi-square, or maximum-likelihood methods. This process is usually referred to as calibration. Typically this process is used for model parameters that quantify unobservable aspects of the process being modeled. For example, a model might include the mammographic detection probability of a preclinical tumor. Such a parameter might be estimated by optimizing fit between simulation output and observed rates of incidence of screen-detected and interval (ie, between scheduled mammograms) cancers in a clinical trial. Fitting to data is well defined when a model is built for a particular intervention, and there are one or more randomized trials in which the effects of the intervention are reported in terms of end results and intermediate outcomes, which can be compared with model predictions. When observed data comes from multiple studies and disease registries, expert judgment is required for assessing data quality and prioritizing data for their importance in fitting.

Validation

Evaluation and optimization of cancer screening programs or scenarios using modeling is only useful if the estimated harms and benefits of screening are reasonably accurate. Testing the accuracy of models, that is, model validation, is, therefore, a crucial step in any modeling study.

Previously, five main types of validation have been distinguished: face validity, verification (or internal validity), cross validity, external validity, and predictive validity. Face validity is the extent to which a model and its assumptions and applications correspond to current science and evidence, as judged by people who have expertise in the problem. In other words, face validity exists when a plain explanation of the model structure and assumptions appears reasonable to a critical audience.

Verification (or internal validity) addresses whether the model’s parts behave as intended and the model has been implemented correctly. This can, for example, be tested by assessing whether the model responds as expected in simplistic boundary conditions such as zero incidence, no other-cause mortality, no benefit from screening or treatment, and other (typically contrived) sets of input parameters where expected model results can be determined beforehand.

Cross validation involves comparing a model with others and determining the extent to which they calculate similar results. If similar results are calculated by models using different methods, confidence in the results increases, especially if the models have been developed independently.

In external validation, a model is used to simulate a real scenario, such as a clinical trial, and the predicted outcomes are compared with the real-world ones. Ideally, external validation tests the model’s ability to reproduce observed results in a validation data set that was not used as a source of parameter estimates or calibration while building the model. Given, however, that breast cancer screening models typically include parameters that vary from setting to setting, or among subpopulation groups, some of them not directly observable, it is usually difficult to completely avoid some degree of reliance on the validation data. Completely independent external validation requires that the variable aspects of screening, natural history of disease, and treatment noted in the previous section are the same as those which prevailed for the data sources from which the model was built. For this reason, external model validation of breast cancer screening models is typically imperfect. In addition, even a fully independent external validation in a single data set, or a small number of them, should not, by itself, be taken as a definitive endorsement of the model: the number of “degrees of freedom” in any model complex enough to approach a realistic data generating mechanism will always be too large for one or a handful of fits to data to confirm.

Predictive validity involves using a model to forecast events and, after some time, comparing the forecasted outcomes to the actual ones. On the one hand, predictive validation is the most desirable type of validation, as it corresponds best to the purpose of most models, which is predicting what will happen. On the other hand, the observations are by definition in the future, making it often difficult to test a models predictive validity in a sufficiently timely manner to make practical use of its results. Predictive validity can sometimes be estimated by observing the model’s performance in simulating the results of somewhat similar events in existing data sets. For example, if the question of interest concerns the outcomes associated with different intensities of screening, it may be relevant to check how well the model can reproduce the changes in incidence and mortality that occurred during a time when a similar population introduced screening and its use disseminated.

Role of Modeling

Well done randomized controlled trials (RCTs) provide the strongest possible evidence regarding the effectiveness of interventions, and high quality observational studies are the second tier. But the available evidence at any time typically leaves some important questions unanswered. Moreover, some questions cannot, in practice, be approached with these methods. RCTs may include inadequate numbers of women in age groups of interest to policy makers. The populations studied may not be sufficiently similar to those of interest to the decision maker. RCTs typically study only one or small number of protocols for screening, whereas policy makers may wish to choose among many options. The duration of follow-up is limited, so projection of long-term outcomes requires other methods. If carried out in settings where mammography is available outside of the study, the control groups may be “contaminated,” which may result in underestimation of screening effects compared to what would happen with the de novo introduction of screening. The results of an RCT may be of limited relevance if new mammographic technologies have been introduced since the trial concluded, or if the spectrum of available treatments has changed. Findings regarding subpopulations of special interest for policy may be nonexistent or underpowered.

By synthesizing the results of relevant RCTs and observational evidence, and supporting the projection of outcomes of hypothetical screening programs, models can overcome some of these limitations of RCTs and provide policy makers with information that covers a wider range of possibilities and may be more closely tailored to the prevailing circumstances in which the decision maker finds him/herself.

In the following sections, we will provide an overview of the different roles microsimulation models have had in evaluating the effects of breast cancer screening. Each category is explained and highlighted by selected examples.

Interpretation

The interpretation of observational studies, and sometimes that of clinical trials, is made difficult by the fact that in the real world, influential factors other than the object of study are often changing too. For example, during the same period that mammography screening became widely available and widely used in the United States, new forms of adjuvant therapy for breast cancer were also introduced, and the use of adjuvant chemotherapy became more common. When systematic screening programs are introduced, changes in incidence and mortality may also be influenced by concurrent changes in risk factor prevalence or treatment patterns. Differences in breast cancer outcomes between different populations can seldom be ascribed to a single cause: typically the prevalences of different risk factors, utilization and implementation of screening, and treatment patterns will all differ. Ordinary statistical analyses, such as regression equations, have at best limited ability to accommodate such confounding variables. Simulation models, however, because they can represent complicated and multifaceted data generating processes, can disentangle the concurrent effects and support attribution of portions of the observed effects to each cause by providing projections of the expected outcomes under both real and counterfactual combinations of these factors.

Trials

Several RCTs have been performed to evaluate the effects of breast cancer screening, reporting significant mortality reductions for women aged 50–69 years. For women younger than 50 years, breast cancer mortality reduction rates of 10–13% were published. However, some women in the age group under 50 were also screened when they were 50 years old or older. Part of the observed mortality reduction in these women is likely to have been a result of detecting the cancer earlier in later rounds when the women were 50 years old or older. It is, however, not directly observable how large this proportion is. Therefore, a modeling study estimated what percentage of the observed mortality reduction for women aged 40–49 years at entry into the five Swedish screening trials might be attributable to screening these women at 50 years of age or older. By simulating all Swedish trials, taking into account important trial characteristics, such as age distribution, attendance, and screening interval, it was estimated that around 70% of the 10% reduction in breast cancer mortality observed for women in their forties might be attributable to screening these women after they reach age 50.

Another example of dissecting the results of a clinical trial is Van Ravesteyn et al. applying the MISCAN model to simulate the UK Breast Screening Frequency Trial. That trial directly compared annual and triennial screening intervals. The investigators found that the difference in breast cancer mortality between the two arms was not statistically significant (RR 0.93, 9% CI 0.63–1.37), a surprising result. The trial simulation replicated the findings of the actual trial, although it produced a slightly more favorable result. The original trial followed women for 36 months. The microsimulation was used to project lifetime mortality, with no material effect on the predicted relative risk of death. However, the projected relative risk became more favorable to annual screening if full attendance of all invitees was simulated and a higher mammogram sensitivity were achieved. The investigators concluded that given actual screening conditions that prevailed during the trial, the trialists’ original expectation of a 25% breast cancer mortality reduction was overly optimistic, and consequently the study was substantially underpowered to detect the actually achievable reduction. They noted that measures to increase the sensitivity of mammography and obtain better adherence to the assigned screening schedules would have enhanced the power of the trial.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here