Molecular-Based Testing in Breast Disease for Therapeutic Decisions


The emergence of high-throughput omics technologies in the mid-1990s heralded the emergence of a new paradigm of personalized medicine. Researchers could now study dynamic biological systems, including cancers, in novel and comprehensive ways. Collaborations arose among basic scientists, clinical researchers, bioinformaticians, and biostatisticians, resulting in the development of a multitude of omics-based tests for use in oncology. The speed of technological advancement and the ability to generate vast quantities of data initially outpaced the development of standards for appropriate study design and validation. Indeed, some of the earliest molecular-based tests, including for breast cancer, were prematurely implemented into clinical trials at a very early stage in their development. Ultimately, concerns raised by statisticians about the validity of the tests and potential harm to patients led to the termination of the trials, and highlighted the need for a rigorous methodological framework for the discovery, validation, and, ultimately, translation of omics-based tests into clinical practice.

Omics is a term encompassing multiple molecular disciplines that measure some characteristic of a large family of cellular molecules such as DNAs, RNAs, proteins, lipids, and metabolites. Genomics, for example, refers to the study of genes and their function. Omics-based tests include both an assay that measures the molecules of interest and a computational model that translates the assay measurements into a clinically actionable result. In general, omics research generates complex, high-dimensional data through measurement of many (often magnitudes) more variables than the number of samples. This results in a high risk that computational models will overfit data. Overfitting occurs when a statistical model describes random error or noise rather than a true underlying relationship. The development of omics-based tests for clinical use, therefore, requires carefully designed and strictly executed series of validation studies using independent sample sets.

In response to concerns raised by the premature incorporation of gene expression–based tests into clinical trials, the Institute of Medicine (IOM) convened a special Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials, charged with identifying appropriate evaluation criteria for omics-based tests and their readiness for inclusion into a clinical trial design. The resulting guidelines are summarized in Table 10.1 . Within the discovery phase, a computational model is first developed on a training set of samples, and then the fully specified locked-down computational model is evaluated on an independent test set of samples. Successful omics-based tests are then transferred from the research laboratory to a clinical laboratory for the validation phase. Here the clinical testing method is developed and optimized, followed by analytic validation (analytic performance characteristics of the test) and clinical validation studies (confirmation that the test correlates with the clinical outcome of interest in an independent sample set). The last stage involves evaluating the clinical utility of the omics-based test through conduction of either a randomized clinical trial or at least two prospective-retrospective studies using archived samples from previous randomized clinical trials.

Table 10.1
OMICS - Based Tests for Predicting Patient Outcomes
(Adapted from Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials; Board on Health Care Services; Board on Health Sciences Policy; Institute of Medicine. In: Micheel CM, Nass SJ, Omenn GS, eds. Evolution of Translational Omics: Lessons Learned and the Path Forward. National Academies Press; 2012. Available from https://www.ncbi.nlm.nih.gov/books/NBK202160 .)
STEPS FOR MOLECULAR TEST DEVELOPMENT AND EVALUATION. DISCOVERY PHASE (RESEARCH LABORATORY)
  • Molecular test is developed on a training set (specimen samples with annotated outcome data that are relevant to intended use). Performance is checked using internal validation methods such as bootstrapping, cross-validation, or other resampling methods.

  • All computational procedures are locked down.

  • Test is confirmed with an independent set of samples, not used in the generation of the computational model. Ideally, a completely independent sample set is used; suboptimally, a split-sample approach may be used that splits data into a training set for model development and a test set for evaluation. All testing should be blinded to outcome data.

  • The data, metadata, computer code, and fully specified computational procedures used for the development of the molecular test are released to the scientific community (public databases, publication, or patent application) to enable independent verification of the findings.

  • The result is a fully specified candidate molecular test with locked-down computational procedures and defined intended clinical use.

TEST VALIDATION PHASE (CLIA-CERTIFIED CLINICAL LABORATORY)
  • Translation of an omics-based test from the laboratory into a clinical test requires analytic and clinical validation performed in a CLIA-certified laboratory. The test may be offered either as an FDA-approved/cleared in vitro diagnostic test or as a laboratory-developed test within a CLIA-certified laboratory under the purview of CLIA regulation; both avenues require analytic and clinical validation.a

  • Analytic validation: The clinical test method and platform are defined and optimized by the CLIA-certified laboratory. Performance characteristics of the test (e.g., accuracy, precision, analytic specificity, detection limit, quantitation limit, linearity, range, and robustness) are defined using specimens representing the intended clinical use and range of expected test results.

  • Clinical validation: Requires an independent specimen set from a relevant patient population representing those intended for use of the test. Clinical validation confirms the test results are linked to clinical state or outcome (prognostic tests) or response to therapy (predictive test) and defines clinical performance characteristics (e.g., clinical sensitivity, clinical specificity, likelihood ratios, hazard ratios, and receiver operating characteristic curves).

  • Validation results and data are published for peer review.

  • Performance monitoring (ongoing validation): The validated test is integrated into the workflow and quality management system of the laboratory. Written SOPs outline the technical aspects of performing the test as well as the procedures for quality control and quality assurance. Quality indicators such as turnaround time, test failure rates, and trends in test results are tracked.

  • The final result is a fully developed (specified, locked-down, and validated) molecular clinical test.

EVALUATION OF CLINICAL UTILITY AND USE
  • Level I evidence for clinical utility of a molecular test can be derived through two avenues:

    • (i)

      Prospective clinical trial designed to address the utility of the omics-based test (“gold standard”; Level of Evidence IA); the test result may either direct patient management (requires IDE from the FDA) or not, depending on the trial design.

    • (ii)

      Prospective-retrospective studies using archived clinical specimens from previously conducted clinical trials that address the intended use of the omics-based test (two such studies are required for Level of Evidence IB). b Evidence of clinical utility is not required by the FDA for its evaluation and approval/clearance of a clinical test; in some cases clinical utility may not be demonstrated until after a test is introduced in the marketplace, and may sometimes take several years in the case of a prospective clinical trial.

  • The final product is a clinically useful, fully developed (specified, locked-down, and validated) molecular test implemented into clinical practice.

CLIA , Clinical Laboratory Improvement Amendments; FDA , US Food and Drug Administration; IDE , Investigational Device; SOPs , standard operating procedures.

Gene Expression Profiling Using Microarray Technology and Real-Time Reverse Transcription Polymerase Chain Reaction

Since the completion of the Human Genome Project and introduction of microscale technologies for gene expression molecular analysis. accelerated developments in the field of high-throughput sequencing have significantly advanced oncological research, with a particularly large impact on clinical application. Within breast cancer, several gene expression signatures have been developed using microarray and real-time quantitative reverse transcription polymerase chain reaction (qRT-PCR) technologies, several of which are further discussed later in this chapter.

Gene Expression Microarrays

DNA microarrays for measuring gene expression levels create a global snapshot of a tissue or cell type’s relative gene expression (transcriptome) at a particular point in time (at which the tissue was harvested). Microarray technology identifies differences in gene expression profiles between normal and abnormal tissues and, through comparison of expression profiles of various cancers, permits identification of differences in expression profiles, such as those that correlate with different clinical outcomes or response to a specific therapy.

There are a number of commercially available microarrays, which can be broadly classified using at least three criteria: (1) length/type of probes (long probe complementary DNA [ cDNA ] arrays versus short probe oligonucleotide arrays ); (2) manufacturing method ( spotted arrays containing deposited spots of previously synthesized probes versus in situ synthesized arrays where oligonucleotide probes are built directly on the chip); and (3) number of samples simultaneously profiled on the array (single-channel versus multichannel arrays). The steps involved in a microarray experiment are summarized in Fig. 10.1 .

Fig. 10.1, Two-channel versus single-channel microarray experiment. ( A ) Two-channel (two-color) experiments require two cell populations, the sample (tumor) and a reference (normal tissue) from which RNA is extracted and reverse transcribed into complementary (cDNA) incorporating Cy5 (red)– or Cy3 (green)–labeled nucleotides. (Alternatively, the cDNA is used to generate fluorescent-labeled complementary RNA [cRNA].) By convention, the test sample is labeled with red and the reference is labeled with green. Equal quantities of the two are mixed together and hybridized onto the microarray that is then scanned to measure fluorescence of the two fluorophores after excitation with a laser beam of defined wavelength. Relative intensities of fluorescence are used in a ratio-based analysis to identify upregulated and downregulated genes. If a particular gene is overexpressed in the sample compared with the control, that particular spot will appear more red; if underexpressed, the spot will appear more green. ( B ) In single-channel (one-color) experiments, only one sample is assayed. The schematic describes a biotin–streptavidin–R-Phycoerythrin system used in GeneChip microarrays (Affymetrix, Santa Clara, Calif.); however, direct fluorescent labeling of the target (e.g., with Cy3) analogous to the two-color method can also be used. In this assay, total RNA is extracted and cDNA is prepared, which is then used in an in vitro transcription reaction to generate biotinylated cRNA (alternatively, certain platforms use labeled cDNA). The cRNA is fragmented and hybridized to the microarray, washed, and stained with the fluorochrome R-Phycoerythrin-conjugated streptavidin that binds to biotin. The chip is then scanned with a confocal laser and the intensity and distribution pattern of signal in the array are recorded. Regardless of the method or platform of the microarray experiment, statistical and bioinformatics analysis must be used to first normalize the measurements and analyze the data. Expression results are generally displayed as an expression matrix (essentially a large table), with columns representing samples, rows representing genes, and each position in the table describing the measurement for a particular gene in a particular sample. Graphically, the expression matrix is often represented as a heat map (bottom image) . In this instance, red indicates overexpression of a gene, green indicates underexpression of a gene, and black indicates no difference in expression (gray indicates missing data). Patterns of expression among the sample can be clustered by various software according to row or column and the relationship clustering of samples displayed using dendrograms (seen at the top of the heat map). mRNA , Messenger RNA; PE , phycoerythrin.

The sheer amount of data generated from the array requires the use of specific bioinformatics software tools. Normalization of fluorescence signals is performed to account for variations in labeling, hybridization, and scanning methods, and statistical tools are used to determine which changes are considered significant. The different methods of normalization and statistical analysis can result in significant differences in expression results from different laboratories using the same samples, and comparing and contrasting microarray results from different laboratories requires knowledge of the specific methods used. To facilitate this process, the Minimum Information About a Microarray Experiment (MIAME) checklist has been developed and is a requirement, along with depositing all experimental data into a public repository (e.g., Gene Expression Omnibus [GEO] or ArrayExpress ) for publication in many journals. Gene expression results obtained from microarray analysis are ideally confirmed using another method of expression profiling, with qRT-PCR being the most common method.

Quantitative Reverse Transcription Polymerase Chain Reaction

Polymerase chain reaction (PCR) is the workhorse technique of all molecular laboratories. qRT-PCR is an extension of standard PCR that permits quantifying gene expression levels within a sample. The substrate is messenger RNA (mRNA), which in the first step is reverse transcribed to cDNA followed by standard PCR amplification ( Fig. 10.2A ). An instrument monitors the presence of PCR products in real time while software performs quantitative analysis. Detection methods can be either nonspecific , such as fluorescent dyes (e.g., SYBR Green), or specific , such as DNA probes (e.g., TaqMan assay ). The mechanisms of action of these common detection systems are shown in Fig. 10.2B–C . Quantification may be either absolute (using a calibration curve to relate the PCR signal to the amount of starting mRNA) or relative (measuring the relative change in mRNA expression level of the target gene versus a housekeeping or reference gene).

Fig. 10.2, Real-time quantitative reverse transcription polymerase chain reaction (qRT-PCR). ( A ) In RT-PCR the extracted RNA template is first converted into complementary DNA (cDNA) using reverse transcriptase. The cDNA is then used as a template for exponential amplification using PCR with two specific primers (generally within 100 base pairs of each other in the case of formalin-fixed tissue). Accumulating PCR product is detected with a variety of methods. ( B ) The SYBR Green assay is a nonspecific detection method exploiting the property of the SYBR Green I dye to become highly fluorescent when bound to double-stranded DNA. Each exponential cycle of PCR amplification generates double-stranded PCR product that results in a proportional increase of fluorescent signal by bound SYBR Green dye. ( C ) The TaqMan assay, by contrast, is a specific fluorescently labeled probe that anneals to a portion of the target being amplified. The probe has both a fluorescent reporter attached to the 5′ end and a quencher on the 3′ end, which quenches the fluorescent signal. The primers and probe anneal to the specific target gene. Owing to the 5′-exonuclease activity of the Taq polymerase, the bound fluorescent-labeled probe is degraded while a new PCR product is being synthesized. This process releases the fluorescent tag from the quencher and generates a fluorescence signal that is directly proportional to the amount of PCR product in the tube. ( D ) Regardless of the method of detection used, the fluorescent signal is detected and quantified over the course of the PCR assay, and gene expression levels are calculated. ds , Double-stranded; mRNA , messenger RNA.

Although gene expression microarray platforms are better suited to fresh tissue, PCR is an optimal technique for use on formalin-fixed paraffin-embedded tissue (FFPET), the currency of contemporary pathology laboratories.

Experimental Designs of DNA Microarray Experiments

There are three commonly used study designs for microarray experiments: class discovery , class comparison , and class prediction ( Fig. 10.3 ).

Fig. 10.3, Microarray study designs. ( A ) Class discovery is a hypothesis-independent exploratory approach, with gene expression microarray data to investigate the existence of distinct subgroups in an otherwise unselected series of samples. Identified subgroups can then be further assessed for their clinical significance, such as correlation with outcome. ( B ) Class comparison is a hypothesis-driven study, comparing the microarray-derived profiles of two or more predefined groups using supervised methods to identify the genetic differences between these groups. ( C ) Class prediction is an extension of class comparison analysis, where, after identifying differentially expressed genes between the two predefined groups, a multigene predictor is built to accurately predict the class membership of a new sample.

Class discovery is a hypothesis-independent exploratory analysis in which gene expression profiles of a series of unselected tumor samples are analyzed in an unsupervised manner to determine whether genetically distinct molecular subgroups emerge based on the patterns of gene expression. A commonly used data analysis method is hierarchical clustering, which groups the samples based on the similarity in their pattern of gene expression. The relationship between the samples can be graphically represented in a dendrogram (e.g., Figs. 10.3A and 10.4 ), in which the pattern and length of the branches reflect the relatedness of the samples, with shorter branches indicating more closely related gene expression profiles. Whether or not these groupings have clinical significance is determined subsequently.

Fig. 10.4, Cluster dendrogram depicting the gene expression patterns of 85 experimental samples representing 78 carcinomas, 3 fibroadenomas, and 4 normal tissues, analyzed by hierarchical clustering. The closer samples are together, the more similar are their expression profiles. The tumor specimens were divided into six subtypes based on differences in gene expression. The six subtypes of tumors from left to right are basal-like (orange) , ERBB2+ (red) , normal breast-like (green) , luminal subtype C (light blue) , luminal subtype B (yellow) , and luminal subtype A (dark blue) . Of note, the luminal C subtype has not been consistently identified in other microarray studies, particularly using scaled-down numbers of intrinsic genes, and is not considered one of the major intrinsic molecular subtypes. ER , estrogen receptor.

In contrast, class comparison studies are hypothesis driven and start with two or more predefined groups based on clinically meaningful endpoints, such as patients who develop early metastatic disease versus those who do not, or patients who respond to a particular therapy versus those who progress on treatment. The microarray-derived gene expression profiles of the two groups are compared using supervised analysis methods to determine whether there is a genetic basis for the differences in clinical outcome and, if present, to identify which genes or functional gene groups appear to be involved.

Class prediction is similar to class comparison as a hypothesis-driven, supervised analysis; the objective of class prediction, however, is to use the identified gene expression differences between the classes of interest to develop a multigene algorithm (a predictor or gene signature) that can be applied to expression profiles of samples whose class is unknown to predict the class membership of the new sample (e.g., a particular breast cancer subtype or clinical outcome). Because the genes of interest are already identified, class prediction studies often start with a much more limited set of candidate genes. Whereas class discovery and class comparison are both examples of a top-down method of conducting a microarray experiment, class prediction is considered a bottom-up method for microarray study design.

Initial Uses of Gene Microarrays in Breast Cancer

One of the seminal experiments in applying gene expression profiling to breast cancer, and perhaps the most prominent example of a class discovery microarray study, is the work reported by Perou and colleagues, in which they performed unsupervised hierarchical clustering of the gene expression profiles of 65 breast tissue samples. Using cDNA microarrays representing 8,102 human genes, the authors analyzed the gene expression profiles of malignant and benign breast lesions from a cohort of 43 patients (36 invasive ductal carcinomas, 2 invasive lobular carcinomas, 1 ductal carcinoma in situ, 1 fibroadenoma, and 3 normal breast samples, in addition to a number of biological replicates from the same patient). From the data, they defined a set of intrinsic genes comprising genes that showed significantly greater variation between tumors from different patients compared with paired tumor samples from the same patient. Hierarchical cluster analysis using this set of intrinsic genes identified four major groupings, or molecular subtypes, of breast cancer: luminal-like, human epidermal growth factor receptor 2 (HER2) positive, basal-like, and normal breastlike. A subsequent study from the same group using a larger number of tumors, and correlation with outcome data, confirmed the presence of the molecular subtypes of breast cancer and, in addition to the original subtypes, showed that the luminal-like tumors could be divided into at least two subgroups: luminal A and luminal B ( Fig. 10.4 ). The authors also demonstrated that the molecular subtypes were associated with very different clinical outcomes. Ensuing studies using additional microarray data sets confirmed the presence and clinical relevance of breast cancer–intrinsic subtypes ( Fig. 10.5 ). Additional details are provided in Chapter 20 .

Fig. 10.5, Kaplan-Meier analyses of disease outcome stratified by intrinsic subtype gene expression classification in two separate patient cohorts. ( A ) Time to development of distant metastasis in the 97 cases from van’t Veer and coworkers. ( B ) Overall survival for 72 patients with locally advanced breast cancer in the Norway cohort. Luminal C was grouped in together with luminal B and the normal-like tumor subgroup was omitted in both analyses.

Prognostic and Predictive Gene Expression Signatures

Since the initial description of intrinsic molecular subtypes in breast cancer, it is now firmly established that breast cancer does not represent a single disease process, and luminal A, luminal B, HER2-enriched, and basal-like have become integrated into the clinical realm. Based on their unique complement of genetic derangements, these molecular subtypes exhibit vastly differing biological behaviors. Both HER2-enriched and basal-like tumors are rapidly proliferative and highly aggressive, yet most responsive to chemotherapy (and anti-HER2–targeted therapy in the case of HER2-positive [HER2+] disease). Relapses occur early, with the majority occurring within 5 years of diagnosis. Luminal tumors, conversely, present a much broader range of behaviors, with favorable but chemoresistant and notoriously late-recurring luminal A tumors on the one hand, and aggressive, variably chemoresponsive luminal B tumors on the other.

In clinical practice, the estrogen receptor (ER)–positive (ER+)/HER2-negative (HER2–) cohort of patients (representing luminal tumors) is the most commonly encountered and represents often the most challenging treatment scenario. Many women within this category are, in fact, overtreated and subjected to the morbidity of cytotoxic chemotherapy for negligible benefit. Identifying those tumors with more aggressive biology that stand to benefit from the addition of chemotherapy versus those that are adequately treated by endocrine therapy alone has been the clinical impetus for the development of several gene expression assays and is the current indication where these assays play a role in clinical decision-making. Despite the multitude of reported prognostic signatures, only a minority of these assays have entered into clinical practice ( Table 10.2 ).

Table 10.2
Comparison of select commercially available molecular tests for prognostication in breast cancer.
Test Company Specimen type Training sample Number of genes Output Indicated patient population FDA clearance Guidelines incorporating test Prospective randomized trial
Mamma Print Agendia, (Amsterdam, the Netherlands) Fresh or FFPE Microarray data from 78 patients (<55 years, node negative, tumor <5 cm, ER+/–, HER2+/–, most without systemic therapy) 70 High risk or low risk for 10-year distant recurrence FDA cleared for women of all ages diagnosed with stage I or II invasive breast cancer, tumor size ≤5.0 cm, ER+/–, HER2+/–, lymph node negative Yes
  • St. Galen

  • ESMO

  • AGO

  • NCCN

MINDACT
Oncotype DX Genomic Health (Redwood City, Calif.) FFPE RT-PCR data from 447 FFPE samples from 3 clinical trials (most heavily weighted ER+, HER2+/–, node negative, tam treated; also used node positive, ER +/–, tam treated or chemo treated) 16 prognostic genes + 5 reference genes Recurrence score: low, intermediate, or high risk for 10-year distant recurrence Newly diagnosed breast cancer, stage I– IIIa, ER+, HER2–, node negative or 1–3 positive nodes No
  • St. Gallen

  • ESMO

  • AGO

  • NCCN

  • ASCO

  • TAILORx

  • RxPONDER

Prosigna (PAM50) Veracyte (South San Francisco, Calif.) FFPE or fresh For original PAM50 algorithm:

  • Subtype: microarray data from 189 tumor samples representing all subtypes and nodal status from heterogeneously treated women

  • ROR: microarray data from 141 patients (≤52 years, node negative, tumors <5 cm, ER+/–, HER2+/–, without systemic therapy)

50 tumor-related genes + 8 reference genes Risk of recurrence score: low, intermediate, or high risk for 10-year distant recurrence
  • FDA cleared in United States for use in postmenopausal women with hormone receptor–positive, node-negative (stage I or II), or node-positive (stage II) breast cancer to be treated with adjuvant endocrine therapy.

  • Outside the United States, node-negative or node-positive early stage (stages I, II, and IIIA) breast cancer

Yes
  • St. Gallen

  • ESMO

  • AGO

  • NCCN

Embedded correlative science substudy in RxPONDER trial
MapQuant Dx Genomic grade index QIAGEN (Marseille, France) Fresh and FFPE version Microarray data from 64 ER+ tumor samples (33 histological grade 1 tumors and 31 histological grade 3 tumors) 97 High genomic grade vs. low genomic grade; plus equivocal category for clinical test Patients with histological grade 2 breast cancer No None
EndoPredict Sividon Diagnostics GmbH (Koln, Germany) FFPE Microarray data and FFPET from 964 tumors from 6 different patient cohorts (ER+, HER2–, tam treated) 8 cancer genes + 3 control genes EP and EPclin low risk and high risk for 10-year distant recurrence Newly diagnosed breast cancer, ER+, HER2–, node negative or 1–3 nodes positive No
  • St. Gallen

  • ESMO

  • AGO

  • ASCO

None
Breast Cancer Index Bio Thera-nostics (San Diego, Calif.) FFPE
  • H/I: microarray data from 60 early breast cancers (ER+, HER2+/–, tam treated)

  • MGI: microarray data from 251 heterogeneously treated breast cancer patients

  • BCI: RT-PCR of 93 FFPE samples from tam-treated, node-negative women

72-gene H/I ratio + 5-gene molecular grade index
  • BCI prognostic score: low risk or high risk for 10-year distant recurrence

  • BCI predictive score (H/I ratio): high or low likelihood of benefit from extended endocrine therapy

Patients with newly diagnosed ER+ node-negative breast cancer, and patients who are recurrence free after an initial 5 years of adjuvant endocrine therapy (to assess benefit of extended hormonal therapy) No
  • St. Gallen

  • NCCN

  • ESMO

  • ASCO

None
AGO , German Gynecological Oncology Group; ASCO , American Society of Clinical Oncology; BCI , Breast Cancer Index; ER , estrogen receptor; ESMO , European Society for Medical Oncology; FDA , US Food and Drug Administration; FFPE , formalin-fixed, paraffin-embedded; FPPET , formalin-fixed, paraffin-embedded tissue; HER2 , human epidermal growth factor receptor 2; H/I , HOXB13:IL17BR; MGI , molecular grade index; MINDACT, Microarray In Node-Negative and 1 to 3 Positive Lymph Node Disease May Avoid Chemotherapy; NCCN , National Comprehensive Cancer Network; ROR , risk of recurrence; RT-PCR , reverse transcription polymerase chain reaction; tam , tamoxifen.

Commercially Available Signatures

MammaPrint 70-gene signature

Discovery Phase

The first successful breast cancer prognostic signature developed using a top-down approach was the 70-gene signature by van’t Veer et al. The signature was trained using archived, fresh frozen breast cancer specimens from a cohort of 78 predominantly systemically untreated patients, all younger than 55 years with tumors less than 5 cm, negative nodal status, and a mix of positive and negative ER and HER2 status ( Fig. 10.6A ). The samples were divided into two groups: those that developed metastatic disease within 5 years and those that remained metastasis-free for at least 5 years. Supervised analysis of approximately 25,000 genes (Agilent oligonucleotide Hu25K microarray, Agilent Technologies, Santa Clara, Calif.) identified approximately 5,000 genes that were significantly regulated, of which 231 genes were significantly correlated with outcome between the two groups. These 231 genes were rank-ordered based on the magnitude of their correlation coefficient. The prognostic classifier was optimized by sequentially adding in five genes from the top of the list, followed by evaluating the ability of the classifier to accurately classify using the “leave-one-out” method of cross-validation. The best prognostic accuracy was achieved with 70 genes, which form the basis of the 70-gene signature, subsequently commercialized as the MammaPrint assay (Agendia, Amsterdam, the Netherlands), that stratifies patients as being high risk or low risk for early metastatic recurrence. Because of the potential for withholding adjuvant chemotherapy from the good prognosis group, the authors adjusted their optimal accuracy threshold (most accurate cutoff point for classifying tumors to the correct outcome group) to an optimized sensitivity threshold that was set so that no more than 10% of poor prognosis tumors would be misclassified to the good prognosis group ( Fig. 10.6B ). The 70-gene signature was then tested in a cohort of 19 different patients and was shown to correctly classify 17 out of 19 tumors, thus validating the robustness of the 70-gene prognostic classifier.

Fig. 10.6, Supervised class comparison and prediction microarray study used to develop the 70-gene signature. ( A ) Classes were initially defined based on outcome in a training set of 78 sporadic breast tumors, stratified into a poor prognosis group with distant metastases within 5 years and a good prognosis group that remained metastasis free for at least 5 years. Supervised analysis identified genes correlated with outcome, from which 70 were chosen as the optimal classifier. ( B ) The expression data matrix of the 70 prognostic marker genes across the 78 breast cancers. Each row represents a tumor and each column a gene. Genes are ordered according to their correlation coefficient with the two prognostic groups. Tumors are ordered by their correlation to the average profile of the good prognosis group (middle right panel) . The solid line represents the prognostic classifier with optimal accuracy and the dashed line with optimized sensitivity. Above the dashed line are patients with a good prognosis signature, and below the dashed line are those with a poor prognosis signature. The metastasis status for each patient is shown (far right panel) : white indicates patients with metastases within 5 years, and blue indicates those patients disease free for at least 5 years. LND , lymph node dissection.

There are some points to consider regarding the development of the 70-gene signature. As stressed throughout this chapter, because omics-based data sets are composed of an extremely large number of molecular measurements relative to a small number of samples, overfitting of the data is a major concern. In the case of the 70-gene signature, the subset of 231 genes with the highest correlation with clinical outcome was selected from the original 25,000 genes using all samples, and only then was cross-validation performed to select the final 70 genes. This type of incomplete cross-validation can lead to significant overfitting, resulting in an omics-based test that performs well in cross-validation but results in much less discriminatory power on subsequent patient samples. The small sample size of both the training and the test sets is also worth mentioning, and certainly increased the risk of overfitting.

Test Validation Phase

Analytic validation of the MammaPrint assay confirms the reproducibility and precision of the test, with a reported maximum variation of 5% in multiple samplings of the same tissue. Clinical validity of the 70-gene signature was tested on a retrospective cohort of primary breast cancer samples from 295 patients (Netherlands Cancer Institute [NKI] data set), including 151 node-negative samples, 61 of which were used in the training of the signature. The study confirmed that the 70-gene signature was an independent predictor of outcome when included in a multivariate survival model together with clinicopathological parameters and therapy ( Fig. 10.7 ), and that it could stratify the prognostic subgroups defined by the St. Gallen and National Institutes of Health (NIH) criteria.

Fig. 10.7, Validation of the 70-gene signature in a cohort of 295 breast cancer patients. Kaplan-Meier analysis showing distant metastasis–free survival ( A ) and overall survival ( B ) among all 295 patients stratified by signature-defined risk group.

The validation sample, however, was not entirely independent, as the study did include 61 patients used in the training of the predictor, thereby overestimating the discriminatory power of the test. This was well demonstrated in the following validation study that used completely independent samples from node-negative, systemically untreated patients from the TRANSBIG consortium of European cancer centers. The authors also included the 151 node-negative patients from the aforementioned NKI mixed training/validation study as a comparison. The independent samples showed an adjusted odds ratio for time to distant metastasis in the MammaPrint high-risk group of 2.1 as compared with a 6.1 odds ratio in the mixed training/validation cohort; and the odds ratio for overall survival was 2.6 in the independent samples versus 17.5 in the mixed training/validation cohort ( Fig. 10.8 ). This highlights the inflation of discriminatory power seen when one includes training samples into the validation set and points to overfitting of the signature. Length of follow-up likely also contributed to the discrepant odds ratios between the two studies, as the 70-gene signature was shown to be highly time dependent, with better discriminatory power for shorter follow-up times; the median length of follow-up for the mixed training/validation NKI cohort was 6.7 years versus 13.6 years in the independent validation study. The independent TRANSBIG samples did validate MammaPrint to be prognostic, however, and to better risk stratify patients than Adjuvant! Online. (Adjuvant! Online was a popular online decision-making tool commonly used by oncologists to quantify the risk and benefit of adjuvant systemic therapy in a particular breast cancer patient based on clinicopathological factors. It was developed in 2001 and was shut down for updates in 2015. There is currently no information about a relaunch of the website.) The MammaPrint assay has further been validated to be prognostic in several additional retrospective cohort studies. Notably, the requirement for fresh tissue precluded development or validation using homogeneously treated clinical trial samples, which are preferable to heterogeneously treated convenience samples, and also prevented the types of prospective-retrospective studies using archived clinical trial material that are a proposed alternative route to generating level 1B evidence and have been used by the newer generation of gene expression signatures.

Fig. 10.8, Forest plots comparing hazard ratios (HRs) and 95% confidence intervals (CIs) for MammaPrint high-risk versus low-risk groups from a series of independent validation sets (first 5 rows; all combined in row 6, indicated by blue arrow) versus a validation set that included samples used in the training of the signature (row 7, indicated by green arrow). Time to distant metastases is shown in ( A ) and overall survival in ( B ). Note the dramatically inflated discriminatory power seen by including training samples in the validation step of a test.

Evaluation of Clinical Utility and Use

MammaPrint was initially evaluated in a prospective observational study: the microarRAy prognoSTics in breast cancER (RASTER) study, which was conducted in 16 community hospitals in the Netherlands to assess the feasibility of implementing MammaPrint into a community-based setting and to study the impact the test result had on clinical decision-making with respect to use of adjuvant systemic therapy. At 5 years of follow-up, the 5-year distant recurrence–free interval (DRFI) in patients with MammaPrint low-risk and Adjuvant! Online high-risk classification ( n = 124) was 98.4%, of which 76% of patients had not received adjuvant chemotherapy. These findings demonstrate that MammaPrint adds prognostic information beyond standard clinicopathological factors and laid the foundation for the international, prospective, randomized phase III MINDACT clinical trial ( M icroarray I n N ode-negative and 1 to 3 positive lymph node D isease May A void C hemo T herapy). The trial recruited 6,693 patients who were evaluated by both Adjuvant! Online and the 70-gene signature. Patients characterized as low risk in both assessments did not receive chemotherapy, whereas for patients characterized as high risk in both assessments, chemotherapy was advised. Patients with discordant results were randomized to use either the Adjuvant! Online or the 70-gene signature risk classification for treatment decision-making ( Fig. 10.9 ). Five-year outcome data have shown that patients who were classified as low risk by both assessments ( n = 2,745) had 5-year distant metastasis–free survival (DMFS) of 97.6% without use of chemotherapy, compared with 90.6% for patients who were classified as high risk by both assessments ( n = 1,806) and received chemotherapy. Within the discordant group ( n = 2,142), 592 patients were low risk by Adjuvant! Online but high risk by MammaPrint, and 1,550 patients were high risk by Adjuvant! Online but low risk by MammaPrint. The primary statistical analysis for the trial was based on the subset of this latter cohort (who were high risk based on clinicopathological factors but low risk by molecular profile) who did not receive chemotherapy ( n = 644). The 5-year DMFS in this group was 94.7% (95%, confidence interval [CI] = 92.5%–96.2%), which met the trial definition of a successful result (prespecified as a 5-year DMFS greater than 92%). Unfortunately, MINDACT was not powered to address the question of whether chemotherapy benefited the patients in the discordant groups. In the intent-to-treat analysis, for the Adjuvant! Online high/MammaPrint low group, 5-year DMFS was 1.5 percentage points higher with chemotherapy (95.9%) than without (94.4%) (hazard ratio [HR] = 0.78, P = .27). For the Adjuvant! Online low/MammaPrint high group, DMFS rates with and without chemotherapy were 95.8% and 95.0%, respectively (HR = 1.17, P = .66). Disease-free and overall survival rates showed similar nonsignificant trends. Overall, however, survival was excellent within the discordant groups, regardless of whether chemotherapy was administered, and any benefit from chemotherapy, if real, was modest at best. Therefore, in the context of an Adjuvant! Online low-risk result, there is no proven added benefit from a MammaPrint assay. Within the Adjuvant! Online high-risk group, 46% of patients in the trial had a MammaPrint low-risk result, which could result in a significant reduction in chemotherapy prescriptions. Ultimately, the trade-off between a possible small benefit from chemotherapy and the toxicity of chemotherapy remains in the hands of the individual patient and clinician.

Fig. 10.9, Design schema of MINDACT (Microarray in Node-Negative and 1 to 3 Positive Lymph Node Disease May Avoid Chemotherapy), a clinical utility study comparing the MammaPrint gene signature with the Adjuvant! Online clinical-pathological risk stratification tool. Patients with discordant results will be randomized to use either the Adjuvant! Online or MammaPrint risk classification for decision-making with regard to chemotherapy use.

Another point to consider is that the 70-gene signature is known to classify nearly all ER-negative (ER–) patients as high risk (96%–100% of patients in prior studies and 96% of the ER/progesterone receptor-negative [PgR–] tumors in the MINDACT trial ). However, so does Adjuvant! Online. Consequently, most (96%) of the patients randomized in the MINDACT trial had ER+ tumors because very few discrepancies in classification would be expected for the ER– group. In fact, most (81%) of the patients enrolled in the trial had ER+, HER2– tumors due to inherent enrollment bias (i.e., both the patient and oncologist had to be comfortable with the possibility of withholding chemotherapy in the event of a low-risk result). It may have been beneficial, therefore, to have focused the development and validation of the gene signature on ER+ patients who were receiving endocrine therapy (as was the case for Oncotype DX and EndoPredict [discussed later]). In the MINDACT trial, approximately 5% of tumors classified as low risk by MammaPrint were HER2+. It should also be noted that in trastuzumab-naïve patients, 2% to 22% of HER2+ breast cancers have been shown to have a good prognosis 70-gene signature; however, withholding chemotherapy and anti-HER2 agents in this group remains controversial and may be considered as an appropriate option in elderly patients older than 70 years of age.

Preplanned analysis of the MINDACT trial with an updated 8.7 years of follow-up has further reaffirmed the utility of MammaPrint within the group of patients ( n = 644) with high clinical risk/low genomic risk showing that therapeutic de-escalation with omission of chemotherapy yielded excellent 5-year DMFS of 95.1% (95% CI = 93.1%–96.6%) irrespective of the nodal status. The underpowered exploratory analysis by age showed that this gain was seen in patients older than 50 years of age. In patients younger than 50 years, a 5% benefit from the addition of chemotherapy was attributed to chemotherapy-induced suppression of the ovarian function. Further studies have shown that the use of MammaPrint impacts the recommendations for therapeutic escalation or de-escalation and positively influences the physician’s confidence for making such treatment decisions.

Current Status

Following the results of the MINDACT trial, a level of evidence (LoE) and grade of recommendation (GoR) of IA have been achieved for this prognostic signature. MammaPrint has been cleared by the US Food and Drug Administration (FDA) and marked by the Conformité Européenne (CE). It is performed in two companies’ central laboratories: one in the United States and one in the Netherlands. Originally developed for fresh tissue, MammaPrint received FDA 510(k) clearance in 2007 as an in vitro diagnostic multivariate index assay (IVDMIA). More recently, the assay has been adapted to FFPET and has also received FDA 510(k) clearance for the FFPE version. FDA-approved indications include use as a prognostic test for women younger than 61 years of age with lymph node–negative, stage I or II invasive breast carcinoma, with tumor size 5.0 cm or smaller and any ER and HER2 status. The use of MammaPrint is recommended in the recent guidelines from the American Society of Clinical Oncology (ASCO), National Comprehensive Cancer Network (NCCN), and European Society for Medical Oncology (ESMO) for consideration in hormone receptor–positive patients with high clinical risk (irrespective of the nodal status) for identifying good prognostic tumors where the benefit of chemotherapy is limited.

Further Developments

Recognizing the clinically relevant prognostic information contained within the biology of intrinsic molecular subtypes, Agendia has expanded its breast cancer assays to include molecular subtyping (BluePrint) as well as providing quantitative ER, PgR, and HER2 mRNA expression levels by microarray (TargetPrint). The BluePrint/MammaPrint assay allows for functional molecular subtype classification, which has shown better correlation with treatment response to neoadjuvant therapy than subtype based on standard ER, PgR, and HER2 assessment alone. Most recently, the company has developed kits to allow the MammaPrint and BluePrint tests to be performed on-site at reference laboratories that successfully complete the Agendia Partner Reference Lab certification process. Both BluePrint and TargetPrint are laboratory-developed tests, and neither is part of the FDA clearance for MammaPrint. The American Society of Clinical Oncology and College of American Pathologists (ASCO/CAP) 2013 guidelines reiterate that microarray and gene expression platforms are currently unsuitable for clinical HER2 testing; similarly, hormone receptor testing via gene expression has yet to be clinically validated for directing treatment decisions.

Oncotype DX 21-gene signature

Discovery Phase

Oncotype DX (Genomic Health, Redwood City, Calif.) differs from its predecessors in several important respects. It is a qRT-PCR test that is performed on FFPE tumor specimens. It has opened the door to permitting use of an incredibly valuable asset: archival paraffin tumor blocks from previous clinical trials. In addition, the Oncotype DX signature was developed using a purely bottom-up approach. Initially, 250 candidate genes were selected from the published literature, genomic databases, and previous DNA microarray studies, including intrinsic subtypes and the 70-gene signature. Corresponding primer sets were created and qRT-PCR was used to generate quantitative expression levels of the genes from 447 FFPE samples from three separate clinical studies, including 233 samples from the tamoxifen-only arm of the National Surgical Adjuvant Breast and Bowel Project B-20 (NSABP B-20) trial. These latter samples, corresponding to ER+, node-negative tumors from tamoxifen-treated women, represented the most relevant patient population and were most heavily weighted in anticipation of validating on the similar NSABP B-14 trial. Genes were selected based on their correlation with recurrence across the studies as well as the consistency of the primer pair performance. The resulting algorithm is a 21-gene signature (16 prognostic genes plus 5 reference genes), which generates a 0–100 range of recurrence score (RS) that classifies patients as low, intermediate, or high risk of recurrence.

Validation Phase

The 21-gene signature was clinically validated in a completely independent sample set: patients from the tamoxifen arm of the NSABP B-14 trial (a tamoxifen versus placebo trial for ER+ breast cancers). The study was conducted as a rigorous prospective-retrospective study, with a locked-down computational model and predefined statistical analysis plan including prespecified outcome endpoints and cutoff points for RS. The study confirmed the ability of the signature to distinguish prognostically distinct groups based on risk group assignment. In this study, the rates of distant recurrence were 6.8%, 14.3%, and 30.5% for the low- (RS <18), intermediate- (RS ≥18 and <31), and high-risk groups (RS ≥31), respectively.

The second Oncotype DX validation study, designed to assess whether the signature predicts for benefit from chemotherapy, was performed using specimens from the NSABP B-20 trial (randomizing women with ER+ breast cancer to tamoxifen only versus tamoxifen plus chemotherapy). The results demonstrated that women assigned a low-risk RS showed no statistically significant difference in distant relapse–free survival (DRFS) with the addition of chemotherapy; whereas within the high-risk group, women treated with chemotherapy had significantly improved DRFS compared with the tamoxifen-only arm ( Fig. 10.10 ). Once again, however, this is an example of a pervasive methodological flaw in earlier validation studies: mixing of training and validation sets. Samples from the NSABP B-20 tamoxifen-only arm were used (and most heavily weighted) in the training of the algorithm. Although it is no surprise that highly proliferative tumors would respond most to chemotherapy (RS most heavily weighs proliferation-associated genes), including training samples in the validation set overinflates the benefit seen by the addition of chemotherapy in this group, as well as the predictive capacity of the Oncotype DX risk stratification. Furthermore, the study included HER2+ tumors, which are not part of the current clinical indication. (At the time, the HER2 story was still unfolding, and trastuzumab had not yet been approved outside the metastatic setting.) In the clinical setting, where HER2+ tumors are excluded from Oncotype DX testing, the proportion of cases falling into the high-risk group is significantly lower than reported within the original NSABP B-14 and B-20 trial cohorts (both of which included HER2+ tumors). One study reported only 10% of clinical tumor samples being classified as high risk (versus 25% in the validation studies) with a proportional rise in the percent of cases classified as intermediate RS (40% of clinical samples versus 20% in the validation studies). An exploratory analysis of the NSABP B-20 study has recently addressed this controversial issue by eliminating HER2+ tumors (defined by a cutoff of ≥11.5 as determined by reverse-transcriptase polymerase chain reaction [RT-PCR]). The reduced number of events after exclusion of HER2+ tumors yielded statistically insignificant benefit from addition of chemotherapy to tamoxifen versus tamoxifen alone in the overall NSABP B-20 cohort (DRFS: 93% vs. 90%; P = 0.06); however, a significant benefit was maintained with the inclusion of chemotherapy to tamoxifen in tumors with an RS ≥31 (HR = 0.18; 95% CI: 0.07–0.47; P <0.001).

Fig. 10.10, Oncotype DX as a predictor of chemotherapy benefit within the National Surgical Adjuvant Breast and Bowel Project B-20 (NSABP B-20) trial. Patients with low recurrence score (RS) tumors derived minimal benefit from chemotherapy, whereas patients with high-risk tumors showed a 28% absolute benefit. Shown is the Kaplan-Meier estimate for freedom from distant recurrence in the high-risk group, improved from 60% to 88% by adding chemotherapy to tamoxifen. Although seeming impressive, criticism stems from the inclusion of specimens used in the training of the algorithm as well as HER2+ tumors, both of which contribute to overinflation of apparent chemotherapy benefit from high-risk RS classification. DRFS , Distant relapse–free survival; Tam , tamoxifen.

Despite these initial methodological shortcomings, Oncotype DX has been successfully clinically validated as a prognostic assay in numerous studies. Analytic validity of the assay has also been demonstrated. Currently, Oncotype DX is the most widely used of the prognostic breast cancer molecular tests in clinical practice. Typical indications for Oncotype DX are in a patient with a node-negative, ER+, HER2– tumor where the benefit of adjuvant chemotherapy is in question. The role of Oncotype DX in node-positive patients continues to be defined. One of the initial studies was a prospective-retrospective designed study using archived tumor material from a randomized clinical trial (the Southwest Oncology Group [SWOG] S8814 trial) in postmenopausal, axillary lymph node–positive, ER+ breast cancer. The results showed that patients with a high RS benefited from the addition of chemotherapy to tamoxifen, whereas patients with a low RS did not show significant benefit. As the HER2+ tumors were not excluded from the analysis, the performance of the assay in the relevant ER+, HER2– population is unclear. A subsequent combined analysis of five studies including more than 9,000 node-positive patients (which also included the SWOG S8814 trial) treated with endocrine therapy alone showed that a low RS reliably categorized patients harboring limited nodal metastasis with favorable prognosis, though the long-term follow-up is awaited.

Evaluation of Clinical Utility and Use

The benefit of adding chemotherapy to endocrine therapy within the intermediate RS group has been evaluated in two prospective randomized trials: TAILORx and RxPONDER. In the TAILORx trial ( Fig. 10.11A ), patients with node-negative, hormone receptor–positive, HER2– breast cancer receive the Oncotype DX assay. Women with an RS less than 11 receive hormonal therapy, women with an RS greater than 25 receive chemotherapy in addition to hormonal therapy, and women in the middle range (11–25) are randomized to chemotherapy plus hormonal therapy versus hormonal therapy alone. These cutoffs differ from the current risk group cutoff points, where the clinical intermediate score is 18 to 30. The study recruited 10,253 eligible women. Initial 5-year outcome results for the low-risk (RS of 0–10) group, which comprised 15.9% of the eligible patient population, showed that after endocrine therapy alone, the 5-year distant recurrence–free rate was 99.3%, freedom from any recurrence was 98.8%, and overall survival was 98%. Although widely anticipated, this is an important result. These data confirm that these very low-risk women are safely treated by endocrine therapy alone and that there is a negligible role for the addition of chemotherapy. More important, the results of the 67.3% of patients falling into the midrange RS of 11 to 25 who were randomized to the addition of chemotherapy versus endocrine therapy alone were recently reported at a 9-year follow-up showing that endocrine therapy was noninferior to chemoendocrine therapy for invasive disease–free survival (83.3% vs. 84.3%), distant recurrence–free survival (94.5% vs. 95%), and freedom from any recurrence (92.2% vs. 92.9%). Within the high RS range (26–100), where all women were recommended chemotherapy in addition to endocrine therapy, the estimated 5-year freedom from distant recurrence was 93%, which is better than expected with endocrine therapy alone.

Fig. 10.11, Study design schemas for the Oncotype DX phase III trials, TAILORx ( A ) and RxPONDER ( B ). ER , Estrogen receptor; LVI+ , lymphovascular space invasion positive; PgR , progesterone receptor; RS , recurrence score.

RxPONDER ( Fig. 10.11B ) is an ongoing prospective randomized clinical trial including more than 5,000 eligible women designed to investigate chemotherapy benefit in hormone receptor–positive/HER2– breast cancer with 1 to 3 positive nodes subjected to Oncotype DX testing. Those with an RS of 0 to 25 are randomized to hormonal therapy alone versus chemotherapy plus endocrine therapy. The results of the interim analysis demonstrate that adjuvant therapy can be safely de-escalated to endocrine therapy alone in postmenopausal women with limited node-positive disease as no benefit is derived from addition of chemotherapy. In contrast, among premenopausal women, the addition of chemotherapy reduced the hazards for an invasive event by 46% regardless of the clinicopathological variables, and yielded superior 5-year overall survival compared to women treated with endocrine therapy alone. These findings could possibly be attributed to chemotherapy-induced ovarian suppression, as also previously reported in the SOFT/TEXT trial; however, as the information related to chemotherapy-induced amenorrhea was not collected, it remains a speculation. Long-term results with 15-year follow-up data will be reported at a later date.

Note that, unlike the MINDACT trial, neither of these studies compares outcome based on use of Oncotype DX for clinical decision-making compared with treatment decisions based on traditional clinicopathological variables alone, widely regarded as the definition of a clinical utility study.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here