Genomic Approaches to Hematology


Introduction

The publication of the sequence of the human genome in 2001 heralded a new era in biomedical research and delivered a novel perspective on the biologic basis of the leukemias and lymphomas. A major tenet of these new approaches was their emphasis on the generation of large unbiased datasets as a means of discovery. The rapid application of this methodology combined with ready access to tissue for analysis pushed hematology into an era of “ precision medicine” based on the use of molecular diagnostics and targeted interventions. The speed with which this approach was taken up was enhanced by the reduction in sequencing costs, which made testing generally available. In the background, large-scale efforts led to the establishment of repositories of genomic data (e.g., The Cancer Genome Atlas ), which allowed the development of effective diagnostic, prognostic, and predictive biomarkers. The extension of this information by integrating molecular markers into the World Health Organization (WHO) classification enhanced disease definitions, making their behavior easier to understand and predict. Risk-stratified therapy based on the integration of molecular markers is now a reality for many hematologic cancers, and the development of predictive markers is a particular aim based on targeting specific genomic variants (e.g., BRAF inhibitors targeting BRAF V600E mutations). This chapter describes the approaches and progress that have been made for the integration of genomic approaches into the clinic to improve the management of hematologic diseases.

General Principles of Genomic Testing

Genomic Analysis

Analysis of the genome ( Table 3.1 ) aims to identify, quantify, or compare genomic features such as DNA sequence, structural variation (SV), gene expression, or regulatory and functional annotation at a genomic scale. Methods for genomic analysis typically require high-throughput sequencing and computational analysis.

Table 3.1
Definition of the Different Genomes
  • Nuclear genome represents the DNA that may be found within the nucleus of a cell that encodes the majority of DNA in eukaryote cells. In humans, it comprises approximately 3,200,000,000 nucleotides, divided into 24 linear sections, each contained in a different chromosome. These 24 chromosomes consist of 22 autosomes and the two sex chromosomes. The vast majority of cells, somatic cells, are diploid (in contrast to gametes, which are haploid). The spontaneous mutation rate of nuclear DNA is low (0.3%).

  • Mitochondrial genome: is a closed circular DNA molecule of approximately 16,500 nucleotides, present within the mitochondria. It contains 37 genes, all of which are crucial for normal mitochondrial function. Thirteen of these genes encode enzymes involved in oxidative phosphorylation; the remaining genes code for transfer RNA (tRNA) and ribosomal RNA (rRNA). Haploid, the mitochondrial genome is inherited through the mother. It has a higher mutation rate than nuclear DNA, that is overcome by the multiple copies (100–10,000) present in one cell.

  • Epigenome : The epigenome comprises specific covalent modifications of the chromatin that ensure the somatic inheritance of differentiated cell states. Not only does it act during the differentiation of somatic cells but also in response to environmental cues and stresses. The passing on of these modulations to the descendants constitutes epigenetic inheritance. The structure and function of the epigenome are controlled by covalent marks applied to components of a nucleosome (DNA and histones) by enzymes (“writers”). These marks instruct the proteins that recognize them (“readers”) to identify and remodel particular regions of the genome in order to modulate expression. The plasticity of the epigenome comes from the “erasers”, enzymes capable of removing active and repressive marks. Disturbed in cancers, epigenetic changes modulate both the structure and the function of the chromatin.

  • Microbiome : is genetic material of all bacteria, fungi, protozoa, and viruses that live on and within the human body. Its role in disease susceptibility is largely unknown. Recent technologic advances in DNA sequencing and the development of metagenomics have made it feasible to analyze the entire human microbiome and to gain insight into its composition.

DNA-based whole genome high-throughput sequencing approaches for the detection of genetic variants have been used to identify differences between individuals or pathologic conditions. Typically, this approach aims at identifying single-nucleotide variants (SNVs), small insertions and deletions (indels), and SVs. SVs are diverse, ranging from approximately 50 base pairs (bp) to more than megabases in size, and affect more of the genome than any other class of sequence variant. They comprise a number of subclasses of unbalanced copy number abnormalities (CNAs), which include deletions, duplications, and insertions, as well as balanced rearrangements, such as inversions and interchromosomal and intrachromosomal translocations. In addition, SVs include mobile element insertions, multiallelic copy number variants of highly variable copy number, segmental duplications, and complex rearrangements that consist of multiple combinations of these events.

Genome-wide analysis of gene expression, also referred to as transcriptomics, is the study of transcription at the genomic scale. These analyses use RNA and analysis results from microarrays or high-throughput sequencing. The results can be used to determine the range of genes expressed and their isoforms within a particular cell or tissue type, for a disease, or associated with a clinical phenotype such as risk status.

The epigenome refers to the changes made to the DNA that govern its function and include methylation and acetylation of DNA, histones, nonhistone chromatin proteins, and nuclear RNA. The tools for studying epigenetic phenomena are focused on the global analysis of epigenetic status of the cells and tissues. With improvements in epigenomic profiling, new opportunities are available to understand normal epigenomes and their perturbations in cancer.

The Importance of Sample Quality

The acquisition of the appropriate samples for a genomic analysis is one of the most crucial steps for the generation of an accurate result. This is particularly true for gene expression analysis based on samples of RNA. Gene expression is a dynamic process that can be affected by cellular manipulation, RNA abundance and stability, isolation methodology, and the time between when the sample was obtained and subsequently isolated. The highest-quality RNA is obtained if, as soon as possible after harvesting a sample, cells are dissolved in a solution that inactivates RNase enzymes. It is also possible to measure gene expression from stored tissue such as formalin-fixed, paraffin-embedded (FFPE) tissues, but the variability in quality of these data makes routine interpretation difficult. The cellular makeup of samples for RNA analysis is also important (e.g., tumor cells, normal cells, stromal cells, and immune cells) if tumor-specific expression patterns are important; if this is the case, then methods for cell separation are required. This requirement may be less of an issue in tissues such as bone marrow samples taken from patients with acute myeloid leukemia (AML), where the number of blasts cells is high; however, if the percentage of blast cells is low, a selection approach is required. Methods used for this purpose include: flow cytometry, immunomagnetic bead sorting, and laser-capture microdissection. A good example of where cell selection is important is in multiple myeloma, where CD138 selection is crucial for gene expression analysis and is mandated in guidelines for interphase fluorescence in situ hybridization (iFISH) analysis.

At the DNA level the admixture of nonmalignant cells within a tumor may obscure the presence of mutations in the tumor cells, especially if they constitute a minority population. However, to be sure of detecting mutations in a tumor cell population, a greater depth of sequencing is required. Therefore it is critical to have a rough estimate of the purity of the sample so that the appropriate genomic approach can be used. Furthermore, the clonal fraction of tumor cells carrying a specific mutation is important if it is considered to be “actionable.” In this respect, knowing the subclonal percentage is important not only for detecting the mutation but also to be sure that a therapeutic benefit could be expected.

Analytical Considerations

It is important to distinguish approaches used for discovery from those required for a routine diagnosis. For discovery, unsupervised learning approaches are used in which samples are grouped on the basis of data obtained without regard to any prior knowledge of either the samples or the disease. Unsupervised learning methods that have been used include hierarchical clustering, principal component analysis, nonnegative matrix factorization, k -means clustering, and t -distributed stochastic neighbor embedding. Great care must be taken in the interpretation of clustering results because clusters may be caused by irrelevant factors such as sample processing, termed batch effect. Adjustments and normalization processes may be required to account for batch effects to minimize the effect of sample processing and to ensure comparability. Unsupervised learning approaches have been used to cluster leukemia, lymphoma, or myeloma, based on their gene expression profiles, with the goal of uncovering the most robust classification schemes. In myeloma, this approach led to the description of the Translocation Cyclin D classification, which described the major molecular subtypes of disease.

In contrast to unsupervised approaches, supervised learning approaches are best suited for comparing data between classes of samples that can be distinguished by a known property, such as biologic subtype or clinical outcome. For example, to determine the gene expression differences between different leukemia subtypes with distinct genetic abnormalities, one would use a supervised approach.

For routine diagnostic testing the optimum analysis approach is to have a clear idea of the information required and to report the data in a yes/no binary fashion to ensure clinical interpretability.

Next-Generation Sequencing Technology

Next-generation sequencing (NGS) has transformed the field of molecular diagnostics, leading to its routine uptake in clinical care. A single lane on a modern sequencer generates a huge amount of data, allowing for multiplexing and testing of samples.

RNA Profiling

In the late 1990s, profiling was done using an array format in which sequence-specific probes were immobilized onto a solid surface mRNA. These probes were hybridized to RNA from a sample of interest that was labeled with a fluorescent tag, and the array was captured by a laser-scanning device. However, more recently sequencing-based approaches (RNA sequencing [RNA-seq]) have come to dominate because they allow for the profiling of previously unknown genes, alternative splice forms of known mRNAs, and gene fusions.

Expression profiling of FFPE tissues deserves consideration because formalin fixation causes the degradation of mRNAs into fragments of only approximately 80 nucleotides in length. Array-based profiling approaches do not work well, particularly those that involve labeling of the mRNAs by priming of the 3′ polyadenylation tail. However, approaches have been developed that allow for the profiling of FFPE-derived tissues such that even degraded mRNAs can be profiled. Although it is likely that any method applied to FFPE samples will yield “noisier” data than frozen samples, the ability to analyze archived material, particularly those samples with long-term clinical outcome data, is sometimes invaluable. An example of this type of utility is the Lymphoma/Leukemia Molecular Profiling Project (LLMPP) Lymph2Cx assay, which is a parsimonious digital gene expression–based test (NanoString) for cell of origin (COO) assignment in FFPE tissue routinely produced from standard diagnostic processes. The NanoString’s nCounter technology is a variation on the DNA microarray and uses molecular barcodes and microscopic imaging to detect and count-up to several hundred unique transcripts in one hybridization reaction.

Noncoding RNA

Two major classes of noncoding RNAs are short RNAs, known as microRNAs (miRNAs), and large intergenic noncoding RNAs (lincRNAs). miRNAs are small (approximately 22-nucleotide) RNAs that do not encode for proteins but bind to mRNA transcripts to regulate translation and mRNA stability. Several hundred miRNAs are thought to exist in the human genome. In mammalian cells, a role for miRNAs has been recognized in the regulation of cellular differentiation through regulation of translation of key proteins. Not only are many miRNAs differentially expressed across hematopoietic lineages, but several miRNAs have also been demonstrated to play key functional roles in hematopoietic lineage specification and differentiation. The expression and/or function of several miRNAs is altered by chromosomal translocations, deletions, or mutations in leukemia and other hemopoietic disease. In addition, members of the protein complex, including the protein DICER, that process the maturation of miRNAs from longer RNA forms, have been implicated in malignancy. Long noncoding (lnc)RNAs are approximately 1000 nucleotides in length and number approximately 5000 in the human genome. Recent evidence suggests that they may play important roles in establishing and maintaining cell fate and may play key roles in regulation of the epigenome. Interestingly, lncRNAs appear to have exquisite tissue-specific patterns of expression, suggesting that they may have diagnostic potential.

Mutation Detection Strategies

NGS can yield near-perfect fidelity for the detection of a mutation at a specific site, but at the same time, the error rates for any given sequencing read can be as high as 1%. This paradox reflects the fact that most sequencing errors are idiosyncratic, and, by simply resequencing the same region multiple times, developing a consensus results in such errors being lost. Thus, for normal, diploid genomes, sequencing is typically done to at least 30-fold coverage, meaning 30 reads for any given locus (referred to as 30× coverage) . The coverage is influenced by copy number changes (e.g., aneuploidy or regions or gene deletion or amplification) or when there is admixture of normal cells within the tumor sample. To compensate for these copy number variations and normal cell contamination, typical cancer sequencing projects aim for a depth of coverage of at least 100×.

The US Food and Drug Administration (FDA) has issued guidelines for test design, performance characteristics, run quality metrics, performance evaluation, variant annotation, and filtering. The guidance identifies six key aspects of test design: the indications for use statement, user needs for the tests, specimen type, the region of the genome being interrogated, performance needs, and components and methods. The guidance further identifies four key aspects of test performance: accuracy, precision, limit of detection, and analytical specificity. They also specify six test run quality metrics: coverage, specimen quality, DNA quality and processing, sequence generation base calling, mapping or assembly metrics, and variant calling metrics. Several minimum standards for test performance and quality metrics are suggested, including the following:

  • a point estimate of 99.9% accuracy (e.g., positive predictive agreement, negative predictive agreement, and technical positive predictive value) with a lower bound of the 95% confidence interval of 99.0% for all variant types reported;

  • reproducibility and repeatability of at least 95% of the lower bound of the 95% confidence interval;

  • a minimum coverage (i.e., depth and completeness threshold) of 20× for targeted panels and 30× average coverage depth at 100% of bases targeted in the panel or 97% of bases for whole exome sequencing;

  • base calling with a base quality score of at least 30.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here