Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The understanding of the biologic basis of hematopoietic diseases in general and neoplasia in particular has been significantly enhanced by the application of molecular techniques to the study of these diseases. The use of Southern blotting hybridization analysis initiated the integration of molecular biologic techniques into hematopathology and has substantially contributed to understanding the clonality status of lymphoproliferative disorders. The availability of molecular probes to the antigen receptor loci facilitated identification and molecular cloning of partner genes involved in chromosomal translocations underlying the pathogenesis of several lymphoid neoplasms. The advent of the polymerase chain reaction (PCR) had a dramatic impact on the ability to interrogate primary tissue samples for molecular aberrations, including refinement of the understanding of clonality in lymphoproliferative disorders. The versatility of PCR resulted in adaptations including rapid amplification of cDNA ends from both 5′ and 3′ ends for the identification of genes involved in chimeric fusions driving hematopoietic neoplasia. The implementation of PCR–cycle sequencing profoundly enhanced the ability to identify somatic point mutations in a variety of neoplasms and tracking of clonal genetic aberrations in tumor progression. The recent advent of massively parallel next-generation sequencing has offered the most detailed view of genetic aberrations in cancers. In this regard, an acute myeloid leukemia (AML) genome was the first cancer genome to be sequenced. This development has led to the refinement of our understanding of the genetic basis of AML and recognition of prognostically relevant subgroups. Ever since this study, several other genomes of hematopoietic neoplasms have been sequenced. Clearly, this technology will continue to reveal further insights and is increasingly being implemented in routine clinical diagnostics.
The successful implementation of molecular techniques depends on the reliable and robust extraction of nucleic acids. The protocols implemented depend on the specimen type and quantity as well as on the quality and amount of nucleic acid required for the assay. PCR now permits the analysis of a wide variety of specimens, including fresh whole blood, plasma, serum, fine-needle aspirates, tissue biopsies, cultured cells, cerebrospinal fluid, and fixed paraffin-embedded tissues. Microdissected cells or ethanol-fixed cells scraped from cytologic slide preparations may have DNA extracted that is readily interrogated by molecular techniques. Nucleic acid extraction protocols can be performed as either solution- or solid-phase extraction-based approaches with effective isolation of DNA or RNA for downstream applications.
B and T cells exhibit the unique characteristic of undergoing somatic DNA rearrangements of their antigen receptor gene to produce functional immunoglobulin and T-cell receptor (TCR) molecules, respectively. The rearrangements provide a critical mechanism for generation of a significant component of the diversity of the immunoglobulin and TCR genes involved in the specificity of the immune response (reviewed in reference ).
The immunoglobulin genes encode immunoglobulins that are produced exclusively by B cells. Immunoglobulin molecules are heterodimeric proteins consisting of two identical heavy chains linked with two identical light chains, kappa (κ) and lambda (λ). The immunoglobulin genes are located on different chromosomal loci. The immunoglobulin heavy (IGH) chain gene is located on 14q32, the IGK at 2p12, and the IGL at 22q11.
In the germline configuration, the antigen receptor genes are composed of non-contiguous segments of DNA grouped into variable (V), diversity (D), joining (J), and constant (C) regions. All three immunoglobulin genes contain V, J, and C regions, but only IGH genes contain D regions ( Fig. 6-1, A ). The IGH region contains about 45 functional V region segments and about 23 DH and 6 JH segments. The human immunoglobulin constant region contains 11 C region segments that define nine functional immunoglobulin classes and subclasses (IgM, IgD, IgG1, IgG2, IgG3, IgG4, IgA1, IgA2, and IgE). Early in B-cell development in the bone marrow, genetic recombination events occurring at the DNA level result in the initial joining of a single D segment with a J segment, followed by rearrangement of the partially rearranged D-J region to a V segment ( Fig. 6-1, B ). These events are mediated by the recombination activating gene (RAG1/RAG2) complex. The fused V-D-J region is transcribed and joined to the Cµ (IgM) constant region segment at the RNA level. Successful rearrangement of one of the IGH loci is followed by light chain gene rearrangement, which entails direct joining of the V to J region segments because the light chain genes lack D segments. IGK rearrangement typically precedes IGL rearrangement. Further combinatorial diversity is generated by addition of non-germline palindromic (P) nucleotides through non-homologous end joining and incorporation of non-templated (N) nucleotides by the enzymatic activity of DNA terminal deoxynucleotidyl transferase (TdT).
In the germinal centers of peripheral lymphoid tissues such as lymph nodes, the IGH genes are subjected to somatic hypermutation (SHM), followed by class switch recombination, and the IGH V-D-J segment is fused to a G, A, or E constant region segment, leading to expression of the gamma (γ), alpha (α), or epsilon (ε) heavy chains, respectively, in the B cells. Both of these processes are mediated by activation-induced cytidine deaminase (reviewed in reference ). SHM refines the specificity of the antibodies, with the highest level occurring in the third complementarity-determining region (CDR3). Thus, primers for IGH gene rearrangements by PCR are designed to anneal to the framework regions where the hypermutation rate is lowest.
The TCR is a heterodimeric protein comprising either an alpha (α) and beta (β) chain or a gamma (γ) and delta (δ) chain. A mature T cell expresses only either an αβ or a γδ TCR heterodimer. The TCR alpha (TRA) and delta (TRD) genes are located on chromosome 14q11.2. Indeed, the entire TRD gene is located within the TRA locus. The TRB encodes the TCRβ protein and is located at chromosome 7p34, and the TRG gene encodes TCRγ and is located at 7p14.
Precursor T cells migrate from the bone marrow to the thymus to undergo maturation into competent peripheral (post-thymic) T cells. The hierarchy of the TCR gene rearrangements occurring during T-cell development is such that the TRD genes are the first to undergo rearrangement, followed by the TRG genes. As a result of these rearrangements, a small proportion of T cells express γδ TCRs. The TRB genes are the next to undergo rearrangement. This is detectable in the CD4 + CD8 + cortical thymocyte stage. The TRA genes are the next to rearrange, and this leads to deletion of the TRD locus that is located within the alpha locus. Successful TRB and TRA rearrangements lead to expression of TCRαβ protein expression. T-cell maturation occurs with further thymic selection and egress of the mature T cells out of the thymus to the periphery. This sequence of rearrangements has practical implications for T-cell clonality assays by PCR because most T-cell lymphomas express TCRαβ, and most of these will have undergone rearrangements of the TRG genes.
The TCR gene rearrangement process is similar to that which occurs at the immunoglobulin loci ( Fig. 6-1, C ). Notably, however, SHM does not occur in the TCR genes and thus does not contribute to diversity of the T-cell repertoire, and class switch recombination does not occur either.
The expression of immunoglobulin light chain molecules in mature B cells provides an avenue for convenient immunophenotypic assessment of clonality status in mature B-cell populations. Thus, whereas clonality may be readily determined in mature B cells by immunophenotypic methods, determination of clonality by immunophenotyping of T cells can be technically challenging. Clonality assays are frequently used in clinical contexts to establish the monoclonal status of suspicious lymphoid proliferations. Notwithstanding their utility in this setting, it is important to recognize that monoclonality is not equivalent to malignancy, and all laboratory results should be interpreted within relevant clinicopathologic contexts. Conversely, the absence of clonality does not exclude the presence of (lymphoid) malignancy.
PCR (discussed in detail later) is the main approach for the detection of clonal lymphoid proliferations. The assay design uses consensus V region and J region primers in PCR amplification followed by electrophoresis. Because each lymphocyte harbors a unique immunoglobulin or TCR gene rearrangement, clonality analysis of polyclonal populations yields multiple products distributed over a size range within the amplicon detection limit of the PCR assay. This is visualized as a smear in agarose gels, a ladder pattern on polyacrylamide gels ( Fig. 6-1, D ), or a multipeak pattern in capillary gels that are capable of single-base resolution. Protocols for the assessment of clonality evaluating the IGH, IGK, and IGL loci have matured to routine use and standardization in clinical laboratories. Similarly, standardized T-cell clonality assays assessing the TRG, TRB, and TRD have also been successfully implemented.
Capillary electrophoresis (CE), which separates products on the basis of nucleotide number, has emerged as a reliable platform for the analysis of products from clonality PCR assays. Other options include denaturing gradient gel electrophoresis and heteroduplex analysis on polyacrylamide gels that discriminate PCR products on the basis of denaturation parameters, such as melting temperature, reflecting their nucleotide composition. Polyclonal B- or T-cell populations yield a pseudo-gaussian distribution of peaks when analyzed by CE ( Fig. 6-1, E ). Each of the discrete peaks represents many antigen receptors that yield amplicons of identical size.
Monoclonal populations are identified with CE when one or two dominant peaks substantially above that of the next highest background peak are observed. The sensitivity of clonality assays by PCR and CE reliably permits detection of neoplastic populations at a 5% level in the background of polyclonal lymphocytes.
The PCR is primer-directed in vitro amplification of nucleic acid using a thermostable polymerase and thermal cycling to generate exponential copies from a DNA template. In most protocols, the PCR requires template DNA, thermostable DNA polymerase, oligonucleotide primers that are designed to be complementary to target sequence, deoxynucleoside triphosphates (dNTPs) from each base (dATP, dTTP, dGTP, dCTP), and Mg 2+ . For amplification of RNA, a cDNA synthesis step precedes the PCR amplification. The amplification reaction entails multiple cycles of denaturation, primer annealing, and extension. For conventional PCR, postamplification analysis involves electrophoresis of the products generated from the amplification reaction. Authentication of the product generated is based on visualization of bands of expected size for the amplicon. An alternative and popular quantitative format is real-time PCR, wherein homogeneous assays are performed with amplification product synthesis and analysis simultaneously occurring in a closed-tube format. A perfectly efficient PCR assay (i.e., doubling of DNA copy number every cycle) in one 30-cycle reaction yields approximately 10 9 (or more accurately 1,073,741,824 = 2 30 ) copies of product. This amplification capacity renders PCR extremely sensitive and well suited for the molecular diagnosis and monitoring of hematopoietic malignant neoplasms that carry characteristic genetic aberrations, such as translocations. Accordingly, PCR is more sensitive than conventional cytogenetics or fluorescence in situ hybridization (FISH) analysis and can detect 1 neoplastic cell in a background of 1000 normal cells when interrogating antigen receptor gene rearrangements or 1 copy of mutant DNA in a background of approximately 10 5 wild-type DNA sequences.
Real-time PCR is used to describe a technique wherein in vitro nucleic acid synthesis is monitored during amplification rather than at the end point. The technique incorporates fluorescent reporters into the amplification reaction and is monitored by use of thermal cyclers integrated with devices configured to monitor fluorescence. Fluorescence monitoring during the amplification reaction permits identification and quantification of the PCR product. Because amplification and detection occur simultaneously in the same tube, real-time PCR is advantageous in that the process is rapid and less subject to the risk of contamination arising from liberation of amplicons from opening tubes before electrophoresis. Further, the ability to perform accurate relative and absolute quantification has favored the use of this approach in many applications in the clinical laboratory.
The fluorescent reporters used in real-time PCR assays are broadly classified into two major categories: non-specific nucleic acid binding dye-based and specific probe-based methods. The profile of real-time PCR resembles a logistic regression curve wherein there is an initial lag phase followed by a log-linear or exponential phase and finally a plateau phase. Efficient PCR is associated with doubling of the number of copies of the target, and this is reflected in a flat linear phase. At a critical fractional cycle number known as the cycle threshold (C T ), there is an exponential increase in product abundance reflected as geometric increases in the fluorescence levels above background. Accordingly, the C T is defined as the number of cycles required for the fluorescent signal to exceed the background signal. C T levels vary in inverse proportion to the starting quantity of the target nucleic acid in the sample, that is, the C T value is lower when the initial quantity of the template DNA is most abundant. Conversely, lower levels of input template will yield higher C T values.
Three basic fluorescence chemistries are used in real-time monitoring of amplification reactions (reviewed in reference ): double-stranded DNA binding dyes, fluorescently labeled primers, and target-specific probe-based detection.
Near-stoichiometric binding of double-stranded DNA binding dyes to double-stranded DNA renders these reporters convenient for incorporation into real-time PCR assays. Whereas ethidium bromide was the first dye used in fluorescent monitoring of amplification reactions, it has been widely supplanted by SYBR Green I, a double-stranded DNA binding dye with fluorescence characteristics resembling those of fluorescein that increases in fluorescence on binding to double-stranded DNA. SYBR Green I is favored over ethidium bromide because of its high signal-to-noise properties conferred by preferential binding to double-stranded DNA and very low levels of fluorescence of the unbound dye. Double-stranded DNA binding dye-based real-time PCR assays are simple to design because all that is required is incorporation of the dye into the reaction mix. However, the specificity of the PCR is limited to the intrinsic specificity afforded by the specificity of the primers.
Oligonucleotide primers labeled with fluorophores at the 5′ end may be used in real-time PCR assays (reviewed in reference ). In the simplest configuration, a primer can be labeled with one fluorophore at the 5′ end, and amplification results in increased synthesis of labeled template accompanied by changes in fluorescence that occur with hybridization. In another design, a primer can be labeled both with a fluorophore on the 5′ of a hairpin and a fluorescence quencher toward the 3′ end. During PCR, the primer undergoes conformational changes that result in separation of the fluorophore from the quencher, leading to increase in fluorescence during each round of extension in the amplification reaction. Use of different-colored fluorescently labeled primers offers the ability to perform multiplex assays because the different products may be monitored in different fluorescence channels.
Target-specific probes that are complementary to a sequence within the amplicon may be incorporated into PCR. The use of target-specific probes provides an additional level of specificity for detection of the authentic product. In general, three specific probe chemistries may be used in target-specific probe-based amplification reactions: hybridization probes, hydrolysis probes, and dual-mechanism probes. The target-specific probe-based mechanisms depend on fluorescence resonance energy transfer (FRET) occurring between donor and acceptor fluorophores, and fluorescence emissions from the reporter probe may be monitored as an index of amplicon synthesis during PCR.
In this design, two oligonucleotide probes are included in the amplification reaction. Both probes are complementary to an internal sequence within the target and hybridize to the template. The 5′ probe has a donor fluorophore on its 3′ end, and the second probe carries the acceptor (reporter) fluorophore on its 5′ end (the inter-fluorophore distance is optimally ≤1 nucleotide). Excitation of the donor fluorophore with light leads to emission with FRET transfer to the acceptor fluorophore. The transferred energy results in release of light at a longer wavelength that is then detected. This approach provides high specificity of identification of the target amplicon because fluorescence is a FRET-based event requiring hybridization of the probes to the template. Hence, low background levels are observed, ensuring high signal discrimination from background noise. The hybridization probe-based formats also offer the opportunity for further verification of the identity of the product by probe melting curve analysis (see later). Despite this advantage and the exquisite specificity associated with this design, the requirement for a total of four oligonucleotides in the amplification reaction results in a higher level of complexity in hybridization probe-based assays.
Target-specific probe-based systems may also be designed with fluorescently labeled probes configured with a donor fluorophore conjugated to the 5′ end of the probe and a quencher at the 3′ end. Because of the 5′ → 3′ exonuclease function of Taq polymerase, the probe is hydrolyzed and the donor fluorophore is separated from the influence of the quencher, leading to fluorescence. Because the target-specific probes are hydrolyzed, probe melting analysis for verification of the identity of the amplicon is not reliably performed with this probe design. However, minor-groove binders functioning as hybrid stabilizing agents can be incorporated with the probe to improve the robustness of this system. Overall, the simplicity (only three oligonucleotides in the reaction for detection of one target) and specificity provided by this design favor its use in routine clinical settings.
Several probe designs incorporate both hybridization and hydrolysis mechanisms. These include the hairpin probe-based system that incorporates a design wherein the loop portion of the hairpin is complementary to a specific target sequence and the stem sequences are a shorter segment on either end of the probe with base complementarity to one another. The 5′ end of the hairpin is labeled with a donor fluorophore and the 3′ end with a quencher. Hybridization separates the donor from the quencher and results in fluorescence. This approach is highly specific because fluorescence is based on a hybridization event to the authentic target.
Continuous fluorescence monitoring of amplification reactions yields a profile that most resembles a logistic regression, with slight variation depending on the fluorescence chemistry used. Double-stranded DNA binding dyes provide an opportunity for additional verification of amplification of the desired product-by-product melting curve analysis. After amplification is completed, a melting protocol can be initiated and fluorescence melting curve analysis performed. Melting curve analysis determines the melting temperature (Tm) of the PCR product and is visualized as a precipitous drop in fluorescence during progressive heating of the PCR product. A mathematical conversion may display the Tm as a peak based on conversion of the fluorescence/temperature curve to a graph showing the −dF/dT versus temperature. The Tm is defined as the temperature at which half of the polynucleotide duplex is dissociated into single-stranded molecules and is mainly dependent on the GC content and length of the amplicon. The Tm is often distinctive for each amplicon.
Real-time PCR provides an analytically precise and technically robust approach for quantification of nucleic acid species in a sample. The quantitative applications of real-time PCR take advantage of the large dynamic range of more than five orders of magnitude. Quantification by real-time PCR is most often achieved by determination of the C T . The C T represents a fractional cycle number obtained by interpolation of the amplification profile of the PCR. The C T may be calculated by a variety of approaches, including the threshold analysis method, in which a baseline level of fluorescence is selected (typically from the early amplification cycles) and adjusted by arithmetic or proportional adjustment methods to represent a normalized baseline. This approach suffers the drawback of yielding less reliable results if sample fluorescence levels are low, as might occur in samples with low copy numbers of the intended target. An alternative and suitable approach not requiring such normalization is the second derivative maximum method. In the second derivative maximum method, calculation of the fractional cycle number takes the shape of the amplification curve into consideration. This is advantageous in that there is no requirement for baseline corrections or normalization of fluorescence values. Regardless of method used, well-optimized amplification reactions double template copy numbers with each cycle, and the C T is inversely related to the logarithm of the initial template concentration ( Fig. 6-2 ). Thus, a log-fold increase in copy numbers between samples is reflected in a 3.3 cycle number decrease in C T (2 3.3 = 10 = 1 log). Quantitative real-time PCR assays are continually used for the quantification of fusion transcripts such as BCR-ABL1 and PML-RARA in routine clinical diagnostics.
Sanger sequencing is an in vitro method of DNA sequencing that uses non-extendable dideoxynucleotide incorporation by a DNA polymerase. The classical dideoxy chain termination method includes a DNA fragment of interest, a DNA primer, a DNA polymerase, and deoxynucleoside triphosphates (dATP, dGTP, dCTP, and dTTP). One of four of the dideoxynucleoside triphosphates (ddATP, ddGTP, ddCTP, or ddTTP) is added to each reaction; the other three nucleotides are the standard unmodified deoxynucleoside triphosphates. PCR cycle sequencing entailing repeated denaturation, annealing, chain extension, and termination steps is used to generate amplicon fragments of different lengths by incorporation of one of the four dideoxynucleotide base analogues ( Fig. 6-3 ). The pentose ring in the dideoxynucleotide analogues lacks the 3′ hydroxyl and the 2′ hydroxyl groups. Given that DNA chain extension requires the 3′ hydroxyl group, incorporation of such a base “terminates” further chain elongation. The fragments generated are fluorescently labeled either by fluorescently labeled primers or by fluorescently labeled dideoxynucleotide terminators. In modern sequencers, the products of cycle sequencing are resolved with denaturing polyacrylamide gels or more frequently CE. Detection is achieved by interrogation of fluorescence signals as the DNA fragments traverse the gel past a detector. When fluorescently labeled primers are used to label the amplified fragments, four tubes are required for separate termination reactions. In assay configurations wherein one color is used, each dideoxy termination reaction mixture is subjected to electrophoresis in a separate lane or capillary. Alternatively, if four fluorophores are used, the termination reactions may be combined in one tube during electrophoresis and resolved with only one capillary. Conventional Sanger sequencing permits routine analysis of DNA fragments of up to 800 to 1000 bases in multiwell plate assays containing 96 or 384 samples in a 2-hour analytical run. Sanger sequencing is capable of reliable detection of mutant alleles constituting 20% of the allele burden in somatic conditions (malignancy) with heterozygous mutations ( Fig. 6-3 ).
Pyrosequencing is a method that determines the sequence of short nucleic acid segments without necessity for electrophoresis. Pyrosequencing is based on “sequencing by synthesis” and differs from Sanger sequencing in its detection of pyrophosphate release that accompanies nucleotide incorporation rather than chain termination effected by dideoxynucleotides. The procedure entails hybridization of a sequencing primer to a single-stranded template. The sequencing by synthesis reaction involves the enzymatic synthesis of the complementary strand to the single-stranded DNA template. The reaction includes a DNA polymerase, ATP sulfurylase, luciferase, and apyrase and two substrates, adenosine 5′-phosphosulfate and luciferin. The template DNA is immobilized, and solutions of dATP, dCTP, dGTP, and dTTP are sequentially added and removed. The deoxynucleoside triphosphates dATP, dCTP, dGTP, and dATPαS are added individually one at a time into the reaction. The substitution of dATP with dATPαS results in low background because dATPαS, although incorporated by the polymerase, is not a substrate of the luciferase enzyme. When a base is complementary to the corresponding position on the template, it is incorporated by the DNA polymerase, and this reaction is accompanied by the generation of a pyrophosphate (PPi). In this manner, the quantity of PPi produced is equivalent to the quantity of the incorporated nucleotide. The release of PPi is monitored by conversion of PPi and adenosine 5′-phosphosulfate into ATP by the ATP sulfurylase, and ATP drives conversion of luciferin into oxyluciferin, which generates visible light. The amount of light generated by this reaction is proportional to the number of nucleotides incorporated. The apyrase enzyme continuously degrades ATP and the unincorporated dNTPs. This quenches the light from the previous reaction in preparation for the next round of dNTP incorporation. This approach is well adapted for automation and yields maximum utility for resequencing studies or analyses involving large-scale sequencing of short DNA fragments. A limitation of the method is that the lengths of the reads routinely attainable with this technology are smaller (300 to 500 bases) than with Sanger sequencing (800 to 1000 bases).
Next-generation sequencing (NGS) is arguably the most disruptive of technologic advances in molecular biology in the last few decades, and NGS is poised to dramatically transform the landscape of molecular diagnostics testing. Typical workflow for the role of NGS for the clinical diagnostics laboratory is represented in Figure 6-4 . In the research setting, NGS is being increasingly used for de novo genome sequencing, DNA resequencing, transcriptome and exome sequencing, and epigenomics studies that continue to reveal novel insights in constitutional genetics and the genetic basis of disease (reviewed in reference ). The terms second generation and third generation refer to massively parallel sequencing technologies and span the gamut from clonal amplification of DNA templates on solid matrixes in second-generation technologies to those that use single-molecule PCR-free and cycle-free chemistries as configured in third-generation platforms. The third-generation platforms are still maturing and are not discussed in further detail. The second-generation NGS sequencing technologies are distinct from Sanger platforms in that in NGS, a library of fragments is constructed from the DNA to be sequenced, in contrast to Sanger sequencing, which is based on “first-generation” dideoxy terminator chemistry. Second-generation platforms entail emulsion PCR or bridge synthesis-mediated clonal amplification of DNA templates. All NGS protocols entail a library preparation step, sequencing, and bioinformatics analysis.
An important first step in library preparation involves DNA fragmentation by sonication, nebulization, or shearing, followed by DNA repair and end polishing. Synthetic DNA adapters are then covalently ligated to each fragment by a DNA ligase enzyme. The adapters are platform-specific universal sequences that are used for amplification of the library fragments. Newer technologies (Nextera) use in vitro transposition to generate libraries that are ready for sequencing. Amplification occurs on a solid surface, such as beads or flat microfluidic channels that contain adapter sequences complementary to those ligated to the library synthesized from the sample DNA. Because the entire spectrum of sequences in the library is now accessible for amplification by virtue of the universal priming sequences in the adapters, it is possible to amplify all of the library content in a “massively parallel” manner. Because amplification of each fragment occurs in situ on a single locus on the solid surface, the signal for each locus is distinct and can be “read” in a digital fashion. In contrast to Sanger sequencing, in which the amplification process is distinct from the electrophoretic detection analysis process, NGS instruments perform sequencing and analysis simultaneously. Massively parallel sequencing entails a sequential series of stepwise reactions that include nucleotide addition, detection, and identification of the incorporated nucleotides assembled on each fragment and a washing step that removes excess reagents, fluorescently labeled tags, or blocking moieties. Even though several million to billions of reaction foci are sequenced per run, the amplified signal is exponentially higher than that of possible background signals and enhances the ability for specific detection of the sequence at a specific site.
Whole genome sequencing (WGS) provides a comprehensive annotation of the genome of an individual or sample (reviewed in reference ). WGS provides a detailed map of the structural variations occurring in a genome, including complex and large structural aberrations such as translocations and rearrangements, copy number variations including whole chromosomal additions and losses, small insertions and deletions, and single nucleotide variations (e.g., point mutations), all within a single assay. A major consideration in the implementation of WGS is the expense associated with size, complexity associated with the analysis, and management of data generated.
Transcriptome (RNA) sequencing (RNA-Seq) is a large-scale and comprehensive analytical interrogation of the transcriptome (reviewed in reference ). RNA-Seq entails isolation of RNA, from which a library of cDNA fragments is generated. Adapters are ligated to one or both ends of the cDNA fragments, and each molecule is then sequenced in a massively parallel fashion. Short polynucleotide sequences varying in length from 30 to 400 base pairs are obtained from one end in single-end sequencing or from both ends from paired-end sequencing. RNA-Seq entails conversion of isolated RNA (total or subspecies such as poly(A) + ) into a cDNA library to which adapters are attached. Each molecule can be sequenced with or without amplification in a massively parallel fashion. Advantageously, RNA-Seq can use much lower levels of sample RNA. The reads obtained from sequencing may be aligned to a reference genome or transcripts or assembled de novo to generate a genome-level transcription map that includes the transcriptional architecture and expression levels of each gene. The flexibility of NGS platforms permits powerful applications, such as massively parallel cDNA sequencing or RNA-Seq, which has led to significant advances in the characterization and quantification of transcriptomes. Unlike gene expression arrays, RNA-Seq does not suffer from the limitation of detecting only known transcripts. Thus, RNA-Seq can offer information on small RNAs, such as microRNAs, PIWI-interacting RNAs, and short interfering RNAs. However, larger RNA molecules need to be fragmented, and each approach carries its own intrinsic bias. Notwithstanding this issue, RNA-Seq provides the ability to simultaneously measure the expression of thousands of genes, thereby permitting the investigation of biologically relevant transcriptional programs and pathways. NGS-based transcriptomic studies also provide superior dynamic range of detection compared with that offered by microarray-based platforms. Further, the extensive and comprehensive nature of transcriptome sequencing also permits improved understanding in transcription start site mapping, small RNA detection, characterization of alternative splicing events, and gene-fusion identification. RNA-Seq–based gene-fusion identification has been pivotal in the identification of novel gene fusions that are oncogenic drivers in many forms of human cancer. Because hybrid genes are generated from two or more physically separate genes in the genome, mapping back to the genes of origin yields gaps that can be resolved by the use of appropriate algorithms to reveal the gene-fusion events that could account for the hybrid transcript.
Accordingly, several algorithms have been developed to facilitate the identification of chimeric fusions from RNA-Seq data. These include TopHat-Fusion, ChimeraScan, and deFuse, among others. Whereas WGS can also identify gene fusions, these may be difficult to recognize because only a proportion of these will lead to generation of expressed fusion mRNA sequences. By comparison, RNA-Seq directly identifies expressed fusion genes, with much higher depth and coverage than with genome sequencing.
RNA-Seq has been applied for detection of mutations, but this is subject to variations in the abundance of transcripts that may be accessible in the mutated gene compared with the entire transcriptome. Given the versatility and capacity of the technology, it can be envisaged that targeted multiplex panels focusing on multiple recurrently translocated genes that participate as fusion partners in several neoplastic conditions can be configured for clinical diagnostic settings.
Whole exome sequencing involves the massively parallel sequencing of protein-coding sequences in a genome. This method has dramatically facilitated the investigation of genetic alterations that lead to mutations in coding sequences that are associated with diseases. The human exome contains about 30 million base pairs, thus constituting approximately 1% of the human genome (~3 billion base pairs) representing approximately 180,000 exons. Both mendelian and somatic genetic abnormalities underlying human diseases are readily identifiable by exome sequencing. Whole exome sequencing entails use of one of many capture platforms to enrich for protein-coding sequences in genomic DNA. In general, the platforms fall into one of three categories: DNA chip–based capture, DNA probe–based solution hybridization, and RNA probe–based solution hybridization. Although certain sequences (e.g., GC-rich sequences) are difficult to capture and frequently underrepresented in captured sequences, all platforms provide between 74% and 95% capture of genes within the human genome. Captured sequences are subjected to massively parallel sequencing and aligned to genomic reference sequences and variant sequences annotated.
The uniformity of sequence depth over the targeted regions is relatively high, providing 90% to 95% coverage at 30× to 60× depth per nucleotide. These impressive performance characteristics notwithstanding, exome sequencing has some disadvantages, including the inability to identify alterations in 99% of the genome, including non-coding variants and other structural aberrations that would be identified by WGS.
It is estimated that 85% of mutations associated with diseases occur in the coding and functional regions of the genome. Clinical exome sequencing is increasingly being implemented in the identification of variants in complex disorders wherein the disease manifestations may be reflective of a large number of genes in a pathway or in genetic syndromes and in the sequencing of cancers for qualifying patients for eligibility for targeted therapy.
Targeted sequencing with a next-generation platform may also be performed and entails either multiplex amplification strategies or capture-based approaches followed by sequencing. This approach is gaining a lot of traction in clinical laboratories because of its efficiency and low cost per base.
The role of epigenetic changes is increasingly being recognized in the pathogenesis of cancers. The implementation of NGS-based methods may be used for assessing DNA methylations status, mapping of transcription factor occupancy, and evaluation of histone modification. Genome-wide interrogation of DNA methylation can be performed by integrating bisulfite sequencing with NGS. Less expensive but informative alternative strategies, such as reduced representation bisulfite sequencing and targeted enrichment followed by bisulfite treatment, can also be used. Affinity enrichment-based methods with methylcytosine-specific antibodies (MeDIP-Seq) and recombinant methyl binding domains of proteins enable identification of genome regions that are modified by methylation.
Bioinformatic analysis remains a challenge and bottleneck in the interpretation of NGS data. Many of the analytical programs still require command line computer languages and can be difficult for bioinformatics non-experts to use. Nevertheless, several programs now exist to simplify NGS data analysis by provision of easy-to-use graphical interfaces.
In general, the primary data outputs from each platform typically consist of text files containing sequence reads and the quality scores for each base. Base-calling algorithms are implemented to reduce systematic errors. In general, the different sequencing platforms use a parameter such as a Phred-like score, which is related logarithmically to the probabilities of the base-calling errors. Because of a tendency for deterioration of base quality at the ends of reads, trimming protocols are implemented to improve the quality of the data. Whereas each platform typically provides an overall quality assessment, additional tools such as FastQC may be complementarily implemented.
Accurate alignment of sequence reads to reference requires implementation of algorithms such as BWA (Burrows-Wheeler Alignment), MAQ (Mapping and Assembly with Quality), Bowtie, and Novoalign. Detection of variants is optimally achieved with gapped aligners, such as BWA and Novoalign. Non-gapped aligners, such as MAP and Bowtie, are best suited for the detection of insertions-deletions (indels). On completion of sequence alignment, SAM (sequence alignment maps) or BAM (binary format) files are generated and imported into genome browsers such as IGV (Integrative Genomics Viewer) from which they can be visualized.
The next step is to identify sequence variations from reference with specialized algorithms. These algorithms generally use bayesian rules that compute the probability of a variant's occurring at a specific position while taking into account the known polymorphism rate and sequencing errors. Variant detection is followed by annotation with gene and transcript identifiers and prediction of the functional consequences of the variants (i.e., non-synonymous; missense, stop, or frameshift mutations). Once functional annotation is complete, genotypic-phenotypic associations of the individual variants can be determined by querying the published literature or perusal of websites that contain information about mutations and disease association (e.g., OMIM for mendelian diseases and COSMIC for cancer).
Mass spectrometry is an analytical technique that identifies the chemical composition of substances by measuring the mass-to-charge ( m / z ) ratios of ionized (gas-phase) molecules derived from the substance. The principal instrument for this analysis is the mass spectrometer, which in its simplest configuration is composed of an ionization platform, a mass analyzer, and a detector. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry has gained increasing use in modern molecular diagnostics laboratories for the detection of single nucleotide polymorphisms and somatic variants in cancers (reviewed in reference ). Mass spectrometry depends on analysis of the m / z ratio of the analyte and consequently does not require any labeling. In general, the procedure entails isolation of genomic DNA followed by amplification of a fragment of interest by PCR. Heat-labile alkaline phosphatase is added to the reaction to remove phosphates from any residual nucleotides, thereby preventing interference with primer extension. Alkaline phosphatase is heat inactivated, followed by a hybridization step involving addition of an extension primer that binds directly or adjacent to the constitutional or somatic sequence variant of interest. Incorporation of unlabeled deoxynucleotides or dideoxynucleotides results in extension through the polymorphic site on the template sequence and termination with incorporation of the appropriate dideoxynucleotide. The resulting product is spotted onto an array containing a matrix that is often derivatized from sinapinic acid. This matrix has the ability to absorb laser radiation and transfer protons (H + ) to the sample of interest. The ions liberated by the laser are analyzed within a time-of-flight mass spectrometer. The m/z shift accurately identifies the variant nucleotide, and thus the sequence or genotype is determined. Currently available automated systems greatly facilitate implementation of platforms based on this principle, such as the SEQUENOM MassARRAY, and permit the large-scale detection of nucleotide sequence variants and mutations in a variety of clinical laboratory settings.
The advent of high-throughput NGS platforms holds the promise of revolutionizing many areas of molecular pathology. These technologies allow the determination of individual immunoglobulin and TCR sequences from massive numbers of lymphocytes, and this has been used to analyze the normal immunoglobulin and TCR repertoire. Clonal lymphoid populations can be detected by identifying overrepresented immunoglobulin or TCR sequences, and given the large number of sequences that are assessed, the technologies may be applicable to detection of minimal residual disease (MRD). MRD is a major prognostic indicator in a growing number of hematopoietic neoplasms. PCR-based MRD analysis of antigen receptor genes requires the generation of patient-specific primers, rendering analysis burdensome for most molecular laboratories. However, assays to detect translocations that generate chimeric transcripts (such as BCR-ABL1 ) are simpler to develop because the generic primers suffice. Multiparameter flow cytometry exhibits similar sensitivity and specificity but cannot be applied in every case and is somewhat less sensitive than PCR-based methods. High-throughput sequencing is appealing for this application as it may have similar sensitivity to PCR. MRD assessment by high-throughput NGS does not require the generation of patient-specific primers and could be applicable in a higher percentage of cases than flow cytometry. Numerous challenges remain before these technologies replace standard methods of clonality detection, including definition of what constitutes a clonal population, the importance of subclones, and the presentation of significant informatics challenges for many institutions.
Molecular technologies have a variety of roles in the evaluation of lymphoid neoplasms, ranging from facilitating the diagnosis of specific entities and determining prognosis to defining targets of and responses to therapy. Lymphoid neoplasms can be broadly classified on the basis of their maturity (peripheral or mature versus precursor) and lineage (B cell, T cell, and NK cell).
Chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL) is a mature leukemia/lymphoma that typically has a leukemic (blood and bone marrow involvement) presentation. The clinical course is heterogeneous; some patients have stable disease for years, whereas others have a more progressive course. In contrast to most other indolent B-cell neoplasms, chromosomal translocations are not characteristic of CLL. Rather, they are characterized by numerical abnormalities that are better detected by FISH than by metaphase analysis, such as deletions of 13q14 (55%), 11q22-q23 (18%), and 17p13 (7%) and trisomy 12 (16%), which facilitate risk stratification. Deletions of 13q14 are the most commonly identified abnormality and when seen in isolation and in less than 60% of nuclei indicate a good prognosis. CLL with abnormalities of trisomy 12 have a more intermediate prognosis, whereas deletions of 11q22-q23 and 17p13 (targeting ATM and TP53 , respectively) indicate more aggressive behavior. Because identification of patients in particular with 11q and 17p may be important therapeutically, FISH analysis of CLL is recommended before the initiation of therapy. Array comparative genomic hybridization can also be used to demonstrate recurrent numerical chromosomal abnormalities, including gains in 2p25.3 in approximately 30% and gains of 20q13.12 in 20% of CLL. Gains of 2p53.3 are associated with unmutated IGHV regions and amplification of ACP1 and MYCN .
Sequence analysis of the IGHV region is prognostic in CLL. Patients with unmutated IGHV (defined as ≥98% germline sequence homology) have a poorer outcome than those with evidence of hypermutation.
Recurrent mutations of NOTCH1 , SF3B1 , MYD88 , BIRC3 ( API2 ), and TP53 are seen at low prevalence (~5% to 15%) in CLL. TP53 mutations are associated with fludarabine resistance and high-risk disease. Mutations of SF3B1 are more commonly observed in cases with 11q22-q23 deletions and may also be associated with fludarabine resistance and faster disease progression and poor overall survival. BIRC3 and ATM mutations also serve as markers of poor outcome. In the near future, risk stratification may be further informed by the addition of mutational analysis for these genes that are recurrently mutated in CLL.
Hairy cell leukemia (HCL) is a low-grade mature B-cell neoplasm that harbors the BRAF V600E mutation in approximately 100% of cases. Like CLL, HCLs are not associated with recurrent chromosomal translocations. HCL can usually be easily diagnosed through morphologic and immunophenotypic analysis ; however, the presence of BRAF V600E essentially excludes most other lymphomas that may mimic HCL, such as HCL variant (HCLv), and documentation of its presence may be helpful in patients with disease refractory to standard treatments as BRAF inhibitors elicit responses in HCL. The fact that the mutation occurs in a single codon makes it highly amenable to testing in a molecular diagnostic laboratory. Immunohistochemistry using an antibody that recognizes the BRAF V600E mutant specific protein can be used to identify HCL at the single-cell level in bone marrow core biopsies and clot sections. HCLv and those HCL cases that use IGHV4-34 frequently display activating mutations in MAP2K1 , highlighting the role of activated RAS/RAF/MAPK signaling in these mature B-cell neoplasms. Notably, exposure to inhibitors of BRAF or MEK led to loss of hairy cell immunophenotype (CD25, tartrate-resistant acid phosphatase, cyclin D1) as well as membrane projections. These results indicate the importance of knowing previous treatment regimens in monitoring residual disease of patients with HCL or HCLv.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here