Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Cancer genomes are characterized by the presence of a variety of alterations including base substitutions, copy-number alterations (amplifications or deletions), and structural rearrangements (translocations or chromosomal rearrangements).
Among the early methods of DNA sequencing (now known as first-generation methods), the most successful has been the Sanger sequencing or chain termination reaction method. Despite its effectiveness, accuracy, and the substantial improvements since its original description, first-generation sequencing has been limited by high cost, labor intensity, and low throughput (amount of data generated per unit of time).
Next-generation sequencing (NGS) is a broad term describing different technologies characterized by high-throughput, lower cost, and faster sequencing time compared with first-generation methods. NGS enhances the ability to comprehensively identify all alterations in the cancer genome, including mutations, copy-number alterations, and changes in gene expression, in a reasonable time frame.
NGS studies in patients with lung cancer have allowed comprehensive characterization of the molecular alterations in lung adenocarcinomas, squamous cell carcinomas, and small cell carcinomas. These studies have also facilitated the study of the clonal architecture of lung cancer samples and its clinical implications.
It is possible today, with newer technologies, to utilize circulating tumor DNA isolated from peripheral blood or other body fluids of patients for genetic testing. Such testing is less invasive and is becoming increasingly popular in the clinical setting.
The advent of targeted therapies has brought about a paradigm shift in the management of lung cancer. The majority of these drugs, however, only benefit a small subset of patients whose tumors are driven by specific aberrations in cell signaling pathways. Cancer cells demonstrate several types of genomic alterations including base substitutions, copy-number alterations (amplifications or deletions), and structural rearrangements (translocations or chromosomal rearrangements). Point mutations or single base substitutions (also known as single nucleotide variants [SNVs]) represent one of the most common types of DNA alteration. SNVs in protein-coding genes may result in a variety of effects in the resulting proteins. Synonymous mutations alter the DNA sequence of protein-coding genes in a way that the modified sequence at the mutated location still codes for the same amino acid. These mutations are therefore viewed as being “silent,” although recent data suggest that some of these mutations could have important functional consequences. By contrast, missense and nonsense mutations are associated with the substitution of one amino acid for another or premature termination of protein synthesis, respectively. Mutations that arise from the insertion or deletion of one or more nucleotides are referred to as “Indels” (short for insertions and deletions). These mutations can result in frameshift mutations that alter the reading frame of a protein-coding gene. The reading frame of a coding sequence refers to groups of three bases (or codons) in the sequence of a gene, each of which codes for a specific amino acid. When the number of nucleotides inserted or deleted from a coding sequence is not a multiple of three, the reading frame of the coding sequence downstream of the mutation is shifted, resulting in missense or nonsense alterations and the production of an abnormal or nonfunctional protein.
The processing of precursor messenger RNA (mRNA) into mature form occurs through removal of introns and joining of exons in a process termed “splicing.” This process is regulated in cells through proteins that constitute a cell’s splicing machinery. These proteins distinguish introns from exons based on characteristic base sequences within the intron, within the exon, and at intron–exon junctions. Splicing mutations alter these specific sites and deregulate splicing, leading to the abnormal inclusion or exclusion of introns or exons from the final mRNA. This can result in the production of aberrant and nonfunctional proteins. Copy-number alterations are changes in gene number from the two copies present in the normal diploid genome. Rearrangements occur when DNA from one segment is broken and rejoined to a DNA segment from elsewhere in the genome. Rearrangements occurring within the same chromosome or involving regions on different chromosomes are referred to as intrachromosomal or interchromosomal translocations, respectively.
Somatic mutations in cancer cells are identified by comparing the DNA sequence of cancer cells with that of noncancerous “normal” cells acquired from the same individual. Although these somatic mutations occur randomly throughout the genome of a cancer cell, a subset of somatic mutations occurs in a key set of genes that confer growth advantage to the cells harboring them. These “driver” mutations are positively selected during cancer evolution and implicated in oncogenesis. One of the important objectives of cancer genomic studies is to distinguish these driver mutations from bystander “passenger” mutations that do not confer a survival advantage, in an unbiased fashion. This process entails the use of complex statistical algorithms. Apart from offering an insight into the biology underlying malignant transformation, such analyses also facilitate the identification of novel therapeutic targets.
Among the early methods of DNA sequencing (now known as first-generation methods), the most successful has been the Sanger sequencing or chain termination reaction method. When a dideoxynucleotide triphosphate (ddNTP) is incorporated into a growing oligonucleotide DNA molecule instead of a deoxynucleotide (deoxynucleotide triphosphate [dNTP]), its lack of a 3′-hydroxyl group, which is required for the formation of a phosphodiester bond between two nucleotides, leads to the inhibition of DNA polymerase I and further strand elongation. This chain termination forms the basis of Sanger sequencing. The first step in Sanger sequencing is the preparation of identical single-stranded DNA molecules with a short oligonucleotide annealed to each molecule. This short oligonucleotide helps prime DNA synthesis that is complementary to the single-stranded DNA (template) molecules. Both the DNA template and the primer are incubated with DNA polymerase in the presence of a mixture of the four dNTPs and a small amount of each of the four ddNTPs labeled with radioactive 32-P. Although DNA polymerase does not discriminate between dNTPs and ddNTPs, the considerably larger amount of dNTPs compared with ddNTPs allows the incorporation of several hundred nucleotides before a ddNTP is randomly incorporated into the nascent DNA. Because each reaction is performed with one subtype of ddNTP, the result is a group of nascent DNA molecules of different lengths, but with each ending in a ddNTP. The mixture with each of the ddNTPs is loaded into one of four parallel wells of polyacrylamide slab gel and the molecules are separated according to their molecular mass to allow a deduction of the DNA sequence by visualization of the bands by autoradiography. Because of the relatively easier process and reliability compared with the other technologies, autoradiography has become the method of choice for DNA sequencing. Advances in fluorescent technology allowed the tagging of either the primer or the terminating ddNTP with a specific fluorescent dye and the development of automated sequencing. Four-color fluorescent dyes eventually replaced the radioactive labels and allowed the separation of molecules by capillary electrophoresis, which in turn replaced the slab gel method. One of the advantages of the capillary electrophoresis is that it allows all four reactions to be performed in a single tube.
Despite the effectiveness, high accuracy, and substantial improvements since its original description, first-generation sequencing has been limited by high cost, labor intensity, and time consumed due to the low throughput (defined as amount of data generated per unit of time). Using modern techniques, the automated chain-termination method can involve up to 96 sequencing reactions simultaneously. With each run capable of generating approximately 500 bases of sequence, the 96 sequencing reactions may produce, at most, approximately 48 kilobases (kb) every 2 hours. Although this technology was very useful for sequencing lower organisms, it is not particularly suitable for sequencing the human genome, which is approximately 3 billion base pairs (bp) long.
Next-generation sequencing (NGS) is a broad term describing different technologies characterized by high-throughput, lower cost, and faster sequencing time compared with first-generation methods. Although the Sanger sequencing method allowed the study of one modality of cancer genomic alterations at a time, NGS enhances the ability to comprehensively identify all alterations, including mutations, copy-number alterations, and changes in gene expression, in a reasonable time frame. NGS is also referred to as massively parallel sequencing, because it allows for a substantial increase in the number of sequence reads simultaneously generated, facilitating higher throughput and leading to considerable cost reduction. Initially, the increased output was achieved with substantial sacrifices in length and accuracy of the individual reads compared with the Sanger sequencing method. Nevertheless, to overcome the higher error rates, NGS platforms use a high level of redundancy or sequence coverage to increase the confidence in base calling. Sequence coverage or depth is the number of times a nucleotide mapped to a genome position is read during the sequencing process, due to overlap of the reads generated during sequencing. Physical coverage is the number of fragments that span a specific location in the genome. A common method to characterize the quality of sequencing reads is the combination of PHRED and PHRAP quality scores, which are algorithms used to evaluate the accuracy of base calling in the raw and assembled sequence, respectively. Both scores correspond to an error probability of 10 – x /10 . Therefore, PHRED or PHRAP quality scores of 20 and 30 correspond to an accuracy of 99% and 99.9%, respectively.
The most common platforms used for NGS are the Roche 454 (Basel, Switzerland), Illumina (San Diego, CA, USA), and SOLiD (Sunnyvale, CA, USA). The Roche 454 was the first NGS platform available as a commercial product and uses pyrosequencing, an alternative method of DNA sequencing based on measuring inorganic pyrophosphate (PPi) generated during DNA synthesis. In this method, the DNA fragment of interest is hybridized to a sequencing primer and incubated with DNA polymerase, adenosine triphosphate (ATP) sulfurylase, firefly luciferase, and a nucleotide-degrading enzyme. Deoxynucleotides are added in repeated cycles and incorporated into the growing DNA strand at complementary sites of the template strand. During this process, PPi is released in equal molarity to the incorporated deoxynucleotide. ATP sulfurylase catalyzes the conversion of PPi and adenosine phosphosulfate into ATP and sulfate. ATP provides the energy for the oxidation of luciferin into oxyluciferin by luciferase, generating light that can be estimated by a photodiode or charge-coupled device camera. The unincorporated deoxynucleotides are degraded between the cycles by a nucleotide-degrading enzyme, most commonly apyrase. The overall reaction from polymerization to light detection takes approximately 3 seconds to 4 seconds at room temperature. The Illumina platform uses a sequence-by-synthesis (SBS) approach where all four nucleotides, each carrying a base-unique fluorescent label, are added simultaneously to the flow channels together with DNA polymerase and reversible terminators. Each base incorporation step is followed by fluorescent imaging and chemical removal of the terminator. The unique feature of the SOLiD platform is the use of sequencing by ligation, which uses DNA ligase instead of DNA polymerase. The Illumina platform is currently the most widely used platform for NGS.
Whole-genome sequencing (WGS) is the analysis of the entire genomic DNA sequence of a cell at a single time, providing the most comprehensive characterization of the genome. WGS became available after the publication of the Human Genome Project, which generated the reference for human genome sequences. With the use of matched noncancerous genomes, which are usually obtained from skin biopsies in patients with hematologic malignancies and peripheral blood mononuclear cells or adjacent normal tissue in solid tumors for comparison, WGS allows the detection of the full range of genomic alterations as well as noncoding somatic mutations in cancer cells.
The first whole cancer genome sequence was reported in 2008 in a patient with cytogenetically normal acute myeloid leukemia. Using the patient’s skin as the matched normal counterpart, the authors described 10 genes with acquired mutations, including two previously known and eight new mutations. Shortly after that, the initial studies on WGS in lung cancer and other solid tumors were reported. Several tumor samples obtained from patients with various malignancies have been sequenced to date by independent groups and large-scale consortia such as The Cancer Genome Atlas (TCGA).
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here