Molecular Anatomic Pathology: Principles, Techniques, and Application to Immunohistologic Diagnosis

General Principles of Molecular Biology

Immunohistochemistry (IHC) is a common technique used for the detection of protein expression in various tissue samples. In modern pathology practice, this methodology is expanded and complemented by molecular techniques that test for changes in nucleic acids—in effect, DNA and RNA—to assist the immunohistologic diagnosis.

Many of the chapters in this book refer to theranostic and genomic principles that can be investigated with immunohistology and used directly for patient care. The underpinning of these immunohistologic tests requires an understanding of the molecular abnormalities of these disease states and how molecular methods apply to their study. In addition, the molecular methods discussed here may be valuable in diagnosis when immunohistologic results are nonspecific.

Deoxyribonucleic Acid

Genetic information in human cells is encoded in deoxyribonucleic acid (DNA), which is primarily located in the nucleus of each cell. DNA is a double-stranded molecule that consists of two complementary strands of linearly arranged nucleotides, each composed of a phosphorylated sugar and one of four nitrogen-containing bases: adenine (A), guanine (G), thymine (T), or cytosine (C). The order of these four bases encodes genetic information. Two strands of DNA run in opposite directions and are held together through hydrogen bonds between specific bases—in effect, between adenine and thymine (A:T pairing, 2 hydrogen bonds) and guanine and cytosine (G:C pairing, 3 hydrogen bonds)—that forms a double-stranded helix. As a result, the nucleotide sequence of one DNA strand is complementary to the nucleotide sequence of the other DNA strand.

The human genome contains approximately 3 billion base pairs (bp) of DNA. The DNA molecule is efficiently packed into chromatin by histones and other accessory proteins and further condensed across the chromosomes. Each normal somatic cell contains 22 pairs of autosomes and two sex chromosomes, in the XX or XY configuration. Less than 5% of the genomic DNA actually encodes protein and other functional products, such as transfer RNA (tRNA), ribosomal RNA (rRNA), micro-RNA (miRNA), small nuclear RNAs (snRNAs), and long noncoding RNA (lncRNA). ^, Most of the genomic DNA (>95%) consists of noncoding sequence typically introns, untranslated sequences, repetitive sequences such as minisatellites, microsatellites, short-interspersed elements, and long interspersed elements. Some of these noncoding sequencing have important functions, such as regulation and modulation of transcription promotion and mRNA splicing. Microsatellites are short tandem repeats (STRs), and each repeat is from 1 to 13 bp long, while minisatellites are tandemly repeated DNA sequences with a repeat unit of 14 to 500 bp. Microsatellite and minisatellite repeats are also known as STRs. Highly repetitive sequences that contain thousands of repeated units are also found at the telomeric ends of the chromosomes and near the centromere; these play a role in establishing and maintaining chromosome structure and stability. Many of these regions are difficult to sequence and study for genetic alterations due to the repetitive nature of the sequence.

For the genetic information to be decoded, the DNA is transcribed into messenger RNA (mRNA) by a DNA-dependent RNA polymerase enzyme, which, after splicing and additional modifications, is translocated into the cytoplasm where it is “read” by the ribosomal machinery to translate the genetic code into a sequence of amino acids and assembled into a protein molecule ( Fig. 24.1 ). A gene is an annotated segment of the genomic DNA that encodes proteins and other functional products. Each gene is typically represented as two copies in a cell: one on a maternal and another on a paternal chromosome. Current estimations suggest that about 25,000 distinct genes are present in the human genome. Each gene typically consists of one or more exons, which contain sequences necessary for encoding the protein and sequences that provide information for regulating, modulating the translation of the protein (untranslated regions; UTR), and modulating splicing. In contrast, the large stretches of intervening sequences between the exons, referred to as introns, are noncoding sequences that play a role in regulation of important functions, such as splicing (see Fig. 24.1 ). Transcription initiation and termination codons flank the open reading frame (ORF) of the mRNA. The ORF is the mRNA sequence that is translated into the protein molecule. Gene transcription and silencing are facilitated by promoters and enhancers, which are DNA regions typically located nearby and “upstream” from the gene they regulate; they may also be located at a great distance, deep into the intergenic noncoding region.

Fig. 24.1, Gene structure on the DNA level and the process of transcription and translation. Genes are segments of DNA that contain protein-coding regions (exons), noncoding regions (introns), and regulatory regions that include promoter and enhancer sequences. In the mature RNA (mRNA) , only protein-coding parts (exons) are preserved, and they contain the genetic information needed for the protein. UTR , Untranslated region.

Ribonucleic Acid

Ribonucleic acid (RNA) is a single-stranded molecule that consists of a chain of nucleotides on a sugar-phosphate backbone. However, the sugar in RNA is ribose rather than deoxyribose, and thymine is replaced by uracil. RNA is more susceptible to chemical and enzymatic hydrolysis and is less stable than DNA.

Several types of RNA molecules exist, and each is different in its structure, function, and location. The most abundant types of RNA are rRNA and tRNA, which comprise up to 90% of total cellular RNA. They are predominantly located in the cytoplasm and have important functions in protein synthesis: rRNA, in a complex with specific proteins, forms ribosomes on which proteins are synthesized, and tRNA is responsible for the carrying and adding of the amino acid to the growing polypeptide chain during protein synthesis. mRNA comprises 1% to 5% of total RNA; each mRNA molecule is a copy of a specific gene, and functions to transfer genetic information from the nucleus to the cytoplasm where it serves as a “blueprint” for protein synthesis. The gene sequence is first transcribed into the primary mRNA transcript by DNA-dependent RNA polymerase. This transcript is a complementary copy of the gene and includes all exons and introns. Next, introns are spliced out from the primary mRNA transcript while it is processed into a mature mRNA (see Fig. 24.1 ). Other types of RNA include heterogeneous RNA (hnRNA) and snRNA. Several classes of short RNAs have also been discovered, one of which is miRNAs, which are short (19 to 22 nt) single-stranded molecules that function as negative regulators of gene expression. ^, Lastly, an important, more recently discovered, class of RNA molecule is the long noncoding RNA (lncRNA). These molecules are typically greater than 200 nt in length and play a key role in chromatin remodeling, growth, gene expression, and cellular differentiation. They are transcribed from different regions of the noncoding genomic DNA, such as the intergenic, intronic, and promoter elements, or from anti-sense strands of the coding regions of the genomic DNA.

Protein

The abundance of a protein within each cell depends on the expression levels of the gene (i.e., how many mRNA copies are transcribed from DNA) and the stability of the protein. Proteins are synthesized using ribosomal complex in the cell cytoplasm, which reads the mRNA carrying the genetic information and facilitates the assembly of the polypeptide chains by reading a three-letter genetic code on the mRNA and pairing it with a complementary tRNA linked to an amino acid. The three-bases or triplet code, called the codon, defines a specific amino acid that is incorporated into the growing polypeptide chain. After synthesis, the protein undergoes posttranslational modification, such as chain cleavage, chain joining, addition of nonprotein groups, and folding into a complex, three-dimensional (tertiary and quaternary) structure.

Genetic Polymorphism and Mutations (Sequence Variants)

Variations in DNA sequence are common among individuals. Genetic polymorphism is an alteration in the genomic DNA sequence that is found in the general population (apparently healthy individuals) at a 1% or higher frequency (minor allele frequency [MAF]). Polymorphism may be a single nucleotide variant (SNV), also known as a single-nucleotide polymorphism (SNP), an insertion or deletion variant (Indel), or variation in the number of repetitive DNA sequences (such as minisatellites or microsatellites) called length polymorphism. Genetic polymorphism does not usually directly cause a disease but may serve as a risk factor for some diseases.

Mutation, more appropriately referred as a pathogenic or a clinically significant sequence variant, is an alteration of the DNA sequence that is likely to cause disease or is implicated in oncogenesis; they are rare or absent in the genomes of populations of apparently healthy individuals (very low or absent MAF). Pathogenic variants can be either germline (present in all cells of the body) or somatic (found in tumor cells only). Somatic mutations may provide a selective advantage for cell growth and may initiate cancer development, but they are not inherited. In contrast, germline pathogenic variants are passed on to the next generation following a pattern of inheritance.

Pathogenic variants located in a coding sequence, in the regulatory elements, or at the intron-exon boundaries (splice junctions) of a gene may affect transcription and/or translation and result in alteration of the protein structure and function. The sequencing of cancer genomes has revealed that most mutations occur in genes in which the encoded protein affects signaling pathways that control important cell functions. Genetic alterations in cancer involve genes that are implicated in oncogenesis as oncogene, tumor suppressors gene, genes involved in chromatin remodeling, oxidative pathways, and immune regulation.

Not all somatic genetic alterations have a clear biological effect. Pathogenic variants that increase cell growth and survival and are positively selected for tumor development are called driver mutations ; conversely, genetic alterations that do not confer a selective growth advantage to the cell and do not have functional consequences are known as passenger mutations. They may be coincidently present in a cell that acquires a driver mutation and are carried along during clonal expansion, or they occur during clonal expansion of a tumor or at the metastatic site. It is generally believed that only a small fraction of pathogenic variants in a given tumor are represented by driver mutations; it has been estimated that a typical human tumor carries approximately 80 mutations that change the amino acid sequences of proteins, of which less than 15 are driver mutations.

Pathogenic variants can be classified according to size and structure into small-size sequence alterations (single to few base pairs) and large-size genetic alterations (structural rearrangements spanning several base pairs, parts of, or entire chromosomes). Small-size sequence alterations include single-nucleotide variants (SNV) as well as small deletions and insertions (Indels). Based on the predicted impact on the translated protein, SNVs can be further classified as missense, which lead to amino acid change with or without a resulting abnormal protein, silent (synonymous) change that does not result in a change in the amino acid sequence, and nonsense (stop gain) change, when substitution of a single nucleotide results in the incorporation of a pre-mature stop codon and a resulting truncated protein. Indel variants can result in either deletion or insertion of a number of nucleotides (typically in sets of 3), leading to a net loss or gain of amino acids but with an intact open reading frame (ORF), or it leads to insertion or deletion of a number of nucleotides (typically not divisible by 3), which changes the ORF of the gene; this affects multiple amino acids and typically introduces a premature stop codon and a resulting truncated protein. Large-size genetic alterations can be caused by (1) numeric chromosomal change (that is, loss or duplication of the entire chromosome); (2) chromosomal rearrangement, translocations, or inversions that result in an exchange of chromosomal segments between two nonhomologous chromosomes, or within the same chromosome, and typically lead to activation of specific genes located at the fusion point; (3) amplification, when a particular chromosomal region is repeated multiple times on the same chromosome or different chromosomes, resulting in the increased copy number of the gene located within this region; and (4) chromosomal deletion, when deletion of a discrete chromosomal region leads to loss of one or more genes mapping to that region of the genome. Functional consequences of each pathogenic variant type are different. In general, pathogenic variants result in either activation of the gene (typically an oncogene, such as KRAS or RET ) or loss of function of a tumor suppressor gene ( TP53, PTEN, CDKN1A ).

A current list of somatic mutations in cancer can be viewed at the Catalogue of Somatic Mutations in Cancer (COSMIC) database, which documents somatic cancer mutations reported in the literature and identified during the Cancer Genome Project ( https://cancer.sanger.ac.uk/cosmic ).

Specimen Requirements for Molecular Testing

Molecular testing in surgical pathology can be performed on a variety of clinical samples, including fresh- or snap-frozen tissue, formalin-fixed paraffin-embedded (FFPE) tissue, cytology specimens (fresh and fixed fine needle aspiration [FNA] samples), blood, bone marrow, and swabs. Specimen requirement depends on the type of disease and on molecular techniques used for the analysis. Peripheral blood lymphocytes or cells from buccal swabs are typically used for detection of germline mutations responsible for a given inherited disease, such as RET mutations in familiar medullary thyroid carcinoma. Blood and bone marrow biopsy materials are frequently used for characterization of myeloid neoplasms by detection of DNA mutations ( NPM1, FLT3-ITD, MPL ) and chromosomal rearrangements ( BCR/ABL1 in acute lymphocytic leukemia). Tumor tissue samples are used to characterize somatic mutations such as KRAS point mutation in colorectal cancer, SS18/SSX1 rearrangement in synovial sarcomas, or EGFR and MET mutations in lung adenocarcinomas.

Fresh- or snap-frozen tissue is the best sample for testing, because freezing minimizes the degradation and provides excellent quality of DNA, RNA, and protein. Such specimens can be successfully used for any type of molecular analysis, including detection of somatic mutations, chromosomal rearrangements, gene-expression arrays, and miRNA profiling. FFPE tissue samples or fixed cytology specimens do not provide such highly preserved nucleic acids; however, these specimens can be successfully used for molecular testing in routine clinical testing, including DNA and RNA molecular tests. Usually, 10% neutral-buffered formalin (NBF) is most commonly used for tissue fixation. However, it leads to fragmentation and deamination defects of DNA; therefore, molecular assays need to be optimized when FFPE tissue samples are used by amplification of shorter DNA fragments (250 to 300 bp in length). Prolonged (>24 to 48 hours) fixation in 10% NBF adversely affects the quality of nucleic acids and should therefore be avoided. Tissue specimens that were processed with bone decalcifying solution, regardless of the extent (mild or strong), cannot be used for molecular analysis because of extensive DNA fragmentation. Similarly, it is not recommended to perform molecular testing on specimens exposed to fixatives that contain heavy metals (e.g., Zenker, B5, acetic acid-zinc-formalin) because of inhibition of DNA polymerases and other enzymes that are essential for molecular assays.

RNA molecules are less stable than DNA and are easily degraded by a variety of ribonuclease enzymes present in abundance in the cell and environment. Therefore, only freshly collected or frozen samples are universally acceptable for RNA-based testing. RNA isolated from FFPE tissue is of poor quality and can be used for some but not all applications, particularly in a setting of clinical diagnostic testing.

The amount of tissue required for molecular testing depends on the sensitivity of a technique and on the purity of the tumor sample. When selecting a sample for molecular testing, a pathologist must review a representative hematoxylin and eosin (H&E) slide of the tissue to identify a target and determine the purity of the tumor; that is, the proportion of tumor cells and benign stromal and inflammatory cells in the area selected for testing must be evaluated. Manual or laser-capture microdissection can be performed with unstained tissue sections under the guidance of an H&E slide to enrich the tumor cell population. The minimum percentage of tumor cells required for molecular testing depends on the methodology being used for analysis. In general, a minimum tumor cellularity of 50% and at least 300 to 500 tumor cells are required for Sanger sequencing. However, with the increasing use of highly sensitive molecular technologies, such as next generation sequencing (NGS) and digital polymerase chain reaction (PCR), a tumor with as low as 20% neoplastic cellularity can be used for molecular testing. It is important to understand the tradeoff; with decreasing neoplastic cellularity, the ability to reliably detect subclonal mutation and other genetic alterations, such as copy number alterations (CNA) and loss of heterozygosity (LOH), is lost.

For molecular testing of hematologic specimens, blood and bone marrow should be collected in the presence of the anticoagulants ethylenediaminetetraacetic acid (EDTA) or acid-citrate-dextrose (ACD) but not heparin, because even a small residual concentration of heparin inhibits PCR amplification. Conventional cytogenetic analysis requires fresh tissue. Fluorescence in situ hybridization (FISH) can be performed on a variety of specimens including frozen tissue sections, touch preparations, paraffin-embedded tissue sections, and cytology slides.

Technologies for Molecular Analysis

Polymerase Chain Reaction

PCR is an amplification technique most frequently used in molecular laboratories. The introduction of PCR has dramatically increased the speed and accuracy of DNA and RNA analysis, and the technique is based on exponential and bidirectional amplification of DNA sequences with a set of oligonucleotide primers.

Every PCR run must include the DNA template, two primers complementary to the target sequence, four deoxynucleotide triphosphates (dATP, dCTP, dGTP, and dTTP), DNA polymerase, and magnesium chloride (MgCl ₂ ) mixed in the reaction buffer. Three steps occur in the PCR cycle ( Fig. 24.2 ). First, the reaction mixture is heated to a high temperature (95°C), which leads to DNA denaturation , or separation of the double-stranded DNA into two single strands. The second step involves annealing of primers, in which the reaction is cooled to 55°C to 65°C in order to allow primers to attach to their complementary sequences. The third step is DNA extension , in which the reaction is heated to 72°C to allow the enzyme DNA polymerase to build a new DNA strand by adding specific nucleotides to the attached primers. These three steps are repeated 35 to 40 times, and during each cycle, the newly synthesized DNA strands serve as a template for further DNA synthesis. This approach results in the exponential increase in the amount of a targeted DNA sequence and production of 10 ⁷ to 10 ¹¹ copies from a single DNA molecule.

Fig. 24.2, Schematic representation of the polymerase chain reaction (PCR) . A three-step cycle—denaturation, annealing, and extension—is repeated 35 to 40 times in order to generate more than 10 7 copies of the targeted DNA fragment.

The efficiency of PCR amplification depends on many factors, which include the quality of the isolated DNA template, size of the PCR product, optimal primer design, and optimal conditions of the reaction. High-quality DNA allows amplification of long products (as high as 3 to 5 kb). However, when dealing with DNA of suboptimal quality—that is, when DNA is isolated from fixed tissue or cytology preparation—the reliable amplification can be achieved on only relatively short DNA sequences (400 to 500 bp or shorter).

Once the PCR procedure is complete, the products of amplification should be visualized for analysis and interpretation. A simple way to achieve this is to use agarose gel electrophoresis and ethidium bromide staining. However, this method cannot separate amplification products that differ in size by only a few nucleotides, and finer separation can be achieved with polyacrylamide gel or capillary gel electrophoresis ( Fig. 24.3 ). PCR amplification followed by gel electrophoresis is frequently used for detection of small deletions or insertions, microsatellite instability (MSI), and LOH. For detection of point mutations, the PCR products should be interrogated by other molecular techniques.

Fig. 24.3, Post-polymerase chain reaction (PCR) detection of amplification products. (A) Agarose gel electrophoresis shows two PCR products obtained with DNA from two tumor samples, T1 and T2, which are of similar size. (B) Capillary gel electrophoresis shows two PCR products of different sizes. L , DNA ladder (size marker); NC , negative control.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here