Regulation of Gene Expression in Hematology


Introduction

The function of a cell is not only determined by the sum of the specific RNAs and proteins expressed but also by their metabolism, modification, and localization. To understand how a cell behaves, one must understand how the expression of genes, translation of transcripts, and processing of proteins are regulated.

Through concerted regulation of these processes, hematopoietic stem cells (HSCs) maintain a balance between quiescence and differentiation to mature blood cell types; erythroid progenitors produce vast quantities of hemoglobin; myeloid cells generate granules of immune responses; lymphocytes control immunoglobulin levels; and platelets regulate levels of thrombotic receptors.

Aberrant gene expression and RNA metabolism can result in hematologic disorders such as lymphomas, leukemias, and myelodysplastic and myeloproliferative syndromes. Furthermore, mutations in elements of the ribosomal machinery result in bone marrow failure syndromes. Understanding the process behind RNA and protein synthesis, trafficking, and degradation is crucial for the diagnosis and treatment of hematologic disorders.

This chapter will present the foundation necessary to understand the process of gene expression through RNA synthesis and processing, including transcription, splicing, modification, nuclear export, localization, stability and translation as well as posttranslational protein modification, targeting, and localization.

The first step of gene expression is transcription, where RNA polymerases decode the DNA using specific start and stop signals to synthesize RNA ( Fig. 4.1 ). In the subsequent step, splicing removes introns, portions of the RNA that do not code for protein. The RNA is “capped” at the 5′ end and supplied with a poly-A tail at the 3′ end; RNA is also modified cotranscriptionally providing an additional layer for regulation of its stability and localization. RNA modifications define the epitranscriptome in analogy to DNA and histone modifications (called the epigenome ). Next, the spliced RNA is targeted for export out of the nucleus and into the cytoplasm, where ribosomes translate the RNA into protein products (see Fig. 4.1 ). Protein synthesis occurs in the cytoplasm and generates a great variety of products endowed with a wide spectrum of functions. The complete set of proteins produced by a cell is called the proteome and is responsible for the remarkable diversity in cell specialization that is typical of metazoan organisms. To be functional, proteins need to be properly folded, assembled, often modified, and transported to their specific destination. The cells’ interior harbors several membrane-bound organelles, such as the mitochondria, peroxisomes, nucleus, and endoplasmic reticulum (ER), to which the proteins may be targeted. In addition, membraneless organelles have been identified both in the nucleus and in the cytoplasm, including nucleoli, Cajal bodies, P-bodies, and stress granules. These organelles exist as liquid droplets within the cell and arise from the condensation of cellular material in a process termed liquid-liquid phase separation (LLPS).

Figure 4.1, OVERVIEW OF GENE EXPRESSION FROM DNA TO PROTEIN VIA RNA.

This chapter briefly describes gene expression from start to end, exploring both classic and emergent regulatory mechanisms and connecting them with hematologic disorders.

Regulation of Transcription

Each cell in the human body contains approximately 60,000 genes (approximately 20,000 protein coding, 18,000 long noncoding, 7500 small noncoding and 14,500 pseudogenes). Only 1% to 2% of human DNA actually serves to code for proteins; the remaining part is prevalently involved in regulating DNA replication and gene expression across cell types and developmental stages. Together these regulatory sequences determine in which cell, at what time, and in what amount the gene is converted into the corresponding protein.

RNA Polymerase Binding and Regulation by Transcription Factors

RNA polymerase synthesizes RNA from a DNA template. For transcription to begin, RNA polymerase must attach to a specific DNA region at the beginning of a gene, known as promoter . Transcription factors control access of and frequently recruit RNA polymerases to promoter regions. Promoters can additionally function together with other more distant regulatory DNA regions, such as enhancers or repressors to further control the level of transcription of a given gene. Insulator regions in the genome protect genes from influences from regulation of neighboring genes. Multiple enhancer sites may tune the transcription of one gene, and each enhancer may be bound by more than one transcription factor, increasing the complexity of transcriptional regulation. Enhancers are often the major determinant of transcription of developmental genes in the differing lineages and stages of hematopoiesis. Genes can have more than one transcription start site, giving rise to RNA molecules starting with distinct sequences.

RNA is heterogeneous and stretches of genomic DNA may encode for more than one RNA or more than one type of RNA. Most eukaryotic RNA genes, especially messenger RNAs (mRNAs), contain a basic structure consisting of alternating coding exons and noncoding introns, subsequently dealt with in the splicing process.

While most RNAs in the cell are encoded by chromosomes in the nucleus, several mitochondrial proteins are encoded by the mitochondrial genome, often referred to as mtDNA. Transcription of the different classes of RNAs in eukaryotes is carried out by three different RNA polymerase enzymes. RNA polymerase I synthesizes the ribosomal RNAs (rRNAs), except for the 5 S species. RNA polymerase II synthesizes the mRNAs and some small nuclear RNAs (snRNAs) involved in RNA splicing. RNA polymerase III synthesizes 5 S rRNA and transfer RNAs (tRNAs). Transcription levels are finely tuned by the binding strength of the RNA polymerase to the promoter region at the beginning of a given gene, the interaction between activating and inhibiting transcription factors that bind to the given promoter, and transcriptional regulatory domains such as the enhancers or silencers mentioned previously.

Gene-specific transcription factors are sequence-specific DNA binding proteins that can be modified by cell signals. Numerous genetic diseases are associated with mutations in a gene’s coding region, promoter, or enhancers. In β-thalassemia, mutations can occur in the promoter region, the enhancer region, or the coding region of the gene. Mutations can involve single nucleotide substitutions, small deletions, or insertions and can heavily affect transcription, RNA splicing or stability, translation, and ultimately protein availability or functionality. Regulation of transcription is fundamental during T-lymphocyte differentiation, which requires binding of multiple activating transcription factors, such as lymphocyte enhancer factor (LEF)-1, GATA binding protein 3 (GATA)-3, and ETS proto-oncogene (ETS)-1, to the T-cell receptor alpha (TCRA) gene enhancer.

Mutations in promoter sequences that result in decreased transcription factor binding, and therefore less RNA polymerase binding, ultimately lead to decreased gene expression. One of the best examples of a mutation in a transcription factor binding site associated with a human disease is in the factor IX gene. The transcription factor hepatocyte nuclear factor 4 alpha (HNF4α) is required to bind to the factor IX promoter before this gene can be transcribed. Patients with a mutation in the HNF4α binding site can develop hemophilia B, an X-linked recessive bleeding disorder primarily affecting males ( Fig. 4.2 ).

Figure 4.2, ROLE OF TRANSCRIPTION FACTORS IN THE REGULATION OF EUKARYOTIC GENE EXPRESSION.

Many transcription factors, such as signal transducer and activator of transcription (STAT) proteins, require phosphorylation to bind DNA. Since transcription factors can be targeted by kinases and phosphatases, phosphorylation can effectively integrate information carried by multiple signal transduction pathways, thus providing versatility and flexibility in gene regulation. For example, the Janus kinase (JAK)-STAT pathway is widely used by members of the cytokine receptor superfamily, including those for granulocyte colony-stimulating factor (G-CSF), erythropoietin, thrombopoietin, interferons, and interleukins. Normally, ligand-bound growth factor receptors lead to JAK2 phosphorylation, which then activates STAT, also by phosphorylation. Activated STAT then dimerizes, translocates to the hematopoietic cell nucleus, binds DNA, and promotes transcription of genes for hematopoiesis. Alteration of JAK2, such as a V617F mutation, results in a constitutively active kinase capable of driving STAT activation. This leads to constitutive transcription of STAT target genes and results in myeloproliferative disorders such as polycythemia vera.

Regulation of Transcription by Chromatin

The ability of transcription factors and RNA polymerases to access specific promoters and transcribe genes is also regulated by the packaging of DNA by proteins and RNA, together forming the chromatin . Chromatin can package DNA tightly (heterochromatin) or loosely (euchromatin) . In euchromatin , RNA polymerases can freely bind to DNA and genes are actively transcribed. In heterochromatin , DNA is tightly packaged, protected from the transcription machinery, sequestering genes away from transcription. The basic unit of chromatin is the nucleosome, which contains eight histone proteins packaging 146 base pairs of DNA. Histones can be extensively modified to regulate the accessibility of the DNA to the transcriptional apparatus (see Chapter 3 ). Histones can be chemically modified by acetylation, methylation, phosphorylation, or ubiquitination. In general, acetylation opens the nucleosome to increase transcription, whereas phosphorylation marks damaged DNA. Histone methylation can either open chromatin to increase transcription or close it to repress transcription, depending on where the histone is methylated. Transcription factors can themselves recruit histone-modifying enzymes that further regulate transcription. In hematopoiesis, transcription factors, including GATA-1, EKLF, NF-E2, and PU.1, recruit histone acetyltransferases (HATs) and histone deacetylases (HDACs) to promoters of their respective target genes, leading to addition or subtraction of acetyl groups from histones, that in turn alters chromatin structure and accessibility for transcription. GATA-1, a gene essential to erythroid maturation and survival, directly recruits HAT complexes to the β-globin locus to stimulate transcription activation.

Chromatin remodeling is mediated by a family of proteins with switch/sucrose nonfermentable (SWI/SNF) domains. These proteins use adenosine triphosphate (ATP) hydrolysis to shift the nucleosome core along the length of the DNA, a process also known as nucleosome sliding . By sliding nucleosomes away from a gene sequence, SWI/SNF complexes can activate gene transcription. SWI/SNF proteins also contain helicase enzyme activity, which unwinds the DNA by breaking hydrogen bonds between the complementary nucleotides on opposite strands. By unwinding the DNA into two single strands, the DNA can then be read by RNA polymerases in the direction 3′ to 5′, allowing RNA polymerase to produce an antiparallel RNA strand. The SWI/SNF complex has been shown to be active in the DNA damage response and is also responsible for tumor suppression. These processes are described in further detail in Chapter 2 .

Regulation of Transcription by DNA Modification

DNA can also itself be chemically modified to amplify or suppress transcription. CpG sites within gene promoter regions can be chemically modified by methylation enzymes called DNA methyltransferases (DNMTs), which decrease DNA binding of RNA polymerase and associated transcription factors. Hypermethylation has been observed in bone marrow cells of patients with myelodysplastic syndromes (MDSs), and the degree of DNA hypermethylation correlates with disease stage. In MDSs, the promoters of genes that are important for myeloid differentiation are hypermethylated, repressing their transcription and inhibiting proper maturation of the myeloid lineages. Hypomethylating agents such as azacitidine and decitabine can induce remission and may prolong survival in some MDS patients.

The regulation of gene expression by modification of chromatin conformation or DNA itself is termed epigenetic because it tunes cell function without altering the nucleotide sequence of the DNA. Regulation and function of the epigenome and their role in hematopoiesis and diseases thereof are described in Chapter 3 and reviewed in Cullen et al. Examples of hematologic malignancies driven by disordered epigenetic regulation include MDSs and acute myeloid leukemia (AML), with mutations in the DNMT3A gene observed in approximately 5% of MDSs and approximately 20% of AML cases. DNMT3A mutations confer a worse prognosis in AML. The ten-eleven-translocation methylcytosine dioxygenase member, TET2, catalyzes hydroxymethylation of cytosines in DNA and results in demethylation of DNA; TET2 is mutated in AML, MDSs, chronic myelomonocytic leukemia (CMML), and other myeloproliferative neoplasms (MPNs), and all mutations represent loss-of-function mutations. Both DNTM3A and TET2 mutations, together with mutations in other chromatin modifiers such as ASXL1, can widely regulate gene expression and are often present in clonal hematopoiesis, preceding onset of frank malignancy.

RNA Proofreading

Before a final mRNA product is made that can be translated, several proofreading regulatory steps must take place. The RNA polymerase may not even clear the promoter and slip off, producing truncated transcripts. Once the nascent transcript reaches approximately 23 nucleotides, the RNA polymerase no longer slips off, and full transcript elongation can occur. RNA polymerase then continues to traverse the template DNA strand, using ATP while complementarily pairing bases and forming the phosphodiester-ribose backbone. Many RNA transcripts may be rapidly produced from a single copy of a gene, as multiple RNA polymerases can transcribe a gene simultaneously, spaced out from one another. An important proofreading mechanism during elongation allows the substitution of incorrectly incorporated bases or editing of bases for other purposes, usually by permitting short pauses during which the appropriate RNA editing factors can bind. RNA editing mechanisms in mRNAs include nucleoside modifications of cytidine to uridine (C-U) and adenosine to inosine (A-I) by deamination, as well as nucleotide insertions and additions without a DNA template by protein complexes called editosomes . Adenosine-to-inosine (A-to-I) modifications make up nearly 90% of all editing events in RNA. The deamination of adenosine is catalyzed by the double-stranded RNA(dsRNA)-specific adenosine deaminase (ADAR). The deamination of adenosine to inosine disrupts and destabilizes dsRNA base pairing with multiple possible outcomes, such as reduced formation of small interfering RNAs (siRNAs) but also labeling of RNA as self and prevention of activation of an innate immune response. Studies in hematopoiesis and leukemia have elucidated the critical roles of A-to-I editing.

Another important repair mechanism is transcription-coupled nucleotide excision repair, where RNA polymerase stops transcribing when it comes to a bulky lesion in one of the nucleotides in the gene. A large protein complex excises the DNA segment containing the bulky lesion, and a new DNA segment is synthesized to replace it, using the opposite strand as a template. The RNA polymerase then resumes transcribing the gene. However, in general, RNA proofreading mechanisms are not as effective as in DNA replication, and transcription fidelity is generally lower.

Regulation of RNA Processing: Capping, Splicing, and Polyadenylation

After a eukaryotic gene is transcribed, the primary transcript is modified to protect it from degradation and target it for export into the cytoplasm and eventually translation to protein (see Fig. 4.1 ). These modifications generate the mature transcript and include capping, splicing, and polyadenylation. Capping occurs shortly after the start of transcription, when a modified guanine nucleotide is added to the 5′ end of the mRNA. This terminal 7-methylguanosine residue is necessary for proper attachment to the 40 S ribosome subunit during translation initiation. It also protects the RNA from endogenous ribonucleases that degrade uncapped RNA, which is often viral in origin. RNA polymerases do not terminate transcription in an orderly manner. They tend to be processive, yet the cell cannot tolerate a population of mRNAs that are enormous in size. Therefore mRNAs have a signal, the sequence AAUAA, that defines the end of the transcript. In general, ribonucleases cut mRNAs shortly after that signal, and a chain of several hundred adenosine residues, the poly(A), is added to the free 3′ transcript end. RNA cleavage and synthesis of the poly(A) tail require binding of specific proteins, including cleavage/polyadenylation specificity factor (CPSF), cleavage stimulation factor (CstF), polyadenylate polymerase (PAP), polyadenylate binding protein 2 (PAB2), cleavage factor I (CFI), and CFII, that function to catalyze cleavage and protect the mRNA from exoribonucleases. The poly(A) tail increases RNA stability and assists in RNA export to the cytoplasm and translation. Mutations in the poly(A) signal can result in hematologic disease. For example, there are thrombophilic patients with a mutation in the polyadenylation signal in the prothrombin gene that increases the stabilization of this mRNA, resulting in higher prothrombin protein levels and increased thrombosis.

Before the mRNA is exported from the nucleus to be translated into protein, introns must be removed and the exons reconnected (see Fig. 4.1 ). In complex multicellular organisms such as vertebrates, introns are approximately 10-fold longer than the exons. The sequence and length of introns varied rapidly over evolutionary time. The process of removing introns, termed splicing , requires a series of reactions mediated by the spliceosome, a dynamic complex formed anew on its substrate from 5 snRNAs and approximately 100 proteins to form small nuclear ribonucleoproteins (snRNPs). Recent critical advances in cryoelectron microscopy have shed light on the dynamics of assembly and disassembly of the spliceosome and the splicing process. Canonical splicing uses the major spliceosome and accounts for more than 99% of splicing. The major spliceosome is composed of the nuclear active snRNPs U1, U2, U4, U5, and U6 along with multiple specific accessory proteins, such as U2AF and SF1. The spliceosome complex recognizes the dinucleotide GU at the 5′ end of an intron and an AG at the 3′ end ( Fig. 4.3 ). Splicing can also occur cotranscriptionally and is regulated by chromatin factors that regulate transcription. As transcription proceeds, an RNA lariat structure forms as intermediate, connecting intron ends, providing for both excision of the intron and proper alignment of the ends of the two bordering exons to allow precise ligation. When the intronic flanking sequences do not follow the GU-AG rule, noncanonical splicing removes these rarer introns with different splice site sequences using the minor spliceosome. The same U5 snRNP is found in the minor spliceosome, in addition to the unique yet functionally similar U11, U12, U4atac, and U6atac. Furthermore, there are splicing mechanisms, including tRNA splicing and self-splicing, that function without use of the spliceosomal machinery. tRNA intron excision and exon ligation are catalyzed by protein-only complexes. tRNA introns typically interrupt the anticodon loop and must be removed so that the mature tRNA can properly function in protein translation.

Figure 4.3, DESCRIPTION OF THE CANONICAL SPLICING JUNCTION, HIGHLIGHTING SPLICING FACTORS RECURRENTLY MUTATED IN HEMATOLOGIC MALIGNANCIES.

Splicing is central to the output of a diverse transcriptome and then proteome. Alternative splicing (AS) can enhance the versatility and diversity of a single gene. By alternatively excising different introns along with the intervening exons, a wide range of unique transcripts and proteins of differing sizes can be generated. These alternatives, termed isoforms , come from one gene that generates a variety of mRNAs with varying exon composition. AS is widespread in complex multicellular organisms: the average human gene has seven different isoforms, and the number of known isoforms is rapidly increasing thanks to technologies such as long-read next-generation sequencing. AS is common and essential for the proper function of almost all hematopoietic cells. For example, B cells can produce both immunoglobulin M (IgM) and IgD at the same developmental stage using AS. Erythrocytes use AS to produce differing isoforms of cytoskeletal proteins. However, AS does not always give beneficial results.

One of the best examples of inappropriate splicing leading to hematologic disease is β-thalassemia, where several mutations that occur in the GU-AG splicing signals result in aberrant β-globin mRNAs. Abnormal splicing can also lead to AML, MDSs, and other hematologic disorders. Translocated in liposarcoma (TLS) is a protein that recruits splicing complexes to mRNAs and is involved in the TLS-ETS transcription factor ERG fusion oncogene in t(16;21) in AML. This fusion of TLS with the transcription factor ERG alters the splicing profile of immature myeloid cells, blocking the expression of genes required for proper differentiation. Trans-splicing is a form of splicing that joins two exons that are not within the same mRNA transcript. Some trans-splicing events occur when the intron splice sites are not filled by spliceosomes. Trans-splicing can lead to mRNAs displaying exon repetitions or chimeric fusion RNAs, which can mimic the presence of a chromosomal translocation in normal cells. For example, specific chimeric fusion mRNAs seen in acute leukemias, such as MLL-AF4, BCR-ABL, TEL-AML1, AML1-ETO, PML-RAR, NPM-ALK, and ATIC-ALK, have been found in blood cells of healthy individuals with normal chromosome karyotype. Interestingly, these individuals generally do not develop leukemia. In addition, in patients with chronic myelogenous leukemia (CML) resistance to tyrosine kinase inhibitor therapy has been linked to AS of the BCR-ABL transcript.

Splicing mutations in genes can occur in cis within the RNA itself as is the case for the aforementioned β-thalassemia, or in trans such as mutations in splicing factors or members of the spliceosome. Recurring mutations in several factors of the spliceosome result in MDSs and other hematologic malignancies (see Fig. 4.3 ). Mutations in the splicing factor 3b, subunit 1 (SF3B1) have been observed in 68% to 75% and 81% of refractory anemia with ring sideroblasts (RARS) and RARS with thrombocytosis (T) patients, respectively, and result in alternative choice of the branchpoint site within the intron. Mutations in the U2 small nuclear RNA auxiliary factor I (U2AF1) and the serine/arginine-rich splicing factor 2 (SRSF2) result in sequence-specific splicing aberrations. U2AF1, a subunit of the U2AF heterodimer that also contains the polypyrimidine tract binding subunit U2AF2, carries distinct point mutations in its two zinc-fingers that contact the AG dinucleotide in the 3′ splice site. Mutations of U2AF1 S34 or Q157 residues create de novo 3′ splice site contacts that alter RNA splicing and result in preferential exon inclusion or exclusion, depending on the −3 or +1 nucleotide sequence at the 3′ splice site, respectively. Mutations in U2AF1 are associated with a number of myeloid malignancies and occur in 8.7% to 11.6% of de novo cases of MDS. U2AF1-mutant MDS/AML cells exhibit enhanced stress granule response, pointing to a novel role for biomolecular condensates in adaptive oncogenic strategies. SRSF2 is a member of the serine/arginine-rich pre-mRNA splicing factors. SRSF2 recognizes so-called splicing enhancer sequences (ESE) within exons (see Fig. 4.3 ), with a 5′-SSNG-3′ consensus motif where S = C/G and N = C/G/T/U. Mutations of P95 in the RNA-binding domain of SRSF2 alter RNA binding and splicing, reflected in higher affinity for 5′-CCNG-3′ than 5′-GGNG-3′ containing exons and resulting in preferential inclusion of alternative exons containing CCNG-rich motifs and exclusion of exons containing GGNG-rich motifs. Mutations in the SRSF2 gene are associated with MDS and related diseases, particularly CMML, with SRSF2 mutations reported in up to 47% of patients. Recent studies have identified numerous critical targets aberrantly spliced, such as the member of the PRC2 complex EZH2, involved in epigenetic regulation. In addition, it has been found that splicing factor mutations result in enhanced formation of DNA:RNA hybrids, so called R-loops , that induce DNA damage that promises to be exploitable in the treatment of these diseases.

Nuclear Export of RNA

The nuclear envelope (NE) serves as a major regulator of gene expression, by controlling the movement of mature RNA from the nucleus to the cytoplasm for translation. The NE is made up of a double membrane. The outer nuclear membrane is continuous with the ER and has a composition distinct from that of the inner membrane. Nuclear pore complexes (NPCs) inserted within the NE regulate the transport of molecules in and out of the nucleus. Ions, small metabolites, and proteins smaller than 40 kDa passively diffuse across NPC channels. However, larger proteins and mRNAs are transported through NPCs via energy-dependent (guanosine triphosphate [GTP]) and signal-mediated processes that require chaperoning transport proteins. Approximately 3000 NPCs perforate the NE in animal cells. NPCs are approximately 120 nm in external diameter and composed of three major parts: (1) a central core containing a 10-nm channel, (2) a nuclear basket that can dilate in response to large cargoes, and (3) flexible fibrils that extend from the central core into the cytoplasm. NPCs contain approximately 50 different proteins (nucleoporins), arranged in a complex cylindrical structure with an octagonal symmetry. Nucleoporins constitute the scaffold of the NPC and are arranged in rings. In the inner ring, nucleoporins containing repeats of two hydrophobic amino acids, phenylalanine and glycine (FG-repeats), are essential for the movement of the cargo-carrier complexes and for creating a selectivity barrier against the diffusion of nonnuclear proteins. The FG-nucleoporin filaments protrude toward the inner core of the NPC and the weak hydrophobic interactions between the FG-repeats and the cargo-carrier complexes mediate the passage of molecules.

One nucleoporin, Nup98, is involved in numerous translocations and resulting fusion proteins that cause MDS and leukemia. The N-terminal (Nt) domain that contains the FG-repeats is fused to more than 28 partners in MDS and AML and has shown leukemogenicity in cell lines and mouse models. Interestingly, it is likely Nup98’s transcriptional activation function rather than its role in nuclear pore formation that is critical in the function of these fusion proteins.

Naked RNA cannot be exported through NPC channels. Rather, RNA export from the nucleus requires that newly synthesized RNAs undergo the previously described processing steps: 5′ capping, splicing, and 3′ polyadenylation. RNA-binding proteins, such as NXF1 that mediates the export of most mRNAs or XPO1, are required to fold and shuttle the modified RNA through NPCs. The eukaryotic translation initiation factor 4E (eIF4E) enhances nuclear export of a subset of RNA transcripts and is critical for proper granulocyte differentiation. Overexpression of eIF4E impedes myeloid maturation and can result in AML. The cellular RNA export machinery is co-opted by gammaretroviruses for the export of viral transcripts, such as binding of murine leukemia virus transcripts by the host cell’s NXF1.

RNA Heterogeneity

The transcriptome of a cell is represented by a myriad of different RNA molecules with and without protein-coding capacities. Before recognition of the versatility of RNA, DNA was considered the sole conveyor of information while RNA was thought to solely function as an intermediate in protein synthesis (mRNA) or as effector molecules (tRNA, rRNA, snRNA). In the past couple of decades, RNA research has shed light on the pliability of RNA and its many regulatory functions in gene expression ( Fig. 4.4 ). Several classes of noncoding RNAs (ncRNAs) such as micro RNAs (miRNAs), piwi-interacting RNAs (piRNAs), and long noncoding RNAs (lncRNAs) have been identified and associated with regulatory functions and diseases. According to current genome annotations, the number of human ncRNAs is much higher than the number of protein-coding genes. The RNA landscape is further enriched by the presence of retrotransposons, a broad class of transposable elements, or “jumping genes,” that duplicate through RNA intermediates that are reverse transcribed and inserted at new genomic locations. Due to their “copy and paste” mechanism, retrotransposons amplify in number quickly, composing 40% of the entire human genome.

Figure 4.4, HIERARCHY OF RNA CLASSES DEFINED BY FUNCTION, BIOGENESIS, AND SIZE.

LncRNAs are defined by their length, greater than 200 nucleotides (nt). They have gene regulatory roles in the nucleus or function in the cytoplasm via epigenetic, transcriptional, posttranscriptional, translational, and protein location effects. With almost 18,000 genes in the human genome, lncRNAs are apparently the most numerous and functionally diverse class of ncRNAs. Depending on their genomic location, they can be further divided in antisense (asRNAs), overlapping protein coding genes, or intergenic (lincRNAs) (see Fig. 4.4 ). Several lncRNAs undergo maturation processes such as capping, splicing, and polyadenylation. Despite their designation as ncRNAs, a considerable number of these transcripts tend to contain short open reading frames (sORFs) and bind with ribosomes, suggesting that the coding potential of lncRNAs has been vastly underestimated. Although the function of most lncRNAs is still unknown, multiple lincRNAs such as MALAT1, NEAT1, or XIST have already been implicated in human diseases such as cancer. An inherited form of α-thalassemia is caused by the translocation of an antisense lncRNA near the α-globin gene, resulting in the epigenetic silencing of the HBA2 gene and causing the disease.

Short ncRNAs, with a length less than 200 nucleotides, account for approximately 8500 genes in the human genome. They include well-known RNA classes such as tRNAs and rRNAs, essential for translation and described below in the section Protein Synthesis. snRNAs are associated with specific proteins in snRNPs and part of the spliceosomal complexes described in the section Regulation of RNA Processing: Capping, Splicing and Polyadenylation. Small nucleolar RNAs (snoRNAs) are a class of small RNA molecules that primarily guide chemical modifications of other RNAs, mainly rRNAs, tRNAs, and snRNAs.

More recently discovered classes of short RNAs include miRNAs, siRNAs, and piRNAs. The role of miRNAs and siRNAs are described in detail later (section RNA Interference). piRNAs are approximately 26- to 31-nt-long transcripts and function in suppression of transposable elements and maintenance of germline integrity.

Among emerging classes of RNAs, circular RNAs (circRNAs) are primarily produced via backsplicing of the 3′ end to the 5′ end of exons within the same transcript of a coding gene, giving these RNAs the circular shape. Compared with linear RNA, circRNAs form a covalently closed circular continuous loop and are highly conserved, stable, and tissue specific. CircRNAs can result in alternative protein isoforms and function as decoys for miRNAs and RNA-binding proteins, thereby regulating RNA stability and translation.

Enhancer RNAs (eRNAs) are generally relatively short ncRNAs transcribed from the genomic DNA at enhancer regions and represent a diverse class of molecules. They were originally described as nonpolyadenylated, bidirectionally transcribed RNA transcripts (<2000 nt) that originate from active enhancers. Subsequently some eRNAs were shown to be longer (>4000 nt), polyadenylated, and unidirectionally transcribed from higher-activity enhancers. eRNAs can regulate transcription in cis and in trans . Studies to identify their specific functions and mechanisms of action are ongoing.

ncRNAs, that once were mostly considered “junk” because they were diverging from the “central dogma,” are proving to be an increasingly important part of our genome, an intricate layer of signals that control gene expression in physiology and disease.

Regulation of RNA Metabolism: Stability and Localization

In mammalian cells, RNA lifetimes range from several minutes to days and represent a tightly regulated balance of transcription (detailed earlier) and degradation. The limited lifetime of mRNAs enables a cell to alter protein synthesis in response to its changing needs. The stability of mRNA is regulated by several mechanisms, including via sequences within the untranslated regions (UTRs) of mRNA, via nonsense-mediated decay (NMD) and via RNA modifications. Within the mature mRNA, the coding sequence (CDS) contains the sequence that is translated into protein, from the start codon to the stop codon. UTRs before the start codon (5′UTR) and after the stop codon (3′UTR) are not translated but govern mRNA half-life, localization, and translational efficiency. In human mRNAs, UTR regions are on average as long as the CDS (5′UTR: 236 nt, CDS:1121 nt, 3′UTR: 1047 nt). Both proteins and small RNA species can bind to either the 5′ or 3′ UTRs to regulate translation or influence survival of the transcript. NMD is a surveillance pathway present in all eukaryotes whose main function is to reduce errors in gene expression by eliminating mRNA transcripts that contain premature termination codons (PTCs). In addition to quality control, NMD can also serve to regulate expression in conjunction with AS. More recently, RNA modifications have emerged as a powerful mechanism of regulation of RNA stability and possibly translation. miRNAs, small RNA molecules that bind to complementary sequences on target mRNA transcripts, provide additional regulation of mRNA stability and translation.

Regulation of RNA Stability by 5' and 3' UTR Sequences and Structures

UTR sequence regulation of mRNA survival is essential for proper hematopoietic differentiation. The best example of this is globin synthesis, where the mRNA is very stable because of its UTR sequences. This long half-life meets the needs of reticulocytes to synthesize globin for up to 2 days after terminally mature erythroblasts lose the nucleus and the ability to make new mRNA.

Some of the elements contained in UTRs form a characteristic secondary structure that alters the survival of the mRNA transcript, exemplified by the prothrombin 3′ UTR. This mRNA is constitutively polyadenylated at seven or more alternative positions, and the 3′ UTR folds into at least two distinct stem-loop conformations. These alternate structures expose a consensus binding site for trans-acting factors, such as heterogeneous nuclear ribonucleoprotein 1 (hnRNP-I), polypyrimidine tract-binding protein-1 (PTB-1), and nucleolin, with translational regulatory properties. Another type of 3′ UTR regulatory sequence involves selenocysteine insertion sequence (SECIS) elements. These represent another stem-loop RNA structure found in mRNA transcripts and serve as protein-binding sites on UTR segments that direct the ribosome to translate the codon UGA as selenocysteine rather than as a stop codon. An example of this regulation can be found in selenoprotein P in plasma. Bacteria contain another class of these mRNA elements, the riboswitches , that directly bind the small molecules that their mRNAs encode thereby directly regulating their own activity in response to the concentrations of their effector molecules. Bacterial riboswitches are relevant to hematopoiesis since the mRNAs for several enzymes in the cobalamine pathway in intestinal bacteria have riboswitches; these bind adenosylcobalamine that in turn regulates the survival and translation of these mRNAs and finetunes cobalamine synthesis. Riboswitches are promising new drug targets, for now against bacteria, exemplifying the opportunities in RNA therapeutics.

Another class of UTR functional sequences affecting the stability of the mRNA is the AU-rich element (ARE). AREs are stretches of mRNA consisting mostly of adenine and uracil nucleotides. These sequences destabilize their transcripts through the action of riboendonucleases that stimulate poly(A) tail removal. Loss of the poly(A) tail is thought to promote mRNA degradation by facilitating attack by both the exosome complex and the decapping complex. Rapid mRNA degradation via AREs is a critical mechanism for preventing the overproduction of potent cytokines such as tumor necrosis factor (TNF) and granulocyte-macrophage colony-stimulating factor (GM-CSF). AREs also regulate the synthesis of mRNA for proto-oncogenic transcription factors such as c-Jun and c-Fos. The AU elements in the mRNA of these genes mediate destruction of their transcripts in quiescent cells, preventing inappropriate cell proliferation that would occur if Fos/Jun were still active.

Besides transcript stability, the efficiency of translation can be regulated by cellular factors that bind mRNA in a sequence-specific manner. Iron metabolism is an excellent example of how cells coordinate uptake and sequestration of an essential metabolite in response to availability. Transferrin is a plasma protein that carries iron. Receptors for transferrin (TfR) are expressed on cells requiring iron for maturation, such as erythroid progenitor cells. They mediate internalization of transferrin loaded with iron into the cytoplasm through receptor-mediated endocytosis. When a cell becomes iron deficient, iron-responsive element–binding proteins (IRE-BPs) can bind to iron-responsive elements (IREs) in the UTR of transferrin receptor (TfR) mRNA ( Fig. 4.5 ). UTR binding leads to stabilization of the TfR mRNA transcript and in increased protein expression. However, when a cell has sufficient iron, as iron binds to more and more IRE-BPs, they change shape and unbind the TfR mRNA. The TfR mRNA becomes unstable and is rapidly degraded (see Fig. 4.5 ). Therefore, in that situation, TfR receptor expression is low and the fewer receptors import lessiron.

Figure 4.5, CONTROL OF TRANSFERRIN RECEPTOR EXPRESSION BY RNA STABILITY.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here