Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Epigenetics can be defined as inheritance of variation, above and beyond changes in the DNA sequence. In other words, epigenetics comprises the study of how cells sharing the same exhaustive DNA blueprint can appear and function so distinctly as white blood cells, hepatocytes, neurons, etc. Whereas the genome contains all of the vital information to direct the development of an organism, the epigenome dynamically filters and organizes that information into highly coordinated programs of gene expression.
Within the nucleus, DNA interacts with histone and non-histone proteins to form chromatin, which can be broadly classified as highly compacted and transcriptionally silent (heterochromatin) versus loosely compacted and transcriptionally active (euchromatin). Heterochromatin comprises two distinct classes of DNA: (1) noncoding, often repetitive, “structural,” DNA of centromeres and telomeres (constitutive heterochromatin), and (2) gene-encoding and gene-regulatory “functional,” DNA that is selectively rendered inactive in different cell types (facultative heterochromatin). When euchromatin is described as loosely compacted, the information content of its DNA is readily accessible to binding the protein and RNA machinery that regulate gene expression. Therefore, the study of epigenetics and chromatin aims to describe and understand the chromatin dynamics that orchestrate the four-dimensional symphony of molecular and cellular biology, from the (seemingly) one-dimensional score that is the genome.
The information contained within chromatin can be grossly divided into two main categories: (1) the structural genes themselves, which are transcribed and translated into proteins or act as functional RNAs, and (2) gene-regulatory regions, which control the timing and amount of transcription ( Fig. 2.1A ). The information contained in transcribed and translated regions can be interpreted using the “genetic code,” wherein the DNA sequence of the gene specifies, through a messenger RNA intermediate, the amino acid sequences of resulting proteins. While there is no universal genetic code to decipher the function of RNAs that are not translated into proteins, some such as ribosomal-RNA and transfer-RNA genes have well understood functions. In addition, several other classes of non-protein coding RNA genes with known functions exist, including small nuclear RNA (snRNA) involved in RNA splicing, Piwi-interacting RNA (piRNA) involved in silencing of transposable elements, small nucleolar RNA (sno-RNA) involved in directing the chemical modification of other RNA, and micro-RNA (miRNA) involved in translational silencing. A growing class of long noncoding RNA (lncRNA) have been identified, with a variety of proposed functions. Interestingly, these lncRNA genes appear to be regulated in much the same way as protein-coding genes. Protein-coding regions comprise approximately 1% to 2% of the genome. In contrast, the information contained in gene-regulatory regions is the “epigenetic code,” which has yet to be fully deciphered and is based on the accessibility of those regions to dynamic protein-DNA interactions, the identity of those interacting proteins, and the identity of the gene(s) whose expression is being modulated.
The most dramatic example of chromatin compaction is the condensation that occurs during mitosis, making individual chromosomes visible by light microscopy and allowing segregation of replicates equally among daughter cells. A condensed or compacted chromosome is folded many times upon itself and is highly protein-bound, affording little or no access to genomic information and remaining transcriptionally silent (see Fig. 2.1B ). Contrast this with the “decondensed,” chromatin state that is necessary for DNA replication, during the synthesis phase of the cell cycle. DNA replication requires unfolding of chromatin, disruption of its protein-DNA interactions, and “unzipping,” the double helix to allow every base in the genome to be copied. When not dividing, cells maintain their chromatin in intermediate states of compaction. Actively transcribed genes and their associated regulatory chromatin regions are “open,” and “accessible,” insofar as the underlying protein-DNA interactions are readily modified and disrupted to accommodate binding of transcription factors, cofactors, RNA polymerases, and the totality of functional components underlying gene expression.
It is important to remember some key differences between genomic and epigenomic research. Whereas the genome is essentially an unvarying feature of every cell in an organism (with the important exception of T and B cells that rearrange and mutate their antigen receptor genes), the epigenome of each cell within that organism is unique. Moreover, epigenomes are fluid throughout a cell’s life span, integrating intrinsic cellular “identity,” with contextual signals to specify a program of gene expression. Finally, the mechanics of DNA replication and cell division necessarily disrupt the protein-DNA interactions that comprise the epigenome. How cells re-establish their epigenetic identity, after cell division, is not well understood.
Regulatory, noncoding DNA regions can have a variety of different functions, illustrated in Fig. 2.1A and variously classified as promoters, enhancers/silencers, super-enhancers, and insulators. Promoters are typically located within 1 to 2 kb of the transcriptional start site (TSS) of a gene. At a minimum, RNA-polymerase-II-dependent promoters contain binding sites for general transcription factors TBP and TFIIB, which form the core of the transcriptional complex. Within the promoter, transcription factor binding sites (TFBS) modulate gene expression by recruiting histone modifying enzymes and transcriptional coactivators or corepressors.
An enhancer/silencer is a short (50 to 1500 bp) region of DNA that can be bound by transcription factors to increase/decrease the likelihood that transcription of a particular gene will occur. Enhancers/silencers can act both in cis (within a chromosome) and rarely in trans (between chromosomes), can be located up to 1 Mb away from the gene, and can be upstream or downstream from the TSS. Promoters physically interact with their associated enhancers or silencers via three-dimensional chromatin “looping,” facilitated by Mediator and Cohesin protein complexes (see Fig. 2.1D ). Genes may be regulated by several enhancers/silencers, and each enhancer/silencer may modulate expression of one or more genes. A super-enhancer is a cluster of physically and functionally associated enhancers that regulates genes critical for cell identity. Super-enhancers are marked by high levels of enhancer-associated histone modification and bind high levels of cell-type specific and lineage-defining transcription factors (known as “master” transcription factors).
By blocking the physical interactions between enhancers and promoters, insulators help to restrict the set of genes that can be modulated by an enhancer. Insulators are bound by cohesin and CTCF proteins and form boundaries between silenced and active genes. Clusters of insulators separate heterochromatin from euchromatin, and the segments of active chromatin bounded by these clusters are known as topological domains-genomic regions, within which regulation occurs.
Methylation of cytosine by DNA methyltransferases (DNMTs) occurs at 60% to 90% of CpG dinucleotides, in the mammalian genome. Methylated DNA is bound by methyl-CpG-binding domain proteins (MBDs) that recruit histone-modifying enzymes and chromatin-remodeling proteins, resulting in highly condensed heterochromatin. Methylation of promoter regions thereby represses transcription. Patterns of DNA methylation are replicated during DNA synthesis, and cell division and can be used to distinguish cell types and stages of differentiation.
The genome-wide pattern of DNA methylation, known as the methylome, has been characterized for a wide variety of tissues. Approximately 75% of the methylome is consistent across all cell types. The remaining 25% is differentially hypo- or hypermethylated in a cell type-specific manner. Cell type-specific hypomethylated regions are enriched for nucleosomes with modifications associated with active regions and TFBSs, while cell type-specific hypermethylation is associated with transcription factor silencing during differentiation. Aberrant DNA methylation is an extremely common feature of cancers, where hypermethylation of tumor-suppressor genes and hypomethylation of oncogenes may play important roles in oncogenesis and tumor progression.
Histones H2A, H2B, H3, and H4 are known as the core histones, while histones H1 and H5 are known as the linker histones. The core histones all exist as dimers, and the four dimers come together to form one octameric nucleosome core. The smallest unit of chromatin structure is the nucleosome, consisting of 147 base pairs of DNA double helix wrapped around the core histone octamer (see Fig. 2.1C ). Linker histones, primarily H1, bind the nucleosome at the entry and exit sites of the DNA and allow the formation of higher order structure. Histone N-terminal domains are rich in lysine and arginine residues that are subject to a variety of post-translational modifications (see below).
In addition to these major histones, dozens of minor histone variants have been identified and are highly evolutionarily conserved. Some minor variants have very specific roles in chromatin regulation. For example, histone H3-like CENPA is associated with centromeres. H2A.Z is associated with the promoters and enhancers of actively transcribed genes. Histone H3.3 is associated with the body of actively transcribed genes. Phosphorylated H2A.X is found in regions around double-stranded DNA breaks and recruits DNA-repair machinery.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here