Principles of Epigenetics

Introduction to Epigenetics

The publication of the majority of the human genome sequence in 2001 , was the precursor to many important discoveries. However, the human genome sequence has not provided researchers with the codex to fully understand the genome’s functionality or to predict its response to environmental cues (such as nutritional challenges). One reason why this is the case is that the human genome is more complicated than was originally postulated. Counterintuitively, this complexity partially arises from the finding that the human genome only has approximately one third of the predicted number of genes. Fewer genes means that those genes that are present are more complex, producing multiple different messenger RNAs. As a result, the regulatory processes that control the expression of these genes are complex, , involving multiple layers of regulation, much of which still remains to be discovered and described.

Traditionally, it had been assumed that inherited genes control gene expression and, ultimately, phenotype. In the early 1940s Waddington introduced the concept of epigenetics (“on to top of” or “in addition to” genes) to describe the way in which genes interact with their surroundings to produce a phenotype during the differentiation of cells over the course of development (without a change in gene sequence). Thus, environmental cues can lead to up- or down-regulation of gene activity. This definition leaves out the concept of inheritance, instead emphasizing the effect on the final cell type and how small nongenetic changes in development can lead to measurable differences in adult phenotype. Recently, epigenetics has been redefined, first by Riggs as “the study of mitotically and/or mitotically heritable changes in gene function that cannot be explained by changes in DNA sequence” and more recently by Cavalli and Heard as “the study of molecules and mechanisms that can perpetuate alternative gene activity states in the context of the same DNA sequence.” Therefore, epigenetics is any element with permanent (or at least semi-permanent) changes in gene expression or cellular phenotype. This encompasses transgenerational inheritance and the persistence of gene activity or chromatin states through extended periods of time. Throughout this chapter we will discuss epigenetics in the context of this more modern definition, but it should be noted that epigenetics is a term that has many different definitions, with “mitotically stable” and “epigenetic memory” being points of controversy.

From Genetics to Epigenetics

Double-stranded DNA is an efficient and reliable mechanism to pass information from one generation to another, given that it is stable and there are a number of repair systems that have evolved to maintain it. Thus, genetic changes tend to occur slowly, taking many generations for a single mutation to become dominant in a population. By contrast, epigenetic changes can occur in a more rapid timeframe. This means that epigenetics provides a mechanism for rapid responses to environmental changes. Consistent with this, studies have shown that de novo epigenetic “mutation” is one to two orders of magnitude more frequent than de novo somatic DNA mutation. This difference in “mutation rates” is due to a reduction in the fidelity of maintenance of epigenetic features, when compared to genetic features, throughout the cell cycle. For example, the genetic code is copied (replicated) with an error rate of less than 1 base in 10 7 to 10 8 bases copied. By contrast, epigenetic mechanisms, such as methylation, have an error rate that has been estimated to be between 1% and 4%. , ,

Developmental plasticity is a genotype’s or individual’s ability to respond to changes in environmental conditions through changes in its phenotypes. All developmental plasticity is, by definition, epigenetic in origin, as the genotype of the responding individual remains unaltered in the process. The plasticity of the epigenome is important for its contribution to the dynamic coordination of the genome’s responses to environmental signals. However, changing to suit the present environment can result in a suboptimal phenotype for tomorrow’s environment (the mismatch hypothesis). In developmental terms, the epigenome can change to enhance fitness in response to an environmental cue (e.g., reduced placental nutrient supply) during a small window in early development. Subsequent changes to the environmental conditions (e.g., overabundance of high-energy food) mean that the epigenetic changes, which have been stably maintained through the remainder of development, may become detrimental over the course of the individual’s lifespan by increasing the risk for metabolic and cardiovascular diseases.

How Genes Learn from Experience

Twin studies exemplify the epigenetic changes that occur during a lifetime of interactions between the environment and the genome (reviewed in Bell and Spector ). In simple terms, genetically identical monozygotic twins are epigenetically indistinguishable when they are born. However, as they age, the twins begin to display differences in the overall phenotype, due to their cumulative individual exposure to environmental signals. As previously mentioned, it is through the epigenetic changes that each individual modifies their phenotype to better suit the environment they have experienced. Collectively, these changes alter the individual twin’s risk factors for obesity and a number of non-communicable diseases such as type 2 diabetes mellitus ( Box 2.1 ). ,

Box 2.1
Nature Versus Nurture, Genes Versus the Environment

There has been a longstanding debate as to whether health is determined by “nature” or “nurture.” It is clear that phenotypic traits exist on a continuum, where some are predominantly controlled by genetics (e.g., height) and others by environmental factors (e.g., obesity). However, the influence of genes and environment in the development of phenotypic traits is not mutually exclusive, but rather is a result of their constant interaction. For example, twin studies suggest that genetic factors have a substantial effect on variations in body weight, particularly in children and adolescents. Nonetheless, the fact that obesity is rapidly increasing worldwide shows that environment also plays a significant role in the likelihood of becoming obese. Thus in most cases the resulting manifestation of non-communicable disease is a combination of nature and nurture. Importantly, this interaction between the environment and genetic inheritance is mediated through epigenetics.

The Structure of the (EPI)Genome

Eukaryotes use multiple systems to initiate and regulate changes in gene expression. In total, genes are regulated by hundreds of functional DNA elements, controlling when and how much protein is produced. This regulatory control occurs through mechanisms that utilize epigenetic signals to affect nuclear (e.g., transcription and mRNA processing) and cytoplasmic (e.g., translation) processes ( Fig. 2.1 ). These mechanisms include DNA methylation (with or without ubiquitination), histone modifications (i.e., acetylation, phosphorylation, sumoylation, methylation), chromatin folding, non-coding RNA (ncRNA and miRNA), , and prions. Epigenetic effects on transcription are well documented and will therefore form the main focus of the remainder of this chapter. A summary of how these various epigenetic processes are analyzed is shown in Table 2.1 .

Fig. 2.1, Epigenetic machinery. The following analogy can be used to illustrate this point. Security guards can use keys to lock and unlock doors according to instructions they are receiving from another source. (1 and 3) At the beginning and end of each working day, the guards go through their routine of unlocking or locking doors. (2) By locking and unlocking doors in a factory, the guards are not changing the structure of the factory, but rather this system is akin to epigenetic modifications that limit the workers’ (i.e., the transcription factors, DNA binding proteins, and RNA polymerases) access to the equipment and information within the factory. If there is an error in the unlocking routine for example, part of the factory would remain off-limits to the workers for one cycle of 12 hours. Thus, if the factory is a car assembly line and the section where the wheels are stored (i.e., the gene) remains locked, then no workers are able to access this area and the final product (i.e., the phenotype) is cars without wheels. (3) However, when the correct set of keys has opened the correct factory doors, the cars and wheels will both be accessible, and the cars will be made.

Table 2.1
Measuring Epigenetic Profiles.
Single Locus Analysis Global (Whole Genome) Analysis
Epigenetic Process Function Platform Cost Time Platform Cost Time
DNA Methylation Repress gene activity Bisulfite conversion followed by various targeted sequencing opt ions Low Low Bisulfite conversion followed by various whole-genome sequencing opt ions High High
DNMT1a. DNMTlb Methylation maintenance (across cell divisions)
DNMT3a. DNMT3b De novo DNA methylation
Historic modifications Chromatin immunoprecipitation followed by qPCR Low Low Chromatin immunoprecipitation followed by Next Generation Sequencing (ChIP-seq) or hybridization to a microarray (ChIP-chip) High High
Post-transcriptional Regulators (miRNA, ncRNA) Repress gene activity qRT-PCR, Targeted Sequencing, or Microarray Low Low Microarray or Next Generation Sequencing (RNAseq) High High
Chromatin Structure and Function (3D Genome) 10,000× Companion, DNA activity regulation Chromatin conformation capture (3C, 4C, or GCC) and FISH Medium Medium Global chromatin conformation capture (5C or Hi-C), FISH, and ChIA-PET. Very high High
Various techniques are used to characterize epigenetic modifications. The use of antibody precipitation to isolate pieces of DNA that are methylated or unmethylated, or are associated with modified histones is central to many of the techniques that are used to study epigenetic modifications on the local and global scale (e.g., chromatin immunoprecipitation, ChIP-chip ChIP-seq; ChIA-PET; MeDIP). Modifications of methylation of cytosine in CpGs are also studied using bisulfite conversion, which changes the 5me-C residue to a uracil. Finally, chromatin organization (which DNA sequences are nearby or contacting each other within the nucleus) and the effects of epigenetic modifications on this is determined by methods that range from FISH methodologies methodologies, , differential centrifugation, or chromosome conformation capture based technologies (e.g., 3C 3C, 4C, GCC, or 5C ).

It should be noted that epigenetic effects such as DNA methylation do not turn a gene on or off permanently. Rather, most epigenetic mechanisms lead to semi-permanent changes. As such, epigenetic modifications need to be continually maintained by the recruitment of the required enzymes and proteins to accurately replenish the epigenetic marks, and thus contribute to the maintenance of the appropriate state of transcription. Epigenetic modifications only “contribute to the maintenance of the correct state of transcription”; other factors (e.g., DNA-binding proteins and RNA polymerases) are ultimately responsible for reading and transcribing the gene.

Dna Methylation

DNA methylation is a fundamental and evolutionarily conserved epigenetic modification involved in gene regulation and other biologic processes (e.g., see He and colleagues ). In mammals, DNA methylation is restricted to sites where a cytosine nucleotide is followed by a guanine nucleotide (CpG, Fig. 2.2 ). In most mammalian species, 90% to 98% of CpG sites are methylated, and the methylation status and density of CpG sites is associated with gene regulation. Therefore, measuring the methylation status of particular genes within a cell type can provide researchers with information as to which RNA species are likely to be transcribed, albeit there are exceptions to this rule (as will be discussed later.)

Fig. 2.2, Fundamentals of epigenetics.

Gene activation is typically associated with tracts of largely unmethylated CpG, known as CpG islands . The majority (60%) of these CpG islands occur in or near gene promoters. , Methylation (a mark of down-regulation) inside or within ∼2 kb of these CpG islands contributes to the control of gene expression. DNA methylation status is mostly controlled by the family of genes known as DNA methyltransferases (DNMT) . Briefly, DNMT1 controls maintenance of methylation (transmission from mother to daughter cells). DNMT 3a and 3b are responsible for de novo methylation (establishment of methylation without a template or changes in methylation state), , while DNMT3L is largely involved in the methylation of maternally imprinted genes (see later) during oogenesis.

Placental growth and development are regulated in part by epigenetic mechanisms such as DNA methylation. During gestation, embryonic development is associated with the establishment of distinct DNA methylation differences between the trophectoderm and inner cell mass. The trophectoderm (ultimately placenta) becomes significantly less methylated than the inner cell mass (ultimately embryo/somatic tissues). Overall, whole-term placental lysates have 14% to 25% less global DNA methylation when compared to somatic tissues.

For a more exhaustive review on epigenetic marks in development, see the review by Ficz and colleagues.

Histone Modifications

The most basic unit of chromatin structure is the nucleosome, which consists of approximately 147 base pairs of DNA wrapped 1.67 times around a barrel-shaped histone octamer containing two copies of the core histones H2A, H2B, H3, and H4 (see Fig. 2.2 ). Nucleosomes are separated by exposed linker DNA that is typically 20 to 50 base pairs in length. Only about 75% to 90% of DNA in eukaryotes is bound within a nucleosome at any time in the cell cycle.

Nucleosomes are the targets of a wide range of post-translational modifications (e.g., acetylation, phosphorylation, sumoylation, and methylation) that combine to form an epigenetic (histone) code. Each of the core histones (H3 and H4) features a long amino acid “tail,” where posttranslational modifications may occur to affect gene expression. Enzymes deposit (“write”) or remove (“erase”) these histone marks of phosphorylation, acetylation, and methylation. Proteins that bind to modified histones (“readers”) are part of larger, multisubunit protein complexes that exert downstream functions (note: these complexes often recognize combinations of different histone marks simultaneously).

Post-translational modifications of the histone tails, or to the central histone structure itself, can (1) directly affect the compaction and assembly of the chromatin by regulating the interaction between the DNA and each histone within the nucleosome or the between nucleosomes themselves ; or (2) serve as binding sites for recruitment of other proteins that themselves contribute to the regulation of transcription and other nuclear functions. These modifications of the histone residues (“marks”) have been correlated with various important genetic elements in the regulation of gene expression. Promoters of transcriptionally active genes are associated with enriched trimethylation on histone H3 lysine 4 (H3K4me3), and lysine acetylation (H3K9ac, H3K27ac). For example, an enhancer is a genomic switch that, when activated (“turned on”), increases the likelihood of transcription of a particular gene. These enhancer elements are defined by having both H3K27ac and H3K4me1. Repressed genes have a higher density of nucleosomes (i.e., heterochromatin) and are usually marked by H3K9me, H3K27me3, and H4K20me3.

Polycomb complexes remodel chromatin to maintain developmentally or environmentally programmed expression states. DNA-binding proteins (or noncoding RNAs) recruit polycomb-group proteins to specific regions of the genome for epigenetic silencing of genes. For Polycomb Repressive Complex 2, histone methyltransferase enzyme EZH2 regulates trimethylation of lysine 27 on histone H3 (H3K27me2/3). Processes affected by polycomb complexes include X-chromosome inactivation and Hox gene silencing, through modulation of chromatin structure during embryonic development.

According to the modern definition of epigenetics, epigenetic changes must be mitotically stable. This leads to considerable controversy over the underlying changes that must be present and as to how expression levels of consistently activated genes are maintained when the original activation signal has passed. Some histone modifications (e.g., H3K36 methylation) have not been shown to be “mitotically stable” across several generations, while methylation marks located on H3K4, H3K9, and H3K27 have been shown to be mitotically transmissible. Also, some epigenetic changes are only a transient phenomenon, such as the phosphorylation of a variant of histone H2A (i.e., H2AX) during DNA double-strand breaks. On many levels, this would classify as an epigenetic mark, but it disappears once the break is repaired. Thus, these types of marks will never be classified as stably inherited effects and cannot meet the modern definition of being epigenetic. Therefore, while they are generally called epigenetic , not all methylation and histone modifications are epigenetic in the modern definition.

Chromatin Folding and 3d Structure

A non-linear consideration of DNA is important to understand the establishment and maintenance of enhancer-promoter interactions in space and time. Nucleosomes are the lowest form of structural scaffolds for DNA, which, when packaged with other proteins and RNA components, form compacted chromatin structures. The compaction levels for chromatin are not fixed but vary as the cell moves through the cell cycle. This ultimately results in the structures we recognize as chromosomes, in which the DNA has been compacted up to 10,000-fold (see Fig. 2.2 ).

The dynamic process of changing the compaction level of the DNA within a nucleus is an important component of the regulation of genes. , At a gross level, chromatin compaction is thought to contribute to the two dominant types of chromatin within eukaryotic cells: (1) heterochromatin, the tightly compacted form of chromatin that is largely transcriptionally silent; and (2) euchromatin, the less condensed, more transcriptionally active form of chromatin. However, closer inspection using new molecular techniques reveals that DNA packaging also creates local chromatin structures that contribute to the establishment of cell-type identity and lineage specificity. , Briefly, each chromosome folds up into a structure that promotes physical connections between regulatory elements that would be otherwise separated by long distances in the DNA sequence.

The organized three-dimensional (3D) chromatin structure within the nucleus (i.e., functional framework between regulatory elements and distant genes) gets substantially reorganized in disease ( Fig. 2.3 ). This restructuring induces an aberrant exposure of gene promoters to inappropriate regulatory elements, resulting in enhanced pro-disease (e.g., oncogenes) or silenced anti-disease genes (e.g., tumor suppressors). In development, Rubinstein syndrome and brachydactyly–mental retardation are both linked to defects in the management of the local chromatin state. In Rubinstein-Taybi syndrome, defects in genes that encode histone acetyltransferases (i.e., CREBBP and EP300) lead to a deficiency of histone acetylation. This is thought to result in the loss of open chromatin states in critical cell types, ultimately resulting in short stature, broad thumbs, and learning difficulties. In brachydactyly–mental retardation, the opposite problem occurs. Specifically, histone deacetylase 4 (HDAC4) can be mutated. As HDAC4 is an eraser of histone acetylation, mutation of this gene leads to an overabundance of open chromatin states in certain cell types, ultimately leading to skeletal and intellectual abnormalities.

Fig. 2.3, Spatial associations, differential methylation, and imprinting. (A) DNA variation or epigenetic marks can alter the spatial conformation of DNA so that elements normally regulated together are quite far apart. (B) On chromosome 11 (11p15) is a developmentally critical imprinting control region (IGF2 locus). In normal cells, the imprinted IGF2 locus is regulated differently in the maternally and paternally derived chromatin. Hypermethylation of the H19 promotor and loss of imprinting of IGF2 are detectable in 2% to 7% of BWS patients (resulting in over-expression of IGF2). Hypomethylation of the same promoter in RSS patients prevents IGF2 promoter interactions (resulting in under-expression of IGF2).

Alterations to long-range chromatin interactions between genes and their regulatory elements also contribute to human disease. For instance, limb formation in mammals is heavily reliant on the spatial co-localization of locus control regions (LCRs) and gene promoters. LCRs are genomic loci located at some distance away on the same chromosome (or even located on another chromosome) that are capable of mediating the activation or repression of one or more promoters (i.e., the LCR contains enhancer or repressor elements; for more on enhancers, see the histone modifications section). LCRs can interact with a specific target gene or with many genes. This allows the coordinated regulation of functionally related genes. Mammalian limb development is controlled by a cluster of genes, termed the homeobox D genes , that are partially regulated by an LCR within a 600 kb gene desert (region of the genome devoid of protein-coding genes) on chromosome 2. In cases where the physical interactions between this gene desert and the homeobox D gene are interrupted by translocations, patients develop limb and finger malformations including brachydactyly and syndactyly. Likewise, preaxial polydactyly can develop as a result of an alteration in the long-range interactions between the sonic hedgehog gene (SHH) and an intronic single nucleotide polymorphism (SNP) located 1 Mb away. The mutation in the intron of LMBR1 is incidental to the phenotype, as these elements act as an enhancer for SHH expression, where misexpression of SHH is the cause of the altered limb formation.

Non-Coding Rnas

Non-coding RNA (ncRNA) includes a broad array of RNA species, including microRNA (miRNA), small temporal RNA (stRNA), short interfering RNA (siRNA), short hairpin RNA (shRNA), small nuclear RNAs (snRNA), small nucleolar RNAs (snoRNA), transfer RNAs (tRNA), ribosomal RNAs (rRNA), and long non-coding RNA (lncRNA). All of these ncRNA are important regulators or effectors of RNA expression, and many have been implicated in gene and chromatin structure regulation. , However, for this review, only ncRNA siRNA, miRNA, and lncRNA will be covered in further detail.

Both siRNA and miRNA are 20 to 25-base-pair-long sequences of RNA that are assembled as single-stranded molecules in the cytoplasm with RNA-induced silencing complexes. These complexes, as their name suggests, result in the inhibition of protein synthesis by silencing the target mRNA(s). The siRNA and miRNA interactions with mRNA are sequence specific, and sometimes hundreds of different mRNA species can be bound by a single type of siRNA or miRNA. While similar in mechanism, siRNA and miRNA differ in how their binding to mRNA causes translational silencing. siRNA primarily acts through mediating the RNA interference pathway, where siRNA has perfect base pair complementarity with the targeted mRNA, resulting in cleaved mRNA. By contrast, miRNA have incomplete base pair complementarity with the target mRNA, leading to translational repression without mRNA degradation. miRNA bind to mRNA molecules and silence their translation into proteins by either: (1) cleaving the mRNA strand into two pieces; (2) destabilizing the mRNA by shortening its poly(A) tail; or (3) altering the mRNA-ribosomal interactions during translation.

miRNA has been identified in different organisms and plays very important roles in the timing of development, particularly in locking down differentiation states. For example, miRNA has important roles in tooth development, controlling size, shape, and the number of teeth. miRNAs have also been linked with many different disease states, including inflammatory diseases, cancer, Alzheimer disease, cardiovascular disease, type 2 diabetes mellitus, and rheumatoid arthritis.

Long ncRNA (lncRNA) are a “catch-all” for any non-coding RNA species over 200 base pairs in length. The targets of lncRNA are different from those of the short RNA species. These include regulation of chromatin states and folding, epigenetic regulation, X chromosome inactivation, imprinting, establishment of lineage specificity, and formation of anterior-posterior pattern during development. Not surprisingly, because of the range and number of targets, lncRNA have been implicated in a number of developmental processes and diseases, including cancer.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here