Basic Genetic Principles

Introduction

The human genome refers to the complete set of human DNA (with the suffix -ome arising from the Greek for “all” or “complete”). A copy of our genome comprises approximately 3 billion base pairs (bp) and about 20,000 protein-coding genes. The Human Genome Project was a significant contribution toward understanding the organization, structure, and sequence of the human genome. ^, With these developments, genomic medicine has emerged as a new discipline to analyze the genome and genetic information as a part of clinical care.

Having in-depth knowledge about the genome and the types and consequences of genomic variations is important for all medical professionals, especially neonatologists. Recognizing the most common chromosomal and monogenic disorders and genetic concepts such as inheritance, genomic imprinting, uniparental disomy (UPD), and X chromosome inactivation can help clinicians understand the origins of genetic conditions and risk of recurrence to patients and their families. Knowing the types of available clinical genomic tests, along with their utility and limitations, is critical for appropriate clinical use. In this chapter, we will also review prenatal diagnosis, clinical physical examination of the dysmorphic child, the future of newborn sequencing, and therapeutic approaches for monogenic diseases.

Genomic Organization

DNA and RNA

Deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) are long polymers of nucleotides. Each nucleotide has three elements: (1) a nitrogenous base, (2) a sugar molecule, and (3) a phosphate molecule. The nitrogenous bases fall into two types: purines and pyrimidines. The purines include adenine and guanine; the pyrimidines include cytosine, thymine, and uracil. The primary difference between RNA and DNA is related to a base composition, such that RNA contains uracil, whereas DNA contains thymine. The other difference between RNA and DNA is in their sugar-phosphate backbones: RNA contains ribose, and DNA contains 2-deoxyribose. Deoxyribose confers resistance to hydrolysis, which gives DNA chemical stability, supporting the fidelity of the information as cells divide. While DNA is packaged in chromosomes in the nucleus, RNA carries that message from the nucleus to the cytoplasm, where it is made into proteins.

The double-helix structure of DNA was elucidated in 1953. Hydrogen bonds zip up the complementary strands of DNA in which A pairs with T, and C pairs with G. Because of the complementary sequence of the two strands of DNA, there is redundancy in the information content, increasing the fidelity of the code. Replication of the double-stranded structure of DNA molecules requires a separation of the two strands followed by the synthesis of two new complementary strands. In contrast to DNA, RNA molecules are single-stranded and short-lived.

Structure of Chromosomes

Humans usually have 46 chromosomes in 23 pairs: 44 are autosomes, and 2 are the sex chromosomes (X and Y) involved in sex determination. The homologous pairs are numbered from 1 to 22 in order of decreasing size, with one member of each pair inherited from one parent.

Every chromosome consists of a long, single, and continuous DNA molecule located in the nucleus. DNA is packaged as chromatin complexed with histone proteins to condense the DNA into the nucleus. The five major types of histone proteins play a critical role in packaging of the chromatin. Two copies of the four core histones H2A, H2B, H3, and H4 constitute an octamer, around which a segment of DNA winds. A fifth histone, H1, binds to DNA at the tip of each nucleosome. Approximately 140 bp of DNA are linked with each histone core, making just under two turns around the octamer. After a short (20- to 60-bp) “spacer” segment of DNA, the pattern repeats, giving chromatin the look of beads on a string. Each complex of DNA with its core histones is called a nucleosome , which is the basic structural unit of chromatin.

Between cell divisions, the chromatin is unwound where genes are being expressed. With cell division, the chromosomes condense and become visible as the structures we observe in a karyotype. Noncoding RNA molecules play an essential role in gene regulation. For example, Xist, a noncoding RNA molecule, is a central regulator of X chromosome inactivation. It coats the inactivated X chromosome, which is structurally condensed, with most (but not all) genes being transcriptionally inactive.

Mitochondrial Genome

Mitochondria are organelles within cells that transform the energy from food into a form that cells can use. Each cell contains thousands of mitochondria, each containing several copies of a small circular mitochondrial chromosome. The mitochondrial DNA molecule is 16 kb in length and encodes 37 genes, all of which are fundamental for both normal mitochondrial function and also required for the function of ribosomal and transfer RNA molecules in the mitochondria. Mitochondrial genes are solely inherited from the mother.

Structure of Genes

Genes in humans are composed of protein-coding sequences called exons and the intervening (noncoding) DNA sequences called introns ( Fig. 1.1 ). Introns are initially transcribed into RNA in the nucleus and are spliced out to make the mature mRNA. Therefore, the information from the intronic sequences is not typically represented in the final protein product. Exonic sequences determine the amino acid sequence of the protein. Most genes contain at least one and usually numerous introns. The total length of the introns makes up a far greater proportion of a gene’s total length for most genes. Genes are also flanked by additional sequences that are transcribed but untranslated, known as the 5′ and 3′ untranslated regions, which play a role in RNA stability and gene expression.

Fig. 1.1, Gene structure (top) and the flow of genetic information from DNA to protein. Tan boxes indicate the regions of exons that do not encode amino acid sequences; gray boxes indicate posttranscriptional modifications. AUG is a codon that specifies the amino acid methionine and is also used to specify the first amino acid of a protein.

The human genome contains noncoding DNA sequences that act as regulatory elements . These sequences include promoters, enhancers, silencers, and locus control regions. They coordinate the regulation of genes in space and time. They include sequences for proteins called transcription factors to bind to and either increase or repress transcription. Promoter regions are responsible for the initiation of transcription and are typically found 5′ or upstream of a gene. Enhancers and silencers can be located either 5′ or 3′ of a gene, within the introns or sometimes farther away in neighboring genes. Promoters are binding sites for proteins that increase or repress transcription. Besides genes that are transcribed and made into proteins, there are genes known as non-coding RNAs (ncRNAs), whose functional product is RNA. Some ncRNAs can be quite long (long ncRNAs or lncRNAs) and play roles in gene regulation. There are also small non-coding RNAs, known as short interfering RNAs (siRNAs) and microRNAs (miRNAs) , that control gene expression. MicroRNAs are short, non-coding RNAs, approximately 22 nucleotides in length, which post-transcriptionally regulate mRNA expression, usually by decreasing expression. siRNAs are homologous to specific mRNAs and degrade the mRNA to decrease the expression of the target gene.

More than half of the human genome consists of various types of repeat sequences that are either clustered together or evenly distributed throughout the genome. These sequences can be short and consist of only a few nucleotides or can be as long as 5000 to 6000 nucleotides. The two best-studied dispersed repetitive elements are the Alu family and the long interspersed nuclear element (LINE) family. The Alu family makes up at least 10% of human DNA, and the LINE family accounts for nearly 20% of the genome. Segmental duplications are another repeat and are highly conserved and make up 5% of the human genome. When these duplicated sequences include genes, structural rearrangements can cause genetic diseases.

Cell Division

Transmission of the genetic information from one generation to the next relies on accurate replication of DNA during reproduction. Mitosis is used during somatic cell division to support the development and cellular differentiation. In mitotic division, the usual complement of 46 chromosomes is maintained through a process of DNA replication and subsequent separation of the chromosomes. In contrast, meiosis occurs only in cells that become gametes, each of which has only 23 chromosomes (haploid genome). Thus errors in cell division in either somatic or germline cells can cause abnormalities of chromosome number or structure that can be clinically significant.

Mosaicism

Mosaicism is the presence of at least two cell populations derived from the same zygote. Mitotic nondisjunction, trisomy rescue, or occurrence of a somatic new mutation can lead to the development of genetically different cell lines within the body. Mosaicism can affect any cells or tissue within a developing embryo at any point after conception to adulthood. If the mosaic cells are found only in the placenta and absent in the embryo, this known as confined placental mosaicism ( CPM ) ( Fig. 1.2A ). CPM may be detectable on a chorionic villus sample and may be associated with intrauterine growth restriction but not with congenital anomalies or neurodevelopmental disorders if the genetic anomaly is not present in the fetus. Somatic mosaicism is the presence of two or more cell lineages in tissues that may have a clinically observable phenotype in the part of the body with the genetic aberration (see Fig. 1.2B ). In gonadal mosaicism, the mosaic cells are restricted to the gametes and do not have a clinically observable phenotype but can be passed onto the next generation.

Meiosis

Meiosis is the process by which gametes are formed. In contrast with mitosis, in which a single cell division and an exact duplication of the genetic material occurs, meiosis involves two cell divisions, starting with a diploid parental cell and random reassortment and reduction of genetic material so that each of the four daughter cells has the haploid DNA content (i.e., 23 chromosomes). In this way, meiosis yields four haploid gametes (sperm or eggs). In female meiosis, the second meiotic division is completed only after fertilization, and advancing maternal age is associated with nondisjunction of the chromosomes. A polar body, containing a complete set of chromosomes, is extruded, leaving the egg with a single remaining haploid set of chromosomes. The second polar body is useful for preimplantation genetic diagnosis.

Recombination

During the prophase of the first meiotic division, homologous pairs of chromosomes are held together by the synaptonemal complex, which extends along the entire length of the paired chromosomes. Recombination between chromatids of the homologous chromosomes occurs at this stage, resulting in the exchange of DNA between the original parental chromosomes ( Fig. 1.3 ). In males, the X and Y chromosomes are physically associated only at the tips of their short arms during meiotic prophase. This short region is called the pseudoautosomal region because recombination between the X and Y chromosomes occurs there (and thus it behaves as an autosome in terms of Mendelian inheritance). Recombination involves the exchange of genetic material between the two homologs during meiosis I, and it is also critical for proper chromosome segregation during meiosis. Failure to recombine correctly can lead to nondisjunction of the chromosomes in meiosis I and is a frequent cause of aneuploidy (incorrect chromosome number) leading to pregnancy loss and congenital anomalies.

How Genes Function

Flow of Genetic Information

Transcription

The first step in gene expression is the production of an RNA molecule from the DNA template. The RNA acts as a molecular messenger, carrying the genetic information out of the nucleus to the cytoplasm. The synthesis of mRNA is called transcription because the genetic information in DNA is transcribed. During transcription, the two DNA strands separate, and one functions as a template for the synthesis of single-stranded RNA molecules by RNA polymerases. The initial RNA transcripts are quite long because they include both introns and exons from the gene. The intronic sequences are cut out, and the remaining exons are spliced together. To form the mature mRNAs that leave the nucleus, a methylated guanine nucleotide called a cap is added to the 5′ end, and a string of 200 to 250 adenine (polyA tail) bases is added to the 3′ end. The cap is necessary for ribosomal binding to initiate protein synthesis, and the polyadenosine stretch at the 3′ end increases the stability of the mRNA.

Transcriptional control is central for the development and proper functioning of every organism. Transcriptional regulation is accomplished by modifying the DNA (e.g., cytosine methylation) or by protein binding to specific DNA sequences to activate or repress transcription of a gene. There are many sequence-specific transcription factors that are differentially active by cellular and tissue type and time in development. Several regulatory sequences have been identified in promoters that are important for transcriptional initiation by RNA polymerase II, including the TATA box, so-called because it consists of a run of T and A base pairs. The TATA box is located approximately 30 bases before the transcription start site and functions as the binding site for a large, multisubunit complex of transcription factors (including RNA polymerase). A second conserved region, the so-called CAT box , is a few dozen base pairs farther upstream. Specific sequence elements that form promoters and enhancers are required for binding the ∼1400 sequence-specific proteins that bind to DNA and regulate transcription. Mutations in these regulatory sequence elements can lead to significant alterations in transcription and also can lead to genetic disorders.

The boundary between the introns and exons consists of a 5′ donor GT dinucleotide and a 3′ acceptor AG dinucleotide. Besides the canonical splice sequences, there are also splicing regulatory elements such as exonic splicing enhancers (ESEs) and exonic splicing silencers (ESSs). ESEs and ESSs correspond to six to eight nucleotides that serve as docking sites for splicing activator or splicing repressor proteins, thereby influencing the recruitment and activity of the splicing machinery. Most human genes undergo alternative splicing and hence encode more than one protein for each gene. Alternative polyadenylation creates further diversity. Some genes have more than one promoter, and these alternative promoters may result in tissue-specific isoforms. Alternative splicing of exons is also seen with individual exons present in only some isoforms.

Translation

The production of protein from a mRNA template is called translation because the genetic information that is encoded in DNA is translated into a sequence of amino acids in the protein. The genetic information is stored in the genetic code. Each of the three adjacent nucleotides is a unit of information called a codon and specifies an amino acid or the start or stop of translation. The linear codons in the DNA sequence specify the sequence of amino acids in a protein. Because each of the three sites in a codon can be one of four possible nucleotides, a total of 4 ³ , or 64, different codons are possible. Three of these 64 possible codons, UAA, UAG, and UGA, are called termination codons . The remaining 61 codons specify one of the 20 amino acids, leading to some degeneracy in coding certain amino acids. A consequence of degeneracy is that some DNA variants do not result in a change in the amino acid sequence (synonymous variants).

Epigenetics

In addition to the classic transcription factors that bind to specific sequence elements in genes, gene expression is controlled by enzymes that modify DNA-bound proteins and DNA itself. The principal mechanism by which DNA is modified is by methylation of cytosine residues adjacent to guanosine. Methylation of these CpG dinucleotides by DNA methylases leads to transcriptional inactivation, while demethylation by demethylases alters the conformation of chromatin, leading to transcriptional activation. Histone proteins are extensively modified by many enzymes, including acetylases, kinases, and methylases. The pattern of histone modification, particularly on lysine residues, controls whether a particular region of chromatin will be transcriptionally active or inactive.

Modifications of chromatin proteins and DNA can be inherited through multiple cell divisions. Such alterations that do not change the DNA sequence itself and are called epigenetic . Genetic diseases that affect this process exemplify the importance of epigenetics. For example, mutations in MeCP2, a protein that binds to methylated DNA to repress the expression of associated genes, cause Rett syndrome, an X-linked neurodegenerative disease. Rubinstein-Taybi syndrome is caused by mutations in the CBP gene, encoding CREB-binding protein, which acts to acetylate the histone proteins that are major components of chromatin.

Genetic Variation

A locus is a particular position on a chromosome for a specific gene and related DNA elements. Alleles refer to an alternative version of the DNA sequence at a locus. Generally, one of the alternative alleles is found in more than half of the population and is called is the major allele . The other versions of that gene refer to variants or minor alleles. Allele frequencies vary significantly in different populations. If an allele frequency is greater than 1%, it is said to be a polymorphism (multiple forms).

Mutation is generally meant to signify a DNA sequence that is deleterious and associated with a human disease. Mutations can be germline and inherited from one or both parents, or somatic and acquired over the life of an individual. Mutations can vary by the size of the altered DNA sequence. The size of mutations can range from a single nucleotide to the rearrangements of an entire chromosome. By convention, we have a reference genome that is used to compare genetic variants. This reference genome is updated as we understand the human genome better.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here