Anatomy and Physiology of the Gene


Normal blood cells have limited life spans; they must be replenished in precise numbers by a continuously renewing population of progenitor cells. Homeostasis of the blood requires that proliferation of these cells be efficient yet strictly constrained. Many distinctive types of mature blood cells must arise from these progenitors by a controlled process of commitment to, and execution of, complex programs of differentiation. Thus developing red blood cells must produce large quantities of hemoglobin but not the myeloperoxidase characteristic of granulocytes, the immunoglobulins characteristic of lymphocytes, or the fibrinogen receptors characteristic of platelets. Similarly, the maintenance of normal amounts of procoagulant and anticoagulant proteins in the circulation requires an exquisitely regulated production, destruction, and interaction of the components. Understanding the basic biologic principles underlying cell growth, differentiation, death, and the homeostasis of critical proteins requires a thorough knowledge of the structure and regulated expression of genes because the gene is now known to be the fundamental unit by which biologic information is stored, transmitted, and expressed in this regulated fashion.

Genes were originally characterized as mathematic units of inheritance. They are now known to consist of molecules of deoxyribonucleic acid (DNA). By virtue of their ability to store information in the form of nucleotide sequences, to transmit it by means of semiconservative replication to daughter cells during mitosis and meiosis, and to express it by directing the incorporation of amino acids into proteins, DNA molecules are the chemical transducers of genetic information flow. Efforts to understand the biochemical means by which this transduction is accomplished have given rise to the disciplines of molecular biology and molecular genetics.

The Genetic View of the Biosphere: The Central Dogma of Molecular Biology

The fundamental premise of the molecular biologist is that the magnificent diversity encountered in nature is ultimately governed by genes. The capacity of genes to exert this control is in turn determined by relatively simple stereochemical rules, first appreciated by Watson and Crick in the 1950s. These rules govern the types of interactions that can occur between two molecules of DNA or ribonucleic acid (RNA).

DNA and RNA are linear unbranched polymers consisting of four types of nucleotide subunits. Each nucleotide is distinguished from the others by a unique purine or pyrimidine “base” projecting from the chain. Proteins are linear unbranched polymers consisting of 21 types of amino acid subunits. Each amino acid is distinguished from the others by the chemical nature of its side chain, the moiety not involved in forming the peptide bond links of the chain. The properties of cells, tissues, and organisms depend largely on the aggregate structures, properties and biochemical activities of their proteins, and the interactions occurring among them. The central dogma of molecular biology states that genes control these properties by encoding the structures of proteins, controlling the timing and amount of their production, and coordinating their synthesis with that of other proteins. The information needed to achieve these ends is transmitted (expressed) from DNA and translated into proteins by a class of nucleic acid molecules called RNA. Genetic information thus flows in the direction DNA → RNA → protein. This central dogma provides, in principle, a universal approach for investigating the biologic properties and behavior of any given cell, tissue, or organism by study of the controlling genes. Methods permitting direct manipulation of DNA and RNA sequences should then be universally applicable to the study of all living entities. Indeed, the power of the methodologies of molecular genetics lie in the universality of their utility.

One exception to the central dogma of molecular biology that is especially relevant to hematologists is the storage of genetic information in RNA molecules in certain viruses, notably the retroviruses associated with T-cell leukemia and lymphoma, and the human immunodeficiency virus. When retroviruses enter the cell, the RNA genome (the term “genome” refers to the totality of DNA or RNA sequences encoding the genetic information of a cell, tissue, or organism) is copied into a DNA replica (cDNA). This is accomplished with RNA-dependent DNA polymerases, enzymes also called reverse transcriptases . This DNA representation of the viral genome is then expressed according to the pathway specified by the central dogma. Retroviruses thus represent a variation on the theme rather than a true exception to or violation of the dogma. There are also some RNA viruses (coronaviruses being the most universally known example) that carry an RNA-dependent RNA polymerase capable of replicating many copies of its own RNA genome. These messenger RNAs (mRNAs) then encode proteins essential to their life cycle.

The Anatomy and Physiology of THE Gene

DNA and RNA Structure

DNA molecules are extremely long, unbranched polymers of nucleotide subunits. Each nucleotide contains a sugar moiety called deoxyribose, a phosphate group attached to the 5′ carbon position, and a purine or pyrimidine base attached to the 1′ position ( Fig. 1.1 ). The linkages in the chain are formed by phosphodiester bonds between the 5′ position of each sugar residue and the 3′ position of the adjacent residue in the chain (see Fig. 1.1 ). The sugar-phosphate links form the backbone of the polymer, from which the purine or pyrimidine bases project perpendicularly.

Figure 1.1, STRUCTURE, BASE PAIRING, POLARITY, AND TEMPLATE PROPERTIES OF DNA.

The haploid human genome consists of 23 long, double-stranded DNA molecules tightly complexed with histones and other nuclear proteins to form compact linear structures called chromosomes . The genome contains approximately 3 billion nucleotides; the individual chromosomes range from 50 to 200 million bases in length. By convention they are numbered from the longest (chromosome 1) to the shortest (chromosome 22), with the sex chromosomes getting the special designation X and Y. Females inherit the XX genotype and males, XY. The individual genes are aligned along each chromosome. The human genome contains about 2000 to 30,000 genes. Blood cells, like most somatic cells, are diploid. That is, each chromosome is present in two copies, so there are 46 chromosomes consisting of approximately 6 billion base pairs (bp) of DNA.

The four nucleotide bases in DNA are two purines (adenosine and guanosine) and two pyrimidines (thymine and cytosine). The basic chemical configuration of the other nucleic acid found in cells, RNA, is quite similar, except that the sugar is ribose (having a hydroxyl group attached to the 2′ carbon rather than the hydrogen found in deoxyribose) and the pyrimidine base uracil is used in place of thymine. The bases are commonly referred to by a shorthand notation: the letters A, C, G, T, and U are used to refer to adenosine, cytosine, guanosine, thymine, and uracil, respectively.

The ends of DNA and RNA strands are chemically distinct because of the 3′ → 5′ phosphodiester bond linkage that ties adjacent bases together (see Fig. 1.1 ). One end of the strand (the 3′ end) has an unlinked (free at the 3′ carbon) sugar position, and the other (the 5′ end) has a free 5′ position. There is thus a directionality (polarity) to the sequence of bases in a DNA strand: the same sequence of bases read in a 3′ → 5′ direction carries a different meaning than if read in a 5′ → 3′ direction. Cellular enzymes can thus distinguish one end of a nucleic acid from the other and one strand from its paired mate; most enzymes that “read” the DNA sequence tend to do so only in one direction (3′ → 5′ or 5′ → 3′ but not both). For instance, most nucleic acid– synthesizing enzymes read the template strand in 3′ → 5′ direction, thus adding new bases to the strand in a 5′ → 3′ direction.

Storage of Genetic Information in the Nucleotide Sequences of DNA

The ability of DNA molecules to store information resides in the sequence of nucleotide bases arrayed along the polymer chain. Under the physiologic conditions in living cells, DNA is thermodynamically most stable when two strands coil around each other to form a double-stranded helix. The strands are aligned in an “antiparallel” direction, having opposite 3′ → 5′ polarities (see Fig. 1.1 ). The DNA strands are held together by hydrogen bonds between the bases on one strand and the bases on the opposite (complementary) strand. The stereochemistry of these interactions allows bonds to form between the two strands only when adenine on one strand pairs with thymine at the same position of the opposite strand, or guanine with cytosine. These are the “Watson-Crick” rules of base pairing. Two strands joined together in compliance with these rules are said to have “complementary” base sequences. Similar rules apply to the formation of DNA-RNA or RNA-RNA double-stranded hybrids, except that A-U base pairs replace A-T pairs.

These thermodynamic rules imply that the sequence of bases along one DNA strand immediately dictates the sequence of bases that must be present along the complementary strand in the double helix. For example, whenever an A occurs along one strand, a T must be present at that exact position on the opposite strand; a G must always be paired with a C, a T with an A, and a C with a G.

Single-stranded nucleic acids can also fold back on themselves if two complementary sequences exist at different points along the molecule, thus forming “hairpin loops.” Hairpin loop structures create secondary structures that affect the accessibility of sequences and the interaction of the molecule with proteins or other nucleic acids.

Transmission of Genetic Information to the Next Generation

Enzymes that replicate (polymerize) DNA and RNA molecules obey the base-pairing rules. By using an existing strand of DNA or RNA as the template, a new (daughter) strand is copied (transcribed) by reading processively along the base sequence of the template strand, adding to the growing strand at each position only that base that is complementary to the corresponding base in the template according to the Watson-Crick rules. Thus a DNA strand having the base sequence 5′-GGCTATG-3′ could be copied by DNA polymerase only into a daughter strand having the sequence 3′-CCGATAC-5′. Note that the sequence of the template strand provides all the information needed to predict the nucleotide sequence of the complementary daughter strand. Genetic information is thus stored in the form of base-paired nucleotide sequences.

If a double-stranded DNA molecule is separated into its two component strands and each strand is then used as a template to synthesize a new daughter strand, the product will be two double-stranded daughter DNA molecules, each identical to the original parent molecule. This semiconservative replication process is exactly what occurs during mitosis and meiosis as cell division proceeds ( Fig. 1.2 ). The rules of Watson-Crick base pairing thus provide for the faithful transmission of exact copies of the cellular genome to subsequent generations.

Figure 1.2, SEMICONSERVATIVE REPLICATION OF DNA.

The Expression of Genetic Information Via Translation Into Proteins Using the Genetic Code

The information stored in the DNA base sequence of genes achieves its impact on the structure, function, and behavior of organisms by governing the structures, timing, and amounts of proteins and certain RNAs synthesized in the cells. The primary structure (i.e., the amino acid sequence) of each protein determines its three-dimensional conformation and therefore its properties (e.g., shape, enzymatic activity, ability to interact with other molecules, localization, and stability). In the aggregate, these proteins control cell structure and metabolism. The process by which DNA achieves its control of cells through protein synthesis is called gene expression .

An outline of the basic pathway of gene expression in eukaryotic cells is shown in Fig. 1.3 . The DNA base sequence of the “minus,” “anticoding” strand is first copied into an RNA molecule with a complementary base sequence, called premessenger RNA (pre-mRNA), by mRNA polymerase. Pre-mRNA thus has a base sequence identical to the DNA “plus” or “coding” strand. Genes in eukaryotic species consist of tandem arrays of sequences encoding mature mRNA (exons) alternating with sequences (introns) present in the initial mRNA transcript (pre-mRNA) but absent from the mature mRNA. The entire gene is transcribed into the larger precursor, which is then further processed (spliced) in the nucleus. The introns are excised from the final mature mRNA molecule, which is then further processed, as discussed later, and exported to the cytoplasm to be decoded (translated) into the amino acid sequence of the protein by association with a biochemically complex group of ribonucleoprotein structures called ribosomes . Ribosomes contain two subunits: the 60 S subunit contains a single, large (28 S) ribosomal RNA (rRNA) molecule complexed with multiple proteins, and the 40 S subunit. The RNA component of the 40 S subunit is a smaller (18 S) rRNA.

Figure 1.3, SYNTHESIS OF mRNA AND PROTEIN—THE PATHWAY OF GENE EXPRESSION.

Ribosomes read an mRNA sequence in a ticker tape fashion three bases at a time, inserting the appropriate amino acid encoded by each three-base code word or codon into the appropriate position of the growing protein chain. This process is called mRNA translation . The glossary used by cells to know which amino acids are encoded by each DNA codon is called the genetic code ( Table 1.1 ). Each amino acid is encoded by a sequence of three successive bases. Because there are four code letters (A, C, G, and U) and because sequences read in the 5′ → 3′ direction have a different biologic meaning than sequences read in the 3′ → 5′ direction, there are 4 3 , or 64, possible codons consisting of three bases.

Table 1.11
The Genetic Code a Messenger RNA Codons for the Amino Acids
Alanine Arginine Asparagine Aspartic Acid Cysteine
5′-GCU-3′ CGU AAU GAU UGU
GCC CGC AAC GAC UGC
GCA CGA
GCG AGA
AGG

Glutamic Acid Glutamine Glycine Histidine Isoleucine
GAA CAA GGU CAU AUU
GAG CAG GGC CAC AUC
GGA AUA
GGG

Leucine Lysine Methionine Phenylalanine Proline b
UUA AAA AUG c UUU CCU
UUG AAG UUC CCC
CUU CCA
CUC CCG
CUA
CUG

Serine Threonine Tryptophan Tyrosine Valine
UCU ACU UGG UAU GUU
UCC ACC UAC GUC
UCA ACA GUA
UCG ACG GUG
AGU
AGC
Chain Termination d
UAA
UAG
UGA

A , Adenosine; C , cytosine; G , guanosine; T , thymine; U , uracil.

a Note that most of the degeneracy in the code is in the third base position (e.g., lysine, AA [G or C]; asparagine, AA [C or U]; valine, GUN [where N is any base]).

b Hydroxyproline, the 21st amino acid, is generated by posttranslational modification of proline. It is almost exclusively confined to collagen subunits.

c AUG is also used as the chain-initiation codon when surrounded by the Kozak consensus sequence.

d The codons that signal the end of translation, also called nonsense or termination codons, are described by their nicknames amber (UAG), ochre (UAA), and opal (UGA).

There are 21 naturally occurring amino acids found in proteins. Thus more codons are available than amino acids to be encoded. As noted in Table 1.1 , a consequence of this redundancy is that some amino acids are encoded by more than one codon. For example, six distinct codons can specify incorporation of arginine into a growing amino acid chain, four codons can specify valine, two can specify glutamic acid, and only one each methionine or tryptophan. However, in no case does a single codon encode more than one amino acid. Codons thus predict unambiguously the amino acid sequence they encode. In contrast, one cannot easily read backward from the amino acid sequence to decipher the exact encoding DNA sequence. These facts are summarized by saying that the code is degenerate but not ambiguous.

Some specialized codons serve as punctuation points during translation. The methionine codon (AUG), when surrounded by a consensus nucleotide sequence motif (the Kozak box) near the beginning (5′ end) of the mRNA, serves as the initiator codon signaling the first amino acid to be incorporated. All proteins initially begin with a methionine residue, but this is often removed later in the translational process. Three codons, UAG, UAA, and UGA, serve as translation terminators, signaling the end of translation.

The adaptor molecules mediating individual decoding events during mRNA translation are small (40 bases long) RNA molecules called transfer RNAs (tRNAs). When bound into a ribosome, each tRNA exposes a three-base segment within its sequence called the anticodon . These three bases attempt to pair with the three-base codon exposed on the mRNA. If the anticodon is complementary in sequence to the codon, a stable interaction among the mRNA, the ribosome, and the tRNA molecule results. Each tRNA also contains a separate region that is adapted for covalent binding to an amino acid. The enzymes that catalyze the binding of each amino acid are constrained in such a way that each tRNA species can bind only to a single amino acid. For example, tRNA molecules containing the anticodon 3′-AAA-5′, which is complementary to a 5′-UUU-3′ (phenylalanine) codon in mRNA, can be bound to or charged with only phenylalanine; tRNA containing the anticodon 3′-UAG-5′ can be charged with only isoleucine, and so forth.

tRNAs and their amino acyl tRNAs transduce nucleic acid information into the amino acid sequence that determines it physiologic properties. Ribosomes provide the structural matrix on which tRNA anticodons and mRNA codons become properly exposed and aligned in an orderly, linear, and sequential fashion. As each new codon is exposed, the appropriate charged tRNA species is bound. A peptide bond is then formed between the amino acid carried by this tRNA and the C-terminal residue on the existing nascent protein chain. The growing chain is transferred to the new tRNA in the process, so that it is held in place as the next tRNA is brought in. This cycle is repeated until completion of translation. The completed polypeptide is then transferred to other organelles for further processing (e.g., to the endoplasmic reticulum and the Golgi apparatus) or released into the cytosol for association with other subunits to form complex multimeric proteins (e.g., hemoglobin) and so forth, as discussed in Chapter 4, Chapter 6 .

Regulation of Gene Expression

Virtually all cells of an organism receive a complete copy of the DNA genome inherited at the time of conception. The diversity of distinct cell types and tissues found in any complex organism is possible only because different portions of the genome are selectively expressed or repressed in each cell type. Each cell must “know” which genes to express, how actively to express them, and when to express them. This biologic necessity has come to be known as gene regulation or regulated gene expression . Understanding gene regulation provides insight into how pluripotent stem cells determine that they will express the proper sets of genes in daughter progenitor cells that differentiate along each lineage. Major hematologic disorders (e.g., the leukemias and lymphomas), immunodeficiency states, and myeloproliferative syndromes result from derangements in the system of gene regulation. An understanding of the ways that genes are selected for expression thus remains one of the major frontiers of biology and medicine. Chapter 2, Chapter 4, Chapter 6 offer a more thorough coverage of these topics. The following sections provide brief introductions.

Chromatin and the Epigenetic Regulation of Gene Expression

Only a small fraction of the 6 billion base pairs of DNA present in a diploid human cell codes for proteins or for the ribosomal, transfer, and spliceosome RNAs, even including the nearby DNA sequences (promoters, repressors, enhancers, silencers, and insulator sequences) that are needed to support regulated protein synthesis. As discussed later and in Chapter 4 , many additional species of RNA molecules exhibiting important regulatory effects on gene expression have been and still are being discovered. Yet, less than 10% of the genome accounts for all DNA sequences having a known function in gene expression. The remainder is called “DNA dark matter.” It is being intensively investigated, but its purpose and impact on homeostasis remain unknown. A major challenge for cells, then, is how to find the genes and how to identify and activate only those genes whose expression it needs for its vital functions. The field of study that has arisen to address these questions is called epigenetics. This section provides only a brief introduction to epigenetics; Chapter 2 offers a thorough review and documents the increasing importance of epigenetics to hematology.

Most of the DNA in living cells is inactivated by formation of a nucleoprotein complex called chromatin . The histone and nonhistone proteins in chromatin effectively sequester genes from enzymes needed for expression. The most tightly compacted chromatin regions are called heterochromatin . Euchromatin , less tightly packed, contains actively transcribed genes. Activation of a gene for expression (i.e., transcription) requires that it become less compacted and more accessible to the transcription apparatus. These processes involve both cis- acting and trans -acting factors. Cis- acting elements are regulatory DNA sequences within or flanking the genes. They are recognized by trans -acting factors, which are nuclear DNA–binding proteins needed for transcriptional regulation.

DNA sequence regions flanking genes are called cis -acting because they influence expression of nearby genes only on the same chromosome. These sequences do not usually encode mRNA or protein molecules. They alter the conformation of the gene within chromatin twisting or kinking the surrounding DNA in ways that facilitate or inhibit access to the factors that modulate transcription. When exogenous nucleases (DNAses) are added experimentally in small amounts to nuclei, these exposed regions are especially sensitive to their DNA-cutting action. Thus DNAse hypersensitive sites in chromatin have come to be useful as markers for regions in or near genes that are accessible for transcription ( Chapter 2 ).

DNA methylation is an epigenetic structural feature that also marks differences between actively transcribed and inactive genes. Most eukaryotic DNA is heavily methylated; that is, the DNA is modified by the addition of a methyl group to the 5 position of the cytosine pyrimidine ring (5-methyl-C). In general, heavily methylated genes are inactive; active genes are relatively hypomethylated, especially in the 5′ and 3′ flanking regions containing the promoter and other regulatory elements (see “Enhancers, Promoters, and Silencers”). These flanking regions frequently include DNA sequences with a high content of Cs and Gs (CpG islands). Hypomethylated CpG islands serve as markers of actively transcribed genes. For example, a search for undermethylated CpG islands on chromosome 7 facilitated the search for the gene for cystic fibrosis.

DNA methylation is facilitated by DNA methyltransferases (DMTs). DNA replication incorporates unmethylated nucleotides into each nascent strand, thus leading to demethylated DNA. For cytosines to become methylated, the methyltransferases must act after each round of replication. After an initial wave of demethylation early in embryonic development, regulatory elements are methylated during various stages of development and differentiation ( Chapter 2 ). Aberrant DNA methylation also occurs as an early step during tumorigenesis, leading to silencing of tumor suppressor genes and of genes related to differentiation. This finding has led to induction of DNA demethylation as a target in cancer therapy. Indeed, 5-azacytidine, a cytidine analog that inhibits DMT, and the related compound decitabine, are approved by the US Food and Drug Administration (FDA) for use in myelodysplastic syndromes, and their use in cases of other malignancies is being investigated.

The mechanisms by which particular regions of DNA are targeted for methylation are under intense investigation. It is becoming increasingly apparent that this modification begets further alterations in chromatin proteins that in turn influence gene expression.

The “opening” of chromatin is necessary but not sufficient for genes to be expressed. The sequences within the now-accessible regions of DNA that are intended for transcription, and no others, must be identified and configured for binding by the intranuclear factors and mRNA polymerase that will execute the transcription program. This is accomplished by the presence of sequences embedded near or within the gene that are recognized by specific proteins that activate or inactivate transcription depending on which stimulatory or inhibitory proteins the sequences attract. These are discussed in the next section.

The major protein components of chromatin are histones, which are a small, highly basic protein family that binds tightly to the acidic residues in DNA. Histones can be acetylated, reducing their affinity for DNA, or methylated, which stabilizes their binding. Histone acetylation, phosphorylation, and methylation of the N-terminal tail are the focus of intense study for their potential roles in opening or closing access to regions of DNA for expression. For example, acetylation of histone lysine residues (catalyzed by histone acetyltransferases) is associated with transcriptional activation. Conversely, histone deacetylation (catalyzed by histone deacetylase) leads to gene silencing. Histone deacetylases are recruited to areas of DNA methylation by DMT and by methyl–DNA-binding proteins, thus linking DNA methylation to histone deacetylation. Drugs inhibiting these enzymes have been demonstrated to be active anticancer agents and continue to be the focus of ongoing studies. The regulation of histone acetylation and deacetylation appears to be linked to gene expression, but the roles of histone phosphorylation and methylation are less well understood. Current research suggests that in addition to gene regulation, histone modifications contribute to the “epigenetic code” and are thus a means by which information regarding chromatin structure is passed to daughter cells after DNA replication occurs.

Regulatory Sequence Motifs in or Near Genes: Enhancers, Promoters, and Silencers

Several types of cis -active DNA sequence elements have been defined according to the presumed consequences of their interaction with nuclear proteins (see Fig. 1.5 ). Promoters are found just upstream (to the 5′ side) of the start of mRNA transcription (the CAP). mRNA polymerases appear to bind first to the promoter region and thereby gain access to the structural gene sequences downstream. Promoters thus serve a dual function of being binding sites for mRNA polymerase and marking for the polymerase the downstream point at which transcription should start.

Enhancers are more complicated DNA sequence elements. Enhancers can lie on either side of a gene or even within the gene. Enhancers are bound by enhancer binding proteins, thereby stimulating expression of genes nearby. The domain of influence of enhancers (i.e., the number of genes to either side whose expression is stimulated) varies. Some enhancers influence only the adjacent gene; others seem to mark the boundaries of large multigene clusters (gene domains) whose coordinated expression is appropriate to a particular tissue type or a particular time. For example, the very high levels of globin gene expression in erythroid cells depend on the function of an enhancer that seems to activate the entire gene cluster and is thus called a locus-activating region (see Fig. 1.5 ). The nuclear factors interacting with enhancers are probably induced into synthesis or activation as part of the process of differentiation. Chromosomal rearrangements that place a gene that is usually tightly regulated under the control of a highly active enhancer can lead to overexpression of that gene. This commonly occurs in Burkitt lymphoma, for example, in which the MYC proto-oncogene is juxtaposed and dysregulated by an immunoglobulin enhancer.

Silencer sequences serve a function that is the obverse of enhancers. When bound by the appropriate nuclear proteins, silencer sequences cause repression of gene expression. Some evidence indicates that the same sequence elements can act as enhancers or silencers under different conditions, presumably by being bound by different sets of proteins having opposite effects on transcription. Insulators are sequence domains that mark the “boundaries” of multigene clusters, thereby preventing activation of one set of genes from “leaking” into nearby genes. The concerted actions of enhancers, silencers, and insulators delineate the specific DNA sequences to be transcribed or prevented from transcription within an opened region of chromatin.

One way that activation of transcription of a genomic DNA segment is accomplished is by a “looping” out phenomenon whereby some DNA binding proteins first bind to each end of a potentially expressed segment of open chromatin; those proteins then bind to one other, pulling the ends together and forming a looped-out segment of chromatin. Additional factors then bind to enhancers, silences, promotors, and enhancers, thereby demarcating those parts meant for transcription or silencing. Loops, in other words, may be a secondary structure that identifies areas primed for transcription (see Fig. 2.1) .

Transcription Factors

Transcription factors are nuclear proteins that exhibit gene-specific DNA binding. Considerable information is now available about these nuclear proteins and their biochemical properties, but their physiologic behavior remains incompletely understood. Common structural features have become apparent. Most transcription factors have DNA-binding domains sharing homologous structural motifs (cytosine-rich regions called zinc fingers, leucine-rich regions called leucine zippers, and so on), but other regions appear to be unique. Some factors recognize specific DNA sequence motifs within promoters, enhancers, silencers, or insulators and bind directly to them, whereas others bind to these factors, forming complexes that promote or inhibit transcription. Many factors implicated in the regulation of growth, differentiation, and development (e.g., homeobox genes, proto-oncogenes, antioncogenes) appear to be DNA-binding proteins and may be involved in the steps needed for activation of a gene within chromatin. These factors are discussed in more detail in several other chapters (see Chapter 2, Chapter 4, Chapter 6 ); when mutated, many are involved in the pathogenesis of blood dyscrasias, such as c-myc and c-myb.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here