Protein Architecture: Relationship of Form and Function


Previous chapters outline the central dogma of mlecular biology: the storage of genetic information in DNA and its regulated transcription into messenger RNA and eventual translation into proteins. In this chapter, we briefly outline the chemical structure of proteins and their posttranslational modifications (PTMs). We explain how the properties of the 20 amino acids of which proteins are composed allow these polymers to fold into compact, functional domains and how particular domains and motifs have been assembled, modified, and reused in the course of evolution. Finally, we describe a sampling of proteins and domains of relevance to the hematologist and explore briefly how point mutations, chromosomal translocations, and other genetic alterations may modify protein structure and function to cause disease.

Amino Acids and the Peptide Bond

Proteins are linear polymers of the 20 naturally occurring amino acids, linked together by the peptide bond. All of the amino acids share a common core or backbone structure and differ only in the “sidechain” emanating from the central “α-carbon” (Cα) of this core. The common backbone elements include an amino group, the central Cα, and a carboxylic acid group. Peptide bonds are formed by reaction of the carboxylic acid of one amino acid with the amino group of the next amino acid in the chain. This reaction is templated and catalyzed by the ribosome. Coupling of multiple amino acids together via the peptide bond produces the repeating mainchain structure of the polypeptide chain, composed of the amide (NH) nitrogen, Cα, and carbonyl carbon (CO), followed by the amide nitrogen of the next amino acid in the chain ( Fig. 7.1A ). The resonant, partial double bond character of the peptide bond prevents rotation about this bond; thus the five mainchain carbon, nitrogen, and oxygen atoms of each peptide unit lie in a plane. The conformational flexibility in the polypeptide chain is conferred by rotation about the bonds on either side of the Cα atom; these bond angles are referred to as phi and psi angles. The angle of the N–Cα bond is the phi angle (Φ), and that of the Cα–CO bond is the psi angle (ψ).

Figure 7.1, (A) Diagram showing a polypeptide chain where the mainchain atoms are represented as peptide units, linked through the α-carbon (Cα) atoms. Each peptide unit is a planar, rigid group (shaded pink) and has two degrees of freedom; it can rotate around the Cα-CO bond and the N-Cα bond. The peptide bonds are depicted in the trans conformation; adjacent Cα carbons and their side chains (highlighted blue) on opposite sides of the N–Cα bond. This is the preferred configuration for most amino acids because it minimizes steric hindrance. (B) The α helix. The hydrogen bonds between residue n and residue n + 4, which stabilizes the helix, are shown as dashed lines. (C) Schematic drawing of a mixed β-sheet. The three β-strands on the left are antiparallel to one another, while the two rightmost β-strands are parallel. The hydrogen bonds that stabilize the sheet are shaded.

The primary structure or primary sequence of a protein refers to the order in which various residues of the 20 amino acids are assembled into the polypeptide chain, and this sequence is critically important for determining the three-dimensional fold and thus function of the protein. The diverse chemical structure and physicochemical properties of the 20 amino acid sidechains guide the three-dimensional fold of proteins and also provide for the enormous repertoire of protein function, from catalysis of myriad chemical reactions to immune recognition to establishment of muscle and skeletal structure.

The amino acids can be divided into general classes based on the physicochemical properties of their side chains and in particular their propensity to interact with water. Hydrophobic amino acids have aliphatic or aromatic side chains and include alanine, valine, leucine, isoleucine, proline, methionine, and phenylalanine. The hydrophobic amino acids predominate in the interior of proteins, where they are sequestered from water. They tend to pack against each other via van der Waals interactions, which contribute to the overall stability of folded protein domains. By contrast, hydrophilic or polar amino acids (including serine, threonine, tyrosine, asparagine, glutamine, cysteine, and tryptophan) are often exposed on the surface of proteins, where they can form hydrogen bonds with each other, with the protein mainchain, and with water or ligand molecules. Hydrogen bonding refers to the attractive interaction of a proton covalently bonded to one electronegative atom (usually a nitrogen or oxygen in proteins) with another electronegative atom. Hydrogen bonds are an important contributor to the stability of proteins and to the specificity of protein-protein and protein-ligand interactions. Charged amino acids are also polar and are important participants in hydrogen bonding. Hydrogen bonds between negatively charged (acidic) and positively charged (basic) amino acids are also termed salt bridges and are also important components of protein stability and protein-protein interactions. The acidic amino acids are aspartate and glutamate, and the basic amino acids are lysine, arginine, and histidine. Histidine merits special mention because it is the only amino acid whose side chain can be protonated or unprotonated, and therefore charged or uncharged, around physiologic ranges of pH. For this reason, histidine is part of many enzyme active sites. For example, in the serine proteases of the coagulation cascade, an active site histidine acts as a general base, accepting and then releasing a proton in sequential steps of the enzymatic reaction. It is also important to note that some of the polar amino acids are amphipathic (i.e., they have both polar and hydrophobic character). This dual nature of threonine, lysine, tyrosine, arginine, and tryptophan makes them well suited for participating in protein-protein interactions, where they may be alternately exposed to solvent or buried upon formation of a complex.

The amino acid cysteine is unique in that its side chain contains a relatively reactive thiol group. The structure of cell-surface and extracellular proteins is often stabilized by disulfide bonds , covalent bonds formed between the thiol groups of spatially juxtaposed cysteine residues. In general, disulfide bonds are not found in intracellular proteins, where the reducing environment disfavors their formation. Disulfide bonds can form between cysteines within the same polypeptide chain, stabilizing the fold of the polypeptide backbone, or they may covalently join two different polypeptide chains, for example, the heavy and light chains of an immunoglobulin (Ig). In addition to their role in disulfide bond formation, cysteine residues often contribute to protein stability via their participation in metal ion coordination, in particular zinc, which is often bound by conserved sets of cysteine and histidine residues in small protein domains.

Protein Secondary Structure

The alternating pattern of hydrogen bond–donating amide groups and hydrogen bond–accepting carbonyl groups gives rise to repeating elements of protein structure that are stabilized by hydrogen bonds between these mainchain groups. These secondary structure elements include α-helices and β-sheets. In an α-helix, the mainchain adopts a right-handed helical conformation in which the carbonyl oxygen of the i th residue in the polypeptide chain accepts a hydrogen bond from the amide nitrogen of the ( i + 4) th residue (see Fig. 7.1B ). The pattern may repeat for only a few residues, forming a single turn of α-helix, or for more than 100 residues, forming dozens of turns of helix. There are 3.6 residues per turn of helix, and the pitch or rise of the helix is 1.5 Å per residue or 5.4 Å per turn. The sidechains of residues in an α-helix project outward, away from the central axis of the helix. Often a polar sidechain will “cap” the end of a helix by forming a hydrogen bond with the otherwise unpartnered amide or carbonyl group at the N- or C-terminal end of the helix.

In β-sheet secondary structure, the protein backbone adopts an extended conformation and two or more strands are arranged side by side, with hydrogen bonds between the strands. The strands can run in the same direction (parallel β sheet) or antiparallel to one another, and both parallel and antiparallel strands are often found together in mixed sheets ( Fig. 7.2C ). In β-sheets, the sidechains of a given strand extend alternately above and below the plane defined by the hydrogen-bonded mainchains. Other common types of secondary structure include a variant of the helix with an i + 3 hydrogen bonding pattern (the 3 10 helix) and specific types of β-turns, short segments connecting other elements of secondary structure that are stabilized by β-sheet–like hydrogen bonds. Although any of the amino acids can be found within α-helices or β-sheets, the special characteristics of proline and glycine merit mention. The cyclic structure of proline means that it lacks an amide proton; thus it introduces an irregularity in hydrogen bonding. For this reason, it is infrequently found in α-helices, but if present, it will introduce a “kink” stemming from its constrained structure. Glycine lacks a sidechain—it has only a second hydrogen atom on its Cα—and therefore has less steric restriction and can adopt a wider range of backbone phi and psi angles. This added flexibility means that it tends to disfavor regular secondary structure.

Figure 7.2, SEVERAL COMMON DOMAIN STRUCTURES.

Because proteins are large and complicated structures, they are typically illustrated with “ribbon” diagrams that trace the path of the polypeptide backbone. In such representations, helices are drawn as helical coils or cylinders, and β-strands as elongated rectangles with an arrow as a guide to the direction of the protein chain from its amino- to carboxy-terminal end.

Posttranslational Modifications

The covalent structure of proteins is commonly modified in structurally and functionally important ways beyond the linear coupling of amino acids via the peptide bond. Regulated proteolysis can be considered a PTM and can serve an important regulatory role, as in the cleavage of prothrombin in the blood clotting cascade.

A number of functional groups are appended to proteins to regulate their function, localization, protein interactions, and degradation. Examples of these PTMs include phosphorylation, glycosylation, ubiquitination, methylation, acetylation, and lipidation. PTMs occur at distinct amino acid side chains or peptide linkages and are most often mediated by enzymatic activity and can occur at any step in the “life cycle” of a protein. As discussed later, a number of protein domains have evolved to recognize and bind specifically to proteins labeled by a particular PTM. Protein phosphorylation , most commonly on serine, threonine, or tyrosine residues, is one of the most important and well-studied PTMs. Phosphorylation is mediated by protein kinases and can activate or deactivate many enzymes through conformational changes and as such plays a critical role in the regulation of many cellular processes, including cell cycle, growth, apoptosis, and signal transduction pathways. Protein glycosylation encompasses a diverse selection of sugar-moiety additions to proteins that ranges from simple monosaccharide modifications to highly complex branched polysaccharides. Glycosylation has significant effects on protein folding, conformation, distribution, stability, and activity. Carbohydrates in the form of asparagine-linked (N-linked) or serine/threonine-linked (O-linked) oligosaccharides are major structural components of many cell surface and secreted proteins and also many viral proteins. Protein methylation on arginine or lysine residues is carried out by methyltransferases with S-adenosyl methionine (SAM) as the primary methyl group donor. Methylation is an important mechanism of epigenetic regulation because histone methylation and demethylation influence the availability of DNA for transcription. N- acetylation , the transfer of an acetyl group to the amine nitrogen at the N-terminus of the polypeptide chain, occurs in a majority of eukaryotic proteins. Lysine acetylation and deacetylation is an important regulatory mechanism in a number of proteins. It is best characterized in histones, where histone acetyl transferases (HATs) and histone deacetylases (HDACs) regulate gene expression via modification of histone tails. Many cytoplasmic proteins are also acetylated, and therefore acetylation seems to play a greater role in cell biology than simply transcriptional regulation. Lipidation is a modification that targets proteins to membranes in organelles, vesicles, and the plasma membrane. Examples of lipidation include myristoylation, palmitoylation, and prenylation . Each type of modification gives proteins distinct membrane affinities, although all types of lipidation increase the hydrophobicity of a protein and thus its affinity for membranes. In N-myrisoylation, the myristoyl group (14-carbon saturated fatty acid) is transferred to an N-terminal glycine by N-myristoyltransferase. The myristoyl group does not always permanently anchor the protein in the membrane; in a number of proteins the N-terminal myristoyl group has been observed to pack into the protein core. N-myristoylation can therefore act as a conformational localization switch in which protein conformational changes influence the availability of the handle for membrane attachment.

Finally, ubiquitination (or equivalently, ubiquitylation ) involves modification of a target protein with the protein ubiquitin, which is typically attached to a lysine residue via an isopeptide bond with the C-terminus of ubiquitin. Modification with ubiquitin, either singly (monoubiquitination) or in chains (polyubiquitination), is used to regulate diverse processes from protein degradation to gene expression to the cell cycle.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here