Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The major histocompatibility complex (MHC) genomic region contains many genes with immune-related functions, including the human leukocyte antigen (HLA) genes.
Identification of genes within the MHC has increased the possibility of defining the genetic basis of immunity.
The clustering of immune-related genes in the MHC region may not be coincidental and may be the result of evolutionary forces joining genes with similar functions.
The MHC is associated with more diseases than any other region on the human genome.
Presence of fixed stretches of MHC deoxyribonucleic acid (DNA) (haplotypes) complicates the study of susceptibility genes but also provides the means for their identification.
Methods are available for detecting associations or linkage between polymorphisms found with the MHC and disease.
Although MHC class I– and class II–encoded proteins are distinguished by structural and functional similarities shared by the members of each class, class III molecules can be defined only as non-I and non-II because their genes and their products have no common features and are not recognized by T cells.
Despite convincing genetic association studies showing the importance of the class III region in health and disease, neither the genes responsible for disease nor the underlying mechanism of pathology is known for many of the associated conditions.
Almost half of the genes within the MHC class III region play roles in the innate immune system, including members of the complement cascade (C4, C2, BF), tumor necrosis factor (TNF, LTA, LTB), and lymphocyte antigen-6 (LY6) families.
Deficiency of the second component of the complement system is the most common complement protein deficiency state in European populations, but rare in others.
Complete C4 deficiency is extremely rare. The level of C4 in the serum of patients with C4 null alleles is extremely variable and cannot be used reliably to detect heterozygotes for complete deficiency because there is no correlation between the level of C4 and the number of expressed C4 genes.
Extended haplotypes are highly characteristic of an ethnic subgroup and have lower frequencies or do not occur at all in other ethnic groups.
It is critical to know the ethnic distribution of a disease-associated MHC allele to evaluate whether the allele in patients is increased compared with ethnically matched control populations and is in fact truly a marker for the ethnic distribution of the disease.
A major problem with MHC gene associations is their incomplete penetrance, making ordinary formal segregation and linkage studies difficult to carry out.
Understanding the role of the major histocompatibility complex (MHC) in immune responses and in the pathogenesis of disease requires defining polymorphisms in the class I and class II regions, as well as genes in the central MHC class III region. The strong interest in the MHC genomic sequence originates from its established role in regulating inflammation and the innate and adaptive immune responses. Several genes in the MHC play important roles in cellular discrimination of self and nonself that require an essential knowledge of the effects of the MHC system in transplantation medicine and in susceptibility to autoimmunity. Variants of class I and class II genes were described in detail in Chapter 50 . In this chapter, we discuss the non–human leukocyte antigen (HLA) region genes, disease associations, and methods for detecting the association of diseases with genetic markers.
Completion of the MHC deoxyribonucleic acid (DNA) sequence preceded completion of the total human genome sequence by almost 4 years ( ). The rush to complete the entire MHC sequence followed the need to understand the biology and genetics of the mouse MHC (H-2) and HLA regions, which were initially provided by murine inbred experiments and serologic typing ( ). Although the MHC was discovered more than 50 years ago, its nature has been resolved only in the last 2 decades, with the advent of DNA cloning and discovery of the structure of class I and class II molecules ( ; ).
The MHC located on the short arm of chromosome 6 spans approximately 4 megabases (Mb) and contains more than 250 identified loci, by far the most gene-dense region of the human genome sequenced to date ( ). It encodes the most polymorphic human proteins—the HLA class I and class II molecules, with over 24,000 allelic variants identified so far ( ). More than 40% of the total expressed loci in the MHC are associated with immune-related functions. This clustering of immune-related genes in the MHC region may not be coincidental and may be the result of evolutionary forces joining genes with similar functions.
Traditionally, the MHC region has been classified into three regions from telomere to centromere: HLA class I, non-HLA class (MHC class III), and HLA class II. HLA class I and class II gene functions were reviewed in detail in Chapter 50 . We will concentrate on the MHC class III region and will later return to HLA class I and class II genes to discuss disease associations.
The human MHC class III, or central MHC region, located between the HLA class I and class II regions, is the most gene-dense region of the human genome. Specifically, the MHC class III region contains 61 protein-coding genes in ∼900-kb sequence, with an average gene size of ∼8.5 kb ( ), compared with the entire human genome, which contains, on average, fewer than 11 genes/Mb and has an average gene size of 27 kb ( ). Although HLA class I– and class II–encoded proteins are distinguished by structural and functional similarities shared by the members of each class, MHC class III molecules can be defined only as non-I and non-II because their genes and their products have no common features and are not recognized by T cells. Figure 51.1 shows a gene map of the MHC class III region.
Despite convincing genetic association studies showing the importance of the MHC class III region in health and disease, neither the genes responsible for disease nor the underlying mechanism of pathology is known for many of the associated conditions. One reason for this is the fact that the functions of nearly half the genes in the MHC class III region are not completely known. A high-throughput yeast two-hybrid system was used to look into the function of several intracellular proteins encoded within the MHC class III region ( ). This study revealed that approximately one third of the analyzed proteins encoded within the MHC class III region may have a role in mRNA processing, which again suggests clustering of functionally related genes within this region of the human genome. Specifically, three proteins (STK19, PBX2, and NELFE) of those in the MHC class III region interact with proteins implicated in transcriptional regulation. Four proteins (LSM2, DDX39B, DXO, and SKIV2L) are orthologs of yeast proteins involved in mRNA processing, and another five proteins (PRRC2A, STK19, CLIC1, PBX2, and ABHD16A) interact with proteins previously implicated in RNA processing. Proteins involved in RNA processing are usually categorized as “housekeeping” proteins, but they are often critical for innate immunity against viral RNA. Metabolism of intracellular nucleic acids is also critical for limiting autoimmunity against DNA and RNA. SKIV2L (superkiller viralicidic activity 2-like)–deficient patients have an elevated interferon response ( ) and present with trichohepatoenteric syndrome, an inflammatory bowel disease–like intractable diarrhea ( ).
The region closer to HLA class II genes contains several genes of known function, including TNXB ( ), which encodes tenascin X, a large extracellular matrix protein expressed in connective tissues, and NEU1 , which encodes neuraminidase 1, a lysosomal sialidase, and SLC44A4 , which encodes a choline transporter. Mutations of TNXB are a cause of Ehlers-Danlos syndrome ( ), primary myopathy ( ), and vesicoureteral reflux ( ; ). Mutations of the NEU1 gene are responsible for congenital sialidosis ( ; ). Mutations in SLC44A4 cause autosomal dominant nonsyndromic deafness ( ).
The region toward the class I genes also contains several genes of known function, including VARS , the cytoplasmic valyl-tRNA synthetase, MSH5 , a homolog of Escherichia coli MutS recombinase, and CSNK2B , casein kinase type II. Mutations in VARS lead to neurodevelopmental encephalopathy and microcephaly ( ), mutations in MSH5 lead to primary ovarian insufficiency ( ), and pathogenic variants in CSNK2B cause intellectual disability and global developmental delay with or without epilepsy ( ).
Almost one half of genes within the MHC class III region play roles in the innate immune system, including members of the complement fixation cascade (C4, C2, CFB) and the tumor necrosis factor family (TNF, LTα, LTβ), which will be discussed in greater detail later in this chapter. Other members of the immunoglobulin superfamily are also located within the MHC class III regions: NCR3 (natural cytotoxicity triggering receptor 3) and the less characterized LY6G genes ( ). The NCR3 gene encodes the NKp30 protein responsible for triggering natural killer (NK) cells ( ). NK cells play a variety of roles in the immune system, including defense against infectious organisms as well as tumors, and have also been implicated in autoimmunity. NKp30 is involved in the killing of cells infected with Plasmodium falciparum ( ) and cytomegalovirus ( ), fungi ( ), hematologic and solid tumor cells ( ) and contributes to the pathogenesis of primary Sjögren syndrome ( ). Variants in NCR3 gene contribute to genetic risk of malaria attacks ( ). Increased expression of NCR3 and the adjacent LST1 (leukocyte-specific transcript 1) gene has been found in the blood of rheumatoid arthritis patients ( ); and haplotype-specific variants in these genes alter susceptibility to arthritis in animal models ( ). LST1 encodes a palmitoylated transmembrane signaling adaptor and membrane scaffolding protein in myeloid cells ( ). Both genes are also significantly upregulated in response to lipopolysaccharide, interferon-γ, and bacterial infection, suggesting a role for these gene products in autoimmune-related inflammation and dendritic cell/NK cell–associated functions. The Ly6 (lymphocyte antigen 6 complex) family of proteins were originally described using antisera to lymphocytes and were used as markers for lymphocyte subset differentiation, but now we know that they belong to a family of 35 genes in humans, 5 of which are located in the MHC class III region. The functions of many Ly6 proteins are still not well understood, but it ranges from neutrophil migration, cell-cell interaction, complement regulation, macrophage activation, to modulation of viral infections ( ).
The NFKBIL1 (NF-κB-inhibitor–like 1) and BTNL2 (Butyrophilin-like protein 2) genes are also located in the MHC class III region ( ; ). Much interest has been focused on the NFKBIL1 ( ) and BTNL2 ( ) proteins after single mutations in their promoter regions were linked to rheumatoid arthritis ( ) and sarcoidosis ( ), respectively. BTNL2 may also contribute to the risk of type 1 diabetes ( ). AIF1 (Allograft inflammatory factor-1) encodes a calcium-binding protein highly expressed in monocytes and macrophages and involved in phagocytosis in these cells and in antigen uptake in intestinal M cells, and have also been implicated in rheumatoid arthritis and other inflammatory processes ( ; ). Finally, the genes for the microsomal enzyme steroid P450 21-hydroxylase and three members of the major heat-shock protein 70 family are also contained in the MHC class III region and will be discussed later.
Tumor necrosis factor and lymphotoxin α (LTα) are potent immunomodulatory cytokines produced in response to inflammatory stimuli. TNF and LTα proteins are each encoded by separate genes ( TNF and LTA ) and share approximately 34% amino acid identity ( ). TNF and LTα are either maintained as cell surface molecules or released from cells. LTα is retained on the cell surface via a transmembrane region. Surface TNF results not from the presence of a transmembrane region, but rather from an association with lymphotoxin β (LTβ) ( ). LTβ encoded by LTB has 21% and 24% amino acid identity with TNF and LTα , respectively.
To date, several TNF promoter polymorphisms have been identified and correlated with susceptibility to various autoimmune and infectious diseases. This work remains controversial both at the genetic level and, more importantly, with regard to functional relevance. As demonstrated for some of the TNF promoter polymorphisms, these gene variations may be ethnic-specific markers for other linked and unidentified factors that may have an impact on host immune response in the development of autoimmune diseases and susceptibility or resistance to infectious diseases in a given population ( ). Polymorphisms in the TNF promoter region have been correlated in large-scale population studies with cerebral malaria ( ), rheumatoid arthritis ( ), coronary heart disease ( ), type 2 diabetes ( ), Alzheimer disease ( ), ankylosing spondylitis ( ), mortality to septic shock ( ), mucocutaneous leishmaniasis ( ), periodontitis ( ), immune thrombocytopenia ( ; ), inflammatory bowel disease ( ), and various hematologic and nonhematologic malignancies ( ; ; ). Similarly, large-scale analysis has linked both promoter and coding polymorphisms within the LTA gene to myocardial infarction ( ), stroke ( ), sepsis ( ), leprosy ( ), autoimmune diseases ( ), and hematologic and nonhematologic malignancies ( ; ).
Anti-TNF therapy is a breakthrough in the management of rheumatoid arthritis, psoriasis, psoriatic arthritis, juvenile arthritis, ankylosing spondylitis, and inflammatory bowel disease (Crohn disease and ulcerative colitis). Genetic variants altering the expression level of TNF can potentially alter the response to anti-TNF therapies as well. Similar to disease association studies, the effect of TNF promoter variants on therapy outcomes and their use in management remains unsettled ( ; ; ; ; ; ).
Stress proteins, or heat-shock proteins, are expressed by cells in response to a variety of stress stimuli. This response has been observed in all species examined to date. The family of stress proteins, 70 kDa in size, has a highly conserved amino acid sequence identity as found from primitive eukaryotes to humans. Several studies have identified loci for heat-shock protein 70 (HSP70) on chromosomes 6, 14, and 21 ( ; ). Three genes encoding members of the HSP70 family are located telomeric to the C2 locus ( HSPA1L, HSPA1A, and HSPA1B, formerly known as HSP70-Hom, HSP70-1, and HSP70-2, respectively).
Sequence analysis of HSPA1A and HSPA1B genes has shown that they are intronless genes that encode an identical protein product of 641 amino acids. HSPA1L is also an intronless gene that encodes a protein of 641 amino acids and has 90% sequence identity with HSPA1A ( ). Because of the high degree of sequence similarity between coding regions of the various HSP70 genes, DNA probes corresponding to coding regions tend to cross-hybridize. However, sufficient sequence differences have been observed between the 5′ and 3′ untranslated regions to design oligonucleotide primers and probes to allow specific amplification and hybridization of the three genes ( ).
HSP70 plays a critical role in assisting protein folding, disaggregation, and degradation, and therefore it influences all aspects of cell biology. Polymorphisms in the HSP70 gene cluster have been associated with the magnitude of the heat-shock response ( ), schizophrenia ( ; ), leukemia ( ), age-related cataract ( ), longevity ( ), among others. Other polymorphisms in the HSP70 gene have been correlated to cytokine response in trauma ( ), Parkinson disease ( ), uveitis in patients with sarcoidosis ( ), and risk for clozapine-induced agranulocytosis in Ashkenazi Jews ( ).
The complement C2, C4, and factor B proteins encoded within the MHC class III demonstrate inherited structural variants that can be studied by using techniques that detect differences in net surface charge due to amino acid differences. Two methods are used to separate these proteins: (1) high-voltage agarose gel electrophoresis, which detects variations in mobility due to charge differences between proteins at a given pH, and (2) isoelectric focusing in thin-layer polyacrylamide gels, which shows differences in isoelectric points. Proteins can be visualized by immunofixation electrophoresis using insolubility of antigen-antibody complexes, or by detection of functional hemolytic activity with overlay gels, in which antibody-sensitized sheep erythrocytes are combined with complement-deficient serum ( ).
At the protein level, C2 shows minor polymorphism by isoelectric focusing. The alleles for C2 are C (common), a less common B (basic) allele with three rare basic variants, and four rare A (acidic) variants ( ). The polymorphic site for the C2∗B allele is carried by the C2a complement fragment. The C2 protein is genetically polymorphic in a wide variety of ethnic groups, although C2∗C accounts for more than 90% of C2 genes in most populations ( ; ). In Caucasian and Asian populations, C2∗B accounts for 5% to 10% of C2 genes. A nonexpressed allele ( C2∗QO ) is found in 2% of the Caucasian European population ( ). For C2 typing, proteins are separated by isoelectric focusing in polyacrylamide gels and are visualized by C2-induced hemolysis in overlay gels. Diluted normal human serum can replace C2-deficient serum as a reagent, because C2 is the limiting factor in the classical complement pathway.
Two distinct but closely related loci for the C4 protein ( C4A and C4B ), presumably the result of tandem gene duplication, encode for two forms of C4. The genetics of C4 is extraordinarily complex, not only at the genetic level but also at the protein level. It has been shown that the number of C4 genes in humans varies from two to four, and in exceptional cases, six ( ; ). The C4 gene duplication unit includes its neighboring genes STK19 at the 5′ region and CYP21 and TNXB at the 3′ region ( ). Misalignments (unequal crossing-overs) of heterozygous mono-, bi-, or trimodular chromosomes during meiosis can cause further rearrangements and lead to congenital adrenal hyperplasia and Ehlers-Danlos syndrome by deleting CYP21 and TNXB genes, respectively, in addition to changing the number of C4 genes. Three quarters of C4 genes carry endogenous retrovirus sequences that might confer intrinsic protection against exogenous retroviral infection ( ). Recent data suggest a negative correlation between the presence of endogenous retrovirus sequences and the level of C4 proteins, as well as a direct relationship between the number of genes and C4 protein levels ( ).
At the protein level, C4A and C4B differ only by four amino acid residues in the α chain ( PC PV LD and LS PV IH , encoded by exon 26 of C4A and C4B, respectively) but these changes lead to altered ligand specificity: C4A preferentially binds to amino groups on immune complexes, whereas C4B preferentially binds to hydroxyl or carbohydrate groups (Yu et al., 1988). C4B is several times more hemolytically active than C4A, but C4A is more active at inhibiting the formation and dissolution of immune complexes than C4B. C4A proteins carry the Rodgers blood group antigenic determinant, whereas C4B carries the Chido determinant. Extensive genetic polymorphism is detectable in C4A and C4B proteins among populations; more than 35 variants have been observed by agarose gel electrophoresis and DNA-based typing methods ( ; ; ; ). Among the C4A and C4B allotypes, the most common alleles are C4A∗3 and C4B∗1 . C4A∗4, C4A∗2, C4B∗2, and C4B∗5 show a worldwide general distribution. C4A∗6 is also observed in many populations, with the exception of some Mongoloid groups. C4B∗3 is identified mainly in African and Caucasian groups.
C4 typing requires a combination of several techniques ( ): (1) Immunofixation electrophoresis after treatment with neuraminidase. The patterns produced show three bands for each variant with some overlap occurring between certain variants. Additional treatment of the sample with carboxypeptidase reduces each variant to a single band. (2) Detection of C4A versus C4B by functional hemolytic assay. This method distinguishes C4A and C4B overlapping patterns because C4B variants have 5 to 10 times the hemolytic activity of C4A variants. (3) Rodgers (C4A) or Chido (C4B) serologic reactivity. The serum is incubated with human anti-Rodgers or human anti-Chido to test for inhibition of agglutination with appropriate positive erythrocytes. Alternatively, C4 variants can be typed for Chido or Rodgers reactivity by immunoblotting. Null alleles of C4 heterozygotes can be detected in two ways. Electrophoresis of C4 null ( C4A QO and C4B QO ) samples demonstrates an absence of bands in homozygotes, but in heterozygotes requires quantification by visual inspection, crossed immunoelectrophoresis, or densitometric scanning of immunofixation patterns. An alternative method is to determine the presence and ratios of the C4A and C4B α chains after sodium dodecyl sulfate polyacrylamide gel electrophoresis of immunoprecipitates. More recently, DNA-based typing methods, including sequence-specific primer (SSP) amplification, restriction fragment length polymorphism analysis, real-time polymerase chain reaction (PCR), and direct DNA sequencing, have been used to complement inconclusive C4A and C4B protein allotyping ( ; ).
Factor B is synthesized by the CFB gene. Human factor B is highly polymorphic, with more than 20 variants identified to date, including several dysfunctional proteins, those present in low concentration, and null alleles. Factor B typing is normally performed by combining agarose electrophoresis and/or isoelectric focusing at the protein level with PCR analysis at the DNA level ( ). By using these methods, it is possible to identify two very common alleles, CFB∗F and CFB∗S , two less common alleles, CFB∗F1 and CFB∗S1 , and a host of rare alleles. Variants are named for their decimal fraction migration in gel electrophoresis: CFB F to CFB F1 for fast variants, and CFB S to CFB S1 for slow variants. In European Caucasian populations, CFB∗F has a frequency of 0.2, CFB∗S of 0.77, CFB∗F1 of 0.01, and CFB∗S1 of 0.01; rare alleles account for only 0.002. CFB∗S is most common in Caucasian, Mongoloid, and Australoid populations, and CFB∗F is most common in Negroid populations. CFB∗F1 is observed in African groups, as well as in some Caucasian populations.
Deficiency of the second component of the complement system, C2, has been reported in about 1 in 10,000 Caucasians, and is the most common complement protein deficiency state in that population ( ; ). People with complete C2 deficiency are homozygous for null C2 alleles ( C2∗Q0 for quantity zero). C2∗Q0 results from a 28-bp gene deletion that generates a frameshift and a stop codon 14 bp distal to the end of exon 6 ( ). Rarely, C2 deficiency is the result of missense mutations resulting in impaired C2 secretions that have been termed C2 deficiency type II ( ). Although C2 deficiency can lead to disease, most patients (60%) remain asymptomatic. Up to 25% of homozygotes for C2 deficiency have increased susceptibility to bacterial infection due to immunoglobulin deficiency ( ; ). Between 20% and 40% of reported patients with C2 deficiency have a systemic lupus–like disease ( ; ; ). A polymorphism in the C2 gene has been correlated with reduced risk for age-related macular degeneration ( ; ).
A high incidence of C4 null alleles is seen in the general population. C4A∗Q0 and C4B∗Q0 result from gene deletions, premature stop codons, and other mutations that cause transcription failure ( ; ; ). Thirty-five percent of individuals of all races do not express one C4A or C4B gene (i.e., carry C4A∗Q0 or C4B∗Q0 ), 8% to 10% carry two null alleles, and less than 1% do not express three alleles. Complete C4 deficiency (trans C4A∗Q0 , C4B∗Q0 ) haplotypes are extremely rare ( ). The C4A∗Q0 allele, particularly in homozygous individuals or those with complete C4 deficiency, has been associated with systemic lupus erythematosus ( ; ). This susceptibility is probably related to defective handling of immune complexes ( ; ). Children homozygous for C4B∗Q0 have a 3.5-fold greater incidence of bacterial meningitis ( ). Homozygous C4B deficiency is described in families with non-SLE glomerulonephritis ( ; ; ), and the C4B∗Q0 allele is thought to increase the risk for Henoch–Schönlein purpura ( ). Increased frequency of the C4B∗Q0 allele has been found in patients with severe coronary artery disease who underwent bypass compared with healthy controls ( ).
Absence of homozygous factor B deficiency in humans has led to the notion that the condition might be lethal during embryonic development. Based on inheritance patterns and serum levels, some heterozygotes from CFB∗Q0 have been identified as having reduced protein products ( ), and recently compound heterozygous loss-of-function mutations in CFB were identified in a patient with recurrent pneumococcal and meningococcal infections ( ). The CFB F variant has been associated with higher protein concentration but lower hemolytic activity than CFB S ( ). Mutations in the CFB gene have been associated with age-related macular degeneration ( ; ) and atypical hemolytic-uremic syndrome ( ).
The four genes of the complement region occupy approximately 120 kb of genomic DNA ( ). The C2 and CFB genes are located in very close proximity, separated by less than 2 kb, but CFB and C4A are separated by about 30 kb. C4A and C4B are about 10 kb apart. Alleles of the complement genes occur as a unit at the population level and show striking linkage disequilibrium in haplotypes determined by family studies. That is to say, they occur together as sets on the same chromosome more frequently than expected from the frequencies of their individual alleles and with no well-documented recombinations. For these reasons, haplotypes of the four complement loci have been called complotypes, as an abbreviation for “complement haplotypes” ( ). Although their order from telomere to centromere is C2, CFB, C4A, C4B (see Fig. 51.1 ), the positions of CFB and C2 are transposed for clarity in using variant alleles to designate complotypes. Thus CFB∗S, C2∗C, C4A∗QO, C4B∗1 is a complotype that in abbreviated form is SC01 . More than a dozen complotypes in Caucasians have frequencies of about 0.01 or higher ( Table 51.1 ). In most populations, the SC31 complotype is most common. In people of African descent, FC31 is common, as is SC42 in East and Southeast Asians.
Complotype | Frequency, % |
---|---|
SC31 | 0.430 |
SC01 | 0.127 |
FC31 | 0.096 |
SC30 | 0.053 |
SC42 | 0.040 |
SC61 | 0.034 |
FC30 | 0.031 |
FC01 | 0.029 |
SC02 | 0.029 |
SC21 | 0.022 |
SB42 | 0.019 |
SC33 | 0.014 |
SC2(1,2) b | 0.01 |
3SC32 | 0.011 |
a Complotypes are given as abbreviated letters and numbers, with four alleles in arbitrary order: CFB, C2, C4A , and C4B .
Based on the study of the distribution of complotypes in relation to HLA-B and HLA-DR specificities on normal Caucasian haplotypes determined in family studies, striking linkage disequilibrium involving the whole region became evident. One could easily recognize HLA-B, complotype, DR allele sets that showed statistically significant three-point linkage disequilibrium ( Fig. 51.2 ) and that define what are regarded as extended haplotypes ( ). A shorthand nomenclature for extended haplotypes has been designed by enclosing HLA-B, complotype, and HLA-DR variants in brackets. Thus the most common extended haplotype in Caucasians is [HLA-B8, SC01, DR3] . Extended haplotypes are highly characteristic of an ethnic subgroup and have lower frequencies or do not occur at all in other ethnic groups ( ; ).
Initial studies defined extended haplotypes as the genomic interval between HLA-B and DR . It then became evident that other uncharacterized MHC alleles in the interval were likely to be included. Despite the fact that HLA-A alleles have shown limited variation for extended haplotypes, only half such haplotypes show unique and significant HLA-A allele associations ( ). The HLA-Cw locus was by far the last to be characterized. Based on genetic distance and previously incomplete typing data of HLA-B / Cw pairs, it was evident that conserved haplotypes would also include the HLA-Cw genetic region. Thus, different HLA-Cw alleles have been recently associated with different extended haplotypes ( Table 51.2 ) ( ). Furthermore, several TNF and HSP allele systems and microsatellites have been studied in relation to extended haplotypes ( Table 51.3 ). More recently, models have been created to describe the variable sizes of stretches of conserved DNA in the MHC, using the known frequencies of four different types of small blocks (<0.2 Mb) of relatively conserved DNA sequence: HLA-Cw / B; TNF; complotype; and HLA-DR / DQ ( ). Using HLA allele identification and TNF microsatellites, Yunis and colleagues have shown that some extended haplotypes extend to the HLA-A and HLA-DPB1 loci, which form fixed genetic units of at least 3.2 Mb of DNA. Intermediate fragments of extended haplotypes also exist, which are, nevertheless, larger than any of the four small blocks. This complexity of genetic fixity at various levels should be taken into account in studies of genetic disease association, immune response control, and human diversity.
Extended Haplotype | HLA-Cw Allele | Ethnicity |
---|---|---|
[HLA-B8, SC01, DR3, DQ2] | 0701 | Northern Europe |
[HLA-B7, SC31, DR2, DQ6] | 0702 | Northern Europe |
[HLA-B44, FC31, DR7, DQ2] | 1601 | Europe |
[HLA-B44, SC30, DR4, DQ7/8] | 0501 | Europe |
[HLA-B57, SC61, DR7, DQ2/3] | 0602 | Northern Europe |
[HLA-B14, SC22, DR1, DQ1] | 0802 | Northern Europe |
[HLA-B35, SC31, DR5, DQ3] | 0401 | Southern Europe |
[HLA-B38, SC21, DR4, DQ8] | 1203 | Ashkenazi Jews |
[HLA-B15, SC33, DR4, DQ8] | 0304 | Northern Europe |
[HLA-B18, F1C30, DR3, DQ2] | 0501 | Basques, Sardinia, Spain |
[HLA-B18, S042, DR2] b | 1203 | Northern Europe |
[HLA-B42, FC(1,90)0, DR3] c | 1701 | African |
a Data from normal Caucasoid population chromosomes from Boston.
b Data from patients with C2 deficiency. (Clavijo O, Delgado J, Awdeh ZL, et al: HLA-Cw alleles associated with HLA extended haplotypes and C2 deficiency, Tissue Antigens 52:282–285, 1998.)
c Data from chromosomes of normal black population living in Boston. (Clavijo OP, Delgado JC, Yu N, et al: HLA-Cw∗1701 is associated with two sub-Saharan African-derived HLA haplotypes: HLA-B∗4201, DRB1∗03 and HLA-B∗4202 without DRB1∗03, Tissue Antigens 54:303–306, 1999.)
COMPLOTYPE | HSP | TNF MARKERS | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
HLA-DR | C4B | C4A | CFB | C2 | 1B | 1A | e | d | −308 | −857 | −863 | a | b | HLA-B | HLA-Cw |
3 | 1 | 0 | S | C | 8.5 | C | 3 | 1 | 2 | 1 | 1 | 2 | 3 | 8 | 0701 |
2 | 1 | 3 | S | C | 9 | A | 3 | 3 | 1 | 1 | 1 | 11 | 4 | 7 | 0702 |
7 | 1 | 3 | F | C | 9 | A | 3 | 3 | 1 | 1 | 1 | 7, 8 | 4 | 44 | 1601 |
4 | 0 | 3 | S | C | 9 | A | 3 | 3 | 1 | 1 | 1 | 6, 7 | 5 | 44 | 0501 |
7 | 1 | 6 | S | C | 9 | A | 3 | 4 | 1 | 1 | 1 | 2 | 5 | 57 | 0602 |
5 | 1 | 3 | S | C | 9 | A | 3 | 3 | 1 | 1 | 1 | 5 | 5 | 35 | 0401 |
1 | 1, 2 | 2 | S | C | 9 | A | 1 | 4 | 1 | 1 | 2 | 2 | 1 | 14 | 0802 |
4 | 1 | 2 | S | C | 9 | A | 3 | 3 | 1 | 2 | 1 | 10 | 4 | 38 | 1203 |
4 | 3 | 3 | S | C | 9 | A | 1 | 4 | 1 | 1 | 2 | 2 | 1 | 62 | 0304 |
2 | 2 | 4 | S | 0 | 9 | A | 3 | 3 | 1 | 2 | 1 | 10 | 4 | 18 | 1203 |
3 | 0 | 3 | F1 | C | 8.5 | C | 3 | 4 | 1 | 1 | 1 | 1 | 5 | 18 | 0501 |
a The numbers or letters under each column refer to a particular allele variant of the complement, heat-shock protein (HSP) , tumor necrosis factor (TNF) microsatellite, or TNF promoter polymorphisms that are associated with the human leukocyte antigen (HLA) specificities as noted. See text for explanation. In the case of TNF-308 , 1 and 2 represent the −308/G and −308/A variants, respectively. The TNF-857/C allele is noted as 1, and the TNF-857/T allele as 2. The TNF-863/C TNF allele is noted as 1, and the −863/A as 2.
Extended haplotypes, which account for at least 30% of normal Caucasian haplotypes, have relatively fixed gross structure and DNA sequence and carry very similar (if not identical) alleles even when they are found in apparently unrelated individuals. The frequency of extended haplotypes is more likely to be underestimated than overestimated because haplotypes must be determined from family studies involving thousands of unrelated index cases. The original MHC reference sequence ( ) has been used as the starting point for numerous studies undertaken to ascertain multiple common and rarer extended haplotypes ( ; ; ; ), and then eight haplotypes have been completely sequenced, and the haplotype [HLA-A3, B7, DR15] was designated as the new MHC reference sequence incorporated into the reference human genome assembly ( ). The advent of next generation (or massively parallel) sequencing technologies enabled the sequencing of individual genomes. However, the current technologies in clinical use struggle with complex portions of the genome, like the MHC. These challenges are now being overcome, and there are novel technologies that can sequence the entire MHC, including some that are capable of phasing the entire diploid MHC, not just the individual HLA genes ( ; ; ; ). These studies will facilitate the identification of precise disease loci and may help us to better understand events such as recombination and polymorphism.
The MHC is associated with more diseases than any other region on the human genome. Identification of the key genetic association within the MHC has proven difficult for at least four reasons: (1) MHC has the highest gene density in the genome, (2) and these large numbers of genes are often in strong linkage disequilibrium forming complex haplotype structures, (3) extended haplotypes contain multiple equally plausible candidate loci, and (4) the associated diseases often show non-Mendelian inheritance.
We have reviewed concepts about complotypes and extended haplotypes to enhance our understanding of the relevance of a rather extensive literature on MHC disease associations. Most of the MHC disease associations reported to date are contained within extended haplotypes. It is probably because of this simple fact that so many MHC-allele–associated diseases have been reported, because it is not a single base pair but more than 3 million base pairs of conserved DNA that constitute the marker. For example, complement alleles such as CFB∗F1 and C4B∗3 have been linked to type 1 diabetes, because the extended haplotypes that carry them, [HLA-B18 , F1C30 , DR3] and [HLA-B62 , SC01 , DR3] , are more common in these patients than in the general population. This association means that haplotypes comprising 3 million bases or more in length of conserved genomic DNA carry susceptibility alleles for type 1 diabetes. These 3 million base-long extended haplotypes often contain multiple candidate loci, best illustrated by psoriasis, a chronic inflammatory skin disorder. Psoriasis was associated with HLA-Cw6 over 30 years ago ( ), and many studies attempted since to identify the causal gene or genes. Although HLA-C itself is the strongest candidate gene, multiple additional HLA class I and class II, and MICA/MICB variants appear to be independently associated with psoriasis ( ; ). Moreover, there are at least four genes (PSORS1C1-3 and CDSN) located next to HLA-C that encode skin-expressed genes providing additional candidate genes for this complex skin disease ( ; ). It is also critical to know the ethnic distribution of an associated MHC allele to evaluate whether the allele in patients is increased compared with that in ethnically matched control populations or is in fact truly a marker for the ethnic distribution of the disease ( ). Furthermore, it is more difficult to map the genes responsible for disease in Caucasian Americans than in African Americans or Asian Americans because the level of genetic diversity is lower in Caucasians.
Most diseases associated with MHC are autoimmune and do not show clear-cut Mendelian inheritance. Many problems confound attempts to understand the inheritance mechanisms by which these diseases occur, and analysis of MHC markers in patients and their families has clarified the picture only marginally. A major problem with MHC gene associations is their incomplete penetrance, making ordinary formal segregation and linkage studies very difficult (if not impossible) to carry out. This is seen most clearly in monozygotic twins, who presumably have identical genes. If one such twin has one of these diseases (e.g., type 1 diabetes), the other twin does not necessarily have the same disease. For type 1 diabetes, the concordance rate appears to be no higher than 50% ( ; ). This suggests that penetrance of a disease in a completely susceptible host is incomplete. Although there is excellent reason to consider genes in the MHC as determinants of type 1 diabetes susceptibility, only 15% of MHC-identical siblings of diabetic patients have insulin-dependent diabetes. This difference between 50% and 15% provides evidence for the influence of genes at a second, non–MHC-linked locus (or loci), and suggests the influence of other factors, such as environmental factors, in determining susceptibility.
Incomplete penetrance makes assignment of a specific mode of inheritance difficult and makes the likelihood of finding families with more than one affected member within one or more generations low. Families are frequently used to determine modes of inheritance. Another complicating factor is the inability to determine whether we are studying a group of patients that are homogeneous in terms of genetic determination. About 5% to 6% of random families with a type 1 diabetic proband will have a second affected child. Of these sibling pairs, approximately 60% will be MHC identical, 35% will be haploidentical, and a few percent will share no MHC haplotypes. This pattern suggests recessive inheritance of an MHC-linked susceptibility gene for the disease. At the very least, however, family studies provide highly useful haplotype data and usually allow the assignment of homozygosity for an HLA marker in probands. They have also established that MHC association in most diseases of interest is based on linkage between a susceptibility gene and the MHC.
Yet another unknown is the number of different susceptibility alleles for a disease in any specific population. In this regard, extended haplotypes can be helpful because, if increased in patients, they probably represent a single susceptibility allele that could be anywhere in the region of fixity. However, some patients have only portions of these haplotypes, and this provides little information about the location of specific allele involvement. Furthermore, it appears likely that many MHC-associated diseases are polygenic. Because of this, we will first discuss diseases associated with mutations of a single MHC gene; this will be followed by a discussion of polygenic diseases.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here