Applications of Genetics to Cardiovascular Medicine

Additional content is available online at Elsevier eBooks for Practicing Clinicians

Naturally occurring human genetic variation has served for decades to elucidate the root causes of disease, including cardiovascular disease. Exponential technologic advances in computation, data science, and assay development have recently enabled population-based analyses, broad clinical profiling, and direct-to-consumer genetic testing in millions of people. Because germline genetic variation is established at conception and persists for the lifetime, genetics offers a robust tool for causal inference for broader preventive and therapeutic insights.

This chapter reviews key principles in genetics, gene discovery approaches, and diverse applications of genetic association study findings toward clinical translation ( Table 7.1 ). The molecular structure of deoxyribonucleic acid (DNA) was described approximately 70 years ago, and the Human Genome Project completed the first draft of the human genome sequence approximately 20 years ago at an estimated cost of US$2.7 billion. Over a remarkably short period of time, human genetic data have become increasingly pervasive, and their connection to disease is increasingly understood, thereby rapidly expanding their relevance to the practice of cardiovascular medicine. To highlight the diverse and emerging applications of genetics to cardiovascular medicine, we primarily focus on coronary artery disease (CAD), the leading cause of death worldwide.

TABLE 7.1

Translating Genetics to Cardiovascular Medicine

Bench	Bedside
Identify causal factors that influence disease	Biomarkers titratable to disease risk
Test epidemiologic associations for causal inference	Biomarkers titratable to disease risk
Penetrance estimation	Disease risk prediction
Therapeutic target prioritization	Novel therapeutic targets
Therapeutic response prediction	Maximization of therapeutic benefit
Discover and characterize the range of phenotypic consequences of therapeutic traits	Minimization of therapeutic side effects
Diverse targeting strategies	Novel medicines

Key Principles of Human Genetics

Central Dogma

Genes are encoded in DNA, a polymeric molecule with two intertwining strands of a deoxyribose-phosphate backbone surrounding a ladder of paired purine and pyrimidine bases in a double helical configuration. The purine nucleotides are adenine (A) and guanine (G), and the pyrimidine nucleotides are thymine (T) and cytosine (C). Purines and pyrimidines link complementarily by hydrogen bonds across opposing strands: A-T, T-A, C-G, and G-C.

The linear DNA sequence represents its primary structure, and the base-paired double helix represents its secondary structure. Geometric and steric constraints leading to differences in orientation and shape lead to the tertiary structure. Lastly, denser packing of DNA molecules around protein anchors, known as histones, into chromatin provides the quaternary structure. Further chromatin condensation and packing yields the 22 pairs of autosomal chromosomes and one pair of sex chromosomes.

The “central dogma” of molecular biology refers to the flow of information from DNA to ribonucleic acid (RNA) to proteins. Traditionally, a gene is a DNA sequence that encodes a functional protein, and roughly 20,000 genes leading to distinct proteins have been described. Transcription copies the information in the DNA sequence into a single-stranded coding RNA, also known as a messenger RNA (mRNA). This polymer is structurally similar to DNA but uses uracil (U) in place of thymine (T). Of the 6.4 billion base pairs in the human genome, just over 1% represent exons , or DNA regions that directly encode mRNA. Subsequently, translation copies the information in an mRNA into a sequence of amino acids that make up a protein, which can service in a variety of roles (e.g., structural elements, enzymes, hormones, gene expression regulation). Variation in DNA sequence, or genotype , may influence protein function or abundance directly through alteration of the amino acid sequence when occurring within exons or indirectly when occurring in noncoding regions, including effects on splicing or mRNA transcript abundance. Such effects on a protein may lead to variation in an observable characteristic, or phenotype .

Epigenetics refers to phenotypic changes caused by factors beyond the DNA base pair sequence that influence the process of transcription. The most common such modification is methylation of cytosine bases, typically those in CpG dinucleotides, which generally results in reduced transcription or “silencing” of a gene. Post-translational modification of histone proteins, such as acetylation of lysine residues, can influence the accessibility of DNA sequence to the transcriptional machinery. Additionally, expressed RNA molecules that do not code for proteins, termed noncoding RNAs (ncRNAs), can yield phenotypic changes. For example, long ncRNAs can regulate transcription through several mechanisms, including interactions with the cell’s transcriptional machinery and with histone-modifying enzymes; this is the mechanism for X chromosome inactivation in mammals. Additionally, microRNAs, another form of ncRNA, physically bind to complementary sequences in mRNA molecules and result in either suppression of mRNA translation or degradation of the mRNAs.

Heritability

Many cardiovascular diseases, including CAD, aggregate within families. When disease occurs early, shared genetic factors may play a strong role. For example, a family history of premature CAD in a parent confers a nearly twofold risk for CAD.

Heritability refers to the fraction of interindividual variability in risk for disease attributable to additive genetic variation. Heritability is a population-based construct without clear meaning for individuals. Among individuals, 99.9% of the 6.4 billion base pairs are the same; genetic analyses leverage the 0.1% differences to understand trait or disease variation. It is estimated that CAD is 40% to 60% heritable, based on the aforementioned family-based methods or statistical genetics approaches. For common traits studied to date, heritability is typically in the 20% to 80% range. Traits with higher degrees of heritability are more suitable for gene discovery studies and genetic risk prediction. Remaining contributors to disease risk variability include environmental influences, nonadditive genetic influences (epistasis), nonadditive genotype/environment effects, errors in estimations of relatedness or disease, and random chance.

Genetic Architecture

The “genetic architecture” of a disease refers to the number and magnitude of genetic risk factors that exist in each patient and in the population, as well as their frequencies and interactions. For a given individual, diseases can result from genetic variation at a single gene ( monogenic ), few genes ( oligogenic ), or several genes ( polygenic ). In scenarios where a single gene defect is necessary to yield sufficiently large risk for disease, the condition is termed a mendelian disorder because it will obey classical modes of inheritance.

Typical mendelian modes of inheritance include autosomal dominant, autosomal recessive, or X-linked. In autosomal dominant disorders, a single defective copy of a gene (with most genes having two copies, one inherited from the mother and one from the father) suffices to cause the phenotype. Autosomal recessive disorders require both copies to be defective to lead to the phenotype. Familial hypercholesterolemia (FH), characterized by severely elevated blood cholesterol values and markedly increased risk for premature CAD, typically occurs due to single genetic variants in low- LDLR , PCSK9 , or APOB . However, if both gene copies are disrupted, a more severe phenotype occurs, and thus the inheritance pattern is termed incomplete dominance. In X-linked disorders, the defective gene resides on the X chromosome. Given that men have only one X chromosome and women have two X chromosomes, men who carry the defective copy are affected with the disorder whereas women tend to be unaffected carriers, with some exceptions. Fabry disease, a lysosomal storage disease sometimes manifesting as cardiomyopathy due to disruptive mutations in GLA on the X chromosome, is typically more severe in hemizygous men (due to there being one X chromosome, and thus one GLA copy) than heterozygous women (due to there being two GLA copies). Thus, Fabry is not classically X-linked recessive and is generally simply termed X-linked.

Mendelian disorders imply that the presence of a pathogenic monogenic variant is deterministic for disease. However, genetic profiling in large datasets enables unbiased estimates of penetrance—the likelihood of a person with a pathogenic variant having disease—and expressivity—variation in severity of disease. ^,

Genetic Variation

Genetic architecture and phenotype largely dictate the diagnostic yield of genetic testing strategies ( Fig. 7.1 ). Humans share the vast majority of DNA sequence, but variation in both coding and noncoding DNA sequences contributes to distinguishing characteristics between individuals. Due to natural selection over many generations, common genetic variation tends to link to modest phenotypic effects, whereas rarer genetic variation, arising relatively more recently in human history, can lead to larger phenotypic effects. Common genetic variation influencing phenotypes tends to occur within noncoding regulatory elements. Coding sequence is less tolerant of genetic variation, and single base pair changes may lead to substantial phenotypic changes.

FIGURE 7.1, Relationship between allele frequency and effect magnitude of associated variants. Genome-wide assay studies, typically conducted with genome-wide genotyping arrays, typically identify common alleles with modest effects. Array coverage and imputation better enable the detection of lower frequency variants with intermediate effects. Rare alleles with larger effects are only detectable through genetic sequencing. Whole exome sequencing will detect the full allelic spectrum in coding regions, and whole genome sequencing will detect the full allelic spectrum across the genome.

Current clinical cardiovascular genetics practice largely focuses on the detection of coding variants predisposing to large phenotypic changes ( Fig. 7.2 ). DNA variation within coding sequence may not necessarily directly impact a protein’s amino acid sequence. Degeneracy, or redundancy, in the genetic code refers to the observation that multiple codons (groups of three bases, the basis of the three-letter code) may yield the same amino acid. For example, variation at a G-C-A codon to G-C-G will lead to an alanine in both scenarios; such coding DNA sequence variants without impact on amino acid sequence are termed synonymous variants and tend to not have phenotypic consequences. Other coding variants can cause a variety of alterations in a protein—substitution of a single amino acid with another ( missense ), premature introduction of a stop codon ( nonsense ), scrambling of the amino acid sequence past the variant site ( frameshift ), or insertion or deletion of amino acids. These nonsynonymous variants may have a range of phenotypic effects from negligible to profound. Nonsense and frameshift variants tend to yield greater phenotypic effects than missense variants. Also, sequence variants at splice sites (the first and second bases after the end of each exon and before the beginning of each exon) can lead to a severely disrupted protein missing a domain encoded by an entire exon. Predicted loss-of-function, or protein-truncating, variants refer to nonsense, frameshift, or splice site variants; of note, such variants that occur near the downstream end of the DNA sequence may not have a significant phenotypic effect. ^, In silico prediction algorithms, largely weighted by assessments of evolutionary conservation of DNA sequence across gene families and across species, may help to prioritize missense variants more likely to have larger phenotypic effects.

FIGURE 7.2, Protein-altering variant ontology. Key genetic variants expected to have direct impact on amino acid sequence, and therefore overall protein function, and their relationships are depicted.

Noncoding variants, although they do not directly affect the amino acid sequences of proteins, can cause phenotypic changes in other ways. A noncoding variant within regulatory elements, such as promoters or transcriptional enhancers, may result in a decreased amount of the protein product. Noncoding variants can affect the processing of RNA in other ways; for example, a noncoding variant that falls in the midst of a microRNA sequence might impair or enhance the microRNA’s ability to interact with specific mRNAs. Large-scale research efforts are cross-referencing human genetic variation with diverse regulatory and intermediate effector molecule changes across tissues to help identify mechanistic links between noncoding DNA variation and phenotypes.

Although most genetic variation is a single base pair change, larger DNA sequence changes may also yield phenotypic impacts. Viable aneuploidies (e.g., Down syndrome caused by trisomy 21) or chromosomal abnormalities can yield varied substantial effects. Copy number variants (CNVs) involve a variable number of repeats of a long DNA sequence (>1000 base pairs), whereas variable nucleotide tandem repeats refer to variation involving shorter nucleotide motifs. CNVs have been linked to congenital heart diseases as well as variation in atherosclerotic cardiovascular disease biomarkers, such as lipoprotein(a) [Lp(a)].

Characterizing Human Genetic Variation

In most cases, a person has two copies of each DNA sequence because of the presence of paired chromosomes, and the two copies are known as alleles . Exceptions are for DNA sequences on the X or Y chromosomes in men, the two sex chromosomes being quite different, and for DNA sequences in the mitochondria which are exclusively maternally inherited. For a DNA variant, the genotype is the identity of the two alleles at the site of the variant. The two alleles may be identical (homozygous) or different (heterozygous).

A series of genetic variants that occur together is termed a haplotype . After the completion of the Human Genome Project, the International HapMap Consortium performed dense sequencing of large genomic segments in hundreds of individuals and identified regions of the genome (loci) where single base pair changes, or single nucleotide polymorphisms (SNPs), commonly occur across individuals. Nearby common variants are often found to be inherited together and exist in a state called linkage disequilibrium (LD) ( Table 7.2 ). Because the haplotype is located on a single region of the chromosome, it tends to retain the linked genotypes as it passes from parents to offspring.

Genotyping technologies directly ascertain the genotype at prespecified variant sites. A common approach to interrogate the presence of a single variant is the polymerase chain reaction-based TaqMan assay; probes are designed to specific SNP alleles, each with a different 5′ fluorophore color that is detected during amplification. More commonly, prespecified variants are assayed in multiplex through array “chips” with the capacity to assess up to 2 million variants at once. Arrays are designed based on LD patterns detected in reference sequencing studies to ensure adequate coverage of haplotypes via “tagging” SNPs across the genome. This technology is used in conventional genome-wide assay studies and in most direct-to-consumer genetic testing services. Imputation, or statistical inference of nondirectly assayed genotypes using data from reference sequencing studies, can infer several million additional genotypes. The imputed allele dosage (0 to 2 on a continuous scale) for each variant with frequency greater than 0.5% in the population is probabilistically assigned based on the combination of genotypes directly assayed on the array.

Sequencing technologies directly identify the order of base pairs in DNA ( Fig. 7.3 ). Sanger sequencing, first described in the 1970s and still in routine use, uses DNA polymerase to synthesize new DNA chains, using the DNA under study as a copy template, with trace amounts of fluorescently labeled chain-terminating nucleotides (four different colors for the four bases) to yield fragments of differing lengths that identify the base in each position by its color. Shotgun sequencing, involving the sequencing of random fragments of DNA with subsequent assembly of the sequences via overlaps between the fragments, was used for the Human Genome Project. Massively parallel “next-generation sequencing” (NGS) was developed in the late 1990s through early 2000s. In NGS, fixed DNA libraries provide templates for “sequencing-by-synthesis” in multiplex fashion. NGS can enumerate base pair changes across all 6.4 billion base pairs of the human genome (whole genome sequencing) or exclusively the protein-coding regions (whole exome sequencing). Both whole exome and whole genome sequencing are increasingly applied in population-based research analyses as well as clinical applications. To minimize biases introduced from templates (such as copying errors and sequence-dependent amplification biases) used for NGS, novel approaches such as real-time, single-molecule sequencing platforms for long-read de novo sequencing are being explored but have not yet been applied at similar scale as NGS.

FIGURE 7.3, Schematic of DNA sequencing technologies. Second generation sequencing is also referred to as next generation sequencing.

TABLE 7.2

Factors Influencing Linkage Disequilibrium

Factors	Mechanisms
Variable recombination rates	LD extent is inversely proportional to the recombination rate, and certain regions of the genome have higher rates of recombination than others.
Variable mutation rates	Some regions, such as CpG dinucleotides, may have high mutation rates and show little LD.
Gene conversion	During meiosis, homologous recombination between heterozygous sites may result in correction of mismatched alleles effectively copying DNA sequence.
Natural selection	Haplotypes containing favorable alleles may be quickly swept to high frequency.
Population structure	Population subdivisions promote LD patterns in humans.
Admixture	Subsequent generations after gene flow can newly establish LD between nearby markers.
Genetic drift	Random sampling of gametes in each generation can lead to allele frequency changes, more pronounced in smaller populations

DNA , Deoxyribonucleic acid; LD , linkage disequilibrium.

Gene Discovery

Family-Based Studies

Conditions that occur prematurely and aggregate in families suggest important contribution from genetic variation. When classic mendelian inheritance patterns are observed for a suspected mendelian condition, genetic analyses to confirm the presence of a monogenic factor will have greater diagnostic yield than when such inheritance patterns are absent. For adult-onset conditions with strong genetic and nongenetic determinants, general familial enrichment may also result from polygenic or environmental factors. Phenocopy refers to a phenotype consistent with genetic predisposition but largely caused by environmental conditions for a given individual.

For novel syndromes or phenotypes without a known genetic basis or with nondiagnostic conventional genetic testing, family-based analyses may serve to discover novel implicated genes. Recruitment of multiple family members both with and without the phenotype allows for elimination of genotypes inconsistent with mendelian segregation. Both phenocopy and reduced penetrance may lead to deviation from expected inheritance patterns, and thus analyses of large extended pedigrees aid such analyses.

Previously, linkage studies were used to prioritize genomic regions that tended to cosegregate with the presence of a phenotype rather than the absence of the phenotype. Classic approaches, prior to widespread use of NGS, involved genotyping hundreds of genetic markers across the genome. Cosegregation of a marker with disease in pedigrees suggested that the causal disease mutation lay within several megabases of the marker, a region that often encompasses numerous candidate genes. Positional cloning would further narrow down the region by genotyping more markers, with subsequent sequencing used to identify the causal gene.

NGS is often now used upfront for broad gene sequencing, particularly whole exome sequencing, for family-based analyses. Variants annotated to disrupt protein function are prioritized if they are consistently observed among affected family members but not present among unaffected family members. The advent of large publicly available reference multi-ethnic whole exome and whole genome sequence databases of allele frequencies now allow for the verification of the absence of a suspected disease-causing variant among unrelated healthy individuals. Once the rare genetic variant thought most likely to be the causal mutation is selected, it can be confirmed by sequencing the gene in unrelated individuals who have the same disorder. If some of these individuals have variants in the same gene (either the same or, more likely, different variants), it strongly argues that the gene is responsible for the disease.

Hypercholesterolemia and Coronary Artery Disease (see also Chapter 27 )

FH afflicts approximately 1 in 300 individuals, manifesting as severely elevated blood cholesterol levels and increased risk for early-onset myocardial infarction ( Fig. 7.4 ). Work in the 1970s and 1980s demonstrated that most cases of FH result from mutations in the LDLR gene, and subsequent studies implicated mutations in the gene for apolipoprotein B ( APOB ) at domains that interact with the LDL receptor.

FIGURE 7.4, Mechanisms of LDLR dysfunction leading to familial hypercholesterolemia. Numbers refer to classes of LDLR variants: (1) synthesis of receptor or precursor protein is absent, (2) absent [2a] or impaired [2b] formation of receptor protein, (3) normal synthesis of receptor protein, abnormal low-density lipoprotein binding, (4) clustering in coated pits, internalization of the receptor complex does not take place, (5) receptors are not recycled and are rapidly degraded, and (6) receptors fail to be targeted in the basolateral membrane. ApoB , apolipoprotein B; LDLR , low-density lipoprotein receptor; LDLRAP1 , low-density lipoprotein receptor associated protein 1; PCSK9 , proprotein convertase subtilisin/kexin type 9.

In the early 2000s, various studies identified families with apparent incompletely dominant FH but without LDLR or APOB variants. Linkage analyses and subsequent positional cloning identified PCSK9 as the causal gene. Sequencing studies and subsequent functional work identified two different rare gain-of-function PCSK9 variants in different families. PCSK9 increases blood cholesterol by binding to the LDL receptor and reducing the availability of the LDL receptor at the cell surface for cholesterol clearance from blood.

Also in the early 2000s, linkage and cloning analyses of families with autosomal recessive FH prioritized a large region on chromosome 1. Ultimately, homozygous mutations in LDLRAP1 (previously known as ARH, autosomal recessive hypercholesterolemia) were implicated in several families of Sardinian origin. LDLRAP1 encodes LDL receptor adaptor protein 1, which is required for endocytosis of the LDL receptor.

Metabolic Syndrome and Coronary Artery Disease

In 2007, linkage analysis of an extended family of Iranian ancestry with premature CAD and features of the metabolic syndrome resulted in the identification of a causal missense variant in LRP6 . In vitro analyses indicated that the LRP6 missense variant disrupts Wnt signaling. More recently, the same investigators used linkage analyses in three large families of Iranian ancestry with cosegregation of premature CAD and the metabolic syndrome to prioritize a region in chromosome 19. Whole exome sequencing and focused analysis within the prioritized region identified a perfectly cosegregating missense variant in DYRK1B in all three families. Screening of morbidly obese individuals of European descent with CAD and multiple metabolic phenotypes identified a family with cosegregation of a different missense variant in DYRK1B . Functional analysis indicated that the variants were gain-of-function, promoting the expression of the gene encoding glucose-6-phosphatase.

Case-Control and Population-Based Studies

The technologic advances described earlier in this chapter allow unbiased assessments of the effects of genome-wide genetic variation on cardiovascular traits in large cohorts. Family-based analyses continue to be an efficient study design for families with apparently mendelian conditions with nondiagnostic genetic panel testing. However, heritability assessments for common conditions, such as CAD, indicate that naturally occurring common genetic variation may contribute to CAD risk broadly and not just in such exceptional families.

The design of studies using large cohorts focuses on maximizing power (likelihood of detecting true associations) to test hypotheses while minimizing the risk of detecting false associations. Power for genetic association analyses is determined by: (1) exposure (allele) frequency, (2) total sample size, particularly case count, (3) true effect of the exposure, and (4) threshold for statistical significance. Because there are approximately 1 million independent sites of common genetic variation in the human genome, a Bonferroni-corrected alpha threshold of 5 × 10 ⁻⁸ (0.05 divided by 1 million) for statistical significance is typically applied to genome-wide studies. Despite stringent thresholds for statistical significance used to mitigate false-positives in a single discovery cohort, putative novel associations should undergo independent replication in a validation cohort. Both population stratification (systematic allele frequency differences between subpopulations) and cryptic relatedness (greater degree of relatedness among individuals in a cohort than is assumed) may lead to spurious associations. The use of genome-wide genotyping data to adjust for ancestry and genetic relatedness may mitigate such confounding.

Two broad analytic approaches are used—the common variant association study (CVAS) and the rare variant association study (RVAS). CVAS is also termed genome-wide association study (GWAS). In a GWAS, genetic variants are sufficiently prevalent to estimate the relative difference between cases and controls or incremental change in a continuous outcome. In contrast, RVAS aims to test the collective contribution of individually rare variants to a phenotype, requiring the aggregation of rare variants into a statistical exposure unit for effect estimation.

GWASs use arrays comprising prespecified genetic variants, typically up to 2 million. Reference datasets may be used to impute 10 to 30 million additional variants depending on ethnicity and panel. Conventional statistical models use multivariable regression frameworks to compare each variant’s allele frequency between cases or controls or with graded effect on a continuous outcome. Case-control cross-sectional study designs have a lower risk of confounding in GWAS versus in observational epidemiologic studies because putative confounders are unlikely to influence the random allocation of alleles at birth. Because case count strongly influences statistical power, case-control experimental designs are frequently used in GWASs. As broadly phenotyped mega-biobanks become increasingly available, new computationally efficient mixed model approaches to analyze unbalanced case-control phenotypes are often used. ^, Conventional methods ignore putative genetic interactions between loci, or epistasis; to address this omission, emerging methods aim to use multidimensional genetic architecture into genetic discovery. A novel discovery from a GWAS, typically a common noncoding SNP, represents just the first step in characterizing the biologic and clinical relevance of the genomic locus marked by the SNP, because the locus will often contain numerous candidate genes, any one of which could be causal. Follow-up efforts include comprehensive in silico and functional dissection to prioritize causal variants and genes toward understanding how the SNP genotype leads to the phenotype.

RVASs, which typically interrogate rare disruptive protein-coding variants, allow for more robust prioritization of causal genes, because any identified variants nominate the genes in which they reside. Given the infrequency of each individual variant, and the corresponding lack of statistical power, variants in the same gene are collapsed into a single statistical unit for association. Because approximately 20,000 protein-coding genes have been described in the human genome, the Bonferroni-corrected alpha threshold (i.e., corrected for multiple comparisons) for exome-wide significance is 2.5 × 10 ⁻⁶ . Because disruptive variants within the same gene may have bidirectional functional effects (loss-of-function variants versus gain-of-function variants), as is the case with PCSK9 and APOB , specialized methods accounting for this phenomenon, such as the sequence kernel association test, are preferred.

Genome-Wide Association Studies for Lipids

Starting in 2007, GWASs have been performed on cohorts of individuals of European descent to identify SNPs associated with blood low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), triglycerides, or total cholesterol. With each successive year, an increasing number of variants and loci are newly discovered due to: (1) increased sample sizes, (2) improved coverage from successive genotyping arrays, (3) incorporation of diverse ethnicities, and (4) improvements in genotype imputation. These advances also permit the characterization and association of uncommon alleles with larger effect sizes ( Fig. 7.5 ). To date, over 350 distinct regions of the genome have been identified to be significantly associated with blood lipids.

FIGURE 7.5, Identified lipid associations through genome-wide scans for plasma lipids. A, Compared with earlier studies, newer studies with denser arrays (including arrays enriched for coding variation), improved imputation, and larger sample sizes enable detection of variants across the allelic spectrum with more modest effects as well as lower frequency and rare variants with larger effects. B, Due to denser arrays, improved imputation, and larger sample sizes, genetic association studies for lipids continue to identify novel genomic loci associated with lipids.

Imputation and the analysis of diverse ethnicities have enabled the detection of so-called Goldilocks alleles. Such variants represent large-effect disruptive mutations with sufficiently high allele frequencies to have statistical power in population-based studies. The analysis of founder or bottlenecked populations are well suited to identifying large-effect uncommon alleles. For example, a study of nearly 120,000 adults living in Iceland used array-derived genotypes imputed to 25.3 million variants from reference genomes from Iceland. A novel rare (allele frequency 0.4%) Northern European–specific 12-base pair deletion in the fourth intron of ASGR1 (a receptor on hepatocytes for a class of glycoproteins) was found to be associated with both reduced non–HDL-C and reduced risk for CAD. Polymerase chain reaction (PCR)-based and direct sequence analyses indicated that the intronic variant disrupted ASGR1 mRNA splicing, leading to a truncated ASGR1 protein.

In addition to imputation, genotyping arrays enriched for exonic variant coverage (“exome chips”) also identify large-effect uncommon disruptive variants. Such an approach was recently applied to lipids across diverse ethnicities, with several novel associations. ^, A new observation was the association of A1CF p.Gly398Ser with increased triglyceride and total cholesterol concentrations, as well as nominal association with increased risk for CAD. Consistent with this observation, knock-in mice with the equivalent of the A1CF p.Gly398Ser mutation had increased triglycerides. A1CF is an RNA-binding protein that alters the splicing of messages that encode enzymes involved in carbohydrate metabolism.

Genome-Wide Association Studies for Coronary Artery Disease

The first GWASs for CAD were reported in 2007, all identifying a 58-kilobase interval in chromosome 9p21 not previously recognized to be relevant to CAD and not containing any protein-coding genes (a so-called gene desert). Despite intensive efforts since the discovery of this 9p21 locus, the mechanisms by which variants in the locus influence CAD risk remain unclear, highlighting how functional interrogation of disease-associated variants in genomic regions without robust pathophysiologic hypotheses remains a formidable challenge. The list of loci associated with CAD continues to expand, with 163 loci identified to date ( Fig. 7.6 ). Based on observed pleiotropy and prior biologic hypotheses, many loci may contribute to CAD risk through various established risk factors, and many other loci, including the 9p21 locus, may act through currently undiscovered pathways.

FIGURE 7.6, Genes mapped to known coronary artery disease loci from genome-wide association studies binned by atherosclerosis-related pathophysiologic pathways based on observed pleiotropy.

Analysis of low-frequency disruptive alleles for CAD using exome chips has also discovered newly implicated genes. SVEP1 p.D2702G (allele frequency 3.6%) was recently found to be associated with increased risk for CAD. SVEP1 encodes sushi, a cell-adhesion molecule. Interrogation of SVEP1 p.D2702G with established CAD risk factors showed that it also led to increased blood pressure and increased risk for diabetes mellitus type 2. The CAD association appears outsized compared with the effects on blood pressure and diabetes mellitus, implicating potentially novel pathways that may contribute to CAD risk.

Evidence of association across an “allelic series”—multiple alleles with diverse frequencies (common and rare) and mechanisms (noncoding and coding) linked to the same gene—increases confidence in causal gene inference. Prior evidence strongly implicated the nitric oxide–cyclic GMP pathway in CAD risk, and CAD GWASs have detected several SNPs tagging key genes in the pathway, such as NOS3 , GUCY1A1 (formerly GUCY1A3 , a guanylate cyclase subunit), PDE5A , PDE3A , and MRVI1 . Luciferase assays for a CAD-associated noncoding variant near GUCY1A1 show that it modulates GUCY1A1 promoter activity. Prior work linked loss-of-function mutations in GUCY1A1 in an extended family with increased risk for premature CAD. Consistent with these findings, both common noncoding and rare coding disruptive alleles in GUCY1A1 influence both blood pressure and CAD risk in population-based analyses.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here