The Human Genome and Neonatal Care

Key Points

Twenty percent of infant deaths in the United States and a larger portion of infant deaths in the NICU have been attributed to chromosomal and congenital anomalies, with the prevalence increasing with the expanded use of genetic diagnostic tools.
Thousands of individuals have had their entire genomes or exomes sequenced and shared, along with corresponding phenotype information. This comprehensive, linked information enhances our ability to identify genetic links with previously unexplained structural and functional disorders among newborns.
Linking genetic variations to disease is a key step to advancing understanding of pathophysiology and development which will lead to better treatment and management strategies.
Studies evaluating use of rapid next-generation gene sequencing in diagnosis of infants with diseases of unknown etiology have begun in NICUs, and early results reveal increasingly higher rates of diagnosis with time to result now in days to weeks rather than months to years.
Ethical issues related to incidental findings arising from sequencing must be addressed as these approaches are applied in the NICU.
Identifying genes and variants associated with the more common complex disorders of prematurity (ROP, IVH, NEC, BPD) remains a challenge.
Application of genomics and genetics to clinical neonatology is evolving rapidly as the prices drop and the speed of sequencing, identifying, and analyzing variants for their association with a growing number of gene-disease databases accelerates.

Within 50 years of the discovery of the double helix structure of DNA, the Human Genome Project achieved the major milestone of identifying an almost complete sequence of the approximately 3 billion nucleotides in the human haploid genome. Advances in sequencing technology have enabled the reporting of over 100,000 complete genomes and the characterization of millions of genetic variants in millions of individuals since the publication of these single, consensus genomes. Linking genetic variations to disease is a key step to advancing understanding of pathophysiology and development which will lead to better treatment and management strategies. Clinicians in neonatal intensive care units (NICUs) are partnering with experts in genetics, gaining expertise and utilizing tools such as variant panels and next-generation sequencing to diagnose causes of birth defects and other systemic disorders that become apparent in infancy. This chapter will review concepts about the human genome relevant to neonatal care providers, including (1) the approach and evolving genome-focused tools that clinicians are using to diagnose neonates with genetic disorders; and (2) a summary of work seeking to identify associations between relatively common genetic variations and complex disorders associated with preterm birth. To facilitate reading, Table 25.1 lists and defines common terms applicable to genetics and genomics in the NICU.

Table 25.1

Glossary of Terms Used in Genetics and Genomics

Allele: variant form of a gene at a certain locus

Autosomal: gene is located on one of the numbered (nonsex) chromosomes

CNV: copy number variant)– INDELS (insertions/deletions)

Curation: a term used to describe the analysis of the existing data and evidence about a specific gene or gene variant and its relation to a specific phenotype

Exome: the sum of all exons (coding sequence, splice sites, 5′ and 3′ UTRs, miRNAs) in the genome ~ 50 Mb or 2% of genome)

FISH: fluorescence in situ hybridization (FISH), which determines the presence or absence of discrete segments of DNA

Genome-wide association: in studies testing thousands or millions of variants for association with a phenotype, the statistical standard for likelihood a variant contributes to risk of the condition in question. In genome wide association studies (GWAS) is often set at p < 10 ⁻⁸

Heritability: a quantitative measure of the extent to which genetic factors account for phenotypic variance

Human Genome: 3 billion nucleotides, with approximately 25,000 genes (+ 2× more highly conserved regions)

Linkage: cotraveling of alleles usually near one another—when you find one, you usually find the other; multiple alleles traveling together would be haplotype

Locus: position on a chromosome

NGS: next-generation sequencing, which includes whole exome sequencing (WES), and whole genome sequencing (WGS)

SNP: single nucleotide polymorphism, a variation in gene sequence that is the most common type of genetic variation among people. On average, each person has 4–5 million SNPs in their genome; about one occurs every 1000 nucleotides

VUS: variants of unknown significance

The Human Genome Project and the Big Picture of Genomic Medicine

The Human Genome Project was an international effort that included a focus on identifying the chromosomal location of normal and disease-causing genetic variants. The international effort continues with some countries’ population-based studies applying whole genome or exome sequencing to tens of thousands of individuals, mostly in relatively high-resource countries with overrepresentation of communities with European ancestry. The Global Genomic Medicine Collaborative (G2MC) was formed in 2015 to facilitate implementation of genomic medicine worldwide to improve individual and population health, with the imperative for inclusion of geographic and population diversity. More global efforts in accumulating cohorts with extensive genomic information have begun, and the G2MC has prioritized the need for provider education programs, as this is likely to be a rate-limiting step in appropriate allocation and utilization of genomic testing.

By participating in collaborative efforts, clinicians and researchers have published studies of comprehensive views of genetic variation for thousands of individuals with type 2 diabetes, breast cancer, or traits such as height or birth weight. These collaborative international efforts have created the framework for one of the greatest successes of the Human Genome Project—that is, information generated relevant to the human DNA sequence and its variation, held in public trust with open access to the scientific community through dozens of accessible databases and analytic platforms.

One major hub for clinicians that contains both clinical and scientific descriptors of genetic findings on a disease basis is located on the website Online Mendelian Inheritance in Man ( www.ncbi.nlm.nih.gov/omim/ )—a comprehensive catalogue, updated daily, of more than 16,000 described genes, including the ever-growing total of over 7000 genes with phenotyping associations with a known molecular basis, and over 4500 genes with phenotype-causing mutation. The online version, a compendium of human genes and genetic phenotypes, also provides links to online resources that provide information about genetic variants, their associations with disease, and multiple other aspects of these genes such as their commonality across species and the differences in variant prevalence between populations of different ancestry, proteins and variations in protein structures, and which genes are in which biological pathways ( http://omim.org/help/external ; www.ncbi.nlm.nih.gov/omim/ ).

The Clinical Sequencing Exploratory Research (CSER) Consortium, funded by the National Human Genome Research Institute (NHGRI) and the National Cancer Institute (NCI), is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches. OMIM also includes a link to ClinVar, a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence ( https://www.ncbi.nlm.nih.gov/clinvar/intro/ ). These and some of the other online tools that provide information about genetic variations and links to phenotypes are listed in Table 25.2 .

Table 25.2

Online Genomics Resources

ACMG Recommendations for Reporting of Incidental Findings in Clinical Exome and Genome Sequencing https://www.ncbi.nlm.nih.gov/clinvar/docs/acmg/ . Annually updated minimum list of genes that should be evaluated in individuals undergoing clinical ES/GS based on the medical actionability of the associated condition

ClinGen: a National Institutes of Health (NIH)-funded resource dedicated to building a central resource that defines the clinical relevance of genes and variants for use in precision medicine and research: https://clinicalgenome.org/

ClinVar: freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence: https://www.ncbi.nlm.nih.gov/clinvar/

Database of Genotype and Phenotype (dbGaP): an archive of data from genome-wide association studies on a variety of diseases and conditions accessible through this NCBI: https://www.ncbi.nlm.nih.gov/gap/

DECIPHER: a database of reported copy number variants and linked phenotypes: https://decipher.sanger.ac.uk/

Ensembl: a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation, and transcriptional regulation. Ensembl annotates genes, computes multiple alignments, predicts regulatory function, and collects disease data: https://www.ensembl.org/index.html

GeneMatcher: web site that enables connections between clinicians who have a patient with a candidate or ultra-rare gene and researchers who have an interest in that gene: https://genematcher.org/

GeneReviews: a clinical resource for many genetic conditions that provides clinically actionable information including diagnosis, inheritance, and management as well as a differential diagnosis of related conditions: https://www.ncbi.nlm.nih.gov/books/NBK1116/

GenomeConnect: GenomeConnect is an online registry designed by the Clinical Genome Resource (ClinGen) for people who are interested in sharing de-identified genetic and health information to improve understanding of genetics and health: https://www.genomeconnect.org/

Human Phenotype Ontology (HPO): a standardized set of phenotypic terms; a widely used resource for capturing human disease phenotypes for computational analysis to support differential diagnostics: https://hpo.jax.org/app/116

Online Mendelian Inheritance in Man (OMIM): a searchable database of clinical features, phenotypes, and genes: https://omim.org/

Unique: a website with patient-/family-facing resources regarding chromosome and gene disorders: https://www.rarechromo.org/

University of California Santa Cruz Genome Browser: a website created initially to ensure public access to the initial human genome assembly, has now evolved to include a broad collection of vertebrate and model organism assemblies and annotations, along with a large suite of tools for viewing, analyzing, and downloading data: https://genome.ucsc.edu/

VVP (VAAST Variant Prioritizer) rapidly prioritizes genetic variants: https://github.com/Yandell-Lab/VVP-pub

The Genome and Genomics

Genetics focuses on the study of single genes and their effects. Genomics is defined as the comprehensive study of the functions and interactions of all the genes in the genome. The Human Genome Project provided key information about the genome. It quantified the total number of base pairs (approximately 3 billion), defined genes, and provided more information about how genomic DNA might be classified. The genomic DNA of eukaryotic organisms includes exons (regions of DNA that are “expressed,” or translated into protein) and introns (intervening regions not translated into protein). Despite their apparent importance, exons account for less than 2% of the DNA in the entire genome. The regions outside of the exons are increasingly recognized to play critical roles in how and when the genes are expressed. The definition of what a “gene” is changes regularly as we learn more about distant regulatory elements, DNA modifications, chromatin structure, gene/gene interactions, exon skipping, post translational modification, and a host of other factors that determine what a “gene” does.

Understanding how conserved “noncoding” sequences influence human health and disease continues to be an important area of study. Even before the sequencing of the complete human genome, we knew that there were regulatory regions of the genome close to exons, sometimes called promotor regions. These are sites where transcription factors can bind to “turn on” or “turn off” a gene’s expression. The regulatory region is usually a different set of base pairs than the actual transcription start site where transcription of mRNA is initiated. We also know that certain combinations of mRNA result in “stop codons,” marking the point in the sequence where transcription stops ( Fig. 25.1 ).

Fig. 25.1, Gene Control: Regulatory Regions.

https://www.ncbi.nlm.nih.gov/Class/MLACourse/Modules/MolBioReview/alternative_splicing.html

In addition to describing functional aspects of the genome, the origins of the human genome are also intriguing. Almost half of the genome is derived from foreign DNA. These entered the genome via germ cells as transposable element DNA, first described in maize by Barbara McClintock. Interestingly, we have learned that approximately 8% of the intronic base sequences are the products of human endogenous retroviruses (HERVs). The segments are DNA-based copies of their own viral RNA genetic material inserted into the human genome over millennia. Evidence is accumulating to suggest a potential functional role of these HERVs in numerous pathologies including neurodegenerative diseases, autoimmune disorders, and multiple cancers.

As our understanding of the genome has expanded, we know that many more RNAs and proteins can be made from the DNA in the genome. While approximately 20,000 protein-coding genes are known, the principle of alternative splicing (i.e., mRNA comprised of different exons from the same gene coding different proteins) increases the estimate of the number of proteins from the 20,000 one would imagine to well over 100,000. This is particularly important in the developing neuronal system, in which over 90% of multiexon genes are alternatively spliced throughout development ( Fig. 25.2 ).

https://www.ncbi.nlm.nih.gov/Class/MLACourse/Modules/MolBioReview/alternative_splicing.html

Mitochondrial Deoxyribonucleic Acid

Mitochondria are the energy-processing organelles within each cell. Each mitochondrion has its own genome, distinct from the nuclear genome and thought to arise from incorporation of bacterial DNA. The mitochondrial genome is approximately 16,500 base pairs in length and encodes 37 genes. A wide range of disorders have been associated with variation in mitochondrial sequence. In addition, because mitochondria reside in the cytoplasm and are not found in sperm, they have a unique pattern of maternal-only inheritance, in which mothers pass their mitochondria on to all their offspring, with the daughters then passing their mothers’ mitochondrial DNA on to subsequent generations. The knowledge that variants in mitochondrial DNA can cause disease plus the progress made in techniques for molecular and cellular manipulation have led to embryonic mitochondrial transplantation, where “healthy” mitochondria are transplanted into the cytoplasm of an early-stage embryo identified to have a lethal mitochondrial variant.

Variations in the Human Genome

Genetic variations are differences in the DNA sequence and structure among individuals. Variant types include single nucleotide polymorphisms (SNPs), copy number variants (CNV), insertions and deletions (indels), polymorphic repeats, and microsatellite variants. Additionally, there are differences in the amount of chromosomal material, termed aneuploidies.

Single Nucleotide Polymorphisms

Single nucleotide polymorphisms (SNPs) were the first commonly characterized contributors to human genetic variation. SNPs are specific nucleotide sites in the human genome, where it is possible to have two (or even three or four) different nucleotides at a specific position on a chromosome. For example, there might be either a T or a G at a specific point in an individual’s genomic sequence, with differences in populations in which 40% of the individuals have a T at the specific locus, and 60% have a G. These variant sites are common, with up to 1% of the 3 billion base pairs of human DNA sequence varying between any two individuals, resulting in millions of SNPs across the genome. Most variation is found across all human populations, although some variants appear to be highly population or ancestry specific. Chip-based DNA sequence detection allows assays of greater than 1 million SNPs simultaneously on one individual at a cost approximating $100. Since 1999 over 500 million SNPs have been catalogued in the online public-domain resource, the Single Nucleotide Polymorphism database (dbSNP: http://www.ncbi.nlm.nih.gov/snp ). The vast majority of SNPs do not appear to be disease-causing, although they may lie adjacent to DNA changes that do contribute to disease predisposition and can be detected using the phenomena of linkage disequilibrium, in which the genotype of one SNP correlates highly with the genotype of nearby SNPs.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here