Deep Phenotyping: Adding New Tools to Understand Disease


DefiNing Phenotype

The terms genotype and phenotype were introduced by the Danish plant physiologist and geneticist Wilhelm Johannsen in 1909. The genotype refers to parts or all of the genetic material, whereas the phenotype may be made up of any observable characteristic or trait resulting from the genotype in the context of environmental and stochastic variation. Any molecular (including epigenetic marks), biochemical, physiologic, or morphologic property at any time point (e.g., developmental or adult) or at any scale (subcellular to organismal) and any of the individual’s behavior or products of behavior is a phenotypic characteristic. As there are innumerable characteristics that could be used to describe an individual, the collection of phenotypes that describe the individual organism defines the individual’s phenome . , As a consequence of mendelian genetic principles, there was previously a notion of a linear relationship between a genotype and a phenotype. By this logic, the relationship between genotype and phenotype can be viewed through a narrow and rather straightforward lens. However, it is well appreciated that biology is considerably more complex, and even in a single family with a mendelian disorder the relationship between genotype and phenotype is highly complex.

At the other end of the spectrum, the relationships between genome and phenome can only be described using the tools of systems biology. The relation between genotype and phenotype can be conceptualized as a genotype-phenotype (GP) map, assigning phenotypes to each possible genotype. The GP map concept applies to any time point in a living system and describes the outcome of very complex dynamics that include environmental effects ( Fig. 54.1 ). The relevance for these concepts in clinical genetics and clinical medicine flows from the potential for genotype to predict phenotypes that have not yet been observed or for a series of phenotypes to predict the underlying genotype and thus anticipate the mechanism underlying the disorder. Ultimately, the most important inference from these core concepts is that many of the phenotypes downstream of a specific genotype may be latent at any given time as a result of the absence of a conditioning variable or the lack of an appropriate detection tool. The expansion of the traditional clinical toolkit to include continuous, dynamic, or perturbed phenotypes and the addition of completely novel phenotypes together can be described as deep phenotyping .

Fig. 54.1, Genotype to phenotype map.

Genomics

Beginning in the 1990s, the Human Genome Project ushered in a leap in technologies that allow scientists to define the fundamental sequences of genes of entire genomes. Numerous model organisms have been studied, not just the human genome. The ability to derive, quickly and relatively inexpensively, the entire sequence of an organism’s genome provides unprecedented opportunities in biologic and ecologic sciences, including the opportunity to understand how environmental factors influence biology at the molecular level. So called first-generation sequencing technologies, originally described by Sanger and Coulson, have served as the primary technology for DNA sequencing for the last several decades, with estimated costs of $3 billion to sequence the human genome. , Large-scale sequencing projects based on several next-generation sequencing technologies can now be conducted faster and less expensively than was possible with previous generations of technologies. Third-generation sequencing promises to provide full genome sequencing of individuals (humans or other organisms) for around $1000 per sequencing. Genomics is just one of the numerous new fields of biologic investigation defined by their scale and comprehensiveness, including RNA-omics, proteomics, metabolomics, lipidomics, and others. Notably the genome itself is the only rationally bounded data set in biology and, as a result, is a framework around which much of the rest of biology, whether functional genomics or single-cell electrophysiology, can be understood, in particular as environmental and other perturbations are incorporated into the characterization of biologic responses.

Genotype-Phenotype Relationships

Over the last 20 years the amount of detail drawn onto the GP map has dramatically increased as functional genomics technologies have been deployed closer and closer to clinical care. Various permutations and combinations of genetic data along with metabolomic, lipidomic, or other data sets can be accessed in individual specimens. However, these pins in the map lead to a relatively stagnant number of described clinical phenotypes. The extent of the human disease catalog is such that disparate genomic results typically map on to a very limited number of clinical entities. To fully understand the relationship between genotypes and phenotypes will require much more granular clinical observations, which is a driver for much deeper, more granular, clinical phenotyping.

The average individual differs from other random humans by somewhere in the range of 10 6 to 10 7 variants with much greater complexity evident in most other “omics” data sets. However, we only have the phenotypic language of a few thousand different phenotypes. The OMIM Gene Map Statistics (updated July 3, 2020) suggests the total number of phenotypes for which the molecular basis is known is 6685, whereas the Human Phenotype Ontology suggests around 12,000 or more. These crude estimates give us a sense of the extent of the information content missing in traditional clinical assessments ( Fig. 54.2 ).

Fig. 54.2, Numerical mismatch between the numbers of panomic data and phenotypes.

Phenotypes and Phenomics

In the last few years with the advent of large data sets from electronic health records (EHRs), emerging use of wearable and other scalable personal technologies, the design of new continuous monitoring sensors, the potential for clinical use of functional genomics, and the availability of data science capabilities to understand the resultant massively parallel data streams have all led to the field of phenomics. Phenomics is the use of large-scale and unbiased approaches to study biologic systems. Phenomics can be used across the full range of biologic sciences, from studies of monocultures in controlled laboratory environments through agricultural field conditions to populations of organisms under rapidly changing conditions. Thus phenomics has broad importance in applied and basic biology and is equally relevant to goals as disparate as yield improvement in food and energy crops, disease processes, therapeutics, and understanding complex networks that control fundamental life processes. The information content necessary to match genomic complexity is likely to emerge from nontraditional sources, even if only because it will be difficult to increase information content by the requisite orders of magnitude without incorporating data already used for other purposes. Another critical feature of emerging phenomic data is its longitudinal collection and enrichment through ongoing cycles of perturbation. This concept of relatively low-cost immersive capture of data that enables the parsing of relatively sparse phenotypic clusters (such as clinical syndromes) is inherent to phenomics and might be imagined as an efficient stratification tool for disease, risk of specific outcomes, or other relevant end points.

Perhaps no subfield of medicine is better placed than cardiac electrophysiology to participate in deep phenotyping, given the precision of analysis with even the surface electrocardiogram (ECG), experience with decades of telemetry, device programming, or domain-dependent analysis. The widespread use of perturbations (e.g., ajmaline) to elicit latent phenotypes and systems approaches for analysis also form the basis for much of phenome science.

Precision Medicine: Getting to Mechanism at the Level of the Individual Patient

The central concept underlying precision medicine is a mechanistic understanding of each disease and its response to therapy sufficient to direct a specific intervention, all ideally at the level of the individual patient. To execute on this vision requires parsing incompletely defined disease entities into discrete mechanistic subsets and developing interventions to precisely address each of these etiologically distinct entities. Fig. 54.3 schematizes the current and future methods of using phenomic resolution . In the current environment, where diagnoses are essentially organ based, phenomenologic and only rarely mechanistic, drug development has become so expensive that it is now impractical to imagine the cost-effective creation of new interventions for many prevalent chronic conditions. The vision of precision medicine also argues for a much more seamless integration of research and development with clinical care, where shared taxonomies will enable every clinical interaction to inform our practical understanding of disease mechanisms and drug responses. Ideally, this would be executed in ways that drive real-time and real-world discovery, innovation, translation, and implementation.

Fig. 54.3, Current low-throughput methods result in biomedical knowledge gaps, with clinical decisions based on incomplete data, and treatments tailored through crude or empirical criteria .

Only in oncology, where at least some of the biology is accessible through surgical excision of the diseased tissue or liquid biopsy, has “co-clinical” modeling proven feasible. In most common germline disorders, although genetics often reveal causal variation, there still remain substantial barriers to efficient disease modeling at an individual patient or family level. Aggregation of similar disorders under single diagnostic labels for clinical trials or other purposes has directly limited etiologic and mechanistic understanding by reducing the resolution of downstream studies. Existing clinical phenotypes are typically anatomical or physiologic and result in a substantial mismatch in between what is possible to collect routinely in humans and what we know in animal models. This lack of one-to-one mapping of high-resolution phenotypes between disease and animal models causes a failure of translation and is another form of “phenotype gap.”

Deep Phenotyping

There are a number of possible approaches to creating a deep phenotyping infrastructure, but in general the most important theme, directly inferred from the results in genomics, is the exclusion of as much a priori bias as possible. To enable immersive disease surveillance, disease classification, and ongoing monitoring of risk, therapeutic response or other outcomes require transformation of the resolution and capabilities of clinical data collection. The contrast between deep phenotyping and broad phenotyping is often drawn, yet these distinctions are artificial. We may need deep phenotyping to understand mechanism in a specific disorder, but broad phenotyping is necessary to capture relevant information across heterogeneous diseases and to match the full potential of genomics. Ideal phenotypes will be standardized, continuous, and linear over several log orders; will have cellular or molecular representation; and will explicitly balance depth and breadth. Unless there is an effort to move our assessment of disease more proximate in the pathophysiologic chain, it will be difficult to achieve the resolution necessary to elucidate discrete disease mechanisms in a truly predictive manner.

To be able to define, at the time of diagnosis, with high probability the likelihood of future episodic problems such as sudden death will require physicians to capture, understand, and manage much more information than is currently available. Advances in the scale of data capture and analysis will also change the role of physicians in measuring and managing risk, which they currently perform on an episodic basis usually defined by “clinical events,” but ultimately will be a continuous responsibility.

Improving What We Measure in Complex Diseases

Inherent in the practice of medicine, is the collection of data to characterize a patient’s condition. Symptoms, physical exam, vital signs, laboratory testing, and biomedical imaging are defined to arrive at a diagnosis with subsequent refinement using additional diagnostic testing ranging from simple biomarkers to advanced imaging modalities. The resultant disease categories into which patients are placed can subsequently result in a quite specific understanding of their longer-term trajectory. For example, patients with an initial episode of atrial fibrillation (AF) will regularly have electrocardiographic monitoring and cardiac imaging as we make the logical connection between electrophysiologic function and cardiac structure. Based on epidemiologic data, we may also incorporate biochemical evaluation of the thyroid axis. However, at any given time, on the basis of the prevalent disease conceptual framework, other investigations may seem unrelated or out of place. Only in recent decades does it seem rational to perform overnight polysomnography for AF patients. Proposing such testing 30 or 40 years ago would have been counterintuitive to our understanding. Likewise, auditory evaluation may be seen completely unconnected to evaluation of AF. However, emerging data suggest that auditory electrophysiology might have insights to both diagnostics and therapeutics of AF. Specialty medicine inherently limits the evaluation of global phenotypes. Rigorous efforts to comprehensively evaluate patients from multimodal and orthogonal lines of investigation may be required to truly understand downstream phenotypes. In the short term, individual physicians will adapt to new data types, but in the longer-term substantial changes in provider roles will emerge as data science and analytics assume a greater role in the management of health and disease. Already device monitoring is beginning to include adjacent conditions as cardiac platforms such as pacemakers or implantable cardioverter defibrillators (ICDs) are the templates for implantable systems in many fields.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here