Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Functional genomics is a field of molecular biology that describes gene function and interactions and the biologic-based technologies that permit molecular analysis and manipulation of complex neural systems. These technologies range from disease mutation discovery to methods for in vivo correction of said mutations. With these approaches, it is possible to assess neuronal biology from individual genes to cellular physiology.
Genomic DNA must have an open-chromatin structure to elicit gene transcription. Chromatin structure once thought to be static is instead dynamic and changes with age, disease, and pharmacologic challenge. Chromatin structural dynamics are being assessed for therapeutic targeting.
RNA is transcribed from open chromatin, giving rise to RNAs that are structural, regulatory, and translated into protein. Changes in the transcriptomics landscape of cells can give rise to a brain tumor oncogenic phenotype. Analysis of the tumor transcriptomics state has progressed from analysis of ensembles of cells to assessing the transcriptome of single individual tumor cells. Such high-resolution transcriptomics analysis has highlighted the variation in transcriptome profiles between seemingly identical cells, suggesting that cell-to-cell transcriptome plasticity may be an important component of CNS function.
As mutations are identified that give rise to disease-associated neuronal gene function, it would be therapeutically desirable to correct such mutations. Although gene therapy such as gene addition has been used successfully in some diseases, overexpression and other artifacts can complicate the therapeutic efficacy of such approaches. The advent of CRISP-Cas 9 and its variations has made it possible for somatic cells to be genetically modified at specific mutations to produce a “normal” gene in vivo. There have been many advances in understanding and manipulating the CRISPR-Cas9 system so it can be used to mechanistically understand biologic processes and correct abnormal gene function. Advances in using this technology in the CNS have benefited from close collaborations between neuroscientists, genomicists, and clinicians.
Identification of gene defects is relatively straightforward, but gaining mechanistic insight into how these mutations result in disease physiology requires manipulation of the system, including individual specific circuit-associated cell manipulation. Pharmacologic manipulation of cells has been used for decades to gain insight into mechanism, but now techniques such as chemogenetics (using artificial receptors and ligands to manipulate specific cells) and optogenetics (using light to control individual cells) are providing high-resolution means to modulate selected in vivo cells to better understand the cells underlying physiology. An important component of such functional genomics analysis is the efficiency and specificity of transgene delivery to cells of interest. New viral vectors and other approaches are making this process more efficient and providing a means to potentially use these approaches as therapeutic modalities.
The histologic and physiologic observations of Ramon y Cajal on the structure and connections of neurons have shaped one of the most enduring lines of scientific inquiry in neurobiology: understanding the principal mechanisms by which the fate or physiologic state of central neurons is specified during development at the correct time and place to facilitate the quadrillion synaptic connections that are established, maintained, and remodeled. These arrays of neural networks establish and codify our perceptions and other cognitive functions. To do so, synapses bring together in apposition specialized morphologic structures of the presynaptic (axonal) and postsynaptic (dendritic) subcellular neuronal domains that are often ensheathed by the end feet of astrocytes. Our understanding of neuroscience in these molecular terms has been developed over the past several decades with one of the remaining fundamental challenges: understanding how the precision and strength of these connections relies on the pinpoint placement of gene products in each of these cellular compartments.
At his acceptance speech for the Nobel Prize in Medicine in support of the neuronal doctrine of His, Forel, and Nansen in 1906, Cajal emphasized :
“However, it must be said that some of the physiologic inferences drawn from observations made by the elective methods of these last twenty-five years have been contended, and naturally cannot be considered as unimpeachable dogmas. Present-day science, in spite of its well-founded conclusions, has not the right to foretell the future. Our assertion can go no further than the revelations of contemporary methods. Perhaps, with tune, technique will discover some coloration process capable of revealing new and more intimate connections between neurons thought to be in contact.”
Indeed, over the past decade, scientific innovation has transformed a rapidly evolving set of molecular techniques to gain insight into understanding the dynamics of gene expression. The “omics” breakthrough not only has been critical to understanding the mechanistic underpinnings of normal development, but also the role that a gene or sets of genes play in neurological functions and diseases. In past editions, we highlighted the core principles of molecular and cellular biology that steered the development of past methods. In this edition, we chose to highlight a set of experimental approaches from the contemporary molecular neuroscientist’s toolbox that have emerged over the past decade and have provided the ability to quantitatively examine and manipulate the functioning of neuronal systems.
Continuing efforts toward miniaturization and scalability epitomize the new “omics” technologies that are transforming nervous system studies by allowing data-rich and detailed characterizations of the molecular mechanisms underlying cell physiology at ever-smaller scales, even to individual synapses. At its core, functional genomics aspires to integrate data from the study of different molecular strata—the genome, transcriptome, proteome, metabolome, and their regulatory mechanisms—into a systems-level understanding of cell biology. The ostensible goal is to obtain a richly detailed, global understanding of the nervous system’s emergent properties through the interactions among its constituent elements. In doing so, it promises to expand our insight into the fundamental functioning of the CNS and root problems of complex diseases that will transform the current predictive power of our diagnostic and therapeutic regimens.
Concurrent with the rise of functional genomics has been the development of massively parallel, multiplexed gene expression analysis and direct sequencing technology. Exploiting the now well-worn concept of molecular hybridization between Watson-Crick nucleotide base pairs, commercially available microarrays and the more clinically utilized sequencing panels used in routine diagnostic platforms with reporter-based readout mechanisms are direct outgrowths of methodologies first conceptualized by Sol Spiegelman and Edwin Southern. Genome-wide association studies (GWAS) using single-nucleotide polymorphism arrays identify genetic variants across different individuals to ascertain if any variant is associated with any trait. Since the landmark GWAS for age-related macular degeneration, more than 3700 GWAS have been published identifying greater than 52,000 unique single-nucleotide variant (SNV)-trait associations at a genome-wide significance ( P < 5 × 10 –8 ) . The significance of an unexpectedly large number of these gene variants has been confirmed in follow-up studies. , In the CNS, the overall strength of these studies, most performed with microarrays, has been the ability to successfully identify risk loci for multifactorial neuropsychiatric diseases including schizophrenia, major depressive disorder, and anorexia nervosa ; degenerative diseases such as Alzheimer disease ; sleep disorders such as insomnia ; and complex cognitive traits such as educational attainment. In some GWAS, genes within the risk loci that had not been previously suspected to contribute to the underlying pathophysiology of the disease have led to the discovery of novel mechanisms of action that phenotypically converge in pathogenesis. For example, genome-wide analysis identified risk susceptibility polymorphisms in a previously unknown autophagy-related pathway , in Crohn disease that is distinct from SNVs at various risk loci associated with immunity, host defense against microbes, and gut homeostasis. Altshuler et al. argue that this type of discovery, providing novel insights about mechanisms of disease, is the primary value of gene mapping as opposed to risk prediction because of its capacity to generate testable strategies for prevention, diagnosis, and therapy. It is also now understood that these types of risk loci can vary in frequency and penetrance for the same complex disorder as a result of GWAS profiling of heterogeneity across different ethnicities. In type 2 diabetes mellitus (T2DM), separate genetic polymorphisms (i.e., TCF7L2 in Europeans, TB1D4 in Greenlanders, KCNQ1 in East Asians, , and SLC16A11 in Mexicans ) are thought to be a reflection of ethnic differences in risk allele frequency and appear to have the greatest size effect on T2DM susceptibility.
The cumulative impact of genome-wide association analysis has also led to some reconsideration of how precisely genetic variations contribute to phenotype expression. Historically, oligogenic disorders, or traits, are thought to arise from the sum effect of low-penetrant common variants. This hypothesis, more colloquially referred to as the common disease, common variant hypothesis is contrasted with Mendelian diseases that stem from highly penetrant, rare, single-gene variants. There is gathering evidence that a strict distinction between these two types of genetic architectures is too simplistic. Contrarian cases to the common disease, common variant hypothesis have been observed in traits affecting plasma cholesterol levels , or the proliferation and differentiation of hematopoietic progenitor cells in which both common and rare low-frequency variants exert notable size effects. When combined with observations from Mendelian disorders showing how common variants can modify disease severity, genetic architectures appear to present on a spectrum between two poles. As a result, it is often common to use the rare, single-gene disruptive mutations of the Mendelian form of a disease as a starting point to gain greater insight into new molecular pathogenesis pathways of more common diseases that share some trait component.
Despite these advances, there is still considerable debate surrounding GWAS. Concerns persist that GWAS SNV loci only explain a small minority of the heritability of complex traits, may represent spurious associations solely caused by sampling differences because of cryptic population stratification that do not precisely identify causal variants and genes, and uncover too many risk loci to result in testable models. Particular criticism has centered on the claim that SNVs incompletely explain the estimated heritability of most complex disorders or traits. For example, a classically quantitative trait such a human height has an estimated heritability of 80%, but the loci identified in early GWAS confer relative small individual contributions explaining only about 5% of the phenotypic variance. Although several reasons for the missing heritability have been posited, conventional GWAS microarray approaches generally use a Bonferroni correction to maintain the genome-wide false-positive rate of 5%, assuming 1 million independent tests for common genetic variation. This significance threshold underpowers GWAS and limits its ability to depict a more comprehensive representation of heritability explained by SNVs in part by excluding risk loci of modest effects that cannot exceed such stringent significance criteria. However, recently implemented imputation methods that can accurately estimate genotypes, or genotype probabilities, at markers that have not been directly examined by GWAS, find negligible missing heritability for complex traits such as height or body mass index.
Structural variation (SV) in the genome encompasses a spectrum of quantitative chromosomal rearrangements ranging from microscopically visible chromosome anomalies to submicroscopic deletions. The totality of indels (insertions/deletions), duplications, inversions, and translocations are thought to account for an order of magnitude more polymorphic base pairs than single-nucleotide differences. Unbalanced rearrangements, or copy number variations (CNVs), which alter the diploid status of human DNA by deletion or duplication of segments of the genome, are ubiquitous, represent the largest fractional component of SVs, , and induce a spectrum of phenotypic effects from adaptive traits to embryonic lethality. When SVs are found in the germline, they are an abundant source of congenital disease, regardless of whether the rearrangement affects whole chromosomes (e.g., Patau or Down syndromes ) or microdeletion and microduplication syndromes (e.g., Charcot-Marie-Tooth syndrome type 1A), but when found in somatic cells, SV enrichment is a recurring point of reference across a variety of tumor types. The pathogenicity of SVs that disrupt or duplicate the coding sequence is currently interpreted through the lens of gene dosage effects. Moreover, a series of case studies have highlighted the pathogenic effects of SVs localized to noncoding regions of the genome through so-called position effects on gene cis -regulatory elements such as promoters, enhancers, and insulators. ,
Clinical evaluation has increasingly relied on genome-wide cytogenetic tests such as array comparative genomic hybridization (CGH) platforms for molecular karyotyping and CNV discovery, particularly in assessments of children with learning disabilities and rare pediatric syndromes. Array CGH platforms have become experimental workhorses by combining fluorescence techniques with genomic microarray technology to allow the comparison of genomic differences in two differentially labeled samples, a test genome (the patient) and a reference genome (the control), for which the normalized signal ratio between them is a proxy for copy number. Although pathogenic CNVs are detectable at any position within the genome, these cytogenetic platforms cannot detect balanced rearrangements, are rarely effective in the context of genetic mosaicism, and are limited in dynamic range of sensitivity and therefore resolution by technical features inherent to microarray hybridization kinetics. , Underscoring these concerns, a recent meta-analysis of array CGH studies evaluating the diagnostic and false-positive yields of these platforms suggest that only ∼10% of patients with rare pediatric diseases can be diagnosed using these approaches. In contrast with the relatively low resolution of array CGH techniques, massively paralleled short-read sequencing technology via whole-genome or whole-exome sequencing combines the benefits of visualizing high-resolution, individual sequence variation with genome-wide cytogenetic tests. For diseases with a heterogeneous ontogeny, such as infantile epileptic encephalopathy or Bardet-Biedl syndrome, in which SVs spread across any one of several dozen genes may be causal, next-generation sequencing enables a fast and simultaneous karyotyping alternative. Moreover, next-generation sequencing techniques are particularly suited for analysis of small indels, as observed in the aforementioned infantile epileptic encephalopathy, because of its ability to detect these SVs with high sensitivity. Although the short-read technology has the potential for highly comprehensive maps of SV rearrangements, it lacks the ability to detect break points in repetitive regions, where many breakpoints frequently occur. As a screen for validation of disease states, optimized gene panels sequenced at high depth have become prevalent in the clinic. The range of uses for sequencing panels varies from screening for fetal aneuploidies in utero through DNA-based assays of maternal blood to distinguishing disease states in high-risk breast cancer patients with BRCA1/2 mutations for treatment and prevention decisions. They are also useful beyond diagnostics as once disease has occurred, they can be used to identify clinically actionable mutational profiles as observed in malignancies that then encourage subject enrollment in clinical trials.
Although next-generation sequencing with whole-exome or whole-genome sequencing is amenable to detecting CNVs, widespread application to patient diagnostics has been met with challenges in computational approaches and the relatively high cost of deep sequencing that is required. There are four main bioinformatics methods for detecting CNVs within next-generation sequencing datasets: read-pair, split-read, read-depth, and assembly-based methods. Although each method has its own set of advantages and limitations, none of the four methods alone is sufficiently comprehensive. For example, while read-depth methods are best suited for detecting absolute copy number, it is often poor at detecting small CNVs, such as exon deletions or duplication, which underlie such conditions as Duchenne muscular dystrophy and spinal muscular atrophy type 1. Bioinformatic tools using read-pair methods are capable of detecting medium-sized insertions and deletions from mapped data, but small insertion and deletion events are relatively insensitive to detection because of difficulties separating small perturbations in read-pair distance from background noise. Split-read pipelines can detect breakpoints with base pair precision and perform well with detection of small deletions and insertions, but remain highly dependent on read length and have a low reliability for detecting variation in repeating regions. For assembly-based tools, their exclusive advantage is that they do not require a reference genome and thus allow for discovery of de novo CNV mutations that have established roles in the etiology of many disorders including intellectual disabilities, , autism spectrum disorders, , and sporadic schizophrenia. , However, assembly-based methods require extensive computation and perform poorly on repeat and duplication regions. , Combined approaches take advantage of the unique attributes of each of these methods and use stepwise approaches to combine datasets from two or more sets of analysis to generate more comprehensive karyotyping of CNV organization and distribution. For example, SVDetect combines the breakpoint predictions of the read-pair approach and the log-ratio from the read-depth analysis between matched case and control samples to infer the probability of gain and loss events.
After mitosis, chromosomes partially decondense to occupy largely discrete pockets of territory in the cell nucleus and cluster at nonrandom radial positions, with small chromosomes positioned more interiorly and larger chromosomes positioned more proximally to the nuclear periphery. These chromosomal territories are separated by an interchromatin domain, which is rich in soluble nuclear machinery, where there is a high frequency of chromatin mobility and interchromatin interactions. Within chromosome territories, widespread spatial differences have been described for gene-rich and gene-poor regions, , with the positioning of individual genes juxtaposed to the nuclear periphery and pericentromeric heterochromatin correlating with gene activity. , Most tellingly, genes located far apart on a chromosome (i.e., ∼40 Mb) colocalize at high frequency when transcriptionally active. These interactions that occur as a result of chromatin folding underscore the notion that the topology of genomic enhancer regions has a significant influence on the control of transcriptional activity of regulated genes.
Using techniques that preserve the three-dimensional nuclear structure and specific higher-order chromatin interactions, highly scalable techniques have been developed to observe the chromatin landscape at an unprecedented resolution with high-throughput analysis. First described by Dekker et al. in yeast, chromosome conformation capture (3C) and derivations of it use a biochemical approach to determine the frequency of chromatin folding events ( Fig. 61.1 ). 3C relies primarily on cross-linking the DNA, most often with formaldehyde, fragmenting the DNA through restriction digest or sonication, and finally analyzing the three-dimensional relationship through a ligation step that strongly favors ligations between cross-linked DNA fragments under diluted conditions. When done in this manner, DNA fragments that are distant in linear distance but spatially colocalize within the three-dimensional landscape can be ligated to each followed by reversal of the cross-linking, which allows for semiquantitative or quantitative polymerase chain reaction (qPCR) , of selected ligation junctions to measure the number of ligation events between non-neighboring sites. Previous studies have noted in detail the need for an optimal fixation protocol, appropriate restriction enzyme selections, and rigorous 3C template purification. , When adapted for use in mammals, 3C techniques have been invaluable tools for corroborating the existence of long-range, pairwise chromatin loops, typically spanning up to several hundred kilobases, and the dynamic interplay among transcriptional regulatory sequences (e.g., promoters, enhancers, silencers, and insulators) and distant target genes. , Although 3C techniques resolve these spatial interactions at high resolution, they suffer from low throughput with limited coverage of the genome because they can only identify contact between a single pair of loci at any one time and are unsuitable for analysis of long-range interactions exceeding ∼600 kb because of the capture probability of the assay.
The lineal successor to 3C techniques, chromosome conformation capture carbon copy (5C) technology, scales the chromatin interaction analysis to determine a more comprehensive higher-order, multi-loci chromosome structure in a defined genomic region. To do so in a high-throughput manner, 5C uses a set of highly multiplexed oligonucleotides to facilitate a ligation-mediated amplification step to enrich selected ligation junctions within the genomic region of interest, thereby generating a carbon copy of a subset of the initial 3C library. Just as done in 3C techniques, 5C identifies physical pairwise interactions, but with lower bias in polymerase chain reaction (PCR) amplification and on a considerably larger scale , with the number of unique ligation products being detected simultaneously reaching upward of 10 8 with modifications of the ligation-mediated amplification protocol. Layered on top of these individual topologic associations, 5C analysis constructs a matrix of contact frequencies for each of these interactions across the region of interest that can be reflected in three-dimensional simulations of the regional chromatin structure during various states of activity. However, the upper limits of 5C oligonucleotide multiplexing continue to limit the size of the defined region that can be analyzed.
Unlike 3C and 5C techniques, which require some a priori knowledge of sequence information for both interacting chromosomal regions, the development of circular chromosome conformation capture (4C) techniques facilitates unbiased, genome-wide screens of spatial interactions originating from the viewpoint of a single genomic locus. Starting with 3C ligation products, 4C technology utilizes a second round of restriction enzyme digestion and ligation to create self-circularizing DNA products based on the proximity ligation principle in which circular DNA molecules are generated under high concentrations of ligase and weeks-long prolonged incubations. Then, inverse PCR of the circularized 4C library facilitates identification of all genome-wide sequences contacting this chromosomal site. Of note, 4C studies have further illuminated the nexus between the relative stability of chromosome topology and transcription and the interactome map shared by mitochondrial and nuclear DNA in mouse astrocytes.
In contrast with 4C techniques that use a defined point in the genomic landscape to outline comprehensive maps of interactivity, chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) maps the de novo chromatin interaction viewpoint of a known DNA or chromatin-interacting protein with high resolution (i.e., ∼100 bp to 1 kb). This technique uses a chromatin immunoprecipitation (ChIP)-based enrichment method to analyze the ligation junctions formed between DNA sites that are pulled down with an antibody against a protein of interest (e.g., sites bound by specific transcription factors) and extracted for short-read (i.e., 2 × 20 bp) sequencing with high-throughput methods. The libraries generated by a typical ChIA-PET experiment require ∼10 9 cells of input but are of a relatively low complexity compared with other 3C-based techniques, resulting in interactome maps for the ligand-activated transcription factor estrogen receptor α and the insulator CCCTC-binding protein generated from relatively shallow sequencing depths. The development of a long-read (i.e., 2 × 250 bp) version of ChIA-PET improves the fragment-mapping efficiency and is capable of inferring haplotype-specific chromatin interactions if phased SNP information is available. More recently, a method that leverages principles of ChIP, Hi-C and transposase-mediated on-bead library construction (HiChIP), used cohesin as a case study to demonstrate that highly comprehensive interactome maps can be generated with an order of magnitude more informative reads while requiring 100-fold less starting material.
Although 3C (two defined-regions), 5C (multiplexed amplification), 4C (inverse PCR to point of interest), and ChIA-PET (all contacts to protein of interest) reflect targeted approaches to spatial organization of genomes, Hi-C represents a more recently developed unbiased approach that utilizes biotinylated adapter ligation, which allows for the pull-down of the sonicated fragments to enhance the enrichment of interacting fragments across the entire genome. Following next-generation sequencing, sequences are mapped with various bioinformatic tools such as iterative mapping to create a contact heat map showing the genome-wide spatial associations. Such studies require deep sequencing and have been performed to further characterize the complexity of functional and structural differences of open and closed chromatin compartments, smaller partitions within these compartments, dubbed topologically associated domains (TADs), , and the biophysical principles governing chromatin packing. , Probing all of these aspects of three-dimensional chromatin architecture using Hi-C and ultra-deep sequencing to examine neural differentiation and cortical development in mice, the highest-resolution Hi-C interactome maps generated to date offer a richly detailed blueprint relating spatial genome architecture to gene expression and cell fate. Modifications of the conventional Hi-C protocol have also been adapted for use in single cells to demonstrate how discrete TADs contact maps are conserved but interdomain and transchromosomal interactivity are highly variable. In addition, translational research exploring the role that chromatin organization plays in patient-specific transcriptional programs in glioblastoma found marked intertumoral heterogeneity in chromosomal conformations as well as enhancer regions related to stemness and cancer-associated genes that uncovered new potential therapeutic targets, namely CD276, to curb the self-renewal properties of glioblastoma stem cells.
One additional consideration that should be noted with all 3C-based techniques (e.g., 3C, 5C, 4C, Hi-C, ChIA-PET) is that they provide information about the frequency but not the functionality of the pairwise interactions. Thus, additional experiments, often using transgenics or conditionally expressed models of gene expression, are required to address whether any particular element of the interactome is functionally meaningful as these physical contacts may well be nonfunctional and merely a consequence of general folding patterns of chromosomes.
Alternative techniques have also been developed to assess chromatin accessibility. The assay for transposase-accessible chromatin using sequencing (ATAC-seq) harnesses the use of the hyperactive transposase and transposon, Tn5, to generate high-throughput epigenetic profiles. Using this method, Tn5 is loaded in vitro with adapter sequencing primers, and on exposure to chromatin, Tn5 inserts into double-stranded GC-rich genomic DNA regions that are open because the nucleosomes have decondensed. After insertion, the Tn5 and neighboring DNA are cut out of the chromosomal DNA, and the material PCR is amplified with the Tn5 adaptors and made into a sequencing library. The data-rich sequencing information can be curated to identify rates of transcription factor occupancy, nucleosome disruption within active regulatory sites, and genome-wide chromatin accessibility. Variants of ATAC-seq adapted for single cells have provided a first glance at the individual cell-to-cell heterogeneity observed at each of these regulatory levels of the epigenome. ATAC-seq has generally supplanted formaldehyde-assisted isolation of regulatory elements (FAIRE)-seq, which isolates nucleosome-depleted DNA from chromatin to infer active regulatory elements, as well as DNAse-Seq, which exploits differential hypersensitivity to DNAse I cleavage revealing transcriptional start sties and active promoters in active chromatin states.
Gene expression profiling continues to be the most widely used functional genomics technology due in equal parts to its early development, ease with which it can be performed, decreasing costs, and high information content. High-throughput and high-sensitivity sequencing technologies, also called deep sequencing, are now the primary workhorse for determining the identity and abundance of mRNA and other classes of RNAs including microRNAs (miRNA) and piwi-interacting RNA (piRNA). In addition, there are innumerable variations of the technology that have a specialized emphasis, for example the systematic discovery of the structure of RNA (i.e., Frag-seq and icSHAPE).
Neurons and astrocytes constitute co-equal populations intermingled with other cell types including oligodendrocytic and microglia support cells within the CNS. The intrinsic properties of differentiated neurons, including morphology, types of neurotransmitter release, projection targets, and basic input/output characteristics, exist along a wide spectrum of neuronal phenotypes, even within the same neuroanatomic region. Underlying these cellular properties are precise subcellular spatiotemporal patterns of gene products expressed with pinpoint accuracy. Our understanding of neuroscience in these refined and specific cellular and molecular terms has increasingly redirected the focus of transcriptomic landscape analysis using RNA-sequencing (RNA-seq) from ensembles of cells within brain substructures to investigating individual cell-to-cell heterogeneity of neuronal or non-neuronal cells. , In particular, current single-cell RNA-seq (scRNA-seq) techniques attempt to identify molecularly distinct subpopulations of cell types by documenting the endogenous variance, or differential gene expression, in the transcriptome while accounting for the technical noise inherent to these datasets. These data are used to infer what role transcriptomic-driven differences in cell state dynamics have in the interplay among cellular phenotypes, their broad-ranging interconnectivity, and their microenvironment. The importance of single-cell analysis in providing an understanding of the building blocks of the nervous system was recognized and encouraged by the US BRAIN Initiative report, BRAIN 2025 ( https://braininitiative.nih.gov/strategic-planning/brain-priority-areas ).
The breadth of scRNA techniques have sequentially enhanced the initial series of proof-of-concept studies examining the transcriptomic profiles of single hippocampal neurons or neuronal dendrites , by integrating deep-sequencing capabilities with technical facets for cell-specific bar codes that enable greater multiplexing capabilities , and unique molecular identifiers that mitigate amplification bias that is common to all scRNA-seq protocols. , Marrying these technical improvements with commercial or custom-made , microfluidics devices generates reverse-emulsion droplets that act as nanoliter reaction chambers for reverse transcription of cellular mRNA from individual cells. As a result, scRNA-seq protocols are readily scalable to many thousands of cells. However, concerns about the inefficiencies in encapsulation, differences in capture efficiency resulting from incomplete lysis, and the need for specialized equipment in droplet-based microfluidics techniques has led to the development of Seq-Well, which uses arrays of subnanoliter wells for scRNA-seq analysis.
As a general rule, the number of cells required for these characterizations of the single-cell transcriptome increases with the complexity of the sample population. Single-cell data have shown many previously unexpected results, including studies of morphologically similar cortical neurons clustering into distinct cell types with specific patterns of biomarker expression. , One method to analyze large numbers of single cells employs droplet-based scRNA-seq methodologies in which individual cells are encapsulated in a single aqueous droplet where the initial molecular biologic reactions occur. Using a droplet-based technique to analyze single cells in the mouse somatosensory cortex and hippocampus, at least 47 transcriptionally distinct subclasses of cells were identified, including 7 subclasses of largely layer-specific S1 pyramidal neurons, 2 types of CA1 glutamatergic neurons, 16 types of interneurons in which similar types were found in dissimilar regions of the brain, 2 types of astrocytes, and 6 subtypes of oligodendrocytes. Alternatively, when scRNA-seq was used in a translational study encompassing 430 freshly isolated, FACS-sorted cells from five primary glioblastomas, analysis of cell-to-cell variability indicated that individual cells from the same tumor were more correlated to each other than were cells from different tumors. Intratumoral subtype heterogeneity of transcriptome profiles most correlated with the expression of a primitive subpopulation of stemlike cells, suggesting that the clinical outcome is influenced by the degree of heterogeneity.
Most single-cell isolation procedures inevitably compromise positional information because of their use of mechanical or enzymatic techniques for disaggregation of individual cells. The in vivo position of a neuron and the pattern of its neuronal connections are specified by an intrinsic set of transcription factors (and other genes) that regulate cell migration, axon guidance, dendritic branching, and synaptogenesis. Understanding these transcription factor cascades in the soma and its neural projections, as defined by the cellular transcriptomics (where the cell is in its natural microenvironment), enhances our insight into the molecular basis of neuronal development and neurological disease. The development of high-throughput spatial transcriptomics captures gene expression dynamics in cells and subregions of cells, where the endogenous cellular interactions remain intact.
At the present time, spatial transcriptomics methods are divided into two general classes: single-molecule fluorescence in situ hybridization (smFISH)-based methods and array-based methods. The smFISH-based methods, , such as sequential fluorescence in situ hybridization (seqFISH), seqFISH+, and MERFISH, exploit the spatial resolution of the technique to simultaneously examine the localization of hundreds to thousands of gene products. For example, seqFISH targets each gene with 24 probes labeled with fluorophores that can be spectrally resolved. By serially reprobing the same slide with probe sets with known switches in fluorophore combinations, several hundred genes can be imagined. A highly multiplexed adjunct of seqFISH called SeqFISH + is able to perform transcriptome-level profiling of an order of magnitude more genes. Using two algorithms for clustering analysis, deconstructing the seqFISH+ data to reveal the visualization patterns of the subcellular localization of mRNAs in a specific layer of the cortex suggests a strong correlation with specific clusters of gene expression from an scRNA-seq dataset.
In situ transcription (IST) is a technique that uses gene-specific or oligo(dT) primers to reverse-transcribe endogenous mRNAs in situ in fixed tissue sections adsorbed to a glass slide for rapid and highly effective spatial detection of mRNA. , It is also a component of the fluorescent in situ sequencing (FISSEQ) methodology. In array-based techniques, the IST method is inverted, where barcoded oligo(dT) primers for capturing mRNA are first arrayed on a glass slide. For example, Ståhl et al. initially explored the possibility that cDNA could be reverse-transcribed from captured mRNA of olfactory bulb tissue placed on arrayed oligo(dT) primers. The tissue was fixed, stained with hematoxylin and eosin (H&E), and imaged. It was then permeabolized, and a reverse-transcription reagent mix containing a fluorescently labeled nucleotide was added to the top of the tissue. Following digestion of the tissue, the fluorescently labeled cDNA pattern was compared with the H&E staining to show the minimal diffusion of the labeled cDNA from the site of individual cells. Spatial transcriptomic-derived sequencing libraries of olfactory bulb tissue compared favorably with both laser-capture microdissection sequencing libraries and Allen Brain Atlas in situ hybridization data. More importantly, analysis of differentially expressed genes in homologous regions showed very similar profiles, while genes with previously known restricted patterns of expression in both the glomerular layer and the granule cell layer were observed.
Become a Clinical Tree membership for Full access and enjoy Unlimited articles
If you are a member. Log in here