Proteomics - Clinical Tree

Abstract

Background

Clinical proteomics has traditionally referred to experiments that attempt to discover novel biomarkers for disease diagnosis, prognosis, or therapeutic management by using tools that measure the abundance of hundreds or thousands of proteins in a single sample. These discovery experiments began with protein electrophoresis, particularly two-dimensional (2D) gel electrophoresis, and have evolved into workflows that rely very heavily on mass spectrometry (MS). Using the workflows developed for discovery proteomics, clinical laboratories have developed quantitative assays for proteins in human samples that solve many of the issues associated with the measurement of proteins by immunoassay. This technology is changing clinical research and is poised to significantly transform protein measurements used in patient care.

Content

This chapter begins with the history of clinical proteomics, with a special emphasis on 2D gel electrophoresis of serum and plasma proteins. It then describes discovery techniques that use MS, including data-dependent acquisition and data-independent acquisition. It finishes with a discussion of targeted quantitative proteomic methods, both bottom-up (proteolysis-assisted) and top-down (intact) as replacement methodologies for immunoassays and Western blotting. Special attention is paid to peptide selection, denaturation and digestion, peptide and protein enrichment, internal standards, and calibration.

Historical perspective

The word proteome is a combination of the words protein and genome, first coined by Marc Wilkins in 1994. Wilkins used the term to describe the entire complement of proteins expressed by a genome, cell, tissue, or organism, and proteomics refers to the comprehensive identification and quantitative measurement of these proteins. Today, the term encompasses separation science, protein microchemistry, bioinformatics, and mass spectrometry (MS) as the fundamental techniques used in the large-scale study of protein identity, abundance, structure, and function.

Analysis of proteomes, and the human proteome in particular, was a logical extension of the completion of the human genome, which revealed the genetic blueprint from which the proteome is constructed. The promise of proteomics for clinical research was an outgrowth of the understanding that many clinically relevant markers are present in blood and that disease-related aberrations in protein abundance might be quantified by comparative analysis of proteomes.

Early proteomics

The earliest investigations of the human serum proteome preceded the completion of the human genome by 24 years and used two-dimensional gel electrophoresis (2D gel) to provide a high-resolution snapshot of serum proteins ( Fig. 24.1 ). In this technique, a complex pool of proteins is first resolved by isoelectric point in the first dimension and by molecular mass in the second dimension. This spreads the proteins in the sample into an array in which each spot is theoretically associated with one protein. After resolution, staining is used to visualize the proteins. Application of this technique to serum illuminated its complexity, revealing over 300 spots, which were suggested to arise from 75 to 100 unique proteins. Some of these proteins were identified by comparison with the migration of purified protein standards, comparison with immunoprecipitated proteins, or immunoblotting.

FIGURE 24.1, Two-dimensional gel electrophoresis. Before the advent of mass spectrometric methods to probe the proteome, proteins were resolved and quantified using two-dimensional polyacrylamide gel electrophoresis. Proteins were first separated based on isoelectric point (horizontally) and then based on size (vertically). Hundreds of spots were visible.

Although powerful, 2D gels did not gain widespread use for comparative studies until the early 1990s. The main limitation was a lack of tools for the rapid identification of the protein(s) contained in a single spot, which could link the location, intensity, and identity of a gel spot to biology. Before the advent of MS-based tools, identification of a single protein took weeks of dedicated labor starting with an effort to isolate sufficient protein for subsequent analysis. Once isolated, the extracted protein was proteolyzed with trypsin and two or more peptides were isolated using preparative HPLC separation. These peptides were then analyzed with Edman degradation to obtain short N-terminal amino acid sequences. The sequence data, along with approximate molecular weight, isoelectric point, and any other available data, were used to search available databases and infer protein identity.

Everything changed during the early 1990s, when an array of technologies transformed protein identification from an arduous task to a scalable, simple procedure that could be completed in a day or two. For example, improvements in MS sensitivity, resolution, and computer control along with the development of gentle ionization methods (matrix-assisted laser desorption ionization [MALDI] and electrospray ionization [ESI] ^, ; see Chapter 20 for a discussion of ionization techniques) enabled laboratories to readily collect easily interpretable mass spectra. Second, protein microchemistry techniques allowed sufficient sample to be extracted from single gel spots to yield interpretable mass spectra. Finally, high-quality genomic databases combined with novel statistical algorithms allowed expert systems to convert mass spectrometric data to protein identifications with unprecedented speed and accuracy. ^, Together, these technologies merged into a powerful analytical platform that enabled protein identification and characterization in a manner that matched the throughput and number of protein spots that could be resolved on a 2D gel.

Progress continued into the mid-2000s, with advances in 2D gels and associated technologies that allowed the reproducible resolution of hundreds to thousands of protein spots from complex samples. Efforts to apply the techniques in clinical research quickly expanded to diverse sample types, including solid tissue, ^, blood, ^, tears, and urine. The general approach was to obtain samples representing healthy and disease states, perform preanalytical fractionation, separate the individual protein fractions by 2D electrophoresis, and stain the gels (see Fig. 24.1 ). For additional discussion of electrophoretic techniques, refer to Chapter 18 . Using imaging techniques, comparative analysis of gels identified co-migrating spots with differential intensities or changes in spot position. A number of public 2D gel databases were published on the internet, some of which are still available (e.g., http://world-2dpage.expasy.org/list/ ). Many 2D gel-based studies reported statistically significant differences in protein abundance associated with disease, but validation of these discoveries leading to a biomarker with sufficient promise for clinical studies was not realized. Frequently, candidate biomarker proteins were acute-phase proteins that were known to generally associate with many diseases. Alternatively, high-abundance proteins with no clear biological significance to the disease were identified, suggesting experimental artifact. The constraints on proteome depth and breadth and the inability to perform sufficient technical and experimental replicates to overcome variability left many gel-based experiments significantly underpowered to successfully detect biomarkers.

As an analytical tool for proteome analysis, 2D gels suffer from variability, significant demands on labor, and limited dynamic range. Since biological samples had a tremendous range of protein abundances (7 to 12 orders of magnitude), the 1 to 2 orders of magnitude provided by 2D gels made a deep analysis of the proteome essentially impossible. ^, Unfractionated samples of tissue or blood often have 10 to 20 proteins that account for 80% or more of the total protein content of the sample. As a result, the observable proteome on a 2D gel is severely limited. Loading greater amounts of sample onto 2D gels causes distortions during separation, dramatically reduces reproducibility, and can contaminate large portions of the gel, leading to frequent identification of abundant proteins, even in unexpected regions on the gel.

To improve the depth of analysis, creative fractionation and depletion strategies were developed to work around the protein abundance problem. These approaches focused on separating protein pools into subfractions based on chemical, structural, or biophysical properties. ^, Depletion of highly abundant proteins using immunoaffinity and semi-specific chemical affinity approaches also became commonplace. These approaches were effective but added cost, complexity, and variability to an already challenging technique. Although this discussion is historical, the challenges of proteome dynamic range are still a consideration in modern proteomics workflows.

It was soon recognized that complex mixtures of proteins could be directly ionized and analyzed with MALDI MS to give complex spectra comprising the masses and crude abundance of many proteins. Although much more limited in resolution than 2D gels (as a result of reducing separation to mass alone), it was appealing to suggest that MALDI spectra could substitute for 2D gels and resolve at least some of the long-standing technical issues that compromised gel-based analysis: labor intensiveness and poor throughput. With the ability to apply the MALDI technique to biomarker discovery, efforts could yield many more experimental observations with lower labor requirements and ostensibly better reproducibility.

Biomarker discovery in the post–two-dimensional gel era

With the increased throughput and improved precision that appeared possible with MALDI analysis of complex mixtures, manufacturers developed integrated systems of reagents and instruments for performing discovery experiments. Although mass spectrometers are subject to constraints similar to those with 2D gels regarding the observable dynamic range of protein abundances in a single spectrum, the ability to automate upstream sample preparation or carry out microscale fractionation directly on MALDI targets and commercial availability of “chips” made the technical aspects of biomarker discovery much less daunting. ^, Although direct identification of proteins in this workflow was generally not possible, it was argued by some that the identity of proteins giving rise to discriminatory peaks in a spectrum was not important, and instead the spectral pattern or fingerprint was the biomarker that linked phenotype to a snapshot of relative protein abundance. Critics pointed out that in a spectrum of a complex matrix such as serum or plasma, any peak was unlikely to be a single protein and instead would be composed of dozens, if not hundreds, of proteins, making it impossible to create the link between the biology and biochemistry of a disease process and the change in abundance of any specific peak.

The dawn of high-throughput proteome analysis led to a shift in thinking about biomarker discovery from hypothesis-driven to hypothesis-generating. In the former, biochemical pathways associated with disease are rigorously interrogated to link observations and biology. In the latter, an unbiased comparison of samples without consideration of biology is used to identify putative differences in protein abundance between sample types to be associated with biological pathways in future studies. At around the same time, the field also adopted the hopeful concept of multiprotein biomarkers; changes in abundance of multiple proteins, none of which might be individually valuable, combined to provide important clinical information.

Given the apparent benefits, and largely ignoring probable compromises, unbiased discovery approaches were applied to many diseases that had defied biochemical diagnosis or those that would benefit from early detection. ^, Initial studies on breast and prostate cancer were promising, and results from one particular study electrified both the clinical and proteomics research communities. The results from that study suggested that ovarian cancer, even at the earliest stages, could be unambiguously detected from a proteomic analysis of a serum sample. Efforts to validate this study led to an acrimonious public debate, and eventually it was accepted that the apparent clinical validity of the test was artifactual, arising from poor experimental design and fundamental misunderstandings of MS data and processing. Biomarker discovery studies using these MALDI-based fingerprinting methods continued for several years after this upheaval, but the approach was largely abandoned by 2007.

A pivotal moment in the early days of proteomics was the development of integrated liquid chromatography–tandem mass spectrometry (LC-MS/MS) systems with automated data acquisition and automated data processing. Pioneering work by Washburn and Yates provided a way to perform proteome analysis that truly overcame many of the shortcomings of 2D gels. The strategy started with a complex protein mixture which was digested with trypsin. The peptides were then fractionated by multidimensional chromatography, with each fraction being directed sequentially to the mass spectrometer. Although it is counterintuitive to make a complex mixture even more complex via proteolysis, the technique known as Multidimensional proteomics identification technology (MudPIT) was highly automated, scaled easily, and used to generate protein catalogs of thousands of proteins at a fraction of the labor required for a similarly complex 2D gel analysis. Although not originally developed to perform quantitative analysis, subsequent enhancements of this technique provided the innovations that are at the core of most proteomics discovery experiments carried out today.

Biomarker pipeline

The goal of proteomic biomarker discovery is to identify and then demonstrate clinical validity of a protein or combination of proteins that provide useful diagnostic or prognostic information regarding disease. At each step of this process, experimental tools and performance expectations change. At the earliest stages of discovery, dozens of samples are interrogated at the level of thousands of proteins ( Fig. 24.2 ). Discovery workflows from preanalytical sample preparation to complete data analysis are very time consuming, requiring hours to days of data collection for each individual specimen that is being processed. It is not uncommon for discovery experiments to require continuous data acquisition for months, constraining the number of experimental samples analyzed. After the raw data are collected, they are reduced to give a list of each peak by mass, retention time, intensity, and identity (if available). These lists are then mathematically manipulated to align common features between multiple specimens and to normalize the overall intensity between each acquired data set. Finally, the feature lists are compared to generate a list of peaks that are reproducibly and significantly different between the two clinical states. Frequently, discovery experiments yield 10 to 100 candidate proteins suitable for further development. The investigator may choose to use other data to help refine this list by focusing on known or postulated biological relevance.

FIGURE 24.2, Proteomics biomarker discovery pipeline. To discover potential novel protein biomarkers, samples can be prepared for liquid chromatography–mass spectrometry (LC-MS) and analyzed as intact proteins. Replicates of pooled samples from two different disease states or unpooled samples from individual patients from two disease states may be compared (e.g., healthy versus control). Signals from the mass spectrometer are integrated and compared in silico to identify protein features that are significantly different among the disease states, which can be subsequently identified in later analyses. This type of analysis of intact proteins is limited to small proteins. Alternatively, samples can be prepared by using proteolysis before being analyzed as peptides by LC-tandem MS (LC-MS/MS). Using this approach, proteins are identified and quantified to find proteins that are different between pathophysiologic states.

In the next stage of the pipeline, specific targeted assays for selected candidate proteins are developed with methods that achieve higher throughput and precision, with the goal of verifying the potential validity of the candidate marker proteins in a series of pilot studies that will incorporate samples from hundreds of patients. At the same time, it is expected that refinement will reduce the number of candidate proteins to those with the highest relevance to the disease state leaving perhaps 5 to 20 proteins from the original set of candidates.

In the final phase of biomarker development, the most neglected phase to date, an assay for one or a panel of markers is developed and rigorously validated to establish analytical performance which is appropriately aligned with the desired clinical validity. The validated assay is then used to analyze hundreds to thousands of samples from prospective and/or retrospective clinical studies to establish the clinical validity (or lack thereof) of the biomarker or biomarker panel. On completion of this work, a successful biomarker or biomarker panel assay will be described in a standard operating procedure (SOP) that includes preanalytical requirements, a description of the intended use, performance metrics, quality control procedures, and the expected clinical performance of the assay as described by metrics for sensitivity, specificity, positive predictive value, and negative predictive value. In some cases, this work could include studies designed to achieve regulatory approval from the United States Food and Drug Administration or similar governmental body.

Discovery experiments

Similar to the first comparative studies with 2D gels, the aim of present-day discovery proteomics experiments is to identify and quantify all of the proteins in complex mixtures to find candidate biomarker proteins that differ in relative abundance between two biological states (e.g., diseased and normal). The discovery pipeline begins with the selection of useful samples. Ideally, clinically relevant samples are selected that have been collected, prepared, and stored in an identical fashion along with a dossier of information regarding demographic and clinical history. Preanalytical steps are generally applied to fractionate or otherwise modify the sample to improve depth of analysis to lend the greatest opportunity to measure proteins that may differ by disease state. Subsequently, the samples are then digested from whole proteins into a complex pool of peptides destined for LC-MS or LC-MS/MS experiments to provide data sets that can be compared to identify significant differences between samples.

Requirement for separations

As mentioned earlier, given the enormous complexity of biological specimens and the additional complexity that is derived from proteolysis of proteins into peptides, separation technologies must be used to allow a mass spectrometer to specifically detect and identify peptides from among a complex mixture of peptides. This is due to the limited duty cycle of mass spectrometers and the need to acquire one tandem mass spectrum at a time (when data dependent acquisition is used.)

High-performance liquid chromatography (HPLC) using reverse-phase chemistry has been the chief tool for performing peptide and protein separations with a direct interface to mass spectrometers (LC-MS). For additional discussion on HPLC, refer to Chapter 19 . The compatibility of reverse-phase solvent systems with LC-MS, commercial availability of a wide range of column dimensions and packing materials, and the ease with which high-resolution separations of peptides and proteins are achieved with modest method development are key advantages compared to alternative separation technologies. Chromatography can be performed under a wide range of flow rates. Low-flow chromatography at 0.5 μL/min or less is typically employed for maximum sensitivity but suffers from low throughput and lack of robustness. At flow rates from 0.2 to 1 mL/min, maximal throughput is achieved with lower sensitivity. ^, Alternative separation techniques such as capillary electrophoresis have successfully been interfaced with mass spectrometers and offer very-high-resolution separations, but challenges in maintaining robust performance and complexities in developing routine separation methods have prevented wider adoption. For additional discussion on chromatography, refer to Chapter 19 .

Data acquisition strategies

MS experiments in biomarker discovery studies are typically conducted using one of three data collection strategies: (1) full scan only LC-MS, (2) data-dependent LC-MS/MS (DDA), or (3) data-independent LC-MS/MS (DIA). Each workflow has different strengths and weaknesses, and laboratories select workflows based on experience and specific assay requirements.

In a full-scan acquisition experiment, a high-resolution mass spectrometer continually collects spectra over a wide mass-to-charge (m/z) range, capturing signals from all ions that are presented to the mass spectrometer as the chromatogram is developed. No tandem mass spectra are acquired, which are needed for the identification of the peptides eluting from the chromatographic column. Instead, contour maps of mass spectrometric data are collected across three dimensions: time, intensity, and m/z . If a peak appears to be different between two clinically relevant groups in this type of discovery experiment, a second round of analysis is required to identify the peptide of interest.

Data-dependent acquisition (DDA) is an algorithm-guided acquisition strategy in which MS and MS/MS data collection is automated and dependent on the presentation of sample to the mass spectrometer. First, a wide m/z range spectrum (survey scan) is collected. The survey scan is rapidly processed to generate a list of candidate peaks which are ranked in order of intensity. This list is then used to assemble an acquisition queue for sequential tandem mass spectrometric experiments ( Figs. 24.3 and 24.4 A). Iteratively, the instrument then performs precursor selection and fragmentation of each precursor ion listed in the queue. To prevent the instrument from repeatedly analyzing the same set of high-abundance precursors which appear in sequential survey scans, precursor ions which have undergone fragmentation are excluded from re-analysis for a fixed duration, preventing further analysis until the exclusion criteria expires. Using this experimental strategy, the final data set contains information on the m/z, intensity, retention time of each individual precursor ion, and associated product ion spectra.

FIGURE 24.3, Data-dependent acquisition. Peptide identification in discovery proteomics experiments often uses software to drive the mass spectrometer in the selection of the precursor peptides to be fragmented. As peptides elute from the chromatographic column, the mass spectrometer first performs a survey scan to assess peptide precursor m/z and their abundance. In this theoretical example a survey scan is performed at 105 minutes and the most abundant peaks are selected for subsequent tandem mass spectrometry (MS/MS) analysis, including the specific peptide that is fragmented in the third panel. The resulting spectrum is compared against a theoretical database of proteins to identify the peptide from the spectrum using various statistical approaches. VIFDALR is the example peptide fragment identified (using single letter amino acid codes).

Finally, data-independent acquisition (DIA) strategies use a full scan precursor survey, followed by the sequential isolation/fragmentation (MS/MS) of all ions collected from small windows (e.g., 5 to 20 Da) across the precursor scan until the entire m/z range of the initial full scan is covered (see Fig. 24.4B ). The survey scan and MS/MS scans are repeated continuously throughout the chromatographic run. In contrast to DDA, DIA product ion spectra are composite spectra that include all fragment ions from all precursors isolated in the small window rather than just one precursor, as in DDA. To interpret the data, extensive processing deconvolutes precursor and fragment ion data to yield time, m/z, and intensity for each precursor peptide, as well as fragmentation data inferred from the composite spectra.

FIGURE 24.4, Comparison of data-dependent acquisition (DDA) and data-independent acquisition (DIA). During DDA and DIA discovery experiments, the mass spectrometer begins with a high-resolution, high-mass-accuracy survey scan of the peptides eluting off the chromatographic column. A, In DDA, the survey scan is used to build a list of the precursors (using a 0.7 to 2.0 Da window) that will be fragmented/analyzed by tandem mass spectrometry (MS/MS) in subsequent steps in the mass spectrometer (typically 2 to 8 fragments are targeted from each precursor survey scan). This cycle of precursor/survey scan and MS/MS steps (typically ≥7 Hz) is repeated throughout the chromatographic run. B, In contrast, after the precursor scan in DIA, every part of the m/z range (typically 400 to 1000 Da wide) is then sampled by stepping through the m/z range using windows of 10 to 20 Da, collecting all of the precursors in each window and fragmenting/analyzing them by MS/MS. This cycle of precursor/survey scan and MS/MS steps (typically ≤2 Hz) is repeated throughout the chromatographic run. Methods to improve the specificity of the DIA approach include overlapping windows and randomization of the MS/MS windows (not shown). For both methods, there is significant post–data acquisition analysis using software to determine the identity and abundance of peptides in the sample (not shown).

For both DDA and DIA experiments, peptide identification relies on the ability of the mass spectrometer to create fragments in the gas phase to generate product ion spectra, also called MS/MS spectra. This fragmentation process occurs by imparting energy into a peptide by accelerating it and colliding it with an inert gas. With the energy imparted due to collisions, a peptide will dissociate into two fragments in a thermodynamically probabilistic fashion, most commonly cleaving at specific points along the amide backbone to yield b-ions and y-ions, which respectively include the amino- and carboxyl-terminus of the peptide ( Fig. 24.5 ). The mass differences between peaks in the spectrum, which are associated with fragmentation at each amino acid in the peptide, may be used to construct the amino acid sequence of the peptide. The resulting product ion spectra can also be used as fingerprint spectra that are matched to database entries by peak matching rather than through interpretation of amino acid sequence and subsequent database searching. Software algorithms are used to accomplish the process of matching mass spectra to peptide sequences in a database. Databases are generated by in silico digestion of all theoretical proteins typically derived from genomic data. In some cases, theoretical databases are supplemented with empirical MS data from a variety of sources, including public repositories representing hundreds of millions of processed mass spectra. Protein and peptide identification software such as Sequest, MASCOT, X! Tandem, Andromeda, and OMSSA have been rigorously evaluated and are widely used by the MS community.

FIGURE 24.5, Fragmentation of peptides in the gas phase. Peptides are fragmented after being excited in the gas phase within the mass spectrometer. The most common ion fragments analyzed in triple-quadrupole mass spectrometers for targeted assays are b-ions and y-ions, which include the amino- and carboxyl-terminus, respectively. Other ions are formed and may be more predominant in other types of mass analyzers (e.g., a-, c-, x-, and z-ions). The characteristic fragmentation patterns of peptides make it possible to search databases for peptide identification from the fingerprint spectra.

In each of these experimental workflows, isotope labels ^, and chemical tags ^, have been used to allow the analysis of many samples simultaneously (multiplex) and/or to provide a reference against which both experimental sample types (e.g., disease and healthy) can be compared to improve precision. For example, two samples, one from a diseased patient and one from a healthy patient, are chemically labeled during the preanalytical steps (i.e., after proteolysis of the samples). In one case, the chemical label contains isotopes of natural abundance and the other is enriched in heavy isotopes. If the two samples are mixed after labeling, they will contain many identical peptides, but each peptide from the different samples will have a different mass due to the presence of isotopes incorporated by the chemical label. Because this mixture of case and control is analyzed by LC-MS, pairs of peaks will appear in the spectra, each representing the same peptide but at slightly different masses, which can be easily resolved by the mass spectrometer. These strategies help increase the number of technical and experimental replicates that can be achieved on any given instrument, and also can facilitate the comparison of data acquired over weeks or months where instrument drift can make unbiased comparison of data sets difficult.

Processing of discovery proteomics data

Postprocessing of these data sets is required before final analysis. First, the precursor m/z, intensity, and retention time making up each data set is arrayed in three dimensions and overlaid seeking maximal alignment of shared features across all datasets. Software deployed for this task can be used to correct for known experimental artifacts, such as drift in retention time. Subsequently, the data are normalized to correct for intensity differences that arise from day-to-day changes in sample preparation efficiency and mass spectrometer performance. Finally, if DDA data are being processed, each precursor and associated product ion scan are used for database searching to identify peptides , which can be assembled to generate protein identifications. Alternatively, a DIA dataset is processed by comparison to a reference DIA dataset or a library which provides the ability to deconvolute precursor and product ion scans which are composites arising from multiple peptides.

With appropriate data processing, relative abundance and identities of thousands of proteins can be achieved in a single discovery experiment. Once these steps are completed, an array of biostatistics tools is used to find proteins where significant abundance changes are observed between case and control samples. In addition to commercial software packages that attempt to provide comprehensive solutions to these diverse workflows, many academic and industrial groups build data processing pipelines from a combination of custom-built, open-source, and commercial software packages.

Variations and details

The experimental strategy that is ultimately deployed is defined by the availability of specific instrumentation. There are subtle differences in each of the workflows that reflect tradeoffs in data depth versus breadth. For example, the greatest precision is generally observed in full-scan-only experiments, which yield precursor ion m/z, intensity, and retention time with no protein identifications unless separate LC-MS/MS analyses are carried out. The two drawbacks to this approach are (1) the reduced specificity of the method for low-abundance peaks when multiple peptides elute with overlapping m / z isotope envelopes makes identification of the specific peptide of interest challenging and (2) the requirement for a separate experiment including MS/MS analyses to enable identification of peptides associated with peaks that differ between two samples or two groups of samples. Subsequent experiments intended to yield identifications can be complicated by retention time differences and interference from overlapping peptides.

DDA provides peptide and protein identification within a single experiment, but measurement imprecision is higher providing less power to discriminate abundance differences. The reason that DDA is more variable is because the automated process of picking peaks to analyze by MS/MS during the analysis is stochastic. More specifically, if more abundant peptides elute at the same time, the peptide of interest may not be selected for MS/MS analysis. Likewise, time spent acquiring MS/MS spectra precludes the acquisition of full scan spectra from which quantitative information is derived.

DIA has been proposed as a workflow that can partially resolve these trade-offs between DDA and full scan acquisition strategies providing quantitative information from both precursors and product ions. Compromises associated with this approach are the need for a comprehensive library of precursor and product ion spectra used for deconvolution and concerns about specificity of quantitative data due to overlapping product ion peaks.

Although improvements in instrumentation and software have enabled greater depth of proteome analysis with fewer compromises, limits remain and the ability to exhaustively interrogate a proteome in a single LC-MS run is not yet achievable.

An alternative to these untargeted, pan-proteomic approaches is to use targeted mass spectrometric experiments to discover protein biomarkers. ^, Built from a list of predefined proteins of interest derived from previous proteomics experiments or biological insight, a tandem mass spectrometric, selected reaction monitoring (SRM) method is developed using a triple quadrupole or quadrupole–high-mass-accuracy analyzer hybrid instrument. In this experiment, specific, predetermined pairs of precursor and fragment ion masses (transitions) are monitored to detect surrogate peptides for a protein and quantify them as chromatographic peak areas. The resulting peak areas between experimental observations are then normalized to provide the relative abundance of the peptides of interest (i.e., representing a subset of the total proteome). This type of discovery experiment is distinguished from fully quantitative experiments in that it lacks internal standards (IS) to control for sample preparation and mass spectrometer performance and the system is uncalibrated so that conversion of peak areas to concentration is not possible. Finally, these assays are typically set up without quality control materials that are the cornerstone of longitudinal monitoring of assay quality in the clinical laboratory. Unlike DDA and DIA experiments, the decision to specifically target individual peptides places a limit on the total number of observations that can be achieved in a run due to duty cycle constraints. Although the precision of this approach is often good for research purposes (CV <25%), the breadth of protein coverage is limited to a few hundred peptides in a single run.

POINTS TO REMEMBER

Discovery proteomics

It is generally used to identify new biomarkers.
Methods are less specific and precise compared to targeted quantitative methods.
2D gel electrophoresis was one of the first techniques used to examine proteomes.
MALDI–time-of-flight (TOF) MS and LC-MS/MS replaced 2D gels.
Currently, most discovery experiments are performed using untargeted methods.

Targeted quantitative experiment

After discovery, targeted quantitative mass spectrometric approaches are used in the development of specific, precise assays. These are designed for improved specificity and precision, providing a mechanism to verify or validate initial proteomic discoveries by running hundreds of patient samples with improved quantitative performance when compared to measurements made during discovery. Often, improvements in throughput are taken into account at this stage to facilitate much larger studies. During this stage, many candidates will fail to reproduce due to marginal statistical significance in initial discovery experiments. In other cases, proteins will be removed as candidates because of poor performance characteristics (i.e., stability, degradation, and modification), which would make them unsuitable for further assay development.

Targeted assays are typically performed using triple-quadrupole instruments using a well-developed technique known as stable isotope dilution (readers are referred to Chapter 23 for an in-depth discussion of the principles of isotope dilution assays). In this experiment, calibration materials with known concentrations of protein analyte are processed in parallel with patient samples where each sample and calibrator is spiked with an IS at a constant concentration. The IS is most often chemically and structurally identical to the analyte of interest; however, stable (nonradioactive) heavy isotopes are incorporated. This modification changes only the mass of the peptide and not the physical chemical characteristics. Samples and calibrators are analyzed by MS to determine the intensity of signals specific for the analyte of interest and the IS, which are used in the determination of the peak area ratio (analyte peak area ÷ IS peak area). Using the calibration materials, a response curve is generated by plotting the peak area ratio versus concentration, which can be fit with a line by standard regression techniques. From this response curve, the concentration of the analyte of interest in an unknown sample can be determined from the peak area ratio.

The selection of MS to perform validation and clinical studies should be evaluated against the possibility of using commercially available immunoassays. Validated immunoassays with good sensitivity and specificity could alleviate the need for the development of MS-based assays. Frequently though, discovery experiments yield proteins for which no immunoassay is commercially available, and in these cases the path through assay development can be significantly shorter using MS when compared to the in-house development of an acceptable immunoassay. ^, Even when commercial kits are available, there is a growing appreciation of the substantial limitations inherent in immunoassays, including lack of quality control for commercial research–grade enzyme-linked immunosorbent assay (ELISA) kits, nonspecific recognition of nontarget proteins, autoantibody and heterophilic antibody interference, poor concordance between kits for the same analyte, and saturation of sandwich assay reagents leading to falsely low results. A well-designed mass spectrometric assay can avoid these issues, which makes it an attractive alternative to immunoassays.

POINTS TO REMEMBER

Targeted proteomics

Targeted proteomics experiments only detect preselected proteins and peptides.
Mass spectrometry is sometimes used without internal standards in discovery experiments.
Incorporation of internal standards into targeted experiments reduces imprecision.
Targeted proteomics experiments can have substantially lower imprecision than untargeted experiments.

Bottom-up and top-down experiments

From the preceding sections, it may be apparent that proteins can be quantified by MS using two different approaches. In some cases, proteins are of sufficiently low molecular weight to be detected in their intact state, with or without dissociation in the gas phase in the mass spectrometer. This approach to quantification has been termed top-down proteomics to contrast with bottom-up or proteolysis-aided proteomics, which relies on proteolysis and the quantification of surrogate peptides to determine protein concentration ( Fig. 24.6 ). Both approaches are discussed in the following sections.

FIGURE 24.6, Common workflows for proteomic assays by mass spectrometry (MS). Bottom-up proteomics assays incorporate a proteolytic digestion step and use peptides detected by the mass spectrometer as surrogates for the protein contained in the sample. Intact or top-down proteomics assays require protein enrichment (either biochemical or immunoaffinity) before analysis of the protein by MS. Internal standard peptides (not shown) can be added before or after digestion (ideally before). Intact protein internal standards (not shown) are ideally added before protein enrichment.

Bottom-up targeted proteomics

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here