Pediatric Cardiovascular Data, Analysis, and Critical Appraisal of the Literature


Introduction

In recent years the volume and variety of pediatric cardiovascular data captured across various sources have continued to expand. These datasets are increasingly being integrated and used for a variety of research and quality improvement purposes. Regardless of the data source, there are several important points to consider in analyzing data or utilizing the literature to guide evidence-based care in this population. In this chapter, we highlight key aspects related to pediatric cardiovascular data, analysis, and critical appraisal of the literature.

Pediatric Cardiovascular Data

Current Data Environment

The present decade has ushered in an era of “big data” during which the volume, velocity, and variety of data captured across numerous sources and many different fields have increased exponentially. Along with this, new techniques and capabilities have been developed to better collect, manage, analyze, and provide feedback regarding these data, with the goal of optimizing performance and outcomes across numerous industries. For example, the automotive industry captures data generated by sensors on electric cars to better understand people's driving habits. These data are merged and analyzed with information on the frequency and location of battery charging to aid in better design of the next generation of vehicles and charging infrastructure. In the hotel industry, certain chains merge and analyze weather and airline flight cancellation data, along with information on the geographic location of their hotels. These data are used to target mobile ads to stranded passengers to promote easy booking of nearby hotels.

Health Care Data

Historically in health care and in the hierarchy of medical research, the value of databases, registries, and other data sources in the cycle of scientific discovery and in patient care has not always been recognized. “Mining datasets” and “database research” have been characterized as lesser pursuits compared with basic science research or clinical trials.

However, several recent developments have begun to change the way we view data and their potential value. First, similar to other industries, the volume and granularity of health care data have increased exponentially, including data captured in the electronic health record, clinical registries, research datasets, at the bedside, and from mobile monitors as well as other sources. It has become increasingly recognized that the analysis and integration of these datasets may expand the range of questions that can be answered. For example, early results suggest that the integration of continuous data streams generated by various bedside monitoring systems with data on clinical outcomes may enable better prediction and treatment of adverse events in intensive care settings.

Second, along with this trend toward the increasing availability of data, there has been a simultaneous trend toward declining federal funding to support biomedical research. This has led to further interest in improving our understanding of how to leverage available data sources to power research more efficiently. For example, the use of existing registries or the electronic health record as platforms to support clinical trials has been proposed with the goal of reducing the time and costs associated with data collection.

Finally, the current emphasis in the United States on both improving the quality of health care and lowering costs has necessitated the analysis and integration of both quality and resource utilization data across numerous sources in order to elucidate the landscape of care delivery and outcomes, to investigate relationships between quality and cost, and to develop and investigate strategies for improvement. These and other recent trends have led to a greater recognition in the health care field of the value of leveraging the increasing volume of available data, and numerous recent initiatives have been launched with the goal of further integrating information across sources to conduct novel research and improve care. These sources include the National Institutes of Health Big Data to Knowledge and Precision Medicine Initiatives, among others.

Pediatric Cardiovascular Data

In 2015, the National Heart, Lung, and Blood Institute of the US National Institutes of Health convened a working group to characterize the current data environment in the field and to offer recommendations for further development and integration. The working group described several strengths and weaknesses related to the existing data environment as detailed in the following sections.

Data Sources

Numerous clinical and quality improvement registries, administrative/billing databases, public health databases, research datasets, and other sources now exist in the field and contain detailed information that is used for pediatric cardiovascular research, surveillance, and quality improvement purposes. Comprehensive listings of available data sources, both in the United States and worldwide ( Table 24.1 ), have recently been published. In addition, data are being increasingly captured via a variety of newer techniques and modalities, including the electronic health record, medical monitors and devices, and genetic and biomarker data. Some centers are also now capturing standardized longer-term outcomes data, such as quality-of-life and neurodevelopmental outcomes data.

Table 24.1
Summary of Existing Pediatric Cardiovascular Databases and Registries
Modified from Vener DF, Gaies MG, Jacobs JP, et al. Clinical databases and registries in congenital cardiac surgery, critical care, and anesthesiology worldwide. World J Pediatr Congenit Heart Surg. 2017;8:77–87.

a Formerly the European Association of Cardio-Thoracic Surgeons Congenital Heart Surgery Database (EACTS).
CHD, Congenital heart disease; CHS, congenital heart surgery; CRT, cardiac resynchronization therapy; CVICU, cardiovascular intensive care unit; ECMO , extracorporeal membrane oxygenation; EP, electrophysiology; ICD, implantable cardioverter-defibrillator; ICU, intensive care unit; PA, pulmonary artery; PH, pulmonary hypertension; STS, Society of Thoracic Surgeons.

Infrastructure and Collaboration

Many programs focusing on congenital heart disease across the United States have developed local infrastructure and personnel to support data collection for various registries and other datasets and to support the management and analyses of their local data for administrative, quality improvement, and research purposes. Several centers and research organizations also function as data-coordinating centers, aggregating and analyzing various multicenter datasets in the field.

There is also an environment of collaboration across many programs focusing on congenital heart disease and investigators related to participation in various multicenter research and quality improvement efforts. This has also extended in many cases to collaboration with patient and parent advocacy groups. Examples include the National Pediatric Cardiology Quality Improvement Collaborative, Pediatric Heart Network, Congenital Heart Surgeons Society, Pediatric Cardiac Critical Care Consortium, and many others. Annual meetings of the Multi-societal Database Committee for Pediatric and Congenital Heart Disease have aided in facilitating the sharing of ideas and collaboration across the many different registries and databases.

Standardized Nomenclature

Another important aspect of the current pediatric cardiovascular data landscape has been the major effort over the past two decades to develop a standardized nomenclature system. In the 1990s both the European Association for Cardio-Thoracic Surgery (EACTS) and the Society of Thoracic Surgeons (STS) created databases to assess congenital heart surgery outcomes and established the International Congenital Heart Surgery Nomenclature and Database Project. Subsequently, the International Society for Nomenclature of Pediatric and Congenital Heart Disease was formed; it cross-mapped the nomenclature developed by the surgical societies with that of the Association for European Pediatric Cardiology, creating the International Pediatric and Congenital Cardiac Code (IPCCC, http://www.IPCCC.net ). The IPCCC is now used by multiple databases spanning pediatric cardiovascular disease, and a recent National Heart, Lung, and Blood Institute working group recommended that the IPCCC nomenclature should be used across all datasets in the field when possible.

Current Data Limitations

Although a great deal of progress has been made over the past several years to better capture important pediatric cardiovascular data, many limitations remain. First, there are several limitations related to data collection. Many registries and databases contain duplicate fields, some with nonstandard definitions, leading to redundant data entry, high personnel costs, and duplication of efforts. There is also wide variability in missing data, data errors, mechanisms for data audits and validation, and overall data quality across different datasets. Second, there are limitations related to data integration. Most datasets remain housed in isolated silos, without the ability to easily integrate or share information across datasets. This limits the types of scientific questions that may be answered and adds to high costs and redundancies related to separate data coordinating and analytic centers. Finally, there are limitations related to organizational structure—generally there is a separate governance and organizational structure for each database or registry effort, which adds to the inefficiencies and lack of integration. Some are a relatively small part of larger organizations focused primarily on adult cardiovascular disease, with limited input or leadership from the pediatric cardiac population. This can lead to further challenges in driving change.

Pediatric Cardiovascular Data Sharing and Integration

To address the limitations outlined in the preceding sections, recent work has focused on developing better mechanisms to foster data sharing and integration. These efforts hold the potential to drive efficiencies by minimizing redundancies in data collection, management, and analysis. In turn, this work could save both time and costs. In addition, data integration efforts can support novel investigation not otherwise possible with the use of isolated datasets alone. Data linkages expand the pool of available data for analysis and also capitalize on the strengths and mitigate the weaknesses of different data sources. These data sharing and collaboration activities may take place through several mechanisms and can involve partnerships or data linkage activities on either the “front end” (at or before the time of data collection) or the “back end” (once data have already been entered).

Partnerships Across Databases: Shared Data Fields and Infrastructure

Partnerships between new and/or existing registries and organizations can drive efficiencies in several ways. For example, the Pediatric Acute Care Cardiology Collaborative (PAC3) recently collaborated with the Pediatric Cardiac Critical Care Consortium (PC 4 ) to add data from cardiac step-down units to the intensive care unit data collected by PC 4 . Data will be collected and submitted together, allowing for integrated feedback, analysis, and improvement activities. This approach is more time- and cost-efficient than creating a separate step-down registry, in which many of the fields regarding patient characteristics, operative data, and clinical course prior to transfer would have been duplicated. Similar efforts have integrated anesthesia data with the STS Congenital Heart Surgery Database and electrophysiology data within the American College of Cardiology Improving Pediatric and Adult Congenital Treatments (IMPACT) registry, which collects cardiac catheterization data. These approaches have involved varying organizational structures governing data access and analysis.

A related method involves a more distributed approach with sharing of common data fields and definitions between organizations, information technology solutions allowing single entry of shared data at the local level, and subsequent submission and distribution of both shared and unique data variables to the appropriate data coordinating centers for each organization/registry. An example of this is the shared variables and definitions for certain fields across the STS, PC, and IMPACT registries.

Linking Existing Datasets

Linking existing data that have already been collected can be accomplished through a variety of mechanisms. Linkage of patient records can be accomplished through the use of unique identifiers such as medical record number, social security number, or combinations of “indirect” identifiers (such as date of birth, date of admission or discharge, sex, and center where hospitalized) when unique identifiers are not available.

These linked datasets have been used to conduct a number of analyses that would not have been possible within individual datasets alone—several examples are highlighted below:

  • Academic outcomes: Clinical data from a state birth defects registry have been linked with state education records to understand academic outcomes in children with congenital heart defects.

  • Comparative effectiveness and cost analyses: Clinical data from the STS Congenital Heart Surgery Database have been linked with resource utilization data from the Children's Hospital Association to perform comparative effectiveness and cost-quality analyses. This linked dataset now spans more than 60,000 records and more than 30 children's hospitals. Similar methods have also been used to link clinical trial data from the Pediatric Heart Network with administrative datasets to clarify the impact of therapies on not only clinical outcomes but also costs of care.

  • Long-term survival and other outcomes: Clinical information from the Pediatric Cardiac Care Consortium (PCCC) registry has been linked with the National Death Index and United Network for Organ Sharing dataset in order to elucidate longer-term outcomes (mortality and transplant status) in patients with congenital heart disease undergoing surgical or catheter-based intervention.

  • Care models: Center-level clinical data from the STS Congenital Heart Surgery Database have been linked with various survey data to clarify the association of clinical outcomes with certain hospital care models and nursing variables.

Data Modules

Methods have also been developed to create data modules enabling efficient collection of supplemental data points to an existing registry or database. The modules can be quickly created and deployed to allow timely collection of additional data needed to answer research questions that may arise. For example, this methodology has been recently used by PC 4 to study the relationship between Vasoactive-Inotropic Score and outcome after infant cardiac surgery. A module allowing for capture of additional data related to inotrope use was created, deployed, and linked to the main registry. This facilitated efficient data collection with 391 infants prospectively enrolled across four centers in just 5 months.

Trial Within a Registry

It has become increasingly recognized that many variables of interest for prospective investigation, including clinical trials, are being captured within clinical registries on a routine basis. It has been proposed that leveraging these existing registry data may be a more efficient way to power prospective research, avoiding duplicate data collection and reducing study costs. These methods have been successfully used to support clinical trials in adult cardiovascular medicine.

In the pediatric cardiovascular realm, the Pediatric Heart Network recently conducted a study to evaluate the completeness and accuracy of a site's local surgical registry data (collected for submission to the STS Database). Results were supportive of the use of these data for a portion of the data collection required for a prospective study (e.g., the Residual Lesion Study), which is ongoing and the first example of the use of registry data for this purpose in the field.

Pediatric Cardiovascular Data: Future Directions

While there are now a number of pediatric cardiovascular data sources available for research, quality improvement, and other purposes, important limitations remain. Although several initiatives have supported greater integration and efficiencies across data sources, as described in the preceding sections, most have involved 1 : 1 data linkages to answer a specific question. More comprehensive approaches are needed to better streamline data collection; integrate information across existing and newer data sources; develop organizational models for more efficient data management, governance, and analysis; and reduce duplicative efforts, personnel, and costs. In addition to supporting more efficient research, these efforts also hold the potential to allow us to answer broader questions rather than those confined to a specific hospitalization, episode of care, or intervention, as is the focus of our current individual registries. Newer analytic approaches such as machine learning techniques are also being further investigated and may allow us to uncover important patterns in the data that would otherwise not be apparent using traditional techniques.

To begin to address these remaining challenges, a series of meetings across multiple stakeholder groups was held over the course of 2017. As a result of these meetings, five initial networks/registries agreed to collaborate and align efforts, forming Cardiac Networks United. These initial five organizations include the PC 4 , PAC3, National Pediatric Cardiology Quality Improvement Collaborative, Cardiac Neurodevelopmental Outcomes Collaborative, and Advanced Cardiac Therapies Improving Outcomes Network. Efforts are ongoing to align attempts to foster novel science not possible in individual silos, accelerate the translation of discovery to improvements in care, and reduce infrastructure and personnel costs through the sharing of data and resources.

Measurement and Description of Data

Regardless of the source of data, there are several important considerations to keep in mind when describing and analyzing pediatric cardiovascular data.

Data and Variables

Data are specific pieces of information defined by their level of measurement and their relationship to other data. They are often referred to as variables , since they may take on different values. The type of values that a variable may assume determines the level of measurement, which in turn determines how the values for a given variable should be described and how associations between variables should be assessed.

Categorical variables are those for which the values fall into discrete and mutually exclusive categories. The relationship between the different categories reflects a qualitative difference. For example, for the variable indicating type of atrial septal defect, the possible values could be ostium primum, secundum, or sinus venosus. Variables with only two possible values are referred to as being dichotomous or binary. Examples of dichotomous variables include yes versus no and right versus left.

A specific type of dichotomous categorical variable is the occurrence of a discrete event, such as receiving an intervention, or death. Events are almost always associated with a period of time at risk, which is an important aspect of that particular variable. This can be presented as the number of patients experiencing a particular event during a specified period expressed as a proportion of the total patients at risk for that event. For example, “There were 5 (13%) deaths within 30 days of surgery in 38 patients undergoing Fontan palliation.” When analyzing this type of data in more complex datasets that include varying lengths of time for which each patient is followed and patients are lost to follow-up, specific analyses that can account for these issues, called censoring , must be used. Kaplan-Meier time to event analyses are the most common seen in the medical literature (see later).

Ordinal variables reflect a specific type of categorical level of measurement in which the values can be ordered in a quantitative manner. An example would be the subjective grading of valvar regurgitation from echocardiography—trivial is less than mild, which is less than moderate. The categories are discrete and ordered, and the values would be presented in a manner similar to other categorical variables—as frequencies, proportions, and percentages. A specific quantitative value is not assigned to differences between the groups; we merely know that one category is more or less than another.

Quantitative or continuous variables are those where the difference between two values reflects a quantifiable amount. Examples include height, weight, age, ventricular ejection fraction, and blood pressure. When measured repeatedly, continuous variables tend to take on a distribution. A distribution is a description of the relative likelihood of any particular value occurring.

In describing the distribution of a continuous variable, the standard is to present some measure of the center of the values along with the magnitude and spread of their variation. The first step is to look at a frequency plot of the distribution of values. If the distribution is equal on each side of center, or bell shaped, we refer to this is as being normally distributed. In a normal distribution, the center and variation of the spread (or distance of the variables from the center) have specific definable properties or parameters. The measure of the center would be the mean or average value, and exactly half of the individual's measures fall above or below the mean. The typical measure of variation in a normal distribution is the standard deviation. This is calculated as the sum of the square of each of the differences between the values and the mean divided by the number of values. The standard deviation details the shape of the normal curve and thereby the relationship of the all the variables’ values to the mean. In total, 66% of all values of a variable are within 1 standard deviation of the mean, 95% are within 2, and 99.7% are within 3 standard deviations.

Not all distributions are normal. If the tails or the sides of the distribution are unequal (i.e., lop-sided), it is referred to as a skewed distribution. Kurtosis refers to a distribution that is either peaked or flattened. Important skewness or kurtosis can cause the distribution to become nonnormal; the standard parameters and characteristics of mean and standard deviation then no longer apply. In this case, measures of the center should be chosen that reflect the ranking of values and not their interval magnitude. In ranking all of the values, the median value would be that measured value at the 50th percentile. For nonnormal data, the greater the amount of skewness, the greater the difference between the median value and the calculated mean. Measures of spread in a nonnormal distribution include values at specific percentiles, such as the quartile values, presented as the measured values at the 25th and 75th percentiles, with the interquartile range presented as the difference between these two values. Alternatively, the measured values at the 5th and 95th percentiles or the minimum and maximum values might be presented. Since these values are not dependent on the distribution being normal, they are often referred to as nonparametric measures.

Validity, Accuracy, and Reliability

Variables have properties reflecting the impact of how the measurements were determined. These properties include validity, accuracy, and reliability.

Validity

Validity assesses whether the measurement used is a true reflection of the desired concept. It answers the question, “Am I really measuring what I think I am measuring?” Validity can be challenging to achieve, particularly when the phenomenon being measured is qualitative and subjective.

If we take aortic valve regurgitation as an example, a subjective grading is often applied when performing echocardiographic assessment, characterized by ordinal categories of none, trivial, or trace or mild, moderate, and severe. The subjective and qualitative grade is meant to reflect the overall impression of the observer, who takes into account many aspects related to aortic valve regurgitation, such as the width of the jet, the function of the ventricle, pressure-halftime measurement, and diastolic flow reversal in the aorta. In using all of this information, we may give more weight to some over others in assigning the final grade of aortic valve regurgitation. If we wished to validate our subjective system of grading, we might start by convening a panel of expert echocardiographers and asking them first to define the concept of aortic valve regurgitation. After discussion, they may agree that no single indirect measure will suffice, and that multiple items may need to be considered simultaneously. The individual items and measures are chosen because they have content validity , meaning that they are judged to be related to specific aspects reflecting aortic valve regurgitation, and construct validity , meaning that they are judged to have a plausible causal or physiologic reason for having a relationship to aortic valve regurgitation. Alternatively, the panel may seek to measure aortic valve regurgitation using other methods, such as with magnetic resonance imaging or cardiac catheterization. This process is aimed at criterion-related validity , or the degree to which the proposed measure relates to accepted existing measures. They may also seek to assess how the subjective grade relates to clinical or outcome measures, known as predictive validity .

Accuracy

Once a measure is deemed to be valid, its accuracy and precision should be assessed. Accuracy is a reflection of validity in that it assesses how close a measure comes to the truth, but it also includes any systematic error or bias in making the measurement. Systematic error refers to variations in the measurements that might always occur predominately in one direction. In other words, the deviation of a measurement from the truth tends to be consistent. Regarding aortic valve regurgitation, this might reflect technical differences in echocardiographic assessment, as in the settings of gain or frequency of the probe that was used. This may also occur at the level of the observer, whereby the observer has a consistent bias in making the interpretation of aortic valve regurgitation, such as grading all physiologic aortic valve regurgitation as mild instead of trace. Alternatively, some observers may place more weight on a specific aspect when assigning a specific grade that tends to shift their grade assignment in one direction.

Reliability

Reliability or precision refers to the reproducibility of the measurement under a variety of circumstances and relates to random error or bias. It is the degree to which the same value is obtained when the measurement is made under the same conditions. Some of the random variation in measurements may be attributed to the instruments, such as obtaining the echocardiogram using two different machines. Some of the random variation may also relate to the subject, such as variations in physiologic state when the echocardiograms were obtained.

The reliability and accuracy of a measurement can be optimized via measurement standardization. Training sessions for observers on assessment and interpretation of a measure can be designed so that criteria for judgment are applied in a uniform manner. Limiting the number of observers, having independent adjudications, and defining and standardizing all aspects of assessment also improve reliability. In our case, this could be achieved by having the same readers assess aortic valve regurgitation using the same echocardiography machine with the same settings in patients of similar fluid status under similar resting conditions.

Analysis of Data

Analysis is the method by which data or measurements are used to answer questions, and then to assess the confidence in inferring those findings beyond the subjects that were studied. The plan for analysis of the data is an integral part of the study design and protocol. The appropriate planning, strategy, execution, and interpretation are essential elements to the critical appraisal of any research report.

Research Question

Every study must begin with a well-defined question, and the drafting of this question is the first step toward creating a research protocol. The research question often suggests the design of the study, the population to be studied, the measurements to be made, and the plan for analysis of the data. It also determines whether the study is descriptive or comparative. The process of constructing a research question is often iterative. For example, in considering the topic of hypertrophic cardiomyopathy, a descriptive research question might be “What are the outcomes of hypertrophic cardiomyopathy?” This question is nonspecific, but steps can subsequently be taken to refine and focus the question. The first step would be to determine what answers are already known regarding this question and what areas of controversy warrant further study. After a background review, an investigator may further clarify the question by asking the following: “What outcomes do I wish to study?”, “How will I define hypertrophic cardiomyopathy and in what subjects?”, and “At what time point or over what time do I wish to examine these outcomes?” In answering these questions, the research question is revised and further specified to “What is the subsequent risk of sudden death for children with familial hypertrophic cardiomyopathy presenting to a specialized clinic?” This refined question now defines the cohort to be studied—children with familial hypertrophic cardiomyopathy in a specialized clinic and the outcome of interest, sudden death—and it suggests that the study will have some type of observational design. Thus a well-defined and focused research question is essential to considering other aspects of the proposed study or report.

Using Variables to Answer Questions

Once the research question is established, the next step in generating an analysis plan is to select and define variables. Specifically, the researcher must establish the information needed to answer the question. This process should include setting definitions, determining the source(s) of data, and considering issues of measurement validity and reliability.

Types of Variables

Variables can be classified for statistical purposes as either dependent or independent variables. Dependent variables are generally the outcomes of interest, and either change in response to an intervention or are influenced by associated factors. Independent variables are those that may affect the dependent variable. The research question should define the primary independent variable, which is commonly a specific treatment or a key subject characteristic. A detailed consideration of the question should clearly identify the key or primary dependent and independent variables.

In any study there are usually one or two primary outcomes of interest, but there are often additional secondary outcomes. Analysis of secondary outcomes is used for supporting the primary outcome or exploring or generating additional hypotheses. It should be recognized that the greater the number of outcomes examined in a study, called multiple comparisons , the more likely it is that one of them will be statistically significant purely by chance. When assessing multiple comparisons, the level of certainty required to reach significance must increase.

Composite outcomes are a different but also important concept. A composite outcome results when several different outcomes are grouped together into one catchall outcome. As an example, a study of the effect of digoxin on adolescent patients with advanced heart failure might have a composite outcome of admission to the intensive care unit, listing for transplantation, and death. Having a composite outcome raises the likelihood that the study has a high enough number of outcomes to support an analysis. However, the appropriateness of composite outcomes is questionable, and issues have been raised about their validity. First, not all possible outcomes that might be included in a composite outcome have the same importance for subjects. In our example, admission to the intensive care unit and death, while both serious, would likely be deemed equivalent by very few people. Second, the creation of a composite outcome might obscure differences between the individual outcomes. Third, the risk for the component outcomes may be different with different associations. In our example, we would not be able to detect if any variables were associated specifically with intensive care unit admission. We would only be able to assess association with the composite outcome. Thus specific outcomes should be favored over composite outcomes when feasible and relevant.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here