Multiplatform Standardization of Breast DWI Protocols: Quality Control and Test Objects


Overview of Breast Diffusion-Weighted Imaging Standardization Efforts

Given the rapid advances in diffusion-weighted imaging (DWI) technology and the multiple parameters influencing quality of breast DWI acquisition and analysis, harmonization of breast DWI protocols necessitates coherent standardization efforts by physicians, researchers, and MR vendors. These standardization efforts will likely include trade-offs between practicality and diagnostic yield.

Rationale and Need for Standardization for Clinical Translation

Clinical translation of developed breast DWI technologies requires design and implementation of standardized acquisition and analysis workflows to improve overall robustness and reduce dependency on site imaging protocols and vendor platforms. Standardization improves reproducibility and accuracy of derived DWI metrics and enables practical application in the clinical setting. Despite great potential and promise of contrast-free imaging sensitive to tissue microstructure, DWI is often being used only qualitatively in the clinic to increase conspicuity of impeded diffusion tissues on high diffusion-weighting images ( b > 700 s/mm 2 ). The typical breast DWI protocol is an auxiliary to the high-resolution contrast-enhanced T1 weighted (T1w) scan that continues to serve as the primary diagnostic driver in breast magnetic resonance imaging (MRI) (see Chapters 2 and 13 ).

Unlike conventional qualitative DWI based on subjective impressions, the utilization of quantitative DWI metrics derived from biophysical models potentially enables objective thresholds for disease detection and treatment response monitoring. Rich quantitative DWI research conducted to date has uncovered major sources of technical bias and variability in breast DWI. Improvement of accuracy and precision via amelioration of bias and unnecessary variability is the focus of ongoing DWI standardization efforts. Characterization and reduction of technical errors are prerequisites to define meaningful thresholds for derived quantitative DWI parameters (e.g., apparent diffusion coefficient [ADC], kurtosis, compartment fractions) and for successful widespread translation of DWI technology improvements (see Chapter 12 for multishot, reduced field of view [rFOV], and other advanced acquisition techniques) into the clinical environment.

The common technical errors in breast DWI stem from inherent hardware properties, acquisition protocol, and data analysis workflow. Among acquisition parameters, applied b value sensitization in terms of DWI gradient strength, duration, ranges, and spatial uniformity is the most important and challenging factor for reproducible results. Current clinical systems with moderate gradient strength (<50 mT/m) do not routinely provide control over diffusion gradient pulse duration. Diffusion gradient timing is nominally set by the minimum echo time (TE) for the highest b value acquired in a sequence. Therefore ensuring consistent TE and b range across scanners is a reasonable first standardization step to constrain diffusion encode-time bias. TE setting also determines T2 weighting and is a major driver of signal-to-noise ratio (SNR) at high b values. It is important to consider potential Rician noise bias in both protocol design and applied diffusion model as it may impact quantitative results (see also Chapters 1 and 8 ). Additionally, due to spatial gradient nonlinearity, applied b values possess finite spatial nonuniformity, typically higher than nominal (isocenter value) at anatomical breast locations for horizontal bore scanners. This gradient system-dependent bias requires correction to reduce variability in multicenter, multisystem studies.

Use of single-shot (ss) echo-planar imaging (EPI) acquisitions for reasonably short scan duration (<5 minutes) makes clinical breast DWI susceptible to multiple artifacts, including those that stem from imperfect fat suppression, B 0 field inhomogeneity, and eddy currents (see Chapter 13 ). These issues become particularly challenging for advanced DWI techniques that target more subtle signal features, for example, in diffusion kurtosis via increased high b value range (see Chapters 1 and 8 ) or fractional anisotropy obtained from diffusion tensor imaging (DTI) with at least six diffusion sensitizing directions (see Chapter 9 ). Generation of quantitative DWI metrics based on different biophysical models relies on availability of sequence parameters (e.g., b values and DWI gradient orientation) retained in image metadata. This information is often stored in nonstandard structures (vendor-specific “private” metadata fields) in image headers or is not routinely available. Visualization and interpretation of DWI results in clinical picture archive and communication system (PACS) requires adherence to imaging and metadata standards provided by Digital Imaging and Communications in Medicine (DICOM) and imaging insight toolkit (ITK), including intensity scaling and units for quantitative parametric maps.

Most of the DWI standardization issues are common across different organs, but some are particularly acute for breast DWI (e.g., sensitivity to imperfect fat suppression; Fig. 13.6 ). The ultimate goal of breast DWI protocol standardization is to implement acquisition parameters and image postprocessing that minimize the variability across scanner platforms due to artifacts and technical biases.

Standardization Initiatives and Guidelines

Standardization is a dynamic process, because as new MRI scanners and acquisition sequences are developed, the improved DWI protocols become available on a subset of systems or over subject enrollment interval (e.g., high-resolution DWI with rFOV and segmented EPI acquisition; see Chapter 12 ). DWI biomarker standardization efforts have been initiated to various degrees by multiple organizations ( Table 14.1 ): Radiologic Society of North America (RSNA) Quantitative Imaging Biomarkers Alliance (QIBA) DWI, European Society of Breast Imaging (EUSOBI), National Institute of Standards and Technology (NIST), ACR Imaging Network (ACRIN), National Clinical Trial Network (NCTN), and Medical Imaging Technology Alliance (MITA). The QIBA DWI and EUSOBI groups aim to establish consensus recommendations, based on published data, to specifically inform breast DWI protocol implementation for reproducible results. They also promote rigorous quality assurance (QA) for clinical trials that use DWI as an imaging readout and seek clinical translation of discovered DWI-based biomarkers.

Table 14.1
Standardization Organizations and Breast-DWI Relevant Resources
Organization Resource URL
European Society of Breast Imaging (EUSOBI) Optimization of breast DWI protocols https://www.eusobi.org
RSNA Quantitative Imaging Biomarkers Alliance (QIBA) DWI profile outlining specifications to achieve baseline ADC precision https://qibawiki.rsna.org/images/6/63/QIBA_DWIProfile_Consensus_Dec2019_Final.pdf
American College of Radiology Imaging Network (ACRIN) Supports multicenter oncology clinical trials that use imaging end-points https://ecog-acrin.org
NCI National Clinical Trial Network (NCTN) Clinical trial QC requirements for cancer centers https://www.cancer.gov/research/infrastructure/clinical-trials/nctn
National Comprehensive Cancer Network (NCCN) Global cancer care, research and imaging standards https://www.nccn.org/
National Institute of Standards and Technology (NIST) Ground-truth DWI phantom calibration and lending https://www.nist.gov/programs-projects/nistnibib-medical-imaging-phantom-lending-library
Medical Imaging Technology Alliance (MITA) Performance standards for diagnostic imaging https://www.medicalimaging.org/standards
Digital Image and Communications in Medicine (DICOM) Image database and archival metadata standards for DWI and ADC parametric maps
Food and Drug Administration (FDA) Imaging device specifications and safety for clinical use
FDA Biomarkers, EndpointS, and Other Tools (BEST) Clinical imaging biomarker requirements
NCI The Cancer Imaging Archive (TCIA) Sharing annotated cancer imaging DICOM data https://www.cancerimagingarchive.net/collections
RSNA Quantitative Imaging Data Warehouse (QIDW) Sharing DWI phantom data, QC tools and DROs http://qidw.rsna.org/#collections
ADC, Apparent diffusion coefficient; DICOM, Digital Imaging and Communications in Medicine; DRO, digital reference object; DWI, diffusion-weighted imaging; QC, quality control.

The NIST imaging division develops and calibrates DWI reference standards for acquisition optimization and quantitative evaluation of technical biases. This institute also currently provides the quantitative MRI phantom loan library (see Table 14.1 ) for clinical trials and validation. The MITA division of the National Electrical Manufacturers Association has developed DICOM standards for DWI metadata and generated parametric maps to ensure compatibility with clinical PACS used by radiologists and to facilitate interoperability of analysis results. These standards require that critical parameters for DWI acquisition ( b values, diffusion gradient directions) and analysis (algorithm, output scales, units) are archived with and accessible from the metadata for DWI images and quantitative parametric maps (e.g., ADC).

In practice, dominant MR manufactures implement these requirements to variable degrees, for example, routinely storing b values, diffusion gradient directions and scale/units in optional “private” tags, which can change format between scanner software versions. On the other hand, multiframe 32-bit precision storage of the parametric DICOM volumes is not routinely supported by clinical PACS vendors and DICOM viewer workstations. They also do not routinely support segmentation DICOM for region of interest (ROI) definition. Therefore for implementation of advanced offline DWI analysis, the onus currently remains with the MR scientists/physicists to update DWI DICOM interpretation dictionaries according to vendor-provided DICOM conformance statements distributed with the scanner software that list the available metadata fields and formats. Furthermore, many relevant parameters, including DWI gradient waveform (e.g., single spin-echo [SSE] vs. double spin-echo [DSE], or bipolar) and timing (pulse duration and separation) are commonly missing from metadata. These issues need to be resolved to facilitate harmonization and integration of advanced DWI methods into clinical application workflow.

In the United States, the Food and Drug Administration (FDA) and National Cancer Institute (NCI) NCTN provide general regulations for quality control (QC) requirements in imaging device applications and clinical trials (e.g., FDA “Biomarkers, EndpointS, and Other Tools” [BEST]; see Table 14.1 ). Several imaging accreditation programs address standardization across multiple imaging modalities and include DWI guidelines to a variable degree. The American College of Radiology (ACR) accredits sites and scanners based on submitted test images, site protocols, and personnel qualifications. Accreditation specific to clinical trials using imaging is performed by both for-profit imaging contract research organizations (iCROs) and core labs associated with cooperative therapy groups. One example of the latter is the Eastern Cooperative Oncology Group (ECOG) ACRIN (see Table 14.1 ), which has supported studies to evaluate ADC use for breast cancer therapy response prediction (ACRIN 6698) and diagnosis (ACRIN 6702). Related groups also offer accreditation for clinical trials, with the objective of certifying select sites as “trial ready” in terms of their imaging performance. The National Comprehensive Cancer Network (NCCN) provides disease-based clinical practice guidelines for oncology, which is a primary target of current clinical DWI applications.

The EUSOBI organization recently launched an international initiative dedicated to breast-specific DWI protocol optimization to generate consensus guidelines and promote routine clinical implementation as a part of multiparametric MRI evaluation. These guidelines focus on achieving objective description of impeded tissue water mobility across multiple vendor platforms, including specifications on b values, spatial resolution, repetition and echo times, and ROI placement. This work summarizes consensus on assessment of lesion location and morphology using DWI and ADC, and reflects broad ADC ranges (in μm 2 /ms) for malignant (0.8–1.3), benign (1.2–2.0), and normal (1.7–2) breast tissue cited in current literature. The work also highlights the shortcomings of present nonstandard protocols in defining diagnostically meaningful ADC thresholds by potentially broadening observed ADC ranges.

Another international standardization initiative by QIBA DWI biomarker committee, formed by the RSNA, has developed a DWI profile (see Table 14.1 ) that provides guidelines to establish confidence intervals for interpretation of meaningful biological changes in quantitative ADC metrics for multiple organs, including the breast. As with any quantitative metrics, the confidence intervals of derived breast DWI parameters rely on evaluation of precision (scan-rescan repeatability, cross-platform reproducibility), and technical accuracy (measurement bias). The ultimate goal is to achieve confidence intervals tighter than the measured biological effect. The current QIBA DWI profile contains recommendations for standardization of breast DWI acquisition and analysis and quantitative system performance evaluation to address common technical issues for ADC measurements in multicenter, multiplatform environments. These guidelines ensure that harmonized baseline performance is achieved across multivendor platforms using a DWI phantom, and recommend application of an ADC digital reference object (shared via the RSNA Quantitative Imaging Data Warehouse [QIDW]; see Table 14.1 ) to confirm that image analysis does not introduce substantial technical bias.

Standardization Tools and Workflow: Repeatability and Technical Bias

Practical clinical translation workflow ideally starts from good correlation of pathological findings to the given DWI metric. Then optimization of the acquisition protocol is performed in conjunction with the DWI model constraints to minimize measurement uncertainty and bias, and maximize diagnostic performance, which is ultimately tested in multicenter clinical trials. Presence of system-dependent technical bias increases variability in data collated from multicenter trials leading to inconclusive results and/or necessitating large subject recruitment. Characterization and reporting of the acquisition and fit model parameters, as well as bias and repeatability, is critical for quantitative DWI metrics. The latter define confidence intervals to establish reproducible diagnostic thresholds and enable emerging radiomics and artificial intelligence (AI) applications.

Diffusion scans typically have lower spatial resolution compared with clinical T1- and T2-weighted sequences, and corresponding DWI are ideally coregistered to high-resolution images for (or used to guide) lesion segmentation and characterization. This image processing may introduce additional error sources. Finally, the confidence intervals for diagnostic thresholds are determined by assessing precision and accuracy of the DWI scan protocol and entire analysis workflow. According to QIBA metrology recommendations, the assessment of measurement precision for the chosen quantitative DWI metric relies on test-retest repeatability studies, which use the same scan protocol over a relatively short time interval to ensure negligible change in breast biological condition. The reproducibility studies, on the other hand, evaluate variation across sites or scanner platform and acquisition protocols. The quantitative measurement accuracy (bias) is assessed with a phantom that provides ground-truth values, and platform-dependent bias should be corrected, when possible, or accounted for in the metric confidence intervals.

Repeatability Analysis Guidelines

Test-retest repeatability studies are usually performed using repeated scans with the same protocol parameters for acquisition and analysis on a group of patients. The repeatability coefficient (RC) is assessed from the within-subject standard deviation (wSD) or coefficient of variance (wCV, when wSD scales with mean parameter value). RC provides an estimate for measurement precision, and can be used directly to establish 95% confidence intervals (CIs) for the longitudinal studies (e.g., treatment response monitoring), assuming constant technical bias across serial imaging time points.

QIBA metrology papers offer guidelines for design and sufficient powering of repeatability studies. In general, it is desirable to obtain test-retest scans on 30 or more subjects (preferably on different days, or at least with repositioning the patient if on the same day) with consistent acquisition and analysis parameters for derived breast DWI metrics (e.g., ADC mean, percentiles, or cold spot volume). Repeatability analysis, based on Bland–Altman statistics (see Fig. 14.1 ), assumes that biological tissue characteristics do not change between test and retest, and measurement noise is the main contributor to the disagreement between two results. Due to inherently varying optimal DWI protocols for tissue diffusion properties, different repeatability may be observed for normal tissue versus lesions. Hence, distinct CIs are expected for biologically dissimilar lesion ROIs. The majority of breast DWI repeatability studies performed to date have focused on different characteristics of the ADC histograms (see Fig. 14.1 ) for lesion ROIs, including mean, skew, percentiles, and histogram volumes (e.g., dense lesion volume defined as quantity of voxels with ADC up to a given threshold times the voxel volume).

Fig. 14.1, ADC test-retest repeatability for multisite breast DWI trial.

Breast DWI Metric Repeatability

Based on the recently published (sufficiently powered) multiplatform test-retest studies, the current QIBA DWI profile includes a longitudinal claim suggesting that meaningful changes in mean ADC of a breast lesion beyond 13% can be detected with 95% confidence. Similar good repeatability was observed for low ADC histogram percentiles (RC < 20%); however, histogram volumes showed generally poor repeatability (RC > 80%), likely due to high sensitivity to ROI segmentation errors. Higher ADC histogram moments and radiomic features also tend to show higher variability (e.g., 24% for “cold spot” ADC and 85% for entropy). These metrics are apparently more sensitive to noise, although generally less test-retest repeatability data is available for evaluation. Several small repeatability studies for advanced DWI metrics (intravoxel incoherent motion [IVIM] perfusion fraction, kurtosis, fractional anisotropy) similarly report lower repeatability compared with that of the mean ADC. This is also consistent with observation of lower reproducibility (e.g., 38%–48% for kurtosis and D*) between different acquisition protocols. More comprehensive studies are needed to characterize effect of lesion size and biological condition on DWI measurement precision. The general challenges of improving precision and accuracy, and minimizing scan time, should be addressed by acquisition protocol optimization and advanced image processing (e.g., denoizing) to allow clinical application of these promising quantitative breast DWI metrics.

Bias Analysis and Correction Guidelines

For cross-sectional diagnostic comparisons (e.g., to differentiate malignant from benign breast conditions), confidence intervals will include both precision and bias. Technical bias evaluation is performed with respect to a phantom with known DWI parameter values. Ideally, these values are chosen to mimic biophysical characteristics of breast diffusion supplied by in vivo observations and in silico modeling. Bias can be evaluated as a function of different acquisition parameters, including resolution, fat suppression, and spatial uniformity of b value ( Fig. 14.2 ). A set of physical phantoms has been developed to test acquisition and analysis of breast DWI. Diffusion properties of phantoms are usually confirmed by reference scans where bias is known to be negligible (e.g., high resolution, artifact free, at magnet isocenter), although reference scans are often not practical for daily clinical applications. The bias of the desired clinical protocol under evaluation is then established by measuring the difference for measured DWI metrics with respect to phantom reference values. Additionally, for appropriate statistical derivation of the CIs, the linearity test needs to be performed to ensure that measured metric is linearly correlated to the true values over the tissue-relevant range.

Fig. 14.2, Gradient system-dependent ADC bias for multiscanner breast DWI study.

Evaluation of bias for advanced quantitative models and fit algorithm fidelity may be performed based on virtual phantoms termed digital reference objects (DROs, e.g., for ADC SNR; see Table 14.1 ), which provide true parameter values from forward modeling of DWI signals and acquisition conditions. DWI DRO intensities are simulated for physical ranges of tissue diffusion parameters (e.g., ADC, D a , K a , D* and IVIM fraction). The tested fit algorithm results are then compared with the input DRO parameters to assess bias. Another option for DRO generation is to acquire high-resolution images free of artifact and bias using a long acquisition (e.g., high-resolution turbo-spin-echo [TSE]) that is not clinically viable, and impose known artifacts (e.g., partial volume averaging due to reduced resolution) to test postprocessing ability to eliminate them with respect to reference images. Sharing acquired test-retest DICOM DWI data (e.g., via the NCI Cancer Imaging Archive [TCIA] collections; see Table 14.1 ) is also a valuable resource to the breast DWI community to field test emerging denoizing, segmentation and classification algorithms. These tools are particularly susceptible to overtraining on small available data sets, resulting in poor generalization. Notably, shared and pooled test-retest data is an economical means to benchmark improved performance of new algorithms via improved repeatability statistics (reduced technical variability).

Optimization and Standardization of Breast DWI Acquisition and Analysis

Challenges of Breast DWI Protocol Standardization for Clinical Translation

Protocol optimization for breast DWI is based on maximizing the diagnostic yield, and minimizing the acquisition time and complexity for practical implementation in the clinical environment. Because DWI is sensitive to a variety of breast microstructural and physiological characteristics, optimization of clinical workflow should ideally start with evaluation of cellular histology for target pathology and its correlation to research DWI findings, for example, changes in vascular fraction probed by low b value ranges versus cellular density at intermediate b values versus impeded cellular diffusion at high b values ( Chapters 1 and 8 ; Figure 1.1, Figure 1.2, Figure 1.3, Figure 1.4 in Chapter 1 ). The clinical DWI protocol should then be adjusted to query pathology-specific tissue microstructure and suppress other (unwanted) components by choosing scan parameters that emphasize DWI contrast-to-noise ratio (CNR). The standardization challenge is in finding parameter ranges (TE, b values, fat suppression) that can be implemented across a variety of clinical scanners and be effective for the study population to probe the distribution of breast tissue diffusivity parameters. Standardization and optimization of breast DWI protocols typically involve establishing consistent b values and TE range, as well as ensuring efficient fat suppression and reduction of EPI artifacts (eddy current and shim distortions; Chapter 13 ) either by acquisition or image postprocessing and registration.

For instance, acquisition of multiple high b values ≥800 s/mm 2 for breast DWI could be more informative for impeded diffusion ; however, it requires prolonged examinations for sufficient SNR and may compromise simultaneous quantification of high ADC tissues. Improved spatial resolution and SNR, reduced distortion, and scan time are common goals of breast DWI protocol optimization and implementation of advanced acquisition techniques ( Fig. 14.3 ).

Fig. 14.3, Example breast DWI for ss-EPI versus MUSE.

Clinical Implementation Considerations (See Also Chapter 13 )

For practical consideration, adding DWI to an existing breast MRI protocol that typically contains T2w and dynamic contrast-enhanced MRI (DCE-MRI) acquisitions should strive to minimize disruption to the clinical workflow. Therefore it is important to keep the DWI acquisition time within a reasonable range (ideally <5–7 minutes) while maintaining adequate image quality, CNR, and SNR for radiologists to read the images. This is often difficult to accomplish within the limited time of the clinical breast MRI examination if complete breast coverage in the axial slice (S/I) direction is desired. Additionally, DWI should be acquired before DCE-MRI because gadolinium contrast agent has been shown in some studies to adversely affect the ability to calculate robust tissue ADC statistics, and this effect may be time dependent. The timing of the DWI acquisition needs to be set forth in the imaging protocol, and for multisite clinical trials, ideally controlled for by the central data processing site. Another practical consideration is that DWI images should not interfere with clinical reading; that is, if directional diffusion images are being acquired for offline ADC analysis (e.g., calculation of ADC map for b value subset or eddy current correction), radiologists may request that these images are not sent to PACS; if so, directional DWI data may need to be sent to a separate offline processing computer.

Quantitative Considerations

For multisite quantitative MRI studies, protocol consistency across multiple imaging platforms is important to allow data pooling for statistical analysis. This requires a consistent DWI acquisition protocol (see important factors in Table 14.2 ) that is implementable across different MRI vendor platforms and software versions. The harmonized DWI protocol should specify values (or ranges) for basic/standard DWI acquisition parameters such as repetition time (TR), TE, b values, acquisition time, etc. The b values must be consistent across multiple imaging sites and multiple imaging time points for the same patients. Longitudinal DWI data should ideally be acquired on the same MRI scanner using the same breast coil. DWI data must have sufficient SNR to allow derivation of quantitatively accurate model metrics, with the growing emphasis and effort in the field directed toward evaluation of accuracy of breast ADC values. Accurate and reproducible ADC measurements require that TR and TE are within certain ranges to ensure no significant loss of signal (e.g., TR > 4000 ms; TE = 50–100 ms). Another acquisition parameter to stipulate is the range of acceleration factors (high enough to allow acquisition time reduction but low enough to minimize SNR loss). The critical metadata information relevant for downstream analysis and not routinely preserved in DICOM (e.g., actual b value, diffusion encoding direction, and sequence variant) should be recorded.

Table 14.2
DWI Acquisition Standardization Factors
Acquisition Property Examples Effect
Fat suppression Shim quality, technique, e.g., spectral vs. inversion recovery (SPAIR vs. STIR) ADC bias, chemical shift artifact, lesion segmentation, geometrical accuracy, image quality, acquisition time
Acquisition timing TR, TE, receiver bandwidth SNR, T2-weighting, ADC accuracy, acquisition time
Pulse sequence class Type, e.g., ss-EPI vs. ms-EPI vs. ssTSE, Cartesian vs. radial SNR, T2-weighting, image distortion, motion artifact, resolution, acquisition time
Diffusion gradient waveform DWI pulse width and separation; gradient amplitude ( b value) and polarity (e.g., DSE vs. SSE vs. bipolar) SNR, T2-weighting, ADC value, eddy-current distortion
b Value dependent averaging Increase number of averages at high b values Acquisition time, SNR, ADC accuracy
Effective echo spacing Parallel imaging acceleration factor, multishot, receiver bandwidth Degree of distortion, ghosting, SNR, acquisition time
Geometry FOV, acquired voxel size, slice thickness, fold-over direction Spatial resolution, SNR, quantitative accuracy, acquisition time, breast coverage
Multi- b value sampling IVIM, ADC, DKI protocols Quantitative parameter bias, acquisition time
Scan duration Sequence class, geometry, fat suppression, b value sampling and averaging Image quality, quantitative accuracy, and clinical feasibility
ADC, Apparent diffusion coefficient; DKI, diffusion kurtosis imaging; DSE, double spin-echo; DWI, diffusion-weighted imaging; FOV, field of view; IVIM, intravoxel incoherent motion; ms-EPI, multi-shot echo planar imaging; ss-EPI, single-shot echo-planar imaging; SNR, signal-to-noise ratio; SPAIR, spectral attenuated inversion recovery; SSE, single spin-echo; ssTSE, single shot turbo spin echo; STIR, spectral presaturation with inversion recovery; TE, echo time; TR, repetition time.

You're Reading a Preview

Become a Clinical Tree membership for Full access and enjoy Unlimited articles

Become membership

If you are a member. Log in here