Quantitative Image Analysis

Liu Q, Liu H, Mirian N, Ren S, Viswanath V, Karp JS, Surti S, Liu C. 

Phys Med Biol, vol. 67, pp: 145014, 2022.

Objective. Deep learning denoising networks are typically trained with images that are representative of the testing data. Due to the large variability of the noise levels in positron emission tomography (PET) images, it is challenging to develop a proper training set for general clinical use. Our work aims to develop a personalized denoising strategy for the low-count PET images at various noise levels.

Approach. We first investigated the impact of the noise level in the training images on the model performance. Five 3D U-Net models were trained on five groups of images at different noise levels, and a one-size-fits-all model was trained on images covering a wider range of noise levels. We then developed a personalized weighting method by linearly blending the results from two models trained on 20%-count level images and 60%-count level images to balance the trade-off between noise reduction and spatial blurring. By adjusting the weighting factor, denoising can be conducted in a personalized and task-dependent way.

Main results. The evaluation results of the six models showed that models trained on noisier images had better performance in denoising but introduced more spatial blurriness, and the one-size-fits-all model did not generalize well when deployed for testing images with a wide range of noise levels. The personalized denoising results showed that noisier images require higher weights on noise reduction to maximize the structural similarity and mean squared error. And model trained on 20%-count level images can produce the best liver lesion detectability.

Significance. Our study demonstrated that in deep learning-based low dose PET denoising, noise levels in the training input images have a substantial impact on the model performance. The proposed personalized denoising strategy utilized two training sets to overcome the drawbacks introduced by each individual network and provided a series of denoised results for clinical reading.

Viswanath V, Zhou R, Lee H, Li S, Cragin A, Doot RK, Mankoff DA, Pantel AR.

J Nucl Med, vol. 62, pp: :1154-1162, 2021.

The PET radiotracer 18F-(2S,4R)4-fluoroglutamine (18F-Gln) reflects glutamine transport and can be used to infer glutamine metabolism. Mouse xenograft studies have demonstrated that 18F-Gln uptake correlates directly with glutamine pool size and is inversely related to glutamine metabolism through the glutaminase enzyme. To provide a framework for the analysis of 18F-Gln-PET, we have examined 18F-Gln uptake kinetics in mouse models of breast cancer at baseline and after inhibition of glutaminase. We describe results of the preclinical analysis and computer simulations with the goal of model validation and performance assessment in anticipation of human breast cancer patient studies. 

Methods: Triple-negative breast cancer and receptor-positive xenografts were implanted in athymic mice. PET mouse imaging was performed at baseline and after treatment with a glutaminase inhibitor or a vehicle solution for 4 mouse groups. Dynamic PET images were obtained for 1 h beginning at the time of intravenous injection of 18F-Gln. Kinetic analysis and computer simulations were performed on representative time-activity curves, testing 1- and 2-compartment models to describe kinetics. 

Results: Dynamic imaging for 1 h captured blood and tumor time-activity curves indicative of largely reversible uptake of 18F-Gln in tumors. Consistent with this observation, a 2-compartment model indicated a relatively low estimate of the rate constant of tracer trapping, suggesting that the 1-compartment model is preferable. Logan plot graphical analysis demonstrated late linearity, supporting reversible kinetics and modeling with a single compartment. Analysis of the mouse data and simulations suggests that estimates of glutamine pool size, specifically the distribution volume (VD) for 18F-Gln, were more reliable using the 1-compartment reversible model than the 2-compartment irreversible model. Tumor-to-blood ratios, a more practical potential proxy of VD, correlated well with volume of distribution from single-compartment models and Logan analyses. 

Conclusion: Kinetic analysis of dynamic 18F-Gln-PET images demonstrated the ability to measure VD to estimate glutamine pool size, a key indicator of cellular glutamine metabolism, by both a 1-compartment model and Logan analysis. Changes in VD with glutaminase inhibition support the ability to assess response to glutamine metabolism-targeted therapy. Concordance of kinetic measures with tumor-to-blood ratios provides a clinically feasible approach to human imaging.

Schmall JP, Surti S, Otero HJ, Servaes S, Karp JS, States LJ. 

J Nucl Med, vol. 62, pp: 123-130, 2021.

 

In this study, we investigated the diagnostic performance of whole-body 18F-FDG imaging using a PET/MRI scanner with time-of-flight capability for low-dose clinical imaging of pediatric patients. In addition to clinically acquired image data using a dosing regimen of 3.7 MBq/kg, images from simulated low-dose regimens (1.9-0.41 MBq/kg) were evaluated using several metrics: SUV quantitation, qualitative image quality, and lesion detectability. 

Methods: Low-dose images were generated by truncating the list-mode PET data to reduce the count statistics. Changes in PET quantitation for low-dose images were assessed using volume-of-interest analysis of healthy tissue and suspected lesions. Three pediatric radiologists reviewed the image volumes without knowing the dose level. Qualitative image quality was assessed on the basis of Likert scoring. Radiologists were also asked to identify suspected lesions within the liver for PET-only and PET/MR images. Lesion detectability was measured using a receiver-operating-characteristic study and quantified using a free-response receiving-operating-characteristic (FROC) methodology to assess changes in performance for low-dose images. 

Results: Our analysis of volume-of-interest quantitation showed that SUVs remain stable down to ⅓ dose (1.2 MBq/kg). Likert scoring of PET/MR images showed no noticeable trend with dose level; however, scores of PET-only images were lower for low-dose scans, with a 12% reduction for ⅓-dose images compared with full-dose images. There was minimal change in total lesion count for different dose levels; however, all 3 readers had an increase in false-negatives for ⅓-dose images compared with full-dose images. Using the FROC methodology to quantify lesion-detection performance for human observers, no significant differences were observed for the 3 dosing levels when using the averaged reader data (all P values > 0.103). For all readers, the FROC performance was higher for PET/MRI than for PET alone. 

Conclusion: Reductions to the lowest recommended pediatric dosing regimens are possible when using PET/MRI. The data suggest that the administered dose can be decreased to 2.46 MBq/kg, a 33% reduction in PET activity, with no degradation in image quality, leading to a corresponding reduction in absorbed dose.

Surti S, Viswanath V, Daube-Witherspoon ME, Conti M, Casey ME,  Karp JS

J Nucl Med, vol. 61, pp: 1684-1690, 2020.

Latest digital whole-body PET scanners provide a combination of higher sensitivity and improved spatial and timing resolution. We performed a lesion detectability study on two generations of Siemens Biograph PET/CT scanners, the mCT and Vision, to study the impact of improved physical performance on clinical performance. Our hypothesis is that the improved performance of the Vision will result in improved lesion detectability, allowing shorter imaging times or equivalently, lower injected dose. 

Methods: Data were acquired with the Society of Nuclear Medicine and Molecular Imaging Clinical Trials Network torso phantom combined with a 20-cm diameter cylindrical phantom. Spherical lesions were emulated by acquiring spheres-in-air data, and combining it with the phantom data to generate combined datasets with embedded lesions of known contrast. Two sphere sizes and uptakes were used: 9.89 mm diameter spheres with 6:1 (lung) and 3:1 (cylinder) and 4.95 mm diameter spheres with 9.6:1 (lung) and 4.5:1 (cylinder) local activity concentration uptakes. Standard image reconstruction was performed: ordinary Poisson ordered subsets expectation maximization algorithm with point spread function and time-of-flight modeling and post-reconstruction smoothing with a 5 mm Gaussian filter. The Vision images were also generated without any post-reconstruction smoothing. Generalized scan statistics methodology was used to estimate the area under the localization receiver operating characteristic curve (ALROC). 

Results: Higher sensitivity and improved TOF performance of Vision leads to reduced contrast in the background noise nodule distribution. Measured lesion contrast is also higher on the Vision due to its improved spatial resolution. Hence, the ALROC values are noticeably higher for the Vision relative to the mCT. 

Conclusion: Improved overall performance of the Vision provides a factor of 4-6 reduction in imaging time (or injected dose) over the mCT when using the ALROC metric for lesions >9.89 mm in diameter. Smaller lesions are barely detected in the mCT, leading to even higher ALROC gains with the Vision. Improved spatial resolution of the Vision also leads to a higher measured contrast that is closer to the real uptake, implying improved quantification. Post-reconstruction smoothing, however, reduces this improvement in measured contrast, thereby reducing the ALROC values for small, high uptake lesions.

Guerraty MA, Johnson LC, Blankemeyer E, Rader DJ, Moore SC, Metzler SD

J Nucl Cardiol, vol. 28, pp: 2647-2656, 2021.

Background: Despite growing interest in coronary microvascular disease (CMVD), there is a dearth of mechanistic understanding. Mouse models offer opportunities to understand molecular processes in CMVD. We have sought to develop quantitative mouse imaging to assess coronary microvascular function.

Methods: We used 99mTc-sestamibi to measure myocardial blood flow in mice with MILabs U-SPECT+ system. We determined recovery and crosstalk coefficients, the influx rate constant from blood to myocardium (K1), and, using microsphere perfusion, constraints on the extraction fraction curve. We used 99mTc and stannous pyrophosphate for red blood cell imaging to measure intramyocardial blood volume (IMBV) as an alternate measure of microvascular function.

Results: The recovery coefficients for myocardial tissue (RT) and left ventricular arterial blood (RA) were 0.81 ± 0.16 and 1.07 ± 0.12, respectively. The assumption RT = 1 − FBV (fraction blood volume) does not hold in mice. Using a complete mixing matrix to fit a one-compartment model, we measured K1 of 0.57 ± 0.08 min−1. Constraints on the extraction fraction curve for 99mTc-sestamibi in mice for best-fit Renkin–Crone parameters were α = 0.99 and β = 0.39. Additionally, we found that wild-type mice increase their IMBV by 22.9 ± 3.3% under hyperemic conditions.

Conclusions: We have developed a framework for measuring K1 and change in IMBV in mice, demonstrating non-invasive µSPECT-based quantitative imaging of mouse microvascular function.

Johnson LC, Guerraty MA, Moore SC, Metzler SD

Phys Med Biol, vol. 64, pp. 065018, 2019.

Myocardial blood flow and myocardial blood flow reserve (MBFR) measurements are often used clinically to quantify coronary microvascular function. Developing imaging-based methods to measure MBFR for research in mice would be advantageous for evaluating new treatment methods for coronary microvascular disease (CMVD), yet this is more challenging in mice than in humans. This work investigates microSPECT's quantitative capabilities of cardiac imaging by utilizing a multi-part cardiac phantom and applying a known kinetic model to synthesize kinetic data from static data, allowing for assessment of kinetic modeling accuracy. The phantom was designed with four main components: two left-ventricular (LV) myocardial sections and two LV blood-pool sections, sized for end-systole (ES) and end-diastole (ED). Each section of the phantom was imaged separately while acquiring list-mode data. These static, separate-compartment data were manipulated into synthetic dynamic data using a kinetic model representing the myocardium and blood-pool activity concentrations over time and then combined into a set of dynamic image frames and reconstructed. Regions of interest were drawn on the resulting images, and kinetic parameters were estimated. This process was performed for three tracer uptake values (K 1), three myocardial wall thicknesses, ten filter parameters, and 20 iterations for 25 noise ensembles. The degree of filtering and iteration number were optimized to minimize the root mean-squared error (RMSE) of K 1 values, with the largest number of iterations and minimal filtering yielding the lowest error. Using the optimized parameters, K 1 was determined with reasonable error (~3% RMSE) over all wall thicknesses and K 1input values. This work demonstrates that accurate and precise measurements of K 1 are possible for the U-SPECT+  system used in this study, for several different uptake rates and LV dimensions. Additionally, it allows for future investigation utilizing other imaging systems, including PET studies with any radiotracer, as well as with additional phantom parts containing lesions.

O’Sullivan F, O’Sullivan JN, Huang J, Doot RK, Muzi M, Schubert EK, Peterson L, Dunnwald LK, Mankoff DM.

J Med Imaging (Bellingham), vol. 5, pp: 011010, 2018.

Blood flow-metabolism mismatch from dynamic positron emission tomography (PET) studies with 15O-labeled water ( H2O) and 18F-labeled fluorodeoxyglucose (FDG) has been shown to be a promising diagnostic for locally advanced breast cancer (LABCa) patients. The mismatch measurement involves kinetic analysis with the arterial blood time course (AIF) as an input function. We evaluate the use of a statistical method for AIF extraction (SAIF) in these studies. Fifty-three LABCa patients had dynamic PET studies with H2O and FDG. For each PET study, two AIFs were recovered, an SAIF extraction and also a manual extraction based on a region of interest placed over the left ventricle (LV-ROI). Blood flow-metabolism mismatch was obtained with each AIF, and kinetic and prognostic reliability comparisons were made. Strong correlations were found between kinetic assessments produced by both AIFs. SAIF AIFs retained the full prognostic value, for pathologic response and overall survival, of LV-ROI AIFs.

Panetta JV, Daube-Witherspoon ME, Karp JS.

Med Phys, vol. 44, pp: 3534-3544, 2017.

Purpose: To improve the precision of multicenter clinical trials, several efforts are underway to determine scanner‐specific parameters for harmonization using standardized phantom measurements. The goal of this study was to test the correspondence between quantification in phantom and patient images and validate the use of phantoms for harmonization of patient images.

Methods: The National Electrical Manufacturers' Association image quality phantom with hot spheres was scanned on two time‐of‐flight PET scanners. Whole‐body [18F]‐fluorodeoxyglucose (FDG)‐PET scans were acquired of subjects on the same systems. List‐mode events from spheres (diam.: 10–28 mm) measured in air on each scanner were embedded into the phantom and subject list‐mode data from each scanner to create lesions with known uptake with respect to the local background in the phantom and each subject's liver and lung regions, as a proxy to characterize true lesion quantification. Images were analyzed using the contrast recovery coefficient (CRC) typically used in phantom studies and serving as a surrogate for the standardized uptake value used clinically. Postreconstruction filtering (resolution recovery and Gaussian smoothing) was applied to determine if the effect on the phantom images translates equivalently to subject images. Three postfiltering strategies were selected to harmonize the CRCmean or CRCmax values between the two scanners based on the phantom measurements and then applied to the subject images.

Results: Both the average CRCmean and CRCmax values for lesions embedded in the lung and liver in four subjects (BMI range 25–38) agreed to within 5% with the CRC values for lesions embedded in the phantom for all lesion sizes. In addition, the relative changes in CRCmeanand CRCmax resulting from the application of the postfilters on the subject and phantom images were consistent within measurement uncertainty. Further, the root mean squared percent difference (RMSpd) between CRC values on the two scanners calculated over the three sphere sizes was significantly reduced in the subjects using postfiltering strategies chosen to harmonize CRCmean or CRCmax based on phantom measurements: RMSpd of the CRCmean values in subjects was reduced from 36% to < 8% after harmonizing CRCmean, while RMSpd for CRCmax was reduced from ~33% to < 6% after harmonizing CRCmax with a different strategy. However, with this strategy designed to harmonize CRCmax, the RMSpd for CRCmean only improved to ~14% in subjects.

Conclusions: The consistency of the CRC measurements between the phantom and subject data demonstrates that harmonization strategies defined with phantom studies track well to patient images. However, quantitative agreement between different scanners as represented by the RMSpd depends on the metric chosen for harmonization.

Scheuermann JS, Reddin JS, Opanowski A, et. al.

J Nucl Med, vol. 58, pp: 1065-1071, 2017.

Objectives Quantitative PET/CT imaging can provide early assessment of tumor response at a molecular level, thereby enabling more objective, efficient, and accurate trials of new therapeutic agents. The primary objective of the Centers of Quantitative Imaging Excellence (CQIE) project, run by the American College of Radiology Imaging Network (ACRIN), is to establish sites within the NCI Cancer Center Program that are capable of conducting clinical trials in which there are integral molecular and/or functional advanced imaging endpoints.

Methods An imaging test suite was developed based on existing ACRIN and ACR protocols. These included SUV measurements, axial uniformity, and contrast ratio tests using uniform cylinders and the ACR phantom. Dynamic imaging tests were also included.

Results A total of 59 NCI Cancer Centers were contacted, and 56 agreed to participate. There were 64 PET/CT systems tested (GE: 6 models and 36 scanners, Siemens: 7 models and 21 scanners, Philips: 3 models and 7 scanners). All systems achieved CQIE certification: 25 systems passed on first attempt, 30 required two attempts, 9 required 3 or more attempts. Reasons for failure were: Quantitative errors (21), incomplete data or forms (15), and incorrect protocols (13). After CQIE certification, SUVs for a uniform cylinder were 1.00±0.03 for body imaging protocols and 0.99±0.04 for brain imaging protocols. Axial variation in uniform cylinders was 5.3% and 3.7% for body and brain imaging protocols.

Conclusions Over half (39/64) of the PET/CT scanner required testing more than once to achieve CQIE certification, with quantitative errors being the most common failure mode. In some cases scanner re-calibration corrected the quantitative errors. After CQIE qualification, SUV and axial uniformity values were within acceptable levels for clinical trials using quantitative PET imaging.

Prior studies have shown that breast cancer patients experienced higher mortality and recurrence risks when their tumors failed to show a decline in blood flow (BF) after neoadjuvant therapy, as measured directly from 15O-water PET scans (1) or indirectly from dynamic 18F-FDG PET scans using changes in 18F-FDG transport (K1), which can be estimated via kinetic image analyses (1,2). The gold standard BF PET tracer, 15O-water (3), is only available at PET imaging centers that have a cyclotron on-site because of the short 2-min half-life of 15O. 18F-FDG is a more widely available radiotracer, with a half-life that allows regional supply to clinical centers. However, the 60-min dynamic 18F-FDG PET imaging protocol used by Dunnwald et al. that enabled estimates of 18F-FDG transport (K1) and metabolic flux (Ki) to predict disease-free survival and overall survival (2) is impractical in a busy clinical setting (4).

Daube-Witherspoon ME, Surti S, Perkins AE, Karp JS.

J Nucl Med, vol. 55, pp. 602-607, 2014.

Inclusion of time-of-flight (TOF) information in PET reconstructions has been demonstrated to improve image quality through better signal-to-noise ratios, faster convergence, better lesion detectability, and better image uniformity. The goal of this work was to assess the impact of TOF information on the accuracy and precision of quantitative measurements of activity uptake in small lesions in clinical studies. 

Methods: Data from small (10-mm diameter) spheres were merged with list-mode data from 6 healthy volunteers after injection of 18F-FDG. Six spheres having known activity uptake with respect to the average whole-body uptake were embedded in both the liver and the lung of the subject’s data. Images were reconstructed with TOF information and without TOF information (non-TOF reconstruction). The measured uptake was compared with the known activity; variability was measured across 60 bootstrapped replicates of the merged data, across the 6 spheres within a given organ, and across all spheres in all subjects. 

Results: The average uptake across all spheres and subjects was approximately 50% higher in the lung and 20% higher in the liver with TOF reconstruction than with non-TOF reconstruction at comparable noise levels. The variabilities across replicates, across spheres within an organ, and across all spheres and subjects were 20%–30% lower with TOF reconstruction than with non-TOF reconstruction in the lung; in the liver, the variabilities were 10%–20% lower with TOF reconstruction than with non-TOF reconstruction. 

Conclusion: TOF reconstruction leads to more accurate and precise measurements, both within a subject and across subjects, of the activity in small lesions under clinical conditions.

Doot RK, McDonald ES, Mankoff DA.

Clin Transl Imaging, vol. 2, pp. 295-303, 2014.

Positron emission tomography (PET) measures of cancer metabolism and cellular proliferation are increasingly being studied as markers of cancer response to treatment, with the goal of using them as predictors of patient therapeutic outcomes—i.e., as surrogate outcome measures. The primary PET radiotracers so far used for monitoring response of cancer to treatment are 18F-fluorodeoxyglucose (FDG) for studying abnormal energy metabolism and 18F-fluorothymidine (FLT) for examining cell proliferation. Both FDG and FLT PET quantitation of cancer response to treatment have been found to correlate with patient outcomes, mostly in single-center studies. The aim of this review is to summarize the impact of commonly selected PET quantitation methods on the ability of PET measures to quantitate cancer response to treatment. An understanding of the biochemistry and kinetics of FDG and FLT uptake and knowledge of the expected tracer uptake by cancerous processes relative to background uptake are required to select appropriate PET quantitation methods for trial testing for correlations between PET measures and patient outcome. PET measures may eventually serve as predictive biomarkers capable of guiding individualized treatment and improving patient outcomes and quality of life by early identification of ineffective therapies. PET can also potentially identify patients who would be good candidates for molecularly targeted drugs and monitor response to these personalized therapies.

Doot RK, Pierce LA, Byrd D, Elston B, Allberg KC, Kinahan PE.

Trans Oncol, vol. 7, pp. 48-54, 2014.

This study investigates measurement biases in longitudinal positron-emission tomography/computed tomography (PET/CT) studies that are due to instrumentation variability including human error. Improved estimation of variability between patient scans is of particular importance for assessing response to therapy and multicenter trials. We used National Institute of Standards and Technology-traceable calibration methodology for solid germanium-68/gallium-68 (68Ge/68Ga) sources used as surrogates for fluorine-18 (18F) in radionuclide activity calibrators. One cross-calibration kit was constructed for both dose calibrators and PET scanners using the same 9-month half-life batch of 68Ge/68Ga in epoxy. Repeat measurements occurred in a local network of PET imaging sites to assess standardized uptake value (SUV) errors over time for six dose calibrators from two major manufacturers and for six PET/CT scanners from three major manufacturers. Bias in activity measures by dose calibrators ranged from −50%to 9%and was relatively stable over time except at one site that modified settings between measurements. Bias in activity concentration measures by PET scanners ranged from −27% to 13% with a median of 174 days between the six repeat scans (range, 29 to 226 days). Corresponding errors in SUV measurements ranged from −20% to 47%. SUV biases were not stable over time with longitudinal differences for individual scanners ranging from −11% to 59%. Bias in SUV measurements varied over time and between scanner sites. These results suggest that attention should be paid to PET scanner calibration for longitudinal studies and use of dose calibrator and scanner cross-calibration kits could be helpful for quality assurance and control

We investigate an approach to evaluation of emission-tomography (ET) imaging systems used for region-of-interest (ROI) estimation tasks. In the evaluation we employ the concept of “emission counts” (EC), which are the number of events per voxel emitted during a scan. We use the reduction in posterior variance of ROI EC, compared to the prior ROI EC variance, as the metric of primary interest, which we call the “posterior variance reduction index” (PVRI). Systems that achieve a higher PVRI are considered superior to systems with lower PVRI. The approach is independent of the reconstruction method and is applicable to all photon-limited data types including list-mode data. We analyzed this approach using a model of 2-D tomography, and compared our results to the classical theory of tomographic sampling. We found that performance evaluations using the PVRI index were consistent with the classical theory. System evaluation based on EC posterior variance is an intuitively appealing and physically meaningful method that is useful for evaluation of system performance in ROI quantitation tasks.

Moore SC, Southekal S, Park M-A, McQuaid SJ, Kijewski MF, Müller SP.

IEEE Trans Med Imaging, vol. 31, pp. 405-416, 2012.

We have developed a neσw method of compensating for effects of partial volume and spillover in dual-modality imaging. The approach requires segmentation of just a few tissue types within a small volume-of-interest (VOI) surrounding a lesion; the algorithm estimates simultaneously, from projection data, the activity concentration within each segmented tissue inside the VOI. Measured emission projections were fitted to the sum of resolution-blurred projections of each such tissue, scaled by its unknown activity concentration, plus a global background contribution obtained by reprojection through the reconstructed image volume outside the VOI. The method was evaluated using multiple-pinhole μSPECT data simulated for the MOBY mouse phantom containing two spherical lung tumors and one liver tumor, as well as using multiple-bead phantom data acquired on μSPECT and μCT scanners. Each VOI in the simulation study was 4.8 mm (12 voxels) cubed and, depending on location, contained up to four tissues (tumor, liver, heart, lung) with different values of relative 99m Tc concentration. All tumor activity estimates achieved <; 3% bias after ~ 15 ordered-subsets expectation maximization (OSEM) iterations (×10 subsets), with better than 8% precision (≤ 25% greater than the Cramer-Rao lowσer bound). The projection-based fitting approach also outperformed three standardized uptake value (SUV)-like metrics, one of which was corrected for count spillover. In the bead phantom experiment, the mean ± standard deviation of the bias of VOI estimates of bead concentration were 0.9±9.5%, comparable to those of a perturbation geometric transfer matrix (pGTM) approach (-5.4±8.6%); however, VOI estimates were more stable with increasing iteration number than pGTM estimates, even in the presence of substantial axial misalignment between μCT and μSPECT image volumes.

Doot RK, Scheuermann JS, Christian PE, Karp JS, and Kinahan PE

Med Phys, vol. 37, pp:6035-46, 2010.

Purpose: The variances and biases inherent in quantifying PET tracer uptake from instrumentation factors are needed to ascertain the significance of any measured differences such as in quantifying response to therapy. The authors studied the repeatability and reproducibility of serial PET measures of activity as a function of object size, acquisition, reconstruction, and analysis method on one scanner and at three PET centers using a single protocol with long half‐life phantoms.

Methods: The authors assessed standard deviations (SDs) and mean biases of consecutive measures of PET activity concentrations in a uniform phantom and a NEMA NU‐2 image quality (IQ) phantom filled with 9 months half‐life urn:x-wiley:00942405:media:mp9298:mp9298-math-0001 in an epoxy matrix. Activity measurements were normalized by dividing by a common decay corrected true value and reported as recovery coefficients (RCs). Each experimental set consisted of 20 consecutive PET scans of either a stationary phantom to evaluate repeatability or a repositioned phantom to assess reproducibility. One site conducted a comprehensive series of repeatability and reproducibility experiments, while two other sites repeated the reproducibility experiments using the same IQ phantom. An equation was derived to estimate the SD of a new PET measure from a known SD based on the ratios of available coincident counts between the two PET measures.

Results: For stationary uniform phantom scans, the SDs of maximum RCs were three to five times less than predicted for uncorrelated pixels within circular regions of interest (ROIs) with diameters ranging from 1 to 15 cm. For stationary IQ phantom scans from 1 cm diameter ROIs, the average SDs of mean and maximum RCs ranged from 1.4% to 8.0%, depending on the methods of acquisition and reconstruction (coefficients of variation range 2.5% to 9.8%). Similar SDs were observed for both analytic and iterative reconstruction methods urn:x-wiley:00942405:media:mp9298:mp9298-math-0002. SDs of RCs for 2D acquisitions were significantly higher than for 3D acquisitions urn:x-wiley:00942405:media:mp9298:mp9298-math-0003 for same acquisition and processing parameters. SDs of maximum RCs were larger than corresponding mean values for stationary IQ phantom scans urn:x-wiley:00942405:media:mp9298:mp9298-math-0004, although the magnitude of difference is reduced due to noise correlations in the image. Increased smoothing decreased SDs urn:x-wiley:00942405:media:mp9298:mp9298-math-0005and decreased maximum and mean RCs urn:x-wiley:00942405:media:mp9298:mp9298-math-0006. Reproducibility of GE DSTE, Philips Gemini TF, and Siemens Biograph Hi‐REZ PET/CT scans of the same IQ phantom, with similar acquisition, reconstruction, and repositioning among 20 scans, were, in general, similar (mean and maximum RC SD range 2.5% to 4.8%).

Conclusions: Short‐term scanner variability is low compared to other sources of error. There are tradeoffs in noise and bias depending on acquisition, processing, and analysis methods. The SD of a new PET measure can be estimated from a known SD if the ratios of available coincident counts between the two PET scanner acquisitions are known and both employ the same ROI definition. Results suggest it is feasible to use PET/CTs from different vendors and sites in clinical trials if they are properly cross‐calibrated.

Scheuermann JS, Saffer JR, Karp JS, Levering AM, Siegel BA.

J Nucl Med, vol. 50, pp. 1187-1193, 2009.

The PET Core Laboratory of the American College of Radiology Imaging Network (ACRIN) qualifies sites to participate in multicenter research trials by quantitatively reviewing submitted PET scans of uniform cylinders to verify the accuracy of scanner standardized uptake value (SUV) calibration and qualitatively reviewing clinical PET images from each site. To date, cylinder and patient data from 169 PET scanners have been reviewed, and 146 have been qualified. 

Methods: Each site is required to submit data from 1 uniform cylinder and 2 patient test cases. Submitted phantom data are analyzed by drawing a circular region of interest that encompasses approximately 90% of the diameter of the interior of the phantom and then recording the mean SUV and SD of each transverse slice. In addition, average SUVs are measured in the liver of submitted patient scans. These data illustrate variations of SUVs across PET scanners and across institutions, and comparison of results with values submitted by the site indicate the level of experience of PET camera operators in calculating SUVs. 

Results: Of 101 scanner applications for which detailed records of the qualification process were available, 12 (12%) failed because of incorrect SUV or normalization calibrations. For sites to pass, the average cylinder SUV is required to be 1.0 ± 0.1. The average SUVs for uniform cylinder images for the most common scanners evaluated—Siemens Biograph PET/CT (n = 43), GE Discovery LS PET/CT (n = 15), GE Discovery ST PET/CT (n = 34), Philips Allegro PET (n = 5), and Philips Gemini PET/CT (n = 11)—were 0.99, 1.01, 1.00, 0.98, and 0.95, respectively, and the average liver SUVs for submitted test cases were 2.34, 2.13, 2.27, 1.73, and 1.92, respectively.

Conclusion: Minimizing errors in SUV measurement is critical to achieving accurate quantification in clinical trials. The experience of the ACRIN PET Core Laboratory shows that many sites are unable to maintain accurate SUV calibrations without additional training or supervision. This raises concerns about using SUVs to quantify patient data without verification.

For volume-imaging PET scanners, no septa are used to maximize the sensitivity by collecting events oblique to the scanner axis. The authors answer two questions: (i) how does the performance of an image reconstruction algorithm for a volume-imaging PET scanner depend on its general dimensions? and (ii) at what point is a three-dimensional (3D) reconstruction algorithm needed for a volume-imaging scanner, as the axial extent is increased? A 3D reconstruction algorithm will accurately incorporate the oblique events in a reconstruction of the original source distribution. From simulations of an existing volume PET scanner with a maximum axial acceptance angle (+or- alpha ) of alpha =9 degrees , however, the authors show that the single-slice rebinning algorithm is a good compromise between sensitivity, speed, and accuracy when compared to standard two-dimensional reconstruction ( alpha =1 degrees ), and a 3D reconstruction with alpha =9 degrees . The authors also show with simulations that a new scanner with alpha =27 degrees requires 3D reconstruction in order to achieve maximum sensitivity without unacceptable losses in accuracy. Measurements of scanner performance are based on a series of figures of merit that characterize image quality and quantitative accuracy measured from a set of simulated test phantoms.

Matej S, Herman GT, Narayan TK, Furuie SS, Lewitt RM, Kinahan PE.

Phys Med Biol, vol. 39, pp. 355-367, 1994.

The relative performance of 5 fully 3D PET reconstruction algorithms is evaluated. The algorithms are a filtered backprojection (FBP) method and 2 variants each of the EM-ML and ART iterative methods. For each of the iterative methods, 1 variant makes use of voxels and the other makes use of 'blobs' (spherically symmetric functions smoothly decaying to zero at their boundaries) as basis functions in its discrete reconstruction model. The methods are evaluated from the point of view of the efficacy of the reconstructions produced by them for 3 typical medical tasks - estimation of the average activity inside specific regions of interest, hot spots detection, and cold spots detection. A free parameter is allowed in the description of each of the 5 algorithms; the parameters are determined by a training process during which a value of the free parameter is selected which (nearly) maximizes a technical figure of merit. Such training and the actual comparative evaluation is done by making use of randomly generated phantoms and their projection data. The methodology allows assignation of levels of statistical significance to claims of the relative superiority of 1 algorithm over another for a particular task. It is found that using blobs as basis functions in the iterative algorithms is definitely advantageous over using voxels. This result has high statistical significance. (A visual illustration of it is given.) Comparing FBP, EM-ML using blobs, and ART using blobs, the authors do not find a clear difference in the overall performance of the investigated variants of the methods. If anything, their results suggest that ART using blobs may be the most efficacious of the 3.

Chitalia R, Viswanath V, Pantel AR, Peterson LM, Gastounioti A, Cohen EA, Muzi M, Karp J, Mankoff DA, Kontos D.

Eur J Nucl Med Mol Imaging, vol. 48, pp:3990-4001, 2021.

Purpose: Probe-based dynamic (4-D) imaging modalities capture breast intratumor heterogeneity both spatially and kinetically. Characterizing heterogeneity through tumor sub-populations with distinct functional behavior may elucidate tumor biology to improve targeted therapy specificity and enable precision clinical decision making.

Methods: We propose an unsupervised clustering algorithm for 4-D imaging that integrates Markov-Random Field (MRF) image segmentation with time-series analysis to characterize kinetic intratumor heterogeneity. We applied this to dynamic FDG PET scans by identifying distinct time-activity curve (TAC) profiles with spatial proximity constraints. We first evaluated algorithm performance using simulated dynamic data. We then applied our algorithm to a dataset of 50 women with locally advanced breast cancer imaged by dynamic FDG PET prior to treatment and followed to monitor for disease recurrence. A functional tumor heterogeneity (FTH) signature was then extracted from functionally distinct sub-regions within each tumor. Cross-validated time-to-event analysis was performed to assess the prognostic value of FTH signatures compared to established histopathological and kinetic prognostic markers.

Results: Adding FTH signatures to a baseline model of known predictors of disease recurrence and established FDG PET uptake and kinetic markers improved the concordance statistic (C-statistic) from 0.59 to 0.74 (p = 0.005). Unsupervised hierarchical clustering of the FTH signatures identified two significant (p < 0.001) phenotypes of tumor heterogeneity corresponding to high and low FTH. Distributions of FDG flux, or Ki, were significantly different (p = 0.04) across the two phenotypes.

Conclusions: Our findings suggest that imaging markers of FTH add independent value beyond standard PET imaging metrics in predicting recurrence-free survival in breast cancer and thus merit further study.