Perelman School of Medicine at the University of Pennsylvania

Penn Medicine · Imaging AUC

banner image

How does the AUC program review evidence?

Penn Medicine adheres to evidence-based processes for developing, modifying, and/or endorsing imaging AUC.  In particular, the Imaging AUC program ensures that AUC (and all components of each set of criteria) are scientifically valid and evidence-based to the greatest extent possible.  All AUC incorporate multidisciplinary stakeholder input.  All key points in individual criteria are identified as evidence-based or consensus-based. 

We utilize the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system -- a formal, peer-reviewed, and widely accepted methodology to assess the quality of evidence and strength of recommendations for diagnostic tests or strategies.  This system is particularly useful when randomized control trial (RCT) data is sparse and accuracy tests are more abundant, as is the case in diagnostic imaging studies.  The underlying assumption is that obtaining a better idea of whether a target condition is present or absent will result in superior management of patients and improved outcome.  However, when studies measuring the impact of testing on patient-important outcomes are not always available, the GRADE system allows for utilizing the test accuracy metrics (e.g. sensitivity, specificity) to make inferences about the likely impact on patient-important outcomes.  As a result, diagnostic accuracy is a surrogate outcome for benefits and harms to patients.  The formal methodology described here.

The final score is based on whether the following quality criteria are present in the publication:

  • Statistical measures such as sensitivity, PPV, ROC analysis, etc. are present and facilitate comparisons across citations. 
  • Measurements of uncertainty such as p-values, confidence intervals, etc. are present to provide a range for the statistical measurement.  
  • Timing of the study, e.g. prospective studies designed prior to the data collection tend to reduce bias.  
  • Five additional criteria for diagnostic studies only – comparison with standard method was made, a reference standard has been applied to all subjects in the same way, recruitment was performed systematically, two or more independent readers were employed, and test results were interpreted blind to reference standard results. 
  • Three additional criteria for therapeutic studies only – presence of control and intervention groups, random allocation into these groups and length of follow-up or drop-out factors listed.

In our literature rating process, the best score for review articles is 4 as they do not typically perform statistical measurements, while observational or experimental papers can score 1 and 3.  For example, if all eight criteria are met for diagnostic studies or 5-6 are met for therapeutic studies, then the (best) score of 1 is given to that publication. Meta-analysis studies are not rated since this method is designed only to evaluate individual studies.