Multimodal Brain Tumor Segmentation Challenge 2020: Tasks


ScopeRelevanceTasks & EvaluationDataParticipation DetailsRegistrationPrevious BraTSPeople


Tasks' Description and Evaluation Framework

In this year's challenge, 3 reference standards are used for the 3 tasks of the challenge: 1) manual segmentation labels of tumor sub-regions, 2) clinical data of overall survival, and 3) uncertainty estimation for the predicted tumor sub-regions.

Participants are highly encouraged to participate in more than one task. However, they are not required to.

In favor of transparency and reproducibility, this year we will make publicly available (by the end of the challenge) the scripts used in CBICA's IPP for the evaluation of the participating methods. Furthermore, we intend to make available the actual software tools we have been using to pre-process the BraTS data.

Description

The participants are called to address this task by using the provided clinically-acquired training data to develop their method and produce segmentation labels of the different glioma sub-regions. The sub-regions considered for evaluation are: 1) the "enhancing tumor" (ET), 2) the "tumor core" (TC), and 3) the "whole tumor" (WT) [see figure below]. The ET is described by areas that show hyper-intensity in T1Gd when compared to T1, but also when compared to “healthy” white matter in T1Gd. The TC describes the bulk of the tumor, which is what is typically resected. The TC entails the ET, as well as the necrotic (fluid-filled) and the non-enhancing (solid) parts of the tumor. The appearance of the necrotic (NCR) and the non-enhancing (NET) tumor core is typically hypo-intense in T1-Gd when compared to T1. The WT describes the complete extent of the disease, as it entails the TC and the peritumoral edema (ED), which is typically depicted by hyper-intense signal in FLAIR.

The provided segmentation labels have values of 1 for NCR & NET, 2 for ED, 4 for ET, and 0 for everything else.
The participants are called to upload their segmentation labels as a single multi-label file in nifti (.nii.gz) format, into CBICA's Image Processing Portal for evaluation.

Evaluation Approach

Consistent with the configuration of previous BraTS challenges, we use the "Dice score", and the "Hausdorff distance (95%)". Expanding upon this evaluation scheme, since BraTS'17 we also use the metrics of "Sensitivity" and "Specificity", allowing to determine potential over- or under-segmentations of the tumor sub-regions by participating methods.

Description

Once the participants produce their segmentation labels in the pre-operative scans, they will be called to use these labels in combination with the provided MRI data to extract imaging/radiomic features that they consider appropriate, and analyze them through machine learning algorithms, in an attempt to predict patient overall survival (OS). The participants do not need to be limited to volumetric parameters, but can also consider intensity, morphologic, histogram-based, and textural features, as well as spatial information, and glioma diffusion properties extracted from glioma growth models.

Note that participants will be evaluated for the predicted OS status of subjects with resection status of GTR (i.e., Gross Total Resection).
The participants are called to upload a .csv file with the subject ids and the predicted survival values (survival in days), into CBICA's Image Processing Portal for evaluation.

Evaluation

Following available literature, as well as considerations for potential clinical translation, we consider two evaluation schemes. First, the participating teams will be evaluated and ranked based on the accuracy of the classification of subjects as long-survivors (e.g., >15 months), short-survivors (e.g., <10 months), and mid-survivors (e.g., between 10 and 15 months). Predictions of the participating teams will be assessed based on accuracy (i.e., the number of correctly classified patients) with respect to this grouping. Note that participants are expected to provide predicted survival status only for subjects with available age and resection status of GTR (i.e., Gross Total Resection). For post-challenge meta-analyses, we will also compare both the mean and median square error of survival time predictions.

Description

Building upon BraTS'19 complementary research task, this year we evaluate uncertainty measures in the context of glioma region segmentation. This task focuses on rewarding methods with predictions that are: (a) confident when correct and (b) uncertain when incorrect. Participants willing to participate in this task are asked to upload (in addition to their segmentation results of Task 1) 3 generated uncertainty maps associated with the resulting labels at every voxel.

The uncertainty maps should be associated with 1) "enhancing tumor" (ET), 2) "tumor core" (TC), and 3) "whole tumor" (WT) regions. In this manner, the uncertainties will be associated with the traditional BraTS Dice metrics. The participants should normalize their uncertainty values between 0 - 100 across the entire dataset, such that "0" represents the most certain prediction and "100" represents the most uncertain. Note that, in any one single patient case, the values of the uncertainties do not need to take on the full range from [0 100] (i.e. The algorithm may be confident for predictions at all voxels for a single patient). To keep storage requirements to a minimum, participants are expected to submit uncertainties in ‘uint8’ type.

The participants are called to upload 4 nifti (.nii.gz) volumes (3 uncertainty maps and 1 multi-class segmentation volume from Task 1) onto CBICA's Image Processing Portal format. For example, for each ID in the dataset, participants are expected to upload following 4 volumes:
1. {ID}.nii.gz (multi-class label map)
2. {ID}_unc_whole.nii.gz (Uncertainty map associated with whole tumor)
3. {ID}_unc_core.nii.gz (Uncertainty map associated with tumor core)
4. {ID}_unc_enhance.nii.gz (Uncertainty map associated with enhancing tumor)

Evaluation

For the task of estimating uncertainty, uncertain voxels will be filtered out at several predetermined N numbers of uncertainty thresholds, "T" (0<=T=<100), and the model performance will be assessed based on the "Dice score" of the remaining voxels at each value of T. For example, T:75 implies that all voxels with uncertainty values >75 will be marked as uncertain, and the associated predictions will be filtered out and not considered for the subsequent Dice calculations. Dice values will only be calculated for the remaining predictions at the unfiltered voxels. This evaluation will reward approaches where the confidence in the correct assertions is high (True Positive - TP, and True Negative - TN) and low for incorrect assertions (False Positive - FP, and False Negative - FN).

This strategy does not keep track of the number of correctly labelled voxels that are filtered at each threshold level along with the uncertain incorrect labels. In order to penalize filtering out many correctly predicted voxels (TP, TN) when attaining high Dice values, an additional assessment component is added to keep track of the filtered TP and TN voxels. Given that tumor segmentation is expected to have a high-class imbalance between tumor and healthy tissues, the system keeps track of the filtered TP and TN separately. The ratio of filtered TP (FTP) at different thresholds (T) is measured relative to the unfiltered values (T = 100) such that FTP =  (TP_100 - TP_T) / TP_100. The ratio of filtered TN (FTN) is calculated in a similar manner. This evaluation essentially penalizes approaches that filter out a large percentage of TP or TN relative to T=100 voxels in order to attain the reported Dice value.

The figure and table below show visual and quantitative examples of the assessment metric based on images from the BraTS dataset. Decreasing T leads to filtering out voxels with incorrect assertions,  leading to an increase in the Dice value for the remaining voxels. Case 2 shows a marginally higher Dice value than Case 1 at uncertainty thresholds T = 50 and 25. However, the Ratio of Filtered TP and TN indicates that this is at the expense of marking more TP and TN as uncertain.

Participating teams will be ranked for each BraTS subjects separately, according to the integrated data provided by the area under three curves: 1) Dice vs T, 2) FTP vs T, and 3) FTN vs T, for different values of T. The integrated score will be calculated as follows: 

score = AUC_1 + (1-AUC_2) + (1-AUC_3). 

The final ranking will be based on the average of the rankings for each case.

 

BraTS uncertainty examples
Effect of uncertainty thresholding on 2 different examples for WT segmentation.

 

 

  Dice score Ratio of FTP Ratio of FTN
T:100 T:75 T:50 T:25 T:75 T:50 T:25 T:75 T:50 T:25
Example 1 .94 .96 .965 .97 0 .05 .1 .0015 .0016 .0019
Example 2 .92 .955 .97 .975 0 .15 .25 .0015 .0026

.0096

BraTS Tumor Sub-regions
Glioma sub-regions. Image patches with the tumor sub-regions annotated in the different MRI modalities. The image patches show from left to right: the whole tumor (WT - yellow) visible in T2-FLAIR (Fig. A), the tumor core (TC - orange) visible in T2 (Fig. B), the enhancing tumor (ET - light blue) visible in T1-Gd, surrounding the cystic/necrotic components of the core (green) (Fig. C). The segmentations are combined to generate the final labels of the tumor sub-regions (Fig. D): edema/invasion (yellow), non-enhancing solid core (orange), necrotic/cystic core (green), enhancing core (blue). (Figure taken from the BraTS IEEE TMI paper.)

 

Feel free to send any communication related to the BraTS challenge to brats2020@cbica.upenn.edu

MICCAI 2020