Computational Precision Medicine: Radiology-Pathology Challenge on Brain Tumor Classification 2019 (CPM-RadPath): Data
Data Description Overview
To register for participation and get access to the CPM-RadPath 2019 data, you can follow the instructions given at the "Registration" page.
The CPM-RadPath dataset consists of multi-institutional paired radiology scans and digitized histopathology images of brain gliomas, obtained from the same patients, as well as their diagnostic classification label. Taking into consideration the latest classification of CNS tumors [1, 2], the classes used in the CPM-RadPath challenge are:
- A = Lower grade astrocytoma, IDH-mutant (Grade II or III)
- O = Oligodendroglioma, IDH-mutant, 1p/19q codeleted (Grade II or III)
- G = Glioblastoma and Diffuse astrocytic glioma with molecular features of glioblastoma, IDH-wildtype (Grade IV).
Training data will be released on July 15, allowing participants to train their classification algorithms distinguishing across the different gliomas classes. The training data consist of paired radiology-pathology images and the ground truth classification file. The classification file is a .csv file (training_data_classification_labels.csv) with three columns, where the first column denotes the ID of a case, the second column the classification label, and the third column is the age of the subject in days. Example contents of the ground truth classification file are illustrated below:
CPM_RadPath_2019_ID, class, age_in_days
CPM19_TCIA01_1_1, G, 17832
CPM19_TCIA01_2_1, O, 19435
CPM19_TCIA01_21_1, A, 25894
Furthermore, a "training_data_pathology_image_info.csv" file is provided that stores the physical pixel size, in microns per pixel, of each histopathology image:
CPM_RadPath_2019_ID, mpp-x, mpp-y
CPM19_CBICA_AAB_1, 0.2515, 0.2515
CPM19_CBICA_AAG_1, 0.2515, 0.2515
Validation data will be released on August 15, through an email pointing to the accompanying leaderboard. These data will allow participants to obtain preliminary results in unseen data and also report it in their submitted papers, in addition to their cross-validated results on the training data. The ground truth classification file will not be provided for the validation data, but multiple submissions to the online evaluation platform will be allowed. The challenge platform will score each submission and post it to the participants feed. The scores will also be posted to the leaderboard so the participants can see the scores of the other teams. Scoring will be computed as the number of correctly classified cases divided by the total number of cases, i.e., Accuracy.
Participants are asked to submit a .csv file containing the class label predictions of their method in the validation phase. Please submit a zip file containing the CSV file for scoring by the challenge platform. The format of the CSV file should be as follows. The first row should contain the header (CPM_RadPath_2019_ID,class). The first column of each data row should contain the case id and the second column the class label: G for Glioblastoma, A for Astrocytoma, and O for Oligodendroglioma. The contents of an example file is shown below:
Finally, all participants will be presented with the same testing data, which will be made available through email during 3-18 September and for a limited controlled time-window (48h), before the participants are required to upload their final results in the evaluation portal. Similarly to the validation data, corresponds to paired imaging data without a ground truth classification file. The participants will only be allowed to submit their classification results once to the challenge platform. The scores for this phase for each team and their ranking will be announced during the challenge in MICCAI 2019.
The top-ranked participating teams will be invited before the end of September to prepare slides for a short oral presentation of their method during the CPM-RadPath challenge.
Radiology Data Description:
The radiology data of the CPM-RadPath challenge describes multi-institutional routine clinically-acquired pre-operative multimodal MRI scans of brain gliomas. Specifically, the radiology scans used in this challenge are available as NIfTI files (.nii.gz) and correspond to multi-parametric MRI images comprising a) native (T1) and b) post-contrast T1-weighted (T1Gd), c) T2-weighted (T2), and d) T2 Fluid Attenuated Inversion Recovery (T2-FLAIR) volumes. All brain scans were acquired with different clinical protocols and various scanners (1T-3T) from multiple institutions. The provided data are distributed after their pre-processing, i.e. co-registered to the same anatomical template, interpolated to the same resolution (1 mm^3) and skull-stripped.
Pathology Data Description:
The histopathology data of the CPM-RadPath challenge contain one digitized whole slide tissue image for each patient, captured from Hematoxylin and Eosin (H&E) stained tissue specimens. The tissue specimens were scanned at 20x or 40x magnifications. Note that there may be color and intensity variations among the images because of batch effects and other image acquisition artifacts. The images are stored in tiled tiff format. The participants can use the OpenSlide library (https://openslide.org) to read the images, or any other library of their preference.
The participants will be required to submit a docker image (https://www.docker.com) via DockerHub (https://hub.docker.com). The docker image will contain the implementation of the participant’s classification algorithm. The docker images will be executed on the challenge platform and final test scores will be computed.
The docker image will be expected to accept a folder of images and output a classification file in the same format used by the ground truth classification file in the training phase. The input image folder will consist of two subfolders: radiology and pathology. The radiology folder will contain the radiology images for each case in zip files, one zip file for a case. The pathology folder will contain the whole slide tissue images. The filename of a whole slide tissue image will match the case id. For example, a dataset consisting of three cases will have the following folder and file structure in an input image folder:
The docker image will then be executed as follows:
docker run -v input_images:/data/images -v output_result:/data/output brain_classification_docker
The docker container is expected to output a single csv file with the filename output_classification.csv in /data/output in the container.
Data Usage Agreement / Citations
You are free to use and/or refer to the CPM-RadPath datasets in your own research, provided that you always cite the following manuscript:
T. Kurc, S. Bakas, X. Ren, A. Bagari, A. Momeni, Y. Huang, L. Zhang, A. Kumar, M. Thibault, Q. Qi, Q. Wang, A. Kori, O. Gevaert, Y. Zhang, D. Shen, M. Khened, X. Ding, G. Krishnamurthi, J. Kalpathy-Cramer, J. Davis, T. Zhao, R. Gupta, J. Saltz, K. Farahani. "Segmentation and classification in digital pathology for glioma research: challenges and deep learning approaches". Frontiers in neuroscience, p.27, 2020.
Feel free to send any communication related to the CPM-RadPath challenge to email@example.com