© 2022 Information Processing Center - National Research Institute
The mpMRI database
Technical issues
The imaging data was provided by the Lower Silesian Oncology, Pulmonology and Hematology Center (DCOPiH) in Wrocław, Poland. The MRI scanner used at DCOPiH was a 1.5 T Siemens Avanto Fit. The following parameters were used:
T2W
Imaging planes: axial, sagittal, coronal
Slice thickness: 3 mm
Pole widzenia (FOV): od 19 do 25 cm (w zależności od protokołu badania)
In-plane resolution: voxel dimensions for the FOV of 19 cm are 0.7 x 0.7 x 3.0 mm
DWI
Imaging planes: axial
Slice thickness: 4 mm
Field of view (FOV): over 38 cm
In-plane resolution: 2.3 x 2.3 x 4 mm
Low b-value: 50
Intermediate b-values: 400, 800
High b-value: 1200
DCE
Imaging planes: axial
Slice thickness: 2 mm
Field of view (FOV): 20 cm
In-plane resolution: 0.9 x 0.9 x 2 mm
Temporal resolution: 1 min
Total observation time: 6 min
Dose of contrast: 0.1 mmol/kg
Injection rate: 2 cc/s
Fat suppression: yes
Label procedures and label standards
All scans were described independently by three professional radiologists, each of whom has a minimum of three years’ experience in describing oncological mpMRI pelvis (prostate) scans and practical knowledge of the PI-RADS 2.1 standard. None of the radiologists had access to historical data (they did not know the patients’ histories nor the locations of foci subject to fusion biopsy). The data was collected by a team of four radiologists.
Data standards
The dataset consists of:
- source images in the form of DICOM files that contain eleven sequences T2W (axial, sagittal, and coronal), ADC, DWI (high b-value), and DCE (six time points)
- prostate area anatomical labels (Anatomical_Labels)
- anterior fibromuscular stroma (afs)
- central zone (cz)
- prostate gland (pg)
- peripheral zone (pz)
- left seminal vesicles (sv_l)
- right seminal vesicles (sv_r)
- transition zone (tz)
- urethra ( ur)
- lesion area labels (Lesion_Labels)
- lesion1–lesion4 (depending on the number of foci in a scan)
The data is collected and organised in catalogues whose subfolders correspond to specific data types (imaging data, anatomical labels, and lesion labels) and to numbers of consecutive scans. The lesion label data is additionally arranged into subfolders, where the information on lesion types and sequence types to which specific labels pertain is stored All lesion labels provided by the radiologists who detected the lesions are available.
Two data representation standards were developed:
- DICOM—imaging data is in the form of original DICOM files; visual labels are converted to the DICOM SEG standard
- MHA/ NIfTI—imaging data is converted to ITK MetaImage (.mha) files; visual labels are converted to NIfTI (.nii) The two data structures are presented in the figure below.
The two data structures are presented in the figure below.
The database contains outlines traced by all of the radiologists. That is why each lesion has up to three unique outlines in all sequences (radiologists can be identified by the names of the lesion label files, which contain their unique radiologist IDs). The database is possibly the only mpMRI prostate database that contains lesion outlines traced by multiple experts. Various lesion outlines can be used to augment data in model learning naturally.
The medical data collected to populate the database is characterised by a wide range of clinical variables that could define their degree of clinical relevance (GS value, ISUP group, number and length of sections sampled for examination, number and length of sections affected by lesions, prostate volume, PSA density, EUA risk groups, etc.). This provides much-needed flexibility in the selection of subsets that include various criteria for lesions’ clinical relevance.
DCOPiH conducts histopathological examination according to the standards of the Polish Society of Pathologists. Malignancy levels are determined based on the Gleason and WHO/ISUP grading systems; for specific patients, however, they are determined based on the overall results for all biopsy samples.
Due to the size of the data, all of it has been divided into smaller data packages in the form of zip files. Smaller packages may facilitate the downloading of the data considerably.
Dataset characteristics
Patients | |
Number of cases | 503 |
– with reliable signs and symptoms | 379 |
– with negative biopsy results (or no biopsy) | 197 |
– with positive biopsy results | 182 |
– ISUP 1 | 79 (43%) |
– ISUP 2 | 51 (28%) |
– ISUP 3 | 37 (20%) |
– ISUP 4 | 9 (5%) |
– ISUP 5 | 6 (3%) |
– with limited signs and symptoms | 124 |
Age median | 68 |
PSA median | 7.71 |
MRI prostate volume median | 48.51 |
Lesions | |
Number of lesions recorded | 393 |
Zone | |
TZ | 205 |
PZ | 188 |
PI-RADS | |
2 | 19 |
3 | 61 |
4 | 134 |
5 | 178 |
Downloadable files—data(last modified on 5 December, 2022):
Note: The files below are multi-part zip files (each file is a part of an archive). Save all of the files in the same folder and unpack them using 7-Zip.
DICOM/DICOM SEG data: AI4AR_dicom.zip, z01.zip, z02.zip, z03.zip, z04.zip, z05.zip, z06.zip, z07.zip, z08.zip, z09.zip, z10.zip, z11.zip, z12.zip, z13.zip, z14.zip
MHA/NIfTI data: AI4AR_cont.zip, z01.zip, z02.zip, z03.zip, z04.zip, z05.zip, z06.zip, z07.zip, z08.zip, z09.zip
Downloadable files—description (last modified on 1 December 2022):
Medical and radiological description
License
The dataset is available to the public free of charge on the project website (https://ai4ar.opi.org.pl) under the Creative Commons Corporation’s Attribution 4.0 International license.
Dataset management
Laboratory of Applied Artificial Intelligence, the National Information Processing Institute, Warsaw, Poland
Contact
Dr Rafał Jóźwiak – Rafal.Jozwiak@opi.org.pl
Dr Jakub Mitura, M.D. – Jakub.Mitura@opi.org.pl