├── LICENSE
├── README.md
├── requirements.txt
├── tcia_dicom_to_nifti.py
├── tcia_dicom_to_nifti_generic.py
├── tcia_nifti_to_hdf5.py
└── tcia_nifti_to_mha.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2022 midas.lab
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # TCIA_processing: tcia_dicom_to_nifti.py
  2 | 
  3 | Conversion script for conversion of TCIA DICOM data to NIfTI format (dataset: FDG-PET-CT-Lesion, doi: <a href="https://doi.org/10.7937/gkr0-xv29"><img src="https://img.shields.io/badge/DOI-10.7937%2Fgkr0--xv29-blue"></a>).
  4 | 
  5 | ## Requirements
  6 | 
  7 | To run the script you will need a number of python packages. Use the terminal and run sequentially:
  8 | 
  9 | ```bash
 10 | pip3 install numpy
 11 | pip3 install dicom2nifti
 12 | pip3 install nibabel
 13 | pip3 install pydicom
 14 | pip3 install tqdm
 15 | pip3 install nilearn
 16 | ```
 17 | in case you use a Colab or Jupyter notebook and cannot use the terminal you can perform these installations by adding a "!" in front of the commands, e.g.
 18 | ```python
 19 | !pip3 install numpy
 20 | ...
 21 | ```
 22 | ## Data structure
 23 | DICOM data downloaded from TCIA will have the following format:
 24 | 
 25 | Directory structure of the original DICOM data within the folder /PATH/TO/DICOM/FDG-PET-CT-Lesions/ :
 26 | 
 27 | <img width="400" alt="image" src="https://user-images.githubusercontent.com/52936169/165639574-58c53bd0-2ff2-4525-9147-f254521840dd.png">
 28 | 
 29 | 
 30 | ## Usage
 31 | 
 32 | In order to run this script use the terminal and navigate to the path where the script is stored, then run:
 33 | 
 34 | ```bash
 35 | python3 tcia_dicom_to_nifti.py /PATH/TO/DICOM/FDG-PET-CT-Lesions/ /PATH/TO/NIFTI/FDG-PET-CT-Lesions/
 36 | 
 37 | ```
 38 | where
 39 | 
 40 | ```/PATH/TO/DICOM/FDG-PET-CT-Lesions/```
 41 | is the directory of the DICOM data downloaded from TCIA (see above: data structure) and
 42 | ```/PATH/TO/NIFTI/FDG-PET-CT-Lesions/```
 43 | is the path you want to store the NIfTI files in.
 44 | 
 45 | You can ignore the nilearn warning:
 46 | 
 47 | ```.../nilearn/image/resampling.py:527: UserWarning: Casting data from int16 to float32 warnings.warn("Casting data from %s to %s" % (data.dtype.name, aux))```
 48 | 
 49 | or suppress warnings by running the script as (after making sure everything works):
 50 | 
 51 | ```bash
 52 | python3 -W ignore tcia_dicom_to_nifti.py /PATH/TO/DICOM/FDG-PET-CT-Lesions/ /PATH/TO/NIFTI/FDG-PET-CT-Lesions/
 53 | ```
 54 | 
 55 | ## Output
 56 | The resulting NIfTI directory will have the following structure:
 57 | 
 58 | <img width="500" alt="image" src="https://user-images.githubusercontent.com/52936169/165639700-164c5778-556f-4492-96ed-fa21a9a51603.png">
 59 | 
 60 | ## Execution time
 61 | Running the script can take multiple hours.
 62 | 
 63 | # TCIA_processing: tcia_nifti_to_mha.py
 64 | 
 65 | Conversion script for conversion of TCIA NIfTI data (created using tcia_dicom_to_nifti.py - see above) to mha files.
 66 | 
 67 | ## Requirements
 68 | 
 69 | To run the script you will need a number of python packages. Use the terminal and run sequentially:
 70 | 
 71 | ```bash
 72 | pip3 install SimpleITK
 73 | pip3 install tqdm
 74 | ```
 75 | in case you use a Colab or Jupyter notebook and cannot use the terminal you can perform these installations by adding a "!" in front of the commands, e.g.
 76 | ```python
 77 | !pip3 install SimpleITK
 78 | ...
 79 | ```
 80 | ## Usage
 81 | 
 82 | In order to run this script use the terminal and navigate to the path where the script is stored, then run:
 83 | 
 84 | ```bash
 85 | python3 tcia_nifti_to_mha.py /PATH/TO/NIFTI/FDG-PET-CT-Lesions/ /PATH/TO/MHA/FDG-PET-CT-Lesions/
 86 | ```
 87 | where
 88 | 
 89 | ```/PATH/TO/NIFTI/FDG-PET-CT-Lesions/```
 90 | is the directory of the NIfTI data generated using tcia_dicom_to_nifti.py (see above) and
 91 | ```/PATH/TO/NIFTI/FDG-PET-CT-Lesions/```
 92 | is the path you want to store the MHA files in.
 93 | 
 94 | You can ignore the nilearn warning:
 95 | 
 96 | ```.WARNING: In /tmp/SimpleITK-build/ITK/Modules/IO/Meta/src/itkMetaImageIO.cxx, line 669 MetaImageIO (0x2d9b300): Unsupported or empty metaData item intent_name of type Ssfound, won't be written to image file```
 97 | 
 98 | or suppress warnings by running the script as (after making sure everything works):
 99 | 
100 | ```bash
101 | python3 -W ignore tcia_nifti_to_mha.py /PATH/TO/NIFTI/FDG-PET-CT-Lesions/ /PATH/TO/MHA/FDG-PET-CT-Lesions/
102 | ```
103 | 
104 | # TCIA_processing: tcia_nifti_to_hdf5.py
105 | 
106 | Conversion script for conversion of TCIA NIfTI data (created using tcia_dicom_to_nifti.py - see above) to a single hdf5 file
107 | 
108 | ## Requirements
109 | 
110 | To run the script you will need a number of python packages. Use the terminal and run sequentially:
111 | 
112 | ```bash
113 | pip3 install numpy
114 | pip3 install h5py
115 | pip3 install tqdm
116 | pip3 install nibabel
117 | ```
118 | in case you use a Colab or Jupyter notebook and cannot use the terminal you can perform these installations by adding a "!" in front of the commands, e.g.
119 | ```python
120 | !pip3 install numpy
121 | ...
122 | ```
123 | ## Usage
124 | 
125 | In order to run this script use the terminal and navigate to the path where the script is stored, then run:
126 | 
127 | ```bash
128 | python3 tcia_nifti_to_hdf5.py /PATH/TO/NIFTI/FDG-PET-CT-Lesions/ /PATH/TO/HDF5/FDG-PET-CT-Lesions.hdf5
129 | 
130 | ```
131 | where
132 | 
133 | ```/PATH/TO/NIFTI/FDG-PET-CT-Lesions/```
134 | is the directory of the NIfTI data generated using tcia_dicom_to_nifti.py (see above) and
135 | ```/PATH/TO/HDF5/FDG-PET-CT-Lesions.hdf5```
136 | is the path and filename of the hdf5 file to be created.
137 | 
138 | ## Package Versions
139 | All scripts were tested under python 3.9 with the following package versions:
140 |    
141 | dicom2nifti==2.3.3
142 | 
143 | nibabel==3.2.2
144 | 
145 | pydicom==2.3.0
146 | 
147 | h5py==3.6.0
148 | 
149 | tqdm==4.64.0
150 | 
151 | SimpleITK==2.1.1.2
152 | 
153 | nilearn==0.9.1
154 | 
155 | numpy==1.22.3
156 | 
157 | ## License
158 | [MIT](https://choosealicense.com/licenses/mit/)
159 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | numpy
 2 | dicom2nifti
 3 | nibabel
 4 | pydicom
 5 | h5py
 6 | tqdm
 7 | SimpleITK
 8 | nilearn
 9 | 
10 | # All scripts were tested under python 3.9 with the following package versions:
11 |    
12 | #dicom2nifti==2.3.3
13 | #nibabel==3.2.2
14 | #pydicom==2.3.0
15 | #h5py==3.6.0
16 | #tqdm==4.64.0
17 | #SimpleITK==2.1.1.2
18 | #nilearn==0.9.1
19 | #numpy==1.22.3
20 | 


--------------------------------------------------------------------------------
/tcia_dicom_to_nifti.py:
--------------------------------------------------------------------------------
  1 | # data preparation (conversion of DICOM PET/CT studies to nifti format for running automated lesion segmentation)
  2 | 
  3 | # run script from command line as follows:
  4 | # python tcia_dicom_to_nifti.py /PATH/TO/DICOM/FDG-PET-CT-Lesions/ /PATH/TO/NIFTI/FDG-PET-CT-Lesions/
  5 | 
  6 | # you can ignore the nilearn warning:
  7 | # .../nilearn/image/resampling.py:527: UserWarning: Casting data from int16 to float32 warnings.warn("Casting data from %s to %s" % (data.dtype.name, aux)) 
  8 | # or run as python -W ignore tcia_dicom_to_nifti.py /PATH/TO/DICOM/FDG-PET-CT-Lesions/ /PATH/TO/NIFTI/FDG-PET-CT-Lesions/
  9 | 
 10 | import pathlib as plb
 11 | import tempfile
 12 | import os
 13 | import dicom2nifti
 14 | import nibabel as nib
 15 | import numpy as np
 16 | import pydicom
 17 | import sys
 18 | import shutil
 19 | import nilearn.image
 20 | from tqdm import tqdm
 21 | 
 22 | 
 23 | def find_studies(path_to_data):
 24 |     # find all studies
 25 |     dicom_root = plb.Path(path_to_data)
 26 |     patient_dirs = list(dicom_root.glob('*'))
 27 | 
 28 |     study_dirs = []
 29 | 
 30 |     for dir in patient_dirs:
 31 |         sub_dirs = list(dir.glob('*'))
 32 |         #print(sub_dirs)
 33 |         study_dirs.extend(sub_dirs)
 34 |         
 35 |         #dicom_dirs = dicom_dirs.append(dir.glob('*'))
 36 |     return study_dirs
 37 | 
 38 | 
 39 | def identify_modalities(study_dir):
 40 |     # identify CT, PET and mask subfolders and return dicitionary of modalities and corresponding paths, also return series ID, output is a dictionary
 41 |     study_dir = plb.Path(study_dir)
 42 |     sub_dirs = list(study_dir.glob('*'))
 43 | 
 44 |     modalities = {}
 45 | 
 46 |     for dir in sub_dirs:
 47 |         first_file = next(dir.glob('*.dcm'))
 48 |         ds = pydicom.dcmread(str(first_file))
 49 |         #print(ds)
 50 |         modality = ds.Modality
 51 |         modalities[modality] = dir
 52 |     
 53 |     modalities["ID"] = ds.StudyInstanceUID
 54 |     return modalities
 55 | 
 56 | 
 57 | def dcm2nii_CT(CT_dcm_path, nii_out_path):
 58 |     # conversion of CT DICOM (in the CT_dcm_path) to nifti and save in nii_out_path
 59 |     with tempfile.TemporaryDirectory() as tmp: #convert CT
 60 |         tmp = plb.Path(str(tmp))
 61 |         # convert dicom directory to nifti
 62 |         # (store results in temp directory)
 63 |         dicom2nifti.convert_directory(CT_dcm_path, str(tmp), 
 64 |                                       compression=True, reorient=True)
 65 |         nii = next(tmp.glob('*nii.gz'))
 66 |         # copy niftis to output folder with consistent naming
 67 |         shutil.copy(nii, nii_out_path/'CT.nii.gz')
 68 | 
 69 | 
 70 | def dcm2nii_PET(PET_dcm_path, nii_out_path):
 71 |     # conversion of PET DICOM (in the PET_dcm_path) to nifti (and SUV nifti) and save in nii_out_path
 72 |     first_pt_dcm = next(PET_dcm_path.glob('*.dcm'))
 73 |     suv_corr_factor = calculate_suv_factor(first_pt_dcm)
 74 | 
 75 |     with tempfile.TemporaryDirectory() as tmp: #convert PET
 76 |         tmp = plb.Path(str(tmp))
 77 |         # convert dicom directory to nifti
 78 |         # (store results in temp directory)
 79 |         dicom2nifti.convert_directory(PET_dcm_path, str(tmp), 
 80 |                                     compression=True, reorient=True)
 81 |         nii = next(tmp.glob('*nii.gz'))
 82 |         # copy nifti to output folder with consistent naming
 83 |         shutil.copy(nii, nii_out_path/'PET.nii.gz')
 84 | 
 85 |         # convert pet images to quantitative suv images and save nifti file
 86 |         suv_pet_nii = convert_pet(nib.load(nii_out_path/'PET.nii.gz'), suv_factor=suv_corr_factor)
 87 |         nib.save(suv_pet_nii, nii_out_path/'SUV.nii.gz')
 88 | 
 89 | 
 90 | def conv_time(time_str):
 91 |     # function for time conversion in DICOM tag
 92 |     return (float(time_str[:2]) * 3600 + float(time_str[2:4]) * 60 + float(time_str[4:13]))
 93 | 
 94 | 
 95 | def calculate_suv_factor(dcm_path):
 96 |     # reads a PET dicom file and calculates the SUV conversion factor
 97 |     ds = pydicom.dcmread(str(dcm_path))
 98 |     total_dose = ds.RadiopharmaceuticalInformationSequence[0].RadionuclideTotalDose
 99 |     start_time = ds.RadiopharmaceuticalInformationSequence[0].RadiopharmaceuticalStartTime
100 |     half_life = ds.RadiopharmaceuticalInformationSequence[0].RadionuclideHalfLife
101 |     acq_time = ds.AcquisitionTime
102 |     weight = ds.PatientWeight
103 |     time_diff = conv_time(acq_time) - conv_time(start_time)
104 |     act_dose = total_dose * 0.5 ** (time_diff / half_life)
105 |     suv_factor = 1000 * weight / act_dose
106 |     return suv_factor
107 | 
108 | 
109 | def convert_pet(pet, suv_factor):
110 |     # function for conversion of PET values to SUV (should work on Siemens PET/CT)
111 |     affine = pet.affine
112 |     pet_data = pet.get_fdata()
113 |     pet_suv_data = (pet_data*suv_factor).astype(np.float32)
114 |     pet_suv = nib.Nifti1Image(pet_suv_data, affine)
115 |     return pet_suv
116 | 
117 | 
118 | def dcm2nii_mask(mask_dcm_path, nii_out_path):
119 |     # conversion of the mask dicom file to nifti (not directly possible with dicom2nifti)
120 |     mask_dcm = list(mask_dcm_path.glob('*.dcm'))[0]
121 |     mask = pydicom.read_file(str(mask_dcm))
122 |     mask_array = mask.pixel_array
123 |     
124 |     # get mask array to correct orientation (this procedure is dataset specific)
125 |     mask_array = np.transpose(mask_array,(2,1,0) )  
126 |     mask_orientation = mask[0x5200, 0x9229][0].PlaneOrientationSequence[0].ImageOrientationPatient
127 |     if mask_orientation[4] == 1:
128 |         mask_array = np.flip(mask_array, 1 )
129 |     
130 |     # get affine matrix from the corresponding pet             
131 |     pet = nib.load(str(nii_out_path/'PET.nii.gz'))
132 |     pet_affine = pet.affine
133 |     
134 |     # return mask as nifti object
135 |     mask_out = nib.Nifti1Image(mask_array, pet_affine)
136 |     nib.save(mask_out, nii_out_path/'SEG.nii.gz')   
137 |     
138 | 
139 | def resample_ct(nii_out_path):
140 |     # resample CT to PET and mask resolution
141 |     ct   = nib.load(nii_out_path/'CT.nii.gz')
142 |     pet  = nib.load(nii_out_path/'PET.nii.gz')
143 |     CTres = nilearn.image.resample_to_img(ct, pet, fill_value=-1024)
144 |     nib.save(CTres, nii_out_path/'CTres.nii.gz')
145 | 
146 | 
147 | def tcia_to_nifti(tcia_path, nii_out_path, modality='CT'):
148 |     # conversion for a single file
149 |     # creates a nifti file for one patient/study
150 |     # tcia_path:        path to a DICOM directory for a specific study of one patient
151 |     # nii_out_path:     path to a directory where nifti file for one patient, study and modality will be stored
152 |     # modality:         modality to be converted CT, PET or mask ('CT', 'PT', 'SEG')
153 |     os.makedirs(nii_out_path, exist_ok=True)
154 |     if modality == 'CT':
155 |         dcm2nii_CT(tcia_path, nii_out_path)
156 |         resample_ct(nii_out_path)
157 |     elif modality == 'PET':
158 |         dcm2nii_PET(tcia_path, nii_out_path)
159 |     elif modality == 'SEG':
160 |         dcm2nii_mask(tcia_path, nii_out_path)
161 | 
162 | 
163 | def tcia_to_nifti_study(study_path, nii_out_path):
164 |     # conversion for a single study
165 |     # creates NIfTI files for one patient
166 |     # study_path:       path to a study directory containing all DICOM files for a specific study of one patient
167 |     # nii_out_path:     path to a directory where all nifti files for one patient and study will be stored
168 |     study_path = plb.Path(study_path)
169 |     modalities = identify_modalities(study_path)
170 |     nii_out_path = plb.Path(nii_out_root / study_path.parent.name)
171 |     nii_out_path = nii_out_path/study_path.name
172 |     os.makedirs(nii_out_path, exist_ok=True)
173 | 
174 |     ct_dir = modalities["CT"]
175 |     dcm2nii_CT(ct_dir, nii_out_path)
176 | 
177 |     pet_dir = modalities["PT"]
178 |     dcm2nii_PET(pet_dir, nii_out_path)
179 | 
180 |     seg_dir = modalities["SEG"]
181 |     dcm2nii_mask(seg_dir, nii_out_path)
182 | 
183 |     resample_ct(nii_out_path)
184 | 
185 | 
186 | def convert_tcia_to_nifti(study_dirs,nii_out_root):
187 |     # batch conversion of all patients
188 |     for study_dir in tqdm(study_dirs):
189 |         
190 |         patient = study_dir.parent.name
191 |         print("The following patient directory is being processed: ", patient)
192 | 
193 |         modalities = identify_modalities(study_dir)
194 |         nii_out_path = plb.Path(nii_out_root/study_dir.parent.name)
195 |         nii_out_path = nii_out_path/study_dir.name
196 |         os.makedirs(nii_out_path, exist_ok=True)
197 | 
198 |         ct_dir = modalities["CT"]
199 |         dcm2nii_CT(ct_dir, nii_out_path)
200 | 
201 |         pet_dir = modalities["PT"]
202 |         dcm2nii_PET(pet_dir, nii_out_path)
203 | 
204 |         seg_dir = modalities["SEG"]
205 |         dcm2nii_mask(seg_dir, nii_out_path)
206 | 
207 |         resample_ct(nii_out_path)
208 | 
209 | 
210 | if __name__ == "__main__":
211 |     path_to_data = plb.Path(sys.argv[1])  # path to downloaded TCIA DICOM database, e.g. '.../FDG-PET-CT-Lesions/'
212 |     nii_out_root = plb.Path(sys.argv[2])  # path to the to be created NiFTI files, e.g. '...tcia_nifti/FDG-PET-CT-Lesions/')
213 | 
214 |     study_dirs = find_studies(path_to_data)
215 |     convert_tcia_to_nifti(study_dirs, nii_out_root)
216 | 


--------------------------------------------------------------------------------
/tcia_dicom_to_nifti_generic.py:
--------------------------------------------------------------------------------
 1 | # data preparation (conversion of DICOM series to nifti format)
 2 | 
 3 | # run script from command line as follows:
 4 | # python tcia_dicom_to_nifti.py /PATH/TO/DICOM/TCIA_dataset_name/ /PATH/TO/NIFTI/TCIA_dataset_name/
 5 | # if not existing the output folder(s) (/PATH/TO/NIFTI/TCIA_dataset_name/) will be generated 
 6 | 
 7 | import pathlib as plb
 8 | import tempfile
 9 | import os
10 | import dicom2nifti
11 | import nibabel as nib
12 | import numpy as np
13 | import pydicom
14 | import sys
15 | import shutil
16 | from tqdm import tqdm
17 | 
18 | 
19 | def find_studies(path_to_data):
20 |     # find all studies
21 |     dicom_root = plb.Path(path_to_data)
22 |     patient_dirs = list(dicom_root.glob('*'))
23 | 
24 |     study_dirs = []
25 | 
26 |     for dir in patient_dirs:
27 |         sub_dirs = list(dir.glob('*'))
28 |         #print(sub_dirs)
29 |         study_dirs.extend(sub_dirs)
30 |         
31 |         #dicom_dirs = dicom_dirs.append(dir.glob('*'))
32 |     return study_dirs
33 | 
34 | 
35 | def get_series(study_dir):
36 |     # returns paths of series directories
37 |     study_dir = plb.Path(study_dir)
38 |     series_dirs = list(study_dir.glob('*'))
39 | 
40 |     return series_dirs
41 | 
42 | def dcm2nii(dcm_path, nii_out_path):
43 |     # conversion of DICOM to nifti and save in nii_out_path
44 |     
45 |     dicom2nifti.convert_directory(str(dcm_path), str(nii_out_path), 
46 |                                       compression=True, reorient=True)
47 |  
48 | def convert_tcia_to_nifti(study_dirs,nii_out_root):
49 |     # batch conversion of all patients
50 |     for study_dir in tqdm(study_dirs):
51 |         
52 |         patient = study_dir.parent.name
53 |         print("The following patient directory is being processed: ", patient)
54 | 
55 |         series_dirs = get_series(study_dir)
56 | 
57 |         nii_out_path = plb.Path(nii_out_root/study_dir.name)
58 |         os.makedirs(nii_out_path, exist_ok=True)
59 | 
60 |         for series in series_dirs:
61 |             try:
62 |                 dcm2nii(series, nii_out_path)
63 |             except:
64 |                 # ... PRINT THE ERROR MESSAGE ... #
65 |                 print('An error occurred, data may be (partially) not converted: '+ str(series))
66 |                 
67 |     
68 | if __name__ == "__main__":
69 |     path_to_data = plb.Path(sys.argv[1])  # path to downloaded TCIA DICOM database, e.g. '...TCIA/manifest-1647440690095/FDG-PET-CT-Lesions/'
70 |     nii_out_root = plb.Path(sys.argv[2])  # path to the to be created NiFTI files, e.g. '...tcia_nifti/FDG-PET-CT-Lesions/')
71 | 
72 |     study_dirs = find_studies(path_to_data)
73 |     convert_tcia_to_nifti(study_dirs, nii_out_root)
74 | 


--------------------------------------------------------------------------------
/tcia_nifti_to_hdf5.py:
--------------------------------------------------------------------------------
  1 | # data preparation (conversion of DICOM PET/CT studies to HDF5 format for running automated lesion segmentation)
  2 | 
  3 | # run script from command line as follows:
  4 | # python tcia_dicom_to_nifti.py /PATH/TO/NIFTI/FDG-PET-CT-Lesions/ /PATH/TO/HDF5/FDG-PET-CT-Lesions.hdf5
  5 | 
  6 | import h5py
  7 | from tqdm import tqdm
  8 | import pathlib as plb
  9 | import sys
 10 | import os
 11 | import nibabel as nib
 12 | import numpy as np
 13 | 
 14 | def find_studies(path_to_data):
 15 |     # find all studies
 16 |     dicom_root = plb.Path(path_to_data)
 17 |     patient_dirs = list(dicom_root.glob('*'))
 18 | 
 19 |     study_dirs = []
 20 | 
 21 |     for dir in patient_dirs:
 22 |         sub_dirs = list(dir.glob('*'))
 23 |         #print(sub_dirs)
 24 |         study_dirs.extend(sub_dirs)
 25 |         
 26 |         #dicom_dirs = dicom_dirs.append(dir.glob('*'))
 27 |     return study_dirs
 28 | 
 29 | 
 30 | def nifti_to_hdf5(nii_file, path_to_h5_file):
 31 |     # conversion for a single file
 32 |     # creates an hdf5 file for one patient
 33 |     # nii_path:         path to a study directory containing all nifti files for a specific study of one patient
 34 |     # path_to_h5_file:  path to a single hdf5 file for one patient and study
 35 |     data = nib.load(nii_file)
 36 |     with h5py.File(path_to_h5_file, 'w') as h5_file:
 37 |         h5_file.create_dataset(data.get_fdata())
 38 | 
 39 | 
 40 | def nifti_to_hdf5_study(study_path, path_to_h5_file):
 41 |     # conversion for a single study
 42 |     # creates an hdf5 file for one patient
 43 |     # study_path:       path to a study directory containing all nifti files for a specific study of one patient
 44 |     # path_to_h5_file:  path to a single hdf5 file for one patient and study
 45 | 
 46 |     study_path = plb.Path(study_path)
 47 |     patient = study_path.parent.name
 48 |     study = study_path.name
 49 | 
 50 |     suv = nib.load(str(study_path / 'SUV.nii.gz'))
 51 |     ctres = nib.load(str(study_path / 'CTres.nii.gz'))
 52 |     ct = nib.load(str(study_path / 'CT.nii.gz'))
 53 |     pet = nib.load(str(study_path / 'PET.nii.gz'))
 54 |     seg = nib.load(str(study_path / 'SEG.nii.gz'))
 55 | 
 56 |     suv = suv.get_fdata()
 57 |     ctres = ctres.get_fdata()
 58 |     ct = ct.get_fdata()
 59 |     pet = pet.get_fdata()
 60 |     seg = seg.get_fdata()
 61 | 
 62 |     with h5py.File(path_to_h5_file, 'w') as h5_file:
 63 |         try:
 64 |             h5_file.create_group(patient + '/' + study)
 65 |             h5_file.create_dataset(patient + '/' + study + '/suv', data=suv, compression="gzip")
 66 |             h5_file.create_dataset(patient + '/' + study + '/ctres', data=ctres, compression="gzip")
 67 |             h5_file.create_dataset(patient + '/' + study + '/ct', data=ct, compression="gzip")
 68 |             h5_file.create_dataset(patient + '/' + study + '/pet', data=pet, compression="gzip")
 69 |             h5_file.create_dataset(patient + '/' + study + '/seg', data=seg, compression="gzip")
 70 |         except:
 71 |             h5_pat = h5_file.create_group(patient)
 72 |             h5_pat.create_group(study)
 73 |             h5_file.create_dataset(patient + '/' + study + '/suv', data=suv, compression="gzip")
 74 |             h5_file.create_dataset(patient + '/' + study + '/ctres', data=ctres, compression="gzip")
 75 |             h5_file.create_dataset(patient + '/' + study + '/ct', data=ct, compression="gzip")
 76 |             h5_file.create_dataset(patient + '/' + study + '/pet', data=pet, compression="gzip")
 77 |             h5_file.create_dataset(patient + '/' + study + '/seg', data=seg, compression="gzip")
 78 | 
 79 | 
 80 | def convert_nifti_to_hdf5(study_dirs, path_to_h5_data):
 81 |     # batch conversion of all patients
 82 |     # creates a single hdf5 file for all patients
 83 |     # study_dirs:       NiFTI study directories for all patients
 84 |     # path_to_h5_data:  path to a single hdf5 file for all patients
 85 | 
 86 |     h5_file = h5py.File(path_to_h5_data, 'w')
 87 | 
 88 |     for pat_dir in tqdm(study_dirs):
 89 | 
 90 |         patient = pat_dir.parent.name
 91 |         study   = pat_dir.name
 92 | 
 93 |         suv    = nib.load(str(pat_dir/'SUV.nii.gz'))
 94 |         ctres  = nib.load(str(pat_dir/'CTres.nii.gz'))
 95 |         ct     = nib.load(str(pat_dir/'CT.nii.gz'))
 96 |         pet    = nib.load(str(pat_dir/'PET.nii.gz'))
 97 |         seg    = nib.load(str(pat_dir/'SEG.nii.gz'))
 98 |         
 99 |         suv   = suv.get_fdata()
100 |         ctres = ctres.get_fdata()
101 |         ct    = ct.get_fdata()
102 |         pet   = pet.get_fdata()
103 |         seg   = seg.get_fdata()
104 | 
105 |         try:
106 |             h5_file.create_group(patient+'/'+study)
107 |             h5_file.create_dataset(patient+'/'+study+'/suv', data=suv, compression="gzip")
108 |             h5_file.create_dataset(patient+'/'+study+'/ctres', data=ctres, compression="gzip")
109 |             h5_file.create_dataset(patient+'/'+study+'/ct', data=ct, compression="gzip")
110 |             h5_file.create_dataset(patient+'/'+study+'/pet', data=pet, compression="gzip")
111 |             h5_file.create_dataset(patient+'/'+study+'/seg', data=seg, compression="gzip")
112 | 
113 |         except:
114 |             h5_pat = h5_file.create_group(patient)
115 |             h5_pat.create_group(study)
116 |             h5_file.create_dataset(patient+'/'+study+'/suv', data=suv, compression="gzip")
117 |             h5_file.create_dataset(patient+'/'+study+'/ctres', data=ctres, compression="gzip")
118 |             h5_file.create_dataset(patient+'/'+study+'/ct', data=ct, compression="gzip")
119 |             h5_file.create_dataset(patient+'/'+study+'/pet', data=pet, compression="gzip")
120 |             h5_file.create_dataset(patient+'/'+study+'/seg', data=seg, compression="gzip")
121 |         
122 |     h5_file.close()
123 | 
124 | 
125 | if __name__ == "__main__":
126 |     path_to_data = sys.argv[1]     # path to converted NiFTI files (see tcia2nifti) from downloaded TCIA DICOM database e.g. '...tcia_nifti/FDG-PET-CT-Lesions/'
127 |     path_to_h5_data = sys.argv[2]  # path to the to be saved HDF5 file, e.g. '...hdf5/FDG-PET-CT-Lesions.hdf5'
128 |     study_dirs = find_studies(path_to_data)
129 |     convert_nifti_to_hdf5(study_dirs, path_to_h5_data)
130 | 
131 | 
132 | 


--------------------------------------------------------------------------------
/tcia_nifti_to_mha.py:
--------------------------------------------------------------------------------
 1 | # converts the entire dataset from the .nii.gz format to the .mha format
 2 | #(the .mha format is required by grand-challenge.org as input and ouput data of algorithms)
 3 | 
 4 | #run script from command line as follows:
 5 | # python tcia_nifti_to_mha.py /PATH/TO/NIFTI/FDG-PET-CT-Lesions/ /PATH/TO/MHA//FDG-PET-CT-Lesions/
 6 | 
 7 | import SimpleITK as sitk
 8 | import pathlib as plb
 9 | from tqdm import tqdm
10 | import os
11 | import sys
12 | 
13 | def find_studies(path_to_data):  # returns a list of unique study paths within the dataset
14 |     dicom_root = plb.Path(path_to_data)
15 |     patient_dirs = list(dicom_root.glob('*'))
16 | 
17 |     study_dirs = []
18 | 
19 |     for dir in patient_dirs:
20 |         sub_dirs = list(dir.glob('*'))
21 |         #print(sub_dirs)
22 |         study_dirs.extend(sub_dirs)
23 |         
24 |         #dicom_dirs = dicom_dirs.append(dir.glob('*'))
25 |     return study_dirs
26 | 
27 | def nii_to_mha(nii_path, mha_out_path): # converts a .nii.gz file to .mha and saves to a specified path
28 |     img = sitk.ReadImage(nii_path)
29 |     sitk.WriteImage(img, mha_out_path, True)
30 | 
31 | 
32 | def convert_to_mha(study_dirs,path_to_mha_data): # main function converting the entire dataset from .nii.gz to .mha
33 |         
34 |     for study_dir in tqdm(study_dirs):
35 | 
36 |         patient = study_dir.parent.name
37 |         study   = study_dir.name
38 | 
39 |         suv_nii    = str(study_dir/'SUV.nii.gz')
40 |         ctres_nii  = str(study_dir/'CTres.nii.gz')
41 |         ct_nii     = str(study_dir/'CT.nii.gz')
42 |         pet_nii    = str(study_dir/'PET.nii.gz')
43 |         seg_nii    = str(study_dir/'SEG.nii.gz')
44 | 
45 |         suv_mha_dir    = os.path.join(path_to_mha_data, patient, study)
46 |         ctres_mha_dir  = os.path.join(path_to_mha_data, patient, study)
47 |         ct_mha_dir     = os.path.join(path_to_mha_data, patient, study)
48 |         pet_mha_dir    = os.path.join(path_to_mha_data, patient, study)
49 |         seg_mha_dir    = os.path.join(path_to_mha_data, patient, study)
50 | 
51 |         os.makedirs(suv_mha_dir  , exist_ok=True)
52 |         os.makedirs(ctres_mha_dir, exist_ok=True)
53 |         os.makedirs(ct_mha_dir   , exist_ok=True)
54 |         os.makedirs(pet_mha_dir  , exist_ok=True)
55 |         os.makedirs(seg_mha_dir  , exist_ok=True)
56 | 
57 |         nii_to_mha(suv_nii,   os.path.join(suv_mha_dir,'SUV.mha'))
58 |         nii_to_mha(ctres_nii, os.path.join(ctres_mha_dir,'CTres.mha'))
59 |         nii_to_mha(ct_nii,    os.path.join(ct_mha_dir,'CT.mha'))
60 |         nii_to_mha(pet_nii,   os.path.join(pet_mha_dir,'PET.mha'))
61 |         nii_to_mha(seg_nii,   os.path.join(seg_mha_dir,'SEG.mha') )     
62 | 
63 | 
64 | if __name__ == "__main__":
65 | 
66 |     path_to_nii_data = sys.argv[1] # path to nifti data e.g. .../nifti/FDG-PET-CT-Lesions/
67 |     path_to_mha_data = sys.argv[2] # output path for mha data ... /mha/FDG-PET-CT-Lesions/ (will be created if non existing)
68 |     study_dirs = find_studies(path_to_nii_data)
69 | 
70 |     convert_to_mha(study_dirs,path_to_mha_data)
71 | 


--------------------------------------------------------------------------------