├── LICENSE
├── download_abide_preproc_information.txt
├── README.md
└── download_abide_preprocessed_dataset.ipynb


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 Shawon Barman
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/download_abide_preproc_information.txt:
--------------------------------------------------------------------------------
 1 | The download_abide_preproc.py script allows any user to download outputs from the ABIDE preprocessed data release. The user specifies the desired derivative, pipeline, and noise removal strategy of interest, and the script finds the data on FCP-INDI's S3 bucket, hosted by Amazon Web Services, and downloads the data to a local directory. The script also allows for phenotypic specifications for targeting only the particpants whose information meets the desired criteria; these specifications include: diagnosis (either ASD, TDC, or both), an age range (e.g. particpants between 2 and 30 years of age), sex (male or female), and site (location where the images where acquired from). * Note the script only downloads images where the functional image's mean framewise displacement is less than 0.2.
 2 | 
 3 | At a minimum, the script needs a specific derivative, pipeline, and strategy to search for.
 4 | Acceptable derivatives include:
 5 | - alff (Amplitude of low frequency fluctuations)
 6 | - degree_binarize (Degree centrality with binarized weighting)
 7 | - degree_weighted (Degree centrality with correlation weighting)
 8 | - eigenvector_binarize (Eigenvector centrality with binarized weighting)
 9 | - eigenvector_weighted (Eigenvector centrality with correlation weighting)
10 | - falff (Fractional ALFF)
11 | - func_mask (Functional data mask)
12 | - func_mean (Mean preprocessed functional image)
13 | - func_preproc (Preprocessed functional image)
14 | - lfcd (Local functional connectivity density)
15 | - reho (Regional homogeneity)
16 | - rois_aal (Timeseries extracted from the Automated Anatomical Labeling atlas)
17 | - rois_cc200 (" " from Cameron Craddock's 200 ROI parcellation atlas)
18 | - rois_cc400 (" " " 400 ROI parcellation atlas)
19 | - rois_dosenbach160 (" " from the Dosenbach160 atlas)
20 | - rois_ez (" " from the Eickhoff-Zilles atlas)
21 | - rois_ho (" " from the Harvard-Oxford atlas)
22 | - rois_tt (" " from the Talaraich and Tournoux atlas)
23 | - vmhc (Voxel-mirrored homotopic connectivity)
24 | 
25 | Acceptable pipelines include:
26 | - ccs
27 | - cpac
28 | - dparsf
29 | - niak
30 | 
31 | Acceptable strategies include:
32 | - filt_global (band-pass filtering and global signal regression)
33 | - filt_noglobal (band-pass filtering only)
34 | - nofilt_global (global signal regression only)
35 | - nofilt_noglobal (neither)
36 | 
37 | For more information on the ABIDE preprocessed initiative, please check out http://preprocessed-connectomes-project.github.io/abide
38 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # How-to-download-ABIDE-Preprocessed-dataset-for-autism-detection
 2 | 
 3 | <p>The ABIDE (Autism Brain Imaging Data Exchange) preprocessed dataset is a collection of neuroimaging data specifically curated for autism spectrum disorder (ASD) research. It comprises brain imaging data from individuals with ASD and typically developing controls. The dataset has been preprocessed to remove noise, correct for artifacts, and align the data across different imaging centers and acquisition protocols. This preprocessing makes the data more suitable for comparative and statistical analyses.</p>
 4 | 
 5 | <p>The need for the ABIDE preprocessed dataset arises from the complexity of studying brain connectivity and functional differences in individuals with ASD. Analyzing raw neuroimaging data requires extensive preprocessing to ensure data quality and consistency, which can be challenging and time-consuming. By providing preprocessed data, researchers can focus on analyzing and interpreting the results of studies related to autism, rather than spending significant effort on preprocessing.</p>
 6 | 
 7 | <p>download_abide_preprocessed_dataset.ipynb script is designed to download preprocessed data from the ABIDE dataset for autism research. It focuses on a specific derivative (a type of data), preprocessing pipeline, and noise-removal strategy. The collect_and_download function processes participant information from the ABIDE phenotypic file, filtering participants based on their diagnosis (autism or typically developing controls), mean frame displacement, and other criteria. If a participant meets the criteria, the script constructs the download path and saves the data to a specified local directory. The script allows customization of the derivative, pipeline, strategy, and diagnosis for data collection. Overall, it automates the process of retrieving preprocessed brain imaging data to aid in research related to autism spectrum disorder.</p>
 8 | 
 9 | <h3>Let's dive into more detailed explanations:</h3>
10 | <ol>
11 |   <li><h5>Purpose and Dataset:</h5> This script is designed to automate the process of downloading preprocessed brain imaging data from the ABIDE (Autism Brain Imaging Data Exchange) dataset. ABIDE provides valuable data for autism research, enabling scientists to investigate brain connectivity patterns and differences in individuals with autism spectrum disorder (ASD) compared to typically developing controls (TDC).</li>
12 |   <li><h5> Data Selection Criteria:</h5> The script targets specific subsets of the dataset based on the following criteria:
13 |     <ul>
14 |       <li><strong>'derivative:'</strong> This refers to a specific measure calculated from the brain imaging data, such as 'rois_cc200' (region of interest connectivity based on the CC200 atlas).</li>
15 |       <li><strong>'pipeline:'</strong> It indicates the preprocessing pipeline used to prepare the data, for instance, 'cpac' (Configurable Pipeline for the Analysis of Connectomes).</li>
16 |       <li><strong>'strategy:'</strong> This represents the noise removal approach applied during preprocessing, like 'filt_global' (global signal regression with band-pass filtering).</li>
17 |       <li><strong>'diagnosis:'</strong> This parameter determines whether to focus on participants with ASD ('asd'), TDC ('tdc'), or both ('both').</li>
18 |     </ul>
19 |   </li>
20 |   <li><h5>Data Retrieval Process:</h5> The 'collect_and_download' function first connects to the ABIDE dataset's Amazon S3 storage. It processes the phenotypic file containing participant information, extracting details such as site, age, sex, diagnosis, and preprocessing quality metrics like mean frame displacement.</li>
21 |   <li><h5>Filtering Participants:</h5> Using these details, the script filters participants who meet the specified criteria. It checks whether the diagnosis matches the desired diagnosis group and whether the mean frame displacement is below a threshold (0.2). This helps ensure the data quality.</li>
22 |   <li><h5>File Download:</h5> For participants that pass the filtering, the script constructs the download path based on the selected parameters (derivative, pipeline, strategy). It then downloads the corresponding preprocessed data file (in NIfTI or 1D format) from the S3 bucket. Downloaded files are saved in a local directory specified by 'download_data_dir'.</li>
23 |   <li><h5>Progress and Completion:</h5> Throughout the download process, the script provides updates on the progress, indicating the percentage of completed downloads. Once all eligible files are downloaded, the script concludes with a "Done!" message.</li>
24 | </ol>
25 | 
26 | <p>In essence, this script streamlines the process of collecting preprocessed brain imaging data from the ABIDE dataset, allowing researchers to efficiently gather data for their autism-related studies by customizing the data selection criteria.</p>
27 | 
28 | <h3>To run the code in a Jupyter Notebook, follow these steps:</h3>
29 | <ol>
30 |   <li><strong>Open Jupyter Notebook:</strong> Open your Jupyter Notebook environment. If you're using Anaconda, you can launch Jupyter Notebook from the Anaconda Navigator or by running the command jupyter notebook in your terminal/command prompt.</li>
31 |   <li><strong>Navigate to the Notebook:</strong> Using the Jupyter Notebook interface, navigate to the directory where the download_abide_preprocessed_dataset.ipynb notebook is located.</li>
32 |   <li><strong>Open the Notebook:</strong> Click on the download_abide_preprocessed_dataset.ipynb notebook to open it.</li>
33 |   <li><strong>Run the Cells:</strong> The notebook is divided into cells. You can execute each cell by clicking on it and then pressing Shift + Enter. Alternatively, you can use the "Run" button from the toolbar.
34 |   <ul>
35 |     <li>Run the cell that contains the code by clicking on it.</li>
36 |     <li>Press Shift + Enter or click the "Run" button in the toolbar to execute the code in the cell.</li>
37 |   </ul>
38 |   </li>
39 |   <li><strong>Monitor Progress:</strong> As you run the cells, the code will execute, and you'll see output messages and potential progress updates, including information about downloaded files.</li>
40 |   <li><strong>Review Output:</strong> After running the cells, review the output to ensure that the code executed as expected. You should see messages indicating the progress of downloading and the "Done!" message at the end if everything completed successfully.</li>
41 |   <li><strong>Check Local Directory:</strong> Open the specified download_data_dir directory in your file explorer to see the downloaded files.</li>
42 | </ol>
43 | 
44 | 
45 | 
46 | 
47 | 


--------------------------------------------------------------------------------
/download_abide_preprocessed_dataset.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": null,
  6 |    "id": "bff35a2d",
  7 |    "metadata": {},
  8 |    "outputs": [],
  9 |    "source": [
 10 |     "# Main collect and download function\n",
 11 |     "def collect_and_download(derivative, pipeline, strategy, out_dir, diagnosis):\n",
 12 |     "    import os\n",
 13 |     "    import urllib.request as request\n",
 14 |     "\n",
 15 |     "    # Init variables\n",
 16 |     "    mean_fd_thresh = 0.2\n",
 17 |     "    s3_prefix = 'https://s3.amazonaws.com/fcp-indi/data/Projects/ABIDE_Initiative'\n",
 18 |     "    s3_pheno_path = '/'.join([s3_prefix, 'Phenotypic_V1_0b_preprocessed1.csv'])\n",
 19 |     "\n",
 20 |     "    derivative = derivative.lower()\n",
 21 |     "    pipeline = pipeline.lower()\n",
 22 |     "    strategy = strategy.lower()\n",
 23 |     "\n",
 24 |     "    # Check derivative for extension\n",
 25 |     "    if 'roi' in derivative:\n",
 26 |     "        extension = '.1D'\n",
 27 |     "    else:\n",
 28 |     "        extension = '.nii.gz'\n",
 29 |     "\n",
 30 |     "    if not os.path.exists(out_dir):\n",
 31 |     "        print('Could not find {0}, creating now...'.format(out_dir))\n",
 32 |     "        os.makedirs(out_dir)\n",
 33 |     "\n",
 34 |     "    s3_pheno_file = request.urlopen(s3_pheno_path)\n",
 35 |     "    pheno_list = s3_pheno_file.readlines()\n",
 36 |     "\n",
 37 |     "    header = pheno_list[0].decode().split(',')\n",
 38 |     "    try:\n",
 39 |     "        site_idx = header.index('SITE_ID')\n",
 40 |     "        file_idx = header.index('FILE_ID')\n",
 41 |     "        age_idx = header.index('AGE_AT_SCAN')\n",
 42 |     "        sex_idx = header.index('SEX')\n",
 43 |     "        dx_idx = header.index('DX_GROUP')\n",
 44 |     "        mean_fd_idx = header.index('func_mean_fd')\n",
 45 |     "    except Exception as exc:\n",
 46 |     "        err_msg = 'Unable to extract header information from the pheno file...'\n",
 47 |     "        raise Exception(err_msg)\n",
 48 |     "\n",
 49 |     "    s3_paths = []\n",
 50 |     "    for pheno_row in pheno_list[1:]:\n",
 51 |     "        cs_row = pheno_row.decode().split(',')\n",
 52 |     "\n",
 53 |     "        try:\n",
 54 |     "            row_file_id = cs_row[file_idx]\n",
 55 |     "            row_site = cs_row[site_idx]\n",
 56 |     "            row_age = float(cs_row[age_idx])\n",
 57 |     "            row_sex = cs_row[sex_idx]\n",
 58 |     "            row_dx = cs_row[dx_idx]\n",
 59 |     "            row_mean_fd = float(cs_row[mean_fd_idx])\n",
 60 |     "        except Exception as e:\n",
 61 |     "            continue\n",
 62 |     "\n",
 63 |     "        if row_file_id == 'no_filename':\n",
 64 |     "            continue\n",
 65 |     "        if row_mean_fd >= mean_fd_thresh:\n",
 66 |     "            continue\n",
 67 |     "\n",
 68 |     "        if (diagnosis == 'asd' and row_dx != '1') or (diagnosis == 'tdc' and row_dx != '2'):\n",
 69 |     "            continue\n",
 70 |     "\n",
 71 |     "        filename = row_file_id + '_' + derivative + extension\n",
 72 |     "        s3_path = '/'.join([s3_prefix, 'Outputs', pipeline, strategy, derivative, filename])\n",
 73 |     "        s3_paths.append(s3_path)\n",
 74 |     "\n",
 75 |     "    total_num_files = len(s3_paths)\n",
 76 |     "    for path_idx, s3_path in enumerate(s3_paths):\n",
 77 |     "        rel_path = s3_path.lstrip(s3_prefix)\n",
 78 |     "        download_file = os.path.join(out_dir, rel_path)\n",
 79 |     "        download_dir = os.path.dirname(download_file)\n",
 80 |     "        if not os.path.exists(download_dir):\n",
 81 |     "            os.makedirs(download_dir)\n",
 82 |     "        try:\n",
 83 |     "            if not os.path.exists(download_file):\n",
 84 |     "                print('Retrieving: {0}'.format(download_file))\n",
 85 |     "                request.urlretrieve(s3_path, download_file)\n",
 86 |     "                print('{0:.3f}% percent complete'.format(100*(float(path_idx+1)/total_num_files)))\n",
 87 |     "            else:\n",
 88 |     "                print('File {0} already exists, skipping...'.format(download_file))\n",
 89 |     "        except Exception as exc:\n",
 90 |     "            print('There was a problem downloading {0}.\\n Check input arguments and try again.'.format(s3_path))\n",
 91 |     "\n",
 92 |     "    print('Done!')\n",
 93 |     "\n",
 94 |     "# pipelines = [\"ccs\", \"cpac\", \"dparsf\", \"niak\"]\n",
 95 |     "# strategies = [\"filt_global\", \"filt_noglobal\", \"nofilt_global\", \"nofilt_noglobal\"]\n",
 96 |     "# derivatives = [\"alff\", \"degree_binarize\", \"degree_weighted\", \"dual_regression\", \"eigenvector_binarize\", \"eigenvector_weighted\", \"falff\", \"func_mask\", \"func_mean\", \"func_preproc\", \"lfcd\", \"reho\", \"rois_aal\", \"rois_cc200\", \"rois_cc400\", \"rois_dosenbach160\", \"rois_ez\", \"rois_ho\", \"rois_tt\", \"vmhc\"]\n",
 97 |     "# extensions = [\"1D\", \"nii.gz\"]\n",
 98 |     "    \n",
 99 |     "# Variables to specify download settings (modify these values as needed)\n",
100 |     "desired_derivative = 'rois_cc200'  # Derivative of interest (e.g. 'reho')\n",
101 |     "desired_pipeline = 'cpac'     # Pipeline used to preprocess the data (e.g. 'cpac')    \n",
102 |     "desired_strategy = 'filt_global'  # Noise-removal strategy used during preprocessing\n",
103 |     "download_data_dir = 'preprocessed_dataset'  # Path to local folder to download files to\n",
104 |     "desired_diagnosis = 'both'  # 'asd', 'tdc', or 'both' corresponding to the diagnosis of the participants for whom data should be downloaded\n",
105 |     "\n",
106 |     "# Call the collect and download routine\n",
107 |     "collect_and_download(desired_derivative, desired_pipeline, desired_strategy, download_data_dir, desired_diagnosis)"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "markdown",
112 |    "id": "f3df6ec3",
113 |    "metadata": {},
114 |    "source": [
115 |     "## Another way but you need to connect csv file so that get file id"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": null,
121 |    "id": "c5f28928",
122 |    "metadata": {},
123 |    "outputs": [],
124 |    "source": [
125 |     "import os\n",
126 |     "import requests\n",
127 |     "\n",
128 |     "# Base URL for the dataset\n",
129 |     "base_url = \"https://s3.amazonaws.com/fcp-indi/data/Projects/ABIDE_Initiative/Outputs\"\n",
130 |     "\n",
131 |     "# Function to construct the download URL\n",
132 |     "def construct_url(pipeline, strategy, derivative, file_id, ext):\n",
133 |     "    return f\"{base_url}/{pipeline}/{strategy}/{derivative}/{file_id}_{derivative}.{ext}\"\n",
134 |     "\n",
135 |     "# Function to download a file from a given URL\n",
136 |     "def download_file(url, destination):\n",
137 |     "    response = requests.get(url, stream=True)\n",
138 |     "    with open(destination, \"wb\") as file:\n",
139 |     "        for chunk in response.iter_content(chunk_size=8192):\n",
140 |     "            file.write(chunk)\n",
141 |     "\n",
142 |     "# Display options and get user choice\n",
143 |     "def get_user_choice(options):\n",
144 |     "    for i, option in enumerate(options, start=1):\n",
145 |     "        print(f\"{i}. {option}\")\n",
146 |     "    choice = input(\"Enter your choice: \")\n",
147 |     "    return choice\n",
148 |     "\n",
149 |     "# Available options\n",
150 |     "pipelines = [\"ccs\", \"cpac\", \"dparsf\", \"niak\"]\n",
151 |     "strategies = [\"filt_global\", \"filt_noglobal\", \"nofilt_global\", \"nofilt_noglobal\"]\n",
152 |     "derivatives = [\"alff\", \"degree_binarize\", \"degree_weighted\", \"dual_regression\", \"eigenvector_binarize\", \"eigenvector_weighted\", \"falff\", \"func_mask\", \"func_mean\", \"func_preproc\", \"lfcd\", \"reho\", \"rois_aal\", \"rois_cc200\", \"rois_cc400\", \"rois_dosenbach160\", \"rois_ez\", \"rois_ho\", \"rois_tt\", \"vmhc\"]\n",
153 |     "extensions = [\"1D\", \"nii.gz\"]\n",
154 |     "\n",
155 |     "# Get user choices\n",
156 |     "pipeline_choice = int(get_user_choice(pipelines))\n",
157 |     "strategy_choice = int(get_user_choice(strategies))\n",
158 |     "derivative_choice = int(get_user_choice(derivatives))\n",
159 |     "extension_choice = int(get_user_choice(extensions))\n",
160 |     "\n",
161 |     "file_id = input(\"Enter FILE_ID value: \")\n",
162 |     "\n",
163 |     "# Get the chosen options\n",
164 |     "chosen_pipeline = pipelines[pipeline_choice - 1]\n",
165 |     "chosen_strategy = strategies[strategy_choice - 1]\n",
166 |     "chosen_derivative = derivatives[derivative_choice - 1]\n",
167 |     "chosen_extension = extensions[extension_choice - 1]\n",
168 |     "\n",
169 |     "# Construct the download URL\n",
170 |     "url = construct_url(chosen_pipeline, chosen_strategy, chosen_derivative, file_id, chosen_extension)\n",
171 |     "\n",
172 |     "# Specify the destination directory\n",
173 |     "destination_dir = \"downloaded_data\"\n",
174 |     "os.makedirs(destination_dir, exist_ok=True)\n",
175 |     "\n",
176 |     "# Specify the destination file path\n",
177 |     "destination_file = os.path.join(destination_dir, f\"{file_id}_{chosen_derivative}.{chosen_extension}\")\n",
178 |     "\n",
179 |     "# Download the file\n",
180 |     "download_file(url, destination_file)\n",
181 |     "\n",
182 |     "print(\"File downloaded successfully!\")"
183 |    ]
184 |   }
185 |  ],
186 |  "metadata": {
187 |   "kernelspec": {
188 |    "display_name": "Python 3 (ipykernel)",
189 |    "language": "python",
190 |    "name": "python3"
191 |   },
192 |   "language_info": {
193 |    "codemirror_mode": {
194 |     "name": "ipython",
195 |     "version": 3
196 |    },
197 |    "file_extension": ".py",
198 |    "mimetype": "text/x-python",
199 |    "name": "python",
200 |    "nbconvert_exporter": "python",
201 |    "pygments_lexer": "ipython3",
202 |    "version": "3.9.17"
203 |   }
204 |  },
205 |  "nbformat": 4,
206 |  "nbformat_minor": 5
207 | }
208 | 


--------------------------------------------------------------------------------