├── 01_start_here.ipynb ├── 02_reshape_train_csv.ipynb ├── 03_train_basic_model.ipynb ├── 07_fastai_v2.ipynb ├── 08_CAM_binary_classifier.ipynb └── README.md /01_start_here.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Downloading the data" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Make sure you have the Kaggle CLI installed. You can find the directions for how to set it up [here](https://github.com/Kaggle/kaggle-api)." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [], 22 | "source": [ 23 | "!mkdir -p data\n", 24 | "!mkdir -p data/raw_data" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "Downloading the dataset can take a while... its over 150GB!\n", 32 | "\n", 33 | "The following will download the dataset for you but it is going to be extremely slow (for most competitions using the kaggle cli works really well - here they set it up so that files are downloaded one by one?!).\n", 34 | "\n", 35 | "What I would recommend is to download all the files in a zip archive using the `Download All` button from the competition website. Once done, extract all the files into `data/raw_data` and we can take it from there." 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 2, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "# !kaggle competitions download -c rsna-intracranial-hemorrhage-detection -p data/raw_data" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "Let's also download a csv file with train labels and the sample submission." 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 3, 57 | "metadata": {}, 58 | "outputs": [ 59 | { 60 | "name": "stdout", 61 | "output_type": "stream", 62 | "text": [ 63 | "stage_1_sample_submission.csv.zip: Skipping, found more recently modified local copy (use --force to force download)\n", 64 | "stage_1_train.csv.zip: Skipping, found more recently modified local copy (use --force to force download)\n" 65 | ] 66 | } 67 | ], 68 | "source": [ 69 | "!cd data && kaggle competitions download -c rsna-intracranial-hemorrhage-detection -f stage_1_sample_submission.csv\n", 70 | "!cd data && kaggle competitions download -c rsna-intracranial-hemorrhage-detection -f stage_1_train.csv " 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "## Preprocessing the data" 78 | ] 79 | }, 80 | { 81 | "cell_type": "markdown", 82 | "metadata": {}, 83 | "source": [ 84 | "Images are saved in DICOM format. Among other things, we will need pydicom to read the data.\n", 85 | "\n", 86 | "The DICOM format has a couple of gotchas - this [kernel](https://www.kaggle.com/omission/eda-view-dicom-images-with-correct-windowing) on kaggle is a good starting point.\n", 87 | "\n", 88 | "We will iterate over all the images in raw_data, process them slightly and save them as image files.\n", 89 | "\n", 90 | "For the window and width values we will use the brain window values that are commonly used for visualizing [intracranial hemorrhages](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109328#latest-629856)." 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 4, 96 | "metadata": {}, 97 | "outputs": [], 98 | "source": [ 99 | "%matplotlib inline\n", 100 | "\n", 101 | "import PIL\n", 102 | "import pydicom\n", 103 | "import numpy as np\n", 104 | "from pathlib import Path\n", 105 | "from matplotlib import pyplot as plt\n", 106 | "\n", 107 | "import torch\n", 108 | "import fastai\n", 109 | "from fastai.core import parallel" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 5, 115 | "metadata": {}, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/plain": [ 120 | "('1.2.0', '1.0.58.dev0')" 121 | ] 122 | }, 123 | "execution_count": 5, 124 | "metadata": {}, 125 | "output_type": "execute_result" 126 | } 127 | ], 128 | "source": [ 129 | "torch.__version__, fastai.__version__" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 6, 135 | "metadata": {}, 136 | "outputs": [], 137 | "source": [ 138 | "window_center = 40\n", 139 | "window_width = 80" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 7, 145 | "metadata": {}, 146 | "outputs": [], 147 | "source": [ 148 | "paths = Path('data/raw_data/stage_1_train_images/')\n", 149 | "path = list(paths.iterdir())[0]" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 8, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "im = pydicom.read_file(str(path))" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 9, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [ 167 | "def window_and_normalize(im):\n", 168 | " rescaled = im.pixel_array * float(im.RescaleSlope) + float(im.RescaleIntercept)\n", 169 | " windowed = rescaled.clip(min=window_center-window_width, max=window_center+window_width)\n", 170 | "\n", 171 | " return (windowed + np.negative(window_center-window_width)) / (window_width * 2 * 1/255)" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 10, 177 | "metadata": {}, 178 | "outputs": [ 179 | { 180 | "data": { 181 | "text/plain": [ 182 | "" 183 | ] 184 | }, 185 | "execution_count": 10, 186 | "metadata": {}, 187 | "output_type": "execute_result" 188 | }, 189 | { 190 | "data": { 191 | "image/png": "\n", 192 | "text/plain": [ 193 | "
" 194 | ] 195 | }, 196 | "metadata": { 197 | "needs_background": "light" 198 | }, 199 | "output_type": "display_data" 200 | } 201 | ], 202 | "source": [ 203 | "plt.imshow(window_and_normalize(im), cmap=plt.cm.bone)" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "Let's read the data, process it, resize and save to disk" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 11, 216 | "metadata": {}, 217 | "outputs": [], 218 | "source": [ 219 | "!mkdir -p data/112\n", 220 | "!mkdir -p data/112/train \n", 221 | "!mkdir -p data/112/test " 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 14, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "def resize(src, dst, sz):\n", 231 | " im = pydicom.read_file(str(src))\n", 232 | " ary = window_and_normalize(im)\n", 233 | " im = PIL.Image.fromarray(ary.astype(np.int8), mode='L')\n", 234 | " im.resize((sz,sz), resample=PIL.Image.BICUBIC).save(f'{dst}/{src.stem}.png')" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 13, 240 | "metadata": {}, 241 | "outputs": [], 242 | "source": [ 243 | "def resize_112(path, _): resize(path, 'data/112/train', 112)" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": 14, 249 | "metadata": {}, 250 | "outputs": [ 251 | { 252 | "ename": "ValueError", 253 | "evalue": "The length of the pixel data in the dataset (153710 bytes) doesn't match the expected length (524288 bytes). The dataset may be corrupted or there may be an issue with the pixel data handler.", 254 | "output_type": "error", 255 | "traceback": [ 256 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 257 | "\u001b[0;31m_RemoteTraceback\u001b[0m Traceback (most recent call last)", 258 | "\u001b[0;31m_RemoteTraceback\u001b[0m: \n\"\"\"\nTraceback (most recent call last):\n File \"/home/radek/anaconda3/envs/fastai/lib/python3.7/concurrent/futures/process.py\", line 232, in _process_worker\n r = call_item.fn(*call_item.args, **call_item.kwargs)\n File \"\", line 1, in resize_112\n def resize_112(path, _): resize(path, 'data/112', 112)\n File \"\", line 3, in resize\n ary = window_and_normalize(im)\n File \"\", line 2, in window_and_normalize\n rescaled = im.pixel_array * float(im.RescaleSlope) + float(im.RescaleIntercept)\n File \"/home/radek/anaconda3/envs/fastai/lib/python3.7/site-packages/pydicom/dataset.py\", line 1362, in pixel_array\n self.convert_pixel_data()\n File \"/home/radek/anaconda3/envs/fastai/lib/python3.7/site-packages/pydicom/dataset.py\", line 1308, in convert_pixel_data\n raise last_exception\n File \"/home/radek/anaconda3/envs/fastai/lib/python3.7/site-packages/pydicom/dataset.py\", line 1276, in convert_pixel_data\n arr = handler.get_pixeldata(self)\n File \"/home/radek/anaconda3/envs/fastai/lib/python3.7/site-packages/pydicom/pixel_data_handlers/numpy_handler.py\", line 257, in get_pixeldata\n .format(actual_length, padded_expected_len)\nValueError: The length of the pixel data in the dataset (153710 bytes) doesn't match the expected length (524288 bytes). The dataset may be corrupted or there may be an issue with the pixel data handler.\n\"\"\"", 259 | "\nThe above exception was the direct cause of the following exception:\n", 260 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 261 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mparallel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresize_112\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpaths\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0miterdir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmax_workers\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m12\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 262 | "\u001b[0;32m~/work/fastai/fastai/core.py\u001b[0m in \u001b[0;36mparallel\u001b[0;34m(func, arr, max_workers, leave)\u001b[0m\n\u001b[1;32m 353\u001b[0m \u001b[0mresults\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 354\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mf\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mprogress_bar\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconcurrent\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfutures\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mas_completed\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfutures\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtotal\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleave\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mleave\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 355\u001b[0;31m \u001b[0mresults\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 356\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0many\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mo\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mo\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mresults\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mresults\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 357\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 263 | "\u001b[0;32m~/anaconda3/envs/fastai/lib/python3.7/concurrent/futures/_base.py\u001b[0m in \u001b[0;36mresult\u001b[0;34m(self, timeout)\u001b[0m\n\u001b[1;32m 423\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mCancelledError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 424\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_state\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mFINISHED\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 425\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__get_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 426\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 427\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_condition\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwait\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 264 | "\u001b[0;32m~/anaconda3/envs/fastai/lib/python3.7/concurrent/futures/_base.py\u001b[0m in \u001b[0;36m__get_result\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 382\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__get_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 383\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_exception\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 384\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_exception\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 385\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 386\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_result\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 265 | "\u001b[0;31mValueError\u001b[0m: The length of the pixel data in the dataset (153710 bytes) doesn't match the expected length (524288 bytes). The dataset may be corrupted or there may be an issue with the pixel data handler." 266 | ] 267 | } 268 | ], 269 | "source": [ 270 | "parallel(resize_112, list(paths.iterdir()), max_workers=12)" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "Something, somewhere, went wrong :). Looks like one of the files is damaged.\n", 278 | "\n", 279 | "Consulting the forums seems we have the [culprit](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109476#latest-629906)." 280 | ] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "execution_count": 20, 285 | "metadata": {}, 286 | "outputs": [ 287 | { 288 | "name": "stdout", 289 | "output_type": "stream", 290 | "text": [ 291 | "ls: cannot access 'data/112/ID_6431af929.png': No such file or directory\r\n" 292 | ] 293 | } 294 | ], 295 | "source": [ 296 | "ls data/112/train/ID_6431af929.png" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 35, 302 | "metadata": {}, 303 | "outputs": [ 304 | { 305 | "data": { 306 | "text/html": [ 307 | "
\n", 308 | "\n", 321 | "\n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | "
IDLabel
3872988ID_6431af929_epidural0
3872989ID_6431af929_intraparenchymal0
3872990ID_6431af929_intraventricular0
3872991ID_6431af929_subarachnoid0
3872992ID_6431af929_subdural0
3872993ID_6431af929_any0
\n", 362 | "
" 363 | ], 364 | "text/plain": [ 365 | " ID Label\n", 366 | "3872988 ID_6431af929_epidural 0\n", 367 | "3872989 ID_6431af929_intraparenchymal 0\n", 368 | "3872990 ID_6431af929_intraventricular 0\n", 369 | "3872991 ID_6431af929_subarachnoid 0\n", 370 | "3872992 ID_6431af929_subdural 0\n", 371 | "3872993 ID_6431af929_any 0" 372 | ] 373 | }, 374 | "execution_count": 35, 375 | "metadata": {}, 376 | "output_type": "execute_result" 377 | } 378 | ], 379 | "source": [ 380 | "import pandas as pd\n", 381 | "\n", 382 | "df = pd.read_csv('data/stage_1_train.csv.zip')\n", 383 | "df[df.ID.str.match('ID_6431af929')]" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "The file is conveniently in the train set. Let's drop it and call it a success." 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 36, 396 | "metadata": {}, 397 | "outputs": [], 398 | "source": [ 399 | "df = df[~df.ID.str.match('ID_6431af929')]\n", 400 | "df.to_csv('data/stage_1_train_with_one_image_dropped.csv', index=False)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 37, 406 | "metadata": {}, 407 | "outputs": [ 408 | { 409 | "name": "stdout", 410 | "output_type": "stream", 411 | "text": [ 412 | "674258\r\n" 413 | ] 414 | } 415 | ], 416 | "source": [ 417 | "!ls data/112 -l | wc -l" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 38, 423 | "metadata": {}, 424 | "outputs": [ 425 | { 426 | "data": { 427 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAHAAAABwCAAAAADji6uXAAAO+0lEQVR4nO1aaXxT15X/v/ckPW3Wbsnyjm2MzW7MZgIkQAkJBBqStFlKEyC0aZZmss2kM+20/SVtM01L10zS6aRJmoakoSUBfhDKEiAEGGwMYbGxscEbtmxrl21J70lP784H2dKTbGND+60+n/Tuvef+7znnnuXeK2CCJmiCJmiCJmiCJuiflhg1c5Oc1A0OZymlUSe3lSkPnCY3CXkjpC9e9d+NgYhISGTn+tybmuJGJJTRJfc/OHnoK/C3T3qaXVDplGJ/X9/4JxnnOJVOps2i1z1NJ1r097deou26HGs0Evb2tnP/SECVOX+Syt2nMj1Mg3y08sM3Ne/ZiAyttWCX3hJwRmL5Weaz40QcByCrpv3uNpknSuWGQX4zXbfHk2f98xs7zSqZUFVxgc9UtnfOnBpq4McFOLYNjaxTNGZbrrUAit9+c+89T+XPiDRF/tD4nZceP6Jb1d7DUMJATDst3Hw++A8BtPIBZUW5QhSFzraQ+b3zDwMUgTqEytfus69R1cfogDvMCobS7ub2gXEA0mP0GxE2LDecO9vQl79+EXvxfStw78E73m5eCu9FtxgOUAOcJteeIecpmlpiGwfgGDbUmstl5nA9rStg6q5miNg7BdQTy1cAJvidvOgVQjIxQqJyrabA9JWyfT+M/L2Apu/d/q7iuFVm7A+Ge3LVaAurY3lAqL0VQQ65HtplUAaJ0B/Tcituw7T+n4pjAY6h0vWPiOaoqsDsbPdE4QQ9oJ4T+f61YNfnYUTfB+fqzeKg0LC8KV9WDsj/be1YJhoDsOpZIIvN6LvaFwsJAy7FJLpaR3buPR+46ABpQRNdQ3IYtYYuKGfUZgCG56rGQrxuv35jARiZ0NEV4gmtEq2YY/3jIXC/PtioZQBQzlBkZ5M2W11MHfZVyhB2h2STZowBeD0bUoVzAL3As7QStIElplPzdD0Amj5e4yQAWJmHip44pWWi/fMqFztbnf3y458s83XcNKDcVAioiq6UOR0cYxU1x8MhDQCYMo/bVH0AJ3MTIBYAMPv+rmPhoEJ7wXt0Tth1PcDrqTS7wgigeKCzV28pyZJ91gYPncsAno7PXTwACP0AoAC75Xu6U7S6uZftYzzVWkZ+cxKq51sZAKzlyhmPRe0nMsBrL9oVAwmjO779KQJajChffpbpVIpCd4+xM0b3DQDK0SP5dSTMjzSg5ayDe3TmVNFxxe3lAF/+ngiAa+RyPAGSRVMNkP30BSbaIPDBDdkdM0AgilHN6AVIUkLa3pXSQ6mO3dLY0X1Bm/N1nW4fBxIAehVpGll526Ox3z0A0CTAhMxPnxU/EwAgnJWcS111WFqMSBLqD1OlpZq8xq4gFeVqqrf86vtzAQD9hWYAgOGZQTOp5lcceAAAs4hEM8LuhcvVAIAQlIl5Cn6WYtIkCFOkSukQg5D3Or/wyrV1Nfkv/vlbAMpvX2YFAEz55rflWRpg1b+4Xp8PANCusspJb3tRJhgagDdZ8Ng0KdVWElCmk6pLTgO4EGnb3qjS4EQvXfyLN8uhXNVWA4BRnJ+jW2LUYtkvqFzrIIP9Sza3o0c+C6KNBcJKNqGpFL1JN410IZQmBqCly/idMh8UghxQPXrqD8IrOc/oGYhRjvvJiYZeKOxf8yZY2Ao2SPqXUmRyJYBgRmKqgdjIgIIoWYucJgB8jszpmd09rpUmANBt/r+n9v/yZDkIASI8gHZvhT7Jk1FEKyI6oPxpAFzCPqw/JYNIAAVJs5oHACWj57QGftGswVbND+iX6uvjfAYABcXfVUiYqqwcVaTNumuNHeATs5n6pXij+aFWAACZ2VzXq5q0Nin6lqWfxDUvWgzALalMiq/muLRZD6/U5gFcAkafWqFLNgot8VYxClBEZRGj3MA9AHCVK9BQgOW20C4fAKBFXahanLZMuY2hnlzPQgGICT1qU4NAEpDIJepREADoc+TJbLmlAETZqYvZi2QAVv/XY/ElDeRun54yFaEw2aKYA0ANyRZkoimjJCqVArIMQMD3CVTsFgrAyYvloVC88EzU+o1peYgAMGoAQA/EkmZIPQYkAQktUW884ItNgrJcCwDuS4VrVmgAAJVDWyi7Ms43JEtiKm89EE2UU1rfKICxqASQk4OSAx1qJi/OdY6xDUaojld18R8vDVaFfKrOgHNes55O2NDqTemUAMakgApAASiUU+MKscrqh7p0y38AALh/c/ybUqanP3Hm3JLk1jSl+L3UD2NSn7KBiJi3KlsbD1Ely6qdQLiWB/Jlz/5vBYWKraMV7dGaFdO5JGDaciQSCpKuoBXg1z5hLFLG9aYu4i+AdEMBTwTUlupjB4/mAOB9D218+Lf1IQCAa/BsUdfI+qSZLrVUlbiFtGoOySlCFm/0y0sHxV7Cun3GoiLwf6nKzoR80APZvuPX6D+pcsu/dSfAxW0ba3ftP+2X4KWecSR2i0h0JBBNkBx73oAhr6UXCjSqA186FiwJSIY1LXtXRLi5uXGaX18Qb2oJlvZvdHScDQ0NkoCnAkqNS3j1AI40lUtHCltffazSv1qjSbbR+2HrRfYmZsGP3jGtz9c8ZOl0nfzgNMoKZ3e1Do5RjVrE7Vkg+ZhrzWUXXCQSql3EvO4nw+hPyl87yfHwiwCA0sV5xel76dPlo0mYQoqV2jp1nSR2XV1PvmEUhg+kX3gaqBowrHECHU1NACAXJPGayhjOM5KE+Xb1lHXP7orFxfDsvttMHR8uHiFEEAkhRHQRQkjPO4sAmEqk4Vp+6dbRJJQmYDgEU/CKaodtAYDmX+3uBOZWjbjM+ORhIwDYHnlk3746l9/olkgojLpL/VIXFSHOK5Nzrx8t2t3a4AUAl9uKUUhwHabuZikawsU773RuslT9koN2ea0DAKhwIGWsBLBbak8xY4XtQIt/0BhFhf5LkzUpjC7OkqginOH5A2xLRhYOfPXle30/+WuV+KGwpuh8XMJI6slfAhKWhjaYru4ZugfJ+s+v6WO1ZSqA9Onieic73wqv+PpQLWhjAEwGSM59L71a+pl8a01pzNHRCQCg02oKySeTAth+bhBPvfrYE3owC/S9Xi40lOaObPOyHW98OpgnmN54QUHNWuUvfBtWUn/us1ZF3LosO2qJIaqRRlSB0rt5c/HgorQuIXNQra5Dk3IZIfzFQPY8AECroSzesfr5B4rABBEQGprioVKD1IONBNCpTQcsWeko/HFCBxkJj+p8XauIRlgSrDXMlgPAwqEe/c8Bflu3LIrYYODS8CFISaLS3mFBoHyqZ9kIZV30jVYdMeQouSsObdrtU7QpioZu88ykJzLh1KsUCQgZduNqbty4LvEhgho0oOAJenSZQVoGMF5DCkf1+7OWt9FKpTy9CkjQKKV+HE97eW7ySyRDgYEqtjp5F21g86fkKFNZdorX9ooFPZ1Ju2Wk1sEpNkx1NDCWg/ONI41kKxTXov2gbZkim7bR1rk6B7z1F+zJpGvrHRXQJbkqs4NX9DXnlQdy04N/SElTuU55Z1DkVIysbmaiIyKj3B5NIP/yqY9lnuR4bdrdlASQkuzSzLBLFOmMK3tVxWmAfSyQW8eio5VDrs44dCqL7G8354l9vMrk3+ab1pAcr2oZFTAkcfxOkwCIbodLucG868zj9mRPFgCl/hJBQDC4g8YhG+45Snl8ETFby16qL3JJcnleWyqgZNN4JR7KBwAgfK2j4+r27758Kk1K2WxZL+SKaM+VwWI1VnfOSdo+3OrLpC6/ZTZJzabpSuNN/oxIukJBAOgPd2U1+h/3s40lqT6aufZn3XLW1T//DgBAm6MhRHnOXJ5ZRhyvCKt3S4darwN4Ovl78ExGDIcrZ2UE2vffWpJjlrLN2PS2zxssfc4GiO6e5r59DlO23L6M97zWsWGfNLQo9aMDIs1jAHRV9ByeykHUXY0yuqBTlZfoWdawz5D779MgBk+6gqTJbSqJhu6ZzH38xSM1KQi5yrRYdP0LWh728w2TDISrzrxs4ZsCdy3OGQxa7kuTS8/7P2lV5alsLW3KzRFn7VeKI++fuKvhQsoMZb1pddAYN8ItFfvf3lTMFvSe7MiiMvxbG1cVKMMMabn80cGZ3ZEt7XPXiGruClPEUvLKKScO1leq0jbYwra0GccAdMrsXcfKYuoV9gsGO03ortMOpUJZ9/M2aAfO+cIlS9iQv5to3JySNRz8I28t+yhtgnnv3Bgg6pd+eEq1wi5M9jnKLcZwrMlTSnm7shyRgSYAme6wSZvBOy0qkA+OxXKr9oVT2fVTrqRNOOa7xbru01jxFM8LLth5KtbX2iPLsiqbD3iCIkVgWqiz+KPR/ALy3lmU3LGjO4371h1TnTcmIQ5tEs98GnvQGCgKUpY+xmAJt7baLCGd5bwI+2xLviDntCxzddc1FM37wJPOfXu//0YlRPZjR4/A/uVp+SRIDfQ7nAMZegdt6qp2gd5QBUrDtyqVjdv7MWXKp8PegpjaUNrVynjeD3OeO7dNhGnJWp0zEO6PwMhYOVFzaDejelJFZKQpZ9JHhwTMm/fu8JegsrrtD6U1jeN1revHz3/7nYB318mqMpHKtggi4boDHfWizMRwfr2qRP1mDTBt2e9GeHlayDjTm8Z82ADg/VHsX3MA1+7/qdeZtYzvcvvpCx4xt+qFUrPVljmwrQawrX9vpEfSxRj2NjTOJ9n7Zm67DADlS2eIgk8W8WtlRl7j5Sy+I+c4QP7gpdoRuOhT8175j7S2cT7J/rX37h1XADQ02CdNtYUCAk1aG82F1OdnIgCw3DkSHmZOHy7heN+AP3d9+cQpAOjuPm2YVCA0dPez7PG4m7N3KnaMyFSpwjBFjxcQjcEN9j1RAIi6XDUAEI7Gk1jGN/g3h1I8U3g1dfK0yDO+TROna78v2ZJ6YBNiAFDwBHkr8fwrpuQmb2x4yhs/IDy/iT25MP1Bgl30jPu1pBgk5SDRzcOXNn78KgXA/375+pKTrZKCWVuysGDX0VEZ9OpDJ9LbbvC/GLZ7J39RfS1MAMBgnzlLefqge/TR96180f93AoKaf7utu9PLcZnmIrnvbO31XtKonM4RGm8QEGAmzciyK2ORrktX05PReOjGAQFATlORMZ+XJ2iCJmiC/mno/wGnBidVbHEXtgAAAABJRU5ErkJggg==\n", 428 | "text/plain": [ 429 | "" 430 | ] 431 | }, 432 | "execution_count": 38, 433 | "metadata": {}, 434 | "output_type": "execute_result" 435 | } 436 | ], 437 | "source": [ 438 | "PIL.Image.open('data/112/train/ID_000039fa0.png')" 439 | ] 440 | }, 441 | { 442 | "cell_type": "markdown", 443 | "metadata": {}, 444 | "source": [ 445 | "We now need to do the same thing for the images in the test set." 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 12, 451 | "metadata": {}, 452 | "outputs": [], 453 | "source": [ 454 | "paths = Path('data/raw_data/stage_1_test_images/')\n", 455 | "\n", 456 | "def resize_112(path, _): resize(path, 'data/112/test', 112)" 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": 15, 462 | "metadata": {}, 463 | "outputs": [], 464 | "source": [ 465 | "parallel(resize_112, list(paths.iterdir()), max_workers=12)" 466 | ] 467 | } 468 | ], 469 | "metadata": { 470 | "kernelspec": { 471 | "display_name": "Python 3", 472 | "language": "python", 473 | "name": "python3" 474 | }, 475 | "language_info": { 476 | "codemirror_mode": { 477 | "name": "ipython", 478 | "version": 3 479 | }, 480 | "file_extension": ".py", 481 | "mimetype": "text/x-python", 482 | "name": "python", 483 | "nbconvert_exporter": "python", 484 | "pygments_lexer": "ipython3", 485 | "version": "3.7.3" 486 | } 487 | }, 488 | "nbformat": 4, 489 | "nbformat_minor": 2 490 | } 491 | -------------------------------------------------------------------------------- /02_reshape_train_csv.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import pandas as pd" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 2, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "df_train = pd.read_csv('data/stage_1_train_with_one_image_dropped.csv')" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 3, 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "data": { 28 | "text/html": [ 29 | "
\n", 30 | "\n", 43 | "\n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | "
IDLabel
0ID_63eb1e259_epidural0
1ID_63eb1e259_intraparenchymal0
2ID_63eb1e259_intraventricular0
3ID_63eb1e259_subarachnoid0
4ID_63eb1e259_subdural0
\n", 79 | "
" 80 | ], 81 | "text/plain": [ 82 | " ID Label\n", 83 | "0 ID_63eb1e259_epidural 0\n", 84 | "1 ID_63eb1e259_intraparenchymal 0\n", 85 | "2 ID_63eb1e259_intraventricular 0\n", 86 | "3 ID_63eb1e259_subarachnoid 0\n", 87 | "4 ID_63eb1e259_subdural 0" 88 | ] 89 | }, 90 | "execution_count": 3, 91 | "metadata": {}, 92 | "output_type": "execute_result" 93 | } 94 | ], 95 | "source": [ 96 | "df_train.head()" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 4, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "df_train['fn'] = df_train.ID.apply(lambda x: '_'.join(x.split('_')[:2]) + '.png')" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 5, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "df_train.columns = ['ID', 'probability', 'fn']" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 6, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "df_train['label'] = df_train.ID.apply(lambda x: x.split('_')[-1])" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 7, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "data": { 133 | "text/html": [ 134 | "
\n", 135 | "\n", 148 | "\n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | "
IDprobabilityfnlabel
0ID_63eb1e259_epidural0ID_63eb1e259.pngepidural
1ID_63eb1e259_intraparenchymal0ID_63eb1e259.pngintraparenchymal
2ID_63eb1e259_intraventricular0ID_63eb1e259.pngintraventricular
3ID_63eb1e259_subarachnoid0ID_63eb1e259.pngsubarachnoid
4ID_63eb1e259_subdural0ID_63eb1e259.pngsubdural
\n", 196 | "
" 197 | ], 198 | "text/plain": [ 199 | " ID probability fn \\\n", 200 | "0 ID_63eb1e259_epidural 0 ID_63eb1e259.png \n", 201 | "1 ID_63eb1e259_intraparenchymal 0 ID_63eb1e259.png \n", 202 | "2 ID_63eb1e259_intraventricular 0 ID_63eb1e259.png \n", 203 | "3 ID_63eb1e259_subarachnoid 0 ID_63eb1e259.png \n", 204 | "4 ID_63eb1e259_subdural 0 ID_63eb1e259.png \n", 205 | "\n", 206 | " label \n", 207 | "0 epidural \n", 208 | "1 intraparenchymal \n", 209 | "2 intraventricular \n", 210 | "3 subarachnoid \n", 211 | "4 subdural " 212 | ] 213 | }, 214 | "execution_count": 7, 215 | "metadata": {}, 216 | "output_type": "execute_result" 217 | } 218 | ], 219 | "source": [ 220 | "df_train.head()" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 8, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "data": { 230 | "text/plain": [ 231 | "(4045566, 4)" 232 | ] 233 | }, 234 | "execution_count": 8, 235 | "metadata": {}, 236 | "output_type": "execute_result" 237 | } 238 | ], 239 | "source": [ 240 | "df_train.shape" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 9, 246 | "metadata": {}, 247 | "outputs": [], 248 | "source": [ 249 | "df_train.drop_duplicates('ID', inplace=True)" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 10, 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "data": { 259 | "text/plain": [ 260 | "(4045542, 4)" 261 | ] 262 | }, 263 | "execution_count": 10, 264 | "metadata": {}, 265 | "output_type": "execute_result" 266 | } 267 | ], 268 | "source": [ 269 | "df_train.shape" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 11, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "pivot = df_train.pivot(index='fn', columns='label', values='probability')" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 12, 284 | "metadata": {}, 285 | "outputs": [ 286 | { 287 | "data": { 288 | "text/html": [ 289 | "
\n", 290 | "\n", 303 | "\n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | "
labelanyepiduralintraparenchymalintraventricularsubarachnoidsubdural
fn
ID_000039fa0.png000000
ID_00005679d.png000000
ID_00008ce3c.png000000
ID_0000950d7.png000000
ID_0000aee4b.png000000
\n", 372 | "
" 373 | ], 374 | "text/plain": [ 375 | "label any epidural intraparenchymal intraventricular \\\n", 376 | "fn \n", 377 | "ID_000039fa0.png 0 0 0 0 \n", 378 | "ID_00005679d.png 0 0 0 0 \n", 379 | "ID_00008ce3c.png 0 0 0 0 \n", 380 | "ID_0000950d7.png 0 0 0 0 \n", 381 | "ID_0000aee4b.png 0 0 0 0 \n", 382 | "\n", 383 | "label subarachnoid subdural \n", 384 | "fn \n", 385 | "ID_000039fa0.png 0 0 \n", 386 | "ID_00005679d.png 0 0 \n", 387 | "ID_00008ce3c.png 0 0 \n", 388 | "ID_0000950d7.png 0 0 \n", 389 | "ID_0000aee4b.png 0 0 " 390 | ] 391 | }, 392 | "execution_count": 12, 393 | "metadata": {}, 394 | "output_type": "execute_result" 395 | } 396 | ], 397 | "source": [ 398 | "pivot.head()" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 13, 404 | "metadata": {}, 405 | "outputs": [], 406 | "source": [ 407 | "pivot.reset_index(inplace=True)" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 14, 413 | "metadata": {}, 414 | "outputs": [ 415 | { 416 | "data": { 417 | "text/html": [ 418 | "
\n", 419 | "\n", 432 | "\n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | "
labelfnanyepiduralintraparenchymalintraventricularsubarachnoidsubdural
0ID_000039fa0.png000000
1ID_00005679d.png000000
2ID_00008ce3c.png000000
3ID_0000950d7.png000000
4ID_0000aee4b.png000000
\n", 498 | "
" 499 | ], 500 | "text/plain": [ 501 | "label fn any epidural intraparenchymal intraventricular \\\n", 502 | "0 ID_000039fa0.png 0 0 0 0 \n", 503 | "1 ID_00005679d.png 0 0 0 0 \n", 504 | "2 ID_00008ce3c.png 0 0 0 0 \n", 505 | "3 ID_0000950d7.png 0 0 0 0 \n", 506 | "4 ID_0000aee4b.png 0 0 0 0 \n", 507 | "\n", 508 | "label subarachnoid subdural \n", 509 | "0 0 0 \n", 510 | "1 0 0 \n", 511 | "2 0 0 \n", 512 | "3 0 0 \n", 513 | "4 0 0 " 514 | ] 515 | }, 516 | "execution_count": 14, 517 | "metadata": {}, 518 | "output_type": "execute_result" 519 | } 520 | ], 521 | "source": [ 522 | "pivot.head()" 523 | ] 524 | }, 525 | { 526 | "cell_type": "code", 527 | "execution_count": 15, 528 | "metadata": {}, 529 | "outputs": [ 530 | { 531 | "data": { 532 | "text/plain": [ 533 | "(674257, 7)" 534 | ] 535 | }, 536 | "execution_count": 15, 537 | "metadata": {}, 538 | "output_type": "execute_result" 539 | } 540 | ], 541 | "source": [ 542 | "pivot.shape" 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": 16, 548 | "metadata": {}, 549 | "outputs": [], 550 | "source": [ 551 | "pivot.to_csv('data/train_pivot.csv', index=False)" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "The pivoted version of the data can be useful down the road so let's go ahead and save it.\n", 559 | "\n", 560 | "Nonetheless, I just checked the documentation and as we will be using the data_block API for our initial model, we need to reformat the data slightly." 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": 17, 566 | "metadata": {}, 567 | "outputs": [], 568 | "source": [ 569 | "from collections import defaultdict\n", 570 | "\n", 571 | "d = defaultdict(list)\n", 572 | "for fn in df_train.fn.unique(): d[fn]\n", 573 | "\n", 574 | "for tup in df_train.itertuples():\n", 575 | " if tup.probability: d[tup.fn].append(tup.label)" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": 18, 581 | "metadata": {}, 582 | "outputs": [], 583 | "source": [ 584 | "ks, vs = [], []\n", 585 | "\n", 586 | "for k, v in d.items():\n", 587 | " ks.append(k), vs.append(' '.join(v))" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 19, 593 | "metadata": {}, 594 | "outputs": [], 595 | "source": [ 596 | "pd.DataFrame(data={'fn': ks, 'labels': vs}).to_csv('data/train_labels_as_strings.csv', index=False)" 597 | ] 598 | } 599 | ], 600 | "metadata": { 601 | "kernelspec": { 602 | "display_name": "Python 3", 603 | "language": "python", 604 | "name": "python3" 605 | }, 606 | "language_info": { 607 | "codemirror_mode": { 608 | "name": "ipython", 609 | "version": 3 610 | }, 611 | "file_extension": ".py", 612 | "mimetype": "text/x-python", 613 | "name": "python", 614 | "nbconvert_exporter": "python", 615 | "pygments_lexer": "ipython3", 616 | "version": "3.7.3" 617 | } 618 | }, 619 | "nbformat": 4, 620 | "nbformat_minor": 2 621 | } 622 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [RSNA Intracranial Hemorrhage Detection](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/overview) Kaggle competition starter pack 2 | 3 | The instructions in the notebooks will take you through downloading the data, processing it, training a basic model and making a submission. 4 | 5 | Twitter [thread](https://twitter.com/radekosmulski/status/1175156772342030337?s=20) 6 | Kaggle [forum post](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109649) 7 | FastAi [forum post](https://forums.fast.ai/t/share-your-work-here-part-2/41392/129?u=radek) 8 | 9 | *EDIT*: Please note, `window_and_normalize` should divide `window_width` by 2 in order to align with the 'brain window' traditionally used by radiologists for visualization. As is right now, the window is twice as wide. Good discussion on relevance of windowing to visualization / modelling in this [kaggle thread](https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/discussion/109328#latest-630565) 10 | 11 | PyTorch version: 1.2.0 12 | FastAi version: 1.0.58.dev0 (54b757bbfe85df4ccf391dd3da0825c441b1d2da) 13 | --------------------------------------------------------------------------------