├── .gitattributes ├── .gitignore ├── Lung Cancer 3D CNN.ipynb ├── Project Report.docx └── README.md /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *,cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | local_settings.py 56 | 57 | # Flask stuff: 58 | instance/ 59 | .webassets-cache 60 | 61 | # Scrapy stuff: 62 | .scrapy 63 | 64 | # Sphinx documentation 65 | docs/_build/ 66 | 67 | # PyBuilder 68 | target/ 69 | 70 | # Jupyter Notebook 71 | .ipynb_checkpoints 72 | 73 | # pyenv 74 | .python-version 75 | 76 | # celery beat schedule file 77 | celerybeat-schedule 78 | 79 | # dotenv 80 | .env 81 | 82 | # virtualenv 83 | .venv/ 84 | venv/ 85 | ENV/ 86 | 87 | # Spyder project settings 88 | .spyderproject 89 | 90 | # Rope project settings 91 | .ropeproject 92 | -------------------------------------------------------------------------------- /Lung Cancer 3D CNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Lung Cancer Detection using 3D Convolutional Neural Networks\n" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Kaggle.com - Data Science Bowl 2017 Competition \n", 15 | "\n", 16 | "Competition link: https://www.kaggle.com/c/data-science-bowl-2017\n" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "# Problem statement\n", 24 | "\n", 25 | "\n", 26 | "In the United States, lung cancer strikes 225,000 people every year, and accounts for $12 billion in health care costs. Early detection is critical to give patients the best chance at recovery and survival. One year ago, the office of the U.S. Vice President spearheaded a bold new initiative, the Cancer Moonshot, to make a decade's worth of progress in cancer prevention, diagnosis, and treatment in just 5 years. \n", 27 | "\n", 28 | "In 2017, the Data Science Bowl will be a critical milestone in support of the Cancer Moonshot by convening the data science and medical communities to develop lung cancer detection algorithms. Using a data set of thousands of high-resolution lung scans provided by the National Cancer Institute, participants will develop algorithms that accurately determine when lesions in the lungs are cancerous. This will dramatically reduce the false positive rate that plagues the current detection technology, get patients earlier access to life-saving interventions, and give radiologists more time to spend with their patients.\n", 29 | "\n" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "# Goal\n", 37 | "\n", 38 | "The goal of this project is to evaluate the data (Slices of CT scans) provided with various pre-processing techniques and analyze the data using machine learning algorithms, in this case 3D Convolutional Neural Networks to train and validate the model, to create an accurate model which can be used to determine whether a person has cancer or not. This will greatly help in the identification and elimination of cancer cells in the early stages. Therefore, an automated method capable of determining whether the patient will be diagnosed with lung cancer is the aim of this project." 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "## Importing dataset and libraries" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": null, 51 | "metadata": { 52 | "collapsed": true 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "import numpy as np\n", 57 | "import pandas as pd\n", 58 | "import dicom\n", 59 | "import os\n", 60 | "import matplotlib.pyplot as plt\n", 61 | "import cv2\n", 62 | "import math\n", 63 | "\n", 64 | "##Data directory\n", 65 | "dataDirectory = 'Lung_Cancer/stage1/stage1/'\n", 66 | "lungPatients = os.listdir(dataDirectory)\n", 67 | "\n", 68 | "##Read labels csv \n", 69 | "labels = pd.read_csv('Lung_Cancer/stage1_labels/stage1_labels.csv', index_col=0)\n", 70 | "\n", 71 | "##Setting x*y size to 50\n", 72 | "size = 50\n", 73 | "\n", 74 | "## Setting z-dimension (number of slices to 20)\n", 75 | "NoSlices = 20" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "# Data Preprocessing" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "## Function to get chunks, mean and processing of images\n", 90 | "\n", 91 | "Chunks - Generating 20 chunks from a list of images. The number of chunks (z-dimension) differs from one file to another. To make it even I used a chunks function to get an even count of 20\n", 92 | "\n", 93 | "Mean - To calculate mean \n", 94 | "\n", 95 | "Data Processing - Generating 3D lung image using id (re-arrange), getting 20 chunks of each file. After processing, the output is saved to a numpy file. " 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 2, 101 | "metadata": { 102 | "collapsed": false 103 | }, 104 | "outputs": [ 105 | { 106 | "name": "stdout", 107 | "output_type": "stream", 108 | "text": [ 109 | "Saved - 0\n", 110 | "Data is unlabeled\n", 111 | "Data is unlabeled\n", 112 | "Data is unlabeled\n", 113 | "Data is unlabeled\n", 114 | "Data is unlabeled\n", 115 | "Data is unlabeled\n", 116 | "Saved - 100\n", 117 | "Data is unlabeled\n", 118 | "Data is unlabeled\n", 119 | "Data is unlabeled\n", 120 | "Data is unlabeled\n", 121 | "Data is unlabeled\n", 122 | "Data is unlabeled\n", 123 | "Data is unlabeled\n", 124 | "Data is unlabeled\n", 125 | "Data is unlabeled\n", 126 | "Data is unlabeled\n", 127 | "Data is unlabeled\n", 128 | "Data is unlabeled\n", 129 | "Saved - 200\n", 130 | "Data is unlabeled\n", 131 | "Data is unlabeled\n", 132 | "Data is unlabeled\n", 133 | "Data is unlabeled\n", 134 | "Data is unlabeled\n", 135 | "Data is unlabeled\n", 136 | "Data is unlabeled\n", 137 | "Saved - 300\n", 138 | "Data is unlabeled\n", 139 | "Data is unlabeled\n", 140 | "Data is unlabeled\n", 141 | "Data is unlabeled\n", 142 | "Data is unlabeled\n", 143 | "Data is unlabeled\n", 144 | "Saved - 400\n", 145 | "Data is unlabeled\n", 146 | "Data is unlabeled\n", 147 | "Data is unlabeled\n", 148 | "Data is unlabeled\n", 149 | "Data is unlabeled\n", 150 | "Data is unlabeled\n", 151 | "Data is unlabeled\n", 152 | "Data is unlabeled\n", 153 | "Data is unlabeled\n", 154 | "Data is unlabeled\n", 155 | "Data is unlabeled\n", 156 | "Data is unlabeled\n", 157 | "Saved - 500\n", 158 | "Data is unlabeled\n", 159 | "Data is unlabeled\n", 160 | "Data is unlabeled\n", 161 | "Data is unlabeled\n", 162 | "Data is unlabeled\n", 163 | "Data is unlabeled\n", 164 | "Data is unlabeled\n", 165 | "Data is unlabeled\n", 166 | "Data is unlabeled\n", 167 | "Data is unlabeled\n", 168 | "Data is unlabeled\n", 169 | "Data is unlabeled\n", 170 | "Data is unlabeled\n", 171 | "Saved - 600\n", 172 | "Data is unlabeled\n", 173 | "Data is unlabeled\n", 174 | "Data is unlabeled\n", 175 | "Data is unlabeled\n", 176 | "Data is unlabeled\n", 177 | "Data is unlabeled\n", 178 | "Data is unlabeled\n", 179 | "Data is unlabeled\n", 180 | "Data is unlabeled\n", 181 | "Data is unlabeled\n", 182 | "Data is unlabeled\n", 183 | "Data is unlabeled\n", 184 | "Data is unlabeled\n", 185 | "Data is unlabeled\n", 186 | "Saved - 700\n", 187 | "Data is unlabeled\n", 188 | "Data is unlabeled\n", 189 | "Data is unlabeled\n", 190 | "Data is unlabeled\n", 191 | "Data is unlabeled\n", 192 | "Data is unlabeled\n", 193 | "Data is unlabeled\n", 194 | "Data is unlabeled\n", 195 | "Data is unlabeled\n", 196 | "Data is unlabeled\n", 197 | "Data is unlabeled\n", 198 | "Saved - 800\n", 199 | "Data is unlabeled\n", 200 | "Data is unlabeled\n", 201 | "Data is unlabeled\n", 202 | "Data is unlabeled\n", 203 | "Data is unlabeled\n", 204 | "Data is unlabeled\n", 205 | "Data is unlabeled\n", 206 | "Data is unlabeled\n", 207 | "Data is unlabeled\n", 208 | "Data is unlabeled\n", 209 | "Data is unlabeled\n", 210 | "Data is unlabeled\n", 211 | "Data is unlabeled\n", 212 | "Data is unlabeled\n", 213 | "Data is unlabeled\n", 214 | "Data is unlabeled\n", 215 | "Data is unlabeled\n", 216 | "Data is unlabeled\n", 217 | "Data is unlabeled\n", 218 | "Data is unlabeled\n", 219 | "Saved - 900\n", 220 | "Data is unlabeled\n", 221 | "Data is unlabeled\n", 222 | "Data is unlabeled\n", 223 | "Data is unlabeled\n", 224 | "Data is unlabeled\n", 225 | "Data is unlabeled\n", 226 | "Data is unlabeled\n", 227 | "Data is unlabeled\n", 228 | "Data is unlabeled\n", 229 | "Data is unlabeled\n", 230 | "Data is unlabeled\n", 231 | "Data is unlabeled\n", 232 | "Data is unlabeled\n", 233 | "Data is unlabeled\n", 234 | "Data is unlabeled\n", 235 | "Data is unlabeled\n", 236 | "Data is unlabeled\n", 237 | "Saved - 1000\n", 238 | "Data is unlabeled\n", 239 | "Data is unlabeled\n", 240 | "Data is unlabeled\n", 241 | "Data is unlabeled\n", 242 | "Data is unlabeled\n", 243 | "Data is unlabeled\n", 244 | "Data is unlabeled\n", 245 | "Data is unlabeled\n", 246 | "Data is unlabeled\n", 247 | "Data is unlabeled\n", 248 | "Data is unlabeled\n", 249 | "Data is unlabeled\n", 250 | "Data is unlabeled\n", 251 | "Data is unlabeled\n", 252 | "Saved - 1100\n", 253 | "Data is unlabeled\n", 254 | "Data is unlabeled\n", 255 | "Data is unlabeled\n", 256 | "Data is unlabeled\n", 257 | "Data is unlabeled\n", 258 | "Data is unlabeled\n", 259 | "Data is unlabeled\n", 260 | "Data is unlabeled\n", 261 | "Data is unlabeled\n", 262 | "Data is unlabeled\n", 263 | "Data is unlabeled\n", 264 | "Data is unlabeled\n", 265 | "Data is unlabeled\n", 266 | "Saved - 1200\n", 267 | "Data is unlabeled\n", 268 | "Data is unlabeled\n", 269 | "Data is unlabeled\n", 270 | "Data is unlabeled\n", 271 | "Data is unlabeled\n", 272 | "Data is unlabeled\n", 273 | "Data is unlabeled\n", 274 | "Data is unlabeled\n", 275 | "Data is unlabeled\n", 276 | "Data is unlabeled\n", 277 | "Data is unlabeled\n", 278 | "Data is unlabeled\n", 279 | "Data is unlabeled\n", 280 | "Data is unlabeled\n", 281 | "Data is unlabeled\n", 282 | "Data is unlabeled\n", 283 | "Saved - 1300\n", 284 | "Data is unlabeled\n", 285 | "Data is unlabeled\n", 286 | "Data is unlabeled\n", 287 | "Data is unlabeled\n", 288 | "Data is unlabeled\n", 289 | "Data is unlabeled\n", 290 | "Data is unlabeled\n", 291 | "Data is unlabeled\n", 292 | "Data is unlabeled\n", 293 | "Data is unlabeled\n", 294 | "Data is unlabeled\n", 295 | "Data is unlabeled\n", 296 | "Saved - 1400\n", 297 | "Data is unlabeled\n", 298 | "Data is unlabeled\n", 299 | "Data is unlabeled\n", 300 | "Data is unlabeled\n", 301 | "Data is unlabeled\n", 302 | "Data is unlabeled\n", 303 | "Data is unlabeled\n", 304 | "Data is unlabeled\n", 305 | "Data is unlabeled\n", 306 | "Data is unlabeled\n", 307 | "Data is unlabeled\n", 308 | "Data is unlabeled\n", 309 | "Data is unlabeled\n", 310 | "Data is unlabeled\n", 311 | "Data is unlabeled\n", 312 | "Saved - 1500\n", 313 | "Data is unlabeled\n", 314 | "Data is unlabeled\n", 315 | "Data is unlabeled\n", 316 | "Data is unlabeled\n", 317 | "Data is unlabeled\n", 318 | "Data is unlabeled\n", 319 | "Data is unlabeled\n", 320 | "Data is unlabeled\n", 321 | "Data is unlabeled\n", 322 | "Data is unlabeled\n" 323 | ] 324 | } 325 | ], 326 | "source": [ 327 | "def chunks(l, n):\n", 328 | " count = 0\n", 329 | " for i in range(0, len(l), n):\n", 330 | " if (count < NoSlices):\n", 331 | " yield l[i:i + n]\n", 332 | " count = count + 1\n", 333 | "\n", 334 | "\n", 335 | "def mean(l):\n", 336 | " return sum(l) / len(l)\n", 337 | "\n", 338 | "\n", 339 | "def dataProcessing(patient, labels_df, size=50, noslices=20, visualize=False):\n", 340 | " label = labels_df.get_value(patient, 'cancer')\n", 341 | " path = dataDirectory + patient\n", 342 | " slices = [dicom.read_file(path + '/' + s) for s in os.listdir(path)]\n", 343 | " slices.sort(key=lambda x: int(x.ImagePositionPatient[2]))\n", 344 | "\n", 345 | " new_slices = []\n", 346 | " slices = [cv2.resize(np.array(each_slice.pixel_array), (size, size)) for each_slice in slices]\n", 347 | "\n", 348 | " chunk_sizes = math.floor(len(slices) / noslices)\n", 349 | " for slice_chunk in chunks(slices, chunk_sizes):\n", 350 | " slice_chunk = list(map(mean, zip(*slice_chunk)))\n", 351 | " new_slices.append(slice_chunk)\n", 352 | "\n", 353 | " if label == 1:\n", 354 | " label = np.array([0, 1])\n", 355 | " elif label == 0:\n", 356 | " label = np.array([1, 0])\n", 357 | " return np.array(new_slices), label\n", 358 | "\n", 359 | "\n", 360 | "imageData = []\n", 361 | "for num, patient in enumerate(lungPatients):\n", 362 | " if num % 100 == 0:\n", 363 | " print('Saved -', num)\n", 364 | " try:\n", 365 | " img_data, label = dataProcessing(patient, labels, size=size, noslices=NoSlices)\n", 366 | " imageData.append([img_data, label,patient])\n", 367 | " except KeyError as e:\n", 368 | " print('Data is unlabeled')\n", 369 | "\n", 370 | " \n", 371 | "##Results are saved as numpy file\n", 372 | "np.save('imageDataNew-{}-{}-{}.npy'.format(size, size, NoSlices), imageData)" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "metadata": {}, 378 | "source": [ 379 | "The above result shows that there are a lot of data that are unlabeled" 380 | ] 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "metadata": {}, 385 | "source": [ 386 | "# Feeding pre-processed data into 3D Convolution layers" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "## Using tensorflow framework for Conv3D and importing libraries\n" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": null, 399 | "metadata": { 400 | "collapsed": true 401 | }, 402 | "outputs": [], 403 | "source": [ 404 | "import tensorflow as tf\n", 405 | "import pandas as pd\n", 406 | "import tflearn\n", 407 | "from tflearn.layers.conv import conv_3d, max_pool_3d\n", 408 | "from tflearn.layers.core import input_data, dropout, fully_connected\n", 409 | "from tflearn.layers.estimator import regression\n", 410 | "import numpy as np\n", 411 | "import matplotlib.pyplot as plt" 412 | ] 413 | }, 414 | { 415 | "cell_type": "markdown", 416 | "metadata": {}, 417 | "source": [ 418 | "## Loading the numpy file and setting train and validation datasets" 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": null, 424 | "metadata": { 425 | "collapsed": true 426 | }, 427 | "outputs": [], 428 | "source": [ 429 | "imageData = np.load('imageDataNew-50-50-20.npy')\n", 430 | "trainingData = imageData[0:800]\n", 431 | "validationData = imageData[-200:-100]\n", 432 | "\n", 433 | "x = tf.placeholder('float')\n", 434 | "y = tf.placeholder('float')\n", 435 | "size = 50\n", 436 | "keep_rate = 0.8\n", 437 | "NoSlices = 20" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "## Building 3D CNN model\n" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 3, 450 | "metadata": { 451 | "collapsed": false 452 | }, 453 | "outputs": [ 454 | { 455 | "name": "stdout", 456 | "output_type": "stream", 457 | "text": [ 458 | "Epoch 1 completed out of 10 loss: 3.88795779023e+13\n", 459 | "Accuracy: 0.7\n", 460 | "Epoch 2 completed out of 10 loss: 1.24509060778e+13\n", 461 | "Accuracy: 0.6\n", 462 | "Epoch 3 completed out of 10 loss: 6.06095143603e+12\n", 463 | "Accuracy: 0.67\n", 464 | "Epoch 4 completed out of 10 loss: 3.88584119686e+12\n", 465 | "Accuracy: 0.71\n", 466 | "Epoch 5 completed out of 10 loss: 3.01922889866e+12\n", 467 | "Accuracy: 0.58\n", 468 | "Epoch 6 completed out of 10 loss: 1.83550548122e+12\n", 469 | "Accuracy: 0.48\n", 470 | "Epoch 7 completed out of 10 loss: 1.58098134931e+12\n", 471 | "Accuracy: 0.61\n", 472 | "Epoch 8 completed out of 10 loss: 1.02996348973e+12\n", 473 | "Accuracy: 0.67\n", 474 | "Epoch 9 completed out of 10 loss: 845322210816.0\n", 475 | "Accuracy: 0.52\n", 476 | "Epoch 10 completed out of 10 loss: 613924536640.0\n", 477 | "Accuracy: 0.6\n", 478 | "Final Accuracy: 0.61\n", 479 | "Patient: dc1ecce5e7f8a4be9082cb5650fa62bd\n", 480 | "Actual: No Cancer\n", 481 | "Predcited: No Cancer\n", 482 | "Patient: dc5cd907d9de1ed0609832f5bf1fc6e2\n", 483 | "Actual: No Cancer\n", 484 | "Predcited: Cancer\n", 485 | "Patient: dc66d11755fd073a59743d0df6b62ee2\n", 486 | "Actual: No Cancer\n", 487 | "Predcited: Cancer\n", 488 | "Patient: dc9854bcdcc71b690d9806438009001d\n", 489 | "Actual: Cancer\n", 490 | "Predcited: No Cancer\n", 491 | "Patient: dcb426dd025b609489c8f520d6d644b7\n", 492 | "Actual: No Cancer\n", 493 | "Predcited: No Cancer\n", 494 | "Patient: dcde02d4757bb845376fa6dbb0351df6\n", 495 | "Actual: No Cancer\n", 496 | "Predcited: No Cancer\n", 497 | "Patient: dcdf0b64b314e08e8f71f3bec9ecb080\n", 498 | "Actual: No Cancer\n", 499 | "Predcited: No Cancer\n", 500 | "Patient: dcf5fd36b9fff9183f63df286bf8eef9\n", 501 | "Actual: No Cancer\n", 502 | "Predcited: No Cancer\n", 503 | "Patient: dcf75f484b2d2712e5033ba61fd6e2a0\n", 504 | "Actual: No Cancer\n", 505 | "Predcited: Cancer\n", 506 | "Patient: dd281294b34eb6deb76ef9f38169d50e\n", 507 | "Actual: No Cancer\n", 508 | "Predcited: Cancer\n", 509 | "Patient: dd571c3949cdae0b59fc0542bb23e06a\n", 510 | "Actual: No Cancer\n", 511 | "Predcited: Cancer\n", 512 | "Patient: dd5764803d51c71a27707d9db8c84aac\n", 513 | "Actual: No Cancer\n", 514 | "Predcited: No Cancer\n", 515 | "Patient: de04fbf5e6c2389f0d039398bdcda971\n", 516 | "Actual: No Cancer\n", 517 | "Predcited: No Cancer\n", 518 | "Patient: de4d3724030397e71d2ac2ab16df5fba\n", 519 | "Actual: No Cancer\n", 520 | "Predcited: Cancer\n", 521 | "Patient: de635c85f320131ee743733bb04e65b9\n", 522 | "Actual: No Cancer\n", 523 | "Predcited: Cancer\n", 524 | "Patient: de881c07adc8d53e52391fac066ccb9f\n", 525 | "Actual: No Cancer\n", 526 | "Predcited: No Cancer\n", 527 | "Patient: de9f65a7a70b73ce2ffef1e4a2613eee\n", 528 | "Actual: No Cancer\n", 529 | "Predcited: Cancer\n", 530 | "Patient: df015da931ad5312ee7b24b201b67478\n", 531 | "Actual: Cancer\n", 532 | "Predcited: No Cancer\n", 533 | "Patient: df1354de25723c9a55e1241d4c40ffe2\n", 534 | "Actual: No Cancer\n", 535 | "Predcited: No Cancer\n", 536 | "Patient: df54dc42705decd3f75ec8fd8040e76e\n", 537 | "Actual: No Cancer\n", 538 | "Predcited: No Cancer\n", 539 | "Patient: df75d5a21b4289e8df6e2d0e135ac48f\n", 540 | "Actual: No Cancer\n", 541 | "Predcited: No Cancer\n", 542 | "Patient: df761dd787bfc439890740ccce934f36\n", 543 | "Actual: Cancer\n", 544 | "Predcited: Cancer\n", 545 | "Patient: df8614fd49a196123c5b88584dd5dd65\n", 546 | "Actual: No Cancer\n", 547 | "Predcited: No Cancer\n", 548 | "Patient: e00832e96709eb85f8e0e608ca02c2b5\n", 549 | "Actual: Cancer\n", 550 | "Predcited: No Cancer\n", 551 | "Patient: e10c2b829c39d4a500c09caf04d461a1\n", 552 | "Actual: Cancer\n", 553 | "Predcited: No Cancer\n", 554 | "Patient: e127111e994be5f79bb0cea52c9d563e\n", 555 | "Actual: No Cancer\n", 556 | "Predcited: No Cancer\n", 557 | "Patient: e129305f6d074d08cd2de0ebdfeaa576\n", 558 | "Actual: No Cancer\n", 559 | "Predcited: No Cancer\n", 560 | "Patient: e1584618a0c72f124fe618e1ed9b3e55\n", 561 | "Actual: No Cancer\n", 562 | "Predcited: Cancer\n", 563 | "Patient: e163325ccf00afde107c80dfce2bce80\n", 564 | "Actual: No Cancer\n", 565 | "Predcited: No Cancer\n", 566 | "Patient: e188bdeea72bb41d980dc2556dc8aafa\n", 567 | "Actual: No Cancer\n", 568 | "Predcited: No Cancer\n", 569 | "Patient: e1c92d3f85a37bd8bb963345b6d66e03\n", 570 | "Actual: No Cancer\n", 571 | "Predcited: No Cancer\n", 572 | "Patient: e1e47812eecd80466cf7f5b0160de446\n", 573 | "Actual: No Cancer\n", 574 | "Predcited: No Cancer\n", 575 | "Patient: e1f3a01e73d706b7e9c30c0a17a4c0b5\n", 576 | "Actual: No Cancer\n", 577 | "Predcited: No Cancer\n", 578 | "Patient: e2a7eaebd0830061e77690aa48f11936\n", 579 | "Actual: No Cancer\n", 580 | "Predcited: No Cancer\n", 581 | "Patient: e2b7fe7fbb002029640c0e65e3051888\n", 582 | "Actual: Cancer\n", 583 | "Predcited: Cancer\n", 584 | "Patient: e2bcbfe1ab0f9ddc5d6234f819cd5df5\n", 585 | "Actual: No Cancer\n", 586 | "Predcited: No Cancer\n", 587 | "Patient: e2ea2f046495909ff89e18e05f710fee\n", 588 | "Actual: No Cancer\n", 589 | "Predcited: No Cancer\n", 590 | "Patient: e3034ac9c2799b9a9cee2111593d9853\n", 591 | "Actual: No Cancer\n", 592 | "Predcited: No Cancer\n", 593 | "Patient: e3423505ef6b43f03c5d7bde52a5a78c\n", 594 | "Actual: Cancer\n", 595 | "Predcited: No Cancer\n", 596 | "Patient: e38789c5eabb3005bfb82a5298055ba0\n", 597 | "Actual: Cancer\n", 598 | "Predcited: Cancer\n", 599 | "Patient: e3a9a6f8d21c6c459728066bcf18c615\n", 600 | "Actual: No Cancer\n", 601 | "Predcited: No Cancer\n", 602 | "Patient: e3e518324e1a85b85f15d9127ed9ea89\n", 603 | "Actual: No Cancer\n", 604 | "Predcited: No Cancer\n", 605 | "Patient: e414153d0e52f70cbe27c129911445a0\n", 606 | "Actual: No Cancer\n", 607 | "Predcited: No Cancer\n", 608 | "Patient: e42815372aa308f5943847ad06f529de\n", 609 | "Actual: No Cancer\n", 610 | "Predcited: No Cancer\n", 611 | "Patient: e43afa905c8e279f818b2d5104f6762b\n", 612 | "Actual: Cancer\n", 613 | "Predcited: No Cancer\n", 614 | "Patient: e4421d2d5318845c1cccbc6fa308a96e\n", 615 | "Actual: No Cancer\n", 616 | "Predcited: No Cancer\n", 617 | "Patient: e4436b5914162ff7efea2bdfb71c19ae\n", 618 | "Actual: No Cancer\n", 619 | "Predcited: Cancer\n", 620 | "Patient: e46973b13a7a6f421430d81fc1dda970\n", 621 | "Actual: No Cancer\n", 622 | "Predcited: No Cancer\n", 623 | "Patient: e4a87107f94e4a8e32b735d18cef1137\n", 624 | "Actual: No Cancer\n", 625 | "Predcited: No Cancer\n", 626 | "Patient: e4ff18b33b7110a64f497e177102f23d\n", 627 | "Actual: No Cancer\n", 628 | "Predcited: No Cancer\n", 629 | "Patient: e537c91cdfa97d20a39df7ef04a52570\n", 630 | "Actual: Cancer\n", 631 | "Predcited: No Cancer\n", 632 | "Patient: e5438d842118e579a340a78f3c5775cc\n", 633 | "Actual: No Cancer\n", 634 | "Predcited: No Cancer\n", 635 | "Patient: e54b574a7e7c650edc224cbdede9e675\n", 636 | "Actual: Cancer\n", 637 | "Predcited: No Cancer\n", 638 | "Patient: e56b9f25a47a42f4ae4085005c46109c\n", 639 | "Actual: Cancer\n", 640 | "Predcited: No Cancer\n", 641 | "Patient: e572e978c2b50aca781e6302937e5b13\n", 642 | "Actual: Cancer\n", 643 | "Predcited: Cancer\n", 644 | "Patient: e58b78dc31d80a50285816f4ecd661e3\n", 645 | "Actual: Cancer\n", 646 | "Predcited: No Cancer\n", 647 | "Patient: e58cc57cab8a1738041b72b156fedc56\n", 648 | "Actual: No Cancer\n", 649 | "Predcited: No Cancer\n", 650 | "Patient: e5c68cfa0f33540da3098800f0daae2c\n", 651 | "Actual: Cancer\n", 652 | "Predcited: No Cancer\n", 653 | "Patient: e5cf847e616cc2fe94816ffa547d2614\n", 654 | "Actual: Cancer\n", 655 | "Predcited: No Cancer\n", 656 | "Patient: e608c0e6cf3adf3c9939593a3c322ef7\n", 657 | "Actual: No Cancer\n", 658 | "Predcited: No Cancer\n", 659 | "Patient: e6214ef879c6d01ae598161e50e23c0c\n", 660 | "Actual: No Cancer\n", 661 | "Predcited: No Cancer\n", 662 | "Patient: e63f43056330bc418a11208aa3a9e7f0\n", 663 | "Actual: No Cancer\n", 664 | "Predcited: No Cancer\n", 665 | "Patient: e659f6517c4df17e86d4d87181396ea6\n", 666 | "Actual: Cancer\n", 667 | "Predcited: No Cancer\n", 668 | "Patient: e67bc6cd24a71a486b626592d591a2da\n", 669 | "Actual: No Cancer\n", 670 | "Predcited: No Cancer\n", 671 | "Patient: e6b3e750c6c7a70ca512d77defcfe615\n", 672 | "Actual: Cancer\n", 673 | "Predcited: No Cancer\n", 674 | "Patient: e6d4a747235bfcc1feac759571c8485c\n", 675 | "Actual: No Cancer\n", 676 | "Predcited: No Cancer\n", 677 | "Patient: e6d8b2631843a24e6761f2723ea30788\n", 678 | "Actual: No Cancer\n", 679 | "Predcited: No Cancer\n", 680 | "Patient: e6f4757b8f315f31559c5c256cb8dead\n", 681 | "Actual: No Cancer\n", 682 | "Predcited: No Cancer\n", 683 | "Patient: e709901da9ba15a95d4a29906edc01dd\n", 684 | "Actual: Cancer\n", 685 | "Predcited: No Cancer\n", 686 | "Patient: e787e5fd289a9f1f6bba31569b7ad384\n", 687 | "Actual: No Cancer\n", 688 | "Predcited: No Cancer\n", 689 | "Patient: e79f52e833ccca893509f0fdeeb26e9f\n", 690 | "Actual: No Cancer\n", 691 | "Predcited: No Cancer\n", 692 | "Patient: e7adb2e4409683b9490e34b6b3604d9e\n", 693 | "Actual: No Cancer\n", 694 | "Predcited: Cancer\n", 695 | "Patient: e7cb27a5362a7098e1437bfb1224d2dc\n", 696 | "Actual: No Cancer\n", 697 | "Predcited: No Cancer\n", 698 | "Patient: e7d76f0723911280b64f0f83a4990c97\n", 699 | "Actual: No Cancer\n", 700 | "Predcited: No Cancer\n", 701 | "Patient: e858263b89f0bb57597bcff325eaeecf\n", 702 | "Actual: No Cancer\n", 703 | "Predcited: No Cancer\n", 704 | "Patient: e8be143b9f5e352f71043b24f79f5a17\n", 705 | "Actual: No Cancer\n", 706 | "Predcited: No Cancer\n", 707 | "Patient: e8eb842ee04bbad407f85fe671f24d4f\n", 708 | "Actual: Cancer\n", 709 | "Predcited: No Cancer\n", 710 | "Patient: e92a2ed80510513497d5252b001cfa3e\n", 711 | "Actual: No Cancer\n", 712 | "Predcited: No Cancer\n", 713 | "Patient: e977737394cee9abb19ad07310aae8eb\n", 714 | "Actual: No Cancer\n", 715 | "Predcited: No Cancer\n", 716 | "Patient: e9ccf1ce85c39779fafb9ec703c71555\n", 717 | "Actual: Cancer\n", 718 | "Predcited: No Cancer\n", 719 | "Patient: ea7373271a2441b5864df2053c0f5c3e\n", 720 | "Actual: Cancer\n", 721 | "Predcited: No Cancer\n", 722 | "Patient: eacb38abacf1214f3b456b6c9fa78697\n", 723 | "Actual: No Cancer\n", 724 | "Predcited: No Cancer\n", 725 | "Patient: ead64f9269f2200e1d439960a1e069b4\n", 726 | "Actual: No Cancer\n", 727 | "Predcited: No Cancer\n", 728 | "Patient: eaf753dc137e12fd06e96d27f3111043\n", 729 | "Actual: Cancer\n", 730 | "Predcited: No Cancer\n", 731 | "Patient: eb008af181f3791fdce2376cf4773733\n", 732 | "Actual: Cancer\n", 733 | "Predcited: Cancer\n", 734 | "Patient: eb8d5136918d6859ca3cc3abafe369ac\n", 735 | "Actual: No Cancer\n", 736 | "Predcited: No Cancer\n", 737 | "Patient: eba18d04b18084ef64be8f22bb7905ca\n", 738 | "Actual: No Cancer\n", 739 | "Predcited: No Cancer\n", 740 | "Patient: eba4bfb93928d424ff21b5be96b5c09b\n", 741 | "Actual: No Cancer\n", 742 | "Predcited: No Cancer\n", 743 | "Patient: ebd601d40a18634b100c92e7db39f585\n", 744 | "Actual: Cancer\n", 745 | "Predcited: No Cancer\n", 746 | "Patient: ed0f3c1619b2becec76ba5df66e1ea56\n", 747 | "Actual: Cancer\n", 748 | "Predcited: No Cancer\n", 749 | "Patient: ed49b57854f5580658fb3510676e03dd\n", 750 | "Actual: Cancer\n", 751 | "Predcited: Cancer\n", 752 | "Patient: ed83b655a1bbad40a782ad13cf27ce8f\n", 753 | "Actual: No Cancer\n", 754 | "Predcited: No Cancer\n", 755 | "Patient: eda58f4918c4b506cd156702bf8a56a3\n", 756 | "Actual: No Cancer\n", 757 | "Predcited: No Cancer\n", 758 | "Patient: edad1a7e85b5443e0ae9e654d2adbcba\n", 759 | "Actual: No Cancer\n", 760 | "Predcited: No Cancer\n", 761 | "Patient: edae2e1edd1217d0c9e20eff2a7b2dd8\n", 762 | "Actual: No Cancer\n", 763 | "Predcited: No Cancer\n", 764 | "Patient: edbf53a8478049de1494b213fdf942e6\n", 765 | "Actual: No Cancer\n", 766 | "Predcited: Cancer\n", 767 | "Patient: ee71210fa398cbb080f6c537a503e806\n", 768 | "Actual: No Cancer\n", 769 | "Predcited: Cancer\n", 770 | "Patient: ee88217bee233a3bfc971b450e3d8b85\n", 771 | "Actual: No Cancer\n", 772 | "Predcited: No Cancer\n", 773 | "Patient: ee984e8fba88691aac4992fbb14f6e97\n", 774 | "Actual: No Cancer\n", 775 | "Predcited: Cancer\n", 776 | "Patient: ee9c580272cd02741df7299892602ac7\n", 777 | "Actual: No Cancer\n", 778 | "Predcited: No Cancer\n", 779 | "Predicted 0 1\n", 780 | "Actual \n", 781 | "0 56 17\n", 782 | "1 18 9\n" 783 | ] 784 | }, 785 | { 786 | "name": "stderr", 787 | "output_type": "stream", 788 | "text": [ 789 | "C:\\Users\\ashwa\\Anaconda3\\envs\\tensorflow-gpu\\lib\\site-packages\\matplotlib\\collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", 790 | " if self._edgecolors == str('face'):\n" 791 | ] 792 | } 793 | ], 794 | "source": [ 795 | "def convolution3d(x, W):\n", 796 | " return tf.nn.conv3d(x, W, strides=[1, 1, 1, 1, 1], padding='SAME')\n", 797 | "\n", 798 | "\n", 799 | "def maxpooling3d(x):\n", 800 | " return tf.nn.max_pool3d(x, ksize=[1, 2, 2, 2, 1], strides=[1, 2, 2, 2, 1], padding='SAME')\n", 801 | "\n", 802 | "\n", 803 | "def cnn(x):\n", 804 | " x = tf.reshape(x, shape=[-1, size, size, NoSlices, 1])\n", 805 | " convolution1 = tf.nn.relu(\n", 806 | " convolution3d(x, tf.Variable(tf.random_normal([3, 3, 3, 1, 32]))) + tf.Variable(tf.random_normal([32])))\n", 807 | " convolution1 = maxpooling3d(convolution1)\n", 808 | " convolution2 = tf.nn.relu(\n", 809 | " convolution3d(convolution1, tf.Variable(tf.random_normal([3, 3, 3, 32, 64]))) + tf.Variable(\n", 810 | " tf.random_normal([64])))\n", 811 | " convolution2 = maxpooling3d(convolution2)\n", 812 | " convolution3 = tf.nn.relu(\n", 813 | " convolution3d(convolution2, tf.Variable(tf.random_normal([3, 3, 3, 64, 128]))) + tf.Variable(\n", 814 | " tf.random_normal([128])))\n", 815 | " convolution3 = maxpooling3d(convolution3)\n", 816 | " convolution4 = tf.nn.relu(\n", 817 | " convolution3d(convolution3, tf.Variable(tf.random_normal([3, 3, 3, 128, 256]))) + tf.Variable(\n", 818 | " tf.random_normal([256])))\n", 819 | " convolution4 = maxpooling3d(convolution4)\n", 820 | " convolution5 = tf.nn.relu(\n", 821 | " convolution3d(convolution4, tf.Variable(tf.random_normal([3, 3, 3, 256, 512]))) + tf.Variable(\n", 822 | " tf.random_normal([512])))\n", 823 | " convolution5 = maxpooling3d(convolution4)\n", 824 | " fullyconnected = tf.reshape(convolution5, [-1, 1024])\n", 825 | " fullyconnected = tf.nn.relu(\n", 826 | " tf.matmul(fullyconnected, tf.Variable(tf.random_normal([1024, 1024]))) + tf.Variable(tf.random_normal([1024])))\n", 827 | " fullyconnected = tf.nn.dropout(fullyconnected, keep_rate)\n", 828 | " output = tf.matmul(fullyconnected, tf.Variable(tf.random_normal([1024, 2]))) + tf.Variable(tf.random_normal([2]))\n", 829 | " return output\n", 830 | "\n", 831 | "\n", 832 | "def network(x):\n", 833 | " prediction = cnn(x)\n", 834 | " cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))\n", 835 | " optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost)\n", 836 | " epochs = 10\n", 837 | " with tf.Session() as session:\n", 838 | " session.run(tf.global_variables_initializer())\n", 839 | " for epoch in range(epochs):\n", 840 | " epoch_loss = 0\n", 841 | " for data in trainingData:\n", 842 | " try:\n", 843 | " X = data[0]\n", 844 | " Y = data[1]\n", 845 | " _, c = session.run([optimizer, cost], feed_dict={x: X, y: Y})\n", 846 | " epoch_loss += c\n", 847 | " except Exception as e:\n", 848 | " pass\n", 849 | "\n", 850 | " correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))\n", 851 | " # if tf.argmax(prediction, 1) == 0:\n", 852 | " accuracy = tf.reduce_mean(tf.cast(correct, 'float'))\n", 853 | " print('Epoch', epoch + 1, 'completed out of', epochs, 'loss:', epoch_loss)\n", 854 | " # print('Correct:',correct.eval({x:[i[0] for i in validationData], y:[i[1] for i in validationData]}))\n", 855 | " print('Accuracy:', accuracy.eval({x: [i[0] for i in validationData], y: [i[1] for i in validationData]}))\n", 856 | " print('Final Accuracy:', accuracy.eval({x: [i[0] for i in validationData], y: [i[1] for i in validationData]}))\n", 857 | " patients = []\n", 858 | " actual = []\n", 859 | " predicted = []\n", 860 | "\n", 861 | " finalprediction = tf.argmax(prediction, 1)\n", 862 | " actualprediction = tf.argmax(y, 1)\n", 863 | " for i in range(len(validationData)):\n", 864 | " patients.append(validationData[i][2])\n", 865 | " for i in finalprediction.eval({x: [i[0] for i in validationData], y: [i[1] for i in validationData]}):\n", 866 | " if(i==1):\n", 867 | " predicted.append(\"Cancer\")\n", 868 | " else:\n", 869 | " predicted.append(\"No Cancer\")\n", 870 | " for i in actualprediction.eval({x: [i[0] for i in validationData], y: [i[1] for i in validationData]}):\n", 871 | " if(i==1):\n", 872 | " actual.append(\"Cancer\")\n", 873 | " else:\n", 874 | " actual.append(\"No Cancer\")\n", 875 | " for i in range(len(patients)):\n", 876 | " print(\"Patient: \",patients[i])\n", 877 | " print(\"Actual: \", actual[i])\n", 878 | " print(\"Predcited: \", predicted[i])\n", 879 | "\n", 880 | " from sklearn.metrics import confusion_matrix\n", 881 | " y_actual = pd.Series(\n", 882 | " (actualprediction.eval({x: [i[0] for i in validationData], y: [i[1] for i in validationData]})),\n", 883 | " name='Actual')\n", 884 | " y_predicted = pd.Series(\n", 885 | " (finalprediction.eval({x: [i[0] for i in validationData], y: [i[1] for i in validationData]})),\n", 886 | " name='Predicted')\n", 887 | " df_confusion = pd.crosstab(y_actual, y_predicted)\n", 888 | " print(df_confusion)\n", 889 | "\n", 890 | " ## Function to plot confusion matrix\n", 891 | " def plot_confusion_matrix(df_confusion, title='Confusion matrix', cmap=plt.cm.gray_r):\\\n", 892 | " \n", 893 | " plt.matshow(df_confusion, cmap=cmap) # imshow \n", 894 | " # plt.title(title)\n", 895 | " plt.colorbar()\n", 896 | " tick_marks = np.arange(len(df_confusion.columns))\n", 897 | " plt.xticks(tick_marks, df_confusion.columns, rotation=45)\n", 898 | " plt.yticks(tick_marks, df_confusion.index)\n", 899 | " # plt.tight_layout()\n", 900 | " plt.ylabel(df_confusion.index.name)\n", 901 | " plt.xlabel(df_confusion.columns.name)\n", 902 | " plt.show()\n", 903 | " plot_confusion_matrix(df_confusion)\n", 904 | " # print(y_true,y_pred)\n", 905 | " # print(confusion_matrix(y_true, y_pred))\n", 906 | " # print(actualprediction.eval({x:[i[0] for i in validationData], y:[i[1] for i in validationData]}))\n", 907 | " # print(finalprediction.eval({x:[i[0] for i in validationData], y:[i[1] for i in validationData]}))\n", 908 | "network(x)" 909 | ] 910 | }, 911 | { 912 | "cell_type": "markdown", 913 | "metadata": {}, 914 | "source": [ 915 | "After training 10 epochs the accurcay is 61%. The confusion matrix shows that true negatives are 56 out of 100 validation images. False negatives are 18/100 which is a big threat as cancer prediction is a critical issue and hence we should avoid false negatives completely." 916 | ] 917 | }, 918 | { 919 | "cell_type": "markdown", 920 | "metadata": {}, 921 | "source": [ 922 | "## Detecting whether a patient has cancer or not" 923 | ] 924 | }, 925 | { 926 | "cell_type": "code", 927 | "execution_count": 6, 928 | "metadata": { 929 | "collapsed": false 930 | }, 931 | "outputs": [ 932 | { 933 | "name": "stdout", 934 | "output_type": "stream", 935 | "text": [ 936 | "Patient: ffe02fe7d2223743f7fb455dfaff3842\n", 937 | "Predcited: No Cancer\n" 938 | ] 939 | } 940 | ], 941 | "source": [ 942 | "import tensorflow as tf\n", 943 | "import pandas as pd\n", 944 | "import tflearn\n", 945 | "from tflearn.layers.conv import conv_3d, max_pool_3d\n", 946 | "from tflearn.layers.core import input_data, dropout, fully_connected\n", 947 | "from tflearn.layers.estimator import regression\n", 948 | "import numpy as np\n", 949 | "import pandas as pd\n", 950 | "import matplotlib.pyplot as plt\n", 951 | "\n", 952 | "imageData = np.load('imageDataNew-50-50-20.npy')\n", 953 | "trainingData = imageData[0:800]\n", 954 | "testData = imageData[-1:]\n", 955 | "\n", 956 | "x = tf.placeholder('float')\n", 957 | "y = tf.placeholder('float')\n", 958 | "size = 50\n", 959 | "keep_rate = 0.8\n", 960 | "NoSlices = 20\n", 961 | "\n", 962 | "\n", 963 | "def convolution3d(x, W):\n", 964 | " return tf.nn.conv3d(x, W, strides=[1, 1, 1, 1, 1], padding='SAME')\n", 965 | "\n", 966 | "\n", 967 | "def maxpooling3d(x):\n", 968 | " return tf.nn.max_pool3d(x, ksize=[1, 2, 2, 2, 1], strides=[1, 2, 2, 2, 1], padding='SAME')\n", 969 | "\n", 970 | "\n", 971 | "def cnn(x):\n", 972 | " x = tf.reshape(x, shape=[-1, size, size, NoSlices, 1])\n", 973 | " convolution1 = tf.nn.relu(\n", 974 | " convolution3d(x, tf.Variable(tf.random_normal([3, 3, 3, 1, 32]))) + tf.Variable(tf.random_normal([32])))\n", 975 | " convolution1 = maxpooling3d(convolution1)\n", 976 | " convolution2 = tf.nn.relu(\n", 977 | " convolution3d(convolution1, tf.Variable(tf.random_normal([3, 3, 3, 32, 64]))) + tf.Variable(\n", 978 | " tf.random_normal([64])))\n", 979 | " convolution2 = maxpooling3d(convolution2)\n", 980 | " convolution3 = tf.nn.relu(\n", 981 | " convolution3d(convolution2, tf.Variable(tf.random_normal([3, 3, 3, 64, 128]))) + tf.Variable(\n", 982 | " tf.random_normal([128])))\n", 983 | " convolution3 = maxpooling3d(convolution3)\n", 984 | " convolution4 = tf.nn.relu(\n", 985 | " convolution3d(convolution3, tf.Variable(tf.random_normal([3, 3, 3, 128, 256]))) + tf.Variable(\n", 986 | " tf.random_normal([256])))\n", 987 | " convolution4 = maxpooling3d(convolution4)\n", 988 | " convolution5 = tf.nn.relu(\n", 989 | " convolution3d(convolution4, tf.Variable(tf.random_normal([3, 3, 3, 256, 512]))) + tf.Variable(\n", 990 | " tf.random_normal([512])))\n", 991 | " convolution5 = maxpooling3d(convolution4)\n", 992 | " fullyconnected = tf.reshape(convolution5, [-1, 1024])\n", 993 | " fullyconnected = tf.nn.relu(\n", 994 | " tf.matmul(fullyconnected, tf.Variable(tf.random_normal([1024, 1024]))) + tf.Variable(tf.random_normal([1024])))\n", 995 | " fullyconnected = tf.nn.dropout(fullyconnected, keep_rate)\n", 996 | " output = tf.matmul(fullyconnected, tf.Variable(tf.random_normal([1024, 2]))) + tf.Variable(tf.random_normal([2]))\n", 997 | " return output\n", 998 | "\n", 999 | "\n", 1000 | "def network(x):\n", 1001 | " prediction = cnn(x)\n", 1002 | " cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y))\n", 1003 | " optimizer = tf.train.AdamOptimizer(learning_rate=1e-3).minimize(cost)\n", 1004 | " epochs = 10\n", 1005 | " with tf.Session() as session:\n", 1006 | " session.run(tf.global_variables_initializer())\n", 1007 | " for epoch in range(epochs):\n", 1008 | " epoch_loss = 0\n", 1009 | " for data in trainingData:\n", 1010 | " try:\n", 1011 | " X = data[0]\n", 1012 | " Y = data[1]\n", 1013 | " _, c = session.run([optimizer, cost], feed_dict={x: X, y: Y})\n", 1014 | " epoch_loss += c\n", 1015 | " except Exception as e:\n", 1016 | " pass\n", 1017 | "\n", 1018 | " patients = []\n", 1019 | " actual = []\n", 1020 | " predicted = []\n", 1021 | "\n", 1022 | " finalprediction = tf.argmax(prediction, 1)\n", 1023 | " actualprediction = tf.argmax(y, 1)\n", 1024 | " for i in range(len(testData)):\n", 1025 | " patients.append(testData[i][2])\n", 1026 | " for i in finalprediction.eval({x: [i[0] for i in testData], y: [i[1] for i in testData]}):\n", 1027 | " if(i==1):\n", 1028 | " predicted.append(\"Cancer\")\n", 1029 | " else:\n", 1030 | " predicted.append(\"No Cancer\")\n", 1031 | "\n", 1032 | " for i in range(len(patients)):\n", 1033 | " print(\"Patient: \",patients[i])\n", 1034 | " print(\"Predcited: \", predicted[i])\n", 1035 | "\n", 1036 | " \n", 1037 | "network(x)" 1038 | ] 1039 | }, 1040 | { 1041 | "cell_type": "markdown", 1042 | "metadata": {}, 1043 | "source": [ 1044 | "For the patient id ffe02fe7d2223743f7fb455dfaff3842 predicted cancer - No Cancer " 1045 | ] 1046 | } 1047 | ], 1048 | "metadata": { 1049 | "anaconda-cloud": {}, 1050 | "kernelspec": { 1051 | "display_name": "Python [default]", 1052 | "language": "python", 1053 | "name": "python3" 1054 | }, 1055 | "language_info": { 1056 | "codemirror_mode": { 1057 | "name": "ipython", 1058 | "version": 3 1059 | }, 1060 | "file_extension": ".py", 1061 | "mimetype": "text/x-python", 1062 | "name": "python", 1063 | "nbconvert_exporter": "python", 1064 | "pygments_lexer": "ipython3", 1065 | "version": "3.5.2" 1066 | } 1067 | }, 1068 | "nbformat": 4, 1069 | "nbformat_minor": 2 1070 | } 1071 | -------------------------------------------------------------------------------- /Project Report.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/srujanielango/Lung-Cancer-Detection-using-3D-Convolutional-Neural-Networks/fcbfdaeea2747487a33aeb66cb4474faeed76c10/Project Report.docx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Implementing 3D CNN for Lung Cancer Detection 2 | 3 | • Download and install CUDA such that GPU can be utilized for processing on data and this speeds up training by a considerate amount of time. Also Download CUDNN and copy the contents of the folder to the respective contents in the CUDA folder 4 | 5 | • Install anaconda with python 3.5 6 | 7 | • Create a conda environment in command prompt and name it as tensorflow gpu. Follow instructions in this page to setup tensorflow gpu for the system: https://www.tensorflow.org/install/install_windows 8 | 9 | • Activate the environment 10 | 11 | • Import necessary libraries specified below 12 | 13 | • OpenCV, Dicom, pandas, tensorflow, numpy, os, matplotlib, scikit-learn 14 | 15 | • After import of packages is complete, make sure that the indentation is followed precisely as that can cause multiple errors 16 | 17 | • Open Jupyter Notebook from within the activated environment 18 | 19 | • LungCancer3DCNN.ipynb has the 3D CNN model to be trained and contains model to detect individual patient's tumor 20 | 21 | • After the necessary parameters are specified for each layer of the network, the model will be created, to which the training data should be passed and the trained model is processed as output 22 | 23 | • On the trained model, we then run the test data, for which we will receive an accuracy, if the accuracy is stagnant, it means that the model has considerable amount of overfitting and datasets available are insufficient. We are able to predict the patients and whether they have cancer or not based on the trained model 24 | 25 | • The above steps were necessary for a person to run the code and achieve the best possible solution set accurately 26 | --------------------------------------------------------------------------------