├── README.md ├── assets └── feats.jpg ├── cats_n_dogs.ipynb ├── cats_n_dogs_BN.ipynb ├── img_clf.py └── vgg_bn.py /README.md: -------------------------------------------------------------------------------- 1 | # Keras Image Classification 2 | 3 | Classifies an image as containing either a dog or a cat (using Kaggle's public dataset), but could easily be extended to other image classification problems. 4 | 5 | To run these scripts/notebooks, you must have keras, numpy, scipy, and h5py installed, and enabling GPU acceleration is highly recommended if that's an option. 6 | 7 | ## img_clf.py 8 | After playing around with hyperparameters a bit, this reaches around 96-98% accuracy on the validation data, and when tested on Kaggle's hidden test data achieved a log loss score around 0.18. 9 | 10 | Most of the code / strategy here was based on this Keras tutorial. 11 | 12 | Pre-trained VGG16 model weights can be downloaded here. 13 | 14 | The data directory structure I used was: 15 | 16 | * project 17 | * data 18 | * train 19 | * dogs 20 | * cats 21 | * validation 22 | * dogs 23 | * cats 24 | * test 25 | * test 26 | 27 | ## cats_n_dogs.ipynb: 28 | This produced a slightly better score (.161 log loss on kaggle test set). The better score most likely comes from having larger images and ensembling a few models, despite the fact there's no image augmentation in the notebook. 29 | 30 | Might run into memory errors because of the large image dimensions -- if so reducing the number of folds and saving the model weights rather than keeping the models in memory should do the trick. The notebook uses a slightly flatter directory structure, with the validation split happening after the images are loaded: 31 | 32 | * project 33 | * data 34 | * train 35 | * dogs 36 | * cats 37 | * test 38 | * test 39 | 40 | ## cats_n_dogs_BN.ipynb: 41 | This produced the best score (0.069 loss without any ensembling). The notebook incorporates some of the techniques from Jeremy Howard's deep learning class , with the inclusion of batch normalization being the biggest factor. I also added extra layers of augmentation to the prediction script, which greatly improved performance. 42 | 43 | Pre-trained model weights for VGG16 w/ batch normalization can be downloaded here. 44 | 45 | The VGG16BN class is defined in vgg_bn.py, and the data directory structure used was: 46 | 47 | * project 48 | * data 49 | * train 50 | * dogs 51 | * cats 52 | * validation 53 | * dogs 54 | * cats 55 | * test 56 | * test 57 | -------------------------------------------------------------------------------- /assets/feats.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rdcolema/keras-image-classification/9d4b678d33410a7c848dee85e86be5d624de7046/assets/feats.jpg -------------------------------------------------------------------------------- /cats_n_dogs_BN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Image Classification of Dogs vs. Cats Using CNN Ensemble" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Imports & environment" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": { 21 | "collapsed": false 22 | }, 23 | "outputs": [ 24 | { 25 | "name": "stderr", 26 | "output_type": "stream", 27 | "text": [ 28 | "Using Theano backend.\n", 29 | "Using gpu device 0: GeForce GTX 980M (CNMeM is enabled with initial size: 90.0% of memory, cuDNN 5105)\n", 30 | "/home/robert/anaconda3/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.\n", 31 | " warnings.warn(warn)\n", 32 | "/home/robert/anaconda3/lib/python3.5/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.\n", 33 | " warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')\n", 34 | "/home/robert/anaconda3/lib/python3.5/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.\n", 35 | " warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')\n" 36 | ] 37 | } 38 | ], 39 | "source": [ 40 | "import os\n", 41 | "import numpy as np\n", 42 | "\n", 43 | "from glob import glob\n", 44 | "from shutil import copyfile\n", 45 | "from vgg_bn import Vgg16BN\n", 46 | "from keras.callbacks import ModelCheckpoint\n", 47 | "\n", 48 | "ROOT_DIR = os.getcwd()\n", 49 | "DATA_HOME_DIR = ROOT_DIR + '/data'\n", 50 | "%matplotlib inline" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "Config & Hyperparameters" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 10, 63 | "metadata": { 64 | "collapsed": true 65 | }, 66 | "outputs": [], 67 | "source": [ 68 | "# paths\n", 69 | "data_path = DATA_HOME_DIR + '/' \n", 70 | "train_path = data_path + '/train/'\n", 71 | "valid_path = data_path + '/valid/'\n", 72 | "test_path = DATA_HOME_DIR + '/test/'\n", 73 | "model_path = ROOT_DIR + '/models/'\n", 74 | "submission_path = ROOT_DIR + '/submissions/'\n", 75 | "\n", 76 | "# data\n", 77 | "img_width, img_height = 224, 224\n", 78 | "batch_size = 64\n", 79 | "nb_train_samples = 23000\n", 80 | "nb_valid_samples = 2000\n", 81 | "nb_test_samples = 12500\n", 82 | "classes = [\"cats\", \"dogs\"]\n", 83 | "n_classes = len(classes)\n", 84 | "\n", 85 | "# model\n", 86 | "nb_epoch = 10\n", 87 | "nb_aug = 5\n", 88 | "lr = 0.001" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "Build the VGG model w/ Batch Normalization" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 3, 101 | "metadata": { 102 | "collapsed": false, 103 | "scrolled": true 104 | }, 105 | "outputs": [ 106 | { 107 | "name": "stdout", 108 | "output_type": "stream", 109 | "text": [ 110 | "____________________________________________________________________________________________________\n", 111 | "Layer (type) Output Shape Param # Connected to \n", 112 | "====================================================================================================\n", 113 | "lambda_1 (Lambda) (None, 3, 224, 224) 0 lambda_input_1[0][0] \n", 114 | "____________________________________________________________________________________________________\n", 115 | "zeropadding2d_1 (ZeroPadding2D) (None, 3, 226, 226) 0 lambda_1[0][0] \n", 116 | "____________________________________________________________________________________________________\n", 117 | "convolution2d_1 (Convolution2D) (None, 64, 224, 224) 0 zeropadding2d_1[0][0] \n", 118 | "____________________________________________________________________________________________________\n", 119 | "zeropadding2d_2 (ZeroPadding2D) (None, 64, 226, 226) 0 convolution2d_1[0][0] \n", 120 | "____________________________________________________________________________________________________\n", 121 | "convolution2d_2 (Convolution2D) (None, 64, 224, 224) 0 zeropadding2d_2[0][0] \n", 122 | "____________________________________________________________________________________________________\n", 123 | "maxpooling2d_1 (MaxPooling2D) (None, 64, 112, 112) 0 convolution2d_2[0][0] \n", 124 | "____________________________________________________________________________________________________\n", 125 | "zeropadding2d_3 (ZeroPadding2D) (None, 64, 114, 114) 0 maxpooling2d_1[0][0] \n", 126 | "____________________________________________________________________________________________________\n", 127 | "convolution2d_3 (Convolution2D) (None, 128, 112, 112) 0 zeropadding2d_3[0][0] \n", 128 | "____________________________________________________________________________________________________\n", 129 | "zeropadding2d_4 (ZeroPadding2D) (None, 128, 114, 114) 0 convolution2d_3[0][0] \n", 130 | "____________________________________________________________________________________________________\n", 131 | "convolution2d_4 (Convolution2D) (None, 128, 112, 112) 0 zeropadding2d_4[0][0] \n", 132 | "____________________________________________________________________________________________________\n", 133 | "maxpooling2d_2 (MaxPooling2D) (None, 128, 56, 56) 0 convolution2d_4[0][0] \n", 134 | "____________________________________________________________________________________________________\n", 135 | "zeropadding2d_5 (ZeroPadding2D) (None, 128, 58, 58) 0 maxpooling2d_2[0][0] \n", 136 | "____________________________________________________________________________________________________\n", 137 | "convolution2d_5 (Convolution2D) (None, 256, 56, 56) 0 zeropadding2d_5[0][0] \n", 138 | "____________________________________________________________________________________________________\n", 139 | "zeropadding2d_6 (ZeroPadding2D) (None, 256, 58, 58) 0 convolution2d_5[0][0] \n", 140 | "____________________________________________________________________________________________________\n", 141 | "convolution2d_6 (Convolution2D) (None, 256, 56, 56) 0 zeropadding2d_6[0][0] \n", 142 | "____________________________________________________________________________________________________\n", 143 | "zeropadding2d_7 (ZeroPadding2D) (None, 256, 58, 58) 0 convolution2d_6[0][0] \n", 144 | "____________________________________________________________________________________________________\n", 145 | "convolution2d_7 (Convolution2D) (None, 256, 56, 56) 0 zeropadding2d_7[0][0] \n", 146 | "____________________________________________________________________________________________________\n", 147 | "maxpooling2d_3 (MaxPooling2D) (None, 256, 28, 28) 0 convolution2d_7[0][0] \n", 148 | "____________________________________________________________________________________________________\n", 149 | "zeropadding2d_8 (ZeroPadding2D) (None, 256, 30, 30) 0 maxpooling2d_3[0][0] \n", 150 | "____________________________________________________________________________________________________\n", 151 | "convolution2d_8 (Convolution2D) (None, 512, 28, 28) 0 zeropadding2d_8[0][0] \n", 152 | "____________________________________________________________________________________________________\n", 153 | "zeropadding2d_9 (ZeroPadding2D) (None, 512, 30, 30) 0 convolution2d_8[0][0] \n", 154 | "____________________________________________________________________________________________________\n", 155 | "convolution2d_9 (Convolution2D) (None, 512, 28, 28) 0 zeropadding2d_9[0][0] \n", 156 | "____________________________________________________________________________________________________\n", 157 | "zeropadding2d_10 (ZeroPadding2D) (None, 512, 30, 30) 0 convolution2d_9[0][0] \n", 158 | "____________________________________________________________________________________________________\n", 159 | "convolution2d_10 (Convolution2D) (None, 512, 28, 28) 0 zeropadding2d_10[0][0] \n", 160 | "____________________________________________________________________________________________________\n", 161 | "maxpooling2d_4 (MaxPooling2D) (None, 512, 14, 14) 0 convolution2d_10[0][0] \n", 162 | "____________________________________________________________________________________________________\n", 163 | "zeropadding2d_11 (ZeroPadding2D) (None, 512, 16, 16) 0 maxpooling2d_4[0][0] \n", 164 | "____________________________________________________________________________________________________\n", 165 | "convolution2d_11 (Convolution2D) (None, 512, 14, 14) 0 zeropadding2d_11[0][0] \n", 166 | "____________________________________________________________________________________________________\n", 167 | "zeropadding2d_12 (ZeroPadding2D) (None, 512, 16, 16) 0 convolution2d_11[0][0] \n", 168 | "____________________________________________________________________________________________________\n", 169 | "convolution2d_12 (Convolution2D) (None, 512, 14, 14) 0 zeropadding2d_12[0][0] \n", 170 | "____________________________________________________________________________________________________\n", 171 | "zeropadding2d_13 (ZeroPadding2D) (None, 512, 16, 16) 0 convolution2d_12[0][0] \n", 172 | "____________________________________________________________________________________________________\n", 173 | "convolution2d_13 (Convolution2D) (None, 512, 14, 14) 0 zeropadding2d_13[0][0] \n", 174 | "____________________________________________________________________________________________________\n", 175 | "maxpooling2d_5 (MaxPooling2D) (None, 512, 7, 7) 0 convolution2d_13[0][0] \n", 176 | "____________________________________________________________________________________________________\n", 177 | "flatten_1 (Flatten) (None, 25088) 0 maxpooling2d_5[0][0] \n", 178 | "____________________________________________________________________________________________________\n", 179 | "dense_1 (Dense) (None, 4096) 0 flatten_1[0][0] \n", 180 | "____________________________________________________________________________________________________\n", 181 | "batchnormalization_1 (BatchNormal(None, 4096) 0 dense_1[0][0] \n", 182 | "____________________________________________________________________________________________________\n", 183 | "dropout_1 (Dropout) (None, 4096) 0 batchnormalization_1[0][0] \n", 184 | "____________________________________________________________________________________________________\n", 185 | "dense_2 (Dense) (None, 4096) 0 dropout_1[0][0] \n", 186 | "____________________________________________________________________________________________________\n", 187 | "batchnormalization_2 (BatchNormal(None, 4096) 0 dense_2[0][0] \n", 188 | "____________________________________________________________________________________________________\n", 189 | "dropout_2 (Dropout) (None, 4096) 0 batchnormalization_2[0][0] \n", 190 | "____________________________________________________________________________________________________\n", 191 | "dense_4 (Dense) (None, 2) 8194 dropout_2[0][0] \n", 192 | "====================================================================================================\n", 193 | "Total params: 8194\n", 194 | "____________________________________________________________________________________________________\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "vgg = Vgg16BN(size=(img_width, img_height), n_classes=n_classes, batch_size=batch_size, lr=lr)\n", 200 | "model = vgg.model\n", 201 | "\n", 202 | "model.summary()" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 12, 208 | "metadata": { 209 | "collapsed": true 210 | }, 211 | "outputs": [], 212 | "source": [ 213 | "info_string = \"{0}x{1}_{2}epoch_{3}aug_{4}lr_vgg16-bn\".format(img_width, img_height, nb_epoch, nb_aug, lr)\n", 214 | "ckpt_fn = model_path + '{val_loss:.2f}-loss_' + info_string + '.h5'\n", 215 | "\n", 216 | "ckpt = ModelCheckpoint(filepath=ckpt_fn,\n", 217 | " monitor='val_loss',\n", 218 | " save_best_only=True,\n", 219 | " save_weights_only=True)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "Train the Model" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 13, 232 | "metadata": { 233 | "collapsed": false 234 | }, 235 | "outputs": [], 236 | "source": [ 237 | "vgg.fit(train_path, valid_path,\n", 238 | " nb_trn_samples=nb_train_samples,\n", 239 | " nb_val_samples=nb_valid_samples,\n", 240 | " nb_epoch=nb_epoch,\n", 241 | " callbacks=[ckpt],\n", 242 | " aug=nb_aug)" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "Predict on Test Data" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 11, 255 | "metadata": { 256 | "collapsed": false 257 | }, 258 | "outputs": [ 259 | { 260 | "name": "stdout", 261 | "output_type": "stream", 262 | "text": [ 263 | "Generating predictions for Augmentation... 0\n", 264 | "Found 12500 images belonging to 1 classes.\n", 265 | "Generating predictions for Augmentation... 1\n", 266 | "Found 12500 images belonging to 1 classes.\n", 267 | "Generating predictions for Augmentation... 2\n", 268 | "Found 12500 images belonging to 1 classes.\n", 269 | "Generating predictions for Augmentation... 3\n", 270 | "Found 12500 images belonging to 1 classes.\n", 271 | "Generating predictions for Augmentation... 4\n", 272 | "Found 12500 images belonging to 1 classes.\n", 273 | "Averaging Predictions Across Augmentations...\n" 274 | ] 275 | } 276 | ], 277 | "source": [ 278 | "# generate predictions\n", 279 | "for aug in range(nb_aug):\n", 280 | " print(\"Generating predictions for Augmentation {0}...\",format(aug+1))\n", 281 | " if aug == 0:\n", 282 | " predictions, filenames = vgg.test(test_path, nb_test_samples, aug=nb_aug)\n", 283 | " else:\n", 284 | " aug_pred, filenames = vgg.test(test_path, nb_test_samples, aug=nb_aug)\n", 285 | " predictions += aug_pred\n", 286 | "\n", 287 | "print(\"Averaging Predictions Across Augmentations...\")\n", 288 | "predictions /= nb_aug" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 14, 294 | "metadata": { 295 | "collapsed": false 296 | }, 297 | "outputs": [], 298 | "source": [ 299 | "# clip predictions\n", 300 | "c = 0.01\n", 301 | "preds = np.clip(predictions, c, 1-c)" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 15, 307 | "metadata": { 308 | "collapsed": false 309 | }, 310 | "outputs": [ 311 | { 312 | "name": "stdout", 313 | "output_type": "stream", 314 | "text": [ 315 | "Writing Predictions to CSV...\n", 316 | "0 / 12500\n", 317 | "2500 / 12500\n", 318 | "5000 / 12500\n", 319 | "7500 / 12500\n", 320 | "10000 / 12500\n", 321 | "Done.\n" 322 | ] 323 | } 324 | ], 325 | "source": [ 326 | "sub_file = submission_path + info_string + '.csv'\n", 327 | "\n", 328 | "with open(sub_file, 'w') as f:\n", 329 | " print(\"Writing Predictions to CSV...\")\n", 330 | " f.write('id,label\\n')\n", 331 | " for i, image_name in enumerate(filenames):\n", 332 | " pred = ['%.6f' % p for p in preds[i, :]]\n", 333 | " if i % 2500 == 0:\n", 334 | " print(i, '/', nb_test_samples)\n", 335 | " f.write('%s,%s\\n' % (os.path.basename(image_name).replace('.jpg', ''), (pred[1])))\n", 336 | " print(\"Done.\")" 337 | ] 338 | } 339 | ], 340 | "metadata": { 341 | "anaconda-cloud": {}, 342 | "kernelspec": { 343 | "display_name": "Python [conda root]", 344 | "language": "python", 345 | "name": "conda-root-py" 346 | }, 347 | "language_info": { 348 | "codemirror_mode": { 349 | "name": "ipython", 350 | "version": 3 351 | }, 352 | "file_extension": ".py", 353 | "mimetype": "text/x-python", 354 | "name": "python", 355 | "nbconvert_exporter": "python", 356 | "pygments_lexer": "ipython3", 357 | "version": "3.5.2" 358 | } 359 | }, 360 | "nbformat": 4, 361 | "nbformat_minor": 1 362 | } 363 | -------------------------------------------------------------------------------- /img_clf.py: -------------------------------------------------------------------------------- 1 | from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img 2 | from keras.models import Sequential, model_from_json 3 | from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D, Activation, Dropout, Flatten, Dense 4 | from keras.callbacks import EarlyStopping 5 | from keras import optimizers 6 | import numpy as np 7 | import csv 8 | from scipy.misc import imresize 9 | import os 10 | import h5py 11 | 12 | 13 | ### paths to weight files 14 | weights_path = '../vgg16_weights.h5' # this is the pretrained vgg16 weights 15 | top_model_weights_path = '../bottleneck_model.h5' # this is the best performing model before fine tuning 16 | 17 | 18 | ### paths to training and testing data 19 | train_data_dir = '../data/train' 20 | validation_data_dir = '../data/validation' 21 | test_data_dir = '../data/test' 22 | 23 | ### other hyperparameters 24 | nb_train_samples = 24500 25 | nb_validation_samples = 500 26 | nb_test_samples = 12500 27 | nb_epoch = 25 28 | img_width, img_height = 200, 200 29 | 30 | # (you'll have to divide up the dataset into the right directories to match this setup 31 | # since the kaggle dataset doesn't come with a validation split) 32 | 33 | early_stopping = EarlyStopping(monitor='val_loss', patience=2, verbose=1, mode='auto') 34 | # ^^ this stops training after validation loss stops improving 35 | 36 | 37 | def save_bottlebeck_features(): 38 | """builds the pretrained vgg16 model and runs it on our training and validation datasets""" 39 | datagen = ImageDataGenerator(rescale=1./255) 40 | 41 | # match the vgg16 architecture so we can load the pretrained weights into this model 42 | model = Sequential() 43 | model.add(ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height))) 44 | 45 | model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1')) 46 | model.add(ZeroPadding2D((1, 1))) 47 | model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_2')) 48 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 49 | 50 | model.add(ZeroPadding2D((1, 1))) 51 | model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1')) 52 | model.add(ZeroPadding2D((1, 1))) 53 | model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_2')) 54 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 55 | 56 | model.add(ZeroPadding2D((1, 1))) 57 | model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1')) 58 | model.add(ZeroPadding2D((1, 1))) 59 | model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_2')) 60 | model.add(ZeroPadding2D((1, 1))) 61 | model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_3')) 62 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 63 | 64 | model.add(ZeroPadding2D((1, 1))) 65 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1')) 66 | model.add(ZeroPadding2D((1, 1))) 67 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2')) 68 | model.add(ZeroPadding2D((1, 1))) 69 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_3')) 70 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 71 | 72 | model.add(ZeroPadding2D((1, 1))) 73 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1')) 74 | model.add(ZeroPadding2D((1, 1))) 75 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_2')) 76 | model.add(ZeroPadding2D((1, 1))) 77 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_3')) 78 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 79 | 80 | # load VGG16 weights 81 | f = h5py.File(weights_path) 82 | 83 | for k in range(f.attrs['nb_layers']): 84 | if k >= len(model.layers): 85 | break 86 | g = f['layer_{}'.format(k)] 87 | weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])] 88 | model.layers[k].set_weights(weights) 89 | 90 | f.close() 91 | print 'Model loaded.' 92 | 93 | generator = datagen.flow_from_directory( 94 | train_data_dir, 95 | target_size=(img_width, img_height), 96 | batch_size=32, 97 | class_mode=None, 98 | shuffle=False) 99 | bottleneck_features_train = model.predict_generator(generator, nb_train_samples) 100 | np.save(open('bottleneck_features_train.npy', 'wb'), bottleneck_features_train) 101 | 102 | generator = datagen.flow_from_directory( 103 | validation_data_dir, 104 | target_size=(img_width, img_height), 105 | batch_size=32, 106 | class_mode=None, 107 | shuffle=False) 108 | bottleneck_features_validation = model.predict_generator(generator, nb_validation_samples) 109 | np.save(open('bottleneck_features_validation.npy', 'wb'), bottleneck_features_validation) 110 | 111 | 112 | def train_top_model(): 113 | """trains the classifier""" 114 | train_data = np.load(open('bottleneck_features_train.npy', 'rb')) 115 | train_labels = np.array([0] * (nb_train_samples / 2) + [1] * (nb_train_samples / 2)) 116 | 117 | validation_data = np.load(open('bottleneck_features_validation.npy', 'rb')) 118 | validation_labels = np.array([0] * (nb_validation_samples / 2) + [1] * (nb_validation_samples / 2)) 119 | 120 | model = Sequential() 121 | model.add(Flatten(input_shape=train_data.shape[1:])) 122 | model.add(Dense(256, activation='relu')) 123 | model.add(Dropout(0.5)) 124 | model.add(Dense(1, activation='sigmoid')) 125 | 126 | model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy']) 127 | 128 | model.fit(train_data, train_labels, 129 | nb_epoch=nb_epoch, 130 | batch_size=32, 131 | validation_data=(validation_data, validation_labels), 132 | callbacks=[early_stopping]) 133 | 134 | # save the model weights 135 | model.save_weights(top_model_weights_path) 136 | 137 | 138 | def fine_tune(): 139 | """recreates top model architecture/weights and fine tunes with image augmentation and optimizations""" 140 | 141 | # reconstruct vgg16 model 142 | model = Sequential() 143 | model.add(ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height))) 144 | 145 | model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_1')) 146 | model.add(ZeroPadding2D((1, 1))) 147 | model.add(Convolution2D(64, 3, 3, activation='relu', name='conv1_2')) 148 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 149 | 150 | model.add(ZeroPadding2D((1, 1))) 151 | model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_1')) 152 | model.add(ZeroPadding2D((1, 1))) 153 | model.add(Convolution2D(128, 3, 3, activation='relu', name='conv2_2')) 154 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 155 | 156 | model.add(ZeroPadding2D((1, 1))) 157 | model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_1')) 158 | model.add(ZeroPadding2D((1, 1))) 159 | model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_2')) 160 | model.add(ZeroPadding2D((1, 1))) 161 | model.add(Convolution2D(256, 3, 3, activation='relu', name='conv3_3')) 162 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 163 | 164 | model.add(ZeroPadding2D((1, 1))) 165 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_1')) 166 | model.add(ZeroPadding2D((1, 1))) 167 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_2')) 168 | model.add(ZeroPadding2D((1, 1))) 169 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv4_3')) 170 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 171 | 172 | model.add(ZeroPadding2D((1, 1))) 173 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_1')) 174 | model.add(ZeroPadding2D((1, 1))) 175 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_2')) 176 | model.add(ZeroPadding2D((1, 1))) 177 | model.add(Convolution2D(512, 3, 3, activation='relu', name='conv5_3')) 178 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 179 | 180 | # load vgg16 weights 181 | f = h5py.File(weights_path) 182 | 183 | for k in range(f.attrs['nb_layers']): 184 | if k >= len(model.layers): 185 | break 186 | g = f['layer_{}'.format(k)] 187 | weights = [g['param_{}'.format(p)] for p in range(g.attrs['nb_params'])] 188 | model.layers[k].set_weights(weights) 189 | 190 | f.close() 191 | 192 | # add the classification layers 193 | top_model = Sequential() 194 | top_model.add(Flatten(input_shape=model.output_shape[1:])) 195 | top_model.add(Dense(256, activation='relu')) 196 | top_model.add(Dropout(0.5)) 197 | top_model.add(Dense(1, activation='sigmoid')) 198 | 199 | top_model.load_weights(top_model_weights_path) 200 | 201 | # add the model on top of the convolutional base 202 | model.add(top_model) 203 | 204 | # set the first 25 layers (up to the last conv block) 205 | # to non-trainable (weights will not be updated) 206 | for layer in model.layers[:25]: 207 | layer.trainable = False 208 | 209 | # compile the model with a SGD/momentum optimizer 210 | # and a very slow learning rate. 211 | model.compile(loss='binary_crossentropy', 212 | optimizer=optimizers.SGD(lr=1e-4, momentum=0.9), 213 | metrics=['accuracy']) 214 | 215 | # prepare data augmentation configuration 216 | train_datagen = ImageDataGenerator( 217 | rescale=1./255, 218 | shear_range=0.2, 219 | zoom_range=0.2, 220 | horizontal_flip=True) 221 | 222 | test_datagen = ImageDataGenerator(rescale=1./255) 223 | 224 | train_generator = train_datagen.flow_from_directory( 225 | train_data_dir, 226 | target_size=(img_height, img_width), 227 | batch_size=32, 228 | class_mode='binary') 229 | 230 | validation_generator = test_datagen.flow_from_directory( 231 | validation_data_dir, 232 | target_size=(img_height, img_width), 233 | batch_size=32, 234 | class_mode='binary') 235 | 236 | # fine-tune the model 237 | model.fit_generator( 238 | train_generator, 239 | samples_per_epoch=nb_train_samples, 240 | nb_epoch=nb_epoch, 241 | validation_data=validation_generator, 242 | nb_val_samples=nb_validation_samples, 243 | callbacks=[early_stopping]) 244 | 245 | # save the model 246 | json_string = model.to_json() 247 | 248 | with open('final_model_architecture.json', 'w') as f: 249 | f.write(json_string) 250 | 251 | model.save_weights('final_weights.h5') 252 | 253 | # return the model for convenience when making predictions 254 | return model 255 | 256 | 257 | def predict_labels(model): 258 | """writes test image labels and predictions to csv""" 259 | 260 | test_datagen = ImageDataGenerator(rescale=1./255) 261 | test_generator = test_datagen.flow_from_directory( 262 | test_data_dir, 263 | target_size=(img_height, img_width), 264 | batch_size=32, 265 | shuffle=False, 266 | class_mode=None) 267 | 268 | base_path = test_data_dir + "/test/" 269 | 270 | with open("prediction.csv", "w") as f: 271 | p_writer = csv.writer(f, delimiter=',', lineterminator='\n') 272 | for _, _, imgs in os.walk(base_path): 273 | for im in imgs: 274 | pic_id = im.split(".")[0] 275 | img = load_img(base_path + im) 276 | img = imresize(img, size=(img_height, img_width)) 277 | test_x = img_to_array(img).reshape(3, img_height, img_width) 278 | test_x = test_x.reshape((1,) + test_x.shape) 279 | test_generator = test_datagen.flow(test_x, 280 | batch_size=1, 281 | shuffle=False) 282 | prediction = model.predict_generator(test_generator, 1)[0][0] 283 | p_writer.writerow([pic_id, prediction]) 284 | 285 | def load_model(): 286 | """Loads a model from an earlier run""" 287 | 288 | json_file = open('final_model_architecture.json', 'r') 289 | model_json = json_file.read() 290 | json_file.close() 291 | model = model_from_json(model_json) 292 | model.load_weights('final_weights.h5') 293 | print "Model Loaded." 294 | 295 | return model 296 | 297 | if __name__ == "__main__": 298 | save_bottlebeck_features() 299 | train_top_model() 300 | model = fine_tune() 301 | predict_labels(model) 302 | -------------------------------------------------------------------------------- /vgg_bn.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from keras.layers.normalization import BatchNormalization 4 | from keras.models import Sequential 5 | from keras.layers.core import Flatten, Dense, Dropout, Lambda 6 | from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D 7 | from keras.optimizers import Adam 8 | from keras.preprocessing.image import ImageDataGenerator 9 | 10 | 11 | vgg_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32).reshape((3,1,1)) 12 | def vgg_preprocess(x): 13 | x = x - vgg_mean 14 | return x[:, ::-1] # reverse axis rgb->bgr 15 | 16 | 17 | class Vgg16BN(): 18 | """The VGG 16 Imagenet model with Batch Normalization for the Dense Layers""" 19 | 20 | def __init__(self, size=(224, 224), n_classes=2, lr=0.001, batch_size=64): 21 | self.weights_file = 'vgg16_bn.h5' # download from: http://www.platform.ai/models/ 22 | self.size = size 23 | self.n_classes = n_classes 24 | self.lr = lr 25 | self.batch_size = batch_size 26 | self.build() 27 | 28 | def predict(self, data): 29 | return self.model.predict(data) 30 | 31 | def ConvBlock(self, layers, filters): 32 | model = self.model 33 | for i in range(layers): 34 | model.add(ZeroPadding2D((1, 1))) 35 | model.add(Convolution2D(filters, 3, 3, activation='relu')) 36 | model.add(MaxPooling2D((2, 2), strides=(2, 2))) 37 | 38 | def FCBlock(self): 39 | model = self.model 40 | model.add(Dense(4096, activation='relu')) 41 | model.add(BatchNormalization()) 42 | model.add(Dropout(0.5)) 43 | 44 | def build(self, ft=True): 45 | model = self.model = Sequential() 46 | model.add(Lambda(vgg_preprocess, input_shape=(3,) + self.size)) 47 | 48 | self.ConvBlock(2, 64) 49 | self.ConvBlock(2, 128) 50 | self.ConvBlock(3, 256) 51 | self.ConvBlock(3, 512) 52 | self.ConvBlock(3, 512) 53 | 54 | model.add(Flatten()) 55 | self.FCBlock() 56 | self.FCBlock() 57 | model.add(Dense(self.n_classes, activation='softmax')) 58 | 59 | model.load_weights(self.weights_file) 60 | 61 | if ft: 62 | self.finetune() 63 | 64 | self.compile() 65 | 66 | def finetune(self): 67 | model = self.model 68 | model.pop() 69 | for layer in model.layers: 70 | layer.trainable=False 71 | model.add(Dense(self.n_classes, activation='softmax')) 72 | 73 | def compile(self): 74 | self.model.compile(optimizer=Adam(lr=self.lr), 75 | loss='categorical_crossentropy', metrics=['accuracy']) 76 | 77 | def fit(self, trn_path, val_path, nb_trn_samples, nb_val_samples, nb_epoch=1, callbacks=None, aug=False): 78 | if aug: 79 | train_datagen = ImageDataGenerator(rotation_range=10, width_shift_range=0.05, zoom_range=0.05, 80 | channel_shift_range=10, height_shift_range=0.05, shear_range=0.05, 81 | horizontal_flip=True) 82 | else: 83 | train_datagen = ImageDataGenerator() 84 | 85 | trn_gen = train_datagen.flow_from_directory(trn_path, target_size=self.size, batch_size=self.batch_size, 86 | class_mode='categorical', shuffle=True) 87 | 88 | val_gen = ImageDataGenerator().flow_from_directory(val_path, target_size=self.size, batch_size=self.batch_size, 89 | class_mode='categorical', shuffle=True) 90 | 91 | self.model.fit_generator(trn_gen, samples_per_epoch=nb_trn_samples, nb_epoch=nb_epoch, verbose=2, 92 | validation_data=val_gen, nb_val_samples=nb_val_samples, callbacks=callbacks) 93 | 94 | def test(self, test_path, nb_test_samples, aug=False): 95 | if aug: 96 | test_datagen = ImageDataGenerator(rotation_range=10, width_shift_range=0.05, zoom_range=0.05, 97 | channel_shift_range=10, height_shift_range=0.05, shear_range=0.05, 98 | horizontal_flip=True) 99 | else: 100 | test_datagen = ImageDataGenerator() 101 | 102 | test_gen = test_datagen.flow_from_directory(test_path, target_size=self.size, batch_size=self.batch_size, 103 | class_mode=None, shuffle=False) 104 | 105 | return self.model.predict_generator(test_gen, val_samples=nb_test_samples), test_gen.filenames 106 | --------------------------------------------------------------------------------