├── .gitattributes ├── LICENSE ├── README.md ├── file3.csv ├── model.png ├── test.html ├── test.ipynb ├── training_log2.txt └── 毕业报告 2.5.pdf /.gitattributes: -------------------------------------------------------------------------------- 1 | *.html linguist-language=jupyter notebook 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Iron 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # dogs-vs-cats 2 | [猫狗大战](https://www.kaggle.com/c/dogs-vs-cats)是kaggle比赛中的经典题目,也是Udacity的机器学习学位的毕业课题。要求在最终在测试集的结果排在Public Leaderboard的前10%,具体要求为[dog_vs_cat](https://github.com/nd009/capstone/tree/master/dog_vs_cat)。 3 | 4 | ### 项目资料 5 | 数据集下载:[训练集 + 测试集](https://www.kaggle.com/c/dogs-vs-cats/data) ,kaggle一共提供了两个数据集train与test。其中train set包含25000个样本,cat和dog各12500个。 6 | 7 | ![](https://i.imgur.com/kS76chv.jpg) 8 | test set包含12500张图片,但没有分类。 9 | ![](https://i.imgur.com/BW4ga6N.jpg) 10 | 11 | ### 神经网络模型 12 | 本次项目采用经典的ResNet模型,ResNet在2015年提出以来,曾在ImageNet比赛classification任务上获得第一名,因为它“简单与实用”并存,之后很多方法都建立在ResNet50或者ResNet101的基础上完成的,检测,分割,识别等领域都有着广泛的运用[2]。 13 | 14 | ![](https://i.imgur.com/qhO6nzU.png) 15 | ![](https://i.imgur.com/vG8fosf.png) 16 | 17 | ### 文件结构 18 | - file3.csv 测试集结果 19 | - model.png 神经网络模型结构图 20 | - test.ipynb/test.html 程序文件 21 | - training_log2.txt 训练log文件 22 | - 毕业报告 2.5.pdf 提交的毕业报告 23 | -------------------------------------------------------------------------------- /model.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wzyanqi/dogs-vs-cats/7605fa5973a7eb27c4896dc864e98b8a4f7a7286/model.png -------------------------------------------------------------------------------- /test.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [ 10 | { 11 | "name": "stderr", 12 | "output_type": "stream", 13 | "text": [ 14 | "Using TensorFlow backend.\n" 15 | ] 16 | } 17 | ], 18 | "source": [ 19 | "import cv2\n", 20 | "import glob\n", 21 | "import time\n", 22 | "import numpy as np\n", 23 | "import matplotlib.pyplot as plt\n", 24 | "from keras.models import Model\n", 25 | "from keras.layers import Input, Flatten, Dropout, Dense\n", 26 | "from keras.models import load_model\n", 27 | "from keras.callbacks import ModelCheckpoint\n", 28 | "from sklearn.utils import shuffle\n", 29 | "from keras.preprocessing import image\n", 30 | "from keras.applications.resnet50 import preprocess_input, decode_predictions" 31 | ] 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "metadata": {}, 36 | "source": [ 37 | "### 自带的预处理方法" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 2, 43 | "metadata": { 44 | "collapsed": false 45 | }, 46 | "outputs": [], 47 | "source": [ 48 | "def get_train_data1(imgDir, size=224):\n", 49 | " imgDir = shuffle(imgDir)\n", 50 | " train_x = np.zeros((len(imgDir), size, size, 3), dtype=np.float32)\n", 51 | " train_y = np.zeros(len(imgDir), dtype=np.uint8)\n", 52 | " for index, file in enumerate(imgDir): \n", 53 | " img = image.load_img(file, target_size=(size, size))\n", 54 | " x = image.img_to_array(img)\n", 55 | " x = np.expand_dims(x, axis=0)\n", 56 | " train_x[index] = preprocess_input(x) \n", 57 | " if 'dog' in file:\n", 58 | " train_y[index] = 1\n", 59 | " return train_x, train_y" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 3, 65 | "metadata": { 66 | "collapsed": false 67 | }, 68 | "outputs": [ 69 | { 70 | "name": "stdout", 71 | "output_type": "stream", 72 | "text": [ 73 | "Proprecessing begin !\n", 74 | "1215.41 seconds to preprocessing !\n" 75 | ] 76 | } 77 | ], 78 | "source": [ 79 | "# cat 0 dog 1\n", 80 | "print('Proprecessing begin !')\n", 81 | "t=time.time()\n", 82 | "train_dir = glob.glob('train_all/*.jpg')\n", 83 | "train_x, train_y = get_train_data1(train_dir)\n", 84 | "t2 = time.time()\n", 85 | "print(round(t2 - t, 2), 'seconds to preprocessing !')" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "## ------------------------------------------------------------------------------------------------------\n", 93 | "### 50% Dropout" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 6, 99 | "metadata": { 100 | "collapsed": false 101 | }, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "Train on 20000 samples, validate on 5000 samples\n", 108 | "Epoch 1/10\n", 109 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.1564 - acc: 0.9376\n", 110 | "Epoch 00001: val_loss improved from inf to 0.04456, saving model to model_best.h5\n", 111 | "20000/20000 [==============================] - 5523s 276ms/step - loss: 0.1564 - acc: 0.9375 - val_loss: 0.0446 - val_acc: 0.9854\n", 112 | "Epoch 2/10\n", 113 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0777 - acc: 0.9699\n", 114 | "Epoch 00002: val_loss improved from 0.04456 to 0.03630, saving model to model_best.h5\n", 115 | "20000/20000 [==============================] - 5346s 267ms/step - loss: 0.0777 - acc: 0.9699 - val_loss: 0.0363 - val_acc: 0.9872\n", 116 | "Epoch 3/10\n", 117 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0541 - acc: 0.9809\n", 118 | "Epoch 00003: val_loss improved from 0.03630 to 0.03114, saving model to model_best.h5\n", 119 | "20000/20000 [==============================] - 5357s 268ms/step - loss: 0.0541 - acc: 0.9809 - val_loss: 0.0311 - val_acc: 0.9892\n", 120 | "Epoch 4/10\n", 121 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0420 - acc: 0.9854\n", 122 | "Epoch 00004: val_loss improved from 0.03114 to 0.03020, saving model to model_best.h5\n", 123 | "20000/20000 [==============================] - 5645s 282ms/step - loss: 0.0419 - acc: 0.9854 - val_loss: 0.0302 - val_acc: 0.9900\n", 124 | "Epoch 5/10\n", 125 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0328 - acc: 0.9883\n", 126 | "Epoch 00005: val_loss improved from 0.03020 to 0.02960, saving model to model_best.h5\n", 127 | "20000/20000 [==============================] - 5445s 272ms/step - loss: 0.0328 - acc: 0.9883 - val_loss: 0.0296 - val_acc: 0.9900\n", 128 | "Epoch 6/10\n", 129 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0278 - acc: 0.9905\n", 130 | "Epoch 00006: val_loss did not improve\n", 131 | "20000/20000 [==============================] - 5308s 265ms/step - loss: 0.0278 - acc: 0.9905 - val_loss: 0.0302 - val_acc: 0.9894\n", 132 | "Epoch 7/10\n", 133 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0199 - acc: 0.9933\n", 134 | "Epoch 00007: val_loss improved from 0.02960 to 0.02953, saving model to model_best.h5\n", 135 | "20000/20000 [==============================] - 5321s 266ms/step - loss: 0.0199 - acc: 0.9933 - val_loss: 0.0295 - val_acc: 0.9898\n", 136 | "Epoch 8/10\n", 137 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0167 - acc: 0.9943\n", 138 | "Epoch 00008: val_loss improved from 0.02953 to 0.02855, saving model to model_best.h5\n", 139 | "20000/20000 [==============================] - 5297s 265ms/step - loss: 0.0167 - acc: 0.9943 - val_loss: 0.0285 - val_acc: 0.9902\n", 140 | "Epoch 9/10\n", 141 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0159 - acc: 0.9947\n", 142 | "Epoch 00009: val_loss did not improve\n", 143 | "20000/20000 [==============================] - 5276s 264ms/step - loss: 0.0159 - acc: 0.9947 - val_loss: 0.0290 - val_acc: 0.9894\n", 144 | "Epoch 10/10\n", 145 | "19990/20000 [============================>.] - ETA: 2s - loss: 0.0128 - acc: 0.9963\n", 146 | "Epoch 00010: val_loss did not improve\n", 147 | "20000/20000 [==============================] - 5289s 264ms/step - loss: 0.0128 - acc: 0.9963 - val_loss: 0.0286 - val_acc: 0.9898\n" 148 | ] 149 | }, 150 | { 151 | "data": { 152 | "text/plain": [ 153 | "" 154 | ] 155 | }, 156 | "execution_count": 6, 157 | "metadata": {}, 158 | "output_type": "execute_result" 159 | } 160 | ], 161 | "source": [ 162 | "from keras.optimizers import SGD\n", 163 | "from keras.applications.resnet50 import ResNet50\n", 164 | "opt = SGD(lr=0.0001, momentum=0.9)\n", 165 | "myinput = Input(shape=(224, 224, 3))\n", 166 | "base_model = ResNet50(weights='imagenet', input_tensor=myinput, include_top=False)\n", 167 | "x = Flatten()(base_model.output)\n", 168 | "x = Dense(1024, activation='relu')(x)\n", 169 | "x = Dropout(0.5)(x)\n", 170 | "predictions = Dense(1, activation='sigmoid')(x)\n", 171 | "# this is the model we will train\n", 172 | "model = Model(myinput, predictions)\n", 173 | "model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])\n", 174 | "best_model = ModelCheckpoint('model_best.h5', verbose=1, save_best_only=True)\n", 175 | "model.fit(train_x, train_y, validation_split=0.2, shuffle=True, batch_size=16, epochs=10, callbacks=[best_model])\n", 176 | "model.save('model_10.h5')" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "## ------------------------------------------------------------------------------------------------------\n", 184 | "### 75% Dropout" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 4, 190 | "metadata": { 191 | "collapsed": false 192 | }, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "Train on 20000 samples, validate on 5000 samples\n", 199 | "Epoch 1/10\n", 200 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.2106 - acc: 0.9092\n", 201 | "Epoch 00001: val_loss improved from inf to 0.05018, saving model to model_best.h5\n", 202 | "20000/20000 [==============================] - 5164s 258ms/step - loss: 0.2105 - acc: 0.9092 - val_loss: 0.0502 - val_acc: 0.9854\n", 203 | "Epoch 2/10\n", 204 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0819 - acc: 0.9699\n", 205 | "Epoch 00002: val_loss improved from 0.05018 to 0.03850, saving model to model_best.h5\n", 206 | "20000/20000 [==============================] - 5244s 262ms/step - loss: 0.0819 - acc: 0.9699 - val_loss: 0.0385 - val_acc: 0.9872\n", 207 | "Epoch 3/10\n", 208 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0610 - acc: 0.9772\n", 209 | "Epoch 00003: val_loss improved from 0.03850 to 0.03489, saving model to model_best.h5\n", 210 | "20000/20000 [==============================] - 5178s 259ms/step - loss: 0.0610 - acc: 0.9771 - val_loss: 0.0349 - val_acc: 0.9880\n", 211 | "Epoch 4/10\n", 212 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0456 - acc: 0.9841\n", 213 | "Epoch 00004: val_loss improved from 0.03489 to 0.03298, saving model to model_best.h5\n", 214 | "20000/20000 [==============================] - 5251s 263ms/step - loss: 0.0456 - acc: 0.9841 - val_loss: 0.0330 - val_acc: 0.9886\n", 215 | "Epoch 5/10\n", 216 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0371 - acc: 0.9865\n", 217 | "Epoch 00005: val_loss improved from 0.03298 to 0.03106, saving model to model_best.h5\n", 218 | "20000/20000 [==============================] - 5050s 252ms/step - loss: 0.0372 - acc: 0.9865 - val_loss: 0.0311 - val_acc: 0.9890\n", 219 | "Epoch 6/10\n", 220 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0315 - acc: 0.9885\n", 221 | "Epoch 00006: val_loss improved from 0.03106 to 0.03022, saving model to model_best.h5\n", 222 | "20000/20000 [==============================] - 5065s 253ms/step - loss: 0.0315 - acc: 0.9886 - val_loss: 0.0302 - val_acc: 0.9898\n", 223 | "Epoch 7/10\n", 224 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0263 - acc: 0.9909\n", 225 | "Epoch 00007: val_loss improved from 0.03022 to 0.02980, saving model to model_best.h5\n", 226 | "20000/20000 [==============================] - 5122s 256ms/step - loss: 0.0263 - acc: 0.9909 - val_loss: 0.0298 - val_acc: 0.9902\n", 227 | "Epoch 8/10\n", 228 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0213 - acc: 0.9923\n", 229 | "Epoch 00008: val_loss improved from 0.02980 to 0.02937, saving model to model_best.h5\n", 230 | "20000/20000 [==============================] - 5072s 254ms/step - loss: 0.0215 - acc: 0.9922 - val_loss: 0.0294 - val_acc: 0.9906\n", 231 | "Epoch 9/10\n", 232 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0185 - acc: 0.9931\n", 233 | "Epoch 00009: val_loss did not improve\n", 234 | "20000/20000 [==============================] - 5049s 252ms/step - loss: 0.0185 - acc: 0.9931 - val_loss: 0.0297 - val_acc: 0.9896\n", 235 | "Epoch 10/10\n", 236 | "19984/20000 [============================>.] - ETA: 3s - loss: 0.0164 - acc: 0.9948\n", 237 | "Epoch 00010: val_loss did not improve\n", 238 | "20000/20000 [==============================] - 4931s 247ms/step - loss: 0.0163 - acc: 0.9948 - val_loss: 0.0301 - val_acc: 0.9906\n" 239 | ] 240 | } 241 | ], 242 | "source": [ 243 | "from keras.optimizers import SGD\n", 244 | "from keras.applications.resnet50 import ResNet50\n", 245 | "opt = SGD(lr=0.0001, momentum=0.9)\n", 246 | "myinput = Input(shape=(224, 224, 3))\n", 247 | "base_model = ResNet50(weights='imagenet', input_tensor=myinput, include_top=False)\n", 248 | "x = Flatten()(base_model.output)\n", 249 | "x = Dense(1024, activation='relu')(x)\n", 250 | "x = Dropout(0.75)(x)\n", 251 | "predictions = Dense(1, activation='sigmoid')(x)\n", 252 | "# this is the model we will train\n", 253 | "model = Model(myinput, predictions)\n", 254 | "model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])\n", 255 | "best_model = ModelCheckpoint('model_best.h5', verbose=1, save_best_only=True)\n", 256 | "model.fit(train_x, train_y, validation_split=0.2, shuffle=True, batch_size=16, epochs=10, callbacks=[best_model])\n", 257 | "model.save('model_10.h5')" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": { 263 | "collapsed": true 264 | }, 265 | "source": [ 266 | "## ------------------------------------------------------------------------------------------------------\n", 267 | "### Clean data" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 5, 273 | "metadata": { 274 | "collapsed": false 275 | }, 276 | "outputs": [ 277 | { 278 | "name": "stdout", 279 | "output_type": "stream", 280 | "text": [ 281 | "cat\\cat.10029.jpg\n", 282 | "cat\\cat.10266.jpg\n", 283 | "cat\\cat.10610.jpg\n", 284 | "cat\\cat.10712.jpg\n", 285 | "cat\\cat.11184.jpg\n", 286 | "cat\\cat.11222.jpg\n", 287 | "cat\\cat.11281.jpg\n", 288 | "cat\\cat.11565.jpg\n", 289 | "cat\\cat.12272.jpg\n", 290 | "cat\\cat.12499.jpg\n", 291 | "cat\\cat.1361.jpg\n", 292 | "cat\\cat.1962.jpg\n", 293 | "cat\\cat.2337.jpg\n", 294 | "cat\\cat.3202.jpg\n", 295 | "cat\\cat.3658.jpg\n", 296 | "cat\\cat.4085.jpg\n", 297 | "cat\\cat.4308.jpg\n", 298 | "cat\\cat.4360.jpg\n", 299 | "cat\\cat.4688.jpg\n", 300 | "cat\\cat.4986.jpg\n", 301 | "cat\\cat.5355.jpg\n", 302 | "cat\\cat.5418.jpg\n", 303 | "cat\\cat.5583.jpg\n", 304 | "cat\\cat.5795.jpg\n", 305 | "cat\\cat.5834.jpg\n", 306 | "cat\\cat.6304.jpg\n", 307 | "cat\\cat.6402.jpg\n", 308 | "cat\\cat.6655.jpg\n", 309 | "cat\\cat.7564.jpg\n", 310 | "cat\\cat.7655.jpg\n", 311 | "cat\\cat.7671.jpg\n", 312 | "cat\\cat.7703.jpg\n", 313 | "cat\\cat.7920.jpg\n", 314 | "cat\\cat.7968.jpg\n", 315 | "cat\\cat.8138.jpg\n", 316 | "cat\\cat.8456.jpg\n", 317 | "cat\\cat.8504.jpg\n", 318 | "cat\\cat.8828.jpg\n", 319 | "cat\\cat.9290.jpg\n", 320 | "cat\\cat.9596.jpg\n", 321 | "cat\\cat.9897.jpg\n" 322 | ] 323 | } 324 | ], 325 | "source": [ 326 | "import os\n", 327 | "size = 224\n", 328 | "model_test = load_model('model_best.h5')\n", 329 | "cat_dir = glob.glob('cat/*.jpg')\n", 330 | "for file_name in cat_dir: \n", 331 | " img = image.load_img(file_name, target_size=(size, size))\n", 332 | " x = image.img_to_array(img)\n", 333 | " x = np.expand_dims(x, axis=0)\n", 334 | " x = preprocess_input(x)\n", 335 | " if model_test.predict(x) > 0.5:\n", 336 | " print(file_name)\n", 337 | " os.remove(file_name)" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": 6, 343 | "metadata": { 344 | "collapsed": false 345 | }, 346 | "outputs": [ 347 | { 348 | "name": "stdout", 349 | "output_type": "stream", 350 | "text": [ 351 | "dog\\dog.10225.jpg\n", 352 | "dog\\dog.10524.jpg\n", 353 | "dog\\dog.10801.jpg\n", 354 | "dog\\dog.10939.jpg\n", 355 | "dog\\dog.11299.jpg\n", 356 | "dog\\dog.11300.jpg\n", 357 | "dog\\dog.11526.jpg\n", 358 | "dog\\dog.11731.jpg\n", 359 | "dog\\dog.12223.jpg\n", 360 | "dog\\dog.2614.jpg\n", 361 | "dog\\dog.3074.jpg\n", 362 | "dog\\dog.3341.jpg\n", 363 | "dog\\dog.3920.jpg\n", 364 | "dog\\dog.4334.jpg\n", 365 | "dog\\dog.4690.jpg\n", 366 | "dog\\dog.5251.jpg\n", 367 | "dog\\dog.5529.jpg\n", 368 | "dog\\dog.5767.jpg\n", 369 | "dog\\dog.6256.jpg\n", 370 | "dog\\dog.6921.jpg\n", 371 | "dog\\dog.7.jpg\n", 372 | "dog\\dog.7076.jpg\n", 373 | "dog\\dog.7332.jpg\n", 374 | "dog\\dog.7413.jpg\n", 375 | "dog\\dog.7692.jpg\n", 376 | "dog\\dog.8671.jpg\n", 377 | "dog\\dog.9517.jpg\n" 378 | ] 379 | } 380 | ], 381 | "source": [ 382 | "import os\n", 383 | "size = 224\n", 384 | "model_test = load_model('model_best.h5')\n", 385 | "dog_dir = glob.glob('dog/*.jpg')\n", 386 | "for file_name in dog_dir: \n", 387 | " img = image.load_img(file_name, target_size=(size, size))\n", 388 | " x = image.img_to_array(img)\n", 389 | " x = np.expand_dims(x, axis=0)\n", 390 | " x = preprocess_input(x)\n", 391 | " if model_test.predict(x) < 0.5:\n", 392 | " print(file_name)\n", 393 | " os.remove(file_name)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "## ------------------------------------------------------------------------------------------------------\n", 401 | "### 75% Dropout Retrain" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": 3, 407 | "metadata": { 408 | "collapsed": false 409 | }, 410 | "outputs": [ 411 | { 412 | "name": "stdout", 413 | "output_type": "stream", 414 | "text": [ 415 | "Proprecessing begin !\n", 416 | "1170.75 seconds to preprocessing !\n" 417 | ] 418 | } 419 | ], 420 | "source": [ 421 | "# cat 0 dog 1\n", 422 | "print('Proprecessing begin !')\n", 423 | "t=time.time()\n", 424 | "train_dir = glob.glob('clean_train/*.jpg')\n", 425 | "train_x, train_y = get_train_data1(train_dir)\n", 426 | "t2 = time.time()\n", 427 | "print(round(t2 - t, 2), 'seconds to preprocessing !')" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 4, 433 | "metadata": { 434 | "collapsed": false 435 | }, 436 | "outputs": [ 437 | { 438 | "name": "stdout", 439 | "output_type": "stream", 440 | "text": [ 441 | "Train on 19945 samples, validate on 4987 samples\n", 442 | "Epoch 1/10\n", 443 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.1761 - acc: 0.9323\n", 444 | "Epoch 00001: val_loss improved from inf to 0.04538, saving model to model_best.h5\n", 445 | "19945/19945 [==============================] - 5182s 260ms/step - loss: 0.1762 - acc: 0.9323 - val_loss: 0.0454 - val_acc: 0.9880\n", 446 | "Epoch 2/10\n", 447 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0691 - acc: 0.9755\n", 448 | "Epoch 00002: val_loss improved from 0.04538 to 0.03252, saving model to model_best.h5\n", 449 | "19945/19945 [==============================] - 5002s 251ms/step - loss: 0.0691 - acc: 0.9755 - val_loss: 0.0325 - val_acc: 0.9908\n", 450 | "Epoch 3/10\n", 451 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0515 - acc: 0.9810\n", 452 | "Epoch 00003: val_loss improved from 0.03252 to 0.02635, saving model to model_best.h5\n", 453 | "19945/19945 [==============================] - 5040s 253ms/step - loss: 0.0515 - acc: 0.9810 - val_loss: 0.0264 - val_acc: 0.9916\n", 454 | "Epoch 4/10\n", 455 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0395 - acc: 0.9854\n", 456 | "Epoch 00004: val_loss improved from 0.02635 to 0.02282, saving model to model_best.h5\n", 457 | "19945/19945 [==============================] - 4993s 250ms/step - loss: 0.0395 - acc: 0.9854 - val_loss: 0.0228 - val_acc: 0.9918\n", 458 | "Epoch 5/10\n", 459 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0311 - acc: 0.9888\n", 460 | "Epoch 00005: val_loss improved from 0.02282 to 0.02017, saving model to model_best.h5\n", 461 | "19945/19945 [==============================] - 6052s 303ms/step - loss: 0.0311 - acc: 0.9888 - val_loss: 0.0202 - val_acc: 0.9928\n", 462 | "Epoch 6/10\n", 463 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0237 - acc: 0.9920\n", 464 | "Epoch 00006: val_loss improved from 0.02017 to 0.01880, saving model to model_best.h5\n", 465 | "19945/19945 [==============================] - 5140s 258ms/step - loss: 0.0237 - acc: 0.9920 - val_loss: 0.0188 - val_acc: 0.9936\n", 466 | "Epoch 7/10\n", 467 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0218 - acc: 0.9919\n", 468 | "Epoch 00007: val_loss did not improve\n", 469 | "19945/19945 [==============================] - 5105s 256ms/step - loss: 0.0218 - acc: 0.9919 - val_loss: 0.0189 - val_acc: 0.9930\n", 470 | "Epoch 8/10\n", 471 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0199 - acc: 0.9929\n", 472 | "Epoch 00008: val_loss improved from 0.01880 to 0.01825, saving model to model_best.h5\n", 473 | "19945/19945 [==============================] - 5025s 252ms/step - loss: 0.0199 - acc: 0.9929 - val_loss: 0.0183 - val_acc: 0.9934\n", 474 | "Epoch 9/10\n", 475 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0162 - acc: 0.9948\n", 476 | "Epoch 00009: val_loss did not improve\n", 477 | "19945/19945 [==============================] - 5006s 251ms/step - loss: 0.0162 - acc: 0.9948 - val_loss: 0.0190 - val_acc: 0.9928\n", 478 | "Epoch 10/10\n", 479 | "19936/19945 [============================>.] - ETA: 2s - loss: 0.0139 - acc: 0.9959\n", 480 | "Epoch 00010: val_loss improved from 0.01825 to 0.01776, saving model to model_best.h5\n", 481 | "19945/19945 [==============================] - 5039s 253ms/step - loss: 0.0139 - acc: 0.9958 - val_loss: 0.0178 - val_acc: 0.9932\n" 482 | ] 483 | } 484 | ], 485 | "source": [ 486 | "from keras.optimizers import SGD\n", 487 | "from keras.applications.resnet50 import ResNet50\n", 488 | "opt = SGD(lr=0.0001, momentum=0.9)\n", 489 | "myinput = Input(shape=(224, 224, 3))\n", 490 | "base_model = ResNet50(weights='imagenet', input_tensor=myinput, include_top=False)\n", 491 | "x = Flatten()(base_model.output)\n", 492 | "x = Dense(1024, activation='relu')(x)\n", 493 | "x = Dropout(0.75)(x)\n", 494 | "predictions = Dense(1, activation='sigmoid')(x)\n", 495 | "# this is the model we will train\n", 496 | "model = Model(myinput, predictions)\n", 497 | "model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])\n", 498 | "best_model = ModelCheckpoint('model_best.h5', verbose=1, save_best_only=True)\n", 499 | "model.fit(train_x, train_y, validation_split=0.2, shuffle=True, batch_size=16, epochs=10, callbacks=[best_model])\n", 500 | "model.save('model_10.h5')" 501 | ] 502 | }, 503 | { 504 | "cell_type": "markdown", 505 | "metadata": { 506 | "collapsed": true 507 | }, 508 | "source": [ 509 | "## ------------------------------------------------------------------------------------------------------\n", 510 | "### Drawing model" 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": 4, 516 | "metadata": { 517 | "collapsed": false 518 | }, 519 | "outputs": [], 520 | "source": [ 521 | "from keras.models import load_model\n", 522 | "model = load_model('model_best.h5')\n", 523 | "from keras.utils.vis_utils import plot_model\n", 524 | "plot_model(model, to_file='model.png',show_shapes=True)" 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": {}, 530 | "source": [ 531 | "## ------------------------------------------------------------------------------------------------------\n", 532 | "### Testing" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 2, 538 | "metadata": { 539 | "collapsed": true 540 | }, 541 | "outputs": [], 542 | "source": [ 543 | "import os\n", 544 | "size = 224\n", 545 | "test_dir = glob.glob('test1/*.jpg')\n", 546 | "mytest = np.zeros((12500, size, size, 3), dtype=np.float32)\n", 547 | "test_path = r'E:\\final\\test1'\n", 548 | "for file_name in (test_dir):\n", 549 | " index = int(file_name[6:-4]) - 1\n", 550 | " img = image.load_img(file_name, target_size=(size, size))\n", 551 | " x = image.img_to_array(img)\n", 552 | " x = np.expand_dims(x, axis=0)\n", 553 | " mytest[index] = preprocess_input(x)" 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": {}, 559 | "source": [ 560 | "## ------------------------------------------------------------------------------------------------------\n", 561 | "### Write in file" 562 | ] 563 | }, 564 | { 565 | "cell_type": "code", 566 | "execution_count": 3, 567 | "metadata": { 568 | "collapsed": false 569 | }, 570 | "outputs": [], 571 | "source": [ 572 | "##predict\n", 573 | "with open('file3.csv','w') as f:\n", 574 | " f.write('id,label\\n')\n", 575 | "\n", 576 | "model_test = load_model('model_best.h5')\n", 577 | "\n", 578 | "with open('file3.csv','a') as f:\n", 579 | " for i in range(len(test_dir)):\n", 580 | " predict = model_test.predict(mytest[i:i+1])\n", 581 | " predict = predict[0][0]\n", 582 | " f.write('{},{}\\n'.format(i+1,predict))" 583 | ] 584 | } 585 | ], 586 | "metadata": { 587 | "anaconda-cloud": {}, 588 | "kernelspec": { 589 | "display_name": "Python [conda root]", 590 | "language": "python", 591 | "name": "conda-root-py" 592 | }, 593 | "language_info": { 594 | "codemirror_mode": { 595 | "name": "ipython", 596 | "version": 3 597 | }, 598 | "file_extension": ".py", 599 | "mimetype": "text/x-python", 600 | "name": "python", 601 | "nbconvert_exporter": "python", 602 | "pygments_lexer": "ipython3", 603 | "version": "3.5.2" 604 | } 605 | }, 606 | "nbformat": 4, 607 | "nbformat_minor": 1 608 | } 609 | -------------------------------------------------------------------------------- /training_log2.txt: -------------------------------------------------------------------------------- 1 | Train on 19945 samples, validate on 4987 samples 2 | Epoch 1/10 3 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.1761 - acc: 0.9323 4 | Epoch 00001: val_loss improved from inf to 0.04538, saving model to model_best.h5 5 | 19945/19945 [==============================] - 5182s 260ms/step - loss: 0.1762 - acc: 0.9323 - val_loss: 0.0454 - val_acc: 0.9880 6 | Epoch 2/10 7 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0691 - acc: 0.9755 8 | Epoch 00002: val_loss improved from 0.04538 to 0.03252, saving model to model_best.h5 9 | 19945/19945 [==============================] - 5002s 251ms/step - loss: 0.0691 - acc: 0.9755 - val_loss: 0.0325 - val_acc: 0.9908 10 | Epoch 3/10 11 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0515 - acc: 0.9810 12 | Epoch 00003: val_loss improved from 0.03252 to 0.02635, saving model to model_best.h5 13 | 19945/19945 [==============================] - 5040s 253ms/step - loss: 0.0515 - acc: 0.9810 - val_loss: 0.0264 - val_acc: 0.9916 14 | Epoch 4/10 15 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0395 - acc: 0.9854 16 | Epoch 00004: val_loss improved from 0.02635 to 0.02282, saving model to model_best.h5 17 | 19945/19945 [==============================] - 4993s 250ms/step - loss: 0.0395 - acc: 0.9854 - val_loss: 0.0228 - val_acc: 0.9918 18 | Epoch 5/10 19 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0311 - acc: 0.9888 20 | Epoch 00005: val_loss improved from 0.02282 to 0.02017, saving model to model_best.h5 21 | 19945/19945 [==============================] - 6052s 303ms/step - loss: 0.0311 - acc: 0.9888 - val_loss: 0.0202 - val_acc: 0.9928 22 | Epoch 6/10 23 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0237 - acc: 0.9920 24 | Epoch 00006: val_loss improved from 0.02017 to 0.01880, saving model to model_best.h5 25 | 19945/19945 [==============================] - 5140s 258ms/step - loss: 0.0237 - acc: 0.9920 - val_loss: 0.0188 - val_acc: 0.9936 26 | Epoch 7/10 27 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0218 - acc: 0.9919 28 | Epoch 00007: val_loss did not improve 29 | 19945/19945 [==============================] - 5105s 256ms/step - loss: 0.0218 - acc: 0.9919 - val_loss: 0.0189 - val_acc: 0.9930 30 | Epoch 8/10 31 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0199 - acc: 0.9929 32 | Epoch 00008: val_loss improved from 0.01880 to 0.01825, saving model to model_best.h5 33 | 19945/19945 [==============================] - 5025s 252ms/step - loss: 0.0199 - acc: 0.9929 - val_loss: 0.0183 - val_acc: 0.9934 34 | Epoch 9/10 35 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0162 - acc: 0.9948 36 | Epoch 00009: val_loss did not improve 37 | 19945/19945 [==============================] - 5006s 251ms/step - loss: 0.0162 - acc: 0.9948 - val_loss: 0.0190 - val_acc: 0.9928 38 | Epoch 10/10 39 | 19936/19945 [============================>.] - ETA: 2s - loss: 0.0139 - acc: 0.9959 40 | Epoch 00010: val_loss improved from 0.01825 to 0.01776, saving model to model_best.h5 41 | 19945/19945 [==============================] - 5039s 253ms/step - loss: 0.0139 - acc: 0.9958 - val_loss: 0.0178 - val_acc: 0.9932 -------------------------------------------------------------------------------- /毕业报告 2.5.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/wzyanqi/dogs-vs-cats/7605fa5973a7eb27c4896dc864e98b8a4f7a7286/毕业报告 2.5.pdf --------------------------------------------------------------------------------