├── .~lock.Vehicles_Categorization.odt# ├── Capstone Project_writeup.md ├── Capstone Project_writeup.pdf ├── Dataset ├── pictures_per_wnid_dict.pickle ├── pictures_summary.pickle └── vehicles_synsets.txt ├── README.md ├── Vehicles_Categorization.html ├── Vehicles_Categorization.ipynb ├── Vehicles_Categorization.odt ├── images ├── Layers_1_adam_acc.png ├── Layers_1_adam_loss.png ├── augmented_pics.png ├── car_filters.png ├── cnn_pooling.png ├── dataset_characteristics.png ├── deeplearning_google.png ├── download_process.png ├── image_transformations.png ├── kernel.png ├── model_results.png ├── model_workflow.png ├── monza_analysis.png ├── neural_net.png ├── optimizers.png ├── sgd.png ├── truck_filters.png ├── truck_filters_4.png └── truck_versus_car.png ├── models ├── Layers_1_Adagrad_acc.png ├── Layers_1_Adagrad_loss.png ├── Layers_1_adam_acc.png ├── Layers_1_adam_loss.png ├── Layers_2_Adagrad_acc.png ├── Layers_2_Adagrad_loss.png ├── Layers_2_adam_acc.png ├── Layers_2_adam_loss.png ├── Layers_3_Adagrad_acc.png ├── Layers_3_Adagrad_loss.png ├── Layers_3_adam_acc.png ├── Layers_3_adam_loss.png ├── Layers_4_Adagrad_acc.png ├── Layers_4_Adagrad_loss.png ├── Layers_4_adam_acc.png └── Layers_4_adam_loss.png └── scripts ├── Dataset_wrangling.py ├── ImageProcessor.py ├── configuration.py ├── download_imagenet_images.py └── models.py /.~lock.Vehicles_Categorization.odt#: -------------------------------------------------------------------------------- 1 | ,rafaelcastillo,INV00733,20.09.2016 10:22,file:///home/rafaelcastillo/.config/libreoffice/4; -------------------------------------------------------------------------------- /Capstone Project_writeup.md: -------------------------------------------------------------------------------- 1 | # Capstone Project 2 | ## Machine Learning Engineer Nanodegree 3 | Rafael Castillo Alcibar 4 | 5 | September 15th, 2016 6 | 7 | ## I. Definition 8 | 9 | ### Project Overview 10 | 11 | This project is a proof of concept (POC) solution where deep learning techniques are applied to vehicle recognition tasks, this is particularly important task in the area of traffic control and management, for example, companies operating road tolls to detect fraud actions since different fees are applied with regards to vehicle types. Images used to train neural nets are obtained from the [Imagenet](http://image-net.org/) dataset, which publicly distributes images URLs for hundreds of categories. Since the whole experiment is performed on a personal computer with limited computational resources, POC scope is also limited to the simple classification of two different kinds of vehicles: Trailer Trucks versus Sports Cars. Main POC's goal is to determine the maximum accuracy (percent of times model was correct on its predictions) different neural nets with basic architectures can reach using a limited set of images (less than 700) for training. 12 | 13 | Drawing 14 | 15 | The whole project is constructed using [Keras](https://github.com/fchollet/keras), which is a highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano and it was developed with a focus on enabling fast experimentation. In this case, Theano is the backend selected. 16 | 17 | #### Some Deep Learning Background 18 | The first general, working learning algorithm for supervised deep feedforward multilayer perceptrons was published by Ivakhnenko and Lapa in 1965. In 1989, Yann LeCun et al. were able to apply the standard backpropagation algorithm, which had been around as the reverse mode of automatic differentiation since 1970 to a deep neural network with the purpose of recognizing handwritten ZIP codes on mail. Despite the success of applying the algorithm, the time to train the network on this dataset was approximately 3 days, making it impractical for general use. According to LeCun, in the early 2000s, in an industrial application CNNs already processed an estimated 10% to 20% of all the checks written in the US in the early 2000s. The significant additional impact of deep learning in image or object recognition was felt in the years 2011–2012. Although CNNs trained by backpropagation had been around for decades, fast implementations of CNNs with max-pooling on GPUs were needed to make a dent in computer vision. In 2011, this approach achieved for the first time superhuman performance in a visual pattern recognition contest. 19 | 20 | Deep learning is often presented as a step towards realizing strong AI and thus many organizations have become interested in its use for particular applications. In December 2013, Facebook hired Yann LeCun to head its new artificial intelligence (AI) lab. The AI lab will develop deep learning techniques to help Facebook do tasks such as automatically tagging uploaded pictures with the names of the people in them. 21 | 22 | In 2014, Google also bought DeepMind Technologies, a British start-up that developed a system capable of learning how to play Atari video games using only raw pixels as data input. In 2015 they demonstrated AlphaGo system which achieved one of the long-standing "grand challenges" of AI by learning the game of Go well enough to beat a human professional Go player. [ref 04] 23 | 24 | 25 | Drawing 26 | 27 | 28 | 29 | ### Problem Statement 30 | 31 | For this project, it is used a personal computer (Intel® Core™ i5-4310M CPU @ 2.70GHz × 4, 8 Gb RAM and 64-bit). Different deep learning models are trained and validated and their results compared in order to determine which architecture maximizes prediction scores in the vehicle classification recognition while minimizing computational costs. Vehicles images will be used to train and test models while a subset of images are used for validation (these are unseen images for models not previously used during the train/test phase). 32 | 33 | ### Metrics 34 | 35 | Since this is a binary classification problem, basically model will try to respond to the question: "Is the vehicle of this picture a trailer truck or a sports car?", scores like Precision, Recall, F-Scores or Accuracy are suitable. For simplicity, and since the dataset is balanced (there are a similar number of images for each class), accuracy is the score used to evaluate models performances. Accuracy gives an estimate of how an often model is correct on its predictions, that is, how often model correctly flags a truck like a truck and a sports car as a sports car. 36 | 37 | In the other hand, since computational resources is also a critical point to consider, minutes required to train the model is the second metric used. Combination of both scores allows the identification of the model that maximizes precision while minimizing computational resources. 38 | 39 | 40 | 41 | ## II. Analysis 42 | 43 | ### Data Exploration 44 | 45 | Images are collected from Imagenet dataset which contains hundreds of different categories, [synsets](https://en.wikipedia.org/wiki/WordNet) is used to identify any particular category, for example, in this particular case the following synsets are used: 46 | 47 | 1. n04467665: Trailer Trucks 48 | 2. n04285008: Sport-Cars 49 | 50 | To retrieve images, since image URLs are freely available, the process to download by HTTP protocol requires a little python script to download pictures from urls listed in the following link: 51 | ``` 52 | http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=[wnid] 53 | ```` 54 | 55 | where ```[wnid]``` is one of the synsets selected previously. For further reference, please check [Imagenet documentation](http://image-net.org/download-imageurls). 56 | 57 | 58 | Drawing 59 | 60 | Image below summarizes main dataset characteristics, a total of 1550 pictures are available with a mean height and width of 352 and 483 pixels respectively with 3 channels (RGB) for colors. Since there are pictures with 1px height and width, pictures with less than 150px for each dimension are removed since are considered to not be valid for the project due to their poor resolution,this process eliminated just 32 images. 61 | 62 | Drawing 63 | 64 | With regards of the different classes, just one of them include more than 800 pictures, since the dataset is at this stage umbalanced and small, [Keras Data Generator](https://keras.io/preprocessing/image/) utility is employed to generate fake images from the pictures already available, for example, the picture below. This utility is crucial to balance classes and generate new images to train models [ref 05]. 65 | 66 | 67 | Drawing 68 | 69 | ### Exploratory Visualization 70 | 71 | In order to use images as input for the deep learning models, images need to be converted into multidimensional arrays of number where each pixel represents a cell in the multidimensional array. For this process it is used the Numpy library ```ndimage``` as described in this [tutorial](http://www.scipy-lectures.org/advanced/image_processing/). In this project images are resized to 150px height and width respectively in gray scale of colors (since color is not a determinant characteristic to differentiate between a truck and a sports car) which reduces dimensions in two dimensions (RGB = 3 channels, grayscale = 1 channel) 72 | 73 | Drawing 74 | 75 | ### Algorithms and Techniques 76 | 77 | Different deep convolutional neural nets architectures are used to perform this task, which nowadays seems to be the best known approach in the image recognition field. Images categorization is a complex task, for example, a grayscale image of size 150x150 would be transformed to a vector of size 150·150 = 22500 for a fully connected neural network. Such huge dimensionality with no predefined features makes this problem unapproachable for standard supervised learning approaches, even combining them with dimensional reduction techniques like PCA. 78 | 79 | Convolutional nets are elected to be the most efficient technique to extract relevant information from, in this case, images to be used in classification tasks. When used for image recognition, convolutional neural networks (CNNs) consist of multiple layers of small kernels which process portions of the input image, called receptive fields. Kernels are small matrix (normally 3x3 or 5x5) applied over the input image to extract features from data, this technique has been used in image processing for decades, from Photoshop filters to medical imaging. [This blog by Victor Powell](http://setosa.io/ev/image-kernels/) is an excellent resource to understand how kernels works. 80 | 81 | Drawing 82 | 83 | The outputs of these kernels are then tiled so that their input regions overlap, to obtain a better representation of the original image; this is repeated for every such layer. Convolutional networks may include local or global pooling layers, which combine the outputs of neuron clusters. Compared to other image classification algorithms, convolutional neural networks use relatively little pre-processing. This means that the network is responsible for learning the filters that in traditional algorithms were hand-engineered. The lack of dependence on prior knowledge and human effort in designing features is a major advantage for CNNs. 84 | 85 | Another important concept of CNNs is pooling, which is a form of non-linear down-sampling. There are several non-linear functions to implement pooling among which max pooling is the most common. It partitions the input image into a set of non-overlapping rectangles and, for each such sub-region, outputs the maximum. The intuition is that once a feature has been found, its exact location isn't as important as its rough location relative to other features. The function of the pooling layer is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting. It is common to periodically insert a pooling layer in-between successive conv layers in a CNN architecture. The pooling operation provides a form of translation invariance.[ref 06]. 86 | 87 | Drawing 88 | 89 | The proposed net architecture for this particular problem is a neural net with 1 to 4 layers where each layer includes a CNN + Max Pooling layer. On top of that it is included a fully connected net with 150 nodes in the input side and 1 node to output results and dropout implemented. Dropout is a regularization technique for reducing overfitting in neural networks by preventing complex co-adaptations on training data, it basically consist in dropping out nodes randomly in a neural network to gain robustness in model predictions. Below is included an example of a proposed architecture: 90 | 91 | Drawing 92 | 93 | 94 | On top of this, two different optimizers are employed: ```Adam``` and ```Adagram```. Optimizers are used to minimize the ```Cost``` function in a neural net. In the example below, we can see there are weights (W) and biases (b) for every node and connection between nodes in a neural network: 95 | 96 | Drawing 97 | 98 | A cost function is a measure of "how good" a neural network did with respect to it's given training sample and the expected output. It also may depend on variables such as weights and biases. A cost function is a single value, not a vector, because it rates how good the neural network did as a whole. 99 | 100 | Specifically, a cost function is of the form: 101 | ``` 102 | C(W,B,S,E) 103 | ``` 104 | where ```W``` is our neural network's weights, ```B``` is our neural network's biases, ```S``` is the input of a single training sample, and ```E``` is the desired output of that training sample.[ref 08] 105 | 106 | While there are different ways to represent the ```Cost``` function, the goal of optimization is to minimize it. Different approaches are used, Stochastic Gradient Descent (SGD) tries to find minimums or maximums by iteration. This is the most common approach and different versions of this method originates the optimizers here employed: 107 | * **AdaGrad** (for adaptive gradient algorithm) is a modified stochastic gradient descent with per-parameter learning rate, first published in 2011. Informally, this increases the learning rate for more sparse parameters and decreases the learning rate for less sparse ones. This strategy often improves convergence performance over standard stochastic gradient descent. 108 | * **Adam** is also a method in which the learning rate is adapted for each of the parameters. The idea is to divide the learning rate for a weight by a running average of their magnitudes and recent gradients for that weight. 109 | 110 | Drawing 111 | 112 | ### Benchmark 113 | 114 | In the study: [Monza: Image Classification of Vehicle Make and Model Using Convolutional Neural Networks and Transfer Learning](http://cs231n.stanford.edu/reports/lediurfinal.pdf) several machine learning approaches are used for car detection and identification. A fine-grained dataset containing 196 different classes of cars is employed. This dataset is particularly challenging due to the freeform nature of the images, which contained cars in many different sizes, shapes, and poses, similar scenario applies to the current dataset, but in this particular case there are just two different classes. Study results are presented in terms of accuracy for the top1 and top5 classes for the different approaches used. For the Deep Learning approaches, accuracy values are around 0.8, so this will be the value used to benchmark current results. 115 | 116 | Drawing 117 | 118 | ## III. Methodology 119 | 120 | ### Data Preprocessing 121 | 122 | Vehicles images are downloaded from the Imagenet dataset, in the notebook included is described the process step-by-step, basically it is required to include the configuration parameters in ```configuration.py``` and execute ```download_imagenet_images.py```. 123 | 124 | Once all files are downloaded, different picture classes needs to be organized following the structure: 125 | ``` 126 | dataset\ 127 | train\ 128 | n04467665\ 129 | n04467665_01.png 130 | n04467665_04.png 131 | ... 132 | n04285008\ 133 | n04285008_01.png 134 | n04285008_04.png 135 | ... 136 | test\ 137 | n04467665\ 138 | n04467665_02.png 139 | n04467665_03.png 140 | ... 141 | n04285008\ 142 | n04285008_02.png 143 | n04285008_03.png 144 | ... 145 | validation\ 146 | n04467665\ 147 | n04467665_07.png 148 | n04467665_09.png 149 | ... 150 | n04285008\ 151 | n04285008_07.png 152 | n04285008_09.png 153 | ... 154 | ``` 155 | 156 | For this purpose, ```Data_Wrangling.py``` is employed. Next step is to eliminate those pictures with height and width lower than 150px. A threshold of 150px is employed since this is the images dimensions used in common Deep Learning models nowadays and the size of the input images for the models. To end, ```ImageProcessor.py``` is employed to perform several tasks: 157 | 1. Generate augmented images from current images using the Keras utility ImageDataGenerator to be used to train models. 158 | - rescale=1./255: as the images taken by Raspberry Pi’s camera come with RGB coefficients in the range of 0-255 I had to normalize the values to span from 0 to 1., which was achieved by this scaling 159 | - rotation_range=40: images were rotated randomly by 0-40 degrees 160 | - width_shift_range=0.01: range in which image was randomly translated vertically 161 | - height_shift_range=0.1: range in which image was randomly translated horizontally 162 | - shear_range=0.05: range in which shearing transformations were applied randomly 163 | - zoom_range=0.1: range in which image was zoomed at randomly 164 | - fill_mode='nearest': this was the method with which newly introduced pixels were filled out 165 | 2. Resize pictures to height and width of 150px. 166 | 3. Use a gray scale for the picture colors (since color is not a important feature to distinguish a truck from a car). 167 | 168 | ### Implementation 169 | 170 | For the implementation I have chosen Keras. Keras is a neural network library for Theano and TensorFlow written in Python. Different convolutional neural net architectures were applied for the task with the intention of identifying the architecture that reached a reasonable accuracy with the minimum computational resources. Networks consisted of an input layer, 1 to 4 convolutional layers, a fully connected layer, and an output layer. The convolutional layers used 3x3 convolutions and 32-64 output filters followed by max pooling layers of 2x2. For the activation functions rectified linear units are used, except for the final output neuron which was sigmoid. After the fully connected layer a dropout of 0.5 was applied (this helps to prevent overfitting). For the loss function I have used logloss (binary crossentropy). Two different optimizers are used and compared: ```adam``` and ```Adagrad```. 171 | 172 | With regards to the difficulties encountered in the process, the first difficulty was to understand how data augmentation should be carried out. It needed a bit of trial and error to figure out how much I can distort the images, such that the car/truck remains on the images all the time. 173 | A further difficult was to figure out how to visualize the filters. Although I had a [solution to start from](https://keras.io/getting-started/faq/#how-can-i-record-the-training-validation-loss-accuracy-at-each-epoch), the code needed some workaround, such as referencing my convolutional layers, writing a function to draw the images for all filters. 174 | 175 | In the code side, ```models.run_model()``` is used to train the model as well as to validate it over the validation set and generate the learning curves. ```models.build_model()``` is used to build the model with the network architecture according to the parameters defined. Data generated in function ```models.data_augmentation()``` is used to train/test the model in batches of 32 images and 100 epochs, and to generate validation data to validate the trained model against new data not previously used during train/test. ```models.learning_curves()``` are used to graphically represent the process and get an overview of the accuracy and loss values by epochs. 176 | 177 | Drawing 178 | 179 | 180 | ### Refinement 181 | 182 | Nets with 1 to 4 layers are tested in order to determine which configuration provides the best performance while minimizing computational resources. In the picture below it is demonstrate nets performance in terms of accuracy and minutes to train for the different layers and optimized used: 183 | 184 | Drawing 185 | 186 | | Net Architecture | Accuracy | Minutes to train | 187 | |----------|:-------------:|------:| 188 | | Layers_1_adam | 0.86 | 21 | 189 | | Layers_1_Adagrad | 0.55 | 16 | 190 | | Layers_2_adam | 0.81 | 26 | 191 | | Layers_2_Adagrad | 0.55 | 22 | 192 | | Layers_3_adam | 0.87 | 25 | 193 | | Layers_3_Adagrad | 0.86 | 23 | 194 | | Layers_4_adam | 0.66 | 24 | 195 | | Layers_4_Adagrad | 0.84 | 25 | 196 | 197 | 198 | Neural Net with 1 layer and adam optimizer already meet the benchmark criteria, so no need of further refinement is required. Below are the accuracy and loss representation for Layer 1 and adam optimizer: 199 | 200 | 201 | ## IV. Results 202 | 203 | 204 | ### Model Evaluation and Validation 205 | 206 | A validation set with a 10% of the dataset, not used during training/testing phase, is used to validate results. The final architecture selected, 1 layer and adam optimizer, reaches an accuracy over 80% which is in the range of the benchmark results. An accuracy of 80% means that model is correct in in 80 out of 100 predictions made. Since the dataset is balanced (thanks to the data augmentation), accuracy is a perfectly valid metric in this scenario and no need to investigate alternatives like Precision & Recall or F-Scores is required. 207 | 208 | Drawing 209 | 210 | Drawing 211 | 212 | ### Justification 213 | In this section, your model’s final solution and its results should be compared to the benchmark you established earlier in the project using some type of statistical analysis. You should also justify whether these results and the solution are significant enough to have solved the problem posed in the project. Questions to ask yourself when writing this section: 214 | - _Are the final results found stronger than the benchmark result reported earlier?_ 215 | - _Have you thoroughly analyzed and discussed the final solution?_ 216 | - _Is the final solution significant enough to have solved the problem?_ 217 | 218 | The result obtained with the model selected was higher than actually expected. In a more simplistic approach (just two classes), model is capable to reach state of the art accuracy performances even on the validation set (completely unseen data for the model). We can consider this proof of concept satisfactory as model reaches benchmark results. 219 | 220 | 221 | ## V. Conclusion 222 | _(approx. 1-2 pages)_ 223 | 224 | ### Free-Form Visualization 225 | 226 | Following are represented some original pictures and how different filters represent them in the different convolutional layers. This gives us an idea of how the neural net decomposes the 227 | visual space. 228 | 229 | 230 | Drawing 231 | 232 | Drawing 233 | 234 | Drawing 235 | 236 | In both examples, for layer 1, different filters focus mainly on shapes and still images are recognizable, but in higher layers this not happens anymore and looks mostly noise. As mentioned by ```@fchollet``` on his expcetional [post](https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html): _"Does it mean that convnets are bad tools? Of course not, they serve their purpose just fine. What it means is that we should refrain from our natural tendency to anthropomorphize them and believe that they "understand", say, the concept of dog, or the appearance of a magpie, just because they are able to classify these objects with high accuracy. They don't, at least not to any any extent that would make sense to us humans."_ 237 | 238 | ### Reflection 239 | 240 | In this POC it is implemented a Deep Learning solution to automatic vehicle recognition. Image recognition used to be a difficult task historically, however for the last few years (thanks to augmented computational resources) there are efficient methods to approach these kinds of problems. Deep multi-layer neural networks are capable of building up a hierarchy of abstractions that makes it possible to identify 241 | complex inputs (i.e. images), and in this project this is the approach selected. 242 | 243 | There were two major areas for the project. The first was data collection, the second was model building. Given that collected dataset is reduced, a critical part in this project is the use ofthe data augmentation utility from Keras to help to prevent overfitting and improve 244 | generalization. 245 | 246 | After this, building the different models attempted is not particularly complex (thanks to Keras again!), and although there are a significant amount of parameters to experiment (such as the type of activation functions, regularization methods, loss functions, error metrics, nodes in fully connected layers, etc.), it is started from good architectures that were published by ```@fchollet```, and build from there. It is amazing to see how efficient this method is, and how fast it is possible to set up an architecture that is performing well on the task. 247 | 248 | Although the final method fits expectations for the problem, further testing with more validation data would be desired. The bottleneck here was the difficulty around data collection. Additional data could be used to warranty model does generalize well enough in largely different environments. 249 | 250 | 251 | 252 | ### Improvement 253 | 254 | With regards to improvements, as already mentioned, gathering additional data would help in generalization. A further area in which to expand the project, is to expand it to a multiclass classification project, such that the model not only recognizes cars from trucks, but many other vehicles as well, such as vans, motorcycles, etc. Considering that, it would potentially be necessary to expand the model architecture by adding more layers and neurons to it, such that the model is expressive enough to accommodate the additional complexity. 255 | 256 | 257 | ### References: 258 | 259 | [ref 01]: [Imagenet Classification with deep convolutional neural networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) 260 | 261 | [ref 02]: [Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks](http://static.googleusercontent.com/media/research.google.com/es//pubs/archive/42241.pdf) 262 | 263 | [ref 03]: Ivakhnenko, A. G. and Lapa, V. G. (1965). Cybernetic Predicting Devices. CCM Information Corporation. 264 | 265 | [ref 04]: [Wikipedia Deep Learning History](https://en.wikipedia.org/wiki/Deep_learning#History) 266 | 267 | [ref 05]: [Building powerful image classification models using very little data](https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html) 268 | 269 | [ref 06]: [Convolutional Neural Network](https://en.wikipedia.org/wiki/Convolutional_neural_network) 270 | 271 | [ref 07]: [Monza: Image Classification of Vehicle Make and Model Using Convolutional Neural Networks and Transfer Learning](http://cs231n.stanford.edu/reports/lediurfinal.pdf) 272 | 273 | [ref 08]: [ Neilsen's book](http://neuralnetworksanddeeplearning.com/) 274 | 275 | 276 | -------------------------------------------------------------------------------- /Capstone Project_writeup.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/Capstone Project_writeup.pdf -------------------------------------------------------------------------------- /Dataset/pictures_per_wnid_dict.pickle: -------------------------------------------------------------------------------- 1 | (dp0 2 | S'n04467665' 3 | p1 4 | cnumpy.core.multiarray 5 | scalar 6 | p2 7 | (cnumpy 8 | dtype 9 | p3 10 | (S'i8' 11 | p4 12 | I0 13 | I1 14 | tp5 15 | Rp6 16 | (I3 17 | S'<' 18 | p7 19 | NNNI-1 20 | I-1 21 | I0 22 | tp8 23 | bS'm\x03\x00\x00\x00\x00\x00\x00' 24 | p9 25 | tp10 26 | Rp11 27 | sS'n04285008' 28 | p12 29 | g2 30 | (g6 31 | S'\xd1\x02\x00\x00\x00\x00\x00\x00' 32 | p13 33 | tp14 34 | Rp15 35 | s. -------------------------------------------------------------------------------- /Dataset/vehicles_synsets.txt: -------------------------------------------------------------------------------- 1 | n04467665 2 | n04285008 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deep-Learning Vehicle Classification 2 | 3 | 4 | This project is a proof of concept (POC) solution where deep learning techniques are applied to vehicle recognition tasks, this is particularly important task in the area of traffic control and management, for example, companies operating road tolls to detect fraud actions since different fees are applied with regards to vehicle types. Images used to train neural nets are obtained from the [Imagenet](http://image-net.org/) dataset, which publicly distributes images URLs for hundreds of categories. Since the whole experiment is performed on a personal computer with limited computational resources, POC scope is also limited to the simple classification of two different kinds of vehicles: Trailer Trucks versus Sports Cars. Main POC's goal is to determine the maximum accuracy (percent of times model was correct on its predictions) different neural nets with basic architectures can reach using a limited set of images (less than 700) for training. 5 | 6 | Drawing 7 | 8 | ## COMPLETE REPORT: 9 | Report with full descriptions of motivations, methodology, results, etc. [Deep Learning Vehicle Classification-Project_writeup](https://github.com/kingkastle/Deep-Learning---Vehicle-Classification/blob/master/Capstone%20Project_writeup.md) 10 | 11 | 12 | ## Requirements: 13 | 14 | Following libraries are necessary: 15 | 16 | ``` 17 | # local scripts: 18 | from scripts import configuration # includes paths and parameters configurations 19 | from scripts import models # includes the different models 20 | from scripts import Dataset_wrangling # includes scripts from downloading pics to generate datasets 21 | 22 | 23 | # standard libraries 24 | import os 25 | import h5py 26 | from keras.preprocessing.image import ImageDataGenerator 27 | from keras.models import Sequential 28 | from keras.layers import Convolution2D,MaxPooling2D 29 | import numpy as np 30 | from sklearn.metrics import f1_score 31 | from sklearn.metrics import recall_score 32 | from datetime import datetime 33 | import configuration 34 | import pickle 35 | import multiprocessing 36 | import logging 37 | import urllib2 38 | import download_imagenet_images 39 | import pandas 40 | import pickle 41 | import seaborn as sns 42 | import matplotlib.pyplot as plt 43 | import matplotlib.image as mpimg 44 | from IPython.core.display import HTML,display 45 | 46 | ``` 47 | 48 | Project is enterely written in Python 2.7. 49 | 50 | ## Instructions: 51 | 52 | Please follow the instructions given in ```Vehicles_Categorization.ipynb``` 53 | 54 | 55 | Enjoy! 56 | -------------------------------------------------------------------------------- /Vehicles_Categorization.odt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/Vehicles_Categorization.odt -------------------------------------------------------------------------------- /images/Layers_1_adam_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/Layers_1_adam_acc.png -------------------------------------------------------------------------------- /images/Layers_1_adam_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/Layers_1_adam_loss.png -------------------------------------------------------------------------------- /images/augmented_pics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/augmented_pics.png -------------------------------------------------------------------------------- /images/car_filters.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/car_filters.png -------------------------------------------------------------------------------- /images/cnn_pooling.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/cnn_pooling.png -------------------------------------------------------------------------------- /images/dataset_characteristics.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/dataset_characteristics.png -------------------------------------------------------------------------------- /images/deeplearning_google.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/deeplearning_google.png -------------------------------------------------------------------------------- /images/download_process.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/download_process.png -------------------------------------------------------------------------------- /images/image_transformations.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/image_transformations.png -------------------------------------------------------------------------------- /images/kernel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/kernel.png -------------------------------------------------------------------------------- /images/model_results.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/model_results.png -------------------------------------------------------------------------------- /images/model_workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/model_workflow.png -------------------------------------------------------------------------------- /images/monza_analysis.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/monza_analysis.png -------------------------------------------------------------------------------- /images/neural_net.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/neural_net.png -------------------------------------------------------------------------------- /images/optimizers.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/optimizers.png -------------------------------------------------------------------------------- /images/sgd.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/sgd.png -------------------------------------------------------------------------------- /images/truck_filters.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/truck_filters.png -------------------------------------------------------------------------------- /images/truck_filters_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/truck_filters_4.png -------------------------------------------------------------------------------- /images/truck_versus_car.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/images/truck_versus_car.png -------------------------------------------------------------------------------- /models/Layers_1_Adagrad_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_1_Adagrad_acc.png -------------------------------------------------------------------------------- /models/Layers_1_Adagrad_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_1_Adagrad_loss.png -------------------------------------------------------------------------------- /models/Layers_1_adam_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_1_adam_acc.png -------------------------------------------------------------------------------- /models/Layers_1_adam_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_1_adam_loss.png -------------------------------------------------------------------------------- /models/Layers_2_Adagrad_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_2_Adagrad_acc.png -------------------------------------------------------------------------------- /models/Layers_2_Adagrad_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_2_Adagrad_loss.png -------------------------------------------------------------------------------- /models/Layers_2_adam_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_2_adam_acc.png -------------------------------------------------------------------------------- /models/Layers_2_adam_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_2_adam_loss.png -------------------------------------------------------------------------------- /models/Layers_3_Adagrad_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_3_Adagrad_acc.png -------------------------------------------------------------------------------- /models/Layers_3_Adagrad_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_3_Adagrad_loss.png -------------------------------------------------------------------------------- /models/Layers_3_adam_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_3_adam_acc.png -------------------------------------------------------------------------------- /models/Layers_3_adam_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_3_adam_loss.png -------------------------------------------------------------------------------- /models/Layers_4_Adagrad_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_4_Adagrad_acc.png -------------------------------------------------------------------------------- /models/Layers_4_Adagrad_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_4_Adagrad_loss.png -------------------------------------------------------------------------------- /models/Layers_4_adam_acc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_4_adam_acc.png -------------------------------------------------------------------------------- /models/Layers_4_adam_loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kingkastle/Deep-Learning---Vehicle-Classification/e6bd3ecd42d2b65cd29f0ce6b052cf6f60160390/models/Layers_4_adam_loss.png -------------------------------------------------------------------------------- /scripts/Dataset_wrangling.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Created on Aug 21, 2016 3 | 4 | @author: rafaelcastillo 5 | 6 | This script is used to built the final dataset from the pictures downloaded 7 | ''' 8 | 9 | import pandas 10 | import numpy as np 11 | from os import listdir,walk 12 | from os.path import isfile, join 13 | from keras.preprocessing.image import ImageDataGenerator 14 | from scipy import ndimage 15 | import seaborn as sns 16 | sns.set(style="white") 17 | import matplotlib.pyplot as plt 18 | import os, shutil 19 | import random 20 | import Image 21 | 22 | 23 | 24 | plt.style.use('seaborn-muted') # Using ggplot style for visualizations 25 | 26 | 27 | def resize_pic(img_path,width,height): 28 | ''' 29 | This function is used to resize pictures to the desired shape. 30 | 31 | Args: 32 | * img_path: path to local image (in jpg format) to convert 33 | * width: desired width of the output picture 34 | * height: desired height of the output picture 35 | 36 | Return: 37 | 0/1 depending of the conversion status. 38 | Generated picture is stored under a different name using img_path 39 | ''' 40 | im1 = Image.open(img_path) 41 | # use cubic spline interpolation in a 4x4 environment filter options to resize the image 42 | try: 43 | im4 = im1.resize((width, height), Image.BICUBIC) 44 | im4.save(img_path.replace('.jpg','_good_shape.jpg')) 45 | except: 46 | #print "Unable to resize file: {0}".format(img_path) 47 | return 0 48 | return 1 49 | 50 | 51 | def generate_pics(dataset_path,family,df_hw,number_pics,width,height): 52 | ''' 53 | This script is used to generate pictures using the Keras utility ImageDataGenerator. 54 | 55 | Args: 56 | * dataset_path: dataset path 57 | * family: flower category 58 | * df_hw: dataframe with height width, channel and family name for each picture 59 | * number_pics: number of pictures to generate 60 | * width: desired width of the output picture 61 | * height: desired height of the output picture 62 | 63 | Return: 64 | none 65 | 66 | ''' 67 | 68 | # Generate temporary folder and subfolder where the input pics to generate 69 | # fake pictures will be located. 70 | pictures_path = dataset_path + family 71 | if not os.path.exists(pictures_path + '/' + 'temp'): 72 | os.mkdir(pictures_path + '/' + 'temp') 73 | os.mkdir(pictures_path + '/' + 'temp/pics') 74 | 75 | # Get pictures paths: 76 | picture_files = df_hw[(df_hw['family']==family)& 77 | (df_hw['height']>=height)& 78 | (df_hw['width']>=width)]['name'].unique() 79 | 80 | # Select a number_pics of pictures using a random sample: 81 | if len(picture_files) < number_pics: 82 | print """Warning: There are insufficient pictures with optimum size to generate augmented pics. 83 | Process will repeat pictures to generate augmented data""" 84 | pics_diff = number_pics - len(picture_files) 85 | selected_pics = random.sample(picture_files,len(picture_files)) + random.sample(picture_files,pics_diff) 86 | else: 87 | selected_pics = random.sample(picture_files,number_pics) 88 | 89 | # Get wnid for the flower specie: 90 | wnid = selected_pics[0].split("_")[0] 91 | 92 | # Copy those files to the destination folder: 93 | for pic in selected_pics: 94 | shutil.copyfile(join(pictures_path, pic), pictures_path + '/' + 'temp/pics/' + pic) 95 | 96 | # Generate augmented pics. Docs: https://keras.io/preprocessing/image/ 97 | datagen = ImageDataGenerator( 98 | rotation_range=40, 99 | width_shift_range=0.01, 100 | height_shift_range=0.1, 101 | shear_range=0.05, 102 | zoom_range=0.1, 103 | horizontal_flip=True, 104 | fill_mode='nearest') 105 | 106 | # the .flow_from_directory() command below generates batches of randomly transformed images 107 | # and saves the results to save_to_dir directory 108 | for _ in datagen.flow_from_directory(pictures_path + '/' + 'temp', 109 | target_size=(width,height), 110 | batch_size=number_pics, 111 | classes=None, 112 | class_mode=None, 113 | shuffle=True, 114 | save_to_dir=pictures_path, 115 | save_prefix='Generated', 116 | save_format='jpg'): 117 | break # otherwise the generator would loop indefinitely 118 | 119 | # Rename generated files: 120 | for pic in [join(pictures_path, f) for f in os.listdir(pictures_path) if 'Generated' in f]: 121 | shutil.move( pic, pic.replace('Generated',wnid + '_fake')) 122 | 123 | # Remove temporary folder: 124 | shutil.rmtree(pictures_path + '/' + 'temp') 125 | 126 | 127 | 128 | def sizes_distribution(dataset_path): 129 | ''' 130 | This function generates a dataframes, one stores height and width and 131 | the number of channels available in RGB model for all pictures 132 | 133 | Args: 134 | * dataset_path: dataset path 135 | 136 | Return: 137 | * df_hw: dataframe with height width, channel and family name for each picture 138 | 139 | ''' 140 | # Dataframe to store results 141 | df_hw = pandas.DataFrame(columns=['name','height','width','channels']) 142 | idx = 0 143 | 144 | # Get list with all directories in the Dataset 145 | dir_list = [x[1] for x in walk(dataset_path)][0] 146 | for dir in dir_list: 147 | current_dir = dataset_path + "/" + dir 148 | picture_names = [f for f in listdir(current_dir) if isfile(join(current_dir, f))] 149 | for picture in picture_names: 150 | try: 151 | img = ndimage.imread(current_dir + "/" + picture,mode='RGB').shape 152 | except: 153 | print 'Unable to open file {0}'.format(current_dir + "/" + picture) 154 | os.remove(current_dir + "/" + picture) 155 | continue 156 | df_hw.loc[idx] = [picture,img[0],img[1],img[2]] 157 | idx += 1 158 | 159 | # generate family column 160 | df_hw['family'] = df_hw['name'].apply(lambda x: x.split("_")[0]) 161 | 162 | return df_hw 163 | 164 | 165 | def generate_sets(dataset_path,output_path,sizes,selected_classes): 166 | ''' 167 | This function generates the train and test structure directories required by 168 | function data_augmentation in models.py 169 | 170 | Args: 171 | * dataset_path: dataset path with images 172 | * output_path: train and test dataset path 173 | * sizes: list with the train/test/validation sizes distributions, i.e: [0.6,0.2,0.2] 174 | * selected_classes: list with selected flowers categories used for modeling 175 | 176 | Return: 177 | None 178 | ''' 179 | 180 | assert (np.sum(sizes) != 1.),"Total sizes must sum 1!, i.e. [.8,.1,.1]" 181 | 182 | try: 183 | os.mkdir(output_path) 184 | os.mkdir(output_path + '/' + 'train') 185 | os.mkdir(output_path + '/' + 'test') 186 | os.mkdir(output_path + '/' + 'validation') 187 | except: 188 | pass 189 | 190 | # Get list with all directories in the Dataset 191 | dir_list = [x[1] for x in walk(dataset_path)][0] 192 | for dir in dir_list: 193 | 194 | # Just process those flowers categories included in selected_classes 195 | if dir not in selected_classes: continue 196 | 197 | # Create folder in output directory 198 | current_dir = dataset_path + "/" + dir 199 | os.mkdir(current_dir.replace(dataset_path,output_path+"/train")) 200 | os.mkdir(current_dir.replace(dataset_path,output_path+"/test")) 201 | os.mkdir(current_dir.replace(dataset_path,output_path+"/validation")) 202 | 203 | # Get picture names and shuffle: 204 | picture_names = [f for f in listdir(current_dir) if isfile(join(current_dir, f))] 205 | random.shuffle(picture_names) 206 | train_size = int(len(picture_names)*sizes[0]) 207 | test_size = int(len(picture_names)*sizes[1]) 208 | 209 | # Copy files to the corresponding output directory 210 | for picture in picture_names[:train_size]: 211 | shutil.copyfile(current_dir + "/" + picture, current_dir.replace(dataset_path,output_path+"/train") + "/" + picture) 212 | for picture in picture_names[train_size:train_size + test_size]: 213 | shutil.copyfile(current_dir + "/" + picture, current_dir.replace(dataset_path,output_path+"/test") + "/" + picture) 214 | for picture in picture_names[train_size + test_size:]: 215 | shutil.copyfile(current_dir + "/" + picture, current_dir.replace(dataset_path,output_path+"/validation") + "/" + picture) 216 | 217 | 218 | if __name__ == '__main__': 219 | print "Dataset Wrangling module loaded..." 220 | 221 | 222 | 223 | 224 | 225 | 226 | -------------------------------------------------------------------------------- /scripts/ImageProcessor.py: -------------------------------------------------------------------------------- 1 | import os, shutil 2 | import random 3 | from os import listdir 4 | from os.path import isfile, join 5 | import Image 6 | from scipy import ndimage 7 | from keras.preprocessing.image import ImageDataGenerator 8 | 9 | 10 | def resize_pic(img_path,width,height): 11 | ''' 12 | This function is used to resize pictures to the desired shape. 13 | 14 | Args: 15 | * img_path: path to local image (in jpg format) to convert 16 | * width: desired width of the output picture 17 | * height: desired height of the output picture 18 | 19 | Return: 20 | 0/1 depending of the conversion status. 21 | Generated picture is stored under a different name using img_path 22 | ''' 23 | im1 = Image.open(img_path) 24 | # use cubic spline interpolation in a 4x4 environment filter options to resize the image 25 | try: 26 | im4 = im1.resize((width, height), Image.BICUBIC) 27 | im4.save(img_path.replace('.jpg','_good_shape.jpg')) 28 | except: 29 | print "Unable to resize file: {0}".format(img_path) 30 | return 0 31 | return 1 32 | 33 | 34 | def generate_pics(pictures_path,number_pics,width,height): 35 | ''' 36 | This script is used to generate pictures using the Keras utility ImageDataGenerator. 37 | 38 | Args: 39 | * pictures_path: path to folder where a particular flowers specie is located 40 | * number_pics: number of pictures to generate 41 | * width: desired width of the output picture 42 | * height: desired height of the output picture 43 | 44 | Return: 45 | none 46 | 47 | ''' 48 | 49 | # Generate temporary folder and subfolder where the input pics to generate 50 | # fake pictures will be located. 51 | if not os.path.exists(pictures_path + '/' + 'temp'): 52 | os.mkdir(pictures_path + '/' + 'temp') 53 | os.mkdir(pictures_path + '/' + 'temp/pics') 54 | 55 | # Get pictures paths: 56 | picture_files = [f for f in listdir(pictures_path) 57 | if isfile(join(pictures_path, f)) 58 | if ndimage.imread(join(pictures_path,f),mode='RGB').shape[0]>height] 59 | if len(picture_files) < number_pics: 60 | print "Warning: there are insufficient files with optimum size to generate augmented pics" 61 | number_pics = len(picture_files) 62 | 63 | # Select a number_pics of pictures using a random sample: 64 | selected_pics = random.sample(picture_files,number_pics) 65 | 66 | # Get wnid for the flower specie: 67 | wnid = selected_pics[0].split("_")[0] 68 | 69 | # Copy those files to the destination folder: 70 | for pic in selected_pics: 71 | shutil.copyfile(join(pictures_path, pic), pictures_path + '/' + 'temp/pics/' + pic) 72 | 73 | # Generate augmented pics. Docs: https://keras.io/preprocessing/image/ 74 | datagen = ImageDataGenerator( 75 | rotation_range=40, 76 | width_shift_range=0.01, 77 | height_shift_range=0.1, 78 | shear_range=0.05, 79 | zoom_range=0.1, 80 | horizontal_flip=True, 81 | fill_mode='nearest') 82 | 83 | # the .flow_from_directory() command below generates batches of randomly transformed images 84 | # and saves the results to save_to_dir directory 85 | for _ in datagen.flow_from_directory(pictures_path + '/' + 'temp', 86 | target_size=(width,height), 87 | batch_size=number_pics, 88 | classes=None, 89 | class_mode=None, 90 | shuffle=True, 91 | save_to_dir=pictures_path, 92 | save_prefix='Generated', 93 | save_format='jpg'): 94 | break # otherwise the generator would loop indefinitely 95 | 96 | # Rename generated files: 97 | for pic in [join(pictures_path, f) for f in listdir(pictures_path) if 'Generated' in f]: 98 | shutil.move( pic, pic.replace('Generated',wnid + '_fake')) 99 | 100 | # Remove temporary folder: 101 | shutil.rmtree(pictures_path + '/' + 'temp') 102 | 103 | pictures_path = '/home/rafaelcastillo/MLND/Project5/DeepLearning/Dataset2/n12914923' 104 | number_pics = 4 105 | width = 250 106 | height = 250 107 | generate_pics(pictures_path,number_pics,width,height) 108 | -------------------------------------------------------------------------------- /scripts/configuration.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Created on Aug 27, 2016 3 | 4 | @author: rafaelcastillo 5 | 6 | This file includes all configuration parameters used along the project 7 | ''' 8 | 9 | ## Configurations during the wrangling phase: 10 | 11 | path_dataset = '/home/rafaelcastillo/MLND/Project5/DeepLearning/Dataset/' # path to dataset in local directory 12 | 13 | dataset_train_test = '/home/rafaelcastillo/MLND/Project5/DeepLearning/Dataset_train_test' # path to dataset in the model required structure 14 | 15 | sizes = [.6,.3,.1] # sizes of the different sets: train/test/validation 16 | 17 | ## Classes: 18 | classes = ['n04467665','n04285008'] 19 | 20 | ## Dimensions of the augmented pictures: (vehicles) 21 | height = 405 22 | width = 460 23 | 24 | ## models path: 25 | model_path = '/home/rafaelcastillo/MLND/Project5/DeepLearning/models' 26 | 27 | ## models input pics dimensions: 28 | model_height = 150 29 | model_width = 150 30 | -------------------------------------------------------------------------------- /scripts/download_imagenet_images.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | ''' 3 | Created on Aug 19, 2016 4 | 5 | @author: rafaelcastillo 6 | 7 | 8 | This script is intended to download pictures from Imagenet (http://image-net.org/) 9 | based on the synset by HTTP request as here described: 10 | 11 | http://image-net.org/download-imageurls 12 | 13 | This project focus on the flowers classification (http://image-net.org/explore?wnid=n11669921), 14 | corresponding "WordNet ID" (wnid) of a synset of the different flowers families 15 | are stored in local file: 16 | 17 | 'flowers_synsets.txt' 18 | 19 | The different wnid included in such files are used to download pictures for each of the flower families: 20 | 21 | http://www.image-net.org/api/text/imagenet.synset.geturls?wnid=[wnid] 22 | 23 | ''' 24 | 25 | import urllib2 26 | import os 27 | import numpy as np 28 | import logging 29 | 30 | 31 | 32 | 33 | 34 | def download(url,local_dataset,file_name): 35 | ''' 36 | This function is employed to download files by HTTP 37 | 38 | Args: 39 | * url: url of picture to download 40 | * local_dataset: root path to locate pictures 41 | * file_name: name of picture name in local folder 42 | 43 | Return: 44 | 1: picture downloaded successfully 45 | 0: picture not downloaded 46 | 47 | ''' 48 | try: 49 | furl = urllib2.urlopen(url) 50 | finalurl = furl.geturl() # Since some pictures are no longer available, finalurl is used to detect url redirection: 51 | if url != finalurl: 52 | logging.info('File no longer available: {0}'.format(url)) 53 | return 0 54 | wnid = file_name.split("_")[0] 55 | local_path = local_dataset + "/" + wnid 56 | if not os.path.exists(local_path): 57 | os.makedirs(local_path) 58 | f = file("{0}.jpg".format(local_path + "/" + file_name),'wb') 59 | f.write(furl.read()) 60 | f.close() 61 | except: 62 | logging.info('Unable to download file {0}'.format(url)) 63 | return 0 64 | return 1 65 | 66 | def worker(procnum, process_number,return_list): 67 | ''' 68 | Worker function to perform multiprocessing 69 | 70 | Arg: 71 | * procnum: list with url, local Dataset directory and file_name elements for download function. 72 | * return_list: list to store download function status 73 | 74 | Return: 75 | none 76 | ''' 77 | url,local_dataset,file_name = procnum[0],procnum[1],procnum[2] 78 | download_status = download(url, local_dataset, file_name) 79 | return_list[process_number] = download_status 80 | 81 | def process_jobs(jobs,return_list): 82 | ''' 83 | Deploy all appended processes in jobs 84 | 85 | Arg: 86 | * jobs: list with appended processes 87 | * return_list: list with all processes' outputs 88 | 89 | Return: 90 | Sum of all processes' outputs 91 | ''' 92 | for p in jobs: 93 | p.join(2) 94 | # If thread is active 95 | if p.is_alive(): 96 | logging.info( "Process is running... let's kill it...") 97 | # Terminate process 98 | p.terminate() 99 | return np.sum(return_list) 100 | 101 | 102 | if __name__ == '__main__': 103 | print "download module loaded..." 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | -------------------------------------------------------------------------------- /scripts/models.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Created on Aug 30, 2016 3 | 4 | @author: rafaelcastillo 5 | 6 | This script includes the modelling part 7 | 8 | ref: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html 9 | ref: https://blog.keras.io/how-convolutional-neural-networks-see-the-world.html 10 | ''' 11 | from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img 12 | from keras.models import Sequential 13 | from keras.layers import Convolution2D, MaxPooling2D, AveragePooling2D 14 | from keras.layers import Activation, Dropout, Flatten, Dense 15 | from keras import backend as K 16 | from scipy import ndimage 17 | import matplotlib.pyplot as plt 18 | import matplotlib.image as mpimg 19 | from datetime import datetime 20 | import pandas, os 21 | import numpy as np 22 | import configuration, Dataset_wrangling 23 | from sklearn.metrics import accuracy_score 24 | 25 | def data_augmentation(path_dataset,pic_dims): 26 | """ 27 | This function generates batches of input data for models from the directories. 28 | 29 | Args: 30 | * dataset_path: dataset path 31 | * pic_dims: list with width and height of pictures 32 | 33 | Returns: 34 | * train and test objects with the input images 35 | """ 36 | 37 | # augmentation configuration used for training 38 | train_datagen = ImageDataGenerator( 39 | rescale=1/255., 40 | rotation_range=30, 41 | width_shift_range=0.1, 42 | height_shift_range=0.1, 43 | shear_range=0, 44 | zoom_range=0.2, 45 | horizontal_flip=True, 46 | fill_mode='nearest') 47 | 48 | # augmentation configuration used for testing 49 | test_datagen = ImageDataGenerator(rescale=1./255) 50 | 51 | # reading images from the specified directory and generating batches of augmented data 52 | train_generator = train_datagen.flow_from_directory( 53 | '{0}/train'.format(path_dataset), 54 | color_mode="grayscale", 55 | target_size=(pic_dims[0], pic_dims[1]), 56 | batch_size=32, 57 | class_mode='binary') 58 | 59 | # reading images from the specified directory and generating batches of augmented data 60 | validation_generator = test_datagen.flow_from_directory( 61 | '{0}/test'.format(path_dataset), 62 | color_mode="grayscale", 63 | target_size=(pic_dims[0], pic_dims[1]), 64 | batch_size=32, 65 | class_mode='binary') 66 | 67 | return train_generator, validation_generator 68 | 69 | 70 | def learning_curves(model_path,model_name,optimizer,history,show_plots): 71 | """ 72 | Display and save learning curves. 73 | 74 | Args: 75 | * model_path: path to models directory 76 | * model_name: name of trained and validated model 77 | * show_plots: whether to show plots or not while executing 78 | """ 79 | 80 | # accuracy 81 | plt.figure() 82 | plt.plot(history.history['acc']) 83 | plt.plot(history.history['val_acc']) 84 | plt.title('accuracy of the model') 85 | plt.ylabel('accuracy') 86 | plt.xlabel('epoch') 87 | plt.legend(['validation set','training set'], loc='lower right') 88 | plt.savefig(model_path + "/" + model_name + '_acc.png') 89 | if show_plots: plt.show() 90 | 91 | # loss 92 | plt.figure() 93 | plt.plot(history.history['loss']) 94 | plt.plot(history.history['val_loss']) 95 | plt.title('loss of the model') 96 | plt.ylabel('loss') 97 | plt.xlabel('epoch') 98 | plt.legend(['validation set','training set'], loc='upper right') 99 | plt.savefig(model_path + "/" + model_name + '_loss.png') 100 | if show_plots: plt.show() 101 | 102 | 103 | def build_model(optimizer,pic_dims,layers): 104 | """ 105 | Builds model with desired hyperparameters. 106 | 107 | Args: 108 | * optimizer: optimizer used in the model 109 | * pic_dims: list with width and height of pictures 110 | * layers: number of conv layers included in the net 111 | 112 | Returns: 113 | * model: defined model with the selected optimizer 114 | """ 115 | 116 | # Define the first conv layer 117 | model = Sequential() 118 | model.add(Convolution2D(32, 3, 3, activation='relu', input_shape=(1, pic_dims[0], pic_dims[1]), name='conv1')) 119 | model.add(MaxPooling2D(pool_size=(2, 2))) 120 | 121 | #Include as many convs layers as defined by parameter layer 122 | num_layers = 1 123 | while (num_layers < layers): 124 | num_layers += 1 125 | model.add(Convolution2D(64, 3, 3, activation='relu', name='conv{0}'.format(num_layers))) 126 | model.add(MaxPooling2D(pool_size=(2, 2))) 127 | 128 | # Define on top a fully connected net 129 | model.add(Flatten()) 130 | model.add(Dense(150)) 131 | model.add(Activation('relu')) 132 | model.add(Dropout(0.5)) 133 | model.add(Dense(1)) 134 | model.add(Activation('sigmoid')) 135 | 136 | model.compile(loss='binary_crossentropy', 137 | optimizer=optimizer, 138 | metrics=['accuracy']) 139 | 140 | return model 141 | 142 | def run_model(optimizer,nb_epoch,model_path,path_dataset,pic_dims,layers): 143 | """ 144 | This function builds the model as well as validate its over the validation set. 145 | Model weights are saved in a local file as well as training time and accuracy over validation sets. 146 | 147 | Args: 148 | * optimizer: optimizer used to calculate deep net loss 149 | * nb_epoch: number of epochs used during training process 150 | * model_path: path to models directory 151 | * path_dataset: path to input images 152 | * pic_dims: list with width and height of pictures 153 | * layers: number of conv layers included in the net 154 | 155 | Return: 156 | * model: trained model 157 | * time_to_train: time (seconds) required to train model 158 | * acc: model accuracy over validation set 159 | * model_name: name of trained and validated model 160 | """ 161 | 162 | # Generate connectors to augmented train/test data: 163 | train_generator, validation_generator = data_augmentation(path_dataset,pic_dims) 164 | model = build_model(optimizer,pic_dims,layers) 165 | 166 | #Start calculating train processing time: 167 | startTime = datetime.now() 168 | 169 | # Train model: 170 | history = model.fit_generator( 171 | train_generator, 172 | samples_per_epoch=300, 173 | nb_epoch=nb_epoch, 174 | validation_data=validation_generator, 175 | nb_val_samples=100, 176 | verbose=1) 177 | 178 | # End training processing: 179 | time_to_train = datetime.now() - startTime 180 | 181 | # Generate learning plots: 182 | model_name = "Layers_{0}_".format(layers) + str(optimizer) 183 | learning_curves(model_path,model_name,optimizer,history,False) 184 | model.save_weights(model_path + "/" + model_name +'.h5') 185 | 186 | # Calculate accuracy over validation set: 187 | model_parameters = [model_path,optimizer,model_name,pic_dims] 188 | acc = validate_model(model_parameters,path_dataset) 189 | 190 | return model,model_name,time_to_train,acc 191 | 192 | def validate_model(model_parameters,path_dataset): 193 | ''' 194 | 195 | This script test a model over the validation dataset (using augmented data) 196 | 197 | Args: 198 | * model_parameters: List that includes:[model_path, optimizer, model_name, [height, width], layers] 199 | * path_dataset: path to dataset (where validation data is included) 200 | 201 | Returns: 202 | * Accuracy score 203 | ''' 204 | 205 | model_path, optimizer, model_name, pic_dims = model_parameters 206 | 207 | # Load model 208 | model = build_model(optimizer,pic_dims,layers) 209 | model.load_weights(model_path + "/" + model_name + '.h5') 210 | 211 | # Load test data and labels 212 | test_datagen = ImageDataGenerator(rescale=1./255) 213 | i = 0 214 | for test_data in test_datagen.flow_from_directory( 215 | '{0}/validation'.format(configuration.dataset_train_test), 216 | target_size=(150, 150), 217 | batch_size=200, 218 | color_mode="grayscale", 219 | class_mode='binary', 220 | shuffle=False): 221 | i += 1 222 | if i > 2: 223 | break # otherwise the generator would loop indefinitely 224 | X = test_data[0] 225 | y_true = test_data[1] 226 | 227 | # Predict on test data 228 | y_pred = model.predict(X, batch_size=1, verbose=0) 229 | 230 | # Round predictions to 1s and 0s 231 | y_pred = np.around(y_pred) 232 | 233 | return accuracy_score(y_true, y_pred) 234 | 235 | def visualize_filters(path_to_image,layer_name,filters): 236 | ''' 237 | This function is used to visualize two filters for a layer in a neural network 238 | 239 | * Args: 240 | * path_to_image: Path to image 241 | * layer_name: name of the layer to visualize 242 | * filters: list of length = 2 to visualize 243 | 244 | * Returns: 245 | Generates a visualization 246 | ''' 247 | # Get layer index from model.layers 248 | layer_index = [i for i,x in enumerate(model.layers) if x.name == layer_name][0] 249 | 250 | # Function to get the layer output 251 | get_layer_output = K.function([model.layers[0].input, K.learning_phase()], 252 | [model.layers[layer_index].output]) 253 | 254 | # Resize picture 255 | Dataset_wrangling.resize_pic(path_to_image,150,150) 256 | X = ndimage.imread(path_to_image.replace('.jpg','_good_shape.jpg'),flatten=True) 257 | y = np.expand_dims(X, axis=0) 258 | os.remove(path_to_image.replace('.jpg','_good_shape.jpg')) 259 | 260 | # Get layer output for the image: 261 | layer_output = get_layer_output([np.expand_dims(y, axis=0), 0])[0] 262 | 263 | filter_1 = filters[0] 264 | filter_2 = filters[1] 265 | 266 | # Generate visualization: 267 | fig = plt.figure() 268 | 269 | ax1 = plt.subplot(131) 270 | ax2 = plt.subplot(132) 271 | ax3 = plt.subplot(133) 272 | 273 | ax1.imshow(mpimg.imread(path_to_image)) 274 | ax2.imshow(layer_output[0,filter_1,:,:],cmap='hot',alpha=1) 275 | ax3.imshow(layer_output[0,filter_2,:,:],cmap='hot',alpha=1) 276 | ax1.set_title('Original') 277 | ax2.set_title('Filter: {0}'.format(filter_1)) 278 | ax3.set_title('Filter: {0}'.format(filter_2)) 279 | 280 | fig.suptitle('Image representation in Layer {0}'.format(layer_name), fontsize=14, fontweight='bold') 281 | 282 | plt.tight_layout() 283 | 284 | 285 | 286 | 287 | if __name__ == '__main__': 288 | print "Models module loaded..." 289 | 290 | results = pandas.DataFrame(columns=['model_name','time_to_train','accuracy']) 291 | dataset_path = '/home/rafaelcastillo/MLND/Project5/DeepLearning/Dataset_train_test' 292 | model_path = '/home/rafaelcastillo/MLND/Project5/DeepLearning/models' 293 | optimizer_list = ['adam', 'Adagrad'] 294 | pic_dims = [150,150] 295 | nb_epocs = 100 296 | for layers in [1,2,3,4]: 297 | for optimizer in optimizer_list: 298 | model,model_name,time_to_train,acc = run_model(optimizer, nb_epocs, model_path, dataset_path,pic_dims,layers) 299 | results.loc[results.shape[0]+1,:] = [model_name,time_to_train,acc] 300 | results.to_csv(dataset_path + "/" + 'Net_results.csv',sep=",",index=False) 301 | 302 | --------------------------------------------------------------------------------