├── LICENSE
├── README.md
├── audio_conv_utils.py
├── imagenet_utils.py
├── inception_resnet_v2.py
├── inception_v3.py
├── mobilenet.py
├── music_tagger_crnn.py
├── resnet50.py
├── vgg16.py
├── vgg19.py
└── xception.py


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2016 François Chollet
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Trained image classification models for Keras
 2 | 
 3 | **THIS REPOSITORY IS DEPRECATED. USE THE MODULE `keras.applications` INSTEAD.**
 4 | 
 5 | Pull requests will not be reviewed nor merged. Direct any PRs to `keras.applications`. Issues are not monitored either.
 6 | 
 7 | ----
 8 | 
 9 | This repository contains code for the following Keras models:
10 | 
11 | - VGG16
12 | - VGG19
13 | - ResNet50
14 | - Inception v3
15 | - CRNN for music tagging
16 | 
17 | All architectures are compatible with both TensorFlow and Theano, and upon instantiation the models will be built according to the image dimension ordering set in your Keras configuration file at `~/.keras/keras.json`. For instance, if you have set `image_dim_ordering=tf`, then any model loaded from this repository will get built according to the TensorFlow dimension ordering convention, "Width-Height-Depth".
18 | 
19 | Pre-trained weights can be automatically loaded upon instantiation (`weights='imagenet'` argument in model constructor for all image models, `weights='msd'` for the music tagging model). Weights are automatically downloaded if necessary, and cached locally in `~/.keras/models/`.
20 | 
21 | ## Examples
22 | 
23 | ### Classify images
24 | 
25 | ```python
26 | from resnet50 import ResNet50
27 | from keras.preprocessing import image
28 | from imagenet_utils import preprocess_input, decode_predictions
29 | 
30 | model = ResNet50(weights='imagenet')
31 | 
32 | img_path = 'elephant.jpg'
33 | img = image.load_img(img_path, target_size=(224, 224))
34 | x = image.img_to_array(img)
35 | x = np.expand_dims(x, axis=0)
36 | x = preprocess_input(x)
37 | 
38 | preds = model.predict(x)
39 | print('Predicted:', decode_predictions(preds))
40 | # print: [[u'n02504458', u'African_elephant']]
41 | ```
42 | 
43 | ### Extract features from images
44 | 
45 | ```python
46 | from vgg16 import VGG16
47 | from keras.preprocessing import image
48 | from imagenet_utils import preprocess_input
49 | 
50 | model = VGG16(weights='imagenet', include_top=False)
51 | 
52 | img_path = 'elephant.jpg'
53 | img = image.load_img(img_path, target_size=(224, 224))
54 | x = image.img_to_array(img)
55 | x = np.expand_dims(x, axis=0)
56 | x = preprocess_input(x)
57 | 
58 | features = model.predict(x)
59 | ```
60 | 
61 | ### Extract features from an arbitrary intermediate layer
62 | 
63 | ```python
64 | from vgg19 import VGG19
65 | from keras.preprocessing import image
66 | from imagenet_utils import preprocess_input
67 | from keras.models import Model
68 | 
69 | base_model = VGG19(weights='imagenet')
70 | model = Model(input=base_model.input, output=base_model.get_layer('block4_pool').output)
71 | 
72 | img_path = 'elephant.jpg'
73 | img = image.load_img(img_path, target_size=(224, 224))
74 | x = image.img_to_array(img)
75 | x = np.expand_dims(x, axis=0)
76 | x = preprocess_input(x)
77 | 
78 | block4_pool_features = model.predict(x)
79 | ```
80 | 
81 | ## References
82 | 
83 | - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) - please cite this paper if you use the VGG models in your work.
84 | - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) - please cite this paper if you use the ResNet model in your work.
85 | - [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) - please cite this paper if you use the Inception v3 model in your work.
86 | - [Music-auto_tagging-keras](https://github.com/keunwoochoi/music-auto_tagging-keras)
87 | 
88 | Additionally, don't forget to [cite Keras](https://keras.io/getting-started/faq/#how-should-i-cite-keras) if you use these models.
89 | 
90 | 
91 | ## License
92 | 
93 | - All code in this repository is under the MIT license as specified by the LICENSE file.
94 | - The ResNet50 weights are ported from the ones [released by Kaiming He](https://github.com/KaimingHe/deep-residual-networks) under the [MIT license](https://github.com/KaimingHe/deep-residual-networks/blob/master/LICENSE).
95 | - The VGG16 and VGG19 weights are ported from the ones [released by VGG at Oxford](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) under the [Creative Commons Attribution License](https://creativecommons.org/licenses/by/4.0/).
96 | - The Inception v3 weights are trained by ourselves and are released under the MIT license.
97 | 


--------------------------------------------------------------------------------
/audio_conv_utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from keras import backend as K
 3 | 
 4 | 
 5 | TAGS = ['rock', 'pop', 'alternative', 'indie', 'electronic',
 6 |         'female vocalists', 'dance', '00s', 'alternative rock', 'jazz',
 7 |         'beautiful', 'metal', 'chillout', 'male vocalists',
 8 |         'classic rock', 'soul', 'indie rock', 'Mellow', 'electronica',
 9 |         '80s', 'folk', '90s', 'chill', 'instrumental', 'punk',
10 |         'oldies', 'blues', 'hard rock', 'ambient', 'acoustic',
11 |         'experimental', 'female vocalist', 'guitar', 'Hip-Hop',
12 |         '70s', 'party', 'country', 'easy listening',
13 |         'sexy', 'catchy', 'funk', 'electro', 'heavy metal',
14 |         'Progressive rock', '60s', 'rnb', 'indie pop',
15 |         'sad', 'House', 'happy']
16 | 
17 | 
18 | def librosa_exists():
19 |     try:
20 |         __import__('librosa')
21 |     except ImportError:
22 |         return False
23 |     else:
24 |         return True
25 | 
26 | 
27 | def preprocess_input(audio_path, dim_ordering='default'):
28 |     '''Reads an audio file and outputs a Mel-spectrogram.
29 |     '''
30 |     if dim_ordering == 'default':
31 |         dim_ordering = K.image_dim_ordering()
32 |     assert dim_ordering in {'tf', 'th'}
33 | 
34 |     if librosa_exists():
35 |         import librosa
36 |     else:
37 |         raise RuntimeError('Librosa is required to process audio files.\n' +
38 |                            'Install it via `pip install librosa` \nor visit ' +
39 |                            'http://librosa.github.io/librosa/ for details.')
40 | 
41 |     # mel-spectrogram parameters
42 |     SR = 12000
43 |     N_FFT = 512
44 |     N_MELS = 96
45 |     HOP_LEN = 256
46 |     DURA = 29.12
47 | 
48 |     src, sr = librosa.load(audio_path, sr=SR)
49 |     n_sample = src.shape[0]
50 |     n_sample_wanted = int(DURA * SR)
51 | 
52 |     # trim the signal at the center
53 |     if n_sample < n_sample_wanted:  # if too short
54 |         src = np.hstack((src, np.zeros((int(DURA * SR) - n_sample,))))
55 |     elif n_sample > n_sample_wanted:  # if too long
56 |         src = src[(n_sample - n_sample_wanted) / 2:
57 |                   (n_sample + n_sample_wanted) / 2]
58 | 
59 |     logam = librosa.logamplitude
60 |     melgram = librosa.feature.melspectrogram
61 |     x = logam(melgram(y=src, sr=SR, hop_length=HOP_LEN,
62 |                       n_fft=N_FFT, n_mels=N_MELS) ** 2,
63 |               ref_power=1.0)
64 | 
65 |     if dim_ordering == 'th':
66 |         x = np.expand_dims(x, axis=0)
67 |     elif dim_ordering == 'tf':
68 |         x = np.expand_dims(x, axis=3)
69 |     return x
70 | 
71 | 
72 | def decode_predictions(preds, top_n=5):
73 |     '''Decode the output of a music tagger model.
74 | 
75 |     # Arguments
76 |         preds: 2-dimensional numpy array
77 |         top_n: integer in [0, 50], number of items to show
78 | 
79 |     '''
80 |     assert len(preds.shape) == 2 and preds.shape[1] == 50
81 |     results = []
82 |     for pred in preds:
83 |         result = zip(TAGS, pred)
84 |         result = sorted(result, key=lambda x: x[1], reverse=True)
85 |         results.append(result[:top_n])
86 |     return results
87 | 


--------------------------------------------------------------------------------
/imagenet_utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import json
 3 | 
 4 | from keras.utils.data_utils import get_file
 5 | from keras import backend as K
 6 | 
 7 | CLASS_INDEX = None
 8 | CLASS_INDEX_PATH = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json'
 9 | 
10 | 
11 | def preprocess_input(x, dim_ordering='default'):
12 |     if dim_ordering == 'default':
13 |         dim_ordering = K.image_dim_ordering()
14 |     assert dim_ordering in {'tf', 'th'}
15 | 
16 |     if dim_ordering == 'th':
17 |         x[:, 0, :, :] -= 103.939
18 |         x[:, 1, :, :] -= 116.779
19 |         x[:, 2, :, :] -= 123.68
20 |         # 'RGB'->'BGR'
21 |         x = x[:, ::-1, :, :]
22 |     else:
23 |         x[:, :, :, 0] -= 103.939
24 |         x[:, :, :, 1] -= 116.779
25 |         x[:, :, :, 2] -= 123.68
26 |         # 'RGB'->'BGR'
27 |         x = x[:, :, :, ::-1]
28 |     return x
29 | 
30 | 
31 | def decode_predictions(preds, top=5):
32 |     global CLASS_INDEX
33 |     if len(preds.shape) != 2 or preds.shape[1] != 1000:
34 |         raise ValueError('`decode_predictions` expects '
35 |                          'a batch of predictions '
36 |                          '(i.e. a 2D array of shape (samples, 1000)). '
37 |                          'Found array with shape: ' + str(preds.shape))
38 |     if CLASS_INDEX is None:
39 |         fpath = get_file('imagenet_class_index.json',
40 |                          CLASS_INDEX_PATH,
41 |                          cache_subdir='models')
42 |         CLASS_INDEX = json.load(open(fpath))
43 |     results = []
44 |     for pred in preds:
45 |         top_indices = pred.argsort()[-top:][::-1]
46 |         result = [tuple(CLASS_INDEX[str(i)]) + (pred[i],) for i in top_indices]
47 |         results.append(result)
48 |     return results
49 | 


--------------------------------------------------------------------------------
/inception_resnet_v2.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """Inception-ResNet V2 model for Keras.
  3 | 
  4 | Model naming and structure follows TF-slim implementation (which has some additional
  5 | layers and different number of filters from the original arXiv paper):
  6 | https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py
  7 | 
  8 | Pre-trained ImageNet weights are also converted from TF-slim, which can be found in:
  9 | https://github.com/tensorflow/models/tree/master/slim#pre-trained-models
 10 | 
 11 | # Reference
 12 | - [Inception-v4, Inception-ResNet and the Impact of
 13 |    Residual Connections on Learning](https://arxiv.org/abs/1602.07261)
 14 | 
 15 | """
 16 | from __future__ import print_function
 17 | from __future__ import absolute_import
 18 | 
 19 | import warnings
 20 | import numpy as np
 21 | 
 22 | from keras.preprocessing import image
 23 | from keras.models import Model
 24 | from keras.layers import Activation
 25 | from keras.layers import AveragePooling2D
 26 | from keras.layers import BatchNormalization
 27 | from keras.layers import Concatenate
 28 | from keras.layers import Conv2D
 29 | from keras.layers import Dense
 30 | from keras.layers import GlobalAveragePooling2D
 31 | from keras.layers import GlobalMaxPooling2D
 32 | from keras.layers import Input
 33 | from keras.layers import Lambda
 34 | from keras.layers import MaxPooling2D
 35 | from keras.utils.data_utils import get_file
 36 | from keras.engine.topology import get_source_inputs
 37 | from keras.applications.imagenet_utils import _obtain_input_shape
 38 | from keras.applications.imagenet_utils import decode_predictions
 39 | from keras import backend as K
 40 | 
 41 | 
 42 | BASE_WEIGHT_URL = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.7/'
 43 | 
 44 | 
 45 | def preprocess_input(x):
 46 |     """Preprocesses a numpy array encoding a batch of images.
 47 | 
 48 |     This function applies the "Inception" preprocessing which converts
 49 |     the RGB values from [0, 255] to [-1, 1]. Note that this preprocessing
 50 |     function is different from `imagenet_utils.preprocess_input()`.
 51 | 
 52 |     # Arguments
 53 |         x: a 4D numpy array consists of RGB values within [0, 255].
 54 | 
 55 |     # Returns
 56 |         Preprocessed array.
 57 |     """
 58 |     x /= 255.
 59 |     x -= 0.5
 60 |     x *= 2.
 61 |     return x
 62 | 
 63 | 
 64 | def conv2d_bn(x,
 65 |               filters,
 66 |               kernel_size,
 67 |               strides=1,
 68 |               padding='same',
 69 |               activation='relu',
 70 |               use_bias=False,
 71 |               name=None):
 72 |     """Utility function to apply conv + BN.
 73 | 
 74 |     # Arguments
 75 |         x: input tensor.
 76 |         filters: filters in `Conv2D`.
 77 |         kernel_size: kernel size as in `Conv2D`.
 78 |         padding: padding mode in `Conv2D`.
 79 |         activation: activation in `Conv2D`.
 80 |         strides: strides in `Conv2D`.
 81 |         name: name of the ops; will become `name + '_ac'` for the activation
 82 |             and `name + '_bn'` for the batch norm layer.
 83 | 
 84 |     # Returns
 85 |         Output tensor after applying `Conv2D` and `BatchNormalization`.
 86 |     """
 87 |     x = Conv2D(filters,
 88 |                kernel_size,
 89 |                strides=strides,
 90 |                padding=padding,
 91 |                use_bias=use_bias,
 92 |                name=name)(x)
 93 |     if not use_bias:
 94 |         bn_axis = 1 if K.image_data_format() == 'channels_first' else 3
 95 |         bn_name = None if name is None else name + '_bn'
 96 |         x = BatchNormalization(axis=bn_axis, scale=False, name=bn_name)(x)
 97 |     if activation is not None:
 98 |         ac_name = None if name is None else name + '_ac'
 99 |         x = Activation(activation, name=ac_name)(x)
100 |     return x
101 | 
102 | 
103 | def inception_resnet_block(x, scale, block_type, block_idx, activation='relu'):
104 |     """Adds a Inception-ResNet block.
105 | 
106 |     This function builds 3 types of Inception-ResNet blocks mentioned
107 |     in the paper, controlled by the `block_type` argument (which is the
108 |     block name used in the official TF-slim implementation):
109 |         - Inception-ResNet-A: `block_type='block35'`
110 |         - Inception-ResNet-B: `block_type='block17'`
111 |         - Inception-ResNet-C: `block_type='block8'`
112 | 
113 |     # Arguments
114 |         x: input tensor.
115 |         scale: scaling factor to scale the residuals (i.e., the output of
116 |             passing `x` through an inception module) before adding them
117 |             to the shortcut branch. Let `r` be the output from the residual branch,
118 |             the output of this block will be `x + scale * r`.
119 |         block_type: `'block35'`, `'block17'` or `'block8'`, determines
120 |             the network structure in the residual branch.
121 |         block_idx: an `int` used for generating layer names. The Inception-ResNet blocks
122 |             are repeated many times in this network. We use `block_idx` to identify
123 |             each of the repetitions. For example, the first Inception-ResNet-A block
124 |             will have `block_type='block35', block_idx=0`, ane the layer names will have
125 |             a common prefix `'block35_0'`.
126 |         activation: activation function to use at the end of the block
127 |             (see [activations](keras./activations.md)).
128 |             When `activation=None`, no activation is applied
129 |             (i.e., "linear" activation: `a(x) = x`).
130 | 
131 |     # Returns
132 |         Output tensor for the block.
133 | 
134 |     # Raises
135 |         ValueError: if `block_type` is not one of `'block35'`,
136 |             `'block17'` or `'block8'`.
137 |     """
138 |     if block_type == 'block35':
139 |         branch_0 = conv2d_bn(x, 32, 1)
140 |         branch_1 = conv2d_bn(x, 32, 1)
141 |         branch_1 = conv2d_bn(branch_1, 32, 3)
142 |         branch_2 = conv2d_bn(x, 32, 1)
143 |         branch_2 = conv2d_bn(branch_2, 48, 3)
144 |         branch_2 = conv2d_bn(branch_2, 64, 3)
145 |         branches = [branch_0, branch_1, branch_2]
146 |     elif block_type == 'block17':
147 |         branch_0 = conv2d_bn(x, 192, 1)
148 |         branch_1 = conv2d_bn(x, 128, 1)
149 |         branch_1 = conv2d_bn(branch_1, 160, [1, 7])
150 |         branch_1 = conv2d_bn(branch_1, 192, [7, 1])
151 |         branches = [branch_0, branch_1]
152 |     elif block_type == 'block8':
153 |         branch_0 = conv2d_bn(x, 192, 1)
154 |         branch_1 = conv2d_bn(x, 192, 1)
155 |         branch_1 = conv2d_bn(branch_1, 224, [1, 3])
156 |         branch_1 = conv2d_bn(branch_1, 256, [3, 1])
157 |         branches = [branch_0, branch_1]
158 |     else:
159 |         raise ValueError('Unknown Inception-ResNet block type. '
160 |                          'Expects "block35", "block17" or "block8", '
161 |                          'but got: ' + str(block_type))
162 | 
163 |     block_name = block_type + '_' + str(block_idx)
164 |     channel_axis = 1 if K.image_data_format() == 'channels_first' else 3
165 |     mixed = Concatenate(axis=channel_axis, name=block_name + '_mixed')(branches)
166 |     up = conv2d_bn(mixed,
167 |                    K.int_shape(x)[channel_axis],
168 |                    1,
169 |                    activation=None,
170 |                    use_bias=True,
171 |                    name=block_name + '_conv')
172 | 
173 |     x = Lambda(lambda inputs, scale: inputs[0] + inputs[1] * scale,
174 |                output_shape=K.int_shape(x)[1:],
175 |                arguments={'scale': scale},
176 |                name=block_name)([x, up])
177 |     if activation is not None:
178 |         x = Activation(activation, name=block_name + '_ac')(x)
179 |     return x
180 | 
181 | 
182 | def InceptionResNetV2(include_top=True,
183 |                       weights='imagenet',
184 |                       input_tensor=None,
185 |                       input_shape=None,
186 |                       pooling=None,
187 |                       classes=1000):
188 |     """Instantiates the Inception-ResNet v2 architecture.
189 | 
190 |     Optionally loads weights pre-trained on ImageNet.
191 |     Note that when using TensorFlow, for best performance you should
192 |     set `"image_data_format": "channels_last"` in your Keras config
193 |     at `~/.keras/keras.json`.
194 | 
195 |     The model and the weights are compatible with both TensorFlow and Theano
196 |     backends (but not CNTK). The data format convention used by the model is
197 |     the one specified in your Keras config file.
198 | 
199 |     Note that the default input image size for this model is 299x299, instead
200 |     of 224x224 as in the VGG16 and ResNet models. Also, the input preprocessing
201 |     function is different (i.e., do not use `imagenet_utils.preprocess_input()`
202 |     with this model. Use `preprocess_input()` defined in this module instead).
203 | 
204 |     # Arguments
205 |         include_top: whether to include the fully-connected
206 |             layer at the top of the network.
207 |         weights: one of `None` (random initialization)
208 |             or `'imagenet'` (pre-training on ImageNet).
209 |         input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
210 |             to use as image input for the model.
211 |         input_shape: optional shape tuple, only to be specified
212 |             if `include_top` is `False` (otherwise the input shape
213 |             has to be `(299, 299, 3)` (with `'channels_last'` data format)
214 |             or `(3, 299, 299)` (with `'channels_first'` data format).
215 |             It should have exactly 3 inputs channels,
216 |             and width and height should be no smaller than 139.
217 |             E.g. `(150, 150, 3)` would be one valid value.
218 |         pooling: Optional pooling mode for feature extraction
219 |             when `include_top` is `False`.
220 |             - `None` means that the output of the model will be
221 |                 the 4D tensor output of the last convolutional layer.
222 |             - `'avg'` means that global average pooling
223 |                 will be applied to the output of the
224 |                 last convolutional layer, and thus
225 |                 the output of the model will be a 2D tensor.
226 |             - `'max'` means that global max pooling will be applied.
227 |         classes: optional number of classes to classify images
228 |             into, only to be specified if `include_top` is `True`, and
229 |             if no `weights` argument is specified.
230 | 
231 |     # Returns
232 |         A Keras `Model` instance.
233 | 
234 |     # Raises
235 |         ValueError: in case of invalid argument for `weights`,
236 |             or invalid input shape.
237 |         RuntimeError: If attempting to run this model with an unsupported backend.
238 |     """
239 |     if K.backend() in {'cntk'}:
240 |         raise RuntimeError(K.backend() + ' backend is currently unsupported for this model.')
241 | 
242 |     if weights not in {'imagenet', None}:
243 |         raise ValueError('The `weights` argument should be either '
244 |                          '`None` (random initialization) or `imagenet` '
245 |                          '(pre-training on ImageNet).')
246 | 
247 |     if weights == 'imagenet' and include_top and classes != 1000:
248 |         raise ValueError('If using `weights` as imagenet with `include_top`'
249 |                          ' as true, `classes` should be 1000')
250 | 
251 |     # Determine proper input shape
252 |     input_shape = _obtain_input_shape(
253 |         input_shape,
254 |         default_size=299,
255 |         min_size=139,
256 |         data_format=K.image_data_format(),
257 |         require_flatten=False,
258 |         weights=weights)
259 | 
260 |     if input_tensor is None:
261 |         img_input = Input(shape=input_shape)
262 |     else:
263 |         if not K.is_keras_tensor(input_tensor):
264 |             img_input = Input(tensor=input_tensor, shape=input_shape)
265 |         else:
266 |             img_input = input_tensor
267 | 
268 |     # Stem block: 35 x 35 x 192
269 |     x = conv2d_bn(img_input, 32, 3, strides=2, padding='valid')
270 |     x = conv2d_bn(x, 32, 3, padding='valid')
271 |     x = conv2d_bn(x, 64, 3)
272 |     x = MaxPooling2D(3, strides=2)(x)
273 |     x = conv2d_bn(x, 80, 1, padding='valid')
274 |     x = conv2d_bn(x, 192, 3, padding='valid')
275 |     x = MaxPooling2D(3, strides=2)(x)
276 | 
277 |     # Mixed 5b (Inception-A block): 35 x 35 x 320
278 |     branch_0 = conv2d_bn(x, 96, 1)
279 |     branch_1 = conv2d_bn(x, 48, 1)
280 |     branch_1 = conv2d_bn(branch_1, 64, 5)
281 |     branch_2 = conv2d_bn(x, 64, 1)
282 |     branch_2 = conv2d_bn(branch_2, 96, 3)
283 |     branch_2 = conv2d_bn(branch_2, 96, 3)
284 |     branch_pool = AveragePooling2D(3, strides=1, padding='same')(x)
285 |     branch_pool = conv2d_bn(branch_pool, 64, 1)
286 |     branches = [branch_0, branch_1, branch_2, branch_pool]
287 |     channel_axis = 1 if K.image_data_format() == 'channels_first' else 3
288 |     x = Concatenate(axis=channel_axis, name='mixed_5b')(branches)
289 | 
290 |     # 10x block35 (Inception-ResNet-A block): 35 x 35 x 320
291 |     for block_idx in range(1, 11):
292 |         x = inception_resnet_block(x,
293 |                                    scale=0.17,
294 |                                    block_type='block35',
295 |                                    block_idx=block_idx)
296 | 
297 |     # Mixed 6a (Reduction-A block): 17 x 17 x 1088
298 |     branch_0 = conv2d_bn(x, 384, 3, strides=2, padding='valid')
299 |     branch_1 = conv2d_bn(x, 256, 1)
300 |     branch_1 = conv2d_bn(branch_1, 256, 3)
301 |     branch_1 = conv2d_bn(branch_1, 384, 3, strides=2, padding='valid')
302 |     branch_pool = MaxPooling2D(3, strides=2, padding='valid')(x)
303 |     branches = [branch_0, branch_1, branch_pool]
304 |     x = Concatenate(axis=channel_axis, name='mixed_6a')(branches)
305 | 
306 |     # 20x block17 (Inception-ResNet-B block): 17 x 17 x 1088
307 |     for block_idx in range(1, 21):
308 |         x = inception_resnet_block(x,
309 |                                    scale=0.1,
310 |                                    block_type='block17',
311 |                                    block_idx=block_idx)
312 | 
313 |     # Mixed 7a (Reduction-B block): 8 x 8 x 2080
314 |     branch_0 = conv2d_bn(x, 256, 1)
315 |     branch_0 = conv2d_bn(branch_0, 384, 3, strides=2, padding='valid')
316 |     branch_1 = conv2d_bn(x, 256, 1)
317 |     branch_1 = conv2d_bn(branch_1, 288, 3, strides=2, padding='valid')
318 |     branch_2 = conv2d_bn(x, 256, 1)
319 |     branch_2 = conv2d_bn(branch_2, 288, 3)
320 |     branch_2 = conv2d_bn(branch_2, 320, 3, strides=2, padding='valid')
321 |     branch_pool = MaxPooling2D(3, strides=2, padding='valid')(x)
322 |     branches = [branch_0, branch_1, branch_2, branch_pool]
323 |     x = Concatenate(axis=channel_axis, name='mixed_7a')(branches)
324 | 
325 |     # 10x block8 (Inception-ResNet-C block): 8 x 8 x 2080
326 |     for block_idx in range(1, 10):
327 |         x = inception_resnet_block(x,
328 |                                    scale=0.2,
329 |                                    block_type='block8',
330 |                                    block_idx=block_idx)
331 |     x = inception_resnet_block(x,
332 |                                scale=1.,
333 |                                activation=None,
334 |                                block_type='block8',
335 |                                block_idx=10)
336 | 
337 |     # Final convolution block: 8 x 8 x 1536
338 |     x = conv2d_bn(x, 1536, 1, name='conv_7b')
339 | 
340 |     if include_top:
341 |         # Classification block
342 |         x = GlobalAveragePooling2D(name='avg_pool')(x)
343 |         x = Dense(classes, activation='softmax', name='predictions')(x)
344 |     else:
345 |         if pooling == 'avg':
346 |             x = GlobalAveragePooling2D()(x)
347 |         elif pooling == 'max':
348 |             x = GlobalMaxPooling2D()(x)
349 | 
350 |     # Ensure that the model takes into account
351 |     # any potential predecessors of `input_tensor`
352 |     if input_tensor is not None:
353 |         inputs = get_source_inputs(input_tensor)
354 |     else:
355 |         inputs = img_input
356 | 
357 |     # Create model
358 |     model = Model(inputs, x, name='inception_resnet_v2')
359 | 
360 |     # Load weights
361 |     if weights == 'imagenet':
362 |         if K.image_data_format() == 'channels_first':
363 |             if K.backend() == 'tensorflow':
364 |                 warnings.warn('You are using the TensorFlow backend, yet you '
365 |                               'are using the Theano '
366 |                               'image data format convention '
367 |                               '(`image_data_format="channels_first"`). '
368 |                               'For best performance, set '
369 |                               '`image_data_format="channels_last"` in '
370 |                               'your Keras config '
371 |                               'at ~/.keras/keras.json.')
372 |         if include_top:
373 |             weights_filename = 'inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5'
374 |             weights_path = get_file(weights_filename,
375 |                                     BASE_WEIGHT_URL + weights_filename,
376 |                                     cache_subdir='models',
377 |                                     md5_hash='e693bd0210a403b3192acc6073ad2e96')
378 |         else:
379 |             weights_filename = 'inception_resnet_v2_weights_tf_dim_ordering_tf_kernels_notop.h5'
380 |             weights_path = get_file(weights_filename,
381 |                                     BASE_WEIGHT_URL + weights_filename,
382 |                                     cache_subdir='models',
383 |                                     md5_hash='d19885ff4a710c122648d3b5c3b684e4')
384 |         model.load_weights(weights_path)
385 | 
386 |     return model
387 | 
388 | 
389 | if __name__ == '__main__':
390 |     model = InceptionResNetV2(include_top=True, weights='imagenet')
391 | 
392 |     img_path = 'elephant.jpg'
393 |     img = image.load_img(img_path, target_size=(299, 299))
394 |     x = image.img_to_array(img)
395 |     x = np.expand_dims(x, axis=0)
396 | 
397 |     x = preprocess_input(x)
398 | 
399 |     preds = model.predict(x)
400 |     print('Predicted:', decode_predictions(preds))
401 | 


--------------------------------------------------------------------------------
/inception_v3.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """Inception V3 model for Keras.
  3 | 
  4 | Note that the input image format for this model is different than for
  5 | the VGG16 and ResNet models (299x299 instead of 224x224),
  6 | and that the input preprocessing function is also different (same as Xception).
  7 | 
  8 | # Reference
  9 | 
 10 | - [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567)
 11 | 
 12 | """
 13 | from __future__ import print_function
 14 | from __future__ import absolute_import
 15 | 
 16 | import warnings
 17 | import numpy as np
 18 | 
 19 | from keras.models import Model
 20 | from keras import layers
 21 | from keras.layers import Activation
 22 | from keras.layers import Dense
 23 | from keras.layers import Input
 24 | from keras.layers import BatchNormalization
 25 | from keras.layers import Conv2D
 26 | from keras.layers import MaxPooling2D
 27 | from keras.layers import AveragePooling2D
 28 | from keras.layers import GlobalAveragePooling2D
 29 | from keras.layers import GlobalMaxPooling2D
 30 | from keras.engine.topology import get_source_inputs
 31 | from keras.utils.layer_utils import convert_all_kernels_in_model
 32 | from keras.utils.data_utils import get_file
 33 | from keras import backend as K
 34 | from keras.applications.imagenet_utils import decode_predictions
 35 | from keras.applications.imagenet_utils import _obtain_input_shape
 36 | from keras.preprocessing import image
 37 | 
 38 | 
 39 | WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels.h5'
 40 | WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'
 41 | 
 42 | 
 43 | def conv2d_bn(x,
 44 |               filters,
 45 |               num_row,
 46 |               num_col,
 47 |               padding='same',
 48 |               strides=(1, 1),
 49 |               name=None):
 50 |     """Utility function to apply conv + BN.
 51 | 
 52 |     Arguments:
 53 |         x: input tensor.
 54 |         filters: filters in `Conv2D`.
 55 |         num_row: height of the convolution kernel.
 56 |         num_col: width of the convolution kernel.
 57 |         padding: padding mode in `Conv2D`.
 58 |         strides: strides in `Conv2D`.
 59 |         name: name of the ops; will become `name + '_conv'`
 60 |             for the convolution and `name + '_bn'` for the
 61 |             batch norm layer.
 62 | 
 63 |     Returns:
 64 |         Output tensor after applying `Conv2D` and `BatchNormalization`.
 65 |     """
 66 |     if name is not None:
 67 |         bn_name = name + '_bn'
 68 |         conv_name = name + '_conv'
 69 |     else:
 70 |         bn_name = None
 71 |         conv_name = None
 72 |     if K.image_data_format() == 'channels_first':
 73 |         bn_axis = 1
 74 |     else:
 75 |         bn_axis = 3
 76 |     x = Conv2D(
 77 |         filters, (num_row, num_col),
 78 |         strides=strides,
 79 |         padding=padding,
 80 |         use_bias=False,
 81 |         name=conv_name)(x)
 82 |     x = BatchNormalization(axis=bn_axis, scale=False, name=bn_name)(x)
 83 |     x = Activation('relu', name=name)(x)
 84 |     return x
 85 | 
 86 | 
 87 | def InceptionV3(include_top=True,
 88 |                 weights='imagenet',
 89 |                 input_tensor=None,
 90 |                 input_shape=None,
 91 |                 pooling=None,
 92 |                 classes=1000):
 93 |     """Instantiates the Inception v3 architecture.
 94 | 
 95 |     Optionally loads weights pre-trained
 96 |     on ImageNet. Note that when using TensorFlow,
 97 |     for best performance you should set
 98 |     `image_data_format="channels_last"` in your Keras config
 99 |     at ~/.keras/keras.json.
100 |     The model and the weights are compatible with both
101 |     TensorFlow and Theano. The data format
102 |     convention used by the model is the one
103 |     specified in your Keras config file.
104 |     Note that the default input image size for this model is 299x299.
105 | 
106 |     Arguments:
107 |         include_top: whether to include the fully-connected
108 |             layer at the top of the network.
109 |         weights: one of `None` (random initialization)
110 |             or "imagenet" (pre-training on ImageNet).
111 |         input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
112 |             to use as image input for the model.
113 |         input_shape: optional shape tuple, only to be specified
114 |             if `include_top` is False (otherwise the input shape
115 |             has to be `(299, 299, 3)` (with `channels_last` data format)
116 |             or `(3, 299, 299)` (with `channels_first` data format).
117 |             It should have exactly 3 inputs channels,
118 |             and width and height should be no smaller than 139.
119 |             E.g. `(150, 150, 3)` would be one valid value.
120 |         pooling: Optional pooling mode for feature extraction
121 |             when `include_top` is `False`.
122 |             - `None` means that the output of the model will be
123 |                 the 4D tensor output of the
124 |                 last convolutional layer.
125 |             - `avg` means that global average pooling
126 |                 will be applied to the output of the
127 |                 last convolutional layer, and thus
128 |                 the output of the model will be a 2D tensor.
129 |             - `max` means that global max pooling will
130 |                 be applied.
131 |         classes: optional number of classes to classify images
132 |             into, only to be specified if `include_top` is True, and
133 |             if no `weights` argument is specified.
134 | 
135 |     Returns:
136 |         A Keras model instance.
137 | 
138 |     Raises:
139 |         ValueError: in case of invalid argument for `weights`,
140 |             or invalid input shape.
141 |     """
142 |     if weights not in {'imagenet', None}:
143 |         raise ValueError('The `weights` argument should be either '
144 |                          '`None` (random initialization) or `imagenet` '
145 |                          '(pre-training on ImageNet).')
146 | 
147 |     if weights == 'imagenet' and include_top and classes != 1000:
148 |         raise ValueError('If using `weights` as imagenet with `include_top`'
149 |                          ' as true, `classes` should be 1000')
150 | 
151 |     # Determine proper input shape
152 |     input_shape = _obtain_input_shape(
153 |         input_shape,
154 |         default_size=299,
155 |         min_size=139,
156 |         data_format=K.image_data_format(),
157 |         include_top=include_top)
158 | 
159 |     if input_tensor is None:
160 |         img_input = Input(shape=input_shape)
161 |     else:
162 |         img_input = Input(tensor=input_tensor, shape=input_shape)
163 | 
164 |     if K.image_data_format() == 'channels_first':
165 |         channel_axis = 1
166 |     else:
167 |         channel_axis = 3
168 | 
169 |     x = conv2d_bn(img_input, 32, 3, 3, strides=(2, 2), padding='valid')
170 |     x = conv2d_bn(x, 32, 3, 3, padding='valid')
171 |     x = conv2d_bn(x, 64, 3, 3)
172 |     x = MaxPooling2D((3, 3), strides=(2, 2))(x)
173 | 
174 |     x = conv2d_bn(x, 80, 1, 1, padding='valid')
175 |     x = conv2d_bn(x, 192, 3, 3, padding='valid')
176 |     x = MaxPooling2D((3, 3), strides=(2, 2))(x)
177 | 
178 |     # mixed 0, 1, 2: 35 x 35 x 256
179 |     branch1x1 = conv2d_bn(x, 64, 1, 1)
180 | 
181 |     branch5x5 = conv2d_bn(x, 48, 1, 1)
182 |     branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
183 | 
184 |     branch3x3dbl = conv2d_bn(x, 64, 1, 1)
185 |     branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
186 |     branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
187 | 
188 |     branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
189 |     branch_pool = conv2d_bn(branch_pool, 32, 1, 1)
190 |     x = layers.concatenate(
191 |         [branch1x1, branch5x5, branch3x3dbl, branch_pool],
192 |         axis=channel_axis,
193 |         name='mixed0')
194 | 
195 |     # mixed 1: 35 x 35 x 256
196 |     branch1x1 = conv2d_bn(x, 64, 1, 1)
197 | 
198 |     branch5x5 = conv2d_bn(x, 48, 1, 1)
199 |     branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
200 | 
201 |     branch3x3dbl = conv2d_bn(x, 64, 1, 1)
202 |     branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
203 |     branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
204 | 
205 |     branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
206 |     branch_pool = conv2d_bn(branch_pool, 64, 1, 1)
207 |     x = layers.concatenate(
208 |         [branch1x1, branch5x5, branch3x3dbl, branch_pool],
209 |         axis=channel_axis,
210 |         name='mixed1')
211 | 
212 |     # mixed 2: 35 x 35 x 256
213 |     branch1x1 = conv2d_bn(x, 64, 1, 1)
214 | 
215 |     branch5x5 = conv2d_bn(x, 48, 1, 1)
216 |     branch5x5 = conv2d_bn(branch5x5, 64, 5, 5)
217 | 
218 |     branch3x3dbl = conv2d_bn(x, 64, 1, 1)
219 |     branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
220 |     branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
221 | 
222 |     branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
223 |     branch_pool = conv2d_bn(branch_pool, 64, 1, 1)
224 |     x = layers.concatenate(
225 |         [branch1x1, branch5x5, branch3x3dbl, branch_pool],
226 |         axis=channel_axis,
227 |         name='mixed2')
228 | 
229 |     # mixed 3: 17 x 17 x 768
230 |     branch3x3 = conv2d_bn(x, 384, 3, 3, strides=(2, 2), padding='valid')
231 | 
232 |     branch3x3dbl = conv2d_bn(x, 64, 1, 1)
233 |     branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3)
234 |     branch3x3dbl = conv2d_bn(
235 |         branch3x3dbl, 96, 3, 3, strides=(2, 2), padding='valid')
236 | 
237 |     branch_pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
238 |     x = layers.concatenate(
239 |         [branch3x3, branch3x3dbl, branch_pool], axis=channel_axis, name='mixed3')
240 | 
241 |     # mixed 4: 17 x 17 x 768
242 |     branch1x1 = conv2d_bn(x, 192, 1, 1)
243 | 
244 |     branch7x7 = conv2d_bn(x, 128, 1, 1)
245 |     branch7x7 = conv2d_bn(branch7x7, 128, 1, 7)
246 |     branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)
247 | 
248 |     branch7x7dbl = conv2d_bn(x, 128, 1, 1)
249 |     branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1)
250 |     branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 1, 7)
251 |     branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1)
252 |     branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
253 | 
254 |     branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
255 |     branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
256 |     x = layers.concatenate(
257 |         [branch1x1, branch7x7, branch7x7dbl, branch_pool],
258 |         axis=channel_axis,
259 |         name='mixed4')
260 | 
261 |     # mixed 5, 6: 17 x 17 x 768
262 |     for i in range(2):
263 |         branch1x1 = conv2d_bn(x, 192, 1, 1)
264 | 
265 |         branch7x7 = conv2d_bn(x, 160, 1, 1)
266 |         branch7x7 = conv2d_bn(branch7x7, 160, 1, 7)
267 |         branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)
268 | 
269 |         branch7x7dbl = conv2d_bn(x, 160, 1, 1)
270 |         branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 7, 1)
271 |         branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 1, 7)
272 |         branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 7, 1)
273 |         branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
274 | 
275 |         branch_pool = AveragePooling2D(
276 |             (3, 3), strides=(1, 1), padding='same')(x)
277 |         branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
278 |         x = layers.concatenate(
279 |             [branch1x1, branch7x7, branch7x7dbl, branch_pool],
280 |             axis=channel_axis,
281 |             name='mixed' + str(5 + i))
282 | 
283 |     # mixed 7: 17 x 17 x 768
284 |     branch1x1 = conv2d_bn(x, 192, 1, 1)
285 | 
286 |     branch7x7 = conv2d_bn(x, 192, 1, 1)
287 |     branch7x7 = conv2d_bn(branch7x7, 192, 1, 7)
288 |     branch7x7 = conv2d_bn(branch7x7, 192, 7, 1)
289 | 
290 |     branch7x7dbl = conv2d_bn(x, 192, 1, 1)
291 |     branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 7, 1)
292 |     branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
293 |     branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 7, 1)
294 |     branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7)
295 | 
296 |     branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x)
297 |     branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
298 |     x = layers.concatenate(
299 |         [branch1x1, branch7x7, branch7x7dbl, branch_pool],
300 |         axis=channel_axis,
301 |         name='mixed7')
302 | 
303 |     # mixed 8: 8 x 8 x 1280
304 |     branch3x3 = conv2d_bn(x, 192, 1, 1)
305 |     branch3x3 = conv2d_bn(branch3x3, 320, 3, 3,
306 |                           strides=(2, 2), padding='valid')
307 | 
308 |     branch7x7x3 = conv2d_bn(x, 192, 1, 1)
309 |     branch7x7x3 = conv2d_bn(branch7x7x3, 192, 1, 7)
310 |     branch7x7x3 = conv2d_bn(branch7x7x3, 192, 7, 1)
311 |     branch7x7x3 = conv2d_bn(
312 |         branch7x7x3, 192, 3, 3, strides=(2, 2), padding='valid')
313 | 
314 |     branch_pool = MaxPooling2D((3, 3), strides=(2, 2))(x)
315 |     x = layers.concatenate(
316 |         [branch3x3, branch7x7x3, branch_pool], axis=channel_axis, name='mixed8')
317 | 
318 |     # mixed 9: 8 x 8 x 2048
319 |     for i in range(2):
320 |         branch1x1 = conv2d_bn(x, 320, 1, 1)
321 | 
322 |         branch3x3 = conv2d_bn(x, 384, 1, 1)
323 |         branch3x3_1 = conv2d_bn(branch3x3, 384, 1, 3)
324 |         branch3x3_2 = conv2d_bn(branch3x3, 384, 3, 1)
325 |         branch3x3 = layers.concatenate(
326 |             [branch3x3_1, branch3x3_2], axis=channel_axis, name='mixed9_' + str(i))
327 | 
328 |         branch3x3dbl = conv2d_bn(x, 448, 1, 1)
329 |         branch3x3dbl = conv2d_bn(branch3x3dbl, 384, 3, 3)
330 |         branch3x3dbl_1 = conv2d_bn(branch3x3dbl, 384, 1, 3)
331 |         branch3x3dbl_2 = conv2d_bn(branch3x3dbl, 384, 3, 1)
332 |         branch3x3dbl = layers.concatenate(
333 |             [branch3x3dbl_1, branch3x3dbl_2], axis=channel_axis)
334 | 
335 |         branch_pool = AveragePooling2D(
336 |             (3, 3), strides=(1, 1), padding='same')(x)
337 |         branch_pool = conv2d_bn(branch_pool, 192, 1, 1)
338 |         x = layers.concatenate(
339 |             [branch1x1, branch3x3, branch3x3dbl, branch_pool],
340 |             axis=channel_axis,
341 |             name='mixed' + str(9 + i))
342 |     if include_top:
343 |         # Classification block
344 |         x = GlobalAveragePooling2D(name='avg_pool')(x)
345 |         x = Dense(classes, activation='softmax', name='predictions')(x)
346 |     else:
347 |         if pooling == 'avg':
348 |             x = GlobalAveragePooling2D()(x)
349 |         elif pooling == 'max':
350 |             x = GlobalMaxPooling2D()(x)
351 | 
352 |     # Ensure that the model takes into account
353 |     # any potential predecessors of `input_tensor`.
354 |     if input_tensor is not None:
355 |         inputs = get_source_inputs(input_tensor)
356 |     else:
357 |         inputs = img_input
358 |     # Create model.
359 |     model = Model(inputs, x, name='inception_v3')
360 | 
361 |     # load weights
362 |     if weights == 'imagenet':
363 |         if K.image_data_format() == 'channels_first':
364 |             if K.backend() == 'tensorflow':
365 |                 warnings.warn('You are using the TensorFlow backend, yet you '
366 |                               'are using the Theano '
367 |                               'image data format convention '
368 |                               '(`image_data_format="channels_first"`). '
369 |                               'For best performance, set '
370 |                               '`image_data_format="channels_last"` in '
371 |                               'your Keras config '
372 |                               'at ~/.keras/keras.json.')
373 |         if include_top:
374 |             weights_path = get_file(
375 |                 'inception_v3_weights_tf_dim_ordering_tf_kernels.h5',
376 |                 WEIGHTS_PATH,
377 |                 cache_subdir='models',
378 |                 md5_hash='9a0d58056eeedaa3f26cb7ebd46da564')
379 |         else:
380 |             weights_path = get_file(
381 |                 'inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5',
382 |                 WEIGHTS_PATH_NO_TOP,
383 |                 cache_subdir='models',
384 |                 md5_hash='bcbd6486424b2319ff4ef7d526e38f63')
385 |         model.load_weights(weights_path)
386 |         if K.backend() == 'theano':
387 |             convert_all_kernels_in_model(model)
388 |     return model
389 | 
390 | 
391 | def preprocess_input(x):
392 |     x /= 255.
393 |     x -= 0.5
394 |     x *= 2.
395 |     return x
396 | 
397 | 
398 | if __name__ == '__main__':
399 |     model = InceptionV3(include_top=True, weights='imagenet')
400 | 
401 |     img_path = 'elephant.jpg'
402 |     img = image.load_img(img_path, target_size=(299, 299))
403 |     x = image.img_to_array(img)
404 |     x = np.expand_dims(x, axis=0)
405 | 
406 |     x = preprocess_input(x)
407 | 
408 |     preds = model.predict(x)
409 |     print('Predicted:', decode_predictions(preds))
410 | 


--------------------------------------------------------------------------------
/mobilenet.py:
--------------------------------------------------------------------------------
  1 | """MobileNet v1 models for Keras.
  2 | 
  3 | Code contributed by Somshubra Majumdar (@titu1994).
  4 | 
  5 | MobileNet is a general architecture and can be used for multiple use cases.
  6 | Depending on the use case, it can use different input layer size and
  7 | different width factors. This allows different width models to reduce
  8 | the number of multiply-adds and thereby
  9 | reduce inference cost on mobile devices.
 10 | 
 11 | MobileNets support any input size greater than 32 x 32, with larger image sizes
 12 | offering better performance.
 13 | The number of parameters and number of multiply-adds
 14 | can be modified by using the `alpha` parameter,
 15 | which increases/decreases the number of filters in each layer.
 16 | By altering the image size and `alpha` parameter,
 17 | all 16 models from the paper can be built, with ImageNet weights provided.
 18 | 
 19 | The paper demonstrates the performance of MobileNets using `alpha` values of
 20 | 1.0 (also called 100 % MobileNet), 0.75, 0.5 and 0.25.
 21 | For each of these `alpha` values, weights for 4 different input image sizes
 22 | are provided (224, 192, 160, 128).
 23 | 
 24 | The following table describes the size and accuracy of the 100% MobileNet
 25 | on size 224 x 224:
 26 | ----------------------------------------------------------------------------
 27 | Width Multiplier (alpha) | ImageNet Acc |  Multiply-Adds (M) |  Params (M)
 28 | ----------------------------------------------------------------------------
 29 | |   1.0 MobileNet-224    |    70.6 %     |        529        |     4.2     |
 30 | |   0.75 MobileNet-224   |    68.4 %     |        325        |     2.6     |
 31 | |   0.50 MobileNet-224   |    63.7 %     |        149        |     1.3     |
 32 | |   0.25 MobileNet-224   |    50.6 %     |        41         |     0.5     |
 33 | ----------------------------------------------------------------------------
 34 | 
 35 | The following table describes the performance of
 36 | the 100 % MobileNet on various input sizes:
 37 | ------------------------------------------------------------------------
 38 |       Resolution      | ImageNet Acc | Multiply-Adds (M) | Params (M)
 39 | ------------------------------------------------------------------------
 40 | |  1.0 MobileNet-224  |    70.6 %    |        529        |     4.2     |
 41 | |  1.0 MobileNet-192  |    69.1 %    |        529        |     4.2     |
 42 | |  1.0 MobileNet-160  |    67.2 %    |        529        |     4.2     |
 43 | |  1.0 MobileNet-128  |    64.4 %    |        529        |     4.2     |
 44 | ------------------------------------------------------------------------
 45 | 
 46 | The weights for all 16 models are obtained and translated
 47 | from Tensorflow checkpoints found at
 48 | https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md
 49 | 
 50 | # Reference
 51 | - [MobileNets: Efficient Convolutional Neural Networks for
 52 |    Mobile Vision Applications](https://arxiv.org/pdf/1704.04861.pdf))
 53 | """
 54 | from __future__ import print_function
 55 | from __future__ import absolute_import
 56 | from __future__ import division
 57 | 
 58 | import warnings
 59 | import numpy as np
 60 | 
 61 | from keras.preprocessing import image
 62 | 
 63 | from keras.models import Model
 64 | from keras.layers import Input
 65 | from keras.layers import Activation
 66 | from keras.layers import Dropout
 67 | from keras.layers import Reshape
 68 | from keras.layers import BatchNormalization
 69 | from keras.layers import GlobalAveragePooling2D
 70 | from keras.layers import GlobalMaxPooling2D
 71 | from keras.layers import Conv2D
 72 | from keras import initializers
 73 | from keras import regularizers
 74 | from keras import constraints
 75 | from keras.utils import conv_utils
 76 | from keras.utils.data_utils import get_file
 77 | from keras.engine.topology import get_source_inputs
 78 | from keras.engine import InputSpec
 79 | from keras.applications.imagenet_utils import _obtain_input_shape
 80 | from keras.applications.imagenet_utils import decode_predictions
 81 | from keras import backend as K
 82 | 
 83 | 
 84 | BASE_WEIGHT_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.6/'
 85 | 
 86 | 
 87 | def relu6(x):
 88 |     return K.relu(x, max_value=6)
 89 | 
 90 | 
 91 | def preprocess_input(x):
 92 |     x /= 255.
 93 |     x -= 0.5
 94 |     x *= 2.
 95 |     return x
 96 | 
 97 | 
 98 | class DepthwiseConv2D(Conv2D):
 99 |     """Depthwise separable 2D convolution.
100 | 
101 |     Depthwise Separable convolutions consists in performing
102 |     just the first step in a depthwise spatial convolution
103 |     (which acts on each input channel separately).
104 |     The `depth_multiplier` argument controls how many
105 |     output channels are generated per input channel in the depthwise step.
106 | 
107 |     # Arguments
108 |         kernel_size: An integer or tuple/list of 2 integers, specifying the
109 |             width and height of the 2D convolution window.
110 |             Can be a single integer to specify the same value for
111 |             all spatial dimensions.
112 |         strides: An integer or tuple/list of 2 integers,
113 |             specifying the strides of the convolution along the width and height.
114 |             Can be a single integer to specify the same value for
115 |             all spatial dimensions.
116 |             Specifying any stride value != 1 is incompatible with specifying
117 |             any `dilation_rate` value != 1.
118 |         padding: one of `"valid"` or `"same"` (case-insensitive).
119 |         depth_multiplier: The number of depthwise convolution output channels
120 |             for each input channel.
121 |             The total number of depthwise convolution output
122 |             channels will be equal to `filters_in * depth_multiplier`.
123 |         data_format: A string,
124 |             one of `channels_last` (default) or `channels_first`.
125 |             The ordering of the dimensions in the inputs.
126 |             `channels_last` corresponds to inputs with shape
127 |             `(batch, height, width, channels)` while `channels_first`
128 |             corresponds to inputs with shape
129 |             `(batch, channels, height, width)`.
130 |             It defaults to the `image_data_format` value found in your
131 |             Keras config file at `~/.keras/keras.json`.
132 |             If you never set it, then it will be "channels_last".
133 |         activation: Activation function to use
134 |             (see [activations](keras./activations.md)).
135 |             If you don't specify anything, no activation is applied
136 |             (ie. "linear" activation: `a(x) = x`).
137 |         use_bias: Boolean, whether the layer uses a bias vector.
138 |         depthwise_initializer: Initializer for the depthwise kernel matrix
139 |             (see [initializers](keras./initializers.md)).
140 |         bias_initializer: Initializer for the bias vector
141 |             (see [initializers](keras./initializers.md)).
142 |         depthwise_regularizer: Regularizer function applied to
143 |             the depthwise kernel matrix
144 |             (see [regularizer](keras./regularizers.md)).
145 |         bias_regularizer: Regularizer function applied to the bias vector
146 |             (see [regularizer](keras./regularizers.md)).
147 |         activity_regularizer: Regularizer function applied to
148 |             the output of the layer (its "activation").
149 |             (see [regularizer](keras./regularizers.md)).
150 |         depthwise_constraint: Constraint function applied to
151 |             the depthwise kernel matrix
152 |             (see [constraints](keras./constraints.md)).
153 |         bias_constraint: Constraint function applied to the bias vector
154 |             (see [constraints](keras./constraints.md)).
155 | 
156 |     # Input shape
157 |         4D tensor with shape:
158 |         `[batch, channels, rows, cols]` if data_format='channels_first'
159 |         or 4D tensor with shape:
160 |         `[batch, rows, cols, channels]` if data_format='channels_last'.
161 | 
162 |     # Output shape
163 |         4D tensor with shape:
164 |         `[batch, filters, new_rows, new_cols]` if data_format='channels_first'
165 |         or 4D tensor with shape:
166 |         `[batch, new_rows, new_cols, filters]` if data_format='channels_last'.
167 |         `rows` and `cols` values might have changed due to padding.
168 |     """
169 | 
170 |     def __init__(self,
171 |                  kernel_size,
172 |                  strides=(1, 1),
173 |                  padding='valid',
174 |                  depth_multiplier=1,
175 |                  data_format=None,
176 |                  activation=None,
177 |                  use_bias=True,
178 |                  depthwise_initializer='glorot_uniform',
179 |                  bias_initializer='zeros',
180 |                  depthwise_regularizer=None,
181 |                  bias_regularizer=None,
182 |                  activity_regularizer=None,
183 |                  depthwise_constraint=None,
184 |                  bias_constraint=None,
185 |                  **kwargs):
186 |         super(DepthwiseConv2D, self).__init__(
187 |             filters=None,
188 |             kernel_size=kernel_size,
189 |             strides=strides,
190 |             padding=padding,
191 |             data_format=data_format,
192 |             activation=activation,
193 |             use_bias=use_bias,
194 |             bias_regularizer=bias_regularizer,
195 |             activity_regularizer=activity_regularizer,
196 |             bias_constraint=bias_constraint,
197 |             **kwargs)
198 |         self.depth_multiplier = depth_multiplier
199 |         self.depthwise_initializer = initializers.get(depthwise_initializer)
200 |         self.depthwise_regularizer = regularizers.get(depthwise_regularizer)
201 |         self.depthwise_constraint = constraints.get(depthwise_constraint)
202 |         self.bias_initializer = initializers.get(bias_initializer)
203 | 
204 |     def build(self, input_shape):
205 |         if len(input_shape) < 4:
206 |             raise ValueError('Inputs to `DepthwiseConv2D` should have rank 4. '
207 |                              'Received input shape:', str(input_shape))
208 |         if self.data_format == 'channels_first':
209 |             channel_axis = 1
210 |         else:
211 |             channel_axis = 3
212 |         if input_shape[channel_axis] is None:
213 |             raise ValueError('The channel dimension of the inputs to '
214 |                              '`DepthwiseConv2D` '
215 |                              'should be defined. Found `None`.')
216 |         input_dim = int(input_shape[channel_axis])
217 |         depthwise_kernel_shape = (self.kernel_size[0],
218 |                                   self.kernel_size[1],
219 |                                   input_dim,
220 |                                   self.depth_multiplier)
221 | 
222 |         self.depthwise_kernel = self.add_weight(
223 |             shape=depthwise_kernel_shape,
224 |             initializer=self.depthwise_initializer,
225 |             name='depthwise_kernel',
226 |             regularizer=self.depthwise_regularizer,
227 |             constraint=self.depthwise_constraint)
228 | 
229 |         if self.use_bias:
230 |             self.bias = self.add_weight(shape=(input_dim * self.depth_multiplier,),
231 |                                         initializer=self.bias_initializer,
232 |                                         name='bias',
233 |                                         regularizer=self.bias_regularizer,
234 |                                         constraint=self.bias_constraint)
235 |         else:
236 |             self.bias = None
237 |         # Set input spec.
238 |         self.input_spec = InputSpec(ndim=4, axes={channel_axis: input_dim})
239 |         self.built = True
240 | 
241 |     def call(self, inputs, training=None):
242 |         outputs = K.depthwise_conv2d(
243 |             inputs,
244 |             self.depthwise_kernel,
245 |             strides=self.strides,
246 |             padding=self.padding,
247 |             dilation_rate=self.dilation_rate,
248 |             data_format=self.data_format)
249 | 
250 |         if self.bias:
251 |             outputs = K.bias_add(
252 |                 outputs,
253 |                 self.bias,
254 |                 data_format=self.data_format)
255 | 
256 |         if self.activation is not None:
257 |             return self.activation(outputs)
258 | 
259 |         return outputs
260 | 
261 |     def compute_output_shape(self, input_shape):
262 |         if self.data_format == 'channels_first':
263 |             rows = input_shape[2]
264 |             cols = input_shape[3]
265 |             out_filters = input_shape[1] * self.depth_multiplier
266 |         elif self.data_format == 'channels_last':
267 |             rows = input_shape[1]
268 |             cols = input_shape[2]
269 |             out_filters = input_shape[3] * self.depth_multiplier
270 | 
271 |         rows = conv_utils.conv_output_length(rows, self.kernel_size[0],
272 |                                              self.padding,
273 |                                              self.strides[0])
274 |         cols = conv_utils.conv_output_length(cols, self.kernel_size[1],
275 |                                              self.padding,
276 |                                              self.strides[1])
277 | 
278 |         if self.data_format == 'channels_first':
279 |             return (input_shape[0], out_filters, rows, cols)
280 |         elif self.data_format == 'channels_last':
281 |             return (input_shape[0], rows, cols, out_filters)
282 | 
283 |     def get_config(self):
284 |         config = super(DepthwiseConv2D, self).get_config()
285 |         config.pop('filters')
286 |         config.pop('kernel_initializer')
287 |         config.pop('kernel_regularizer')
288 |         config.pop('kernel_constraint')
289 |         config['depth_multiplier'] = self.depth_multiplier
290 |         config['depthwise_initializer'] = initializers.serialize(self.depthwise_initializer)
291 |         config['depthwise_regularizer'] = regularizers.serialize(self.depthwise_regularizer)
292 |         config['depthwise_constraint'] = constraints.serialize(self.depthwise_constraint)
293 |         return config
294 | 
295 | 
296 | def MobileNet(input_shape=None,
297 |               alpha=1.0,
298 |               depth_multiplier=1,
299 |               dropout=1e-3,
300 |               include_top=True,
301 |               weights='imagenet',
302 |               input_tensor=None,
303 |               pooling=None,
304 |               classes=1000):
305 |     """Instantiates the MobileNet architecture.
306 | 
307 |     Note that only TensorFlow is supported for now,
308 |     therefore it only works with the data format
309 |     `image_data_format='channels_last'` in your Keras config
310 |     at `~/.keras/keras.json`.
311 | 
312 |     To load a MobileNet model via `load_model`, import the custom
313 |     objects `relu6` and `DepthwiseConv2D` and pass them to the
314 |     `custom_objects` parameter.
315 |     E.g.
316 |     model = load_model('mobilenet.h5', custom_objects={
317 |                        'relu6': mobilenet.relu6,
318 |                        'DepthwiseConv2D': mobilenet.DepthwiseConv2D})
319 | 
320 |     # Arguments
321 |         input_shape: optional shape tuple, only to be specified
322 |             if `include_top` is False (otherwise the input shape
323 |             has to be `(224, 224, 3)` (with `channels_last` data format)
324 |             or (3, 224, 224) (with `channels_first` data format).
325 |             It should have exactly 3 inputs channels,
326 |             and width and height should be no smaller than 32.
327 |             E.g. `(200, 200, 3)` would be one valid value.
328 |         alpha: controls the width of the network.
329 |             - If `alpha` < 1.0, proportionally decreases the number
330 |                 of filters in each layer.
331 |             - If `alpha` > 1.0, proportionally increases the number
332 |                 of filters in each layer.
333 |             - If `alpha` = 1, default number of filters from the paper
334 |                  are used at each layer.
335 |         depth_multiplier: depth multiplier for depthwise convolution
336 |             (also called the resolution multiplier)
337 |         dropout: dropout rate
338 |         include_top: whether to include the fully-connected
339 |             layer at the top of the network.
340 |         weights: `None` (random initialization) or
341 |             `imagenet` (ImageNet weights)
342 |         input_tensor: optional Keras tensor (i.e. output of
343 |             `layers.Input()`)
344 |             to use as image input for the model.
345 |         pooling: Optional pooling mode for feature extraction
346 |             when `include_top` is `False`.
347 |             - `None` means that the output of the model
348 |                 will be the 4D tensor output of the
349 |                 last convolutional layer.
350 |             - `avg` means that global average pooling
351 |                 will be applied to the output of the
352 |                 last convolutional layer, and thus
353 |                 the output of the model will be a
354 |                 2D tensor.
355 |             - `max` means that global max pooling will
356 |                 be applied.
357 |         classes: optional number of classes to classify images
358 |             into, only to be specified if `include_top` is True, and
359 |             if no `weights` argument is specified.
360 | 
361 |     # Returns
362 |         A Keras model instance.
363 | 
364 |     # Raises
365 |         ValueError: in case of invalid argument for `weights`,
366 |             or invalid input shape.
367 |         RuntimeError: If attempting to run this model with a
368 |             backend that does not support separable convolutions.
369 |     """
370 | 
371 |     if K.backend() != 'tensorflow':
372 |         raise RuntimeError('Only Tensorflow backend is currently supported, '
373 |                            'as other backends do not support '
374 |                            'depthwise convolution.')
375 | 
376 |     if weights not in {'imagenet', None}:
377 |         raise ValueError('The `weights` argument should be either '
378 |                          '`None` (random initialization) or `imagenet` '
379 |                          '(pre-training on ImageNet).')
380 | 
381 |     if weights == 'imagenet' and include_top and classes != 1000:
382 |         raise ValueError('If using `weights` as ImageNet with `include_top` '
383 |                          'as true, `classes` should be 1000')
384 | 
385 |     # Determine proper input shape.
386 |     input_shape = _obtain_input_shape(input_shape,
387 |                                       default_size=224,
388 |                                       min_size=32,
389 |                                       data_format=K.image_data_format(),
390 |                                       include_top=include_top or weights)
391 |     if K.image_data_format() == 'channels_last':
392 |         row_axis, col_axis = (0, 1)
393 |     else:
394 |         row_axis, col_axis = (1, 2)
395 |     rows = input_shape[row_axis]
396 |     cols = input_shape[col_axis]
397 | 
398 |     if weights == 'imagenet':
399 |         if depth_multiplier != 1:
400 |             raise ValueError('If imagenet weights are being loaded, '
401 |                              'depth multiplier must be 1')
402 | 
403 |         if alpha not in [0.25, 0.50, 0.75, 1.0]:
404 |             raise ValueError('If imagenet weights are being loaded, '
405 |                              'alpha can be one of'
406 |                              '`0.25`, `0.50`, `0.75` or `1.0` only.')
407 | 
408 |         if rows != cols or rows not in [128, 160, 192, 224]:
409 |             raise ValueError('If imagenet weights are being loaded, '
410 |                              'input must have a static square shape (one of '
411 |                              '(128,128), (160,160), (192,192), or (224, 224)).'
412 |                              ' Input shape provided = %s' % (input_shape,))
413 | 
414 |     if K.image_data_format() != 'channels_last':
415 |         warnings.warn('The MobileNet family of models is only available '
416 |                       'for the input data format "channels_last" '
417 |                       '(width, height, channels). '
418 |                       'However your settings specify the default '
419 |                       'data format "channels_first" (channels, width, height).'
420 |                       ' You should set `image_data_format="channels_last"` '
421 |                       'in your Keras config located at ~/.keras/keras.json. '
422 |                       'The model being returned right now will expect inputs '
423 |                       'to follow the "channels_last" data format.')
424 |         K.set_image_data_format('channels_last')
425 |         old_data_format = 'channels_first'
426 |     else:
427 |         old_data_format = None
428 | 
429 |     if input_tensor is None:
430 |         img_input = Input(shape=input_shape)
431 |     else:
432 |         if not K.is_keras_tensor(input_tensor):
433 |             img_input = Input(tensor=input_tensor, shape=input_shape)
434 |         else:
435 |             img_input = input_tensor
436 | 
437 |     x = _conv_block(img_input, 32, alpha, strides=(2, 2))
438 |     x = _depthwise_conv_block(x, 64, alpha, depth_multiplier, block_id=1)
439 | 
440 |     x = _depthwise_conv_block(x, 128, alpha, depth_multiplier,
441 |                               strides=(2, 2), block_id=2)
442 |     x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, block_id=3)
443 | 
444 |     x = _depthwise_conv_block(x, 256, alpha, depth_multiplier,
445 |                               strides=(2, 2), block_id=4)
446 |     x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, block_id=5)
447 | 
448 |     x = _depthwise_conv_block(x, 512, alpha, depth_multiplier,
449 |                               strides=(2, 2), block_id=6)
450 |     x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=7)
451 |     x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=8)
452 |     x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=9)
453 |     x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=10)
454 |     x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=11)
455 | 
456 |     x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier,
457 |                               strides=(2, 2), block_id=12)
458 |     x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13)
459 | 
460 |     if include_top:
461 |         if K.image_data_format() == 'channels_first':
462 |             shape = (int(1024 * alpha), 1, 1)
463 |         else:
464 |             shape = (1, 1, int(1024 * alpha))
465 | 
466 |         x = GlobalAveragePooling2D()(x)
467 |         x = Reshape(shape, name='reshape_1')(x)
468 |         x = Dropout(dropout, name='dropout')(x)
469 |         x = Conv2D(classes, (1, 1),
470 |                    padding='same', name='conv_preds')(x)
471 |         x = Activation('softmax', name='act_softmax')(x)
472 |         x = Reshape((classes,), name='reshape_2')(x)
473 |     else:
474 |         if pooling == 'avg':
475 |             x = GlobalAveragePooling2D()(x)
476 |         elif pooling == 'max':
477 |             x = GlobalMaxPooling2D()(x)
478 | 
479 |     # Ensure that the model takes into account
480 |     # any potential predecessors of `input_tensor`.
481 |     if input_tensor is not None:
482 |         inputs = get_source_inputs(input_tensor)
483 |     else:
484 |         inputs = img_input
485 | 
486 |     # Create model.
487 |     model = Model(inputs, x, name='mobilenet_%0.2f_%s' % (alpha, rows))
488 | 
489 |     # load weights
490 |     if weights == 'imagenet':
491 |         if K.image_data_format() == 'channels_first':
492 |             raise ValueError('Weights for "channels_last" format '
493 |                              'are not available.')
494 |         if alpha == 1.0:
495 |             alpha_text = '1_0'
496 |         elif alpha == 0.75:
497 |             alpha_text = '7_5'
498 |         elif alpha == 0.50:
499 |             alpha_text = '5_0'
500 |         else:
501 |             alpha_text = '2_5'
502 | 
503 |         if include_top:
504 |             model_name = 'mobilenet_%s_%d_tf.h5' % (alpha_text, rows)
505 |             weigh_path = BASE_WEIGHT_PATH + model_name
506 |             weights_path = get_file(model_name,
507 |                                     weigh_path,
508 |                                     cache_subdir='models')
509 |         else:
510 |             model_name = 'mobilenet_%s_%d_tf_no_top.h5' % (alpha_text, rows)
511 |             weigh_path = BASE_WEIGHT_PATH + model_name
512 |             weights_path = get_file(model_name,
513 |                                     weigh_path,
514 |                                     cache_subdir='models')
515 |         model.load_weights(weights_path)
516 | 
517 |     if old_data_format:
518 |         K.set_image_data_format(old_data_format)
519 |     return model
520 | 
521 | 
522 | def _conv_block(inputs, filters, alpha, kernel=(3, 3), strides=(1, 1)):
523 |     """Adds an initial convolution layer (with batch normalization and relu6).
524 | 
525 |     # Arguments
526 |         inputs: Input tensor of shape `(rows, cols, 3)`
527 |             (with `channels_last` data format) or
528 |             (3, rows, cols) (with `channels_first` data format).
529 |             It should have exactly 3 inputs channels,
530 |             and width and height should be no smaller than 32.
531 |             E.g. `(224, 224, 3)` would be one valid value.
532 |         filters: Integer, the dimensionality of the output space
533 |             (i.e. the number output of filters in the convolution).
534 |         alpha: controls the width of the network.
535 |             - If `alpha` < 1.0, proportionally decreases the number
536 |                 of filters in each layer.
537 |             - If `alpha` > 1.0, proportionally increases the number
538 |                 of filters in each layer.
539 |             - If `alpha` = 1, default number of filters from the paper
540 |                  are used at each layer.
541 |         kernel: An integer or tuple/list of 2 integers, specifying the
542 |             width and height of the 2D convolution window.
543 |             Can be a single integer to specify the same value for
544 |             all spatial dimensions.
545 |         strides: An integer or tuple/list of 2 integers,
546 |             specifying the strides of the convolution along the width and height.
547 |             Can be a single integer to specify the same value for
548 |             all spatial dimensions.
549 |             Specifying any stride value != 1 is incompatible with specifying
550 |             any `dilation_rate` value != 1.
551 | 
552 |     # Input shape
553 |         4D tensor with shape:
554 |         `(samples, channels, rows, cols)` if data_format='channels_first'
555 |         or 4D tensor with shape:
556 |         `(samples, rows, cols, channels)` if data_format='channels_last'.
557 | 
558 |     # Output shape
559 |         4D tensor with shape:
560 |         `(samples, filters, new_rows, new_cols)` if data_format='channels_first'
561 |         or 4D tensor with shape:
562 |         `(samples, new_rows, new_cols, filters)` if data_format='channels_last'.
563 |         `rows` and `cols` values might have changed due to stride.
564 | 
565 |     # Returns
566 |         Output tensor of block.
567 |     """
568 |     channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
569 |     filters = int(filters * alpha)
570 |     x = Conv2D(filters, kernel,
571 |                padding='same',
572 |                use_bias=False,
573 |                strides=strides,
574 |                name='conv1')(inputs)
575 |     x = BatchNormalization(axis=channel_axis, name='conv1_bn')(x)
576 |     return Activation(relu6, name='conv1_relu')(x)
577 | 
578 | 
579 | def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha,
580 |                           depth_multiplier=1, strides=(1, 1), block_id=1):
581 |     """Adds a depthwise convolution block.
582 | 
583 |     A depthwise convolution block consists of a depthwise conv,
584 |     batch normalization, relu6, pointwise convolution,
585 |     batch normalization and relu6 activation.
586 | 
587 |     # Arguments
588 |         inputs: Input tensor of shape `(rows, cols, channels)`
589 |             (with `channels_last` data format) or
590 |             (channels, rows, cols) (with `channels_first` data format).
591 |         pointwise_conv_filters: Integer, the dimensionality of the output space
592 |             (i.e. the number output of filters in the pointwise convolution).
593 |         alpha: controls the width of the network.
594 |             - If `alpha` < 1.0, proportionally decreases the number
595 |                 of filters in each layer.
596 |             - If `alpha` > 1.0, proportionally increases the number
597 |                 of filters in each layer.
598 |             - If `alpha` = 1, default number of filters from the paper
599 |                  are used at each layer.
600 |         depth_multiplier: The number of depthwise convolution output channels
601 |             for each input channel.
602 |             The total number of depthwise convolution output
603 |             channels will be equal to `filters_in * depth_multiplier`.
604 |         strides: An integer or tuple/list of 2 integers,
605 |             specifying the strides of the convolution along the width and height.
606 |             Can be a single integer to specify the same value for
607 |             all spatial dimensions.
608 |             Specifying any stride value != 1 is incompatible with specifying
609 |             any `dilation_rate` value != 1.
610 |         block_id: Integer, a unique identification designating the block number.
611 | 
612 |     # Input shape
613 |         4D tensor with shape:
614 |         `(batch, channels, rows, cols)` if data_format='channels_first'
615 |         or 4D tensor with shape:
616 |         `(batch, rows, cols, channels)` if data_format='channels_last'.
617 | 
618 |     # Output shape
619 |         4D tensor with shape:
620 |         `(batch, filters, new_rows, new_cols)` if data_format='channels_first'
621 |         or 4D tensor with shape:
622 |         `(batch, new_rows, new_cols, filters)` if data_format='channels_last'.
623 |         `rows` and `cols` values might have changed due to stride.
624 | 
625 |     # Returns
626 |         Output tensor of block.
627 |     """
628 |     channel_axis = 1 if K.image_data_format() == 'channels_first' else -1
629 |     pointwise_conv_filters = int(pointwise_conv_filters * alpha)
630 | 
631 |     x = DepthwiseConv2D((3, 3),
632 |                         padding='same',
633 |                         depth_multiplier=depth_multiplier,
634 |                         strides=strides,
635 |                         use_bias=False,
636 |                         name='conv_dw_%d' % block_id)(inputs)
637 |     x = BatchNormalization(axis=channel_axis, name='conv_dw_%d_bn' % block_id)(x)
638 |     x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)
639 | 
640 |     x = Conv2D(pointwise_conv_filters, (1, 1),
641 |                padding='same',
642 |                use_bias=False,
643 |                strides=(1, 1),
644 |                name='conv_pw_%d' % block_id)(x)
645 |     x = BatchNormalization(axis=channel_axis, name='conv_pw_%d_bn' % block_id)(x)
646 |     return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x)
647 | 
648 | 
649 | if __name__ == '__main__':
650 |     for r in [128, 160, 192, 224]:
651 |         for a in [0.25, 0.50, 0.75, 1.0]:
652 |             if r == 224:
653 |                 model = MobileNet(include_top=True, weights='imagenet',
654 |                                   input_shape=(r, r, 3), alpha=a)
655 | 
656 |                 img_path = 'elephant.jpg'
657 |                 img = image.load_img(img_path, target_size=(r, r))
658 |                 x = image.img_to_array(img)
659 |                 x = np.expand_dims(x, axis=0)
660 |                 x = preprocess_input(x)
661 |                 print('Input image shape:', x.shape)
662 | 
663 |                 preds = model.predict(x)
664 |                 print(np.argmax(preds))
665 |                 print('Predicted:', decode_predictions(preds, 1))
666 | 
667 |             model = MobileNet(include_top=False, weights='imagenet')
668 | 


--------------------------------------------------------------------------------
/music_tagger_crnn.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | '''MusicTaggerCRNN model for Keras.
  3 | 
  4 | Code by github.com/keunwoochoi.
  5 | 
  6 | # Reference:
  7 | 
  8 | - [Music-auto_tagging-keras](https://github.com/keunwoochoi/music-auto_tagging-keras)
  9 | 
 10 | '''
 11 | from __future__ import print_function
 12 | from __future__ import absolute_import
 13 | 
 14 | import numpy as np
 15 | from keras import backend as K
 16 | from keras.layers import Input, Dense
 17 | from keras.models import Model
 18 | from keras.layers import Dense, Dropout, Reshape, Permute
 19 | from keras.layers.convolutional import Convolution2D
 20 | from keras.layers.convolutional import MaxPooling2D, ZeroPadding2D
 21 | from keras.layers.normalization import BatchNormalization
 22 | from keras.layers.advanced_activations import ELU
 23 | from keras.layers.recurrent import GRU
 24 | from keras.utils.data_utils import get_file
 25 | from keras.utils.layer_utils import convert_all_kernels_in_model
 26 | from audio_conv_utils import decode_predictions, preprocess_input
 27 | 
 28 | TH_WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.3/music_tagger_crnn_weights_tf_kernels_th_dim_ordering.h5'
 29 | TF_WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.3/music_tagger_crnn_weights_tf_kernels_tf_dim_ordering.h5'
 30 | 
 31 | 
 32 | def MusicTaggerCRNN(weights='msd', input_tensor=None,
 33 |                     include_top=True):
 34 |     '''Instantiate the MusicTaggerCRNN architecture,
 35 |     optionally loading weights pre-trained
 36 |     on Million Song Dataset. Note that when using TensorFlow,
 37 |     for best performance you should set
 38 |     `image_dim_ordering="tf"` in your Keras config
 39 |     at ~/.keras/keras.json.
 40 | 
 41 |     The model and the weights are compatible with both
 42 |     TensorFlow and Theano. The dimension ordering
 43 |     convention used by the model is the one
 44 |     specified in your Keras config file.
 45 | 
 46 |     For preparing mel-spectrogram input, see
 47 |     `audio_conv_utils.py` in [applications](https://github.com/fchollet/keras/tree/master/keras/applications).
 48 |     You will need to install [Librosa](http://librosa.github.io/librosa/)
 49 |     to use it.
 50 | 
 51 |     # Arguments
 52 |         weights: one of `None` (random initialization)
 53 |             or "msd" (pre-training on ImageNet).
 54 |         input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
 55 |             to use as image input for the model.
 56 |         include_top: whether to include the 1 fully-connected
 57 |             layer (output layer) at the top of the network.
 58 |             If False, the network outputs 32-dim features.
 59 | 
 60 | 
 61 |     # Returns
 62 |         A Keras model instance.
 63 |     '''
 64 |     if weights not in {'msd', None}:
 65 |         raise ValueError('The `weights` argument should be either '
 66 |                          '`None` (random initialization) or `msd` '
 67 |                          '(pre-training on Million Song Dataset).')
 68 | 
 69 |     # Determine proper input shape
 70 |     if K.image_dim_ordering() == 'th':
 71 |         input_shape = (1, 96, 1366)
 72 |     else:
 73 |         input_shape = (96, 1366, 1)
 74 | 
 75 |     if input_tensor is None:
 76 |         melgram_input = Input(shape=input_shape)
 77 |     else:
 78 |         if not K.is_keras_tensor(input_tensor):
 79 |             melgram_input = Input(tensor=input_tensor, shape=input_shape)
 80 |         else:
 81 |             melgram_input = input_tensor
 82 | 
 83 |     # Determine input axis
 84 |     if K.image_dim_ordering() == 'th':
 85 |         channel_axis = 1
 86 |         freq_axis = 2
 87 |         time_axis = 3
 88 |     else:
 89 |         channel_axis = 3
 90 |         freq_axis = 1
 91 |         time_axis = 2
 92 | 
 93 |     # Input block
 94 |     x = ZeroPadding2D(padding=(0, 37))(melgram_input)
 95 |     x = BatchNormalization(axis=time_axis, name='bn_0_freq')(x)
 96 | 
 97 |     # Conv block 1
 98 |     x = Convolution2D(64, 3, 3, border_mode='same', name='conv1')(x)
 99 |     x = BatchNormalization(axis=channel_axis, mode=0, name='bn1')(x)
100 |     x = ELU()(x)
101 |     x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='pool1')(x)
102 | 
103 |     # Conv block 2
104 |     x = Convolution2D(128, 3, 3, border_mode='same', name='conv2')(x)
105 |     x = BatchNormalization(axis=channel_axis, mode=0, name='bn2')(x)
106 |     x = ELU()(x)
107 |     x = MaxPooling2D(pool_size=(3, 3), strides=(3, 3), name='pool2')(x)
108 | 
109 |     # Conv block 3
110 |     x = Convolution2D(128, 3, 3, border_mode='same', name='conv3')(x)
111 |     x = BatchNormalization(axis=channel_axis, mode=0, name='bn3')(x)
112 |     x = ELU()(x)
113 |     x = MaxPooling2D(pool_size=(4, 4), strides=(4, 4), name='pool3')(x)
114 | 
115 |     # Conv block 4
116 |     x = Convolution2D(128, 3, 3, border_mode='same', name='conv4')(x)
117 |     x = BatchNormalization(axis=channel_axis, mode=0, name='bn4')(x)
118 |     x = ELU()(x)
119 |     x = MaxPooling2D(pool_size=(4, 4), strides=(4, 4), name='pool4')(x)
120 | 
121 |     # reshaping
122 |     if K.image_dim_ordering() == 'th':
123 |         x = Permute((3, 1, 2))(x)
124 |     x = Reshape((15, 128))(x)
125 | 
126 |     # GRU block 1, 2, output
127 |     x = GRU(32, return_sequences=True, name='gru1')(x)
128 |     x = GRU(32, return_sequences=False, name='gru2')(x)
129 | 
130 |     if include_top:
131 |         x = Dense(50, activation='sigmoid', name='output')(x)
132 | 
133 |     # Create model
134 |     model = Model(melgram_input, x)
135 |     if weights is None:
136 |         return model
137 |     else:
138 |         # Load weights
139 |         if K.image_dim_ordering() == 'tf':
140 |             weights_path = get_file('music_tagger_crnn_weights_tf_kernels_tf_dim_ordering.h5',
141 |                                     TF_WEIGHTS_PATH,
142 |                                     cache_subdir='models')
143 |         else:
144 |             weights_path = get_file('music_tagger_crnn_weights_tf_kernels_th_dim_ordering.h5',
145 |                                     TH_WEIGHTS_PATH,
146 |                                     cache_subdir='models')
147 |         model.load_weights(weights_path, by_name=True)
148 |         if K.backend() == 'theano':
149 |             convert_all_kernels_in_model(model)
150 |         return model
151 | 
152 | 
153 | if __name__ == '__main__':
154 |     model = MusicTaggerCRNN(weights='msd')
155 | 
156 |     audio_path = 'audio_file.mp3'
157 |     melgram = preprocess_input(audio_path)
158 |     melgrams = np.expand_dims(melgram, axis=0)
159 | 
160 |     preds = model.predict(melgrams)
161 |     print('Predicted:')
162 |     print(decode_predictions(preds))
163 | 


--------------------------------------------------------------------------------
/resnet50.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | '''ResNet50 model for Keras.
  3 | 
  4 | # Reference:
  5 | 
  6 | - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
  7 | 
  8 | Adapted from code contributed by BigMoyan.
  9 | '''
 10 | from __future__ import print_function
 11 | 
 12 | import numpy as np
 13 | import warnings
 14 | 
 15 | from keras.layers import Input
 16 | from keras import layers
 17 | from keras.layers import Dense
 18 | from keras.layers import Activation
 19 | from keras.layers import Flatten
 20 | from keras.layers import Conv2D
 21 | from keras.layers import MaxPooling2D
 22 | from keras.layers import GlobalMaxPooling2D
 23 | from keras.layers import ZeroPadding2D
 24 | from keras.layers import AveragePooling2D
 25 | from keras.layers import GlobalAveragePooling2D
 26 | from keras.layers import BatchNormalization
 27 | from keras.models import Model
 28 | from keras.preprocessing import image
 29 | import keras.backend as K
 30 | from keras.utils import layer_utils
 31 | from keras.utils.data_utils import get_file
 32 | from keras.applications.imagenet_utils import decode_predictions
 33 | from keras.applications.imagenet_utils import preprocess_input
 34 | from keras.applications.imagenet_utils import _obtain_input_shape
 35 | from keras.engine.topology import get_source_inputs
 36 | 
 37 | 
 38 | WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5'
 39 | WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5'
 40 | 
 41 | 
 42 | def identity_block(input_tensor, kernel_size, filters, stage, block):
 43 |     """The identity block is the block that has no conv layer at shortcut.
 44 | 
 45 |     # Arguments
 46 |         input_tensor: input tensor
 47 |         kernel_size: defualt 3, the kernel size of middle conv layer at main path
 48 |         filters: list of integers, the filterss of 3 conv layer at main path
 49 |         stage: integer, current stage label, used for generating layer names
 50 |         block: 'a','b'..., current block label, used for generating layer names
 51 | 
 52 |     # Returns
 53 |         Output tensor for the block.
 54 |     """
 55 |     filters1, filters2, filters3 = filters
 56 |     if K.image_data_format() == 'channels_last':
 57 |         bn_axis = 3
 58 |     else:
 59 |         bn_axis = 1
 60 |     conv_name_base = 'res' + str(stage) + block + '_branch'
 61 |     bn_name_base = 'bn' + str(stage) + block + '_branch'
 62 | 
 63 |     x = Conv2D(filters1, (1, 1), name=conv_name_base + '2a')(input_tensor)
 64 |     x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
 65 |     x = Activation('relu')(x)
 66 | 
 67 |     x = Conv2D(filters2, kernel_size,
 68 |                padding='same', name=conv_name_base + '2b')(x)
 69 |     x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
 70 |     x = Activation('relu')(x)
 71 | 
 72 |     x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
 73 |     x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)
 74 | 
 75 |     x = layers.add([x, input_tensor])
 76 |     x = Activation('relu')(x)
 77 |     return x
 78 | 
 79 | 
 80 | def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):
 81 |     """conv_block is the block that has a conv layer at shortcut
 82 | 
 83 |     # Arguments
 84 |         input_tensor: input tensor
 85 |         kernel_size: defualt 3, the kernel size of middle conv layer at main path
 86 |         filters: list of integers, the filterss of 3 conv layer at main path
 87 |         stage: integer, current stage label, used for generating layer names
 88 |         block: 'a','b'..., current block label, used for generating layer names
 89 | 
 90 |     # Returns
 91 |         Output tensor for the block.
 92 | 
 93 |     Note that from stage 3, the first conv layer at main path is with strides=(2,2)
 94 |     And the shortcut should have strides=(2,2) as well
 95 |     """
 96 |     filters1, filters2, filters3 = filters
 97 |     if K.image_data_format() == 'channels_last':
 98 |         bn_axis = 3
 99 |     else:
100 |         bn_axis = 1
101 |     conv_name_base = 'res' + str(stage) + block + '_branch'
102 |     bn_name_base = 'bn' + str(stage) + block + '_branch'
103 | 
104 |     x = Conv2D(filters1, (1, 1), strides=strides,
105 |                name=conv_name_base + '2a')(input_tensor)
106 |     x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x)
107 |     x = Activation('relu')(x)
108 | 
109 |     x = Conv2D(filters2, kernel_size, padding='same',
110 |                name=conv_name_base + '2b')(x)
111 |     x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x)
112 |     x = Activation('relu')(x)
113 | 
114 |     x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x)
115 |     x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x)
116 | 
117 |     shortcut = Conv2D(filters3, (1, 1), strides=strides,
118 |                       name=conv_name_base + '1')(input_tensor)
119 |     shortcut = BatchNormalization(axis=bn_axis, name=bn_name_base + '1')(shortcut)
120 | 
121 |     x = layers.add([x, shortcut])
122 |     x = Activation('relu')(x)
123 |     return x
124 | 
125 | 
126 | def ResNet50(include_top=True, weights='imagenet',
127 |              input_tensor=None, input_shape=None,
128 |              pooling=None,
129 |              classes=1000):
130 |     """Instantiates the ResNet50 architecture.
131 | 
132 |     Optionally loads weights pre-trained
133 |     on ImageNet. Note that when using TensorFlow,
134 |     for best performance you should set
135 |     `image_data_format="channels_last"` in your Keras config
136 |     at ~/.keras/keras.json.
137 | 
138 |     The model and the weights are compatible with both
139 |     TensorFlow and Theano. The data format
140 |     convention used by the model is the one
141 |     specified in your Keras config file.
142 | 
143 |     # Arguments
144 |         include_top: whether to include the fully-connected
145 |             layer at the top of the network.
146 |         weights: one of `None` (random initialization)
147 |             or "imagenet" (pre-training on ImageNet).
148 |         input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
149 |             to use as image input for the model.
150 |         input_shape: optional shape tuple, only to be specified
151 |             if `include_top` is False (otherwise the input shape
152 |             has to be `(224, 224, 3)` (with `channels_last` data format)
153 |             or `(3, 224, 244)` (with `channels_first` data format).
154 |             It should have exactly 3 inputs channels,
155 |             and width and height should be no smaller than 197.
156 |             E.g. `(200, 200, 3)` would be one valid value.
157 |         pooling: Optional pooling mode for feature extraction
158 |             when `include_top` is `False`.
159 |             - `None` means that the output of the model will be
160 |                 the 4D tensor output of the
161 |                 last convolutional layer.
162 |             - `avg` means that global average pooling
163 |                 will be applied to the output of the
164 |                 last convolutional layer, and thus
165 |                 the output of the model will be a 2D tensor.
166 |             - `max` means that global max pooling will
167 |                 be applied.
168 |         classes: optional number of classes to classify images
169 |             into, only to be specified if `include_top` is True, and
170 |             if no `weights` argument is specified.
171 | 
172 |     # Returns
173 |         A Keras model instance.
174 | 
175 |     # Raises
176 |         ValueError: in case of invalid argument for `weights`,
177 |             or invalid input shape.
178 |     """
179 |     if weights not in {'imagenet', None}:
180 |         raise ValueError('The `weights` argument should be either '
181 |                          '`None` (random initialization) or `imagenet` '
182 |                          '(pre-training on ImageNet).')
183 | 
184 |     if weights == 'imagenet' and include_top and classes != 1000:
185 |         raise ValueError('If using `weights` as imagenet with `include_top`'
186 |                          ' as true, `classes` should be 1000')
187 | 
188 |     # Determine proper input shape
189 |     input_shape = _obtain_input_shape(input_shape,
190 |                                       default_size=224,
191 |                                       min_size=197,
192 |                                       data_format=K.image_data_format(),
193 |                                       include_top=include_top)
194 | 
195 |     if input_tensor is None:
196 |         img_input = Input(shape=input_shape)
197 |     else:
198 |         if not K.is_keras_tensor(input_tensor):
199 |             img_input = Input(tensor=input_tensor, shape=input_shape)
200 |         else:
201 |             img_input = input_tensor
202 |     if K.image_data_format() == 'channels_last':
203 |         bn_axis = 3
204 |     else:
205 |         bn_axis = 1
206 | 
207 |     x = ZeroPadding2D((3, 3))(img_input)
208 |     x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)
209 |     x = BatchNormalization(axis=bn_axis, name='bn_conv1')(x)
210 |     x = Activation('relu')(x)
211 |     x = MaxPooling2D((3, 3), strides=(2, 2))(x)
212 | 
213 |     x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
214 |     x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
215 |     x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
216 | 
217 |     x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
218 |     x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
219 |     x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
220 |     x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
221 | 
222 |     x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
223 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
224 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
225 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
226 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
227 |     x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')
228 | 
229 |     x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
230 |     x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
231 |     x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')
232 | 
233 |     x = AveragePooling2D((7, 7), name='avg_pool')(x)
234 | 
235 |     if include_top:
236 |         x = Flatten()(x)
237 |         x = Dense(classes, activation='softmax', name='fc1000')(x)
238 |     else:
239 |         if pooling == 'avg':
240 |             x = GlobalAveragePooling2D()(x)
241 |         elif pooling == 'max':
242 |             x = GlobalMaxPooling2D()(x)
243 | 
244 |     # Ensure that the model takes into account
245 |     # any potential predecessors of `input_tensor`.
246 |     if input_tensor is not None:
247 |         inputs = get_source_inputs(input_tensor)
248 |     else:
249 |         inputs = img_input
250 |     # Create model.
251 |     model = Model(inputs, x, name='resnet50')
252 | 
253 |     # load weights
254 |     if weights == 'imagenet':
255 |         if include_top:
256 |             weights_path = get_file('resnet50_weights_tf_dim_ordering_tf_kernels.h5',
257 |                                     WEIGHTS_PATH,
258 |                                     cache_subdir='models',
259 |                                     md5_hash='a7b3fe01876f51b976af0dea6bc144eb')
260 |         else:
261 |             weights_path = get_file('resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5',
262 |                                     WEIGHTS_PATH_NO_TOP,
263 |                                     cache_subdir='models',
264 |                                     md5_hash='a268eb855778b3df3c7506639542a6af')
265 |         model.load_weights(weights_path)
266 |         if K.backend() == 'theano':
267 |             layer_utils.convert_all_kernels_in_model(model)
268 | 
269 |         if K.image_data_format() == 'channels_first':
270 |             if include_top:
271 |                 maxpool = model.get_layer(name='avg_pool')
272 |                 shape = maxpool.output_shape[1:]
273 |                 dense = model.get_layer(name='fc1000')
274 |                 layer_utils.convert_dense_weights_data_format(dense, shape, 'channels_first')
275 | 
276 |             if K.backend() == 'tensorflow':
277 |                 warnings.warn('You are using the TensorFlow backend, yet you '
278 |                               'are using the Theano '
279 |                               'image data format convention '
280 |                               '(`image_data_format="channels_first"`). '
281 |                               'For best performance, set '
282 |                               '`image_data_format="channels_last"` in '
283 |                               'your Keras config '
284 |                               'at ~/.keras/keras.json.')
285 |     return model
286 | 
287 | 
288 | if __name__ == '__main__':
289 |     model = ResNet50(include_top=True, weights='imagenet')
290 | 
291 |     img_path = 'elephant.jpg'
292 |     img = image.load_img(img_path, target_size=(224, 224))
293 |     x = image.img_to_array(img)
294 |     x = np.expand_dims(x, axis=0)
295 |     x = preprocess_input(x)
296 |     print('Input image shape:', x.shape)
297 | 
298 |     preds = model.predict(x)
299 |     print('Predicted:', decode_predictions(preds))
300 | 


--------------------------------------------------------------------------------
/vgg16.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | '''VGG16 model for Keras.
  3 | 
  4 | # Reference:
  5 | 
  6 | - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
  7 | 
  8 | '''
  9 | from __future__ import print_function
 10 | 
 11 | import numpy as np
 12 | import warnings
 13 | 
 14 | from keras.models import Model
 15 | from keras.layers import Flatten
 16 | from keras.layers import Dense
 17 | from keras.layers import Input
 18 | from keras.layers import Conv2D
 19 | from keras.layers import MaxPooling2D
 20 | from keras.layers import GlobalMaxPooling2D
 21 | from keras.layers import GlobalAveragePooling2D
 22 | from keras.preprocessing import image
 23 | from keras.utils import layer_utils
 24 | from keras.utils.data_utils import get_file
 25 | from keras import backend as K
 26 | from keras.applications.imagenet_utils import decode_predictions
 27 | from keras.applications.imagenet_utils import preprocess_input
 28 | from keras.applications.imagenet_utils import _obtain_input_shape
 29 | from keras.engine.topology import get_source_inputs
 30 | 
 31 | 
 32 | WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5'
 33 | WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5'
 34 | 
 35 | 
 36 | def VGG16(include_top=True, weights='imagenet',
 37 |           input_tensor=None, input_shape=None,
 38 |           pooling=None,
 39 |           classes=1000):
 40 |     """Instantiates the VGG16 architecture.
 41 | 
 42 |     Optionally loads weights pre-trained
 43 |     on ImageNet. Note that when using TensorFlow,
 44 |     for best performance you should set
 45 |     `image_data_format="channels_last"` in your Keras config
 46 |     at ~/.keras/keras.json.
 47 | 
 48 |     The model and the weights are compatible with both
 49 |     TensorFlow and Theano. The data format
 50 |     convention used by the model is the one
 51 |     specified in your Keras config file.
 52 | 
 53 |     # Arguments
 54 |         include_top: whether to include the 3 fully-connected
 55 |             layers at the top of the network.
 56 |         weights: one of `None` (random initialization)
 57 |             or "imagenet" (pre-training on ImageNet).
 58 |         input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
 59 |             to use as image input for the model.
 60 |         input_shape: optional shape tuple, only to be specified
 61 |             if `include_top` is False (otherwise the input shape
 62 |             has to be `(224, 224, 3)` (with `channels_last` data format)
 63 |             or `(3, 224, 244)` (with `channels_first` data format).
 64 |             It should have exactly 3 inputs channels,
 65 |             and width and height should be no smaller than 48.
 66 |             E.g. `(200, 200, 3)` would be one valid value.
 67 |         pooling: Optional pooling mode for feature extraction
 68 |             when `include_top` is `False`.
 69 |             - `None` means that the output of the model will be
 70 |                 the 4D tensor output of the
 71 |                 last convolutional layer.
 72 |             - `avg` means that global average pooling
 73 |                 will be applied to the output of the
 74 |                 last convolutional layer, and thus
 75 |                 the output of the model will be a 2D tensor.
 76 |             - `max` means that global max pooling will
 77 |                 be applied.
 78 |         classes: optional number of classes to classify images
 79 |             into, only to be specified if `include_top` is True, and
 80 |             if no `weights` argument is specified.
 81 | 
 82 |     # Returns
 83 |         A Keras model instance.
 84 | 
 85 |     # Raises
 86 |         ValueError: in case of invalid argument for `weights`,
 87 |             or invalid input shape.
 88 |     """
 89 |     if weights not in {'imagenet', None}:
 90 |         raise ValueError('The `weights` argument should be either '
 91 |                          '`None` (random initialization) or `imagenet` '
 92 |                          '(pre-training on ImageNet).')
 93 | 
 94 |     if weights == 'imagenet' and include_top and classes != 1000:
 95 |         raise ValueError('If using `weights` as imagenet with `include_top`'
 96 |                          ' as true, `classes` should be 1000')
 97 |     # Determine proper input shape
 98 |     input_shape = _obtain_input_shape(input_shape,
 99 |                                       default_size=224,
100 |                                       min_size=48,
101 |                                       data_format=K.image_data_format(),
102 |                                       include_top=include_top)
103 | 
104 |     if input_tensor is None:
105 |         img_input = Input(shape=input_shape)
106 |     else:
107 |         if not K.is_keras_tensor(input_tensor):
108 |             img_input = Input(tensor=input_tensor, shape=input_shape)
109 |         else:
110 |             img_input = input_tensor
111 |     # Block 1
112 |     x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
113 |     x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
114 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
115 | 
116 |     # Block 2
117 |     x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
118 |     x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
119 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
120 | 
121 |     # Block 3
122 |     x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
123 |     x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
124 |     x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
125 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
126 | 
127 |     # Block 4
128 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
129 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
130 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
131 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)
132 | 
133 |     # Block 5
134 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
135 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
136 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
137 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)
138 | 
139 |     if include_top:
140 |         # Classification block
141 |         x = Flatten(name='flatten')(x)
142 |         x = Dense(4096, activation='relu', name='fc1')(x)
143 |         x = Dense(4096, activation='relu', name='fc2')(x)
144 |         x = Dense(classes, activation='softmax', name='predictions')(x)
145 |     else:
146 |         if pooling == 'avg':
147 |             x = GlobalAveragePooling2D()(x)
148 |         elif pooling == 'max':
149 |             x = GlobalMaxPooling2D()(x)
150 | 
151 |     # Ensure that the model takes into account
152 |     # any potential predecessors of `input_tensor`.
153 |     if input_tensor is not None:
154 |         inputs = get_source_inputs(input_tensor)
155 |     else:
156 |         inputs = img_input
157 |     # Create model.
158 |     model = Model(inputs, x, name='vgg16')
159 | 
160 |     # load weights
161 |     if weights == 'imagenet':
162 |         if include_top:
163 |             weights_path = get_file('vgg16_weights_tf_dim_ordering_tf_kernels.h5',
164 |                                     WEIGHTS_PATH,
165 |                                     cache_subdir='models')
166 |         else:
167 |             weights_path = get_file('vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5',
168 |                                     WEIGHTS_PATH_NO_TOP,
169 |                                     cache_subdir='models')
170 |         model.load_weights(weights_path)
171 |         if K.backend() == 'theano':
172 |             layer_utils.convert_all_kernels_in_model(model)
173 | 
174 |         if K.image_data_format() == 'channels_first':
175 |             if include_top:
176 |                 maxpool = model.get_layer(name='block5_pool')
177 |                 shape = maxpool.output_shape[1:]
178 |                 dense = model.get_layer(name='fc1')
179 |                 layer_utils.convert_dense_weights_data_format(dense, shape, 'channels_first')
180 | 
181 |             if K.backend() == 'tensorflow':
182 |                 warnings.warn('You are using the TensorFlow backend, yet you '
183 |                               'are using the Theano '
184 |                               'image data format convention '
185 |                               '(`image_data_format="channels_first"`). '
186 |                               'For best performance, set '
187 |                               '`image_data_format="channels_last"` in '
188 |                               'your Keras config '
189 |                               'at ~/.keras/keras.json.')
190 |     return model
191 | 
192 | 
193 | if __name__ == '__main__':
194 |     model = VGG16(include_top=True, weights='imagenet')
195 | 
196 |     img_path = 'elephant.jpg'
197 |     img = image.load_img(img_path, target_size=(224, 224))
198 |     x = image.img_to_array(img)
199 |     x = np.expand_dims(x, axis=0)
200 |     x = preprocess_input(x)
201 |     print('Input image shape:', x.shape)
202 | 
203 |     preds = model.predict(x)
204 |     print('Predicted:', decode_predictions(preds))
205 | 


--------------------------------------------------------------------------------
/vgg19.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | '''VGG19 model for Keras.
  3 | 
  4 | # Reference:
  5 | 
  6 | - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556)
  7 | 
  8 | '''
  9 | from __future__ import print_function
 10 | 
 11 | import numpy as np
 12 | import warnings
 13 | 
 14 | from keras.models import Model
 15 | from keras.layers import Flatten, Dense, Input
 16 | from keras.layers import Conv2D
 17 | from keras.layers import MaxPooling2D
 18 | from keras.layers import GlobalMaxPooling2D
 19 | from keras.layers import GlobalAveragePooling2D
 20 | from keras.preprocessing import image
 21 | from keras.utils import layer_utils
 22 | from keras.utils.data_utils import get_file
 23 | from keras import backend as K
 24 | from keras.applications.imagenet_utils import decode_predictions
 25 | from keras.applications.imagenet_utils import preprocess_input
 26 | from keras.applications.imagenet_utils import _obtain_input_shape
 27 | from keras.engine.topology import get_source_inputs
 28 | 
 29 | 
 30 | WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5'
 31 | WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5'
 32 | 
 33 | 
 34 | def VGG19(include_top=True, weights='imagenet',
 35 |           input_tensor=None, input_shape=None,
 36 |           pooling=None,
 37 |           classes=1000):
 38 |     """Instantiates the VGG19 architecture.
 39 | 
 40 |     Optionally loads weights pre-trained
 41 |     on ImageNet. Note that when using TensorFlow,
 42 |     for best performance you should set
 43 |     `image_data_format="channels_last"` in your Keras config
 44 |     at ~/.keras/keras.json.
 45 | 
 46 |     The model and the weights are compatible with both
 47 |     TensorFlow and Theano. The data format
 48 |     convention used by the model is the one
 49 |     specified in your Keras config file.
 50 | 
 51 |     # Arguments
 52 |         include_top: whether to include the 3 fully-connected
 53 |             layers at the top of the network.
 54 |         weights: one of `None` (random initialization)
 55 |             or "imagenet" (pre-training on ImageNet).
 56 |         input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
 57 |             to use as image input for the model.
 58 |         input_shape: optional shape tuple, only to be specified
 59 |             if `include_top` is False (otherwise the input shape
 60 |             has to be `(224, 224, 3)` (with `channels_last` data format)
 61 |             or `(3, 224, 244)` (with `channels_first` data format).
 62 |             It should have exactly 3 inputs channels,
 63 |             and width and height should be no smaller than 48.
 64 |             E.g. `(200, 200, 3)` would be one valid value.
 65 |         pooling: Optional pooling mode for feature extraction
 66 |             when `include_top` is `False`.
 67 |             - `None` means that the output of the model will be
 68 |                 the 4D tensor output of the
 69 |                 last convolutional layer.
 70 |             - `avg` means that global average pooling
 71 |                 will be applied to the output of the
 72 |                 last convolutional layer, and thus
 73 |                 the output of the model will be a 2D tensor.
 74 |             - `max` means that global max pooling will
 75 |                 be applied.
 76 |         classes: optional number of classes to classify images
 77 |             into, only to be specified if `include_top` is True, and
 78 |             if no `weights` argument is specified.
 79 | 
 80 |     # Returns
 81 |         A Keras model instance.
 82 | 
 83 |     # Raises
 84 |         ValueError: in case of invalid argument for `weights`,
 85 |             or invalid input shape.
 86 |     """
 87 |     if weights not in {'imagenet', None}:
 88 |         raise ValueError('The `weights` argument should be either '
 89 |                          '`None` (random initialization) or `imagenet` '
 90 |                          '(pre-training on ImageNet).')
 91 | 
 92 |     if weights == 'imagenet' and include_top and classes != 1000:
 93 |         raise ValueError('If using `weights` as imagenet with `include_top`'
 94 |                          ' as true, `classes` should be 1000')
 95 |     # Determine proper input shape
 96 |     input_shape = _obtain_input_shape(input_shape,
 97 |                                       default_size=224,
 98 |                                       min_size=48,
 99 |                                       data_format=K.image_data_format(),
100 |                                       include_top=include_top)
101 | 
102 |     if input_tensor is None:
103 |         img_input = Input(shape=input_shape)
104 |     else:
105 |         if not K.is_keras_tensor(input_tensor):
106 |             img_input = Input(tensor=input_tensor, shape=input_shape)
107 |         else:
108 |             img_input = input_tensor
109 |     # Block 1
110 |     x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input)
111 |     x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x)
112 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x)
113 | 
114 |     # Block 2
115 |     x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x)
116 |     x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x)
117 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
118 | 
119 |     # Block 3
120 |     x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
121 |     x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
122 |     x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
123 |     x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv4')(x)
124 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
125 | 
126 |     # Block 4
127 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
128 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
129 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
130 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv4')(x)
131 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)
132 | 
133 |     # Block 5
134 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
135 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
136 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
137 |     x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv4')(x)
138 |     x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)
139 | 
140 |     if include_top:
141 |         # Classification block
142 |         x = Flatten(name='flatten')(x)
143 |         x = Dense(4096, activation='relu', name='fc1')(x)
144 |         x = Dense(4096, activation='relu', name='fc2')(x)
145 |         x = Dense(classes, activation='softmax', name='predictions')(x)
146 |     else:
147 |         if pooling == 'avg':
148 |             x = GlobalAveragePooling2D()(x)
149 |         elif pooling == 'max':
150 |             x = GlobalMaxPooling2D()(x)
151 | 
152 |     # Ensure that the model takes into account
153 |     # any potential predecessors of `input_tensor`.
154 |     if input_tensor is not None:
155 |         inputs = get_source_inputs(input_tensor)
156 |     else:
157 |         inputs = img_input
158 |     # Create model.
159 |     model = Model(inputs, x, name='vgg19')
160 | 
161 |     # load weights
162 |     if weights == 'imagenet':
163 |         if include_top:
164 |             weights_path = get_file('vgg19_weights_tf_dim_ordering_tf_kernels.h5',
165 |                                     WEIGHTS_PATH,
166 |                                     cache_subdir='models')
167 |         else:
168 |             weights_path = get_file('vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5',
169 |                                     WEIGHTS_PATH_NO_TOP,
170 |                                     cache_subdir='models')
171 |         model.load_weights(weights_path)
172 |         if K.backend() == 'theano':
173 |             layer_utils.convert_all_kernels_in_model(model)
174 | 
175 |         if K.image_data_format() == 'channels_first':
176 |             if include_top:
177 |                 maxpool = model.get_layer(name='block5_pool')
178 |                 shape = maxpool.output_shape[1:]
179 |                 dense = model.get_layer(name='fc1')
180 |                 layer_utils.convert_dense_weights_data_format(dense, shape, 'channels_first')
181 | 
182 |             if K.backend() == 'tensorflow':
183 |                 warnings.warn('You are using the TensorFlow backend, yet you '
184 |                               'are using the Theano '
185 |                               'image data format convention '
186 |                               '(`image_data_format="channels_first"`). '
187 |                               'For best performance, set '
188 |                               '`image_data_format="channels_last"` in '
189 |                               'your Keras config '
190 |                               'at ~/.keras/keras.json.')
191 |     return model
192 | 
193 | 
194 | if __name__ == '__main__':
195 |     model = VGG19(include_top=True, weights='imagenet')
196 | 
197 |     img_path = 'cat.jpg'
198 |     img = image.load_img(img_path, target_size=(224, 224))
199 |     x = image.img_to_array(img)
200 |     x = np.expand_dims(x, axis=0)
201 |     x = preprocess_input(x)
202 |     print('Input image shape:', x.shape)
203 | 
204 |     preds = model.predict(x)
205 |     print('Predicted:', decode_predictions(preds))
206 | 


--------------------------------------------------------------------------------
/xception.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | '''Xception V1 model for Keras.
  3 | 
  4 | On ImageNet, this model gets to a top-1 validation accuracy of 0.790.
  5 | and a top-5 validation accuracy of 0.945.
  6 | 
  7 | Do note that the input image format for this model is different than for
  8 | the VGG16 and ResNet models (299x299 instead of 224x224),
  9 | and that the input preprocessing function
 10 | is also different (same as Inception V3).
 11 | 
 12 | Also do note that this model is only available for the TensorFlow backend,
 13 | due to its reliance on `SeparableConvolution` layers.
 14 | 
 15 | # Reference:
 16 | 
 17 | - [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/abs/1610.02357)
 18 | 
 19 | '''
 20 | from __future__ import print_function
 21 | from __future__ import absolute_import
 22 | 
 23 | import warnings
 24 | import numpy as np
 25 | 
 26 | from keras.preprocessing import image
 27 | 
 28 | from keras.models import Model
 29 | from keras import layers
 30 | from keras.layers import Dense
 31 | from keras.layers import Input
 32 | from keras.layers import BatchNormalization
 33 | from keras.layers import Activation
 34 | from keras.layers import Conv2D
 35 | from keras.layers import SeparableConv2D
 36 | from keras.layers import MaxPooling2D
 37 | from keras.layers import GlobalAveragePooling2D
 38 | from keras.layers import GlobalMaxPooling2D
 39 | from keras.engine.topology import get_source_inputs
 40 | from keras.utils.data_utils import get_file
 41 | from keras import backend as K
 42 | from keras.applications.imagenet_utils import decode_predictions
 43 | from keras.applications.imagenet_utils import _obtain_input_shape
 44 | 
 45 | 
 46 | TF_WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.4/xception_weights_tf_dim_ordering_tf_kernels.h5'
 47 | TF_WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.4/xception_weights_tf_dim_ordering_tf_kernels_notop.h5'
 48 | 
 49 | 
 50 | def Xception(include_top=True, weights='imagenet',
 51 |              input_tensor=None, input_shape=None,
 52 |              pooling=None,
 53 |              classes=1000):
 54 |     """Instantiates the Xception architecture.
 55 | 
 56 |     Optionally loads weights pre-trained
 57 |     on ImageNet. This model is available for TensorFlow only,
 58 |     and can only be used with inputs following the TensorFlow
 59 |     data format `(width, height, channels)`.
 60 |     You should set `image_data_format="channels_last"` in your Keras config
 61 |     located at ~/.keras/keras.json.
 62 | 
 63 |     Note that the default input image size for this model is 299x299.
 64 | 
 65 |     # Arguments
 66 |         include_top: whether to include the fully-connected
 67 |             layer at the top of the network.
 68 |         weights: one of `None` (random initialization)
 69 |             or "imagenet" (pre-training on ImageNet).
 70 |         input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
 71 |             to use as image input for the model.
 72 |         input_shape: optional shape tuple, only to be specified
 73 |             if `include_top` is False (otherwise the input shape
 74 |             has to be `(299, 299, 3)`.
 75 |             It should have exactly 3 inputs channels,
 76 |             and width and height should be no smaller than 71.
 77 |             E.g. `(150, 150, 3)` would be one valid value.
 78 |         pooling: Optional pooling mode for feature extraction
 79 |             when `include_top` is `False`.
 80 |             - `None` means that the output of the model will be
 81 |                 the 4D tensor output of the
 82 |                 last convolutional layer.
 83 |             - `avg` means that global average pooling
 84 |                 will be applied to the output of the
 85 |                 last convolutional layer, and thus
 86 |                 the output of the model will be a 2D tensor.
 87 |             - `max` means that global max pooling will
 88 |                 be applied.
 89 |         classes: optional number of classes to classify images
 90 |             into, only to be specified if `include_top` is True, and
 91 |             if no `weights` argument is specified.
 92 | 
 93 |     # Returns
 94 |         A Keras model instance.
 95 | 
 96 |     # Raises
 97 |         ValueError: in case of invalid argument for `weights`,
 98 |             or invalid input shape.
 99 |         RuntimeError: If attempting to run this model with a
100 |             backend that does not support separable convolutions.
101 |     """
102 |     if weights not in {'imagenet', None}:
103 |         raise ValueError('The `weights` argument should be either '
104 |                          '`None` (random initialization) or `imagenet` '
105 |                          '(pre-training on ImageNet).')
106 | 
107 |     if weights == 'imagenet' and include_top and classes != 1000:
108 |         raise ValueError('If using `weights` as imagenet with `include_top`'
109 |                          ' as true, `classes` should be 1000')
110 | 
111 |     if K.backend() != 'tensorflow':
112 |         raise RuntimeError('The Xception model is only available with '
113 |                            'the TensorFlow backend.')
114 |     if K.image_data_format() != 'channels_last':
115 |         warnings.warn('The Xception model is only available for the '
116 |                       'input data format "channels_last" '
117 |                       '(width, height, channels). '
118 |                       'However your settings specify the default '
119 |                       'data format "channels_first" (channels, width, height). '
120 |                       'You should set `image_data_format="channels_last"` in your Keras '
121 |                       'config located at ~/.keras/keras.json. '
122 |                       'The model being returned right now will expect inputs '
123 |                       'to follow the "channels_last" data format.')
124 |         K.set_image_data_format('channels_last')
125 |         old_data_format = 'channels_first'
126 |     else:
127 |         old_data_format = None
128 | 
129 |     # Determine proper input shape
130 |     input_shape = _obtain_input_shape(input_shape,
131 |                                       default_size=299,
132 |                                       min_size=71,
133 |                                       data_format=K.image_data_format(),
134 |                                       include_top=include_top)
135 | 
136 |     if input_tensor is None:
137 |         img_input = Input(shape=input_shape)
138 |     else:
139 |         if not K.is_keras_tensor(input_tensor):
140 |             img_input = Input(tensor=input_tensor, shape=input_shape)
141 |         else:
142 |             img_input = input_tensor
143 | 
144 |     x = Conv2D(32, (3, 3), strides=(2, 2), use_bias=False, name='block1_conv1')(img_input)
145 |     x = BatchNormalization(name='block1_conv1_bn')(x)
146 |     x = Activation('relu', name='block1_conv1_act')(x)
147 |     x = Conv2D(64, (3, 3), use_bias=False, name='block1_conv2')(x)
148 |     x = BatchNormalization(name='block1_conv2_bn')(x)
149 |     x = Activation('relu', name='block1_conv2_act')(x)
150 | 
151 |     residual = Conv2D(128, (1, 1), strides=(2, 2),
152 |                       padding='same', use_bias=False)(x)
153 |     residual = BatchNormalization()(residual)
154 | 
155 |     x = SeparableConv2D(128, (3, 3), padding='same', use_bias=False, name='block2_sepconv1')(x)
156 |     x = BatchNormalization(name='block2_sepconv1_bn')(x)
157 |     x = Activation('relu', name='block2_sepconv2_act')(x)
158 |     x = SeparableConv2D(128, (3, 3), padding='same', use_bias=False, name='block2_sepconv2')(x)
159 |     x = BatchNormalization(name='block2_sepconv2_bn')(x)
160 | 
161 |     x = MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block2_pool')(x)
162 |     x = layers.add([x, residual])
163 | 
164 |     residual = Conv2D(256, (1, 1), strides=(2, 2),
165 |                       padding='same', use_bias=False)(x)
166 |     residual = BatchNormalization()(residual)
167 | 
168 |     x = Activation('relu', name='block3_sepconv1_act')(x)
169 |     x = SeparableConv2D(256, (3, 3), padding='same', use_bias=False, name='block3_sepconv1')(x)
170 |     x = BatchNormalization(name='block3_sepconv1_bn')(x)
171 |     x = Activation('relu', name='block3_sepconv2_act')(x)
172 |     x = SeparableConv2D(256, (3, 3), padding='same', use_bias=False, name='block3_sepconv2')(x)
173 |     x = BatchNormalization(name='block3_sepconv2_bn')(x)
174 | 
175 |     x = MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block3_pool')(x)
176 |     x = layers.add([x, residual])
177 | 
178 |     residual = Conv2D(728, (1, 1), strides=(2, 2),
179 |                       padding='same', use_bias=False)(x)
180 |     residual = BatchNormalization()(residual)
181 | 
182 |     x = Activation('relu', name='block4_sepconv1_act')(x)
183 |     x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name='block4_sepconv1')(x)
184 |     x = BatchNormalization(name='block4_sepconv1_bn')(x)
185 |     x = Activation('relu', name='block4_sepconv2_act')(x)
186 |     x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name='block4_sepconv2')(x)
187 |     x = BatchNormalization(name='block4_sepconv2_bn')(x)
188 | 
189 |     x = MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block4_pool')(x)
190 |     x = layers.add([x, residual])
191 | 
192 |     for i in range(8):
193 |         residual = x
194 |         prefix = 'block' + str(i + 5)
195 | 
196 |         x = Activation('relu', name=prefix + '_sepconv1_act')(x)
197 |         x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name=prefix + '_sepconv1')(x)
198 |         x = BatchNormalization(name=prefix + '_sepconv1_bn')(x)
199 |         x = Activation('relu', name=prefix + '_sepconv2_act')(x)
200 |         x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name=prefix + '_sepconv2')(x)
201 |         x = BatchNormalization(name=prefix + '_sepconv2_bn')(x)
202 |         x = Activation('relu', name=prefix + '_sepconv3_act')(x)
203 |         x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name=prefix + '_sepconv3')(x)
204 |         x = BatchNormalization(name=prefix + '_sepconv3_bn')(x)
205 | 
206 |         x = layers.add([x, residual])
207 | 
208 |     residual = Conv2D(1024, (1, 1), strides=(2, 2),
209 |                       padding='same', use_bias=False)(x)
210 |     residual = BatchNormalization()(residual)
211 | 
212 |     x = Activation('relu', name='block13_sepconv1_act')(x)
213 |     x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name='block13_sepconv1')(x)
214 |     x = BatchNormalization(name='block13_sepconv1_bn')(x)
215 |     x = Activation('relu', name='block13_sepconv2_act')(x)
216 |     x = SeparableConv2D(1024, (3, 3), padding='same', use_bias=False, name='block13_sepconv2')(x)
217 |     x = BatchNormalization(name='block13_sepconv2_bn')(x)
218 | 
219 |     x = MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block13_pool')(x)
220 |     x = layers.add([x, residual])
221 | 
222 |     x = SeparableConv2D(1536, (3, 3), padding='same', use_bias=False, name='block14_sepconv1')(x)
223 |     x = BatchNormalization(name='block14_sepconv1_bn')(x)
224 |     x = Activation('relu', name='block14_sepconv1_act')(x)
225 | 
226 |     x = SeparableConv2D(2048, (3, 3), padding='same', use_bias=False, name='block14_sepconv2')(x)
227 |     x = BatchNormalization(name='block14_sepconv2_bn')(x)
228 |     x = Activation('relu', name='block14_sepconv2_act')(x)
229 | 
230 |     if include_top:
231 |         x = GlobalAveragePooling2D(name='avg_pool')(x)
232 |         x = Dense(classes, activation='softmax', name='predictions')(x)
233 |     else:
234 |         if pooling == 'avg':
235 |             x = GlobalAveragePooling2D()(x)
236 |         elif pooling == 'max':
237 |             x = GlobalMaxPooling2D()(x)
238 | 
239 |     # Ensure that the model takes into account
240 |     # any potential predecessors of `input_tensor`.
241 |     if input_tensor is not None:
242 |         inputs = get_source_inputs(input_tensor)
243 |     else:
244 |         inputs = img_input
245 |     # Create model.
246 |     model = Model(inputs, x, name='xception')
247 | 
248 |     # load weights
249 |     if weights == 'imagenet':
250 |         if include_top:
251 |             weights_path = get_file('xception_weights_tf_dim_ordering_tf_kernels.h5',
252 |                                     TF_WEIGHTS_PATH,
253 |                                     cache_subdir='models')
254 |         else:
255 |             weights_path = get_file('xception_weights_tf_dim_ordering_tf_kernels_notop.h5',
256 |                                     TF_WEIGHTS_PATH_NO_TOP,
257 |                                     cache_subdir='models')
258 |         model.load_weights(weights_path)
259 | 
260 |     if old_data_format:
261 |         K.set_image_data_format(old_data_format)
262 |     return model
263 | 
264 | 
265 | def preprocess_input(x):
266 |     x /= 255.
267 |     x -= 0.5
268 |     x *= 2.
269 |     return x
270 | 
271 | 
272 | if __name__ == '__main__':
273 |     model = Xception(include_top=True, weights='imagenet')
274 | 
275 |     img_path = 'elephant.jpg'
276 |     img = image.load_img(img_path, target_size=(299, 299))
277 |     x = image.img_to_array(img)
278 |     x = np.expand_dims(x, axis=0)
279 |     x = preprocess_input(x)
280 |     print('Input image shape:', x.shape)
281 | 
282 |     preds = model.predict(x)
283 |     print(np.argmax(preds))
284 |     print('Predicted:', decode_predictions(preds, 1))
285 | 


--------------------------------------------------------------------------------