├── LICENSE ├── README.md ├── audio_conv_utils.py ├── imagenet_utils.py ├── inception_resnet_v2.py ├── inception_v3.py ├── mobilenet.py ├── music_tagger_crnn.py ├── resnet50.py ├── vgg16.py ├── vgg19.py └── xception.py /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 François Chollet 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Trained image classification models for Keras 2 | 3 | **THIS REPOSITORY IS DEPRECATED. USE THE MODULE `keras.applications` INSTEAD.** 4 | 5 | Pull requests will not be reviewed nor merged. Direct any PRs to `keras.applications`. Issues are not monitored either. 6 | 7 | ---- 8 | 9 | This repository contains code for the following Keras models: 10 | 11 | - VGG16 12 | - VGG19 13 | - ResNet50 14 | - Inception v3 15 | - CRNN for music tagging 16 | 17 | All architectures are compatible with both TensorFlow and Theano, and upon instantiation the models will be built according to the image dimension ordering set in your Keras configuration file at `~/.keras/keras.json`. For instance, if you have set `image_dim_ordering=tf`, then any model loaded from this repository will get built according to the TensorFlow dimension ordering convention, "Width-Height-Depth". 18 | 19 | Pre-trained weights can be automatically loaded upon instantiation (`weights='imagenet'` argument in model constructor for all image models, `weights='msd'` for the music tagging model). Weights are automatically downloaded if necessary, and cached locally in `~/.keras/models/`. 20 | 21 | ## Examples 22 | 23 | ### Classify images 24 | 25 | ```python 26 | from resnet50 import ResNet50 27 | from keras.preprocessing import image 28 | from imagenet_utils import preprocess_input, decode_predictions 29 | 30 | model = ResNet50(weights='imagenet') 31 | 32 | img_path = 'elephant.jpg' 33 | img = image.load_img(img_path, target_size=(224, 224)) 34 | x = image.img_to_array(img) 35 | x = np.expand_dims(x, axis=0) 36 | x = preprocess_input(x) 37 | 38 | preds = model.predict(x) 39 | print('Predicted:', decode_predictions(preds)) 40 | # print: [[u'n02504458', u'African_elephant']] 41 | ``` 42 | 43 | ### Extract features from images 44 | 45 | ```python 46 | from vgg16 import VGG16 47 | from keras.preprocessing import image 48 | from imagenet_utils import preprocess_input 49 | 50 | model = VGG16(weights='imagenet', include_top=False) 51 | 52 | img_path = 'elephant.jpg' 53 | img = image.load_img(img_path, target_size=(224, 224)) 54 | x = image.img_to_array(img) 55 | x = np.expand_dims(x, axis=0) 56 | x = preprocess_input(x) 57 | 58 | features = model.predict(x) 59 | ``` 60 | 61 | ### Extract features from an arbitrary intermediate layer 62 | 63 | ```python 64 | from vgg19 import VGG19 65 | from keras.preprocessing import image 66 | from imagenet_utils import preprocess_input 67 | from keras.models import Model 68 | 69 | base_model = VGG19(weights='imagenet') 70 | model = Model(input=base_model.input, output=base_model.get_layer('block4_pool').output) 71 | 72 | img_path = 'elephant.jpg' 73 | img = image.load_img(img_path, target_size=(224, 224)) 74 | x = image.img_to_array(img) 75 | x = np.expand_dims(x, axis=0) 76 | x = preprocess_input(x) 77 | 78 | block4_pool_features = model.predict(x) 79 | ``` 80 | 81 | ## References 82 | 83 | - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) - please cite this paper if you use the VGG models in your work. 84 | - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) - please cite this paper if you use the ResNet model in your work. 85 | - [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) - please cite this paper if you use the Inception v3 model in your work. 86 | - [Music-auto_tagging-keras](https://github.com/keunwoochoi/music-auto_tagging-keras) 87 | 88 | Additionally, don't forget to [cite Keras](https://keras.io/getting-started/faq/#how-should-i-cite-keras) if you use these models. 89 | 90 | 91 | ## License 92 | 93 | - All code in this repository is under the MIT license as specified by the LICENSE file. 94 | - The ResNet50 weights are ported from the ones [released by Kaiming He](https://github.com/KaimingHe/deep-residual-networks) under the [MIT license](https://github.com/KaimingHe/deep-residual-networks/blob/master/LICENSE). 95 | - The VGG16 and VGG19 weights are ported from the ones [released by VGG at Oxford](http://www.robots.ox.ac.uk/~vgg/research/very_deep/) under the [Creative Commons Attribution License](https://creativecommons.org/licenses/by/4.0/). 96 | - The Inception v3 weights are trained by ourselves and are released under the MIT license. 97 | -------------------------------------------------------------------------------- /audio_conv_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from keras import backend as K 3 | 4 | 5 | TAGS = ['rock', 'pop', 'alternative', 'indie', 'electronic', 6 | 'female vocalists', 'dance', '00s', 'alternative rock', 'jazz', 7 | 'beautiful', 'metal', 'chillout', 'male vocalists', 8 | 'classic rock', 'soul', 'indie rock', 'Mellow', 'electronica', 9 | '80s', 'folk', '90s', 'chill', 'instrumental', 'punk', 10 | 'oldies', 'blues', 'hard rock', 'ambient', 'acoustic', 11 | 'experimental', 'female vocalist', 'guitar', 'Hip-Hop', 12 | '70s', 'party', 'country', 'easy listening', 13 | 'sexy', 'catchy', 'funk', 'electro', 'heavy metal', 14 | 'Progressive rock', '60s', 'rnb', 'indie pop', 15 | 'sad', 'House', 'happy'] 16 | 17 | 18 | def librosa_exists(): 19 | try: 20 | __import__('librosa') 21 | except ImportError: 22 | return False 23 | else: 24 | return True 25 | 26 | 27 | def preprocess_input(audio_path, dim_ordering='default'): 28 | '''Reads an audio file and outputs a Mel-spectrogram. 29 | ''' 30 | if dim_ordering == 'default': 31 | dim_ordering = K.image_dim_ordering() 32 | assert dim_ordering in {'tf', 'th'} 33 | 34 | if librosa_exists(): 35 | import librosa 36 | else: 37 | raise RuntimeError('Librosa is required to process audio files.\n' + 38 | 'Install it via `pip install librosa` \nor visit ' + 39 | 'http://librosa.github.io/librosa/ for details.') 40 | 41 | # mel-spectrogram parameters 42 | SR = 12000 43 | N_FFT = 512 44 | N_MELS = 96 45 | HOP_LEN = 256 46 | DURA = 29.12 47 | 48 | src, sr = librosa.load(audio_path, sr=SR) 49 | n_sample = src.shape[0] 50 | n_sample_wanted = int(DURA * SR) 51 | 52 | # trim the signal at the center 53 | if n_sample < n_sample_wanted: # if too short 54 | src = np.hstack((src, np.zeros((int(DURA * SR) - n_sample,)))) 55 | elif n_sample > n_sample_wanted: # if too long 56 | src = src[(n_sample - n_sample_wanted) / 2: 57 | (n_sample + n_sample_wanted) / 2] 58 | 59 | logam = librosa.logamplitude 60 | melgram = librosa.feature.melspectrogram 61 | x = logam(melgram(y=src, sr=SR, hop_length=HOP_LEN, 62 | n_fft=N_FFT, n_mels=N_MELS) ** 2, 63 | ref_power=1.0) 64 | 65 | if dim_ordering == 'th': 66 | x = np.expand_dims(x, axis=0) 67 | elif dim_ordering == 'tf': 68 | x = np.expand_dims(x, axis=3) 69 | return x 70 | 71 | 72 | def decode_predictions(preds, top_n=5): 73 | '''Decode the output of a music tagger model. 74 | 75 | # Arguments 76 | preds: 2-dimensional numpy array 77 | top_n: integer in [0, 50], number of items to show 78 | 79 | ''' 80 | assert len(preds.shape) == 2 and preds.shape[1] == 50 81 | results = [] 82 | for pred in preds: 83 | result = zip(TAGS, pred) 84 | result = sorted(result, key=lambda x: x[1], reverse=True) 85 | results.append(result[:top_n]) 86 | return results 87 | -------------------------------------------------------------------------------- /imagenet_utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import json 3 | 4 | from keras.utils.data_utils import get_file 5 | from keras import backend as K 6 | 7 | CLASS_INDEX = None 8 | CLASS_INDEX_PATH = 'https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json' 9 | 10 | 11 | def preprocess_input(x, dim_ordering='default'): 12 | if dim_ordering == 'default': 13 | dim_ordering = K.image_dim_ordering() 14 | assert dim_ordering in {'tf', 'th'} 15 | 16 | if dim_ordering == 'th': 17 | x[:, 0, :, :] -= 103.939 18 | x[:, 1, :, :] -= 116.779 19 | x[:, 2, :, :] -= 123.68 20 | # 'RGB'->'BGR' 21 | x = x[:, ::-1, :, :] 22 | else: 23 | x[:, :, :, 0] -= 103.939 24 | x[:, :, :, 1] -= 116.779 25 | x[:, :, :, 2] -= 123.68 26 | # 'RGB'->'BGR' 27 | x = x[:, :, :, ::-1] 28 | return x 29 | 30 | 31 | def decode_predictions(preds, top=5): 32 | global CLASS_INDEX 33 | if len(preds.shape) != 2 or preds.shape[1] != 1000: 34 | raise ValueError('`decode_predictions` expects ' 35 | 'a batch of predictions ' 36 | '(i.e. a 2D array of shape (samples, 1000)). ' 37 | 'Found array with shape: ' + str(preds.shape)) 38 | if CLASS_INDEX is None: 39 | fpath = get_file('imagenet_class_index.json', 40 | CLASS_INDEX_PATH, 41 | cache_subdir='models') 42 | CLASS_INDEX = json.load(open(fpath)) 43 | results = [] 44 | for pred in preds: 45 | top_indices = pred.argsort()[-top:][::-1] 46 | result = [tuple(CLASS_INDEX[str(i)]) + (pred[i],) for i in top_indices] 47 | results.append(result) 48 | return results 49 | -------------------------------------------------------------------------------- /inception_resnet_v2.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """Inception-ResNet V2 model for Keras. 3 | 4 | Model naming and structure follows TF-slim implementation (which has some additional 5 | layers and different number of filters from the original arXiv paper): 6 | https://github.com/tensorflow/models/blob/master/slim/nets/inception_resnet_v2.py 7 | 8 | Pre-trained ImageNet weights are also converted from TF-slim, which can be found in: 9 | https://github.com/tensorflow/models/tree/master/slim#pre-trained-models 10 | 11 | # Reference 12 | - [Inception-v4, Inception-ResNet and the Impact of 13 | Residual Connections on Learning](https://arxiv.org/abs/1602.07261) 14 | 15 | """ 16 | from __future__ import print_function 17 | from __future__ import absolute_import 18 | 19 | import warnings 20 | import numpy as np 21 | 22 | from keras.preprocessing import image 23 | from keras.models import Model 24 | from keras.layers import Activation 25 | from keras.layers import AveragePooling2D 26 | from keras.layers import BatchNormalization 27 | from keras.layers import Concatenate 28 | from keras.layers import Conv2D 29 | from keras.layers import Dense 30 | from keras.layers import GlobalAveragePooling2D 31 | from keras.layers import GlobalMaxPooling2D 32 | from keras.layers import Input 33 | from keras.layers import Lambda 34 | from keras.layers import MaxPooling2D 35 | from keras.utils.data_utils import get_file 36 | from keras.engine.topology import get_source_inputs 37 | from keras.applications.imagenet_utils import _obtain_input_shape 38 | from keras.applications.imagenet_utils import decode_predictions 39 | from keras import backend as K 40 | 41 | 42 | BASE_WEIGHT_URL = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.7/' 43 | 44 | 45 | def preprocess_input(x): 46 | """Preprocesses a numpy array encoding a batch of images. 47 | 48 | This function applies the "Inception" preprocessing which converts 49 | the RGB values from [0, 255] to [-1, 1]. Note that this preprocessing 50 | function is different from `imagenet_utils.preprocess_input()`. 51 | 52 | # Arguments 53 | x: a 4D numpy array consists of RGB values within [0, 255]. 54 | 55 | # Returns 56 | Preprocessed array. 57 | """ 58 | x /= 255. 59 | x -= 0.5 60 | x *= 2. 61 | return x 62 | 63 | 64 | def conv2d_bn(x, 65 | filters, 66 | kernel_size, 67 | strides=1, 68 | padding='same', 69 | activation='relu', 70 | use_bias=False, 71 | name=None): 72 | """Utility function to apply conv + BN. 73 | 74 | # Arguments 75 | x: input tensor. 76 | filters: filters in `Conv2D`. 77 | kernel_size: kernel size as in `Conv2D`. 78 | padding: padding mode in `Conv2D`. 79 | activation: activation in `Conv2D`. 80 | strides: strides in `Conv2D`. 81 | name: name of the ops; will become `name + '_ac'` for the activation 82 | and `name + '_bn'` for the batch norm layer. 83 | 84 | # Returns 85 | Output tensor after applying `Conv2D` and `BatchNormalization`. 86 | """ 87 | x = Conv2D(filters, 88 | kernel_size, 89 | strides=strides, 90 | padding=padding, 91 | use_bias=use_bias, 92 | name=name)(x) 93 | if not use_bias: 94 | bn_axis = 1 if K.image_data_format() == 'channels_first' else 3 95 | bn_name = None if name is None else name + '_bn' 96 | x = BatchNormalization(axis=bn_axis, scale=False, name=bn_name)(x) 97 | if activation is not None: 98 | ac_name = None if name is None else name + '_ac' 99 | x = Activation(activation, name=ac_name)(x) 100 | return x 101 | 102 | 103 | def inception_resnet_block(x, scale, block_type, block_idx, activation='relu'): 104 | """Adds a Inception-ResNet block. 105 | 106 | This function builds 3 types of Inception-ResNet blocks mentioned 107 | in the paper, controlled by the `block_type` argument (which is the 108 | block name used in the official TF-slim implementation): 109 | - Inception-ResNet-A: `block_type='block35'` 110 | - Inception-ResNet-B: `block_type='block17'` 111 | - Inception-ResNet-C: `block_type='block8'` 112 | 113 | # Arguments 114 | x: input tensor. 115 | scale: scaling factor to scale the residuals (i.e., the output of 116 | passing `x` through an inception module) before adding them 117 | to the shortcut branch. Let `r` be the output from the residual branch, 118 | the output of this block will be `x + scale * r`. 119 | block_type: `'block35'`, `'block17'` or `'block8'`, determines 120 | the network structure in the residual branch. 121 | block_idx: an `int` used for generating layer names. The Inception-ResNet blocks 122 | are repeated many times in this network. We use `block_idx` to identify 123 | each of the repetitions. For example, the first Inception-ResNet-A block 124 | will have `block_type='block35', block_idx=0`, ane the layer names will have 125 | a common prefix `'block35_0'`. 126 | activation: activation function to use at the end of the block 127 | (see [activations](keras./activations.md)). 128 | When `activation=None`, no activation is applied 129 | (i.e., "linear" activation: `a(x) = x`). 130 | 131 | # Returns 132 | Output tensor for the block. 133 | 134 | # Raises 135 | ValueError: if `block_type` is not one of `'block35'`, 136 | `'block17'` or `'block8'`. 137 | """ 138 | if block_type == 'block35': 139 | branch_0 = conv2d_bn(x, 32, 1) 140 | branch_1 = conv2d_bn(x, 32, 1) 141 | branch_1 = conv2d_bn(branch_1, 32, 3) 142 | branch_2 = conv2d_bn(x, 32, 1) 143 | branch_2 = conv2d_bn(branch_2, 48, 3) 144 | branch_2 = conv2d_bn(branch_2, 64, 3) 145 | branches = [branch_0, branch_1, branch_2] 146 | elif block_type == 'block17': 147 | branch_0 = conv2d_bn(x, 192, 1) 148 | branch_1 = conv2d_bn(x, 128, 1) 149 | branch_1 = conv2d_bn(branch_1, 160, [1, 7]) 150 | branch_1 = conv2d_bn(branch_1, 192, [7, 1]) 151 | branches = [branch_0, branch_1] 152 | elif block_type == 'block8': 153 | branch_0 = conv2d_bn(x, 192, 1) 154 | branch_1 = conv2d_bn(x, 192, 1) 155 | branch_1 = conv2d_bn(branch_1, 224, [1, 3]) 156 | branch_1 = conv2d_bn(branch_1, 256, [3, 1]) 157 | branches = [branch_0, branch_1] 158 | else: 159 | raise ValueError('Unknown Inception-ResNet block type. ' 160 | 'Expects "block35", "block17" or "block8", ' 161 | 'but got: ' + str(block_type)) 162 | 163 | block_name = block_type + '_' + str(block_idx) 164 | channel_axis = 1 if K.image_data_format() == 'channels_first' else 3 165 | mixed = Concatenate(axis=channel_axis, name=block_name + '_mixed')(branches) 166 | up = conv2d_bn(mixed, 167 | K.int_shape(x)[channel_axis], 168 | 1, 169 | activation=None, 170 | use_bias=True, 171 | name=block_name + '_conv') 172 | 173 | x = Lambda(lambda inputs, scale: inputs[0] + inputs[1] * scale, 174 | output_shape=K.int_shape(x)[1:], 175 | arguments={'scale': scale}, 176 | name=block_name)([x, up]) 177 | if activation is not None: 178 | x = Activation(activation, name=block_name + '_ac')(x) 179 | return x 180 | 181 | 182 | def InceptionResNetV2(include_top=True, 183 | weights='imagenet', 184 | input_tensor=None, 185 | input_shape=None, 186 | pooling=None, 187 | classes=1000): 188 | """Instantiates the Inception-ResNet v2 architecture. 189 | 190 | Optionally loads weights pre-trained on ImageNet. 191 | Note that when using TensorFlow, for best performance you should 192 | set `"image_data_format": "channels_last"` in your Keras config 193 | at `~/.keras/keras.json`. 194 | 195 | The model and the weights are compatible with both TensorFlow and Theano 196 | backends (but not CNTK). The data format convention used by the model is 197 | the one specified in your Keras config file. 198 | 199 | Note that the default input image size for this model is 299x299, instead 200 | of 224x224 as in the VGG16 and ResNet models. Also, the input preprocessing 201 | function is different (i.e., do not use `imagenet_utils.preprocess_input()` 202 | with this model. Use `preprocess_input()` defined in this module instead). 203 | 204 | # Arguments 205 | include_top: whether to include the fully-connected 206 | layer at the top of the network. 207 | weights: one of `None` (random initialization) 208 | or `'imagenet'` (pre-training on ImageNet). 209 | input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) 210 | to use as image input for the model. 211 | input_shape: optional shape tuple, only to be specified 212 | if `include_top` is `False` (otherwise the input shape 213 | has to be `(299, 299, 3)` (with `'channels_last'` data format) 214 | or `(3, 299, 299)` (with `'channels_first'` data format). 215 | It should have exactly 3 inputs channels, 216 | and width and height should be no smaller than 139. 217 | E.g. `(150, 150, 3)` would be one valid value. 218 | pooling: Optional pooling mode for feature extraction 219 | when `include_top` is `False`. 220 | - `None` means that the output of the model will be 221 | the 4D tensor output of the last convolutional layer. 222 | - `'avg'` means that global average pooling 223 | will be applied to the output of the 224 | last convolutional layer, and thus 225 | the output of the model will be a 2D tensor. 226 | - `'max'` means that global max pooling will be applied. 227 | classes: optional number of classes to classify images 228 | into, only to be specified if `include_top` is `True`, and 229 | if no `weights` argument is specified. 230 | 231 | # Returns 232 | A Keras `Model` instance. 233 | 234 | # Raises 235 | ValueError: in case of invalid argument for `weights`, 236 | or invalid input shape. 237 | RuntimeError: If attempting to run this model with an unsupported backend. 238 | """ 239 | if K.backend() in {'cntk'}: 240 | raise RuntimeError(K.backend() + ' backend is currently unsupported for this model.') 241 | 242 | if weights not in {'imagenet', None}: 243 | raise ValueError('The `weights` argument should be either ' 244 | '`None` (random initialization) or `imagenet` ' 245 | '(pre-training on ImageNet).') 246 | 247 | if weights == 'imagenet' and include_top and classes != 1000: 248 | raise ValueError('If using `weights` as imagenet with `include_top`' 249 | ' as true, `classes` should be 1000') 250 | 251 | # Determine proper input shape 252 | input_shape = _obtain_input_shape( 253 | input_shape, 254 | default_size=299, 255 | min_size=139, 256 | data_format=K.image_data_format(), 257 | require_flatten=False, 258 | weights=weights) 259 | 260 | if input_tensor is None: 261 | img_input = Input(shape=input_shape) 262 | else: 263 | if not K.is_keras_tensor(input_tensor): 264 | img_input = Input(tensor=input_tensor, shape=input_shape) 265 | else: 266 | img_input = input_tensor 267 | 268 | # Stem block: 35 x 35 x 192 269 | x = conv2d_bn(img_input, 32, 3, strides=2, padding='valid') 270 | x = conv2d_bn(x, 32, 3, padding='valid') 271 | x = conv2d_bn(x, 64, 3) 272 | x = MaxPooling2D(3, strides=2)(x) 273 | x = conv2d_bn(x, 80, 1, padding='valid') 274 | x = conv2d_bn(x, 192, 3, padding='valid') 275 | x = MaxPooling2D(3, strides=2)(x) 276 | 277 | # Mixed 5b (Inception-A block): 35 x 35 x 320 278 | branch_0 = conv2d_bn(x, 96, 1) 279 | branch_1 = conv2d_bn(x, 48, 1) 280 | branch_1 = conv2d_bn(branch_1, 64, 5) 281 | branch_2 = conv2d_bn(x, 64, 1) 282 | branch_2 = conv2d_bn(branch_2, 96, 3) 283 | branch_2 = conv2d_bn(branch_2, 96, 3) 284 | branch_pool = AveragePooling2D(3, strides=1, padding='same')(x) 285 | branch_pool = conv2d_bn(branch_pool, 64, 1) 286 | branches = [branch_0, branch_1, branch_2, branch_pool] 287 | channel_axis = 1 if K.image_data_format() == 'channels_first' else 3 288 | x = Concatenate(axis=channel_axis, name='mixed_5b')(branches) 289 | 290 | # 10x block35 (Inception-ResNet-A block): 35 x 35 x 320 291 | for block_idx in range(1, 11): 292 | x = inception_resnet_block(x, 293 | scale=0.17, 294 | block_type='block35', 295 | block_idx=block_idx) 296 | 297 | # Mixed 6a (Reduction-A block): 17 x 17 x 1088 298 | branch_0 = conv2d_bn(x, 384, 3, strides=2, padding='valid') 299 | branch_1 = conv2d_bn(x, 256, 1) 300 | branch_1 = conv2d_bn(branch_1, 256, 3) 301 | branch_1 = conv2d_bn(branch_1, 384, 3, strides=2, padding='valid') 302 | branch_pool = MaxPooling2D(3, strides=2, padding='valid')(x) 303 | branches = [branch_0, branch_1, branch_pool] 304 | x = Concatenate(axis=channel_axis, name='mixed_6a')(branches) 305 | 306 | # 20x block17 (Inception-ResNet-B block): 17 x 17 x 1088 307 | for block_idx in range(1, 21): 308 | x = inception_resnet_block(x, 309 | scale=0.1, 310 | block_type='block17', 311 | block_idx=block_idx) 312 | 313 | # Mixed 7a (Reduction-B block): 8 x 8 x 2080 314 | branch_0 = conv2d_bn(x, 256, 1) 315 | branch_0 = conv2d_bn(branch_0, 384, 3, strides=2, padding='valid') 316 | branch_1 = conv2d_bn(x, 256, 1) 317 | branch_1 = conv2d_bn(branch_1, 288, 3, strides=2, padding='valid') 318 | branch_2 = conv2d_bn(x, 256, 1) 319 | branch_2 = conv2d_bn(branch_2, 288, 3) 320 | branch_2 = conv2d_bn(branch_2, 320, 3, strides=2, padding='valid') 321 | branch_pool = MaxPooling2D(3, strides=2, padding='valid')(x) 322 | branches = [branch_0, branch_1, branch_2, branch_pool] 323 | x = Concatenate(axis=channel_axis, name='mixed_7a')(branches) 324 | 325 | # 10x block8 (Inception-ResNet-C block): 8 x 8 x 2080 326 | for block_idx in range(1, 10): 327 | x = inception_resnet_block(x, 328 | scale=0.2, 329 | block_type='block8', 330 | block_idx=block_idx) 331 | x = inception_resnet_block(x, 332 | scale=1., 333 | activation=None, 334 | block_type='block8', 335 | block_idx=10) 336 | 337 | # Final convolution block: 8 x 8 x 1536 338 | x = conv2d_bn(x, 1536, 1, name='conv_7b') 339 | 340 | if include_top: 341 | # Classification block 342 | x = GlobalAveragePooling2D(name='avg_pool')(x) 343 | x = Dense(classes, activation='softmax', name='predictions')(x) 344 | else: 345 | if pooling == 'avg': 346 | x = GlobalAveragePooling2D()(x) 347 | elif pooling == 'max': 348 | x = GlobalMaxPooling2D()(x) 349 | 350 | # Ensure that the model takes into account 351 | # any potential predecessors of `input_tensor` 352 | if input_tensor is not None: 353 | inputs = get_source_inputs(input_tensor) 354 | else: 355 | inputs = img_input 356 | 357 | # Create model 358 | model = Model(inputs, x, name='inception_resnet_v2') 359 | 360 | # Load weights 361 | if weights == 'imagenet': 362 | if K.image_data_format() == 'channels_first': 363 | if K.backend() == 'tensorflow': 364 | warnings.warn('You are using the TensorFlow backend, yet you ' 365 | 'are using the Theano ' 366 | 'image data format convention ' 367 | '(`image_data_format="channels_first"`). ' 368 | 'For best performance, set ' 369 | '`image_data_format="channels_last"` in ' 370 | 'your Keras config ' 371 | 'at ~/.keras/keras.json.') 372 | if include_top: 373 | weights_filename = 'inception_resnet_v2_weights_tf_dim_ordering_tf_kernels.h5' 374 | weights_path = get_file(weights_filename, 375 | BASE_WEIGHT_URL + weights_filename, 376 | cache_subdir='models', 377 | md5_hash='e693bd0210a403b3192acc6073ad2e96') 378 | else: 379 | weights_filename = 'inception_resnet_v2_weights_tf_dim_ordering_tf_kernels_notop.h5' 380 | weights_path = get_file(weights_filename, 381 | BASE_WEIGHT_URL + weights_filename, 382 | cache_subdir='models', 383 | md5_hash='d19885ff4a710c122648d3b5c3b684e4') 384 | model.load_weights(weights_path) 385 | 386 | return model 387 | 388 | 389 | if __name__ == '__main__': 390 | model = InceptionResNetV2(include_top=True, weights='imagenet') 391 | 392 | img_path = 'elephant.jpg' 393 | img = image.load_img(img_path, target_size=(299, 299)) 394 | x = image.img_to_array(img) 395 | x = np.expand_dims(x, axis=0) 396 | 397 | x = preprocess_input(x) 398 | 399 | preds = model.predict(x) 400 | print('Predicted:', decode_predictions(preds)) 401 | -------------------------------------------------------------------------------- /inception_v3.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """Inception V3 model for Keras. 3 | 4 | Note that the input image format for this model is different than for 5 | the VGG16 and ResNet models (299x299 instead of 224x224), 6 | and that the input preprocessing function is also different (same as Xception). 7 | 8 | # Reference 9 | 10 | - [Rethinking the Inception Architecture for Computer Vision](http://arxiv.org/abs/1512.00567) 11 | 12 | """ 13 | from __future__ import print_function 14 | from __future__ import absolute_import 15 | 16 | import warnings 17 | import numpy as np 18 | 19 | from keras.models import Model 20 | from keras import layers 21 | from keras.layers import Activation 22 | from keras.layers import Dense 23 | from keras.layers import Input 24 | from keras.layers import BatchNormalization 25 | from keras.layers import Conv2D 26 | from keras.layers import MaxPooling2D 27 | from keras.layers import AveragePooling2D 28 | from keras.layers import GlobalAveragePooling2D 29 | from keras.layers import GlobalMaxPooling2D 30 | from keras.engine.topology import get_source_inputs 31 | from keras.utils.layer_utils import convert_all_kernels_in_model 32 | from keras.utils.data_utils import get_file 33 | from keras import backend as K 34 | from keras.applications.imagenet_utils import decode_predictions 35 | from keras.applications.imagenet_utils import _obtain_input_shape 36 | from keras.preprocessing import image 37 | 38 | 39 | WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels.h5' 40 | WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5' 41 | 42 | 43 | def conv2d_bn(x, 44 | filters, 45 | num_row, 46 | num_col, 47 | padding='same', 48 | strides=(1, 1), 49 | name=None): 50 | """Utility function to apply conv + BN. 51 | 52 | Arguments: 53 | x: input tensor. 54 | filters: filters in `Conv2D`. 55 | num_row: height of the convolution kernel. 56 | num_col: width of the convolution kernel. 57 | padding: padding mode in `Conv2D`. 58 | strides: strides in `Conv2D`. 59 | name: name of the ops; will become `name + '_conv'` 60 | for the convolution and `name + '_bn'` for the 61 | batch norm layer. 62 | 63 | Returns: 64 | Output tensor after applying `Conv2D` and `BatchNormalization`. 65 | """ 66 | if name is not None: 67 | bn_name = name + '_bn' 68 | conv_name = name + '_conv' 69 | else: 70 | bn_name = None 71 | conv_name = None 72 | if K.image_data_format() == 'channels_first': 73 | bn_axis = 1 74 | else: 75 | bn_axis = 3 76 | x = Conv2D( 77 | filters, (num_row, num_col), 78 | strides=strides, 79 | padding=padding, 80 | use_bias=False, 81 | name=conv_name)(x) 82 | x = BatchNormalization(axis=bn_axis, scale=False, name=bn_name)(x) 83 | x = Activation('relu', name=name)(x) 84 | return x 85 | 86 | 87 | def InceptionV3(include_top=True, 88 | weights='imagenet', 89 | input_tensor=None, 90 | input_shape=None, 91 | pooling=None, 92 | classes=1000): 93 | """Instantiates the Inception v3 architecture. 94 | 95 | Optionally loads weights pre-trained 96 | on ImageNet. Note that when using TensorFlow, 97 | for best performance you should set 98 | `image_data_format="channels_last"` in your Keras config 99 | at ~/.keras/keras.json. 100 | The model and the weights are compatible with both 101 | TensorFlow and Theano. The data format 102 | convention used by the model is the one 103 | specified in your Keras config file. 104 | Note that the default input image size for this model is 299x299. 105 | 106 | Arguments: 107 | include_top: whether to include the fully-connected 108 | layer at the top of the network. 109 | weights: one of `None` (random initialization) 110 | or "imagenet" (pre-training on ImageNet). 111 | input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) 112 | to use as image input for the model. 113 | input_shape: optional shape tuple, only to be specified 114 | if `include_top` is False (otherwise the input shape 115 | has to be `(299, 299, 3)` (with `channels_last` data format) 116 | or `(3, 299, 299)` (with `channels_first` data format). 117 | It should have exactly 3 inputs channels, 118 | and width and height should be no smaller than 139. 119 | E.g. `(150, 150, 3)` would be one valid value. 120 | pooling: Optional pooling mode for feature extraction 121 | when `include_top` is `False`. 122 | - `None` means that the output of the model will be 123 | the 4D tensor output of the 124 | last convolutional layer. 125 | - `avg` means that global average pooling 126 | will be applied to the output of the 127 | last convolutional layer, and thus 128 | the output of the model will be a 2D tensor. 129 | - `max` means that global max pooling will 130 | be applied. 131 | classes: optional number of classes to classify images 132 | into, only to be specified if `include_top` is True, and 133 | if no `weights` argument is specified. 134 | 135 | Returns: 136 | A Keras model instance. 137 | 138 | Raises: 139 | ValueError: in case of invalid argument for `weights`, 140 | or invalid input shape. 141 | """ 142 | if weights not in {'imagenet', None}: 143 | raise ValueError('The `weights` argument should be either ' 144 | '`None` (random initialization) or `imagenet` ' 145 | '(pre-training on ImageNet).') 146 | 147 | if weights == 'imagenet' and include_top and classes != 1000: 148 | raise ValueError('If using `weights` as imagenet with `include_top`' 149 | ' as true, `classes` should be 1000') 150 | 151 | # Determine proper input shape 152 | input_shape = _obtain_input_shape( 153 | input_shape, 154 | default_size=299, 155 | min_size=139, 156 | data_format=K.image_data_format(), 157 | include_top=include_top) 158 | 159 | if input_tensor is None: 160 | img_input = Input(shape=input_shape) 161 | else: 162 | img_input = Input(tensor=input_tensor, shape=input_shape) 163 | 164 | if K.image_data_format() == 'channels_first': 165 | channel_axis = 1 166 | else: 167 | channel_axis = 3 168 | 169 | x = conv2d_bn(img_input, 32, 3, 3, strides=(2, 2), padding='valid') 170 | x = conv2d_bn(x, 32, 3, 3, padding='valid') 171 | x = conv2d_bn(x, 64, 3, 3) 172 | x = MaxPooling2D((3, 3), strides=(2, 2))(x) 173 | 174 | x = conv2d_bn(x, 80, 1, 1, padding='valid') 175 | x = conv2d_bn(x, 192, 3, 3, padding='valid') 176 | x = MaxPooling2D((3, 3), strides=(2, 2))(x) 177 | 178 | # mixed 0, 1, 2: 35 x 35 x 256 179 | branch1x1 = conv2d_bn(x, 64, 1, 1) 180 | 181 | branch5x5 = conv2d_bn(x, 48, 1, 1) 182 | branch5x5 = conv2d_bn(branch5x5, 64, 5, 5) 183 | 184 | branch3x3dbl = conv2d_bn(x, 64, 1, 1) 185 | branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3) 186 | branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3) 187 | 188 | branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x) 189 | branch_pool = conv2d_bn(branch_pool, 32, 1, 1) 190 | x = layers.concatenate( 191 | [branch1x1, branch5x5, branch3x3dbl, branch_pool], 192 | axis=channel_axis, 193 | name='mixed0') 194 | 195 | # mixed 1: 35 x 35 x 256 196 | branch1x1 = conv2d_bn(x, 64, 1, 1) 197 | 198 | branch5x5 = conv2d_bn(x, 48, 1, 1) 199 | branch5x5 = conv2d_bn(branch5x5, 64, 5, 5) 200 | 201 | branch3x3dbl = conv2d_bn(x, 64, 1, 1) 202 | branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3) 203 | branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3) 204 | 205 | branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x) 206 | branch_pool = conv2d_bn(branch_pool, 64, 1, 1) 207 | x = layers.concatenate( 208 | [branch1x1, branch5x5, branch3x3dbl, branch_pool], 209 | axis=channel_axis, 210 | name='mixed1') 211 | 212 | # mixed 2: 35 x 35 x 256 213 | branch1x1 = conv2d_bn(x, 64, 1, 1) 214 | 215 | branch5x5 = conv2d_bn(x, 48, 1, 1) 216 | branch5x5 = conv2d_bn(branch5x5, 64, 5, 5) 217 | 218 | branch3x3dbl = conv2d_bn(x, 64, 1, 1) 219 | branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3) 220 | branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3) 221 | 222 | branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x) 223 | branch_pool = conv2d_bn(branch_pool, 64, 1, 1) 224 | x = layers.concatenate( 225 | [branch1x1, branch5x5, branch3x3dbl, branch_pool], 226 | axis=channel_axis, 227 | name='mixed2') 228 | 229 | # mixed 3: 17 x 17 x 768 230 | branch3x3 = conv2d_bn(x, 384, 3, 3, strides=(2, 2), padding='valid') 231 | 232 | branch3x3dbl = conv2d_bn(x, 64, 1, 1) 233 | branch3x3dbl = conv2d_bn(branch3x3dbl, 96, 3, 3) 234 | branch3x3dbl = conv2d_bn( 235 | branch3x3dbl, 96, 3, 3, strides=(2, 2), padding='valid') 236 | 237 | branch_pool = MaxPooling2D((3, 3), strides=(2, 2))(x) 238 | x = layers.concatenate( 239 | [branch3x3, branch3x3dbl, branch_pool], axis=channel_axis, name='mixed3') 240 | 241 | # mixed 4: 17 x 17 x 768 242 | branch1x1 = conv2d_bn(x, 192, 1, 1) 243 | 244 | branch7x7 = conv2d_bn(x, 128, 1, 1) 245 | branch7x7 = conv2d_bn(branch7x7, 128, 1, 7) 246 | branch7x7 = conv2d_bn(branch7x7, 192, 7, 1) 247 | 248 | branch7x7dbl = conv2d_bn(x, 128, 1, 1) 249 | branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1) 250 | branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 1, 7) 251 | branch7x7dbl = conv2d_bn(branch7x7dbl, 128, 7, 1) 252 | branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7) 253 | 254 | branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x) 255 | branch_pool = conv2d_bn(branch_pool, 192, 1, 1) 256 | x = layers.concatenate( 257 | [branch1x1, branch7x7, branch7x7dbl, branch_pool], 258 | axis=channel_axis, 259 | name='mixed4') 260 | 261 | # mixed 5, 6: 17 x 17 x 768 262 | for i in range(2): 263 | branch1x1 = conv2d_bn(x, 192, 1, 1) 264 | 265 | branch7x7 = conv2d_bn(x, 160, 1, 1) 266 | branch7x7 = conv2d_bn(branch7x7, 160, 1, 7) 267 | branch7x7 = conv2d_bn(branch7x7, 192, 7, 1) 268 | 269 | branch7x7dbl = conv2d_bn(x, 160, 1, 1) 270 | branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 7, 1) 271 | branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 1, 7) 272 | branch7x7dbl = conv2d_bn(branch7x7dbl, 160, 7, 1) 273 | branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7) 274 | 275 | branch_pool = AveragePooling2D( 276 | (3, 3), strides=(1, 1), padding='same')(x) 277 | branch_pool = conv2d_bn(branch_pool, 192, 1, 1) 278 | x = layers.concatenate( 279 | [branch1x1, branch7x7, branch7x7dbl, branch_pool], 280 | axis=channel_axis, 281 | name='mixed' + str(5 + i)) 282 | 283 | # mixed 7: 17 x 17 x 768 284 | branch1x1 = conv2d_bn(x, 192, 1, 1) 285 | 286 | branch7x7 = conv2d_bn(x, 192, 1, 1) 287 | branch7x7 = conv2d_bn(branch7x7, 192, 1, 7) 288 | branch7x7 = conv2d_bn(branch7x7, 192, 7, 1) 289 | 290 | branch7x7dbl = conv2d_bn(x, 192, 1, 1) 291 | branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 7, 1) 292 | branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7) 293 | branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 7, 1) 294 | branch7x7dbl = conv2d_bn(branch7x7dbl, 192, 1, 7) 295 | 296 | branch_pool = AveragePooling2D((3, 3), strides=(1, 1), padding='same')(x) 297 | branch_pool = conv2d_bn(branch_pool, 192, 1, 1) 298 | x = layers.concatenate( 299 | [branch1x1, branch7x7, branch7x7dbl, branch_pool], 300 | axis=channel_axis, 301 | name='mixed7') 302 | 303 | # mixed 8: 8 x 8 x 1280 304 | branch3x3 = conv2d_bn(x, 192, 1, 1) 305 | branch3x3 = conv2d_bn(branch3x3, 320, 3, 3, 306 | strides=(2, 2), padding='valid') 307 | 308 | branch7x7x3 = conv2d_bn(x, 192, 1, 1) 309 | branch7x7x3 = conv2d_bn(branch7x7x3, 192, 1, 7) 310 | branch7x7x3 = conv2d_bn(branch7x7x3, 192, 7, 1) 311 | branch7x7x3 = conv2d_bn( 312 | branch7x7x3, 192, 3, 3, strides=(2, 2), padding='valid') 313 | 314 | branch_pool = MaxPooling2D((3, 3), strides=(2, 2))(x) 315 | x = layers.concatenate( 316 | [branch3x3, branch7x7x3, branch_pool], axis=channel_axis, name='mixed8') 317 | 318 | # mixed 9: 8 x 8 x 2048 319 | for i in range(2): 320 | branch1x1 = conv2d_bn(x, 320, 1, 1) 321 | 322 | branch3x3 = conv2d_bn(x, 384, 1, 1) 323 | branch3x3_1 = conv2d_bn(branch3x3, 384, 1, 3) 324 | branch3x3_2 = conv2d_bn(branch3x3, 384, 3, 1) 325 | branch3x3 = layers.concatenate( 326 | [branch3x3_1, branch3x3_2], axis=channel_axis, name='mixed9_' + str(i)) 327 | 328 | branch3x3dbl = conv2d_bn(x, 448, 1, 1) 329 | branch3x3dbl = conv2d_bn(branch3x3dbl, 384, 3, 3) 330 | branch3x3dbl_1 = conv2d_bn(branch3x3dbl, 384, 1, 3) 331 | branch3x3dbl_2 = conv2d_bn(branch3x3dbl, 384, 3, 1) 332 | branch3x3dbl = layers.concatenate( 333 | [branch3x3dbl_1, branch3x3dbl_2], axis=channel_axis) 334 | 335 | branch_pool = AveragePooling2D( 336 | (3, 3), strides=(1, 1), padding='same')(x) 337 | branch_pool = conv2d_bn(branch_pool, 192, 1, 1) 338 | x = layers.concatenate( 339 | [branch1x1, branch3x3, branch3x3dbl, branch_pool], 340 | axis=channel_axis, 341 | name='mixed' + str(9 + i)) 342 | if include_top: 343 | # Classification block 344 | x = GlobalAveragePooling2D(name='avg_pool')(x) 345 | x = Dense(classes, activation='softmax', name='predictions')(x) 346 | else: 347 | if pooling == 'avg': 348 | x = GlobalAveragePooling2D()(x) 349 | elif pooling == 'max': 350 | x = GlobalMaxPooling2D()(x) 351 | 352 | # Ensure that the model takes into account 353 | # any potential predecessors of `input_tensor`. 354 | if input_tensor is not None: 355 | inputs = get_source_inputs(input_tensor) 356 | else: 357 | inputs = img_input 358 | # Create model. 359 | model = Model(inputs, x, name='inception_v3') 360 | 361 | # load weights 362 | if weights == 'imagenet': 363 | if K.image_data_format() == 'channels_first': 364 | if K.backend() == 'tensorflow': 365 | warnings.warn('You are using the TensorFlow backend, yet you ' 366 | 'are using the Theano ' 367 | 'image data format convention ' 368 | '(`image_data_format="channels_first"`). ' 369 | 'For best performance, set ' 370 | '`image_data_format="channels_last"` in ' 371 | 'your Keras config ' 372 | 'at ~/.keras/keras.json.') 373 | if include_top: 374 | weights_path = get_file( 375 | 'inception_v3_weights_tf_dim_ordering_tf_kernels.h5', 376 | WEIGHTS_PATH, 377 | cache_subdir='models', 378 | md5_hash='9a0d58056eeedaa3f26cb7ebd46da564') 379 | else: 380 | weights_path = get_file( 381 | 'inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5', 382 | WEIGHTS_PATH_NO_TOP, 383 | cache_subdir='models', 384 | md5_hash='bcbd6486424b2319ff4ef7d526e38f63') 385 | model.load_weights(weights_path) 386 | if K.backend() == 'theano': 387 | convert_all_kernels_in_model(model) 388 | return model 389 | 390 | 391 | def preprocess_input(x): 392 | x /= 255. 393 | x -= 0.5 394 | x *= 2. 395 | return x 396 | 397 | 398 | if __name__ == '__main__': 399 | model = InceptionV3(include_top=True, weights='imagenet') 400 | 401 | img_path = 'elephant.jpg' 402 | img = image.load_img(img_path, target_size=(299, 299)) 403 | x = image.img_to_array(img) 404 | x = np.expand_dims(x, axis=0) 405 | 406 | x = preprocess_input(x) 407 | 408 | preds = model.predict(x) 409 | print('Predicted:', decode_predictions(preds)) 410 | -------------------------------------------------------------------------------- /mobilenet.py: -------------------------------------------------------------------------------- 1 | """MobileNet v1 models for Keras. 2 | 3 | Code contributed by Somshubra Majumdar (@titu1994). 4 | 5 | MobileNet is a general architecture and can be used for multiple use cases. 6 | Depending on the use case, it can use different input layer size and 7 | different width factors. This allows different width models to reduce 8 | the number of multiply-adds and thereby 9 | reduce inference cost on mobile devices. 10 | 11 | MobileNets support any input size greater than 32 x 32, with larger image sizes 12 | offering better performance. 13 | The number of parameters and number of multiply-adds 14 | can be modified by using the `alpha` parameter, 15 | which increases/decreases the number of filters in each layer. 16 | By altering the image size and `alpha` parameter, 17 | all 16 models from the paper can be built, with ImageNet weights provided. 18 | 19 | The paper demonstrates the performance of MobileNets using `alpha` values of 20 | 1.0 (also called 100 % MobileNet), 0.75, 0.5 and 0.25. 21 | For each of these `alpha` values, weights for 4 different input image sizes 22 | are provided (224, 192, 160, 128). 23 | 24 | The following table describes the size and accuracy of the 100% MobileNet 25 | on size 224 x 224: 26 | ---------------------------------------------------------------------------- 27 | Width Multiplier (alpha) | ImageNet Acc | Multiply-Adds (M) | Params (M) 28 | ---------------------------------------------------------------------------- 29 | | 1.0 MobileNet-224 | 70.6 % | 529 | 4.2 | 30 | | 0.75 MobileNet-224 | 68.4 % | 325 | 2.6 | 31 | | 0.50 MobileNet-224 | 63.7 % | 149 | 1.3 | 32 | | 0.25 MobileNet-224 | 50.6 % | 41 | 0.5 | 33 | ---------------------------------------------------------------------------- 34 | 35 | The following table describes the performance of 36 | the 100 % MobileNet on various input sizes: 37 | ------------------------------------------------------------------------ 38 | Resolution | ImageNet Acc | Multiply-Adds (M) | Params (M) 39 | ------------------------------------------------------------------------ 40 | | 1.0 MobileNet-224 | 70.6 % | 529 | 4.2 | 41 | | 1.0 MobileNet-192 | 69.1 % | 529 | 4.2 | 42 | | 1.0 MobileNet-160 | 67.2 % | 529 | 4.2 | 43 | | 1.0 MobileNet-128 | 64.4 % | 529 | 4.2 | 44 | ------------------------------------------------------------------------ 45 | 46 | The weights for all 16 models are obtained and translated 47 | from Tensorflow checkpoints found at 48 | https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md 49 | 50 | # Reference 51 | - [MobileNets: Efficient Convolutional Neural Networks for 52 | Mobile Vision Applications](https://arxiv.org/pdf/1704.04861.pdf)) 53 | """ 54 | from __future__ import print_function 55 | from __future__ import absolute_import 56 | from __future__ import division 57 | 58 | import warnings 59 | import numpy as np 60 | 61 | from keras.preprocessing import image 62 | 63 | from keras.models import Model 64 | from keras.layers import Input 65 | from keras.layers import Activation 66 | from keras.layers import Dropout 67 | from keras.layers import Reshape 68 | from keras.layers import BatchNormalization 69 | from keras.layers import GlobalAveragePooling2D 70 | from keras.layers import GlobalMaxPooling2D 71 | from keras.layers import Conv2D 72 | from keras import initializers 73 | from keras import regularizers 74 | from keras import constraints 75 | from keras.utils import conv_utils 76 | from keras.utils.data_utils import get_file 77 | from keras.engine.topology import get_source_inputs 78 | from keras.engine import InputSpec 79 | from keras.applications.imagenet_utils import _obtain_input_shape 80 | from keras.applications.imagenet_utils import decode_predictions 81 | from keras import backend as K 82 | 83 | 84 | BASE_WEIGHT_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.6/' 85 | 86 | 87 | def relu6(x): 88 | return K.relu(x, max_value=6) 89 | 90 | 91 | def preprocess_input(x): 92 | x /= 255. 93 | x -= 0.5 94 | x *= 2. 95 | return x 96 | 97 | 98 | class DepthwiseConv2D(Conv2D): 99 | """Depthwise separable 2D convolution. 100 | 101 | Depthwise Separable convolutions consists in performing 102 | just the first step in a depthwise spatial convolution 103 | (which acts on each input channel separately). 104 | The `depth_multiplier` argument controls how many 105 | output channels are generated per input channel in the depthwise step. 106 | 107 | # Arguments 108 | kernel_size: An integer or tuple/list of 2 integers, specifying the 109 | width and height of the 2D convolution window. 110 | Can be a single integer to specify the same value for 111 | all spatial dimensions. 112 | strides: An integer or tuple/list of 2 integers, 113 | specifying the strides of the convolution along the width and height. 114 | Can be a single integer to specify the same value for 115 | all spatial dimensions. 116 | Specifying any stride value != 1 is incompatible with specifying 117 | any `dilation_rate` value != 1. 118 | padding: one of `"valid"` or `"same"` (case-insensitive). 119 | depth_multiplier: The number of depthwise convolution output channels 120 | for each input channel. 121 | The total number of depthwise convolution output 122 | channels will be equal to `filters_in * depth_multiplier`. 123 | data_format: A string, 124 | one of `channels_last` (default) or `channels_first`. 125 | The ordering of the dimensions in the inputs. 126 | `channels_last` corresponds to inputs with shape 127 | `(batch, height, width, channels)` while `channels_first` 128 | corresponds to inputs with shape 129 | `(batch, channels, height, width)`. 130 | It defaults to the `image_data_format` value found in your 131 | Keras config file at `~/.keras/keras.json`. 132 | If you never set it, then it will be "channels_last". 133 | activation: Activation function to use 134 | (see [activations](keras./activations.md)). 135 | If you don't specify anything, no activation is applied 136 | (ie. "linear" activation: `a(x) = x`). 137 | use_bias: Boolean, whether the layer uses a bias vector. 138 | depthwise_initializer: Initializer for the depthwise kernel matrix 139 | (see [initializers](keras./initializers.md)). 140 | bias_initializer: Initializer for the bias vector 141 | (see [initializers](keras./initializers.md)). 142 | depthwise_regularizer: Regularizer function applied to 143 | the depthwise kernel matrix 144 | (see [regularizer](keras./regularizers.md)). 145 | bias_regularizer: Regularizer function applied to the bias vector 146 | (see [regularizer](keras./regularizers.md)). 147 | activity_regularizer: Regularizer function applied to 148 | the output of the layer (its "activation"). 149 | (see [regularizer](keras./regularizers.md)). 150 | depthwise_constraint: Constraint function applied to 151 | the depthwise kernel matrix 152 | (see [constraints](keras./constraints.md)). 153 | bias_constraint: Constraint function applied to the bias vector 154 | (see [constraints](keras./constraints.md)). 155 | 156 | # Input shape 157 | 4D tensor with shape: 158 | `[batch, channels, rows, cols]` if data_format='channels_first' 159 | or 4D tensor with shape: 160 | `[batch, rows, cols, channels]` if data_format='channels_last'. 161 | 162 | # Output shape 163 | 4D tensor with shape: 164 | `[batch, filters, new_rows, new_cols]` if data_format='channels_first' 165 | or 4D tensor with shape: 166 | `[batch, new_rows, new_cols, filters]` if data_format='channels_last'. 167 | `rows` and `cols` values might have changed due to padding. 168 | """ 169 | 170 | def __init__(self, 171 | kernel_size, 172 | strides=(1, 1), 173 | padding='valid', 174 | depth_multiplier=1, 175 | data_format=None, 176 | activation=None, 177 | use_bias=True, 178 | depthwise_initializer='glorot_uniform', 179 | bias_initializer='zeros', 180 | depthwise_regularizer=None, 181 | bias_regularizer=None, 182 | activity_regularizer=None, 183 | depthwise_constraint=None, 184 | bias_constraint=None, 185 | **kwargs): 186 | super(DepthwiseConv2D, self).__init__( 187 | filters=None, 188 | kernel_size=kernel_size, 189 | strides=strides, 190 | padding=padding, 191 | data_format=data_format, 192 | activation=activation, 193 | use_bias=use_bias, 194 | bias_regularizer=bias_regularizer, 195 | activity_regularizer=activity_regularizer, 196 | bias_constraint=bias_constraint, 197 | **kwargs) 198 | self.depth_multiplier = depth_multiplier 199 | self.depthwise_initializer = initializers.get(depthwise_initializer) 200 | self.depthwise_regularizer = regularizers.get(depthwise_regularizer) 201 | self.depthwise_constraint = constraints.get(depthwise_constraint) 202 | self.bias_initializer = initializers.get(bias_initializer) 203 | 204 | def build(self, input_shape): 205 | if len(input_shape) < 4: 206 | raise ValueError('Inputs to `DepthwiseConv2D` should have rank 4. ' 207 | 'Received input shape:', str(input_shape)) 208 | if self.data_format == 'channels_first': 209 | channel_axis = 1 210 | else: 211 | channel_axis = 3 212 | if input_shape[channel_axis] is None: 213 | raise ValueError('The channel dimension of the inputs to ' 214 | '`DepthwiseConv2D` ' 215 | 'should be defined. Found `None`.') 216 | input_dim = int(input_shape[channel_axis]) 217 | depthwise_kernel_shape = (self.kernel_size[0], 218 | self.kernel_size[1], 219 | input_dim, 220 | self.depth_multiplier) 221 | 222 | self.depthwise_kernel = self.add_weight( 223 | shape=depthwise_kernel_shape, 224 | initializer=self.depthwise_initializer, 225 | name='depthwise_kernel', 226 | regularizer=self.depthwise_regularizer, 227 | constraint=self.depthwise_constraint) 228 | 229 | if self.use_bias: 230 | self.bias = self.add_weight(shape=(input_dim * self.depth_multiplier,), 231 | initializer=self.bias_initializer, 232 | name='bias', 233 | regularizer=self.bias_regularizer, 234 | constraint=self.bias_constraint) 235 | else: 236 | self.bias = None 237 | # Set input spec. 238 | self.input_spec = InputSpec(ndim=4, axes={channel_axis: input_dim}) 239 | self.built = True 240 | 241 | def call(self, inputs, training=None): 242 | outputs = K.depthwise_conv2d( 243 | inputs, 244 | self.depthwise_kernel, 245 | strides=self.strides, 246 | padding=self.padding, 247 | dilation_rate=self.dilation_rate, 248 | data_format=self.data_format) 249 | 250 | if self.bias: 251 | outputs = K.bias_add( 252 | outputs, 253 | self.bias, 254 | data_format=self.data_format) 255 | 256 | if self.activation is not None: 257 | return self.activation(outputs) 258 | 259 | return outputs 260 | 261 | def compute_output_shape(self, input_shape): 262 | if self.data_format == 'channels_first': 263 | rows = input_shape[2] 264 | cols = input_shape[3] 265 | out_filters = input_shape[1] * self.depth_multiplier 266 | elif self.data_format == 'channels_last': 267 | rows = input_shape[1] 268 | cols = input_shape[2] 269 | out_filters = input_shape[3] * self.depth_multiplier 270 | 271 | rows = conv_utils.conv_output_length(rows, self.kernel_size[0], 272 | self.padding, 273 | self.strides[0]) 274 | cols = conv_utils.conv_output_length(cols, self.kernel_size[1], 275 | self.padding, 276 | self.strides[1]) 277 | 278 | if self.data_format == 'channels_first': 279 | return (input_shape[0], out_filters, rows, cols) 280 | elif self.data_format == 'channels_last': 281 | return (input_shape[0], rows, cols, out_filters) 282 | 283 | def get_config(self): 284 | config = super(DepthwiseConv2D, self).get_config() 285 | config.pop('filters') 286 | config.pop('kernel_initializer') 287 | config.pop('kernel_regularizer') 288 | config.pop('kernel_constraint') 289 | config['depth_multiplier'] = self.depth_multiplier 290 | config['depthwise_initializer'] = initializers.serialize(self.depthwise_initializer) 291 | config['depthwise_regularizer'] = regularizers.serialize(self.depthwise_regularizer) 292 | config['depthwise_constraint'] = constraints.serialize(self.depthwise_constraint) 293 | return config 294 | 295 | 296 | def MobileNet(input_shape=None, 297 | alpha=1.0, 298 | depth_multiplier=1, 299 | dropout=1e-3, 300 | include_top=True, 301 | weights='imagenet', 302 | input_tensor=None, 303 | pooling=None, 304 | classes=1000): 305 | """Instantiates the MobileNet architecture. 306 | 307 | Note that only TensorFlow is supported for now, 308 | therefore it only works with the data format 309 | `image_data_format='channels_last'` in your Keras config 310 | at `~/.keras/keras.json`. 311 | 312 | To load a MobileNet model via `load_model`, import the custom 313 | objects `relu6` and `DepthwiseConv2D` and pass them to the 314 | `custom_objects` parameter. 315 | E.g. 316 | model = load_model('mobilenet.h5', custom_objects={ 317 | 'relu6': mobilenet.relu6, 318 | 'DepthwiseConv2D': mobilenet.DepthwiseConv2D}) 319 | 320 | # Arguments 321 | input_shape: optional shape tuple, only to be specified 322 | if `include_top` is False (otherwise the input shape 323 | has to be `(224, 224, 3)` (with `channels_last` data format) 324 | or (3, 224, 224) (with `channels_first` data format). 325 | It should have exactly 3 inputs channels, 326 | and width and height should be no smaller than 32. 327 | E.g. `(200, 200, 3)` would be one valid value. 328 | alpha: controls the width of the network. 329 | - If `alpha` < 1.0, proportionally decreases the number 330 | of filters in each layer. 331 | - If `alpha` > 1.0, proportionally increases the number 332 | of filters in each layer. 333 | - If `alpha` = 1, default number of filters from the paper 334 | are used at each layer. 335 | depth_multiplier: depth multiplier for depthwise convolution 336 | (also called the resolution multiplier) 337 | dropout: dropout rate 338 | include_top: whether to include the fully-connected 339 | layer at the top of the network. 340 | weights: `None` (random initialization) or 341 | `imagenet` (ImageNet weights) 342 | input_tensor: optional Keras tensor (i.e. output of 343 | `layers.Input()`) 344 | to use as image input for the model. 345 | pooling: Optional pooling mode for feature extraction 346 | when `include_top` is `False`. 347 | - `None` means that the output of the model 348 | will be the 4D tensor output of the 349 | last convolutional layer. 350 | - `avg` means that global average pooling 351 | will be applied to the output of the 352 | last convolutional layer, and thus 353 | the output of the model will be a 354 | 2D tensor. 355 | - `max` means that global max pooling will 356 | be applied. 357 | classes: optional number of classes to classify images 358 | into, only to be specified if `include_top` is True, and 359 | if no `weights` argument is specified. 360 | 361 | # Returns 362 | A Keras model instance. 363 | 364 | # Raises 365 | ValueError: in case of invalid argument for `weights`, 366 | or invalid input shape. 367 | RuntimeError: If attempting to run this model with a 368 | backend that does not support separable convolutions. 369 | """ 370 | 371 | if K.backend() != 'tensorflow': 372 | raise RuntimeError('Only Tensorflow backend is currently supported, ' 373 | 'as other backends do not support ' 374 | 'depthwise convolution.') 375 | 376 | if weights not in {'imagenet', None}: 377 | raise ValueError('The `weights` argument should be either ' 378 | '`None` (random initialization) or `imagenet` ' 379 | '(pre-training on ImageNet).') 380 | 381 | if weights == 'imagenet' and include_top and classes != 1000: 382 | raise ValueError('If using `weights` as ImageNet with `include_top` ' 383 | 'as true, `classes` should be 1000') 384 | 385 | # Determine proper input shape. 386 | input_shape = _obtain_input_shape(input_shape, 387 | default_size=224, 388 | min_size=32, 389 | data_format=K.image_data_format(), 390 | include_top=include_top or weights) 391 | if K.image_data_format() == 'channels_last': 392 | row_axis, col_axis = (0, 1) 393 | else: 394 | row_axis, col_axis = (1, 2) 395 | rows = input_shape[row_axis] 396 | cols = input_shape[col_axis] 397 | 398 | if weights == 'imagenet': 399 | if depth_multiplier != 1: 400 | raise ValueError('If imagenet weights are being loaded, ' 401 | 'depth multiplier must be 1') 402 | 403 | if alpha not in [0.25, 0.50, 0.75, 1.0]: 404 | raise ValueError('If imagenet weights are being loaded, ' 405 | 'alpha can be one of' 406 | '`0.25`, `0.50`, `0.75` or `1.0` only.') 407 | 408 | if rows != cols or rows not in [128, 160, 192, 224]: 409 | raise ValueError('If imagenet weights are being loaded, ' 410 | 'input must have a static square shape (one of ' 411 | '(128,128), (160,160), (192,192), or (224, 224)).' 412 | ' Input shape provided = %s' % (input_shape,)) 413 | 414 | if K.image_data_format() != 'channels_last': 415 | warnings.warn('The MobileNet family of models is only available ' 416 | 'for the input data format "channels_last" ' 417 | '(width, height, channels). ' 418 | 'However your settings specify the default ' 419 | 'data format "channels_first" (channels, width, height).' 420 | ' You should set `image_data_format="channels_last"` ' 421 | 'in your Keras config located at ~/.keras/keras.json. ' 422 | 'The model being returned right now will expect inputs ' 423 | 'to follow the "channels_last" data format.') 424 | K.set_image_data_format('channels_last') 425 | old_data_format = 'channels_first' 426 | else: 427 | old_data_format = None 428 | 429 | if input_tensor is None: 430 | img_input = Input(shape=input_shape) 431 | else: 432 | if not K.is_keras_tensor(input_tensor): 433 | img_input = Input(tensor=input_tensor, shape=input_shape) 434 | else: 435 | img_input = input_tensor 436 | 437 | x = _conv_block(img_input, 32, alpha, strides=(2, 2)) 438 | x = _depthwise_conv_block(x, 64, alpha, depth_multiplier, block_id=1) 439 | 440 | x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, 441 | strides=(2, 2), block_id=2) 442 | x = _depthwise_conv_block(x, 128, alpha, depth_multiplier, block_id=3) 443 | 444 | x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, 445 | strides=(2, 2), block_id=4) 446 | x = _depthwise_conv_block(x, 256, alpha, depth_multiplier, block_id=5) 447 | 448 | x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, 449 | strides=(2, 2), block_id=6) 450 | x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=7) 451 | x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=8) 452 | x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=9) 453 | x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=10) 454 | x = _depthwise_conv_block(x, 512, alpha, depth_multiplier, block_id=11) 455 | 456 | x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, 457 | strides=(2, 2), block_id=12) 458 | x = _depthwise_conv_block(x, 1024, alpha, depth_multiplier, block_id=13) 459 | 460 | if include_top: 461 | if K.image_data_format() == 'channels_first': 462 | shape = (int(1024 * alpha), 1, 1) 463 | else: 464 | shape = (1, 1, int(1024 * alpha)) 465 | 466 | x = GlobalAveragePooling2D()(x) 467 | x = Reshape(shape, name='reshape_1')(x) 468 | x = Dropout(dropout, name='dropout')(x) 469 | x = Conv2D(classes, (1, 1), 470 | padding='same', name='conv_preds')(x) 471 | x = Activation('softmax', name='act_softmax')(x) 472 | x = Reshape((classes,), name='reshape_2')(x) 473 | else: 474 | if pooling == 'avg': 475 | x = GlobalAveragePooling2D()(x) 476 | elif pooling == 'max': 477 | x = GlobalMaxPooling2D()(x) 478 | 479 | # Ensure that the model takes into account 480 | # any potential predecessors of `input_tensor`. 481 | if input_tensor is not None: 482 | inputs = get_source_inputs(input_tensor) 483 | else: 484 | inputs = img_input 485 | 486 | # Create model. 487 | model = Model(inputs, x, name='mobilenet_%0.2f_%s' % (alpha, rows)) 488 | 489 | # load weights 490 | if weights == 'imagenet': 491 | if K.image_data_format() == 'channels_first': 492 | raise ValueError('Weights for "channels_last" format ' 493 | 'are not available.') 494 | if alpha == 1.0: 495 | alpha_text = '1_0' 496 | elif alpha == 0.75: 497 | alpha_text = '7_5' 498 | elif alpha == 0.50: 499 | alpha_text = '5_0' 500 | else: 501 | alpha_text = '2_5' 502 | 503 | if include_top: 504 | model_name = 'mobilenet_%s_%d_tf.h5' % (alpha_text, rows) 505 | weigh_path = BASE_WEIGHT_PATH + model_name 506 | weights_path = get_file(model_name, 507 | weigh_path, 508 | cache_subdir='models') 509 | else: 510 | model_name = 'mobilenet_%s_%d_tf_no_top.h5' % (alpha_text, rows) 511 | weigh_path = BASE_WEIGHT_PATH + model_name 512 | weights_path = get_file(model_name, 513 | weigh_path, 514 | cache_subdir='models') 515 | model.load_weights(weights_path) 516 | 517 | if old_data_format: 518 | K.set_image_data_format(old_data_format) 519 | return model 520 | 521 | 522 | def _conv_block(inputs, filters, alpha, kernel=(3, 3), strides=(1, 1)): 523 | """Adds an initial convolution layer (with batch normalization and relu6). 524 | 525 | # Arguments 526 | inputs: Input tensor of shape `(rows, cols, 3)` 527 | (with `channels_last` data format) or 528 | (3, rows, cols) (with `channels_first` data format). 529 | It should have exactly 3 inputs channels, 530 | and width and height should be no smaller than 32. 531 | E.g. `(224, 224, 3)` would be one valid value. 532 | filters: Integer, the dimensionality of the output space 533 | (i.e. the number output of filters in the convolution). 534 | alpha: controls the width of the network. 535 | - If `alpha` < 1.0, proportionally decreases the number 536 | of filters in each layer. 537 | - If `alpha` > 1.0, proportionally increases the number 538 | of filters in each layer. 539 | - If `alpha` = 1, default number of filters from the paper 540 | are used at each layer. 541 | kernel: An integer or tuple/list of 2 integers, specifying the 542 | width and height of the 2D convolution window. 543 | Can be a single integer to specify the same value for 544 | all spatial dimensions. 545 | strides: An integer or tuple/list of 2 integers, 546 | specifying the strides of the convolution along the width and height. 547 | Can be a single integer to specify the same value for 548 | all spatial dimensions. 549 | Specifying any stride value != 1 is incompatible with specifying 550 | any `dilation_rate` value != 1. 551 | 552 | # Input shape 553 | 4D tensor with shape: 554 | `(samples, channels, rows, cols)` if data_format='channels_first' 555 | or 4D tensor with shape: 556 | `(samples, rows, cols, channels)` if data_format='channels_last'. 557 | 558 | # Output shape 559 | 4D tensor with shape: 560 | `(samples, filters, new_rows, new_cols)` if data_format='channels_first' 561 | or 4D tensor with shape: 562 | `(samples, new_rows, new_cols, filters)` if data_format='channels_last'. 563 | `rows` and `cols` values might have changed due to stride. 564 | 565 | # Returns 566 | Output tensor of block. 567 | """ 568 | channel_axis = 1 if K.image_data_format() == 'channels_first' else -1 569 | filters = int(filters * alpha) 570 | x = Conv2D(filters, kernel, 571 | padding='same', 572 | use_bias=False, 573 | strides=strides, 574 | name='conv1')(inputs) 575 | x = BatchNormalization(axis=channel_axis, name='conv1_bn')(x) 576 | return Activation(relu6, name='conv1_relu')(x) 577 | 578 | 579 | def _depthwise_conv_block(inputs, pointwise_conv_filters, alpha, 580 | depth_multiplier=1, strides=(1, 1), block_id=1): 581 | """Adds a depthwise convolution block. 582 | 583 | A depthwise convolution block consists of a depthwise conv, 584 | batch normalization, relu6, pointwise convolution, 585 | batch normalization and relu6 activation. 586 | 587 | # Arguments 588 | inputs: Input tensor of shape `(rows, cols, channels)` 589 | (with `channels_last` data format) or 590 | (channels, rows, cols) (with `channels_first` data format). 591 | pointwise_conv_filters: Integer, the dimensionality of the output space 592 | (i.e. the number output of filters in the pointwise convolution). 593 | alpha: controls the width of the network. 594 | - If `alpha` < 1.0, proportionally decreases the number 595 | of filters in each layer. 596 | - If `alpha` > 1.0, proportionally increases the number 597 | of filters in each layer. 598 | - If `alpha` = 1, default number of filters from the paper 599 | are used at each layer. 600 | depth_multiplier: The number of depthwise convolution output channels 601 | for each input channel. 602 | The total number of depthwise convolution output 603 | channels will be equal to `filters_in * depth_multiplier`. 604 | strides: An integer or tuple/list of 2 integers, 605 | specifying the strides of the convolution along the width and height. 606 | Can be a single integer to specify the same value for 607 | all spatial dimensions. 608 | Specifying any stride value != 1 is incompatible with specifying 609 | any `dilation_rate` value != 1. 610 | block_id: Integer, a unique identification designating the block number. 611 | 612 | # Input shape 613 | 4D tensor with shape: 614 | `(batch, channels, rows, cols)` if data_format='channels_first' 615 | or 4D tensor with shape: 616 | `(batch, rows, cols, channels)` if data_format='channels_last'. 617 | 618 | # Output shape 619 | 4D tensor with shape: 620 | `(batch, filters, new_rows, new_cols)` if data_format='channels_first' 621 | or 4D tensor with shape: 622 | `(batch, new_rows, new_cols, filters)` if data_format='channels_last'. 623 | `rows` and `cols` values might have changed due to stride. 624 | 625 | # Returns 626 | Output tensor of block. 627 | """ 628 | channel_axis = 1 if K.image_data_format() == 'channels_first' else -1 629 | pointwise_conv_filters = int(pointwise_conv_filters * alpha) 630 | 631 | x = DepthwiseConv2D((3, 3), 632 | padding='same', 633 | depth_multiplier=depth_multiplier, 634 | strides=strides, 635 | use_bias=False, 636 | name='conv_dw_%d' % block_id)(inputs) 637 | x = BatchNormalization(axis=channel_axis, name='conv_dw_%d_bn' % block_id)(x) 638 | x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x) 639 | 640 | x = Conv2D(pointwise_conv_filters, (1, 1), 641 | padding='same', 642 | use_bias=False, 643 | strides=(1, 1), 644 | name='conv_pw_%d' % block_id)(x) 645 | x = BatchNormalization(axis=channel_axis, name='conv_pw_%d_bn' % block_id)(x) 646 | return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x) 647 | 648 | 649 | if __name__ == '__main__': 650 | for r in [128, 160, 192, 224]: 651 | for a in [0.25, 0.50, 0.75, 1.0]: 652 | if r == 224: 653 | model = MobileNet(include_top=True, weights='imagenet', 654 | input_shape=(r, r, 3), alpha=a) 655 | 656 | img_path = 'elephant.jpg' 657 | img = image.load_img(img_path, target_size=(r, r)) 658 | x = image.img_to_array(img) 659 | x = np.expand_dims(x, axis=0) 660 | x = preprocess_input(x) 661 | print('Input image shape:', x.shape) 662 | 663 | preds = model.predict(x) 664 | print(np.argmax(preds)) 665 | print('Predicted:', decode_predictions(preds, 1)) 666 | 667 | model = MobileNet(include_top=False, weights='imagenet') 668 | -------------------------------------------------------------------------------- /music_tagger_crnn.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | '''MusicTaggerCRNN model for Keras. 3 | 4 | Code by github.com/keunwoochoi. 5 | 6 | # Reference: 7 | 8 | - [Music-auto_tagging-keras](https://github.com/keunwoochoi/music-auto_tagging-keras) 9 | 10 | ''' 11 | from __future__ import print_function 12 | from __future__ import absolute_import 13 | 14 | import numpy as np 15 | from keras import backend as K 16 | from keras.layers import Input, Dense 17 | from keras.models import Model 18 | from keras.layers import Dense, Dropout, Reshape, Permute 19 | from keras.layers.convolutional import Convolution2D 20 | from keras.layers.convolutional import MaxPooling2D, ZeroPadding2D 21 | from keras.layers.normalization import BatchNormalization 22 | from keras.layers.advanced_activations import ELU 23 | from keras.layers.recurrent import GRU 24 | from keras.utils.data_utils import get_file 25 | from keras.utils.layer_utils import convert_all_kernels_in_model 26 | from audio_conv_utils import decode_predictions, preprocess_input 27 | 28 | TH_WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.3/music_tagger_crnn_weights_tf_kernels_th_dim_ordering.h5' 29 | TF_WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.3/music_tagger_crnn_weights_tf_kernels_tf_dim_ordering.h5' 30 | 31 | 32 | def MusicTaggerCRNN(weights='msd', input_tensor=None, 33 | include_top=True): 34 | '''Instantiate the MusicTaggerCRNN architecture, 35 | optionally loading weights pre-trained 36 | on Million Song Dataset. Note that when using TensorFlow, 37 | for best performance you should set 38 | `image_dim_ordering="tf"` in your Keras config 39 | at ~/.keras/keras.json. 40 | 41 | The model and the weights are compatible with both 42 | TensorFlow and Theano. The dimension ordering 43 | convention used by the model is the one 44 | specified in your Keras config file. 45 | 46 | For preparing mel-spectrogram input, see 47 | `audio_conv_utils.py` in [applications](https://github.com/fchollet/keras/tree/master/keras/applications). 48 | You will need to install [Librosa](http://librosa.github.io/librosa/) 49 | to use it. 50 | 51 | # Arguments 52 | weights: one of `None` (random initialization) 53 | or "msd" (pre-training on ImageNet). 54 | input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) 55 | to use as image input for the model. 56 | include_top: whether to include the 1 fully-connected 57 | layer (output layer) at the top of the network. 58 | If False, the network outputs 32-dim features. 59 | 60 | 61 | # Returns 62 | A Keras model instance. 63 | ''' 64 | if weights not in {'msd', None}: 65 | raise ValueError('The `weights` argument should be either ' 66 | '`None` (random initialization) or `msd` ' 67 | '(pre-training on Million Song Dataset).') 68 | 69 | # Determine proper input shape 70 | if K.image_dim_ordering() == 'th': 71 | input_shape = (1, 96, 1366) 72 | else: 73 | input_shape = (96, 1366, 1) 74 | 75 | if input_tensor is None: 76 | melgram_input = Input(shape=input_shape) 77 | else: 78 | if not K.is_keras_tensor(input_tensor): 79 | melgram_input = Input(tensor=input_tensor, shape=input_shape) 80 | else: 81 | melgram_input = input_tensor 82 | 83 | # Determine input axis 84 | if K.image_dim_ordering() == 'th': 85 | channel_axis = 1 86 | freq_axis = 2 87 | time_axis = 3 88 | else: 89 | channel_axis = 3 90 | freq_axis = 1 91 | time_axis = 2 92 | 93 | # Input block 94 | x = ZeroPadding2D(padding=(0, 37))(melgram_input) 95 | x = BatchNormalization(axis=time_axis, name='bn_0_freq')(x) 96 | 97 | # Conv block 1 98 | x = Convolution2D(64, 3, 3, border_mode='same', name='conv1')(x) 99 | x = BatchNormalization(axis=channel_axis, mode=0, name='bn1')(x) 100 | x = ELU()(x) 101 | x = MaxPooling2D(pool_size=(2, 2), strides=(2, 2), name='pool1')(x) 102 | 103 | # Conv block 2 104 | x = Convolution2D(128, 3, 3, border_mode='same', name='conv2')(x) 105 | x = BatchNormalization(axis=channel_axis, mode=0, name='bn2')(x) 106 | x = ELU()(x) 107 | x = MaxPooling2D(pool_size=(3, 3), strides=(3, 3), name='pool2')(x) 108 | 109 | # Conv block 3 110 | x = Convolution2D(128, 3, 3, border_mode='same', name='conv3')(x) 111 | x = BatchNormalization(axis=channel_axis, mode=0, name='bn3')(x) 112 | x = ELU()(x) 113 | x = MaxPooling2D(pool_size=(4, 4), strides=(4, 4), name='pool3')(x) 114 | 115 | # Conv block 4 116 | x = Convolution2D(128, 3, 3, border_mode='same', name='conv4')(x) 117 | x = BatchNormalization(axis=channel_axis, mode=0, name='bn4')(x) 118 | x = ELU()(x) 119 | x = MaxPooling2D(pool_size=(4, 4), strides=(4, 4), name='pool4')(x) 120 | 121 | # reshaping 122 | if K.image_dim_ordering() == 'th': 123 | x = Permute((3, 1, 2))(x) 124 | x = Reshape((15, 128))(x) 125 | 126 | # GRU block 1, 2, output 127 | x = GRU(32, return_sequences=True, name='gru1')(x) 128 | x = GRU(32, return_sequences=False, name='gru2')(x) 129 | 130 | if include_top: 131 | x = Dense(50, activation='sigmoid', name='output')(x) 132 | 133 | # Create model 134 | model = Model(melgram_input, x) 135 | if weights is None: 136 | return model 137 | else: 138 | # Load weights 139 | if K.image_dim_ordering() == 'tf': 140 | weights_path = get_file('music_tagger_crnn_weights_tf_kernels_tf_dim_ordering.h5', 141 | TF_WEIGHTS_PATH, 142 | cache_subdir='models') 143 | else: 144 | weights_path = get_file('music_tagger_crnn_weights_tf_kernels_th_dim_ordering.h5', 145 | TH_WEIGHTS_PATH, 146 | cache_subdir='models') 147 | model.load_weights(weights_path, by_name=True) 148 | if K.backend() == 'theano': 149 | convert_all_kernels_in_model(model) 150 | return model 151 | 152 | 153 | if __name__ == '__main__': 154 | model = MusicTaggerCRNN(weights='msd') 155 | 156 | audio_path = 'audio_file.mp3' 157 | melgram = preprocess_input(audio_path) 158 | melgrams = np.expand_dims(melgram, axis=0) 159 | 160 | preds = model.predict(melgrams) 161 | print('Predicted:') 162 | print(decode_predictions(preds)) 163 | -------------------------------------------------------------------------------- /resnet50.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | '''ResNet50 model for Keras. 3 | 4 | # Reference: 5 | 6 | - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) 7 | 8 | Adapted from code contributed by BigMoyan. 9 | ''' 10 | from __future__ import print_function 11 | 12 | import numpy as np 13 | import warnings 14 | 15 | from keras.layers import Input 16 | from keras import layers 17 | from keras.layers import Dense 18 | from keras.layers import Activation 19 | from keras.layers import Flatten 20 | from keras.layers import Conv2D 21 | from keras.layers import MaxPooling2D 22 | from keras.layers import GlobalMaxPooling2D 23 | from keras.layers import ZeroPadding2D 24 | from keras.layers import AveragePooling2D 25 | from keras.layers import GlobalAveragePooling2D 26 | from keras.layers import BatchNormalization 27 | from keras.models import Model 28 | from keras.preprocessing import image 29 | import keras.backend as K 30 | from keras.utils import layer_utils 31 | from keras.utils.data_utils import get_file 32 | from keras.applications.imagenet_utils import decode_predictions 33 | from keras.applications.imagenet_utils import preprocess_input 34 | from keras.applications.imagenet_utils import _obtain_input_shape 35 | from keras.engine.topology import get_source_inputs 36 | 37 | 38 | WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels.h5' 39 | WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5' 40 | 41 | 42 | def identity_block(input_tensor, kernel_size, filters, stage, block): 43 | """The identity block is the block that has no conv layer at shortcut. 44 | 45 | # Arguments 46 | input_tensor: input tensor 47 | kernel_size: defualt 3, the kernel size of middle conv layer at main path 48 | filters: list of integers, the filterss of 3 conv layer at main path 49 | stage: integer, current stage label, used for generating layer names 50 | block: 'a','b'..., current block label, used for generating layer names 51 | 52 | # Returns 53 | Output tensor for the block. 54 | """ 55 | filters1, filters2, filters3 = filters 56 | if K.image_data_format() == 'channels_last': 57 | bn_axis = 3 58 | else: 59 | bn_axis = 1 60 | conv_name_base = 'res' + str(stage) + block + '_branch' 61 | bn_name_base = 'bn' + str(stage) + block + '_branch' 62 | 63 | x = Conv2D(filters1, (1, 1), name=conv_name_base + '2a')(input_tensor) 64 | x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x) 65 | x = Activation('relu')(x) 66 | 67 | x = Conv2D(filters2, kernel_size, 68 | padding='same', name=conv_name_base + '2b')(x) 69 | x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x) 70 | x = Activation('relu')(x) 71 | 72 | x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x) 73 | x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x) 74 | 75 | x = layers.add([x, input_tensor]) 76 | x = Activation('relu')(x) 77 | return x 78 | 79 | 80 | def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)): 81 | """conv_block is the block that has a conv layer at shortcut 82 | 83 | # Arguments 84 | input_tensor: input tensor 85 | kernel_size: defualt 3, the kernel size of middle conv layer at main path 86 | filters: list of integers, the filterss of 3 conv layer at main path 87 | stage: integer, current stage label, used for generating layer names 88 | block: 'a','b'..., current block label, used for generating layer names 89 | 90 | # Returns 91 | Output tensor for the block. 92 | 93 | Note that from stage 3, the first conv layer at main path is with strides=(2,2) 94 | And the shortcut should have strides=(2,2) as well 95 | """ 96 | filters1, filters2, filters3 = filters 97 | if K.image_data_format() == 'channels_last': 98 | bn_axis = 3 99 | else: 100 | bn_axis = 1 101 | conv_name_base = 'res' + str(stage) + block + '_branch' 102 | bn_name_base = 'bn' + str(stage) + block + '_branch' 103 | 104 | x = Conv2D(filters1, (1, 1), strides=strides, 105 | name=conv_name_base + '2a')(input_tensor) 106 | x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2a')(x) 107 | x = Activation('relu')(x) 108 | 109 | x = Conv2D(filters2, kernel_size, padding='same', 110 | name=conv_name_base + '2b')(x) 111 | x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2b')(x) 112 | x = Activation('relu')(x) 113 | 114 | x = Conv2D(filters3, (1, 1), name=conv_name_base + '2c')(x) 115 | x = BatchNormalization(axis=bn_axis, name=bn_name_base + '2c')(x) 116 | 117 | shortcut = Conv2D(filters3, (1, 1), strides=strides, 118 | name=conv_name_base + '1')(input_tensor) 119 | shortcut = BatchNormalization(axis=bn_axis, name=bn_name_base + '1')(shortcut) 120 | 121 | x = layers.add([x, shortcut]) 122 | x = Activation('relu')(x) 123 | return x 124 | 125 | 126 | def ResNet50(include_top=True, weights='imagenet', 127 | input_tensor=None, input_shape=None, 128 | pooling=None, 129 | classes=1000): 130 | """Instantiates the ResNet50 architecture. 131 | 132 | Optionally loads weights pre-trained 133 | on ImageNet. Note that when using TensorFlow, 134 | for best performance you should set 135 | `image_data_format="channels_last"` in your Keras config 136 | at ~/.keras/keras.json. 137 | 138 | The model and the weights are compatible with both 139 | TensorFlow and Theano. The data format 140 | convention used by the model is the one 141 | specified in your Keras config file. 142 | 143 | # Arguments 144 | include_top: whether to include the fully-connected 145 | layer at the top of the network. 146 | weights: one of `None` (random initialization) 147 | or "imagenet" (pre-training on ImageNet). 148 | input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) 149 | to use as image input for the model. 150 | input_shape: optional shape tuple, only to be specified 151 | if `include_top` is False (otherwise the input shape 152 | has to be `(224, 224, 3)` (with `channels_last` data format) 153 | or `(3, 224, 244)` (with `channels_first` data format). 154 | It should have exactly 3 inputs channels, 155 | and width and height should be no smaller than 197. 156 | E.g. `(200, 200, 3)` would be one valid value. 157 | pooling: Optional pooling mode for feature extraction 158 | when `include_top` is `False`. 159 | - `None` means that the output of the model will be 160 | the 4D tensor output of the 161 | last convolutional layer. 162 | - `avg` means that global average pooling 163 | will be applied to the output of the 164 | last convolutional layer, and thus 165 | the output of the model will be a 2D tensor. 166 | - `max` means that global max pooling will 167 | be applied. 168 | classes: optional number of classes to classify images 169 | into, only to be specified if `include_top` is True, and 170 | if no `weights` argument is specified. 171 | 172 | # Returns 173 | A Keras model instance. 174 | 175 | # Raises 176 | ValueError: in case of invalid argument for `weights`, 177 | or invalid input shape. 178 | """ 179 | if weights not in {'imagenet', None}: 180 | raise ValueError('The `weights` argument should be either ' 181 | '`None` (random initialization) or `imagenet` ' 182 | '(pre-training on ImageNet).') 183 | 184 | if weights == 'imagenet' and include_top and classes != 1000: 185 | raise ValueError('If using `weights` as imagenet with `include_top`' 186 | ' as true, `classes` should be 1000') 187 | 188 | # Determine proper input shape 189 | input_shape = _obtain_input_shape(input_shape, 190 | default_size=224, 191 | min_size=197, 192 | data_format=K.image_data_format(), 193 | include_top=include_top) 194 | 195 | if input_tensor is None: 196 | img_input = Input(shape=input_shape) 197 | else: 198 | if not K.is_keras_tensor(input_tensor): 199 | img_input = Input(tensor=input_tensor, shape=input_shape) 200 | else: 201 | img_input = input_tensor 202 | if K.image_data_format() == 'channels_last': 203 | bn_axis = 3 204 | else: 205 | bn_axis = 1 206 | 207 | x = ZeroPadding2D((3, 3))(img_input) 208 | x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x) 209 | x = BatchNormalization(axis=bn_axis, name='bn_conv1')(x) 210 | x = Activation('relu')(x) 211 | x = MaxPooling2D((3, 3), strides=(2, 2))(x) 212 | 213 | x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1)) 214 | x = identity_block(x, 3, [64, 64, 256], stage=2, block='b') 215 | x = identity_block(x, 3, [64, 64, 256], stage=2, block='c') 216 | 217 | x = conv_block(x, 3, [128, 128, 512], stage=3, block='a') 218 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='b') 219 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='c') 220 | x = identity_block(x, 3, [128, 128, 512], stage=3, block='d') 221 | 222 | x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a') 223 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b') 224 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c') 225 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d') 226 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e') 227 | x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f') 228 | 229 | x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a') 230 | x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b') 231 | x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c') 232 | 233 | x = AveragePooling2D((7, 7), name='avg_pool')(x) 234 | 235 | if include_top: 236 | x = Flatten()(x) 237 | x = Dense(classes, activation='softmax', name='fc1000')(x) 238 | else: 239 | if pooling == 'avg': 240 | x = GlobalAveragePooling2D()(x) 241 | elif pooling == 'max': 242 | x = GlobalMaxPooling2D()(x) 243 | 244 | # Ensure that the model takes into account 245 | # any potential predecessors of `input_tensor`. 246 | if input_tensor is not None: 247 | inputs = get_source_inputs(input_tensor) 248 | else: 249 | inputs = img_input 250 | # Create model. 251 | model = Model(inputs, x, name='resnet50') 252 | 253 | # load weights 254 | if weights == 'imagenet': 255 | if include_top: 256 | weights_path = get_file('resnet50_weights_tf_dim_ordering_tf_kernels.h5', 257 | WEIGHTS_PATH, 258 | cache_subdir='models', 259 | md5_hash='a7b3fe01876f51b976af0dea6bc144eb') 260 | else: 261 | weights_path = get_file('resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5', 262 | WEIGHTS_PATH_NO_TOP, 263 | cache_subdir='models', 264 | md5_hash='a268eb855778b3df3c7506639542a6af') 265 | model.load_weights(weights_path) 266 | if K.backend() == 'theano': 267 | layer_utils.convert_all_kernels_in_model(model) 268 | 269 | if K.image_data_format() == 'channels_first': 270 | if include_top: 271 | maxpool = model.get_layer(name='avg_pool') 272 | shape = maxpool.output_shape[1:] 273 | dense = model.get_layer(name='fc1000') 274 | layer_utils.convert_dense_weights_data_format(dense, shape, 'channels_first') 275 | 276 | if K.backend() == 'tensorflow': 277 | warnings.warn('You are using the TensorFlow backend, yet you ' 278 | 'are using the Theano ' 279 | 'image data format convention ' 280 | '(`image_data_format="channels_first"`). ' 281 | 'For best performance, set ' 282 | '`image_data_format="channels_last"` in ' 283 | 'your Keras config ' 284 | 'at ~/.keras/keras.json.') 285 | return model 286 | 287 | 288 | if __name__ == '__main__': 289 | model = ResNet50(include_top=True, weights='imagenet') 290 | 291 | img_path = 'elephant.jpg' 292 | img = image.load_img(img_path, target_size=(224, 224)) 293 | x = image.img_to_array(img) 294 | x = np.expand_dims(x, axis=0) 295 | x = preprocess_input(x) 296 | print('Input image shape:', x.shape) 297 | 298 | preds = model.predict(x) 299 | print('Predicted:', decode_predictions(preds)) 300 | -------------------------------------------------------------------------------- /vgg16.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | '''VGG16 model for Keras. 3 | 4 | # Reference: 5 | 6 | - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) 7 | 8 | ''' 9 | from __future__ import print_function 10 | 11 | import numpy as np 12 | import warnings 13 | 14 | from keras.models import Model 15 | from keras.layers import Flatten 16 | from keras.layers import Dense 17 | from keras.layers import Input 18 | from keras.layers import Conv2D 19 | from keras.layers import MaxPooling2D 20 | from keras.layers import GlobalMaxPooling2D 21 | from keras.layers import GlobalAveragePooling2D 22 | from keras.preprocessing import image 23 | from keras.utils import layer_utils 24 | from keras.utils.data_utils import get_file 25 | from keras import backend as K 26 | from keras.applications.imagenet_utils import decode_predictions 27 | from keras.applications.imagenet_utils import preprocess_input 28 | from keras.applications.imagenet_utils import _obtain_input_shape 29 | from keras.engine.topology import get_source_inputs 30 | 31 | 32 | WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels.h5' 33 | WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5' 34 | 35 | 36 | def VGG16(include_top=True, weights='imagenet', 37 | input_tensor=None, input_shape=None, 38 | pooling=None, 39 | classes=1000): 40 | """Instantiates the VGG16 architecture. 41 | 42 | Optionally loads weights pre-trained 43 | on ImageNet. Note that when using TensorFlow, 44 | for best performance you should set 45 | `image_data_format="channels_last"` in your Keras config 46 | at ~/.keras/keras.json. 47 | 48 | The model and the weights are compatible with both 49 | TensorFlow and Theano. The data format 50 | convention used by the model is the one 51 | specified in your Keras config file. 52 | 53 | # Arguments 54 | include_top: whether to include the 3 fully-connected 55 | layers at the top of the network. 56 | weights: one of `None` (random initialization) 57 | or "imagenet" (pre-training on ImageNet). 58 | input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) 59 | to use as image input for the model. 60 | input_shape: optional shape tuple, only to be specified 61 | if `include_top` is False (otherwise the input shape 62 | has to be `(224, 224, 3)` (with `channels_last` data format) 63 | or `(3, 224, 244)` (with `channels_first` data format). 64 | It should have exactly 3 inputs channels, 65 | and width and height should be no smaller than 48. 66 | E.g. `(200, 200, 3)` would be one valid value. 67 | pooling: Optional pooling mode for feature extraction 68 | when `include_top` is `False`. 69 | - `None` means that the output of the model will be 70 | the 4D tensor output of the 71 | last convolutional layer. 72 | - `avg` means that global average pooling 73 | will be applied to the output of the 74 | last convolutional layer, and thus 75 | the output of the model will be a 2D tensor. 76 | - `max` means that global max pooling will 77 | be applied. 78 | classes: optional number of classes to classify images 79 | into, only to be specified if `include_top` is True, and 80 | if no `weights` argument is specified. 81 | 82 | # Returns 83 | A Keras model instance. 84 | 85 | # Raises 86 | ValueError: in case of invalid argument for `weights`, 87 | or invalid input shape. 88 | """ 89 | if weights not in {'imagenet', None}: 90 | raise ValueError('The `weights` argument should be either ' 91 | '`None` (random initialization) or `imagenet` ' 92 | '(pre-training on ImageNet).') 93 | 94 | if weights == 'imagenet' and include_top and classes != 1000: 95 | raise ValueError('If using `weights` as imagenet with `include_top`' 96 | ' as true, `classes` should be 1000') 97 | # Determine proper input shape 98 | input_shape = _obtain_input_shape(input_shape, 99 | default_size=224, 100 | min_size=48, 101 | data_format=K.image_data_format(), 102 | include_top=include_top) 103 | 104 | if input_tensor is None: 105 | img_input = Input(shape=input_shape) 106 | else: 107 | if not K.is_keras_tensor(input_tensor): 108 | img_input = Input(tensor=input_tensor, shape=input_shape) 109 | else: 110 | img_input = input_tensor 111 | # Block 1 112 | x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input) 113 | x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x) 114 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x) 115 | 116 | # Block 2 117 | x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x) 118 | x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x) 119 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x) 120 | 121 | # Block 3 122 | x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x) 123 | x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x) 124 | x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x) 125 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x) 126 | 127 | # Block 4 128 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x) 129 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x) 130 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x) 131 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x) 132 | 133 | # Block 5 134 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x) 135 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x) 136 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x) 137 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x) 138 | 139 | if include_top: 140 | # Classification block 141 | x = Flatten(name='flatten')(x) 142 | x = Dense(4096, activation='relu', name='fc1')(x) 143 | x = Dense(4096, activation='relu', name='fc2')(x) 144 | x = Dense(classes, activation='softmax', name='predictions')(x) 145 | else: 146 | if pooling == 'avg': 147 | x = GlobalAveragePooling2D()(x) 148 | elif pooling == 'max': 149 | x = GlobalMaxPooling2D()(x) 150 | 151 | # Ensure that the model takes into account 152 | # any potential predecessors of `input_tensor`. 153 | if input_tensor is not None: 154 | inputs = get_source_inputs(input_tensor) 155 | else: 156 | inputs = img_input 157 | # Create model. 158 | model = Model(inputs, x, name='vgg16') 159 | 160 | # load weights 161 | if weights == 'imagenet': 162 | if include_top: 163 | weights_path = get_file('vgg16_weights_tf_dim_ordering_tf_kernels.h5', 164 | WEIGHTS_PATH, 165 | cache_subdir='models') 166 | else: 167 | weights_path = get_file('vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5', 168 | WEIGHTS_PATH_NO_TOP, 169 | cache_subdir='models') 170 | model.load_weights(weights_path) 171 | if K.backend() == 'theano': 172 | layer_utils.convert_all_kernels_in_model(model) 173 | 174 | if K.image_data_format() == 'channels_first': 175 | if include_top: 176 | maxpool = model.get_layer(name='block5_pool') 177 | shape = maxpool.output_shape[1:] 178 | dense = model.get_layer(name='fc1') 179 | layer_utils.convert_dense_weights_data_format(dense, shape, 'channels_first') 180 | 181 | if K.backend() == 'tensorflow': 182 | warnings.warn('You are using the TensorFlow backend, yet you ' 183 | 'are using the Theano ' 184 | 'image data format convention ' 185 | '(`image_data_format="channels_first"`). ' 186 | 'For best performance, set ' 187 | '`image_data_format="channels_last"` in ' 188 | 'your Keras config ' 189 | 'at ~/.keras/keras.json.') 190 | return model 191 | 192 | 193 | if __name__ == '__main__': 194 | model = VGG16(include_top=True, weights='imagenet') 195 | 196 | img_path = 'elephant.jpg' 197 | img = image.load_img(img_path, target_size=(224, 224)) 198 | x = image.img_to_array(img) 199 | x = np.expand_dims(x, axis=0) 200 | x = preprocess_input(x) 201 | print('Input image shape:', x.shape) 202 | 203 | preds = model.predict(x) 204 | print('Predicted:', decode_predictions(preds)) 205 | -------------------------------------------------------------------------------- /vgg19.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | '''VGG19 model for Keras. 3 | 4 | # Reference: 5 | 6 | - [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) 7 | 8 | ''' 9 | from __future__ import print_function 10 | 11 | import numpy as np 12 | import warnings 13 | 14 | from keras.models import Model 15 | from keras.layers import Flatten, Dense, Input 16 | from keras.layers import Conv2D 17 | from keras.layers import MaxPooling2D 18 | from keras.layers import GlobalMaxPooling2D 19 | from keras.layers import GlobalAveragePooling2D 20 | from keras.preprocessing import image 21 | from keras.utils import layer_utils 22 | from keras.utils.data_utils import get_file 23 | from keras import backend as K 24 | from keras.applications.imagenet_utils import decode_predictions 25 | from keras.applications.imagenet_utils import preprocess_input 26 | from keras.applications.imagenet_utils import _obtain_input_shape 27 | from keras.engine.topology import get_source_inputs 28 | 29 | 30 | WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels.h5' 31 | WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5' 32 | 33 | 34 | def VGG19(include_top=True, weights='imagenet', 35 | input_tensor=None, input_shape=None, 36 | pooling=None, 37 | classes=1000): 38 | """Instantiates the VGG19 architecture. 39 | 40 | Optionally loads weights pre-trained 41 | on ImageNet. Note that when using TensorFlow, 42 | for best performance you should set 43 | `image_data_format="channels_last"` in your Keras config 44 | at ~/.keras/keras.json. 45 | 46 | The model and the weights are compatible with both 47 | TensorFlow and Theano. The data format 48 | convention used by the model is the one 49 | specified in your Keras config file. 50 | 51 | # Arguments 52 | include_top: whether to include the 3 fully-connected 53 | layers at the top of the network. 54 | weights: one of `None` (random initialization) 55 | or "imagenet" (pre-training on ImageNet). 56 | input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) 57 | to use as image input for the model. 58 | input_shape: optional shape tuple, only to be specified 59 | if `include_top` is False (otherwise the input shape 60 | has to be `(224, 224, 3)` (with `channels_last` data format) 61 | or `(3, 224, 244)` (with `channels_first` data format). 62 | It should have exactly 3 inputs channels, 63 | and width and height should be no smaller than 48. 64 | E.g. `(200, 200, 3)` would be one valid value. 65 | pooling: Optional pooling mode for feature extraction 66 | when `include_top` is `False`. 67 | - `None` means that the output of the model will be 68 | the 4D tensor output of the 69 | last convolutional layer. 70 | - `avg` means that global average pooling 71 | will be applied to the output of the 72 | last convolutional layer, and thus 73 | the output of the model will be a 2D tensor. 74 | - `max` means that global max pooling will 75 | be applied. 76 | classes: optional number of classes to classify images 77 | into, only to be specified if `include_top` is True, and 78 | if no `weights` argument is specified. 79 | 80 | # Returns 81 | A Keras model instance. 82 | 83 | # Raises 84 | ValueError: in case of invalid argument for `weights`, 85 | or invalid input shape. 86 | """ 87 | if weights not in {'imagenet', None}: 88 | raise ValueError('The `weights` argument should be either ' 89 | '`None` (random initialization) or `imagenet` ' 90 | '(pre-training on ImageNet).') 91 | 92 | if weights == 'imagenet' and include_top and classes != 1000: 93 | raise ValueError('If using `weights` as imagenet with `include_top`' 94 | ' as true, `classes` should be 1000') 95 | # Determine proper input shape 96 | input_shape = _obtain_input_shape(input_shape, 97 | default_size=224, 98 | min_size=48, 99 | data_format=K.image_data_format(), 100 | include_top=include_top) 101 | 102 | if input_tensor is None: 103 | img_input = Input(shape=input_shape) 104 | else: 105 | if not K.is_keras_tensor(input_tensor): 106 | img_input = Input(tensor=input_tensor, shape=input_shape) 107 | else: 108 | img_input = input_tensor 109 | # Block 1 110 | x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv1')(img_input) 111 | x = Conv2D(64, (3, 3), activation='relu', padding='same', name='block1_conv2')(x) 112 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block1_pool')(x) 113 | 114 | # Block 2 115 | x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv1')(x) 116 | x = Conv2D(128, (3, 3), activation='relu', padding='same', name='block2_conv2')(x) 117 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x) 118 | 119 | # Block 3 120 | x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x) 121 | x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x) 122 | x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x) 123 | x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv4')(x) 124 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x) 125 | 126 | # Block 4 127 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x) 128 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x) 129 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x) 130 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv4')(x) 131 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x) 132 | 133 | # Block 5 134 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x) 135 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x) 136 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x) 137 | x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv4')(x) 138 | x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x) 139 | 140 | if include_top: 141 | # Classification block 142 | x = Flatten(name='flatten')(x) 143 | x = Dense(4096, activation='relu', name='fc1')(x) 144 | x = Dense(4096, activation='relu', name='fc2')(x) 145 | x = Dense(classes, activation='softmax', name='predictions')(x) 146 | else: 147 | if pooling == 'avg': 148 | x = GlobalAveragePooling2D()(x) 149 | elif pooling == 'max': 150 | x = GlobalMaxPooling2D()(x) 151 | 152 | # Ensure that the model takes into account 153 | # any potential predecessors of `input_tensor`. 154 | if input_tensor is not None: 155 | inputs = get_source_inputs(input_tensor) 156 | else: 157 | inputs = img_input 158 | # Create model. 159 | model = Model(inputs, x, name='vgg19') 160 | 161 | # load weights 162 | if weights == 'imagenet': 163 | if include_top: 164 | weights_path = get_file('vgg19_weights_tf_dim_ordering_tf_kernels.h5', 165 | WEIGHTS_PATH, 166 | cache_subdir='models') 167 | else: 168 | weights_path = get_file('vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5', 169 | WEIGHTS_PATH_NO_TOP, 170 | cache_subdir='models') 171 | model.load_weights(weights_path) 172 | if K.backend() == 'theano': 173 | layer_utils.convert_all_kernels_in_model(model) 174 | 175 | if K.image_data_format() == 'channels_first': 176 | if include_top: 177 | maxpool = model.get_layer(name='block5_pool') 178 | shape = maxpool.output_shape[1:] 179 | dense = model.get_layer(name='fc1') 180 | layer_utils.convert_dense_weights_data_format(dense, shape, 'channels_first') 181 | 182 | if K.backend() == 'tensorflow': 183 | warnings.warn('You are using the TensorFlow backend, yet you ' 184 | 'are using the Theano ' 185 | 'image data format convention ' 186 | '(`image_data_format="channels_first"`). ' 187 | 'For best performance, set ' 188 | '`image_data_format="channels_last"` in ' 189 | 'your Keras config ' 190 | 'at ~/.keras/keras.json.') 191 | return model 192 | 193 | 194 | if __name__ == '__main__': 195 | model = VGG19(include_top=True, weights='imagenet') 196 | 197 | img_path = 'cat.jpg' 198 | img = image.load_img(img_path, target_size=(224, 224)) 199 | x = image.img_to_array(img) 200 | x = np.expand_dims(x, axis=0) 201 | x = preprocess_input(x) 202 | print('Input image shape:', x.shape) 203 | 204 | preds = model.predict(x) 205 | print('Predicted:', decode_predictions(preds)) 206 | -------------------------------------------------------------------------------- /xception.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | '''Xception V1 model for Keras. 3 | 4 | On ImageNet, this model gets to a top-1 validation accuracy of 0.790. 5 | and a top-5 validation accuracy of 0.945. 6 | 7 | Do note that the input image format for this model is different than for 8 | the VGG16 and ResNet models (299x299 instead of 224x224), 9 | and that the input preprocessing function 10 | is also different (same as Inception V3). 11 | 12 | Also do note that this model is only available for the TensorFlow backend, 13 | due to its reliance on `SeparableConvolution` layers. 14 | 15 | # Reference: 16 | 17 | - [Xception: Deep Learning with Depthwise Separable Convolutions](https://arxiv.org/abs/1610.02357) 18 | 19 | ''' 20 | from __future__ import print_function 21 | from __future__ import absolute_import 22 | 23 | import warnings 24 | import numpy as np 25 | 26 | from keras.preprocessing import image 27 | 28 | from keras.models import Model 29 | from keras import layers 30 | from keras.layers import Dense 31 | from keras.layers import Input 32 | from keras.layers import BatchNormalization 33 | from keras.layers import Activation 34 | from keras.layers import Conv2D 35 | from keras.layers import SeparableConv2D 36 | from keras.layers import MaxPooling2D 37 | from keras.layers import GlobalAveragePooling2D 38 | from keras.layers import GlobalMaxPooling2D 39 | from keras.engine.topology import get_source_inputs 40 | from keras.utils.data_utils import get_file 41 | from keras import backend as K 42 | from keras.applications.imagenet_utils import decode_predictions 43 | from keras.applications.imagenet_utils import _obtain_input_shape 44 | 45 | 46 | TF_WEIGHTS_PATH = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.4/xception_weights_tf_dim_ordering_tf_kernels.h5' 47 | TF_WEIGHTS_PATH_NO_TOP = 'https://github.com/fchollet/deep-learning-models/releases/download/v0.4/xception_weights_tf_dim_ordering_tf_kernels_notop.h5' 48 | 49 | 50 | def Xception(include_top=True, weights='imagenet', 51 | input_tensor=None, input_shape=None, 52 | pooling=None, 53 | classes=1000): 54 | """Instantiates the Xception architecture. 55 | 56 | Optionally loads weights pre-trained 57 | on ImageNet. This model is available for TensorFlow only, 58 | and can only be used with inputs following the TensorFlow 59 | data format `(width, height, channels)`. 60 | You should set `image_data_format="channels_last"` in your Keras config 61 | located at ~/.keras/keras.json. 62 | 63 | Note that the default input image size for this model is 299x299. 64 | 65 | # Arguments 66 | include_top: whether to include the fully-connected 67 | layer at the top of the network. 68 | weights: one of `None` (random initialization) 69 | or "imagenet" (pre-training on ImageNet). 70 | input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) 71 | to use as image input for the model. 72 | input_shape: optional shape tuple, only to be specified 73 | if `include_top` is False (otherwise the input shape 74 | has to be `(299, 299, 3)`. 75 | It should have exactly 3 inputs channels, 76 | and width and height should be no smaller than 71. 77 | E.g. `(150, 150, 3)` would be one valid value. 78 | pooling: Optional pooling mode for feature extraction 79 | when `include_top` is `False`. 80 | - `None` means that the output of the model will be 81 | the 4D tensor output of the 82 | last convolutional layer. 83 | - `avg` means that global average pooling 84 | will be applied to the output of the 85 | last convolutional layer, and thus 86 | the output of the model will be a 2D tensor. 87 | - `max` means that global max pooling will 88 | be applied. 89 | classes: optional number of classes to classify images 90 | into, only to be specified if `include_top` is True, and 91 | if no `weights` argument is specified. 92 | 93 | # Returns 94 | A Keras model instance. 95 | 96 | # Raises 97 | ValueError: in case of invalid argument for `weights`, 98 | or invalid input shape. 99 | RuntimeError: If attempting to run this model with a 100 | backend that does not support separable convolutions. 101 | """ 102 | if weights not in {'imagenet', None}: 103 | raise ValueError('The `weights` argument should be either ' 104 | '`None` (random initialization) or `imagenet` ' 105 | '(pre-training on ImageNet).') 106 | 107 | if weights == 'imagenet' and include_top and classes != 1000: 108 | raise ValueError('If using `weights` as imagenet with `include_top`' 109 | ' as true, `classes` should be 1000') 110 | 111 | if K.backend() != 'tensorflow': 112 | raise RuntimeError('The Xception model is only available with ' 113 | 'the TensorFlow backend.') 114 | if K.image_data_format() != 'channels_last': 115 | warnings.warn('The Xception model is only available for the ' 116 | 'input data format "channels_last" ' 117 | '(width, height, channels). ' 118 | 'However your settings specify the default ' 119 | 'data format "channels_first" (channels, width, height). ' 120 | 'You should set `image_data_format="channels_last"` in your Keras ' 121 | 'config located at ~/.keras/keras.json. ' 122 | 'The model being returned right now will expect inputs ' 123 | 'to follow the "channels_last" data format.') 124 | K.set_image_data_format('channels_last') 125 | old_data_format = 'channels_first' 126 | else: 127 | old_data_format = None 128 | 129 | # Determine proper input shape 130 | input_shape = _obtain_input_shape(input_shape, 131 | default_size=299, 132 | min_size=71, 133 | data_format=K.image_data_format(), 134 | include_top=include_top) 135 | 136 | if input_tensor is None: 137 | img_input = Input(shape=input_shape) 138 | else: 139 | if not K.is_keras_tensor(input_tensor): 140 | img_input = Input(tensor=input_tensor, shape=input_shape) 141 | else: 142 | img_input = input_tensor 143 | 144 | x = Conv2D(32, (3, 3), strides=(2, 2), use_bias=False, name='block1_conv1')(img_input) 145 | x = BatchNormalization(name='block1_conv1_bn')(x) 146 | x = Activation('relu', name='block1_conv1_act')(x) 147 | x = Conv2D(64, (3, 3), use_bias=False, name='block1_conv2')(x) 148 | x = BatchNormalization(name='block1_conv2_bn')(x) 149 | x = Activation('relu', name='block1_conv2_act')(x) 150 | 151 | residual = Conv2D(128, (1, 1), strides=(2, 2), 152 | padding='same', use_bias=False)(x) 153 | residual = BatchNormalization()(residual) 154 | 155 | x = SeparableConv2D(128, (3, 3), padding='same', use_bias=False, name='block2_sepconv1')(x) 156 | x = BatchNormalization(name='block2_sepconv1_bn')(x) 157 | x = Activation('relu', name='block2_sepconv2_act')(x) 158 | x = SeparableConv2D(128, (3, 3), padding='same', use_bias=False, name='block2_sepconv2')(x) 159 | x = BatchNormalization(name='block2_sepconv2_bn')(x) 160 | 161 | x = MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block2_pool')(x) 162 | x = layers.add([x, residual]) 163 | 164 | residual = Conv2D(256, (1, 1), strides=(2, 2), 165 | padding='same', use_bias=False)(x) 166 | residual = BatchNormalization()(residual) 167 | 168 | x = Activation('relu', name='block3_sepconv1_act')(x) 169 | x = SeparableConv2D(256, (3, 3), padding='same', use_bias=False, name='block3_sepconv1')(x) 170 | x = BatchNormalization(name='block3_sepconv1_bn')(x) 171 | x = Activation('relu', name='block3_sepconv2_act')(x) 172 | x = SeparableConv2D(256, (3, 3), padding='same', use_bias=False, name='block3_sepconv2')(x) 173 | x = BatchNormalization(name='block3_sepconv2_bn')(x) 174 | 175 | x = MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block3_pool')(x) 176 | x = layers.add([x, residual]) 177 | 178 | residual = Conv2D(728, (1, 1), strides=(2, 2), 179 | padding='same', use_bias=False)(x) 180 | residual = BatchNormalization()(residual) 181 | 182 | x = Activation('relu', name='block4_sepconv1_act')(x) 183 | x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name='block4_sepconv1')(x) 184 | x = BatchNormalization(name='block4_sepconv1_bn')(x) 185 | x = Activation('relu', name='block4_sepconv2_act')(x) 186 | x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name='block4_sepconv2')(x) 187 | x = BatchNormalization(name='block4_sepconv2_bn')(x) 188 | 189 | x = MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block4_pool')(x) 190 | x = layers.add([x, residual]) 191 | 192 | for i in range(8): 193 | residual = x 194 | prefix = 'block' + str(i + 5) 195 | 196 | x = Activation('relu', name=prefix + '_sepconv1_act')(x) 197 | x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name=prefix + '_sepconv1')(x) 198 | x = BatchNormalization(name=prefix + '_sepconv1_bn')(x) 199 | x = Activation('relu', name=prefix + '_sepconv2_act')(x) 200 | x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name=prefix + '_sepconv2')(x) 201 | x = BatchNormalization(name=prefix + '_sepconv2_bn')(x) 202 | x = Activation('relu', name=prefix + '_sepconv3_act')(x) 203 | x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name=prefix + '_sepconv3')(x) 204 | x = BatchNormalization(name=prefix + '_sepconv3_bn')(x) 205 | 206 | x = layers.add([x, residual]) 207 | 208 | residual = Conv2D(1024, (1, 1), strides=(2, 2), 209 | padding='same', use_bias=False)(x) 210 | residual = BatchNormalization()(residual) 211 | 212 | x = Activation('relu', name='block13_sepconv1_act')(x) 213 | x = SeparableConv2D(728, (3, 3), padding='same', use_bias=False, name='block13_sepconv1')(x) 214 | x = BatchNormalization(name='block13_sepconv1_bn')(x) 215 | x = Activation('relu', name='block13_sepconv2_act')(x) 216 | x = SeparableConv2D(1024, (3, 3), padding='same', use_bias=False, name='block13_sepconv2')(x) 217 | x = BatchNormalization(name='block13_sepconv2_bn')(x) 218 | 219 | x = MaxPooling2D((3, 3), strides=(2, 2), padding='same', name='block13_pool')(x) 220 | x = layers.add([x, residual]) 221 | 222 | x = SeparableConv2D(1536, (3, 3), padding='same', use_bias=False, name='block14_sepconv1')(x) 223 | x = BatchNormalization(name='block14_sepconv1_bn')(x) 224 | x = Activation('relu', name='block14_sepconv1_act')(x) 225 | 226 | x = SeparableConv2D(2048, (3, 3), padding='same', use_bias=False, name='block14_sepconv2')(x) 227 | x = BatchNormalization(name='block14_sepconv2_bn')(x) 228 | x = Activation('relu', name='block14_sepconv2_act')(x) 229 | 230 | if include_top: 231 | x = GlobalAveragePooling2D(name='avg_pool')(x) 232 | x = Dense(classes, activation='softmax', name='predictions')(x) 233 | else: 234 | if pooling == 'avg': 235 | x = GlobalAveragePooling2D()(x) 236 | elif pooling == 'max': 237 | x = GlobalMaxPooling2D()(x) 238 | 239 | # Ensure that the model takes into account 240 | # any potential predecessors of `input_tensor`. 241 | if input_tensor is not None: 242 | inputs = get_source_inputs(input_tensor) 243 | else: 244 | inputs = img_input 245 | # Create model. 246 | model = Model(inputs, x, name='xception') 247 | 248 | # load weights 249 | if weights == 'imagenet': 250 | if include_top: 251 | weights_path = get_file('xception_weights_tf_dim_ordering_tf_kernels.h5', 252 | TF_WEIGHTS_PATH, 253 | cache_subdir='models') 254 | else: 255 | weights_path = get_file('xception_weights_tf_dim_ordering_tf_kernels_notop.h5', 256 | TF_WEIGHTS_PATH_NO_TOP, 257 | cache_subdir='models') 258 | model.load_weights(weights_path) 259 | 260 | if old_data_format: 261 | K.set_image_data_format(old_data_format) 262 | return model 263 | 264 | 265 | def preprocess_input(x): 266 | x /= 255. 267 | x -= 0.5 268 | x *= 2. 269 | return x 270 | 271 | 272 | if __name__ == '__main__': 273 | model = Xception(include_top=True, weights='imagenet') 274 | 275 | img_path = 'elephant.jpg' 276 | img = image.load_img(img_path, target_size=(299, 299)) 277 | x = image.img_to_array(img) 278 | x = np.expand_dims(x, axis=0) 279 | x = preprocess_input(x) 280 | print('Input image shape:', x.shape) 281 | 282 | preds = model.predict(x) 283 | print(np.argmax(preds)) 284 | print('Predicted:', decode_predictions(preds, 1)) 285 | --------------------------------------------------------------------------------