├── val_error.png ├── val_error_zoom.png ├── cifar10_eval.py ├── cifar10_msra.py └── README.md /val_error.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apark263/cfmz/HEAD/val_error.png -------------------------------------------------------------------------------- /val_error_zoom.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/apark263/cfmz/HEAD/val_error_zoom.png -------------------------------------------------------------------------------- /cifar10_eval.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # ---------------------------------------------------------------------------- 3 | # Copyright 2016 Nervana Systems Inc. 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # ---------------------------------------------------------------------------- 16 | from neon.util.argparser import NeonArgparser 17 | from neon.util.persist import load_obj 18 | from neon.transforms import Misclassification 19 | from neon.models import Model 20 | from neon.data import ImageLoader 21 | 22 | # parse the command line arguments (generates the backend) 23 | parser = NeonArgparser(__doc__) 24 | args = parser.parse_args() 25 | 26 | # setup data provider 27 | test_set = ImageLoader(set_name='validation', repo_dir=args.data_dir, 28 | inner_size=32, scale_range=40, do_transforms=False) 29 | 30 | model = Model(load_obj(args.model_file), test_set) 31 | print 'Error = %.1f%%' % (model.eval(test_set, metric=Misclassification())*100) 32 | -------------------------------------------------------------------------------- /cifar10_msra.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # ---------------------------------------------------------------------------- 3 | # Copyright 2016 Nervana Systems Inc. 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | # ---------------------------------------------------------------------------- 16 | from neon.util.argparser import NeonArgparser 17 | from neon.initializers import Kaiming, IdentityInit 18 | from neon.layers import Conv, Pooling, GeneralizedCost, Affine, Activation 19 | from neon.layers import MergeSum, SkipNode 20 | from neon.optimizers import GradientDescentMomentum, Schedule 21 | from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, Misclassification 22 | from neon.models import Model 23 | from neon.data import ImageLoader 24 | from neon.callbacks.callbacks import Callbacks 25 | 26 | # parse the command line arguments (generates the backend) 27 | parser = NeonArgparser(__doc__) 28 | parser.add_argument('--depth', type=int, default=9, 29 | help='depth of each stage (network depth will be 6n+2)') 30 | args = parser.parse_args() 31 | 32 | # setup data provider 33 | imgset_options = dict(inner_size=32, scale_range=40, repo_dir=args.data_dir) 34 | train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options) 35 | test = ImageLoader(set_name='validation', shuffle=False, do_transforms=False, **imgset_options) 36 | 37 | 38 | 39 | def conv_params(fsize, nfm, stride=1, relu=True): 40 | return dict(fshape=(fsize, fsize, nfm), strides=stride, padding=(1 if fsize > 1 else 0), 41 | activation=(Rectlin() if relu else None), 42 | init=Kaiming(local=True), 43 | batch_norm=True) 44 | 45 | 46 | def id_params(nfm): 47 | return dict(fshape=(1, 1, nfm), strides=2, padding=0, activation=None, init=IdentityInit()) 48 | 49 | 50 | def module_factory(nfm, stride=1): 51 | mainpath = [Conv(**conv_params(3, nfm, stride=stride)), 52 | Conv(**conv_params(3, nfm, relu=False))] 53 | sidepath = [SkipNode() if stride == 1 else Conv(**id_params(nfm))] 54 | module = [MergeSum([mainpath, sidepath]), 55 | Activation(Rectlin())] 56 | return module 57 | 58 | # Structure of the deep residual part of the network: 59 | # args.depth modules of 2 convolutional layers each at feature map depths of 16, 32, 64 60 | nfms = [2**(stage + 4) for stage in sorted(range(3) * args.depth)] 61 | strides = [1] + [1 if cur == prev else 2 for cur, prev in zip(nfms[1:], nfms[:-1])] 62 | 63 | # Now construct the network 64 | layers = [Conv(**conv_params(3, 16))] 65 | for nfm, stride in zip(nfms, strides): 66 | layers.append(module_factory(nfm, stride)) 67 | layers.append(Pooling(8, op='avg')) 68 | layers.append(Affine(nout=10, init=Kaiming(local=False), batch_norm=True, activation=Softmax())) 69 | 70 | model = Model(layers=layers) 71 | opt = GradientDescentMomentum(0.1, 0.9, wdecay=0.0001, 72 | schedule=Schedule([90, 135], 0.1)) 73 | 74 | # configure callbacks 75 | callbacks = Callbacks(model, eval_set=test, metric=Misclassification(), **args.callback_args) 76 | cost = GeneralizedCost(costfunc=CrossEntropyMulti()) 77 | 78 | model.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks) 79 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Model 2 | This is an implementation of the deep residual network used for cifar10 as described in [He et. al., 3 | "Deep Residual Learning for Image Recognition"](http://arxiv.org/abs/1512.03385). The model is 4 | structured as a very deep network with skip connections designed to have convolutional parameters 5 | adjusting to residual activations. The training protocol uses minimal pre-processing (mean 6 | subtraction) and very simple data augmentation (shuffling, flipping, and cropping). All model 7 | parameters (even batch norm parameters) are updated using simple stochastic gradient descent with 8 | weight decay. The learning rate is dropped only twice (at 90 and 135 epochs). 9 | 10 | ### Acknowledgments 11 | Many thanks to Dr. He and his team at MSRA for their helpful input in replicating the model as 12 | described in their paper. 13 | 14 | ### Model script 15 | The model train script is included ([cifar10_msra.py](./cifar10_msra.py)). 16 | 17 | ### Trained weights 18 | The trained weights file can be downloaded from AWS 19 | 20 | | Model Depth | Model File | 21 | | ----------- | ---------- | 22 | | 20 | [cifar10_msra_020_e180.p](https://s3-us-west-1.amazonaws.com/nervana-modelzoo/cifar10_msra_e180.p) | 23 | | 32 | [cifar10_msra_032_e180.p](https://s3-us-west-1.amazonaws.com/nervana-modelzoo/cifar10_msra_e180.p) | 24 | | 56 | [cifar10_msra_056_e180.p](https://s3-us-west-1.amazonaws.com/nervana-modelzoo/cifar10_msra_e180.p) | 25 | | 110 | [cifar10_msra_110_e180.p](https://s3-us-west-1.amazonaws.com/nervana-modelzoo/cifar10_msra_e180.p) | 26 | 27 | ### Performance 28 | Training this model with the options described below should be able to achieve above 93.6% top-1 29 | accuracy using only mean subtraction, random cropping, and random flips. 30 | 31 | ## Instructions 32 | This script was tested with [neon version 1.2.1](https://github.com/NervanaSystems/neon/tree/v1.2.1). 33 | Make sure that your local repo is synced to this commit and run the [installation 34 | procedure](http://neon.nervanasys.com/docs/latest/user_guide.html#installation) before proceeding. 35 | Commit SHA for v1.2.1 is `c460e6c12cc4ea6e7453c0335afadf1f5110a4f7` 36 | 37 | In addition, we use the branch that implements the merge sum layer type. 38 | 39 | This example uses the `ImageLoader` module to load the images for consumption while applying random 40 | cropping, flipping, and shuffling. Prior to beginning training, you need to write out the padded 41 | cifar10 images into a macrobatch repository. From your top-level neon direcotry, run: 42 | 43 | ``` 44 | neon/data/batch_writer.py \ 45 | --set_type cifar10 \ 46 | --data_dir \ 47 | --macro_size 10000 \ 48 | --target_size 40 49 | ``` 50 | 51 | Note that it is good practice to choose your `data_dir` to be local to your machine in order to 52 | avoid having `ImageLoader` module perform reads over the network. 53 | 54 | Once the batches have been written out, you may initiate training: 55 | ``` 56 | cifar10_msra.py -r 0 -vv \ 57 | --log \ 58 | --epochs 180 \ 59 | --save_path \ 60 | --eval_freq 1 \ 61 | --backend gpu \ 62 | --data_dir \ 63 | --depth 64 | ``` 65 | 66 | The depth argument is the `n` value discussed in the paper which represents the number of repeated 67 | residual models at each filter depth. Since there are 3 stages at each filter depth, and each 68 | residual module consists of 2 convolutional layers, there will be `6n` total convolutional layers 69 | in the residual part of the network, plus 2 additional layers (input convolutional, and output 70 | linear), making the total network `6n+2` layers deep. For depth arguments of 3, 5, 9, 18, we get 71 | network depths of 20, 32, 56, and 110. 72 | 73 | If you just want to run evaluation, you can use the much simpler script that loads the serialized 74 | model and evaluates it on the validation set: 75 | 76 | ``` 77 | cifar10_eval.py -vv --model_file 78 | ``` 79 | 80 | ## Benchmarks 81 | Machine and GPU specs: 82 | ``` 83 | Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 84 | Ubuntu 14.04.2 LTS 85 | GPU: GeForce GTX TITAN X 86 | CUDA Driver Version 7.0 87 | ``` 88 | 89 | The memory usage and per-epoch training time of each network configuration, along with final 90 | validation error is shown in the table below. We observed that the error rates were consistently 91 | lower than what was cited in the original paper. Our hypothesis is that this may be due to our 92 | inclusion of a final batch norm transformation at the output affine layer. 93 | 94 | | Model Depth | GPU Memory Footprint | Seconds per Epoch | Validation Error % | 95 | | ------------ | -------------------- | ----------------- | ------------------ | 96 | | 20 | 521 MiB | 11 | 8.29 | 97 | | 32 | 636 MiB | 18 | 7.26 | 98 | | 56 | 860 MiB | 30 | 6.31 | 99 | | 110 | 1277 MiB | 60 | 6.00 | 100 | 101 | The total amount of time to train the 56 layer network for 180 epochs was about 90 minutes with the 102 | described machine and GPU specifications. 103 | 104 | The evolution of validation misclassification error for the various layer depths can be seen in the 105 | figures below. 106 | 107 | ![validation error](./val_error.png) 108 | 109 | ![validation error zoom](./val_error_zoom.png) 110 | --------------------------------------------------------------------------------