├── val_error.png
├── val_error_zoom.png
├── cifar10_eval.py
├── cifar10_msra.py
└── README.md


/val_error.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apark263/cfmz/HEAD/val_error.png


--------------------------------------------------------------------------------
/val_error_zoom.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/apark263/cfmz/HEAD/val_error_zoom.png


--------------------------------------------------------------------------------
/cifar10_eval.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # ----------------------------------------------------------------------------
 3 | # Copyright 2016 Nervana Systems Inc.
 4 | # Licensed under the Apache License, Version 2.0 (the "License");
 5 | # you may not use this file except in compliance with the License.
 6 | # You may obtain a copy of the License at
 7 | #
 8 | #      http://www.apache.org/licenses/LICENSE-2.0
 9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | # ----------------------------------------------------------------------------
16 | from neon.util.argparser import NeonArgparser
17 | from neon.util.persist import load_obj
18 | from neon.transforms import Misclassification
19 | from neon.models import Model
20 | from neon.data import ImageLoader
21 | 
22 | # parse the command line arguments (generates the backend)
23 | parser = NeonArgparser(__doc__)
24 | args = parser.parse_args()
25 | 
26 | # setup data provider
27 | test_set = ImageLoader(set_name='validation', repo_dir=args.data_dir,
28 |                        inner_size=32, scale_range=40, do_transforms=False)
29 | 
30 | model = Model(load_obj(args.model_file), test_set)
31 | print 'Error = %.1f%%' % (model.eval(test_set, metric=Misclassification())*100)
32 | 


--------------------------------------------------------------------------------
/cifar10_msra.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | # ----------------------------------------------------------------------------
 3 | # Copyright 2016 Nervana Systems Inc.
 4 | # Licensed under the Apache License, Version 2.0 (the "License");
 5 | # you may not use this file except in compliance with the License.
 6 | # You may obtain a copy of the License at
 7 | #
 8 | #      http://www.apache.org/licenses/LICENSE-2.0
 9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | # ----------------------------------------------------------------------------
16 | from neon.util.argparser import NeonArgparser
17 | from neon.initializers import Kaiming, IdentityInit
18 | from neon.layers import Conv, Pooling, GeneralizedCost, Affine, Activation
19 | from neon.layers import MergeSum, SkipNode
20 | from neon.optimizers import GradientDescentMomentum, Schedule
21 | from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, Misclassification
22 | from neon.models import Model
23 | from neon.data import ImageLoader
24 | from neon.callbacks.callbacks import Callbacks
25 | 
26 | # parse the command line arguments (generates the backend)
27 | parser = NeonArgparser(__doc__)
28 | parser.add_argument('--depth', type=int, default=9,
29 |                     help='depth of each stage (network depth will be 6n+2)')
30 | args = parser.parse_args()
31 | 
32 | # setup data provider
33 | imgset_options = dict(inner_size=32, scale_range=40, repo_dir=args.data_dir)
34 | train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)
35 | test = ImageLoader(set_name='validation', shuffle=False, do_transforms=False, **imgset_options)
36 | 
37 | 
38 | 
39 | def conv_params(fsize, nfm, stride=1, relu=True):
40 |     return dict(fshape=(fsize, fsize, nfm), strides=stride, padding=(1 if fsize > 1 else 0),
41 |                 activation=(Rectlin() if relu else None),
42 |                 init=Kaiming(local=True),
43 |                 batch_norm=True)
44 | 
45 | 
46 | def id_params(nfm):
47 |     return dict(fshape=(1, 1, nfm), strides=2, padding=0, activation=None, init=IdentityInit())
48 | 
49 | 
50 | def module_factory(nfm, stride=1):
51 |     mainpath = [Conv(**conv_params(3, nfm, stride=stride)),
52 |                 Conv(**conv_params(3, nfm, relu=False))]
53 |     sidepath = [SkipNode() if stride == 1 else Conv(**id_params(nfm))]
54 |     module = [MergeSum([mainpath, sidepath]),
55 |               Activation(Rectlin())]
56 |     return module
57 | 
58 | # Structure of the deep residual part of the network:
59 | # args.depth modules of 2 convolutional layers each at feature map depths of 16, 32, 64
60 | nfms = [2**(stage + 4) for stage in sorted(range(3) * args.depth)]
61 | strides = [1] + [1 if cur == prev else 2 for cur, prev in zip(nfms[1:], nfms[:-1])]
62 | 
63 | # Now construct the network
64 | layers = [Conv(**conv_params(3, 16))]
65 | for nfm, stride in zip(nfms, strides):
66 |     layers.append(module_factory(nfm, stride))
67 | layers.append(Pooling(8, op='avg'))
68 | layers.append(Affine(nout=10, init=Kaiming(local=False), batch_norm=True, activation=Softmax()))
69 | 
70 | model = Model(layers=layers)
71 | opt = GradientDescentMomentum(0.1, 0.9, wdecay=0.0001,
72 |                               schedule=Schedule([90, 135], 0.1))
73 | 
74 | # configure callbacks
75 | callbacks = Callbacks(model, eval_set=test, metric=Misclassification(), **args.callback_args)
76 | cost = GeneralizedCost(costfunc=CrossEntropyMulti())
77 | 
78 | model.fit(train, optimizer=opt, num_epochs=args.epochs, cost=cost, callbacks=callbacks)
79 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Model
  2 | This is an implementation of the deep residual network used for cifar10 as described in [He et. al.,
  3 | "Deep Residual Learning for Image Recognition"](http://arxiv.org/abs/1512.03385).  The model is
  4 | structured as a very deep network with skip connections designed to have convolutional parameters
  5 | adjusting to residual activations.  The training protocol uses minimal pre-processing (mean
  6 | subtraction) and very simple data augmentation (shuffling, flipping, and cropping).  All model
  7 | parameters (even batch norm parameters) are updated using simple stochastic gradient descent with
  8 | weight decay.  The learning rate is dropped only twice (at 90 and 135 epochs).
  9 | 
 10 | ### Acknowledgments
 11 | Many thanks to Dr. He and his team at MSRA for their helpful input in replicating the model as
 12 | described in their paper.
 13 | 
 14 | ### Model script
 15 | The model train script is included ([cifar10_msra.py](./cifar10_msra.py)).
 16 | 
 17 | ### Trained weights
 18 | The trained weights file can be downloaded from AWS
 19 | 
 20 | | Model Depth | Model File |
 21 | | ----------- | ---------- |
 22 | |  20 | [cifar10_msra_020_e180.p](https://s3-us-west-1.amazonaws.com/nervana-modelzoo/cifar10_msra_e180.p) |
 23 | |  32 | [cifar10_msra_032_e180.p](https://s3-us-west-1.amazonaws.com/nervana-modelzoo/cifar10_msra_e180.p) |
 24 | |  56 | [cifar10_msra_056_e180.p](https://s3-us-west-1.amazonaws.com/nervana-modelzoo/cifar10_msra_e180.p) |
 25 | | 110 | [cifar10_msra_110_e180.p](https://s3-us-west-1.amazonaws.com/nervana-modelzoo/cifar10_msra_e180.p) |
 26 | 
 27 | ### Performance
 28 | Training this model with the options described below should be able to achieve above 93.6% top-1
 29 | accuracy using only mean subtraction, random cropping, and random flips.
 30 | 
 31 | ## Instructions
 32 | This script was tested with [neon version 1.2.1](https://github.com/NervanaSystems/neon/tree/v1.2.1).
 33 | Make sure that your local repo is synced to this commit and run the [installation
 34 | procedure](http://neon.nervanasys.com/docs/latest/user_guide.html#installation) before proceeding.
 35 | Commit SHA for v1.2.1 is  `c460e6c12cc4ea6e7453c0335afadf1f5110a4f7`
 36 | 
 37 | In addition, we use the branch that implements the merge sum layer type.
 38 | 
 39 | This example uses the `ImageLoader` module to load the images for consumption while applying random
 40 | cropping, flipping, and shuffling.  Prior to beginning training, you need to write out the padded
 41 | cifar10 images into a macrobatch repository.  From your top-level neon direcotry, run:
 42 | 
 43 | ```
 44 | neon/data/batch_writer.py \
 45 |     --set_type cifar10 \
 46 |     --data_dir <path-to-save-batches> \
 47 |     --macro_size 10000 \
 48 |     --target_size 40
 49 | ```
 50 | 
 51 | Note that it is good practice to choose your `data_dir` to be local to your machine in order to
 52 | avoid having `ImageLoader` module perform reads over the network.
 53 | 
 54 | Once the batches have been written out, you may initiate training:
 55 | ```
 56 | cifar10_msra.py -r 0 -vv \
 57 |     --log <logfile> \
 58 |     --epochs 180 \
 59 |     --save_path <model-save-path> \
 60 |     --eval_freq 1 \
 61 |     --backend gpu \
 62 |     --data_dir <path-to-saved-batches> \
 63 |     --depth <n>
 64 | ```
 65 | 
 66 | The depth argument is the `n` value discussed in the paper which represents the number of repeated
 67 | residual models at each filter depth.  Since there are 3 stages at each filter depth, and each
 68 | residual module consists of 2 convolutional layers, there will be `6n` total convolutional layers
 69 | in the residual part of the network, plus 2 additional layers (input convolutional, and output
 70 | linear), making the total network `6n+2` layers deep.  For depth arguments of 3, 5, 9, 18, we get
 71 | network depths of 20, 32, 56, and 110.
 72 | 
 73 | If you just want to run evaluation, you can use the much simpler script that loads the serialized
 74 | model and evaluates it on the validation set:
 75 | 
 76 | ```
 77 | cifar10_eval.py -vv --model_file <model-save-path>
 78 | ```
 79 | 
 80 | ## Benchmarks
 81 | Machine and GPU specs:
 82 | ```
 83 | Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
 84 | Ubuntu 14.04.2 LTS
 85 | GPU: GeForce GTX TITAN X
 86 | CUDA Driver Version 7.0
 87 | ```
 88 | 
 89 | The memory usage and per-epoch training time of each network configuration, along with final
 90 | validation error is shown in the table below.  We observed that the error rates were consistently
 91 | lower than what was cited in the original paper.  Our hypothesis is that this may be due to our
 92 | inclusion of a final batch norm transformation at the output affine layer.
 93 | 
 94 | | Model Depth  | GPU Memory Footprint | Seconds per Epoch | Validation Error % |
 95 | | ------------ | -------------------- | ----------------- | ------------------ |
 96 | |  20 |  521 MiB | 11 | 8.29 |
 97 | |  32 |  636 MiB | 18 | 7.26 |
 98 | |  56 |  860 MiB | 30 | 6.31 |
 99 | | 110 | 1277 MiB | 60 | 6.00 |
100 | 
101 | The total amount of time to train the 56 layer network for 180 epochs was about 90 minutes with the
102 | described machine and GPU specifications.
103 | 
104 | The evolution of validation misclassification error for the various layer depths can be seen in the
105 | figures below.
106 | 
107 | ![validation error](./val_error.png)
108 | 
109 | ![validation error zoom](./val_error_zoom.png)
110 | 


--------------------------------------------------------------------------------