├── 0.6662_imagenet_etinynet_477k-294-best.params
├── README.md
├── etinynet.py
├── test_imagenet.py
└── train_imagenet.py


/0.6662_imagenet_etinynet_477k-294-best.params:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aztc/EtinyNet/d3270389fb09057bbcd9006520e6307ec4984518/0.6662_imagenet_etinynet_477k-294-best.params


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # EtinyNet
 2 | 
 3 | EtinyNet is an extremely tiny CNN backbone for Tiny Machine Learning (TinyML) that aims at executing AI workloads on low-power & low-cost IoT devices with limitted memory, such as Microcontrollor (MCU) , compact Filed Programable Gata Array (FPGA) and small footprint CNNs accelerator. 
 4 | 
 5 | We currently provide two settings of EtinyNet-1.0 and EtinyNet-0.75, which have only 477K and parameters and 360K parameters. The performance of these two models on ImageNet and comparisons with other state-of-the-art lightweight models are shwon below.
 6 | 
 7 | Table 1. Comparison of state-of-the-art small networks over classification accuracy, the model size and MAdds on ImageNet-1000 dataset. “-” mean no reported results available. The input size is 224x224.
 8 | | Model| Params.(M) |  Top-1 Acc. (%)| Top-5 Acc. (%)|
 9 | | ---- | -- |-- |-- |
10 | | MobileNeXt-0.35 | 1.8 |64.7|-|
11 | | MnasNet-A1-0.35 | 1.7 |64.4|85.1|
12 | | MobileNetV2-0.35 | 1.7 |60.3|82.9
13 | | MicroNet-M3 | 1.6 |61.3|82.9|
14 | | MobileNetV3-Small-0.35 | 1.6 |58.0|-|
15 | | ShuffleNetV2-0.5 | 1.4 |61.1|82.6|
16 | | MicroNet-M2 | 1.4 |58.2|80.1|
17 | | MobileNetV2-0.15 | 1.4 |55.1|-|
18 | | MobileNetV1-0.5 | 1.3 |61.7|83.6|
19 | | EfficientNet-B | 1.3 |56.7|79.8|
20 | | **EtinyNet-1.0** | **0.98** |**65.5**|**86.2**|
21 | | **EtinyNet-0.75** | **0.68** |**62.2**|**84.0**|
22 | 
23 | 
24 | 
25 | We deploy the int8 quantized EtinyNet-1.0 on STM32H743 MCU for running object classification and detection. Results are in Tabel 2.
26 | 
27 | Tabel 2. Comparison to MCU designs on ImageNet. EtinyNet obtains the record accuracy of64.7% and 65.8% on STM32F412 and STM32F746.
28 | | Model| STM32F412 |  STM32F746 |
29 | | ---- | -- |-- |
30 | | Rusciet al.  | 60.2% |--|
31 | | MCUNet       | 62.2% |63.5%|
32 | | EtinyNet-1.0 | **64.7%** |**65.8%**|
33 | 
34 | In fact, the EtinyNet can exhibit its powerful performance on the specially designed CNN accelerator Neural Co-processor (NCP). Since EtinyNet comsumes only ~800KB memory (except fully-connected layer), NCP can run it in a single-chip mannar without accessing off-chip memory, saving much energy and latency caused by data transmission. We build a system based on NCP + MCU(STM32L4R9), in which the MCU runs pre-processsing and post-processing while NCP runs EtinyNet. NCP stores weights and feature maps on chip and connects with MCU via SDIO/SPI interface to transmite images and results. The system has a really simple working pipeline as: 1) MCU sends image to NCP, 2) NCP runs EtinyNet, 3) TinyNPU sends results back. With the NCP, we prompt the throughput of entire system to 30fps and reache an extremelly low processing power of 160mW (MCU + NCP)。
35 | 
36 | Here's a video that presents the prototype system.
37 | 
38 | [![EtinyNet](https://i9.ytimg.com/vi/mIZPxtJ-9EY/mq3.jpg?sqp=COju4ZAG&rs=AOn4CLDglN9ujGc3h1syZAd-s9PNYzD9-Q)](https://www.youtube.com/watch?v=mIZPxtJ-9EY)
39 | 
40 | 
41 | We provide the training code for EtinyNet-1.0 (no quantization) as well as the coresponding test code and well-trained parameters, as listed below:
42 | 
43 | 1) train_imagenet.py: training code. The default input size is 224, 300 epoches and  128 batchsize x 8 gpus.
44 | 
45 | 2) etinynet.py: EtinyNet-1.0.
46 | 
47 | 3) 0.6553-imagenet-mobilenet_lite313_477k_nownorm_4433_224-293-best.py: well-trained parameters for EtinyNet-1.0 (no quantization)
48 | 
49 | 4) test_imagenet.py : training code.
50 | 
51 | The MXNet toolbox and '.rec' compressed ImageNet data file were used for training efficiency. Please refer to https://mxnet.incubator.apache.org/versions/1.9.0/ for more detail about '.rec' data.
52 | 
53 | 
54 | 


--------------------------------------------------------------------------------
/etinynet.py:
--------------------------------------------------------------------------------
  1 | from mxnet.gluon import nn
  2 | from mxnet.gluon.nn import BatchNorm
  3 | from mxnet.context import cpu
  4 | from mxnet.gluon.block import HybridBlock
  5 | from gluoncv.nn import ReLU6
  6 | import mxnet as mx
  7 | from gluoncv.model_zoo.ssd import get_ssd
  8 | from gluoncv.model_zoo import get_model
  9 | from gluoncv.nn.coder import MultiPerClassDecoder, NormalizedBoxCenterDecoder
 10 | from gluoncv.model_zoo.ssd.anchor import SSDAnchorGenerator
 11 | from mxnet import autograd
 12 | import numpy as np
 13 | 
 14 | 
 15 | class Clip(HybridBlock):
 16 |     def __init__(self, **kwargs):
 17 |         super(Clip, self).__init__(**kwargs)
 18 |     def hybrid_forward(self, F, x):
 19 |         return F.clip(x, -12, 12, name="clip")
 20 |     
 21 | 
 22 | # pylint: disable= too-many-arguments
 23 | def _add_conv(out, channels=1, kernel=1, stride=1, pad=0, norm=True, in_channels=0, quantized=True,
 24 |               num_group=1, active=True, relu6=False, norm_layer=BatchNorm, norm_kwargs=None):
 25 |     out.add(nn.Conv2D(channels, kernel, stride, pad, groups=num_group, use_bias=False,in_channels=in_channels))
 26 |     if active:
 27 |         out.add(ReLU6() if relu6 else nn.Activation('relu'))
 28 |     if norm:
 29 |         out.add(norm_layer(scale=True, center=True, **({} if norm_kwargs is None else norm_kwargs)))
 30 | 
 31 | 
 32 | class LinearBottleneck(nn.HybridBlock):
 33 |     r"""LinearBottleneck used in MobileNetV2 model from the
 34 |     `"Inverted Residuals and Linear Bottlenecks:
 35 |       Mobile Networks for Classification, Detection and Segmentation"
 36 |     <https://arxiv.org/abs/1801.04381>`_ paper.
 37 | 
 38 |     Parameters
 39 |     ----------
 40 |     in_channels : int
 41 |         Number of input channels.
 42 |     channels : int
 43 |         Number of output channels.
 44 |     t : int
 45 |         Layer expansion ratio.
 46 |     stride : int
 47 |         stride
 48 |     norm_layer : object
 49 |         Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
 50 |         Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
 51 |     norm_kwargs : dict
 52 |         Additional `norm_layer` arguments, for example `num_devices=4`
 53 |         for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
 54 |     """
 55 | 
 56 |     def __init__(self, channels, stride, shortcut,
 57 |                  norm_layer=BatchNorm, norm_kwargs=None, **kwargs):
 58 |         super(LinearBottleneck, self).__init__(**kwargs)
 59 |         self.use_shortcut1 = (stride == 1 and channels[0] == channels[1] and shortcut)
 60 |         self.use_shortcut2 = shortcut
 61 |         with self.name_scope():
 62 |             self.out1 = nn.HybridSequential() # 1x1
 63 |             self.out2 = nn.HybridSequential() # 1x1
 64 |             
 65 |             _add_conv(self.out1,
 66 |                       in_channels=channels[0],
 67 |                       channels=channels[0],
 68 |                       kernel=3,
 69 |                       stride=stride,
 70 |                       pad=1,
 71 |                       num_group=channels[0],
 72 |                       active=False,
 73 |                       relu6=False,
 74 |                       norm_layer=norm_layer, norm_kwargs=norm_kwargs)
 75 |             _add_conv(self.out1,
 76 |                       in_channels=channels[0],
 77 |                       channels=channels[1],
 78 |                       active=True,
 79 |                       relu6=False,
 80 |                       norm_layer=norm_layer, norm_kwargs=norm_kwargs)
 81 |             _add_conv(self.out2,
 82 |                       in_channels=channels[1],
 83 |                       channels=channels[1], 
 84 |                       kernel=3, 
 85 |                       stride=1, 
 86 |                       pad=1,
 87 |                       num_group=channels[1],
 88 |                       active=True, 
 89 |                       relu6=False,
 90 |                       norm_layer=norm_layer, norm_kwargs=norm_kwargs)
 91 |                         
 92 |             
 93 |     def hybrid_forward(self, F, x):    
 94 |         out = self.out1(x)
 95 |         if self.use_shortcut1:        
 96 |             out = F.elemwise_add(out, x)
 97 |         x = out
 98 |         out = self.out2(out)
 99 |         if self.use_shortcut2:
100 |             out = F.elemwise_add(out, x)   
101 |         return out
102 | 
103 | 
104 | 
105 | class Etinynet(nn.HybridBlock):
106 |     r"""MobileNetV2 model from the
107 |     `"Inverted Residuals and Linear Bottlenecks:
108 |       Mobile Networks for Classification, Detection and Segmentation"
109 |     <https://arxiv.org/abs/1801.04381>`_ paper.
110 |     Parameters
111 |     ----------
112 |     multiplier : float, default 1.0
113 |         The width multiplier for controlling the model size. The actual number of channels
114 |         is equal to the original channel size multiplied by this multiplier.
115 |     classes : int, default 1000
116 |         Number of classes for the output layer.
117 |     norm_layer : object
118 |         Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
119 |         Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
120 |     norm_kwargs : dict
121 |         Additional `norm_layer` arguments, for example `num_devices=4`
122 |         for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
123 |     """
124 |     def __init__(self, multiplier=1.0, classes=1000, norm_layer=BatchNorm, norm_kwargs=None, 
125 |                   ctx=cpu(), root='', pretrained=False, **kwargs):
126 |         super(Etinynet, self).__init__(**kwargs)
127 |         
128 |         with self.name_scope():
129 |             self.features1 = nn.HybridSequential(prefix='features1_')
130 |             self.features2 = nn.HybridSequential(prefix='features2_')
131 |             self.features3 = nn.HybridSequential(prefix='features2_')
132 |             with self.features1.name_scope():
133 |                 _add_conv(self.features1, int(32 * multiplier), kernel=3,
134 |                           stride=(2,2), pad=1, relu6=False,in_channels=3, quantized=True,
135 |                           norm_layer=norm_layer, norm_kwargs=norm_kwargs)
136 |                 
137 |                 self.features1.add(nn.MaxPool2D((2,2)))
138 |                 channels_group = [[32, 32],  [32, 32], [32, 32], [32, 32],
139 |                                   [32, 128], [128, 128], [128, 128], [128, 128]]
140 |                 strides =   [1,1,1,1] + [2,1,1,1] 
141 |                 shortcuts = [0,0,0,0] + [0,0,0,0]
142 |                 for cg, s, sc in zip(channels_group, strides, shortcuts):
143 |                     self.features1.add(LinearBottleneck(channels=np.int32(np.array(cg)*multiplier),
144 |                                                        stride=s,
145 |                                                        shortcut=sc,
146 |                                                        norm_layer=norm_layer,
147 |                                                        norm_kwargs=norm_kwargs))
148 |                 
149 |                            
150 |                 channels_group = [[128, 192], [192, 192], [192, 192]]
151 |                 strides =   [2,1,1]
152 |                 shortcuts = [1,1,1]
153 |                 for cg, s, sc in zip(channels_group, strides, shortcuts):
154 |                     self.features2.add(LinearBottleneck(channels=np.int32(np.array(cg)*multiplier),
155 |                                                        stride=s,
156 |                                                        shortcut=sc,
157 |                                                        norm_layer=norm_layer,
158 |                                                        norm_kwargs=norm_kwargs))
159 |                 
160 |                 
161 |                 channels_group = [[192, 256], [256, 256], [256, 512]]
162 |                 strides =   [2,1,1]
163 |                 shortcuts = [1,1,1]
164 |                 for cg, s, sc in zip(channels_group, strides, shortcuts):
165 |                     self.features3.add(LinearBottleneck(channels=np.int32(np.array(cg)*multiplier),
166 |                                                        stride=s,
167 |                                                        shortcut=sc,
168 |                                                        norm_layer=norm_layer,
169 |                                                        norm_kwargs=norm_kwargs))
170 |                 self.avg = nn.GlobalAvgPool2D()
171 | 
172 | 
173 |             self.output = nn.HybridSequential(prefix='output_')
174 |             with self.output.name_scope():
175 |                  self.output.add(
176 |                      nn.Conv2D(classes, 1, in_channels=int(512*multiplier),prefix='pred_'),
177 |                      nn.Flatten())
178 |           
179 |         
180 |         
181 |     def hybrid_forward(self, F, x):
182 |         x1 = self.features1(x)
183 |         x2 = self.features2(x1)
184 |         x3 = self.features3(x2)  
185 |         x = self.avg(x3)
186 |         x = self.output(x)
187 |         return x
188 | 
189 | 
190 | 
191 | if __name__ == "__main__": 
192 |     model = Etinynet()
193 |     model.initialize()
194 |     model.summary(mx.ndarray.zeros((1,3,256,256)))
195 |         
196 | 
197 | 


--------------------------------------------------------------------------------
/test_imagenet.py:
--------------------------------------------------------------------------------
  1 | import argparse, time, logging, os, math
  2 | #os.environ["CUDA_VISIBLE_DEVICES"] = '4,5,6,7'
  3 | 
  4 | import numpy as np
  5 | import mxnet as mx
  6 | import gluoncv as gcv
  7 | from mxnet import gluon, nd
  8 | from mxnet import autograd as ag
  9 | from mxnet.gluon.data.vision import transforms
 10 | 
 11 | import gluoncv as gcv
 12 | gcv.utils.check_version('0.6.0')
 13 | from gluoncv.data import imagenet
 14 | from gluoncv.model_zoo import get_model
 15 | from gluoncv.utils import makedirs, LRSequential, LRScheduler
 16 | 
 17 | os.environ["CUDA_VISIBLE_DEVICES"] = '4,5,6,7'
 18 | 
 19 | # CLI
 20 | def parse_args():
 21 |     parser = argparse.ArgumentParser(description='Train a model for image classification.')
 22 |     parser.add_argument('--data-dir', type=str, default='~/.mxnet/datasets/imagenet',
 23 |                         help='training and validation pictures to use.')
 24 |     parser.add_argument('--rec-train', type=str, default='~/.mxnet/datasets/imagenet/rec/train.rec',
 25 |                         help='the training data')
 26 |     parser.add_argument('--rec-train-idx', type=str, default='~/.mxnet/datasets/imagenet/rec/train.idx',
 27 |                         help='the index of training data')
 28 |     parser.add_argument('--rec-val', type=str, default='~/.mxnet/datasets/imagenet/rec/val.rec',
 29 |                         help='the validation data')
 30 |     parser.add_argument('--rec-val-idx', type=str, default='~/.mxnet/datasets/imagenet/rec/val.idx',
 31 |                         help='the index of validation data')
 32 |     parser.add_argument('--use-rec', action='store_true',
 33 |                         help='use image record iter for data input. default is false.')
 34 |     parser.add_argument('--batch-size', type=int, default=32,
 35 |                         help='training batch size per device (CPU/GPU).')
 36 |     parser.add_argument('--dtype', type=str, default='float32',
 37 |                         help='data type for training. default is float32')
 38 |     parser.add_argument('--num-gpus', type=int, default=0,
 39 |                         help='number of gpus to use.')
 40 |     parser.add_argument('-j', '--num-data-workers', dest='num_workers', default=4, type=int,
 41 |                         help='number of preprocessing workers')
 42 |     parser.add_argument('--num-epochs', type=int, default=3,
 43 |                         help='number of training epochs.')
 44 |     parser.add_argument('--lr', type=float, default=0.1,
 45 |                         help='learning rate. default is 0.1.')
 46 |     parser.add_argument('--momentum', type=float, default=0.9,
 47 |                         help='momentum value for optimizer, default is 0.9.')
 48 |     parser.add_argument('--wd', type=float, default=0.0001,
 49 |                         help='weight decay rate. default is 0.0001.')
 50 |     parser.add_argument('--lr-mode', type=str, default='step',
 51 |                         help='learning rate scheduler mode. options are step, poly and cosine.')
 52 |     parser.add_argument('--lr-decay', type=float, default=0.1,
 53 |                         help='decay rate of learning rate. default is 0.1.')
 54 |     parser.add_argument('--lr-decay-period', type=int, default=0,
 55 |                         help='interval for periodic learning rate decays. default is 0 to disable.')
 56 |     parser.add_argument('--lr-decay-epoch', type=str, default='40,60',
 57 |                         help='epochs at which learning rate decays. default is 40,60.')
 58 |     parser.add_argument('--warmup-lr', type=float, default=0.0,
 59 |                         help='starting warmup learning rate. default is 0.0.')
 60 |     parser.add_argument('--warmup-epochs', type=int, default=0,
 61 |                         help='number of warmup epochs.')
 62 |     parser.add_argument('--last-gamma', action='store_true',
 63 |                         help='whether to init gamma of the last BN layer in each bottleneck to 0.')
 64 |     parser.add_argument('--mode', type=str,
 65 |                         help='mode in which to train the model. options are symbolic, imperative, hybrid')
 66 |     parser.add_argument('--model', type=str,
 67 |                         help='type of model to use. see vision_model for options.')
 68 |     parser.add_argument('--input-size', type=int, default=224,
 69 |                         help='size of the input image size. default is 224')
 70 |     parser.add_argument('--crop-ratio', type=float, default=0.875,
 71 |                         help='Crop ratio during validation. default is 0.875')
 72 |     parser.add_argument('--use-pretrained', action='store_true',
 73 |                         help='enable using pretrained model from gluon.')
 74 |     parser.add_argument('--use_se', action='store_true',
 75 |                         help='use SE layers or not in resnext. default is false.')
 76 |     parser.add_argument('--mixup', action='store_true',
 77 |                         help='whether train the model with mix-up. default is false.')
 78 |     parser.add_argument('--mixup-alpha', type=float, default=0.2,
 79 |                         help='beta distribution parameter for mixup sampling, default is 0.2.')
 80 |     parser.add_argument('--mixup-off-epoch', type=int, default=0,
 81 |                         help='how many last epochs to train without mixup, default is 0.')
 82 |     parser.add_argument('--label-smoothing', action='store_true',
 83 |                         help='use label smoothing or not in training. default is false.')
 84 |     parser.add_argument('--no-wd', action='store_true',
 85 |                         help='whether to remove weight decay on bias, and beta/gamma for batchnorm layers.')
 86 |     parser.add_argument('--teacher', type=str, default=None,
 87 |                         help='teacher model for distillation training')
 88 |     parser.add_argument('--temperature', type=float, default=20,
 89 |                         help='temperature parameter for distillation teacher model')
 90 |     parser.add_argument('--hard-weight', type=float, default=0.5,
 91 |                         help='weight for the loss of one-hot label for distillation training')
 92 |     parser.add_argument('--batch-norm', action='store_true',
 93 |                         help='enable batch normalization or not in vgg. default is false.')
 94 |     parser.add_argument('--save-frequency', type=int, default=10,
 95 |                         help='frequency of model saving.')
 96 |     parser.add_argument('--save-dir', type=str, default='params',
 97 |                         help='directory of saved models')
 98 |     parser.add_argument('--resume-epoch', type=int, default=0,
 99 |                         help='epoch to resume training from.')
100 |     parser.add_argument('--resume-params', type=str, default='',
101 |                         help='path of parameters to load from.')
102 |     parser.add_argument('--resume-states', type=str, default='',
103 |                         help='path of trainer state to load from.')
104 |     parser.add_argument('--log-interval', type=int, default=50,
105 |                         help='Number of batches to wait before logging.')
106 |     parser.add_argument('--logging-file', type=str, default='train_imagenet.log',
107 |                         help='name of training log file')
108 |     parser.add_argument('--use-gn', action='store_true',
109 |                         help='whether to use group norm.')
110 |     opt = parser.parse_args()
111 |     return opt
112 | 
113 | 
114 | def main():
115 |     opt = parse_args()
116 |     
117 |     opt.rec_train = '/home/xkr/ramdisk/imagenet_train.rec'
118 |     opt.rec_train_idx = '/home/xkr/ramdisk/imagenet_train.idx'
119 |     opt.rec_val = '/home/xkr/ramdisk/imagenet_val.rec'
120 |     opt.rec_val_idx = '/home/xkr/ramdisk/imagenet_val.idx'
121 |     
122 |     opt.use_rec = True
123 |     opt.model = 'test'
124 |     opt.relugar = False
125 |     opt.lamda = 0.001
126 |     opt.mode = 'hybrid'
127 |     opt.dtype = "float32"
128 |     opt.lr = 0.001
129 |     opt.batch_size = 128
130 |     opt.num_gpus = 8
131 |     opt.input_size = 224
132 |     
133 |     
134 |     opt.resume_epoch = 0
135 |     opt.resume_params = "0.6553-imagenet-mobilenet_lite313_477k_nownorm_4433_224-293-best.params"
136 | 
137 |     
138 |     filehandler = logging.FileHandler(opt.logging_file)
139 |     streamhandler = logging.StreamHandler()
140 | 
141 |     logger = logging.getLogger('')
142 |     logger.setLevel(logging.INFO)
143 |     logger.addHandler(filehandler)
144 |     logger.addHandler(streamhandler)
145 | 
146 |     logger.info(opt)
147 | 
148 |     batch_size = opt.batch_size
149 |     classes = 1000
150 |     num_training_samples = 1281167
151 | 
152 |     num_gpus = opt.num_gpus
153 |     batch_size *= max(1, num_gpus)
154 |     context = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()]
155 |     num_workers = opt.num_workers
156 | 
157 |     lr_decay = opt.lr_decay
158 |     lr_decay_period = opt.lr_decay_period
159 |     if opt.lr_decay_period > 0:
160 |         lr_decay_epoch = list(range(lr_decay_period, opt.num_epochs, lr_decay_period))
161 |     else:
162 |         lr_decay_epoch = [int(i) for i in opt.lr_decay_epoch.split(',')]
163 |     lr_decay_epoch = [e - opt.warmup_epochs for e in lr_decay_epoch]
164 |     num_batches = num_training_samples // batch_size
165 | 
166 |     lr_scheduler = LRSequential([
167 |         LRScheduler(opt.lr_mode, base_lr=opt.lr, target_lr=1e-5,
168 |                     nepochs=opt.num_epochs - opt.warmup_epochs,
169 |                     iters_per_epoch=num_batches,
170 |                     step_epoch=lr_decay_epoch,
171 |                     step_factor=lr_decay)
172 |     ])
173 | 
174 |     model_name = opt.model
175 | 
176 |     kwargs = {'ctx': context, 'pretrained': opt.use_pretrained, 'classes': classes}
177 |     if opt.use_gn:
178 |         kwargs['norm_layer'] = gcv.nn.GroupNorm
179 |     if model_name.startswith('vgg'):
180 |         kwargs['batch_norm'] = opt.batch_norm
181 |     elif model_name.startswith('resnext'):
182 |         kwargs['use_se'] = opt.use_se
183 | 
184 |     if opt.last_gamma:
185 |         kwargs['last_gamma'] = True
186 | 
187 |     optimizer = 'nag'
188 |     optimizer_params = {'wd': opt.wd, 'momentum': opt.momentum, 'lr_scheduler': lr_scheduler}
189 |     if opt.dtype != 'float32':
190 |         optimizer_params['multi_precision'] = True
191 | 
192 |   
193 |     from etinynet import Etinynet
194 |     net = Etinynet(classes=1000)
195 | 
196 |     net.cast(opt.dtype)
197 |     if opt.resume_params != '':
198 |         net.load_parameters(opt.resume_params, ctx = context)
199 | #        net.collect_params().load(opt.resume_params, ctx = context)
200 | 
201 |     # teacher model for distillation training
202 |     if opt.teacher is not None and opt.hard_weight < 1.0:
203 |         teacher_name = opt.teacher
204 |         teacher = get_model(teacher_name, pretrained=True, classes=classes, ctx=context)
205 |         teacher.cast(opt.dtype)
206 |         distillation = True
207 |     else:
208 |         distillation = False
209 |     
210 | 
211 | 
212 |     # Two functions for reading data from record file or raw images
213 |     def get_data_rec(rec_train, rec_train_idx, rec_val, rec_val_idx, batch_size, num_workers):
214 |         rec_train = os.path.expanduser(rec_train)
215 |         rec_train_idx = os.path.expanduser(rec_train_idx)
216 |         rec_val = os.path.expanduser(rec_val)
217 |         rec_val_idx = os.path.expanduser(rec_val_idx)
218 |         jitter_param = 0.4
219 |         lighting_param = 0.1
220 |         input_size = opt.input_size
221 |         crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875
222 |         resize = int(math.ceil(input_size / crop_ratio))
223 |         mean_rgb = [128, 128, 128]
224 |         std_rgb = [1, 1, 1] 
225 | #        mean_rgb = [123.68, 116.779, 103.939]
226 | #        std_rgb = [58.393, 57.12, 57.375]
227 | 
228 |         def batch_fn(batch, ctx):
229 |             data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
230 |             label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
231 |             return data, label
232 | 
233 |         train_data = mx.io.ImageRecordIter(
234 |             path_imgrec         = rec_train,
235 |             path_imgidx         = rec_train_idx,
236 |             preprocess_threads  = num_workers,
237 |             shuffle             = True,
238 |             batch_size          = batch_size,
239 | 
240 |             data_shape          = (3, input_size, input_size),
241 |             mean_r              = mean_rgb[0],
242 |             mean_g              = mean_rgb[1],
243 |             mean_b              = mean_rgb[2],
244 |             std_r               = std_rgb[0],
245 |             std_g               = std_rgb[1],
246 |             std_b               = std_rgb[2],
247 |             rand_mirror         = True,
248 |             random_resized_crop = True,
249 |             max_aspect_ratio    = 4. / 3.,
250 |             min_aspect_ratio    = 3. / 4.,
251 |             max_random_area     = 1,
252 |             min_random_area     = 0.16,
253 |             brightness          = jitter_param,
254 |             saturation          = jitter_param,
255 |             contrast            = jitter_param,
256 | #            pca_noise           = lighting_param,
257 |         )
258 |         val_data = mx.io.ImageRecordIter(
259 |             path_imgrec         = rec_val,
260 |             path_imgidx         = rec_val_idx,
261 |             preprocess_threads  = num_workers,
262 |             shuffle             = False,
263 |             batch_size          = batch_size,
264 | 
265 |             resize              = resize,
266 |             data_shape          = (3, input_size, input_size),
267 |             mean_r              = mean_rgb[0],
268 |             mean_g              = mean_rgb[1],
269 |             mean_b              = mean_rgb[2],
270 |             std_r               = std_rgb[0],
271 |             std_g               = std_rgb[1],
272 |             std_b               = std_rgb[2],
273 |         )
274 |         return train_data, val_data, batch_fn
275 | 
276 |     def get_data_loader(data_dir, batch_size, num_workers):
277 |         normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
278 |         jitter_param = 0.4
279 |         lighting_param = 0.1
280 |         input_size = opt.input_size
281 |         crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875
282 |         resize = int(math.ceil(input_size / crop_ratio))
283 | 
284 |         def batch_fn(batch, ctx):
285 |             data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0)
286 |             label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0)
287 |             return data, label
288 | 
289 |         transform_train = transforms.Compose([
290 |             transforms.RandomResizedCrop(input_size),
291 |             transforms.RandomFlipLeftRight(),
292 |             transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param,
293 |                                         saturation=jitter_param),
294 |             transforms.RandomLighting(lighting_param),
295 |             transforms.ToTensor(),
296 |             normalize
297 |         ])
298 |         transform_test = transforms.Compose([
299 |             transforms.Resize(resize, keep_ratio=True),
300 |             transforms.CenterCrop(input_size),
301 |             transforms.ToTensor(),
302 |             normalize
303 |         ])
304 | 
305 |         train_data = gluon.data.DataLoader(
306 |             imagenet.classification.ImageNet(data_dir, train=True).transform_first(transform_train),
307 |             batch_size=batch_size, shuffle=True, last_batch='discard', num_workers=num_workers)
308 |         val_data = gluon.data.DataLoader(
309 |             imagenet.classification.ImageNet("/home/xkr/data/ImageNet", train=False).transform_first(transform_test),
310 |             batch_size=batch_size, shuffle=False, num_workers=num_workers)
311 | 
312 |         return train_data, val_data, batch_fn
313 | 
314 |     if opt.use_rec:
315 |         train_data, val_data, batch_fn = get_data_rec(opt.rec_train, opt.rec_train_idx,
316 |                                                     opt.rec_val, opt.rec_val_idx,
317 |                                                     batch_size, num_workers)
318 |     else:
319 |         train_data, val_data, batch_fn = get_data_loader(opt.data_dir, batch_size, num_workers)
320 | 
321 |     if opt.mixup:
322 |         train_metric = mx.metric.RMSE()
323 |     else:
324 |         train_metric = mx.metric.Accuracy()
325 |     acc_top1 = mx.metric.Accuracy()
326 |     acc_top5 = mx.metric.TopKAccuracy(5)
327 | 
328 |     save_frequency = opt.save_frequency
329 |     if opt.save_dir and save_frequency:
330 |         save_dir = opt.save_dir
331 |         makedirs(save_dir)
332 |     else:
333 |         save_dir = ''
334 |         save_frequency = 0
335 | 
336 |     def mixup_transform(label, classes, lam=1, eta=0.0):
337 |         if isinstance(label, nd.NDArray):
338 |             label = [label]
339 |         res = []
340 |         for l in label:
341 |             y1 = l.one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
342 |             y2 = l[::-1].one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
343 |             res.append(lam*y1 + (1-lam)*y2)
344 |         return res
345 |         params = net.collect_params()
346 |         weights_3x3 = []
347 |         weights_1x1 = []
348 |         for k, param in params.items():
349 |             shape = param.shape
350 |             if len(shape)==4 and shape[1]==1 and shape[2]==3 and shape[3]==3:
351 |                 weights_3x3.append([param.data(ctx=d) for d in ctx])
352 |                 if report:
353 |                     print(k, shape)
354 |             
355 |             elif len(shape)==4 and shape[0]<shape[1] and shape[2]==1 and shape[3]==1 and shape[1]!=1280:
356 |                 weights_1x1.append([param.data(ctx=d) for d in ctx])
357 |                 if report:
358 |                     print(k, shape)
359 |         report = False
360 |         
361 |         loss = [nd.zeros((1,)).as_in_context(d) for d in ctx]
362 |         for w in weights_3x3:
363 |             for i, d in enumerate(ctx):
364 |                 wd = nd.reshape(w[i], (w[i].shape[0], 9))
365 |                 inner = nd.sum(nd.power(wd, 2), axis=1)
366 |                 inner = inner - 1.0
367 |                 loss[i] = loss[i] + nd.norm(inner, ord=2)
368 |                 
369 |         for w in weights_1x1:
370 |             for i, d in enumerate(ctx):
371 |                 wd = nd.squeeze(w[i])
372 |                 product = nd.dot(wd, wd.T)
373 |                 num = product.shape[0]
374 |                 
375 |                 mask = nd.ones((num, num), ctx=d) - nd.eye(num, ctx=d)
376 |                 product = product * mask
377 |                 loss[i] = loss[i] + nd.norm(product, ord=2)
378 |                 
379 |         return loss
380 |     
381 |     def smooth(label, classes, eta=0.1):
382 |         if isinstance(label, nd.NDArray):
383 |             label = [label]
384 |         smoothed = []
385 |         for l in label:
386 |             res = l.one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
387 |             smoothed.append(res)
388 |         return smoothed
389 |     
390 |     global report
391 |     report = True
392 |     
393 | 
394 | 
395 |     
396 |     
397 |     def test(ctx, val_data):
398 |         if opt.use_rec:
399 |             val_data.reset()
400 |         acc_top1.reset()
401 |         acc_top5.reset()
402 |         with ag.predict_mode():
403 |             for i, batch in enumerate(val_data):
404 |                 data, label = batch_fn(batch, ctx)
405 |                 outputs = [net(X.astype(opt.dtype, copy=False)) for X in data]
406 |                 acc_top1.update(label, outputs)
407 |                 acc_top5.update(label, outputs)
408 |     
409 |             _, top1 = acc_top1.get()
410 |             _, top5 = acc_top5.get()
411 |         return (1-top1, 1-top5)
412 |     
413 |     def train(ctx):
414 |         
415 |         err_top1_val, err_top5_val = test(ctx, val_data)
416 |         logger.info('[Epoch %d] validation: acc-top1=%.5f acc-top5=%.5f'%(0, 1-err_top1_val, 1-err_top5_val))
417 |         
418 |         
419 |     
420 |     if opt.mode == 'hybrid':
421 |         net.hybridize(static_alloc=True, static_shape=True)
422 |         if distillation:
423 |             teacher.hybridize(static_alloc=True, static_shape=True)
424 |     train(context)
425 | 
426 | if __name__ == '__main__':
427 |     main()
428 | 


--------------------------------------------------------------------------------
/train_imagenet.py:
--------------------------------------------------------------------------------
  1 | import argparse, time, logging, os, math
  2 | #os.environ["CUDA_VISIBLE_DEVICES"] = '4,5,6,7'
  3 | 
  4 | import numpy as np
  5 | import mxnet as mx
  6 | import gluoncv as gcv
  7 | from mxnet import gluon, nd
  8 | from mxnet import autograd as ag
  9 | from mxnet.gluon.data.vision import transforms
 10 | import random
 11 | import gluoncv as gcv
 12 | gcv.utils.check_version('0.6.0')
 13 | from gluoncv.data import imagenet
 14 | from gluoncv.model_zoo import get_model
 15 | from gluoncv.utils import makedirs, LRSequential, LRScheduler
 16 | 
 17 | # CLI
 18 | def parse_args():
 19 |     parser = argparse.ArgumentParser(description='Train a model for image classification.')
 20 |     parser.add_argument('--data-dir', type=str, default='~/.mxnet/datasets/imagenet',
 21 |                         help='training and validation pictures to use.')
 22 |     parser.add_argument('--rec-train', type=str, default='~/.mxnet/datasets/imagenet/rec/train.rec',
 23 |                         help='the training data')
 24 |     parser.add_argument('--rec-train-idx', type=str, default='~/.mxnet/datasets/imagenet/rec/train.idx',
 25 |                         help='the index of training data')
 26 |     parser.add_argument('--rec-val', type=str, default='~/.mxnet/datasets/imagenet/rec/val.rec',
 27 |                         help='the validation data')
 28 |     parser.add_argument('--rec-val-idx', type=str, default='~/.mxnet/datasets/imagenet/rec/val.idx',
 29 |                         help='the index of validation data')
 30 |     parser.add_argument('--use-rec', action='store_true',
 31 |                         help='use image record iter for data input. default is false.')
 32 |     parser.add_argument('--batch-size', type=int, default=32,
 33 |                         help='training batch size per device (CPU/GPU).')
 34 |     parser.add_argument('--dtype', type=str, default='float32',
 35 |                         help='data type for training. default is float32')
 36 |     parser.add_argument('--num-gpus', type=int, default=0,
 37 |                         help='number of gpus to use.')
 38 |     parser.add_argument('-j', '--num-data-workers', dest='num_workers', default=4, type=int,
 39 |                         help='number of preprocessing workers')
 40 |     parser.add_argument('--num-epochs', type=int, default=3,
 41 |                         help='number of training epochs.')
 42 |     parser.add_argument('--lr', type=float, default=0.1,
 43 |                         help='learning rate. default is 0.1.')
 44 |     parser.add_argument('--momentum', type=float, default=0.9,
 45 |                         help='momentum value for optimizer, default is 0.9.')
 46 |     parser.add_argument('--wd', type=float, default=0.0001,
 47 |                         help='weight decay rate. default is 0.0001.')
 48 |     parser.add_argument('--lr-mode', type=str, default='step',
 49 |                         help='learning rate scheduler mode. options are step, poly and cosine.')
 50 |     parser.add_argument('--lr-decay', type=float, default=0.1,
 51 |                         help='decay rate of learning rate. default is 0.1.')
 52 |     parser.add_argument('--lr-decay-period', type=int, default=0,
 53 |                         help='interval for periodic learning rate decays. default is 0 to disable.')
 54 |     parser.add_argument('--lr-decay-epoch', type=str, default='40,60',
 55 |                         help='epochs at which learning rate decays. default is 40,60.')
 56 |     parser.add_argument('--warmup-lr', type=float, default=0.0,
 57 |                         help='starting warmup learning rate. default is 0.0.')
 58 |     parser.add_argument('--warmup-epochs', type=int, default=0,
 59 |                         help='number of warmup epochs.')
 60 |     parser.add_argument('--last-gamma', action='store_true',
 61 |                         help='whether to init gamma of the last BN layer in each bottleneck to 0.')
 62 |     parser.add_argument('--mode', type=str,
 63 |                         help='mode in which to train the model. options are symbolic, imperative, hybrid')
 64 |     parser.add_argument('--model', type=str,
 65 |                         help='type of model to use. see vision_model for options.')
 66 |     parser.add_argument('--input-size', type=int, default=224,
 67 |                         help='size of the input image size. default is 224')
 68 |     parser.add_argument('--crop-ratio', type=float, default=0.875,
 69 |                         help='Crop ratio during validation. default is 0.875')
 70 |     parser.add_argument('--use-pretrained', action='store_true',
 71 |                         help='enable using pretrained model from gluon.')
 72 |     parser.add_argument('--use_se', action='store_true',
 73 |                         help='use SE layers or not in resnext. default is false.')
 74 |     parser.add_argument('--mixup', action='store_true',
 75 |                         help='whether train the model with mix-up. default is false.')
 76 |     parser.add_argument('--mixup-alpha', type=float, default=0.2,
 77 |                         help='beta distribution parameter for mixup sampling, default is 0.2.')
 78 |     parser.add_argument('--mixup-off-epoch', type=int, default=0,
 79 |                         help='how many last epochs to train without mixup, default is 0.')
 80 |     parser.add_argument('--label-smoothing', action='store_true',
 81 |                         help='use label smoothing or not in training. default is false.')
 82 |     parser.add_argument('--no-wd', action='store_true',
 83 |                         help='whether to remove weight decay on bias, and beta/gamma for batchnorm layers.')
 84 |     parser.add_argument('--teacher', type=str, default=None,
 85 |                         help='teacher model for distillation training')
 86 |     parser.add_argument('--temperature', type=float, default=20,
 87 |                         help='temperature parameter for distillation teacher model')
 88 |     parser.add_argument('--hard-weight', type=float, default=0.5,
 89 |                         help='weight for the loss of one-hot label for distillation training')
 90 |     parser.add_argument('--batch-norm', action='store_true',
 91 |                         help='enable batch normalization or not in vgg. default is false.')
 92 |     parser.add_argument('--save-frequency', type=int, default=10,
 93 |                         help='frequency of model saving.')
 94 |     parser.add_argument('--save-dir', type=str, default='params',
 95 |                         help='directory of saved models')
 96 |     parser.add_argument('--resume-epoch', type=int, default=0,
 97 |                         help='epoch to resume training from.')
 98 |     parser.add_argument('--resume-params', type=str, default='',
 99 |                         help='path of parameters to load from.')
100 |     parser.add_argument('--resume-states', type=str, default='',
101 |                         help='path of trainer state to load from.')
102 |     parser.add_argument('--log-interval', type=int, default=50,
103 |                         help='Number of batches to wait before logging.')
104 |     parser.add_argument('--logging-file', type=str, default='train_imagenet.log',
105 |                         help='name of training log file')
106 |     parser.add_argument('--use-gn', action='store_true',
107 |                         help='whether to use group norm.')
108 |     opt = parser.parse_args()
109 |     return opt
110 | 
111 | 
112 | def main():
113 |     opt = parse_args()
114 |     
115 |     
116 |     # hyper parameter settings example
117 |     opt.rec_train = '/home/xkr/ramdisk/imagenet_train.rec'
118 |     opt.rec_train_idx = '/home/xkr/ramdisk/imagenet_train.idx'
119 |     opt.rec_val = '/home/xkr/ramdisk/imagenet_val.rec'
120 |     opt.rec_val_idx = '/home/xkr/ramdisk/imagenet_val.idx'
121 |     
122 |     opt.use_rec = True
123 |     opt.model = 'etinynet'
124 |     opt.relugar = False
125 |     opt.lamda = 0.001
126 |     opt.mode = 'hybrid'
127 |     opt.dtype = "float32"
128 |     opt.lr = 0.1
129 |     
130 |     opt.lr_mode = 'cosine'
131 |     opt.lr_decay_epoch='30,60'
132 |     opt.num_epochs = 300
133 |     opt.batch_size = 128
134 |     opt.num_gpus = 8
135 |     opt.warmup_epochs = 0
136 |     opt.label_smoothing = False
137 |     opt.mixup = False
138 |     opt.mixup_alpha = 0.2
139 | 
140 |     opt.hard_weight = 0.75
141 |     opt.no_wd = True
142 |     opt.wd = 1e-4
143 |     opt.last_gamma = False
144 |     opt.save_dir = 'params/params_' + opt.model
145 |     opt.logging_file = 'params/log_' + opt.model + '.log'
146 |     opt.num_workers = 80
147 |     opt.log_interval = 100
148 |     opt.input_size = 224
149 |     opt.save_frequency = 100
150 |      
151 |     opt.quantization_regular = False
152 |     opt.quantizaiton_weight = 0.05
153 |     
154 |     opt.weight_regular = False
155 |     opt.weight_regular_w = 1e-2
156 |     
157 |     opt.resume_epoch = 0
158 |     opt.resume_params = ""
159 |     opt.resume_states = ""
160 | 
161 |     opt.teacher = None
162 |     opt.temperature = 3
163 |     opt.norm_distill = False
164 |     opt.norm_distill_w = 9
165 |     opt.norm = 8
166 |     
167 |     filehandler = logging.FileHandler(opt.logging_file)
168 |     streamhandler = logging.StreamHandler()
169 | 
170 |     logger = logging.getLogger('')
171 |     logger.setLevel(logging.INFO)
172 |     logger.addHandler(filehandler)
173 |     logger.addHandler(streamhandler)
174 | 
175 |     logger.info(opt)
176 | 
177 |     batch_size = opt.batch_size
178 |     classes = 1000
179 |     num_training_samples = 1281167
180 | 
181 |     num_gpus = opt.num_gpus
182 |     batch_size *= max(1, num_gpus)
183 |     context = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()]
184 |     num_workers = opt.num_workers
185 | 
186 |     lr_decay = opt.lr_decay
187 |     lr_decay_period = opt.lr_decay_period
188 |     if opt.lr_decay_period > 0:
189 |         lr_decay_epoch = list(range(lr_decay_period, opt.num_epochs, lr_decay_period))
190 |     else:
191 |         lr_decay_epoch = [int(i) for i in opt.lr_decay_epoch.split(',')]
192 |     lr_decay_epoch = [e - opt.warmup_epochs for e in lr_decay_epoch]
193 |     num_batches = num_training_samples // batch_size
194 | 
195 |     lr_scheduler = LRSequential([
196 |         LRScheduler(opt.lr_mode, base_lr=opt.lr, target_lr=1e-5,
197 |                     nepochs=opt.num_epochs - opt.warmup_epochs,
198 |                     iters_per_epoch=num_batches,
199 |                     step_epoch=lr_decay_epoch,
200 |                     step_factor=lr_decay)
201 |     ])
202 | 
203 |     model_name = opt.model
204 | 
205 |     kwargs = {'ctx': context, 'pretrained': opt.use_pretrained, 'classes': classes}
206 |     if opt.use_gn:
207 |         kwargs['norm_layer'] = gcv.nn.GroupNorm
208 |     if model_name.startswith('vgg'):
209 |         kwargs['batch_norm'] = opt.batch_norm
210 |     elif model_name.startswith('resnext'):
211 |         kwargs['use_se'] = opt.use_se
212 | 
213 |     if opt.last_gamma:
214 |         kwargs['last_gamma'] = True
215 | 
216 |     optimizer = 'nag'
217 |     optimizer_params = {'wd': opt.wd, 'momentum': opt.momentum, 'lr_scheduler': lr_scheduler}
218 |     if opt.dtype != 'float32':
219 |         optimizer_params['multi_precision'] = True
220 | 
221 | 
222 |     if opt.norm_distill:
223 |         from models.mobilenet_lite_distill import MobileNetV2
224 |         kwargs = {'norm': opt.norm,'classes':1000,'switchout':2}
225 |         net = MobileNetV2(**kwargs)
226 |     else:    
227 |         from etinynet import Etinynet
228 |         net = Etinynet(classes=1000)
229 |     
230 |     net.cast(opt.dtype)
231 |     if opt.resume_params != '':
232 |         net.load_parameters(opt.resume_params, ctx = context, allow_missing=True)
233 |         net.initialize(ctx=context)
234 | 
235 |     # teacher model for distillation training
236 |     if opt.teacher is not None and opt.hard_weight < 1.0 and not opt.norm_distill:
237 |         teacher_name = opt.teacher
238 |         teacher = get_model(teacher_name, pretrained=True, classes=classes, ctx=context)
239 |         teacher.cast(opt.dtype)
240 |         distillation = True
241 |     else:
242 |         distillation = False
243 |     
244 |     if opt.norm_distill:
245 |         from models.mobilenetv2_distill import mobilenet_v2_1_0
246 |         kwargs = {'norm': opt.norm, 'classes':1000,'switchout':1}
247 |         teacher = mobilenet_v2_1_0(**kwargs)
248 |         teacher.load_parameters(opt.tea_net_params, ctx = context)
249 |         distillation = True
250 | 
251 |     # Two functions for reading data from record file or raw images
252 |     def get_data_rec(rec_train, rec_train_idx, rec_val, rec_val_idx, batch_size, num_workers):
253 |         rec_train = os.path.expanduser(rec_train)
254 |         rec_train_idx = os.path.expanduser(rec_train_idx)
255 |         rec_val = os.path.expanduser(rec_val)
256 |         rec_val_idx = os.path.expanduser(rec_val_idx)
257 |         jitter_param = 0.4
258 |         lighting_param = 0.1
259 |         input_size = opt.input_size
260 |         crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875
261 |         resize = int(math.ceil(input_size / crop_ratio))
262 |         mean_rgb = [128, 128, 128]
263 |         std_rgb = [1, 1, 1] 
264 | #        mean_rgb = [123.68, 116.779, 103.939]
265 | #        std_rgb = [58.393, 57.12, 57.375]
266 | 
267 |         def batch_fn(batch, ctx):
268 |             data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0)
269 |             label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0)
270 |             return data, label
271 | 
272 |         train_data = mx.io.ImageRecordIter(
273 |             path_imgrec         = rec_train,
274 |             path_imgidx         = rec_train_idx,
275 |             preprocess_threads  = num_workers,
276 |             shuffle             = True,
277 |             batch_size          = batch_size,
278 | 
279 |             data_shape          = (3, input_size, input_size),
280 |             mean_r              = mean_rgb[0],
281 |             mean_g              = mean_rgb[1],
282 |             mean_b              = mean_rgb[2],
283 |             std_r               = std_rgb[0],
284 |             std_g               = std_rgb[1],
285 |             std_b               = std_rgb[2],
286 |             rand_mirror         = True,
287 |             random_resized_crop = True,
288 |             max_aspect_ratio    = 4. / 3.,
289 |             min_aspect_ratio    = 3. / 4.,
290 |             max_random_area     = 1,
291 |             min_random_area     = 0.16,
292 |             brightness          = jitter_param,
293 |             saturation          = jitter_param,
294 |             contrast            = jitter_param,
295 | #            pca_noise           = lighting_param,
296 |         )
297 |         val_data = mx.io.ImageRecordIter(
298 |             path_imgrec         = rec_val,
299 |             path_imgidx         = rec_val_idx,
300 |             preprocess_threads  = num_workers,
301 |             shuffle             = False,
302 |             batch_size          = batch_size,
303 | 
304 |             resize              = resize,
305 |             data_shape          = (3, input_size, input_size),
306 |             mean_r              = mean_rgb[0],
307 |             mean_g              = mean_rgb[1],
308 |             mean_b              = mean_rgb[2],
309 |             std_r               = std_rgb[0],
310 |             std_g               = std_rgb[1],
311 |             std_b               = std_rgb[2],
312 |         )
313 |         return train_data, val_data, batch_fn
314 | 
315 |     def get_data_loader(data_dir, batch_size, num_workers):
316 |         normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
317 |         jitter_param = 0.4
318 |         lighting_param = 0.1
319 |         input_size = opt.input_size
320 |         crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875
321 |         resize = int(math.ceil(input_size / crop_ratio))
322 | 
323 |         def batch_fn(batch, ctx):
324 |             data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0)
325 |             label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0)
326 |             return data, label
327 | 
328 |         transform_train = transforms.Compose([
329 |             transforms.RandomResizedCrop(input_size),
330 |             transforms.RandomFlipLeftRight(),
331 |             transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param,
332 |                                         saturation=jitter_param),
333 |             transforms.RandomLighting(lighting_param),
334 |             transforms.ToTensor(),
335 |             normalize
336 |         ])
337 |         transform_test = transforms.Compose([
338 |             transforms.Resize(resize, keep_ratio=True),
339 |             transforms.CenterCrop(input_size),
340 |             transforms.ToTensor(),
341 |             normalize
342 |         ])
343 | 
344 |         train_data = gluon.data.DataLoader(
345 |             imagenet.classification.ImageNet(data_dir, train=True).transform_first(transform_train),
346 |             batch_size=batch_size, shuffle=True, last_batch='discard', num_workers=num_workers)
347 |         val_data = gluon.data.DataLoader(
348 |             imagenet.classification.ImageNet("/home/xkr/data/ImageNet", train=False).transform_first(transform_test),
349 |             batch_size=batch_size, shuffle=False, num_workers=num_workers)
350 | 
351 |         return train_data, val_data, batch_fn
352 | 
353 |     if opt.use_rec:
354 |         train_data, val_data, batch_fn = get_data_rec(opt.rec_train, opt.rec_train_idx,
355 |                                                     opt.rec_val, opt.rec_val_idx,
356 |                                                     batch_size, num_workers)
357 |     else:
358 |         train_data, val_data, batch_fn = get_data_loader(opt.data_dir, batch_size, num_workers)
359 | 
360 |     if opt.mixup:
361 |         train_metric = mx.metric.RMSE()
362 |     else:
363 |         train_metric = mx.metric.Accuracy()
364 |     acc_top1 = mx.metric.Accuracy()
365 |     acc_top5 = mx.metric.TopKAccuracy(5)
366 | 
367 |     save_frequency = opt.save_frequency
368 |     if opt.save_dir and save_frequency:
369 |         save_dir = opt.save_dir
370 |         makedirs(save_dir)
371 |     else:
372 |         save_dir = ''
373 |         save_frequency = 0
374 | 
375 |     def mixup_transform(label, classes, lam=1, eta=0.0):
376 |         if isinstance(label, nd.NDArray):
377 |             label = [label]
378 |         res = []
379 |         for l in label:
380 |             y1 = l.one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
381 |             y2 = l[::-1].one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
382 |             res.append(lam*y1 + (1-lam)*y2)
383 |         return res
384 |         params = net.collect_params()
385 |         weights_3x3 = []
386 |         weights_1x1 = []
387 |         for k, param in params.items():
388 |             shape = param.shape
389 |             if len(shape)==4 and shape[1]==1 and shape[2]==3 and shape[3]==3:
390 |                 weights_3x3.append([param.data(ctx=d) for d in ctx])
391 |                 if report:
392 |                     print(k, shape)
393 |             
394 |             elif len(shape)==4 and shape[0]<shape[1] and shape[2]==1 and shape[3]==1 and shape[1]!=1280:
395 |                 weights_1x1.append([param.data(ctx=d) for d in ctx])
396 |                 if report:
397 |                     print(k, shape)
398 |         report = False
399 |         
400 |         loss = [nd.zeros((1,)).as_in_context(d) for d in ctx]
401 |         for w in weights_3x3:
402 |             for i, d in enumerate(ctx):
403 |                 wd = nd.reshape(w[i], (w[i].shape[0], 9))
404 |                 inner = nd.sum(nd.power(wd, 2), axis=1)
405 |                 inner = inner - 1.0
406 |                 loss[i] = loss[i] + nd.norm(inner, ord=2)
407 |                 
408 |         for w in weights_1x1:
409 |             for i, d in enumerate(ctx):
410 |                 wd = nd.squeeze(w[i])
411 |                 product = nd.dot(wd, wd.T)
412 |                 num = product.shape[0]
413 |                 
414 |                 mask = nd.ones((num, num), ctx=d) - nd.eye(num, ctx=d)
415 |                 product = product * mask
416 |                 loss[i] = loss[i] + nd.norm(product, ord=2)
417 |                 
418 |         return loss
419 |     
420 |     def smooth(label, classes, eta=0.1):
421 |         if isinstance(label, nd.NDArray):
422 |             label = [label]
423 |         smoothed = []
424 |         for l in label:
425 |             res = l.one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
426 |             smoothed.append(res)
427 |         return smoothed
428 |     
429 |     global report
430 |     report = True
431 |     
432 |     
433 |     def weight_loss(net, ctx):
434 |         global report
435 |         params = net.collect_params()
436 |         keys = list(params.keys())
437 |         weights_3x3 = []
438 |         gamma_3x3 = []
439 |         var_3x3 = []
440 |         weights_1x1 = []
441 |         
442 |         for idx in range(len(keys)):            
443 |             param = params[keys[idx]]
444 |             shape = param.shape
445 |             if len(shape)==4 and shape[1]==1 and shape[2]==3 and shape[3]==3:
446 |                 weights_3x3.append([param.data(ctx=d) for d in ctx])
447 |                 gamma_3x3.append([params[keys[idx+1]].data(ctx=d) for d in ctx])
448 |                 var_3x3.append([params[keys[idx+4]].data(ctx=d) for d in ctx])
449 |                 if report:
450 |                     print(keys[idx], shape)
451 |             
452 |             elif len(shape)==4 and shape[2]==1 and shape[3]==1 and shape[0] != 1000:
453 |                 weights_1x1.append([param.data(ctx=d) for d in ctx])
454 |                 if report:
455 |                     print(keys[idx], shape)
456 |         
457 |         weight_pt = []
458 |         keys = list(pt.keys())
459 |         for key in keys:   
460 |             if "weight" in key and "features1.0" not in key and "output.0" not in key:
461 |                 weight_pt.append(pt[key])
462 | #                print(key,pt[key].shape)
463 |         
464 |         report = False
465 |         loss = [nd.zeros((1,)).as_in_context(d) for d in ctx]
466 |         for l in range(len(weight_pt)):
467 |             for i, d in enumerate(ctx):
468 |                 w3 = weights_3x3[l][i]
469 |                 w1 = weights_1x1[l][i]
470 |                 a  = gamma_3x3[l][i]
471 |                 v =  var_3x3[l][i]
472 |                 
473 |                 a = a / nd.sqrt(v + 1e-5)
474 |                 a = nd.reshape(a,(-1,1,1,1))
475 |                 w3 = a * w3
476 |                 w3 = nd.reshape(w3,(1,-1))
477 |                 w1 = nd.tile(w1,(1,1,1,9))
478 |                 w1 = nd.reshape(w1,(w1.shape[0],-1))
479 |                 wh = w1 * w3
480 |                 wh = nd.reshape(wh,(wh.shape[0],-1,3,3))
481 |                 
482 |                 diff = wh - nd.array(weight_pt[l],ctx=d)
483 |                 loss[i] = loss[i] + nd.norm(diff, ord=2)
484 |              
485 |                 
486 |         return loss
487 |     
488 |     
489 |     def regular_loss(net, ctx):
490 |         global report
491 |         params = net.collect_params()
492 |         weights_3x3 = []
493 |         weights_1x1 = []
494 |         for k, param in params.items():
495 |             shape = param.shape
496 |             if len(shape)==4 and shape[1]==1 and shape[2]==3 and shape[3]==3:
497 |                 weights_3x3.append([param.data(ctx=d) for d in ctx])
498 |                 if report:
499 |                     print(k, shape)
500 |             
501 |             elif len(shape)==4 and shape[0]<shape[1] and shape[2]==1 and shape[3]==1 and shape[1]!=1280:
502 |                 weights_1x1.append([param.data(ctx=d) for d in ctx])
503 |                 if report:
504 |                     print(k, shape)
505 |         report = False
506 |         
507 |         loss = [nd.zeros((1,)).as_in_context(d) for d in ctx]
508 |         for w in weights_3x3:
509 |             for i, d in enumerate(ctx):
510 |                 wd = nd.reshape(w[i], (w[i].shape[0], 9))
511 |                 inner = nd.sum(nd.power(wd, 2), axis=1)
512 |                 inner = inner - 1.0
513 |                 loss[i] = loss[i] + nd.norm(inner, ord=2)
514 |                 
515 |         for w in weights_1x1:
516 |             for i, d in enumerate(ctx):
517 |                 wd = nd.squeeze(w[i])
518 |                 product = nd.dot(wd, wd.T)
519 |                 num = product.shape[0]
520 |                 
521 |                 mask = nd.ones((num, num), ctx=d) - nd.eye(num, ctx=d)
522 |                 product = product * mask
523 |                 loss[i] = loss[i] + nd.norm(product, ord=2)
524 |                 
525 |         return loss
526 |     
527 |     
528 |     def quantization_regular(net, ctx):
529 |         global report
530 |         params = net.collect_params()
531 |         weights = []
532 |         scale_w = []
533 |         keys = list(params.keys())
534 |         for idx in range(len(keys)):            
535 |             param = params[keys[idx]]
536 |             shape = param.shape
537 |             if len(shape)==4 and shape[1]==1 and shape[2]==3 and shape[3]==3:
538 |                 weights.append([param.data(ctx=d) for d in ctx])
539 | #                scale_w.append([params[keys[idx+1]].data(ctx=d) for d in ctx])
540 |                 if report:
541 |                     print(keys[idx], shape)          
542 |             elif len(shape)==4 and shape[0] != 1000 and shape[2]==1 and shape[3]==1:
543 |                 weights.append([param.data(ctx=d) for d in ctx])
544 | #                scale_w.append([params[keys[idx+1]].data(ctx=d) for d in ctx])
545 |                 if report:
546 |                     print(keys[idx], shape)
547 |                     
548 |         report = False
549 |         
550 |         precision_w = 2**(4-1) - 1
551 |         loss = [nd.zeros((1,)).as_in_context(d) for d in ctx]
552 |         
553 |         for l in range(len(weights)):
554 |             for i, d in enumerate(ctx):
555 |                 weight = weights[l][i]
556 |         
557 |                 cur_weight = weight
558 |                 w_quan = cur_weight * precision_w
559 |                 w_round = nd.round(w_quan)
560 |                 diff = w_quan - w_round
561 |                 
562 |                 diff_norm = nd.norm(diff, ord=2)
563 |                 ele_num = np.sqrt(diff.shape[0]*diff.shape[1]*diff.shape[2]*diff.shape[3])
564 |                 diff_norm = diff_norm / ele_num
565 |                 loss[i] = loss[i] + diff_norm
566 |         return loss
567 |     
568 |     
569 |     def test(ctx, val_data):
570 |         if opt.use_rec:
571 |             val_data.reset()
572 |         acc_top1.reset()
573 |         acc_top5.reset()
574 |         with ag.predict_mode():
575 |             for i, batch in enumerate(val_data):
576 |                 data, label = batch_fn(batch, ctx)
577 |                 outputs = [net(X.astype(opt.dtype, copy=False)) for X in data]
578 |                 acc_top1.update(label, outputs)
579 |                 acc_top5.update(label, outputs)
580 |     
581 |             _, top1 = acc_top1.get()
582 |             _, top5 = acc_top5.get()
583 |         return (1-top1, 1-top5)
584 |     
585 |     def train(ctx):
586 |         if isinstance(ctx, mx.Context):
587 |             ctx = [ctx]
588 |         if opt.resume_params == '':
589 |             net.initialize(mx.init.Xavier(rnd_type="uniform", factor_type="in", magnitude=1), ctx=ctx)
590 | 
591 |         if opt.no_wd:
592 |             for k, v in net.collect_params('.*beta|.*gamma|.*bias').items():
593 |                 v.wd_mult = 0.0
594 | 
595 |         trainer = gluon.Trainer(net.collect_params(), optimizer, optimizer_params)
596 |         if opt.resume_states != '':
597 |             trainer.load_states(opt.resume_states)
598 | 
599 |         if opt.label_smoothing or opt.mixup:
600 |             sparse_label_loss = False
601 |         else:
602 |             sparse_label_loss = True
603 |         if distillation:
604 |             if opt.norm_distill:
605 |                 L_ndis = gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=False)
606 |                 L = gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=sparse_label_loss)
607 |             else:
608 |                 L = gcv.loss.DistillationSoftmaxCrossEntropyLoss(temperature=opt.temperature,
609 |                                                                      hard_weight=opt.hard_weight,
610 |                                                                      sparse_label=sparse_label_loss)
611 |         else:
612 |             L = gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=sparse_label_loss)
613 | 
614 |         best_val_score = 1
615 |         
616 |         err_top1_val, err_top5_val = test(ctx, val_data)
617 |         logger.info('[Epoch %d] validation: acc-top1=%.5f acc-top5=%.5f'%(0, 1-err_top1_val, 1-err_top5_val))
618 |         
619 |         
620 |         std = []
621 |         for item in context:
622 |             std.append(nd.reshape(nd.array([58.393, 57.12, 57.375],item),(1,3,1,1)))
623 |         
624 |     
625 |         
626 |         
627 |         for epoch in range(opt.resume_epoch, opt.num_epochs):
628 |             tic = time.time()
629 |             if opt.use_rec:
630 |                 train_data.reset()
631 |             train_metric.reset()
632 |             btic = time.time()
633 |             train_loss1 = 0
634 |             train_loss2 = 0
635 |             
636 |             for i, batch in enumerate(train_data):
637 |                 data, label = batch_fn(batch, ctx)
638 |                 if opt.mixup:
639 |                     lam = np.random.beta(opt.mixup_alpha, opt.mixup_alpha)
640 |                     if epoch >= opt.num_epochs - opt.mixup_off_epoch:
641 |                         lam = 1
642 |                     data = [lam*X + (1-lam)*X[::-1] for X in data]
643 | 
644 |                     if opt.label_smoothing:
645 |                         eta = 0.1
646 |                     else:
647 |                         eta = 0.0
648 |                     label = mixup_transform(label, 
649 |                         classes, lam, eta)
650 | 
651 |                 elif opt.label_smoothing:
652 |                     hard_label = label
653 |                     label = smooth(label, classes)
654 | 
655 | 
656 |                 if distillation:
657 |                     if opt.norm_distill:
658 |                         teacher_prob = [nd.softmax(teacher(X.astype(opt.dtype, copy=False)/S)) for X,S in zip(data,std)]
659 |                     else:
660 |                         teacher_prob = [nd.softmax(teacher(X.astype(opt.dtype, copy=False)/S) / opt.temperature) \
661 |                                         for X,S in zip(data,std)]
662 | 
663 |                 with ag.record():
664 |                     if opt.norm_distill:
665 |                         outputs_tmp = [net(X.astype(opt.dtype, copy=False)) for X in data]
666 |                         outputs = [X[0] for X in outputs_tmp]  
667 |                         outputs_norm = [X[1] for X in outputs_tmp] 
668 |                     else:
669 |                         outputs = [net(X.astype(opt.dtype, copy=False)) for X in data]
670 |                     
671 |                     
672 |                     if opt.quantization_regular:
673 |                         l1 = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)]
674 |                         l2 = quantization_regular(net, ctx)
675 |                         loss = [x + opt.quantizaiton_weight * y for x,y in zip(l1,l2)]
676 |                     
677 |                     elif opt.weight_regular:
678 |                         l1 = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)]
679 |                         l2 = weight_loss(net, ctx)
680 |                         loss = [x + opt.weight_regular_w * y for x,y in zip(l1,l2)]
681 |                         
682 |                     elif opt.relugar:
683 |                         loss2 = regular_loss(net, ctx)
684 |                         loss2 = [l * opt.lamda for l in loss2]
685 |                         
686 |                         if distillation:
687 |                             loss = [L(yhat.astype('float32', copy=False),
688 |                                       y.astype('float32', copy=False),
689 |                                       p.astype('float32', copy=False)) + l2 for yhat, y, p, l2 in zip(outputs, label, teacher_prob, loss2)]
690 |                         else:
691 |                             loss = [L(yhat, y.astype(opt.dtype, copy=False)) + l2 for yhat, y, l2 in zip(outputs, label, loss2)]
692 |                     else:
693 |                         if distillation:
694 |                             if opt.norm_distill:
695 |                                 l1 = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)]
696 |                                 l2 = [L_ndis(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs_norm, teacher_prob)]
697 |                                 loss = [x + opt.norm_distill_w * y for x,y in zip(l1,l2)]  
698 |                             else:
699 |                                 loss = [L(yhat.astype('float32', copy=False),
700 |                                           y.astype('float32', copy=False),
701 |                                           p.astype('float32', copy=False)) for yhat, y, p in zip(outputs, label, teacher_prob)]
702 |                         else:
703 |                             loss = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)]
704 |                         
705 |                 for l in loss:
706 |                     l.backward()
707 |                 trainer.step(batch_size)
708 |                 
709 |                 train_loss1 = sum([l.sum().asscalar() for l in loss])  / batch_size
710 |                 
711 |                 if opt.quantization_regular:
712 |                     train_loss2 = sum([l.sum().asscalar() for l in l2]) / opt.num_gpus
713 |                 elif opt.weight_regular:
714 |                     train_loss2 = sum([l.sum().asscalar() for l in l2]) / opt.num_gpus
715 |                 elif opt.relugar:
716 |                     train_loss2 = sum([l.sum().asscalar() for l in loss2]) / opt.num_gpus
717 |                 
718 | 
719 |                 if opt.mixup:
720 |                     output_softmax = [nd.SoftmaxActivation(out.astype('float32', copy=False)) \
721 |                                     for out in outputs]
722 |                     train_metric.update(label, output_softmax)
723 |                 else:
724 |                     if opt.label_smoothing:
725 |                         train_metric.update(hard_label, outputs)
726 |                     else:
727 |                         train_metric.update(label, outputs)
728 | 
729 |                 if opt.log_interval and not (i+1)%opt.log_interval:
730 |                     train_metric_name, train_metric_score = train_metric.get()
731 |                     logger.info('Epoch[%d] Batch [%d]\tSpeed: %dHz\t%s=%.3f lr=%.5f loss1=%.3f loss2=%.8f'%(
732 |                                 epoch, i, batch_size*opt.log_interval/(time.time()-btic),
733 |                                 train_metric_name, train_metric_score, trainer.learning_rate,
734 |                                 train_loss1, train_loss2))
735 |                     btic = time.time()
736 | 
737 |             train_metric_name, train_metric_score = train_metric.get()
738 |             throughput = int(batch_size * i /(time.time() - tic))
739 | 
740 |             err_top1_val, err_top5_val = test(ctx, val_data)
741 | 
742 |             logger.info('[Epoch %d] training: %s=%f'%(epoch, train_metric_name, train_metric_score))
743 |             logger.info('[Epoch %d] speed: %d Hz  time cost: %f'%(epoch, throughput, time.time()-tic))
744 |             logger.info('[Epoch %d] validation: acc-top1=%.5f acc-top5=%.5f'%(epoch, 1-err_top1_val, 1-err_top5_val))
745 | 
746 |             if err_top1_val < best_val_score:
747 |                 best_val_score = err_top1_val
748 | #                net.export('%s/%.4f-imagenet-%s'%(save_dir, (1-best_val_score), model_name),epoch)
749 |                 net.save_parameters('%s/%.4f-imagenet-%s-%d-best.params'%(save_dir, (1-best_val_score), model_name, epoch))
750 |                 trainer.save_states('%s/%.4f-imagenet-%s-%d-best.states'%(save_dir, (1-best_val_score), model_name, epoch))
751 | 
752 |             if save_frequency and save_dir and (epoch + 1) % save_frequency == 0:
753 |                 net.save_parameters('%s/imagenet-%s-%d.params'%(save_dir, model_name, epoch))
754 |                 trainer.save_states('%s/imagenet-%s-%d.states'%(save_dir, model_name, epoch))
755 | 
756 |         if save_frequency and save_dir:
757 |             net.save_parameters('%s/imagenet-%s-%d.params'%(save_dir, model_name, opt.num_epochs-1))
758 |             trainer.save_states('%s/imagenet-%s-%d.states'%(save_dir, model_name, opt.num_epochs-1))
759 | 
760 | 
761 |     if opt.mode == 'hybrid':
762 |         net.hybridize(static_alloc=True, static_shape=True)
763 |         if distillation:
764 |             teacher.hybridize(static_alloc=True, static_shape=True)
765 |     train(context)
766 | 
767 | if __name__ == '__main__':
768 |     main()
769 | 


--------------------------------------------------------------------------------