├── 0.6662_imagenet_etinynet_477k-294-best.params ├── README.md ├── etinynet.py ├── test_imagenet.py └── train_imagenet.py /0.6662_imagenet_etinynet_477k-294-best.params: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aztc/EtinyNet/d3270389fb09057bbcd9006520e6307ec4984518/0.6662_imagenet_etinynet_477k-294-best.params -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EtinyNet 2 | 3 | EtinyNet is an extremely tiny CNN backbone for Tiny Machine Learning (TinyML) that aims at executing AI workloads on low-power & low-cost IoT devices with limitted memory, such as Microcontrollor (MCU) , compact Filed Programable Gata Array (FPGA) and small footprint CNNs accelerator. 4 | 5 | We currently provide two settings of EtinyNet-1.0 and EtinyNet-0.75, which have only 477K and parameters and 360K parameters. The performance of these two models on ImageNet and comparisons with other state-of-the-art lightweight models are shwon below. 6 | 7 | Table 1. Comparison of state-of-the-art small networks over classification accuracy, the model size and MAdds on ImageNet-1000 dataset. “-” mean no reported results available. The input size is 224x224. 8 | | Model| Params.(M) | Top-1 Acc. (%)| Top-5 Acc. (%)| 9 | | ---- | -- |-- |-- | 10 | | MobileNeXt-0.35 | 1.8 |64.7|-| 11 | | MnasNet-A1-0.35 | 1.7 |64.4|85.1| 12 | | MobileNetV2-0.35 | 1.7 |60.3|82.9 13 | | MicroNet-M3 | 1.6 |61.3|82.9| 14 | | MobileNetV3-Small-0.35 | 1.6 |58.0|-| 15 | | ShuffleNetV2-0.5 | 1.4 |61.1|82.6| 16 | | MicroNet-M2 | 1.4 |58.2|80.1| 17 | | MobileNetV2-0.15 | 1.4 |55.1|-| 18 | | MobileNetV1-0.5 | 1.3 |61.7|83.6| 19 | | EfficientNet-B | 1.3 |56.7|79.8| 20 | | **EtinyNet-1.0** | **0.98** |**65.5**|**86.2**| 21 | | **EtinyNet-0.75** | **0.68** |**62.2**|**84.0**| 22 | 23 | 24 | 25 | We deploy the int8 quantized EtinyNet-1.0 on STM32H743 MCU for running object classification and detection. Results are in Tabel 2. 26 | 27 | Tabel 2. Comparison to MCU designs on ImageNet. EtinyNet obtains the record accuracy of64.7% and 65.8% on STM32F412 and STM32F746. 28 | | Model| STM32F412 | STM32F746 | 29 | | ---- | -- |-- | 30 | | Rusciet al. | 60.2% |--| 31 | | MCUNet | 62.2% |63.5%| 32 | | EtinyNet-1.0 | **64.7%** |**65.8%**| 33 | 34 | In fact, the EtinyNet can exhibit its powerful performance on the specially designed CNN accelerator Neural Co-processor (NCP). Since EtinyNet comsumes only ~800KB memory (except fully-connected layer), NCP can run it in a single-chip mannar without accessing off-chip memory, saving much energy and latency caused by data transmission. We build a system based on NCP + MCU(STM32L4R9), in which the MCU runs pre-processsing and post-processing while NCP runs EtinyNet. NCP stores weights and feature maps on chip and connects with MCU via SDIO/SPI interface to transmite images and results. The system has a really simple working pipeline as: 1) MCU sends image to NCP, 2) NCP runs EtinyNet, 3) TinyNPU sends results back. With the NCP, we prompt the throughput of entire system to 30fps and reache an extremelly low processing power of 160mW (MCU + NCP)。 35 | 36 | Here's a video that presents the prototype system. 37 | 38 | [![EtinyNet](https://i9.ytimg.com/vi/mIZPxtJ-9EY/mq3.jpg?sqp=COju4ZAG&rs=AOn4CLDglN9ujGc3h1syZAd-s9PNYzD9-Q)](https://www.youtube.com/watch?v=mIZPxtJ-9EY) 39 | 40 | 41 | We provide the training code for EtinyNet-1.0 (no quantization) as well as the coresponding test code and well-trained parameters, as listed below: 42 | 43 | 1) train_imagenet.py: training code. The default input size is 224, 300 epoches and 128 batchsize x 8 gpus. 44 | 45 | 2) etinynet.py: EtinyNet-1.0. 46 | 47 | 3) 0.6553-imagenet-mobilenet_lite313_477k_nownorm_4433_224-293-best.py: well-trained parameters for EtinyNet-1.0 (no quantization) 48 | 49 | 4) test_imagenet.py : training code. 50 | 51 | The MXNet toolbox and '.rec' compressed ImageNet data file were used for training efficiency. Please refer to https://mxnet.incubator.apache.org/versions/1.9.0/ for more detail about '.rec' data. 52 | 53 | 54 | -------------------------------------------------------------------------------- /etinynet.py: -------------------------------------------------------------------------------- 1 | from mxnet.gluon import nn 2 | from mxnet.gluon.nn import BatchNorm 3 | from mxnet.context import cpu 4 | from mxnet.gluon.block import HybridBlock 5 | from gluoncv.nn import ReLU6 6 | import mxnet as mx 7 | from gluoncv.model_zoo.ssd import get_ssd 8 | from gluoncv.model_zoo import get_model 9 | from gluoncv.nn.coder import MultiPerClassDecoder, NormalizedBoxCenterDecoder 10 | from gluoncv.model_zoo.ssd.anchor import SSDAnchorGenerator 11 | from mxnet import autograd 12 | import numpy as np 13 | 14 | 15 | class Clip(HybridBlock): 16 | def __init__(self, **kwargs): 17 | super(Clip, self).__init__(**kwargs) 18 | def hybrid_forward(self, F, x): 19 | return F.clip(x, -12, 12, name="clip") 20 | 21 | 22 | # pylint: disable= too-many-arguments 23 | def _add_conv(out, channels=1, kernel=1, stride=1, pad=0, norm=True, in_channels=0, quantized=True, 24 | num_group=1, active=True, relu6=False, norm_layer=BatchNorm, norm_kwargs=None): 25 | out.add(nn.Conv2D(channels, kernel, stride, pad, groups=num_group, use_bias=False,in_channels=in_channels)) 26 | if active: 27 | out.add(ReLU6() if relu6 else nn.Activation('relu')) 28 | if norm: 29 | out.add(norm_layer(scale=True, center=True, **({} if norm_kwargs is None else norm_kwargs))) 30 | 31 | 32 | class LinearBottleneck(nn.HybridBlock): 33 | r"""LinearBottleneck used in MobileNetV2 model from the 34 | `"Inverted Residuals and Linear Bottlenecks: 35 | Mobile Networks for Classification, Detection and Segmentation" 36 | `_ paper. 37 | 38 | Parameters 39 | ---------- 40 | in_channels : int 41 | Number of input channels. 42 | channels : int 43 | Number of output channels. 44 | t : int 45 | Layer expansion ratio. 46 | stride : int 47 | stride 48 | norm_layer : object 49 | Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`) 50 | Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`. 51 | norm_kwargs : dict 52 | Additional `norm_layer` arguments, for example `num_devices=4` 53 | for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`. 54 | """ 55 | 56 | def __init__(self, channels, stride, shortcut, 57 | norm_layer=BatchNorm, norm_kwargs=None, **kwargs): 58 | super(LinearBottleneck, self).__init__(**kwargs) 59 | self.use_shortcut1 = (stride == 1 and channels[0] == channels[1] and shortcut) 60 | self.use_shortcut2 = shortcut 61 | with self.name_scope(): 62 | self.out1 = nn.HybridSequential() # 1x1 63 | self.out2 = nn.HybridSequential() # 1x1 64 | 65 | _add_conv(self.out1, 66 | in_channels=channels[0], 67 | channels=channels[0], 68 | kernel=3, 69 | stride=stride, 70 | pad=1, 71 | num_group=channels[0], 72 | active=False, 73 | relu6=False, 74 | norm_layer=norm_layer, norm_kwargs=norm_kwargs) 75 | _add_conv(self.out1, 76 | in_channels=channels[0], 77 | channels=channels[1], 78 | active=True, 79 | relu6=False, 80 | norm_layer=norm_layer, norm_kwargs=norm_kwargs) 81 | _add_conv(self.out2, 82 | in_channels=channels[1], 83 | channels=channels[1], 84 | kernel=3, 85 | stride=1, 86 | pad=1, 87 | num_group=channels[1], 88 | active=True, 89 | relu6=False, 90 | norm_layer=norm_layer, norm_kwargs=norm_kwargs) 91 | 92 | 93 | def hybrid_forward(self, F, x): 94 | out = self.out1(x) 95 | if self.use_shortcut1: 96 | out = F.elemwise_add(out, x) 97 | x = out 98 | out = self.out2(out) 99 | if self.use_shortcut2: 100 | out = F.elemwise_add(out, x) 101 | return out 102 | 103 | 104 | 105 | class Etinynet(nn.HybridBlock): 106 | r"""MobileNetV2 model from the 107 | `"Inverted Residuals and Linear Bottlenecks: 108 | Mobile Networks for Classification, Detection and Segmentation" 109 | `_ paper. 110 | Parameters 111 | ---------- 112 | multiplier : float, default 1.0 113 | The width multiplier for controlling the model size. The actual number of channels 114 | is equal to the original channel size multiplied by this multiplier. 115 | classes : int, default 1000 116 | Number of classes for the output layer. 117 | norm_layer : object 118 | Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`) 119 | Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`. 120 | norm_kwargs : dict 121 | Additional `norm_layer` arguments, for example `num_devices=4` 122 | for :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`. 123 | """ 124 | def __init__(self, multiplier=1.0, classes=1000, norm_layer=BatchNorm, norm_kwargs=None, 125 | ctx=cpu(), root='', pretrained=False, **kwargs): 126 | super(Etinynet, self).__init__(**kwargs) 127 | 128 | with self.name_scope(): 129 | self.features1 = nn.HybridSequential(prefix='features1_') 130 | self.features2 = nn.HybridSequential(prefix='features2_') 131 | self.features3 = nn.HybridSequential(prefix='features2_') 132 | with self.features1.name_scope(): 133 | _add_conv(self.features1, int(32 * multiplier), kernel=3, 134 | stride=(2,2), pad=1, relu6=False,in_channels=3, quantized=True, 135 | norm_layer=norm_layer, norm_kwargs=norm_kwargs) 136 | 137 | self.features1.add(nn.MaxPool2D((2,2))) 138 | channels_group = [[32, 32], [32, 32], [32, 32], [32, 32], 139 | [32, 128], [128, 128], [128, 128], [128, 128]] 140 | strides = [1,1,1,1] + [2,1,1,1] 141 | shortcuts = [0,0,0,0] + [0,0,0,0] 142 | for cg, s, sc in zip(channels_group, strides, shortcuts): 143 | self.features1.add(LinearBottleneck(channels=np.int32(np.array(cg)*multiplier), 144 | stride=s, 145 | shortcut=sc, 146 | norm_layer=norm_layer, 147 | norm_kwargs=norm_kwargs)) 148 | 149 | 150 | channels_group = [[128, 192], [192, 192], [192, 192]] 151 | strides = [2,1,1] 152 | shortcuts = [1,1,1] 153 | for cg, s, sc in zip(channels_group, strides, shortcuts): 154 | self.features2.add(LinearBottleneck(channels=np.int32(np.array(cg)*multiplier), 155 | stride=s, 156 | shortcut=sc, 157 | norm_layer=norm_layer, 158 | norm_kwargs=norm_kwargs)) 159 | 160 | 161 | channels_group = [[192, 256], [256, 256], [256, 512]] 162 | strides = [2,1,1] 163 | shortcuts = [1,1,1] 164 | for cg, s, sc in zip(channels_group, strides, shortcuts): 165 | self.features3.add(LinearBottleneck(channels=np.int32(np.array(cg)*multiplier), 166 | stride=s, 167 | shortcut=sc, 168 | norm_layer=norm_layer, 169 | norm_kwargs=norm_kwargs)) 170 | self.avg = nn.GlobalAvgPool2D() 171 | 172 | 173 | self.output = nn.HybridSequential(prefix='output_') 174 | with self.output.name_scope(): 175 | self.output.add( 176 | nn.Conv2D(classes, 1, in_channels=int(512*multiplier),prefix='pred_'), 177 | nn.Flatten()) 178 | 179 | 180 | 181 | def hybrid_forward(self, F, x): 182 | x1 = self.features1(x) 183 | x2 = self.features2(x1) 184 | x3 = self.features3(x2) 185 | x = self.avg(x3) 186 | x = self.output(x) 187 | return x 188 | 189 | 190 | 191 | if __name__ == "__main__": 192 | model = Etinynet() 193 | model.initialize() 194 | model.summary(mx.ndarray.zeros((1,3,256,256))) 195 | 196 | 197 | -------------------------------------------------------------------------------- /test_imagenet.py: -------------------------------------------------------------------------------- 1 | import argparse, time, logging, os, math 2 | #os.environ["CUDA_VISIBLE_DEVICES"] = '4,5,6,7' 3 | 4 | import numpy as np 5 | import mxnet as mx 6 | import gluoncv as gcv 7 | from mxnet import gluon, nd 8 | from mxnet import autograd as ag 9 | from mxnet.gluon.data.vision import transforms 10 | 11 | import gluoncv as gcv 12 | gcv.utils.check_version('0.6.0') 13 | from gluoncv.data import imagenet 14 | from gluoncv.model_zoo import get_model 15 | from gluoncv.utils import makedirs, LRSequential, LRScheduler 16 | 17 | os.environ["CUDA_VISIBLE_DEVICES"] = '4,5,6,7' 18 | 19 | # CLI 20 | def parse_args(): 21 | parser = argparse.ArgumentParser(description='Train a model for image classification.') 22 | parser.add_argument('--data-dir', type=str, default='~/.mxnet/datasets/imagenet', 23 | help='training and validation pictures to use.') 24 | parser.add_argument('--rec-train', type=str, default='~/.mxnet/datasets/imagenet/rec/train.rec', 25 | help='the training data') 26 | parser.add_argument('--rec-train-idx', type=str, default='~/.mxnet/datasets/imagenet/rec/train.idx', 27 | help='the index of training data') 28 | parser.add_argument('--rec-val', type=str, default='~/.mxnet/datasets/imagenet/rec/val.rec', 29 | help='the validation data') 30 | parser.add_argument('--rec-val-idx', type=str, default='~/.mxnet/datasets/imagenet/rec/val.idx', 31 | help='the index of validation data') 32 | parser.add_argument('--use-rec', action='store_true', 33 | help='use image record iter for data input. default is false.') 34 | parser.add_argument('--batch-size', type=int, default=32, 35 | help='training batch size per device (CPU/GPU).') 36 | parser.add_argument('--dtype', type=str, default='float32', 37 | help='data type for training. default is float32') 38 | parser.add_argument('--num-gpus', type=int, default=0, 39 | help='number of gpus to use.') 40 | parser.add_argument('-j', '--num-data-workers', dest='num_workers', default=4, type=int, 41 | help='number of preprocessing workers') 42 | parser.add_argument('--num-epochs', type=int, default=3, 43 | help='number of training epochs.') 44 | parser.add_argument('--lr', type=float, default=0.1, 45 | help='learning rate. default is 0.1.') 46 | parser.add_argument('--momentum', type=float, default=0.9, 47 | help='momentum value for optimizer, default is 0.9.') 48 | parser.add_argument('--wd', type=float, default=0.0001, 49 | help='weight decay rate. default is 0.0001.') 50 | parser.add_argument('--lr-mode', type=str, default='step', 51 | help='learning rate scheduler mode. options are step, poly and cosine.') 52 | parser.add_argument('--lr-decay', type=float, default=0.1, 53 | help='decay rate of learning rate. default is 0.1.') 54 | parser.add_argument('--lr-decay-period', type=int, default=0, 55 | help='interval for periodic learning rate decays. default is 0 to disable.') 56 | parser.add_argument('--lr-decay-epoch', type=str, default='40,60', 57 | help='epochs at which learning rate decays. default is 40,60.') 58 | parser.add_argument('--warmup-lr', type=float, default=0.0, 59 | help='starting warmup learning rate. default is 0.0.') 60 | parser.add_argument('--warmup-epochs', type=int, default=0, 61 | help='number of warmup epochs.') 62 | parser.add_argument('--last-gamma', action='store_true', 63 | help='whether to init gamma of the last BN layer in each bottleneck to 0.') 64 | parser.add_argument('--mode', type=str, 65 | help='mode in which to train the model. options are symbolic, imperative, hybrid') 66 | parser.add_argument('--model', type=str, 67 | help='type of model to use. see vision_model for options.') 68 | parser.add_argument('--input-size', type=int, default=224, 69 | help='size of the input image size. default is 224') 70 | parser.add_argument('--crop-ratio', type=float, default=0.875, 71 | help='Crop ratio during validation. default is 0.875') 72 | parser.add_argument('--use-pretrained', action='store_true', 73 | help='enable using pretrained model from gluon.') 74 | parser.add_argument('--use_se', action='store_true', 75 | help='use SE layers or not in resnext. default is false.') 76 | parser.add_argument('--mixup', action='store_true', 77 | help='whether train the model with mix-up. default is false.') 78 | parser.add_argument('--mixup-alpha', type=float, default=0.2, 79 | help='beta distribution parameter for mixup sampling, default is 0.2.') 80 | parser.add_argument('--mixup-off-epoch', type=int, default=0, 81 | help='how many last epochs to train without mixup, default is 0.') 82 | parser.add_argument('--label-smoothing', action='store_true', 83 | help='use label smoothing or not in training. default is false.') 84 | parser.add_argument('--no-wd', action='store_true', 85 | help='whether to remove weight decay on bias, and beta/gamma for batchnorm layers.') 86 | parser.add_argument('--teacher', type=str, default=None, 87 | help='teacher model for distillation training') 88 | parser.add_argument('--temperature', type=float, default=20, 89 | help='temperature parameter for distillation teacher model') 90 | parser.add_argument('--hard-weight', type=float, default=0.5, 91 | help='weight for the loss of one-hot label for distillation training') 92 | parser.add_argument('--batch-norm', action='store_true', 93 | help='enable batch normalization or not in vgg. default is false.') 94 | parser.add_argument('--save-frequency', type=int, default=10, 95 | help='frequency of model saving.') 96 | parser.add_argument('--save-dir', type=str, default='params', 97 | help='directory of saved models') 98 | parser.add_argument('--resume-epoch', type=int, default=0, 99 | help='epoch to resume training from.') 100 | parser.add_argument('--resume-params', type=str, default='', 101 | help='path of parameters to load from.') 102 | parser.add_argument('--resume-states', type=str, default='', 103 | help='path of trainer state to load from.') 104 | parser.add_argument('--log-interval', type=int, default=50, 105 | help='Number of batches to wait before logging.') 106 | parser.add_argument('--logging-file', type=str, default='train_imagenet.log', 107 | help='name of training log file') 108 | parser.add_argument('--use-gn', action='store_true', 109 | help='whether to use group norm.') 110 | opt = parser.parse_args() 111 | return opt 112 | 113 | 114 | def main(): 115 | opt = parse_args() 116 | 117 | opt.rec_train = '/home/xkr/ramdisk/imagenet_train.rec' 118 | opt.rec_train_idx = '/home/xkr/ramdisk/imagenet_train.idx' 119 | opt.rec_val = '/home/xkr/ramdisk/imagenet_val.rec' 120 | opt.rec_val_idx = '/home/xkr/ramdisk/imagenet_val.idx' 121 | 122 | opt.use_rec = True 123 | opt.model = 'test' 124 | opt.relugar = False 125 | opt.lamda = 0.001 126 | opt.mode = 'hybrid' 127 | opt.dtype = "float32" 128 | opt.lr = 0.001 129 | opt.batch_size = 128 130 | opt.num_gpus = 8 131 | opt.input_size = 224 132 | 133 | 134 | opt.resume_epoch = 0 135 | opt.resume_params = "0.6553-imagenet-mobilenet_lite313_477k_nownorm_4433_224-293-best.params" 136 | 137 | 138 | filehandler = logging.FileHandler(opt.logging_file) 139 | streamhandler = logging.StreamHandler() 140 | 141 | logger = logging.getLogger('') 142 | logger.setLevel(logging.INFO) 143 | logger.addHandler(filehandler) 144 | logger.addHandler(streamhandler) 145 | 146 | logger.info(opt) 147 | 148 | batch_size = opt.batch_size 149 | classes = 1000 150 | num_training_samples = 1281167 151 | 152 | num_gpus = opt.num_gpus 153 | batch_size *= max(1, num_gpus) 154 | context = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()] 155 | num_workers = opt.num_workers 156 | 157 | lr_decay = opt.lr_decay 158 | lr_decay_period = opt.lr_decay_period 159 | if opt.lr_decay_period > 0: 160 | lr_decay_epoch = list(range(lr_decay_period, opt.num_epochs, lr_decay_period)) 161 | else: 162 | lr_decay_epoch = [int(i) for i in opt.lr_decay_epoch.split(',')] 163 | lr_decay_epoch = [e - opt.warmup_epochs for e in lr_decay_epoch] 164 | num_batches = num_training_samples // batch_size 165 | 166 | lr_scheduler = LRSequential([ 167 | LRScheduler(opt.lr_mode, base_lr=opt.lr, target_lr=1e-5, 168 | nepochs=opt.num_epochs - opt.warmup_epochs, 169 | iters_per_epoch=num_batches, 170 | step_epoch=lr_decay_epoch, 171 | step_factor=lr_decay) 172 | ]) 173 | 174 | model_name = opt.model 175 | 176 | kwargs = {'ctx': context, 'pretrained': opt.use_pretrained, 'classes': classes} 177 | if opt.use_gn: 178 | kwargs['norm_layer'] = gcv.nn.GroupNorm 179 | if model_name.startswith('vgg'): 180 | kwargs['batch_norm'] = opt.batch_norm 181 | elif model_name.startswith('resnext'): 182 | kwargs['use_se'] = opt.use_se 183 | 184 | if opt.last_gamma: 185 | kwargs['last_gamma'] = True 186 | 187 | optimizer = 'nag' 188 | optimizer_params = {'wd': opt.wd, 'momentum': opt.momentum, 'lr_scheduler': lr_scheduler} 189 | if opt.dtype != 'float32': 190 | optimizer_params['multi_precision'] = True 191 | 192 | 193 | from etinynet import Etinynet 194 | net = Etinynet(classes=1000) 195 | 196 | net.cast(opt.dtype) 197 | if opt.resume_params != '': 198 | net.load_parameters(opt.resume_params, ctx = context) 199 | # net.collect_params().load(opt.resume_params, ctx = context) 200 | 201 | # teacher model for distillation training 202 | if opt.teacher is not None and opt.hard_weight < 1.0: 203 | teacher_name = opt.teacher 204 | teacher = get_model(teacher_name, pretrained=True, classes=classes, ctx=context) 205 | teacher.cast(opt.dtype) 206 | distillation = True 207 | else: 208 | distillation = False 209 | 210 | 211 | 212 | # Two functions for reading data from record file or raw images 213 | def get_data_rec(rec_train, rec_train_idx, rec_val, rec_val_idx, batch_size, num_workers): 214 | rec_train = os.path.expanduser(rec_train) 215 | rec_train_idx = os.path.expanduser(rec_train_idx) 216 | rec_val = os.path.expanduser(rec_val) 217 | rec_val_idx = os.path.expanduser(rec_val_idx) 218 | jitter_param = 0.4 219 | lighting_param = 0.1 220 | input_size = opt.input_size 221 | crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875 222 | resize = int(math.ceil(input_size / crop_ratio)) 223 | mean_rgb = [128, 128, 128] 224 | std_rgb = [1, 1, 1] 225 | # mean_rgb = [123.68, 116.779, 103.939] 226 | # std_rgb = [58.393, 57.12, 57.375] 227 | 228 | def batch_fn(batch, ctx): 229 | data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0) 230 | label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0) 231 | return data, label 232 | 233 | train_data = mx.io.ImageRecordIter( 234 | path_imgrec = rec_train, 235 | path_imgidx = rec_train_idx, 236 | preprocess_threads = num_workers, 237 | shuffle = True, 238 | batch_size = batch_size, 239 | 240 | data_shape = (3, input_size, input_size), 241 | mean_r = mean_rgb[0], 242 | mean_g = mean_rgb[1], 243 | mean_b = mean_rgb[2], 244 | std_r = std_rgb[0], 245 | std_g = std_rgb[1], 246 | std_b = std_rgb[2], 247 | rand_mirror = True, 248 | random_resized_crop = True, 249 | max_aspect_ratio = 4. / 3., 250 | min_aspect_ratio = 3. / 4., 251 | max_random_area = 1, 252 | min_random_area = 0.16, 253 | brightness = jitter_param, 254 | saturation = jitter_param, 255 | contrast = jitter_param, 256 | # pca_noise = lighting_param, 257 | ) 258 | val_data = mx.io.ImageRecordIter( 259 | path_imgrec = rec_val, 260 | path_imgidx = rec_val_idx, 261 | preprocess_threads = num_workers, 262 | shuffle = False, 263 | batch_size = batch_size, 264 | 265 | resize = resize, 266 | data_shape = (3, input_size, input_size), 267 | mean_r = mean_rgb[0], 268 | mean_g = mean_rgb[1], 269 | mean_b = mean_rgb[2], 270 | std_r = std_rgb[0], 271 | std_g = std_rgb[1], 272 | std_b = std_rgb[2], 273 | ) 274 | return train_data, val_data, batch_fn 275 | 276 | def get_data_loader(data_dir, batch_size, num_workers): 277 | normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) 278 | jitter_param = 0.4 279 | lighting_param = 0.1 280 | input_size = opt.input_size 281 | crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875 282 | resize = int(math.ceil(input_size / crop_ratio)) 283 | 284 | def batch_fn(batch, ctx): 285 | data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0) 286 | label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0) 287 | return data, label 288 | 289 | transform_train = transforms.Compose([ 290 | transforms.RandomResizedCrop(input_size), 291 | transforms.RandomFlipLeftRight(), 292 | transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param, 293 | saturation=jitter_param), 294 | transforms.RandomLighting(lighting_param), 295 | transforms.ToTensor(), 296 | normalize 297 | ]) 298 | transform_test = transforms.Compose([ 299 | transforms.Resize(resize, keep_ratio=True), 300 | transforms.CenterCrop(input_size), 301 | transforms.ToTensor(), 302 | normalize 303 | ]) 304 | 305 | train_data = gluon.data.DataLoader( 306 | imagenet.classification.ImageNet(data_dir, train=True).transform_first(transform_train), 307 | batch_size=batch_size, shuffle=True, last_batch='discard', num_workers=num_workers) 308 | val_data = gluon.data.DataLoader( 309 | imagenet.classification.ImageNet("/home/xkr/data/ImageNet", train=False).transform_first(transform_test), 310 | batch_size=batch_size, shuffle=False, num_workers=num_workers) 311 | 312 | return train_data, val_data, batch_fn 313 | 314 | if opt.use_rec: 315 | train_data, val_data, batch_fn = get_data_rec(opt.rec_train, opt.rec_train_idx, 316 | opt.rec_val, opt.rec_val_idx, 317 | batch_size, num_workers) 318 | else: 319 | train_data, val_data, batch_fn = get_data_loader(opt.data_dir, batch_size, num_workers) 320 | 321 | if opt.mixup: 322 | train_metric = mx.metric.RMSE() 323 | else: 324 | train_metric = mx.metric.Accuracy() 325 | acc_top1 = mx.metric.Accuracy() 326 | acc_top5 = mx.metric.TopKAccuracy(5) 327 | 328 | save_frequency = opt.save_frequency 329 | if opt.save_dir and save_frequency: 330 | save_dir = opt.save_dir 331 | makedirs(save_dir) 332 | else: 333 | save_dir = '' 334 | save_frequency = 0 335 | 336 | def mixup_transform(label, classes, lam=1, eta=0.0): 337 | if isinstance(label, nd.NDArray): 338 | label = [label] 339 | res = [] 340 | for l in label: 341 | y1 = l.one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes) 342 | y2 = l[::-1].one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes) 343 | res.append(lam*y1 + (1-lam)*y2) 344 | return res 345 | params = net.collect_params() 346 | weights_3x3 = [] 347 | weights_1x1 = [] 348 | for k, param in params.items(): 349 | shape = param.shape 350 | if len(shape)==4 and shape[1]==1 and shape[2]==3 and shape[3]==3: 351 | weights_3x3.append([param.data(ctx=d) for d in ctx]) 352 | if report: 353 | print(k, shape) 354 | 355 | elif len(shape)==4 and shape[0] 0 else [mx.cpu()] 184 | num_workers = opt.num_workers 185 | 186 | lr_decay = opt.lr_decay 187 | lr_decay_period = opt.lr_decay_period 188 | if opt.lr_decay_period > 0: 189 | lr_decay_epoch = list(range(lr_decay_period, opt.num_epochs, lr_decay_period)) 190 | else: 191 | lr_decay_epoch = [int(i) for i in opt.lr_decay_epoch.split(',')] 192 | lr_decay_epoch = [e - opt.warmup_epochs for e in lr_decay_epoch] 193 | num_batches = num_training_samples // batch_size 194 | 195 | lr_scheduler = LRSequential([ 196 | LRScheduler(opt.lr_mode, base_lr=opt.lr, target_lr=1e-5, 197 | nepochs=opt.num_epochs - opt.warmup_epochs, 198 | iters_per_epoch=num_batches, 199 | step_epoch=lr_decay_epoch, 200 | step_factor=lr_decay) 201 | ]) 202 | 203 | model_name = opt.model 204 | 205 | kwargs = {'ctx': context, 'pretrained': opt.use_pretrained, 'classes': classes} 206 | if opt.use_gn: 207 | kwargs['norm_layer'] = gcv.nn.GroupNorm 208 | if model_name.startswith('vgg'): 209 | kwargs['batch_norm'] = opt.batch_norm 210 | elif model_name.startswith('resnext'): 211 | kwargs['use_se'] = opt.use_se 212 | 213 | if opt.last_gamma: 214 | kwargs['last_gamma'] = True 215 | 216 | optimizer = 'nag' 217 | optimizer_params = {'wd': opt.wd, 'momentum': opt.momentum, 'lr_scheduler': lr_scheduler} 218 | if opt.dtype != 'float32': 219 | optimizer_params['multi_precision'] = True 220 | 221 | 222 | if opt.norm_distill: 223 | from models.mobilenet_lite_distill import MobileNetV2 224 | kwargs = {'norm': opt.norm,'classes':1000,'switchout':2} 225 | net = MobileNetV2(**kwargs) 226 | else: 227 | from etinynet import Etinynet 228 | net = Etinynet(classes=1000) 229 | 230 | net.cast(opt.dtype) 231 | if opt.resume_params != '': 232 | net.load_parameters(opt.resume_params, ctx = context, allow_missing=True) 233 | net.initialize(ctx=context) 234 | 235 | # teacher model for distillation training 236 | if opt.teacher is not None and opt.hard_weight < 1.0 and not opt.norm_distill: 237 | teacher_name = opt.teacher 238 | teacher = get_model(teacher_name, pretrained=True, classes=classes, ctx=context) 239 | teacher.cast(opt.dtype) 240 | distillation = True 241 | else: 242 | distillation = False 243 | 244 | if opt.norm_distill: 245 | from models.mobilenetv2_distill import mobilenet_v2_1_0 246 | kwargs = {'norm': opt.norm, 'classes':1000,'switchout':1} 247 | teacher = mobilenet_v2_1_0(**kwargs) 248 | teacher.load_parameters(opt.tea_net_params, ctx = context) 249 | distillation = True 250 | 251 | # Two functions for reading data from record file or raw images 252 | def get_data_rec(rec_train, rec_train_idx, rec_val, rec_val_idx, batch_size, num_workers): 253 | rec_train = os.path.expanduser(rec_train) 254 | rec_train_idx = os.path.expanduser(rec_train_idx) 255 | rec_val = os.path.expanduser(rec_val) 256 | rec_val_idx = os.path.expanduser(rec_val_idx) 257 | jitter_param = 0.4 258 | lighting_param = 0.1 259 | input_size = opt.input_size 260 | crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875 261 | resize = int(math.ceil(input_size / crop_ratio)) 262 | mean_rgb = [128, 128, 128] 263 | std_rgb = [1, 1, 1] 264 | # mean_rgb = [123.68, 116.779, 103.939] 265 | # std_rgb = [58.393, 57.12, 57.375] 266 | 267 | def batch_fn(batch, ctx): 268 | data = gluon.utils.split_and_load(batch.data[0], ctx_list=ctx, batch_axis=0) 269 | label = gluon.utils.split_and_load(batch.label[0], ctx_list=ctx, batch_axis=0) 270 | return data, label 271 | 272 | train_data = mx.io.ImageRecordIter( 273 | path_imgrec = rec_train, 274 | path_imgidx = rec_train_idx, 275 | preprocess_threads = num_workers, 276 | shuffle = True, 277 | batch_size = batch_size, 278 | 279 | data_shape = (3, input_size, input_size), 280 | mean_r = mean_rgb[0], 281 | mean_g = mean_rgb[1], 282 | mean_b = mean_rgb[2], 283 | std_r = std_rgb[0], 284 | std_g = std_rgb[1], 285 | std_b = std_rgb[2], 286 | rand_mirror = True, 287 | random_resized_crop = True, 288 | max_aspect_ratio = 4. / 3., 289 | min_aspect_ratio = 3. / 4., 290 | max_random_area = 1, 291 | min_random_area = 0.16, 292 | brightness = jitter_param, 293 | saturation = jitter_param, 294 | contrast = jitter_param, 295 | # pca_noise = lighting_param, 296 | ) 297 | val_data = mx.io.ImageRecordIter( 298 | path_imgrec = rec_val, 299 | path_imgidx = rec_val_idx, 300 | preprocess_threads = num_workers, 301 | shuffle = False, 302 | batch_size = batch_size, 303 | 304 | resize = resize, 305 | data_shape = (3, input_size, input_size), 306 | mean_r = mean_rgb[0], 307 | mean_g = mean_rgb[1], 308 | mean_b = mean_rgb[2], 309 | std_r = std_rgb[0], 310 | std_g = std_rgb[1], 311 | std_b = std_rgb[2], 312 | ) 313 | return train_data, val_data, batch_fn 314 | 315 | def get_data_loader(data_dir, batch_size, num_workers): 316 | normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]) 317 | jitter_param = 0.4 318 | lighting_param = 0.1 319 | input_size = opt.input_size 320 | crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875 321 | resize = int(math.ceil(input_size / crop_ratio)) 322 | 323 | def batch_fn(batch, ctx): 324 | data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0) 325 | label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0) 326 | return data, label 327 | 328 | transform_train = transforms.Compose([ 329 | transforms.RandomResizedCrop(input_size), 330 | transforms.RandomFlipLeftRight(), 331 | transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param, 332 | saturation=jitter_param), 333 | transforms.RandomLighting(lighting_param), 334 | transforms.ToTensor(), 335 | normalize 336 | ]) 337 | transform_test = transforms.Compose([ 338 | transforms.Resize(resize, keep_ratio=True), 339 | transforms.CenterCrop(input_size), 340 | transforms.ToTensor(), 341 | normalize 342 | ]) 343 | 344 | train_data = gluon.data.DataLoader( 345 | imagenet.classification.ImageNet(data_dir, train=True).transform_first(transform_train), 346 | batch_size=batch_size, shuffle=True, last_batch='discard', num_workers=num_workers) 347 | val_data = gluon.data.DataLoader( 348 | imagenet.classification.ImageNet("/home/xkr/data/ImageNet", train=False).transform_first(transform_test), 349 | batch_size=batch_size, shuffle=False, num_workers=num_workers) 350 | 351 | return train_data, val_data, batch_fn 352 | 353 | if opt.use_rec: 354 | train_data, val_data, batch_fn = get_data_rec(opt.rec_train, opt.rec_train_idx, 355 | opt.rec_val, opt.rec_val_idx, 356 | batch_size, num_workers) 357 | else: 358 | train_data, val_data, batch_fn = get_data_loader(opt.data_dir, batch_size, num_workers) 359 | 360 | if opt.mixup: 361 | train_metric = mx.metric.RMSE() 362 | else: 363 | train_metric = mx.metric.Accuracy() 364 | acc_top1 = mx.metric.Accuracy() 365 | acc_top5 = mx.metric.TopKAccuracy(5) 366 | 367 | save_frequency = opt.save_frequency 368 | if opt.save_dir and save_frequency: 369 | save_dir = opt.save_dir 370 | makedirs(save_dir) 371 | else: 372 | save_dir = '' 373 | save_frequency = 0 374 | 375 | def mixup_transform(label, classes, lam=1, eta=0.0): 376 | if isinstance(label, nd.NDArray): 377 | label = [label] 378 | res = [] 379 | for l in label: 380 | y1 = l.one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes) 381 | y2 = l[::-1].one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes) 382 | res.append(lam*y1 + (1-lam)*y2) 383 | return res 384 | params = net.collect_params() 385 | weights_3x3 = [] 386 | weights_1x1 = [] 387 | for k, param in params.items(): 388 | shape = param.shape 389 | if len(shape)==4 and shape[1]==1 and shape[2]==3 and shape[3]==3: 390 | weights_3x3.append([param.data(ctx=d) for d in ctx]) 391 | if report: 392 | print(k, shape) 393 | 394 | elif len(shape)==4 and shape[0]= opt.num_epochs - opt.mixup_off_epoch: 641 | lam = 1 642 | data = [lam*X + (1-lam)*X[::-1] for X in data] 643 | 644 | if opt.label_smoothing: 645 | eta = 0.1 646 | else: 647 | eta = 0.0 648 | label = mixup_transform(label, 649 | classes, lam, eta) 650 | 651 | elif opt.label_smoothing: 652 | hard_label = label 653 | label = smooth(label, classes) 654 | 655 | 656 | if distillation: 657 | if opt.norm_distill: 658 | teacher_prob = [nd.softmax(teacher(X.astype(opt.dtype, copy=False)/S)) for X,S in zip(data,std)] 659 | else: 660 | teacher_prob = [nd.softmax(teacher(X.astype(opt.dtype, copy=False)/S) / opt.temperature) \ 661 | for X,S in zip(data,std)] 662 | 663 | with ag.record(): 664 | if opt.norm_distill: 665 | outputs_tmp = [net(X.astype(opt.dtype, copy=False)) for X in data] 666 | outputs = [X[0] for X in outputs_tmp] 667 | outputs_norm = [X[1] for X in outputs_tmp] 668 | else: 669 | outputs = [net(X.astype(opt.dtype, copy=False)) for X in data] 670 | 671 | 672 | if opt.quantization_regular: 673 | l1 = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)] 674 | l2 = quantization_regular(net, ctx) 675 | loss = [x + opt.quantizaiton_weight * y for x,y in zip(l1,l2)] 676 | 677 | elif opt.weight_regular: 678 | l1 = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)] 679 | l2 = weight_loss(net, ctx) 680 | loss = [x + opt.weight_regular_w * y for x,y in zip(l1,l2)] 681 | 682 | elif opt.relugar: 683 | loss2 = regular_loss(net, ctx) 684 | loss2 = [l * opt.lamda for l in loss2] 685 | 686 | if distillation: 687 | loss = [L(yhat.astype('float32', copy=False), 688 | y.astype('float32', copy=False), 689 | p.astype('float32', copy=False)) + l2 for yhat, y, p, l2 in zip(outputs, label, teacher_prob, loss2)] 690 | else: 691 | loss = [L(yhat, y.astype(opt.dtype, copy=False)) + l2 for yhat, y, l2 in zip(outputs, label, loss2)] 692 | else: 693 | if distillation: 694 | if opt.norm_distill: 695 | l1 = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)] 696 | l2 = [L_ndis(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs_norm, teacher_prob)] 697 | loss = [x + opt.norm_distill_w * y for x,y in zip(l1,l2)] 698 | else: 699 | loss = [L(yhat.astype('float32', copy=False), 700 | y.astype('float32', copy=False), 701 | p.astype('float32', copy=False)) for yhat, y, p in zip(outputs, label, teacher_prob)] 702 | else: 703 | loss = [L(yhat, y.astype(opt.dtype, copy=False)) for yhat, y in zip(outputs, label)] 704 | 705 | for l in loss: 706 | l.backward() 707 | trainer.step(batch_size) 708 | 709 | train_loss1 = sum([l.sum().asscalar() for l in loss]) / batch_size 710 | 711 | if opt.quantization_regular: 712 | train_loss2 = sum([l.sum().asscalar() for l in l2]) / opt.num_gpus 713 | elif opt.weight_regular: 714 | train_loss2 = sum([l.sum().asscalar() for l in l2]) / opt.num_gpus 715 | elif opt.relugar: 716 | train_loss2 = sum([l.sum().asscalar() for l in loss2]) / opt.num_gpus 717 | 718 | 719 | if opt.mixup: 720 | output_softmax = [nd.SoftmaxActivation(out.astype('float32', copy=False)) \ 721 | for out in outputs] 722 | train_metric.update(label, output_softmax) 723 | else: 724 | if opt.label_smoothing: 725 | train_metric.update(hard_label, outputs) 726 | else: 727 | train_metric.update(label, outputs) 728 | 729 | if opt.log_interval and not (i+1)%opt.log_interval: 730 | train_metric_name, train_metric_score = train_metric.get() 731 | logger.info('Epoch[%d] Batch [%d]\tSpeed: %dHz\t%s=%.3f lr=%.5f loss1=%.3f loss2=%.8f'%( 732 | epoch, i, batch_size*opt.log_interval/(time.time()-btic), 733 | train_metric_name, train_metric_score, trainer.learning_rate, 734 | train_loss1, train_loss2)) 735 | btic = time.time() 736 | 737 | train_metric_name, train_metric_score = train_metric.get() 738 | throughput = int(batch_size * i /(time.time() - tic)) 739 | 740 | err_top1_val, err_top5_val = test(ctx, val_data) 741 | 742 | logger.info('[Epoch %d] training: %s=%f'%(epoch, train_metric_name, train_metric_score)) 743 | logger.info('[Epoch %d] speed: %d Hz time cost: %f'%(epoch, throughput, time.time()-tic)) 744 | logger.info('[Epoch %d] validation: acc-top1=%.5f acc-top5=%.5f'%(epoch, 1-err_top1_val, 1-err_top5_val)) 745 | 746 | if err_top1_val < best_val_score: 747 | best_val_score = err_top1_val 748 | # net.export('%s/%.4f-imagenet-%s'%(save_dir, (1-best_val_score), model_name),epoch) 749 | net.save_parameters('%s/%.4f-imagenet-%s-%d-best.params'%(save_dir, (1-best_val_score), model_name, epoch)) 750 | trainer.save_states('%s/%.4f-imagenet-%s-%d-best.states'%(save_dir, (1-best_val_score), model_name, epoch)) 751 | 752 | if save_frequency and save_dir and (epoch + 1) % save_frequency == 0: 753 | net.save_parameters('%s/imagenet-%s-%d.params'%(save_dir, model_name, epoch)) 754 | trainer.save_states('%s/imagenet-%s-%d.states'%(save_dir, model_name, epoch)) 755 | 756 | if save_frequency and save_dir: 757 | net.save_parameters('%s/imagenet-%s-%d.params'%(save_dir, model_name, opt.num_epochs-1)) 758 | trainer.save_states('%s/imagenet-%s-%d.states'%(save_dir, model_name, opt.num_epochs-1)) 759 | 760 | 761 | if opt.mode == 'hybrid': 762 | net.hybridize(static_alloc=True, static_shape=True) 763 | if distillation: 764 | teacher.hybridize(static_alloc=True, static_shape=True) 765 | train(context) 766 | 767 | if __name__ == '__main__': 768 | main() 769 | --------------------------------------------------------------------------------