├── AlexNet_compressed.net ├── LICENSE ├── README.md ├── bvlc_alexnet_deploy.prototxt └── decode.py /AlexNet_compressed.net: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/songhan/Deep-Compression-AlexNet/a4ab6859e1bc86dec3607c8cdd53b0a72da6bcda/AlexNet_compressed.net -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 2-Clause License 2 | 3 | Redistribution and use in source and binary forms, with or without 4 | modification, are permitted provided that the following conditions are met: 5 | 6 | * Redistributions of source code must retain the above copyright notice, this 7 | list of conditions and the following disclaimer. 8 | 9 | * Redistributions in binary form must reproduce the above copyright notice, 10 | this list of conditions and the following disclaimer in the documentation 11 | and/or other materials provided with the distribution. 12 | 13 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 14 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 16 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 17 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 19 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 20 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 21 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 22 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | - March 15, 2019: for our most updated work on model compression and acceleration, please reference: 2 | 3 | [ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) (ICLR’19) 4 | 5 | [AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/pdf/1802.03494.pdf) (ECCV’18) 6 | 7 | [HAQ: Hardware-Aware Automated Quantization](https://arxiv.org/pdf/1811.08886.pdf) (CVPR’19) 8 | 9 | [Defenstive Quantization: When Efficiency Meet Robustness](https://openreview.net/pdf?id=ryetZ20ctX) (ICLR'19) 10 | 11 | 12 | # Deep Compression on AlexNet 13 | This is a demo of [Deep Compression](http://arxiv.org/pdf/1510.00149v5.pdf) compressing AlexNet from 233MB to 8.9MB without loss of accuracy. It only differs from the paper that Huffman coding is not applied. Deep Compression's video from [ICLR'16 best paper award presentation](https://youtu.be/kQAhW9gh6aU) is available. 14 | 15 | # Related Papers 16 | [Learning both Weights and Connections for Efficient Neural Network (NIPS'15)](http://arxiv.org/pdf/1506.02626v3.pdf) 17 | 18 | [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (ICLR'16, best paper award)](http://arxiv.org/pdf/1510.00149v5.pdf) 19 | 20 | [EIE: Efficient Inference Engine on Compressed Deep Neural Network (ISCA'16)](http://arxiv.org/pdf/1602.01528v1.pdf) 21 | 22 | If you find Deep Compression useful in your research, please consider citing the paper: 23 | 24 | @inproceedings{han2015learning, 25 | title={Learning both Weights and Connections for Efficient Neural Network}, 26 | author={Han, Song and Pool, Jeff and Tran, John and Dally, William}, 27 | booktitle={Advances in Neural Information Processing Systems (NIPS)}, 28 | pages={1135--1143}, 29 | year={2015} 30 | } 31 | 32 | 33 | @article{han2015deep_compression, 34 | title={Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, 35 | author={Han, Song and Mao, Huizi and Dally, William J}, 36 | journal={International Conference on Learning Representations (ICLR)}, 37 | year={2016} 38 | } 39 | 40 | **A hardware accelerator working directly on the deep compressed model:** 41 | 42 | @article{han2016eie, 43 | title={EIE: Efficient Inference Engine on Compressed Deep Neural Network}, 44 | author={Han, Song and Liu, Xingyu and Mao, Huizi and Pu, Jing and Pedram, Ardavan and Horowitz, Mark A and Dally, William J}, 45 | journal={International Conference on Computer Architecture (ISCA)}, 46 | year={2016} 47 | } 48 | 49 | 50 | 51 | # Usage: 52 | 53 | export CAFFE_ROOT=$your caffe root$ 54 | 55 | python decode.py bvlc_alexnet_deploy.prototxt AlexNet_compressed.net $CAFFE_ROOT/alexnet.caffemodel 56 | 57 | cd $CAFFE_ROOT 58 | 59 | ./build/tools/caffe test --model=models/bvlc_alexnet/train_val.prototxt --weights=alexnet.caffemodel --iterations=1000 --gpu 0 60 | 61 | 62 | # Test Result: 63 | I1022 20:18:58.336736 13182 caffe.cpp:198] accuracy_top1 = 0.57074 64 | I1022 20:18:58.336745 13182 caffe.cpp:198] accuracy_top5 = 0.80254 65 | -------------------------------------------------------------------------------- /bvlc_alexnet_deploy.prototxt: -------------------------------------------------------------------------------- 1 | name: "AlexNet" 2 | input: "data" 3 | input_dim: 10 4 | input_dim: 3 5 | input_dim: 227 6 | input_dim: 227 7 | layer { 8 | name: "conv1" 9 | type: "Convolution" 10 | bottom: "data" 11 | top: "conv1" 12 | param { 13 | lr_mult: 1 14 | decay_mult: 1 15 | } 16 | param { 17 | lr_mult: 2 18 | decay_mult: 0 19 | } 20 | convolution_param { 21 | num_output: 96 22 | kernel_size: 11 23 | stride: 4 24 | } 25 | } 26 | layer { 27 | name: "relu1" 28 | type: "ReLU" 29 | bottom: "conv1" 30 | top: "conv1" 31 | } 32 | layer { 33 | name: "norm1" 34 | type: "LRN" 35 | bottom: "conv1" 36 | top: "norm1" 37 | lrn_param { 38 | local_size: 5 39 | alpha: 0.0001 40 | beta: 0.75 41 | } 42 | } 43 | layer { 44 | name: "pool1" 45 | type: "Pooling" 46 | bottom: "norm1" 47 | top: "pool1" 48 | pooling_param { 49 | pool: MAX 50 | kernel_size: 3 51 | stride: 2 52 | } 53 | } 54 | layer { 55 | name: "conv2" 56 | type: "Convolution" 57 | bottom: "pool1" 58 | top: "conv2" 59 | param { 60 | lr_mult: 1 61 | decay_mult: 1 62 | } 63 | param { 64 | lr_mult: 2 65 | decay_mult: 0 66 | } 67 | convolution_param { 68 | num_output: 256 69 | pad: 2 70 | kernel_size: 5 71 | group: 2 72 | } 73 | } 74 | layer { 75 | name: "relu2" 76 | type: "ReLU" 77 | bottom: "conv2" 78 | top: "conv2" 79 | } 80 | layer { 81 | name: "norm2" 82 | type: "LRN" 83 | bottom: "conv2" 84 | top: "norm2" 85 | lrn_param { 86 | local_size: 5 87 | alpha: 0.0001 88 | beta: 0.75 89 | } 90 | } 91 | layer { 92 | name: "pool2" 93 | type: "Pooling" 94 | bottom: "norm2" 95 | top: "pool2" 96 | pooling_param { 97 | pool: MAX 98 | kernel_size: 3 99 | stride: 2 100 | } 101 | } 102 | layer { 103 | name: "conv3" 104 | type: "Convolution" 105 | bottom: "pool2" 106 | top: "conv3" 107 | param { 108 | lr_mult: 1 109 | decay_mult: 1 110 | } 111 | param { 112 | lr_mult: 2 113 | decay_mult: 0 114 | } 115 | convolution_param { 116 | num_output: 384 117 | pad: 1 118 | kernel_size: 3 119 | } 120 | } 121 | layer { 122 | name: "relu3" 123 | type: "ReLU" 124 | bottom: "conv3" 125 | top: "conv3" 126 | } 127 | layer { 128 | name: "conv4" 129 | type: "Convolution" 130 | bottom: "conv3" 131 | top: "conv4" 132 | param { 133 | lr_mult: 1 134 | decay_mult: 1 135 | } 136 | param { 137 | lr_mult: 2 138 | decay_mult: 0 139 | } 140 | convolution_param { 141 | num_output: 384 142 | pad: 1 143 | kernel_size: 3 144 | group: 2 145 | } 146 | } 147 | layer { 148 | name: "relu4" 149 | type: "ReLU" 150 | bottom: "conv4" 151 | top: "conv4" 152 | } 153 | layer { 154 | name: "conv5" 155 | type: "Convolution" 156 | bottom: "conv4" 157 | top: "conv5" 158 | param { 159 | lr_mult: 1 160 | decay_mult: 1 161 | } 162 | param { 163 | lr_mult: 2 164 | decay_mult: 0 165 | } 166 | convolution_param { 167 | num_output: 256 168 | pad: 1 169 | kernel_size: 3 170 | group: 2 171 | } 172 | } 173 | layer { 174 | name: "relu5" 175 | type: "ReLU" 176 | bottom: "conv5" 177 | top: "conv5" 178 | } 179 | layer { 180 | name: "pool5" 181 | type: "Pooling" 182 | bottom: "conv5" 183 | top: "pool5" 184 | pooling_param { 185 | pool: MAX 186 | kernel_size: 3 187 | stride: 2 188 | } 189 | } 190 | layer { 191 | name: "fc6" 192 | type: "InnerProduct" 193 | bottom: "pool5" 194 | top: "fc6" 195 | param { 196 | lr_mult: 1 197 | decay_mult: 1 198 | } 199 | param { 200 | lr_mult: 2 201 | decay_mult: 0 202 | } 203 | inner_product_param { 204 | num_output: 4096 205 | } 206 | } 207 | layer { 208 | name: "relu6" 209 | type: "ReLU" 210 | bottom: "fc6" 211 | top: "fc6" 212 | } 213 | layer { 214 | name: "drop6" 215 | type: "Dropout" 216 | bottom: "fc6" 217 | top: "fc6" 218 | dropout_param { 219 | dropout_ratio: 0.5 220 | } 221 | } 222 | layer { 223 | name: "fc7" 224 | type: "InnerProduct" 225 | bottom: "fc6" 226 | top: "fc7" 227 | param { 228 | lr_mult: 1 229 | decay_mult: 1 230 | } 231 | param { 232 | lr_mult: 2 233 | decay_mult: 0 234 | } 235 | inner_product_param { 236 | num_output: 4096 237 | } 238 | } 239 | layer { 240 | name: "relu7" 241 | type: "ReLU" 242 | bottom: "fc7" 243 | top: "fc7" 244 | } 245 | layer { 246 | name: "drop7" 247 | type: "Dropout" 248 | bottom: "fc7" 249 | top: "fc7" 250 | dropout_param { 251 | dropout_ratio: 0.5 252 | } 253 | } 254 | layer { 255 | name: "fc8" 256 | type: "InnerProduct" 257 | bottom: "fc7" 258 | top: "fc8" 259 | param { 260 | lr_mult: 1 261 | decay_mult: 1 262 | } 263 | param { 264 | lr_mult: 2 265 | decay_mult: 0 266 | } 267 | inner_product_param { 268 | num_output: 1000 269 | } 270 | } 271 | layer { 272 | name: "prob" 273 | type: "Softmax" 274 | bottom: "fc8" 275 | top: "prob" 276 | } 277 | -------------------------------------------------------------------------------- /decode.py: -------------------------------------------------------------------------------- 1 | ''' 2 | If you find Deep Compression useful in your research, please consider citing the paper: 3 | 4 | @inproceedings{han2015learning, 5 | title={Learning both Weights and Connections for Efficient Neural Network}, 6 | author={Han, Song and Pool, Jeff and Tran, John and Dally, William}, 7 | booktitle={Advances in Neural Information Processing Systems (NIPS)}, 8 | pages={1135--1143}, 9 | year={2015} 10 | } 11 | 12 | 13 | @article{han2015deep_compression, 14 | title={Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding}, 15 | author={Han, Song and Mao, Huizi and Dally, William J}, 16 | journal={International Conference on Learning Representations (ICLR)}, 17 | year={2016} 18 | } 19 | 20 | A hardware accelerator working directly on the deep compressed model: 21 | 22 | @article{han2016eie, 23 | title={EIE: Efficient Inference Engine on Compressed Deep Neural Network}, 24 | author={Han, Song and Liu, Xingyu and Mao, Huizi and Pu, Jing and Pedram, Ardavan and Horowitz, Mark A and Dally, William J}, 25 | journal={International Conference on Computer Architecture (ISCA)}, 26 | year={2016} 27 | } 28 | 29 | 30 | 31 | ''' 32 | 33 | import sys 34 | import os 35 | import numpy as np 36 | import pickle 37 | 38 | help_ = ''' 39 | Usage: 40 | decode.py 41 | Set variable CAFFE_ROOT as root of caffe before run this demo! 42 | ''' 43 | 44 | if len(sys.argv) != 4: 45 | print help_ 46 | sys.exit() 47 | else: 48 | prototxt = sys.argv[1] 49 | net_bin = sys.argv[2] 50 | target = sys.argv[3] 51 | 52 | # os.system("cd $CAFFE_ROOT") 53 | try: 54 | caffe_root = os.environ["CAFFE_ROOT"] 55 | except KeyError: 56 | print "Set system variable CAFFE_ROOT before running the demo!" 57 | sys.exit() 58 | 59 | sys.path.insert(0, caffe_root + '/python') 60 | import caffe 61 | 62 | caffe.set_mode_cpu() 63 | net = caffe.Net(prototxt, caffe.TEST) 64 | layers = filter(lambda x:'conv' in x or 'fc' in x or 'ip' in x, net.params.keys()) 65 | 66 | fin = open(net_bin, 'rb') 67 | 68 | def binary_to_net(weights, spm_stream, ind_stream, codebook, num_nz): 69 | bits = np.log2(codebook.size) 70 | if bits == 4: 71 | slots = 2 72 | elif bits == 8: 73 | slots = 1 74 | else: 75 | print "Not impemented,", bits 76 | sys.exit() 77 | code = np.zeros(weights.size, np.uint8) 78 | 79 | # Recover from binary stream 80 | spm = np.zeros(num_nz, np.uint8) 81 | ind = np.zeros(num_nz, np.uint8) 82 | if slots == 2: 83 | spm[np.arange(0, num_nz, 2)] = spm_stream % (2**4) 84 | spm[np.arange(1, num_nz, 2)] = spm_stream / (2**4) 85 | else: 86 | spm = spm_stream 87 | ind[np.arange(0, num_nz, 2)] = ind_stream% (2**4) 88 | ind[np.arange(1, num_nz, 2)] = ind_stream/ (2**4) 89 | 90 | 91 | # Recover the matrix 92 | ind = np.cumsum(ind+1)-1 93 | code[ind] = spm 94 | data = np.reshape(codebook[code], weights.shape) 95 | np.copyto(weights, data) 96 | 97 | nz_num = np.fromfile(fin, dtype = np.uint32, count = len(layers)) 98 | for idx, layer in enumerate(layers): 99 | # print "Reconstruct layer", layer 100 | # print "Total Non-zero number:", nz_num[idx] 101 | if 'conv' in layer: 102 | bits = 8 103 | else: 104 | bits = 4 105 | codebook_size = 2 ** bits 106 | codebook = np.fromfile(fin, dtype = np.float32, count = codebook_size) 107 | bias = np.fromfile(fin, dtype = np.float32, count = net.params[layer][1].data.size) 108 | np.copyto(net.params[layer][1].data, bias) 109 | 110 | spm_stream = np.fromfile(fin, dtype = np.uint8, count = (nz_num[idx]-1) / (8/bits) + 1) 111 | ind_stream = np.fromfile(fin, dtype = np.uint8, count = (nz_num[idx]-1) / 2+1) 112 | 113 | binary_to_net(net.params[layer][0].data, spm_stream, ind_stream, codebook, nz_num[idx]) 114 | 115 | net.save(target) 116 | print "All done! See your output caffemodel and test its accuracy." 117 | --------------------------------------------------------------------------------