├── AlexNet_compressed.net
├── LICENSE
├── README.md
├── bvlc_alexnet_deploy.prototxt
└── decode.py


/AlexNet_compressed.net:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/songhan/Deep-Compression-AlexNet/a4ab6859e1bc86dec3607c8cdd53b0a72da6bcda/AlexNet_compressed.net


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 2-Clause License
 2 | 
 3 | Redistribution and use in source and binary forms, with or without
 4 | modification, are permitted provided that the following conditions are met:
 5 | 
 6 | * Redistributions of source code must retain the above copyright notice, this
 7 |   list of conditions and the following disclaimer.
 8 | 
 9 | * Redistributions in binary form must reproduce the above copyright notice,
10 |   this list of conditions and the following disclaimer in the documentation
11 |   and/or other materials provided with the distribution.
12 | 
13 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
14 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
16 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
17 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
19 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
20 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
21 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
22 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | - March 15, 2019: for our most updated work on model compression and acceleration, please reference: 
 2 | 
 3 | 	[ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware](https://arxiv.org/pdf/1812.00332.pdf) (ICLR’19)
 4 | 
 5 | 	[AMC: AutoML for Model Compression and Acceleration on Mobile Devices](https://arxiv.org/pdf/1802.03494.pdf) (ECCV’18)
 6 | 
 7 | 	[HAQ: Hardware-Aware Automated Quantization](https://arxiv.org/pdf/1811.08886.pdf)  (CVPR’19)
 8 | 	
 9 | 	[Defenstive Quantization: When Efficiency Meet Robustness](https://openreview.net/pdf?id=ryetZ20ctX) (ICLR'19)
10 | 
11 | 
12 | # Deep Compression on AlexNet
13 | This is a demo of [Deep Compression](http://arxiv.org/pdf/1510.00149v5.pdf) compressing AlexNet from 233MB to 8.9MB without loss of accuracy. It only differs from the paper that Huffman coding is not applied. Deep Compression's video from [ICLR'16 best paper award presentation](https://youtu.be/kQAhW9gh6aU) is available. 
14 | 
15 | # Related Papers
16 | [Learning both Weights and Connections for Efficient Neural Network (NIPS'15)](http://arxiv.org/pdf/1506.02626v3.pdf)
17 | 
18 | [Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding (ICLR'16, best paper award)](http://arxiv.org/pdf/1510.00149v5.pdf)
19 | 
20 | [EIE: Efficient Inference Engine on Compressed Deep Neural Network (ISCA'16)](http://arxiv.org/pdf/1602.01528v1.pdf)
21 | 
22 | If you find Deep Compression useful in your research, please consider citing the paper:
23 | 
24 | 	@inproceedings{han2015learning,
25 | 	  title={Learning both Weights and Connections for Efficient Neural Network},
26 | 	  author={Han, Song and Pool, Jeff and Tran, John and Dally, William},
27 | 	  booktitle={Advances in Neural Information Processing Systems (NIPS)},
28 | 	  pages={1135--1143},
29 | 	  year={2015}
30 | 	}
31 | 	
32 | 	
33 | 	@article{han2015deep_compression,
34 | 	  title={Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding},
35 | 	  author={Han, Song and Mao, Huizi and Dally, William J},
36 | 	  journal={International Conference on Learning Representations (ICLR)},
37 | 	  year={2016}
38 | 	}
39 | 	
40 | **A hardware accelerator working directly on the deep compressed model:**
41 | 	
42 | 	@article{han2016eie,
43 | 	  title={EIE: Efficient Inference Engine on Compressed Deep Neural Network},
44 | 	  author={Han, Song and Liu, Xingyu and Mao, Huizi and Pu, Jing and Pedram, Ardavan and Horowitz, Mark A and Dally, William J},
45 | 	  journal={International Conference on Computer Architecture (ISCA)},
46 | 	  year={2016}
47 | 	}
48 | 
49 | 
50 | 
51 | # Usage:
52 | 
53 |     export CAFFE_ROOT=$your caffe root$
54 | 
55 |     python decode.py bvlc_alexnet_deploy.prototxt AlexNet_compressed.net $CAFFE_ROOT/alexnet.caffemodel 
56 | 
57 |     cd $CAFFE_ROOT
58 | 
59 |     ./build/tools/caffe test --model=models/bvlc_alexnet/train_val.prototxt --weights=alexnet.caffemodel --iterations=1000 --gpu 0
60 | 
61 | 
62 | # Test Result:
63 | 	I1022 20:18:58.336736 13182 caffe.cpp:198] accuracy_top1 = 0.57074
64 | 	I1022 20:18:58.336745 13182 caffe.cpp:198] accuracy_top5 = 0.80254
65 | 


--------------------------------------------------------------------------------
/bvlc_alexnet_deploy.prototxt:
--------------------------------------------------------------------------------
  1 | name: "AlexNet"
  2 | input: "data"
  3 | input_dim: 10
  4 | input_dim: 3
  5 | input_dim: 227
  6 | input_dim: 227
  7 | layer {
  8 |   name: "conv1"
  9 |   type: "Convolution"
 10 |   bottom: "data"
 11 |   top: "conv1"
 12 |   param {
 13 |     lr_mult: 1
 14 |     decay_mult: 1
 15 |   }
 16 |   param {
 17 |     lr_mult: 2
 18 |     decay_mult: 0
 19 |   }
 20 |   convolution_param {
 21 |     num_output: 96
 22 |     kernel_size: 11
 23 |     stride: 4
 24 |   }
 25 | }
 26 | layer {
 27 |   name: "relu1"
 28 |   type: "ReLU"
 29 |   bottom: "conv1"
 30 |   top: "conv1"
 31 | }
 32 | layer {
 33 |   name: "norm1"
 34 |   type: "LRN"
 35 |   bottom: "conv1"
 36 |   top: "norm1"
 37 |   lrn_param {
 38 |     local_size: 5
 39 |     alpha: 0.0001
 40 |     beta: 0.75
 41 |   }
 42 | }
 43 | layer {
 44 |   name: "pool1"
 45 |   type: "Pooling"
 46 |   bottom: "norm1"
 47 |   top: "pool1"
 48 |   pooling_param {
 49 |     pool: MAX
 50 |     kernel_size: 3
 51 |     stride: 2
 52 |   }
 53 | }
 54 | layer {
 55 |   name: "conv2"
 56 |   type: "Convolution"
 57 |   bottom: "pool1"
 58 |   top: "conv2"
 59 |   param {
 60 |     lr_mult: 1
 61 |     decay_mult: 1
 62 |   }
 63 |   param {
 64 |     lr_mult: 2
 65 |     decay_mult: 0
 66 |   }
 67 |   convolution_param {
 68 |     num_output: 256
 69 |     pad: 2
 70 |     kernel_size: 5
 71 |     group: 2
 72 |   }
 73 | }
 74 | layer {
 75 |   name: "relu2"
 76 |   type: "ReLU"
 77 |   bottom: "conv2"
 78 |   top: "conv2"
 79 | }
 80 | layer {
 81 |   name: "norm2"
 82 |   type: "LRN"
 83 |   bottom: "conv2"
 84 |   top: "norm2"
 85 |   lrn_param {
 86 |     local_size: 5
 87 |     alpha: 0.0001
 88 |     beta: 0.75
 89 |   }
 90 | }
 91 | layer {
 92 |   name: "pool2"
 93 |   type: "Pooling"
 94 |   bottom: "norm2"
 95 |   top: "pool2"
 96 |   pooling_param {
 97 |     pool: MAX
 98 |     kernel_size: 3
 99 |     stride: 2
100 |   }
101 | }
102 | layer {
103 |   name: "conv3"
104 |   type: "Convolution"
105 |   bottom: "pool2"
106 |   top: "conv3"
107 |   param {
108 |     lr_mult: 1
109 |     decay_mult: 1
110 |   }
111 |   param {
112 |     lr_mult: 2
113 |     decay_mult: 0
114 |   }
115 |   convolution_param {
116 |     num_output: 384
117 |     pad: 1
118 |     kernel_size: 3
119 |   }
120 | }
121 | layer {
122 |   name: "relu3"
123 |   type: "ReLU"
124 |   bottom: "conv3"
125 |   top: "conv3"
126 | }
127 | layer {
128 |   name: "conv4"
129 |   type: "Convolution"
130 |   bottom: "conv3"
131 |   top: "conv4"
132 |   param {
133 |     lr_mult: 1
134 |     decay_mult: 1
135 |   }
136 |   param {
137 |     lr_mult: 2
138 |     decay_mult: 0
139 |   }
140 |   convolution_param {
141 |     num_output: 384
142 |     pad: 1
143 |     kernel_size: 3
144 |     group: 2
145 |   }
146 | }
147 | layer {
148 |   name: "relu4"
149 |   type: "ReLU"
150 |   bottom: "conv4"
151 |   top: "conv4"
152 | }
153 | layer {
154 |   name: "conv5"
155 |   type: "Convolution"
156 |   bottom: "conv4"
157 |   top: "conv5"
158 |   param {
159 |     lr_mult: 1
160 |     decay_mult: 1
161 |   }
162 |   param {
163 |     lr_mult: 2
164 |     decay_mult: 0
165 |   }
166 |   convolution_param {
167 |     num_output: 256
168 |     pad: 1
169 |     kernel_size: 3
170 |     group: 2
171 |   }
172 | }
173 | layer {
174 |   name: "relu5"
175 |   type: "ReLU"
176 |   bottom: "conv5"
177 |   top: "conv5"
178 | }
179 | layer {
180 |   name: "pool5"
181 |   type: "Pooling"
182 |   bottom: "conv5"
183 |   top: "pool5"
184 |   pooling_param {
185 |     pool: MAX
186 |     kernel_size: 3
187 |     stride: 2
188 |   }
189 | }
190 | layer {
191 |   name: "fc6"
192 |   type: "InnerProduct"
193 |   bottom: "pool5"
194 |   top: "fc6"
195 |   param {
196 |     lr_mult: 1
197 |     decay_mult: 1
198 |   }
199 |   param {
200 |     lr_mult: 2
201 |     decay_mult: 0
202 |   }
203 |   inner_product_param {
204 |     num_output: 4096
205 |   }
206 | }
207 | layer {
208 |   name: "relu6"
209 |   type: "ReLU"
210 |   bottom: "fc6"
211 |   top: "fc6"
212 | }
213 | layer {
214 |   name: "drop6"
215 |   type: "Dropout"
216 |   bottom: "fc6"
217 |   top: "fc6"
218 |   dropout_param {
219 |     dropout_ratio: 0.5
220 |   }
221 | }
222 | layer {
223 |   name: "fc7"
224 |   type: "InnerProduct"
225 |   bottom: "fc6"
226 |   top: "fc7"
227 |   param {
228 |     lr_mult: 1
229 |     decay_mult: 1
230 |   }
231 |   param {
232 |     lr_mult: 2
233 |     decay_mult: 0
234 |   }
235 |   inner_product_param {
236 |     num_output: 4096
237 |   }
238 | }
239 | layer {
240 |   name: "relu7"
241 |   type: "ReLU"
242 |   bottom: "fc7"
243 |   top: "fc7"
244 | }
245 | layer {
246 |   name: "drop7"
247 |   type: "Dropout"
248 |   bottom: "fc7"
249 |   top: "fc7"
250 |   dropout_param {
251 |     dropout_ratio: 0.5
252 |   }
253 | }
254 | layer {
255 |   name: "fc8"
256 |   type: "InnerProduct"
257 |   bottom: "fc7"
258 |   top: "fc8"
259 |   param {
260 |     lr_mult: 1
261 |     decay_mult: 1
262 |   }
263 |   param {
264 |     lr_mult: 2
265 |     decay_mult: 0
266 |   }
267 |   inner_product_param {
268 |     num_output: 1000
269 |   }
270 | }
271 | layer {
272 |   name: "prob"
273 |   type: "Softmax"
274 |   bottom: "fc8"
275 |   top: "prob"
276 | }
277 | 


--------------------------------------------------------------------------------
/decode.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | If you find Deep Compression useful in your research, please consider citing the paper:
  3 | 
  4 | @inproceedings{han2015learning,
  5 |   title={Learning both Weights and Connections for Efficient Neural Network},
  6 |   author={Han, Song and Pool, Jeff and Tran, John and Dally, William},
  7 |   booktitle={Advances in Neural Information Processing Systems (NIPS)},
  8 |   pages={1135--1143},
  9 |   year={2015}
 10 | }
 11 | 
 12 | 
 13 | @article{han2015deep_compression,
 14 |   title={Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding},
 15 |   author={Han, Song and Mao, Huizi and Dally, William J},
 16 |   journal={International Conference on Learning Representations (ICLR)},
 17 |   year={2016}
 18 | }
 19 | 
 20 | A hardware accelerator working directly on the deep compressed model:
 21 | 
 22 | @article{han2016eie,
 23 |   title={EIE: Efficient Inference Engine on Compressed Deep Neural Network},
 24 |   author={Han, Song and Liu, Xingyu and Mao, Huizi and Pu, Jing and Pedram, Ardavan and Horowitz, Mark A and Dally, William J},
 25 |   journal={International Conference on Computer Architecture (ISCA)},
 26 |   year={2016}
 27 | }
 28 | 
 29 | 
 30 | 
 31 | '''
 32 | 
 33 | import sys
 34 | import os
 35 | import numpy as np
 36 | import pickle
 37 | 
 38 | help_ = '''
 39 | Usage:
 40 |     decode.py <net.prototxt> <net.binary> <target.caffemodel>
 41 |     Set variable CAFFE_ROOT as root of caffe before run this demo!
 42 | '''
 43 | 
 44 | if len(sys.argv) != 4:
 45 |     print help_
 46 |     sys.exit()
 47 | else:
 48 |     prototxt = sys.argv[1]
 49 |     net_bin = sys.argv[2]
 50 |     target = sys.argv[3]
 51 | 
 52 | # os.system("cd $CAFFE_ROOT")
 53 | try:
 54 |     caffe_root = os.environ["CAFFE_ROOT"]
 55 | except KeyError:
 56 |     print "Set system variable CAFFE_ROOT before running the demo!"
 57 |     sys.exit()
 58 | 
 59 | sys.path.insert(0, caffe_root + '/python')
 60 | import caffe
 61 | 
 62 | caffe.set_mode_cpu()
 63 | net = caffe.Net(prototxt, caffe.TEST)
 64 | layers = filter(lambda x:'conv' in x or 'fc' in x or 'ip' in x, net.params.keys())
 65 | 
 66 | fin = open(net_bin, 'rb')
 67 | 
 68 | def binary_to_net(weights, spm_stream, ind_stream, codebook, num_nz):
 69 |     bits = np.log2(codebook.size)
 70 |     if bits == 4:
 71 |         slots = 2
 72 |     elif bits == 8:
 73 |         slots = 1
 74 |     else:
 75 |         print "Not impemented,", bits
 76 |         sys.exit()
 77 |     code = np.zeros(weights.size, np.uint8) 
 78 | 
 79 |     # Recover from binary stream
 80 |     spm = np.zeros(num_nz, np.uint8)
 81 |     ind = np.zeros(num_nz, np.uint8)
 82 |     if slots == 2:
 83 |         spm[np.arange(0, num_nz, 2)] = spm_stream % (2**4)
 84 |         spm[np.arange(1, num_nz, 2)] = spm_stream / (2**4)
 85 |     else:
 86 |         spm = spm_stream
 87 |     ind[np.arange(0, num_nz, 2)] = ind_stream% (2**4)
 88 |     ind[np.arange(1, num_nz, 2)] = ind_stream/ (2**4)
 89 | 
 90 | 
 91 |     # Recover the matrix
 92 |     ind = np.cumsum(ind+1)-1
 93 |     code[ind] = spm
 94 |     data = np.reshape(codebook[code], weights.shape)
 95 |     np.copyto(weights, data)
 96 | 
 97 | nz_num = np.fromfile(fin, dtype = np.uint32, count = len(layers))
 98 | for idx, layer in enumerate(layers):
 99 |     # print "Reconstruct layer", layer
100 |     # print "Total Non-zero number:", nz_num[idx]
101 |     if 'conv' in layer:
102 |         bits = 8
103 |     else:
104 |         bits = 4
105 |     codebook_size = 2 ** bits
106 |     codebook = np.fromfile(fin, dtype = np.float32, count = codebook_size)
107 |     bias = np.fromfile(fin, dtype = np.float32, count = net.params[layer][1].data.size)
108 |     np.copyto(net.params[layer][1].data, bias)
109 | 
110 |     spm_stream = np.fromfile(fin, dtype = np.uint8, count = (nz_num[idx]-1) / (8/bits) + 1)
111 |     ind_stream = np.fromfile(fin, dtype = np.uint8, count = (nz_num[idx]-1) / 2+1)
112 | 
113 |     binary_to_net(net.params[layer][0].data, spm_stream, ind_stream, codebook, nz_num[idx])
114 | 
115 | net.save(target)
116 | print "All done! See your output caffemodel and test its accuracy."
117 | 


--------------------------------------------------------------------------------