├── DPN ├── README.md └── data │ ├── figure_1.png │ ├── figure_2.png │ ├── figure_3.png │ ├── formula_1.png │ ├── formula_2.png │ ├── formula_4.png │ ├── formula_5.png │ ├── table_1.png │ ├── table_2.png │ ├── table_3.png │ ├── table_4.png │ ├── table_5.png │ └── table_6.png ├── DenseNet ├── README.md ├── data │ ├── figure_1.png │ ├── figure_2.png │ ├── figure_3.png │ ├── figure_4.png │ ├── figure_5.png │ ├── formula_1.png │ ├── formula_2.png │ ├── table_1.png │ └── table_2.png ├── densenet_121_deploy.prototxt ├── densenet_121_test.prototxt ├── densenet_121_train.prototxt ├── densenet_deploy.py └── densenet_train_test.py ├── GoogLeNet ├── googlenet_v1_deploy.prototxt └── googlenet_v1_deploy.py ├── README.md ├── ResNeXt ├── README.md ├── data │ ├── figure_1.png │ ├── figure_2.png │ ├── figure_3.png │ ├── figure_4.png │ ├── figure_5.png │ ├── figure_6.png │ ├── figure_7.png │ ├── formula_1.png │ ├── formula_2.png │ ├── formula_3.png │ ├── formula_4.png │ ├── table_1.png │ ├── table_2.png │ ├── table_3.png │ ├── table_4.png │ ├── table_5.png │ ├── table_6.png │ ├── table_7.png │ └── table_8.png ├── resnext_101_deploy.prototxt ├── resnext_101_deploy.py ├── resnext_50_deploy.prototxt └── resnext_50_deploy.py ├── ResNet-v2 ├── README.md ├── data │ ├── figure_1.png │ ├── figure_2.png │ ├── figure_3.png │ ├── figure_4.png │ ├── figure_5.png │ ├── figure_6.png │ ├── formula_1.png │ ├── formula_3.png │ ├── formula_4.png │ ├── formula_5.png │ ├── formula_6.png │ ├── formula_7.png │ ├── formula_8.png │ ├── table_1.png │ ├── table_2.png │ ├── table_3.png │ ├── table_4.png │ └── table_5.png ├── resnet_v2_1001_deploy.prototxt ├── resnet_v2_164_deploy.prototxt └── resnet_v2_deploy.py ├── ResNet ├── README.md ├── data │ ├── figure_1.png │ ├── figure_2.png │ ├── figure_3.png │ ├── figure_4.png │ ├── figure_5.png │ ├── figure_6.png │ ├── figure_7.png │ ├── table_1.png │ ├── table_2.png │ ├── table_3.png │ ├── table_4.png │ ├── table_5.png │ ├── table_6.png │ ├── table_7.png │ └── table_8.png ├── resnet_101_deploy.prototxt ├── resnet_101_deploy.py ├── resnet_152_deploy.prototxt ├── resnet_152_deploy.py ├── resnet_50_deploy.prototxt ├── resnet_50_deploy.py ├── resnet_50_train_test.prototxt └── resnet_50_train_test.py ├── SENet ├── README.md └── data │ ├── figure_1.png │ ├── figure_2.png │ ├── figure_3.png │ ├── figure_4.png │ ├── figure_5.png │ ├── figure_6.png │ ├── figure_7.png │ ├── figure_8.png │ ├── table_1.png │ ├── table_2.png │ ├── table_3.png │ ├── table_4.png │ └── table_5.png ├── ShuffleNet ├── README.md └── data │ ├── channel_shuffle.jpg │ ├── figure_1.png │ ├── figure_2.png │ ├── table_1.png │ ├── table_2.png │ ├── table_3.png │ ├── table_4.png │ ├── table_5.png │ ├── table_6.png │ ├── table_7.png │ └── table_8.png ├── VGGNet ├── vgg_16_deploy.prototxt ├── vgg_16_deploy.py ├── vgg_19_deploy.prototxt └── vgg_19_deploy.py ├── WRN ├── README.md ├── data │ ├── figure_1.png │ ├── figure_2.png │ ├── figure_3.png │ ├── figure_4.png │ ├── table_1.png │ ├── table_2.png │ ├── table_3.png │ ├── table_4.png │ ├── table_5.png │ ├── table_6.png │ ├── table_7.png │ ├── table_8.png │ └── table_9.png ├── wrn_28_10_deploy.prototxt └── wrn_deploy.py └── install_Caffe_CentOS7.sh /DPN/README.md: -------------------------------------------------------------------------------- 1 | # DPN 2 | [Dual Path Networks](https://arxiv.org/abs/1707.01629)
3 | Yunpeng Chen, Jianan Li, Huaxin Xiao, Xiaojie Jin, Shuicheng Yan, Jiashi Feng
4 | 5 | ### 摘要 6 | 本文提出一种简单高效且高度模块化的双路网络(Dual Path Networks,DPN),提出了一种新的网络内部连接的拓扑结构。 7 | 通过考察ResNet和DenseNet与HORNN(higher order recurrent neural network)之间的等价性,我们发现ResNet可以重复利用网络中的特征, 8 | 而DenseNet可以探索新的特征,这两个特性都有助于网络学习到好的表示。本文提出的双路网络既可以共享网络特征,也拥有探索新特征的能力, 9 | 综合了上面两个最先进网络的优点。在ImagNet-1k、Places365和PASCAL VOC数据集上的大量实验证明DPN的性能优于之前的最先进的网络, 10 | 特别是在ImagNet-1k数据集上,一个浅层DPN(模型规模小26%、计算复杂度少25%、内存消耗低8%)的性能超过ResNeXt-101(64x4d), 11 | 一个更深层的DPN(DPN-131)相比于最先进的单模型性能提升更多,但训练速度却要快2倍左右。其他数据集上的实验也证明DPN在多个应用上的性能要优于 12 | ResNet、DenseNet和ResNeXt。
13 | 14 | ### 1. Introduction 15 | 本文的目标是开发一种新的深层网络中的连接拓扑结构,主要聚焦于跳跃连接(skip connection),这种连接方式广泛应用于现代的深度神经网络中。 16 | 跳跃连接使得输入信息可以直接传输到后面的层,而梯度可以直接反传到前面的层,有效缓解了梯度消失问题,网络也更易于优化。
17 | 深度残差网络(Deep Residual Network,[ResNet](https://github.com/binLearning/caffe_toolkit/tree/master/ResNet)) 18 | 是成功使用跳跃连接的网络之一,它将不同block(即残差函数)用跳跃连接直接相连,这种跳跃连接在ResNet中被称为残差路径(residual path), 19 | 残差路径的输出和残差函数的输出通过加法操作结合,形成一个残差单元。在ResNet也发展出了多种架构, 20 | 比如[WRN](https://github.com/binLearning/caffe_toolkit/tree/master/WRN)、Inception-resnet、 21 | [ResNeXt](https://github.com/binLearning/caffe_toolkit/tree/master/ResNeXt)等。
22 | 不同于ResNet通过残差路径将输入特征和输出特征相加,最近发表的密集卷积网络(Dense Convolutional Network, 23 | [DenseNet](https://github.com/binLearning/caffe_toolkit/tree/master/DenseNet)) 24 | 使用密集连接路径(densely connected path)将输入特征与输入特征相连接,使得每一个block可以得到之前所有block的原始信息(注:同属一个stage)。 25 | 本文主要研究ResNet和DenseNet各自的优点以及局限性,然后提出一种新的路径设计方式——双路径架构。 26 | 通过考察DenseNet与HORNN之间的关系从另一个角度理解DenseNet,并且考察ResNet与DenseNet之间的关系。经过上述研究我们发现, 27 | 深度残差网络通过残差路径隐式的对特征进行重复使用,而密集连接网络通过密集连接路径可以持续探索新的特征。
28 | 双路网络DPN继承了上面两个网络的优点,可以对特征进行再利用(re-usage)、再开发(re-exploitation)。DPN也有参数利用率高、 29 | 计算复杂度低、内存消耗少、易于优化的优点。
30 | 31 | ### 2. Related work 32 | AlexNet和VGG是两个非常重要的网络,它们显示了深层卷积神经网络的能力,并且指出使用极小的卷积核(3x3)可以提高神经网络的学习能力。 33 | ResNet中使用了跳跃连接,极大减缓了优化难度并提高了模型性能,后续也有很多基于ResNet的网络架构。DenseNet通过沿通道维度串联的方式连接输入/输出特征, 34 | 所以密集连接路径的宽度会随着深度的增加而线性增加,所需的参数量平方增加,如果程序实现没有针对性的优化就会消耗很大的GPU显存, 35 | 这就限制了通过加深或增宽DenseNet来进一步提升模型准确率。
36 | [ResNet-v2](https://github.com/binLearning/caffe_toolkit/tree/master/ResNet-v2)显示了残差路径(恒等映射)对于缓解优化问题的重要性。 37 | 也有工作考察ResNet与RNN的关系,与其相似,本文考察了DenseNet与HORNN的关系。
38 | 39 | ### 3. Revisiting ResNet, DenseNet and Higher Order RNN 40 | 首先将DenseNet和HORNN进行类比,然后简化DenseNet到ResNet,如图1所示:
41 | ![](./data/figure_1.png)
42 | 最后得出结论:ResNet可以促进特征重复使用,减少特征冗余;DenseNet可以探索新的特征,但存在冗余问题。
43 | 44 | ### 4. Dual Path Networks 45 | #### 4.2 Dual Path Networks 46 | 具体如图2所示,将ResNet和DenseNet结合起来。实际实现的时候用ResNeXt来代替ResNet作为主体,然后使用“slice layer”和“concat layer” 47 | 添加额外的DenseNet的路径,最终得到DPN网络。
48 | ![](./data/figure_2.png)
49 | DPN与DenseNet、ResNeXt的网络架构配置以及复杂度见表1。
50 | ![](./data/table_1.png)
51 | 52 | ### 5. Experiments 53 | DPN在不同任务不同数据集上的表现如下:
54 | ![](./data/table_2.png)
55 | ![](./data/table_3.png)
56 | ![](./data/figure_3.png)
57 | ![](./data/table_4.png)
58 | ![](./data/table_5.png)
59 | ![](./data/table_6.png)
60 | -------------------------------------------------------------------------------- /DPN/data/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/figure_1.png -------------------------------------------------------------------------------- /DPN/data/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/figure_2.png -------------------------------------------------------------------------------- /DPN/data/figure_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/figure_3.png -------------------------------------------------------------------------------- /DPN/data/formula_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/formula_1.png -------------------------------------------------------------------------------- /DPN/data/formula_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/formula_2.png -------------------------------------------------------------------------------- /DPN/data/formula_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/formula_4.png -------------------------------------------------------------------------------- /DPN/data/formula_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/formula_5.png -------------------------------------------------------------------------------- /DPN/data/table_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/table_1.png -------------------------------------------------------------------------------- /DPN/data/table_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/table_2.png -------------------------------------------------------------------------------- /DPN/data/table_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/table_3.png -------------------------------------------------------------------------------- /DPN/data/table_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/table_4.png -------------------------------------------------------------------------------- /DPN/data/table_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/table_5.png -------------------------------------------------------------------------------- /DPN/data/table_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DPN/data/table_6.png -------------------------------------------------------------------------------- /DenseNet/README.md: -------------------------------------------------------------------------------- 1 | # DenseNet 2 | [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993)
3 | Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der Maaten
4 | 5 | ### 摘要 6 | 近期的一些工作表明,如果在网络层之间加上快捷连接(shorter connections),那么卷积网络可以设计得更深层、取得更高的准确率、训练也更高效。 7 | 本文提出一种密集卷积网络(Dense Convolutional Network,DenseNet),网络中的层会与它之前的所有层直接连接。具有L层的传统卷积网络中有L条连接, 8 | 而DenseNet中有L(L+1)/2条直接连接线路。对于网络中的每一层,在它之前的所有层所生成的特征图(feature-maps)都会作为该层的输入。 9 | DenseNet的优点有:缓解梯度消失问题,增强特征在网络中的传输,特征可重复利用,大幅降低网络参数数量。我们在四个benchmark数据集 10 | (CIFAR-10,CIFAR-100,SVHN and ImageNet)上评估网络性能,DenseNet相比于之前的大多数先进网络都有较大提升。官方实现(Caffe)的源码地址: 11 | https://github.com/liuzhuang13/DenseNetCaffe 。
12 | 13 | ### 1. Introduction 14 | CNN在最近才真正是“深度”网络,Hightway Networks和ResNet是最早突破100层的网络架构。随着网络深度的增加,一个新问题出现了: 15 | 输入的信息或者反传的梯度在经过多个网络层之后可能会消失。最近的多项工作都可以用来解决这个问题,比如ResNet、Hightway Networks、 16 | 随机深度的ResNet、FractalNet等,这些网络架构都有一个共同点:层之间都有直连的线路。
17 | 本文提出一种新的连接方式:为了最大化网络层间的信息传输,所有层(具有相同特征图空间尺寸)均加上快捷连接,如图1所示。
18 | ![](./data/figure_1.png)
19 | ResNet使用加法操作来连接不同分支的输出,而DenseNet使用沿通道维度串联的方式来整合输出。由于这种密集的连接方式,我们称本文的网络为 20 | Dense Convolutional Network(DenseNet)。
21 | DenseNet需要的参数规模比传统的卷积网络更小,这是因为它不需要重新学习那些冗余的特征图。传统的前馈架构可以视作带状态的算法, 22 | 状态在层间进行传递。每一层都会对状态做一些变化,但也会保留一些必要的信息。ResNet将这些需要保留的信息直接通过恒等映射进行传输, 23 | [deep networks with stochastic depth](https://arxiv.org/abs/1603.09382)展示了ResNet中的很多层对最终的结果影响极小, 24 | 可以在训练时随机丢弃部分层。如此一来ResNet中的状态和(展开的)RNN就很相似,但是ResNet中每一层的参数不是共享的,所以中的参数量要大得多。 25 | DenseNet将每一层新添加的信息和需要保留的信息区分开来。DenseNet中的层可以很精简(比如每一层只产生12个特征图), 26 | 每层只添加少量的特征图到网络的“集体知识(collective knowledge)”中,其余的特征图保存不变,分类器最终的决策依赖于网络中的所有特征图。
27 | 除了对参数的有效利用之外,DenseNet还有一个很大的优点,它可以改进信息和梯度在网络中的传输,使得网络更易于优化。 28 | 每一层都可以直接得到损失函数的梯度以及原始的输入信号,就像隐式的深度监督(deep supervision)。这有助于训练更深层的网络。 29 | 另外我们还发现密集连接有一定的正则化效果,在训练集规模比较小时可以避免过拟合。
30 | 31 | ### 2. Related Work 32 | FCN等网络通过快捷连接(skip-connnection)将网络中的多级特征进行组合,可有效提升网络性能。AdaNet也提出一种跨层连接的网络架构。 33 | Highway Network是第一个可以有效训练超过100层的网络结构。ResNet将Highway Network中的门限分路直接改为恒等映射, 34 | 在多个计算机视觉领域取得极大的性能提升。随机深度ResNet通过随机丢弃部分层来改进训练过程,成功训练了超过1000层的网络, 35 | 这个工作说明并不是所有层都是必须的,也就是说深度残差网络中存在着大量冗余,DenseNet的部分灵感来自于这个观察。 36 | 预激活(pre-activation)的ResNet([ResNet-v2](https://github.com/binLearning/caffe_toolkit/tree/master/ResNet-v2)) 37 | 也可以训练出超过1000层的网络。
38 | 除了增加网络深度外,还有一些网络是从增加网络宽度入手。GoogLeNet中的Inception模块将不同尺寸卷积核产生的特征图相连接作为输出。 39 | Resnet in Resnet (RiR)提出了一种更宽的残差block。Wide Residual Networks(WRN)展示了只要残差网络深度足够, 40 | 通过简单的每层的增加滤波器数量就可以提高网络性能。FractalNet使用更宽的网络结构也取得了很好的效果。
41 | 不同于从极深/宽的网络中获取更强的表示能力,DenseNet致力于探索特征重复使用(feature reuse)的潜力,同时使用精简的网络, 42 | 使得模型更易于优化并且具有很高的参数利用率。连接不同层所产生的特征图可以增加后续层输入的多样性并提高效率,这是与ResNet最大的不同之处。 43 | Inception网络也是连接不同层所产生的特征图,但是DenseNet更加简单高效。
44 | 还有一些网络架构也有很好的性能,比如Network in Network (NIN)、Deeply Supervised Network (DSN)、Ladder Networks、 45 | Deeply-Fused Nets (DFNs)等。 46 | 47 | ### 3. DenseNets 48 | **ResNets**
49 | ResNet在层间中加入一个恒等映射的快捷连接:
50 | ![](./data/formula_1.png)
51 | ResNet的优点是后面层中的梯度可以通过恒等函数直接传输到前面的层。但是 **ResNet中恒等连接的输出与残差函数的输出通过加法操作进行连接, 52 | 可能会影响网络中的信息流动** 。 
53 | **DenseNet**
54 | DenseNet中的每一层与它所有的后续层都有直接连接,如图1所示,也就是说每一层的输入包含它之前所有层所产生的特征图: 
55 | ![](./data/formula_2.png)
56 | 为了便于实现,将公式(2)中的输入连接为一个单独的张量。 
57 | **Composite function** 
58 | 与ResNet-v2中一样,残差函数由单个连续操作组成:batch normalization (BN),rectified linear unit (ReLU),3×3 convolution (Conv)。
59 | **Pooling layers**
60 | DenseNet将网络分为多个密集连接的dense block,如图2所示,每个block之间加入一个transition layer用于改变特征图尺寸, 61 | transition layer由batch normalization (BN),1x1 convolution (Conv),2×2 average pooling组成。
62 | ![](./data/figure_2.png)
63 | **Growth rate**
64 | 引入一个新的超参数growth rate,表示每个残差函数H产生的特征图数量,本文中以k表示。为了防止网络太宽并提高参数利用率, 65 | k的取值不应太大,一般12,16即可。可以将特征图视为网络的全局状态,每层都会新添加k个特征图,那么growth rate就可以 66 | 控制每一层可以向全局状态中添加多少新的信息。
67 | **Bottleneck layers**
68 | 虽然每一层只产生k个特征图,但加起来的总量是很可观的,导致后续层的输入量太大。本文使用bottleneck layer(1x1-3x3-1x1)来解决这一问题。 69 | 第一个1x1卷积层可以减少输入特征图的数量,以此来提高计算效率。本文将使用了bottleneck layer的模型表示为DenseNet-B。 70 | 除非另有说明,本文所有bottleneck layer中的第一个1x1卷积层将输入特征图减少到4k个。
71 | **Compression**
72 | 为了进一步精简网络,在transition layer中也减少一定比例的特征图,本文中设置该比例为0.5也就是减少一半的特征图。 73 | 本文将同时使用Bottleneck layers和Compression的模型表示为DenseNet-BC。
74 | **Implementation Details**  
75 | 具体见表1。 
76 | ![](./data/table_1.png)
77 | 78 | ### 4. Experiments 79 | #### 4.3 Classification Results on CIFAR and SVHN 80 | 结果见表2。 
81 | ![](./data/table_2.png)
82 | **Accuracy**
83 | 250层DenseNet-BC在SVHN上的表现不佳,可能是因为SVHN相对比较简单,极深层的网络出现了过拟合的现象。
84 | **Capacity**
85 | DenseNet随着L和k的增大性能也持续提升,说明DenseNet可以加深/宽来提高表示能力,也可以看出DenseNet没有出现过拟合或者优化困难的现象。
86 | **Parameter Efficiency**
87 | DenseNet的参数利用率比其他模型更高,尤其是DenseNet-BC。
88 | **Overfitting**
89 | 参数利用率高的一个正面影响就是DenseNet不易发生过拟合现象,DenseNet-BC也可以避免过拟合。
90 | #### 4.4 Classification Results on ImageNet 91 | 与ResNet的比较见图3。 
92 | ![](./data/figure_3.png)
93 | 94 | ### 5. Discussion 95 | **Model compactness**
96 | DenseNet可以重复利用前面层的特征图,并且使用更加精简的模型。图4展示了不同网络的参数使用率。从图中可以看出,DenseNet-BC是参数利用率最高的模型。 97 | 这个结果也符合图3中的趋势。图4-right显示了只有0.8M可训练参数的DenseNet-BC性能可以匹敌包含10.2M参数的1001层ResNet。
98 | ![](./data/figure_4.png)
99 | **Implicit Deep Supervision**
100 | DenseNet性能的提升也可能得益于隐式的深度监督机制,每一层都可以通过快捷连接直接从损失函数层得到梯度(额外的监督信号)。 101 | deeply-supervised nets (DSN)中解释了深度监督的优势,相比较而言DenseNet中的监督信号更加简单,所有层都是从同一个损失函数层接收梯度。
102 | **Stochastic vs. deterministic connection**
103 | DenseNet在一定程度上受到了随机深度ResNet的启发。
104 | **Feature Reuse**
105 | DenseNet中的每一层可以接收到它之前的所有层所产生的特征图(有时要经过transition layers)。为了验证网络是否受益于该机制,针对同一block中的每一层, 106 | 计算该层与它前面s层输出上的权值的绝对值均值,图5展示了三个dense block中每一层的情况,权值的绝对值均值可以考察该层对之前层的依赖程度。 
107 | ![](./data/figure_5.png)
108 | 从图5中可以看出:
109 | 1.同一block中的每一层在多个输入上都有权值。这说明在同一个block中,最早期提取的特征也会被最后的层直接利用到。
110 | 2.transition layers在几乎所有输入上都有权值。这说明DenseNet网络中第一层的信息也可以间接传输到最后一层。
111 | 3.第二和第三个block中的层都在前面transition layers产生的特征上分配了最少的权重。这说明transition layers的输出中有很多冗余特征, 112 | DenseNet-BC通过压缩这些输出获得了更好的性能也说明了这一点。
113 | 4.最终的分类层更多的利用了最后的一些特征图,这可能是因为最后部分的层会生成更高层的特征(更具有区分能力)。
114 | 115 | ### 6. Conclusion 116 | 本文提出了一种新的卷积网络架构——Dense Convolutional Network (DenseNet),同一个block中的所有层互联。DenseNet参数规模更小, 117 | 计算复杂度更低,但在多个任务上取得了最佳的结果。
118 | 得益于密集连接的方式,DenseNet可以同时具有恒等映射(identity mapping)、深度监督(deep supervision)和深度多样性 119 | (diversified depth)的特性。DenseNet可以重复利用网络中的特征,学习到更简洁、准确率更高的模型。由于它内部表示的简洁以及 120 | 对冗余特征的缩减,DenseNet可以在多种计算机视觉任务中作为特征提取器。
121 | -------------------------------------------------------------------------------- /DenseNet/data/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/figure_1.png -------------------------------------------------------------------------------- /DenseNet/data/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/figure_2.png -------------------------------------------------------------------------------- /DenseNet/data/figure_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/figure_3.png -------------------------------------------------------------------------------- /DenseNet/data/figure_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/figure_4.png -------------------------------------------------------------------------------- /DenseNet/data/figure_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/figure_5.png -------------------------------------------------------------------------------- /DenseNet/data/formula_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/formula_1.png -------------------------------------------------------------------------------- /DenseNet/data/formula_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/formula_2.png -------------------------------------------------------------------------------- /DenseNet/data/table_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/table_1.png -------------------------------------------------------------------------------- /DenseNet/data/table_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/DenseNet/data/table_2.png -------------------------------------------------------------------------------- /DenseNet/densenet_deploy.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | from __future__ import print_function 3 | 4 | from caffe import layers as L, params as P, to_proto 5 | from caffe.proto import caffe_pb2 6 | import caffe 7 | 8 | 9 | # Batch Normalization - PReLU - Convolution - [Dropout] layer 10 | # used in Dense Block (DB) and Transition Down (TD). 11 | # param: 12 | # bottom: bottom blob 13 | # ks: kernel_size used in convolution layer, [default] 3 in DB, 1 in TD 14 | # nout: num_output used in convolution layer, be known also as "growth rate" in DB 15 | # stride: stride used in convolution layer, [default] 1 in both DB and TD 16 | # pad: pad used in convolution layer, [default] 1 in DB, 0 in TD 17 | # dropout: dropout_ratio used in dropout layer, [default] 0.2 18 | def bn_relu_conv(net, flag, bottom, ks, nout, stride, pad, dropout): 19 | suffix = '{}x{}'.format(ks, ks) 20 | flag_bn = '{}_{}_bn'.format(flag, suffix) 21 | flag_scale = '{}_{}_scale'.format(flag, suffix) 22 | flag_relu = '{}_{}_relu'.format(flag, suffix) 23 | flag_conv = '{}_{}_conv'.format(flag, suffix) 24 | flag_drop = '{}_{}_dropout'.format(flag, suffix) 25 | 26 | net[flag_bn] = L.BatchNorm(bottom, in_place=False, 27 | batch_norm_param = dict(use_global_stats=True)) 28 | net[flag_scale] = L.Scale(net[flag_bn], bias_term=True, in_place=True) 29 | net[flag_relu] = L.PReLU(net[flag_scale], in_place=True) 30 | 31 | net[flag_conv] = L.Convolution(net[flag_relu], num_output=nout, 32 | kernel_size=ks, stride=stride, pad=pad, 33 | bias_term=False) 34 | 35 | if dropout > 0: 36 | net[flag_drop] = L.Dropout(net[flag_conv], dropout_ratio=dropout) 37 | return net[flag_drop] 38 | 39 | return net[flag_conv] 40 | 41 | 42 | # concat layer 43 | # concat input and output blobs in the same bn_relu_conv layer, 44 | # in order to concat any layer to all subsequent layers in the same DB. 45 | # param: 46 | # bottom: bottom blob 47 | # num_filter: num_output used in convolution layer, be known also as "growth rate" in DB 48 | # dropout: dropout_ratio 49 | def cat_layer(net, major, minor, bottom, num_filter, dropout): 50 | flag_brc = 'block_{}_{}'.format(major, minor) 51 | flag_cat = 'block_{}_{}_concat'.format(major, minor) 52 | 53 | # convolution 1*1, [B] Bottleneck layer in DB 54 | bottleneck = bn_relu_conv(net, flag_brc, bottom, ks=1, nout=num_filter*4, 55 | stride=1, pad=0, dropout=dropout) 56 | # convolution 3*3 57 | brc_layer = bn_relu_conv(net, flag_brc, bottleneck, ks=3, nout=num_filter, 58 | stride=1, pad=1, dropout=dropout) 59 | 60 | net[flag_cat] = L.Concat(bottom, brc_layer, axis=1) 61 | 62 | return net[flag_cat] 63 | 64 | 65 | # transition down 66 | # reduce the spatial dimensionality via convolution and pooling. 67 | # param: 68 | # bottom: bottom blob 69 | # num_filter: num_output used in convolution layer 70 | # dropout: dropout_ratio 71 | def transition_down(net, major, bottom, num_filter, dropout): 72 | flag_brc = 'transition_down_{}'.format(major) 73 | flag_pool = 'transition_down_{}_pooling'.format(major) 74 | 75 | # [C] 1/ratio < 1.0 76 | ratio = 2 # 1/ratio=0.5 77 | brc_layer = bn_relu_conv(net, flag_brc, bottom, ks=1, nout=num_filter//ratio, 78 | stride=1, pad=0, dropout=dropout) 79 | net[flag_pool] = L.Pooling(brc_layer, pool=P.Pooling.AVE, kernel_size=2, stride=2) 80 | 81 | return net[flag_pool] 82 | 83 | 84 | # DenseNet Architecture 85 | # param: 86 | # bs: batch_size 87 | # nlayer: list the number of bn_relu_conv layers in each DB 88 | # nclass: the number of classes 89 | # first_nout: num_output used in first convolution layer before entering the first DB, 90 | # set it to be comparable to growth_rate 91 | # growth_rate: growth rate, in reference to num_output used in convolution layers in DB 92 | # dropout: dropout_ratio, set to 0 to disable dropout 93 | def densenet(nlayer, nclass, first_nout=16, growth_rate=16, dropout=0.2): 94 | 95 | net = caffe.NetSpec() 96 | 97 | net.data = L.Input(input_param = dict(shape = dict(dim = [1,3,224,224]))) 98 | 99 | pre_fmap = 0 # total number of previous feature maps 100 | 101 | # first convolution -------------------------------------------------------- 102 | net.conv_1 = L.Convolution(net.data, num_output=first_nout, 103 | kernel_size=7, stride=2, pad=3) 104 | net.relu_1 = L.PReLU(net.conv_1, in_place=True) 105 | net.pool_1 = L.Pooling(net.relu_1, pool=P.Pooling.MAX, 106 | kernel_size=3, stride=2) 107 | 108 | pre_layer = net.pool_1 109 | pre_fmap += first_nout 110 | 111 | # DB + TD ------------------------------------------------------------------ 112 | # +1 in order to make the index values from 1 113 | for major in xrange(len(nlayer)-1): 114 | # DB 115 | for minor in xrange(nlayer[major]): 116 | pre_layer = cat_layer(net, major+1, minor+1, pre_layer, growth_rate, dropout) 117 | pre_fmap += growth_rate 118 | # TD 119 | pre_layer = transition_down(net, major+1, pre_layer, pre_fmap, dropout) 120 | pre_fmap = pre_fmap // 2 121 | 122 | # last DB, without TD 123 | major = len(nlayer) 124 | for minor in xrange(nlayer[-1]): 125 | pre_layer = cat_layer(net, major, minor+1, pre_layer, growth_rate, dropout) 126 | pre_fmap += growth_rate 127 | 128 | # final layers ------------------------------------------------------------- 129 | net.bn_final = L.BatchNorm(pre_layer, in_place=False, 130 | batch_norm_param = dict(use_global_stats=True)) 131 | net.scale_finel = L.Scale(net.bn_final, bias_term=True, in_place=True) 132 | net.relu_final = L.PReLU(net.scale_finel, in_place=True) 133 | net.pool_final = L.Pooling(net.relu_final, pool=P.Pooling.AVE, global_pooling=True) 134 | 135 | net.fc_class = L.InnerProduct(net.pool_final, num_output=nclass) 136 | 137 | return str(net.to_proto()) 138 | 139 | 140 | def construct_net(): 141 | # DenseNet-121(k=32) 142 | growth_rate = 32 143 | nlayer = [6,12,24,16] 144 | # DenseNet-169(k=32) 145 | #growth_rate = 32 146 | #nlayer = [6,12,32,32] 147 | # DenseNet-201(k=32) 148 | #growth_rate = 32 149 | #nlayer = [6,12,48,32] 150 | # DenseNet-161(k=48) 151 | #growth_rate = 48 152 | #nlayer = [6,12,36,24] 153 | 154 | first_nout = growth_rate * 2 155 | nclass = 1000 156 | 157 | total_num_layer = sum(nlayer)*2 + 5 158 | file_name = 'densenet_{}_deploy.prototxt'.format(total_num_layer) 159 | net_name = 'name: "DenseNet-{}_deploy"\n'.format(total_num_layer) 160 | net_arch = densenet(nlayer, nclass, first_nout=first_nout,growth_rate=growth_rate) 161 | with open(file_name, 'w') as f: 162 | f.write(net_name) 163 | f.write(net_arch) 164 | 165 | 166 | if __name__ == '__main__': 167 | construct_net() 168 | -------------------------------------------------------------------------------- /DenseNet/densenet_train_test.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | from __future__ import print_function 3 | 4 | from caffe import layers as L, params as P, to_proto 5 | from caffe.proto import caffe_pb2 6 | import caffe 7 | 8 | 9 | # Batch Normalization - PReLU - Convolution - [Dropout] layer 10 | # used in Dense Block (DB) and Transition Down (TD). 11 | # param: 12 | # mode: TRAIN(0) or TEST(1) phase 13 | # bottom: bottom blob 14 | # ks: kernel_size used in convolution layer, [default] 3 in DB, 1 in TD 15 | # nout: num_output used in convolution layer, be known also as "growth rate" in DB 16 | # stride: stride used in convolution layer, [default] 1 in both DB and TD 17 | # pad: pad used in convolution layer, [default] 1 in DB, 0 in TD 18 | # dropout: dropout_ratio used in dropout layer, [default] 0.2 19 | def bn_relu_conv(net, mode, flag, bottom, ks, nout, stride, pad, dropout): 20 | suffix = '{}x{}'.format(ks, ks) 21 | flag_bn = '{}_{}_bn'.format(flag, suffix) 22 | flag_scale = '{}_{}_scale'.format(flag, suffix) 23 | flag_relu = '{}_{}_relu'.format(flag, suffix) 24 | flag_conv = '{}_{}_conv'.format(flag, suffix) 25 | flag_drop = '{}_{}_dropout'.format(flag, suffix) 26 | 27 | use_global_stats = False 28 | if mode == 1: # TEST phase 29 | use_global_stats = True 30 | 31 | net[flag_bn] = L.BatchNorm(bottom, in_place=False, 32 | batch_norm_param = dict(use_global_stats=use_global_stats), 33 | param=[dict(lr_mult=0, decay_mult=0), 34 | dict(lr_mult=0, decay_mult=0), 35 | dict(lr_mult=0, decay_mult=0)]) 36 | net[flag_scale] = L.Scale(net[flag_bn], bias_term=True, in_place=True, 37 | filler=dict(value=1), bias_filler=dict(value=0)) 38 | net[flag_relu] = L.PReLU(net[flag_scale], in_place=True) 39 | 40 | net[flag_conv] = L.Convolution(net[flag_relu], num_output=nout, 41 | kernel_size=ks, stride=stride, pad=pad, 42 | weight_filler=dict(type='msra'), 43 | bias_term=False) 44 | if dropout > 0: 45 | net[flag_drop] = L.Dropout(net[flag_conv], dropout_ratio=dropout) 46 | return net[flag_drop] 47 | 48 | return net[flag_conv] 49 | 50 | 51 | # concat layer 52 | # concat input and output blobs in the same bn_relu_conv layer, 53 | # in order to concat any layer to all subsequent layers in the same DB. 54 | # param: 55 | # mode: TRAIN(0) or TEST(1) phase 56 | # bottom: bottom blob 57 | # num_filter: num_output used in convolution layer, be known also as "growth rate" in DB 58 | # dropout: dropout_ratio 59 | def cat_layer(net, mode, major, minor, bottom, num_filter, dropout): 60 | flag_brc = 'block_{}_{}'.format(major, minor) 61 | flag_cat = 'block_{}_{}_concat'.format(major, minor) 62 | 63 | # convolution 1*1, [B] Bottleneck layer in DB 64 | bottleneck = bn_relu_conv(net, mode, flag_brc, bottom, ks=1, nout=num_filter*4, 65 | stride=1, pad=0, dropout=dropout) 66 | # convolution 3*3 67 | brc_layer = bn_relu_conv(net, mode, flag_brc, bottleneck, ks=3, nout=num_filter, 68 | stride=1, pad=1, dropout=dropout) 69 | 70 | net[flag_cat] = L.Concat(bottom, brc_layer, axis=1) 71 | 72 | return net[flag_cat] 73 | 74 | 75 | # transition down 76 | # reduce the spatial dimensionality via convolution and pooling. 77 | # param: 78 | # mode: TRAIN(0) or TEST(1) phase 79 | # bottom: bottom blob 80 | # num_filter: num_output used in convolution layer 81 | # dropout: dropout_ratio 82 | def transition_down(net, mode, major, bottom, num_filter, dropout): 83 | flag_brc = 'transition_down_{}'.format(major) 84 | flag_pool = 'transition_down_{}_pooling'.format(major) 85 | 86 | # [C] 1/ratio < 1.0 87 | ratio = 2 # 1/ratio=0.5 88 | brc_layer = bn_relu_conv(net, mode, flag_brc, bottom, ks=1, nout=num_filter//ratio, 89 | stride=1, pad=0, dropout=dropout) 90 | net[flag_pool] = L.Pooling(brc_layer, pool=P.Pooling.AVE, kernel_size=2, stride=2) 91 | 92 | return net[flag_pool] 93 | 94 | 95 | # DenseNet Architecture 96 | # param: 97 | # mode: TRAIN(0) or TEST(1) phase 98 | # bs: batch_size 99 | # data_file: list source data file used in TRAIN and TEST phase 100 | # nlayer: list the number of bn_relu_conv layers in each DB 101 | # nclass: the number of classes 102 | # first_nout: num_output used in first convolution layer before entering the first DB, 103 | # set it to be comparable to growth_rate 104 | # growth_rate: growth rate, in reference to num_output used in convolution layers in DB 105 | # dropout: dropout_ratio, set to 0 to disable dropout 106 | def densenet(mode, data_file, bs, nlayer, nclass, first_nout=16, growth_rate=16, dropout=0.2): 107 | 108 | net = caffe.NetSpec() 109 | 110 | # data layer --------------------------------------------------------------- 111 | mirror = True 112 | shuffle = True 113 | if mode == 1: # TEST phase 114 | mirror = False 115 | shuffle = False 116 | 117 | transform = dict(scale = 0.0078125, 118 | mirror = mirror, 119 | #crop_size = 224, 120 | mean_value = [127.5, 127.5, 127.5]) 121 | 122 | net.data, net.label = L.Data(#include = dict(phase = mode), 123 | transform_param = transform, 124 | source = data_file, 125 | batch_size = bs, 126 | backend = P.Data.LMDB, 127 | ntop = 2) 128 | # net.data, net.label = L.ImageData(#include = dict(phase = mode), 129 | # transform_param = transform, 130 | # source = data_file, 131 | # batch_size = bs, 132 | # shuffle = shuffle, 133 | # #new_height = 256, 134 | # #new_width = 256, 135 | # #is_color = True, 136 | # ntop = 2) 137 | 138 | pre_fmap = 0 # total number of previous feature maps 139 | 140 | # first convolution -------------------------------------------------------- 141 | net.conv_1 = L.Convolution(net.data, num_output=first_nout, 142 | kernel_size=7, stride=2, pad=3, 143 | weight_filler=dict(type='msra'), 144 | bias_filler=dict(type='constant'), 145 | param=[dict(lr_mult=1, decay_mult=1), 146 | dict(lr_mult=2, decay_mult=0)]) 147 | 148 | net.relu_1 = L.PReLU(net.conv_1, in_place=True) 149 | 150 | net.pool_1 = L.Pooling(net.relu_1, pool=P.Pooling.MAX, 151 | kernel_size=3, stride=2) 152 | 153 | pre_layer = net.pool_1 154 | pre_fmap += first_nout 155 | 156 | # DB + TD ------------------------------------------------------------------ 157 | # +1 in order to make the index values from 1 158 | for major in xrange(len(nlayer)-1): 159 | # DB 160 | for minor in xrange(nlayer[major]): 161 | pre_layer = cat_layer(net, mode, major+1, minor+1, pre_layer, growth_rate, dropout) 162 | pre_fmap += growth_rate 163 | # TD 164 | pre_layer = transition_down(net, mode, major+1, pre_layer, pre_fmap, dropout) 165 | pre_fmap = pre_fmap // 2 166 | 167 | # last DB, without TD 168 | major = len(nlayer) 169 | for minor in xrange(nlayer[-1]): 170 | pre_layer = cat_layer(net, mode, major, minor+1, pre_layer, growth_rate, dropout) 171 | pre_fmap += growth_rate 172 | 173 | # final layers ------------------------------------------------------------- 174 | use_global_stats = False 175 | if mode == 1: # TEST phase 176 | use_global_stats = True 177 | net.bn_final = L.BatchNorm(pre_layer, in_place=False, 178 | batch_norm_param = dict(use_global_stats=use_global_stats), 179 | param=[dict(lr_mult=0, decay_mult=0), 180 | dict(lr_mult=0, decay_mult=0), 181 | dict(lr_mult=0, decay_mult=0)]) 182 | net.scale_finel = L.Scale(net.bn_final, bias_term=True, in_place=True, 183 | filler=dict(value=1), bias_filler=dict(value=0)) 184 | net.relu_final = L.PReLU(net.scale_finel, in_place=True) 185 | net.pool_final = L.Pooling(net.relu_final, pool=P.Pooling.AVE, global_pooling=True) 186 | 187 | net.fc_class = L.InnerProduct(net.pool_final, num_output=nclass, 188 | weight_filler=dict(type='xavier'), 189 | bias_filler=dict(type='constant'), 190 | param=[dict(lr_mult=1, decay_mult=1), 191 | dict(lr_mult=2, decay_mult=0)]) 192 | 193 | net.loss = L.SoftmaxWithLoss(net.fc_class, net.label) 194 | 195 | if mode == 1: 196 | net.accuracy = L.Accuracy(net.fc_class, net.label) 197 | 198 | return str(net.to_proto()) 199 | 200 | 201 | def construct_net(): 202 | # DenseNet-121(k=32) 203 | growth_rate = 32 204 | nlayer = [6,12,24,16] 205 | # DenseNet-169(k=32) 206 | #growth_rate = 32 207 | #nlayer = [6,12,32,32] 208 | # DenseNet-201(k=32) 209 | #growth_rate = 32 210 | #nlayer = [6,12,48,32] 211 | # DenseNet-161(k=48) 212 | #growth_rate = 48 213 | #nlayer = [6,12,36,24] 214 | 215 | first_nout = growth_rate * 2 216 | nclass = 1000 217 | 218 | # train net 219 | mode = 0 220 | bs = 8 221 | data_file = '/data/train_lmdb' 222 | net_arch = densenet(mode, data_file, bs, nlayer, nclass, 223 | first_nout=first_nout, growth_rate=growth_rate) 224 | 225 | total_num_layer = sum(nlayer)*2 + 5 226 | file_name = 'densenet_{}_train.prototxt'.format(total_num_layer) 227 | net_name = 'name: "DenseNet-{}_train"\n'.format(total_num_layer) 228 | with open(file_name, 'w') as f: 229 | f.write(net_name) 230 | f.write(net_arch) 231 | 232 | # test net 233 | mode = 1 234 | bs = 8 235 | data_file = '/data/test_lmdb' 236 | net_arch = densenet(mode, data_file, bs, nlayer, nclass, 237 | first_nout=first_nout, growth_rate=growth_rate) 238 | 239 | total_num_layer = sum(nlayer)*2 + 5 240 | file_name = 'densenet_{}_test.prototxt'.format(total_num_layer) 241 | net_name = 'name: "DenseNet-{}_test"\n'.format(total_num_layer) 242 | with open(file_name, 'w') as f: 243 | f.write(net_name) 244 | f.write(net_arch) 245 | 246 | 247 | if __name__ == '__main__': 248 | construct_net() 249 | -------------------------------------------------------------------------------- /GoogLeNet/googlenet_v1_deploy.prototxt: -------------------------------------------------------------------------------- 1 | name: "GoogLeNet-v1_deploy" 2 | layer { 3 | name: "data" 4 | type: "Input" 5 | top: "data" 6 | input_param { 7 | shape { 8 | dim: 10 9 | dim: 3 10 | dim: 224 11 | dim: 224 12 | } 13 | } 14 | } 15 | layer { 16 | name: "conv1/7x7_s2" 17 | type: "Convolution" 18 | bottom: "data" 19 | top: "conv1/7x7_s2" 20 | convolution_param { 21 | num_output: 64 22 | pad: 3 23 | kernel_size: 7 24 | stride: 2 25 | } 26 | } 27 | layer { 28 | name: "conv1/relu_7x7_s2" 29 | type: "ReLU" 30 | bottom: "conv1/7x7_s2" 31 | top: "conv1/7x7_s2" 32 | } 33 | layer { 34 | name: "pool1/3x3_s2" 35 | type: "Pooling" 36 | bottom: "conv1/7x7_s2" 37 | top: "pool1/3x3_s2" 38 | pooling_param { 39 | pool: MAX 40 | kernel_size: 3 41 | stride: 2 42 | } 43 | } 44 | layer { 45 | name: "conv2/3x3_reduce" 46 | type: "Convolution" 47 | bottom: "pool1/3x3_s2" 48 | top: "conv2/3x3_reduce" 49 | convolution_param { 50 | num_output: 64 51 | pad: 0 52 | kernel_size: 1 53 | stride: 1 54 | } 55 | } 56 | layer { 57 | name: "conv2/relu_3x3_reduce" 58 | type: "ReLU" 59 | bottom: "conv2/3x3_reduce" 60 | top: "conv2/3x3_reduce" 61 | } 62 | layer { 63 | name: "conv2/3x3" 64 | type: "Convolution" 65 | bottom: "conv2/3x3_reduce" 66 | top: "conv2/3x3" 67 | convolution_param { 68 | num_output: 192 69 | pad: 1 70 | kernel_size: 3 71 | stride: 1 72 | } 73 | } 74 | layer { 75 | name: "conv2/relu_3x3" 76 | type: "ReLU" 77 | bottom: "conv2/3x3" 78 | top: "conv2/3x3" 79 | } 80 | layer { 81 | name: "pool2/3x3_s2" 82 | type: "Pooling" 83 | bottom: "conv2/3x3" 84 | top: "pool2/3x3_s2" 85 | pooling_param { 86 | pool: MAX 87 | kernel_size: 3 88 | stride: 2 89 | } 90 | } 91 | layer { 92 | name: "inception_3a/1x1" 93 | type: "Convolution" 94 | bottom: "pool2/3x3_s2" 95 | top: "inception_3a/1x1" 96 | convolution_param { 97 | num_output: 64 98 | pad: 0 99 | kernel_size: 1 100 | stride: 1 101 | } 102 | } 103 | layer { 104 | name: "inception_3a/relu_1x1" 105 | type: "ReLU" 106 | bottom: "inception_3a/1x1" 107 | top: "inception_3a/1x1" 108 | } 109 | layer { 110 | name: "inception_3a/3x3_reduce" 111 | type: "Convolution" 112 | bottom: "pool2/3x3_s2" 113 | top: "inception_3a/3x3_reduce" 114 | convolution_param { 115 | num_output: 96 116 | pad: 0 117 | kernel_size: 1 118 | stride: 1 119 | } 120 | } 121 | layer { 122 | name: "inception_3a/relu_3x3_reduce" 123 | type: "ReLU" 124 | bottom: "inception_3a/3x3_reduce" 125 | top: "inception_3a/3x3_reduce" 126 | } 127 | layer { 128 | name: "inception_3a/3x3 " 129 | type: "Convolution" 130 | bottom: "inception_3a/3x3_reduce" 131 | top: "inception_3a/3x3 " 132 | convolution_param { 133 | num_output: 128 134 | pad: 1 135 | kernel_size: 3 136 | stride: 1 137 | } 138 | } 139 | layer { 140 | name: "inception_3a/relu_3x3 " 141 | type: "ReLU" 142 | bottom: "inception_3a/3x3 " 143 | top: "inception_3a/3x3 " 144 | } 145 | layer { 146 | name: "inception_3a/5x5_reduce" 147 | type: "Convolution" 148 | bottom: "pool2/3x3_s2" 149 | top: "inception_3a/5x5_reduce" 150 | convolution_param { 151 | num_output: 16 152 | pad: 0 153 | kernel_size: 1 154 | stride: 1 155 | } 156 | } 157 | layer { 158 | name: "inception_3a/relu_5x5_reduce" 159 | type: "ReLU" 160 | bottom: "inception_3a/5x5_reduce" 161 | top: "inception_3a/5x5_reduce" 162 | } 163 | layer { 164 | name: "inception_3a/5x5" 165 | type: "Convolution" 166 | bottom: "inception_3a/5x5_reduce" 167 | top: "inception_3a/5x5" 168 | convolution_param { 169 | num_output: 32 170 | pad: 2 171 | kernel_size: 5 172 | stride: 1 173 | } 174 | } 175 | layer { 176 | name: "inception_3a/relu_5x5" 177 | type: "ReLU" 178 | bottom: "inception_3a/5x5" 179 | top: "inception_3a/5x5" 180 | } 181 | layer { 182 | name: "inception_3a/pool" 183 | type: "Pooling" 184 | bottom: "pool2/3x3_s2" 185 | top: "inception_3a/pool" 186 | pooling_param { 187 | pool: MAX 188 | kernel_size: 3 189 | stride: 1 190 | pad: 1 191 | } 192 | } 193 | layer { 194 | name: "inception_3a/pool_proj" 195 | type: "Convolution" 196 | bottom: "inception_3a/pool" 197 | top: "inception_3a/pool_proj" 198 | convolution_param { 199 | num_output: 32 200 | pad: 0 201 | kernel_size: 1 202 | stride: 1 203 | } 204 | } 205 | layer { 206 | name: "inception_3a/relu_pool_proj" 207 | type: "ReLU" 208 | bottom: "inception_3a/pool_proj" 209 | top: "inception_3a/pool_proj" 210 | } 211 | layer { 212 | name: "inception_3a/output" 213 | type: "concat" 214 | bottom: "inception_3a/1x1" 215 | bottom: "inception_3a/3x3 " 216 | bottom: "inception_3a/5x5" 217 | bottom: "inception_3a/pool_proj" 218 | top: "inception_3a/output" 219 | } 220 | layer { 221 | name: "inception_3b/1x1" 222 | type: "Convolution" 223 | bottom: "inception_3a/output" 224 | top: "inception_3b/1x1" 225 | convolution_param { 226 | num_output: 128 227 | pad: 0 228 | kernel_size: 1 229 | stride: 1 230 | } 231 | } 232 | layer { 233 | name: "inception_3b/relu_1x1" 234 | type: "ReLU" 235 | bottom: "inception_3b/1x1" 236 | top: "inception_3b/1x1" 237 | } 238 | layer { 239 | name: "inception_3b/3x3_reduce" 240 | type: "Convolution" 241 | bottom: "inception_3a/output" 242 | top: "inception_3b/3x3_reduce" 243 | convolution_param { 244 | num_output: 128 245 | pad: 0 246 | kernel_size: 1 247 | stride: 1 248 | } 249 | } 250 | layer { 251 | name: "inception_3b/relu_3x3_reduce" 252 | type: "ReLU" 253 | bottom: "inception_3b/3x3_reduce" 254 | top: "inception_3b/3x3_reduce" 255 | } 256 | layer { 257 | name: "inception_3b/3x3 " 258 | type: "Convolution" 259 | bottom: "inception_3b/3x3_reduce" 260 | top: "inception_3b/3x3 " 261 | convolution_param { 262 | num_output: 192 263 | pad: 1 264 | kernel_size: 3 265 | stride: 1 266 | } 267 | } 268 | layer { 269 | name: "inception_3b/relu_3x3 " 270 | type: "ReLU" 271 | bottom: "inception_3b/3x3 " 272 | top: "inception_3b/3x3 " 273 | } 274 | layer { 275 | name: "inception_3b/5x5_reduce" 276 | type: "Convolution" 277 | bottom: "inception_3a/output" 278 | top: "inception_3b/5x5_reduce" 279 | convolution_param { 280 | num_output: 32 281 | pad: 0 282 | kernel_size: 1 283 | stride: 1 284 | } 285 | } 286 | layer { 287 | name: "inception_3b/relu_5x5_reduce" 288 | type: "ReLU" 289 | bottom: "inception_3b/5x5_reduce" 290 | top: "inception_3b/5x5_reduce" 291 | } 292 | layer { 293 | name: "inception_3b/5x5" 294 | type: "Convolution" 295 | bottom: "inception_3b/5x5_reduce" 296 | top: "inception_3b/5x5" 297 | convolution_param { 298 | num_output: 96 299 | pad: 2 300 | kernel_size: 5 301 | stride: 1 302 | } 303 | } 304 | layer { 305 | name: "inception_3b/relu_5x5" 306 | type: "ReLU" 307 | bottom: "inception_3b/5x5" 308 | top: "inception_3b/5x5" 309 | } 310 | layer { 311 | name: "inception_3b/pool" 312 | type: "Pooling" 313 | bottom: "inception_3a/output" 314 | top: "inception_3b/pool" 315 | pooling_param { 316 | pool: MAX 317 | kernel_size: 3 318 | stride: 1 319 | pad: 1 320 | } 321 | } 322 | layer { 323 | name: "inception_3b/pool_proj" 324 | type: "Convolution" 325 | bottom: "inception_3b/pool" 326 | top: "inception_3b/pool_proj" 327 | convolution_param { 328 | num_output: 64 329 | pad: 0 330 | kernel_size: 1 331 | stride: 1 332 | } 333 | } 334 | layer { 335 | name: "inception_3b/relu_pool_proj" 336 | type: "ReLU" 337 | bottom: "inception_3b/pool_proj" 338 | top: "inception_3b/pool_proj" 339 | } 340 | layer { 341 | name: "inception_3b/output" 342 | type: "concat" 343 | bottom: "inception_3b/1x1" 344 | bottom: "inception_3b/3x3 " 345 | bottom: "inception_3b/5x5" 346 | bottom: "inception_3b/pool_proj" 347 | top: "inception_3b/output" 348 | } 349 | layer { 350 | name: "pool3/3x3_s2" 351 | type: "Pooling" 352 | bottom: "inception_3b/output" 353 | top: "pool3/3x3_s2" 354 | pooling_param { 355 | pool: MAX 356 | kernel_size: 3 357 | stride: 2 358 | } 359 | } 360 | layer { 361 | name: "inception_4a/1x1" 362 | type: "Convolution" 363 | bottom: "pool3/3x3_s2" 364 | top: "inception_4a/1x1" 365 | convolution_param { 366 | num_output: 192 367 | pad: 0 368 | kernel_size: 1 369 | stride: 1 370 | } 371 | } 372 | layer { 373 | name: "inception_4a/relu_1x1" 374 | type: "ReLU" 375 | bottom: "inception_4a/1x1" 376 | top: "inception_4a/1x1" 377 | } 378 | layer { 379 | name: "inception_4a/3x3_reduce" 380 | type: "Convolution" 381 | bottom: "pool3/3x3_s2" 382 | top: "inception_4a/3x3_reduce" 383 | convolution_param { 384 | num_output: 96 385 | pad: 0 386 | kernel_size: 1 387 | stride: 1 388 | } 389 | } 390 | layer { 391 | name: "inception_4a/relu_3x3_reduce" 392 | type: "ReLU" 393 | bottom: "inception_4a/3x3_reduce" 394 | top: "inception_4a/3x3_reduce" 395 | } 396 | layer { 397 | name: "inception_4a/3x3 " 398 | type: "Convolution" 399 | bottom: "inception_4a/3x3_reduce" 400 | top: "inception_4a/3x3 " 401 | convolution_param { 402 | num_output: 208 403 | pad: 1 404 | kernel_size: 3 405 | stride: 1 406 | } 407 | } 408 | layer { 409 | name: "inception_4a/relu_3x3 " 410 | type: "ReLU" 411 | bottom: "inception_4a/3x3 " 412 | top: "inception_4a/3x3 " 413 | } 414 | layer { 415 | name: "inception_4a/5x5_reduce" 416 | type: "Convolution" 417 | bottom: "pool3/3x3_s2" 418 | top: "inception_4a/5x5_reduce" 419 | convolution_param { 420 | num_output: 16 421 | pad: 0 422 | kernel_size: 1 423 | stride: 1 424 | } 425 | } 426 | layer { 427 | name: "inception_4a/relu_5x5_reduce" 428 | type: "ReLU" 429 | bottom: "inception_4a/5x5_reduce" 430 | top: "inception_4a/5x5_reduce" 431 | } 432 | layer { 433 | name: "inception_4a/5x5" 434 | type: "Convolution" 435 | bottom: "inception_4a/5x5_reduce" 436 | top: "inception_4a/5x5" 437 | convolution_param { 438 | num_output: 48 439 | pad: 2 440 | kernel_size: 5 441 | stride: 1 442 | } 443 | } 444 | layer { 445 | name: "inception_4a/relu_5x5" 446 | type: "ReLU" 447 | bottom: "inception_4a/5x5" 448 | top: "inception_4a/5x5" 449 | } 450 | layer { 451 | name: "inception_4a/pool" 452 | type: "Pooling" 453 | bottom: "pool3/3x3_s2" 454 | top: "inception_4a/pool" 455 | pooling_param { 456 | pool: MAX 457 | kernel_size: 3 458 | stride: 1 459 | pad: 1 460 | } 461 | } 462 | layer { 463 | name: "inception_4a/pool_proj" 464 | type: "Convolution" 465 | bottom: "inception_4a/pool" 466 | top: "inception_4a/pool_proj" 467 | convolution_param { 468 | num_output: 64 469 | pad: 0 470 | kernel_size: 1 471 | stride: 1 472 | } 473 | } 474 | layer { 475 | name: "inception_4a/relu_pool_proj" 476 | type: "ReLU" 477 | bottom: "inception_4a/pool_proj" 478 | top: "inception_4a/pool_proj" 479 | } 480 | layer { 481 | name: "inception_4a/output" 482 | type: "concat" 483 | bottom: "inception_4a/1x1" 484 | bottom: "inception_4a/3x3 " 485 | bottom: "inception_4a/5x5" 486 | bottom: "inception_4a/pool_proj" 487 | top: "inception_4a/output" 488 | } 489 | layer { 490 | name: "inception_4b/1x1" 491 | type: "Convolution" 492 | bottom: "inception_4a/output" 493 | top: "inception_4b/1x1" 494 | convolution_param { 495 | num_output: 160 496 | pad: 0 497 | kernel_size: 1 498 | stride: 1 499 | } 500 | } 501 | layer { 502 | name: "inception_4b/relu_1x1" 503 | type: "ReLU" 504 | bottom: "inception_4b/1x1" 505 | top: "inception_4b/1x1" 506 | } 507 | layer { 508 | name: "inception_4b/3x3_reduce" 509 | type: "Convolution" 510 | bottom: "inception_4a/output" 511 | top: "inception_4b/3x3_reduce" 512 | convolution_param { 513 | num_output: 112 514 | pad: 0 515 | kernel_size: 1 516 | stride: 1 517 | } 518 | } 519 | layer { 520 | name: "inception_4b/relu_3x3_reduce" 521 | type: "ReLU" 522 | bottom: "inception_4b/3x3_reduce" 523 | top: "inception_4b/3x3_reduce" 524 | } 525 | layer { 526 | name: "inception_4b/3x3 " 527 | type: "Convolution" 528 | bottom: "inception_4b/3x3_reduce" 529 | top: "inception_4b/3x3 " 530 | convolution_param { 531 | num_output: 224 532 | pad: 1 533 | kernel_size: 3 534 | stride: 1 535 | } 536 | } 537 | layer { 538 | name: "inception_4b/relu_3x3 " 539 | type: "ReLU" 540 | bottom: "inception_4b/3x3 " 541 | top: "inception_4b/3x3 " 542 | } 543 | layer { 544 | name: "inception_4b/5x5_reduce" 545 | type: "Convolution" 546 | bottom: "inception_4a/output" 547 | top: "inception_4b/5x5_reduce" 548 | convolution_param { 549 | num_output: 24 550 | pad: 0 551 | kernel_size: 1 552 | stride: 1 553 | } 554 | } 555 | layer { 556 | name: "inception_4b/relu_5x5_reduce" 557 | type: "ReLU" 558 | bottom: "inception_4b/5x5_reduce" 559 | top: "inception_4b/5x5_reduce" 560 | } 561 | layer { 562 | name: "inception_4b/5x5" 563 | type: "Convolution" 564 | bottom: "inception_4b/5x5_reduce" 565 | top: "inception_4b/5x5" 566 | convolution_param { 567 | num_output: 64 568 | pad: 2 569 | kernel_size: 5 570 | stride: 1 571 | } 572 | } 573 | layer { 574 | name: "inception_4b/relu_5x5" 575 | type: "ReLU" 576 | bottom: "inception_4b/5x5" 577 | top: "inception_4b/5x5" 578 | } 579 | layer { 580 | name: "inception_4b/pool" 581 | type: "Pooling" 582 | bottom: "inception_4a/output" 583 | top: "inception_4b/pool" 584 | pooling_param { 585 | pool: MAX 586 | kernel_size: 3 587 | stride: 1 588 | pad: 1 589 | } 590 | } 591 | layer { 592 | name: "inception_4b/pool_proj" 593 | type: "Convolution" 594 | bottom: "inception_4b/pool" 595 | top: "inception_4b/pool_proj" 596 | convolution_param { 597 | num_output: 64 598 | pad: 0 599 | kernel_size: 1 600 | stride: 1 601 | } 602 | } 603 | layer { 604 | name: "inception_4b/relu_pool_proj" 605 | type: "ReLU" 606 | bottom: "inception_4b/pool_proj" 607 | top: "inception_4b/pool_proj" 608 | } 609 | layer { 610 | name: "inception_4b/output" 611 | type: "concat" 612 | bottom: "inception_4b/1x1" 613 | bottom: "inception_4b/3x3 " 614 | bottom: "inception_4b/5x5" 615 | bottom: "inception_4b/pool_proj" 616 | top: "inception_4b/output" 617 | } 618 | layer { 619 | name: "inception_4c/1x1" 620 | type: "Convolution" 621 | bottom: "inception_4b/output" 622 | top: "inception_4c/1x1" 623 | convolution_param { 624 | num_output: 128 625 | pad: 0 626 | kernel_size: 1 627 | stride: 1 628 | } 629 | } 630 | layer { 631 | name: "inception_4c/relu_1x1" 632 | type: "ReLU" 633 | bottom: "inception_4c/1x1" 634 | top: "inception_4c/1x1" 635 | } 636 | layer { 637 | name: "inception_4c/3x3_reduce" 638 | type: "Convolution" 639 | bottom: "inception_4b/output" 640 | top: "inception_4c/3x3_reduce" 641 | convolution_param { 642 | num_output: 128 643 | pad: 0 644 | kernel_size: 1 645 | stride: 1 646 | } 647 | } 648 | layer { 649 | name: "inception_4c/relu_3x3_reduce" 650 | type: "ReLU" 651 | bottom: "inception_4c/3x3_reduce" 652 | top: "inception_4c/3x3_reduce" 653 | } 654 | layer { 655 | name: "inception_4c/3x3 " 656 | type: "Convolution" 657 | bottom: "inception_4c/3x3_reduce" 658 | top: "inception_4c/3x3 " 659 | convolution_param { 660 | num_output: 256 661 | pad: 1 662 | kernel_size: 3 663 | stride: 1 664 | } 665 | } 666 | layer { 667 | name: "inception_4c/relu_3x3 " 668 | type: "ReLU" 669 | bottom: "inception_4c/3x3 " 670 | top: "inception_4c/3x3 " 671 | } 672 | layer { 673 | name: "inception_4c/5x5_reduce" 674 | type: "Convolution" 675 | bottom: "inception_4b/output" 676 | top: "inception_4c/5x5_reduce" 677 | convolution_param { 678 | num_output: 24 679 | pad: 0 680 | kernel_size: 1 681 | stride: 1 682 | } 683 | } 684 | layer { 685 | name: "inception_4c/relu_5x5_reduce" 686 | type: "ReLU" 687 | bottom: "inception_4c/5x5_reduce" 688 | top: "inception_4c/5x5_reduce" 689 | } 690 | layer { 691 | name: "inception_4c/5x5" 692 | type: "Convolution" 693 | bottom: "inception_4c/5x5_reduce" 694 | top: "inception_4c/5x5" 695 | convolution_param { 696 | num_output: 64 697 | pad: 2 698 | kernel_size: 5 699 | stride: 1 700 | } 701 | } 702 | layer { 703 | name: "inception_4c/relu_5x5" 704 | type: "ReLU" 705 | bottom: "inception_4c/5x5" 706 | top: "inception_4c/5x5" 707 | } 708 | layer { 709 | name: "inception_4c/pool" 710 | type: "Pooling" 711 | bottom: "inception_4b/output" 712 | top: "inception_4c/pool" 713 | pooling_param { 714 | pool: MAX 715 | kernel_size: 3 716 | stride: 1 717 | pad: 1 718 | } 719 | } 720 | layer { 721 | name: "inception_4c/pool_proj" 722 | type: "Convolution" 723 | bottom: "inception_4c/pool" 724 | top: "inception_4c/pool_proj" 725 | convolution_param { 726 | num_output: 64 727 | pad: 0 728 | kernel_size: 1 729 | stride: 1 730 | } 731 | } 732 | layer { 733 | name: "inception_4c/relu_pool_proj" 734 | type: "ReLU" 735 | bottom: "inception_4c/pool_proj" 736 | top: "inception_4c/pool_proj" 737 | } 738 | layer { 739 | name: "inception_4c/output" 740 | type: "concat" 741 | bottom: "inception_4c/1x1" 742 | bottom: "inception_4c/3x3 " 743 | bottom: "inception_4c/5x5" 744 | bottom: "inception_4c/pool_proj" 745 | top: "inception_4c/output" 746 | } 747 | layer { 748 | name: "inception_4d/1x1" 749 | type: "Convolution" 750 | bottom: "inception_4c/output" 751 | top: "inception_4d/1x1" 752 | convolution_param { 753 | num_output: 112 754 | pad: 0 755 | kernel_size: 1 756 | stride: 1 757 | } 758 | } 759 | layer { 760 | name: "inception_4d/relu_1x1" 761 | type: "ReLU" 762 | bottom: "inception_4d/1x1" 763 | top: "inception_4d/1x1" 764 | } 765 | layer { 766 | name: "inception_4d/3x3_reduce" 767 | type: "Convolution" 768 | bottom: "inception_4c/output" 769 | top: "inception_4d/3x3_reduce" 770 | convolution_param { 771 | num_output: 144 772 | pad: 0 773 | kernel_size: 1 774 | stride: 1 775 | } 776 | } 777 | layer { 778 | name: "inception_4d/relu_3x3_reduce" 779 | type: "ReLU" 780 | bottom: "inception_4d/3x3_reduce" 781 | top: "inception_4d/3x3_reduce" 782 | } 783 | layer { 784 | name: "inception_4d/3x3 " 785 | type: "Convolution" 786 | bottom: "inception_4d/3x3_reduce" 787 | top: "inception_4d/3x3 " 788 | convolution_param { 789 | num_output: 288 790 | pad: 1 791 | kernel_size: 3 792 | stride: 1 793 | } 794 | } 795 | layer { 796 | name: "inception_4d/relu_3x3 " 797 | type: "ReLU" 798 | bottom: "inception_4d/3x3 " 799 | top: "inception_4d/3x3 " 800 | } 801 | layer { 802 | name: "inception_4d/5x5_reduce" 803 | type: "Convolution" 804 | bottom: "inception_4c/output" 805 | top: "inception_4d/5x5_reduce" 806 | convolution_param { 807 | num_output: 32 808 | pad: 0 809 | kernel_size: 1 810 | stride: 1 811 | } 812 | } 813 | layer { 814 | name: "inception_4d/relu_5x5_reduce" 815 | type: "ReLU" 816 | bottom: "inception_4d/5x5_reduce" 817 | top: "inception_4d/5x5_reduce" 818 | } 819 | layer { 820 | name: "inception_4d/5x5" 821 | type: "Convolution" 822 | bottom: "inception_4d/5x5_reduce" 823 | top: "inception_4d/5x5" 824 | convolution_param { 825 | num_output: 64 826 | pad: 2 827 | kernel_size: 5 828 | stride: 1 829 | } 830 | } 831 | layer { 832 | name: "inception_4d/relu_5x5" 833 | type: "ReLU" 834 | bottom: "inception_4d/5x5" 835 | top: "inception_4d/5x5" 836 | } 837 | layer { 838 | name: "inception_4d/pool" 839 | type: "Pooling" 840 | bottom: "inception_4c/output" 841 | top: "inception_4d/pool" 842 | pooling_param { 843 | pool: MAX 844 | kernel_size: 3 845 | stride: 1 846 | pad: 1 847 | } 848 | } 849 | layer { 850 | name: "inception_4d/pool_proj" 851 | type: "Convolution" 852 | bottom: "inception_4d/pool" 853 | top: "inception_4d/pool_proj" 854 | convolution_param { 855 | num_output: 64 856 | pad: 0 857 | kernel_size: 1 858 | stride: 1 859 | } 860 | } 861 | layer { 862 | name: "inception_4d/relu_pool_proj" 863 | type: "ReLU" 864 | bottom: "inception_4d/pool_proj" 865 | top: "inception_4d/pool_proj" 866 | } 867 | layer { 868 | name: "inception_4d/output" 869 | type: "concat" 870 | bottom: "inception_4d/1x1" 871 | bottom: "inception_4d/3x3 " 872 | bottom: "inception_4d/5x5" 873 | bottom: "inception_4d/pool_proj" 874 | top: "inception_4d/output" 875 | } 876 | layer { 877 | name: "inception_4e/1x1" 878 | type: "Convolution" 879 | bottom: "inception_4d/output" 880 | top: "inception_4e/1x1" 881 | convolution_param { 882 | num_output: 256 883 | pad: 0 884 | kernel_size: 1 885 | stride: 1 886 | } 887 | } 888 | layer { 889 | name: "inception_4e/relu_1x1" 890 | type: "ReLU" 891 | bottom: "inception_4e/1x1" 892 | top: "inception_4e/1x1" 893 | } 894 | layer { 895 | name: "inception_4e/3x3_reduce" 896 | type: "Convolution" 897 | bottom: "inception_4d/output" 898 | top: "inception_4e/3x3_reduce" 899 | convolution_param { 900 | num_output: 160 901 | pad: 0 902 | kernel_size: 1 903 | stride: 1 904 | } 905 | } 906 | layer { 907 | name: "inception_4e/relu_3x3_reduce" 908 | type: "ReLU" 909 | bottom: "inception_4e/3x3_reduce" 910 | top: "inception_4e/3x3_reduce" 911 | } 912 | layer { 913 | name: "inception_4e/3x3 " 914 | type: "Convolution" 915 | bottom: "inception_4e/3x3_reduce" 916 | top: "inception_4e/3x3 " 917 | convolution_param { 918 | num_output: 320 919 | pad: 1 920 | kernel_size: 3 921 | stride: 1 922 | } 923 | } 924 | layer { 925 | name: "inception_4e/relu_3x3 " 926 | type: "ReLU" 927 | bottom: "inception_4e/3x3 " 928 | top: "inception_4e/3x3 " 929 | } 930 | layer { 931 | name: "inception_4e/5x5_reduce" 932 | type: "Convolution" 933 | bottom: "inception_4d/output" 934 | top: "inception_4e/5x5_reduce" 935 | convolution_param { 936 | num_output: 32 937 | pad: 0 938 | kernel_size: 1 939 | stride: 1 940 | } 941 | } 942 | layer { 943 | name: "inception_4e/relu_5x5_reduce" 944 | type: "ReLU" 945 | bottom: "inception_4e/5x5_reduce" 946 | top: "inception_4e/5x5_reduce" 947 | } 948 | layer { 949 | name: "inception_4e/5x5" 950 | type: "Convolution" 951 | bottom: "inception_4e/5x5_reduce" 952 | top: "inception_4e/5x5" 953 | convolution_param { 954 | num_output: 128 955 | pad: 2 956 | kernel_size: 5 957 | stride: 1 958 | } 959 | } 960 | layer { 961 | name: "inception_4e/relu_5x5" 962 | type: "ReLU" 963 | bottom: "inception_4e/5x5" 964 | top: "inception_4e/5x5" 965 | } 966 | layer { 967 | name: "inception_4e/pool" 968 | type: "Pooling" 969 | bottom: "inception_4d/output" 970 | top: "inception_4e/pool" 971 | pooling_param { 972 | pool: MAX 973 | kernel_size: 3 974 | stride: 1 975 | pad: 1 976 | } 977 | } 978 | layer { 979 | name: "inception_4e/pool_proj" 980 | type: "Convolution" 981 | bottom: "inception_4e/pool" 982 | top: "inception_4e/pool_proj" 983 | convolution_param { 984 | num_output: 128 985 | pad: 0 986 | kernel_size: 1 987 | stride: 1 988 | } 989 | } 990 | layer { 991 | name: "inception_4e/relu_pool_proj" 992 | type: "ReLU" 993 | bottom: "inception_4e/pool_proj" 994 | top: "inception_4e/pool_proj" 995 | } 996 | layer { 997 | name: "inception_4e/output" 998 | type: "concat" 999 | bottom: "inception_4e/1x1" 1000 | bottom: "inception_4e/3x3 " 1001 | bottom: "inception_4e/5x5" 1002 | bottom: "inception_4e/pool_proj" 1003 | top: "inception_4e/output" 1004 | } 1005 | layer { 1006 | name: "pool4/3x3_s2" 1007 | type: "Pooling" 1008 | bottom: "inception_4e/output" 1009 | top: "pool4/3x3_s2" 1010 | pooling_param { 1011 | pool: MAX 1012 | kernel_size: 3 1013 | stride: 2 1014 | } 1015 | } 1016 | layer { 1017 | name: "inception_5a/1x1" 1018 | type: "Convolution" 1019 | bottom: "pool4/3x3_s2" 1020 | top: "inception_5a/1x1" 1021 | convolution_param { 1022 | num_output: 256 1023 | pad: 0 1024 | kernel_size: 1 1025 | stride: 1 1026 | } 1027 | } 1028 | layer { 1029 | name: "inception_5a/relu_1x1" 1030 | type: "ReLU" 1031 | bottom: "inception_5a/1x1" 1032 | top: "inception_5a/1x1" 1033 | } 1034 | layer { 1035 | name: "inception_5a/3x3_reduce" 1036 | type: "Convolution" 1037 | bottom: "pool4/3x3_s2" 1038 | top: "inception_5a/3x3_reduce" 1039 | convolution_param { 1040 | num_output: 160 1041 | pad: 0 1042 | kernel_size: 1 1043 | stride: 1 1044 | } 1045 | } 1046 | layer { 1047 | name: "inception_5a/relu_3x3_reduce" 1048 | type: "ReLU" 1049 | bottom: "inception_5a/3x3_reduce" 1050 | top: "inception_5a/3x3_reduce" 1051 | } 1052 | layer { 1053 | name: "inception_5a/3x3 " 1054 | type: "Convolution" 1055 | bottom: "inception_5a/3x3_reduce" 1056 | top: "inception_5a/3x3 " 1057 | convolution_param { 1058 | num_output: 320 1059 | pad: 1 1060 | kernel_size: 3 1061 | stride: 1 1062 | } 1063 | } 1064 | layer { 1065 | name: "inception_5a/relu_3x3 " 1066 | type: "ReLU" 1067 | bottom: "inception_5a/3x3 " 1068 | top: "inception_5a/3x3 " 1069 | } 1070 | layer { 1071 | name: "inception_5a/5x5_reduce" 1072 | type: "Convolution" 1073 | bottom: "pool4/3x3_s2" 1074 | top: "inception_5a/5x5_reduce" 1075 | convolution_param { 1076 | num_output: 32 1077 | pad: 0 1078 | kernel_size: 1 1079 | stride: 1 1080 | } 1081 | } 1082 | layer { 1083 | name: "inception_5a/relu_5x5_reduce" 1084 | type: "ReLU" 1085 | bottom: "inception_5a/5x5_reduce" 1086 | top: "inception_5a/5x5_reduce" 1087 | } 1088 | layer { 1089 | name: "inception_5a/5x5" 1090 | type: "Convolution" 1091 | bottom: "inception_5a/5x5_reduce" 1092 | top: "inception_5a/5x5" 1093 | convolution_param { 1094 | num_output: 128 1095 | pad: 2 1096 | kernel_size: 5 1097 | stride: 1 1098 | } 1099 | } 1100 | layer { 1101 | name: "inception_5a/relu_5x5" 1102 | type: "ReLU" 1103 | bottom: "inception_5a/5x5" 1104 | top: "inception_5a/5x5" 1105 | } 1106 | layer { 1107 | name: "inception_5a/pool" 1108 | type: "Pooling" 1109 | bottom: "pool4/3x3_s2" 1110 | top: "inception_5a/pool" 1111 | pooling_param { 1112 | pool: MAX 1113 | kernel_size: 3 1114 | stride: 1 1115 | pad: 1 1116 | } 1117 | } 1118 | layer { 1119 | name: "inception_5a/pool_proj" 1120 | type: "Convolution" 1121 | bottom: "inception_5a/pool" 1122 | top: "inception_5a/pool_proj" 1123 | convolution_param { 1124 | num_output: 128 1125 | pad: 0 1126 | kernel_size: 1 1127 | stride: 1 1128 | } 1129 | } 1130 | layer { 1131 | name: "inception_5a/relu_pool_proj" 1132 | type: "ReLU" 1133 | bottom: "inception_5a/pool_proj" 1134 | top: "inception_5a/pool_proj" 1135 | } 1136 | layer { 1137 | name: "inception_5a/output" 1138 | type: "concat" 1139 | bottom: "inception_5a/1x1" 1140 | bottom: "inception_5a/3x3 " 1141 | bottom: "inception_5a/5x5" 1142 | bottom: "inception_5a/pool_proj" 1143 | top: "inception_5a/output" 1144 | } 1145 | layer { 1146 | name: "inception_5b/1x1" 1147 | type: "Convolution" 1148 | bottom: "inception_5a/output" 1149 | top: "inception_5b/1x1" 1150 | convolution_param { 1151 | num_output: 384 1152 | pad: 0 1153 | kernel_size: 1 1154 | stride: 1 1155 | } 1156 | } 1157 | layer { 1158 | name: "inception_5b/relu_1x1" 1159 | type: "ReLU" 1160 | bottom: "inception_5b/1x1" 1161 | top: "inception_5b/1x1" 1162 | } 1163 | layer { 1164 | name: "inception_5b/3x3_reduce" 1165 | type: "Convolution" 1166 | bottom: "inception_5a/output" 1167 | top: "inception_5b/3x3_reduce" 1168 | convolution_param { 1169 | num_output: 192 1170 | pad: 0 1171 | kernel_size: 1 1172 | stride: 1 1173 | } 1174 | } 1175 | layer { 1176 | name: "inception_5b/relu_3x3_reduce" 1177 | type: "ReLU" 1178 | bottom: "inception_5b/3x3_reduce" 1179 | top: "inception_5b/3x3_reduce" 1180 | } 1181 | layer { 1182 | name: "inception_5b/3x3 " 1183 | type: "Convolution" 1184 | bottom: "inception_5b/3x3_reduce" 1185 | top: "inception_5b/3x3 " 1186 | convolution_param { 1187 | num_output: 384 1188 | pad: 1 1189 | kernel_size: 3 1190 | stride: 1 1191 | } 1192 | } 1193 | layer { 1194 | name: "inception_5b/relu_3x3 " 1195 | type: "ReLU" 1196 | bottom: "inception_5b/3x3 " 1197 | top: "inception_5b/3x3 " 1198 | } 1199 | layer { 1200 | name: "inception_5b/5x5_reduce" 1201 | type: "Convolution" 1202 | bottom: "inception_5a/output" 1203 | top: "inception_5b/5x5_reduce" 1204 | convolution_param { 1205 | num_output: 48 1206 | pad: 0 1207 | kernel_size: 1 1208 | stride: 1 1209 | } 1210 | } 1211 | layer { 1212 | name: "inception_5b/relu_5x5_reduce" 1213 | type: "ReLU" 1214 | bottom: "inception_5b/5x5_reduce" 1215 | top: "inception_5b/5x5_reduce" 1216 | } 1217 | layer { 1218 | name: "inception_5b/5x5" 1219 | type: "Convolution" 1220 | bottom: "inception_5b/5x5_reduce" 1221 | top: "inception_5b/5x5" 1222 | convolution_param { 1223 | num_output: 128 1224 | pad: 2 1225 | kernel_size: 5 1226 | stride: 1 1227 | } 1228 | } 1229 | layer { 1230 | name: "inception_5b/relu_5x5" 1231 | type: "ReLU" 1232 | bottom: "inception_5b/5x5" 1233 | top: "inception_5b/5x5" 1234 | } 1235 | layer { 1236 | name: "inception_5b/pool" 1237 | type: "Pooling" 1238 | bottom: "inception_5a/output" 1239 | top: "inception_5b/pool" 1240 | pooling_param { 1241 | pool: MAX 1242 | kernel_size: 3 1243 | stride: 1 1244 | pad: 1 1245 | } 1246 | } 1247 | layer { 1248 | name: "inception_5b/pool_proj" 1249 | type: "Convolution" 1250 | bottom: "inception_5b/pool" 1251 | top: "inception_5b/pool_proj" 1252 | convolution_param { 1253 | num_output: 128 1254 | pad: 0 1255 | kernel_size: 1 1256 | stride: 1 1257 | } 1258 | } 1259 | layer { 1260 | name: "inception_5b/relu_pool_proj" 1261 | type: "ReLU" 1262 | bottom: "inception_5b/pool_proj" 1263 | top: "inception_5b/pool_proj" 1264 | } 1265 | layer { 1266 | name: "inception_5b/output" 1267 | type: "concat" 1268 | bottom: "inception_5b/1x1" 1269 | bottom: "inception_5b/3x3 " 1270 | bottom: "inception_5b/5x5" 1271 | bottom: "inception_5b/pool_proj" 1272 | top: "inception_5b/output" 1273 | } 1274 | layer { 1275 | name: "pool5/7x7_s1" 1276 | type: "Pooling" 1277 | bottom: "inception_5b/output" 1278 | top: "pool5/7x7_s1" 1279 | pooling_param { 1280 | pool: AVE 1281 | kernel_size: 7 1282 | stride: 1 1283 | } 1284 | } 1285 | layer { 1286 | name: "pool5/drop_7x7_s1" 1287 | type: "Dropout" 1288 | bottom: "pool5/7x7_s1" 1289 | top: "pool5/7x7_s1" 1290 | dropout_param { 1291 | dropout_ratio: 0.4 1292 | } 1293 | } 1294 | layer { 1295 | name: "loos3/classifier" 1296 | type: "InnerProduct" 1297 | bottom: "pool5/7x7_s1" 1298 | top: "loos3/classifier" 1299 | inner_product_param { 1300 | num_output: 1000 1301 | } 1302 | } 1303 | layer { 1304 | name: "prob" 1305 | type: "Softmax" 1306 | bottom: "loos3/classifier" 1307 | top: "prob" 1308 | } 1309 | -------------------------------------------------------------------------------- /GoogLeNet/googlenet_v1_deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | from six.moves import xrange 4 | 5 | import caffe 6 | from caffe import layers as L, params as P 7 | from caffe import to_proto 8 | 9 | 10 | # block 11 | # Convolution - ReLU 12 | def _block_cr(major, minor, net, bottom, nout, pad, ks, stride): 13 | conv_layer = '{}/{}'.format(major, minor) 14 | relu_layer = '{}/relu_{}'.format(major, minor) 15 | 16 | net[conv_layer] = L.Convolution(bottom, 17 | num_output = nout, pad = pad, 18 | kernel_size = ks, stride = stride) 19 | net[relu_layer] = L.ReLU(net[conv_layer], in_place = True) 20 | 21 | return net[relu_layer] 22 | 23 | 24 | # Inception v1 25 | def _inception_v1(major, net, bottom, nout): 26 | minor = ['1x1', '3x3_reduce', '3x3 ', '5x5_reduce', '5x5', 'pool_proj'] 27 | 28 | block_1x1 = _block_cr(major, minor[0], net, bottom, nout[0], 0, 1, 1) 29 | 30 | block_3x3_reduce = _block_cr(major, minor[1], net, bottom, nout[1], 0, 1, 1) 31 | block_3x3 = _block_cr(major, minor[2], net, block_3x3_reduce, nout[2], 1, 3, 1) 32 | 33 | block_5x5_reduce = _block_cr(major, minor[3], net, bottom, nout[3], 0, 1, 1) 34 | block_5x5 = _block_cr(major, minor[4], net, block_5x5_reduce, nout[4], 2, 5, 1) 35 | 36 | pool_layer = '{}/pool'.format(major) 37 | net[pool_layer] = L.Pooling(bottom, pool = P.Pooling.MAX, 38 | pad = 1, kernel_size = 3, stride = 1) 39 | block_pool_proj = _block_cr(major, minor[5], net, net[pool_layer], nout[5], 0, 1, 1) 40 | 41 | output_layer = '{}/output'.format(major) 42 | net[output_layer] = L.concat(block_1x1, block_3x3, block_5x5, block_pool_proj) 43 | 44 | return net[output_layer] 45 | 46 | 47 | def construc_net(): 48 | net = caffe.NetSpec() 49 | 50 | net.data = L.Input(shape = dict(dim = [10,3,224,224])) 51 | block_cr_1 = _block_cr('conv1', '7x7_s2', net, net.data, 64, 3, 7, 2) 52 | pool_layer_1 = 'pool1/3x3_s2' 53 | net[pool_layer_1] = L.Pooling(block_cr_1, pool = P.Pooling.MAX, 54 | kernel_size = 3, stride = 2) 55 | ##LRN 56 | block_cr_2_reduce = _block_cr('conv2', '3x3_reduce', net, net[pool_layer_1], 64, 0, 1, 1) 57 | block_cr_2 = _block_cr('conv2', '3x3', net, block_cr_2_reduce, 192, 1, 3, 1) 58 | ##LRN 59 | pool_layer_2 = 'pool2/3x3_s2' 60 | net[pool_layer_2] = L.Pooling(block_cr_2, pool = P.Pooling.MAX, 61 | kernel_size = 3, stride = 2) 62 | inception_3a = _inception_v1('inception_3a', net, net[pool_layer_2], [64,96,128,16,32,32]) 63 | inception_3b = _inception_v1('inception_3b', net, inception_3a, [128,128,192,32,96,64]) 64 | pool_layer_3 = 'pool3/3x3_s2' 65 | net[pool_layer_3] = L.Pooling(inception_3b, pool = P.Pooling.MAX, 66 | kernel_size = 3, stride = 2) 67 | inception_4a = _inception_v1('inception_4a', net, net[pool_layer_3], [192,96,208,16,48,64]) 68 | inception_4b = _inception_v1('inception_4b', net, inception_4a, [160,112,224,24,64,64]) 69 | inception_4c = _inception_v1('inception_4c', net, inception_4b, [128,128,256,24,64,64]) 70 | inception_4d = _inception_v1('inception_4d', net, inception_4c, [112,144,288,32,64,64]) 71 | inception_4e = _inception_v1('inception_4e', net, inception_4d, [256,160,320,32,128,128]) 72 | pool_layer_4 = 'pool4/3x3_s2' 73 | net[pool_layer_4] = L.Pooling(inception_4e, pool = P.Pooling.MAX, 74 | kernel_size = 3, stride = 2) 75 | inception_5a = _inception_v1('inception_5a', net, net[pool_layer_4], [256,160,320,32,128,128]) 76 | inception_5b = _inception_v1('inception_5b', net, inception_5a, [384,192,384,48,128,128]) 77 | pool_layer_5 = 'pool5/7x7_s1' 78 | net[pool_layer_5] = L.Pooling(inception_5b, pool = P.Pooling.AVE, 79 | kernel_size = 7, stride = 1) 80 | pool_layer_5_drop = 'pool5/drop_7x7_s1' 81 | net[pool_layer_5_drop] = L.Dropout(net[pool_layer_5], dropout_ratio = 0.4, in_place = True) 82 | fc_layer = 'loos3/classifier' 83 | net[fc_layer] = L.InnerProduct(net[pool_layer_5_drop], num_output = 1000) 84 | net.prob = L.Softmax(net[fc_layer]) 85 | 86 | return net.to_proto() 87 | 88 | 89 | def main(): 90 | with open('googlenet_v1_deploy.prototxt', 'w') as f: 91 | f.write('name: "GoogLeNet-v1_deploy"\n') 92 | f.write(str(construc_net())) 93 | 94 | if __name__ == '__main__': 95 | main() 96 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## caffe_toolkit 2 | Caffe toolkit, including installing Caffe, creating various networks.
3 | 4 | ### 1.Automatically Install Caffe (CentOS7) 5 | ``` 6 | ./install_Caffe_CentOS7.sh 7 | ``` 8 | ### 2.Various Networks 9 | **VGG**
10 | [VGG-16] http://ethereon.github.io/netscope/#/gist/ef51dccffef6cbdd7b4452750e94fdf0
11 | [VGG-19] http://ethereon.github.io/netscope/#/gist/9499eecb7bee2bc701ddcf4b64d58025
12 | **ResNet**
13 | [ResNet-50] http://ethereon.github.io/netscope/#/gist/ada6091c177c2e18440eb077e757afdb
14 | [ResNet-101] http://ethereon.github.io/netscope/#/gist/d0ce7feacfdb498d1f033687c9155bf3
15 | [ResNet-152] http://ethereon.github.io/netscope/#/gist/ea9a36af0542cab6521e37b9340b9283
16 | **ResNet-v2**
17 | [ResNet-v2-164] http://ethereon.github.io/netscope/#/gist/3a7fc9f3e9cb0a8b2e3ef4638796c6d9
18 | **ResNeXt**
19 | [ResNeXt-50] http://ethereon.github.io/netscope/#/gist/2e94631a67ad2a3a7405308db1b2c87f
20 | [ResNeXt-101] http://ethereon.github.io/netscope/#/gist/adb2692e6811ed6ba2cbf3daae7072f9
21 | **WRN**
22 | [WRN-28-10] http://ethereon.github.io/netscope/#/gist/ec4cd13f11c02b5606397d9a5fe8753a
23 | **DenseNet**
24 | [DenseNet-121] http://ethereon.github.io/netscope/#/gist/7767198372b875deef9cc6ed7f465576
25 | 26 | ### Reference 27 | [1] David Stutz, caffe-tools, https://github.com/davidstutz/caffe-tools .
28 | [2] Kaiming He, deep-residual-networks, https://github.com/KaimingHe/deep-residual-networks .
29 | [3] Kaiming He, resnet-1k-layers, https://github.com/KaimingHe/resnet-1k-layers .
30 | [4] facebookresearch, ResNeXt, https://github.com/facebookresearch/ResNeXt .
31 | [5] Sergey Zagoruyko, wide-residual-networks, https://github.com/szagoruyko/wide-residual-networks .
32 | [6] Zhuang Liu, DenseNetCaffe, https://github.com/liuzhuang13/DenseNetCaffe .
33 | 34 | ### Contact Info 35 | If you have any problem on this project, please contact me by sending email to binlearning@163.com.
36 | -------------------------------------------------------------------------------- /ResNeXt/README.md: -------------------------------------------------------------------------------- 1 | # ResNeXt 2 | [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/abs/1611.05431)
3 | Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He
4 | 5 | ### 摘要 6 | 本文提出一种高度模块化并易于搭建的网络架构,网络中使用的基本构件(building block)都是一组具有相同拓扑结构的变换的聚合。 7 | 这种同结构多分支的设计理念只需要设置很少的超参数。本文提出的策略也引入了一个新的维度——**“基数(cardinality)”**,也就是同属一个block 8 | 的变换的数量,这是一个和网络深度、宽度同等重要的因素。通过在ImageNet-1K数据集上的实验可以发现,在保持网络复杂度不变的前提下, 9 | 增大基数可以提高分类准确率。另外通过增大基数来提升网络能力比深度、宽度更有效。本文提出的模型称为ResNeXt,是我们参加ILSVRC 10 | 2016分类任务时提交的解决方案的基础,另外也在ImageNet-5K和COCO检测数据集上对ResNeXt进行考察,发现ResNeXt的效果要优于相应的ResNet。 11 | 官方实现(Torch)的源码地址:https://github.com/facebookresearch/ResNeXt 。
12 | 13 | ### 1. Introduction 14 | 视觉识别领域的研究正在经历从“特征工程”到“网络工程”的转变。现在研究人员的主要精力转向于设计可以学习到更好的表示的网络架构。
15 | 随着超参数(如宽度(网络层的通道数)、滤波器尺寸、步幅等)数量的增加,设计架构也变得越来越困难。而VGG网络的成功说明使用简单但有效的策略 16 | (堆叠相同结构的基本构件)也可以构建比较深层的网络,这个策略在ResNet中也得以沿用,ResNet中堆叠的block也都是相同的拓扑结构。 17 | 简单的设计规则可以减少对超参数的选取,而深度是神经网络中一个至关重要的维度。另外,使用简单的设计规则可以降低所选取的超参数 18 | 过度适应某些特定数据集的风险,VGG网络和ResNet网络在多种视觉/非视觉任务上都很鲁棒。
19 | 不同于VGG网络,Inception模型通过精心设计网络的拓扑结构,在保持模型复杂度较低的前提下也取得了很高的准确率。所有Inception模型都具有 20 | 一个重要的性质——都是遵循 **拆分-变换-合并(split-transform-merge)** 的设计策略。Inception模型中block的输入会先被拆分成若干低维编码 21 | (使用1x1卷积实现),然后经过多个不同的滤波器(如3x3、5x5等)进行转换,最后通过沿通道维度串联的方式合并。这种设计策略希望在保持 22 | 网络计算复杂度相当低的前提下获取与包含大量且密集的层的网络具有相同的表示能力。
23 | 但是,Inception模型实现起来很麻烦,它包含一系列复杂的超参——每个变换的滤波器的尺寸和数量都需要指定,不同阶段的模块也需要定制。 24 | 太多的超参数大多的影响因素,如何将Inception模型调整到适合不同的数据集/任务变得很不明晰。
25 | 本文同时借鉴VGG/ResNet网络中重复使用同结构模块以及Inception模型的拆分-变换-合并的策略来简明的构建深层网络,具体见图1-right。 26 | 这样的设计可以随意调整变换的规模。
27 | ![](./data/figure_1.png)
28 | 这种设计策略还有两种等价形式(见图3)。图3(b)中的形式很像Inception-ResNet网络中的模块,不同的是每个分支都具有相同的拓扑结构; 29 | 图3(c)与AlexNet中分组卷积(grouped convolutions)的理念相似,然而AlexNet使用分组卷积是受限于当时的硬件条件。
30 | ![](./data/figure_3.png)
31 | 基数是与深度、宽度同样重要的维度,实验证明通过**增大基数来提升网络性能比深度、宽度更有效**,尤其是当深度/宽度的影响开始出现衰减时。
32 | 本文提出的网络名为ResNeXt,意为next维度(基数)。
33 | 34 | ### 2. Related Work 35 | **Multi-branch convolutional networks**  
36 | 多分支结构如Inception模型,ResNet可视为两个分支(其中一个是恒等映射),还有树状多分支结构的深度神经决策森林 37 | (Deep neural decision forests)。 
38 | **Grouped convolutions** 
39 | 分组卷积可以追溯到AlexNet,将模型拆分放到两个GPU中进行训练。Caffe、Torch等都支持分组卷积,主要也是为了兼容之前的AlexNet。 40 | 我们没有发现证明分组卷积可以提高准确率的依据。一个分组卷积的特例是逐通道卷积,它是可分离卷积的一部分。 
41 | **Compressing convolutional networks**
42 | 在空间/通道维度分解网络,减少冗余,可以对网络进行加速或精简。我们的方法具有更高的表示能力,而不着重于压缩。
43 | **Ensembling**
44 | 综合多个独立训练的网络的预测可有效提高准确率,这种集成的方法在竞赛中被广泛使用。Veit等人(Residual networks behave like 45 | ensembles of relatively shallow network)指出ResNet网络内部的表现就如同是多个浅层网络的集成,ResNet-v2中的加法操作具有集成的意义。 46 | 本文提出的方法也是用加法操作将变换组合聚合成一个深层网络,但是我们觉得认为残差网络的行为像集成学习是不严谨的,因为网络中的成员是同时训练, 47 | 而不是独立训练所得。
48 | 49 | ### 3. Method 50 | #### 3.1 Template 51 | 使用如VGG/ResNet网络相似的高度模块化的设计理念,网络由一系列残差block堆叠而成,并遵循两个简单的规则:(i)如果block输出的特征图的空间尺寸相同, 52 | 那么它们具有相同的超参数(宽度、滤波器尺寸等);(ii)如果特征图的空间维度减半,那么block的宽度(通道数)加倍, 53 | 第二条规则确保了所有block的计算复杂度基本相同。
54 | 根据上述两条规则可以设计一个模板模块,网络中的所有模块都可以照此设计。这两条规则也减少了超参数的选择,让我们可以专注于考察几个关键因素即可。 55 | 遵循这些规则设计的网络结构如表1所示。
56 | ![](./data/table_1.png)
57 | #### 3.2 Revisiting Simple Neurons 58 | 最简单的人工神经元就是执行內积(权值相加),实现元素级转换。內积可以表示成转换的聚合形式:
59 | ![](./data/formula_1.png)
60 | 如图2中所示,內积操作可以分为拆分(splitting)、变换(transforming)、聚合(aggregating)。
61 | ![](./data/figure_2.png)
62 | #### 3.3 Aggregated Transformations 63 | 将內积中的基本变换替换成更一般的函数,比如一个网络结构,那么聚合变换变成:
64 | ![](./data/formula_2.png)
65 | 公式(2)中的C表示变换的规模,称之为基数。C可以是任意值,它的取值控制着更复杂变换的规模。
66 | 本文中所有的变换Ti都具有相同的拓扑结构,如图1-right所示。
67 | 那么公式(2)中的聚合变换就是残差函数:
68 | ![](./data/formula_3.png)
69 | **Relation to Inception-ResNet**
70 | ResNeXt中的模块结构(图3(a))与Inception-ResNet的模块结构(图3(b))相似,不同的是ResNeXt中的模块都是相同的拓扑结构。
71 | **Relation to Grouped Convolutions**
72 | 使用分组卷积可以将上述模块的结构进一步简化,如图3(c)所示。所有低维的编码(第一个1x1层)可以由一个更宽的层替代,因为分组卷积 73 | 会将输入张量在通道维度上拆分成不同组然后进行处理,然后将处理后的张量连接起来作为输出。这种block与ResNet中原始block的形式(图1-left)很相似, 74 | 不同的是这些block更宽,并且是稀疏连接的。
75 | ![](./data/figure_3.png)
76 | 我们注意到只有block的深度大于2时才可以重新组织得到不同以往的拓扑结构,而深度只有2的block(见图4)重新组织也只是宽一些密集一些的模块。
77 | ![](./data/figure_4.png)
78 | 另外需要注意的是各小分支的变换不一定就是像图3中所示都是相同拓扑结构的,它们也可以是任意不同形式的变换。本文选取同结构的形式是为了使 79 | 网络更加简洁已经易扩展,在这种情况下就可以像图3(c)中所示使用分组卷积很容易的实现ResNeXt。
80 | #### 3.4 Model Capacity 81 | ResNeXt在保持模型复杂度和参数规模不变的情况下提升了模型准确率。复杂度和参数数量可以用来评估模型的表示能力,在考察深度网络时基本都会用到。 82 | 当考察相同复杂度下不同的基数C对模型性能的影响时,为了减少需要修改的超参数量,我们选取修改bottleneck(3x3卷积层)的宽度(通道数量) 83 | 来适应基数的变化,因为它独立于block的输入/输出,这样就不需要对其他的超参数(如block的深度、输入/输出的宽度等)。
84 | ![](./data/figure_1.png)
85 | 在图1-left中,原来的ResNet的block的参数数量有256\*64+3\*3\*64\*64+64\*256≈70k,当ResNeXt基数为C,bottleneck层宽度为d时(图1-right), 86 | 参数数量为:
87 | ![](./data/formula_4.png)
88 | 当C=32,d=4时公式(4)约等于70k,与原来的模型基本相同,表2展示了C与d的关系。
89 | ![](./data/table_2.png)
90 | 表1比较了具有相似复杂度的ResNet-50和ResNeXt-50,虽然复杂度只是大致相似,但之间的差异很小不至于影响结果。
91 | 92 | ### 4. Implementation details 93 | 维度增加(空间尺寸减小)时沿用ResNet中的B方案,但是卷积核由1x1变为3x3,步幅仍然为2。本文实现的方案选取了如图3(c)中的形式, 94 | block内部的设计(权值层与BN、ReLU的位置安排)按照ResNet方式,而不是ResNet-v2方式。图3中三种方案是等价的, 95 | 我们训练了三种形式都得到了相同的结果,选取3(c)来实现是因为这个方案更简洁,运行速度也更快。
96 | 97 | ### 5. Experiments 98 | #### 5.1 Experiments on ImageNet-1K 99 | **Cardinality vs. Width**  
100 | 首先考察基数对模型性能的影响。结果见表3,训练曲线见图5。
101 | ![](./data/table_3.png)
102 | ![](./data/figure_5.png)
103 | 复杂度不变的情况下,随着基数的增大错误率持续减小。ResNeXt的训练误差比ResNet的要小,说明性能的提升是来源于更强的表示能力而不是正则化。 104 | 从表3中可以看出,当bottleneck的宽度很小时,增加基数对模型性能的提升趋于饱和,所以bottleneck宽度的选取一般不小于4d。 
105 | **Increasing Cardinality vs. Deeper/Wider**  
106 | 考察增加深度/宽度/基数对网络性能的提升。具体表现见表4。
107 | ![](./data/table_4.png)
108 | 从表4中可以看出,通过增大基数来提升网络能力比深度、宽度更有效。
109 | **Performance**
110 | Torch对分组卷积的实现优化不理想,运行开支比较大。
111 | **Comparisons with state-of-the-art results**
112 | 表5展示了ResNeXt与各种之前最先进的模型的性能对比。
113 | ![](./data/table_5.png)
114 | #### 5.2 Experiments on ImageNet-5K 115 | ![](./data/table_6.png)
116 | ![](./data/figure_6.png)
117 | #### 5.3 Experiments on CIFAR 118 | ![](./data/table_7.png)
119 | ![](./data/figure_7.png)
120 | #### 5.4 Experiments on COCO object detection 121 | ![](./data/table_8.png)
122 | -------------------------------------------------------------------------------- /ResNeXt/data/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/figure_1.png -------------------------------------------------------------------------------- /ResNeXt/data/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/figure_2.png -------------------------------------------------------------------------------- /ResNeXt/data/figure_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/figure_3.png -------------------------------------------------------------------------------- /ResNeXt/data/figure_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/figure_4.png -------------------------------------------------------------------------------- /ResNeXt/data/figure_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/figure_5.png -------------------------------------------------------------------------------- /ResNeXt/data/figure_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/figure_6.png -------------------------------------------------------------------------------- /ResNeXt/data/figure_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/figure_7.png -------------------------------------------------------------------------------- /ResNeXt/data/formula_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/formula_1.png -------------------------------------------------------------------------------- /ResNeXt/data/formula_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/formula_2.png -------------------------------------------------------------------------------- /ResNeXt/data/formula_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/formula_3.png -------------------------------------------------------------------------------- /ResNeXt/data/formula_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/formula_4.png -------------------------------------------------------------------------------- /ResNeXt/data/table_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/table_1.png -------------------------------------------------------------------------------- /ResNeXt/data/table_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/table_2.png -------------------------------------------------------------------------------- /ResNeXt/data/table_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/table_3.png -------------------------------------------------------------------------------- /ResNeXt/data/table_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/table_4.png -------------------------------------------------------------------------------- /ResNeXt/data/table_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/table_5.png -------------------------------------------------------------------------------- /ResNeXt/data/table_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/table_6.png -------------------------------------------------------------------------------- /ResNeXt/data/table_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/table_7.png -------------------------------------------------------------------------------- /ResNeXt/data/table_8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNeXt/data/table_8.png -------------------------------------------------------------------------------- /ResNeXt/resnext_101_deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | import caffe 5 | from caffe import layers as L, params as P 6 | from caffe import to_proto 7 | 8 | # first block 9 | # Convolution - BatchNorm - Scale - ReLU 10 | def _block_first(net, bottom, nout=64, pad=3, ks=7, stride=2): 11 | net.conv1 = L.Convolution(bottom, 12 | num_output = nout, pad = pad, 13 | kernel_size = ks, stride = stride, 14 | bias_term = False) 15 | net.bn_conv1 = L.BatchNorm(net.conv1, in_place = True) 16 | net.scale_conv1 = L.Scale(net.bn_conv1, bias_term = True, in_place = True) 17 | net.conv1_relu = L.ReLU(net.scale_conv1, in_place = True) 18 | 19 | return net.conv1_relu 20 | 21 | 22 | # 3(layer) in 1(block) 23 | # Convolution - BatchNorm - Scale 24 | def _block_3in1(major, minor, net, bottom, nout, pad, ks, stride): 25 | branch_flag = '{}_branch{}'.format(major, minor) 26 | conv_layer = 'res{}'.format(branch_flag) 27 | bn_layer = 'bn{}'.format(branch_flag) 28 | scale_layer = 'scale{}'.format(branch_flag) 29 | 30 | net[conv_layer] = L.Convolution(bottom, 31 | num_output = nout, pad = pad, 32 | kernel_size = ks, stride = stride, 33 | bias_term = False) 34 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 35 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 36 | 37 | return net[scale_layer] 38 | 39 | 40 | # 4(layer) in 1(block) 41 | # Convolution - BatchNorm - Scale - ReLU 42 | def _block_4in1(major, minor, net, bottom, nout, pad, ks, stride): 43 | branch_flag = '{}_branch{}'.format(major, minor) 44 | conv_layer = 'res{}'.format(branch_flag) 45 | bn_layer = 'bn{}'.format(branch_flag) 46 | scale_layer = 'scale{}'.format(branch_flag) 47 | relu_layer = 'res{}_relu'.format(branch_flag) 48 | 49 | if ks == 3: # bottleneck layer, grouped convolutions 50 | net[conv_layer] = L.Convolution(bottom, 51 | num_output = nout, pad = pad, 52 | kernel_size = ks, stride = stride, 53 | bias_term = False, 54 | group = 32) # Cardinality 55 | else: 56 | net[conv_layer] = L.Convolution(bottom, 57 | num_output = nout, pad = pad, 58 | kernel_size = ks, stride = stride, 59 | bias_term = False) 60 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 61 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 62 | net[relu_layer] = L.ReLU(net[scale_layer], in_place = True) 63 | 64 | return net[relu_layer] 65 | 66 | 67 | # branch 68 | # [3in1] \ 69 | # | - branch 70 | # 4in1 - 4in1 - 3in1 / 71 | def _branch(major, net, bottom, nout, has_branch1=False, is_branch_2a=False): 72 | eltwise_layer = 'res{}'.format(major) 73 | relu_layer = 'res{}_relu'.format(major) 74 | 75 | stride = 1 76 | if has_branch1 and not is_branch_2a: 77 | stride = 2 78 | 79 | branch2_2a = _block_4in1(major, '2a', net, bottom, nout, 0, 1, 1) 80 | branch2_2b = _block_4in1(major, '2b', net, branch2_2a, nout, 1, 3, stride) 81 | branch2_2c = _block_3in1(major, '2c', net, branch2_2b, nout*2, 0, 1, 1) 82 | 83 | if has_branch1: 84 | branch1 = _block_3in1(major, '1', net, bottom, nout*2, 0, 1, stride) 85 | net[eltwise_layer] = L.Eltwise(branch1, branch2_2c) 86 | else: 87 | net[eltwise_layer] = L.Eltwise(bottom, branch2_2c) 88 | 89 | net[relu_layer] = L.ReLU(net[eltwise_layer], in_place = True) 90 | 91 | return net[relu_layer] 92 | 93 | 94 | def construc_net(): 95 | net = caffe.NetSpec() 96 | 97 | net.data = L.Input(input_param = dict(shape = dict(dim = [1,3,224,224]))) 98 | 99 | block1 = _block_first(net, net.data) 100 | 101 | net.pool1 = L.Pooling(block1, pool = P.Pooling.MAX, kernel_size = 3, stride = 2) 102 | 103 | branch_2a = _branch('2a', net, net.pool1, 128, has_branch1 = True, is_branch_2a = True) 104 | branch_2b = _branch('2b', net, branch_2a, 128) 105 | branch_2c = _branch('2c', net, branch_2b, 128) 106 | 107 | branch_3a = _branch('3a', net, branch_2c, 256, has_branch1 = True) 108 | branch_3b = _branch('3b', net, branch_3a, 256) 109 | branch_3c = _branch('3c', net, branch_3b, 256) 110 | branch_3d = _branch('3d', net, branch_3c, 256) 111 | 112 | branch_4a = _branch('4a', net, branch_3d, 512, has_branch1 = True) 113 | branch_pre = branch_4a 114 | for idx in xrange(1,23,1): # conv4_x total 1+22=23 115 | flag = '4b{}'.format(idx) 116 | branch_pre = _branch(flag, net, branch_pre, 512) 117 | 118 | branch_5a = _branch('5a', net, branch_pre, 1024, has_branch1 = True) 119 | branch_5b = _branch('5b', net, branch_5a, 1024) 120 | branch_5c = _branch('5c', net, branch_5b, 1024) 121 | 122 | net.pool5 = L.Pooling(branch_5c, pool = P.Pooling.AVE, kernel_size = 7, stride = 1) 123 | 124 | net.fc6 = L.InnerProduct(net.pool5, num_output = 1000) 125 | net.prob = L.Softmax(net.fc6) 126 | 127 | return net.to_proto() 128 | 129 | 130 | def main(): 131 | with open('resnext_101_deploy.prototxt', 'w') as f: 132 | f.write('name: "ResNeXt-101_deploy"\n') 133 | f.write(str(construc_net())) 134 | 135 | if __name__ == '__main__': 136 | main() 137 | -------------------------------------------------------------------------------- /ResNeXt/resnext_50_deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | import caffe 5 | from caffe import layers as L, params as P 6 | from caffe import to_proto 7 | 8 | # first block 9 | # Convolution - BatchNorm - Scale - ReLU 10 | def _block_first(net, bottom, nout=64, pad=3, ks=7, stride=2): 11 | net.conv1 = L.Convolution(bottom, 12 | num_output = nout, pad = pad, 13 | kernel_size = ks, stride = stride, 14 | bias_term = False) 15 | net.bn_conv1 = L.BatchNorm(net.conv1, in_place = True) 16 | net.scale_conv1 = L.Scale(net.bn_conv1, bias_term = True, in_place = True) 17 | net.conv1_relu = L.ReLU(net.scale_conv1, in_place = True) 18 | 19 | return net.conv1_relu 20 | 21 | 22 | # 3(layer) in 1(block) 23 | # Convolution - BatchNorm - Scale 24 | def _block_3in1(major, minor, net, bottom, nout, pad, ks, stride): 25 | branch_flag = '{}_branch{}'.format(major, minor) 26 | conv_layer = 'res{}'.format(branch_flag) 27 | bn_layer = 'bn{}'.format(branch_flag) 28 | scale_layer = 'scale{}'.format(branch_flag) 29 | 30 | net[conv_layer] = L.Convolution(bottom, 31 | num_output = nout, pad = pad, 32 | kernel_size = ks, stride = stride, 33 | bias_term = False) 34 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 35 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 36 | 37 | return net[scale_layer] 38 | 39 | 40 | # 4(layer) in 1(block) 41 | # Convolution - BatchNorm - Scale - ReLU 42 | def _block_4in1(major, minor, net, bottom, nout, pad, ks, stride): 43 | branch_flag = '{}_branch{}'.format(major, minor) 44 | conv_layer = 'res{}'.format(branch_flag) 45 | bn_layer = 'bn{}'.format(branch_flag) 46 | scale_layer = 'scale{}'.format(branch_flag) 47 | relu_layer = 'res{}_relu'.format(branch_flag) 48 | 49 | if ks == 3: # bottleneck layer, grouped convolutions 50 | net[conv_layer] = L.Convolution(bottom, 51 | num_output = nout, pad = pad, 52 | kernel_size = ks, stride = stride, 53 | bias_term = False, 54 | group = 32) # Cardinality 55 | else: 56 | net[conv_layer] = L.Convolution(bottom, 57 | num_output = nout, pad = pad, 58 | kernel_size = ks, stride = stride, 59 | bias_term = False) 60 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 61 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 62 | net[relu_layer] = L.ReLU(net[scale_layer], in_place = True) 63 | 64 | return net[relu_layer] 65 | 66 | 67 | # branch 68 | # [3in1] \ 69 | # | - branch 70 | # 4in1 - 4in1 - 3in1 / 71 | def _branch(major, net, bottom, nout, has_branch1=False, is_branch_2a=False): 72 | eltwise_layer = 'res{}'.format(major) 73 | relu_layer = 'res{}_relu'.format(major) 74 | 75 | stride = 1 76 | if has_branch1 and not is_branch_2a: 77 | stride = 2 78 | 79 | branch2_2a = _block_4in1(major, '2a', net, bottom, nout, 0, 1, 1) 80 | branch2_2b = _block_4in1(major, '2b', net, branch2_2a, nout, 1, 3, stride) 81 | branch2_2c = _block_3in1(major, '2c', net, branch2_2b, nout*2, 0, 1, 1) 82 | 83 | if has_branch1: 84 | branch1 = _block_3in1(major, '1', net, bottom, nout*2, 0, 1, stride) 85 | net[eltwise_layer] = L.Eltwise(branch1, branch2_2c) 86 | else: 87 | net[eltwise_layer] = L.Eltwise(bottom, branch2_2c) 88 | 89 | net[relu_layer] = L.ReLU(net[eltwise_layer], in_place = True) 90 | 91 | return net[relu_layer] 92 | 93 | 94 | def construc_net(): 95 | net = caffe.NetSpec() 96 | 97 | net.data = L.Input(input_param = dict(shape = dict(dim = [1,3,224,224]))) 98 | 99 | block1 = _block_first(net, net.data) 100 | 101 | net.pool1 = L.Pooling(block1, pool = P.Pooling.MAX, kernel_size = 3, stride = 2) 102 | 103 | branch_2a = _branch('2a', net, net.pool1, 128, has_branch1 = True, is_branch_2a = True) 104 | branch_2b = _branch('2b', net, branch_2a, 128) 105 | branch_2c = _branch('2c', net, branch_2b, 128) 106 | 107 | branch_3a = _branch('3a', net, branch_2c, 256, has_branch1 = True) 108 | branch_3b = _branch('3b', net, branch_3a, 256) 109 | branch_3c = _branch('3c', net, branch_3b, 256) 110 | branch_3d = _branch('3d', net, branch_3c, 256) 111 | 112 | branch_4a = _branch('4a', net, branch_3d, 512, has_branch1 = True) 113 | branch_4b = _branch('4b', net, branch_4a, 512) 114 | branch_4c = _branch('4c', net, branch_4b, 512) 115 | branch_4d = _branch('4d', net, branch_4c, 512) 116 | branch_4e = _branch('4e', net, branch_4d, 512) 117 | branch_4f = _branch('4f', net, branch_4e, 512) 118 | 119 | branch_5a = _branch('5a', net, branch_4f, 1024, has_branch1 = True) 120 | branch_5b = _branch('5b', net, branch_5a, 1024) 121 | branch_5c = _branch('5c', net, branch_5b, 1024) 122 | 123 | net.pool5 = L.Pooling(branch_5c, pool = P.Pooling.AVE, kernel_size = 7, stride = 1) 124 | 125 | net.fc6 = L.InnerProduct(net.pool5, num_output = 1000) 126 | net.prob = L.Softmax(net.fc6) 127 | 128 | return net.to_proto() 129 | 130 | 131 | def main(): 132 | with open('resnext_50_deploy.prototxt', 'w') as f: 133 | f.write('name: "ResNeXt-50_deploy"\n') 134 | f.write(str(construc_net())) 135 | 136 | if __name__ == '__main__': 137 | main() 138 | -------------------------------------------------------------------------------- /ResNet-v2/README.md: -------------------------------------------------------------------------------- 1 | # ResNet-v2 2 | [Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027)
3 | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
4 | 5 | ### 摘要 6 | 近期已经涌现出很多以深度残差网络(deep residual network)为基础的极深层的网络架构,在准确率和收敛性等方面的表现都非常引人注目。 7 | 本文主要分析残差网络基本构件(block)中的信号传播,我们发现当使用恒等映射(identity mapping)作为快捷连接(skip connection) 8 | 并且将激活函数移至加法操作后面时,前向-反向信号都可以在两个block之间直接传播而不受到任何变换操作的影响。大量实验结果证明了恒等映射的重要性。 9 | 本文根据这个发现重新设计了一种残差网络基本单元(unit),使得网络更易于训练并且泛化性能也得到提升。官方实现(Torch)的源码地址: 10 | https://github.com/KaimingHe/resnet-1k-layers 。
11 | 12 | ### 1. Introduction 13 | 深度残差网络(ResNet)由“残差单元(Residual Units)”堆叠而成,每个单元可以表示为:
14 | ![](./data/formula_1.png)
15 | 其中F是残差函数,在ResNet中,h(xl)=xl是恒等映射,f是ReLU激活函数。
16 | 在ImageNet数据集和COCO数据集上,超过1000层的残差网络都取得了最优的准确率。残差网络的**核心思想**是在h(xl)的基础上学习附加的残差函数F, 17 | 其中很重要的选择就是使用恒等映射h(xl)=xl,这可以通过在网络中添加恒等快捷连接(skip connection / shortcut)来实现。
18 | 本文中主要着眼于分析在深度残差网络中构建一个信息“直接”传播的路径——不只是在残差单元直接,而是在整个网络中信息可以“直接”传播。 19 | 如果h(xl)和f(yl)都是恒等映射,那么信号可以在单元间**直接**进行前向-反向传播。实验证明基本满足上述条件的网络架构一般更容易训练。 20 | 本文实验了不同形式的h(xl),发现使用恒等映射的网络性能最好,误差减小最快且训练损失最低。这些实验说明“干净”的信息通道有助于优化。 21 | 各种不同形式的h(xl)见图1,2,4中的灰色箭头所示。
22 | ![](./data/figure_1.png)
23 | 为了构建f(yl)=yl成为恒等映射,我们将激活函数(ReLU和BN)移到权值层之前,形成一种**“预激活(pre-activation)”**的方式,而不是常规的 24 | “后激活(post-activation)”方式,这样就设计出了一种新的残差单元(见图1(b))。基于这种新的单元我们在CIFAR-10/100数据集上使用1001层 25 | 残差网络进行训练,发现新的残差网络比之前的更容易训练并且泛化性能更好。另外还考察了200层新残差网络在ImageNet上的表现, 26 | 原先的残差网络在这个层数之后开始出现过拟合的现象。
27 | 28 | ### 2. Analysis of Deep Residual Networks 29 | 原先的残差网络中的残差单元可以表示为:
30 | ![](./data/formula_1.png)
31 | 如果h、f都是恒等映射,那么公式(1)(2)可以合并为:
32 | ![](./data/formula_3.png)
33 | 那么任意深层的单元L与浅层单元l之间的关系为:
34 | ![](./data/formula_4.png)
35 | 公式(4)有两个特性:(i)深层单元的特征可以由浅层单元的特征和残差函数相加得到;(ii)任意深层单元的特征都可以由起始特征x0与先前所有残差函数相加得到, 36 | 这与普通(plain)网络不同,普通网络的深层特征是由一系列的矩阵向量相乘得到。**残差网络是连加,普通网络是连乘**。
37 | 反向传播时的计算公式如下:
38 | ![](./data/formula_5.png)
39 | 从公式(5)中可以看出,反向传播也是两条路径,其中之一直接将信息回传,另一条会经过所有的带权重层。另外可以注意到第二项的值在一个 40 | mini-batch中不可能一直是1,也就是说回传的梯度不会消失,不论网络中的权值的值再小都不会发生梯度消失现象。
41 | 42 | ### 3. On the Importance of Identity Skip Connections 43 | 首先考察恒等映射的重要性。假设将恒等映射简单的改为h(xl)=λxl,即:
44 | ![](./data/formula_6.png)
45 | 如公式(3)到(4)一样递归调用公式(6),得到:
46 | ![](./data/formula_7.png)
47 | 那么这种情况下的反向传播计算公式为:
48 | ![](./data/formula_8.png)
49 | 假设模型是一个极深层的网络,考察第一个连乘的项,如果所有的λ都大于1,那么这一项会指数级增大;如果所有λ都小于1,那么这一项会很小甚至消失, 50 | 会阻碍信号直接传播,而强制信号通过带权值的层进行传播。实验表明这种方式会导致模型很难优化。
51 | 不同形式的变换映射都会妨碍信号的传播,进而影响训练进程。
52 | #### 3.1 Experiments on Skip Connections 53 | 考察使用不同形式映射(见图2)的网络的性能,具体结果见表1,在训练过程中的误差变化见图3。
54 | ![](./data/figure_2.png)
55 | ![](./data/table_1.png)
56 | ![](./data/figure_3.png)
57 | 在使用exclusive gating时,偏置bg的初始值对于网络性能的影响很大。
58 | #### 3.2 Discussions 59 | 快捷连接中的乘法操作(scaling, gating, 1×1 convolutions, and dropout)会妨碍信号传播,导致优化出现问题。
60 | 值得注意的是gating和1×1 convolutions快捷连接引进了更多的参数,增强了模型的表示能力,但是它们的训练误差反而比恒等映射更大, 61 | 这说明是退化现象导致了这些模型的优化问题。
62 | 63 | ### 4. On the Usage of Activation Functions 64 | 第3章讨论了公式(1)中的h是恒等映射的重要性,现在讨论公式(2)中的f,如果f也是恒等映射的话网络性能会不会也有提升。为了使得f是恒等映射, 65 | 需要调整ReLU、BN和带权值层的位置。 66 | #### 4.1 Experiments on Activation 67 | 下面考察多种组织方式(见图4),使用不同激活方式的网络的性能表现见表2。
68 | ![](./data/figure_4.png)
69 | ![](./data/table_2.png)
70 | **BN after addition**
71 | 效果比基准差,BN层移到相加操作后面会阻碍信号传播,一个明显的现象就是训练初期误差下降缓慢。
72 | **ReLU before addition**
73 | 这样组合的话残差函数分支的输出就一直保持非负,这会影响到模型的表示能力,而实验结果也表明这种组合比基准差。
74 | **Post-activation or pre-activation?**
75 | 原来的设计中相加操作后面还有一个ReLU激活函数,这个激活函数会影响到残差单元的两个分支,现在将它移到残差函数分支上,快捷连接分支不再受到影响。 76 | 具体操作如图5所示。
77 | ![](./data/figure_5.png)
78 | 根据激活函数与相加操作的位置关系,我们称之前的组合方式为“后激活(post-activation)”,现在新的组合方式称之为“预激活(pre-activation)”。 79 | 原来的设计与预激活残差单元之间的性能对比见表3。预激活方式又可以分为两种:只将ReLU放在前面,或者将ReLU和BN都放到前面, 80 | 根据表2中的结果可以看出full pre-activation的效果要更好。 81 | ![](./data/table_3.png)
82 | #### 4.2 Analysis 83 | 使用预激活有两个方面的优点:1)f变为恒等映射,使得网络更易于优化;2)使用BN作为预激活可以加强对模型的正则化。
84 | **Ease of optimization**
85 | 这在训练1001层残差网络时尤为明显,具体见图1。使用原来设计的网络在起始阶段误差下降很慢,因为f是ReLU激活函数,当信号为负时会被截断, 86 | 使模型无法很好地逼近期望函数;而使用预激活的网络中的f是恒等映射,信号可以在不同单元直接直接传播。我们使用的1001层网络优化速度很快, 87 | 并且得到了最低的误差。
88 | ![](./data/figure_1.png)
89 | f为ReLU对浅层残差网络的影响并不大,如图6-right所示。我们认为是当网络经过一段时间的训练之后权值经过适当的调整,使得单元输出基本都是非负, 90 | 此时f不再对信号进行截断。但是截断现象在超过1000层的网络中经常发生。
91 | ![](./data/figure_6.png)
92 | **Reducing overfitting**
93 | 观察图6-right,使用了预激活的网络的训练误差稍高,但却得到更低的测试误差,我们推测这是BN层的正则化效果所致。原来的设计中虽然也用到了BN, 94 | 但归一化后的信号很快与快捷连接通道中的相加了,而相加后的信号是没有归一化的。本文新设计的预激活的单元中的所有权值层的输入都是归一化的信号。
95 | 96 | ### 5. Results 97 | 表4、表5分别展示了不同网络在不同数据集上的表现。使用的预激活单元的更深层的残差网络都取得了最好的成绩。
98 | ![](./data/table_4.png)
99 | ![](./data/table_5.png)
100 | **Computational Cost** 本文提出的模型的计算复杂度正比于网络深度,在ImageNet数据集上,200层的残差网络使用8块GPU耗时约3周完成训练。
101 | ### 6. Conclusions 102 | 恒等映射形式的快捷连接和预激活对于信号在网络中的顺畅传播至关重要。
103 | 104 | 另附件介绍了各种网络的实现细节。
105 | -------------------------------------------------------------------------------- /ResNet-v2/data/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/figure_1.png -------------------------------------------------------------------------------- /ResNet-v2/data/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/figure_2.png -------------------------------------------------------------------------------- /ResNet-v2/data/figure_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/figure_3.png -------------------------------------------------------------------------------- /ResNet-v2/data/figure_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/figure_4.png -------------------------------------------------------------------------------- /ResNet-v2/data/figure_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/figure_5.png -------------------------------------------------------------------------------- /ResNet-v2/data/figure_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/figure_6.png -------------------------------------------------------------------------------- /ResNet-v2/data/formula_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/formula_1.png -------------------------------------------------------------------------------- /ResNet-v2/data/formula_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/formula_3.png -------------------------------------------------------------------------------- /ResNet-v2/data/formula_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/formula_4.png -------------------------------------------------------------------------------- /ResNet-v2/data/formula_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/formula_5.png -------------------------------------------------------------------------------- /ResNet-v2/data/formula_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/formula_6.png -------------------------------------------------------------------------------- /ResNet-v2/data/formula_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/formula_7.png -------------------------------------------------------------------------------- /ResNet-v2/data/formula_8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/formula_8.png -------------------------------------------------------------------------------- /ResNet-v2/data/table_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/table_1.png -------------------------------------------------------------------------------- /ResNet-v2/data/table_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/table_2.png -------------------------------------------------------------------------------- /ResNet-v2/data/table_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/table_3.png -------------------------------------------------------------------------------- /ResNet-v2/data/table_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/table_4.png -------------------------------------------------------------------------------- /ResNet-v2/data/table_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet-v2/data/table_5.png -------------------------------------------------------------------------------- /ResNet-v2/resnet_v2_deploy.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import os 4 | import argparse 5 | import numpy as np 6 | from six.moves import xrange 7 | 8 | import caffe 9 | from caffe import layers as L, params as P 10 | from caffe import to_proto 11 | 12 | 13 | # 4(layer) in 1(block) 14 | # BatchNorm - Scale - ReLU - Convolution 15 | def _block_4in1(major, minor, net, bottom, nout, pad, ks, stride): 16 | block_flag = 'block{}_branch{}'.format(major, minor) 17 | bn_layer = 'bn_{}'.format(block_flag) 18 | scale_layer = 'scale_{}'.format(block_flag) 19 | relu_layer = 'relu_{}'.format(block_flag) 20 | conv_layer = 'conv_{}'.format(block_flag) 21 | 22 | net[bn_layer] = L.BatchNorm(bottom) 23 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 24 | net[relu_layer] = L.ReLU(net[scale_layer], in_place = True) 25 | net[conv_layer] = L.Convolution(net[relu_layer], 26 | num_output = nout, pad = pad, 27 | kernel_size = ks, stride = stride, 28 | bias_term = False) 29 | 30 | return net[conv_layer] 31 | 32 | 33 | # block (residual unit) 34 | # [4in1] \ for increasing dimensions (decreasing spatial dimensions) 35 | # | - block 36 | # 4in1 - 4in1 - 4in1 / 37 | def _block(major, net, bottom, nout, has_block1=False, increasing_dims=True): 38 | eltwise_layer = 'addition_block{}'.format(major) 39 | 40 | stride = 1 41 | if has_block1 and increasing_dims: 42 | stride = 2 43 | 44 | branch2a = _block_4in1(major, '2a', net, bottom, nout//4, 0, 1, stride) 45 | branch2b = _block_4in1(major, '2b', net, branch2a, nout//4, 1, 3, 1) 46 | branch2c = _block_4in1(major, '2c', net, branch2b, nout, 0, 1, 1) 47 | 48 | if has_block1: 49 | branch1 = _block_4in1(major, '1', net, bottom, nout, 0, 1, stride) 50 | net[eltwise_layer] = L.Eltwise(branch1, branch2c) 51 | else: 52 | net[eltwise_layer] = L.Eltwise(bottom, branch2c) 53 | 54 | return net[eltwise_layer] 55 | 56 | 57 | def construc_net(num_block_per_stage): 58 | net = caffe.NetSpec() 59 | 60 | net.data = L.Input(input_param = dict(shape = dict(dim = [1,3,32,32]))) 61 | 62 | net.conv1 = L.Convolution(net.data, num_output = 16, 63 | kernel_size = 3, stride = 1, pad = 1, 64 | bias_term = False) 65 | 66 | # stage 1 67 | block_pre = _block('2_1', net, net.conv1, 64, has_block1 = True, increasing_dims = False) 68 | for idx in xrange(2,num_block_per_stage+1,1): 69 | flag = '2_{}'.format(idx) 70 | block_pre = _block(flag, net, block_pre, 64) 71 | 72 | # stage 2 73 | block_pre = _block('3_1', net, block_pre, 128, has_block1 = True) 74 | for idx in xrange(2,num_block_per_stage+1,1): 75 | flag = '3_{}'.format(idx) 76 | block_pre = _block(flag, net, block_pre, 128) 77 | 78 | # stage 3 79 | block_pre = _block('4_1', net, block_pre, 256, has_block1 = True) 80 | for idx in xrange(2,num_block_per_stage+1,1): 81 | flag = '4_{}'.format(idx) 82 | block_pre = _block(flag, net, block_pre, 256) 83 | 84 | net.bn5 = L.BatchNorm(block_pre) 85 | net.scale5 = L.Scale(net.bn5, bias_term = True, in_place = True) 86 | net.relu5 = L.ReLU(net.scale5, in_place = True) 87 | net.pool5 = L.Pooling(net.relu5, pool = P.Pooling.AVE, kernel_size = 8, stride = 1) 88 | 89 | net.fc6 = L.InnerProduct(net.pool5, num_output = 10) 90 | net.prob = L.Softmax(net.fc6) 91 | 92 | return net.to_proto() 93 | 94 | 95 | def main(args): 96 | num_block_per_stage = (args.depth - 2) // 9 97 | 98 | file_name = 'resnet_v2_{}_deploy.prototxt'.format(args.depth) 99 | net_name = 'name: "ResNet-v2-{}_deploy"\n'.format(args.depth) 100 | 101 | with open(file_name, 'w') as f: 102 | f.write(net_name) 103 | f.write(str(construc_net(num_block_per_stage))) 104 | 105 | 106 | if __name__ == '__main__': 107 | parser = argparse.ArgumentParser() 108 | parser.add_argument('depth', type = int, 109 | help = 'depth should be 9n+2 (e.g., 164 or 1001 in the paper)') 110 | args = parser.parse_args() 111 | 112 | main(args) 113 | -------------------------------------------------------------------------------- /ResNet/README.md: -------------------------------------------------------------------------------- 1 | # ResNet 2 | [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
3 | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
4 | 5 | ### 摘要 6 | 越深层的网络往往越难以训练。本文提出一种残差学习框架,使得比先前网络深的多的网络也易于训练。我们将网络中的层改写为 7 | 在参照该层输入的情况下学习残差函数的形式,替换之前那种无参考学习的方式。我们通过大量实验证明这种残差网络易于优化, 8 | 并且预测准确率随着网络深度的增加而增大。在ImageNet数据集上使用了一个152层的残差网络,深度是VGG网络的8倍但复杂度却更低。 9 | 使用这种残差网络的集合在ImageNet测试集上达到3.57%的top-5错误率,这个结果赢得了ILSVRC2015年分类任务的第一名。 10 | 另外我们在CIFAR-10数据集上对100层和1000层的残差网络进行了分析。
11 | 模型表示的深度在许多视觉识别任务中是最重要的影响因素。完全得益于极深层的表示,我们在COCO物体检测数据集上得到了28%的性能提升。 12 | 深度残差网络是我们在参加ILSVRC2015和COCO2015竞赛时提交的解决方案的基础,在这两个竞赛中我们还赢得了ImageNet检测、ImageNet定位、 13 | COCO检测以及COCO分割等任务的第一名。
14 | 15 | ### 1. Introduction 16 | 深度网络可以将低/中/高层特征与分类器结合起来成为一种端到端的多层形式,其中特征的“层”可以通过增加网络层数(深度)来丰富。 17 | 最近的研究也表明网络深度对于网络的性能提升至关重要。
18 | 那么,**是不是随着网络中堆叠的层数增加可以很容易的训练得到一个性能更好的网络呢?** 一个恶名昭彰的拦路虎妨碍验证该疑问是否正确—— 19 | 梯度消失/爆炸,它会妨碍网络收敛。随着归一化初始化、中间归一化层(BN)等技术的提出,梯度消失/爆炸问题得到一定程度的解决, 20 | 现在可以使用随机梯度下降(SGD)加上反向传播的方法训练一个数十层的网络至收敛。
21 | 然而,另一个拦路虎来袭——退化(degradation)问题。随着网络深度的增加,准确率趋向于饱和,然后迅速下降。让人出乎意料的是, 22 | 这种退化现象**并不是由过拟合导致**,在一个表现不错的网络中添加更多的层反而会导致**更高的训练误差**(过拟合时训练误差减小, 23 | 但测试误差增大)。图1中显示了一个典型案例。
24 | ![](./data/figure_1.png)
25 | 训练准确率的退化表明并不是所有系统都易于优化。考虑一个浅层架构以及与其对应的更深层的模型,构造更深层模型的方案是新添加的层 26 | 都是简单的恒等映射,其他的层都是从已学习的浅层网络中拷贝得到,那么这样的构造方式表明更深层的网络**不应该**比与之对应的浅层 27 | 网络有更高的训练误差。但实验显示我们现阶段的求解器无法发现一个比上述构造方案更好或相当的方案。
28 | 为了解决退化问题,本文提出一种**深度残差学习**框架。我们让每一层学习一个残差映射,并不是像之前一样希望每一层都学习得到所需的潜在映射( 29 | desired underlying mapping)。这里将所需的潜在映射表示为H(x),那么残差映射可以表示为F(x)=H(x)-x,之前的映射H(x)=F(x)+x。 30 | 我们假设残差映射比原来的无参考的映射更容易优化。考虑到一个极端情况,如果最优的方案就是恒等映射,那么将一个残差逼近零比使用 31 | 一组堆叠的非线性层来拟合恒等映射要容易的多。
32 | 公式F(x)+x可以通过在前馈神经网络中添加“快捷连接(shortcut connections)”来实现,快捷连接就是在网络中跳过若干层而直接相连(见图2)。 33 | 在本文中,快捷连接是为了实现恒等映射,它的输出与一组堆叠层的输出相加(见图2)。恒等快捷连接没有增加额外的参数及计算复杂度, 34 | 修改后的网络仍然可以使用SGD及BP进行端到端的训练,并且利用现有的深度学习软件框架(如Caffe)可以很容易构建出来。
35 | ![](./data/figure_2.png)
36 | 实验表明:1)极深的残差网络依然易于优化,但是相应的“普通(plain)”网络(只是简单地堆叠层)随着深度增加训练误差也越来越大; 37 | 2)深度残差网络随着深度的极大增加准确率也会提高,得到比先前网络更好的结果。
38 | 39 | ### 2. Related Work 40 | **残差表示(Residual Representations)**
41 | 在做矢量量化编码(vector quantization)时,对残差矢量进行编码相较于对原始矢量进行编码要更高效。
42 | 用于求解偏微分方程(Partial Differential Equations, PDEs)的多重网格(Multigrid)法的思想表明,恰当的重定义或预处理可以简化优化过程。
43 | **快捷连接(Shortcut Connections)**
44 | 快捷连接在很多网络结构中得到应用,或为了解决梯度消失/爆炸问题,或为了增强网络表示能力。
45 | 与本文思想一致的是Highway Networks,其中提出的快捷连接由门限函数控制。门限函数包含参数且依赖于数据,因此Highway 46 | Networks中的门限会有“关闭”的情况,此时网络的表现如同无残差函数即普通的网络结构。而本文方法使用的恒等连接没有参数,永远不会关闭, 47 | 所有信息始终畅通无阻的进行传播,网络只是学习残差函数。另外,没有实验证明Highway Networks可以随着网络深度的极大增加相应的提高准确率。
48 | 49 | ### 3. Deep Residual Learning 50 | #### 3.1. Residual Learning 51 | 用H(x)表示若干堆叠层所拟合的潜在映射,x表示这些层中第一层的输入。如果多个非线性层可以渐进拟合任意复杂函数的假设成立, 52 | 那么它们可以渐进拟合残差函数(如H(x)-x)的假设也成立。所以与其期望堆叠层拟合H(x),我们直接让这些层拟合残差函数F(x)=H(x)-x, 53 | 虽然这两种形式都可以拟合真正所需的函数,但训练学习的难易程度是不同的。
54 | 这样重定义的动机就是退化问题,如果新添加的层只是恒等映射,那么更深层的网络不应该比相应的浅层网络具有更高的训练误差, 55 | 但是退化问题表明当前的求解器难以将多层非线性层近似成恒等映射。而使用残差学习的重定义形式,如果恒等映射是最优解, 56 | 那么求解器可以很容易的将非线性层的参数全都逼近零来近似恒等映射。
57 | 在实际情况下恒等映射不见得就是最优解,但是这样的重定义可能给问题提供了一个合理的先决条件。如果最优函数近似于恒等映射而不是乘零映射, 58 | 那么求解器在参考恒等映射的前提下可以很容易的发现这些小的扰动,比重新学习一个新的函数要简单的多。实验表明(见图7), 59 | 学习得到的残差函数通常都是很小的响应值,表明将恒等映射作为先决条件是合理的。
60 | ![](./data/figure_7.png)
61 | #### 3.2. Identity Mapping by Shortcuts 62 | 本文将残差学习应用于若干堆叠层的组合,基本的构造单元(building block)如图2所示,将构造单元定义为:
63 | y = F(x,{Wi}) + x --- (1)
64 | F(x,{Wi})表示学习得到的残差映射。F+x由快捷连接和元素级加法实现,而快捷连接不会增加参数量及计算复杂度。 x与F的维度必须相同, 65 | 在改变输入输出的通道数时也要相应改变x的维度。残差函数F可以是任意形式,本文使用两个或三个网络层来表示F(见图5),当然也可以使用更多的层, 66 | 但是单个层的效果相当于线性层y=W1x+x,没有发现什么优势。
67 | #### 3.3. Network Architectures 68 | **Plain Networks**
69 | 主要借鉴于VGG网络,卷积核最大3x3并遵循下面两个设计原则:(i)如果输出特征图的尺寸相同,那么特征图的数量也相同; 70 | (ii)如果特征图尺寸减半,那么卷积核数量加倍,保证每层的计算复杂度相同。降采样由步长为2的卷积层实现,网络最后是全局平均池化层和 71 | 1000路全连接层及softmax层。带权重的层总共有34个(见图3-middle)。注意,这个模型比VGG网络的复杂度要低,共36亿FLOPs,只有VGG-19的18%。
72 | **Residual Network**
73 | 在上面Plain Networks的基础上插入快捷连接就成为深度残差网络(见图3-right)。当block的维度发生变化时有两个解决办法: 74 | (A)捷径仍然是恒等映射,在新增的维度填充零,这个方法不会增加额外的参数;(B)使用变换映射来匹配新的维度(一般是用1x1卷积层实现)。
75 | ![](./data/figure_3.png)
76 | #### 3.4. Implementation 77 | 在ImageNet数据集上训练,图像在[256,480]范围内随机缩放,在缩放后的图像中随机剪切224x224的区域并水平翻转,然后做像素级的均值减除, 78 | 另外也使用颜色增广方法。在每个卷积层激活函数之前应用batch normalization,使用msra初始化方法。使用SGD,batch size为256,初始学习率为0.1, 79 | 当训练误差停止下降时除10。权值衰减系数0.0001,动量0.9,没有使用dropout。
80 | 测试时使用10个剪切块,并在不同分辨率上{224,256,384,480,640}分别预测,最后取平均值作为最终的结果。
81 | 82 | ### 4. Experiments 83 | #### 4.1. ImageNet Classification 84 | 使用不同网络进行实验,具体配置如表1,结果见表2。 85 | ![](./data/table_1.png)
86 |
87 | ![](./data/table_2.png)
88 | **Plain Networks**
89 | 从表2中可以看出,更深层的普通网络比浅层网络有更高的验证误差。通过比较它们在训练过程中的训练/验证误差(见图4), 90 | 以发现退化问题——34层的普通网络在整个训练过程中都有更高的训练误差,虽然18层网络的解空间是34层网络解空间的子空间。
91 | 这种优化问题不太可能由梯度消失导致,BN的使用保证了信号在前向/反向传播时都不会消失。我们推测深层的普通网络可能是指数级低的收敛速率, 92 | 因此影响到了训练误差的减小(然而实验延长训练时间并没有发现该问题改进的迹象)。
93 | ![](./data/figure_4.png)
94 | **Residual Networks**
95 | 残差网络的结果却恰恰相反。34层的残差网络表现要比18层的更好,这表明残差网络可以解决退化问题, 96 | 并且可以随着深度的增加而提高准确率。34层残差网络的表现比34层普通网络要好,这证明了残差学习在深度模型中的有效性。 97 | 另外18层的残差网络和18层的普通网络性能相当,但是收敛速度更快(见图4),这说明在网络“并不太深”时,当前的求解器可以很好的解决, 98 | 在这种情况下残差网络通过在早期加速收敛速度使得优化更容易。
99 | **Identity vs. Projection Shortcuts**
100 | 3.3中介绍在维度发生变化时的处理方法(A)(B),再加一种(C)所有快捷连接都经过变换,表3展示了不同方法的性能。 101 | 从表3中可以看出三种方法都比相应的普通网络的表现要好,B比A要好一些,主要因为A的零值填充并没有残差学习,C比B稍好,但会引入更多的参数, 102 | 综合来看ABC的差别并不大,这说明变换快捷连接对解决退化问题不是必需的,为了减少内存/时间复杂度以及模型大小,本文采用B方案。
103 | ![](./data/table_3.png)
104 | **Deeper Bottleneck Architectures**
105 | 将基本构造单元修改为bottleneck形式,具体见图5。第一个1x1卷积层为了降维,第二个1x1层为了恢复原来的维度, 106 | 这样使得中间的3x3卷积层有更小的输入/输出维度。这两种设计具有相似的时间复杂度。
107 | 无参数的恒等映射在bottleneck架构中特别重要,如果将恒等映射改为变换映射,那么时间复杂度和模型大小都将翻倍,因为快捷连接的两端都有很高维度。
108 | ![](./data/figure_5.png)
109 | **50-layer ResNet**
110 | 用三层的bottleneck替换原来的两层形式,网络中带参数的层共有50个,具体如表1,维度增加时使用B方案,该模型共有38亿FLOPs。
111 | **101-layer and 152-layer ResNets**
112 | 更深层的网络,具体配置见表1。虽然深度急剧增加,但152层ResNet(113亿FLOPs)仍然比VGG-16/19(153/196亿FLOPs)的复杂度更低。
113 | 从表3,表4中可以看出50/101/152层ResNet比34层的准确率更高。并没有发现退化现象,并且随着深度增加准确率也相应提升。
114 | ![](./data/table_4.png)
115 | **Comparisons with State-of-the-art Methods**
116 | 从表4中可以看出,ResNet比先前的网络性能更好。单个152层ResNet的top-5错误率4.49%低于所有网络,而结合了6个网络(其中2个152层)可以达到 117 | 3.57%的top-5错误率,以此赢得ILSVRC2015年的第一名(见表5)。
118 | ![](./data/table_5.png)
119 | 120 | #### 4.2. CIFAR-10 and Analysis 121 | 使用CIFAR-10测试了不同的网络,考察其性能,具体表现如表6所示。
122 | ![](./data/table_6.png)
123 | 各网络在训练过程中的训练/测试误差如图6所示,总体而言符合预期。普通网络随着深度的增加误差反而增大,出现退化现象; 124 | 而残差网络随着深度增加误差也相应减小,说明残差网络很好的解决了退化问题,但有一点需要注意,就是极深层的情况——1202层网络。
125 | **Exploring Over 1000 layers**
126 | 1202层的残差网络虽然也收敛并且误差也比较小,但是它的性能却比110层的网络要差。我们认为导致该现象的原因是过拟合, 127 | CIFAR-10数据集比较小,不需要用1202层这么复杂的网络,另外在这个实验中我们也没有使用maxout或dropout等比较强的正则化方法, 128 | 一般来说使用这些正则化方法在小规模数据集上会取得更好的结果,所以如果本文的网络添加了这些正则化的方法应该能获得更好的结果。
129 | ![](./data/figure_6.png)
130 | **Analysis of Layer Responses**
131 | 3.1节就提到了网络层的响应值的问题,现在分别对不同网络的响应值进行考察,从图7中可以看出,残差网络中的层的响应值通常要比普通网络中的小, 132 | 这个结果说明残差函数比非残差函数更接近于零。另外越深层的网络中的层的响应值越小,具有越多层的残差网络在每一层中对信号的修改就越少。
133 | ![](./data/figure_7.png)
134 | 135 | #### 4.3 Object Detection on PASCAL and MS COCO 136 | 本文提出的残差网络在其他的视觉任务中也有很好的泛化性能。表7、表8分别展示了不同网络在PASCAL VOC和COCO数据集上进行物体检测的表现。 137 | 检测算法使用Faster R-CNN架构,具体的细节见附件。
138 | ![](./data/table_7.png)
139 | ![](./data/table_8.png)
140 | 141 | 142 | ### 疑惑 143 | 对于该架构有个疑问,在3.3中介绍当维度增加时有两种方法可以保证维度一致,文章中推荐B方案,就是使用卷积核1x1步长2的卷积层来处理。 144 | 但是这样做的话原来恒等映射的信息会丢失75%,那么使用核2x2步长2的max/average-pooling或者卷积层会不会更好一些?
145 | [#50 when the dimensions increase](https://github.com/KaimingHe/deep-residual-networks/issues/50)
146 | -------------------------------------------------------------------------------- /ResNet/data/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/figure_1.png -------------------------------------------------------------------------------- /ResNet/data/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/figure_2.png -------------------------------------------------------------------------------- /ResNet/data/figure_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/figure_3.png -------------------------------------------------------------------------------- /ResNet/data/figure_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/figure_4.png -------------------------------------------------------------------------------- /ResNet/data/figure_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/figure_5.png -------------------------------------------------------------------------------- /ResNet/data/figure_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/figure_6.png -------------------------------------------------------------------------------- /ResNet/data/figure_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/figure_7.png -------------------------------------------------------------------------------- /ResNet/data/table_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/table_1.png -------------------------------------------------------------------------------- /ResNet/data/table_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/table_2.png -------------------------------------------------------------------------------- /ResNet/data/table_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/table_3.png -------------------------------------------------------------------------------- /ResNet/data/table_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/table_4.png -------------------------------------------------------------------------------- /ResNet/data/table_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/table_5.png -------------------------------------------------------------------------------- /ResNet/data/table_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/table_6.png -------------------------------------------------------------------------------- /ResNet/data/table_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/table_7.png -------------------------------------------------------------------------------- /ResNet/data/table_8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ResNet/data/table_8.png -------------------------------------------------------------------------------- /ResNet/resnet_101_deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | from six.moves import xrange 4 | 5 | import caffe 6 | from caffe import layers as L, params as P 7 | from caffe import to_proto 8 | 9 | # first block 10 | # Convolution - BatchNorm - Scale - ReLU 11 | def _block_first(net, bottom, nout=64, pad=3, ks=7, stride=2): 12 | net.conv1 = L.Convolution(bottom, 13 | num_output = nout, pad = pad, 14 | kernel_size = ks, stride = stride, 15 | bias_term = False) 16 | net.bn_conv1 = L.BatchNorm(net.conv1, in_place = True) 17 | net.scale_conv1 = L.Scale(net.bn_conv1, bias_term = True, in_place = True) 18 | net.conv1_relu = L.ReLU(net.scale_conv1, in_place = True) 19 | 20 | return net.conv1_relu 21 | 22 | 23 | # 3(layer) in 1(block) 24 | # Convolution - BatchNorm - Scale 25 | def _block_3in1(major, minor, net, bottom, nout, pad, ks, stride): 26 | branch_flag = '{}_branch{}'.format(major, minor) 27 | conv_layer = 'res{}'.format(branch_flag) 28 | bn_layer = 'bn{}'.format(branch_flag) 29 | scale_layer = 'scale{}'.format(branch_flag) 30 | 31 | net[conv_layer] = L.Convolution(bottom, 32 | num_output = nout, pad = pad, 33 | kernel_size = ks, stride = stride, 34 | bias_term = False) 35 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 36 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 37 | 38 | return net[scale_layer] 39 | 40 | 41 | # 4(layer) in 1(block) 42 | # Convolution - BatchNorm - Scale - ReLU 43 | def _block_4in1(major, minor, net, bottom, nout, pad, ks, stride): 44 | branch_flag = '{}_branch{}'.format(major, minor) 45 | conv_layer = 'res{}'.format(branch_flag) 46 | bn_layer = 'bn{}'.format(branch_flag) 47 | scale_layer = 'scale{}'.format(branch_flag) 48 | relu_layer = 'res{}_relu'.format(branch_flag) 49 | 50 | net[conv_layer] = L.Convolution(bottom, 51 | num_output = nout, pad = pad, 52 | kernel_size = ks, stride = stride, 53 | bias_term = False) 54 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 55 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 56 | net[relu_layer] = L.ReLU(net[scale_layer], in_place = True) 57 | 58 | return net[relu_layer] 59 | 60 | 61 | # branch 62 | # [3in1] \ 63 | # | - branch 64 | # 4in1 - 4in1 - 3in1 / 65 | def _branch(major, net, bottom, nout, has_branch1=False, is_branch_2a=False): 66 | eltwise_layer = 'res{}'.format(major) 67 | relu_layer = 'res{}_relu'.format(major) 68 | 69 | stride = 1 70 | if has_branch1 and not is_branch_2a: 71 | stride = 2 72 | 73 | branch2_2a = _block_4in1(major, '2a', net, bottom, nout, 0, 1, stride) 74 | branch2_2b = _block_4in1(major, '2b', net, branch2_2a, nout, 1, 3, 1) 75 | branch2_2c = _block_3in1(major, '2c', net, branch2_2b, nout*4, 0, 1, 1) 76 | 77 | if has_branch1: 78 | branch1 = _block_3in1(major, '1', net, bottom, nout*4, 0, 1, stride) 79 | net[eltwise_layer] = L.Eltwise(branch1, branch2_2c) 80 | else: 81 | net[eltwise_layer] = L.Eltwise(bottom, branch2_2c) 82 | 83 | net[relu_layer] = L.ReLU(net[eltwise_layer], in_place = True) 84 | 85 | return net[relu_layer] 86 | 87 | 88 | def construc_net(): 89 | net = caffe.NetSpec() 90 | 91 | net.data = L.Input(input_param = dict(shape = dict(dim = [1,3,224,224]))) 92 | 93 | block1 = _block_first(net, net.data) 94 | 95 | net.pool1 = L.Pooling(block1, pool = P.Pooling.MAX, kernel_size = 3, stride = 2) 96 | 97 | branch_2a = _branch('2a', net, net.pool1, 64, has_branch1 = True, is_branch_2a = True) 98 | branch_2b = _branch('2b', net, branch_2a, 64) 99 | branch_2c = _branch('2c', net, branch_2b, 64) 100 | 101 | branch_3a = _branch('3a', net, branch_2c, 128, has_branch1 = True) 102 | branch_3b = _branch('3b', net, branch_3a, 128) 103 | branch_3c = _branch('3c', net, branch_3b, 128) 104 | branch_3d = _branch('3d', net, branch_3c, 128) 105 | 106 | branch_4a = _branch('4a', net, branch_3d, 256, has_branch1 = True) 107 | branch_pre = branch_4a 108 | for idx in xrange(1,23,1): # conv4_x total 1+22=23 109 | flag = '4b{}'.format(idx) 110 | branch_pre = _branch(flag, net, branch_pre, 256) 111 | 112 | branch_5a = _branch('5a', net, branch_pre, 512, has_branch1 = True) 113 | branch_5b = _branch('5b', net, branch_5a, 512) 114 | branch_5c = _branch('5c', net, branch_5b, 512) 115 | 116 | net.pool5 = L.Pooling(branch_5c, pool = P.Pooling.AVE, kernel_size = 7, stride = 1) 117 | 118 | net.fc6 = L.InnerProduct(net.pool5, num_output = 1000) 119 | net.prob = L.Softmax(net.fc6) 120 | 121 | return net.to_proto() 122 | 123 | 124 | def main(): 125 | with open('resnet_101_deploy.prototxt', 'w') as f: 126 | f.write('name: "ResNet-101_deploy"\n') 127 | f.write(str(construc_net())) 128 | 129 | if __name__ == '__main__': 130 | main() 131 | -------------------------------------------------------------------------------- /ResNet/resnet_152_deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | from six.moves import xrange 4 | 5 | import caffe 6 | from caffe import layers as L, params as P 7 | from caffe import to_proto 8 | 9 | # first block 10 | # Convolution - BatchNorm - Scale - ReLU 11 | def _block_first(net, bottom, nout=64, pad=3, ks=7, stride=2): 12 | net.conv1 = L.Convolution(bottom, 13 | num_output = nout, pad = pad, 14 | kernel_size = ks, stride = stride, 15 | bias_term = False) 16 | net.bn_conv1 = L.BatchNorm(net.conv1, in_place = True) 17 | net.scale_conv1 = L.Scale(net.bn_conv1, bias_term = True, in_place = True) 18 | net.conv1_relu = L.ReLU(net.scale_conv1, in_place = True) 19 | 20 | return net.conv1_relu 21 | 22 | 23 | # 3(layer) in 1(block) 24 | # Convolution - BatchNorm - Scale 25 | def _block_3in1(major, minor, net, bottom, nout, pad, ks, stride): 26 | branch_flag = '{}_branch{}'.format(major, minor) 27 | conv_layer = 'res{}'.format(branch_flag) 28 | bn_layer = 'bn{}'.format(branch_flag) 29 | scale_layer = 'scale{}'.format(branch_flag) 30 | 31 | net[conv_layer] = L.Convolution(bottom, 32 | num_output = nout, pad = pad, 33 | kernel_size = ks, stride = stride, 34 | bias_term = False) 35 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 36 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 37 | 38 | return net[scale_layer] 39 | 40 | 41 | # 4(layer) in 1(block) 42 | # Convolution - BatchNorm - Scale - ReLU 43 | def _block_4in1(major, minor, net, bottom, nout, pad, ks, stride): 44 | branch_flag = '{}_branch{}'.format(major, minor) 45 | conv_layer = 'res{}'.format(branch_flag) 46 | bn_layer = 'bn{}'.format(branch_flag) 47 | scale_layer = 'scale{}'.format(branch_flag) 48 | relu_layer = 'res{}_relu'.format(branch_flag) 49 | 50 | net[conv_layer] = L.Convolution(bottom, 51 | num_output = nout, pad = pad, 52 | kernel_size = ks, stride = stride, 53 | bias_term = False) 54 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 55 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 56 | net[relu_layer] = L.ReLU(net[scale_layer], in_place = True) 57 | 58 | return net[relu_layer] 59 | 60 | 61 | # branch 62 | # [3in1] \ 63 | # | - branch 64 | # 4in1 - 4in1 - 3in1 / 65 | def _branch(major, net, bottom, nout, has_branch1=False, is_branch_2a=False): 66 | eltwise_layer = 'res{}'.format(major) 67 | relu_layer = 'res{}_relu'.format(major) 68 | 69 | stride = 1 70 | if has_branch1 and not is_branch_2a: 71 | stride = 2 72 | 73 | branch2_2a = _block_4in1(major, '2a', net, bottom, nout, 0, 1, stride) 74 | branch2_2b = _block_4in1(major, '2b', net, branch2_2a, nout, 1, 3, 1) 75 | branch2_2c = _block_3in1(major, '2c', net, branch2_2b, nout*4, 0, 1, 1) 76 | 77 | if has_branch1: 78 | branch1 = _block_3in1(major, '1', net, bottom, nout*4, 0, 1, stride) 79 | net[eltwise_layer] = L.Eltwise(branch1, branch2_2c) 80 | else: 81 | net[eltwise_layer] = L.Eltwise(bottom, branch2_2c) 82 | 83 | net[relu_layer] = L.ReLU(net[eltwise_layer], in_place = True) 84 | 85 | return net[relu_layer] 86 | 87 | 88 | def construc_net(): 89 | net = caffe.NetSpec() 90 | 91 | net.data = L.Input(input_param = dict(shape = dict(dim = [1,3,224,224]))) 92 | 93 | block1 = _block_first(net, net.data) 94 | 95 | net.pool1 = L.Pooling(block1, pool = P.Pooling.MAX, kernel_size = 3, stride = 2) 96 | 97 | branch_2a = _branch('2a', net, net.pool1, 64, has_branch1 = True, is_branch_2a = True) 98 | branch_2b = _branch('2b', net, branch_2a, 64) 99 | branch_2c = _branch('2c', net, branch_2b, 64) 100 | 101 | branch_3a = _branch('3a', net, branch_2c, 128, has_branch1 = True) 102 | branch_pre = branch_3a 103 | for idx in xrange(1,8,1): # conv3_x total 1+7=8 104 | flag = '3b{}'.format(idx) 105 | branch_pre = _branch(flag, net, branch_pre, 128) 106 | 107 | branch_4a = _branch('4a', net, branch_pre, 256, has_branch1 = True) 108 | branch_pre = branch_4a 109 | for idx in xrange(1,36,1): # conv4_x total 1+35=36 110 | flag = '4b{}'.format(idx) 111 | branch_pre = _branch(flag, net, branch_pre, 256) 112 | 113 | branch_5a = _branch('5a', net, branch_pre, 512, has_branch1 = True) 114 | branch_5b = _branch('5b', net, branch_5a, 512) 115 | branch_5c = _branch('5c', net, branch_5b, 512) 116 | 117 | net.pool5 = L.Pooling(branch_5c, pool = P.Pooling.AVE, kernel_size = 7, stride = 1) 118 | 119 | net.fc6 = L.InnerProduct(net.pool5, num_output = 1000) 120 | net.prob = L.Softmax(net.fc6) 121 | 122 | return net.to_proto() 123 | 124 | 125 | def main(): 126 | with open('resnet_152_deploy.prototxt', 'w') as f: 127 | f.write('name: "ResNet-152_deploy"\n') 128 | f.write(str(construc_net())) 129 | 130 | if __name__ == '__main__': 131 | main() 132 | -------------------------------------------------------------------------------- /ResNet/resnet_50_deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | import caffe 5 | from caffe import layers as L, params as P 6 | from caffe import to_proto 7 | 8 | # first block 9 | # Convolution - BatchNorm - Scale - ReLU 10 | def _block_first(net, bottom, nout=64, pad=3, ks=7, stride=2): 11 | net.conv1 = L.Convolution(bottom, 12 | num_output = nout, pad = pad, 13 | kernel_size = ks, stride = stride, 14 | bias_term = False) 15 | net.bn_conv1 = L.BatchNorm(net.conv1, in_place = True) 16 | net.scale_conv1 = L.Scale(net.bn_conv1, bias_term = True, in_place = True) 17 | net.conv1_relu = L.ReLU(net.scale_conv1, in_place = True) 18 | 19 | return net.conv1_relu 20 | 21 | 22 | # 3(layer) in 1(block) 23 | # Convolution - BatchNorm - Scale 24 | def _block_3in1(major, minor, net, bottom, nout, pad, ks, stride): 25 | branch_flag = '{}_branch{}'.format(major, minor) 26 | conv_layer = 'res{}'.format(branch_flag) 27 | bn_layer = 'bn{}'.format(branch_flag) 28 | scale_layer = 'scale{}'.format(branch_flag) 29 | 30 | net[conv_layer] = L.Convolution(bottom, 31 | num_output = nout, pad = pad, 32 | kernel_size = ks, stride = stride, 33 | bias_term = False) 34 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 35 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 36 | 37 | return net[scale_layer] 38 | 39 | 40 | # 4(layer) in 1(block) 41 | # Convolution - BatchNorm - Scale - ReLU 42 | def _block_4in1(major, minor, net, bottom, nout, pad, ks, stride): 43 | branch_flag = '{}_branch{}'.format(major, minor) 44 | conv_layer = 'res{}'.format(branch_flag) 45 | bn_layer = 'bn{}'.format(branch_flag) 46 | scale_layer = 'scale{}'.format(branch_flag) 47 | relu_layer = 'res{}_relu'.format(branch_flag) 48 | 49 | net[conv_layer] = L.Convolution(bottom, 50 | num_output = nout, pad = pad, 51 | kernel_size = ks, stride = stride, 52 | bias_term = False) 53 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 54 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 55 | net[relu_layer] = L.ReLU(net[scale_layer], in_place = True) 56 | 57 | return net[relu_layer] 58 | 59 | 60 | # branch 61 | # [3in1] \ 62 | # | - branch 63 | # 4in1 - 4in1 - 3in1 / 64 | def _branch(major, net, bottom, nout, has_branch1=False, is_branch_2a=False): 65 | eltwise_layer = 'res{}'.format(major) 66 | relu_layer = 'res{}_relu'.format(major) 67 | 68 | stride = 1 69 | if has_branch1 and not is_branch_2a: 70 | stride = 2 71 | 72 | branch2_2a = _block_4in1(major, '2a', net, bottom, nout, 0, 1, stride) 73 | branch2_2b = _block_4in1(major, '2b', net, branch2_2a, nout, 1, 3, 1) 74 | branch2_2c = _block_3in1(major, '2c', net, branch2_2b, nout*4, 0, 1, 1) 75 | 76 | if has_branch1: 77 | branch1 = _block_3in1(major, '1', net, bottom, nout*4, 0, 1, stride) 78 | net[eltwise_layer] = L.Eltwise(branch1, branch2_2c) 79 | else: 80 | net[eltwise_layer] = L.Eltwise(bottom, branch2_2c) 81 | 82 | net[relu_layer] = L.ReLU(net[eltwise_layer], in_place = True) 83 | 84 | return net[relu_layer] 85 | 86 | 87 | def construc_net(): 88 | net = caffe.NetSpec() 89 | 90 | net.data = L.Input(input_param = dict(shape = dict(dim = [1,1,224,224]))) 91 | 92 | block1 = _block_first(net, net.data) 93 | 94 | net.pool1 = L.Pooling(block1, pool = P.Pooling.MAX, kernel_size = 3, stride = 2) 95 | 96 | branch_2a = _branch('2a', net, net.pool1, 64, has_branch1 = True, is_branch_2a = True) 97 | branch_2b = _branch('2b', net, branch_2a, 64) 98 | branch_2c = _branch('2c', net, branch_2b, 64) 99 | 100 | branch_3a = _branch('3a', net, branch_2c, 128, has_branch1 = True) 101 | branch_3b = _branch('3b', net, branch_3a, 128) 102 | branch_3c = _branch('3c', net, branch_3b, 128) 103 | branch_3d = _branch('3d', net, branch_3c, 128) 104 | 105 | branch_4a = _branch('4a', net, branch_3d, 256, has_branch1 = True) 106 | branch_4b = _branch('4b', net, branch_4a, 256) 107 | branch_4c = _branch('4c', net, branch_4b, 256) 108 | branch_4d = _branch('4d', net, branch_4c, 256) 109 | branch_4e = _branch('4e', net, branch_4d, 256) 110 | branch_4f = _branch('4f', net, branch_4e, 256) 111 | 112 | branch_5a = _branch('5a', net, branch_4f, 512, has_branch1 = True) 113 | branch_5b = _branch('5b', net, branch_5a, 512) 114 | branch_5c = _branch('5c', net, branch_5b, 512) 115 | 116 | net.pool5 = L.Pooling(branch_5c, pool = P.Pooling.AVE, kernel_size = 7, stride = 1) 117 | 118 | net.fc6 = L.InnerProduct(net.pool5, num_output = 1000) 119 | net.prob = L.Softmax(net.fc6) 120 | 121 | return net.to_proto() 122 | 123 | 124 | def main(): 125 | with open('resnet_50_deploy.prototxt', 'w') as f: 126 | f.write('name: "ResNet-50_deploy"\n') 127 | f.write(str(construc_net())) 128 | 129 | if __name__ == '__main__': 130 | main() 131 | -------------------------------------------------------------------------------- /ResNet/resnet_50_train_test.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | 4 | import caffe 5 | from caffe import layers as L, params as P 6 | from caffe import to_proto 7 | 8 | 9 | # filler 10 | # type: 'constant', 'gaussian', 'msra', 'xavier' 11 | # [default: 'xavier'] 12 | # value: the value in constant filler 13 | # [default: 0.2] 14 | # std: the std value in Gaussian filler 15 | # [default: 0.01] 16 | def _filler(filler_type='msra', filler_value=0.2, filler_std=0.01): 17 | filler = {} 18 | 19 | if filler_type == 'msra': 20 | filler = dict(type = filler_type) 21 | elif filler_type == 'xavier': 22 | filler = dict(type = filler_type) 23 | elif filler_type == 'gaussian': 24 | filler = dict(type = filler_type, std = filler_std) 25 | elif filler_type == 'constant': 26 | filler = dict(type = filler_type, value = filler_value) 27 | 28 | return filler 29 | 30 | 31 | # first block 32 | # Convolution - BatchNorm - Scale - ReLU 33 | def _block_first(net, bottom, nout=64, pad=3, ks=7, stride=2): 34 | net.conv1 = L.Convolution(bottom, 35 | num_output = nout, pad = pad, 36 | kernel_size = ks, stride = stride, 37 | weight_filler = _filler(), 38 | bias_term = False) 39 | net.bn_conv1 = L.BatchNorm(net.conv1, in_place = True) 40 | net.scale_conv1 = L.Scale(net.bn_conv1, bias_term = True, in_place = True) 41 | net.conv1_relu = L.ReLU(net.scale_conv1, in_place = True) 42 | 43 | return net.conv1_relu 44 | 45 | 46 | # 3(layer) in 1(block) 47 | # Convolution - BatchNorm - Scale 48 | def _block_3in1(major, minor, net, bottom, nout, pad, ks, stride): 49 | branch_flag = '{}_branch{}'.format(major, minor) 50 | conv_layer = 'res{}'.format(branch_flag) 51 | bn_layer = 'bn{}'.format(branch_flag) 52 | scale_layer = 'scale{}'.format(branch_flag) 53 | 54 | net[conv_layer] = L.Convolution(bottom, 55 | num_output = nout, pad = pad, 56 | kernel_size = ks, stride = stride, 57 | weight_filler = _filler(), 58 | bias_term = False) 59 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 60 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 61 | 62 | return net[scale_layer] 63 | 64 | 65 | # 4(layer) in 1(block) 66 | # Convolution - BatchNorm - Scale - ReLU 67 | def _block_4in1(major, minor, net, bottom, nout, pad, ks, stride): 68 | branch_flag = '{}_branch{}'.format(major, minor) 69 | conv_layer = 'res{}'.format(branch_flag) 70 | bn_layer = 'bn{}'.format(branch_flag) 71 | scale_layer = 'scale{}'.format(branch_flag) 72 | relu_layer = 'res{}_relu'.format(branch_flag) 73 | 74 | net[conv_layer] = L.Convolution(bottom, 75 | num_output = nout, pad = pad, 76 | kernel_size = ks, stride = stride, 77 | weight_filler = _filler(), 78 | bias_term = False) 79 | net[bn_layer] = L.BatchNorm(net[conv_layer], in_place = True) 80 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 81 | net[relu_layer] = L.ReLU(net[scale_layer], in_place = True) 82 | 83 | return net[relu_layer] 84 | 85 | 86 | # branch 87 | # [3in1] \ 88 | # | - branch 89 | # 4in1 - 4in1 - 3in1 / 90 | def _branch(major, net, bottom, nout, has_branch1=False, is_branch_2a=False): 91 | eltwise_layer = 'res{}'.format(major) 92 | relu_layer = 'res{}_relu'.format(major) 93 | 94 | stride = 1 95 | if has_branch1 and not is_branch_2a: 96 | stride = 2 97 | 98 | branch2_2a = _block_4in1(major, '2a', net, bottom, nout, 0, 1, stride) 99 | branch2_2b = _block_4in1(major, '2b', net, branch2_2a, nout, 1, 3, 1) 100 | branch2_2c = _block_3in1(major, '2c', net, branch2_2b, nout*4, 0, 1, 1) 101 | 102 | if has_branch1: 103 | branch1 = _block_3in1(major, '1', net, bottom, nout*4, 0, 1, stride) 104 | net[eltwise_layer] = L.Eltwise(branch1, branch2_2c) 105 | else: 106 | net[eltwise_layer] = L.Eltwise(bottom, branch2_2c) 107 | 108 | net[relu_layer] = L.ReLU(net[eltwise_layer], in_place = True) 109 | 110 | return net[relu_layer] 111 | 112 | 113 | def construc_net(): 114 | net = caffe.NetSpec() 115 | 116 | _transform_train = dict(scale = 0.0078125, # 1/128 117 | mirror = True, 118 | #crop_size = 224, 119 | mean_value = [127.5, 127.5, 127.5]) 120 | _transform_test = dict(scale = 0.0078125, # 1/128 121 | mirror = False, 122 | #crop_size = 224, 123 | mean_value = [127.5, 127.5, 127.5]) 124 | 125 | #net.data, net.label = L.ImageData(include = dict(phase = 0), # TRAIN = 0,in caffe_pb2.py 126 | # transform_param = _transform_train, 127 | # source = '../data/images_train.txt', 128 | # batch_size = 32, 129 | # shuffle = True, 130 | # #new_height = 224, 131 | # #new_width = 224, 132 | # #is_color = True, 133 | # ntop = 2) 134 | net.data, net.label = L.Data(include = dict(phase = 0), 135 | transform_param = _transform_train, 136 | source = '../data/images_train_lmdb', 137 | batch_size = 32, 138 | backend = P.Data.LMDB, 139 | ntop = 2) 140 | 141 | # NOTE 142 | data_layer_train = net.to_proto() 143 | 144 | #net.data, net.label = L.ImageData(include = dict(phase = 1), # TEST = 1 145 | # transform_param = _transform_test, 146 | # source = '../data/images_test.txt', 147 | # batch_size = 4, 148 | # shuffle = False, 149 | # #new_height = 224, 150 | # #new_width = 224, 151 | # #is_color = True, 152 | # ntop = 2) 153 | net.data, net.label = L.Data(include = dict(phase = 1), 154 | transform_param = _transform_test, 155 | source = '../data/images_test_lmdb', 156 | batch_size = 4, 157 | backend = P.Data.LMDB, 158 | ntop = 2) 159 | 160 | block1 = _block_first(net, net.data) 161 | 162 | net.pool1 = L.Pooling(block1, pool = P.Pooling.MAX, kernel_size = 3, stride = 2) 163 | 164 | branch_2a = _branch('2a', net, net.pool1, 64, has_branch1 = True, is_branch_2a = True) 165 | branch_2b = _branch('2b', net, branch_2a, 64) 166 | branch_2c = _branch('2c', net, branch_2b, 64) 167 | 168 | branch_3a = _branch('3a', net, branch_2c, 128, has_branch1 = True) 169 | branch_3b = _branch('3b', net, branch_3a, 128) 170 | branch_3c = _branch('3c', net, branch_3b, 128) 171 | branch_3d = _branch('3d', net, branch_3c, 128) 172 | 173 | branch_4a = _branch('4a', net, branch_3d, 256, has_branch1 = True) 174 | branch_4b = _branch('4b', net, branch_4a, 256) 175 | branch_4c = _branch('4c', net, branch_4b, 256) 176 | branch_4d = _branch('4d', net, branch_4c, 256) 177 | branch_4e = _branch('4e', net, branch_4d, 256) 178 | branch_4f = _branch('4f', net, branch_4e, 256) 179 | 180 | branch_5a = _branch('5a', net, branch_4f, 512, has_branch1 = True) 181 | branch_5b = _branch('5b', net, branch_5a, 512) 182 | branch_5c = _branch('5c', net, branch_5b, 512) 183 | 184 | net.pool5 = L.Pooling(branch_5c, pool = P.Pooling.AVE, kernel_size = 7, stride = 1) 185 | 186 | net.fc6 = L.InnerProduct(net.pool5, num_output = 1000) 187 | 188 | net.loss = L.SoftmaxWithLoss(net.fc6, net.label) 189 | 190 | net.accuracy = L.Accuracy(net.fc6, net.label, include = dict(phase = 1)) 191 | 192 | return str(data_layer_train) + str(net.to_proto()) 193 | 194 | 195 | def main(): 196 | file_name = 'resnet_50_train_test.prototxt' 197 | with open(file_name, 'w') as f: 198 | f.write('name: "ResNet-50_train_test"\n') 199 | f.write(construc_net()) 200 | 201 | 202 | if __name__ == '__main__': 203 | main() 204 | -------------------------------------------------------------------------------- /SENet/README.md: -------------------------------------------------------------------------------- 1 | # SENet 2 | [Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507)
3 | Jie Hu, Li Shen, Gang Sun
4 | 5 | ### 摘要 6 | 卷积神经网络顾名思义就是依赖卷积操作,使用局部感受区域(local receptive field)的思想融合空间信息和通道信息来提取包含信息的特征。 7 | 有很多工作从增强空间维度编码的角度来提升网络的表示能力,本文主要聚焦于通道维度,并提出一种新的结构单元——“Squeeze-and-Excitation(SE)”单元, 8 | 对通道间的依赖关系进行建模,可以自适应的调整各通道的特征响应值。如果将SE block添加到之前的先进网络中,只会增加很小的计算消耗, 9 | 但却可以极大地提升网络性能。依靠SENet作者获得了ILSVRC2017分类任务的第一名,top-5错误率为2.251%。
10 | 11 | ### 1. Introduction 12 | 每个卷积层有若干滤波器,可以学习表达包含所有通道的局部空间连接模式。也就是说,卷积滤波器提取局部感受区域中的空间和通道的融合信息。 13 | 再加上非线性激活层和降采样层,CNN可以获得具有全局感受区域的分层模式来作为图像的描述。最近的一些工作表明, 14 | 可以通过加入有助于获取空间相关性的学习机制来改善网络的性能,而且不需要额外的监督。例如Inception架构,通过在模块中加入多尺度处理来提高性能。 15 | 另有探索更好的空间相关性的模型或者添加空间注意力的一些工作。
16 | 与上述方法不同,本文主要探索网络架构设计的另一个方面——通道关联性。本文提出一种新的网络单元——“Squeeze-and-Excitation(SE)” block, 17 | 希望通过对各通道的依赖性进行建模以提高网络的表示能力,并且可以对特征进行逐通道调整,这样网络就可以学习通过全局信息来有选择性的加强 18 | 包含有用信息的特征并抑制无用特征。
19 | SE block的基本结构见图1。第一步squeeze操作,将各通道的全局空间特征作为该通道的表示,形成一个通道描述符;第二步excitation操作, 20 | 学习对各通道的依赖程度,并根据依赖程度的不同对特征图进行调整,调整后的特征图就是SE block的输出。
21 | ![](./data/figure_1.png)
22 | 前面层中的SE block以类别无关(class agnostic)的方式增强可共享的低层表示的质量。越后面的层SE block越来越类别相关。 23 | SE block重新调整特征的益处可以在整个网络中积累。
24 | SE block设计简单,可以很容易地加入到已有的网络中,只增加少量的模型复杂度和计算开支,另外对不同数据集的泛化能力较强。
25 | 依靠SENet取得了ILSVRC2017分类任务的第一名。官方实现(Caffe)源码地址:https://github.com/hujie-frank/SENet 。
26 | 27 | ### 2. Related Work 28 | **Deep architectures**
29 | 有很多工作通过调整卷积神经网络架构使模型更容易地学习深层特征以提升模型性能。VGG和Inception网络证明可以通过增加深度来提升性能。 30 | Batch normalization (BN)在网络中添加可以调节输入数据的单元来稳定学习过程,改善梯度在网络中的传播,使得更深层的网络也可以工作。 31 | [ResNet](https://github.com/binLearning/caffe_toolkit/tree/master/ResNet)、 32 | [ResNet-v2](https://github.com/binLearning/caffe_toolkit/tree/master/ResNet-v2)在网络中加入恒等映射形式的跳跃连接, 33 | 使网络学习残差函数,极大推进了网络架构向更深层的发展。[DenseNet](https://github.com/binLearning/caffe_toolkit/tree/master/DenseNet)、 34 | [DPN](https://github.com/binLearning/caffe_toolkit/tree/master/DPN)通过调整网络各层间的连接机制来提升深层网络的学习和表示性能。
35 | 另一个方向是调整网络中模块的形式。分组卷积(grouped convolutions)可以用于增加基数(cardinality),如Deep roots、 36 | [ResNeXt](https://github.com/binLearning/caffe_toolkit/tree/master/ResNeXt)中所示,网络可以学习到更丰富的表示。 37 | 多分支卷积(multi-branch convolutions)可以视为分组卷积的泛化,网络模块可以进行更灵活多变的操作,如Inception系列。 38 | 跨通道相关是一种新的特征组合方式,可以独立于空间结构(如Xception),或者使用1x1卷积进行处理(如NIN), 39 | 一般来说这些工作主要是为了降低模型和计算复杂度。这种方法的前提假设是通道是实例无关(instance-agnostic)的, 40 | 也就是说输出对于输入数据各通道的依赖性是相同的,不是类别相关的。与之相反,本文提出一种新的机制, 41 | 使用全局信息对各通道动态的非线性的依赖性进行建模,可以改善学习过程并提升网络的表示能力。
42 | **Attention and gating mechanisms**
43 | 注意力机制(attention)引导计算资源偏向输入信号中信息量最大的部分,近几年开始大量用于深度神经网络中,在很多任务中对性能有极大提升。 44 | 它一般是和门限函数(如softmax、sigmoid)或者序列方法联合使用。highway网络使用门限机制来调节快捷连接,Residual attention network 45 | for image classification中介绍了一种trunk-and-mask注意力机制用于沙漏模型(hourglass module),成功的用于语义分割任务。 46 | SE block是一种轻量级的门限机制,专门用于对各通道的关联性进行建模。
47 | 48 | ### 3. Squeeze-and-Excitation Blocks 49 | 卷积层的输出并没有考虑对各通道的依赖性,本文的目标就是让网络有选择性的增强信息量大的特征,使得后续处理可以充分利用这些特征, 50 | 并对无用特征进行抑制。
51 | #### 3.1 Squeeze: Global Information Embedding 52 | 首先考察输出特征每个通道的信号,压缩(squeeze)全局空间信息为通道描述符,使用全局平均池化来生成各通道的统计量。
53 | #### 3.2 Excitation: Adaptive Recalibration 54 | 第二就是考察各通道的依赖程度,实现函数有两个标准:一是要灵活,二是要学习一个非互斥的关系,因为可能多个通道都会对结果有影响。 55 | 本文使用带sigmoid激活函数的门限机制来实现。为了限制模型复杂度并增强泛化能力,门限机制中使用bottleneck形式的两个全连接层, 56 | 第一个FC层降维至1/r,r为超参数,本文取16,具体见6.3实验。最后的sigmoid函数就是各通道的权重,根据输入数据调节各通道特征的权重, 57 | 有助于增强特征的可分辨性。
58 | #### 3.3 Exemplars: SE-Inception and SE-ResNet 59 | 在Inception网络和ResNet网络中加入SE block,具体见图2、图3。
60 | ![](./data/figure_2.png)
61 | ![](./data/figure_3.png)
62 | 63 | ### 4. Model and Computational Complexity 64 | 对添加了SE block的网络的具体配置见表1。
65 | ![](./data/table_1.png)
66 | 每个SE block中包含一个全局平均池化操作,两个小的全连接层,最后一个简单的逐通道缩放操作,全部合起来在ResNet-50的基础上增加了0.26%的计算量。 67 | 新添加的参数量主要来自于两个全连接层,ResNet-50增加了约10%,大多数都是来自最后阶段,此时的通道维度很大。 68 | 但是实验发现如果去掉最后阶段的SE block性能并没有太大影响,而新增加的参数量则会减小到约4%。
69 | 70 | ### 5. Implementation 71 | 基本都是常规处理和训练设置。值得注意的是采用了Relay backpropagation for effective learning of deep convolutional neural networks 72 | 中的数据平衡策略。
73 | 74 | ### 6. Experiments 75 | #### 6.1 ImageNet Classification 76 | 本文实验的不同网络的配置见表2,训练曲线见图4-6。
77 | ![](./data/table_2.png)
78 | ![](./data/figure_4.png)
79 | ![](./data/figure_5.png)
80 | ![](./data/figure_6.png)
81 | 在ImageNet验证集上不同网络的表现见表3。
82 | ![](./data/table_3.png)
83 | #### 6.2 Scene Classification 84 | 不同网络的性能对比见表4。
85 | ![](./data/table_4.png)
86 | #### 6.3 Analysis and Discussion 87 | **Reduction ratio**
88 | 3.2中讨论的降维系数是超参数,它不同取值对网络性能的影响见表5。
89 | ![](./data/table_5.png)
90 | 为了权衡准确率与复杂度,本文选取r=16。
91 | **The role of Excitation**
92 | 考察自门限(self-gating)excitation机制。选取四个类别(如图7),分别考察不同层中的SE block的平均激活值,其分布如图8所示。
93 | ![](./data/figure_7.png)
94 | ![](./data/figure_8.png)
95 | 通过观察图8中不同层SE block激活值的分布情况,发现1)前面层中的分布基本一样,说明这一阶段的特征是类别无关的;2)后续层中分布越来越类别相关, 96 | 每个类别对特征由不同的选择;3)SE_5_2和SE_5_3中的分布也基本一致,说明这两层对网络重新调整的重要性不高,可以去掉这两层中的SE block 97 | 以减少参数量,如第4章中所述。
98 | 99 | ### 7. Conclusion 100 | SE block根据输入动态调整各通道的特征,增强网络的表示能力。另外也可以用于辅助网络修剪/压缩的工作。 101 | -------------------------------------------------------------------------------- /SENet/data/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/figure_1.png -------------------------------------------------------------------------------- /SENet/data/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/figure_2.png -------------------------------------------------------------------------------- /SENet/data/figure_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/figure_3.png -------------------------------------------------------------------------------- /SENet/data/figure_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/figure_4.png -------------------------------------------------------------------------------- /SENet/data/figure_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/figure_5.png -------------------------------------------------------------------------------- /SENet/data/figure_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/figure_6.png -------------------------------------------------------------------------------- /SENet/data/figure_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/figure_7.png -------------------------------------------------------------------------------- /SENet/data/figure_8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/figure_8.png -------------------------------------------------------------------------------- /SENet/data/table_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/table_1.png -------------------------------------------------------------------------------- /SENet/data/table_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/table_2.png -------------------------------------------------------------------------------- /SENet/data/table_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/table_3.png -------------------------------------------------------------------------------- /SENet/data/table_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/table_4.png -------------------------------------------------------------------------------- /SENet/data/table_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/SENet/data/table_5.png -------------------------------------------------------------------------------- /ShuffleNet/README.md: -------------------------------------------------------------------------------- 1 | # ShuffleNet 2 | [ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices](https://arxiv.org/abs/1707.01083)
3 | Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun
4 | 5 | ### 摘要 6 | 本文提出一种计算效率极高的CNN架构——ShuffleNet,主要应用于计算能力有限(例如10-150 MFLOPs)的移动设备中。ShuffleNet架构中利用了两个新的操作, 7 | 逐点分组卷积(pointwise group convolution)和通道重排(channel shuffle),在保持准确率的前提下极大地减少计算量。 8 | 在ImageNet分类和MS COCO检测任务上的实验表明,ShuffleNet的性能比其他结构(例如MobileNet)更优越。ShuffleNet在基于ARM的移动设备中的实际运行速度 9 | 要比AlexNet快约13倍,且准确率基本保持不变。
10 | 11 | ### 1. Introduction 12 | 本文主要聚焦于设计一种计算开支很小但准确率高的网络,主要应用于移动平台如无人机、机器人、手机等。之前的一些工作主要是对“基础”网络架构进行 13 | 修剪(pruning)、压缩(compressing)、低精度表示(low-bit representing)等处理来达到降低计算量的目的,而本文是要探索一种高效计算的基础网络。
14 | 当前最先进的基础网络架构如Xception、[ResNeXt](https://github.com/binLearning/caffe_toolkit/tree/master/ResNeXt)在极小的网络中计算效率变低, 15 | 主要耗费在密集的1x1卷积计算上。本文提出使用逐点分组卷积(pointwise group convolution)替代1x1卷积来减小计算复杂度, 16 | 另为了消除其带来的副作用,使用通道重排(channel shuffle)来改进特征通道中的信息流动,基于这两个操作构建了高效计算的ShuffleNet。 17 | 在同样的计算开支下,ShuffleNet比其他流行的机构有更多的特征图通道数量,可以编码更多的信息,这对于很小的网络的性能是尤为重要的。
18 | 19 | ### 2. Related Work 20 | **Efficient Model Designs**
21 | 在嵌入式设备上运行高质量深度神经网络的需求促进了对高效模型设计的研究。GoogLeNet在增加网络深度时随之增加的复杂度比简单堆叠卷积层的方式要低得多。 22 | SqueezeNet可以在保持准确率的前提下大幅降低参数量和计算量。 23 | [ResNet](https://github.com/binLearning/caffe_toolkit/tree/master/ResNet)中使用bottleneck结构来提高计算效率。 24 | AlexNet中提出分组卷积是为了将模型分配到两个GPU中,在ResNeXt中被发现可以用于提高网络性能。Xception中提出的深度分离卷积 25 | (depthwise separable convolution,逐通道卷积+全通道1x1卷积)是对Inception系列中分离卷积思想的推广。 26 | MobileNet使用深度分离卷积构建轻量级模型获得当前最先进的结果。本文将分组卷积和深度分离卷积推广到一种新的形式。
27 | **Model Acceleration**
28 | 这个方向主要是在保持预训练模型准确率的前提下对预测过程进行加速。有的方法是通过修剪网络连接或通道来减少预训练模型中的冗余连接。 29 | 量化(quantization)和因式分解(factorization)也可以减少冗余。还有一些方法并不是改变参数,而是用FFT或其他方法来优化卷积算法的实现以达到加速的目的。 30 | 蒸馏(distilling)将大模型中的知识迁移到小模型中,使小模型更易于训练。与上述方法相比,本文主要聚焦于设计更好的模型来提高性能, 31 | 而不是加速或迁移已有的模型。
32 | 33 | ### 3. Approach 34 | #### 3.1 Channel Shuffle for Group Convolutions 35 | Xception和ResNeXt分别引进了深度分离卷积(depthwise separable convolution)和分组卷积(group convolution)来权衡模型表示能力与计算量。 36 | 但是这些设计都没有考虑其中的1x1卷积(也被称为逐点卷积(pointwise convolutions)),这部分也是需要很大的计算量的。举例来说, 37 | ResNeXt中只有3x3卷积采用分组卷积,那么每个残差block中93.4%的乘加计算来自于逐点卷积,在极小的网络中逐点卷积会限制通道的数量, 38 | 进而影响到模型性能。
39 | 为了解决这个问题,一个简单的解决方法就是在通道维度上应用稀疏连接,比如在1x1卷积上也采用分组卷积的方式。但是这样做会带来副作用: 40 | 输出中的每个通道数据只是由输入中同组的通道数据推导得到的(如图1(a)所示),这会阻碍信息在不同分组的通道间的流动,减弱网络的表示能力。
41 | ![](./data/figure_1.png)
42 | 如果让每个组的卷积可以获得其他组的输入数据(如图1(b)所示),那么输入/输出的各通道就是完全相关的。为了达到这个目的,可以将每组卷积的输出再细分, 43 | 然后将细分的子组分别传输到下一层的不同组中。这个操作可以由通道重排(channel shuffle)来实现:假设分为g个组进行卷积, 44 | 每组输出n个通道,那么输出的总通道数就是gxn,先将输出的维度变成(g,n),然后转置,最后还原为nxg的数据即可,结果如图1(c)所示。 45 | 将通道重排后的数据作为下一层分组卷积的输入即可,这样的操作不要求两个分组卷积层有相同的分组数量。
46 | 通道重排(channel shuffle)流程见下图:
47 | ![](./data/channel_shuffle.jpg)
48 | #### 3.2 ShuffleNet Unit 49 | 之前的网络(ResNeXt、Xception)只对3x3卷积进行分组/逐通道卷积,现在在1x1卷积(也称为pointwise convolution)上也应用分组卷积, 50 | 称为逐点分组卷积(1x1卷积+分组卷积),然后再加上通道重排操作,就可以在ResNet的基础上构建ShuffleNet,其单元结构见图2。
51 | ![](./data/figure_2.png)
52 | 在ResNet的基础上,首先将残差block中的3x3卷积层替换为逐通道卷积(depthwise convolution)(如图2(a)所示)。 53 | 然后将第一个1x1卷积层替换为逐点分组卷积加上通道重排的操作,这样就构成了ShuffleNet单元(如图(b)所示)。第二个逐点分组卷积是为了恢复原来的通道维度, 54 | 为了简单起见并没有在它后面添加通道重排的操作,这和添加时得到的结果基本相同。BN和非线性激活的使用和ResNet/ResNeXt中类似, 55 | 另外在逐层卷积后不使用ReLU也遵循了Xception。当空间维度减半时,在快捷连接(shortcut path)上添加尺寸3x3步幅2的平均池化层, 56 | 逐通道卷积步幅为2,最后将两者相加的操作替换为拼接,这样输出通道数自然就加倍了,所做修改如图2(c)所示。
57 | 相比同样配置的ResNet和ResNeXt,ShuffleNet的计算量要低得多。假设输入数据大小为c\*h\*w,bottleneck层(1x1+3x3+1x1)通道数为m, 58 | 那么ResNet单元需要hw(2cm+9m\*m) FLOPs,ResNeXt需要hw(2cm+9m\*m/g) FLOPs,而ShuffleNet只需要hw(2cm/g+9m) FLOPs,其中g为分组数量。 59 | 也就是说在给定计算开支的情况下ShuffleNet可以包含更多特征映射,这对于小网络来说非常重要,因为很小的网络一般没有足够的通道数量来进行信息传输。
60 | 逐通道卷积理论上有很低的计算量,但在低功率移动设备上很难有效实现,与密集运算相比计算/内存访问率要差,Xception论文中也提到了这个问题。 61 | 在ShuffleNet中故意只在bottleneck层(3x3卷积)上使用逐通道卷积以避免这种开支。
62 | #### 3.3 Network Architecture 63 | ShuffleNet网络的配置见表1。
64 | ![](./data/table_1.png)
65 | 需要注意的几点:第一个逐点卷积(1x1卷积)不做分组处理,以为输入通道数量相对较小;每阶段(stage)中的第一个block做空间维度减半; 66 | 每个block中的bottleneck层(3x3卷积层)的通道数是block输出通道数的1/4。
67 | 在ShuffleNet单元中分组数量g控制着1x1卷积连接的稀疏程度。从表1中可以看出,在保持计算量基本不变的前提下,分组越多就可以使用越多的通道, 68 | 这有助于网络编码更多信息,虽然卷积核可能会因为有限的输入通道数发生退化现象。
69 | 在通道数量上使用缩放因子s来调节网络复杂度,文中以sx表示。通道数乘s,模型的复杂度大约变为s平方倍。
70 | 71 | ### 4 Experiments 72 | #### 4.1 Ablation Study 73 | ShuffleNet的基础是逐点分组卷积和通道重排,分别考察这两者的作用。
74 | ##### 4.1.1 On the Importance of Pointwise Group Convolutions 75 | 不同分组数量的性能见表2。
76 | ![](./data/table_2.png)
77 | 从表2中可以看出,分组卷积可以提升网络性能,并且网络越小越可以从分组数量的增加中获益,而规模较大的则会趋于饱和。 78 | arch2减少stage3中的block数量但增加每个block中的特征图数量,整体的复杂度基本保持不变,性能比之前有一定提升, 79 | 这说明更宽(wider)的特征图对于较小的模型更为重要。
80 | ##### 4.1.2 Channel Shuffle vs. No Shuffle 81 | 通道重排的作用是使得信息可以在多个分组的卷积层中跨组传播,表3展示了使用或不使用通道重排的性能比对。
82 | ![](./data/table_3.png)
83 | 从表3中可以看出,当分组数量g很大时,添加通道重排可以极大提升性能,这显示了跨通道信息交换的重要性。
84 | #### 4.2 Comparison with Other Structure Units 85 | 与最近几年先进网络架构的性能比对见表4。
86 | ![](./data/table_4.png)
87 | #### 4.3 Comparison with MobileNets and Other Frameworks 88 | 表5展示了ShuffleNet和MobileNet在复杂度、准确率等方面的比对。
89 | ![](./data/table_5.png)
90 | 表6比较了ShuffleNet与一些主流网络的复杂度。
91 | ![](./data/table_6.png)
92 | #### 4.4 Generalization Ability 93 | 为了评估ShuffleNet的泛化能力,使用Faster-RCNN框架和ShuffleNet在MS COCO检测数据集上进行测试,具体性能见表7。
94 | ![](./data/table_7.png)
95 | #### 4.5 Actual Speedup Evaluation 96 | 由于内存访问和其他开支的影响,理论上4倍的复杂度削减在实际实现中只能提速约2.6倍。具体比对见表8。
97 | ![](./data/table_8.png)
98 | ShuffleNet 0.5x相比AlexNet理论上应提速18倍,实际测试提速13倍,相比于其他一些与AlexNet性能基本相同的网络或加速方法要快得多。 99 | -------------------------------------------------------------------------------- /ShuffleNet/data/channel_shuffle.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/channel_shuffle.jpg -------------------------------------------------------------------------------- /ShuffleNet/data/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/figure_1.png -------------------------------------------------------------------------------- /ShuffleNet/data/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/figure_2.png -------------------------------------------------------------------------------- /ShuffleNet/data/table_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/table_1.png -------------------------------------------------------------------------------- /ShuffleNet/data/table_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/table_2.png -------------------------------------------------------------------------------- /ShuffleNet/data/table_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/table_3.png -------------------------------------------------------------------------------- /ShuffleNet/data/table_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/table_4.png -------------------------------------------------------------------------------- /ShuffleNet/data/table_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/table_5.png -------------------------------------------------------------------------------- /ShuffleNet/data/table_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/table_6.png -------------------------------------------------------------------------------- /ShuffleNet/data/table_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/table_7.png -------------------------------------------------------------------------------- /ShuffleNet/data/table_8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/ShuffleNet/data/table_8.png -------------------------------------------------------------------------------- /VGGNet/vgg_16_deploy.prototxt: -------------------------------------------------------------------------------- 1 | name: "VGG-16_deploy" 2 | layer { 3 | name: "data" 4 | type: "Input" 5 | top: "data" 6 | input_param { 7 | shape { 8 | dim: 10 9 | dim: 3 10 | dim: 224 11 | dim: 224 12 | } 13 | } 14 | } 15 | layer { 16 | name: "conv1_1" 17 | type: "Convolution" 18 | bottom: "data" 19 | top: "conv1_1" 20 | convolution_param { 21 | num_output: 64 22 | pad: 1 23 | kernel_size: 3 24 | stride: 1 25 | } 26 | } 27 | layer { 28 | name: "relu1_1" 29 | type: "ReLU" 30 | bottom: "conv1_1" 31 | top: "conv1_1" 32 | } 33 | layer { 34 | name: "conv1_2" 35 | type: "Convolution" 36 | bottom: "conv1_1" 37 | top: "conv1_2" 38 | convolution_param { 39 | num_output: 64 40 | pad: 1 41 | kernel_size: 3 42 | stride: 1 43 | } 44 | } 45 | layer { 46 | name: "relu1_2" 47 | type: "ReLU" 48 | bottom: "conv1_2" 49 | top: "conv1_2" 50 | } 51 | layer { 52 | name: "pool1" 53 | type: "Pooling" 54 | bottom: "conv1_2" 55 | top: "pool1" 56 | pooling_param { 57 | pool: MAX 58 | kernel_size: 2 59 | stride: 2 60 | } 61 | } 62 | layer { 63 | name: "conv2_1" 64 | type: "Convolution" 65 | bottom: "pool1" 66 | top: "conv2_1" 67 | convolution_param { 68 | num_output: 128 69 | pad: 1 70 | kernel_size: 3 71 | stride: 1 72 | } 73 | } 74 | layer { 75 | name: "relu2_1" 76 | type: "ReLU" 77 | bottom: "conv2_1" 78 | top: "conv2_1" 79 | } 80 | layer { 81 | name: "conv2_2" 82 | type: "Convolution" 83 | bottom: "conv2_1" 84 | top: "conv2_2" 85 | convolution_param { 86 | num_output: 128 87 | pad: 1 88 | kernel_size: 3 89 | stride: 1 90 | } 91 | } 92 | layer { 93 | name: "relu2_2" 94 | type: "ReLU" 95 | bottom: "conv2_2" 96 | top: "conv2_2" 97 | } 98 | layer { 99 | name: "pool2" 100 | type: "Pooling" 101 | bottom: "conv2_2" 102 | top: "pool2" 103 | pooling_param { 104 | pool: MAX 105 | kernel_size: 2 106 | stride: 2 107 | } 108 | } 109 | layer { 110 | name: "conv3_1" 111 | type: "Convolution" 112 | bottom: "pool2" 113 | top: "conv3_1" 114 | convolution_param { 115 | num_output: 256 116 | pad: 1 117 | kernel_size: 3 118 | stride: 1 119 | } 120 | } 121 | layer { 122 | name: "relu3_1" 123 | type: "ReLU" 124 | bottom: "conv3_1" 125 | top: "conv3_1" 126 | } 127 | layer { 128 | name: "conv3_2" 129 | type: "Convolution" 130 | bottom: "conv3_1" 131 | top: "conv3_2" 132 | convolution_param { 133 | num_output: 256 134 | pad: 1 135 | kernel_size: 3 136 | stride: 1 137 | } 138 | } 139 | layer { 140 | name: "relu3_2" 141 | type: "ReLU" 142 | bottom: "conv3_2" 143 | top: "conv3_2" 144 | } 145 | layer { 146 | name: "conv3_3" 147 | type: "Convolution" 148 | bottom: "conv3_2" 149 | top: "conv3_3" 150 | convolution_param { 151 | num_output: 256 152 | pad: 1 153 | kernel_size: 3 154 | stride: 1 155 | } 156 | } 157 | layer { 158 | name: "relu3_3" 159 | type: "ReLU" 160 | bottom: "conv3_3" 161 | top: "conv3_3" 162 | } 163 | layer { 164 | name: "pool3" 165 | type: "Pooling" 166 | bottom: "conv3_3" 167 | top: "pool3" 168 | pooling_param { 169 | pool: MAX 170 | kernel_size: 2 171 | stride: 2 172 | } 173 | } 174 | layer { 175 | name: "conv4_1" 176 | type: "Convolution" 177 | bottom: "pool3" 178 | top: "conv4_1" 179 | convolution_param { 180 | num_output: 512 181 | pad: 1 182 | kernel_size: 3 183 | stride: 1 184 | } 185 | } 186 | layer { 187 | name: "relu4_1" 188 | type: "ReLU" 189 | bottom: "conv4_1" 190 | top: "conv4_1" 191 | } 192 | layer { 193 | name: "conv4_2" 194 | type: "Convolution" 195 | bottom: "conv4_1" 196 | top: "conv4_2" 197 | convolution_param { 198 | num_output: 512 199 | pad: 1 200 | kernel_size: 3 201 | stride: 1 202 | } 203 | } 204 | layer { 205 | name: "relu4_2" 206 | type: "ReLU" 207 | bottom: "conv4_2" 208 | top: "conv4_2" 209 | } 210 | layer { 211 | name: "conv4_3" 212 | type: "Convolution" 213 | bottom: "conv4_2" 214 | top: "conv4_3" 215 | convolution_param { 216 | num_output: 512 217 | pad: 1 218 | kernel_size: 3 219 | stride: 1 220 | } 221 | } 222 | layer { 223 | name: "relu4_3" 224 | type: "ReLU" 225 | bottom: "conv4_3" 226 | top: "conv4_3" 227 | } 228 | layer { 229 | name: "pool4" 230 | type: "Pooling" 231 | bottom: "conv4_3" 232 | top: "pool4" 233 | pooling_param { 234 | pool: MAX 235 | kernel_size: 2 236 | stride: 2 237 | } 238 | } 239 | layer { 240 | name: "conv5_1" 241 | type: "Convolution" 242 | bottom: "pool4" 243 | top: "conv5_1" 244 | convolution_param { 245 | num_output: 512 246 | pad: 1 247 | kernel_size: 3 248 | stride: 1 249 | } 250 | } 251 | layer { 252 | name: "relu5_1" 253 | type: "ReLU" 254 | bottom: "conv5_1" 255 | top: "conv5_1" 256 | } 257 | layer { 258 | name: "conv5_2" 259 | type: "Convolution" 260 | bottom: "conv5_1" 261 | top: "conv5_2" 262 | convolution_param { 263 | num_output: 512 264 | pad: 1 265 | kernel_size: 3 266 | stride: 1 267 | } 268 | } 269 | layer { 270 | name: "relu5_2" 271 | type: "ReLU" 272 | bottom: "conv5_2" 273 | top: "conv5_2" 274 | } 275 | layer { 276 | name: "conv5_3" 277 | type: "Convolution" 278 | bottom: "conv5_2" 279 | top: "conv5_3" 280 | convolution_param { 281 | num_output: 512 282 | pad: 1 283 | kernel_size: 3 284 | stride: 1 285 | } 286 | } 287 | layer { 288 | name: "relu5_3" 289 | type: "ReLU" 290 | bottom: "conv5_3" 291 | top: "conv5_3" 292 | } 293 | layer { 294 | name: "pool5" 295 | type: "Pooling" 296 | bottom: "conv5_3" 297 | top: "pool5" 298 | pooling_param { 299 | pool: MAX 300 | kernel_size: 2 301 | stride: 2 302 | } 303 | } 304 | layer { 305 | name: "fc6" 306 | type: "InnerProduct" 307 | bottom: "pool5" 308 | top: "fc6" 309 | inner_product_param { 310 | num_output: 4096 311 | } 312 | } 313 | layer { 314 | name: "relu6" 315 | type: "ReLU" 316 | bottom: "fc6" 317 | top: "fc6" 318 | } 319 | layer { 320 | name: "drop6" 321 | type: "Dropout" 322 | bottom: "fc6" 323 | top: "fc6" 324 | dropout_param { 325 | dropout_ratio: 0.5 326 | } 327 | } 328 | layer { 329 | name: "fc7" 330 | type: "InnerProduct" 331 | bottom: "fc6" 332 | top: "fc7" 333 | inner_product_param { 334 | num_output: 4096 335 | } 336 | } 337 | layer { 338 | name: "relu7" 339 | type: "ReLU" 340 | bottom: "fc7" 341 | top: "fc7" 342 | } 343 | layer { 344 | name: "drop7" 345 | type: "Dropout" 346 | bottom: "fc7" 347 | top: "fc7" 348 | dropout_param { 349 | dropout_ratio: 0.5 350 | } 351 | } 352 | layer { 353 | name: "fc8" 354 | type: "InnerProduct" 355 | bottom: "fc7" 356 | top: "fc8" 357 | inner_product_param { 358 | num_output: 1000 359 | } 360 | } 361 | layer { 362 | name: "prob" 363 | type: "Softmax" 364 | bottom: "fc8" 365 | top: "prob" 366 | } 367 | -------------------------------------------------------------------------------- /VGGNet/vgg_16_deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | from six.moves import xrange 4 | 5 | import caffe 6 | from caffe import layers as L, params as P 7 | from caffe import to_proto 8 | 9 | 10 | # [Convolution - ReLU] * #stack - Pooling 11 | def _block_crp(major, stack_num, net, bottom, nout, pad=1, ks=3, stride=1): 12 | for minor in xrange(stack_num): 13 | conv_layer = 'conv{}_{}'.format(major, minor+1) 14 | relu_layer = 'relu{}_{}'.format(major, minor+1) 15 | 16 | if minor == 0: 17 | bottom_layer = bottom 18 | else: 19 | pre_layer = 'relu{}_{}'.format(major, minor) 20 | bottom_layer = net[pre_layer] 21 | 22 | net[conv_layer] = L.Convolution(bottom_layer, 23 | num_output = nout, pad = pad, 24 | kernel_size = ks, stride = stride) 25 | net[relu_layer] = L.ReLU(net[conv_layer], in_place = True) 26 | 27 | pool_layer = 'pool{}'.format(major) 28 | net[pool_layer] = L.Pooling(net[relu_layer], pool = P.Pooling.MAX, 29 | kernel_size = 2, stride = 2) 30 | 31 | return net[pool_layer] 32 | 33 | 34 | # FullyConnection - ReLU - Dropout 35 | def _block_frd(major, net, bottom, nout, dropratio=0.5): 36 | fc_layer = 'fc{}'.format(major) 37 | relu_layer = 'relu{}'.format(major) 38 | drop_layer = 'drop{}'.format(major) 39 | 40 | net[fc_layer] = L.InnerProduct(bottom, num_output = nout) 41 | net[relu_layer] = L.ReLU(net[fc_layer], in_place = True) 42 | net[drop_layer] = L.Dropout(net[relu_layer], dropout_ratio = dropratio, 43 | in_place = True) 44 | 45 | return net[drop_layer] 46 | 47 | 48 | def construc_net(): 49 | net = caffe.NetSpec() 50 | 51 | net.data = L.Input(shape = dict(dim = [10,3,224,224])) 52 | 53 | block_1 = _block_crp('1', 2, net, net.data, 64) 54 | block_2 = _block_crp('2', 2, net, block_1, 128) 55 | block_3 = _block_crp('3', 3, net, block_2, 256) 56 | block_4 = _block_crp('4', 3, net, block_3, 512) 57 | block_5 = _block_crp('5', 3, net, block_4, 512) 58 | 59 | block_6 = _block_frd('6', net, block_5, 4096) 60 | block_7 = _block_frd('7', net, block_6, 4096) 61 | 62 | net.fc8 = L.InnerProduct(block_7, num_output = 1000) 63 | net.prob = L.Softmax(net.fc8) 64 | 65 | return net.to_proto() 66 | 67 | 68 | def main(): 69 | with open('vgg_16_deploy.prototxt', 'w') as f: 70 | f.write('name: "VGG-16_deploy"\n') 71 | f.write(str(construc_net())) 72 | 73 | 74 | if __name__ == '__main__': 75 | main() 76 | -------------------------------------------------------------------------------- /VGGNet/vgg_19_deploy.prototxt: -------------------------------------------------------------------------------- 1 | name: "VGG-19_deploy" 2 | layer { 3 | name: "data" 4 | type: "Input" 5 | top: "data" 6 | input_param { 7 | shape { 8 | dim: 10 9 | dim: 3 10 | dim: 224 11 | dim: 224 12 | } 13 | } 14 | } 15 | layer { 16 | name: "conv1_1" 17 | type: "Convolution" 18 | bottom: "data" 19 | top: "conv1_1" 20 | convolution_param { 21 | num_output: 64 22 | pad: 1 23 | kernel_size: 3 24 | stride: 1 25 | } 26 | } 27 | layer { 28 | name: "relu1_1" 29 | type: "ReLU" 30 | bottom: "conv1_1" 31 | top: "conv1_1" 32 | } 33 | layer { 34 | name: "conv1_2" 35 | type: "Convolution" 36 | bottom: "conv1_1" 37 | top: "conv1_2" 38 | convolution_param { 39 | num_output: 64 40 | pad: 1 41 | kernel_size: 3 42 | stride: 1 43 | } 44 | } 45 | layer { 46 | name: "relu1_2" 47 | type: "ReLU" 48 | bottom: "conv1_2" 49 | top: "conv1_2" 50 | } 51 | layer { 52 | name: "pool1" 53 | type: "Pooling" 54 | bottom: "conv1_2" 55 | top: "pool1" 56 | pooling_param { 57 | pool: MAX 58 | kernel_size: 2 59 | stride: 2 60 | } 61 | } 62 | layer { 63 | name: "conv2_1" 64 | type: "Convolution" 65 | bottom: "pool1" 66 | top: "conv2_1" 67 | convolution_param { 68 | num_output: 128 69 | pad: 1 70 | kernel_size: 3 71 | stride: 1 72 | } 73 | } 74 | layer { 75 | name: "relu2_1" 76 | type: "ReLU" 77 | bottom: "conv2_1" 78 | top: "conv2_1" 79 | } 80 | layer { 81 | name: "conv2_2" 82 | type: "Convolution" 83 | bottom: "conv2_1" 84 | top: "conv2_2" 85 | convolution_param { 86 | num_output: 128 87 | pad: 1 88 | kernel_size: 3 89 | stride: 1 90 | } 91 | } 92 | layer { 93 | name: "relu2_2" 94 | type: "ReLU" 95 | bottom: "conv2_2" 96 | top: "conv2_2" 97 | } 98 | layer { 99 | name: "pool2" 100 | type: "Pooling" 101 | bottom: "conv2_2" 102 | top: "pool2" 103 | pooling_param { 104 | pool: MAX 105 | kernel_size: 2 106 | stride: 2 107 | } 108 | } 109 | layer { 110 | name: "conv3_1" 111 | type: "Convolution" 112 | bottom: "pool2" 113 | top: "conv3_1" 114 | convolution_param { 115 | num_output: 256 116 | pad: 1 117 | kernel_size: 3 118 | stride: 1 119 | } 120 | } 121 | layer { 122 | name: "relu3_1" 123 | type: "ReLU" 124 | bottom: "conv3_1" 125 | top: "conv3_1" 126 | } 127 | layer { 128 | name: "conv3_2" 129 | type: "Convolution" 130 | bottom: "conv3_1" 131 | top: "conv3_2" 132 | convolution_param { 133 | num_output: 256 134 | pad: 1 135 | kernel_size: 3 136 | stride: 1 137 | } 138 | } 139 | layer { 140 | name: "relu3_2" 141 | type: "ReLU" 142 | bottom: "conv3_2" 143 | top: "conv3_2" 144 | } 145 | layer { 146 | name: "conv3_3" 147 | type: "Convolution" 148 | bottom: "conv3_2" 149 | top: "conv3_3" 150 | convolution_param { 151 | num_output: 256 152 | pad: 1 153 | kernel_size: 3 154 | stride: 1 155 | } 156 | } 157 | layer { 158 | name: "relu3_3" 159 | type: "ReLU" 160 | bottom: "conv3_3" 161 | top: "conv3_3" 162 | } 163 | layer { 164 | name: "conv3_4" 165 | type: "Convolution" 166 | bottom: "conv3_3" 167 | top: "conv3_4" 168 | convolution_param { 169 | num_output: 256 170 | pad: 1 171 | kernel_size: 3 172 | stride: 1 173 | } 174 | } 175 | layer { 176 | name: "relu3_4" 177 | type: "ReLU" 178 | bottom: "conv3_4" 179 | top: "conv3_4" 180 | } 181 | layer { 182 | name: "pool3" 183 | type: "Pooling" 184 | bottom: "conv3_4" 185 | top: "pool3" 186 | pooling_param { 187 | pool: MAX 188 | kernel_size: 2 189 | stride: 2 190 | } 191 | } 192 | layer { 193 | name: "conv4_1" 194 | type: "Convolution" 195 | bottom: "pool3" 196 | top: "conv4_1" 197 | convolution_param { 198 | num_output: 512 199 | pad: 1 200 | kernel_size: 3 201 | stride: 1 202 | } 203 | } 204 | layer { 205 | name: "relu4_1" 206 | type: "ReLU" 207 | bottom: "conv4_1" 208 | top: "conv4_1" 209 | } 210 | layer { 211 | name: "conv4_2" 212 | type: "Convolution" 213 | bottom: "conv4_1" 214 | top: "conv4_2" 215 | convolution_param { 216 | num_output: 512 217 | pad: 1 218 | kernel_size: 3 219 | stride: 1 220 | } 221 | } 222 | layer { 223 | name: "relu4_2" 224 | type: "ReLU" 225 | bottom: "conv4_2" 226 | top: "conv4_2" 227 | } 228 | layer { 229 | name: "conv4_3" 230 | type: "Convolution" 231 | bottom: "conv4_2" 232 | top: "conv4_3" 233 | convolution_param { 234 | num_output: 512 235 | pad: 1 236 | kernel_size: 3 237 | stride: 1 238 | } 239 | } 240 | layer { 241 | name: "relu4_3" 242 | type: "ReLU" 243 | bottom: "conv4_3" 244 | top: "conv4_3" 245 | } 246 | layer { 247 | name: "conv4_4" 248 | type: "Convolution" 249 | bottom: "conv4_3" 250 | top: "conv4_4" 251 | convolution_param { 252 | num_output: 512 253 | pad: 1 254 | kernel_size: 3 255 | stride: 1 256 | } 257 | } 258 | layer { 259 | name: "relu4_4" 260 | type: "ReLU" 261 | bottom: "conv4_4" 262 | top: "conv4_4" 263 | } 264 | layer { 265 | name: "pool4" 266 | type: "Pooling" 267 | bottom: "conv4_4" 268 | top: "pool4" 269 | pooling_param { 270 | pool: MAX 271 | kernel_size: 2 272 | stride: 2 273 | } 274 | } 275 | layer { 276 | name: "conv5_1" 277 | type: "Convolution" 278 | bottom: "pool4" 279 | top: "conv5_1" 280 | convolution_param { 281 | num_output: 512 282 | pad: 1 283 | kernel_size: 3 284 | stride: 1 285 | } 286 | } 287 | layer { 288 | name: "relu5_1" 289 | type: "ReLU" 290 | bottom: "conv5_1" 291 | top: "conv5_1" 292 | } 293 | layer { 294 | name: "conv5_2" 295 | type: "Convolution" 296 | bottom: "conv5_1" 297 | top: "conv5_2" 298 | convolution_param { 299 | num_output: 512 300 | pad: 1 301 | kernel_size: 3 302 | stride: 1 303 | } 304 | } 305 | layer { 306 | name: "relu5_2" 307 | type: "ReLU" 308 | bottom: "conv5_2" 309 | top: "conv5_2" 310 | } 311 | layer { 312 | name: "conv5_3" 313 | type: "Convolution" 314 | bottom: "conv5_2" 315 | top: "conv5_3" 316 | convolution_param { 317 | num_output: 512 318 | pad: 1 319 | kernel_size: 3 320 | stride: 1 321 | } 322 | } 323 | layer { 324 | name: "relu5_3" 325 | type: "ReLU" 326 | bottom: "conv5_3" 327 | top: "conv5_3" 328 | } 329 | layer { 330 | name: "conv5_4" 331 | type: "Convolution" 332 | bottom: "conv5_3" 333 | top: "conv5_4" 334 | convolution_param { 335 | num_output: 512 336 | pad: 1 337 | kernel_size: 3 338 | stride: 1 339 | } 340 | } 341 | layer { 342 | name: "relu5_4" 343 | type: "ReLU" 344 | bottom: "conv5_4" 345 | top: "conv5_4" 346 | } 347 | layer { 348 | name: "pool5" 349 | type: "Pooling" 350 | bottom: "conv5_4" 351 | top: "pool5" 352 | pooling_param { 353 | pool: MAX 354 | kernel_size: 2 355 | stride: 2 356 | } 357 | } 358 | layer { 359 | name: "fc6" 360 | type: "InnerProduct" 361 | bottom: "pool5" 362 | top: "fc6" 363 | inner_product_param { 364 | num_output: 4096 365 | } 366 | } 367 | layer { 368 | name: "relu6" 369 | type: "ReLU" 370 | bottom: "fc6" 371 | top: "fc6" 372 | } 373 | layer { 374 | name: "drop6" 375 | type: "Dropout" 376 | bottom: "fc6" 377 | top: "fc6" 378 | dropout_param { 379 | dropout_ratio: 0.5 380 | } 381 | } 382 | layer { 383 | name: "fc7" 384 | type: "InnerProduct" 385 | bottom: "fc6" 386 | top: "fc7" 387 | inner_product_param { 388 | num_output: 4096 389 | } 390 | } 391 | layer { 392 | name: "relu7" 393 | type: "ReLU" 394 | bottom: "fc7" 395 | top: "fc7" 396 | } 397 | layer { 398 | name: "drop7" 399 | type: "Dropout" 400 | bottom: "fc7" 401 | top: "fc7" 402 | dropout_param { 403 | dropout_ratio: 0.5 404 | } 405 | } 406 | layer { 407 | name: "fc8" 408 | type: "InnerProduct" 409 | bottom: "fc7" 410 | top: "fc8" 411 | inner_product_param { 412 | num_output: 1000 413 | } 414 | } 415 | layer { 416 | name: "prob" 417 | type: "Softmax" 418 | bottom: "fc8" 419 | top: "prob" 420 | } 421 | -------------------------------------------------------------------------------- /VGGNet/vgg_19_deploy.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | from six.moves import xrange 4 | 5 | import caffe 6 | from caffe import layers as L, params as P 7 | from caffe import to_proto 8 | 9 | # [Convolution - ReLU] * #stack - Pooling 10 | def _block_crp(major, stack_num, net, bottom, nout, pad=1, ks=3, stride=1): 11 | for minor in xrange(stack_num): 12 | conv_layer = 'conv{}_{}'.format(major, minor+1) 13 | relu_layer = 'relu{}_{}'.format(major, minor+1) 14 | 15 | if minor == 0: 16 | bottom_layer = bottom 17 | else: 18 | pre_layer = 'relu{}_{}'.format(major, minor) 19 | bottom_layer = net[pre_layer] 20 | 21 | net[conv_layer] = L.Convolution(bottom_layer, 22 | num_output = nout, pad = pad, 23 | kernel_size = ks, stride = stride) 24 | net[relu_layer] = L.ReLU(net[conv_layer], in_place = True) 25 | 26 | pool_layer = 'pool{}'.format(major) 27 | net[pool_layer] = L.Pooling(net[relu_layer], pool = P.Pooling.MAX, 28 | kernel_size = 2, stride = 2) 29 | 30 | return net[pool_layer] 31 | 32 | 33 | # FullyConnection - ReLU - Dropout 34 | def _block_frd(major, net, bottom, nout, dropratio=0.5): 35 | fc_layer = 'fc{}'.format(major) 36 | relu_layer = 'relu{}'.format(major) 37 | drop_layer = 'drop{}'.format(major) 38 | 39 | net[fc_layer] = L.InnerProduct(bottom, num_output = nout) 40 | net[relu_layer] = L.ReLU(net[fc_layer], in_place = True) 41 | net[drop_layer] = L.Dropout(net[relu_layer], dropout_ratio = dropratio, 42 | in_place = True) 43 | 44 | return net[drop_layer] 45 | 46 | 47 | def construc_net(): 48 | net = caffe.NetSpec() 49 | 50 | net.data = L.Input(shape = dict(dim = [10,3,224,224])) 51 | 52 | block_1 = _block_crp('1', 2, net, net.data, 64) 53 | block_2 = _block_crp('2', 2, net, block_1, 128) 54 | block_3 = _block_crp('3', 4, net, block_2, 256) 55 | block_4 = _block_crp('4', 4, net, block_3, 512) 56 | block_5 = _block_crp('5', 4, net, block_4, 512) 57 | 58 | block_6 = _block_frd('6', net, block_5, 4096) 59 | block_7 = _block_frd('7', net, block_6, 4096) 60 | 61 | net.fc8 = L.InnerProduct(block_7, num_output = 1000) 62 | net.prob = L.Softmax(net.fc8) 63 | 64 | return net.to_proto() 65 | 66 | 67 | def main(): 68 | with open('vgg_19_deploy.prototxt', 'w') as f: 69 | f.write('name: "VGG-19_deploy"\n') 70 | f.write(str(construc_net())) 71 | 72 | if __name__ == '__main__': 73 | main() 74 | -------------------------------------------------------------------------------- /WRN/README.md: -------------------------------------------------------------------------------- 1 | # WRN 2 | [Wide Residual Networks](https://arxiv.org/abs/1605.07146)
3 | Sergey Zagoruyko, Nikos Komodakis
4 | 5 | ### 摘要 6 | 深度残差网络可以扩展到上千层,并且仍然能够提升性能。但是,每提升1%的准确率就要付出将网络层数翻倍的代价,而极深层的残差网络对特征的重复利用逐渐减少 7 | (diminishing feature reuse),这会使网络训练变得很慢。为了处理这个问题,本文提出一种新的架构——wide residual networks (WRNs), 8 | 该架构减小残差网络的深度,增大网络的宽度。实验证明WRN要优于窄但极深的网络,16层的WRN在准确率和效率方面要优于之前所有的残差网络, 9 | 包括超过1000层的极深残差网络,WRN在CIFAR、SVHN、COCO上都取得了最佳的结果,在ImageNet上也有很大提升。官方实现(Torch)源码地址: 10 | https://github.com/szagoruyko/wide-residual-networks 。
11 | 12 | ### 1. Introduction 13 | 近几年在多项工作中都发现了使用深层网络的优势,但是训练深层网络存在一些难点,包括梯度消失/爆炸、退化现象等。多种技术有助于训练较深层的网络, 14 | 比如精心设计的初始化策略(msra等),更好的优化器,跳跃连接(skip connection),知识迁移(knowledge transfer),逐层训练 15 | (layer-wise training)等。
16 | 最近的ResNet在多个任务上都取得了最佳的结果,相比于Inception架构,ResNet具有更好的泛化能力,也就是说它提取的特征可以用于迁移学习。 17 | Inception-ResNet显示了残差连接可以加速深层网络的收敛速度,ResNet-v2考察了残差网络中激活函数的位置顺序,显示了恒等映射在残差网络中的重要性, 18 | 并且利用新的架构可以训练极深层的网络。Highway network也可以训练深层网络,它与ResNet最大的不同在于它的残差连接是参数门限形式的。
19 | 之前的研究基本都是着眼于残差block中的激活函数位置顺序或者残差网络的深度。本文从另一个角度来提高残差网络性能。
20 | **Width vs depth in residual networks**
21 | ResNet为了在增加网络深度时保持模型包含较小参数量,将网络设计的很“窄(thin)”,甚至引入bottleneck block使block更窄。
22 | 包含恒等映射的残差block有助于训练极深层网络,但同时也是残差网络的一个缺点。梯度反传时并不一定要通过残差block中带权值的支路(残差函数), 23 | 那么这些残差函数在训练时就学习不到任何信息,所以残差网络中可能只有一小部分block学习到有用的表示,大多数block对最终的预测贡献甚少。 24 | 这个问题在Highway network被称为diminishing feature reuse。随机深度ResNet通过在训练时随机丢弃ResNet中的部分层来解决这个问题, 25 | 这种方法可以视为dropout的特例,而该方法的有效性也证明了上述假设是正确的。
26 | 本文工作基于[ResNet-v2](https://github.com/binLearning/caffe_toolkit/tree/master/ResNet-v2),主要考察残差block的宽度。 27 | 本文实验显示, **适当的增加ResNet中block的宽度比增加网络深度可以更有效的提升性能 ,这说明残差网络的能力主要由残差block提供, 28 | 网络深度只有补充性的作用。**
29 | **Use of dropout in ResNet blocks**
30 | Dropout多用于网络中包含大量参数的最终几层(一般是全连接层)来防止特征相互适应(feature coadaptation)以及过拟合。 31 | 但dropout逐渐被batch normalization (BN)取代,BN也有正则化的效果,并且实验证明使用BN的网络比使用dropout的网络有更高的准确率。 32 | 在本文中,加宽的残差block包含大量参数,我们使用dropout来防止过拟合。ResNet-v2中将dropout加到快捷连接支路上发现性能变差, 33 | 我们认为dropout应该添加到残差函数支路中,实验证明该方法可以提升网络性能。
34 | 35 | ### 2. Wide residual networks 36 | 在ResNet-v2中残差网络有两种形式的block:
37 | **basic** —— 两个相连的3x3卷积层,预激活(pre-activation)形式,如图1(a)所示
38 | **bottleneck** —— 一个3x3卷积层,前后各一个1x1卷积层,如图1(b)所示
39 | bottleneck block是为了在增加层数时减少block的计算量,也就是使得block更窄,而我们要考察的是加宽block的效果,所以不考虑bottleneck block, 40 | 在本文架构中只使用basic形式。
41 | ![](./data/figure_1.png)
42 | 有三种提升残差block表示能力的方法:
43 | • 向每个block中增加更多的卷积层
44 | • 通过增加特征图数量来加宽卷积层
45 | • 增大卷积层的滤波器尺寸
46 | VGG、Inception-v4中显示小尺寸的滤波器更有效,所以本文不考虑尺寸大于3x3的卷积核。引入两个因子:深度因子l,表示一个block中包含的卷积层数量; 47 | 宽度因子k,卷积层输出特征图数量的倍数,那么basic block对应的l=2,k=1。图1(a)、(c)分别展示了basic block和加宽的basic block。
48 | 表1中展示了本文提出的残差网络的具体配置,其中宽度因子k用来控制残差block的宽度。
49 | ![](./data/table_1.png)
50 | #### 2.1 Type of convolutions in residual block 51 | 用B(M)来表示残差block的结构,其中M表示卷积层列表,以滤波器尺寸来代表卷积层。为了考察3x3卷积层的重要性,以及它是否可以被其他形式的卷积取代, 52 | 本文试验了多种形式的卷积组合:
53 | 1. B(3,3) - original basic block
54 | 2. B(3,1,3) - with one extra 1x1 layer
55 | 3. B(1,3,1) - with the same dimensionality of all convolutions, straightened bottleneck
56 | 4. B(1,3) - the network has alternating 1x1-3x3 convolutions everywhere
57 | 5. B(3,1) - similar idea to the previous block
58 | 6. B(3,1,1) - Network-in-Network style block
59 | #### 2.2 Number of convolutional layers per residual block 60 | 考察深度因子l对模型性能的影响,为了保持网络复杂度基本不变,需要同时改变l和d(d表示block的总数),也就是说增大l时应该减小d。
61 | #### 2.3 Width of residual blocks 62 | 考察宽度因子k,当block深度因子l或者block总数d增大时,模型参数量也线性增加;而宽度因子k增大时,参数量和计算复杂度会增加k的平方倍。 63 | 即使这样,计算加宽的网络也比窄但极深的网络更加高效,因为GPU在并行计算大的张量时更加高效。
64 | ResNet之前的网络架构都是比较宽的,比如Inception和VGG。
65 | 本文以WRN-n-k表示包含n个卷积层且宽度因子为k的WRN,也有可能加上block的形式,比如WRN-40-2-B(3,3)。
66 | #### 2.4 Dropout in residual blocks 67 | 加宽block会增加网络的参数数量,所以要使用正则化方法。之前的残差网络使用BN来提供正则化的效果,但仍然需要大量的数据增广操作。 68 | 本文在残差函数支路中加入dropout(如图1(d)所示)来避免过拟合。在极深层残差网络中这样应该可以解决特征重复利用逐渐减少的问题, 69 | 因为dropout的加入可以强使不同的残差block学习表示。
70 | 71 | ### 3. Experimental results 72 | 分别在CIFAR-10、CIFAR-100、SVHN和ImageNet数据集上进行实验。
73 | **Type of convolutions in a block**
74 | 不同形式的卷积组合的性能见表2,可以看出,具有相同参数量的block的性能基本一样,所以后面的实验全部选取只包含3x3卷积层的形式。
75 | ![](./data/table_2.png)
76 | **Number of convolutions per block**
77 | 不同的深度因子l的性能见表3。B(3,3,3)和B(3,3,3,3)比B(3,3)性能差,可能是因为网络中的残差连接减少(参考2.2),使得网络难以优化。
78 | ![](./data/table_3.png)
79 | **Width of residual blocks**  
80 | 考察不同的k和网络深度的组合,具体性能见表4。
81 | ![](./data/table_4.png)
82 | 表5展示了不同网络的性能比较,图2显示了两个代表性网络的训练曲线。
83 | ![](./data/table_5.png)
84 | ![](./data/figure_2.png)
85 | 尽管先前的一些工作表明深度具有正则化的效果,而宽度则会导致过拟合,但是我们成功训练出来比1001层ResNet参数量还要多的模型,训练耗时更短, 86 | 性能也更好。
87 | 总结上述实验结果:
88 | • 加宽不同深度的残差网络都可以提升性能
89 | • 在参数量过多和需要更强的正则化方法之前,增加深度和宽度都有助于提升性能
90 | • 深度好像并没有正则化的作用,具有相同参数量的宽且浅网络可以学习到相当或更好的表示。此外,宽网络可以训练出数倍于窄网络参数量的模型, 91 | 窄网络的深度需要加倍才能得到相同的结果,这使得训练几乎不可行。
92 | **Dropout in residual blocks**
93 | 考察dropout的作用,具体性能见表6。
94 | ![](./data/table_6.png)
95 | 我们发现在训练残差网络时,在第一次和第二次下调学习率之间的时间段,验证集损失和误差都会震荡走高,这可能是因为权值衰减(weight decay), 96 | 但是调低权值衰减系数会使得准确率大幅下降。dropout可以在大多数情况下缓解这个问题(见图2,3)。
97 | ![](./data/figure_3.png)
98 | 尽管网络中使用了BN,dropout仍然是一个有效的正则化方法。dropout可以和加宽方法互补使用来进一步提升模型性能。
99 | **ImageNet and COCO experiments**
100 | 具体结果见表7-9。
101 | ![](./data/table_7.png)
102 | ![](./data/table_8.png)
103 | ![](./data/table_9.png)
104 | **Computational efficiency**
105 | 窄深且卷积核尺寸小的残差网络与GPU计算优势相悖。增加宽度可以提高计算效率,所以宽的网络通常更高效。考察不同的网络计算耗时如图4。
106 | ![](./data/figure_4.png)
107 | **Implementation details**
108 | 使用Torch实现,官方实现源码地址:https://github.com/szagoruyko/wide-residual-networks 。 109 | 110 | ### 4. Conclusions 111 | 本文主要考察了残差网络中的宽度以及dropout的使用。实验证明残差网络的能力主要由残差block提供,并不是极端的深度。另外, 112 | WRN的训练在某些情况下会更快。 113 | -------------------------------------------------------------------------------- /WRN/data/figure_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/figure_1.png -------------------------------------------------------------------------------- /WRN/data/figure_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/figure_2.png -------------------------------------------------------------------------------- /WRN/data/figure_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/figure_3.png -------------------------------------------------------------------------------- /WRN/data/figure_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/figure_4.png -------------------------------------------------------------------------------- /WRN/data/table_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_1.png -------------------------------------------------------------------------------- /WRN/data/table_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_2.png -------------------------------------------------------------------------------- /WRN/data/table_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_3.png -------------------------------------------------------------------------------- /WRN/data/table_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_4.png -------------------------------------------------------------------------------- /WRN/data/table_5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_5.png -------------------------------------------------------------------------------- /WRN/data/table_6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_6.png -------------------------------------------------------------------------------- /WRN/data/table_7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_7.png -------------------------------------------------------------------------------- /WRN/data/table_8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_8.png -------------------------------------------------------------------------------- /WRN/data/table_9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/binLearning/caffe_toolkit/78c7172d93230414a54afd0db9eccab4e8612b84/WRN/data/table_9.png -------------------------------------------------------------------------------- /WRN/wrn_28_10_deploy.prototxt: -------------------------------------------------------------------------------- 1 | name: "WRN-28-10_deploy" 2 | layer { 3 | name: "data" 4 | type: "Input" 5 | top: "data" 6 | input_param { 7 | shape { 8 | dim: 1 9 | dim: 3 10 | dim: 32 11 | dim: 32 12 | } 13 | } 14 | } 15 | layer { 16 | name: "conv1" 17 | type: "Convolution" 18 | bottom: "data" 19 | top: "conv1" 20 | convolution_param { 21 | num_output: 16 22 | bias_term: false 23 | pad: 1 24 | kernel_size: 3 25 | stride: 1 26 | } 27 | } 28 | layer { 29 | name: "block_2_1_branch2a_bn" 30 | type: "BatchNorm" 31 | bottom: "conv1" 32 | top: "block_2_1_branch2a_bn" 33 | } 34 | layer { 35 | name: "block_2_1_branch2a_scale" 36 | type: "Scale" 37 | bottom: "block_2_1_branch2a_bn" 38 | top: "block_2_1_branch2a_bn" 39 | scale_param { 40 | bias_term: true 41 | } 42 | } 43 | layer { 44 | name: "block_2_1_branch2a_relu" 45 | type: "ReLU" 46 | bottom: "block_2_1_branch2a_bn" 47 | top: "block_2_1_branch2a_bn" 48 | } 49 | layer { 50 | name: "block_2_1_branch2a_conv" 51 | type: "Convolution" 52 | bottom: "block_2_1_branch2a_bn" 53 | top: "block_2_1_branch2a_conv" 54 | convolution_param { 55 | num_output: 160 56 | bias_term: false 57 | pad: 1 58 | kernel_size: 3 59 | stride: 1 60 | } 61 | } 62 | layer { 63 | name: "block_2_1_dropout" 64 | type: "Dropout" 65 | bottom: "block_2_1_branch2a_conv" 66 | top: "block_2_1_dropout" 67 | dropout_param { 68 | dropout_ratio: 0.3 69 | } 70 | } 71 | layer { 72 | name: "block_2_1_branch2b_bn" 73 | type: "BatchNorm" 74 | bottom: "block_2_1_dropout" 75 | top: "block_2_1_branch2b_bn" 76 | } 77 | layer { 78 | name: "block_2_1_branch2b_scale" 79 | type: "Scale" 80 | bottom: "block_2_1_branch2b_bn" 81 | top: "block_2_1_branch2b_bn" 82 | scale_param { 83 | bias_term: true 84 | } 85 | } 86 | layer { 87 | name: "block_2_1_branch2b_relu" 88 | type: "ReLU" 89 | bottom: "block_2_1_branch2b_bn" 90 | top: "block_2_1_branch2b_bn" 91 | } 92 | layer { 93 | name: "block_2_1_branch2b_conv" 94 | type: "Convolution" 95 | bottom: "block_2_1_branch2b_bn" 96 | top: "block_2_1_branch2b_conv" 97 | convolution_param { 98 | num_output: 160 99 | bias_term: false 100 | pad: 1 101 | kernel_size: 3 102 | stride: 1 103 | } 104 | } 105 | layer { 106 | name: "block_2_1_branch1_bn" 107 | type: "BatchNorm" 108 | bottom: "conv1" 109 | top: "block_2_1_branch1_bn" 110 | } 111 | layer { 112 | name: "block_2_1_branch1_scale" 113 | type: "Scale" 114 | bottom: "block_2_1_branch1_bn" 115 | top: "block_2_1_branch1_bn" 116 | scale_param { 117 | bias_term: true 118 | } 119 | } 120 | layer { 121 | name: "block_2_1_branch1_relu" 122 | type: "ReLU" 123 | bottom: "block_2_1_branch1_bn" 124 | top: "block_2_1_branch1_bn" 125 | } 126 | layer { 127 | name: "block_2_1_branch1_conv" 128 | type: "Convolution" 129 | bottom: "block_2_1_branch1_bn" 130 | top: "block_2_1_branch1_conv" 131 | convolution_param { 132 | num_output: 160 133 | bias_term: false 134 | pad: 1 135 | kernel_size: 3 136 | stride: 1 137 | } 138 | } 139 | layer { 140 | name: "block_2_1_addition" 141 | type: "Eltwise" 142 | bottom: "block_2_1_branch1_conv" 143 | bottom: "block_2_1_branch2b_conv" 144 | top: "block_2_1_addition" 145 | } 146 | layer { 147 | name: "block_2_2_branch2a_bn" 148 | type: "BatchNorm" 149 | bottom: "block_2_1_addition" 150 | top: "block_2_2_branch2a_bn" 151 | } 152 | layer { 153 | name: "block_2_2_branch2a_scale" 154 | type: "Scale" 155 | bottom: "block_2_2_branch2a_bn" 156 | top: "block_2_2_branch2a_bn" 157 | scale_param { 158 | bias_term: true 159 | } 160 | } 161 | layer { 162 | name: "block_2_2_branch2a_relu" 163 | type: "ReLU" 164 | bottom: "block_2_2_branch2a_bn" 165 | top: "block_2_2_branch2a_bn" 166 | } 167 | layer { 168 | name: "block_2_2_branch2a_conv" 169 | type: "Convolution" 170 | bottom: "block_2_2_branch2a_bn" 171 | top: "block_2_2_branch2a_conv" 172 | convolution_param { 173 | num_output: 160 174 | bias_term: false 175 | pad: 1 176 | kernel_size: 3 177 | stride: 1 178 | } 179 | } 180 | layer { 181 | name: "block_2_2_dropout" 182 | type: "Dropout" 183 | bottom: "block_2_2_branch2a_conv" 184 | top: "block_2_2_dropout" 185 | dropout_param { 186 | dropout_ratio: 0.3 187 | } 188 | } 189 | layer { 190 | name: "block_2_2_branch2b_bn" 191 | type: "BatchNorm" 192 | bottom: "block_2_2_dropout" 193 | top: "block_2_2_branch2b_bn" 194 | } 195 | layer { 196 | name: "block_2_2_branch2b_scale" 197 | type: "Scale" 198 | bottom: "block_2_2_branch2b_bn" 199 | top: "block_2_2_branch2b_bn" 200 | scale_param { 201 | bias_term: true 202 | } 203 | } 204 | layer { 205 | name: "block_2_2_branch2b_relu" 206 | type: "ReLU" 207 | bottom: "block_2_2_branch2b_bn" 208 | top: "block_2_2_branch2b_bn" 209 | } 210 | layer { 211 | name: "block_2_2_branch2b_conv" 212 | type: "Convolution" 213 | bottom: "block_2_2_branch2b_bn" 214 | top: "block_2_2_branch2b_conv" 215 | convolution_param { 216 | num_output: 160 217 | bias_term: false 218 | pad: 1 219 | kernel_size: 3 220 | stride: 1 221 | } 222 | } 223 | layer { 224 | name: "block_2_2_addition" 225 | type: "Eltwise" 226 | bottom: "block_2_1_addition" 227 | bottom: "block_2_2_branch2b_conv" 228 | top: "block_2_2_addition" 229 | } 230 | layer { 231 | name: "block_2_3_branch2a_bn" 232 | type: "BatchNorm" 233 | bottom: "block_2_2_addition" 234 | top: "block_2_3_branch2a_bn" 235 | } 236 | layer { 237 | name: "block_2_3_branch2a_scale" 238 | type: "Scale" 239 | bottom: "block_2_3_branch2a_bn" 240 | top: "block_2_3_branch2a_bn" 241 | scale_param { 242 | bias_term: true 243 | } 244 | } 245 | layer { 246 | name: "block_2_3_branch2a_relu" 247 | type: "ReLU" 248 | bottom: "block_2_3_branch2a_bn" 249 | top: "block_2_3_branch2a_bn" 250 | } 251 | layer { 252 | name: "block_2_3_branch2a_conv" 253 | type: "Convolution" 254 | bottom: "block_2_3_branch2a_bn" 255 | top: "block_2_3_branch2a_conv" 256 | convolution_param { 257 | num_output: 160 258 | bias_term: false 259 | pad: 1 260 | kernel_size: 3 261 | stride: 1 262 | } 263 | } 264 | layer { 265 | name: "block_2_3_dropout" 266 | type: "Dropout" 267 | bottom: "block_2_3_branch2a_conv" 268 | top: "block_2_3_dropout" 269 | dropout_param { 270 | dropout_ratio: 0.3 271 | } 272 | } 273 | layer { 274 | name: "block_2_3_branch2b_bn" 275 | type: "BatchNorm" 276 | bottom: "block_2_3_dropout" 277 | top: "block_2_3_branch2b_bn" 278 | } 279 | layer { 280 | name: "block_2_3_branch2b_scale" 281 | type: "Scale" 282 | bottom: "block_2_3_branch2b_bn" 283 | top: "block_2_3_branch2b_bn" 284 | scale_param { 285 | bias_term: true 286 | } 287 | } 288 | layer { 289 | name: "block_2_3_branch2b_relu" 290 | type: "ReLU" 291 | bottom: "block_2_3_branch2b_bn" 292 | top: "block_2_3_branch2b_bn" 293 | } 294 | layer { 295 | name: "block_2_3_branch2b_conv" 296 | type: "Convolution" 297 | bottom: "block_2_3_branch2b_bn" 298 | top: "block_2_3_branch2b_conv" 299 | convolution_param { 300 | num_output: 160 301 | bias_term: false 302 | pad: 1 303 | kernel_size: 3 304 | stride: 1 305 | } 306 | } 307 | layer { 308 | name: "block_2_3_addition" 309 | type: "Eltwise" 310 | bottom: "block_2_2_addition" 311 | bottom: "block_2_3_branch2b_conv" 312 | top: "block_2_3_addition" 313 | } 314 | layer { 315 | name: "block_2_4_branch2a_bn" 316 | type: "BatchNorm" 317 | bottom: "block_2_3_addition" 318 | top: "block_2_4_branch2a_bn" 319 | } 320 | layer { 321 | name: "block_2_4_branch2a_scale" 322 | type: "Scale" 323 | bottom: "block_2_4_branch2a_bn" 324 | top: "block_2_4_branch2a_bn" 325 | scale_param { 326 | bias_term: true 327 | } 328 | } 329 | layer { 330 | name: "block_2_4_branch2a_relu" 331 | type: "ReLU" 332 | bottom: "block_2_4_branch2a_bn" 333 | top: "block_2_4_branch2a_bn" 334 | } 335 | layer { 336 | name: "block_2_4_branch2a_conv" 337 | type: "Convolution" 338 | bottom: "block_2_4_branch2a_bn" 339 | top: "block_2_4_branch2a_conv" 340 | convolution_param { 341 | num_output: 160 342 | bias_term: false 343 | pad: 1 344 | kernel_size: 3 345 | stride: 1 346 | } 347 | } 348 | layer { 349 | name: "block_2_4_dropout" 350 | type: "Dropout" 351 | bottom: "block_2_4_branch2a_conv" 352 | top: "block_2_4_dropout" 353 | dropout_param { 354 | dropout_ratio: 0.3 355 | } 356 | } 357 | layer { 358 | name: "block_2_4_branch2b_bn" 359 | type: "BatchNorm" 360 | bottom: "block_2_4_dropout" 361 | top: "block_2_4_branch2b_bn" 362 | } 363 | layer { 364 | name: "block_2_4_branch2b_scale" 365 | type: "Scale" 366 | bottom: "block_2_4_branch2b_bn" 367 | top: "block_2_4_branch2b_bn" 368 | scale_param { 369 | bias_term: true 370 | } 371 | } 372 | layer { 373 | name: "block_2_4_branch2b_relu" 374 | type: "ReLU" 375 | bottom: "block_2_4_branch2b_bn" 376 | top: "block_2_4_branch2b_bn" 377 | } 378 | layer { 379 | name: "block_2_4_branch2b_conv" 380 | type: "Convolution" 381 | bottom: "block_2_4_branch2b_bn" 382 | top: "block_2_4_branch2b_conv" 383 | convolution_param { 384 | num_output: 160 385 | bias_term: false 386 | pad: 1 387 | kernel_size: 3 388 | stride: 1 389 | } 390 | } 391 | layer { 392 | name: "block_2_4_addition" 393 | type: "Eltwise" 394 | bottom: "block_2_3_addition" 395 | bottom: "block_2_4_branch2b_conv" 396 | top: "block_2_4_addition" 397 | } 398 | layer { 399 | name: "block_3_1_branch2a_bn" 400 | type: "BatchNorm" 401 | bottom: "block_2_4_addition" 402 | top: "block_3_1_branch2a_bn" 403 | } 404 | layer { 405 | name: "block_3_1_branch2a_scale" 406 | type: "Scale" 407 | bottom: "block_3_1_branch2a_bn" 408 | top: "block_3_1_branch2a_bn" 409 | scale_param { 410 | bias_term: true 411 | } 412 | } 413 | layer { 414 | name: "block_3_1_branch2a_relu" 415 | type: "ReLU" 416 | bottom: "block_3_1_branch2a_bn" 417 | top: "block_3_1_branch2a_bn" 418 | } 419 | layer { 420 | name: "block_3_1_branch2a_conv" 421 | type: "Convolution" 422 | bottom: "block_3_1_branch2a_bn" 423 | top: "block_3_1_branch2a_conv" 424 | convolution_param { 425 | num_output: 320 426 | bias_term: false 427 | pad: 1 428 | kernel_size: 3 429 | stride: 2 430 | } 431 | } 432 | layer { 433 | name: "block_3_1_dropout" 434 | type: "Dropout" 435 | bottom: "block_3_1_branch2a_conv" 436 | top: "block_3_1_dropout" 437 | dropout_param { 438 | dropout_ratio: 0.3 439 | } 440 | } 441 | layer { 442 | name: "block_3_1_branch2b_bn" 443 | type: "BatchNorm" 444 | bottom: "block_3_1_dropout" 445 | top: "block_3_1_branch2b_bn" 446 | } 447 | layer { 448 | name: "block_3_1_branch2b_scale" 449 | type: "Scale" 450 | bottom: "block_3_1_branch2b_bn" 451 | top: "block_3_1_branch2b_bn" 452 | scale_param { 453 | bias_term: true 454 | } 455 | } 456 | layer { 457 | name: "block_3_1_branch2b_relu" 458 | type: "ReLU" 459 | bottom: "block_3_1_branch2b_bn" 460 | top: "block_3_1_branch2b_bn" 461 | } 462 | layer { 463 | name: "block_3_1_branch2b_conv" 464 | type: "Convolution" 465 | bottom: "block_3_1_branch2b_bn" 466 | top: "block_3_1_branch2b_conv" 467 | convolution_param { 468 | num_output: 320 469 | bias_term: false 470 | pad: 1 471 | kernel_size: 3 472 | stride: 1 473 | } 474 | } 475 | layer { 476 | name: "block_3_1_branch1_bn" 477 | type: "BatchNorm" 478 | bottom: "block_2_4_addition" 479 | top: "block_3_1_branch1_bn" 480 | } 481 | layer { 482 | name: "block_3_1_branch1_scale" 483 | type: "Scale" 484 | bottom: "block_3_1_branch1_bn" 485 | top: "block_3_1_branch1_bn" 486 | scale_param { 487 | bias_term: true 488 | } 489 | } 490 | layer { 491 | name: "block_3_1_branch1_relu" 492 | type: "ReLU" 493 | bottom: "block_3_1_branch1_bn" 494 | top: "block_3_1_branch1_bn" 495 | } 496 | layer { 497 | name: "block_3_1_branch1_conv" 498 | type: "Convolution" 499 | bottom: "block_3_1_branch1_bn" 500 | top: "block_3_1_branch1_conv" 501 | convolution_param { 502 | num_output: 320 503 | bias_term: false 504 | pad: 1 505 | kernel_size: 3 506 | stride: 2 507 | } 508 | } 509 | layer { 510 | name: "block_3_1_addition" 511 | type: "Eltwise" 512 | bottom: "block_3_1_branch1_conv" 513 | bottom: "block_3_1_branch2b_conv" 514 | top: "block_3_1_addition" 515 | } 516 | layer { 517 | name: "block_3_2_branch2a_bn" 518 | type: "BatchNorm" 519 | bottom: "block_3_1_addition" 520 | top: "block_3_2_branch2a_bn" 521 | } 522 | layer { 523 | name: "block_3_2_branch2a_scale" 524 | type: "Scale" 525 | bottom: "block_3_2_branch2a_bn" 526 | top: "block_3_2_branch2a_bn" 527 | scale_param { 528 | bias_term: true 529 | } 530 | } 531 | layer { 532 | name: "block_3_2_branch2a_relu" 533 | type: "ReLU" 534 | bottom: "block_3_2_branch2a_bn" 535 | top: "block_3_2_branch2a_bn" 536 | } 537 | layer { 538 | name: "block_3_2_branch2a_conv" 539 | type: "Convolution" 540 | bottom: "block_3_2_branch2a_bn" 541 | top: "block_3_2_branch2a_conv" 542 | convolution_param { 543 | num_output: 320 544 | bias_term: false 545 | pad: 1 546 | kernel_size: 3 547 | stride: 1 548 | } 549 | } 550 | layer { 551 | name: "block_3_2_dropout" 552 | type: "Dropout" 553 | bottom: "block_3_2_branch2a_conv" 554 | top: "block_3_2_dropout" 555 | dropout_param { 556 | dropout_ratio: 0.3 557 | } 558 | } 559 | layer { 560 | name: "block_3_2_branch2b_bn" 561 | type: "BatchNorm" 562 | bottom: "block_3_2_dropout" 563 | top: "block_3_2_branch2b_bn" 564 | } 565 | layer { 566 | name: "block_3_2_branch2b_scale" 567 | type: "Scale" 568 | bottom: "block_3_2_branch2b_bn" 569 | top: "block_3_2_branch2b_bn" 570 | scale_param { 571 | bias_term: true 572 | } 573 | } 574 | layer { 575 | name: "block_3_2_branch2b_relu" 576 | type: "ReLU" 577 | bottom: "block_3_2_branch2b_bn" 578 | top: "block_3_2_branch2b_bn" 579 | } 580 | layer { 581 | name: "block_3_2_branch2b_conv" 582 | type: "Convolution" 583 | bottom: "block_3_2_branch2b_bn" 584 | top: "block_3_2_branch2b_conv" 585 | convolution_param { 586 | num_output: 320 587 | bias_term: false 588 | pad: 1 589 | kernel_size: 3 590 | stride: 1 591 | } 592 | } 593 | layer { 594 | name: "block_3_2_addition" 595 | type: "Eltwise" 596 | bottom: "block_3_1_addition" 597 | bottom: "block_3_2_branch2b_conv" 598 | top: "block_3_2_addition" 599 | } 600 | layer { 601 | name: "block_3_3_branch2a_bn" 602 | type: "BatchNorm" 603 | bottom: "block_3_2_addition" 604 | top: "block_3_3_branch2a_bn" 605 | } 606 | layer { 607 | name: "block_3_3_branch2a_scale" 608 | type: "Scale" 609 | bottom: "block_3_3_branch2a_bn" 610 | top: "block_3_3_branch2a_bn" 611 | scale_param { 612 | bias_term: true 613 | } 614 | } 615 | layer { 616 | name: "block_3_3_branch2a_relu" 617 | type: "ReLU" 618 | bottom: "block_3_3_branch2a_bn" 619 | top: "block_3_3_branch2a_bn" 620 | } 621 | layer { 622 | name: "block_3_3_branch2a_conv" 623 | type: "Convolution" 624 | bottom: "block_3_3_branch2a_bn" 625 | top: "block_3_3_branch2a_conv" 626 | convolution_param { 627 | num_output: 320 628 | bias_term: false 629 | pad: 1 630 | kernel_size: 3 631 | stride: 1 632 | } 633 | } 634 | layer { 635 | name: "block_3_3_dropout" 636 | type: "Dropout" 637 | bottom: "block_3_3_branch2a_conv" 638 | top: "block_3_3_dropout" 639 | dropout_param { 640 | dropout_ratio: 0.3 641 | } 642 | } 643 | layer { 644 | name: "block_3_3_branch2b_bn" 645 | type: "BatchNorm" 646 | bottom: "block_3_3_dropout" 647 | top: "block_3_3_branch2b_bn" 648 | } 649 | layer { 650 | name: "block_3_3_branch2b_scale" 651 | type: "Scale" 652 | bottom: "block_3_3_branch2b_bn" 653 | top: "block_3_3_branch2b_bn" 654 | scale_param { 655 | bias_term: true 656 | } 657 | } 658 | layer { 659 | name: "block_3_3_branch2b_relu" 660 | type: "ReLU" 661 | bottom: "block_3_3_branch2b_bn" 662 | top: "block_3_3_branch2b_bn" 663 | } 664 | layer { 665 | name: "block_3_3_branch2b_conv" 666 | type: "Convolution" 667 | bottom: "block_3_3_branch2b_bn" 668 | top: "block_3_3_branch2b_conv" 669 | convolution_param { 670 | num_output: 320 671 | bias_term: false 672 | pad: 1 673 | kernel_size: 3 674 | stride: 1 675 | } 676 | } 677 | layer { 678 | name: "block_3_3_addition" 679 | type: "Eltwise" 680 | bottom: "block_3_2_addition" 681 | bottom: "block_3_3_branch2b_conv" 682 | top: "block_3_3_addition" 683 | } 684 | layer { 685 | name: "block_3_4_branch2a_bn" 686 | type: "BatchNorm" 687 | bottom: "block_3_3_addition" 688 | top: "block_3_4_branch2a_bn" 689 | } 690 | layer { 691 | name: "block_3_4_branch2a_scale" 692 | type: "Scale" 693 | bottom: "block_3_4_branch2a_bn" 694 | top: "block_3_4_branch2a_bn" 695 | scale_param { 696 | bias_term: true 697 | } 698 | } 699 | layer { 700 | name: "block_3_4_branch2a_relu" 701 | type: "ReLU" 702 | bottom: "block_3_4_branch2a_bn" 703 | top: "block_3_4_branch2a_bn" 704 | } 705 | layer { 706 | name: "block_3_4_branch2a_conv" 707 | type: "Convolution" 708 | bottom: "block_3_4_branch2a_bn" 709 | top: "block_3_4_branch2a_conv" 710 | convolution_param { 711 | num_output: 320 712 | bias_term: false 713 | pad: 1 714 | kernel_size: 3 715 | stride: 1 716 | } 717 | } 718 | layer { 719 | name: "block_3_4_dropout" 720 | type: "Dropout" 721 | bottom: "block_3_4_branch2a_conv" 722 | top: "block_3_4_dropout" 723 | dropout_param { 724 | dropout_ratio: 0.3 725 | } 726 | } 727 | layer { 728 | name: "block_3_4_branch2b_bn" 729 | type: "BatchNorm" 730 | bottom: "block_3_4_dropout" 731 | top: "block_3_4_branch2b_bn" 732 | } 733 | layer { 734 | name: "block_3_4_branch2b_scale" 735 | type: "Scale" 736 | bottom: "block_3_4_branch2b_bn" 737 | top: "block_3_4_branch2b_bn" 738 | scale_param { 739 | bias_term: true 740 | } 741 | } 742 | layer { 743 | name: "block_3_4_branch2b_relu" 744 | type: "ReLU" 745 | bottom: "block_3_4_branch2b_bn" 746 | top: "block_3_4_branch2b_bn" 747 | } 748 | layer { 749 | name: "block_3_4_branch2b_conv" 750 | type: "Convolution" 751 | bottom: "block_3_4_branch2b_bn" 752 | top: "block_3_4_branch2b_conv" 753 | convolution_param { 754 | num_output: 320 755 | bias_term: false 756 | pad: 1 757 | kernel_size: 3 758 | stride: 1 759 | } 760 | } 761 | layer { 762 | name: "block_3_4_addition" 763 | type: "Eltwise" 764 | bottom: "block_3_3_addition" 765 | bottom: "block_3_4_branch2b_conv" 766 | top: "block_3_4_addition" 767 | } 768 | layer { 769 | name: "block_4_1_branch2a_bn" 770 | type: "BatchNorm" 771 | bottom: "block_3_4_addition" 772 | top: "block_4_1_branch2a_bn" 773 | } 774 | layer { 775 | name: "block_4_1_branch2a_scale" 776 | type: "Scale" 777 | bottom: "block_4_1_branch2a_bn" 778 | top: "block_4_1_branch2a_bn" 779 | scale_param { 780 | bias_term: true 781 | } 782 | } 783 | layer { 784 | name: "block_4_1_branch2a_relu" 785 | type: "ReLU" 786 | bottom: "block_4_1_branch2a_bn" 787 | top: "block_4_1_branch2a_bn" 788 | } 789 | layer { 790 | name: "block_4_1_branch2a_conv" 791 | type: "Convolution" 792 | bottom: "block_4_1_branch2a_bn" 793 | top: "block_4_1_branch2a_conv" 794 | convolution_param { 795 | num_output: 640 796 | bias_term: false 797 | pad: 1 798 | kernel_size: 3 799 | stride: 2 800 | } 801 | } 802 | layer { 803 | name: "block_4_1_dropout" 804 | type: "Dropout" 805 | bottom: "block_4_1_branch2a_conv" 806 | top: "block_4_1_dropout" 807 | dropout_param { 808 | dropout_ratio: 0.3 809 | } 810 | } 811 | layer { 812 | name: "block_4_1_branch2b_bn" 813 | type: "BatchNorm" 814 | bottom: "block_4_1_dropout" 815 | top: "block_4_1_branch2b_bn" 816 | } 817 | layer { 818 | name: "block_4_1_branch2b_scale" 819 | type: "Scale" 820 | bottom: "block_4_1_branch2b_bn" 821 | top: "block_4_1_branch2b_bn" 822 | scale_param { 823 | bias_term: true 824 | } 825 | } 826 | layer { 827 | name: "block_4_1_branch2b_relu" 828 | type: "ReLU" 829 | bottom: "block_4_1_branch2b_bn" 830 | top: "block_4_1_branch2b_bn" 831 | } 832 | layer { 833 | name: "block_4_1_branch2b_conv" 834 | type: "Convolution" 835 | bottom: "block_4_1_branch2b_bn" 836 | top: "block_4_1_branch2b_conv" 837 | convolution_param { 838 | num_output: 640 839 | bias_term: false 840 | pad: 1 841 | kernel_size: 3 842 | stride: 1 843 | } 844 | } 845 | layer { 846 | name: "block_4_1_branch1_bn" 847 | type: "BatchNorm" 848 | bottom: "block_3_4_addition" 849 | top: "block_4_1_branch1_bn" 850 | } 851 | layer { 852 | name: "block_4_1_branch1_scale" 853 | type: "Scale" 854 | bottom: "block_4_1_branch1_bn" 855 | top: "block_4_1_branch1_bn" 856 | scale_param { 857 | bias_term: true 858 | } 859 | } 860 | layer { 861 | name: "block_4_1_branch1_relu" 862 | type: "ReLU" 863 | bottom: "block_4_1_branch1_bn" 864 | top: "block_4_1_branch1_bn" 865 | } 866 | layer { 867 | name: "block_4_1_branch1_conv" 868 | type: "Convolution" 869 | bottom: "block_4_1_branch1_bn" 870 | top: "block_4_1_branch1_conv" 871 | convolution_param { 872 | num_output: 640 873 | bias_term: false 874 | pad: 1 875 | kernel_size: 3 876 | stride: 2 877 | } 878 | } 879 | layer { 880 | name: "block_4_1_addition" 881 | type: "Eltwise" 882 | bottom: "block_4_1_branch1_conv" 883 | bottom: "block_4_1_branch2b_conv" 884 | top: "block_4_1_addition" 885 | } 886 | layer { 887 | name: "block_4_2_branch2a_bn" 888 | type: "BatchNorm" 889 | bottom: "block_4_1_addition" 890 | top: "block_4_2_branch2a_bn" 891 | } 892 | layer { 893 | name: "block_4_2_branch2a_scale" 894 | type: "Scale" 895 | bottom: "block_4_2_branch2a_bn" 896 | top: "block_4_2_branch2a_bn" 897 | scale_param { 898 | bias_term: true 899 | } 900 | } 901 | layer { 902 | name: "block_4_2_branch2a_relu" 903 | type: "ReLU" 904 | bottom: "block_4_2_branch2a_bn" 905 | top: "block_4_2_branch2a_bn" 906 | } 907 | layer { 908 | name: "block_4_2_branch2a_conv" 909 | type: "Convolution" 910 | bottom: "block_4_2_branch2a_bn" 911 | top: "block_4_2_branch2a_conv" 912 | convolution_param { 913 | num_output: 640 914 | bias_term: false 915 | pad: 1 916 | kernel_size: 3 917 | stride: 1 918 | } 919 | } 920 | layer { 921 | name: "block_4_2_dropout" 922 | type: "Dropout" 923 | bottom: "block_4_2_branch2a_conv" 924 | top: "block_4_2_dropout" 925 | dropout_param { 926 | dropout_ratio: 0.3 927 | } 928 | } 929 | layer { 930 | name: "block_4_2_branch2b_bn" 931 | type: "BatchNorm" 932 | bottom: "block_4_2_dropout" 933 | top: "block_4_2_branch2b_bn" 934 | } 935 | layer { 936 | name: "block_4_2_branch2b_scale" 937 | type: "Scale" 938 | bottom: "block_4_2_branch2b_bn" 939 | top: "block_4_2_branch2b_bn" 940 | scale_param { 941 | bias_term: true 942 | } 943 | } 944 | layer { 945 | name: "block_4_2_branch2b_relu" 946 | type: "ReLU" 947 | bottom: "block_4_2_branch2b_bn" 948 | top: "block_4_2_branch2b_bn" 949 | } 950 | layer { 951 | name: "block_4_2_branch2b_conv" 952 | type: "Convolution" 953 | bottom: "block_4_2_branch2b_bn" 954 | top: "block_4_2_branch2b_conv" 955 | convolution_param { 956 | num_output: 640 957 | bias_term: false 958 | pad: 1 959 | kernel_size: 3 960 | stride: 1 961 | } 962 | } 963 | layer { 964 | name: "block_4_2_addition" 965 | type: "Eltwise" 966 | bottom: "block_4_1_addition" 967 | bottom: "block_4_2_branch2b_conv" 968 | top: "block_4_2_addition" 969 | } 970 | layer { 971 | name: "block_4_3_branch2a_bn" 972 | type: "BatchNorm" 973 | bottom: "block_4_2_addition" 974 | top: "block_4_3_branch2a_bn" 975 | } 976 | layer { 977 | name: "block_4_3_branch2a_scale" 978 | type: "Scale" 979 | bottom: "block_4_3_branch2a_bn" 980 | top: "block_4_3_branch2a_bn" 981 | scale_param { 982 | bias_term: true 983 | } 984 | } 985 | layer { 986 | name: "block_4_3_branch2a_relu" 987 | type: "ReLU" 988 | bottom: "block_4_3_branch2a_bn" 989 | top: "block_4_3_branch2a_bn" 990 | } 991 | layer { 992 | name: "block_4_3_branch2a_conv" 993 | type: "Convolution" 994 | bottom: "block_4_3_branch2a_bn" 995 | top: "block_4_3_branch2a_conv" 996 | convolution_param { 997 | num_output: 640 998 | bias_term: false 999 | pad: 1 1000 | kernel_size: 3 1001 | stride: 1 1002 | } 1003 | } 1004 | layer { 1005 | name: "block_4_3_dropout" 1006 | type: "Dropout" 1007 | bottom: "block_4_3_branch2a_conv" 1008 | top: "block_4_3_dropout" 1009 | dropout_param { 1010 | dropout_ratio: 0.3 1011 | } 1012 | } 1013 | layer { 1014 | name: "block_4_3_branch2b_bn" 1015 | type: "BatchNorm" 1016 | bottom: "block_4_3_dropout" 1017 | top: "block_4_3_branch2b_bn" 1018 | } 1019 | layer { 1020 | name: "block_4_3_branch2b_scale" 1021 | type: "Scale" 1022 | bottom: "block_4_3_branch2b_bn" 1023 | top: "block_4_3_branch2b_bn" 1024 | scale_param { 1025 | bias_term: true 1026 | } 1027 | } 1028 | layer { 1029 | name: "block_4_3_branch2b_relu" 1030 | type: "ReLU" 1031 | bottom: "block_4_3_branch2b_bn" 1032 | top: "block_4_3_branch2b_bn" 1033 | } 1034 | layer { 1035 | name: "block_4_3_branch2b_conv" 1036 | type: "Convolution" 1037 | bottom: "block_4_3_branch2b_bn" 1038 | top: "block_4_3_branch2b_conv" 1039 | convolution_param { 1040 | num_output: 640 1041 | bias_term: false 1042 | pad: 1 1043 | kernel_size: 3 1044 | stride: 1 1045 | } 1046 | } 1047 | layer { 1048 | name: "block_4_3_addition" 1049 | type: "Eltwise" 1050 | bottom: "block_4_2_addition" 1051 | bottom: "block_4_3_branch2b_conv" 1052 | top: "block_4_3_addition" 1053 | } 1054 | layer { 1055 | name: "block_4_4_branch2a_bn" 1056 | type: "BatchNorm" 1057 | bottom: "block_4_3_addition" 1058 | top: "block_4_4_branch2a_bn" 1059 | } 1060 | layer { 1061 | name: "block_4_4_branch2a_scale" 1062 | type: "Scale" 1063 | bottom: "block_4_4_branch2a_bn" 1064 | top: "block_4_4_branch2a_bn" 1065 | scale_param { 1066 | bias_term: true 1067 | } 1068 | } 1069 | layer { 1070 | name: "block_4_4_branch2a_relu" 1071 | type: "ReLU" 1072 | bottom: "block_4_4_branch2a_bn" 1073 | top: "block_4_4_branch2a_bn" 1074 | } 1075 | layer { 1076 | name: "block_4_4_branch2a_conv" 1077 | type: "Convolution" 1078 | bottom: "block_4_4_branch2a_bn" 1079 | top: "block_4_4_branch2a_conv" 1080 | convolution_param { 1081 | num_output: 640 1082 | bias_term: false 1083 | pad: 1 1084 | kernel_size: 3 1085 | stride: 1 1086 | } 1087 | } 1088 | layer { 1089 | name: "block_4_4_dropout" 1090 | type: "Dropout" 1091 | bottom: "block_4_4_branch2a_conv" 1092 | top: "block_4_4_dropout" 1093 | dropout_param { 1094 | dropout_ratio: 0.3 1095 | } 1096 | } 1097 | layer { 1098 | name: "block_4_4_branch2b_bn" 1099 | type: "BatchNorm" 1100 | bottom: "block_4_4_dropout" 1101 | top: "block_4_4_branch2b_bn" 1102 | } 1103 | layer { 1104 | name: "block_4_4_branch2b_scale" 1105 | type: "Scale" 1106 | bottom: "block_4_4_branch2b_bn" 1107 | top: "block_4_4_branch2b_bn" 1108 | scale_param { 1109 | bias_term: true 1110 | } 1111 | } 1112 | layer { 1113 | name: "block_4_4_branch2b_relu" 1114 | type: "ReLU" 1115 | bottom: "block_4_4_branch2b_bn" 1116 | top: "block_4_4_branch2b_bn" 1117 | } 1118 | layer { 1119 | name: "block_4_4_branch2b_conv" 1120 | type: "Convolution" 1121 | bottom: "block_4_4_branch2b_bn" 1122 | top: "block_4_4_branch2b_conv" 1123 | convolution_param { 1124 | num_output: 640 1125 | bias_term: false 1126 | pad: 1 1127 | kernel_size: 3 1128 | stride: 1 1129 | } 1130 | } 1131 | layer { 1132 | name: "block_4_4_addition" 1133 | type: "Eltwise" 1134 | bottom: "block_4_3_addition" 1135 | bottom: "block_4_4_branch2b_conv" 1136 | top: "block_4_4_addition" 1137 | } 1138 | layer { 1139 | name: "bn5" 1140 | type: "BatchNorm" 1141 | bottom: "block_4_4_addition" 1142 | top: "bn5" 1143 | } 1144 | layer { 1145 | name: "scale5" 1146 | type: "Scale" 1147 | bottom: "bn5" 1148 | top: "bn5" 1149 | scale_param { 1150 | bias_term: true 1151 | } 1152 | } 1153 | layer { 1154 | name: "relu5" 1155 | type: "ReLU" 1156 | bottom: "bn5" 1157 | top: "bn5" 1158 | } 1159 | layer { 1160 | name: "pool5" 1161 | type: "Pooling" 1162 | bottom: "bn5" 1163 | top: "pool5" 1164 | pooling_param { 1165 | pool: AVE 1166 | global_pooling: true 1167 | } 1168 | } 1169 | layer { 1170 | name: "fc6" 1171 | type: "InnerProduct" 1172 | bottom: "pool5" 1173 | top: "fc6" 1174 | inner_product_param { 1175 | num_output: 10 1176 | } 1177 | } 1178 | layer { 1179 | name: "prob" 1180 | type: "Softmax" 1181 | bottom: "fc6" 1182 | top: "prob" 1183 | } 1184 | -------------------------------------------------------------------------------- /WRN/wrn_deploy.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | 3 | import os 4 | import argparse 5 | import numpy as np 6 | from six.moves import xrange 7 | 8 | import caffe 9 | from caffe import layers as L, params as P 10 | from caffe import to_proto 11 | 12 | 13 | # 4(layer) in 1(block) 14 | # BatchNorm - Scale - ReLU - Convolution 15 | def _block_4in1(major, minor, net, bottom, nout, pad, ks, stride): 16 | block_flag = 'block_{}_branch{}'.format(major, minor) 17 | bn_layer = '{}_bn'.format(block_flag) 18 | scale_layer = '{}_scale'.format(block_flag) 19 | relu_layer = '{}_relu'.format(block_flag) 20 | conv_layer = '{}_conv'.format(block_flag) 21 | 22 | net[bn_layer] = L.BatchNorm(bottom) 23 | net[scale_layer] = L.Scale(net[bn_layer], bias_term = True, in_place = True) 24 | net[relu_layer] = L.ReLU(net[scale_layer], in_place = True) 25 | net[conv_layer] = L.Convolution(net[relu_layer], 26 | num_output = nout, pad = pad, 27 | kernel_size = ks, stride = stride, 28 | bias_term = False) 29 | 30 | return net[conv_layer] 31 | 32 | 33 | # block (residual unit) 34 | # [4in1] \ for increasing dimensions (decreasing spatial dimensions) 35 | # | - block 36 | # 4in1 - 4in1 - 4in1 / 37 | def _block(flag, net, bottom, nout, has_branch1=False, increasing_dims=True, dropout=0.3): 38 | eltwise_layer = 'block_{}_addition'.format(flag) 39 | 40 | stride = 1 41 | if has_branch1 and increasing_dims: 42 | stride = 2 43 | 44 | branch2a = _block_4in1(flag, '2a', net, bottom, nout, 1, 3, stride) 45 | if dropout > 0: 46 | dropout_layer = 'block_{}_dropout'.format(flag) 47 | net[dropout_layer] = L.Dropout(branch2a, dropout_ratio=dropout) 48 | branch2b = _block_4in1(flag, '2b', net, net[dropout_layer], nout, 1, 3, 1) 49 | else: 50 | branch2b = _block_4in1(flag, '2b', net, branch2a, nout, 1, 3, 1) 51 | 52 | if has_branch1: 53 | branch1 = _block_4in1(flag, '1', net, bottom, nout, 1, 3, stride) 54 | net[eltwise_layer] = L.Eltwise(branch1, branch2b) 55 | else: 56 | net[eltwise_layer] = L.Eltwise(bottom, branch2b) 57 | 58 | return net[eltwise_layer] 59 | 60 | 61 | def construc_net(widening_factor, num_block_per_stage): 62 | net = caffe.NetSpec() 63 | 64 | net.data = L.Input(input_param = dict(shape = dict(dim = [1,3,32,32]))) 65 | 66 | net.conv1 = L.Convolution(net.data, num_output = 16, 67 | kernel_size = 3, stride = 1, pad = 1, 68 | bias_term = False) 69 | 70 | # stage 1 71 | num_out = widening_factor * 16 72 | block_pre = _block('2_1', net, net.conv1, num_out, has_branch1=True, increasing_dims=False) 73 | for idx in xrange(2,num_block_per_stage+1,1): 74 | flag = '2_{}'.format(idx) 75 | block_pre = _block(flag, net, block_pre, num_out) 76 | 77 | # stage 2 78 | num_out = widening_factor * 32 79 | block_pre = _block('3_1', net, block_pre, num_out, has_branch1=True) 80 | for idx in xrange(2,num_block_per_stage+1,1): 81 | flag = '3_{}'.format(idx) 82 | block_pre = _block(flag, net, block_pre, num_out) 83 | 84 | # stage 3 85 | num_out = widening_factor * 64 86 | block_pre = _block('4_1', net, block_pre, num_out, has_branch1=True) 87 | for idx in xrange(2,num_block_per_stage+1,1): 88 | flag = '4_{}'.format(idx) 89 | block_pre = _block(flag, net, block_pre, num_out) 90 | 91 | net.bn5 = L.BatchNorm(block_pre) 92 | net.scale5 = L.Scale(net.bn5, bias_term = True, in_place=True) 93 | net.relu5 = L.ReLU(net.scale5, in_place = True) 94 | net.pool5 = L.Pooling(net.relu5, pool = P.Pooling.AVE, global_pooling=True) 95 | 96 | net.fc6 = L.InnerProduct(net.pool5, num_output = 10) 97 | net.prob = L.Softmax(net.fc6) 98 | 99 | return net.to_proto() 100 | 101 | 102 | def main(args): 103 | num_block_per_stage = (args.depth - 4) // 6 104 | 105 | file_name = 'wrn_{}_{}_deploy.prototxt'.format(args.depth, args.wfactor) 106 | net_name = 'name: "WRN-{}-{}_deploy"\n'.format(args.depth, args.wfactor) 107 | 108 | with open(file_name, 'w') as f: 109 | f.write(net_name) 110 | f.write(str(construc_net(args.wfactor, num_block_per_stage))) 111 | 112 | 113 | if __name__ == '__main__': 114 | parser = argparse.ArgumentParser() 115 | parser.add_argument('depth', type = int, 116 | help = 'depth should be 6n+4 (e.g., 16,22,28,40 in the paper)') 117 | parser.add_argument('wfactor', type = int, 118 | help = ' widening factor k, multiplies the number of features in conv layers') 119 | args = parser.parse_args() 120 | 121 | main(args) 122 | -------------------------------------------------------------------------------- /install_Caffe_CentOS7.sh: -------------------------------------------------------------------------------- 1 | #! /bin/bash 2 | 3 | # set installation location 4 | CAFFE_INSTALL_ROOT=$(pwd) 5 | echo $CAFFE_INSTALL_ROOT 6 | 7 | # determine super user status 8 | SUPER_USER="root" 9 | CURRENT_USER=$(whoami) 10 | if [ $CURRENT_USER == $SUPER_USER ]; then 11 | SUDO_OR_NOT="" 12 | else 13 | SUDO_OR_NOT="sudo" 14 | fi 15 | 16 | # installing EPEL & update to the latest latest version of packages 17 | $SUDO_OR_NOT yum -y install epel-release 18 | $SUDO_OR_NOT yum clean all 19 | $SUDO_OR_NOT yum -y update --exclude=kernel* 20 | 21 | # installing development tools 22 | $SUDO_OR_NOT yum -y install autoconf automake cmake gcc gcc-c++ libtool make pkgconfig unzip 23 | #$SUDO_OR_NOT yum -y install redhat-rpm-config rpm-build rpm-sign 24 | $SUDO_OR_NOT yum -y install python-devel python-pip 25 | 26 | # installing pre-requisites for Caffe 27 | $SUDO_OR_NOT yum -y install boost-devel glog-devel gflags-devel hdf5-devel leveldb-devel 28 | $SUDO_OR_NOT yum -y install lmdb-devel openblas-devel opencv-devel protobuf-devel snappy-devel 29 | 30 | echo "export LD_LIBRARY_PATH=/usr/local/lib:/usr/lib64:$LD_LIBRARY_PATH" >> ~/.bash_profile 31 | source ~/.bash_profile 32 | 33 | # get Caffe 34 | cd $CAFFE_INSTALL_ROOT 35 | wget https://github.com/BVLC/caffe/archive/master.zip 36 | unzip -o master.zip 37 | mv caffe-master caffe 38 | 39 | # prepare Python binding for pycaffe 40 | #pip install --upgrade pip 41 | #pip install --upgrade setuptools 42 | pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade pip 43 | pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --upgrade setuptools 44 | cd $CAFFE_INSTALL_ROOT/caffe/python 45 | python_verion=$(python --version 2>&1 | sed 's/.* \([0-9]\).\([0-9]\).*/\1\2/') 46 | if [ "$python_verion" -lt "33" ]; then 47 | # IPython 6.0+ does not support Python 2.6, 2.7, 3.0, 3.1, or 3.2 48 | sed -i '6s/.*/ipython>=3.0.0,<6.0.0/' requirements.txt 49 | fi 50 | for req in $(cat requirements.txt) 51 | #do sudo pip install $req 52 | do sudo pip install -i https://pypi.tuna.tsinghua.edu.cn/simple $req 53 | done 54 | 55 | echo "export PYTHONPATH=$(pwd):$PYTHONPATH" >> ~/.bash_profile 56 | source ~/.bash_profile 57 | 58 | # modify Makefile.config 59 | cd $CAFFE_INSTALL_ROOT/caffe 60 | cp -f Makefile.config.example Makefile.config 61 | # use CPU only 62 | sed -i '8s/.*/CPU_ONLY := 1/' Makefile.config # CPU only 63 | # use GPU 64 | #sed -i '5s/.*/USE_CUDNN := 1/' Makefile.config # use cuDNN 65 | #sed -i '29s/.*/CUDA_DIR := \/usr\/local\/cuda/' Makefile.config 66 | sed -i '50s/.*/BLAS := open/' Makefile.config # use OpenBLAS 67 | sed -i '54s/.*/BLAS_INCLUDE := \/usr\/include\/openblas/' Makefile.config 68 | sed -i '55s/.*/BLAS_LIB := \/usr\/lib64/' Makefile.config 69 | numpy_include_path=$(dirname $(dirname `find / -name "arrayobject.h"`)) 70 | ###sed -i "69s/.*/${numpy_include_path}/" Makefile.config # not working 71 | ###sed -i "69s~.*~${numpy_include_path}~" Makefile.config # working 72 | sed -i "69s#.*# ${numpy_include_path}#" Makefile.config 73 | sed -i '91s/.*/WITH_PYTHON_LAYER := 1/' Makefile.config # compile Python layer 74 | 75 | # compile caffe and pycaffe 76 | NUMBER_OF_CORES=$(grep "^core id" /proc/cpuinfo | sort -u | wc -l) 77 | make all -j$NUMBER_OF_CORES 78 | make pycaffe -j$NUMBER_OF_CORES 79 | make test 80 | make runtest 81 | make distribute 82 | 83 | # at the end, you need to run "source ~/.bash_profile" manually 84 | # or start a new shell to be able to do 'python import caffe'. 85 | --------------------------------------------------------------------------------