├── object_detection_summary.md
├── object_detection_summary.pdf
├── pic
    ├── Selection_043.png
    ├── Selection_044.png
    ├── Selection_045.png
    ├── Selection_048.png
    ├── Selection_049.png
    ├── Selection_050.png
    ├── Selection_051.png
    ├── Selection_052.png
    ├── Selection_053.png
    ├── Selection_054.png
    ├── Selection_056.png
    ├── Selection_057.png
    ├── Selection_058.png
    ├── Selection_059.png
    ├── Selection_060.png
    ├── Selection_061.png
    ├── Selection_062.png
    ├── Selection_063.png
    ├── Selection_064.png
    ├── Selection_065.png
    ├── Selection_066.png
    ├── Selection_067.png
    ├── Selection_068.png
    ├── Selection_069.png
    ├── Selection_070.png
    ├── Selection_071.png
    ├── Selection_072.png
    ├── Selection_073.png
    ├── Selection_074.png
    ├── Selection_076.png
    ├── Selection_077.png
    ├── Selection_078.png
    ├── Selection_079.png
    ├── Selection_080.png
    ├── Selection_081.png
    ├── Selection_082.png
    ├── Selection_083.png
    ├── Selection_084.png
    ├── Selection_085.png
    ├── Selection_086.png
    ├── Selection_087.png
    ├── Selection_088.png
    ├── Selection_089.png
    ├── Selection_090.png
    ├── Selection_091.png
    ├── Selection_092.png
    ├── Selection_093.png
    ├── Selection_094.png
    ├── Selection_095.png
    ├── Selection_096.png
    ├── Selection_097.png
    ├── Selection_098.png
    ├── Selection_099.png
    ├── Selection_136.png
    ├── Selection_137.png
    ├── Selection_138.png
    ├── Selection_139.png
    ├── Selection_140.png
    ├── Selection_141.png
    ├── Selection_144.png
    ├── Selection_145.png
    ├── Selection_160.png
    ├── Selection_161.png
    └── Selection_162.png
└── rcnn_seires_paper.docx


/object_detection_summary.md:
--------------------------------------------------------------------------------
  1 | # Object Detection Summary
  2 | ## 1. SPP-Net
  3 | **Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition** [[Paper](https://arxiv.org/pdf/1406.4729.pdf)] [[Code](https://github.com/ShaoqingRen/SPP_net)]: Kaiming He et.al
  4 | 
  5 | #### Net structure
  6 | Main structure: ZF + Spatial Pyramid Pooling layer
  7 | 
  8 | Spatial Pyramid Pooling Layer: SPP是BOW（Bag of Words）模型的扩展，**将图像从粗粒度到细粒度进行划分，然后将局部特征进行整合（在CNN流行之前很常用）。**
  9 | 
 10 | SPP-net将最后一层conv层之后的pooling层用SPP层进行替换，在每种粒度输出kM维向量（k：# filter； M：该粒度包含的bin的数量）。其中最粗的粒度只输出一个bin，也就是global pooling, 而global average pooling可以用用于减小模型尺寸，降低过拟合，提高精度。
 11 | 
 12 | **特点**：可以接受不同长宽比，不同大小的输入图片，输出固定大小的向量。<br>
 13 | 
 14 | ![image1](pic/Selection_048.png)
 15 | 
 16 | #### Model Training
 17 | **Single size training** <br>
 18 | 输入：固定224 x 224大小 <br>
 19 | conv5 feature map输出：a x a (eg: 13 x 13) <br>
 20 | SPP: n x n bins, 滑动窗池化，win=ceiling(a/n) stride=floor(a/n) <br>
 21 | 
 22 | **Multi-size training**<br>
 23 | 输入： 224 x 224和180 x 180 (224 resize 产生) <br>
 24 | 由于SPP层输出的大小只与粒度有关与输入无关，所以224和180两种图片输入的模型的各层权重可以完全共享，因此在训练的时候用一个网络训练一个epoch之后保留权重用另一个网络训练一个epoch，如此迭代。
 25 | 
 26 | #### SPP Net for object detection
 27 | 与RCNN相比，不同与RCNN的proposal选择在输入图片上进行，SPP Net在featurem map上进行操作，不需要对每个proposal进行重新的卷积计算，速度提高100倍+（操作与fast RCNN类似）。<br>
 28 | 
 29 | ![image2](pic/Selection_049.png)
 30 | 
 31 | Selective Search 提取2000个proposal，resize图像min(w, h) = s ∈ S = {480， 576， 688， 864， 1200}后提取特征，对每个候选proposal用4层SP（1x1,2x2,3x3,6x6, total 50 bins）pool，输出256*50=12800d向量输入到fc层，最后用SVM分类。正例为bbox，负例为与正例IoU<0.3的box（去掉与其他样本IoU>0.7的负例）
 32 | 
 33 | ![image3](pic/Selection_050.png)
 34 | 
 35 | 
 36 | ---
 37 | ## 2. DeepID-Net
 38 | **DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection** [[Paper](https://arxiv.org/pdf/1412.5661.pdf)]: Xiaoou training <br>
 39 | An extension of R-CNN: box pre-training, cascade on region proposals, deformation layers and context representations
 40 | #### Network structure
 41 | **method:** SS提取proposal; RCNN剔除可能是背景的proposal；crop bbox的图片输入到DeepID-Net输出各类的confidence（200类）；用计算全图的分类score（1000类）作为contextual信息来优化前面bbox的分类score；平均多个deep model的输出来提高Detection精度；RCNN进行bbox regression。
 42 | 
 43 | ![image4](pic/Selection_043.png)
 44 | 
 45 | **def-pooling layer**:conv层输出大小为W x H，C个part detection maps，记Mc为第c个part detection map，Mc的第(i,j)个元素是m<sub>c</sub><sup>(i,j)</sup>。def-pooling层从Mc取一个大小为(2R+1)x(2R+1)，中心为（s<sub>x</sub>.x, s<sub>y</sub>.y）的区域，输出如下，大小为（W/s<sub>x</sub>, H/s<sub>y</sub>）：
 46 | 
 47 | ![image5](pic/Selection_045.png)
 48 | 
 49 | 其中m<sub>c</sub><sup>z<sub>δ<sub>x</sub>,δ<sub>y</sub></sub></sup>是把第c个part放到deformed position z<sub>δ<sub>x</sub>,δ<sub>y</sub></sub>的visual score； a<sub>c,n</sub>和d<sub>c,n</sub><sup>δ<sub>x</sub>,δ<sub>y</sub></sup>通过BP学习；Σ<sub>n=1</sub><sup>N</sup>a<sub>c,n</sub>d<sub>c,n</sub><sup>δ<sub>x</sub>,δ<sub>y</sub></sup>是把part从假定的anchor处（s<sub>x</sub>.x, s<sub>y</sub>.y放到z<sub>δ<sub>x</sub>,δ<sub>y</sub></sub>的惩罚。<br>
 50 | 而当N=1，a<sub>n</sub>=1, d<sub>1</sub><sup>δ<sub>x</sub>,δ<sub>y</sub></sup>=0 for |δx|, |δy| ≤ k，d<sub>1</sub><sup>δ<sub>x</sub>,δ<sub>y</sub></sup> = ∞ for |δx|, |δy| > k，此时def-pooling layer相当于max-pooling with kernel size k。<br>
 51 | ![image6](pic/Selection_044.png)
 52 | 
 53 | **优势**：1. 可以替换conv层，学习不同size和semantic meaning的deformation parts；比如高级用大的parts（上肢），中级的用中间级别的part（人头），低级的用小的part（嘴）。2. def-pooling layer可以同时检测多个有相同visual cur的deformable parts（如上图所示）；通过def-pooling layer学习的visual pattern可以在多个class共享。
 54 | 
 55 | **contextual modeling**: concatenate 1000-class score as contextual score with 200-class score to form a 1200d feature vector.
 56 | 
 57 | **multiple deep models**: models trained in different structures, pretraining schemes, loss functions (hinge loss), adding def-pooling layer or not, doing bbox rejection or not.
 58 | 
 59 | 
 60 | 
 61 | ---
 62 | ## 3. YOLO
 63 | **You Only Look Once: Unified, Real-Time Object Detection** [[Paper](https://arxiv.org/pdf/1506.02640.pdf)] [[Code](https://github.com/pjreddie/darknet)]: RGB et.al <br>
 64 | Other references: [TensorFlow YOLO object detection on Android](https://github.com/natanielruiz/android-yolo)
 65 | 
 66 | **特点**：从全图提取特征来同时预测所有类的所有bbox，允许end-to-end training，检测实时性高。
 67 | #### Network structure
 68 | **原理**：首先将图片切分成S x S个格子，如果物体的中心在一个格子内，那么这个格子就需要对检测的该物体负责。每个格子预测B个bbox以及这些bbox的confidence score - Pr(Object)\*IOU<sub>pred</sub><sup>truth</sup>即反映了格子包含物体的可能性以及位置坐标的精度。每个bbox预测值有5个：（x, y, w, h, confidence）；而每个格子预测C个条件类概率（Pr(Class<sub>i</sub>|Object)。只有当格子包含物体的时候才计算这个概率，并且**计算class的概率每个格子只做一次，与bbox的数量(B)无关**。在test的时候将条件类概率与之前的confidence相乘即可得到每个bbox的每个类的检测概率（这个值包含了预得的测类以及坐标的准确性，如下）。<br>
 69 | ![image7](pic/Selection_052.png)
 70 | 在Pascal数据集下S=7， B=2, C=20，所以最后的预测为7x7x(5x2+20)的Tensor。<br>
 71 | ![image8](pic/Selection_051.png)
 72 | 
 73 | **网络结构**：
 74 | 24 conv + 2 fc; fast version为9 conv + 2 fc。 <br>
 75 | ![image9](pic/Selection_053.png)
 76 | 
 77 | #### Model Training
 78 | 首先用ImageNet 1000 class + 上图前20层+average pooling+fc预训练（top5:88%）**分类**,输入的图片大小为224x224。然后在这基础上加4conv+2fc（weights随机初始化）并且图片输入大小提高到448x448训练 **检测**。其中 **预处理** 根据输入图片的大小normalize bbox的宽和高到0-1，而x,y则基于特定格子参数化到0-1。
 79 | 
 80 | 最后一层的激活函数是linear activation，其他层为Leaky ReLU:<br>
 81 | ![image10](pic/Selection_056.png)
 82 | 
 83 | 每个格子都会预测多个bbox，但是训练的时候我们对应每个物体只想要一个bbox predictor。如果某个predictor的预测值与某个物体真实值之间存在最大IOU，那么这个predictor就是负责这个预测这个物体的。
 84 | 
 85 | **Loss function**<br>
 86 | ![image11](pic/Selection_054.png)
 87 | 
 88 | 其中，λ<sub>coord</sub>=5, λ<sub>noobj</sub>=0.5用于增大bbox预测坐标的loss，减少没有包含物体格子的预测confidence的loss（因为大部分格子木有包含物体）；而为了使小的bbox预测坐标时的偏差比大的bbox时影响更大，对w和h进行了开根处理。l<sub>i</sub><sup>obj</sup>表示格子i含有是否物体，有就是1；l<sub>ij</sub><sup>obj</sup>表示i个格子中使用第j个bbox predictor是负责预测该物体的（分类误差只有在格子有物体的时候计算，坐标误差只有在predictor是负责预测这个物体的时候计算）。
 89 | 
 90 | 数据集：VOC2007+VOC2012
 91 | 
 92 | #### Inference
 93 | 98(7x7x2) bbox + NMS
 94 | 
 95 | #### Limitation
 96 | - 每个格子只有2个bbox，以及只能属于1类，因此集群的小物体很难检
 97 | - 预测新的尺寸或者异常长宽比的bbox的坐标能力较差，相比与faster R-CNN，定位误差较大。
 98 | - 将大小不同个的bbox的error按一样的方式处理（小bbox的误差对IoU影响较大）
 99 | 
100 | 
101 | 
102 | ---
103 | ## 4. YOLOv2(YOLO9000)
104 | **YOLO9000: Better, Faster, Stronger** [[Paper](https://arxiv.org/pdf/1612.08242.pdf)] [[Code](https://github.com/allanzelener/YAD2K)]
105 | 
106 | #### Improvement
107 | 1). 每层conv后添加 **BN** (Batch Normalization)；
108 | 
109 | 2). **分类** 模型也用448x448的输入图片训练（提高训练输入图像的分辨率），然后fine-tune用于检测模型；
110 | 
111 | 3).**Anchor boxes**<br>
112 | 使用anchor boxes代替conv层后的全连接层预测的bbox并用anchor box来预测class（在yolo中是由每个格子进行预测）和objectiveness（98 bbox in YOLO --> >1000(13x13x9) bbox in YOLOv2)
113 | 
114 | 用k-means clustering代替手动预先确定anchor box的大小(聚类的距离函数为d(box, centroid)=1−IOU(box, centroid)而非euclidean distance) <br>
115 | ![image12](pic/Selection_065.png)
116 | 
117 | 4). 使用相对grid cell的位置来预测坐标：将真实bbox bound to 0-1，而将预测值用logistics activation也限制到0-1。其中pw和ph是预设的anchor box的宽和高。
118 | 
119 | ![image13](pic/Selection_063.png)
120 | 
121 | ![image14](pic/Selection_057.png)
122 | 
123 | 5). 更细粒度的feature来捕获小物体（ **passthrough层** ）：使用最后一层feature map(13x13)的前一层(26x26x512)，将其转成13x13x2048（类似与resnet中identity mapping）并与最后一层concatenate。
124 | 
125 | 6). 由于YOLOv2只包含conv层和pooling层，没有fc层，因此输入 **任意大小** 的图像。每10个batch选择一个新的输入图像大小 - {320, 352, ..., 608}。
126 | 
127 | 7). 用 **Darknet** 代替VGG作为分类模型作为YOLOv2的基础，减少计算量提高速度。在检测的时候用3x3 conv层（1024 filter）+ 1x1 conv层代替Darknet最后的conv层。加一层passthrough层将最后的3x3x512层（feature map:14x14）与最后的3x3x1024层(feature map: 7x7)相连。
128 | 
129 | ![image15](pic/Selection_061.png)
130 | 
131 | 8). Hierarchical classification
132 | 使用WordTree来预测每个节点的条件概率：比如如果一个物体标记为terrier，那么同时也被标注为dog和mammal。在训练/预测的时候将所有WordTree的中间节点也加入预测的label（VOC：1000-> 1369），然后计算联合概率。
133 | 
134 | ![image16](pic/Selection_060.png)
135 | 
136 | ![image17](pic/Selection_058.png)
137 | 
138 | ![image18](pic/Selection_059.png)
139 | 
140 | 
141 | ---
142 | ## 5. AttentionNet
143 | **AttentionNet: Aggregating Weak Directions for Accurate Object Detection** [[Paper](AttentionNet: Aggregating Weak Directions for Accurate Object Detection)]
144 | 
145 | ![image19](pic/Selection_066.png)
146 | 
147 | ---
148 | ## 6. DenseBox
149 | **DenseBox: Unifying Landmark Localization with End to End Object Detection** [[Paper](https://arxiv.org/pdf/1509.04874.pdf)] : Baidu
150 | 
151 | #### model structure
152 | VGG19 前12层conv，conv4-4后面添加up-sampling层，与conv3-4连接叉出两个branch，分别用于预测bbox的left top和bottom right点和是否包含物体的概率：t<sub>i</sub> = {s, dx<sub>t</sub>=x<sub>i</sub>−x<sub>t</sub>,dy<sub>t</sub>=y<sub>i</sub>−y<sub>t</sub>, dx<sub>b</sub>=x<sub>i</sub>−x<sub>b</sub>,dy<sub>b</sub>=y<sub>i</sub>−y<sub>b</sub>}<sub>i</sub>(第一个是confidence score，后面的是输出的坐标与真实bbox的距离)。
153 | 
154 | ![image20](pic/Selection_067.png)
155 | 
156 | **loss function** <br>
157 | ![image21](pic/Selection_072.png)
158 | 
159 | ![image22](pic/Selection_073.png) （貌似这个公式是错误的）
160 | 
161 | ![image23](pic/Selection_069.png)
162 | 
163 | ![image24](pic/Selection_068.png)
164 | 
165 | #### landmark Localization
166 | ![image25](pic/Selection_070.png)
167 | 
168 | ![image26](pic/Selection_071.png)<br>
169 | Llm: L2 loss of predicted value and labels; Lrf与Lcls相同
170 | 
171 | 
172 | ---
173 | ## 7. SSD
174 | **SSD: Single Shot MultiBox Detector** [[Paper](https://arxiv.org/pdf/1512.02325.pdf)] [[Code](https://github.com/weiliu89/caffe/tree/ssd)]
175 | 
176 | **特点**：用一套预设大小和长宽比的bbox来预测各类的score和box offset（类似于Faster RCNN的anchor box）；**结合不同分辨率的feature map用于处理不同大小的物体**；完全 **没有proposal generation** 和接下去的feature resampling，易于训练。速度快59fps on Titan X with mAP 74.3% on 300x 300 image in VOC2007
177 | 
178 | #### Model structure
179 | 以VGG16作为base，然后添加几层逐渐变小的conv层作为不同尺寸的feature map用于检测模型（**低层感受野比较小适合检测小物体，高层感受野比较大适合检测大物体**）。在各层m x n x p 大小的feature map用3 x 3 x p的kernel来生成类的score或者相对与预设的bbox的坐标offset。
180 | 
181 | ![image27](pic/Selection_079.png)
182 | 
183 | **bbox的默认大小和长宽比**<br>
184 | 对于feature map的每个位置有(c+4)\*k个filter，其中c为class数，k为每个点预设的bbox数，4为box shape offset，所以每个feature map共有(c+4)kmn个输出。
185 | 
186 | ![image28](pic/Selection_074.png)
187 | 
188 | #### Model training
189 | 1). 与YOLO和faster RCNN类似的，SSD训练的时候需要首先将ground bbox投射到 **每层** feature map上。
190 | 首先需要确定哪些预设的box对ground truth的检测负责：根据预设box挑出IoU（jaccard overlap）最高的ground bbox，然后如果预设的box与其他ground bbox之间IoU大于0.5也进行配对，那么网络就可以对多个重叠的预设bbox都检车出高的分值而非只有IoU最高的那一个。
191 | 
192 | **Loss**：<br>
193 | ![image29](pic/Selection_076.png)
194 | 
195 | ![image30](pic/Selection_077.png)
196 | 
197 | ![image31](pic/Selection_078.png)
198 | 
199 | 其中N是匹配上的预设bbox的个数。权重α设为1，而x<sub>ij</sub><sup>k</sup>={1,0}为第i个预设box与第j个ground bbox在物体k上匹配。
200 | 
201 | 预设的bbox不一定需要与每一层的实际感受野对应，**通过设计一系列预设的bbox使特定的feature map与特定大小的物体相对应。** <br>
202 | 首先每一层feature map的预设bbox的大小计算如下：<br>
203 | ![image32](pic/Selection_080.png)
204 | 
205 | 其中，Smin=0.2， Smax=0.9， m指共有m层freature map，即最底层的大小为0.2，最高层的大小为0.9。
206 | 
207 | 而长宽比为a<sub>r</sub> ∈ {1, 2, 3, 1/2, 1/3}}，bbox的宽w<sub>k</sub><sup>a</sup>=Sk*sqrt(a<sub>r</sub>),bbox的高h<sub>k</sub><sup>a</sup>=Sk/sqrt(a<sub>r</sub>)。
208 | 而对ar=1，加一个bbox的Sk'=sqrt(Sk*Sk+1)；而每个bbox的中心为((i+0.5)/|fk|, (j+0.5)/|fk|)，其中fk是第k个feature map的size。
209 | 
210 | 通过这种设计，图1中4x4feature map有匹配上够的预设box，但是在8x8的feature map上就没有。
211 | 
212 | 2). hard negative mining
213 | 
214 | 3). data augmentation
215 | - 使用全图
216 | - 与物体box的IoU ∈ {0.1， 0.3， 0.5， 0.7， 0.9}的局部图
217 | - 随机path
218 | 
219 | 挑选的patch大小从原图的0.1-1，长宽比从0.5-2。然后resize到固定大小，0.5的几率随机水平翻转，以及其他一些photo-metric distortions。
220 | 
221 | 
222 | ---
223 | ## 8. DSSD
224 | **DSSD : Deconvolutional Single Shot Detector** [[Paper](https://arxiv.org/pdf/1701.06659.pdf)]
225 | 
226 | **特点**：SSD+resnet101 with deconvolutional layer (**获取高等级context**)
227 | 
228 | #### Model structure
229 | 只用resnet101替换vgg16，mAP反而从77.5%降到76.4%，但是进过调整之后的精度能显著提升。<br>
230 | ![image33](pic/Selection_081.png)
231 | 
232 | **prediction module**:在feature map之后使用resnet block后再进行预测，而非SSD中直接在conv层之后添加L2 normalization层进行预测，这样做的检测精度就明显超过VGG16了。<br>
233 | ![image34](pic/Selection_082.png)
234 | 
235 | **Deconvolutional SSD**:为了获取高级的context，在SSD基础上添加了deconv层。（**缺点:训练预测时的处理速度会变慢，没有预训练模型可用**）<br>
236 | ![image35](pic/Selection_083.png)
237 | 
238 | 其中，elw product值element-wise product，用于连接原始的conv层和相应的deconv层。
239 | 
240 | #### Model training
241 | **Loss**: joint localization loss (smooth L1) + confidence loss (softmax)。
242 | 
243 | 由于没有类似RCNN的resampling过程，需要数据增强。
244 | 
245 | 使用kmeans clustering来确定预设box的长宽比（**与YOLO2类似**），发现7个类最合适，添加了比例1.6（1/1.6）。
246 | 
247 | #### FPS
248 | ![image36](pic/Selection_084.png)
249 | 
250 | ---
251 | ## 9. Inside-Outside Net (ION)
252 | **Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks** [[Paper](https://arxiv.org/pdf/1512.04143.pdf)]: Microsoft RGB
253 | 
254 | **特点**:预测的时候使用通过spatial RNN整合了contextual information（ION结构）；在ROI区域用skip pooling提取不同尺度的特征；将这两者连接作为网络的输入。
255 | 
256 | ![image37](pic/Selection_086.png)
257 | #### Model Structure
258 | ![image38](pic/Selection_085.png)
259 | 
260 | 2k ROI from raw image (RCNN) --> conv3,4,5+context feature --> L2 normalization， concatenate， re-scale, dimension reduced (1x1 conv) --> 512x7x7 (**由vgg16决定**) <br>
261 | 此外，由于需要结合多层的feature，需要减少1x1conv层初始weight的大小，使用 **Xavier initialization**
262 | 
263 | **Context feature with IRNNs** <br>
264 | ![image39](pic/Selection_087.png)
265 | 
266 | RNN输入序列的的4个读取方向：上下左右。这里采用的RNN为 **ReLU RNN**（recurrent weight matrix初始化为 **identity matrix**，梯度可以完全通过BP传递。效果可以和LSMTM一样好，但是内存，计算速度可以提升很多）。（常用的RNN结构为GRU， LSTM， plain tanh recurrent nn）
267 | 
268 | 使用 **1x1 conv** 作为input-to-hidden transition，因为可以被各个方向共享。bias也可以通过这个方式共享然后整合到1x1 conv层中。然后IRNN的输出以4个方向的隐含层依次连接。
269 | 
270 | IRNN从左到右的update如下，其他方向类似：<br>
271 | ![image40](pic/Selection_089.png)
272 | 
273 | 而当将hiden transition matrix固定为单位阵的时候，上式就可以简化为：
274 | ![image41](pic/Selection_090.png)
275 | 
276 | 在每个方向，平行计算所有独立的行和列而非每次计算一个RNN cell；使用semantic label来regularize IRNN输出，这个时候需要添加一层deconv层（32x32 kernel放大16倍）并crop；经过测试在连接各层输出的时候 **不需要dropout**；在训练RNN的时候bias也是不需要的。
277 | 
278 | 第一个4向IRNN输出的feature map就汇总了每个cell周边的feature，随后的1x1 conv层将这些信息混合并降维。在第二个IRNN后，每个cell的输出与输入与输入相关，因此此时的context feature包含了 **global和local** 的信息，feature随位置而改变。<br>
279 | ![image42](pic/Selection_088.png)
280 | 
281 | 
282 | ---
283 | # 10. R-FCN
284 | **R-FCN: Object Detection via Region-based Fully Convolutional Networks** [[Paper](https://arxiv.org/pdf/1605.06409.pdf)] [[Code]( https://github.com/daijifeng001/R-FCN)] : Kaiming He
285 | 
286 | **特点**：fully convolutional, computation shared on entire image, position-sensitive score map，may ignore global info
287 | 
288 | ![image43](pic/Selection_092.png)
289 | 
290 | #### Model structure
291 | region proposal (RPN) + region classification。最后一层feature map对每个类+背景生成一个k<sup>2</sup> **position-sensitive** （top-left, ..., bottom right） score map，因此总共有k<sup>2</sup>(C+1)个channel。RFCN最后一层是position-sensitive ROI pooling层，合并conv层的结果输出每个ROI的score，但是这里用的是selective pooling（与fast rcnn中不同），每一个kxk的bin都只是来自kxk层score map的其中一层。
292 | 
293 | ![image44](pic/Selection_091.png)
294 | 
295 | ![image45](pic/Selection_096.png)
296 | 
297 | **Base Net**: ResNet101 + 1x1 conv (2048d->1024d) + k<sup>2</sup>(C+1)-channel conv层
298 | 
299 | **position-sensitive score map/pooling**<br>
300 | 将ROI分成k x k个格子，每个格子w x h；然后对(i,j)格子的score map进行pooling（**position-sensitive ROI pooling**）。<br>
301 | ![image46](pic/Selection_094.png)
302 | 
303 | 其中r是第(i,j)格子的对于类型c的pooled response，z是k<sup>2</sup>(C+1)个score map的其中一个，n是bin中包含的pixel数（所以是average pooling）
304 | 
305 | 然后这个k<sup>2</sup> position-sensitive score对这个ROI使用average scroe进行投票，生成一个C+1维vector，然后计算softmax（分类score）。 <br>
306 | ![image47](pic/Selection_098.png)
307 | 
308 | ![image48](pic/Selection_099.png)
309 | 
310 | 类似的，在上述k<sup>2</sup>(C+1)d的conv层添加 **4k<sup>2</sup>d** 的conv层用于bbox coordinate regression，输出向量为4k<sup>2</sup>向量，然后通过average voting整合到4d向量（t=(tx,ty,tw,th)）。**注意，这里为了简化计算用的是class agnostic进行box回归，如果需要用class-specifi，那么添加的conv层为4k<sup>2</sup>C。** （ROI层以后不需要学习，所以训练时候的计算量可以忽略）
311 | 
312 | ![image50](pic/Selection_093.png)
313 | #### training
314 | **Loss**: IoU > 0.5为正例<br>
315 | ![image49](pic/Selection_097.png)
316 | 
317 | **OHEM**
318 | 
319 | input image: 600 (短边)
320 | 
321 | **Algorithme à trous**: improve mAP 2.6%
322 | 
323 | #### inference
324 | 300 ROI， NMS with IoU = 0.3
325 | 
326 | # 11. RetinaNet (Focal loss)
327 | **Focal Loss for Dense Object Detection**: [[Paper](https://arxiv.org/pdf/1708.02002.pdf)] [[Code](https://github.com/unsky/focal-loss)]: Kaiming He, Ross Girshick et.al
328 | 
329 | **特点：** focal loss来代替standard cross entropy loss，防止由于foreground-background class imbalance (比如1:1000)造成的容易检测的样本(easy negatives)累积的loss超过难检测的样本(hard negatives)从而控制了检测过程。
330 | 思想类似于OHEM，但是这里直接从loss入手，不用像OHEM一样对训练样本进行操作。
331 | 
332 | #### Focal Loss
333 | 这里Focal loss是被用于one-stage object detection框架(YOLO, SSD, FPN等)，在two-stage框架(RCNN+RPN)中应该会取得更好的效果。
334 | 
335 | 对于原始的cross-entropy：![image51](pic/Selection_137.png)，如果定义![image52](pic/Selection_138.png),则 **CE(p, y) = CE(Pt) = -log(Pt)**。引入权重α∈[0, 1] for class 1 and 1-α for class -1表示样本偏差对CE的影响：**CE(Pt) = -α<sub>t</sub>log(Pt)**。
336 | 
337 | 但是这样的α只对正负样本的重要性做了平衡（比如样本数量），无法区分 **容易训练/不容易训练** 的样本。所以引入调控因子 **(1-Pt)<sup>γ</sup>**，其中γ>=0: Focal-Loss为![image53](pic/Selection_139.png)。
338 | 其中当γ=0的时候，FL=CE。直观来看， **当Pt趋向1的时候，因子会变得更小，这个容易分类的样本的loss也就越小**。当γ=2 (在实际操作中，**γ=2， α=0.25效果最好**)，pt=0.9的时候，FL的值比原来的CE值小100倍；而当pt=0.968的时候，则会降到1000倍。同时将α引入获得的FL的效果会比只用调节因子的还好一些：![image54](pic/Selection_140.png)。梯度为：![image55](pic/Selection_144.png)
339 | 
340 | ![image56](pic/Selection_136.png)
341 | 
342 | #### Class imbalance
343 | 在 **模型初始化** 的时候，对于二分类，一般情况是会把输出y=-1/1的概率设为相同；但是在样本偏差比较严重的情况时，这里降低预设概率为p(比如0.01)，结果发现能够提高训练的稳定性。
344 | 
345 | 而在two-stage检测中，一般是不做α平衡的，一般用 **two-stage cascade** 和 **biased minibatch sampling** (选择的比例跟α平衡其实是一样的)。这里就用FL来代替这些操作用在one-stage检测中。
346 | 
347 | #### Retina detector
348 | 使用Resnet+FPN骨架，检测和分类用FCN层，使用P3-P7层按anchor大小从32x32到512x512来做检测。
349 | 
350 | ![image57](pic/Selection_141.png)
351 | 
352 | #### Effect
353 | ![image58](pic/Selection_145.png)
354 | 
355 | # 12. CoupleNet
356 | **CoupleNet: Coupling Global Structure with Local Parts for Object Detection**: [[Paper](https://arxiv.org/pdf/1708.02863.pdf)] [[Code](https://github.com/tshizys/CoupleNet)]: Hanqing Lu, et al
357 | 
358 | **特点：** 在RPN生成的proposal上使用Position Sensitive ROI提取local信息和ROI pooling来提取global信息，并结合两者达到更好的检测效果。(速度与RFCN相比稍慢，精度高3个点左右)
359 | 
360 | #### Net Architecture
361 | ![image59](pic/Selection_161.png)
362 | 
363 | 使用ResNet101作为骨架，通过RPN生成候选proposal，然后分成两条branch：1). local part-sensitive FCN + 2). global region-sensitive FCN。 这两条branch的输出最后需要couple到一起作为object的分数。
364 | 
365 | **Local FCN**<br>
366 | 与R-FCN相同，对候选proposal进行1x1卷积生成K<sup>2</sup>(C+1)个通道，通过对K<sup>2</sup>进行voting得到每一类的最后score。
367 | 
368 | **Global FCN**<br>
369 | 与Faster RCNN相同，现在conv4后添加1024d 1x1 conv层降维，然后添加一层ROI pooling层，通过kxk核和1x1 conv层最后输出C+1维向量。为了更好的获取global特征，还从面积为proposal两倍的context区域提取了特征，提取的特征直接连接到proposal提取的特征，输入到后面的ROI wise sub-network。
370 | 
371 | **Coupling structure**<br>
372 | global和local分支提取出来的特征需要首先 **normalization** 来确保相同的大小:L2-Norm layer/1x1 conv layer; 然后耦合到一起：element-wise sum, element-wise product and element-wise maximum。测试的结果为：**1x1 conv + element-wise sum**
373 | 
374 | ![image60](pic/Selection_162.png)
375 | 


--------------------------------------------------------------------------------
/object_detection_summary.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/object_detection_summary.pdf


--------------------------------------------------------------------------------
/pic/Selection_043.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_043.png


--------------------------------------------------------------------------------
/pic/Selection_044.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_044.png


--------------------------------------------------------------------------------
/pic/Selection_045.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_045.png


--------------------------------------------------------------------------------
/pic/Selection_048.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_048.png


--------------------------------------------------------------------------------
/pic/Selection_049.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_049.png


--------------------------------------------------------------------------------
/pic/Selection_050.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_050.png


--------------------------------------------------------------------------------
/pic/Selection_051.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_051.png


--------------------------------------------------------------------------------
/pic/Selection_052.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_052.png


--------------------------------------------------------------------------------
/pic/Selection_053.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_053.png


--------------------------------------------------------------------------------
/pic/Selection_054.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_054.png


--------------------------------------------------------------------------------
/pic/Selection_056.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_056.png


--------------------------------------------------------------------------------
/pic/Selection_057.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_057.png


--------------------------------------------------------------------------------
/pic/Selection_058.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_058.png


--------------------------------------------------------------------------------
/pic/Selection_059.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_059.png


--------------------------------------------------------------------------------
/pic/Selection_060.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_060.png


--------------------------------------------------------------------------------
/pic/Selection_061.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_061.png


--------------------------------------------------------------------------------
/pic/Selection_062.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_062.png


--------------------------------------------------------------------------------
/pic/Selection_063.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_063.png


--------------------------------------------------------------------------------
/pic/Selection_064.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_064.png


--------------------------------------------------------------------------------
/pic/Selection_065.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_065.png


--------------------------------------------------------------------------------
/pic/Selection_066.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_066.png


--------------------------------------------------------------------------------
/pic/Selection_067.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_067.png


--------------------------------------------------------------------------------
/pic/Selection_068.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_068.png


--------------------------------------------------------------------------------
/pic/Selection_069.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_069.png


--------------------------------------------------------------------------------
/pic/Selection_070.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_070.png


--------------------------------------------------------------------------------
/pic/Selection_071.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_071.png


--------------------------------------------------------------------------------
/pic/Selection_072.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_072.png


--------------------------------------------------------------------------------
/pic/Selection_073.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_073.png


--------------------------------------------------------------------------------
/pic/Selection_074.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_074.png


--------------------------------------------------------------------------------
/pic/Selection_076.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_076.png


--------------------------------------------------------------------------------
/pic/Selection_077.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_077.png


--------------------------------------------------------------------------------
/pic/Selection_078.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_078.png


--------------------------------------------------------------------------------
/pic/Selection_079.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_079.png


--------------------------------------------------------------------------------
/pic/Selection_080.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_080.png


--------------------------------------------------------------------------------
/pic/Selection_081.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_081.png


--------------------------------------------------------------------------------
/pic/Selection_082.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_082.png


--------------------------------------------------------------------------------
/pic/Selection_083.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_083.png


--------------------------------------------------------------------------------
/pic/Selection_084.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_084.png


--------------------------------------------------------------------------------
/pic/Selection_085.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_085.png


--------------------------------------------------------------------------------
/pic/Selection_086.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_086.png


--------------------------------------------------------------------------------
/pic/Selection_087.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_087.png


--------------------------------------------------------------------------------
/pic/Selection_088.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_088.png


--------------------------------------------------------------------------------
/pic/Selection_089.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_089.png


--------------------------------------------------------------------------------
/pic/Selection_090.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_090.png


--------------------------------------------------------------------------------
/pic/Selection_091.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_091.png


--------------------------------------------------------------------------------
/pic/Selection_092.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_092.png


--------------------------------------------------------------------------------
/pic/Selection_093.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_093.png


--------------------------------------------------------------------------------
/pic/Selection_094.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_094.png


--------------------------------------------------------------------------------
/pic/Selection_095.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_095.png


--------------------------------------------------------------------------------
/pic/Selection_096.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_096.png


--------------------------------------------------------------------------------
/pic/Selection_097.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_097.png


--------------------------------------------------------------------------------
/pic/Selection_098.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_098.png


--------------------------------------------------------------------------------
/pic/Selection_099.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_099.png


--------------------------------------------------------------------------------
/pic/Selection_136.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_136.png


--------------------------------------------------------------------------------
/pic/Selection_137.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_137.png


--------------------------------------------------------------------------------
/pic/Selection_138.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_138.png


--------------------------------------------------------------------------------
/pic/Selection_139.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_139.png


--------------------------------------------------------------------------------
/pic/Selection_140.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_140.png


--------------------------------------------------------------------------------
/pic/Selection_141.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_141.png


--------------------------------------------------------------------------------
/pic/Selection_144.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_144.png


--------------------------------------------------------------------------------
/pic/Selection_145.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_145.png


--------------------------------------------------------------------------------
/pic/Selection_160.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_160.png


--------------------------------------------------------------------------------
/pic/Selection_161.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_161.png


--------------------------------------------------------------------------------
/pic/Selection_162.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/pic/Selection_162.png


--------------------------------------------------------------------------------
/rcnn_seires_paper.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CasiaFan/object_detection_papers_summary/03b716d3c8e945b4076414c4784a2535e00e2840/rcnn_seires_paper.docx


--------------------------------------------------------------------------------