├── .gitignore ├── README.md ├── assets ├── .DS_Store ├── 2018-(J)- Deep Learning for Generic Object Detection: A Survey - 1809.02165.pdf ├── 2018-(J)- Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks - 1809.03193.pdf ├── 2019-(J)-CornerNet-Lite: Efficient Keypoint Based Object Detection - 1904.08900.pdf ├── README.md ├── algorithm │ ├── .DS_Store │ ├── 1811.04533.pdf │ ├── 1904.03797v1.pdf │ ├── RCNN 算法.xmind │ ├── RCNN_algorithm.png │ ├── SPP算法.xmind │ ├── fast_rcnn.png │ ├── faster_rcnn.png │ ├── faster_rcnn_v2.png │ ├── fpn.png │ ├── overfeat.png │ ├── rcnn.png │ └── sppnet.png ├── block_diagram │ ├── SSD-architecture.png │ ├── SSD-framework.png │ ├── cornetnet-lite.png │ ├── fcn.png │ ├── fcn_architecture.png │ ├── fcn_block.png │ ├── fcn_upooling.jpg │ ├── featurized-image-pyramid.png │ ├── fpn.png │ ├── fpn_rpn.jpeg │ ├── lenet_alexnet.png │ ├── mobilenetv1.png │ ├── mobilenetv2.png │ ├── object_detection_block_diagram.ep │ ├── object_detection_block_diagram.pptx │ ├── resnet_architecture.png │ ├── resnet_block.png │ ├── retina-net.png │ ├── shufflenet.png │ ├── vgg16.png │ ├── vgg19.png │ ├── yolo-network-architecture.png │ └── yolo-responsible-predictor.png └── code_diagram │ ├── alexnet_revised.png │ ├── alexnet_revised_v1.png │ ├── lenet_revised.png │ └── vgg16_tl.png ├── dataset └── ChineseFoodDataset │ ├── .DS_Store │ └── chinese_food_spider.py ├── image-retrieval ├── .DS_Store ├── 1998-(J)-Example-Based Learning for View-Based Human Face Detection.pdf ├── 2010-(J)-Object Detection with Discriminatively Trained Part Based Models.pdf └── paper │ ├── .DS_Store │ └── 2015-(J)-CVPR- Deep Learning of Binary Hash Codes for Fast Image Retrieval.pdf └── sample-code ├── .DS_Store ├── network ├── .DS_Store ├── .idea │ ├── misc.xml │ ├── modules.xml │ ├── network.iml │ ├── vcs.xml │ └── workspace.xml ├── alexnet_keras.py ├── cifar10_cnn.py ├── lenet_keras.py ├── resnet.py ├── resnet50.py ├── resnet_common.py ├── resnet_v2.py ├── resnext.py ├── vgg16.py ├── vgg16_keras.py ├── vgg19.py └── vgg19_keras_cifar100.py ├── nlp └── token_nlp.py └── object_detection ├── .DS_Store └── faster_rcnn ├── faster_rcnn_open_image_dataset.py └── faster_rcnn_train.py /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | dataset/.DS_Store 3 | *.h5 4 | */.idea 5 | */.pytest_cache 6 | */.vscode 7 | *.pkl 8 | *.tgz 9 | sample-code/network/*.jpg 10 | .DS_Store 11 | .idea/* 12 | .vscode/* 13 | assets/block_diagram/.DS_Store 14 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [图解物体检测 & 网络框架](https://github.com/taylorguo/Deep-Object-Detection/blob/master/assets/README.md) 2 | 3 | Inspired by awesome object detection, deep object detection does a easy way for understanding in Chinese. 4 | 5 | ## 目录 6 | 7 | - [图解网络架构](#图解网络架构) 8 | - [LeNet_AlexNet](#lenet_alexnet) 9 | - [LeNet_AlexNet_Keras代码实现](#lenet_alexnet_keras代码实现) 10 | - [VGG16网络与代码实现](#vgg16网络与代码实现) 11 | - [VGG19网络与代码实现](#vgg19网络与代码实现) 12 | - [Resnet](#resnet) 13 | - [Inception-v4: 2016](#inception-v4-2016) 14 | - [SqueezeNet:2016](#squeezenet2016) 15 | - [DenseNet:2016](#densenet2016) 16 | - [Xception:2016](#xception2016) 17 | - [ResNeXt:2016](#resnext2016) 18 | - [ROR: 2016](#ror-2016) 19 | - [MobileNet-v1:2017](#mobilenet-v12017) 20 | - [ShuffleNet:2017](#shufflenet2017) 21 | - [SENet : 2017](#senet--2017) 22 | - [MobileNet-V2:2018](#mobilenet-v22018) 23 | - [ShuffleNet-V2: 2018](#shufflenet-v2-2018) 24 | - [MobileNet-V3: 2019](#mobilenet-v3-2019) 25 | - [EfficientNet: 2019](#efficientnet-2019) 26 | - [Transformer in Transformer: 2021](#transformer-in-transformer-2021) 27 | - [ViT-Image Recognition at Scale: 2021](#vit-image-recognition-at-scale-2021) 28 | - [Perceiver: 2021](#perceiver-2021) 29 | - [图解Object_Detection框架](#图解object_detection框架) 30 | - [Multi-stage Object Detection](#multi-stage-object-detection) 31 | - [RCNN : 2014](#rcnn--2014) 32 | - [SPPnet : 2014](#sppnet--2014) 33 | - [FCN : 2015](#fcn--2015) 34 | - [Fast R-CNN : 2015](#fast-r-cnn--2015) 35 | - [Faster R-CNN : 2015](#faster-r-cnn--2015) 36 | - [FPN : 2016](#fpn--2016) 37 | - [Mask R-CNN : 2017](#mask-r-cnn--2017) 38 | - [Soft-NMS : 2017](#soft-nms--2017) 39 | - [Segmentation is all you need : 2019](#segmentation-is-all-you-need--2019) 40 | - [Single Stage Object Detection](#single-stage-object-detection) 41 | - [DenseBox : 2015](#densebox--2015) 42 | - [SSD : 2016](#ssd--2016) 43 | - [YoLov2 : 2016](#yolov2--2016) 44 | - [RetinaNet : 2017](#retinanet--2017) 45 | - [YoLov3 : 2018](#yolov3--2018) 46 | - [M2Det : 2019](#m2det--2019) 47 | - [CornerNet-Lite : 2019](#cornernet-lite--2019) 48 | - [图解 Action Classification](#图解-action-classification) 49 | - [:lemon: MLAD :date: 2021.03.04v1 :blush: University of Central Florida](#lemon--mlad----date---20210304v1--blush--university-of-central-florida) 50 | - [数据集Object_Detection](#数据集object_detection) 51 | - [General Dataset](#general-dataset) 52 | - [Animal](#animal) 53 | - [Plant](#plant) 54 | - [Food](#food) 55 | - [Transportation](#transportation) 56 | - [Scene](#scene) 57 | - [Face](#face) 58 | 59 | 60 | 61 | # 图解网络架构 62 | 63 | ## LeNet_AlexNet 64 | 65 | 66 | ## LeNet_AlexNet_Keras代码实现 67 | 68 | [LeNet-Keras for mnist handwriting digital image classification](https://github.com/taylorguo/Deep-Object-Detection/blob/master/sample-code/network/lenet_keras.py) 69 | 70 | LeNet-Keras restructure 71 | 72 | 73 | Accuracy: 98.54% 74 | 75 | 76 | =================================== 77 | 78 | [AlexNet-Keras for oxflower17 image classification](https://github.com/taylorguo/Deep-Object-Detection/blob/master/sample-code/network/alexnet_keras.py) 79 | 80 | AlexNet-Keras restructure: 修改后的网络 val_acc: ~80%, 过拟合 81 | 82 | 83 | 84 | 85 | =================================== 86 | ## VGG16网络与代码实现 87 | 88 | 89 | 90 | [VGG16 Keras 官方代码实现](https://github.com/taylorguo/Deep-Object-Detection/blob/master/sample-code/network/vgg16.py) 91 | 92 | [VGG16-Keras oxflower17 物体分类](https://github.com/taylorguo/Deep-Object-Detection/blob/master/sample-code/network/vgg16_keras.py): 修改后的网络 val_acc: ~86.4%, 过拟合 93 | 94 | 95 | 96 | 97 | ## VGG19网络与代码实现 98 | 99 | 100 | 101 | [VGG19 Keras 官方代码实现](https://github.com/taylorguo/Deep-Object-Detection/blob/master/sample-code/network/vgg19.py) 102 | 103 | 104 | 105 | ## Resnet 106 | 107 | - ResNet [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) - CVPR 108 | 109 | - 残差块与直连层: 110 | 111 | 112 | 113 | - 残差网络架构: 114 | 115 | 116 | 117 | - 残差网络中 Shortcut Connection 参考文章 118 | 119 | - 1995 - [Neural networks for pattern recognition - Bishop]() 120 | - 1996 - [Pattern recognition and neural networks - Ripley]() 121 | - 1999 - [Modern applied statistics with s-plus - Venables & Ripley]() 122 | 123 | 124 | - [Highway Networks](https://arxiv.org/pdf/1505.00387v2.pdf), [中文翻译参考](https://www.cnblogs.com/2008nmj/p/9104744.html) 125 | 126 | - [Convolutional Neural Networks at Constrained Time Cost](https://arxiv.org/pdf/1412.1710.pdf) 127 | 128 | - 实验表明: 加深网络, 会出现训练误差 129 | 130 | =================================== 131 | ## Inception-v4: 2016 132 | 133 | - [Inception-v4](https://arxiv.org/pdf/1602.07261v1.pdf), Inception-ResNet and the Impact of Residual Connections on Learning 134 | 135 | 136 | 137 | ## SqueezeNet:2016 138 | 139 | - [SqueezeNet](https://arxiv.org/pdf/1602.07360.pdf): AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size 140 | 141 | 142 | ## DenseNet:2016 143 | 144 | - [DenseNet](https://arxiv.org/pdf/1608.06993.pdf) : Densely Connected Convolutional Networks 145 | 146 | - [DenseNet- Github](https://github.com/liuzhuang13/DenseNet#results-on-imagenet-and-pretrained-models) 147 | - Dense Block 层间链接采用concat, 而不是按元素add 148 | 149 | 150 | ## Xception:2016 151 | 152 | - [Xception](https://arxiv.org/pdf/1610.02357.pdf): Deep Learning with Depthwise Separable Convolutions 153 | 154 | 155 | 156 | 157 | ## ResNeXt:2016 158 | 159 | - [ResNeXt](https://arxiv.org/pdf/1611.05431.pdf): Aggregated Residual Transformations for Deep Neural Networks 160 | 161 | 162 | ## ROR: 2016 163 | 164 | - [ROR](https://arxiv.org/pdf/1608.02908.pdf) - Residual Networks of Residual Networks: Multilevel Residual Networks 165 | 166 | 167 | ## MobileNet-v1:2017 168 | 169 | - [MobileNets](https://arxiv.org/pdf/1704.04861.pdf) : Efficient Convolutional Neural Networks for Mobile Vision Applications 170 | 171 | - 图解MobileNetv1: 172 | 173 | 174 | 175 | - 参考资料: 176 | - [tensorflow layers 卷积层 Python定义](https://github.com/tensorflow/tensorflow/blob/43dcd3dc3ee4b090832455acf43e8dd483a6117b/tensorflow/python/layers/convolutional.py#L222) 177 | - [tensorflow base Layers class](https://github.com/tensorflow/tensorflow/blob/43dcd3dc3ee4b090832455acf43e8dd483a6117b/tensorflow/python/layers/base.py#L156) 178 | - [CNN中卷积层的计算细节@zhihu](https://zhuanlan.zhihu.com/p/29119239) 179 | - [CNN中卷积层的计算细节@csdn](https://blog.csdn.net/dcrmg/article/details/79652487) 180 | - [【TensorFlow】理解tf.nn.conv2d方法](https://blog.csdn.net/zuolixiangfisher/article/details/80528989) 181 | - [【tensorflow源码分析】 Conv2d卷积运算](https://www.cnblogs.com/yao62995/p/5773018.html) 182 | - [**『TensorFlow』卷积层、池化层详解**](https://www.cnblogs.com/hellcat/p/7850048.html) 183 | 184 | 185 | ## ShuffleNet:2017 186 | 187 | - [ShuffleNet](https://arxiv.org/pdf/1707.01083.pdf): An Extremely Efficient Convolutional Neural Network for Mobile Devices 188 | 189 | - 图解ShuffleNet单元块: 190 | 191 | 192 | 193 | - Code: 194 | - [ShuffleNet Tensorflow](https://github.com/MG2033/ShuffleNet) 195 | 196 | 197 | ## SENet : 2017 198 | 199 | - [SENet](https://arxiv.org/pdf/1709.01507.pdf) Squeeze-and-Excitation Networks 200 | 201 | 202 | ## MobileNet-V2:2018 203 | 204 | - [MobileNetV2 ](https://arxiv.org/pdf/1801.04381.pdf): Inverted Residuals and Linear Bottlenecks 205 | 206 | - 图解MobileNetv2: 207 | 208 | 209 | 210 | ## ShuffleNet-V2: 2018 211 | 212 | - [ShuffleNet V2](https://arxiv.org/pdf/1807.11164.pdf): Practical Guidelines for Efficient CNN Architecture Design 213 | 214 | 215 | ## MobileNet-V3: 2019 216 | 217 | - [MobileNet V3](https://arxiv.org/pdf/1905.02244.pdf): Searching for MobileNetV3 218 | 219 | ## EfficientNet: 2019 220 | 221 | - [EfficientNet](https://arxiv.org/pdf/1905.11946.pdf): Rethinking Model Scaling for Convolutional Neural Networks 222 | 223 | 224 | ## Transformer in Transformer: 2021 225 | 226 | - [Transformer in Transformer](https://arxiv.org/pdf/2103.00112v1.pdf) 227 | 228 | 229 | - [TnT PyTorch code](https://github.com/lucidrains/transformer-in-transformer) 230 | 231 | ## ViT-Image Recognition at Scale: 2021 232 | 233 | - [Vision Transformers](https://arxiv.org/pdf/2010.11929.pdf): An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 234 | 235 | - [ViT image classification - Keras code](https://github.com/keras-team/keras-io/blob/master/examples/vision/image_classification_with_vision_transformer.py) 236 | 237 | 238 | 239 | ## Perceiver: 2021 240 | 241 | - [Perceiver](https://arxiv.org/pdf/2103.03206.pdf) : General Perception with Iterative Attention 242 | 243 | 244 | 245 | [ViT image classification - Keras code](https://github.com/keras-team/keras-io/blob/master/examples/vision/image_classification_with_vision_transformer.py) 246 | 247 | 248 | ============================= 249 | 250 | 251 | 252 | # [图解Object_Detection框架](https://github.com/taylorguo/Deep-Object-Detection/blob/master/assets/README.md) 253 | 254 | 通用文档 255 | 256 | - [cs231n : Spatial Localization and Detection](http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf) 257 | 258 | 259 | 2010 260 | 261 | - [Object Detection with Discriminatively Trained Part Based Models](http://cs.brown.edu/people/pfelzens/papers/lsvm-pami.pdf) 262 | 263 | 264 | 2011 265 | 266 | - [Ensemble of Exemplar-SVMs for Object Detection and Beyond](http://www.cs.cmu.edu/~efros/exemplarsvm-iccv11.pdf) 267 | 268 | 269 | 2013 270 | 271 | - [OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks](https://arxiv.org/pdf/1312.6229.pdf) 272 | 273 | - [Code](https://github.com/sermanet/OverFeat) 274 | 275 | - sliding window detector on an image pyramid 276 | 277 | - Overfeat 算法流程: 278 | 279 | 280 | 281 | 2014 282 | 283 | - [VGG: Very Deep Convolutional Networks for Large-Scale Image Recognition](http://www.arxiv.org/pdf/1409.1556.pdf) 284 | 285 | - SPP: [Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/pdf/1406.4729.pdf) 286 | 287 | 288 | 289 | 2017 290 | 291 | - [On the Origin of Deep Learning](https://arxiv.org/pdf/1702.07800.pdf) 292 | 293 | 2018 294 | 295 | - [A guide to convolution arithmetic for deep learning](https://arxiv.org/pdf/1603.07285.pdf) 296 | 297 | 298 | - [Progressive Neural Architecture Search](https://arxiv.org/pdf/1712.00559.pdf) 299 | 300 | 301 | 302 | 303 | =========================== 304 | 305 | ## Multi-stage Object Detection 306 | 307 | 308 | 309 | 310 | 311 | 312 | ### RCNN : 2014 313 | 314 | - [Region-Based Convolutional Networks for Accurate Object Detection and Segmentation](http://medialab.sjtu.edu.cn/teaching/CV/hw/related_papers/3_detection.pdf) 315 | 316 | - v5 [Rich feature hierarchies for accurate object detection and semantic segmentation](https://arxiv.org/pdf/1311.2524v3.pdf) - CVPR 317 | 318 | - region proposal with scale-normalized before classifying with a ConvNet 319 | 320 | 321 | 322 | - [RCNN Keras Code](https://github.com/yhenon/keras-rcnn) 323 | 324 | 325 | 326 | 327 | 328 | ### SPPnet : 2014 329 | 330 | - SPPnet [Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/pdf/1406.4729.pdf) - ECCV 331 | 332 | 333 | - [ROI Pooling ](http://wavelab.uwaterloo.ca/wp-content/uploads/2017/04/Lecture_6.pdf) 334 | 335 | 336 | 337 | 338 | 339 | ### FCN : 2015 340 | 341 | - FCN -[Fully convolutional networks for semantic segmentation](https://arxiv.org/pdf/1411.4038.pdf) - CVPR 342 | - 全卷积网络将最后的三层全连接层, 用多通道同尺寸卷积核, 转换成卷积层; 使输入图像尺寸可以改动 343 | 344 | 345 | 346 | - 语义分割的网络结构: 347 | - 提取不同的池化层特征图, 对特征图进行上采样 348 | - 上采样使用反卷积(转置卷积) : 导致反卷积后的图像不够细致 349 | - 跳层结构, 特征图融合: 元素按像素相加(Keras里面 add 函数) 350 | - 将特征图转换成原图像大小进行像素预测 351 | 352 | 353 | 354 | 355 | 356 | - 语义分割的问题定义: 357 | - 像素值二分类 358 | - 最后一层卷积为1x1x21(VOC 20类物体+1类背景) 359 | 360 | 361 | 362 | [参考资料: 全卷积网络 FCN 详解](https://blog.csdn.net/sinat_24143931/article/details/78696442) 363 | 364 | [参考资料: 10分钟看懂FCN: 语义分割深度模型先驱](http://www.sohu.com/a/270896638_633698) 365 | 366 | - code: 367 | - [FCN in tensorflow](https://github.com/MarvinTeichmann/tensorflow-fcn) 368 | - [FCN offical](https://github.com/shelhamer/fcn.berkeleyvision.org) 369 | 370 | 371 | ### Fast R-CNN : 2015 372 | 373 | - [Fast R-CNN](https://arxiv.org/pdf/1504.08083.pdf) - ICCV 374 | 375 | 376 | 377 | ### Faster R-CNN : 2015 378 | 379 | - [Faster R-CNN: To- wards real-time object detection with region proposal net- works](https://arxiv.org/pdf/1506.01497.pdf) - NIPS 380 | 381 | - RPN(Region Proposal Network) & Anchor Box 382 | 383 | 384 | 385 | - [Convolutional Feature Maps](http://kaiminghe.com/iccv15tutorial/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf) 386 | 387 | 388 | - 物体检索 [Faster R-CNN Features for Instance Search](https://arxiv.org/pdf/1604.08893.pdf) 389 | 390 | 391 | 392 | 393 | 394 | ### FPN : 2016 395 | 396 | - [Feature Pyramid Networks for Object Detection](https://arxiv.org/pdf/1612.03144.pdf) 397 | 398 | - Idea from traditional CV feature pyramids, for compute and memory intensive in DL 399 | 400 | 想法源自传统计算机视觉中的特征金字塔, 深度学习中没用是因为计算密集,占内存 401 | 402 | - bottome-up in FeedForward: deepest layer of each stage should have the strongest features 403 | 404 | 每阶段的最深的一层应该有最强的特征 405 | 406 | 407 | 408 | - [参考文档: Understanding FPN](https://medium.com/@jonathan_hui/understanding-feature-pyramid-networks-for-object-detection-fpn-45b227b9106c) 409 | 410 | - Code: 411 | - [FPN in Mask-RCNN Keras Code](https://github.com/matterport/Mask_RCNN/blob/master/mrcnn/model.py) 412 | - [FPN in Tensorflow](https://github.com/yangxue0827/FPN_Tensorflow) 413 | - [FPN in Caffe](https://github.com/unsky/FPN) 414 | 415 | 416 | 417 | ### Mask R-CNN : 2017 418 | 419 | - [Mask R-CNN](https://arxiv.org/pdf/1703.06870.pdf) 420 | - Code: 421 | - [Keras matterport](https://github.com/matterport/Mask_RCNN) 422 | - [Caffe2 Facebook](https://github.com/facebookresearch/Detectron) 423 | - [PyTorch wannabeOG](https://github.com/wannabeOG/Mask-RCNN) 424 | - [MXNet TuSimple](https://github.com/TuSimple/mx-maskrcnn) 425 | - [Chainer DeNA](https://github.com/DeNA/Chainer_Mask_R-CNN) 426 | 427 | 428 | ### Soft-NMS : 2017 429 | 430 | - [Soft-NMS](https://arxiv.org/pdf/1704.04503.pdf) 431 | 432 | 433 | 434 | ### Segmentation is all you need : 2019 435 | 436 | - [Segmentation is All You Need](https://arxiv.org/pdf/1904.13300v1.pdf) 437 | 438 | 439 | ============================ 440 | ## Single Stage Object Detection 441 | 442 | 443 | ### DenseBox : 2015 444 | 445 | - [DenseBox: Unifying Landmark Localization with End to End Object Detection](https://arxiv.org/pdf/1509.04874.pdf) 446 | 447 | ### SSD : 2016 448 | 449 | - [SSD: Single Shot MultiBox Detector](https://arxiv.org/pdf/1512.02325.pdf) - ECCV 450 | 451 | - 工作流程: 452 | 453 | - 特征提取网络为VGG-16, 边界框 和 分类 为特征图金字塔 454 | 455 | - 网络架构: 456 | 457 | 458 | 459 | - 损失函数: 460 | 461 | - 位置Smooth L1 Loss 和 多分类Softmax 的和 462 | 463 | 464 | 465 | 466 | ### YoLov2 : 2016 467 | 468 | - YOLOv2 [YOLO9000: Better, Faster, Stronger](https://arxiv.org/pdf/1612.08242.pdf) 469 | 470 | - 工作流程: 471 | 472 | - 在图像分类任务上预训练 CNN网络 473 | 474 | - 图像拆分为单元格, 如果一个对象的中心在一个单元格内,该单元格就“负责”检测该对象 475 | 476 | 每个单元预测(a)边界框位置,(b)置信度分数,(c)以边界框中的对象的存在为条件的对象类的概率 477 | 478 | - 修改预训练的CNN的最后一层以输出预测张量 479 | 480 | - 网络架构: 481 | 482 | 483 | 484 | - 损失函数: 485 | 486 | - 2部分组成: 边界框回归 和 分类条件概率 - 都采用平方差的和 487 | 488 | 489 | 490 | 491 | ### RetinaNet : 2017 492 | 493 | - RetinaNet:[Focal Loss for Dense Object Detection](https://arxiv.org/pdf/1708.02002.pdf) 494 | 495 | - 工作流程: 496 | 497 | - 焦点损失为明显的,容易错误分类的情况(具有噪声纹理或部分对象的背景)分配更多权重,并且降低简单情况权重(明显空白背景) 498 | 499 | - 特征提取网络为ResNet, 特征金字塔提高检测性能 500 | 501 | 502 | 503 | - 网络架构: 504 | 505 | 506 | 507 | 508 | 509 | 510 | ### YoLov3 : 2018 511 | 512 | - [YOLOv3: An Incremental Improvement](https://arxiv.org/pdf/1804.02767.pdf) 513 | 514 | - bbox 预测使用尺寸聚类 515 | 516 | - 每个box有4个坐标 517 | 518 | - 训练时, 使用误差平方和损失函数 sum of squared error loss 519 | 520 | - bbox object分值, 用 logistic regression 521 | 522 | - 分类器 使用 logistic regression, 损失函数binary cross-entropy 523 | 524 | - 借鉴了 FPN 网络 525 | 526 | - 特征提取卷积网络 527 | 528 | - 3x3, 1x1 卷积层交替 529 | 530 | - 借鉴了 ResNet, 使用了直连, 分别从卷积层或直连层进行直连 531 | 532 | 533 | ### M2Det : 2019 534 | 535 | - [M2Det](https://arxiv.org/pdf/1811.04533.pdf) 536 | 537 | 538 | ### CornerNet-Lite : 2019 539 | 540 | - [CornerNet-Lite](https://arxiv.org/pdf/1904.08900.pdf) : Efficient Keypoint Based Object Detection 541 | - CornerNet-Saccade: 处理特征图的像素, 一个裁剪多个检测; 离线处理 542 | - CornetNet-Squeeze: 骨干网络, 使用SqueezeNet, 沙漏架构; 实时处理 543 | 544 | 545 | 546 | [参考资料: 目标检测算法总结](https://www.cnblogs.com/guoyaohua/p/8994246.html) 547 | 548 | 549 | 550 | 551 | ============================= 552 | 553 | 554 | # 图解 Action Classification 555 | 556 | ## :lemon: [MLAD](https://arxiv.org/pdf/2103.03027.pdf) :date: 2021.03.04v1 :blush: University of Central Florida 557 | 558 | - [Modeling Multi-Label Action Dependencies for Temporal Action Localization](https://arxiv.org/pdf/2103.03027.pdf) 559 | 560 | Network 561 | 562 | 563 | 564 | 565 | 566 | ============================= 567 | 568 | 569 | 570 | # 数据集Object_Detection 571 | 572 | 不确定每个数据集都包含完整的物体检测数据标注。 573 | 574 | ## General Dataset 575 | 576 | - [数据集收集 Dataset Collection](http://www.escience.cn/people/lichang/Data.html) 577 | 578 | - [数据集: 25种简介](https://www.easemob.com/news/1433) 579 | 580 | - [CIFAR10](https://figshare.com/articles/dataset/CIFAR10-DVS_New/4724671/2) 581 | 582 | - [ImageNet 最大的图像识别图像库](http://www.image-net.org/) 583 | 584 | - 14,197,122张图像 585 | 586 | - [PASCAL Visual Object Classes Challenge 2008 (VOC2008)](http://host.robots.ox.ac.uk/pascal/VOC/voc2008/htmldoc/voc.html), [VOC-2012](http://pjreddie.com/projects/pascal-voc-dataset-mirror/) 587 | 588 | 589 | - [Open Images dataset(带标注)](https://github.com/openimages/dataset) 590 | 591 | 592 | - 近900万个图像URL数据集, 数千个类的图像级标签边框并且进行了标注。 593 | 594 | - 数据集包含9,011,219张图像的训练集, 41,260张图像的验证集, 125,436张图像的测试集。 595 | 596 | 597 | - [Corel5K 图像集](https://github.com/watersink/Corel5K) 598 | 599 | - Corel5K图像集,共5000幅图片,包含50个语义主题,有公共汽车、恐龙、海滩等。 600 | 601 | 602 | 603 | 604 | 605 | ## Animal 606 | 607 | 608 | [Stanford Dogs 🐶 Dataset : Over 20,000 images of 120 dog breeds](https://www.kaggle.com/jessicali9530/stanford-dogs-dataset) 609 | 610 | 611 | - Context 612 | 613 | The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world. This dataset has been built using images and annotation from ImageNet for the task of fine-grained image categorization. It was originally collected for fine-grain image categorization, a challenging problem as certain dog breeds have near identical features or differ in colour and age. 614 | 615 | 来源于imagenet, 用于图像细粒度分类 616 | 617 | 618 | - Content 619 | 620 | - Number of categories: 120 621 | - Number of images: 20,580 622 | - Annotations: Class labels, Bounding boxes 623 | 624 | 625 | [Honey Bee pollen : High resolution images of individual bees on the ramp](https://www.kaggle.com/ivanfel/honey-bee-pollen) 626 | 627 | - Context 628 | 629 | This image dataset has been created from videos captured at the entrance of a bee colony in June 2017 at the Bee facility of the Gurabo Agricultural Experimental Station of the University of Puerto Rico. 630 | 631 | 识别 蜜蜂 🐝 授粉 或者 未授粉 632 | 633 | - Content 634 | 635 | - images/ contains images for pollen bearing and no pollen bearing honey bees. 636 | 637 | - The prefix of the images names define their class: e.g. NP1268-15r.jpg for non-pollen and P7797-103r.jpg for pollen bearing bees. 638 | - The numbers correspond to frame and item number respectively, you need to be careful that they are not numbered sequentially. 639 | 640 | 641 | 642 | - Read-skimage.ipynb Jupyter notebook for simple script to load the data and create the dataset using skimage library. 643 | 644 | 645 | 646 | 647 | ## Plant 648 | 649 | [Flowers Recognition : This dataset contains labeled 4242 images of flowers.](https://www.kaggle.com/alxmamaev/flowers-recognition) 650 | 651 | - Context 652 | 653 | This dataset contains 4242 images of flowers. The data collection is based on the data flicr, google images, yandex images. You can use this datastet to recognize plants from the photo. 654 | 655 | 656 | 657 | - Content 658 | 659 | - five classes: chamomile, tulip, rose, sunflower, dandelion 660 | - each class there are about 800 photos 661 | - resolution: about 320x240 pixels 662 | 663 | 664 | [VGG - 17 Category Flower Dataset](http://www.robots.ox.ac.uk/~vgg/data/flowers/17/index.html) 665 | 666 | - Context 667 | 668 | - 17 category flower dataset with 80 images for each class 669 | - 80 images for each category 670 | 671 | 672 | - Content 673 | 674 | - The datasplits used in this paper are specified in datasplits.mat 675 | 676 | - There are 3 separate splits. The results in the paper are averaged over the 3 splits. 677 | 678 | - Each split has a training file (trn1,trn2,trn3), a validation file (val1, val2, val3) and a testfile (tst1, tst2 or tst3). 679 | 680 | 681 | [VGG - 102 Category Flower Dataset](http://www.robots.ox.ac.uk/~vgg/data/flowers/102/index.html) 682 | 683 | - Context 684 | 685 | - 102 category dataset, consisting of 102 flower categories 686 | - Each class consists of between 40 and 258 images 687 | 688 | 689 | - Content 690 | 691 | - The datasplits used in this paper are specified in setid.mat. 692 | 693 | - The results in the paper are produced on a 103 category database. - - The two categories labeled Petunia have since been merged since they are the same. 694 | - There is a training file (trnid), a validation file (valid) and a testfile (tstid). 695 | 696 | 697 | 698 | [Fruits 360 dataset : A dataset with 65429 images of 95 fruits](https://www.kaggle.com/moltean/fruits) 699 | 700 | - Context 701 | 702 | The following fruits are included: Apples (different varieties: Golden, Red Yellow, Granny Smith, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Cactus fruit, Cantaloupe (2 varieties), Carambula, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Dates, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango, Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine, Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Kaiser, Monster, Williams), Physalis (normal, with Husk), Pineapple (normal, Mini), Pitahaya Red, Plums (different varieties), Pomegranate, Pomelo Sweetie, Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red), Walnut. 703 | 704 | 705 | - Content 706 | 707 | - Total number of images: 65429. 708 | - Training set size: 48905 images (one fruit per image). 709 | - Test set size: 16421 images (one fruit per image). 710 | - Multi-fruits set size: 103 images (more than one fruit (or fruit class) per image) 711 | - Number of classes: 95 (fruits). 712 | - Image size: 100x100 pixels. 713 | 714 | 715 | - [GitHub download: Fruits-360 dataset](https://github.com/Horea94/Fruit-Images-Dataset) 716 | 717 | 718 | 719 | [Plant Seedlings Classification : Determine the species of a seedling from an image](https://www.kaggle.com/c/plant-seedlings-classification) 720 | 721 | - Context 722 | 723 | - a dataset containing images of approximately 960 unique plants belonging to 12 species at several growth stages 724 | 725 | - Content 726 | 727 | - [A Public Image Database for Benchmark of Plant Seedling Classification Algorithms](https://arxiv.org/abs/1711.05458) 728 | 729 | 730 | [V2 Plant Seedlings Dataset : Images of crop and weed seedlings at different growth stages](https://www.kaggle.com/vbookshelf/v2-plant-seedlings-dataset) 731 | 732 | 733 | - Context 734 | - The V1 version of this dataset was used in the Plant Seedling Classification playground competition here on Kaggle. This is the V2 version. Some samples in V1 contained multiple plants. The dataset’s creators have now removed those samples. 735 | 736 | - Content 737 | 738 | - This dataset contains 5,539 images of crop and weed seedlings. 739 | - The images are grouped into 12 classes as shown in the above pictures. These classes represent common plant species in Danish agriculture. Each class contains rgb images that show plants at different growth stages. 740 | - The images are in various sizes and are in png format. 741 | 742 | 743 | 744 | 745 | 746 | ## Food 747 | 748 | [UEC Food-256 Japan Food](http://foodcam.mobi/dataset256.html) 749 | 750 | - Context 751 | 752 | - The dataset "UEC FOOD 256" contains 256-kind food photos. Each food photo has a bounding box indicating the location of the food item in the photo. 753 | 754 | - Most of the food categories in this dataset are popular foods in Japan and other countries. 755 | 756 | 757 | - Content 758 | 759 | - [1-256] : directory names correspond to food ID. 760 | - [1-256]/*.jpg : food photo files (some photos are duplicated in two or more directories, since they includes two or more food items.) 761 | - [1-256]/bb_info.txt: bounding box information for the photo files in each directory 762 | 763 | - category.txt : food list including the correspondences between food IDs and food names in English 764 | - category_ja.txt : food list including the correspondences between food IDs and food names in Japanese 765 | - multiple_food.txt: the list representing food photos including two or more food items 766 | 767 | [FoodDD: Food Detection Dataset](http://www.site.uottawa.ca/~shervin/food/), [论文](http://www.site.uottawa.ca/~shervin/pubs/FoodRecognitionDataset-MadiMa.pdf) 768 | 769 | [NutriNet: A Deep Learning Food and Drink Image Recognition System for Dietary Assessment](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5537777/) 770 | 771 | [ChineseFoodNet: A large-scale Image Dataset for Chinese Food Recognition - 2017](https://arxiv.org/pdf/1705.02743.pdf) 772 | 773 | [Yummly-28K - 2017](http://isia.ict.ac.cn/dataset/) 774 | 775 | - Content 776 | 777 | - 27,638 recipes in total. 778 | - Each recipe contains one recipe image, the ingredients, the cuisine and the course information. 779 | - There are 16 kinds of cuisines (e.g,“American”,“Italian” and “Mexican”) 780 | - and 13 kinds of recipe courses (e.g, “Main Dishes”,“Desserts” and “Lunch and Snacks”). 781 | 782 | [VireoFood-172 dataset](http://vireo.cs.cityu.edu.hk/vireofood172/), [论文-2016](http://vireo.cs.cityu.edu.hk/jingjing/papers/chen2016deep.pdf) 783 | 784 | [Dishes: a restaurant-oriented food dataset - 2015](http://isia.ict.ac.cn/dataset/Geolocation-food/) 785 | 786 | 787 | 788 | 789 | ## Transportation 790 | 791 | 792 | [Boat types recognition : About 1,500 pictures of boats classified in 9 categories](https://www.kaggle.com/clorichel/boat-types-recognition) 793 | 794 | - Context 795 | 796 | This dataset is used on this blog post https://clorichel.com/blog/2018/11/10/machine-learning-and-object-detection/ where you'll train an image recognition model with TensorFlow to find about anything on pictures and videos. 797 | 798 | 799 | 800 | - Content 801 | 802 | 1,500 pictures of boats, of various sizes, but classified by those different types: buoy, cruise ship, ferry boat, freight boat, gondola, inflatable boat, kayak, paper boat, sailboat. 803 | 804 | 805 | 806 | 807 | 808 | ## Scene 809 | 810 | 811 | [Intel Image Classification : Image Scene Classification of Multiclass](https://www.kaggle.com/puneet6060/intel-image-classification) 812 | 813 | - Context 814 | 815 | image data of Natural Scenes around the world 816 | 817 | 818 | 819 | - Content 820 | 821 | - This Data contains around 25k images of size 150x150 distributed under 6 categories. {'buildings' -> 0, 'forest' -> 1, 'glacier' -> 2, 'mountain' -> 3, 'sea' -> 4, 'street' -> 5 } 822 | 823 | - The Train, Test and Prediction data is separated in each zip files. There are around 14k images in Train, 3k in Test and 7k in Prediction. This data was initially published on https://datahack.analyticsvidhya.com by Intel to host a Image classification Challenge. 824 | 825 | 826 | 827 | 828 | 829 | 830 | ## Face 831 | 832 | [CelebFaces Attributes (CelebA) Dataset : Over 200K images of celebrities with 40 binary attribute annotations](https://www.kaggle.com/jessicali9530/celeba-dataset/version/2) 833 | 834 | -------------------------------------------------------------------------------- /assets/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/.DS_Store -------------------------------------------------------------------------------- /assets/2018-(J)- Deep Learning for Generic Object Detection: A Survey - 1809.02165.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/2018-(J)- Deep Learning for Generic Object Detection: A Survey - 1809.02165.pdf -------------------------------------------------------------------------------- /assets/2018-(J)- Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks - 1809.03193.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/2018-(J)- Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks - 1809.03193.pdf -------------------------------------------------------------------------------- /assets/2019-(J)-CornerNet-Lite: Efficient Keypoint Based Object Detection - 1904.08900.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/2019-(J)-CornerNet-Lite: Efficient Keypoint Based Object Detection - 1904.08900.pdf -------------------------------------------------------------------------------- /assets/README.md: -------------------------------------------------------------------------------- 1 | # 图解 Object Detection 框架 2 | 3 | ## General Information 4 | 5 | - [cs231n : Spatial Localization and Detection](http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf) 6 | 7 | 8 | 9 | 10 | 2010 11 | 12 | - [Object Detection with Discriminatively Trained Part Based Models](http://cs.brown.edu/people/pfelzens/papers/lsvm-pami.pdf) 13 | 14 | 15 | 2011 16 | 17 | - [Ensemble of Exemplar-SVMs for Object Detection and Beyond](http://www.cs.cmu.edu/~efros/exemplarsvm-iccv11.pdf) 18 | 19 | 2012 20 | 21 | - [AlexNet]() 22 | 23 | 24 | 2013 25 | 26 | - [OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks](https://arxiv.org/pdf/1312.6229.pdf) 27 | 28 | - [Code](https://github.com/sermanet/OverFeat) 29 | 30 | - sliding window detector on an image pyramid 31 | 32 | 33 | 34 | 2014 35 | 36 | - [VGG]() 37 | 38 | - SPP: [Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/pdf/1406.4729.pdf) 39 | 40 | 41 | 2015 42 | 43 | - [Highway Networks](https://arxiv.org/pdf/1505.00387v2.pdf), [中文翻译参考](https://www.cnblogs.com/2008nmj/p/9104744.html) 44 | 45 | - [Convolutional Neural Networks at Constrained Time Cost](https://arxiv.org/pdf/1412.1710.pdf) 46 | 47 | - 实验表明: 加深网络, 会出现训练误差 48 | 49 | - ResNet [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) - CVPR 50 | 51 | - 残差网络中 Shortcut Connection 参考文章 52 | 53 | - 1995 - [Neural networks for pattern recognition - Bishop]() 54 | - 1996 - [Pattern recognition and neural networks - Ripley]() 55 | - 1999 - [Modern applied statistics with s-plus - Venables & Ripley]() 56 | 57 | 2017 58 | 59 | - [On the Origin of Deep Learning](https://arxiv.org/pdf/1702.07800.pdf) 60 | 61 | 2018 62 | 63 | - [A guide to convolution arithmetic for deep learning](https://arxiv.org/pdf/1603.07285.pdf) 64 | 65 | 66 | - [Progressive Neural Architecture Search](https://arxiv.org/pdf/1712.00559.pdf) 67 | 68 | 69 | 70 | 71 | ============================ 72 | ## Single Stage Object Detection 73 | 74 | 75 | 2015 76 | 77 | - [DenseBox: Unifying Landmark Localization with End to End Object Detection](https://arxiv.org/pdf/1509.04874.pdf) 78 | 79 | 2016 80 | 81 | - [SSD: Single Shot MultiBox Detector](https://arxiv.org/pdf/1512.02325.pdf) - ECCV 82 | 83 | - 工作流程: 84 | 85 | - 特征提取网络为VGG-16, 边界框 和 分类 为特征图金字塔 86 | 87 | - 网络架构: 88 | 89 | 90 | 91 | - 损失函数: 92 | 93 | - 位置Smooth L1 Loss 和 多分类Softmax 的和 94 | 95 | 96 | 97 | 98 | - YOLOv2 [YOLO9000: Better, Faster, Stronger](https://arxiv.org/pdf/1612.08242.pdf) 99 | 100 | - 工作流程: 101 | 102 | - 在图像分类任务上预训练 CNN网络 103 | 104 | - 图像拆分为单元格, 如果一个对象的中心在一个单元格内,该单元格就“负责”检测该对象 105 | 106 | 每个单元预测(a)边界框位置,(b)置信度分数,(c)以边界框中的对象的存在为条件的对象类的概率 107 | 108 | - 修改预训练的CNN的最后一层以输出预测张量 109 | 110 | - 网络架构: 111 | 112 | 113 | 114 | - 损失函数: 115 | 116 | - 2部分组成: 边界框回归 和 分类条件概率 - 都采用平方差的和 117 | 118 | 119 | 120 | 121 | 2017 122 | 123 | - RetinaNet:[Focal Loss for Dense Object Detection](https://arxiv.org/pdf/1708.02002.pdf) 124 | 125 | - 工作流程: 126 | 127 | - 焦点损失为明显的,容易错误分类的情况(具有噪声纹理或部分对象的背景)分配更多权重,并且降低简单情况权重(明显空白背景) 128 | 129 | - 特征提取网络为ResNet, 特征金字塔提高检测性能 130 | 131 | 132 | 133 | - 网络架构: 134 | 135 | 136 | 137 | 138 | 139 | 140 | 2018 141 | 142 | - [YOLOv3: An Incremental Improvement](https://arxiv.org/pdf/1804.02767.pdf) 143 | 144 | - bbox 预测使用尺寸聚类 145 | 146 | - 每个box有4个坐标 147 | 148 | - 训练时, 使用误差平方和损失函数 sum of squared error loss 149 | 150 | - bbox object分值, 用 logistic regression 151 | 152 | - 分类器 使用 logistic regression, 损失函数binary cross-entropy 153 | 154 | - 借鉴了 FPN 网络 155 | 156 | - 特征提取卷积网络 157 | 158 | - 3x3, 1x1 卷积层交替 159 | 160 | - 借鉴了 ResNet, 使用了直连, 分别从卷积层或直连层进行直连 161 | 162 | 163 | 164 | =========================== 165 | ## Multi-stage Object Detection 166 | 167 | 168 | 169 | 170 | 2014 171 | 172 | - RCNN 173 | 174 | - [Region-Based Convolutional Networks for 175 | Accurate Object Detection and Segmentation](http://medialab.sjtu.edu.cn/teaching/CV/hw/related_papers/3_detection.pdf) 176 | 177 | - v5 [Rich feature hierarchies for accurate object detection and semantic segmentation](https://arxiv.org/pdf/1311.2524v3.pdf) - CVPR 178 | - region proposal with scale-normalized before classifying with a ConvNet 179 | 180 | 181 | 182 | -[RCNN Keras Code](https://github.com/yhenon/keras-rcnn) 183 | 184 | 185 | 186 | 187 | 188 | - SPPnet [Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/pdf/1406.4729.pdf) - ECCV 189 | 190 | 191 | 192 | - [ROI Pooling ](http://wavelab.uwaterloo.ca/wp-content/uploads/2017/04/Lecture_6.pdf) 193 | 194 | 2015 195 | 196 | 197 | 198 | - FCN -[Fully convolutional networks for semantic segmentation](https://arxiv.org/pdf/1411.4038.pdf) - CVPR 199 | 200 | - [Fast R-CNN](https://arxiv.org/pdf/1504.08083.pdf) - ICCV 201 | 202 | 203 | 204 | - [Faster R-CNN: To- wards real-time object detection with region proposal net- works](https://arxiv.org/pdf/1506.01497.pdf) - NIPS 205 | 206 | - RPN(Region Proposal Network) & Anchor Box 207 | 208 | 209 | 210 | - [Convolutional Feature Maps](http://kaiminghe.com/iccv15tutorial/iccv2015_tutorial_convolutional_feature_maps_kaiminghe.pdf) 211 | 212 | 213 | - 物体检索 [Faster R-CNN Features for Instance Search](https://arxiv.org/pdf/1604.08893.pdf) 214 | 215 | 216 | 217 | 2016 218 | 219 | 220 | 221 | 222 | - [Feature Pyramid Networks for Object Detection](https://arxiv.org/pdf/1612.03144.pdf) 223 | 224 | - Idea from traditional CV feature pyramids, for compute and memory intensive in DL 225 | 226 | 想法源自传统计算机视觉中的特征金字塔, 深度学习中没用是因为计算密集,占内存 227 | 228 | - bottome-up in FeedForward: deepest layer of each stage should have the strongest features 229 | 230 | 每阶段的最深的一层应该有最强的特征 231 | 232 | 233 | 234 | - [参考文档: Understanding FPN](https://medium.com/@jonathan_hui/understanding-feature-pyramid-networks-for-object-detection-fpn-45b227b9106c) -------------------------------------------------------------------------------- /assets/algorithm/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/.DS_Store -------------------------------------------------------------------------------- /assets/algorithm/1811.04533.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/1811.04533.pdf -------------------------------------------------------------------------------- /assets/algorithm/1904.03797v1.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/1904.03797v1.pdf -------------------------------------------------------------------------------- /assets/algorithm/RCNN 算法.xmind: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/RCNN 算法.xmind -------------------------------------------------------------------------------- /assets/algorithm/RCNN_algorithm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/RCNN_algorithm.png -------------------------------------------------------------------------------- /assets/algorithm/SPP算法.xmind: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/SPP算法.xmind -------------------------------------------------------------------------------- /assets/algorithm/fast_rcnn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/fast_rcnn.png -------------------------------------------------------------------------------- /assets/algorithm/faster_rcnn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/faster_rcnn.png -------------------------------------------------------------------------------- /assets/algorithm/faster_rcnn_v2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/faster_rcnn_v2.png -------------------------------------------------------------------------------- /assets/algorithm/fpn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/fpn.png -------------------------------------------------------------------------------- /assets/algorithm/overfeat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/overfeat.png -------------------------------------------------------------------------------- /assets/algorithm/rcnn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/rcnn.png -------------------------------------------------------------------------------- /assets/algorithm/sppnet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/algorithm/sppnet.png -------------------------------------------------------------------------------- /assets/block_diagram/SSD-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/SSD-architecture.png -------------------------------------------------------------------------------- /assets/block_diagram/SSD-framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/SSD-framework.png -------------------------------------------------------------------------------- /assets/block_diagram/cornetnet-lite.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/cornetnet-lite.png -------------------------------------------------------------------------------- /assets/block_diagram/fcn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/fcn.png -------------------------------------------------------------------------------- /assets/block_diagram/fcn_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/fcn_architecture.png -------------------------------------------------------------------------------- /assets/block_diagram/fcn_block.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/fcn_block.png -------------------------------------------------------------------------------- /assets/block_diagram/fcn_upooling.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/fcn_upooling.jpg -------------------------------------------------------------------------------- /assets/block_diagram/featurized-image-pyramid.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/featurized-image-pyramid.png -------------------------------------------------------------------------------- /assets/block_diagram/fpn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/fpn.png -------------------------------------------------------------------------------- /assets/block_diagram/fpn_rpn.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/fpn_rpn.jpeg -------------------------------------------------------------------------------- /assets/block_diagram/lenet_alexnet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/lenet_alexnet.png -------------------------------------------------------------------------------- /assets/block_diagram/mobilenetv1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/mobilenetv1.png -------------------------------------------------------------------------------- /assets/block_diagram/mobilenetv2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/mobilenetv2.png -------------------------------------------------------------------------------- /assets/block_diagram/object_detection_block_diagram.pptx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/object_detection_block_diagram.pptx -------------------------------------------------------------------------------- /assets/block_diagram/resnet_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/resnet_architecture.png -------------------------------------------------------------------------------- /assets/block_diagram/resnet_block.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/resnet_block.png -------------------------------------------------------------------------------- /assets/block_diagram/retina-net.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/retina-net.png -------------------------------------------------------------------------------- /assets/block_diagram/shufflenet.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/shufflenet.png -------------------------------------------------------------------------------- /assets/block_diagram/vgg16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/vgg16.png -------------------------------------------------------------------------------- /assets/block_diagram/vgg19.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/vgg19.png -------------------------------------------------------------------------------- /assets/block_diagram/yolo-network-architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/yolo-network-architecture.png -------------------------------------------------------------------------------- /assets/block_diagram/yolo-responsible-predictor.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/block_diagram/yolo-responsible-predictor.png -------------------------------------------------------------------------------- /assets/code_diagram/alexnet_revised.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/code_diagram/alexnet_revised.png -------------------------------------------------------------------------------- /assets/code_diagram/alexnet_revised_v1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/code_diagram/alexnet_revised_v1.png -------------------------------------------------------------------------------- /assets/code_diagram/lenet_revised.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/code_diagram/lenet_revised.png -------------------------------------------------------------------------------- /assets/code_diagram/vgg16_tl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/assets/code_diagram/vgg16_tl.png -------------------------------------------------------------------------------- /dataset/ChineseFoodDataset/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/dataset/ChineseFoodDataset/.DS_Store -------------------------------------------------------------------------------- /dataset/ChineseFoodDataset/chinese_food_spider.py: -------------------------------------------------------------------------------- 1 | import os, requests 2 | from lxml import etree 3 | 4 | 5 | main_page = "https://www.douguo.com/caipu/" 6 | fenlei_url = "https://www.douguo.com/caipu/fenlei" 7 | test_noodle_page = "https://www.douguo.com/caipu/面条/" 8 | 9 | 10 | def get_html(url): 11 | 12 | r = requests.get(url) 13 | r.encoding = "utf-8" 14 | 15 | # print(r.text) 16 | 17 | html = etree.HTML(r.text) 18 | 19 | return html 20 | 21 | 22 | def get_classes_urls_dict(classes_page_url): 23 | classes_page_html = get_html(classes_page_url) 24 | classes = classes_page_html.xpath("//ul/li/a/@title") 25 | catalog_url = classes_page_html.xpath("//ul[@class='sortlist clearfix']/li/a/@href") 26 | 27 | classes_url = {} 28 | for k,v in zip(classes, catalog_url): 29 | classes_url.update({k:"".join((main_page, k))}) 30 | 31 | return classes_url 32 | 33 | 34 | def get_class_page_nums(single_class_url): 35 | 36 | class_page = get_html(single_class_url) 37 | page_urls = class_page.xpath("//div[@class='pages']/a/@href") 38 | 39 | page_url_prefix = "" 40 | all_page_num = [] 41 | for i in list(set(page_urls)): 42 | page_num = int(i.split("/")[-1]) 43 | all_page_num.append(page_num) 44 | 45 | if page_url_prefix != i[:i.rfind("/")]: 46 | page_url_prefix = i[:i.rfind("/")+1] 47 | # print(page_url_prefix) 48 | 49 | all_page_num.sort() 50 | # print(all_page_num) 51 | 52 | new_num = [] 53 | 54 | if (len(all_page_num)>3): 55 | d0 = all_page_num[1]-all_page_num[0] 56 | d1 = all_page_num[2]-all_page_num[1] 57 | if d0==d1: 58 | p_num = all_page_num[0] 59 | i = 0 60 | while p_num <= all_page_num[-1]: 61 | new_num.append(page_url_prefix + str(p_num)) 62 | p_num += d0 63 | else: 64 | new_num = all_page_num 65 | 66 | return new_num 67 | 68 | 69 | def get_page_img(page_url): 70 | page_html = get_html(page_url) 71 | page_imgs = page_html.xpath("//ul[@id='jxlist']/li/a/img/@src") 72 | return page_imgs 73 | 74 | 75 | def get_all_urls(cls_url): 76 | 77 | all_url_list = [] 78 | cls_urls = get_classes_urls_dict(cls_url) 79 | for i in cls_urls.values(): 80 | for j in get_class_page_nums(i): 81 | all_url_list.extend(get_page_img(j)) 82 | 83 | return all_url_list 84 | 85 | 86 | def download_img_list(url_list): 87 | 88 | if not os.path.exists("images"): 89 | os.mkdir("images") 90 | 91 | i =1 92 | for each in url_list: 93 | print('正在下载第' + str(i) + '张图片,图片地址:' + str(each)) 94 | try: 95 | pic = requests.get(each, timeout=10) 96 | except requests.exceptions.ConnectionError: 97 | print('【错误】当前图片无法下载') 98 | continue 99 | 100 | file_name = each.split("/")[-1] 101 | # dir = 'images/' + 'douguo_{:%Y%m%dT%H%M%S}.jpg'.format(datetime.datetime.now()) 102 | dir = 'images/' + file_name 103 | 104 | with open(dir, 'wb') as fp: 105 | fp.write(pic.content) 106 | 107 | i += 1 108 | 109 | 110 | def download_img(url_list): 111 | if not os.path.exists("images"): 112 | os.mkdir("images") 113 | 114 | i =1 115 | for each in url_list: 116 | 117 | new_each = each.replace("400x266", "yuan") 118 | print('正在下载第' + str(i) + '张图片,图片地址:' + str(each)) 119 | 120 | try: 121 | pic = requests.get(new_each, timeout=10) 122 | except requests.exceptions.ConnectionError: 123 | print('【错误】当前图片无法下载') 124 | continue 125 | 126 | file_name = new_each.split("/")[-1] 127 | # dir = 'images/' + 'douguo_{:%Y%m%dT%H%M%S}.jpg'.format(datetime.datetime.now()) 128 | dir = 'images/' + file_name 129 | 130 | with open(dir, 'wb') as fp: 131 | fp.write(pic.content) 132 | 133 | i += 1 134 | 135 | 136 | def main(url): 137 | c_dict = get_classes_urls_dict(url) 138 | for each_cls in c_dict.values(): 139 | # print(get_class_page_nums(each_cls)) 140 | for each_page in get_class_page_nums(each_cls): 141 | # print(get_page_img(each_page)) 142 | download_img(get_page_img(each_page)) 143 | # for each_img in get_page_img(each_page): 144 | # new_each_img = each_img.replace("400x266", "yuan") 145 | # print(new_each_img) 146 | # download_img(each_img) 147 | 148 | 149 | if __name__ == "__main__": 150 | main(fenlei_url) -------------------------------------------------------------------------------- /image-retrieval/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/image-retrieval/.DS_Store -------------------------------------------------------------------------------- /image-retrieval/1998-(J)-Example-Based Learning for View-Based Human Face Detection.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/image-retrieval/1998-(J)-Example-Based Learning for View-Based Human Face Detection.pdf -------------------------------------------------------------------------------- /image-retrieval/2010-(J)-Object Detection with Discriminatively Trained Part Based Models.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/image-retrieval/2010-(J)-Object Detection with Discriminatively Trained Part Based Models.pdf -------------------------------------------------------------------------------- /image-retrieval/paper/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/image-retrieval/paper/.DS_Store -------------------------------------------------------------------------------- /image-retrieval/paper/2015-(J)-CVPR- Deep Learning of Binary Hash Codes for Fast Image Retrieval.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/image-retrieval/paper/2015-(J)-CVPR- Deep Learning of Binary Hash Codes for Fast Image Retrieval.pdf -------------------------------------------------------------------------------- /sample-code/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/sample-code/.DS_Store -------------------------------------------------------------------------------- /sample-code/network/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/taylorguo/Deep-Object-Detection/76c602d1ff344c28ce63bd84c8d19242a2411839/sample-code/network/.DS_Store -------------------------------------------------------------------------------- /sample-code/network/.idea/misc.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 6 | 7 | -------------------------------------------------------------------------------- /sample-code/network/.idea/modules.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | -------------------------------------------------------------------------------- /sample-code/network/.idea/network.iml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 11 | -------------------------------------------------------------------------------- /sample-code/network/.idea/vcs.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | -------------------------------------------------------------------------------- /sample-code/network/.idea/workspace.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 65 | 66 | 67 | 69 | 70 | 71 | 72 | 73 | 78 | 79 | 80 | 81 | 82 | true 83 | DEFINITION_ORDER 84 | 85 | 86 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 |