├── .gitignore
├── LICENSE
├── README.md
├── VOCdevkit
    └── VOC2007
    │   ├── Annotations
    │       └── README.md
    │   ├── ImageSets
    │       └── Main
    │       │   └── README.md
    │   └── JPEGImages
    │       └── README.md
├── get_map.py
├── img
    └── street.jpg
├── logs
    └── README.md
├── model_data
    ├── simhei.ttf
    └── voc_classes.txt
├── nets
    ├── __init__.py
    ├── mobilenet.py
    ├── ssd.py
    └── ssd_training.py
├── predict.py
├── requirements.txt
├── ssd.py
├── summary.py
├── train.py
├── utils
    ├── __init__.py
    ├── anchors.py
    ├── callbacks.py
    ├── dataloader.py
    ├── utils.py
    ├── utils_bbox.py
    └── utils_map.py
├── voc_annotation.py
└── 常见问题汇总.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # ignore map, miou, datasets
  2 | map_out/
  3 | miou_out/
  4 | VOCdevkit/
  5 | datasets/
  6 | Medical_Datasets/
  7 | lfw/
  8 | logs/
  9 | model_data/
 10 | .temp_map_out/
 11 | 
 12 | # Byte-compiled / optimized / DLL files
 13 | __pycache__/
 14 | *.py[cod]
 15 | *$py.class
 16 | 
 17 | # C extensions
 18 | *.so
 19 | 
 20 | # Distribution / packaging
 21 | .Python
 22 | build/
 23 | develop-eggs/
 24 | dist/
 25 | downloads/
 26 | eggs/
 27 | .eggs/
 28 | lib/
 29 | lib64/
 30 | parts/
 31 | sdist/
 32 | var/
 33 | wheels/
 34 | pip-wheel-metadata/
 35 | share/python-wheels/
 36 | *.egg-info/
 37 | .installed.cfg
 38 | *.egg
 39 | MANIFEST
 40 | 
 41 | # PyInstaller
 42 | #  Usually these files are written by a python script from a template
 43 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 44 | *.manifest
 45 | *.spec
 46 | 
 47 | # Installer logs
 48 | pip-log.txt
 49 | pip-delete-this-directory.txt
 50 | 
 51 | # Unit test / coverage reports
 52 | htmlcov/
 53 | .tox/
 54 | .nox/
 55 | .coverage
 56 | .coverage.*
 57 | .cache
 58 | nosetests.xml
 59 | coverage.xml
 60 | *.cover
 61 | *.py,cover
 62 | .hypothesis/
 63 | .pytest_cache/
 64 | 
 65 | # Translations
 66 | *.mo
 67 | *.pot
 68 | 
 69 | # Django stuff:
 70 | *.log
 71 | local_settings.py
 72 | db.sqlite3
 73 | db.sqlite3-journal
 74 | 
 75 | # Flask stuff:
 76 | instance/
 77 | .webassets-cache
 78 | 
 79 | # Scrapy stuff:
 80 | .scrapy
 81 | 
 82 | # Sphinx documentation
 83 | docs/_build/
 84 | 
 85 | # PyBuilder
 86 | target/
 87 | 
 88 | # Jupyter Notebook
 89 | .ipynb_checkpoints
 90 | 
 91 | # IPython
 92 | profile_default/
 93 | ipython_config.py
 94 | 
 95 | # pyenv
 96 | .python-version
 97 | 
 98 | # pipenv
 99 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
100 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
101 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
102 | #   install all needed dependencies.
103 | #Pipfile.lock
104 | 
105 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
106 | __pypackages__/
107 | 
108 | # Celery stuff
109 | celerybeat-schedule
110 | celerybeat.pid
111 | 
112 | # SageMath parsed files
113 | *.sage.py
114 | 
115 | # Environments
116 | .env
117 | .venv
118 | env/
119 | venv/
120 | ENV/
121 | env.bak/
122 | venv.bak/
123 | 
124 | # Spyder project settings
125 | .spyderproject
126 | .spyproject
127 | 
128 | # Rope project settings
129 | .ropeproject
130 | 
131 | # mkdocs documentation
132 | /site
133 | 
134 | # mypy
135 | .mypy_cache/
136 | .dmypy.json
137 | dmypy.json
138 | 
139 | # Pyre type checker
140 | .pyre/
141 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 JiaQi Xu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## Mobilenet-SSD：轻量级目标检测模型在Keras当中的实现（论文版）
  2 | ---
  3 | 
  4 | 之前实现了一个版本的mobilenet-SSD，有小伙伴告诉我说这个不是原版的Mobilenet-ssd的结构，然后我去网上查了一下，好像还真不是，原版的Mobilenet-ssd不利用38x38的特征层进行回归预测和分类预测，因此我就制作了这个版本，填一下坑。
  5 | 
  6 | ## 目录
  7 | 1. [仓库更新 Top News](#仓库更新)
  8 | 2. [相关仓库 Related code](#相关仓库)
  9 | 3. [性能情况 Performance](#性能情况)
 10 | 4. [所需环境 Environment](#所需环境)
 11 | 5. [文件下载 Download](#文件下载)
 12 | 6. [训练步骤 How2train](#训练步骤)
 13 | 7. [预测步骤 How2predict](#预测步骤)
 14 | 8. [评估步骤 How2eval](#评估步骤)
 15 | 9. [参考资料 Reference](#Reference)
 16 | 
 17 | ## Top News
 18 | **`2022-04`**:**支持多GPU训练，新增各个种类目标数量计算。**  
 19 | 
 20 | **`2022-03`**:**进行了大幅度的更新，支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整、新增图片裁剪。**  
 21 | BiliBili视频中的原仓库地址为：https://github.com/bubbliiiing/Mobilenet-SSD-Essay/tree/bilibili
 22 | 
 23 | **`2021-10`**:**进行了大幅度的更新，增加了大量注释、增加了大量可调整参数、对代码的组成模块进行修改、增加fps、视频预测、批量预测等功能。**   
 24 | 
 25 | ## 相关仓库
 26 | | 模型 | 路径 |
 27 | | :----- | :----- |
 28 | ssd-keras | https://github.com/bubbliiiing/ssd-keras
 29 | mobilenet-ssd-keras | https://github.com/bubbliiiing/mobilenet-ssd-keras
 30 | Mobilenet-SSD-Essay | https://github.com/bubbliiiing/Mobilenet-SSD-Essay
 31 | 
 32 | ## 性能情况
 33 | | 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mAP 0.5:0.95 | mAP 0.5 |
 34 | | :-----: | :-----: | :------: | :------: | :------: | :-----: |
 35 | | VOC07+12 | [essay_mobilenet_ssd_weights.h5](https://github.com/bubbliiiing/Mobilenet-SSD-Essay/releases/download/v1.0/essay_mobilenet_ssd_weights.h5) | VOC-Test07 | 300x300| - | 71.48 |
 36 | 
 37 | ## 所需环境
 38 | tensorflow-gpu==1.13.1  
 39 | keras==2.1.5  
 40 | 
 41 | ## 文件下载
 42 | 训练所需的essay_mobilenet_ssd_weights和主干的权值可以在百度云下载。  
 43 | 链接: https://pan.baidu.com/s/1QPUpVuxPGrCrALPoEUONLA   
 44 | 提取码: fi4s    
 45 | 
 46 | VOC数据集下载地址如下，里面已经包括了训练集、测试集、验证集（与测试集一样），无需再次划分：  
 47 | 链接: https://pan.baidu.com/s/1-1Ej6dayrx3g0iAA88uY5A    
 48 | 提取码: ph32   
 49 | 
 50 | ## 训练步骤
 51 | ### a、训练VOC07+12数据集
 52 | 1. 数据集的准备   
 53 | **本文使用VOC格式进行训练，训练前需要下载好VOC07+12的数据集，解压后放在根目录**  
 54 | 
 55 | 2. 数据集的处理   
 56 | 修改voc_annotation.py里面的annotation_mode=2，运行voc_annotation.py生成根目录下的2007_train.txt和2007_val.txt。   
 57 | 
 58 | 3. 开始网络训练   
 59 | train.py的默认参数用于训练VOC数据集，直接运行train.py即可开始训练。   
 60 | 
 61 | 4. 训练结果预测   
 62 | 训练结果预测需要用到两个文件，分别是ssd.py和predict.py。我们首先需要去ssd.py里面修改model_path以及classes_path，这两个参数必须要修改。   
 63 | **model_path指向训练好的权值文件，在logs文件夹里。   
 64 | classes_path指向检测类别所对应的txt。**   
 65 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。   
 66 | 
 67 | ### b、训练自己的数据集
 68 | 1. 数据集的准备  
 69 | **本文使用VOC格式进行训练，训练前需要自己制作好数据集，**    
 70 | 训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。   
 71 | 训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。   
 72 | 
 73 | 2. 数据集的处理  
 74 | 在完成数据集的摆放之后，我们需要利用voc_annotation.py获得训练用的2007_train.txt和2007_val.txt。   
 75 | 修改voc_annotation.py里面的参数。第一次训练可以仅修改classes_path，classes_path用于指向检测类别所对应的txt。   
 76 | 训练自己的数据集时，可以自己建立一个cls_classes.txt，里面写自己所需要区分的类别。   
 77 | model_data/cls_classes.txt文件内容为：      
 78 | ```python
 79 | cat
 80 | dog
 81 | ...
 82 | ```
 83 | 修改voc_annotation.py中的classes_path，使其对应cls_classes.txt，并运行voc_annotation.py。  
 84 | 
 85 | 3. 开始网络训练  
 86 | **训练的参数较多，均在train.py中，大家可以在下载库后仔细看注释，其中最重要的部分依然是train.py里的classes_path。**  
 87 | **classes_path用于指向检测类别所对应的txt，这个txt和voc_annotation.py里面的txt一样！训练自己的数据集必须要修改！**  
 88 | 修改完classes_path后就可以运行train.py开始训练了，在训练多个epoch后，权值会生成在logs文件夹中。  
 89 | 
 90 | 4. 训练结果预测  
 91 | 训练结果预测需要用到两个文件，分别是ssd.py和predict.py。在ssd.py里面修改model_path以及classes_path。  
 92 | **model_path指向训练好的权值文件，在logs文件夹里。  
 93 | classes_path指向检测类别所对应的txt。**  
 94 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。  
 95 | 
 96 | ## 预测步骤
 97 | ### a、使用预训练权重
 98 | 1. 下载完库后解压，在百度网盘下载，放入model_data，运行predict.py，输入  
 99 | ```python
100 | img/street.jpg
101 | ```
102 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
103 | ### b、使用自己训练的权重
104 | 1. 按照训练步骤训练。  
105 | 2. 在ssd.py文件里面，在如下部分修改model_path和classes_path使其对应训练好的文件；**model_path对应logs文件夹下面的权值文件，classes_path是model_path对应分的类**。  
106 | ```python
107 | _defaults = {
108 |     #--------------------------------------------------------------------------#
109 |     #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
110 |     #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
111 |     #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
112 |     #--------------------------------------------------------------------------#
113 |     "model_path"        : 'model_data/essay_mobilenet_ssd_weights.h5',
114 |     "classes_path"      : 'model_data/voc_classes.txt',
115 |     #---------------------------------------------------------------------#
116 |     #   用于预测的图像大小，和train时使用同一个即可
117 |     #---------------------------------------------------------------------#
118 |     "input_shape"       : [300, 300],
119 |     #---------------------------------------------------------------------#
120 |     #   只有得分大于置信度的预测框会被保留下来
121 |     #---------------------------------------------------------------------#
122 |     "confidence"        : 0.5,
123 |     #---------------------------------------------------------------------#
124 |     #   非极大抑制所用到的nms_iou大小
125 |     #---------------------------------------------------------------------#
126 |     "nms_iou"           : 0.45,
127 |     #---------------------------------------------------------------------#
128 |     #   用于指定先验框的大小
129 |     #---------------------------------------------------------------------#
130 |     'anchors_size'      : [30, 60, 111, 162, 213, 264, 315],
131 |     #---------------------------------------------------------------------#
132 |     #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
133 |     #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
134 |     #---------------------------------------------------------------------#
135 |     "letterbox_image"   : False,
136 | }
137 | ```
138 | 3. 运行predict.py，输入  
139 | ```python
140 | img/street.jpg
141 | ```
142 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
143 | 
144 | ## 评估步骤 
145 | ### a、评估VOC07+12的测试集
146 | 1. 本文使用VOC格式进行评估。VOC07+12已经划分好了测试集，无需利用voc_annotation.py生成ImageSets文件夹下的txt。
147 | 2. 在ssd.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
148 | 3. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
149 | 
150 | ### b、评估自己的数据集
151 | 1. 本文使用VOC格式进行评估。  
152 | 2. 如果在训练前已经运行过voc_annotation.py文件，代码会自动将数据集划分成训练集、验证集和测试集。如果想要修改测试集的比例，可以修改voc_annotation.py文件下的trainval_percent。trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1。train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1。
153 | 3. 利用voc_annotation.py划分测试集后，前往get_map.py文件修改classes_path，classes_path用于指向检测类别所对应的txt，这个txt和训练时的txt一样。评估自己的数据集必须要修改。
154 | 4. 在ssd.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
155 | 5. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
156 | 
157 | 
158 | ## Reference
159 | https://github.com/pierluigiferrari/ssd_keras  
160 | https://github.com/kuhung/SSD_keras  
161 | https://github.com/ruinmessi/RFBNet
162 | 


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/Annotations/README.md:
--------------------------------------------------------------------------------
1 | 存放标签文件


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/ImageSets/Main/README.md:
--------------------------------------------------------------------------------
1 | 存放训练索引文件


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/JPEGImages/README.md:
--------------------------------------------------------------------------------
1 | 存放图片文件


--------------------------------------------------------------------------------
/get_map.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import xml.etree.ElementTree as ET
  3 | 
  4 | from PIL import Image
  5 | from tqdm import tqdm
  6 | 
  7 | from ssd import SSD
  8 | from utils.utils import get_classes
  9 | from utils.utils_map import get_coco_map, get_map
 10 | 
 11 | if __name__ == "__main__":
 12 |     '''
 13 |     Recall和Precision不像AP是一个面积的概念，因此在门限值（Confidence）不同时，网络的Recall和Precision值是不同的。
 14 |     默认情况下，本代码计算的Recall和Precision代表的是当门限值（Confidence）为0.5时，所对应的Recall和Precision值。
 15 | 
 16 |     受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算不同门限条件下的Recall和Precision值
 17 |     因此，本代码获得的map_out/detection-results/里面的txt的框的数量一般会比直接predict多一些，目的是列出所有可能的预测框，
 18 |     '''
 19 |     #------------------------------------------------------------------------------------------------------------------#
 20 |     #   map_mode用于指定该文件运行时计算的内容
 21 |     #   map_mode为0代表整个map计算流程，包括获得预测结果、获得真实框、计算VOC_map。
 22 |     #   map_mode为1代表仅仅获得预测结果。
 23 |     #   map_mode为2代表仅仅获得真实框。
 24 |     #   map_mode为3代表仅仅计算VOC_map。
 25 |     #   map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行
 26 |     #-------------------------------------------------------------------------------------------------------------------#
 27 |     map_mode        = 0
 28 |     #--------------------------------------------------------------------------------------#
 29 |     #   此处的classes_path用于指定需要测量VOC_map的类别
 30 |     #   一般情况下与训练和预测所用的classes_path一致即可
 31 |     #--------------------------------------------------------------------------------------#
 32 |     classes_path    = 'model_data/voc_classes.txt'
 33 |     #--------------------------------------------------------------------------------------#
 34 |     #   MINOVERLAP用于指定想要获得的mAP0.x，mAP0.x的意义是什么请同学们百度一下。
 35 |     #   比如计算mAP0.75，可以设定MINOVERLAP = 0.75。
 36 |     #
 37 |     #   当某一预测框与真实框重合度大于MINOVERLAP时，该预测框被认为是正样本，否则为负样本。
 38 |     #   因此MINOVERLAP的值越大，预测框要预测的越准确才能被认为是正样本，此时算出来的mAP值越低，
 39 |     #--------------------------------------------------------------------------------------#
 40 |     MINOVERLAP      = 0.5
 41 |     #--------------------------------------------------------------------------------------#
 42 |     #   受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算mAP
 43 |     #   因此，confidence的值应当设置的尽量小进而获得全部可能的预测框。
 44 |     #   
 45 |     #   该值一般不调整。因为计算mAP需要获得近乎所有的预测框，此处的confidence不能随便更改。
 46 |     #   想要获得不同门限值下的Recall和Precision值，请修改下方的score_threhold。
 47 |     #--------------------------------------------------------------------------------------#
 48 |     confidence      = 0.02
 49 |     #--------------------------------------------------------------------------------------#
 50 |     #   预测时使用到的非极大抑制值的大小，越大表示非极大抑制越不严格。
 51 |     #   
 52 |     #   该值一般不调整。
 53 |     #--------------------------------------------------------------------------------------#
 54 |     nms_iou         = 0.5
 55 |     #---------------------------------------------------------------------------------------------------------------#
 56 |     #   Recall和Precision不像AP是一个面积的概念，因此在门限值不同时，网络的Recall和Precision值是不同的。
 57 |     #   
 58 |     #   默认情况下，本代码计算的Recall和Precision代表的是当门限值为0.5（此处定义为score_threhold）时所对应的Recall和Precision值。
 59 |     #   因为计算mAP需要获得近乎所有的预测框，上面定义的confidence不能随便更改。
 60 |     #   这里专门定义一个score_threhold用于代表门限值，进而在计算mAP时找到门限值对应的Recall和Precision值。
 61 |     #---------------------------------------------------------------------------------------------------------------#
 62 |     score_threhold  = 0.5
 63 |     #-------------------------------------------------------#
 64 |     #   map_vis用于指定是否开启VOC_map计算的可视化
 65 |     #-------------------------------------------------------#
 66 |     map_vis         = False
 67 |     #-------------------------------------------------------#
 68 |     #   指向VOC数据集所在的文件夹
 69 |     #   默认指向根目录下的VOC数据集
 70 |     #-------------------------------------------------------#
 71 |     VOCdevkit_path  = 'VOCdevkit'
 72 |     #-------------------------------------------------------#
 73 |     #   结果输出的文件夹，默认为map_out
 74 |     #-------------------------------------------------------#
 75 |     map_out_path    = 'map_out'
 76 | 
 77 |     image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split()
 78 | 
 79 |     if not os.path.exists(map_out_path):
 80 |         os.makedirs(map_out_path)
 81 |     if not os.path.exists(os.path.join(map_out_path, 'ground-truth')):
 82 |         os.makedirs(os.path.join(map_out_path, 'ground-truth'))
 83 |     if not os.path.exists(os.path.join(map_out_path, 'detection-results')):
 84 |         os.makedirs(os.path.join(map_out_path, 'detection-results'))
 85 |     if not os.path.exists(os.path.join(map_out_path, 'images-optional')):
 86 |         os.makedirs(os.path.join(map_out_path, 'images-optional'))
 87 | 
 88 |     class_names, _ = get_classes(classes_path)
 89 | 
 90 |     if map_mode == 0 or map_mode == 1:
 91 |         print("Load model.")
 92 |         ssd = SSD(confidence = confidence, nms_iou = nms_iou)
 93 |         print("Load model done.")
 94 | 
 95 |         print("Get predict result.")
 96 |         for image_id in tqdm(image_ids):
 97 |             image_path  = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg")
 98 |             image       = Image.open(image_path)
 99 |             if map_vis:
100 |                 image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg"))
101 |             ssd.get_map_txt(image_id, image, class_names, map_out_path)
102 |         print("Get predict result done.")
103 |         
104 |     if map_mode == 0 or map_mode == 2:
105 |         print("Get ground truth result.")
106 |         for image_id in tqdm(image_ids):
107 |             with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
108 |                 root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot()
109 |                 for obj in root.findall('object'):
110 |                     difficult_flag = False
111 |                     if obj.find('difficult')!=None:
112 |                         difficult = obj.find('difficult').text
113 |                         if int(difficult)==1:
114 |                             difficult_flag = True
115 |                     obj_name = obj.find('name').text
116 |                     if obj_name not in class_names:
117 |                         continue
118 |                     bndbox  = obj.find('bndbox')
119 |                     left    = bndbox.find('xmin').text
120 |                     top     = bndbox.find('ymin').text
121 |                     right   = bndbox.find('xmax').text
122 |                     bottom  = bndbox.find('ymax').text
123 | 
124 |                     if difficult_flag:
125 |                         new_f.write("%s %s %s %s %s difficult\n" % (obj_name, left, top, right, bottom))
126 |                     else:
127 |                         new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
128 |         print("Get ground truth result done.")
129 | 
130 |     if map_mode == 0 or map_mode == 3:
131 |         print("Get map.")
132 |         get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path)
133 |         print("Get map done.")
134 | 
135 |     if map_mode == 4:
136 |         print("Get map.")
137 |         get_coco_map(class_names = class_names, path = map_out_path)
138 |         print("Get map done.")
139 | 


--------------------------------------------------------------------------------
/img/street.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/Mobilenet-SSD-Essay/883aa816986d8a6a8fbacefe348fba7cc40068ac/img/street.jpg


--------------------------------------------------------------------------------
/logs/README.md:
--------------------------------------------------------------------------------
1 | 存放训练后的模型


--------------------------------------------------------------------------------
/model_data/simhei.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/Mobilenet-SSD-Essay/883aa816986d8a6a8fbacefe348fba7cc40068ac/model_data/simhei.ttf


--------------------------------------------------------------------------------
/model_data/voc_classes.txt:
--------------------------------------------------------------------------------
 1 | aeroplane
 2 | bicycle
 3 | bird
 4 | boat
 5 | bottle
 6 | bus
 7 | car
 8 | cat
 9 | chair
10 | cow
11 | diningtable
12 | dog
13 | horse
14 | motorbike
15 | person
16 | pottedplant
17 | sheep
18 | sofa
19 | train
20 | tvmonitor


--------------------------------------------------------------------------------
/nets/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/nets/mobilenet.py:
--------------------------------------------------------------------------------
  1 | import keras.backend as K
  2 | from keras.layers import (Activation, BatchNormalization, Conv2D,
  3 |                           DepthwiseConv2D)
  4 | 
  5 | 
  6 | def _depthwise_conv_block(inputs, pointwise_conv_filters,
  7 |                           depth_multiplier=1, strides=(1, 1), block_id=1):
  8 | 
  9 |     x = DepthwiseConv2D((3, 3),
 10 |                         padding='same',
 11 |                         depth_multiplier=1,
 12 |                         strides=strides,
 13 |                         use_bias=False,
 14 |                         name='conv_dw_%d' % block_id)(inputs)
 15 | 
 16 |     x = BatchNormalization(name='conv_dw_%d_bn' % block_id)(x)
 17 |     x = Activation(relu6, name='conv_dw_%d_relu' % block_id)(x)
 18 | 
 19 |     x = Conv2D(pointwise_conv_filters, (1, 1),
 20 |                padding='same',
 21 |                use_bias=False,
 22 |                strides=(1, 1),
 23 |                name='conv_pw_%d' % block_id)(x)
 24 |     x = BatchNormalization(name='conv_pw_%d_bn' % block_id)(x)
 25 |     return Activation(relu6, name='conv_pw_%d_relu' % block_id)(x)
 26 | 
 27 | def relu6(x):
 28 |     return K.relu(x, max_value=6)
 29 | 
 30 | 
 31 | def conv2d_bn(x,filters,num_row,num_col,padding='same',stride=1,dilation_rate=1,relu=True):
 32 |     x = Conv2D(
 33 |         filters, (num_row, num_col),
 34 |         strides=(stride,stride),
 35 |         padding=padding,
 36 |         dilation_rate=(dilation_rate, dilation_rate),
 37 |         use_bias=False)(x)
 38 |     x = BatchNormalization()(x)
 39 |     if relu:    
 40 |         x = Activation(relu6)(x)
 41 |     return x
 42 | 
 43 | def mobilenet(input_tensor):
 44 |     #----------------------------主干特征提取网络开始---------------------------#
 45 |     # SSD结构,net字典
 46 |     net = {} 
 47 |     # Block 1
 48 |     x = input_tensor
 49 |     # 300,300,3 -> 150,150,64
 50 |     x = Conv2D(32, (3,3),
 51 |             padding='same',
 52 |             use_bias=False,
 53 |             strides=(2, 2),
 54 |             name='conv1')(input_tensor)
 55 |     x = BatchNormalization(name='conv1_bn')(x)
 56 |     x = Activation(relu6, name='conv1_relu')(x)
 57 |     x = _depthwise_conv_block(x, 64, 1, block_id=1)
 58 |     
 59 |     # 150,150,64 -> 75,75,128
 60 |     x = _depthwise_conv_block(x, 128, 1,
 61 |                               strides=(2, 2), block_id=2)
 62 |     x = _depthwise_conv_block(x, 128, 1, block_id=3)
 63 | 
 64 |     
 65 |     # Block 3
 66 |     # 75,75,128 -> 38,38,256
 67 |     x = _depthwise_conv_block(x, 256, 1,
 68 |                               strides=(2, 2), block_id=4)
 69 |     
 70 |     x = _depthwise_conv_block(x, 256, 1, block_id=5)
 71 | 
 72 |     # Block 4
 73 |     # 38,38,256 -> 19,19,512
 74 |     x = _depthwise_conv_block(x, 512, 1,
 75 |                               strides=(2, 2), block_id=6)
 76 |     x = _depthwise_conv_block(x, 512, 1, block_id=7)
 77 |     x = _depthwise_conv_block(x, 512, 1, block_id=8)
 78 |     x = _depthwise_conv_block(x, 512, 1, block_id=9)
 79 |     x = _depthwise_conv_block(x, 512, 1, block_id=10)
 80 |     x = _depthwise_conv_block(x, 512, 1, block_id=11)
 81 |     net['conv4_3'] = x
 82 | 
 83 |     # Block 5
 84 |     # 19,19,512 -> 10,10,1024
 85 |     x = _depthwise_conv_block(x, 1024, 1,
 86 |                               strides=(2, 2), block_id=12)
 87 |     x = _depthwise_conv_block(x, 1024, 1, block_id=13)
 88 |     net['fc7'] = x
 89 | 
 90 |     # x = Dropout(0.5, name='drop7')(x)
 91 |     # Block 6
 92 |     # 10,10,512 -> 5,5,512
 93 |     net['conv6_1'] = conv2d_bn(net['fc7'], 256, 1, 1)
 94 |     net['conv6_2'] = conv2d_bn(net['conv6_1'], 512, 3, 3, stride=2)
 95 | 
 96 |     # Block 7
 97 |     # 5,5,512 -> 3,3,256
 98 |     net['conv7_1'] = conv2d_bn(net['conv6_2'], 128, 1, 1)
 99 |     net['conv7_2'] = conv2d_bn(net['conv7_1'], 256, 3, 3, stride=2)
100 | 
101 |     # Block 8
102 |     # 3,3,256 -> 2,2,256
103 |     net['conv8_1'] = conv2d_bn(net['conv7_2'], 128, 1, 1)
104 |     net['conv8_2'] = conv2d_bn(net['conv8_1'], 256, 3, 3, stride=2)
105 | 
106 |     # Block 9
107 |     # 2,2,256 -> 1,1,128
108 |     net['conv9_1'] = conv2d_bn(net['conv8_2'], 64, 1, 1)
109 |     net['conv9_2'] = conv2d_bn(net['conv9_1'], 128, 3, 3, stride=2)
110 |     #----------------------------主干特征提取网络结束---------------------------#
111 |     return net
112 | 


--------------------------------------------------------------------------------
/nets/ssd.py:
--------------------------------------------------------------------------------
  1 | import keras.backend as K
  2 | import numpy as np
  3 | from keras.engine.topology import InputSpec, Layer
  4 | from keras.layers import (Activation, Concatenate, Conv2D, Flatten, Input,
  5 |                           Reshape)
  6 | from keras.models import Model
  7 | 
  8 | from nets.mobilenet import mobilenet
  9 | 
 10 | class Normalize(Layer):
 11 |     def __init__(self, scale, **kwargs):
 12 |         self.axis = 3
 13 |         self.scale = scale
 14 |         super(Normalize, self).__init__(**kwargs)
 15 | 
 16 |     def build(self, input_shape):
 17 |         self.input_spec = [InputSpec(shape=input_shape)]
 18 |         shape = (input_shape[self.axis],)
 19 |         init_gamma = self.scale * np.ones(shape)
 20 |         self.gamma = K.variable(init_gamma, name='{}_gamma'.format(self.name))
 21 |         self.trainable_weights = [self.gamma]
 22 | 
 23 |     def call(self, x, mask=None):
 24 |         output = K.l2_normalize(x, self.axis)
 25 |         output *= self.gamma
 26 |         return output
 27 | 
 28 | def SSD300(input_shape, num_classes=21):
 29 |     #---------------------------------#
 30 |     #   典型的输入大小为[300,300,3]
 31 |     #---------------------------------#
 32 |     input_tensor = Input(shape=input_shape)
 33 |     
 34 |     # net变量里面包含了整个SSD的结构，通过层名可以找到对应的特征层
 35 |     net = mobilenet(input_tensor)
 36 |     
 37 |     #-----------------------将提取到的主干特征进行处理---------------------------#
 38 |     # 对conv4_3的通道进行l2标准化处理 
 39 |     # 19,19,512
 40 |     net['conv4_3_norm'] = Normalize(20, name='conv4_3_norm')(net['conv4_3'])
 41 |     num_priors = 4
 42 |     # 预测框的处理
 43 |     # num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整
 44 |     net['conv4_3_norm_mbox_loc']        = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same', name='conv4_3_norm_mbox_loc')(net['conv4_3_norm'])
 45 |     net['conv4_3_norm_mbox_loc_flat']   = Flatten(name='conv4_3_norm_mbox_loc_flat')(net['conv4_3_norm_mbox_loc'])
 46 |     # num_priors表示每个网格点先验框的数量，num_classes是所分的类
 47 |     net['conv4_3_norm_mbox_conf']       = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv4_3_norm_mbox_conf')(net['conv4_3_norm'])
 48 |     net['conv4_3_norm_mbox_conf_flat']  = Flatten(name='conv4_3_norm_mbox_conf_flat')(net['conv4_3_norm_mbox_conf'])
 49 | 
 50 |     # 对fc7层进行处理 
 51 |     # 10,10,1024
 52 |     num_priors = 6
 53 |     # 预测框的处理
 54 |     # num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整
 55 |     net['fc7_mbox_loc']         = Conv2D(num_priors * 4, kernel_size=(3,3),padding='same',name='fc7_mbox_loc')(net['fc7'])
 56 |     net['fc7_mbox_loc_flat']    = Flatten(name='fc7_mbox_loc_flat')(net['fc7_mbox_loc'])
 57 |     # num_priors表示每个网格点先验框的数量，num_classes是所分的类
 58 |     net['fc7_mbox_conf']        = Conv2D(num_priors * num_classes, kernel_size=(3,3),padding='same',name='fc7_mbox_conf')(net['fc7'])
 59 |     net['fc7_mbox_conf_flat']   = Flatten(name='fc7_mbox_conf_flat')(net['fc7_mbox_conf'])
 60 | 
 61 |     # 对conv6_2进行处理
 62 |     # 5,5,512
 63 |     num_priors = 6
 64 |     # 预测框的处理
 65 |     # num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整
 66 |     net['conv6_2_mbox_loc']         = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv6_2_mbox_loc')(net['conv6_2'])
 67 |     net['conv6_2_mbox_loc_flat']    = Flatten(name='conv6_2_mbox_loc_flat')(net['conv6_2_mbox_loc'])
 68 |     # num_priors表示每个网格点先验框的数量，num_classes是所分的类
 69 |     net['conv6_2_mbox_conf']        = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv6_2_mbox_conf')(net['conv6_2'])
 70 |     net['conv6_2_mbox_conf_flat']   = Flatten(name='conv6_2_mbox_conf_flat')(net['conv6_2_mbox_conf'])
 71 | 
 72 |     # 对conv7_2进行处理
 73 |     # 3,3,256
 74 |     num_priors = 6
 75 |     # 预测框的处理
 76 |     # num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整
 77 |     net['conv7_2_mbox_loc']         = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv7_2_mbox_loc')(net['conv7_2'])
 78 |     net['conv7_2_mbox_loc_flat']    = Flatten(name='conv7_2_mbox_loc_flat')(net['conv7_2_mbox_loc'])
 79 |     # num_priors表示每个网格点先验框的数量，num_classes是所分的类
 80 |     net['conv7_2_mbox_conf']        = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv7_2_mbox_conf')(net['conv7_2'])
 81 |     net['conv7_2_mbox_conf_flat']   = Flatten(name='conv7_2_mbox_conf_flat')(net['conv7_2_mbox_conf'])
 82 | 
 83 |     # 对conv8_2进行处理
 84 |     # 2,2,256
 85 |     num_priors = 6
 86 |     # 预测框的处理
 87 |     # num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整
 88 |     net['conv8_2_mbox_loc']         = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv8_2_mbox_loc')(net['conv8_2'])
 89 |     net['conv8_2_mbox_loc_flat']    = Flatten(name='conv8_2_mbox_loc_flat')(net['conv8_2_mbox_loc'])
 90 |     # num_priors表示每个网格点先验框的数量，num_classes是所分的类
 91 |     net['conv8_2_mbox_conf']        = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv8_2_mbox_conf')(net['conv8_2'])
 92 |     net['conv8_2_mbox_conf_flat']   = Flatten(name='conv8_2_mbox_conf_flat')(net['conv8_2_mbox_conf'])
 93 | 
 94 |     # 对conv9_2进行处理
 95 |     # 1,1,256
 96 |     num_priors = 6
 97 |     # 预测框的处理
 98 |     # num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整
 99 |     net['conv9_2_mbox_loc']         = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv9_2_mbox_loc')(net['conv9_2'])
100 |     net['conv9_2_mbox_loc_flat']    = Flatten(name='conv9_2_mbox_loc_flat')(net['conv9_2_mbox_loc'])
101 |     # num_priors表示每个网格点先验框的数量，num_classes是所分的类
102 |     net['conv9_2_mbox_conf']        = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv9_2_mbox_conf')(net['conv9_2'])
103 |     net['conv9_2_mbox_conf_flat']   = Flatten(name='conv9_2_mbox_conf_flat')(net['conv9_2_mbox_conf'])
104 |     
105 |     # 将所有结果进行堆叠
106 |     net['mbox_loc'] = Concatenate(axis=1, name='mbox_loc')([net['conv4_3_norm_mbox_loc_flat'],
107 |                                                             net['fc7_mbox_loc_flat'],
108 |                                                             net['conv6_2_mbox_loc_flat'],
109 |                                                             net['conv7_2_mbox_loc_flat'],
110 |                                                             net['conv8_2_mbox_loc_flat'],
111 |                                                             net['conv9_2_mbox_loc_flat']])
112 |                                     
113 |     net['mbox_conf'] = Concatenate(axis=1, name='mbox_conf')([net['conv4_3_norm_mbox_conf_flat'],
114 |                                                             net['fc7_mbox_conf_flat'],
115 |                                                             net['conv6_2_mbox_conf_flat'],
116 |                                                             net['conv7_2_mbox_conf_flat'],
117 |                                                             net['conv8_2_mbox_conf_flat'],
118 |                                                             net['conv9_2_mbox_conf_flat']])
119 |     # 2278,4
120 |     net['mbox_loc']     = Reshape((-1, 4), name='mbox_loc_final')(net['mbox_loc'])
121 |     # 2278,21
122 |     net['mbox_conf']    = Reshape((-1, num_classes), name='mbox_conf_logits')(net['mbox_conf'])
123 |     net['mbox_conf']    = Activation('softmax', name='mbox_conf_final')(net['mbox_conf'])
124 |     # 2278,25
125 |     net['predictions']  = Concatenate(axis =-1, name='predictions')([net['mbox_loc'], net['mbox_conf']])
126 | 
127 |     model = Model(input_tensor, net['predictions'])
128 |     return model
129 | 


--------------------------------------------------------------------------------
/nets/ssd_training.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | from functools import partial
  3 | 
  4 | import tensorflow as tf
  5 | 
  6 | 
  7 | class MultiboxLoss(object):
  8 |     def __init__(self, num_classes, alpha=1.0, neg_pos_ratio=3.0,
  9 |                  background_label_id=0, negatives_for_hard=100.0):
 10 |         self.num_classes = num_classes
 11 |         self.alpha = alpha
 12 |         self.neg_pos_ratio = neg_pos_ratio
 13 |         if background_label_id != 0:
 14 |             raise Exception('Only 0 as background label id is supported')
 15 |         self.background_label_id = background_label_id
 16 |         self.negatives_for_hard = negatives_for_hard
 17 | 
 18 |     def _l1_smooth_loss(self, y_true, y_pred):
 19 |         abs_loss = tf.abs(y_true - y_pred)
 20 |         sq_loss = 0.5 * (y_true - y_pred)**2
 21 |         l1_loss = tf.where(tf.less(abs_loss, 1.0), sq_loss, abs_loss - 0.5)
 22 |         return tf.reduce_sum(l1_loss, -1)
 23 | 
 24 |     def _softmax_loss(self, y_true, y_pred):
 25 |         y_pred = tf.maximum(y_pred, 1e-7)
 26 |         softmax_loss = -tf.reduce_sum(y_true * tf.log(y_pred),
 27 |                                       axis=-1)
 28 |         return softmax_loss
 29 | 
 30 |     def compute_loss(self, y_true, y_pred):
 31 |         num_boxes = tf.to_float(tf.shape(y_true)[1])
 32 | 
 33 |         # --------------------------------------------- #
 34 |         #   分类的loss
 35 |         #   batch_size,2278,21 -> batch_size,2278
 36 |         # --------------------------------------------- #
 37 |         conf_loss = self._softmax_loss(y_true[:, :, 4:-1],
 38 |                                        y_pred[:, :, 4:])
 39 |         # --------------------------------------------- #
 40 |         #   框的位置的loss
 41 |         #   batch_size,2278,4 -> batch_size,2278
 42 |         # --------------------------------------------- #
 43 |         loc_loss = self._l1_smooth_loss(y_true[:, :, :4],
 44 |                                         y_pred[:, :, :4])
 45 | 
 46 |         # --------------------------------------------- #
 47 |         #   获取所有的正标签的loss
 48 |         # --------------------------------------------- #
 49 |         pos_loc_loss = tf.reduce_sum(loc_loss * y_true[:, :, -1],
 50 |                                      axis=1)
 51 |         pos_conf_loss = tf.reduce_sum(conf_loss * y_true[:, :, -1],
 52 |                                       axis=1)
 53 | 
 54 |         # --------------------------------------------- #
 55 |         #   每一张图的正样本的个数
 56 |         #   batch_size,
 57 |         # --------------------------------------------- #
 58 |         num_pos = tf.reduce_sum(y_true[:, :, -1], axis=-1)
 59 | 
 60 |         # --------------------------------------------- #
 61 |         #   每一张图的负样本的个数
 62 |         #   batch_size,
 63 |         # --------------------------------------------- #
 64 |         num_neg = tf.minimum(self.neg_pos_ratio * num_pos, num_boxes - num_pos)
 65 |         # 找到了哪些值是大于0的
 66 |         pos_num_neg_mask = tf.greater(num_neg, 0)
 67 |         # --------------------------------------------- #
 68 |         #   如果所有的图，正样本的数量均为0
 69 |         #   那么则默认选取100个先验框作为负样本
 70 |         # --------------------------------------------- #
 71 |         has_min = tf.to_float(tf.reduce_any(pos_num_neg_mask))
 72 |         num_neg = tf.concat(axis=0, values=[num_neg, [(1 - has_min) * self.negatives_for_hard]])
 73 |         
 74 |         # --------------------------------------------- #
 75 |         #   从这里往后，与视频中看到的代码有些许不同。
 76 |         #   由于以前的负样本选取方式存在一些问题，
 77 |         #   我对该部分代码进行重构。
 78 |         #   求整个batch应该的负样本数量总和
 79 |         # --------------------------------------------- #
 80 |         num_neg_batch = tf.reduce_sum(tf.boolean_mask(num_neg, tf.greater(num_neg, 0)))
 81 |         num_neg_batch = tf.to_int32(num_neg_batch)
 82 | 
 83 |         # --------------------------------------------- #
 84 |         #   对预测结果进行判断，如果该先验框没有包含物体
 85 |         #   那么它的不属于背景的预测概率过大的话
 86 |         #   就是难分类样本
 87 |         # --------------------------------------------- #
 88 |         confs_start = 4 + self.background_label_id + 1
 89 |         confs_end   = confs_start + self.num_classes - 1
 90 | 
 91 |         # --------------------------------------------- #
 92 |         #   batch_size,2278
 93 |         #   把不是背景的概率求和，求和后的概率越大
 94 |         #   代表越难分类。
 95 |         # --------------------------------------------- #
 96 |         max_confs = tf.reduce_sum(y_pred[:, :, confs_start:confs_end], axis=2)
 97 | 
 98 |         # --------------------------------------------------- #
 99 |         #   只有没有包含物体的先验框才得到保留
100 |         #   我们在整个batch里面选取最难分类的num_neg_batch个
101 |         #   先验框作为负样本。
102 |         # --------------------------------------------------- #
103 |         max_confs   = tf.reshape(max_confs * (1 - y_true[:, :, -1]), [-1])
104 |         _, indices  = tf.nn.top_k(max_confs, k=num_neg_batch)
105 | 
106 |         neg_conf_loss = tf.gather(tf.reshape(conf_loss, [-1]), indices)
107 | 
108 |         # 进行归一化
109 |         num_pos     = tf.where(tf.not_equal(num_pos, 0), num_pos, tf.ones_like(num_pos))
110 |         total_loss  = tf.reduce_sum(pos_conf_loss) + tf.reduce_sum(neg_conf_loss) + tf.reduce_sum(self.alpha * pos_loc_loss)
111 |         total_loss /= tf.reduce_sum(num_pos)
112 |         return total_loss
113 | 
114 | def get_lr_scheduler(lr_decay_type, lr, min_lr, total_iters, warmup_iters_ratio = 0.05, warmup_lr_ratio = 0.1, no_aug_iter_ratio = 0.05, step_num = 10):
115 |     def yolox_warm_cos_lr(lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter, iters):
116 |         if iters <= warmup_total_iters:
117 |             # lr = (lr - warmup_lr_start) * iters / float(warmup_total_iters) + warmup_lr_start
118 |             lr = (lr - warmup_lr_start) * pow(iters / float(warmup_total_iters), 2
119 |             ) + warmup_lr_start
120 |         elif iters >= total_iters - no_aug_iter:
121 |             lr = min_lr
122 |         else:
123 |             lr = min_lr + 0.5 * (lr - min_lr) * (
124 |                 1.0
125 |                 + math.cos(
126 |                     math.pi
127 |                     * (iters - warmup_total_iters)
128 |                     / (total_iters - warmup_total_iters - no_aug_iter)
129 |                 )
130 |             )
131 |         return lr
132 | 
133 |     def step_lr(lr, decay_rate, step_size, iters):
134 |         if step_size < 1:
135 |             raise ValueError("step_size must above 1.")
136 |         n       = iters // step_size
137 |         out_lr  = lr * decay_rate ** n
138 |         return out_lr
139 | 
140 |     if lr_decay_type == "cos":
141 |         warmup_total_iters  = min(max(warmup_iters_ratio * total_iters, 1), 3)
142 |         warmup_lr_start     = max(warmup_lr_ratio * lr, 1e-6)
143 |         no_aug_iter         = min(max(no_aug_iter_ratio * total_iters, 1), 15)
144 |         func = partial(yolox_warm_cos_lr ,lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter)
145 |     else:
146 |         decay_rate  = (min_lr / lr) ** (1 / (step_num - 1))
147 |         step_size   = total_iters / step_num
148 |         func = partial(step_lr, lr, decay_rate, step_size)
149 | 
150 |     return func
151 | 


--------------------------------------------------------------------------------
/predict.py:
--------------------------------------------------------------------------------
  1 | #-----------------------------------------------------------------------#
  2 | #   predict.py将单张图片预测、摄像头检测、FPS测试和目录遍历检测等功能
  3 | #   整合到了一个py文件中，通过指定mode进行模式的修改。
  4 | #-----------------------------------------------------------------------#
  5 | import time
  6 | 
  7 | import cv2
  8 | import numpy as np
  9 | from PIL import Image
 10 | 
 11 | from ssd import SSD
 12 | 
 13 | if __name__ == "__main__":
 14 |     ssd = SSD()
 15 |     #----------------------------------------------------------------------------------------------------------#
 16 |     #   mode用于指定测试的模式：
 17 |     #   'predict'           表示单张图片预测，如果想对预测过程进行修改，如保存图片，截取对象等，可以先看下方详细的注释
 18 |     #   'video'             表示视频检测，可调用摄像头或者视频进行检测，详情查看下方注释。
 19 |     #   'fps'               表示测试fps，使用的图片是img里面的street.jpg，详情查看下方注释。
 20 |     #   'dir_predict'       表示遍历文件夹进行检测并保存。默认遍历img文件夹，保存img_out文件夹，详情查看下方注释。
 21 |     #----------------------------------------------------------------------------------------------------------#
 22 |     mode = "predict"
 23 |     #-------------------------------------------------------------------------#
 24 |     #   crop                指定了是否在单张图片预测后对目标进行截取
 25 |     #   count               指定了是否进行目标的计数
 26 |     #   crop、count仅在mode='predict'时有效
 27 |     #-------------------------------------------------------------------------#
 28 |     crop            = False
 29 |     count           = False
 30 |     #----------------------------------------------------------------------------------------------------------#
 31 |     #   video_path          用于指定视频的路径，当video_path=0时表示检测摄像头
 32 |     #                       想要检测视频，则设置如video_path = "xxx.mp4"即可，代表读取出根目录下的xxx.mp4文件。
 33 |     #   video_save_path     表示视频保存的路径，当video_save_path=""时表示不保存
 34 |     #                       想要保存视频，则设置如video_save_path = "yyy.mp4"即可，代表保存为根目录下的yyy.mp4文件。
 35 |     #   video_fps           用于保存的视频的fps
 36 |     #
 37 |     #   video_path、video_save_path和video_fps仅在mode='video'时有效
 38 |     #   保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。
 39 |     #----------------------------------------------------------------------------------------------------------#
 40 |     video_path      = 0
 41 |     video_save_path = ""
 42 |     video_fps       = 25.0
 43 |     #----------------------------------------------------------------------------------------------------------#
 44 |     #   test_interval       用于指定测量fps的时候，图片检测的次数。理论上test_interval越大，fps越准确。
 45 |     #   fps_image_path      用于指定测试的fps图片
 46 |     #   
 47 |     #   test_interval和fps_image_path仅在mode='fps'有效
 48 |     #----------------------------------------------------------------------------------------------------------#
 49 |     test_interval   = 100
 50 |     fps_image_path  = "img/street.jpg"
 51 |     #-------------------------------------------------------------------------#
 52 |     #   dir_origin_path     指定了用于检测的图片的文件夹路径
 53 |     #   dir_save_path       指定了检测完图片的保存路径
 54 |     #   
 55 |     #   dir_origin_path和dir_save_path仅在mode='dir_predict'时有效
 56 |     #-------------------------------------------------------------------------#
 57 |     dir_origin_path = "img/"
 58 |     dir_save_path   = "img_out/"
 59 | 
 60 |     if mode == "predict":
 61 |         '''
 62 |         1、如果想要进行检测完的图片的保存，利用r_image.save("img.jpg")即可保存，直接在predict.py里进行修改即可。 
 63 |         2、如果想要获得预测框的坐标，可以进入ssd.detect_image函数，在绘图部分读取top，left，bottom，right这四个值。
 64 |         3、如果想要利用预测框截取下目标，可以进入ssd.detect_image函数，在绘图部分利用获取到的top，left，bottom，right这四个值
 65 |         在原图上利用矩阵的方式进行截取。
 66 |         4、如果想要在预测图上写额外的字，比如检测到的特定目标的数量，可以进入ssd.detect_image函数，在绘图部分对predicted_class进行判断，
 67 |         比如判断if predicted_class == 'car': 即可判断当前目标是否为车，然后记录数量即可。利用draw.text即可写字。
 68 |         '''
 69 |         while True:
 70 |             img = input('Input image filename:')
 71 |             try:
 72 |                 image = Image.open(img)
 73 |             except:
 74 |                 print('Open Error! Try again!')
 75 |                 continue
 76 |             else:
 77 |                 r_image = ssd.detect_image(image, crop = crop, count=count)
 78 |                 r_image.show()
 79 | 
 80 |     elif mode == "video":
 81 |         capture = cv2.VideoCapture(video_path)
 82 |         if video_save_path!="":
 83 |             fourcc  = cv2.VideoWriter_fourcc(*'XVID')
 84 |             size    = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
 85 |             out     = cv2.VideoWriter(video_save_path, fourcc, video_fps, size)
 86 | 
 87 |         ref, frame = capture.read()
 88 |         if not ref:
 89 |             raise ValueError("未能正确读取摄像头（视频），请注意是否正确安装摄像头（是否正确填写视频路径）。")
 90 | 
 91 |         fps = 0.0
 92 |         while(True):
 93 |             t1 = time.time()
 94 |             # 读取某一帧
 95 |             ref, frame = capture.read()
 96 |             if not ref:
 97 |                 break
 98 |             # 格式转变，BGRtoRGB
 99 |             frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
100 |             # 转变成Image
101 |             frame = Image.fromarray(np.uint8(frame))
102 |             # 进行检测
103 |             frame = np.array(ssd.detect_image(frame))
104 |             # RGBtoBGR满足opencv显示格式
105 |             frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR)
106 |             
107 |             fps  = ( fps + (1./(time.time()-t1)) ) / 2
108 |             print("fps= %.2f"%(fps))
109 |             frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
110 |             
111 |             cv2.imshow("video",frame)
112 |             c= cv2.waitKey(1) & 0xff 
113 |             if video_save_path!="":
114 |                 out.write(frame)
115 | 
116 |             if c==27:
117 |                 capture.release()
118 |                 break
119 | 
120 |         print("Video Detection Done!")
121 |         capture.release()
122 |         if video_save_path!="":
123 |             print("Save processed video to the path :" + video_save_path)
124 |             out.release()
125 |         cv2.destroyAllWindows()
126 |         
127 |     elif mode == "fps":
128 |         img = Image.open(fps_image_path)
129 |         tact_time = ssd.get_FPS(img, test_interval)
130 |         print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1')
131 | 
132 |     elif mode == "dir_predict":
133 |         import os
134 | 
135 |         from tqdm import tqdm
136 | 
137 |         img_names = os.listdir(dir_origin_path)
138 |         for img_name in tqdm(img_names):
139 |             if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')):
140 |                 image_path  = os.path.join(dir_origin_path, img_name)
141 |                 image       = Image.open(image_path)
142 |                 r_image     = ssd.detect_image(image)
143 |                 if not os.path.exists(dir_save_path):
144 |                     os.makedirs(dir_save_path)
145 |                 r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0)
146 | 
147 |     else:
148 |         raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps' or 'dir_predict'.")
149 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | scipy==1.2.1
 2 | numpy==1.17.0
 3 | Keras==2.1.5
 4 | matplotlib==3.1.2
 5 | opencv_python==4.1.2.30
 6 | tensorflow_gpu==1.13.2
 7 | tqdm==4.60.0
 8 | Pillow==8.2.0
 9 | h5py==2.10.0
10 | 


--------------------------------------------------------------------------------
/ssd.py:
--------------------------------------------------------------------------------
  1 | import colorsys
  2 | import os
  3 | import time
  4 | 
  5 | import numpy as np
  6 | from keras.applications.imagenet_utils import preprocess_input
  7 | from PIL import ImageDraw, ImageFont
  8 | 
  9 | from nets.ssd import SSD300
 10 | from utils.anchors import get_anchors
 11 | from utils.utils import cvtColor, get_classes, resize_image, show_config
 12 | from utils.utils_bbox import BBoxUtility
 13 | 
 14 | '''
 15 | 训练自己的数据集必看！
 16 | '''
 17 | class SSD(object):
 18 |     _defaults = {
 19 |         #--------------------------------------------------------------------------#
 20 |         #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
 21 |         #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
 22 |         #
 23 |         #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
 24 |         #   验证集损失较低不代表mAP较高，仅代表该权值在验证集上泛化性能较好。
 25 |         #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
 26 |         #--------------------------------------------------------------------------#
 27 |         "model_path"        : 'model_data/essay_mobilenet_ssd_weights.h5',
 28 |         "classes_path"      : 'model_data/voc_classes.txt',
 29 |         #---------------------------------------------------------------------#
 30 |         #   用于预测的图像大小，和train时使用同一个即可
 31 |         #---------------------------------------------------------------------#
 32 |         "input_shape"       : [300, 300],
 33 |         #---------------------------------------------------------------------#
 34 |         #   只有得分大于置信度的预测框会被保留下来
 35 |         #---------------------------------------------------------------------#
 36 |         "confidence"        : 0.5,
 37 |         #---------------------------------------------------------------------#
 38 |         #   非极大抑制所用到的nms_iou大小
 39 |         #---------------------------------------------------------------------#
 40 |         "nms_iou"           : 0.45,
 41 |         #---------------------------------------------------------------------#
 42 |         #   用于指定先验框的大小
 43 |         #---------------------------------------------------------------------#
 44 |         'anchors_size'      : [30, 60, 111, 162, 213, 264, 315],
 45 |         #---------------------------------------------------------------------#
 46 |         #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
 47 |         #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
 48 |         #---------------------------------------------------------------------#
 49 |         "letterbox_image"   : False,
 50 |     }
 51 | 
 52 |     @classmethod
 53 |     def get_defaults(cls, n):
 54 |         if n in cls._defaults:
 55 |             return cls._defaults[n]
 56 |         else:
 57 |             return "Unrecognized attribute name '" + n + "'"
 58 | 
 59 |     #---------------------------------------------------#
 60 |     #   初始化ssd
 61 |     #---------------------------------------------------#
 62 |     def __init__(self, **kwargs):
 63 |         self.__dict__.update(self._defaults)
 64 |         for name, value in kwargs.items():
 65 |             setattr(self, name, value)
 66 |         #---------------------------------------------------#
 67 |         #   计算总的类的数量
 68 |         #---------------------------------------------------#
 69 |         self.class_names, self.num_classes  = get_classes(self.classes_path)
 70 |         self.anchors                        = get_anchors(self.input_shape, self.anchors_size)
 71 |         self.num_classes                    = self.num_classes + 1
 72 |         
 73 |         #---------------------------------------------------#
 74 |         #   画框设置不同的颜色
 75 |         #---------------------------------------------------#
 76 |         hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]
 77 |         self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
 78 |         self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))
 79 | 
 80 |         self.bbox_util = BBoxUtility(self.num_classes, nms_thresh=self.nms_iou)
 81 |         self.generate()
 82 |         
 83 |         show_config(**self._defaults)
 84 | 
 85 |     #---------------------------------------------------#
 86 |     #   载入模型
 87 |     #---------------------------------------------------#
 88 |     def generate(self):
 89 |         model_path = os.path.expanduser(self.model_path)
 90 |         assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.'
 91 |         
 92 |         #-------------------------------#
 93 |         #   载入模型与权值
 94 |         #-------------------------------#
 95 |         self.ssd = SSD300([self.input_shape[0], self.input_shape[1], 3], self.num_classes)
 96 |         self.ssd.load_weights(self.model_path, by_name=True)
 97 |         print('{} model, anchors, and classes loaded.'.format(model_path))
 98 | 
 99 |     #---------------------------------------------------#
100 |     #   检测图片
101 |     #---------------------------------------------------#
102 |     def detect_image(self, image, crop = False, count = False):
103 |         # 计算输入图片的高和宽
104 |         image_shape = np.array(np.shape(image)[0:2])
105 |         #---------------------------------------------------------#
106 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
107 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
108 |         #---------------------------------------------------------#
109 |         image       = cvtColor(image)
110 |         #---------------------------------------------------------#
111 |         #   给图像增加灰条，实现不失真的resize
112 |         #   也可以直接resize进行识别
113 |         #---------------------------------------------------------#
114 |         image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
115 |         #---------------------------------------------------------#
116 |         #   添加上batch_size维度，图片预处理，归一化。
117 |         #---------------------------------------------------------#
118 |         image_data = preprocess_input(np.expand_dims(np.array(image_data, dtype='float32'), 0))
119 | 
120 |         preds      = self.ssd.predict(image_data)
121 |         #-----------------------------------------------------------#
122 |         #   将预测结果进行解码
123 |         #-----------------------------------------------------------#
124 |         results     = self.bbox_util.decode_box(preds, self.anchors, image_shape, 
125 |                                                 self.input_shape, self.letterbox_image, confidence=self.confidence)
126 |         #--------------------------------------#
127 |         #   如果没有检测到物体，则返回原图
128 |         #--------------------------------------#
129 |         if len(results[0])<=0:
130 |             return image
131 | 
132 |         top_label   = np.array(results[0][:, 4], dtype = 'int32')
133 |         top_conf    = results[0][:, 5]
134 |         top_boxes   = results[0][:, :4]
135 |         #---------------------------------------------------------#
136 |         #   设置字体与边框厚度
137 |         #---------------------------------------------------------#
138 |         font = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * np.shape(image)[1] + 0.5).astype('int32'))
139 |         thickness = max((np.shape(image)[0] + np.shape(image)[1]) // self.input_shape[0], 1)
140 |         #---------------------------------------------------------#
141 |         #   计数
142 |         #---------------------------------------------------------#
143 |         if count:
144 |             print("top_label:", top_label)
145 |             classes_nums    = np.zeros([self.num_classes])
146 |             for i in range(self.num_classes):
147 |                 num = np.sum(top_label == i)
148 |                 if num > 0:
149 |                     print(self.class_names[i], " : ", num)
150 |                 classes_nums[i] = num
151 |             print("classes_nums:", classes_nums)
152 |         #---------------------------------------------------------#
153 |         #   是否进行目标的裁剪
154 |         #---------------------------------------------------------#
155 |         if crop:
156 |             for i, c in list(enumerate(top_boxes)):
157 |                 top, left, bottom, right = top_boxes[i]
158 |                 top     = max(0, np.floor(top).astype('int32'))
159 |                 left    = max(0, np.floor(left).astype('int32'))
160 |                 bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
161 |                 right   = min(image.size[0], np.floor(right).astype('int32'))
162 |                 
163 |                 dir_save_path = "img_crop"
164 |                 if not os.path.exists(dir_save_path):
165 |                     os.makedirs(dir_save_path)
166 |                 crop_image = image.crop([left, top, right, bottom])
167 |                 crop_image.save(os.path.join(dir_save_path, "crop_" + str(i) + ".png"), quality=95, subsampling=0)
168 |                 print("save crop_" + str(i) + ".png to " + dir_save_path)
169 |         #---------------------------------------------------------#
170 |         #   图像绘制
171 |         #---------------------------------------------------------#
172 |         for i, c in list(enumerate(top_label)):
173 |             predicted_class = self.class_names[int(c)]
174 |             box             = top_boxes[i]
175 |             score           = top_conf[i]
176 | 
177 |             top, left, bottom, right = box
178 | 
179 |             top     = max(0, np.floor(top).astype('int32'))
180 |             left    = max(0, np.floor(left).astype('int32'))
181 |             bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
182 |             right   = min(image.size[0], np.floor(right).astype('int32'))
183 | 
184 |             label = '{} {:.2f}'.format(predicted_class, score)
185 |             draw = ImageDraw.Draw(image)
186 |             label_size = draw.textsize(label, font)
187 |             label = label.encode('utf-8')
188 |             print(label, top, left, bottom, right)
189 |             
190 |             if top - label_size[1] >= 0:
191 |                 text_origin = np.array([left, top - label_size[1]])
192 |             else:
193 |                 text_origin = np.array([left, top + 1])
194 | 
195 |             for i in range(thickness):
196 |                 draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c])
197 |             draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c])
198 |             draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
199 |             del draw
200 | 
201 |         return image
202 | 
203 |     def get_FPS(self, image, test_interval):
204 |         image_shape = np.array(np.shape(image)[0:2])
205 |         #---------------------------------------------------------#
206 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
207 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
208 |         #---------------------------------------------------------#
209 |         image       = cvtColor(image)
210 |         #---------------------------------------------------------#
211 |         #   给图像增加灰条，实现不失真的resize
212 |         #   也可以直接resize进行识别
213 |         #---------------------------------------------------------#
214 |         image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
215 |         #---------------------------------------------------------#
216 |         #   添加上batch_size维度，图片预处理，归一化。
217 |         #---------------------------------------------------------#
218 |         image_data = preprocess_input(np.expand_dims(np.array(image_data, dtype='float32'), 0))
219 | 
220 |         preds      = self.ssd.predict(image_data)
221 |         #-----------------------------------------------------------#
222 |         #   将预测结果进行解码
223 |         #-----------------------------------------------------------#
224 |         results     = self.bbox_util.decode_box(preds, self.anchors, image_shape, 
225 |                                                 self.input_shape, self.letterbox_image, confidence=self.confidence)
226 |         t1 = time.time()
227 |         for _ in range(test_interval):
228 |             preds      = self.ssd.predict(image_data)
229 |             #-----------------------------------------------------------#
230 |             #   将预测结果进行解码
231 |             #-----------------------------------------------------------#
232 |             results     = self.bbox_util.decode_box(preds, self.anchors, image_shape, 
233 |                                                     self.input_shape, self.letterbox_image, confidence=self.confidence)
234 |         t2 = time.time()
235 |         tact_time = (t2 - t1) / test_interval
236 |         return tact_time
237 | 
238 |     def get_map_txt(self, image_id, image, class_names, map_out_path):
239 |         f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 
240 |         image_shape = np.array(np.shape(image)[0:2])
241 |         #---------------------------------------------------------#
242 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
243 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
244 |         #---------------------------------------------------------#
245 |         image       = cvtColor(image)
246 |         #---------------------------------------------------------#
247 |         #   给图像增加灰条，实现不失真的resize
248 |         #   也可以直接resize进行识别
249 |         #---------------------------------------------------------#
250 |         image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
251 |         #---------------------------------------------------------#
252 |         #   添加上batch_size维度，图片预处理，归一化。
253 |         #---------------------------------------------------------#
254 |         image_data = preprocess_input(np.expand_dims(np.array(image_data, dtype='float32'), 0))
255 | 
256 |         preds      = self.ssd.predict(image_data)
257 |         #-----------------------------------------------------------#
258 |         #   将预测结果进行解码
259 |         #-----------------------------------------------------------#
260 |         results     = self.bbox_util.decode_box(preds, self.anchors, image_shape, 
261 |                                                 self.input_shape, self.letterbox_image, confidence=self.confidence)
262 |         #--------------------------------------#
263 |         #   如果没有检测到物体，则返回原图
264 |         #--------------------------------------#
265 |         if len(results[0])<=0:
266 |             return 
267 | 
268 |         top_label   = results[0][:, 4]
269 |         top_conf    = results[0][:, 5]
270 |         top_boxes   = results[0][:, :4]
271 | 
272 |         for i, c in list(enumerate(top_label)):
273 |             predicted_class = self.class_names[int(c)]
274 |             box             = top_boxes[i]
275 |             score           = str(top_conf[i])
276 |             
277 |             top, left, bottom, right = box
278 | 
279 |             if predicted_class not in class_names:
280 |                 continue
281 | 
282 |             f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
283 | 
284 |         f.close()
285 |         return 
286 | 


--------------------------------------------------------------------------------
/summary.py:
--------------------------------------------------------------------------------
 1 | #--------------------------------------------#
 2 | #   该部分代码用于看网络结构
 3 | #--------------------------------------------#
 4 | from nets.ssd import SSD300
 5 | from utils.utils import net_flops
 6 | 
 7 | if __name__ == "__main__":
 8 |     input_shape = [300, 300]
 9 |     num_classes = 21
10 | 
11 |     model = SSD300([input_shape[0], input_shape[1], 3], num_classes)
12 |     #--------------------------------------------#
13 |     #   查看网络结构网络结构
14 |     #--------------------------------------------#
15 |     model.summary()
16 |     #--------------------------------------------#
17 |     #   计算网络的FLOPS
18 |     #--------------------------------------------#
19 |     net_flops(model, table=False)
20 |     
21 |     #--------------------------------------------#
22 |     #   获得网络每个层的名称与序号
23 |     #--------------------------------------------#
24 |     # for i,layer in enumerate(model.layers):
25 |     #     print(i,layer.name)
26 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | import os
  3 | 
  4 | import keras.backend as K
  5 | import tensorflow as tf
  6 | from keras.callbacks import (EarlyStopping, LearningRateScheduler,
  7 |                              ModelCheckpoint, TensorBoard)
  8 | from keras.layers import Conv2D, Dense, DepthwiseConv2D
  9 | from keras.optimizers import SGD, Adam
 10 | from keras.regularizers import l2
 11 | from keras.utils.multi_gpu_utils import multi_gpu_model
 12 | 
 13 | from nets.ssd import SSD300
 14 | from nets.ssd_training import MultiboxLoss, get_lr_scheduler
 15 | from utils.anchors import get_anchors
 16 | from utils.callbacks import (ExponentDecayScheduler, LossHistory,
 17 |                              ParallelModelCheckpoint, EvalCallback)
 18 | from utils.dataloader import SSDDatasets
 19 | from utils.utils import get_classes, show_config
 20 | 
 21 | tf.logging.set_verbosity(tf.logging.ERROR)
 22 | 
 23 | '''
 24 | 训练自己的目标检测模型一定需要注意以下几点：
 25 | 1、训练前仔细检查自己的格式是否满足要求，该库要求数据集格式为VOC格式，需要准备好的内容有输入图片和标签
 26 |    输入图片为.jpg图片，无需固定大小，传入训练前会自动进行resize。
 27 |    灰度图会自动转成RGB图片进行训练，无需自己修改。
 28 |    输入图片如果后缀非jpg，需要自己批量转成jpg后再开始训练。
 29 | 
 30 |    标签为.xml格式，文件中会有需要检测的目标信息，标签文件和输入图片文件相对应。
 31 | 
 32 | 2、损失值的大小用于判断是否收敛，比较重要的是有收敛的趋势，即验证集损失不断下降，如果验证集损失基本上不改变的话，模型基本上就收敛了。
 33 |    损失值的具体大小并没有什么意义，大和小只在于损失的计算方式，并不是接近于0才好。如果想要让损失好看点，可以直接到对应的损失函数里面除上10000。
 34 |    训练过程中的损失值会保存在logs文件夹下的loss_%Y_%m_%d_%H_%M_%S文件夹中
 35 |    
 36 | 3、训练好的权值文件保存在logs文件夹中，每个训练世代（Epoch）包含若干训练步长（Step），每个训练步长（Step）进行一次梯度下降。
 37 |    如果只是训练了几个Step是不会保存的，Epoch和Step的概念要捋清楚一下。
 38 | '''
 39 | if __name__ == "__main__":
 40 |     #---------------------------------------------------------------------#
 41 |     #   train_gpu   训练用到的GPU
 42 |     #               默认为第一张卡、双卡为[0, 1]、三卡为[0, 1, 2]
 43 |     #               在使用多GPU时，每个卡上的batch为总batch除以卡的数量。
 44 |     #---------------------------------------------------------------------#
 45 |     train_gpu       = [0,]
 46 |     #---------------------------------------------------------------------#
 47 |     #   classes_path    指向model_data下的txt，与自己训练的数据集相关 
 48 |     #                   训练前一定要修改classes_path，使其对应自己的数据集
 49 |     #---------------------------------------------------------------------#
 50 |     classes_path    = 'model_data/voc_classes.txt'
 51 |     #----------------------------------------------------------------------------------------------------------------------------#
 52 |     #   权值文件的下载请看README，可以通过网盘下载。模型的 预训练权重 对不同数据集是通用的，因为特征是通用的。
 53 |     #   模型的 预训练权重 比较重要的部分是 主干特征提取网络的权值部分，用于进行特征提取。
 54 |     #   预训练权重对于99%的情况都必须要用，不用的话主干部分的权值太过随机，特征提取效果不明显，网络训练的结果也不会好
 55 |     #
 56 |     #   如果训练过程中存在中断训练的操作，可以将model_path设置成logs文件夹下的权值文件，将已经训练了一部分的权值再次载入。
 57 |     #   同时修改下方的 冻结阶段 或者 解冻阶段 的参数，来保证模型epoch的连续性。
 58 |     #   
 59 |     #   当model_path = ''的时候不加载整个模型的权值。
 60 |     #
 61 |     #   此处使用的是整个模型的权重，因此是在train.py进行加载的。
 62 |     #   如果想要让模型从主干的预训练权值开始训练，则设置model_path为主干网络的权值，此时仅加载主干。
 63 |     #   如果想要让模型从0开始训练，则设置model_path = ''，Freeze_Train = Fasle，此时从0开始训练，且没有冻结主干的过程。
 64 |     #   一般来讲，从0开始训练效果会很差，因为权值太过随机，特征提取效果不明显。
 65 |     #
 66 |     #   网络一般不从0开始训练，至少会使用主干部分的权值，有些论文提到可以不用预训练，主要原因是他们 数据集较大 且 调参能力优秀。
 67 |     #   如果一定要训练网络的主干部分，可以了解imagenet数据集，首先训练分类模型，分类模型的 主干部分 和该模型通用，基于此进行训练。
 68 |     #----------------------------------------------------------------------------------------------------------------------------#
 69 |     model_path      = 'model_data/essay_mobilenet_ssd_weights.h5'
 70 |     #------------------------------------------------------#
 71 |     #   input_shape     输入的shape大小
 72 |     #------------------------------------------------------#
 73 |     input_shape     = [300, 300]
 74 |     #------------------------------------------------------#
 75 |     #   可用于设定先验框的大小，默认的anchors_size
 76 |     #   是根据voc数据集设定的，大多数情况下都是通用的！
 77 |     #   如果想要检测小物体，可以修改anchors_size
 78 |     #   一般调小浅层先验框的大小就行了！因为浅层负责小物体检测！
 79 |     #   比如anchors_size = [21, 45, 99, 153, 207, 261, 315]
 80 |     #------------------------------------------------------#
 81 |     anchors_size    = [30, 60, 111, 162, 213, 264, 315]
 82 | 
 83 |     #----------------------------------------------------------------------------------------------------------------------------#
 84 |     #   训练分为两个阶段，分别是冻结阶段和解冻阶段。设置冻结阶段是为了满足机器性能不足的同学的训练需求。
 85 |     #   冻结训练需要的显存较小，显卡非常差的情况下，可设置Freeze_Epoch等于UnFreeze_Epoch，此时仅仅进行冻结训练。
 86 |     #      
 87 |     #   在此提供若干参数设置建议，各位训练者根据自己的需求进行灵活调整：
 88 |     #   （一）从整个模型的预训练权重开始训练： 
 89 |     #       Adam：
 90 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adam'，Init_lr = 6e-4，weight_decay = 0。（冻结）
 91 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adam'，Init_lr = 6e-4，weight_decay = 0。（不冻结）
 92 |     #       SGD：
 93 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 200，Freeze_Train = True，optimizer_type = 'sgd'，Init_lr = 2e-3，weight_decay = 5e-4。（冻结）
 94 |     #           Init_Epoch = 0，UnFreeze_Epoch = 200，Freeze_Train = False，optimizer_type = 'sgd'，Init_lr = 2e-3，weight_decay = 5e-4。（不冻结）
 95 |     #       其中：UnFreeze_Epoch可以在100-300之间调整。
 96 |     #   （二）从主干网络的预训练权重开始训练：
 97 |     #       Adam：
 98 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adam'，Init_lr = 6e-4，weight_decay = 0。（冻结）
 99 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adam'，Init_lr = 6e-4，weight_decay = 0。（不冻结）
100 |     #       SGD：
101 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 200，Freeze_Train = True，optimizer_type = 'sgd'，Init_lr = 2e-3，weight_decay = 5e-4。（冻结）
102 |     #           Init_Epoch = 0，UnFreeze_Epoch = 200，Freeze_Train = False，optimizer_type = 'sgd'，Init_lr = 2e-3，weight_decay = 5e-4。（不冻结）
103 |     #       其中：由于从主干网络的预训练权重开始训练，主干的权值不一定适合目标检测，需要更多的训练跳出局部最优解。
104 |     #             UnFreeze_Epoch可以在200-300之间调整，YOLOV5和YOLOX均推荐使用300。
105 |     #             Adam相较于SGD收敛的快一些。因此UnFreeze_Epoch理论上可以小一点，但依然推荐更多的Epoch。
106 |     #   （三）batch_size的设置：
107 |     #       在显卡能够接受的范围内，以大为好。显存不足与数据集大小无关，提示显存不足（OOM或者CUDA out of memory）请调小batch_size。
108 |     #       受到BatchNorm层影响，batch_size最小为2，不能为1。
109 |     #       正常情况下Freeze_batch_size建议为Unfreeze_batch_size的1-2倍。不建议设置的差距过大，因为关系到学习率的自动调整。
110 |     #----------------------------------------------------------------------------------------------------------------------------#
111 |     #------------------------------------------------------------------#
112 |     #   冻结阶段训练参数
113 |     #   此时模型的主干被冻结了，特征提取网络不发生改变
114 |     #   占用的显存较小，仅对网络进行微调
115 |     #   Init_Epoch          模型当前开始的训练世代，其值可以大于Freeze_Epoch，如设置：
116 |     #                       Init_Epoch = 60、Freeze_Epoch = 50、UnFreeze_Epoch = 100
117 |     #                       会跳过冻结阶段，直接从60代开始，并调整对应的学习率。
118 |     #                       （断点续练时使用）
119 |     #   Freeze_Epoch        模型冻结训练的Freeze_Epoch
120 |     #                       (当Freeze_Train=False时失效)
121 |     #   Freeze_batch_size   模型冻结训练的batch_size
122 |     #                       (当Freeze_Train=False时失效)
123 |     #------------------------------------------------------------------#
124 |     Init_Epoch          = 0
125 |     Freeze_Epoch        = 50
126 |     Freeze_batch_size   = 32
127 |     #------------------------------------------------------------------#
128 |     #   解冻阶段训练参数
129 |     #   此时模型的主干不被冻结了，特征提取网络会发生改变
130 |     #   占用的显存较大，网络所有的参数都会发生改变
131 |     #   UnFreeze_Epoch          模型总共训练的epoch
132 |     #                           SGD需要更长的时间收敛，因此设置较大的UnFreeze_Epoch
133 |     #                           Adam可以使用相对较小的UnFreeze_Epoch
134 |     #   Unfreeze_batch_size     模型在解冻后的batch_size
135 |     #------------------------------------------------------------------#
136 |     UnFreeze_Epoch      = 200
137 |     Unfreeze_batch_size = 16
138 |     #------------------------------------------------------------------#
139 |     #   Freeze_Train    是否进行冻结训练
140 |     #                   默认先冻结主干训练后解冻训练。
141 |     #------------------------------------------------------------------#
142 |     Freeze_Train        = True
143 | 
144 |     #------------------------------------------------------------------#
145 |     #   其它训练参数：学习率、优化器、学习率下降有关
146 |     #------------------------------------------------------------------#
147 |     #------------------------------------------------------------------#
148 |     #   Init_lr         模型的最大学习率
149 |     #                   当使用Adam优化器时建议设置  Init_lr=6e-4
150 |     #                   当使用SGD优化器时建议设置   Init_lr=2e-3
151 |     #   Min_lr          模型的最小学习率，默认为最大学习率的0.01
152 |     #------------------------------------------------------------------#
153 |     Init_lr             = 2e-3
154 |     Min_lr              = Init_lr * 0.01
155 |     #------------------------------------------------------------------#
156 |     #   optimizer_type  使用到的优化器种类，可选的有adam、sgd
157 |     #                   当使用Adam优化器时建议设置  Init_lr=6e-4
158 |     #                   当使用SGD优化器时建议设置   Init_lr=2e-3
159 |     #   momentum        优化器内部使用到的momentum参数
160 |     #   weight_decay    权值衰减，可防止过拟合
161 |     #                   adam会导致weight_decay错误，使用adam时建议设置为0。
162 |     #------------------------------------------------------------------#
163 |     optimizer_type      = "sgd"
164 |     momentum            = 0.937
165 |     weight_decay        = 5e-4
166 |     #------------------------------------------------------------------#
167 |     #   lr_decay_type   使用到的学习率下降方式，可选的有'step'、'cos'
168 |     #------------------------------------------------------------------#
169 |     lr_decay_type       = 'cos'
170 |     #------------------------------------------------------------------#
171 |     #   save_period     多少个epoch保存一次权值，默认每个世代都保存
172 |     #------------------------------------------------------------------#
173 |     save_period         = 10
174 |     #------------------------------------------------------------------#
175 |     #   save_dir        权值与日志文件保存的文件夹
176 |     #------------------------------------------------------------------#
177 |     save_dir            = 'logs'
178 |     #------------------------------------------------------------------#
179 |     #   eval_flag       是否在训练时进行评估，评估对象为验证集
180 |     #                   安装pycocotools库后，评估体验更佳。
181 |     #   eval_period     代表多少个epoch评估一次，不建议频繁的评估
182 |     #                   评估需要消耗较多的时间，频繁评估会导致训练非常慢
183 |     #   此处获得的mAP会与get_map.py获得的会有所不同，原因有二：
184 |     #   （一）此处获得的mAP为验证集的mAP。
185 |     #   （二）此处设置评估参数较为保守，目的是加快评估速度。
186 |     #------------------------------------------------------------------#
187 |     eval_flag           = True
188 |     eval_period         = 10
189 |     #------------------------------------------------------------------#
190 |     #   num_workers     用于设置是否使用多线程读取数据，1代表关闭多线程
191 |     #                   开启后会加快数据读取速度，但是会占用更多内存
192 |     #                   keras里开启多线程有些时候速度反而慢了许多
193 |     #                   在IO为瓶颈的时候再开启多线程，即GPU运算速度远大于读取图片的速度。
194 |     #------------------------------------------------------------------#
195 |     num_workers         = 1
196 | 
197 |     #------------------------------------------------------#
198 |     #   train_annotation_path   训练图片路径和标签
199 |     #   val_annotation_path     验证图片路径和标签
200 |     #------------------------------------------------------#
201 |     train_annotation_path   = '2007_train.txt'
202 |     val_annotation_path     = '2007_val.txt'
203 | 
204 |     #------------------------------------------------------#
205 |     #   设置用到的显卡
206 |     #------------------------------------------------------#
207 |     os.environ["CUDA_VISIBLE_DEVICES"]  = ','.join(str(x) for x in train_gpu)
208 |     ngpus_per_node                      = len(train_gpu)
209 |     print('Number of devices: {}'.format(ngpus_per_node))
210 | 
211 |     #----------------------------------------------------#
212 |     #   获取classes和anchor
213 |     #----------------------------------------------------#
214 |     class_names, num_classes = get_classes(classes_path)
215 |     num_classes += 1
216 |     anchors = get_anchors(input_shape, anchors_size)
217 | 
218 |     K.clear_session()
219 |     model_body = SSD300((input_shape[0], input_shape[1], 3), num_classes)
220 |     if model_path != '':
221 |         #------------------------------------------------------#
222 |         #   载入预训练权重
223 |         #------------------------------------------------------#
224 |         print('Load weights {}.'.format(model_path))
225 |         model_body.load_weights(model_path, by_name=True, skip_mismatch=True)
226 |         
227 |     if ngpus_per_node > 1:
228 |         model = multi_gpu_model(model_body, gpus=ngpus_per_node)
229 |     else:
230 |         model = model_body
231 |     
232 |     #---------------------------#
233 |     #   读取数据集对应的txt
234 |     #---------------------------#
235 |     with open(train_annotation_path, encoding='utf-8') as f:
236 |         train_lines = f.readlines()
237 |     with open(val_annotation_path, encoding='utf-8') as f:
238 |         val_lines   = f.readlines()
239 |     num_train   = len(train_lines)
240 |     num_val     = len(val_lines)
241 |     
242 |     show_config(
243 |         classes_path = classes_path, model_path = model_path, input_shape = input_shape, \
244 |         Init_Epoch = Init_Epoch, Freeze_Epoch = Freeze_Epoch, UnFreeze_Epoch = UnFreeze_Epoch, Freeze_batch_size = Freeze_batch_size, Unfreeze_batch_size = Unfreeze_batch_size, Freeze_Train = Freeze_Train, \
245 |         Init_lr = Init_lr, Min_lr = Min_lr, optimizer_type = optimizer_type, momentum = momentum, lr_decay_type = lr_decay_type, \
246 |         save_period = save_period, save_dir = save_dir, num_workers = num_workers, num_train = num_train, num_val = num_val
247 |     )
248 |     #---------------------------------------------------------#
249 |     #   总训练世代指的是遍历全部数据的总次数
250 |     #   总训练步长指的是梯度下降的总次数 
251 |     #   每个训练世代包含若干训练步长，每个训练步长进行一次梯度下降。
252 |     #   此处仅建议最低训练世代，上不封顶，计算时只考虑了解冻部分
253 |     #----------------------------------------------------------#
254 |     wanted_step = 5e4 if optimizer_type == "sgd" else 1.5e4
255 |     total_step  = num_train // Unfreeze_batch_size * UnFreeze_Epoch
256 |     if total_step <= wanted_step:
257 |         if num_train // Unfreeze_batch_size == 0:
258 |             raise ValueError('数据集过小，无法进行训练，请扩充数据集。')
259 |         wanted_epoch = wanted_step // (num_train // Unfreeze_batch_size) + 1
260 |         print("\n\033[1;33;44m[Warning] 使用%s优化器时，建议将训练总步长设置到%d以上。\033[0m"%(optimizer_type, wanted_step))
261 |         print("\033[1;33;44m[Warning] 本次运行的总训练数据量为%d，Unfreeze_batch_size为%d，共训练%d个Epoch，计算出总训练步长为%d。\033[0m"%(num_train, Unfreeze_batch_size, UnFreeze_Epoch, total_step))
262 |         print("\033[1;33;44m[Warning] 由于总训练步长为%d，小于建议总步长%d，建议设置总世代为%d。\033[0m"%(total_step, wanted_step, wanted_epoch))
263 | 
264 |     for layer in model_body.layers:
265 |         if isinstance(layer, DepthwiseConv2D):
266 |                 layer.add_loss(l2(weight_decay)(layer.depthwise_kernel))
267 |         elif isinstance(layer, Conv2D) or isinstance(layer, Dense):
268 |                 layer.add_loss(l2(weight_decay)(layer.kernel))
269 | 
270 |     #------------------------------------------------------#
271 |     #   主干特征提取网络特征通用，冻结训练可以加快训练速度
272 |     #   也可以在训练初期防止权值被破坏。
273 |     #   Init_Epoch为起始世代
274 |     #   Freeze_Epoch为冻结训练的世代
275 |     #   Unfreeze_Epoch总训练世代
276 |     #   提示OOM或者显存不足请调小Batch_size
277 |     #------------------------------------------------------#
278 |     if True:
279 |         if Freeze_Train:
280 |             freeze_layers = 81
281 |             for i in range(freeze_layers): model_body.layers[i].trainable = False
282 |             print('Freeze the first {} layers of total {} layers.'.format(freeze_layers, len(model_body.layers)))
283 | 
284 |         #-------------------------------------------------------------------#
285 |         #   如果不冻结训练的话，直接设置batch_size为Unfreeze_batch_size
286 |         #-------------------------------------------------------------------#
287 |         batch_size  = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size
288 |         start_epoch = Init_Epoch
289 |         end_epoch   = Freeze_Epoch if Freeze_Train else UnFreeze_Epoch
290 |         
291 |         #-------------------------------------------------------------------#
292 |         #   判断当前batch_size，自适应调整学习率
293 |         #-------------------------------------------------------------------#
294 |         nbs             = 64
295 |         lr_limit_max    = 1e-3 if optimizer_type == 'adam' else 5e-2
296 |         lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-5
297 |         Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
298 |         Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
299 | 
300 |         optimizer = {
301 |             'adam'  : Adam(lr = Init_lr_fit, beta_1 = momentum),
302 |             'sgd'   : SGD(lr = Init_lr_fit, momentum = momentum, nesterov=True)
303 |         }[optimizer_type]
304 |         model.compile(optimizer=optimizer, loss = MultiboxLoss(num_classes, neg_pos_ratio=3.0).compute_loss)
305 |     
306 |         #---------------------------------------#
307 |         #   获得学习率下降的公式
308 |         #---------------------------------------#
309 |         lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
310 | 
311 |         epoch_step          = num_train // batch_size
312 |         epoch_step_val      = num_val // batch_size
313 | 
314 |         if epoch_step == 0 or epoch_step_val == 0:
315 |             raise ValueError('数据集过小，无法进行训练，请扩充数据集。')
316 | 
317 |         train_dataloader    = SSDDatasets(train_lines, input_shape, anchors, batch_size, num_classes, train = True)
318 |         val_dataloader      = SSDDatasets(val_lines, input_shape, anchors, batch_size, num_classes, train = False)
319 | 
320 |         #-------------------------------------------------------------------------------#
321 |         #   训练参数的设置
322 |         #   logging         用于设置tensorboard的保存地址
323 |         #   checkpoint      用于设置权值保存的细节，period用于修改多少epoch保存一次
324 |         #   lr_scheduler       用于设置学习率下降的方式
325 |         #   early_stopping  用于设定早停，val_loss多次不下降自动结束训练，表示模型基本收敛
326 |         #-------------------------------------------------------------------------------#
327 |         time_str        = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S')
328 |         log_dir         = os.path.join(save_dir, "loss_" + str(time_str))
329 |         logging         = TensorBoard(log_dir)
330 |         loss_history    = LossHistory(log_dir)
331 |         if ngpus_per_node > 1:
332 |             checkpoint      = ParallelModelCheckpoint(model_body, os.path.join(save_dir, "ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5"), 
333 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = save_period)
334 |             checkpoint_last = ParallelModelCheckpoint(model_body, os.path.join(save_dir, "last_epoch_weights.h5"), 
335 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = 1)
336 |             checkpoint_best = ParallelModelCheckpoint(model_body, os.path.join(save_dir, "best_epoch_weights.h5"), 
337 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = True, period = 1)
338 |         else:
339 |             checkpoint      = ModelCheckpoint(os.path.join(save_dir, "ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5"), 
340 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = save_period)
341 |             checkpoint_last = ModelCheckpoint(os.path.join(save_dir, "last_epoch_weights.h5"), 
342 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = False, period = 1)
343 |             checkpoint_best = ModelCheckpoint(os.path.join(save_dir, "best_epoch_weights.h5"), 
344 |                                     monitor = 'val_loss', save_weights_only = True, save_best_only = True, period = 1)
345 |         early_stopping  = EarlyStopping(monitor='val_loss', min_delta = 0, patience = 10, verbose = 1)
346 |         lr_scheduler    = LearningRateScheduler(lr_scheduler_func, verbose = 1)
347 |         eval_callback   = EvalCallback(model_body, input_shape, anchors, class_names, num_classes, val_lines, log_dir, \
348 |                                         eval_flag=eval_flag, period=eval_period)
349 |         callbacks       = [logging, loss_history, checkpoint, checkpoint_last, checkpoint_best, lr_scheduler, eval_callback]
350 | 
351 |         if start_epoch < end_epoch:
352 |             print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
353 |             model.fit_generator(
354 |                 generator           = train_dataloader,
355 |                 steps_per_epoch     = epoch_step,
356 |                 validation_data     = val_dataloader,
357 |                 validation_steps    = epoch_step_val,
358 |                 epochs              = end_epoch,
359 |                 initial_epoch       = start_epoch,
360 |                 use_multiprocessing = True if num_workers > 1 else False,
361 |                 workers             = num_workers,
362 |                 callbacks           = callbacks
363 |             )
364 |         #---------------------------------------#
365 |         #   如果模型有冻结学习部分
366 |         #   则解冻，并设置参数
367 |         #---------------------------------------#
368 |         if Freeze_Train:
369 |             batch_size  = Unfreeze_batch_size
370 |             start_epoch = Freeze_Epoch if start_epoch < Freeze_Epoch else start_epoch
371 |             end_epoch   = UnFreeze_Epoch
372 |                 
373 |             #-------------------------------------------------------------------#
374 |             #   判断当前batch_size，自适应调整学习率
375 |             #-------------------------------------------------------------------#
376 |             nbs             = 64
377 |             lr_limit_max    = 1e-3 if optimizer_type == 'adam' else 5e-2
378 |             lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-5
379 |             Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
380 |             Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
381 |             #---------------------------------------#
382 |             #   获得学习率下降的公式
383 |             #---------------------------------------#
384 |             lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
385 |             lr_scheduler    = LearningRateScheduler(lr_scheduler_func, verbose = 1)
386 |             callbacks       = [logging, loss_history, checkpoint, checkpoint_last, checkpoint_best, lr_scheduler, eval_callback]
387 | 
388 |             for i in range(len(model_body.layers)): 
389 |                 model_body.layers[i].trainable = True
390 |             model.compile(optimizer=optimizer, loss = MultiboxLoss(num_classes, neg_pos_ratio=3.0).compute_loss)
391 | 
392 |             epoch_step      = num_train // batch_size
393 |             epoch_step_val  = num_val // batch_size
394 | 
395 |             if epoch_step == 0 or epoch_step_val == 0:
396 |                 raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")
397 | 
398 |             train_dataloader.batch_size    = Unfreeze_batch_size
399 |             val_dataloader.batch_size      = Unfreeze_batch_size
400 | 
401 |             print('Train on {} samples, val on {} samples, with batch size {}.'.format(num_train, num_val, batch_size))
402 |             model.fit_generator(
403 |                 generator           = train_dataloader,
404 |                 steps_per_epoch     = epoch_step,
405 |                 validation_data     = val_dataloader,
406 |                 validation_steps    = epoch_step_val,
407 |                 epochs              = end_epoch,
408 |                 initial_epoch       = start_epoch,
409 |                 use_multiprocessing = True if num_workers > 1 else False,
410 |                 workers             = num_workers,
411 |                 callbacks           = callbacks
412 |             )
413 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/utils/anchors.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | class AnchorBox():
  5 |     def __init__(self, input_shape, min_size, max_size=None, aspect_ratios=None, flip=True):
  6 |         self.input_shape = input_shape
  7 | 
  8 |         self.min_size = min_size
  9 |         self.max_size = max_size
 10 | 
 11 |         self.aspect_ratios = []
 12 |         for ar in aspect_ratios:
 13 |             self.aspect_ratios.append(ar)
 14 |             self.aspect_ratios.append(1.0 / ar)
 15 | 
 16 |     def call(self, layer_shape, mask=None):
 17 |         # --------------------------------- #
 18 |         #   获取输入进来的特征层的宽和高
 19 |         #   比如38x38
 20 |         # --------------------------------- #
 21 |         layer_height    = layer_shape[0]
 22 |         layer_width     = layer_shape[1]
 23 |         # --------------------------------- #
 24 |         #   获取输入进来的图片的宽和高
 25 |         #   比如300x300
 26 |         # --------------------------------- #
 27 |         img_height  = self.input_shape[0]
 28 |         img_width   = self.input_shape[1]
 29 | 
 30 |         box_widths  = []
 31 |         box_heights = []
 32 |         # --------------------------------- #
 33 |         #   self.aspect_ratios一般有两个值
 34 |         #   [1, 1, 2, 1/2]
 35 |         #   [1, 1, 2, 1/2, 3, 1/3]
 36 |         # --------------------------------- #
 37 |         for ar in self.aspect_ratios:
 38 |             # 首先添加一个较小的正方形
 39 |             if ar == 1 and len(box_widths) == 0:
 40 |                 box_widths.append(self.min_size)
 41 |                 box_heights.append(self.min_size)
 42 |             # 然后添加一个较大的正方形
 43 |             elif ar == 1 and len(box_widths) > 0:
 44 |                 box_widths.append(np.sqrt(self.min_size * self.max_size))
 45 |                 box_heights.append(np.sqrt(self.min_size * self.max_size))
 46 |             # 然后添加长方形
 47 |             elif ar != 1:
 48 |                 box_widths.append(self.min_size * np.sqrt(ar))
 49 |                 box_heights.append(self.min_size / np.sqrt(ar))
 50 | 
 51 |         # --------------------------------- #
 52 |         #   获得所有先验框的宽高1/2
 53 |         # --------------------------------- #
 54 |         box_widths  = 0.5 * np.array(box_widths)
 55 |         box_heights = 0.5 * np.array(box_heights)
 56 | 
 57 |         # --------------------------------- #
 58 |         #   每一个特征层对应的步长
 59 |         # --------------------------------- #
 60 |         step_x = img_width / layer_width
 61 |         step_y = img_height / layer_height
 62 | 
 63 |         # --------------------------------- #
 64 |         #   生成网格中心
 65 |         # --------------------------------- #
 66 |         linx = np.linspace(0.5 * step_x, img_width - 0.5 * step_x,
 67 |                            layer_width)
 68 |         liny = np.linspace(0.5 * step_y, img_height - 0.5 * step_y,
 69 |                            layer_height)
 70 |         centers_x, centers_y = np.meshgrid(linx, liny)
 71 |         centers_x = centers_x.reshape(-1, 1)
 72 |         centers_y = centers_y.reshape(-1, 1)
 73 | 
 74 |         # 每一个先验框需要两个(centers_x, centers_y)，前一个用来计算左上角，后一个计算右下角
 75 |         num_anchors_ = len(self.aspect_ratios)
 76 |         anchor_boxes = np.concatenate((centers_x, centers_y), axis=1)
 77 |         anchor_boxes = np.tile(anchor_boxes, (1, 2 * num_anchors_))
 78 |         
 79 |         # 获得先验框的左上角和右下角
 80 |         anchor_boxes[:, ::4]    -= box_widths
 81 |         anchor_boxes[:, 1::4]   -= box_heights
 82 |         anchor_boxes[:, 2::4]   += box_widths
 83 |         anchor_boxes[:, 3::4]   += box_heights
 84 | 
 85 |         # --------------------------------- #
 86 |         #   将先验框变成小数的形式
 87 |         #   归一化
 88 |         # --------------------------------- #
 89 |         anchor_boxes[:, ::2]    /= img_width
 90 |         anchor_boxes[:, 1::2]   /= img_height
 91 |         anchor_boxes = anchor_boxes.reshape(-1, 4)
 92 | 
 93 |         anchor_boxes = np.minimum(np.maximum(anchor_boxes, 0.0), 1.0)
 94 |         return anchor_boxes
 95 | 
 96 | #---------------------------------------------------#
 97 | #   用于计算共享特征层的大小
 98 | #---------------------------------------------------#
 99 | def get_mobilenet_output_length(height, width):
100 |     filter_sizes    = [3, 3, 3, 3, 3, 3, 3, 3, 3]
101 |     padding         = [1, 1, 1, 1, 1, 1, 1, 1, 1]
102 |     stride          = [2, 2, 2, 2, 2, 2, 2, 2, 2]
103 |     feature_heights = []
104 |     feature_widths  = []
105 | 
106 |     for i in range(len(filter_sizes)):
107 |         height  = (height + 2*padding[i] - filter_sizes[i]) // stride[i] + 1
108 |         width   = (width + 2*padding[i] - filter_sizes[i]) // stride[i] + 1
109 |         feature_heights.append(height)
110 |         feature_widths.append(width)
111 |     return np.array(feature_heights)[-6:], np.array(feature_widths)[-6:]
112 | 
113 | def get_anchors(input_shape = [300,300], anchors_size = [30, 60, 111, 162, 213, 264, 315]):
114 |     feature_heights, feature_widths = get_mobilenet_output_length(input_shape[0], input_shape[1])
115 |     aspect_ratios = [[1, 2], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
116 |     anchors = []
117 |     for i in range(len(feature_heights)):
118 |         anchors.append(AnchorBox(input_shape, anchors_size[i], max_size = anchors_size[i+1], 
119 |                     aspect_ratios = aspect_ratios[i]).call([feature_heights[i], feature_widths[i]]))
120 | 
121 |     anchors = np.concatenate(anchors, axis=0)
122 |     return anchors
123 | 


--------------------------------------------------------------------------------
/utils/callbacks.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import math
  4 | import keras
  5 | import matplotlib
  6 | matplotlib.use('Agg')
  7 | from matplotlib import pyplot as plt
  8 | import scipy.signal
  9 | 
 10 | import shutil
 11 | import numpy as np
 12 | 
 13 | from keras import backend as K
 14 | from keras.applications.imagenet_utils import preprocess_input
 15 | from PIL import Image
 16 | from tqdm import tqdm
 17 | from .utils import cvtColor, resize_image
 18 | from .utils_bbox import BBoxUtility
 19 | from .utils_map import get_coco_map, get_map
 20 | 
 21 | 
 22 | class LossHistory(keras.callbacks.Callback):
 23 |     def __init__(self, log_dir):
 24 |         self.log_dir    = log_dir
 25 |         self.losses     = []
 26 |         self.val_loss   = []
 27 |         
 28 |         os.makedirs(self.log_dir)
 29 | 
 30 |     def on_epoch_end(self, epoch, logs={}):
 31 |         if not os.path.exists(self.log_dir):
 32 |             os.makedirs(self.log_dir)
 33 | 
 34 |         self.losses.append(logs.get('loss'))
 35 |         self.val_loss.append(logs.get('val_loss'))
 36 |         
 37 |         with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f:
 38 |             f.write(str(logs.get('loss')))
 39 |             f.write("\n")
 40 |         with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f:
 41 |             f.write(str(logs.get('val_loss')))
 42 |             f.write("\n")
 43 |         self.loss_plot()
 44 | 
 45 |     def loss_plot(self):
 46 |         iters = range(len(self.losses))
 47 | 
 48 |         plt.figure()
 49 |         plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss')
 50 |         plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss')
 51 |         try:
 52 |             if len(self.losses) < 25:
 53 |                 num = 5
 54 |             else:
 55 |                 num = 15
 56 |             
 57 |             plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss')
 58 |             plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss')
 59 |         except:
 60 |             pass
 61 | 
 62 |         plt.grid(True)
 63 |         plt.xlabel('Epoch')
 64 |         plt.ylabel('Loss')
 65 |         plt.title('A Loss Curve')
 66 |         plt.legend(loc="upper right")
 67 | 
 68 |         plt.savefig(os.path.join(self.log_dir, "epoch_loss.png"))
 69 | 
 70 |         plt.cla()
 71 |         plt.close("all")
 72 | 
 73 | class ExponentDecayScheduler(keras.callbacks.Callback):
 74 |     def __init__(self,
 75 |                  decay_rate,
 76 |                  verbose=0):
 77 |         super(ExponentDecayScheduler, self).__init__()
 78 |         self.decay_rate         = decay_rate
 79 |         self.verbose            = verbose
 80 |         self.learning_rates     = []
 81 | 
 82 |     def on_epoch_end(self, batch, logs=None):
 83 |         learning_rate = K.get_value(self.model.optimizer.lr) * self.decay_rate
 84 |         K.set_value(self.model.optimizer.lr, learning_rate)
 85 |         if self.verbose > 0:
 86 |             print('Setting learning rate to %s.' % (learning_rate))
 87 | 
 88 | class WarmUpCosineDecayScheduler(keras.callbacks.Callback):
 89 |     def __init__(self, T_max, eta_min=0, verbose=0):
 90 |         super(WarmUpCosineDecayScheduler, self).__init__()
 91 |         self.T_max      = T_max
 92 |         self.eta_min    = eta_min
 93 |         self.verbose    = verbose
 94 |         self.init_lr    = 0
 95 |         self.last_epoch = 0
 96 | 
 97 |     def on_train_begin(self, batch, logs=None):
 98 |         self.init_lr = K.get_value(self.model.optimizer.lr)
 99 | 
100 |     def on_epoch_end(self, batch, logs=None):
101 |         learning_rate = self.eta_min + (self.init_lr - self.eta_min) * (1 + math.cos(math.pi * self.last_epoch / self.T_max)) / 2
102 |         self.last_epoch += 1
103 | 
104 |         K.set_value(self.model.optimizer.lr, learning_rate)
105 |         if self.verbose > 0:
106 |             print('Setting learning rate to %s.' % (learning_rate))
107 |     
108 | class ParallelModelCheckpoint(keras.callbacks.ModelCheckpoint):
109 |     def __init__(self, model, filepath, monitor='val_loss', verbose=0,
110 |                  save_best_only=False, save_weights_only=False,
111 |                  mode='auto', period=1):
112 |         self.single_model = model
113 |         super(ParallelModelCheckpoint,self).__init__(filepath, monitor, verbose,save_best_only, save_weights_only,mode, period)
114 | 
115 |     def set_model(self, model):
116 |         super(ParallelModelCheckpoint,self).set_model(self.single_model)
117 | 
118 | class EvalCallback(keras.callbacks.Callback):
119 |     def __init__(self, model_body, input_shape, anchors, class_names, num_classes, val_lines, log_dir,\
120 |             map_out_path=".temp_map_out", max_boxes=100, confidence=0.05, nms_iou=0.5, letterbox_image=True, MINOVERLAP=0.5, eval_flag=True, period=1):
121 |         super(EvalCallback, self).__init__()
122 |         
123 |         self.model_body         = model_body
124 |         self.input_shape        = input_shape
125 |         self.anchors            = anchors
126 |         self.class_names        = class_names
127 |         self.num_classes        = num_classes
128 |         self.val_lines          = val_lines
129 |         self.log_dir            = log_dir
130 |         self.map_out_path       = map_out_path
131 |         self.max_boxes          = max_boxes
132 |         self.confidence         = confidence
133 |         self.nms_iou            = nms_iou
134 |         self.letterbox_image    = letterbox_image
135 |         self.MINOVERLAP         = MINOVERLAP
136 |         self.eval_flag          = eval_flag
137 |         self.period             = period
138 |         
139 |         #---------------------------------------------------------#
140 |         #   在yolo_eval函数中，我们会对预测结果进行后处理
141 |         #   后处理的内容包括，解码、非极大抑制、门限筛选等
142 |         #---------------------------------------------------------#
143 |         self.bbox_util = BBoxUtility(self.num_classes, nms_thresh=self.nms_iou)
144 |         
145 |         self.maps       = [0]
146 |         self.epoches    = [0]
147 |         if self.eval_flag:
148 |             with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
149 |                 f.write(str(0))
150 |                 f.write("\n")
151 | 
152 |     def get_map_txt(self, image_id, image, class_names, map_out_path):
153 |         f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 
154 |         image_shape = np.array(np.shape(image)[0:2])
155 |         #---------------------------------------------------------#
156 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
157 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
158 |         #---------------------------------------------------------#
159 |         image       = cvtColor(image)
160 |         #---------------------------------------------------------#
161 |         #   给图像增加灰条，实现不失真的resize
162 |         #   也可以直接resize进行识别
163 |         #---------------------------------------------------------#
164 |         image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
165 |         #---------------------------------------------------------#
166 |         #   添加上batch_size维度，图片预处理，归一化。
167 |         #---------------------------------------------------------#
168 |         image_data = preprocess_input(np.expand_dims(np.array(image_data, dtype='float32'), 0))
169 | 
170 |         preds      = self.model_body.predict(image_data)
171 |         #-----------------------------------------------------------#
172 |         #   将预测结果进行解码
173 |         #-----------------------------------------------------------#
174 |         results     = self.bbox_util.decode_box(preds, self.anchors, image_shape, 
175 |                                                 self.input_shape, self.letterbox_image, confidence=self.confidence)
176 |         #--------------------------------------#
177 |         #   如果没有检测到物体，则返回原图
178 |         #--------------------------------------#
179 |         if len(results[0])<=0:
180 |             return 
181 | 
182 |         top_label   = results[0][:, 4]
183 |         top_conf    = results[0][:, 5]
184 |         top_boxes   = results[0][:, :4]
185 | 
186 |         top_100     = np.argsort(top_conf)[::-1][:self.max_boxes]
187 |         top_boxes   = top_boxes[top_100]
188 |         top_conf    = top_conf[top_100]
189 |         top_label   = top_label[top_100]
190 | 
191 |         for i, c in list(enumerate(top_label)):
192 |             predicted_class = self.class_names[int(c)]
193 |             box             = top_boxes[i]
194 |             score           = str(top_conf[i])
195 | 
196 |             top, left, bottom, right = box
197 |             if predicted_class not in class_names:
198 |                 continue
199 | 
200 |             f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
201 | 
202 |         f.close()
203 |         return 
204 |     
205 |     def on_epoch_end(self, epoch, logs=None):
206 |         temp_epoch = epoch + 1
207 |         if temp_epoch % self.period == 0 and self.eval_flag:
208 |             if not os.path.exists(self.map_out_path):
209 |                 os.makedirs(self.map_out_path)
210 |             if not os.path.exists(os.path.join(self.map_out_path, "ground-truth")):
211 |                 os.makedirs(os.path.join(self.map_out_path, "ground-truth"))
212 |             if not os.path.exists(os.path.join(self.map_out_path, "detection-results")):
213 |                 os.makedirs(os.path.join(self.map_out_path, "detection-results"))
214 |             print("Get map.")
215 |             for annotation_line in tqdm(self.val_lines):
216 |                 line        = annotation_line.split()
217 |                 image_id    = os.path.basename(line[0]).split('.')[0]
218 |                 #------------------------------#
219 |                 #   读取图像并转换成RGB图像
220 |                 #------------------------------#
221 |                 image       = Image.open(line[0])
222 |                 #------------------------------#
223 |                 #   获得预测框
224 |                 #------------------------------#
225 |                 gt_boxes    = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
226 |                 #------------------------------#
227 |                 #   获得预测txt
228 |                 #------------------------------#
229 |                 self.get_map_txt(image_id, image, self.class_names, self.map_out_path)
230 |                 
231 |                 #------------------------------#
232 |                 #   获得真实框txt
233 |                 #------------------------------#
234 |                 with open(os.path.join(self.map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
235 |                     for box in gt_boxes:
236 |                         left, top, right, bottom, obj = box
237 |                         obj_name = self.class_names[obj]
238 |                         new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
239 |                         
240 |             print("Calculate Map.")
241 |             try:
242 |                 temp_map = get_coco_map(class_names = self.class_names, path = self.map_out_path)[1]
243 |             except:
244 |                 temp_map = get_map(self.MINOVERLAP, False, path = self.map_out_path)
245 |             self.maps.append(temp_map)
246 |             self.epoches.append(temp_epoch)
247 | 
248 |             with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
249 |                 f.write(str(temp_map))
250 |                 f.write("\n")
251 |             
252 |             plt.figure()
253 |             plt.plot(self.epoches, self.maps, 'red', linewidth = 2, label='train map')
254 | 
255 |             plt.grid(True)
256 |             plt.xlabel('Epoch')
257 |             plt.ylabel('Map %s'%str(self.MINOVERLAP))
258 |             plt.title('A Map Curve')
259 |             plt.legend(loc="upper right")
260 | 
261 |             plt.savefig(os.path.join(self.log_dir, "epoch_map.png"))
262 |             plt.cla()
263 |             plt.close("all")
264 | 
265 |             print("Get map done.")
266 |             shutil.rmtree(self.map_out_path)
267 | 


--------------------------------------------------------------------------------
/utils/dataloader.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | from random import shuffle
  3 | 
  4 | import cv2
  5 | import keras
  6 | import numpy as np
  7 | from keras.applications.imagenet_utils import preprocess_input
  8 | from PIL import Image
  9 | 
 10 | from utils.utils import cvtColor
 11 | 
 12 | 
 13 | class SSDDatasets(keras.utils.Sequence):
 14 |     def __init__(self, annotation_lines, input_shape, anchors, batch_size, num_classes, train, overlap_threshold = 0.5):
 15 |         self.annotation_lines   = annotation_lines
 16 |         self.length             = len(self.annotation_lines)
 17 |         
 18 |         self.input_shape        = input_shape
 19 |         self.anchors            = anchors
 20 |         self.num_anchors        = len(anchors)
 21 |         self.batch_size         = batch_size
 22 |         self.num_classes        = num_classes
 23 |         self.train              = train
 24 |         self.overlap_threshold  = overlap_threshold
 25 | 
 26 |     def __len__(self):
 27 |         return math.ceil(len(self.annotation_lines) / float(self.batch_size))
 28 | 
 29 |     def __getitem__(self, index):
 30 |         image_data  = []
 31 |         box_data    = []
 32 |         for i in range(index * self.batch_size, (index + 1) * self.batch_size):  
 33 |             i           = i % self.length
 34 |             #---------------------------------------------------#
 35 |             #   训练时进行数据的随机增强
 36 |             #   验证时不进行数据的随机增强
 37 |             #---------------------------------------------------#
 38 |             image, box  = self.get_random_data(self.annotation_lines[i], self.input_shape, random = self.train)
 39 |             if len(box)!=0:
 40 |                 boxes               = np.array(box[:,:4] , dtype=np.float32)
 41 |                 # 进行归一化，调整到0-1之间
 42 |                 boxes[:, [0, 2]]    = boxes[:,[0, 2]] / self.input_shape[1]
 43 |                 boxes[:, [1, 3]]    = boxes[:,[1, 3]] / self.input_shape[0]
 44 |                 # 对真实框的种类进行one hot处理
 45 |                 one_hot_label   = np.eye(self.num_classes - 1)[np.array(box[:,4], np.int32)]
 46 |                 box             = np.concatenate([boxes, one_hot_label], axis=-1)
 47 |             box = self.assign_boxes(box)
 48 | 
 49 |             image_data.append(image)               
 50 |             box_data.append(box)
 51 | 
 52 |         return preprocess_input(np.array(image_data, np.float32)), np.array(box_data)
 53 | 
 54 |     def on_epoch_end(self):
 55 |         shuffle(self.annotation_lines)
 56 | 
 57 |     def rand(self, a=0, b=1):
 58 |         return np.random.rand()*(b-a) + a
 59 | 
 60 |     def get_random_data(self, annotation_line, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.4, random=True):
 61 |         line = annotation_line.split()
 62 |         #------------------------------#
 63 |         #   读取图像并转换成RGB图像
 64 |         #------------------------------#
 65 |         image   = Image.open(line[0])
 66 |         image   = cvtColor(image)
 67 |         #------------------------------#
 68 |         #   获得图像的高宽与目标高宽
 69 |         #------------------------------#
 70 |         iw, ih  = image.size
 71 |         h, w    = input_shape
 72 |         #------------------------------#
 73 |         #   获得预测框
 74 |         #------------------------------#
 75 |         box     = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
 76 | 
 77 |         if not random:
 78 |             scale = min(w/iw, h/ih)
 79 |             nw = int(iw*scale)
 80 |             nh = int(ih*scale)
 81 |             dx = (w-nw)//2
 82 |             dy = (h-nh)//2
 83 | 
 84 |             #---------------------------------#
 85 |             #   将图像多余的部分加上灰条
 86 |             #---------------------------------#
 87 |             image       = image.resize((nw,nh), Image.BICUBIC)
 88 |             new_image   = Image.new('RGB', (w,h), (128,128,128))
 89 |             new_image.paste(image, (dx, dy))
 90 |             image_data  = np.array(new_image, np.float32)
 91 | 
 92 |             #---------------------------------#
 93 |             #   对真实框进行调整
 94 |             #---------------------------------#
 95 |             if len(box)>0:
 96 |                 np.random.shuffle(box)
 97 |                 box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
 98 |                 box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
 99 |                 box[:, 0:2][box[:, 0:2]<0] = 0
100 |                 box[:, 2][box[:, 2]>w] = w
101 |                 box[:, 3][box[:, 3]>h] = h
102 |                 box_w = box[:, 2] - box[:, 0]
103 |                 box_h = box[:, 3] - box[:, 1]
104 |                 box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
105 | 
106 |             return image_data, box
107 |                 
108 |         #------------------------------------------#
109 |         #   对图像进行缩放并且进行长和宽的扭曲
110 |         #------------------------------------------#
111 |         new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter)
112 |         scale = self.rand(.25, 2)
113 |         if new_ar < 1:
114 |             nh = int(scale*h)
115 |             nw = int(nh*new_ar)
116 |         else:
117 |             nw = int(scale*w)
118 |             nh = int(nw/new_ar)
119 |         image = image.resize((nw,nh), Image.BICUBIC)
120 | 
121 |         #------------------------------------------#
122 |         #   将图像多余的部分加上灰条
123 |         #------------------------------------------#
124 |         dx = int(self.rand(0, w-nw))
125 |         dy = int(self.rand(0, h-nh))
126 |         new_image = Image.new('RGB', (w,h), (128,128,128))
127 |         new_image.paste(image, (dx, dy))
128 |         image = new_image
129 | 
130 |         #------------------------------------------#
131 |         #   翻转图像
132 |         #------------------------------------------#
133 |         flip = self.rand()<.5
134 |         if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
135 | 
136 |         image_data      = np.array(image, np.uint8)
137 |         #---------------------------------#
138 |         #   对图像进行色域变换
139 |         #   计算色域变换的参数
140 |         #---------------------------------#
141 |         r               = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1
142 |         #---------------------------------#
143 |         #   将图像转到HSV上
144 |         #---------------------------------#
145 |         hue, sat, val   = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV))
146 |         dtype           = image_data.dtype
147 |         #---------------------------------#
148 |         #   应用变换
149 |         #---------------------------------#
150 |         x       = np.arange(0, 256, dtype=r.dtype)
151 |         lut_hue = ((x * r[0]) % 180).astype(dtype)
152 |         lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
153 |         lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
154 | 
155 |         image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))
156 |         image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB)
157 | 
158 |         #---------------------------------#
159 |         #   对真实框进行调整
160 |         #---------------------------------#
161 |         if len(box)>0:
162 |             np.random.shuffle(box)
163 |             box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
164 |             box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
165 |             if flip: box[:, [0,2]] = w - box[:, [2,0]]
166 |             box[:, 0:2][box[:, 0:2]<0] = 0
167 |             box[:, 2][box[:, 2]>w] = w
168 |             box[:, 3][box[:, 3]>h] = h
169 |             box_w = box[:, 2] - box[:, 0]
170 |             box_h = box[:, 3] - box[:, 1]
171 |             box = box[np.logical_and(box_w>1, box_h>1)] 
172 |         
173 |         return image_data, box
174 | 
175 |     def iou(self, box):
176 |         #---------------------------------------------#
177 |         #   计算出每个真实框与所有的先验框的iou
178 |         #   判断真实框与先验框的重合情况
179 |         #---------------------------------------------#
180 |         inter_upleft    = np.maximum(self.anchors[:, :2], box[:2])
181 |         inter_botright  = np.minimum(self.anchors[:, 2:4], box[2:])
182 | 
183 |         inter_wh    = inter_botright - inter_upleft
184 |         inter_wh    = np.maximum(inter_wh, 0)
185 |         inter       = inter_wh[:, 0] * inter_wh[:, 1]
186 |         #---------------------------------------------# 
187 |         #   真实框的面积
188 |         #---------------------------------------------#
189 |         area_true = (box[2] - box[0]) * (box[3] - box[1])
190 |         #---------------------------------------------#
191 |         #   先验框的面积
192 |         #---------------------------------------------#
193 |         area_gt = (self.anchors[:, 2] - self.anchors[:, 0])*(self.anchors[:, 3] - self.anchors[:, 1])
194 |         #---------------------------------------------#
195 |         #   计算iou
196 |         #---------------------------------------------#
197 |         union = area_true + area_gt - inter
198 | 
199 |         iou = inter / union
200 |         return iou
201 | 
202 |     def encode_box(self, box, return_iou=True, variances = [0.1, 0.1, 0.2, 0.2]):
203 |         #---------------------------------------------#
204 |         #   计算当前真实框和先验框的重合情况
205 |         #   iou [self.num_anchors]
206 |         #   encoded_box [self.num_anchors, 5]
207 |         #---------------------------------------------#
208 |         iou = self.iou(box)
209 |         encoded_box = np.zeros((self.num_anchors, 4 + return_iou))
210 |         
211 |         #---------------------------------------------#
212 |         #   找到每一个真实框，重合程度较高的先验框
213 |         #   真实框可以由这个先验框来负责预测
214 |         #---------------------------------------------#
215 |         assign_mask = iou > self.overlap_threshold
216 | 
217 |         #---------------------------------------------#
218 |         #   如果没有一个先验框重合度大于self.overlap_threshold
219 |         #   则选择重合度最大的为正样本
220 |         #---------------------------------------------#
221 |         if not assign_mask.any():
222 |             assign_mask[iou.argmax()] = True
223 |         
224 |         #---------------------------------------------#
225 |         #   利用iou进行赋值 
226 |         #---------------------------------------------#
227 |         if return_iou:
228 |             encoded_box[:, -1][assign_mask] = iou[assign_mask]
229 |         
230 |         #---------------------------------------------#
231 |         #   找到对应的先验框
232 |         #---------------------------------------------#
233 |         assigned_anchors = self.anchors[assign_mask]
234 | 
235 |         #---------------------------------------------#
236 |         #   逆向编码，将真实框转化为ssd预测结果的格式
237 |         #   先计算真实框的中心与长宽
238 |         #---------------------------------------------#
239 |         box_center  = 0.5 * (box[:2] + box[2:])
240 |         box_wh      = box[2:] - box[:2]
241 |         #---------------------------------------------#
242 |         #   再计算重合度较高的先验框的中心与长宽
243 |         #---------------------------------------------#
244 |         assigned_anchors_center = (assigned_anchors[:, 0:2] + assigned_anchors[:, 2:4]) * 0.5
245 |         assigned_anchors_wh     = (assigned_anchors[:, 2:4] - assigned_anchors[:, 0:2])
246 |         
247 |         #------------------------------------------------#
248 |         #   逆向求取ssd应该有的预测结果
249 |         #   先求取中心的预测结果，再求取宽高的预测结果
250 |         #   存在改变数量级的参数，默认为[0.1,0.1,0.2,0.2]
251 |         #------------------------------------------------#
252 |         encoded_box[:, :2][assign_mask] = box_center - assigned_anchors_center
253 |         encoded_box[:, :2][assign_mask] /= assigned_anchors_wh
254 |         encoded_box[:, :2][assign_mask] /= np.array(variances)[:2]
255 | 
256 |         encoded_box[:, 2:4][assign_mask] = np.log(box_wh / assigned_anchors_wh)
257 |         encoded_box[:, 2:4][assign_mask] /= np.array(variances)[2:4]
258 |         return encoded_box.ravel()
259 | 
260 |     def assign_boxes(self, boxes):
261 |         #---------------------------------------------------#
262 |         #   assignment分为3个部分
263 |         #   :4      的内容为网络应该有的回归预测结果
264 |         #   4:-1    的内容为先验框所对应的种类，默认为背景
265 |         #   -1      的内容为当前先验框是否包含目标
266 |         #---------------------------------------------------#
267 |         assignment          = np.zeros((self.num_anchors, 4 + self.num_classes + 1))
268 |         assignment[:, 4]    = 1.0
269 |         if len(boxes) == 0:
270 |             return assignment
271 | 
272 |         # 对每一个真实框都进行iou计算
273 |         encoded_boxes   = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
274 |         #---------------------------------------------------#
275 |         #   在reshape后，获得的encoded_boxes的shape为：
276 |         #   [num_true_box, num_anchors, 4 + 1]
277 |         #   4是编码后的结果，1为iou
278 |         #---------------------------------------------------#
279 |         encoded_boxes   = encoded_boxes.reshape(-1, self.num_anchors, 5)
280 |         
281 |         #---------------------------------------------------#
282 |         #   [num_anchors]求取每一个先验框重合度最大的真实框
283 |         #---------------------------------------------------#
284 |         best_iou        = encoded_boxes[:, :, -1].max(axis=0)
285 |         best_iou_idx    = encoded_boxes[:, :, -1].argmax(axis=0)
286 |         best_iou_mask   = best_iou > 0
287 |         best_iou_idx    = best_iou_idx[best_iou_mask]
288 |         
289 |         #---------------------------------------------------#
290 |         #   计算一共有多少先验框满足需求
291 |         #---------------------------------------------------#
292 |         assign_num      = len(best_iou_idx)
293 | 
294 |         # 将编码后的真实框取出
295 |         encoded_boxes   = encoded_boxes[:, best_iou_mask, :]
296 |         #---------------------------------------------------#
297 |         #   编码后的真实框的赋值
298 |         #---------------------------------------------------#
299 |         assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,np.arange(assign_num),:4]
300 |         #----------------------------------------------------------#
301 |         #   4代表为背景的概率，设定为0，因为这些先验框有对应的物体
302 |         #----------------------------------------------------------#
303 |         assignment[:, 4][best_iou_mask]     = 0
304 |         assignment[:, 5:-1][best_iou_mask]  = boxes[best_iou_idx, 4:]
305 |         #----------------------------------------------------------#
306 |         #   -1表示先验框是否有对应的物体
307 |         #----------------------------------------------------------#
308 |         assignment[:, -1][best_iou_mask]    = 1
309 |         # 通过assign_boxes我们就获得了，输入进来的这张图片，应该有的预测结果是什么样子的
310 |         return assignment
311 | 


--------------------------------------------------------------------------------
/utils/utils.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from PIL import Image
  3 | 
  4 | 
  5 | #---------------------------------------------------------#
  6 | #   将图像转换成RGB图像，防止灰度图在预测时报错。
  7 | #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
  8 | #---------------------------------------------------------#
  9 | def cvtColor(image):
 10 |     if len(np.shape(image)) == 3 and np.shape(image)[2] == 3:
 11 |         return image 
 12 |     else:
 13 |         image = image.convert('RGB')
 14 |         return image 
 15 | 
 16 | #---------------------------------------------------#
 17 | #   对输入图像进行resize
 18 | #---------------------------------------------------#
 19 | def resize_image(image, size, letterbox_image):
 20 |     iw, ih  = image.size
 21 |     w, h    = size
 22 |     if letterbox_image:
 23 |         scale   = min(w/iw, h/ih)
 24 |         nw      = int(iw*scale)
 25 |         nh      = int(ih*scale)
 26 | 
 27 |         image   = image.resize((nw,nh), Image.BICUBIC)
 28 |         new_image = Image.new('RGB', size, (128,128,128))
 29 |         new_image.paste(image, ((w-nw)//2, (h-nh)//2))
 30 |     else:
 31 |         new_image = image.resize((w, h), Image.BICUBIC)
 32 |     return new_image
 33 | 
 34 | #---------------------------------------------------#
 35 | #   获得类
 36 | #---------------------------------------------------#
 37 | def get_classes(classes_path):
 38 |     with open(classes_path, encoding='utf-8') as f:
 39 |         class_names = f.readlines()
 40 |     class_names = [c.strip() for c in class_names]
 41 |     return class_names, len(class_names)
 42 | 
 43 | def show_config(**kwargs):
 44 |     print('Configurations:')
 45 |     print('-' * 70)
 46 |     print('|%25s | %40s|' % ('keys', 'values'))
 47 |     print('-' * 70)
 48 |     for key, value in kwargs.items():
 49 |         print('|%25s | %40s|' % (str(key), str(value)))
 50 |     print('-' * 70)
 51 |     
 52 | #-------------------------------------------------------------------------------------------------------------------------------#
 53 | #   From https://github.com/ckyrkou/Keras_FLOP_Estimator 
 54 | #   Fix lots of bugs
 55 | #-------------------------------------------------------------------------------------------------------------------------------#
 56 | def net_flops(model, table=False, print_result=True):
 57 |     if (table == True):
 58 |         print("\n")
 59 |         print('%25s | %16s | %16s | %16s | %16s | %6s | %6s' % (
 60 |             'Layer Name', 'Input Shape', 'Output Shape', 'Kernel Size', 'Filters', 'Strides', 'FLOPS'))
 61 |         print('=' * 120)
 62 |         
 63 |     #---------------------------------------------------#
 64 |     #   总的FLOPs
 65 |     #---------------------------------------------------#
 66 |     t_flops = 0
 67 |     factor  = 1e9
 68 | 
 69 |     for l in model.layers:
 70 |         try:
 71 |             #--------------------------------------#
 72 |             #   所需参数的初始化定义
 73 |             #--------------------------------------#
 74 |             o_shape, i_shape, strides, ks, filters = ('', '', ''), ('', '', ''), (1, 1), (0, 0), 0
 75 |             flops   = 0
 76 |             #--------------------------------------#
 77 |             #   获得层的名字
 78 |             #--------------------------------------#
 79 |             name    = l.name
 80 |             
 81 |             if ('InputLayer' in str(l)):
 82 |                 i_shape = l.get_input_shape_at(0)[1:4]
 83 |                 o_shape = l.get_output_shape_at(0)[1:4]
 84 |                 
 85 |             #--------------------------------------#
 86 |             #   Reshape层
 87 |             #--------------------------------------#
 88 |             elif ('Reshape' in str(l)):
 89 |                 i_shape = l.get_input_shape_at(0)[1:4]
 90 |                 o_shape = l.get_output_shape_at(0)[1:4]
 91 | 
 92 |             #--------------------------------------#
 93 |             #   填充层
 94 |             #--------------------------------------#
 95 |             elif ('Padding' in str(l)):
 96 |                 i_shape = l.get_input_shape_at(0)[1:4]
 97 |                 o_shape = l.get_output_shape_at(0)[1:4]
 98 | 
 99 |             #--------------------------------------#
100 |             #   平铺层
101 |             #--------------------------------------#
102 |             elif ('Flatten' in str(l)):
103 |                 i_shape = l.get_input_shape_at(0)[1:4]
104 |                 o_shape = l.get_output_shape_at(0)[1:4]
105 |                 
106 |             #--------------------------------------#
107 |             #   激活函数层
108 |             #--------------------------------------#
109 |             elif 'Activation' in str(l):
110 |                 i_shape = l.get_input_shape_at(0)[1:4]
111 |                 o_shape = l.get_output_shape_at(0)[1:4]
112 |                 
113 |             #--------------------------------------#
114 |             #   LeakyReLU
115 |             #--------------------------------------#
116 |             elif 'LeakyReLU' in str(l):
117 |                 for i in range(len(l._inbound_nodes)):
118 |                     i_shape = l.get_input_shape_at(i)[1:4]
119 |                     o_shape = l.get_output_shape_at(i)[1:4]
120 |                     
121 |                     flops   += i_shape[0] * i_shape[1] * i_shape[2]
122 |                     
123 |             #--------------------------------------#
124 |             #   池化层
125 |             #--------------------------------------#
126 |             elif 'MaxPooling' in str(l):
127 |                 i_shape = l.get_input_shape_at(0)[1:4]
128 |                 o_shape = l.get_output_shape_at(0)[1:4]
129 |                     
130 |             #--------------------------------------#
131 |             #   池化层
132 |             #--------------------------------------#
133 |             elif ('AveragePooling' in str(l) and 'Global' not in str(l)):
134 |                 strides = l.strides
135 |                 ks      = l.pool_size
136 |                 
137 |                 for i in range(len(l._inbound_nodes)):
138 |                     i_shape = l.get_input_shape_at(i)[1:4]
139 |                     o_shape = l.get_output_shape_at(i)[1:4]
140 |                     
141 |                     flops   += o_shape[0] * o_shape[1] * o_shape[2]
142 | 
143 |             #--------------------------------------#
144 |             #   全局池化层
145 |             #--------------------------------------#
146 |             elif ('AveragePooling' in str(l) and 'Global' in str(l)):
147 |                 for i in range(len(l._inbound_nodes)):
148 |                     i_shape = l.get_input_shape_at(i)[1:4]
149 |                     o_shape = l.get_output_shape_at(i)[1:4]
150 |                     
151 |                     flops += (i_shape[0] * i_shape[1] + 1) * i_shape[2]
152 |                 
153 |             #--------------------------------------#
154 |             #   标准化层
155 |             #--------------------------------------#
156 |             elif ('BatchNormalization' in str(l)):
157 |                 for i in range(len(l._inbound_nodes)):
158 |                     i_shape = l.get_input_shape_at(i)[1:4]
159 |                     o_shape = l.get_output_shape_at(i)[1:4]
160 | 
161 |                     temp_flops = 1
162 |                     for i in range(len(i_shape)):
163 |                         temp_flops *= i_shape[i]
164 |                     temp_flops *= 2
165 |                     
166 |                     flops += temp_flops
167 |                 
168 |             #--------------------------------------#
169 |             #   全连接层
170 |             #--------------------------------------#
171 |             elif ('Dense' in str(l)):
172 |                 for i in range(len(l._inbound_nodes)):
173 |                     i_shape = l.get_input_shape_at(i)[1:4]
174 |                     o_shape = l.get_output_shape_at(i)[1:4]
175 |                 
176 |                     temp_flops = 1
177 |                     for i in range(len(o_shape)):
178 |                         temp_flops *= o_shape[i]
179 |                         
180 |                     if (i_shape[-1] == None):
181 |                         temp_flops = temp_flops * o_shape[-1]
182 |                     else:
183 |                         temp_flops = temp_flops * i_shape[-1]
184 |                     flops += temp_flops
185 | 
186 |             #--------------------------------------#
187 |             #   普通卷积层
188 |             #--------------------------------------#
189 |             elif ('Conv2D' in str(l) and 'DepthwiseConv2D' not in str(l) and 'SeparableConv2D' not in str(l)):
190 |                 strides = l.strides
191 |                 ks      = l.kernel_size
192 |                 filters = l.filters
193 |                 bias    = 1 if l.use_bias else 0
194 |                 
195 |                 for i in range(len(l._inbound_nodes)):
196 |                     i_shape = l.get_input_shape_at(i)[1:4]
197 |                     o_shape = l.get_output_shape_at(i)[1:4]
198 |                     
199 |                     if (filters == None):
200 |                         filters = i_shape[2]
201 |                     flops += filters * o_shape[0] * o_shape[1] * (ks[0] * ks[1] * i_shape[2] + bias)
202 | 
203 |             #--------------------------------------#
204 |             #   逐层卷积层
205 |             #--------------------------------------#
206 |             elif ('Conv2D' in str(l) and 'DepthwiseConv2D' in str(l) and 'SeparableConv2D' not in str(l)):
207 |                 strides = l.strides
208 |                 ks      = l.kernel_size
209 |                 filters = l.filters
210 |                 bias    = 1 if l.use_bias else 0
211 |             
212 |                 for i in range(len(l._inbound_nodes)):
213 |                     i_shape = l.get_input_shape_at(i)[1:4]
214 |                     o_shape = l.get_output_shape_at(i)[1:4]
215 |                     
216 |                     if (filters == None):
217 |                         filters = i_shape[2]
218 |                     flops += filters * o_shape[0] * o_shape[1] * (ks[0] * ks[1] + bias)
219 |                 
220 |             #--------------------------------------#
221 |             #   深度可分离卷积层
222 |             #--------------------------------------#
223 |             elif ('Conv2D' in str(l) and 'DepthwiseConv2D' not in str(l) and 'SeparableConv2D' in str(l)):
224 |                 strides = l.strides
225 |                 ks      = l.kernel_size
226 |                 filters = l.filters
227 |                 
228 |                 for i in range(len(l._inbound_nodes)):
229 |                     i_shape = l.get_input_shape_at(i)[1:4]
230 |                     o_shape = l.get_output_shape_at(i)[1:4]
231 |                     
232 |                     if (filters == None):
233 |                         filters = i_shape[2]
234 |                     flops += i_shape[2] * o_shape[0] * o_shape[1] * (ks[0] * ks[1] + bias) + \
235 |                              filters * o_shape[0] * o_shape[1] * (1 * 1 * i_shape[2] + bias)
236 |             #--------------------------------------#
237 |             #   模型中有模型时
238 |             #--------------------------------------#
239 |             elif 'Model' in str(l):
240 |                 flops = net_flops(l, print_result=False)
241 |                 
242 |             t_flops += flops
243 | 
244 |             if (table == True):
245 |                 print('%25s | %16s | %16s | %16s | %16s | %6s | %5.4f' % (
246 |                     name[:25], str(i_shape), str(o_shape), str(ks), str(filters), str(strides), flops))
247 |                 
248 |         except:
249 |             pass
250 |     
251 |     t_flops = t_flops * 2
252 |     if print_result:
253 |         show_flops = t_flops / factor
254 |         print('Total GFLOPs: %.3fG' % (show_flops))
255 |     return t_flops


--------------------------------------------------------------------------------
/utils/utils_bbox.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import tensorflow as tf
  3 | import keras.backend as K
  4 | 
  5 | class BBoxUtility(object):
  6 |     def __init__(self, num_classes, nms_thresh=0.45, top_k=300):
  7 |         self.num_classes    = num_classes
  8 |         self._nms_thresh    = nms_thresh
  9 |         self._top_k         = top_k
 10 |         self.boxes          = K.placeholder(dtype='float32', shape=(None, 4))
 11 |         self.scores         = K.placeholder(dtype='float32', shape=(None,))
 12 |         self.nms            = tf.image.non_max_suppression(self.boxes, self.scores, self._top_k, iou_threshold=self._nms_thresh)
 13 |         self.sess           = K.get_session()
 14 | 
 15 |     def ssd_correct_boxes(self, box_xy, box_wh, input_shape, image_shape, letterbox_image):
 16 |         #-----------------------------------------------------------------#
 17 |         #   把y轴放前面是因为方便预测框和图像的宽高进行相乘
 18 |         #-----------------------------------------------------------------#
 19 |         box_yx = box_xy[..., ::-1]
 20 |         box_hw = box_wh[..., ::-1]
 21 |         input_shape = np.array(input_shape)
 22 |         image_shape = np.array(image_shape)
 23 | 
 24 |         if letterbox_image:
 25 |             #-----------------------------------------------------------------#
 26 |             #   这里求出来的offset是图像有效区域相对于图像左上角的偏移情况
 27 |             #   new_shape指的是宽高缩放情况
 28 |             #-----------------------------------------------------------------#
 29 |             new_shape = np.round(image_shape * np.min(input_shape/image_shape))
 30 |             offset  = (input_shape - new_shape)/2./input_shape
 31 |             scale   = input_shape/new_shape
 32 | 
 33 |             box_yx  = (box_yx - offset) * scale
 34 |             box_hw *= scale
 35 | 
 36 |         box_mins    = box_yx - (box_hw / 2.)
 37 |         box_maxes   = box_yx + (box_hw / 2.)
 38 |         boxes  = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1)
 39 |         boxes *= np.concatenate([image_shape, image_shape], axis=-1)
 40 |         return boxes
 41 | 
 42 |     def decode_boxes(self, mbox_loc, anchors, variances):
 43 |         # 获得先验框的宽与高
 44 |         anchor_width     = anchors[:, 2] - anchors[:, 0]
 45 |         anchor_height    = anchors[:, 3] - anchors[:, 1]
 46 |         # 获得先验框的中心点
 47 |         anchor_center_x  = 0.5 * (anchors[:, 2] + anchors[:, 0])
 48 |         anchor_center_y  = 0.5 * (anchors[:, 3] + anchors[:, 1])
 49 | 
 50 |         # 真实框距离先验框中心的xy轴偏移情况
 51 |         decode_bbox_center_x = mbox_loc[:, 0] * anchor_width * variances[0]
 52 |         decode_bbox_center_x += anchor_center_x
 53 |         decode_bbox_center_y = mbox_loc[:, 1] * anchor_height * variances[1]
 54 |         decode_bbox_center_y += anchor_center_y
 55 |         
 56 |         # 真实框的宽与高的求取
 57 |         decode_bbox_width   = np.exp(mbox_loc[:, 2] * variances[2])
 58 |         decode_bbox_width   *= anchor_width
 59 |         decode_bbox_height  = np.exp(mbox_loc[:, 3] * variances[3])
 60 |         decode_bbox_height  *= anchor_height
 61 | 
 62 |         # 获取真实框的左上角与右下角
 63 |         decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width
 64 |         decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height
 65 |         decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width
 66 |         decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height
 67 | 
 68 |         # 真实框的左上角与右下角进行堆叠
 69 |         decode_bbox = np.concatenate((decode_bbox_xmin[:, None],
 70 |                                       decode_bbox_ymin[:, None],
 71 |                                       decode_bbox_xmax[:, None],
 72 |                                       decode_bbox_ymax[:, None]), axis=-1)
 73 |         # 防止超出0与1
 74 |         decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)
 75 |         return decode_bbox
 76 | 
 77 |     def decode_box(self, predictions, anchors, image_shape, input_shape, letterbox_image, variances = [0.1, 0.1, 0.2, 0.2], confidence=0.5):
 78 |         #---------------------------------------------------#
 79 |         #   :4是回归预测结果
 80 |         #---------------------------------------------------#
 81 |         mbox_loc        = predictions[:, :, :4]
 82 |         #---------------------------------------------------#
 83 |         #   获得种类的置信度
 84 |         #---------------------------------------------------#
 85 |         mbox_conf       = predictions[:, :, 4:]
 86 | 
 87 |         results = []
 88 |         #----------------------------------------------------------------------------------------------------------------#
 89 |         #   对每一张图片进行处理，由于在predict.py的时候，我们只输入一张图片，所以for i in range(len(mbox_loc))只进行一次
 90 |         #----------------------------------------------------------------------------------------------------------------#
 91 |         for i in range(len(mbox_loc)):
 92 |             results.append([])
 93 |             #--------------------------------#
 94 |             #   利用回归结果对先验框进行解码
 95 |             #--------------------------------#
 96 |             decode_bbox = self.decode_boxes(mbox_loc[i], anchors, variances)
 97 | 
 98 |             for c in range(1, self.num_classes):
 99 |                 #--------------------------------#
100 |                 #   取出属于该类的所有框的置信度
101 |                 #   判断是否大于门限
102 |                 #--------------------------------#
103 |                 c_confs     = mbox_conf[i, :, c]
104 |                 c_confs_m   = c_confs > confidence
105 |                 if len(c_confs[c_confs_m]) > 0:
106 |                     #-----------------------------------------#
107 |                     #   取出得分高于confidence的框
108 |                     #-----------------------------------------#
109 |                     boxes_to_process = decode_bbox[c_confs_m]
110 |                     confs_to_process = c_confs[c_confs_m]
111 |                     #-----------------------------------------#
112 |                     #   进行iou的非极大抑制
113 |                     #-----------------------------------------#
114 |                     idx         = self.sess.run(self.nms, feed_dict={self.boxes: boxes_to_process, self.scores: confs_to_process})
115 |                     #-----------------------------------------#
116 |                     #   取出在非极大抑制中效果较好的内容
117 |                     #-----------------------------------------#
118 |                     good_boxes  = boxes_to_process[idx]
119 |                     confs       = confs_to_process[idx][:, None]
120 |                     labels      = (c - 1) * np.ones((len(idx), 1))
121 |                     #-----------------------------------------#
122 |                     #   将label、置信度、框的位置进行堆叠。
123 |                     #-----------------------------------------#
124 |                     c_pred      = np.concatenate((good_boxes, labels, confs), axis=1)
125 |                     # 添加进result里
126 |                     results[-1].extend(c_pred)
127 | 
128 |             if len(results[-1]) > 0:
129 |                 results[-1] = np.array(results[-1])
130 |                 box_xy, box_wh = (results[-1][:, 0:2] + results[-1][:, 2:4])/2, results[-1][:, 2:4] - results[-1][:, 0:2]
131 |                 results[-1][:, :4] = self.ssd_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
132 | 
133 |         return results
134 | 


--------------------------------------------------------------------------------
/utils/utils_map.py:
--------------------------------------------------------------------------------
  1 | import glob
  2 | import json
  3 | import math
  4 | import operator
  5 | import os
  6 | import shutil
  7 | import sys
  8 | try:
  9 |     from pycocotools.coco import COCO
 10 |     from pycocotools.cocoeval import COCOeval
 11 | except:
 12 |     pass
 13 | import cv2
 14 | import matplotlib
 15 | matplotlib.use('Agg')
 16 | from matplotlib import pyplot as plt
 17 | import numpy as np
 18 | 
 19 | '''
 20 |     0,0 ------> x (width)
 21 |      |
 22 |      |  (Left,Top)
 23 |      |      *_________
 24 |      |      |         |
 25 |             |         |
 26 |      y      |_________|
 27 |   (height)            *
 28 |                 (Right,Bottom)
 29 | '''
 30 | 
 31 | def log_average_miss_rate(precision, fp_cumsum, num_images):
 32 |     """
 33 |         log-average miss rate:
 34 |             Calculated by averaging miss rates at 9 evenly spaced FPPI points
 35 |             between 10e-2 and 10e0, in log-space.
 36 | 
 37 |         output:
 38 |                 lamr | log-average miss rate
 39 |                 mr | miss rate
 40 |                 fppi | false positives per image
 41 | 
 42 |         references:
 43 |             [1] Dollar, Piotr, et al. "Pedestrian Detection: An Evaluation of the
 44 |                State of the Art." Pattern Analysis and Machine Intelligence, IEEE
 45 |                Transactions on 34.4 (2012): 743 - 761.
 46 |     """
 47 | 
 48 |     if precision.size == 0:
 49 |         lamr = 0
 50 |         mr = 1
 51 |         fppi = 0
 52 |         return lamr, mr, fppi
 53 | 
 54 |     fppi = fp_cumsum / float(num_images)
 55 |     mr = (1 - precision)
 56 | 
 57 |     fppi_tmp = np.insert(fppi, 0, -1.0)
 58 |     mr_tmp = np.insert(mr, 0, 1.0)
 59 | 
 60 |     ref = np.logspace(-2.0, 0.0, num = 9)
 61 |     for i, ref_i in enumerate(ref):
 62 |         j = np.where(fppi_tmp <= ref_i)[-1][-1]
 63 |         ref[i] = mr_tmp[j]
 64 | 
 65 |     lamr = math.exp(np.mean(np.log(np.maximum(1e-10, ref))))
 66 | 
 67 |     return lamr, mr, fppi
 68 | 
 69 | """
 70 |  throw error and exit
 71 | """
 72 | def error(msg):
 73 |     print(msg)
 74 |     sys.exit(0)
 75 | 
 76 | """
 77 |  check if the number is a float between 0.0 and 1.0
 78 | """
 79 | def is_float_between_0_and_1(value):
 80 |     try:
 81 |         val = float(value)
 82 |         if val > 0.0 and val < 1.0:
 83 |             return True
 84 |         else:
 85 |             return False
 86 |     except ValueError:
 87 |         return False
 88 | 
 89 | """
 90 |  Calculate the AP given the recall and precision array
 91 |     1st) We compute a version of the measured precision/recall curve with
 92 |          precision monotonically decreasing
 93 |     2nd) We compute the AP as the area under this curve by numerical integration.
 94 | """
 95 | def voc_ap(rec, prec):
 96 |     """
 97 |     --- Official matlab code VOC2012---
 98 |     mrec=[0 ; rec ; 1];
 99 |     mpre=[0 ; prec ; 0];
100 |     for i=numel(mpre)-1:-1:1
101 |             mpre(i)=max(mpre(i),mpre(i+1));
102 |     end
103 |     i=find(mrec(2:end)~=mrec(1:end-1))+1;
104 |     ap=sum((mrec(i)-mrec(i-1)).*mpre(i));
105 |     """
106 |     rec.insert(0, 0.0) # insert 0.0 at begining of list
107 |     rec.append(1.0) # insert 1.0 at end of list
108 |     mrec = rec[:]
109 |     prec.insert(0, 0.0) # insert 0.0 at begining of list
110 |     prec.append(0.0) # insert 0.0 at end of list
111 |     mpre = prec[:]
112 |     """
113 |      This part makes the precision monotonically decreasing
114 |         (goes from the end to the beginning)
115 |         matlab: for i=numel(mpre)-1:-1:1
116 |                     mpre(i)=max(mpre(i),mpre(i+1));
117 |     """
118 |     for i in range(len(mpre)-2, -1, -1):
119 |         mpre[i] = max(mpre[i], mpre[i+1])
120 |     """
121 |      This part creates a list of indexes where the recall changes
122 |         matlab: i=find(mrec(2:end)~=mrec(1:end-1))+1;
123 |     """
124 |     i_list = []
125 |     for i in range(1, len(mrec)):
126 |         if mrec[i] != mrec[i-1]:
127 |             i_list.append(i) # if it was matlab would be i + 1
128 |     """
129 |      The Average Precision (AP) is the area under the curve
130 |         (numerical integration)
131 |         matlab: ap=sum((mrec(i)-mrec(i-1)).*mpre(i));
132 |     """
133 |     ap = 0.0
134 |     for i in i_list:
135 |         ap += ((mrec[i]-mrec[i-1])*mpre[i])
136 |     return ap, mrec, mpre
137 | 
138 | 
139 | """
140 |  Convert the lines of a file to a list
141 | """
142 | def file_lines_to_list(path):
143 |     # open txt file lines to a list
144 |     with open(path) as f:
145 |         content = f.readlines()
146 |     # remove whitespace characters like `\n` at the end of each line
147 |     content = [x.strip() for x in content]
148 |     return content
149 | 
150 | """
151 |  Draws text in image
152 | """
153 | def draw_text_in_image(img, text, pos, color, line_width):
154 |     font = cv2.FONT_HERSHEY_PLAIN
155 |     fontScale = 1
156 |     lineType = 1
157 |     bottomLeftCornerOfText = pos
158 |     cv2.putText(img, text,
159 |             bottomLeftCornerOfText,
160 |             font,
161 |             fontScale,
162 |             color,
163 |             lineType)
164 |     text_width, _ = cv2.getTextSize(text, font, fontScale, lineType)[0]
165 |     return img, (line_width + text_width)
166 | 
167 | """
168 |  Plot - adjust axes
169 | """
170 | def adjust_axes(r, t, fig, axes):
171 |     # get text width for re-scaling
172 |     bb = t.get_window_extent(renderer=r)
173 |     text_width_inches = bb.width / fig.dpi
174 |     # get axis width in inches
175 |     current_fig_width = fig.get_figwidth()
176 |     new_fig_width = current_fig_width + text_width_inches
177 |     propotion = new_fig_width / current_fig_width
178 |     # get axis limit
179 |     x_lim = axes.get_xlim()
180 |     axes.set_xlim([x_lim[0], x_lim[1]*propotion])
181 | 
182 | """
183 |  Draw plot using Matplotlib
184 | """
185 | def draw_plot_func(dictionary, n_classes, window_title, plot_title, x_label, output_path, to_show, plot_color, true_p_bar):
186 |     # sort the dictionary by decreasing value, into a list of tuples
187 |     sorted_dic_by_value = sorted(dictionary.items(), key=operator.itemgetter(1))
188 |     # unpacking the list of tuples into two lists
189 |     sorted_keys, sorted_values = zip(*sorted_dic_by_value)
190 |     # 
191 |     if true_p_bar != "":
192 |         """
193 |          Special case to draw in:
194 |             - green -> TP: True Positives (object detected and matches ground-truth)
195 |             - red -> FP: False Positives (object detected but does not match ground-truth)
196 |             - orange -> FN: False Negatives (object not detected but present in the ground-truth)
197 |         """
198 |         fp_sorted = []
199 |         tp_sorted = []
200 |         for key in sorted_keys:
201 |             fp_sorted.append(dictionary[key] - true_p_bar[key])
202 |             tp_sorted.append(true_p_bar[key])
203 |         plt.barh(range(n_classes), fp_sorted, align='center', color='crimson', label='False Positive')
204 |         plt.barh(range(n_classes), tp_sorted, align='center', color='forestgreen', label='True Positive', left=fp_sorted)
205 |         # add legend
206 |         plt.legend(loc='lower right')
207 |         """
208 |          Write number on side of bar
209 |         """
210 |         fig = plt.gcf() # gcf - get current figure
211 |         axes = plt.gca()
212 |         r = fig.canvas.get_renderer()
213 |         for i, val in enumerate(sorted_values):
214 |             fp_val = fp_sorted[i]
215 |             tp_val = tp_sorted[i]
216 |             fp_str_val = " " + str(fp_val)
217 |             tp_str_val = fp_str_val + " " + str(tp_val)
218 |             # trick to paint multicolor with offset:
219 |             # first paint everything and then repaint the first number
220 |             t = plt.text(val, i, tp_str_val, color='forestgreen', va='center', fontweight='bold')
221 |             plt.text(val, i, fp_str_val, color='crimson', va='center', fontweight='bold')
222 |             if i == (len(sorted_values)-1): # largest bar
223 |                 adjust_axes(r, t, fig, axes)
224 |     else:
225 |         plt.barh(range(n_classes), sorted_values, color=plot_color)
226 |         """
227 |          Write number on side of bar
228 |         """
229 |         fig = plt.gcf() # gcf - get current figure
230 |         axes = plt.gca()
231 |         r = fig.canvas.get_renderer()
232 |         for i, val in enumerate(sorted_values):
233 |             str_val = " " + str(val) # add a space before
234 |             if val < 1.0:
235 |                 str_val = " {0:.2f}".format(val)
236 |             t = plt.text(val, i, str_val, color=plot_color, va='center', fontweight='bold')
237 |             # re-set axes to show number inside the figure
238 |             if i == (len(sorted_values)-1): # largest bar
239 |                 adjust_axes(r, t, fig, axes)
240 |     # set window title
241 |     fig.canvas.set_window_title(window_title)
242 |     # write classes in y axis
243 |     tick_font_size = 12
244 |     plt.yticks(range(n_classes), sorted_keys, fontsize=tick_font_size)
245 |     """
246 |      Re-scale height accordingly
247 |     """
248 |     init_height = fig.get_figheight()
249 |     # comput the matrix height in points and inches
250 |     dpi = fig.dpi
251 |     height_pt = n_classes * (tick_font_size * 1.4) # 1.4 (some spacing)
252 |     height_in = height_pt / dpi
253 |     # compute the required figure height 
254 |     top_margin = 0.15 # in percentage of the figure height
255 |     bottom_margin = 0.05 # in percentage of the figure height
256 |     figure_height = height_in / (1 - top_margin - bottom_margin)
257 |     # set new height
258 |     if figure_height > init_height:
259 |         fig.set_figheight(figure_height)
260 | 
261 |     # set plot title
262 |     plt.title(plot_title, fontsize=14)
263 |     # set axis titles
264 |     # plt.xlabel('classes')
265 |     plt.xlabel(x_label, fontsize='large')
266 |     # adjust size of window
267 |     fig.tight_layout()
268 |     # save the plot
269 |     fig.savefig(output_path)
270 |     # show image
271 |     if to_show:
272 |         plt.show()
273 |     # close the plot
274 |     plt.close()
275 | 
276 | def get_map(MINOVERLAP, draw_plot, score_threhold=0.5, path = './map_out'):
277 |     GT_PATH             = os.path.join(path, 'ground-truth')
278 |     DR_PATH             = os.path.join(path, 'detection-results')
279 |     IMG_PATH            = os.path.join(path, 'images-optional')
280 |     TEMP_FILES_PATH     = os.path.join(path, '.temp_files')
281 |     RESULTS_FILES_PATH  = os.path.join(path, 'results')
282 | 
283 |     show_animation = True
284 |     if os.path.exists(IMG_PATH): 
285 |         for dirpath, dirnames, files in os.walk(IMG_PATH):
286 |             if not files:
287 |                 show_animation = False
288 |     else:
289 |         show_animation = False
290 | 
291 |     if not os.path.exists(TEMP_FILES_PATH):
292 |         os.makedirs(TEMP_FILES_PATH)
293 |         
294 |     if os.path.exists(RESULTS_FILES_PATH):
295 |         shutil.rmtree(RESULTS_FILES_PATH)
296 |     else:
297 |         os.makedirs(RESULTS_FILES_PATH)
298 |     if draw_plot:
299 |         try:
300 |             matplotlib.use('TkAgg')
301 |         except:
302 |             pass
303 |         os.makedirs(os.path.join(RESULTS_FILES_PATH, "AP"))
304 |         os.makedirs(os.path.join(RESULTS_FILES_PATH, "F1"))
305 |         os.makedirs(os.path.join(RESULTS_FILES_PATH, "Recall"))
306 |         os.makedirs(os.path.join(RESULTS_FILES_PATH, "Precision"))
307 |     if show_animation:
308 |         os.makedirs(os.path.join(RESULTS_FILES_PATH, "images", "detections_one_by_one"))
309 | 
310 |     ground_truth_files_list = glob.glob(GT_PATH + '/*.txt')
311 |     if len(ground_truth_files_list) == 0:
312 |         error("Error: No ground-truth files found!")
313 |     ground_truth_files_list.sort()
314 |     gt_counter_per_class     = {}
315 |     counter_images_per_class = {}
316 | 
317 |     for txt_file in ground_truth_files_list:
318 |         file_id     = txt_file.split(".txt", 1)[0]
319 |         file_id     = os.path.basename(os.path.normpath(file_id))
320 |         temp_path   = os.path.join(DR_PATH, (file_id + ".txt"))
321 |         if not os.path.exists(temp_path):
322 |             error_msg = "Error. File not found: {}\n".format(temp_path)
323 |             error(error_msg)
324 |         lines_list      = file_lines_to_list(txt_file)
325 |         bounding_boxes  = []
326 |         is_difficult    = False
327 |         already_seen_classes = []
328 |         for line in lines_list:
329 |             try:
330 |                 if "difficult" in line:
331 |                     class_name, left, top, right, bottom, _difficult = line.split()
332 |                     is_difficult = True
333 |                 else:
334 |                     class_name, left, top, right, bottom = line.split()
335 |             except:
336 |                 if "difficult" in line:
337 |                     line_split  = line.split()
338 |                     _difficult  = line_split[-1]
339 |                     bottom      = line_split[-2]
340 |                     right       = line_split[-3]
341 |                     top         = line_split[-4]
342 |                     left        = line_split[-5]
343 |                     class_name  = ""
344 |                     for name in line_split[:-5]:
345 |                         class_name += name + " "
346 |                     class_name  = class_name[:-1]
347 |                     is_difficult = True
348 |                 else:
349 |                     line_split  = line.split()
350 |                     bottom      = line_split[-1]
351 |                     right       = line_split[-2]
352 |                     top         = line_split[-3]
353 |                     left        = line_split[-4]
354 |                     class_name  = ""
355 |                     for name in line_split[:-4]:
356 |                         class_name += name + " "
357 |                     class_name = class_name[:-1]
358 | 
359 |             bbox = left + " " + top + " " + right + " " + bottom
360 |             if is_difficult:
361 |                 bounding_boxes.append({"class_name":class_name, "bbox":bbox, "used":False, "difficult":True})
362 |                 is_difficult = False
363 |             else:
364 |                 bounding_boxes.append({"class_name":class_name, "bbox":bbox, "used":False})
365 |                 if class_name in gt_counter_per_class:
366 |                     gt_counter_per_class[class_name] += 1
367 |                 else:
368 |                     gt_counter_per_class[class_name] = 1
369 | 
370 |                 if class_name not in already_seen_classes:
371 |                     if class_name in counter_images_per_class:
372 |                         counter_images_per_class[class_name] += 1
373 |                     else:
374 |                         counter_images_per_class[class_name] = 1
375 |                     already_seen_classes.append(class_name)
376 | 
377 |         with open(TEMP_FILES_PATH + "/" + file_id + "_ground_truth.json", 'w') as outfile:
378 |             json.dump(bounding_boxes, outfile)
379 | 
380 |     gt_classes  = list(gt_counter_per_class.keys())
381 |     gt_classes  = sorted(gt_classes)
382 |     n_classes   = len(gt_classes)
383 | 
384 |     dr_files_list = glob.glob(DR_PATH + '/*.txt')
385 |     dr_files_list.sort()
386 |     for class_index, class_name in enumerate(gt_classes):
387 |         bounding_boxes = []
388 |         for txt_file in dr_files_list:
389 |             file_id = txt_file.split(".txt",1)[0]
390 |             file_id = os.path.basename(os.path.normpath(file_id))
391 |             temp_path = os.path.join(GT_PATH, (file_id + ".txt"))
392 |             if class_index == 0:
393 |                 if not os.path.exists(temp_path):
394 |                     error_msg = "Error. File not found: {}\n".format(temp_path)
395 |                     error(error_msg)
396 |             lines = file_lines_to_list(txt_file)
397 |             for line in lines:
398 |                 try:
399 |                     tmp_class_name, confidence, left, top, right, bottom = line.split()
400 |                 except:
401 |                     line_split      = line.split()
402 |                     bottom          = line_split[-1]
403 |                     right           = line_split[-2]
404 |                     top             = line_split[-3]
405 |                     left            = line_split[-4]
406 |                     confidence      = line_split[-5]
407 |                     tmp_class_name  = ""
408 |                     for name in line_split[:-5]:
409 |                         tmp_class_name += name + " "
410 |                     tmp_class_name  = tmp_class_name[:-1]
411 | 
412 |                 if tmp_class_name == class_name:
413 |                     bbox = left + " " + top + " " + right + " " +bottom
414 |                     bounding_boxes.append({"confidence":confidence, "file_id":file_id, "bbox":bbox})
415 | 
416 |         bounding_boxes.sort(key=lambda x:float(x['confidence']), reverse=True)
417 |         with open(TEMP_FILES_PATH + "/" + class_name + "_dr.json", 'w') as outfile:
418 |             json.dump(bounding_boxes, outfile)
419 | 
420 |     sum_AP = 0.0
421 |     ap_dictionary = {}
422 |     lamr_dictionary = {}
423 |     with open(RESULTS_FILES_PATH + "/results.txt", 'w') as results_file:
424 |         results_file.write("# AP and precision/recall per class\n")
425 |         count_true_positives = {}
426 | 
427 |         for class_index, class_name in enumerate(gt_classes):
428 |             count_true_positives[class_name] = 0
429 |             dr_file = TEMP_FILES_PATH + "/" + class_name + "_dr.json"
430 |             dr_data = json.load(open(dr_file))
431 | 
432 |             nd          = len(dr_data)
433 |             tp          = [0] * nd
434 |             fp          = [0] * nd
435 |             score       = [0] * nd
436 |             score_threhold_idx = 0
437 |             for idx, detection in enumerate(dr_data):
438 |                 file_id     = detection["file_id"]
439 |                 score[idx]  = float(detection["confidence"])
440 |                 if score[idx] >= score_threhold:
441 |                     score_threhold_idx = idx
442 | 
443 |                 if show_animation:
444 |                     ground_truth_img = glob.glob1(IMG_PATH, file_id + ".*")
445 |                     if len(ground_truth_img) == 0:
446 |                         error("Error. Image not found with id: " + file_id)
447 |                     elif len(ground_truth_img) > 1:
448 |                         error("Error. Multiple image with id: " + file_id)
449 |                     else:
450 |                         img = cv2.imread(IMG_PATH + "/" + ground_truth_img[0])
451 |                         img_cumulative_path = RESULTS_FILES_PATH + "/images/" + ground_truth_img[0]
452 |                         if os.path.isfile(img_cumulative_path):
453 |                             img_cumulative = cv2.imread(img_cumulative_path)
454 |                         else:
455 |                             img_cumulative = img.copy()
456 |                         bottom_border = 60
457 |                         BLACK = [0, 0, 0]
458 |                         img = cv2.copyMakeBorder(img, 0, bottom_border, 0, 0, cv2.BORDER_CONSTANT, value=BLACK)
459 | 
460 |                 gt_file             = TEMP_FILES_PATH + "/" + file_id + "_ground_truth.json"
461 |                 ground_truth_data   = json.load(open(gt_file))
462 |                 ovmax       = -1
463 |                 gt_match    = -1
464 |                 bb          = [float(x) for x in detection["bbox"].split()]
465 |                 for obj in ground_truth_data:
466 |                     if obj["class_name"] == class_name:
467 |                         bbgt    = [ float(x) for x in obj["bbox"].split() ]
468 |                         bi      = [max(bb[0],bbgt[0]), max(bb[1],bbgt[1]), min(bb[2],bbgt[2]), min(bb[3],bbgt[3])]
469 |                         iw      = bi[2] - bi[0] + 1
470 |                         ih      = bi[3] - bi[1] + 1
471 |                         if iw > 0 and ih > 0:
472 |                             ua = (bb[2] - bb[0] + 1) * (bb[3] - bb[1] + 1) + (bbgt[2] - bbgt[0]
473 |                                             + 1) * (bbgt[3] - bbgt[1] + 1) - iw * ih
474 |                             ov = iw * ih / ua
475 |                             if ov > ovmax:
476 |                                 ovmax = ov
477 |                                 gt_match = obj
478 | 
479 |                 if show_animation:
480 |                     status = "NO MATCH FOUND!" 
481 |                     
482 |                 min_overlap = MINOVERLAP
483 |                 if ovmax >= min_overlap:
484 |                     if "difficult" not in gt_match:
485 |                         if not bool(gt_match["used"]):
486 |                             tp[idx] = 1
487 |                             gt_match["used"] = True
488 |                             count_true_positives[class_name] += 1
489 |                             with open(gt_file, 'w') as f:
490 |                                     f.write(json.dumps(ground_truth_data))
491 |                             if show_animation:
492 |                                 status = "MATCH!"
493 |                         else:
494 |                             fp[idx] = 1
495 |                             if show_animation:
496 |                                 status = "REPEATED MATCH!"
497 |                 else:
498 |                     fp[idx] = 1
499 |                     if ovmax > 0:
500 |                         status = "INSUFFICIENT OVERLAP"
501 | 
502 |                 """
503 |                 Draw image to show animation
504 |                 """
505 |                 if show_animation:
506 |                     height, widht = img.shape[:2]
507 |                     white           = (255,255,255)
508 |                     light_blue      = (255,200,100)
509 |                     green           = (0,255,0)
510 |                     light_red       = (30,30,255)
511 |                     margin          = 10
512 |                     # 1nd line
513 |                     v_pos           = int(height - margin - (bottom_border / 2.0))
514 |                     text            = "Image: " + ground_truth_img[0] + " "
515 |                     img, line_width = draw_text_in_image(img, text, (margin, v_pos), white, 0)
516 |                     text            = "Class [" + str(class_index) + "/" + str(n_classes) + "]: " + class_name + " "
517 |                     img, line_width = draw_text_in_image(img, text, (margin + line_width, v_pos), light_blue, line_width)
518 |                     if ovmax != -1:
519 |                         color       = light_red
520 |                         if status   == "INSUFFICIENT OVERLAP":
521 |                             text    = "IoU: {0:.2f}% ".format(ovmax*100) + "< {0:.2f}% ".format(min_overlap*100)
522 |                         else:
523 |                             text    = "IoU: {0:.2f}% ".format(ovmax*100) + ">= {0:.2f}% ".format(min_overlap*100)
524 |                             color   = green
525 |                         img, _ = draw_text_in_image(img, text, (margin + line_width, v_pos), color, line_width)
526 |                     # 2nd line
527 |                     v_pos           += int(bottom_border / 2.0)
528 |                     rank_pos        = str(idx+1)
529 |                     text            = "Detection #rank: " + rank_pos + " confidence: {0:.2f}% ".format(float(detection["confidence"])*100)
530 |                     img, line_width = draw_text_in_image(img, text, (margin, v_pos), white, 0)
531 |                     color           = light_red
532 |                     if status == "MATCH!":
533 |                         color = green
534 |                     text            = "Result: " + status + " "
535 |                     img, line_width = draw_text_in_image(img, text, (margin + line_width, v_pos), color, line_width)
536 | 
537 |                     font = cv2.FONT_HERSHEY_SIMPLEX
538 |                     if ovmax > 0: 
539 |                         bbgt = [ int(round(float(x))) for x in gt_match["bbox"].split() ]
540 |                         cv2.rectangle(img,(bbgt[0],bbgt[1]),(bbgt[2],bbgt[3]),light_blue,2)
541 |                         cv2.rectangle(img_cumulative,(bbgt[0],bbgt[1]),(bbgt[2],bbgt[3]),light_blue,2)
542 |                         cv2.putText(img_cumulative, class_name, (bbgt[0],bbgt[1] - 5), font, 0.6, light_blue, 1, cv2.LINE_AA)
543 |                     bb = [int(i) for i in bb]
544 |                     cv2.rectangle(img,(bb[0],bb[1]),(bb[2],bb[3]),color,2)
545 |                     cv2.rectangle(img_cumulative,(bb[0],bb[1]),(bb[2],bb[3]),color,2)
546 |                     cv2.putText(img_cumulative, class_name, (bb[0],bb[1] - 5), font, 0.6, color, 1, cv2.LINE_AA)
547 | 
548 |                     cv2.imshow("Animation", img)
549 |                     cv2.waitKey(20) 
550 |                     output_img_path = RESULTS_FILES_PATH + "/images/detections_one_by_one/" + class_name + "_detection" + str(idx) + ".jpg"
551 |                     cv2.imwrite(output_img_path, img)
552 |                     cv2.imwrite(img_cumulative_path, img_cumulative)
553 | 
554 |             cumsum = 0
555 |             for idx, val in enumerate(fp):
556 |                 fp[idx] += cumsum
557 |                 cumsum += val
558 |                 
559 |             cumsum = 0
560 |             for idx, val in enumerate(tp):
561 |                 tp[idx] += cumsum
562 |                 cumsum += val
563 | 
564 |             rec = tp[:]
565 |             for idx, val in enumerate(tp):
566 |                 rec[idx] = float(tp[idx]) / np.maximum(gt_counter_per_class[class_name], 1)
567 | 
568 |             prec = tp[:]
569 |             for idx, val in enumerate(tp):
570 |                 prec[idx] = float(tp[idx]) / np.maximum((fp[idx] + tp[idx]), 1)
571 | 
572 |             ap, mrec, mprec = voc_ap(rec[:], prec[:])
573 |             F1  = np.array(rec)*np.array(prec)*2 / np.where((np.array(prec)+np.array(rec))==0, 1, (np.array(prec)+np.array(rec)))
574 | 
575 |             sum_AP  += ap
576 |             text    = "{0:.2f}%".format(ap*100) + " = " + class_name + " AP " #class_name + " AP = {0:.2f}%".format(ap*100)
577 | 
578 |             if len(prec)>0:
579 |                 F1_text         = "{0:.2f}".format(F1[score_threhold_idx]) + " = " + class_name + " F1 "
580 |                 Recall_text     = "{0:.2f}%".format(rec[score_threhold_idx]*100) + " = " + class_name + " Recall "
581 |                 Precision_text  = "{0:.2f}%".format(prec[score_threhold_idx]*100) + " = " + class_name + " Precision "
582 |             else:
583 |                 F1_text         = "0.00" + " = " + class_name + " F1 " 
584 |                 Recall_text     = "0.00%" + " = " + class_name + " Recall " 
585 |                 Precision_text  = "0.00%" + " = " + class_name + " Precision " 
586 | 
587 |             rounded_prec    = [ '%.2f' % elem for elem in prec ]
588 |             rounded_rec     = [ '%.2f' % elem for elem in rec ]
589 |             results_file.write(text + "\n Precision: " + str(rounded_prec) + "\n Recall :" + str(rounded_rec) + "\n\n")
590 |             
591 |             if len(prec)>0:
592 |                 print(text + "\t||\tscore_threhold=" + str(score_threhold) + " : " + "F1=" + "{0:.2f}".format(F1[score_threhold_idx])\
593 |                     + " ; Recall=" + "{0:.2f}%".format(rec[score_threhold_idx]*100) + " ; Precision=" + "{0:.2f}%".format(prec[score_threhold_idx]*100))
594 |             else:
595 |                 print(text + "\t||\tscore_threhold=" + str(score_threhold) + " : " + "F1=0.00% ; Recall=0.00% ; Precision=0.00%")
596 |             ap_dictionary[class_name] = ap
597 | 
598 |             n_images = counter_images_per_class[class_name]
599 |             lamr, mr, fppi = log_average_miss_rate(np.array(rec), np.array(fp), n_images)
600 |             lamr_dictionary[class_name] = lamr
601 | 
602 |             if draw_plot:
603 |                 plt.plot(rec, prec, '-o')
604 |                 area_under_curve_x = mrec[:-1] + [mrec[-2]] + [mrec[-1]]
605 |                 area_under_curve_y = mprec[:-1] + [0.0] + [mprec[-1]]
606 |                 plt.fill_between(area_under_curve_x, 0, area_under_curve_y, alpha=0.2, edgecolor='r')
607 | 
608 |                 fig = plt.gcf()
609 |                 fig.canvas.set_window_title('AP ' + class_name)
610 | 
611 |                 plt.title('class: ' + text)
612 |                 plt.xlabel('Recall')
613 |                 plt.ylabel('Precision')
614 |                 axes = plt.gca()
615 |                 axes.set_xlim([0.0,1.0])
616 |                 axes.set_ylim([0.0,1.05]) 
617 |                 fig.savefig(RESULTS_FILES_PATH + "/AP/" + class_name + ".png")
618 |                 plt.cla()
619 | 
620 |                 plt.plot(score, F1, "-", color='orangered')
621 |                 plt.title('class: ' + F1_text + "\nscore_threhold=" + str(score_threhold))
622 |                 plt.xlabel('Score_Threhold')
623 |                 plt.ylabel('F1')
624 |                 axes = plt.gca()
625 |                 axes.set_xlim([0.0,1.0])
626 |                 axes.set_ylim([0.0,1.05])
627 |                 fig.savefig(RESULTS_FILES_PATH + "/F1/" + class_name + ".png")
628 |                 plt.cla()
629 | 
630 |                 plt.plot(score, rec, "-H", color='gold')
631 |                 plt.title('class: ' + Recall_text + "\nscore_threhold=" + str(score_threhold))
632 |                 plt.xlabel('Score_Threhold')
633 |                 plt.ylabel('Recall')
634 |                 axes = plt.gca()
635 |                 axes.set_xlim([0.0,1.0])
636 |                 axes.set_ylim([0.0,1.05])
637 |                 fig.savefig(RESULTS_FILES_PATH + "/Recall/" + class_name + ".png")
638 |                 plt.cla()
639 | 
640 |                 plt.plot(score, prec, "-s", color='palevioletred')
641 |                 plt.title('class: ' + Precision_text + "\nscore_threhold=" + str(score_threhold))
642 |                 plt.xlabel('Score_Threhold')
643 |                 plt.ylabel('Precision')
644 |                 axes = plt.gca()
645 |                 axes.set_xlim([0.0,1.0])
646 |                 axes.set_ylim([0.0,1.05])
647 |                 fig.savefig(RESULTS_FILES_PATH + "/Precision/" + class_name + ".png")
648 |                 plt.cla()
649 |                 
650 |         if show_animation:
651 |             cv2.destroyAllWindows()
652 |         if n_classes == 0:
653 |             print("未检测到任何种类，请检查标签信息与get_map.py中的classes_path是否修改。")
654 |             return 0
655 |         results_file.write("\n# mAP of all classes\n")
656 |         mAP     = sum_AP / n_classes
657 |         text    = "mAP = {0:.2f}%".format(mAP*100)
658 |         results_file.write(text + "\n")
659 |         print(text)
660 | 
661 |     shutil.rmtree(TEMP_FILES_PATH)
662 | 
663 |     """
664 |     Count total of detection-results
665 |     """
666 |     det_counter_per_class = {}
667 |     for txt_file in dr_files_list:
668 |         lines_list = file_lines_to_list(txt_file)
669 |         for line in lines_list:
670 |             class_name = line.split()[0]
671 |             if class_name in det_counter_per_class:
672 |                 det_counter_per_class[class_name] += 1
673 |             else:
674 |                 det_counter_per_class[class_name] = 1
675 |     dr_classes = list(det_counter_per_class.keys())
676 | 
677 |     """
678 |     Write number of ground-truth objects per class to results.txt
679 |     """
680 |     with open(RESULTS_FILES_PATH + "/results.txt", 'a') as results_file:
681 |         results_file.write("\n# Number of ground-truth objects per class\n")
682 |         for class_name in sorted(gt_counter_per_class):
683 |             results_file.write(class_name + ": " + str(gt_counter_per_class[class_name]) + "\n")
684 | 
685 |     """
686 |     Finish counting true positives
687 |     """
688 |     for class_name in dr_classes:
689 |         if class_name not in gt_classes:
690 |             count_true_positives[class_name] = 0
691 | 
692 |     """
693 |     Write number of detected objects per class to results.txt
694 |     """
695 |     with open(RESULTS_FILES_PATH + "/results.txt", 'a') as results_file:
696 |         results_file.write("\n# Number of detected objects per class\n")
697 |         for class_name in sorted(dr_classes):
698 |             n_det = det_counter_per_class[class_name]
699 |             text = class_name + ": " + str(n_det)
700 |             text += " (tp:" + str(count_true_positives[class_name]) + ""
701 |             text += ", fp:" + str(n_det - count_true_positives[class_name]) + ")\n"
702 |             results_file.write(text)
703 | 
704 |     """
705 |     Plot the total number of occurences of each class in the ground-truth
706 |     """
707 |     if draw_plot:
708 |         window_title = "ground-truth-info"
709 |         plot_title = "ground-truth\n"
710 |         plot_title += "(" + str(len(ground_truth_files_list)) + " files and " + str(n_classes) + " classes)"
711 |         x_label = "Number of objects per class"
712 |         output_path = RESULTS_FILES_PATH + "/ground-truth-info.png"
713 |         to_show = False
714 |         plot_color = 'forestgreen'
715 |         draw_plot_func(
716 |             gt_counter_per_class,
717 |             n_classes,
718 |             window_title,
719 |             plot_title,
720 |             x_label,
721 |             output_path,
722 |             to_show,
723 |             plot_color,
724 |             '',
725 |             )
726 | 
727 |     # """
728 |     # Plot the total number of occurences of each class in the "detection-results" folder
729 |     # """
730 |     # if draw_plot:
731 |     #     window_title = "detection-results-info"
732 |     #     # Plot title
733 |     #     plot_title = "detection-results\n"
734 |     #     plot_title += "(" + str(len(dr_files_list)) + " files and "
735 |     #     count_non_zero_values_in_dictionary = sum(int(x) > 0 for x in list(det_counter_per_class.values()))
736 |     #     plot_title += str(count_non_zero_values_in_dictionary) + " detected classes)"
737 |     #     # end Plot title
738 |     #     x_label = "Number of objects per class"
739 |     #     output_path = RESULTS_FILES_PATH + "/detection-results-info.png"
740 |     #     to_show = False
741 |     #     plot_color = 'forestgreen'
742 |     #     true_p_bar = count_true_positives
743 |     #     draw_plot_func(
744 |     #         det_counter_per_class,
745 |     #         len(det_counter_per_class),
746 |     #         window_title,
747 |     #         plot_title,
748 |     #         x_label,
749 |     #         output_path,
750 |     #         to_show,
751 |     #         plot_color,
752 |     #         true_p_bar
753 |     #         )
754 | 
755 |     """
756 |     Draw log-average miss rate plot (Show lamr of all classes in decreasing order)
757 |     """
758 |     if draw_plot:
759 |         window_title = "lamr"
760 |         plot_title = "log-average miss rate"
761 |         x_label = "log-average miss rate"
762 |         output_path = RESULTS_FILES_PATH + "/lamr.png"
763 |         to_show = False
764 |         plot_color = 'royalblue'
765 |         draw_plot_func(
766 |             lamr_dictionary,
767 |             n_classes,
768 |             window_title,
769 |             plot_title,
770 |             x_label,
771 |             output_path,
772 |             to_show,
773 |             plot_color,
774 |             ""
775 |             )
776 | 
777 |     """
778 |     Draw mAP plot (Show AP's of all classes in decreasing order)
779 |     """
780 |     if draw_plot:
781 |         window_title = "mAP"
782 |         plot_title = "mAP = {0:.2f}%".format(mAP*100)
783 |         x_label = "Average Precision"
784 |         output_path = RESULTS_FILES_PATH + "/mAP.png"
785 |         to_show = True
786 |         plot_color = 'royalblue'
787 |         draw_plot_func(
788 |             ap_dictionary,
789 |             n_classes,
790 |             window_title,
791 |             plot_title,
792 |             x_label,
793 |             output_path,
794 |             to_show,
795 |             plot_color,
796 |             ""
797 |             )
798 |     return mAP
799 | 
800 | def preprocess_gt(gt_path, class_names):
801 |     image_ids   = os.listdir(gt_path)
802 |     results = {}
803 | 
804 |     images = []
805 |     bboxes = []
806 |     for i, image_id in enumerate(image_ids):
807 |         lines_list      = file_lines_to_list(os.path.join(gt_path, image_id))
808 |         boxes_per_image = []
809 |         image           = {}
810 |         image_id        = os.path.splitext(image_id)[0]
811 |         image['file_name'] = image_id + '.jpg'
812 |         image['width']     = 1
813 |         image['height']    = 1
814 |         #-----------------------------------------------------------------#
815 |         #   感谢 多学学英语吧 的提醒
816 |         #   解决了'Results do not correspond to current coco set'问题
817 |         #-----------------------------------------------------------------#
818 |         image['id']        = str(image_id)
819 | 
820 |         for line in lines_list:
821 |             difficult = 0 
822 |             if "difficult" in line:
823 |                 line_split  = line.split()
824 |                 left, top, right, bottom, _difficult = line_split[-5:]
825 |                 class_name  = ""
826 |                 for name in line_split[:-5]:
827 |                     class_name += name + " "
828 |                 class_name  = class_name[:-1]
829 |                 difficult = 1
830 |             else:
831 |                 line_split  = line.split()
832 |                 left, top, right, bottom = line_split[-4:]
833 |                 class_name  = ""
834 |                 for name in line_split[:-4]:
835 |                     class_name += name + " "
836 |                 class_name = class_name[:-1]
837 |             
838 |             left, top, right, bottom = float(left), float(top), float(right), float(bottom)
839 |             if class_name not in class_names:
840 |                 continue
841 |             cls_id  = class_names.index(class_name) + 1
842 |             bbox    = [left, top, right - left, bottom - top, difficult, str(image_id), cls_id, (right - left) * (bottom - top) - 10.0]
843 |             boxes_per_image.append(bbox)
844 |         images.append(image)
845 |         bboxes.extend(boxes_per_image)
846 |     results['images']        = images
847 | 
848 |     categories = []
849 |     for i, cls in enumerate(class_names):
850 |         category = {}
851 |         category['supercategory']   = cls
852 |         category['name']            = cls
853 |         category['id']              = i + 1
854 |         categories.append(category)
855 |     results['categories']   = categories
856 | 
857 |     annotations = []
858 |     for i, box in enumerate(bboxes):
859 |         annotation = {}
860 |         annotation['area']        = box[-1]
861 |         annotation['category_id'] = box[-2]
862 |         annotation['image_id']    = box[-3]
863 |         annotation['iscrowd']     = box[-4]
864 |         annotation['bbox']        = box[:4]
865 |         annotation['id']          = i
866 |         annotations.append(annotation)
867 |     results['annotations'] = annotations
868 |     return results
869 | 
870 | def preprocess_dr(dr_path, class_names):
871 |     image_ids = os.listdir(dr_path)
872 |     results = []
873 |     for image_id in image_ids:
874 |         lines_list      = file_lines_to_list(os.path.join(dr_path, image_id))
875 |         image_id        = os.path.splitext(image_id)[0]
876 |         for line in lines_list:
877 |             line_split  = line.split()
878 |             confidence, left, top, right, bottom = line_split[-5:]
879 |             class_name  = ""
880 |             for name in line_split[:-5]:
881 |                 class_name += name + " "
882 |             class_name  = class_name[:-1]
883 |             left, top, right, bottom = float(left), float(top), float(right), float(bottom)
884 |             result                  = {}
885 |             result["image_id"]      = str(image_id)
886 |             if class_name not in class_names:
887 |                 continue
888 |             result["category_id"]   = class_names.index(class_name) + 1
889 |             result["bbox"]          = [left, top, right - left, bottom - top]
890 |             result["score"]         = float(confidence)
891 |             results.append(result)
892 |     return results
893 |  
894 | def get_coco_map(class_names, path):
895 |     GT_PATH     = os.path.join(path, 'ground-truth')
896 |     DR_PATH     = os.path.join(path, 'detection-results')
897 |     COCO_PATH   = os.path.join(path, 'coco_eval')
898 | 
899 |     if not os.path.exists(COCO_PATH):
900 |         os.makedirs(COCO_PATH)
901 | 
902 |     GT_JSON_PATH = os.path.join(COCO_PATH, 'instances_gt.json')
903 |     DR_JSON_PATH = os.path.join(COCO_PATH, 'instances_dr.json')
904 | 
905 |     with open(GT_JSON_PATH, "w") as f:
906 |         results_gt  = preprocess_gt(GT_PATH, class_names)
907 |         json.dump(results_gt, f, indent=4)
908 | 
909 |     with open(DR_JSON_PATH, "w") as f:
910 |         results_dr  = preprocess_dr(DR_PATH, class_names)
911 |         json.dump(results_dr, f, indent=4)
912 |         if len(results_dr) == 0:
913 |             print("未检测到任何目标。")
914 |             return [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
915 | 
916 |     cocoGt      = COCO(GT_JSON_PATH)
917 |     cocoDt      = cocoGt.loadRes(DR_JSON_PATH)
918 |     cocoEval    = COCOeval(cocoGt, cocoDt, 'bbox') 
919 |     cocoEval.evaluate()
920 |     cocoEval.accumulate()
921 |     cocoEval.summarize()
922 | 
923 |     return cocoEval.stats


--------------------------------------------------------------------------------
/voc_annotation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import random
  3 | import xml.etree.ElementTree as ET
  4 | 
  5 | import numpy as np
  6 | 
  7 | from utils.utils import get_classes
  8 | 
  9 | #--------------------------------------------------------------------------------------------------------------------------------#
 10 | #   annotation_mode用于指定该文件运行时计算的内容
 11 | #   annotation_mode为0代表整个标签处理过程，包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt
 12 | #   annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt
 13 | #   annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt
 14 | #--------------------------------------------------------------------------------------------------------------------------------#
 15 | annotation_mode     = 0
 16 | #-------------------------------------------------------------------#
 17 | #   必须要修改，用于生成2007_train.txt、2007_val.txt的目标信息
 18 | #   与训练和预测所用的classes_path一致即可
 19 | #   如果生成的2007_train.txt里面没有目标信息
 20 | #   那么就是因为classes没有设定正确
 21 | #   仅在annotation_mode为0和2的时候有效
 22 | #-------------------------------------------------------------------#
 23 | classes_path        = 'model_data/voc_classes.txt'
 24 | #--------------------------------------------------------------------------------------------------------------------------------#
 25 | #   trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1
 26 | #   train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1
 27 | #   仅在annotation_mode为0和1的时候有效
 28 | #--------------------------------------------------------------------------------------------------------------------------------#
 29 | trainval_percent    = 0.9
 30 | train_percent       = 0.9
 31 | #-------------------------------------------------------#
 32 | #   指向VOC数据集所在的文件夹
 33 | #   默认指向根目录下的VOC数据集
 34 | #-------------------------------------------------------#
 35 | VOCdevkit_path  = 'VOCdevkit'
 36 | 
 37 | VOCdevkit_sets  = [('2007', 'train'), ('2007', 'val')]
 38 | classes, _      = get_classes(classes_path)
 39 | 
 40 | #-------------------------------------------------------#
 41 | #   统计目标数量
 42 | #-------------------------------------------------------#
 43 | photo_nums  = np.zeros(len(VOCdevkit_sets))
 44 | nums        = np.zeros(len(classes))
 45 | def convert_annotation(year, image_id, list_file):
 46 |     in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8')
 47 |     tree=ET.parse(in_file)
 48 |     root = tree.getroot()
 49 | 
 50 |     for obj in root.iter('object'):
 51 |         difficult = 0 
 52 |         if obj.find('difficult')!=None:
 53 |             difficult = obj.find('difficult').text
 54 |         cls = obj.find('name').text
 55 |         if cls not in classes or int(difficult)==1:
 56 |             continue
 57 |         cls_id = classes.index(cls)
 58 |         xmlbox = obj.find('bndbox')
 59 |         b = (int(float(xmlbox.find('xmin').text)), int(float(xmlbox.find('ymin').text)), int(float(xmlbox.find('xmax').text)), int(float(xmlbox.find('ymax').text)))
 60 |         list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
 61 |         
 62 |         nums[classes.index(cls)] = nums[classes.index(cls)] + 1
 63 |         
 64 | if __name__ == "__main__":
 65 |     random.seed(0)
 66 |     if " " in os.path.abspath(VOCdevkit_path):
 67 |         raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格，否则会影响正常的模型训练，请注意修改。")
 68 | 
 69 |     if annotation_mode == 0 or annotation_mode == 1:
 70 |         print("Generate txt in ImageSets.")
 71 |         xmlfilepath     = os.path.join(VOCdevkit_path, 'VOC2007/Annotations')
 72 |         saveBasePath    = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Main')
 73 |         temp_xml        = os.listdir(xmlfilepath)
 74 |         total_xml       = []
 75 |         for xml in temp_xml:
 76 |             if xml.endswith(".xml"):
 77 |                 total_xml.append(xml)
 78 | 
 79 |         num     = len(total_xml)  
 80 |         list    = range(num)  
 81 |         tv      = int(num*trainval_percent)  
 82 |         tr      = int(tv*train_percent)  
 83 |         trainval= random.sample(list,tv)  
 84 |         train   = random.sample(trainval,tr)  
 85 |         
 86 |         print("train and val size",tv)
 87 |         print("train size",tr)
 88 |         ftrainval   = open(os.path.join(saveBasePath,'trainval.txt'), 'w')  
 89 |         ftest       = open(os.path.join(saveBasePath,'test.txt'), 'w')  
 90 |         ftrain      = open(os.path.join(saveBasePath,'train.txt'), 'w')  
 91 |         fval        = open(os.path.join(saveBasePath,'val.txt'), 'w')  
 92 |         
 93 |         for i in list:  
 94 |             name=total_xml[i][:-4]+'\n'  
 95 |             if i in trainval:  
 96 |                 ftrainval.write(name)  
 97 |                 if i in train:  
 98 |                     ftrain.write(name)  
 99 |                 else:  
100 |                     fval.write(name)  
101 |             else:  
102 |                 ftest.write(name)  
103 |         
104 |         ftrainval.close()  
105 |         ftrain.close()  
106 |         fval.close()  
107 |         ftest.close()
108 |         print("Generate txt in ImageSets done.")
109 | 
110 |     if annotation_mode == 0 or annotation_mode == 2:
111 |         print("Generate 2007_train.txt and 2007_val.txt for train.")
112 |         type_index = 0
113 |         for year, image_set in VOCdevkit_sets:
114 |             image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split()
115 |             list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8')
116 |             for image_id in image_ids:
117 |                 list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(os.path.abspath(VOCdevkit_path), year, image_id))
118 | 
119 |                 convert_annotation(year, image_id, list_file)
120 |                 list_file.write('\n')
121 |             photo_nums[type_index] = len(image_ids)
122 |             type_index += 1
123 |             list_file.close()
124 |         print("Generate 2007_train.txt and 2007_val.txt for train done.")
125 |         
126 |         def printTable(List1, List2):
127 |             for i in range(len(List1[0])):
128 |                 print("|", end=' ')
129 |                 for j in range(len(List1)):
130 |                     print(List1[j][i].rjust(int(List2[j])), end=' ')
131 |                     print("|", end=' ')
132 |                 print()
133 | 
134 |         str_nums = [str(int(x)) for x in nums]
135 |         tableData = [
136 |             classes, str_nums
137 |         ]
138 |         colWidths = [0]*len(tableData)
139 |         len1 = 0
140 |         for i in range(len(tableData)):
141 |             for j in range(len(tableData[i])):
142 |                 if len(tableData[i][j]) > colWidths[i]:
143 |                     colWidths[i] = len(tableData[i][j])
144 |         printTable(tableData, colWidths)
145 | 
146 |         if photo_nums[0] <= 500:
147 |             print("训练集数量小于500，属于较小的数据量，请注意设置较大的训练世代（Epoch）以满足足够的梯度下降次数（Step）。")
148 | 
149 |         if np.sum(nums) == 0:
150 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
151 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
152 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
153 |             print("（重要的事情说三遍）。")
154 | 


--------------------------------------------------------------------------------
/常见问题汇总.md:
--------------------------------------------------------------------------------
  1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。
  2 | 
  3 | # 问题汇总
  4 | ## 1、下载问题
  5 | ### a、代码下载
  6 | **问：up主，可以给我发一份代码吗，代码在哪里下载啊？ 
  7 | 答：Github上的地址就在视频简介里。复制一下就能进去下载了。**
  8 | 
  9 | **问：up主，为什么我下载的代码提示压缩包损坏？
 10 | 答：重新去Github下载。**
 11 | 
 12 | **问：up主，为什么我下载的代码和你在视频以及博客上的代码不一样？
 13 | 答：我常常会对代码进行更新，最终以实际的代码为准。**
 14 | 
 15 | ### b、 权值下载
 16 | **问：up主，为什么我下载的代码里面，model_data下面没有.pth或者.h5文件？ 
 17 | 答：我一般会把权值上传到Github和百度网盘，在GITHUB的README里面就能找到。**
 18 | 
 19 | ### c、 数据集下载
 20 | **问：up主，XXXX数据集在哪里下载啊？
 21 | 答：一般数据集的下载地址我会放在README里面，基本上都有，没有的话请及时联系我添加，直接发github的issue即可**。
 22 | 
 23 | ## 2、环境配置问题
 24 | ### a、20系列及以下显卡环境配置
 25 | **pytorch代码对应的pytorch版本为1.2，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。
 26 | 
 27 | **keras代码对应的tensorflow版本为1.13.2，keras版本是2.1.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。
 28 | 
 29 | **tf2代码对应的tensorflow版本为2.2.0，无需安装keras，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。
 30 | 
 31 | **问：你的代码某某某版本的tensorflow和pytorch能用嘛？
 32 | 答：最好按照我推荐的配置，配置教程也有！其它版本的我没有试过！可能出现问题但是一般问题不大。仅需要改少量代码即可。**
 33 | 
 34 | ### b、30系列显卡环境配置
 35 | 30系显卡由于框架更新不可使用上述环境配置教程。
 36 | 当前我已经测试的可以用的30显卡配置如下：
 37 | **pytorch代码对应的pytorch版本为1.7.0，cuda为11.0，cudnn为8.0.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120668551](https://blog.csdn.net/weixin_44791964/article/details/120668551)。
 38 | 
 39 | **keras代码无法在win10下配置cuda11，在ubuntu下可以百度查询一下，配置tensorflow版本为1.15.4，keras版本是2.1.5或者2.3.1（少量函数接口不同，代码可能还需要少量调整。）**
 40 | 
 41 | **tf2代码对应的tensorflow版本为2.4.0，cuda为11.0，cudnn为8.0.5，博客地址对应为**[https://blog.csdn.net/weixin_44791964/article/details/120657664](https://blog.csdn.net/weixin_44791964/article/details/120657664)。
 42 | 
 43 | ### c、CPU环境配置
 44 | **pytorch代码对应的pytorch-cpu版本为1.2，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120655098](https://blog.csdn.net/weixin_44791964/article/details/120655098)
 45 | 
 46 | **keras代码对应的tensorflow-cpu版本为1.13.2，keras版本是2.1.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120653717](https://blog.csdn.net/weixin_44791964/article/details/120653717)。
 47 | 
 48 | **tf2代码对应的tensorflow-cpu版本为2.2.0，无需安装keras，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120656291](https://blog.csdn.net/weixin_44791964/article/details/120656291)。
 49 | 
 50 | 
 51 | ### d、GPU利用问题与环境使用问题
 52 | **问：为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢？
 53 | 答：确认tensorflow-gpu已经装好，利用pip list查看tensorflow版本，然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练，任务管理器的话要看显存使用情况。**
 54 | 
 55 | **问：up主，我好像没有在用gpu进行训练啊，怎么看是不是用了GPU进行训练？
 56 | 答：查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令。在windows电脑中打开cmd然后利用nvidia-smi指令查看GPU利用情况**
 57 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/f88ef794c9a341918f000eb2b1c67af6.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
 58 | **如果要一定看任务管理器的话，请看性能部分GPU的显存是否利用，或者查看任务管理器的Cuda，而非Copy。**
 59 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center)
 60 | 
 61 | ### e、DLL load failed: 找不到指定的模块
 62 | **问：出现如下错误**
 63 | ```python
 64 | Traceback (most recent call last):
 65 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
 66 |  from tensorflow.python.pywrap_tensorflow_internal import *
 67 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
 68 | pywrap_tensorflow_internal = swig_import_helper()
 69 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
 70 |     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
 71 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file)
 72 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic
 73 |     return _load(spec)
 74 | ImportError: DLL load failed: 找不到指定的模块。
 75 | ```
 76 | **答：如果没重启过就重启一下，否则重新按照步骤安装，还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。**
 77 | 
 78 | ### f、no module问题（no module name utils.utils、no module named 'matplotlib' ）
 79 | **问：为什么提示说no module name utils.utils（no module name nets.yolo、no module name nets.ssd等一系列问题）啊？
 80 | 答：utils并不需要用pip装，它就在我上传的仓库的根目录，出现这个问题的原因是根目录不对，查查相对目录和根目录的概念。查了基本上就明白了。**
 81 | 
 82 | **问：为什么提示说no module name matplotlib（no module name PIL，no module name cv2等等）？
 83 | 答：这个库没安装打开命令行安装就好。pip install matplotlib**
 84 | 
 85 | **问：为什么我已经用pip装了opencv（pillow、matplotlib等），还是提示no module name cv2？
 86 | 答：没有激活环境装，要激活对应的conda环境进行安装才可以正常使用**
 87 | 
 88 | **问：为什么提示说No module named 'torch' ？
 89 | 答：其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况？一般就俩情况，一个是真的没装，还有一个是装到其它环境了，当前激活的环境不是自己装的环境。**
 90 | 
 91 | **问：为什么提示说No module named 'tensorflow' ？
 92 | 答：同上。**
 93 | 
 94 | ### g、cuda安装失败问题
 95 | 一般cuda安装前需要安装Visual Studio，装个2017版本即可。
 96 | 
 97 | ### h、Ubuntu系统问题
 98 | **所有代码在Ubuntu下可以使用，我两个系统都试过。**
 99 | 
100 | ### i、VSCODE提示错误的问题
101 | **问：为什么在VSCODE里面提示一大堆的错误啊？
102 | 答：我也提示一大堆的错误，但是不影响，是VSCODE的问题，如果不想看错误的话就装Pycharm。
103 | 最好将设置里面的Python:Language Server，调整为Pylance。**
104 | 
105 | ### j、使用cpu进行训练与预测的问题
106 | **对于keras和tf2的代码而言，如果想用cpu进行训练和预测，直接装cpu版本的tensorflow就可以了。**
107 | 
108 | **对于pytorch的代码而言，如果想用cpu进行训练和预测，需要将cuda=True修改成cuda=False。**
109 | 
110 | ### k、tqdm没有pos参数问题
111 | **问：运行代码提示'tqdm' object has no attribute 'pos'。
112 | 答：重装tqdm，换个版本就可以了。**
113 | 
114 | ### l、提示decode(“utf-8”)的问题
115 | **由于h5py库的更新，安装过程中会自动安装h5py=3.0.0以上的版本，会导致decode("utf-8")的错误！
116 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0！**
117 | ```
118 | pip install h5py==2.10.0
119 | ```
120 | 
121 | ### m、提示TypeError: __array__() takes 1 positional argument but 2 were given错误
122 | 可以修改pillow版本解决。
123 | ```
124 | pip install pillow==8.2.0
125 | ```
126 | ### n、如何查看当前cuda和cudnn
127 | **window下cuda版本查看方式如下：
128 | 1、打开cmd窗口。
129 | 2、输入nvcc -V。
130 | 3、Cuda compilation tools, release XXXXXXXX中的XXXXXXXX即cuda版本。**
131 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/0389ea35107a408a80ab5cb6590d5a74.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
132 | window下cudnn版本查看方式如下：
133 | 1、进入cuda安装目录，进入incude文件夹。
134 | 2、找到cudnn.h文件。
135 | 3、右键文本打开，下拉，看到#define处可获得cudnn版本。
136 | ```python
137 | #define CUDNN_MAJOR 7
138 | #define CUDNN_MINOR 4
139 | #define CUDNN_PATCHLEVEL 1
140 | ```
141 | 代表cudnn为7.4.1。
142 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/7a86b68b17c84feaa6fa95780d4ae4b4.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
143 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/81bb7c3e13cc492292530e4b69df86a9.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
144 | 
145 | ### o、为什么按照你的环境配置后还是不能使用
146 | **问：up主，为什么我按照你的环境配置后还是不能使用？
147 | 答：请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。**
148 | 
149 | ### p、其它问题
150 | **问：为什么提示TypeError: cat() got an unexpected keyword argument 'axis'，Traceback (most recent call last)，AttributeError: 'Tensor' object has no attribute 'bool'？
151 | 答：这是版本问题，建议使用torch1.2以上版本**
152 | 
153 | **其它有很多稀奇古怪的问题，很多是版本问题，建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2，就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。**
154 | 
155 | ## 3、目标检测库问题汇总（人脸检测和分类库也可参考）
156 | ### a、shape不匹配问题。
157 | #### 1）、训练时shape不匹配问题。
158 | **问：up主，为什么运行train.py会提示shape不匹配啊？
159 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
160 | 
161 | #### 2）、预测时shape不匹配问题。
162 | **问：为什么我运行predict.py会提示我说shape不匹配呀。**
163 | ##### i、copying a param with shape torch.Size([75, 704, 1, 1]) from checkpoint
164 | 在Pytorch里面是这样的：
165 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
166 | ##### ii、Shapes are [1,1,1024,75] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,75], [255,1024,1,1].
167 | 在Keras里面是这样的：
168 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
169 | **答：原因主要有仨：
170 | 1、训练的classes_path没改，就开始训练了。
171 | 2、训练的model_path没改。
172 | 3、训练的classes_path没改。
173 | 请检查清楚了！确定自己所用的model_path和classes_path是对应的！训练的时候用到的num_classes或者classes_path也需要检查！**
174 | 
175 | ### b、显存不足问题（OOM、RuntimeError: CUDA out of memory）。
176 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
177 | 答：这是在keras中出现的，爆显存了，可以改小batch_size，SSD的显存占用率是最小的，建议用SSD；
178 | 2G显存：SSD、YOLOV4-TINY
179 | 4G显存：YOLOV3
180 | 6G显存：YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等
181 | 8G+显存：随便选吧。**
182 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
183 | 
184 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
185 | 答：这是pytorch中出现的，爆显存了，同上。**
186 | 
187 | **问：为什么我显存都没利用，就直接爆显存了？ 
188 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
189 | ### c、为什么要进行冻结训练与解冻训练，不进行行吗？
190 | **问：为什么要冻结训练和解冻训练呀？
191 | 答：可以不进行，本质上是为了保证性能不足的同学的训练，如果电脑性能完全不够，可以将Freeze_Epoch和UnFreeze_Epoch设置成一样，只进行冻结训练。**
192 | 
193 | **同时这也是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
194 | 在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。
195 | 在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。
196 | 
197 | ### d、我的LOSS好大啊，有问题吗？（我的LOSS好小啊，有问题吗？）
198 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
199 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
200 | 
201 | ### e、为什么我训练出来的模型没有预测结果？
202 | **问：为什么我的训练效果不好？预测了没有框（框不准）。
203 | 答：**
204 | 考虑几个问题：
205 | 1、目标信息问题，查看2007_train.txt文件是否有目标信息，没有的话请修改voc_annotation.py。
206 | 2、数据集问题，小于500的自行考虑增加数据集，同时测试不同的模型，确认数据集是好的。
207 | 3、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
208 | 4、网络问题，比如SSD不适合小目标，因为先验框固定了。
209 | 5、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
210 | 6、确认自己是否按照步骤去做了，如果比如voc_annotation.py里面的classes是否修改了等。
211 | 7、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。
212 | 8、是否修改了网络的主干，如果修改了没有预训练权重，网络不容易收敛，自然效果不好。
213 | 
214 | ### f、为什么我计算出来的map是0？
215 | **问：为什么我的训练效果不好？没有map？
216 | 答：**
217 | 首先尝试利用predict.py预测一下，如果有效果的话应该是get_map.py里面的classes_path设置错误。如果没有预测结果的话，解决方法同e问题，对下面几点进行检查：
218 | 1、目标信息问题，查看2007_train.txt文件是否有目标信息，没有的话请修改voc_annotation.py。
219 | 2、数据集问题，小于500的自行考虑增加数据集，同时测试不同的模型，确认数据集是好的。
220 | 3、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
221 | 4、网络问题，比如SSD不适合小目标，因为先验框固定了。
222 | 5、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
223 | 6、确认自己是否按照步骤去做了，如果比如voc_annotation.py里面的classes是否修改了等。
224 | 7、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。
225 | 8、是否修改了网络的主干，如果修改了没有预训练权重，网络不容易收敛，自然效果不好。
226 | 
227 | ### g、gbk编码错误（'gbk' codec can't decode byte）。
228 | **问：我怎么出现了gbk什么的编码错误啊：**
229 | ```python
230 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
231 | ```
232 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
233 | 
234 | ### h、我的图片是xxx*xxx的分辨率的，可以用吗？
235 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
236 | **答：可以用，代码里面会自动进行resize与数据增强。**
237 | 
238 | ### i、我想进行数据增强！怎么增强？
239 | **问：我想要进行数据增强！怎么做呢？**
240 | **答：可以用，代码里面会自动进行resize与数据增强。**
241 | 
242 | ### j、多GPU训练。
243 | **问：怎么进行多GPU训练？
244 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
245 | 
246 | ### k、能不能训练灰度图？
247 | **问：能不能训练灰度图（预测灰度图）啊？
248 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
249 | 
250 | ### l、断点续练问题。
251 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
252 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
253 | 
254 | ### m、我要训练其它的数据集，预训练权重能不能用？
255 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
256 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
257 | 
258 | ### n、网络如何从0开始训练？
259 | **问：我要怎么不使用预训练权重啊？
260 | 答：看一看注释、大多数代码是model_path = ''，Freeze_Train = Fasle**，如果设置model_path无用，**那么把载入预训练权重的代码注释了就行。**
261 | 
262 | ### o、为什么从0开始训练效果这么差（修改了网络主干，效果不好怎么办）？
263 | **问：为什么我不使用预训练权重效果这么差啊？
264 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
265 | 
266 | **问：up，我修改了网络，预训练权重还能用吗？
267 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
268 | 权值匹配的方式可以参考如下：
269 | ```python
270 | # 加快模型训练的效率
271 | print('Loading weights into state dict...')
272 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
273 | model_dict = model.state_dict()
274 | pretrained_dict = torch.load(model_path, map_location=device)
275 | a = {}
276 | for k, v in pretrained_dict.items():
277 |     try:    
278 |         if np.shape(model_dict[k]) ==  np.shape(v):
279 |             a[k]=v
280 |     except:
281 |         pass
282 | model_dict.update(a)
283 | model.load_state_dict(model_dict)
284 | print('Finished!')
285 | ```
286 | 
287 | **问：为什么从0开始训练效果这么差（我修改了网络主干，效果不好怎么办）？
288 | 答：一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。
289 | 网络修改了主干之后也是同样的问题，随机的权值效果很差。**
290 | 
291 | **问：怎么在模型上从0开始训练？
292 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
293 | 如果一定要从0开始，那么训练的时候请注意几点：
294 |  - 不载入预训练权重。 
295 |  - 不要进行冻结训练，注释冻结模型的代码。
296 | 
297 | **问：为什么我不使用预训练权重效果这么差啊？
298 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
299 | 
300 | ### p、你的权值都是哪里来的？
301 | **问：如果网络不能从0开始训练的话你的权值哪里来的？
302 | 答：有些权值是官方转换过来的，有些权值是自己训练出来的，我用到的主干的imagenet的权值都是官方的。**
303 | 
304 | ### q、视频检测与摄像头检测
305 | **问：怎么用摄像头检测呀？
306 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
307 | 
308 | **问：怎么用视频检测呀？
309 | 答：同上**
310 | 
311 | ### r、如何保存检测出的图片
312 | **问：检测完的图片怎么保存？
313 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
314 | 
315 | **问：怎么用视频保存呀？
316 | 答：详细看看predict.py文件的注释。**
317 | 
318 | ### s、遍历问题
319 | **问：如何对一个文件夹的图片进行遍历？
320 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
321 | 
322 | **问：如何对一个文件夹的图片进行遍历？并且保存。
323 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
324 | 
325 | ### t、路径问题（No such file or directory、StopIteration: [Errno 13] Permission denied: 'XXXXXX'）
326 | **问：我怎么出现了这样的错误呀：**
327 | ```python
328 | FileNotFoundError: 【Errno 2】 No such file or directory
329 | StopIteration: [Errno 13] Permission denied: 'D:\\Study\\Collection\\Dataset\\VOC07+12+test\\VOCdevkit/VOC2007'
330 | ……………………………………
331 | ……………………………………
332 | ```
333 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
334 | 关于路径有几个重要的点：
335 | **文件夹名称中一定不要有空格。
336 | 注意相对路径和绝对路径。
337 | 多百度路径相关的知识。**
338 | 
339 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
340 | ### u、和原版比较问题，你怎么和原版不一样啊？
341 | **问：原版的代码是XXX，为什么你的代码是XXX？
342 | 答：是啊……这要不怎么说我不是原版呢……**
343 | 
344 | **问：你这个代码和原版比怎么样，可以达到原版的效果么？
345 | 答：基本上可以达到，我都用voc数据测过，我没有好显卡，没有能力在coco上测试与训练。**
346 | 
347 | **问：你有没有实现yolov4所有的tricks，和原版差距多少？
348 | 答：并没有实现全部的改进部分，由于YOLOV4使用的改进实在太多了，很难完全实现与列出来，这里只列出来了一些我比较感兴趣，而且非常有效的改进。论文中提到的SAM（注意力机制模块），作者自己的源码也没有使用。还有其它很多的tricks，不是所有的tricks都有提升，我也没法实现全部的tricks。至于和原版的比较，我没有能力训练coco数据集，根据使用过的同学反应差距不大。**
349 | 
350 | ### v、我的检测速度是xxx正常吗？我的检测速度还能增快吗？
351 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
352 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
353 | 
354 | **问：我的检测速度是xxx正常吗？我的检测速度还能增快吗？
355 | 答：看配置，配置好速度就快，如果想要配置不变的情况下加快速度，就要修改网络了。**
356 | 
357 | **问：为什么我用服务器去测试yolov4（or others）的FPS只有十几？
358 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。**
359 | 
360 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
361 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
362 | 
363 | ### w、预测图片不显示问题
364 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
365 | 答：给系统安装一个图片查看器就行了。**
366 | 
367 | ### x、算法评价问题（目标检测的map、PR曲线、Recall、Precision等）
368 | **问：怎么计算map？
369 | 答：看map视频，都一个流程。**
370 | 
371 | **问：计算map的时候，get_map.py里面有一个MINOVERLAP是什么用的，是iou吗？
372 | 答：是iou，它的作用是判断预测框和真实框的重合成度，如果重合程度大于MINOVERLAP，则预测正确。**
373 | 
374 | **问：为什么get_map.py里面的self.confidence（self.score）要设置的那么小？
375 | 答：看一下map的视频的原理部分，要知道所有的结果然后再进行pr曲线的绘制。**
376 | 
377 | **问：能不能说说怎么绘制PR曲线啥的呀。
378 | 答：可以看mAP视频，结果里面有PR曲线。**
379 | 
380 | **问：怎么计算Recall、Precision指标。
381 | 答：这俩指标应该是相对于特定的置信度的，计算map的时候也会获得。**
382 | 
383 | ### y、coco数据集训练问题
384 | **问：目标检测怎么训练COCO数据集啊？。
385 | 答：coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库，格式都是一样的。**
386 | 
387 | ### z、UP，怎么优化模型啊？我想提升效果
388 | **问：up，怎么修改模型啊，我想发个小论文！
389 | 答：建议看看yolov3和yolov4的区别，然后看看yolov4的论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。**
390 | 
391 | ### aa、UP，有Focal LOSS的代码吗？怎么改啊？
392 | **问：up，YOLO系列使用Focal LOSS的代码你有吗，有提升吗？
393 | 答：很多人试过，提升效果也不大（甚至变的更Low），它自己有自己的正负样本的平衡方式**。改代码的事情，还是自己好好看看代码吧。
394 | 
395 | ### ab、部署问题（ONNX、TensorRT等）
396 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
397 | 
398 | ## 4、语义分割库问题汇总
399 | ### a、shape不匹配问题
400 | #### 1）、训练时shape不匹配问题
401 | **问：up主，为什么运行train.py会提示shape不匹配啊？
402 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
403 | 
404 | #### 2）、预测时shape不匹配问题
405 | **问：为什么我运行predict.py会提示我说shape不匹配呀。**
406 | ##### i、copying a param with shape torch.Size([75, 704, 1, 1]) from checkpoint
407 | 在Pytorch里面是这样的：
408 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
409 | ##### ii、Shapes are [1,1,1024,75] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,75], [255,1024,1,1].
410 | 在Keras里面是这样的：
411 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
412 | **答：原因主要有二：
413 | 1、train.py里面的num_classes没改。
414 | 2、预测时num_classes没改。
415 | 3、预测时model_path没改。
416 | 请检查清楚！训练和预测的时候用到的num_classes都需要检查！**
417 | 
418 | ### b、显存不足问题（OOM、RuntimeError: CUDA out of memory）。
419 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
420 | 答：这是在keras中出现的，爆显存了，可以改小batch_size。**
421 | 
422 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
423 | 
424 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
425 | 答：这是pytorch中出现的，爆显存了，同上。**
426 | 
427 | **问：为什么我显存都没利用，就直接爆显存了？ 
428 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
429 | 
430 | ### c、为什么要进行冻结训练与解冻训练，不进行行吗？
431 | **问：为什么要冻结训练和解冻训练呀？
432 | 答：可以不进行，本质上是为了保证性能不足的同学的训练，如果电脑性能完全不够，可以将Freeze_Epoch和UnFreeze_Epoch设置成一样，只进行冻结训练。**
433 | 
434 | **同时这也是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
435 | 在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。
436 | 在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。
437 | 
438 | ### d、我的LOSS好大啊，有问题吗？（我的LOSS好小啊，有问题吗？）
439 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
440 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
441 | 
442 | ### e、为什么我训练出来的模型没有预测结果？
443 | **问：为什么我的训练效果不好？预测了没有框（框不准）。
444 | 答：**
445 | **考虑几个问题：
446 | 1、数据集问题，这是最重要的问题。小于500的自行考虑增加数据集；一定要检查数据集的标签，视频中详细解析了VOC数据集的格式，但并不是有输入图片有输出标签即可，还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对，最常见的错误格式就是标签的背景为黑，目标为白，此时目标的像素点值为255，无法正常训练，目标需要为1才行。
447 | 2、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
448 | 3、网络问题，可以尝试不同的网络。
449 | 4、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
450 | 5、确认自己是否按照步骤去做了。
451 | 6、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。**
452 | 
453 | **问：为什么我的训练效果不好？对小目标预测不准确。
454 | 答：对于deeplab和pspnet而言，可以修改一下downsample_factor，当downsample_factor为16的时候下采样倍数过多，效果不太好，可以修改为8。**
455 | 
456 | ### f、为什么我计算出来的miou是0？
457 | **问：为什么我的训练效果不好？计算出来的miou是0？。**
458 | 答：
459 | 与e类似，**考虑几个问题：
460 | 1、数据集问题，这是最重要的问题。小于500的自行考虑增加数据集；一定要检查数据集的标签，视频中详细解析了VOC数据集的格式，但并不是有输入图片有输出标签即可，还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对，最常见的错误格式就是标签的背景为黑，目标为白，此时目标的像素点值为255，无法正常训练，目标需要为1才行。
461 | 2、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
462 | 3、网络问题，可以尝试不同的网络。
463 | 4、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
464 | 5、确认自己是否按照步骤去做了。
465 | 6、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。**
466 | 
467 | ### g、gbk编码错误（'gbk' codec can't decode byte）。
468 | **问：我怎么出现了gbk什么的编码错误啊：**
469 | ```python
470 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
471 | ```
472 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
473 | 
474 | ### h、我的图片是xxx*xxx的分辨率的，可以用吗？
475 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
476 | **答：可以用，代码里面会自动进行resize与数据增强。**
477 | 
478 | ### i、我想进行数据增强！怎么增强？
479 | **问：我想要进行数据增强！怎么做呢？**
480 | **答：可以用，代码里面会自动进行resize与数据增强。**
481 | 
482 | ### j、多GPU训练。
483 | **问：怎么进行多GPU训练？
484 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
485 | 
486 | ### k、能不能训练灰度图？
487 | **问：能不能训练灰度图（预测灰度图）啊？
488 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
489 | 
490 | ### l、断点续练问题。
491 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
492 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
493 | 
494 | ### m、我要训练其它的数据集，预训练权重能不能用？
495 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
496 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
497 | 
498 | ### n、网络如何从0开始训练？
499 | **问：我要怎么不使用预训练权重啊？
500 | 答：看一看注释、大多数代码是model_path = ''，Freeze_Train = Fasle**，如果设置model_path无用，**那么把载入预训练权重的代码注释了就行。**
501 | 
502 | ### o、为什么从0开始训练效果这么差（修改了网络主干，效果不好怎么办）？
503 | **问：为什么我不使用预训练权重效果这么差啊？
504 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，预训练权重还是非常重要的。**
505 | 
506 | **问：up，我修改了网络，预训练权重还能用吗？
507 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
508 | 权值匹配的方式可以参考如下：
509 | ```python
510 | # 加快模型训练的效率
511 | print('Loading weights into state dict...')
512 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
513 | model_dict = model.state_dict()
514 | pretrained_dict = torch.load(model_path, map_location=device)
515 | a = {}
516 | for k, v in pretrained_dict.items():
517 |     try:    
518 |         if np.shape(model_dict[k]) ==  np.shape(v):
519 |             a[k]=v
520 |     except:
521 |         pass
522 | model_dict.update(a)
523 | model.load_state_dict(model_dict)
524 | print('Finished!')
525 | ```
526 | 
527 | **问：为什么从0开始训练效果这么差（我修改了网络主干，效果不好怎么办）？
528 | 答：一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。
529 | 网络修改了主干之后也是同样的问题，随机的权值效果很差。**
530 | 
531 | **问：怎么在模型上从0开始训练？
532 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
533 | 如果一定要从0开始，那么训练的时候请注意几点：
534 |  - 不载入预训练权重。 
535 |  - 不要进行冻结训练，注释冻结模型的代码。
536 | 
537 | **问：为什么我不使用预训练权重效果这么差啊？
538 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
539 | 
540 | ### p、你的权值都是哪里来的？
541 | **问：如果网络不能从0开始训练的话你的权值哪里来的？
542 | 答：有些权值是官方转换过来的，有些权值是自己训练出来的，我用到的主干的imagenet的权值都是官方的。**
543 | 
544 | 
545 | ### q、视频检测与摄像头检测
546 | **问：怎么用摄像头检测呀？
547 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
548 | 
549 | **问：怎么用视频检测呀？
550 | 答：同上**
551 | 
552 | ### r、如何保存检测出的图片
553 | **问：检测完的图片怎么保存？
554 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
555 | 
556 | **问：怎么用视频保存呀？
557 | 答：详细看看predict.py文件的注释。**
558 | 
559 | ### s、遍历问题
560 | **问：如何对一个文件夹的图片进行遍历？
561 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
562 | 
563 | **问：如何对一个文件夹的图片进行遍历？并且保存。
564 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
565 | 
566 | ### t、路径问题（No such file or directory、StopIteration: [Errno 13] Permission denied: 'XXXXXX'）
567 | **问：我怎么出现了这样的错误呀：**
568 | ```python
569 | FileNotFoundError: 【Errno 2】 No such file or directory
570 | StopIteration: [Errno 13] Permission denied: 'D:\\Study\\Collection\\Dataset\\VOC07+12+test\\VOCdevkit/VOC2007'
571 | ……………………………………
572 | ……………………………………
573 | ```
574 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
575 | 关于路径有几个重要的点：
576 | **文件夹名称中一定不要有空格。
577 | 注意相对路径和绝对路径。
578 | 多百度路径相关的知识。**
579 | 
580 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
581 | ### u、和原版比较问题，你怎么和原版不一样啊？
582 | **问：原版的代码是XXX，为什么你的代码是XXX？
583 | 答：是啊……这要不怎么说我不是原版呢……**
584 | 
585 | **问：你这个代码和原版比怎么样，可以达到原版的效果么？
586 | 答：基本上可以达到，我都用voc数据测过，我没有好显卡，没有能力在coco上测试与训练。**
587 | 
588 | ### v、我的检测速度是xxx正常吗？我的检测速度还能增快吗？
589 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
590 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
591 | 
592 | **问：我的检测速度是xxx正常吗？我的检测速度还能增快吗？
593 | 答：看配置，配置好速度就快，如果想要配置不变的情况下加快速度，就要修改网络了。**
594 | 
595 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
596 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
597 | 
598 | ### w、预测图片不显示问题
599 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
600 | 答：给系统安装一个图片查看器就行了。**
601 | 
602 | ### x、算法评价问题（miou）
603 | **问：怎么计算miou？
604 | 答：参考视频里的miou测量部分。**
605 | 
606 | **问：怎么计算Recall、Precision指标。
607 | 答：现有的代码还无法获得，需要各位同学理解一下混淆矩阵的概念，然后自行计算一下。**
608 | 
609 | ### y、UP，怎么优化模型啊？我想提升效果
610 | **问：up，怎么修改模型啊，我想发个小论文！
611 | 答：建议目标检测中的yolov4论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。**
612 | 
613 | ### z、部署问题（ONNX、TensorRT等）
614 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
615 | 
616 | ## 5、交流群问题
617 | **问：up，有没有QQ群啥的呢？
618 | 答：没有没有，我没有时间管理QQ群……**
619 | 
620 | ## 6、怎么学习的问题
621 | **问：up，你的学习路线怎么样的？我是个小白我要怎么学？
622 | 答：这里有几点需要注意哈
623 | 1、我不是高手，很多东西我也不会，我的学习路线也不一定适用所有人。
624 | 2、我实验室不做深度学习，所以我很多东西都是自学，自己摸索，正确与否我也不知道。
625 | 3、我个人觉得学习更靠自学**
626 | 学习路线的话，我是先学习了莫烦的python教程，从tensorflow、keras、pytorch入门，入门完之后学的SSD，YOLO，然后了解了很多经典的卷积网，后面就开始学很多不同的代码了，我的学习方法就是一行一行的看，了解整个代码的执行流程，特征层的shape变化等，花了很多时间也没有什么捷径，就是要花时间吧。
627 | 


--------------------------------------------------------------------------------