├── .gitignore
├── LICENSE
├── README.md
├── VOCdevkit
    └── VOC2007
    │   ├── Annotations
    │       └── README.md
    │   ├── ImageSets
    │       └── Main
    │       │   └── README.md
    │   └── JPEGImages
    │       └── README.md
├── get_map.py
├── img
    └── street.jpg
├── logs
    └── README.md
├── model_data
    ├── simhei.ttf
    └── voc_classes.txt
├── nets
    ├── __init__.py
    ├── mobilenetv2.py
    ├── resnet.py
    ├── ssd.py
    ├── ssd_training.py
    └── vgg.py
├── predict.py
├── requirements.txt
├── ssd.py
├── summary.py
├── train.py
├── utils
    ├── __init__.py
    ├── anchors.py
    ├── callbacks.py
    ├── dataloader.py
    ├── utils.py
    ├── utils_bbox.py
    ├── utils_fit.py
    └── utils_map.py
├── voc_annotation.py
└── 常见问题汇总.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # ignore map, miou, datasets
  2 | map_out/
  3 | miou_out/
  4 | VOCdevkit/
  5 | datasets/
  6 | Medical_Datasets/
  7 | lfw/
  8 | logs/
  9 | model_data/
 10 | .temp_map_out/
 11 | 
 12 | # Byte-compiled / optimized / DLL files
 13 | __pycache__/
 14 | *.py[cod]
 15 | *$py.class
 16 | 
 17 | # C extensions
 18 | *.so
 19 | 
 20 | # Distribution / packaging
 21 | .Python
 22 | build/
 23 | develop-eggs/
 24 | dist/
 25 | downloads/
 26 | eggs/
 27 | .eggs/
 28 | lib/
 29 | lib64/
 30 | parts/
 31 | sdist/
 32 | var/
 33 | wheels/
 34 | pip-wheel-metadata/
 35 | share/python-wheels/
 36 | *.egg-info/
 37 | .installed.cfg
 38 | *.egg
 39 | MANIFEST
 40 | 
 41 | # PyInstaller
 42 | #  Usually these files are written by a python script from a template
 43 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 44 | *.manifest
 45 | *.spec
 46 | 
 47 | # Installer logs
 48 | pip-log.txt
 49 | pip-delete-this-directory.txt
 50 | 
 51 | # Unit test / coverage reports
 52 | htmlcov/
 53 | .tox/
 54 | .nox/
 55 | .coverage
 56 | .coverage.*
 57 | .cache
 58 | nosetests.xml
 59 | coverage.xml
 60 | *.cover
 61 | *.py,cover
 62 | .hypothesis/
 63 | .pytest_cache/
 64 | 
 65 | # Translations
 66 | *.mo
 67 | *.pot
 68 | 
 69 | # Django stuff:
 70 | *.log
 71 | local_settings.py
 72 | db.sqlite3
 73 | db.sqlite3-journal
 74 | 
 75 | # Flask stuff:
 76 | instance/
 77 | .webassets-cache
 78 | 
 79 | # Scrapy stuff:
 80 | .scrapy
 81 | 
 82 | # Sphinx documentation
 83 | docs/_build/
 84 | 
 85 | # PyBuilder
 86 | target/
 87 | 
 88 | # Jupyter Notebook
 89 | .ipynb_checkpoints
 90 | 
 91 | # IPython
 92 | profile_default/
 93 | ipython_config.py
 94 | 
 95 | # pyenv
 96 | .python-version
 97 | 
 98 | # pipenv
 99 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
100 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
101 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
102 | #   install all needed dependencies.
103 | #Pipfile.lock
104 | 
105 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
106 | __pypackages__/
107 | 
108 | # Celery stuff
109 | celerybeat-schedule
110 | celerybeat.pid
111 | 
112 | # SageMath parsed files
113 | *.sage.py
114 | 
115 | # Environments
116 | .env
117 | .venv
118 | env/
119 | venv/
120 | ENV/
121 | env.bak/
122 | venv.bak/
123 | 
124 | # Spyder project settings
125 | .spyderproject
126 | .spyproject
127 | 
128 | # Rope project settings
129 | .ropeproject
130 | 
131 | # mkdocs documentation
132 | /site
133 | 
134 | # mypy
135 | .mypy_cache/
136 | .dmypy.json
137 | dmypy.json
138 | 
139 | # Pyre type checker
140 | .pyre/
141 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 JiaQi Xu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## SSD：Single-Shot MultiBox Detector目标检测模型在Pytorch当中的实现
  2 | ---
  3 | 
  4 | ## 目录
  5 | 1. [仓库更新 Top News](#仓库更新)
  6 | 2. [性能情况 Performance](#性能情况)
  7 | 3. [所需环境 Environment](#所需环境)
  8 | 4. [文件下载 Download](#文件下载)
  9 | 5. [训练步骤 How2train](#训练步骤)
 10 | 6. [预测步骤 How2predict](#预测步骤)
 11 | 7. [评估步骤 How2eval](#评估步骤)
 12 | 8. [参考资料 Reference](#Reference)
 13 | 
 14 | ## Top News
 15 | **`2022-03`**:**进行了大幅度的更新，支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整、新增图片裁剪。**  
 16 | BiliBili视频中的原仓库地址为：https://github.com/bubbliiiing/ssd-pytorch/tree/bilibili
 17 | 
 18 | **`2021-10`**:**进行了大幅度的更新，增加了mobilenetv2主干的选择、增加大量注释、增加了大量可调整参数、对代码的组成模块进行修改、增加fps、视频预测、批量预测等功能。**   
 19 | 
 20 | ## 性能情况
 21 | | 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mAP 0.5:0.95 | mAP 0.5 |
 22 | | :-----: | :-----: | :------: | :------: | :------: | :-----: |
 23 | | VOC07+12 | [ssd_weights.pth](https://github.com/bubbliiiing/ssd-pytorch/releases/download/v1.0/ssd_weights.pth) | VOC-Test07 | 300x300| - | 78.55
 24 | | VOC07+12 | [mobilenetv2_ssd_weights.pth](https://github.com/bubbliiiing/ssd-pytorch/releases/download/v1.0/mobilenetv2_ssd_weights.pth) | VOC-Test07 | 300x300| - | 71.32
 25 | 
 26 | ## 所需环境
 27 | torch == 1.2.0
 28 | 
 29 | ## 文件下载
 30 | 训练所需的ssd_weights.pth和主干的权值可以在百度云下载。  
 31 | 链接: https://pan.baidu.com/s/1iUVE50oLkzqhtZbUL9el9w     
 32 | 提取码: jgn8     
 33 | 
 34 | VOC数据集下载地址如下，里面已经包括了训练集、测试集、验证集（与测试集一样），无需再次划分：  
 35 | 链接: https://pan.baidu.com/s/1-1Ej6dayrx3g0iAA88uY5A    
 36 | 提取码: ph32   
 37 | 
 38 | ## 训练步骤
 39 | ### a、训练VOC07+12数据集
 40 | 1. 数据集的准备   
 41 | **本文使用VOC格式进行训练，训练前需要下载好VOC07+12的数据集，解压后放在根目录**  
 42 | 
 43 | 2. 数据集的处理   
 44 | 修改voc_annotation.py里面的annotation_mode=2，运行voc_annotation.py生成根目录下的2007_train.txt和2007_val.txt。   
 45 | 
 46 | 3. 开始网络训练   
 47 | train.py的默认参数用于训练VOC数据集，直接运行train.py即可开始训练。   
 48 | 
 49 | 4. 训练结果预测   
 50 | 训练结果预测需要用到两个文件，分别是ssd.py和predict.py。我们首先需要去ssd.py里面修改model_path以及classes_path，这两个参数必须要修改。   
 51 | **model_path指向训练好的权值文件，在logs文件夹里。   
 52 | classes_path指向检测类别所对应的txt。**   
 53 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。   
 54 | 
 55 | ### b、训练自己的数据集
 56 | 1. 数据集的准备  
 57 | **本文使用VOC格式进行训练，训练前需要自己制作好数据集，**    
 58 | 训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。   
 59 | 训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。   
 60 | 
 61 | 2. 数据集的处理  
 62 | 在完成数据集的摆放之后，我们需要利用voc_annotation.py获得训练用的2007_train.txt和2007_val.txt。   
 63 | 修改voc_annotation.py里面的参数。第一次训练可以仅修改classes_path，classes_path用于指向检测类别所对应的txt。   
 64 | 训练自己的数据集时，可以自己建立一个cls_classes.txt，里面写自己所需要区分的类别。   
 65 | model_data/cls_classes.txt文件内容为：      
 66 | ```python
 67 | cat
 68 | dog
 69 | ...
 70 | ```
 71 | 修改voc_annotation.py中的classes_path，使其对应cls_classes.txt，并运行voc_annotation.py。  
 72 | 
 73 | 3. 开始网络训练  
 74 | **训练的参数较多，均在train.py中，大家可以在下载库后仔细看注释，其中最重要的部分依然是train.py里的classes_path。**  
 75 | **classes_path用于指向检测类别所对应的txt，这个txt和voc_annotation.py里面的txt一样！训练自己的数据集必须要修改！**  
 76 | 修改完classes_path后就可以运行train.py开始训练了，在训练多个epoch后，权值会生成在logs文件夹中。  
 77 | 
 78 | 4. 训练结果预测  
 79 | 训练结果预测需要用到两个文件，分别是ssd.py和predict.py。在ssd.py里面修改model_path以及classes_path。  
 80 | **model_path指向训练好的权值文件，在logs文件夹里。  
 81 | classes_path指向检测类别所对应的txt。**  
 82 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。  
 83 | 
 84 | ## 预测步骤
 85 | ### a、使用预训练权重
 86 | 1. 下载完库后解压，在百度网盘下载，放入model_data，运行predict.py，输入  
 87 | ```python
 88 | img/street.jpg
 89 | ```
 90 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
 91 | ### b、使用自己训练的权重
 92 | 1. 按照训练步骤训练。  
 93 | 2. 在ssd.py文件里面，在如下部分修改model_path和classes_path使其对应训练好的文件；**model_path对应logs文件夹下面的权值文件，classes_path是model_path对应分的类**。  
 94 | ```python
 95 | _defaults = {
 96 |     #--------------------------------------------------------------------------#
 97 |     #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
 98 |     #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
 99 |     #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
100 |     #--------------------------------------------------------------------------#
101 |     "model_path"        : 'model_data/ssd_weights.pth',
102 |     "classes_path"      : 'model_data/voc_classes.txt',
103 |     #---------------------------------------------------------------------#
104 |     #   用于预测的图像大小，和train时使用同一个即可
105 |     #---------------------------------------------------------------------#
106 |     "input_shape"       : [300, 300],
107 |     #-------------------------------#
108 |     #   主干网络的选择
109 |     #   vgg或者mobilenetv2
110 |     #-------------------------------#
111 |     "backbone"          : "vgg",
112 |     #---------------------------------------------------------------------#
113 |     #   只有得分大于置信度的预测框会被保留下来
114 |     #---------------------------------------------------------------------#
115 |     "confidence"        : 0.5,
116 |     #---------------------------------------------------------------------#
117 |     #   非极大抑制所用到的nms_iou大小
118 |     #---------------------------------------------------------------------#
119 |     "nms_iou"           : 0.45,
120 |     #---------------------------------------------------------------------#
121 |     #   用于指定先验框的大小
122 |     #---------------------------------------------------------------------#
123 |     'anchors_size'      : [30, 60, 111, 162, 213, 264, 315],
124 |     #---------------------------------------------------------------------#
125 |     #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
126 |     #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
127 |     #---------------------------------------------------------------------#
128 |     "letterbox_image"   : False,
129 |     #-------------------------------#
130 |     #   是否使用Cuda
131 |     #   没有GPU可以设置成False
132 |     #-------------------------------#
133 |     "cuda"              : True,
134 | }
135 | ```
136 | 3. 运行predict.py，输入  
137 | ```python
138 | img/street.jpg
139 | ```
140 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
141 | 
142 | ## 评估步骤 
143 | ### a、评估VOC07+12的测试集
144 | 1. 本文使用VOC格式进行评估。VOC07+12已经划分好了测试集，无需利用voc_annotation.py生成ImageSets文件夹下的txt。
145 | 2. 在ssd.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
146 | 3. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
147 | 
148 | ### b、评估自己的数据集
149 | 1. 本文使用VOC格式进行评估。  
150 | 2. 如果在训练前已经运行过voc_annotation.py文件，代码会自动将数据集划分成训练集、验证集和测试集。如果想要修改测试集的比例，可以修改voc_annotation.py文件下的trainval_percent。trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1。train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1。
151 | 3. 利用voc_annotation.py划分测试集后，前往get_map.py文件修改classes_path，classes_path用于指向检测类别所对应的txt，这个txt和训练时的txt一样。评估自己的数据集必须要修改。
152 | 4. 在ssd.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
153 | 5. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
154 | 
155 | ## Reference
156 | https://github.com/pierluigiferrari/ssd_keras  
157 | https://github.com/kuhung/SSD_keras  
158 | 


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/Annotations/README.md:
--------------------------------------------------------------------------------
1 | 存放标签文件


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/ImageSets/Main/README.md:
--------------------------------------------------------------------------------
1 | 存放训练索引文件


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/JPEGImages/README.md:
--------------------------------------------------------------------------------
1 | 存放图片文件


--------------------------------------------------------------------------------
/get_map.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import xml.etree.ElementTree as ET
  3 | 
  4 | from PIL import Image
  5 | from tqdm import tqdm
  6 | 
  7 | from utils.utils import get_classes
  8 | from utils.utils_map import get_coco_map, get_map
  9 | from ssd import SSD
 10 | 
 11 | if __name__ == "__main__":
 12 |     '''
 13 |     Recall和Precision不像AP是一个面积的概念，因此在门限值（Confidence）不同时，网络的Recall和Precision值是不同的。
 14 |     默认情况下，本代码计算的Recall和Precision代表的是当门限值（Confidence）为0.5时，所对应的Recall和Precision值。
 15 | 
 16 |     受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算不同门限条件下的Recall和Precision值
 17 |     因此，本代码获得的map_out/detection-results/里面的txt的框的数量一般会比直接predict多一些，目的是列出所有可能的预测框，
 18 |     '''
 19 |     #------------------------------------------------------------------------------------------------------------------#
 20 |     #   map_mode用于指定该文件运行时计算的内容
 21 |     #   map_mode为0代表整个map计算流程，包括获得预测结果、获得真实框、计算VOC_map。
 22 |     #   map_mode为1代表仅仅获得预测结果。
 23 |     #   map_mode为2代表仅仅获得真实框。
 24 |     #   map_mode为3代表仅仅计算VOC_map。
 25 |     #   map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行
 26 |     #-------------------------------------------------------------------------------------------------------------------#
 27 |     map_mode        = 0
 28 |     #--------------------------------------------------------------------------------------#
 29 |     #   此处的classes_path用于指定需要测量VOC_map的类别
 30 |     #   一般情况下与训练和预测所用的classes_path一致即可
 31 |     #--------------------------------------------------------------------------------------#
 32 |     classes_path    = 'model_data/voc_classes.txt'
 33 |     #--------------------------------------------------------------------------------------#
 34 |     #   MINOVERLAP用于指定想要获得的mAP0.x，mAP0.x的意义是什么请同学们百度一下。
 35 |     #   比如计算mAP0.75，可以设定MINOVERLAP = 0.75。
 36 |     #
 37 |     #   当某一预测框与真实框重合度大于MINOVERLAP时，该预测框被认为是正样本，否则为负样本。
 38 |     #   因此MINOVERLAP的值越大，预测框要预测的越准确才能被认为是正样本，此时算出来的mAP值越低，
 39 |     #--------------------------------------------------------------------------------------#
 40 |     MINOVERLAP      = 0.5
 41 |     #--------------------------------------------------------------------------------------#
 42 |     #   受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算mAP
 43 |     #   因此，confidence的值应当设置的尽量小进而获得全部可能的预测框。
 44 |     #   
 45 |     #   该值一般不调整。因为计算mAP需要获得近乎所有的预测框，此处的confidence不能随便更改。
 46 |     #   想要获得不同门限值下的Recall和Precision值，请修改下方的score_threhold。
 47 |     #--------------------------------------------------------------------------------------#
 48 |     confidence      = 0.02
 49 |     #--------------------------------------------------------------------------------------#
 50 |     #   预测时使用到的非极大抑制值的大小，越大表示非极大抑制越不严格。
 51 |     #   
 52 |     #   该值一般不调整。
 53 |     #--------------------------------------------------------------------------------------#
 54 |     nms_iou         = 0.5
 55 |     #---------------------------------------------------------------------------------------------------------------#
 56 |     #   Recall和Precision不像AP是一个面积的概念，因此在门限值不同时，网络的Recall和Precision值是不同的。
 57 |     #   
 58 |     #   默认情况下，本代码计算的Recall和Precision代表的是当门限值为0.5（此处定义为score_threhold）时所对应的Recall和Precision值。
 59 |     #   因为计算mAP需要获得近乎所有的预测框，上面定义的confidence不能随便更改。
 60 |     #   这里专门定义一个score_threhold用于代表门限值，进而在计算mAP时找到门限值对应的Recall和Precision值。
 61 |     #---------------------------------------------------------------------------------------------------------------#
 62 |     score_threhold  = 0.5
 63 |     #-------------------------------------------------------#
 64 |     #   map_vis用于指定是否开启VOC_map计算的可视化
 65 |     #-------------------------------------------------------#
 66 |     map_vis         = False
 67 |     #-------------------------------------------------------#
 68 |     #   指向VOC数据集所在的文件夹
 69 |     #   默认指向根目录下的VOC数据集
 70 |     #-------------------------------------------------------#
 71 |     VOCdevkit_path  = 'VOCdevkit'
 72 |     #-------------------------------------------------------#
 73 |     #   结果输出的文件夹，默认为map_out
 74 |     #-------------------------------------------------------#
 75 |     map_out_path    = 'map_out'
 76 | 
 77 |     image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split()
 78 | 
 79 |     if not os.path.exists(map_out_path):
 80 |         os.makedirs(map_out_path)
 81 |     if not os.path.exists(os.path.join(map_out_path, 'ground-truth')):
 82 |         os.makedirs(os.path.join(map_out_path, 'ground-truth'))
 83 |     if not os.path.exists(os.path.join(map_out_path, 'detection-results')):
 84 |         os.makedirs(os.path.join(map_out_path, 'detection-results'))
 85 |     if not os.path.exists(os.path.join(map_out_path, 'images-optional')):
 86 |         os.makedirs(os.path.join(map_out_path, 'images-optional'))
 87 | 
 88 |     class_names, _ = get_classes(classes_path)
 89 | 
 90 |     if map_mode == 0 or map_mode == 1:
 91 |         print("Load model.")
 92 |         ssd = SSD(confidence = confidence, nms_iou = nms_iou)
 93 |         print("Load model done.")
 94 | 
 95 |         print("Get predict result.")
 96 |         for image_id in tqdm(image_ids):
 97 |             image_path  = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg")
 98 |             image       = Image.open(image_path)
 99 |             if map_vis:
100 |                 image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg"))
101 |             ssd.get_map_txt(image_id, image, class_names, map_out_path)
102 |         print("Get predict result done.")
103 |         
104 |     if map_mode == 0 or map_mode == 2:
105 |         print("Get ground truth result.")
106 |         for image_id in tqdm(image_ids):
107 |             with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
108 |                 root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot()
109 |                 for obj in root.findall('object'):
110 |                     difficult_flag = False
111 |                     if obj.find('difficult')!=None:
112 |                         difficult = obj.find('difficult').text
113 |                         if int(difficult)==1:
114 |                             difficult_flag = True
115 |                     obj_name = obj.find('name').text
116 |                     if obj_name not in class_names:
117 |                         continue
118 |                     bndbox  = obj.find('bndbox')
119 |                     left    = bndbox.find('xmin').text
120 |                     top     = bndbox.find('ymin').text
121 |                     right   = bndbox.find('xmax').text
122 |                     bottom  = bndbox.find('ymax').text
123 | 
124 |                     if difficult_flag:
125 |                         new_f.write("%s %s %s %s %s difficult\n" % (obj_name, left, top, right, bottom))
126 |                     else:
127 |                         new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
128 |         print("Get ground truth result done.")
129 | 
130 |     if map_mode == 0 or map_mode == 3:
131 |         print("Get map.")
132 |         get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path)
133 |         print("Get map done.")
134 | 
135 |     if map_mode == 4:
136 |         print("Get map.")
137 |         get_coco_map(class_names = class_names, path = map_out_path)
138 |         print("Get map done.")
139 | 


--------------------------------------------------------------------------------
/img/street.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/ssd-pytorch/fedc8a889556c554c5e9b71554e35bd7fa305e62/img/street.jpg


--------------------------------------------------------------------------------
/logs/README.md:
--------------------------------------------------------------------------------
1 | 存放训练后的模型


--------------------------------------------------------------------------------
/model_data/simhei.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/ssd-pytorch/fedc8a889556c554c5e9b71554e35bd7fa305e62/model_data/simhei.ttf


--------------------------------------------------------------------------------
/model_data/voc_classes.txt:
--------------------------------------------------------------------------------
 1 | aeroplane
 2 | bicycle
 3 | bird
 4 | boat
 5 | bottle
 6 | bus
 7 | car
 8 | cat
 9 | chair
10 | cow
11 | diningtable
12 | dog
13 | horse
14 | motorbike
15 | person
16 | pottedplant
17 | sheep
18 | sofa
19 | train
20 | tvmonitor


--------------------------------------------------------------------------------
/nets/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/nets/mobilenetv2.py:
--------------------------------------------------------------------------------
  1 | from torch import nn
  2 | from torch.hub import load_state_dict_from_url
  3 | 
  4 | def _make_divisible(v, divisor, min_value=None):
  5 |     if min_value is None:
  6 |         min_value = divisor
  7 |     new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
  8 |     if new_v < 0.9 * v:
  9 |         new_v += divisor
 10 |     return new_v
 11 | 
 12 | class ConvBNReLU(nn.Sequential):
 13 |     def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
 14 |         padding = (kernel_size - 1) // 2
 15 |         super(ConvBNReLU, self).__init__(
 16 |             nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
 17 |             nn.BatchNorm2d(out_planes),
 18 |             nn.ReLU6(inplace=True)
 19 |         )
 20 |         self.out_channels = out_planes
 21 | 
 22 | class InvertedResidual(nn.Module):
 23 |     def __init__(self, inp, oup, stride, expand_ratio):
 24 |         super(InvertedResidual, self).__init__()
 25 |         self.stride = stride
 26 |         assert stride in [1, 2]
 27 | 
 28 |         hidden_dim = int(round(inp * expand_ratio))
 29 |         self.use_res_connect = self.stride == 1 and inp == oup
 30 | 
 31 |         layers = []
 32 |         if expand_ratio != 1:
 33 |             layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
 34 |         layers.extend([
 35 |             ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
 36 |             nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
 37 |             nn.BatchNorm2d(oup),
 38 |         ])
 39 |         self.conv = nn.Sequential(*layers)
 40 | 
 41 |         self.out_channels = oup
 42 | 
 43 |     def forward(self, x):
 44 |         if self.use_res_connect:
 45 |             return x + self.conv(x)
 46 |         else:
 47 |             return self.conv(x)
 48 | 
 49 | class MobileNetV2(nn.Module):
 50 |     def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=None, round_nearest=8):
 51 |         super(MobileNetV2, self).__init__()
 52 |         block = InvertedResidual
 53 |         input_channel = 32
 54 |         last_channel = 1280
 55 | 
 56 |         if inverted_residual_setting is None:
 57 |             inverted_residual_setting = [
 58 |                 [1, 16, 1, 1],
 59 |                 [6, 24, 2, 2],
 60 |                 [6, 32, 3, 2],
 61 |                 [6, 64, 4, 2],
 62 |                 [6, 96, 3, 1],
 63 |                 [6, 160, 3, 2],
 64 |                 [6, 320, 1, 1],
 65 |             ]
 66 | 
 67 |         if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
 68 |             raise ValueError("inverted_residual_setting should be non-empty "
 69 |                              "or a 4-element list, got {}".format(inverted_residual_setting))
 70 | 
 71 |         input_channel = _make_divisible(input_channel * width_mult, round_nearest)
 72 |         self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)
 73 |         features = [ConvBNReLU(3, input_channel, stride=2)]
 74 |         for t, c, n, s in inverted_residual_setting:
 75 |             output_channel = _make_divisible(c * width_mult, round_nearest)
 76 |             for i in range(n):
 77 |                 stride = s if i == 0 else 1
 78 |                 features.append(block(input_channel, output_channel, stride, expand_ratio=t))
 79 |                 input_channel = output_channel
 80 |         features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1))
 81 |         self.features = nn.Sequential(*features)
 82 | 
 83 |         self.classifier = nn.Sequential(
 84 |             nn.Dropout(0.2),
 85 |             nn.Linear(self.last_channel, num_classes),
 86 |         )
 87 | 
 88 |         for m in self.modules():
 89 |             if isinstance(m, nn.Conv2d):
 90 |                 nn.init.kaiming_normal_(m.weight, mode='fan_out')
 91 |                 if m.bias is not None:
 92 |                     nn.init.zeros_(m.bias)
 93 |             elif isinstance(m, nn.BatchNorm2d):
 94 |                 nn.init.ones_(m.weight)
 95 |                 nn.init.zeros_(m.bias)
 96 |             elif isinstance(m, nn.Linear):
 97 |                 nn.init.normal_(m.weight, 0, 0.01)
 98 |                 nn.init.zeros_(m.bias)
 99 | 
100 |     def forward(self, x):
101 |         x = self.features(x)
102 |         x = x.mean([2, 3])
103 |         x = self.classifier(x)
104 |         return x
105 | 
106 | def mobilenet_v2(pretrained=False, progress=True, **kwargs):
107 |     model = MobileNetV2(**kwargs)
108 |     if pretrained:
109 |         state_dict = load_state_dict_from_url('https://download.pytorch.org/models/mobilenet_v2-b0353104.pth', model_dir="./model_data", progress=progress)
110 |         model.load_state_dict(state_dict)
111 |     del model.classifier
112 |     return model
113 | 
114 | if __name__ == "__main__":
115 |     net = mobilenet_v2()
116 |     for i, layer in enumerate(net.features):
117 |         print(i, layer)


--------------------------------------------------------------------------------
/nets/resnet.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | 
  3 | import torch.nn as nn
  4 | import torch.utils.model_zoo as model_zoo
  5 | 
  6 | def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):
  7 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
  8 |                      padding=dilation, groups=groups, bias=False, dilation=dilation)
  9 | 
 10 | 
 11 | def conv1x1(in_planes, out_planes, stride=1):
 12 |     return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)
 13 | 
 14 | 
 15 | class BasicBlock(nn.Module):
 16 |     expansion = 1
 17 | 
 18 |     def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
 19 |                  base_width=64, dilation=1, norm_layer=None):
 20 |         super(BasicBlock, self).__init__()
 21 |         if norm_layer is None:
 22 |             norm_layer = nn.BatchNorm2d
 23 |         if groups != 1 or base_width != 64:
 24 |             raise ValueError('BasicBlock only supports groups=1 and base_width=64')
 25 |         if dilation > 1:
 26 |             raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
 27 |         self.conv1 = conv3x3(inplanes, planes, stride)
 28 |         self.bn1 = norm_layer(planes)
 29 |         self.relu = nn.ReLU(inplace=True)
 30 |         self.conv2 = conv3x3(planes, planes)
 31 |         self.bn2 = norm_layer(planes)
 32 |         self.downsample = downsample
 33 |         self.stride = stride
 34 | 
 35 |     def forward(self, x):
 36 |         identity = x
 37 | 
 38 |         out = self.conv1(x)
 39 |         out = self.bn1(out)
 40 |         out = self.relu(out)
 41 | 
 42 |         out = self.conv2(out)
 43 |         out = self.bn2(out)
 44 | 
 45 |         if self.downsample is not None:
 46 |             identity = self.downsample(x)
 47 | 
 48 |         out += identity
 49 |         out = self.relu(out)
 50 | 
 51 |         return out
 52 | 
 53 | class Bottleneck(nn.Module):
 54 |     expansion = 4
 55 |     def __init__(self, inplanes, planes, stride=1, downsample=None, groups=1,
 56 |                  base_width=64, dilation=1, norm_layer=None):
 57 |         super(Bottleneck, self).__init__()
 58 |         if norm_layer is None:
 59 |             norm_layer = nn.BatchNorm2d
 60 |         width = int(planes * (base_width / 64.)) * groups
 61 |         # 利用1x1卷积下降通道数
 62 |         self.conv1 = conv1x1(inplanes, width)
 63 |         self.bn1 = norm_layer(width)
 64 |         # 利用3x3卷积进行特征提取
 65 |         self.conv2 = conv3x3(width, width, stride, groups, dilation)
 66 |         self.bn2 = norm_layer(width)
 67 |         # 利用1x1卷积上升通道数
 68 |         self.conv3 = conv1x1(width, planes * self.expansion)
 69 |         self.bn3 = norm_layer(planes * self.expansion)
 70 | 
 71 |         self.relu = nn.ReLU(inplace=True)
 72 |         self.downsample = downsample
 73 |         self.stride = stride
 74 | 
 75 |     def forward(self, x):
 76 |         identity = x
 77 | 
 78 |         out = self.conv1(x)
 79 |         out = self.bn1(out)
 80 |         out = self.relu(out)
 81 | 
 82 |         out = self.conv2(out)
 83 |         out = self.bn2(out)
 84 |         out = self.relu(out)
 85 | 
 86 |         out = self.conv3(out)
 87 |         out = self.bn3(out)
 88 | 
 89 |         if self.downsample is not None:
 90 |             identity = self.downsample(x)
 91 | 
 92 |         out += identity
 93 |         out = self.relu(out)
 94 | 
 95 |         return out
 96 | 
 97 | 
 98 | class ResNet(nn.Module):
 99 |     def __init__(self, block, layers, num_classes=1000):
100 |         #-----------------------------------------------------------#
101 |         #   假设输入图像为600,600,3
102 |         #   当我们使用resnet50的时候
103 |         #-----------------------------------------------------------#
104 |         self.inplanes = 64
105 |         super(ResNet, self).__init__()
106 |         # 600,600,3 -> 300,300,64
107 |         self.conv1  = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
108 |         self.bn1    = nn.BatchNorm2d(64)
109 |         self.relu   = nn.ReLU(inplace=True)
110 |         # 300,300,64 -> 150,150,64
111 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=0, ceil_mode=True) # change
112 |         # 150,150,64 -> 150,150,256
113 |         self.layer1 = self._make_layer(block, 64, layers[0])
114 |         # 150,150,256 -> 75,75,512
115 |         self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
116 |         # 75,75,512 -> 38,38,1024
117 |         self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
118 |         # 38,38,1024 -> 19,19,2048
119 |         self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
120 |         self.features = [self.conv1, self.bn1, self.relu, self.maxpool, self.layer1, self.layer2, self.layer3]
121 | 
122 |         self.avgpool = nn.AvgPool2d(7)
123 |         self.fc = nn.Linear(512 * block.expansion, num_classes)
124 | 
125 |         for m in self.modules():
126 |             if isinstance(m, nn.Conv2d):
127 |                 n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
128 |                 m.weight.data.normal_(0, math.sqrt(2. / n))
129 |             elif isinstance(m, nn.BatchNorm2d):
130 |                 m.weight.data.fill_(1)
131 |                 m.bias.data.zero_()
132 | 
133 |     def _make_layer(self, block, planes, blocks, stride=1):
134 |         downsample = None
135 |         if stride != 1 or self.inplanes != planes * block.expansion:
136 |             downsample = nn.Sequential(
137 |                 nn.Conv2d(self.inplanes, planes * block.expansion,
138 |                     kernel_size=1, stride=stride, bias=False),
139 |             nn.BatchNorm2d(planes * block.expansion),
140 |         )
141 | 
142 |         layers = []
143 |         layers.append(block(self.inplanes, planes, stride, downsample))
144 |         self.inplanes = planes * block.expansion
145 |         for i in range(1, blocks):
146 |             layers.append(block(self.inplanes, planes))
147 | 
148 |         return nn.Sequential(*layers)
149 | 
150 |     def forward(self, x):
151 |         x = self.conv1(x)
152 |         x = self.bn1(x)
153 |         x = self.relu(x)
154 |         x = self.maxpool(x)
155 | 
156 |         x = self.layer1(x)
157 |         x = self.layer2(x)
158 |         x = self.layer3(x)
159 |         x = self.layer4(x)
160 | 
161 |         x = self.avgpool(x)
162 |         x = x.view(x.size(0), -1)
163 |         x = self.fc(x)
164 | 
165 |         return x
166 | 
167 | def resnet50(pretrained=False, **kwargs):
168 |     model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
169 |     if pretrained:
170 |         model.load_state_dict(model_zoo.load_url('https://s3.amazonaws.com/pytorch/models/resnet50-19c8e357.pth', model_dir='model_data'), strict=False)
171 |     
172 |     del model.avgpool
173 |     del model.fc
174 |     return model
175 | 


--------------------------------------------------------------------------------
/nets/ssd.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | import torch.nn.functional as F
  4 | import torch.nn.init as init
  5 | 
  6 | from nets.mobilenetv2 import InvertedResidual, mobilenet_v2
  7 | from nets.vgg import vgg as add_vgg
  8 | from nets.resnet import resnet50
  9 | 
 10 | 
 11 | class L2Norm(nn.Module):
 12 |     def __init__(self,n_channels, scale):
 13 |         super(L2Norm,self).__init__()
 14 |         self.n_channels = n_channels
 15 |         self.gamma      = scale or None
 16 |         self.eps        = 1e-10
 17 |         self.weight     = nn.Parameter(torch.Tensor(self.n_channels))
 18 |         self.reset_parameters()
 19 | 
 20 |     def reset_parameters(self):
 21 |         init.constant_(self.weight,self.gamma)
 22 | 
 23 |     def forward(self, x):
 24 |         norm    = x.pow(2).sum(dim=1, keepdim=True).sqrt()+self.eps
 25 |         #x /= norm
 26 |         x       = torch.div(x,norm)
 27 |         out     = self.weight.unsqueeze(0).unsqueeze(2).unsqueeze(3).expand_as(x) * x
 28 |         return out
 29 | 
 30 | def add_extras(in_channels, backbone_name):
 31 |     layers = []
 32 |     if backbone_name == 'mobilenetv2':
 33 |         layers += [InvertedResidual(in_channels, 512, stride=2, expand_ratio=0.2)]
 34 |         layers += [InvertedResidual(512, 256, stride=2, expand_ratio=0.25)]
 35 |         layers += [InvertedResidual(256, 256, stride=2, expand_ratio=0.5)]
 36 |         layers += [InvertedResidual(256, 64, stride=2, expand_ratio=0.25)]
 37 |     else:
 38 |         # Block 6
 39 |         # 19,19,1024 -> 19,19,256 -> 10,10,512
 40 |         layers += [nn.Conv2d(in_channels, 256, kernel_size=1, stride=1)]
 41 |         layers += [nn.Conv2d(256, 512, kernel_size=3, stride=2, padding=1)]
 42 | 
 43 |         # Block 7
 44 |         # 10,10,512 -> 10,10,128 -> 5,5,256
 45 |         layers += [nn.Conv2d(512, 128, kernel_size=1, stride=1)]
 46 |         layers += [nn.Conv2d(128, 256, kernel_size=3, stride=2, padding=1)]
 47 | 
 48 |         # Block 8
 49 |         # 5,5,256 -> 5,5,128 -> 3,3,256
 50 |         layers += [nn.Conv2d(256, 128, kernel_size=1, stride=1)]
 51 |         layers += [nn.Conv2d(128, 256, kernel_size=3, stride=1)]
 52 |         
 53 |         # Block 9
 54 |         # 3,3,256 -> 3,3,128 -> 1,1,256
 55 |         layers += [nn.Conv2d(256, 128, kernel_size=1, stride=1)]
 56 |         layers += [nn.Conv2d(128, 256, kernel_size=3, stride=1)]
 57 | 
 58 |     return nn.ModuleList(layers)
 59 | 
 60 | class SSD300(nn.Module):
 61 |     def __init__(self, num_classes, backbone_name, pretrained = False):
 62 |         super(SSD300, self).__init__()
 63 |         self.num_classes    = num_classes
 64 |         if backbone_name    == "vgg":
 65 |             self.vgg        = add_vgg(pretrained)
 66 |             self.extras     = add_extras(1024, backbone_name)
 67 |             self.L2Norm     = L2Norm(512, 20)
 68 |             mbox            = [4, 6, 6, 6, 4, 4]
 69 |             
 70 |             loc_layers      = []
 71 |             conf_layers     = []
 72 |             backbone_source = [21, -2]
 73 |             #---------------------------------------------------#
 74 |             #   在add_vgg获得的特征层里
 75 |             #   第21层和-2层可以用来进行回归预测和分类预测。
 76 |             #   分别是conv4-3(38,38,512)和conv7(19,19,1024)的输出
 77 |             #---------------------------------------------------#
 78 |             for k, v in enumerate(backbone_source):
 79 |                 loc_layers  += [nn.Conv2d(self.vgg[v].out_channels, mbox[k] * 4, kernel_size = 3, padding = 1)]
 80 |                 conf_layers += [nn.Conv2d(self.vgg[v].out_channels, mbox[k] * num_classes, kernel_size = 3, padding = 1)]
 81 |             #-------------------------------------------------------------#
 82 |             #   在add_extras获得的特征层里
 83 |             #   第1层、第3层、第5层、第7层可以用来进行回归预测和分类预测。
 84 |             #   shape分别为(10,10,512), (5,5,256), (3,3,256), (1,1,256)
 85 |             #-------------------------------------------------------------#  
 86 |             for k, v in enumerate(self.extras[1::2], 2):
 87 |                 loc_layers  += [nn.Conv2d(v.out_channels, mbox[k] * 4, kernel_size = 3, padding = 1)]
 88 |                 conf_layers += [nn.Conv2d(v.out_channels, mbox[k] * num_classes, kernel_size = 3, padding = 1)]
 89 |         elif backbone_name == "mobilenetv2":
 90 |             self.mobilenet  = mobilenet_v2(pretrained).features
 91 |             self.extras     = add_extras(1280, backbone_name)
 92 |             self.L2Norm     = L2Norm(96, 20)
 93 |             mbox            = [6, 6, 6, 6, 6, 6]
 94 | 
 95 |             loc_layers      = []
 96 |             conf_layers     = []
 97 |             backbone_source = [13, -1]
 98 |             for k, v in enumerate(backbone_source):
 99 |                 loc_layers  += [nn.Conv2d(self.mobilenet[v].out_channels, mbox[k] * 4, kernel_size = 3, padding = 1)]
100 |                 conf_layers += [nn.Conv2d(self.mobilenet[v].out_channels, mbox[k] * num_classes, kernel_size = 3, padding = 1)]
101 |             for k, v in enumerate(self.extras, 2):
102 |                 loc_layers  += [nn.Conv2d(v.out_channels, mbox[k] * 4, kernel_size = 3, padding = 1)]
103 |                 conf_layers += [nn.Conv2d(v.out_channels, mbox[k] * num_classes, kernel_size = 3, padding = 1)]
104 |         elif backbone_name == "resnet50":
105 |             self.resnet     = nn.Sequential(*resnet50(pretrained).features)
106 |             self.extras     = add_extras(1024, backbone_name)
107 |             self.L2Norm     = L2Norm(512, 20)
108 |             mbox            = [4, 6, 6, 6, 4, 4]
109 |             
110 |             loc_layers      = []
111 |             conf_layers     = []
112 |             out_channels    = [512, 1024]
113 |             #---------------------------------------------------#
114 |             #   在add_vgg获得的特征层里
115 |             #   第layer3层和layer4层可以用来进行回归预测和分类预测。
116 |             #---------------------------------------------------#
117 |             for k, v in enumerate(out_channels):
118 |                 loc_layers  += [nn.Conv2d(out_channels[k], mbox[k] * 4, kernel_size = 3, padding = 1)]
119 |                 conf_layers += [nn.Conv2d(out_channels[k], mbox[k] * num_classes, kernel_size = 3, padding = 1)]
120 |             #-------------------------------------------------------------#
121 |             #   在add_extras获得的特征层里
122 |             #   第1层、第3层、第5层、第7层可以用来进行回归预测和分类预测。
123 |             #   shape分别为(10,10,512), (5,5,256), (3,3,256), (1,1,256)
124 |             #-------------------------------------------------------------#  
125 |             for k, v in enumerate(self.extras[1::2], 2):
126 |                 loc_layers  += [nn.Conv2d(v.out_channels, mbox[k] * 4, kernel_size = 3, padding = 1)]
127 |                 conf_layers += [nn.Conv2d(v.out_channels, mbox[k] * num_classes, kernel_size = 3, padding = 1)]
128 |         else:
129 |             raise ValueError("The backbone_name is not support")
130 | 
131 |         self.loc            = nn.ModuleList(loc_layers)
132 |         self.conf           = nn.ModuleList(conf_layers)
133 |         self.backbone_name  = backbone_name
134 |         
135 |     def forward(self, x):
136 |         #---------------------------#
137 |         #   x是300,300,3
138 |         #---------------------------#
139 |         sources = list()
140 |         loc     = list()
141 |         conf    = list()
142 | 
143 |         #---------------------------#
144 |         #   获得conv4_3的内容
145 |         #   shape为38,38,512
146 |         #---------------------------#
147 |         if self.backbone_name == "vgg":
148 |             for k in range(23):
149 |                 x = self.vgg[k](x)
150 |         elif self.backbone_name == "mobilenetv2":
151 |             for k in range(14):
152 |                 x = self.mobilenet[k](x)
153 |         elif self.backbone_name == "resnet50":
154 |             for k in range(6):
155 |                 x = self.resnet[k](x)
156 |         #---------------------------#
157 |         #   conv4_3的内容
158 |         #   需要进行L2标准化
159 |         #---------------------------#
160 |         s = self.L2Norm(x)
161 |         sources.append(s)
162 | 
163 |         #---------------------------#
164 |         #   获得conv7的内容
165 |         #   shape为19,19,1024
166 |         #---------------------------#
167 |         if self.backbone_name == "vgg":
168 |             for k in range(23, len(self.vgg)):
169 |                 x = self.vgg[k](x)
170 |         elif self.backbone_name == "mobilenetv2":
171 |             for k in range(14, len(self.mobilenet)):
172 |                 x = self.mobilenet[k](x)
173 |         elif self.backbone_name == "resnet50":
174 |             for k in range(6, len(self.resnet)):
175 |                 x = self.resnet[k](x)
176 | 
177 |         sources.append(x)
178 |         #-------------------------------------------------------------#
179 |         #   在add_extras获得的特征层里
180 |         #   第1层、第3层、第5层、第7层可以用来进行回归预测和分类预测。
181 |         #   shape分别为(10,10,512), (5,5,256), (3,3,256), (1,1,256)
182 |         #-------------------------------------------------------------#      
183 |         for k, v in enumerate(self.extras):
184 |             x = F.relu(v(x), inplace=True)
185 |             if self.backbone_name == "vgg" or self.backbone_name == "resnet50":
186 |                 if k % 2 == 1:
187 |                     sources.append(x)
188 |             else:
189 |                 sources.append(x)
190 | 
191 |         #-------------------------------------------------------------#
192 |         #   为获得的6个有效特征层添加回归预测和分类预测
193 |         #-------------------------------------------------------------#      
194 |         for (x, l, c) in zip(sources, self.loc, self.conf):
195 |             loc.append(l(x).permute(0, 2, 3, 1).contiguous())
196 |             conf.append(c(x).permute(0, 2, 3, 1).contiguous())
197 | 
198 |         #-------------------------------------------------------------#
199 |         #   进行reshape方便堆叠
200 |         #-------------------------------------------------------------#  
201 |         loc     = torch.cat([o.view(o.size(0), -1) for o in loc], 1)
202 |         conf    = torch.cat([o.view(o.size(0), -1) for o in conf], 1)
203 |         #-------------------------------------------------------------#
204 |         #   loc会reshape到batch_size, num_anchors, 4
205 |         #   conf会reshap到batch_size, num_anchors, self.num_classes
206 |         #-------------------------------------------------------------#     
207 |         output = (
208 |             loc.view(loc.size(0), -1, 4),
209 |             conf.view(conf.size(0), -1, self.num_classes),
210 |         )
211 |         return output
212 | 


--------------------------------------------------------------------------------
/nets/ssd_training.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | from functools import partial
  3 | 
  4 | import torch
  5 | import torch.nn as nn
  6 | 
  7 | 
  8 | class MultiboxLoss(nn.Module):
  9 |     def __init__(self, num_classes, alpha=1.0, neg_pos_ratio=3.0,
 10 |                  background_label_id=0, negatives_for_hard=100.0):
 11 |         self.num_classes = num_classes
 12 |         self.alpha = alpha
 13 |         self.neg_pos_ratio = neg_pos_ratio
 14 |         if background_label_id != 0:
 15 |             raise Exception('Only 0 as background label id is supported')
 16 |         self.background_label_id = background_label_id
 17 |         self.negatives_for_hard = torch.FloatTensor([negatives_for_hard])[0]
 18 | 
 19 |     def _l1_smooth_loss(self, y_true, y_pred):
 20 |         abs_loss = torch.abs(y_true - y_pred)
 21 |         sq_loss = 0.5 * (y_true - y_pred)**2
 22 |         l1_loss = torch.where(abs_loss < 1.0, sq_loss, abs_loss - 0.5)
 23 |         return torch.sum(l1_loss, -1)
 24 | 
 25 |     def _softmax_loss(self, y_true, y_pred):
 26 |         y_pred = torch.clamp(y_pred, min = 1e-7)
 27 |         softmax_loss = -torch.sum(y_true * torch.log(y_pred),
 28 |                                       axis=-1)
 29 |         return softmax_loss
 30 | 
 31 |     def forward(self, y_true, y_pred):
 32 |         # --------------------------------------------- #
 33 |         #   y_true batch_size, 8732, 4 + self.num_classes + 1
 34 |         #   y_pred batch_size, 8732, 4 + self.num_classes
 35 |         # --------------------------------------------- #
 36 |         num_boxes       = y_true.size()[1]
 37 |         y_pred          = torch.cat([y_pred[0], nn.Softmax(-1)(y_pred[1])], dim = -1)
 38 | 
 39 |         # --------------------------------------------- #
 40 |         #   分类的loss
 41 |         #   batch_size,8732,21 -> batch_size,8732
 42 |         # --------------------------------------------- #
 43 |         conf_loss = self._softmax_loss(y_true[:, :, 4:-1], y_pred[:, :, 4:])
 44 |         
 45 |         # --------------------------------------------- #
 46 |         #   框的位置的loss
 47 |         #   batch_size,8732,4 -> batch_size,8732
 48 |         # --------------------------------------------- #
 49 |         loc_loss = self._l1_smooth_loss(y_true[:, :, :4],
 50 |                                         y_pred[:, :, :4])
 51 | 
 52 |         # --------------------------------------------- #
 53 |         #   获取所有的正标签的loss
 54 |         # --------------------------------------------- #
 55 |         pos_loc_loss = torch.sum(loc_loss * y_true[:, :, -1],
 56 |                                      axis=1)
 57 |         pos_conf_loss = torch.sum(conf_loss * y_true[:, :, -1],
 58 |                                       axis=1)
 59 | 
 60 |         # --------------------------------------------- #
 61 |         #   每一张图的正样本的个数
 62 |         #   num_pos     [batch_size,]
 63 |         # --------------------------------------------- #
 64 |         num_pos = torch.sum(y_true[:, :, -1], axis=-1)
 65 | 
 66 |         # --------------------------------------------- #
 67 |         #   每一张图的负样本的个数
 68 |         #   num_neg     [batch_size,]
 69 |         # --------------------------------------------- #
 70 |         num_neg = torch.min(self.neg_pos_ratio * num_pos, num_boxes - num_pos)
 71 |         # 找到了哪些值是大于0的
 72 |         pos_num_neg_mask = num_neg > 0
 73 |         # --------------------------------------------- #
 74 |         #   如果所有的图，正样本的数量均为0
 75 |         #   那么则默认选取100个先验框作为负样本
 76 |         # --------------------------------------------- #
 77 |         has_min = torch.sum(pos_num_neg_mask)
 78 |         
 79 |         # --------------------------------------------- #
 80 |         #   从这里往后，与视频中看到的代码有些许不同。
 81 |         #   由于以前的负样本选取方式存在一些问题，
 82 |         #   我对该部分代码进行重构。
 83 |         #   求整个batch应该的负样本数量总和
 84 |         # --------------------------------------------- #
 85 |         num_neg_batch = torch.sum(num_neg) if has_min > 0 else self.negatives_for_hard
 86 | 
 87 |         # --------------------------------------------- #
 88 |         #   对预测结果进行判断，如果该先验框没有包含物体
 89 |         #   那么它的不属于背景的预测概率过大的话
 90 |         #   就是难分类样本
 91 |         # --------------------------------------------- #
 92 |         confs_start = 4 + self.background_label_id + 1
 93 |         confs_end   = confs_start + self.num_classes - 1
 94 | 
 95 |         # --------------------------------------------- #
 96 |         #   batch_size,8732
 97 |         #   把不是背景的概率求和，求和后的概率越大
 98 |         #   代表越难分类。
 99 |         # --------------------------------------------- #
100 |         max_confs = torch.sum(y_pred[:, :, confs_start:confs_end], dim=2)
101 | 
102 |         # --------------------------------------------------- #
103 |         #   只有没有包含物体的先验框才得到保留
104 |         #   我们在整个batch里面选取最难分类的num_neg_batch个
105 |         #   先验框作为负样本。
106 |         # --------------------------------------------------- #
107 |         max_confs   = (max_confs * (1 - y_true[:, :, -1])).view([-1])
108 | 
109 |         _, indices  = torch.topk(max_confs, k = int(num_neg_batch.cpu().numpy().tolist()))
110 | 
111 |         neg_conf_loss = torch.gather(conf_loss.view([-1]), 0, indices)
112 | 
113 |         # 进行归一化
114 |         num_pos     = torch.where(num_pos != 0, num_pos, torch.ones_like(num_pos))
115 |         total_loss  = torch.sum(pos_conf_loss) + torch.sum(neg_conf_loss) + torch.sum(self.alpha * pos_loc_loss)
116 |         total_loss  = total_loss / torch.sum(num_pos)
117 |         return total_loss
118 | 
119 | def weights_init(net, init_type='normal', init_gain=0.02):
120 |     def init_func(m):
121 |         classname = m.__class__.__name__
122 |         if hasattr(m, 'weight') and classname.find('Conv') != -1:
123 |             if init_type == 'normal':
124 |                 torch.nn.init.normal_(m.weight.data, 0.0, init_gain)
125 |             elif init_type == 'xavier':
126 |                 torch.nn.init.xavier_normal_(m.weight.data, gain=init_gain)
127 |             elif init_type == 'kaiming':
128 |                 torch.nn.init.kaiming_normal_(m.weight.data, a=0, mode='fan_in')
129 |             elif init_type == 'orthogonal':
130 |                 torch.nn.init.orthogonal_(m.weight.data, gain=init_gain)
131 |             else:
132 |                 raise NotImplementedError('initialization method [%s] is not implemented' % init_type)
133 |         elif classname.find('BatchNorm2d') != -1:
134 |             torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
135 |             torch.nn.init.constant_(m.bias.data, 0.0)
136 |     print('initialize network with %s type' % init_type)
137 |     net.apply(init_func)
138 | 
139 | def get_lr_scheduler(lr_decay_type, lr, min_lr, total_iters, warmup_iters_ratio = 0.05, warmup_lr_ratio = 0.1, no_aug_iter_ratio = 0.05, step_num = 10):
140 |     def yolox_warm_cos_lr(lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter, iters):
141 |         if iters <= warmup_total_iters:
142 |             # lr = (lr - warmup_lr_start) * iters / float(warmup_total_iters) + warmup_lr_start
143 |             lr = (lr - warmup_lr_start) * pow(iters / float(warmup_total_iters), 2) + warmup_lr_start
144 |         elif iters >= total_iters - no_aug_iter:
145 |             lr = min_lr
146 |         else:
147 |             lr = min_lr + 0.5 * (lr - min_lr) * (
148 |                 1.0 + math.cos(math.pi* (iters - warmup_total_iters) / (total_iters - warmup_total_iters - no_aug_iter))
149 |             )
150 |         return lr
151 | 
152 |     def step_lr(lr, decay_rate, step_size, iters):
153 |         if step_size < 1:
154 |             raise ValueError("step_size must above 1.")
155 |         n       = iters // step_size
156 |         out_lr  = lr * decay_rate ** n
157 |         return out_lr
158 | 
159 |     if lr_decay_type == "cos":
160 |         warmup_total_iters  = min(max(warmup_iters_ratio * total_iters, 1), 3)
161 |         warmup_lr_start     = max(warmup_lr_ratio * lr, 1e-6)
162 |         no_aug_iter         = min(max(no_aug_iter_ratio * total_iters, 1), 15)
163 |         func = partial(yolox_warm_cos_lr ,lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter)
164 |     else:
165 |         decay_rate  = (min_lr / lr) ** (1 / (step_num - 1))
166 |         step_size   = total_iters / step_num
167 |         func = partial(step_lr, lr, decay_rate, step_size)
168 | 
169 |     return func
170 | 
171 | def set_optimizer_lr(optimizer, lr_scheduler_func, epoch):
172 |     lr = lr_scheduler_func(epoch)
173 |     for param_group in optimizer.param_groups:
174 |         param_group['lr'] = lr
175 | 


--------------------------------------------------------------------------------
/nets/vgg.py:
--------------------------------------------------------------------------------
 1 | import torch.nn as nn
 2 | from torch.hub import load_state_dict_from_url
 3 | 
 4 | 
 5 | '''
 6 | 该代码用于获得VGG主干特征提取网络的输出。
 7 | 输入变量i代表的是输入图片的通道数，通常为3。
 8 | 
 9 | 300, 300, 3 -> 300, 300, 64 -> 300, 300, 64 -> 150, 150, 64 -> 150, 150, 128 -> 150, 150, 128 -> 75, 75, 128 ->
10 | 75, 75, 256 -> 75, 75, 256 -> 75, 75, 256 -> 38, 38, 256 -> 38, 38, 512 -> 38, 38, 512 -> 38, 38, 512 -> 19, 19, 512 ->
11 | 19, 19, 512 -> 19, 19, 512 -> 19, 19, 512 -> 19, 19, 512 -> 19, 19, 1024 -> 19, 19, 1024
12 | 
13 | 38, 38, 512的序号是22
14 | 19, 19, 1024的序号是34
15 | '''
16 | base = [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'C', 512, 512, 512, 'M',
17 |             512, 512, 512]
18 | 
19 | def vgg(pretrained = False):
20 |     layers = []
21 |     in_channels = 3
22 |     for v in base:
23 |         if v == 'M':
24 |             layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
25 |         elif v == 'C':
26 |             layers += [nn.MaxPool2d(kernel_size=2, stride=2, ceil_mode=True)]
27 |         else:
28 |             conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
29 |             layers += [conv2d, nn.ReLU(inplace=True)]
30 |             in_channels = v
31 |     # 19, 19, 512 -> 19, 19, 512 
32 |     pool5 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)
33 |     # 19, 19, 512 -> 19, 19, 1024
34 |     conv6 = nn.Conv2d(512, 1024, kernel_size=3, padding=6, dilation=6)
35 |     # 19, 19, 1024 -> 19, 19, 1024
36 |     conv7 = nn.Conv2d(1024, 1024, kernel_size=1)
37 |     layers += [pool5, conv6,
38 |                nn.ReLU(inplace=True), conv7, nn.ReLU(inplace=True)]
39 | 
40 |     model = nn.ModuleList(layers)
41 |     if pretrained:
42 |         state_dict = load_state_dict_from_url("https://download.pytorch.org/models/vgg16-397923af.pth", model_dir="./model_data")
43 |         state_dict = {k.replace('features.', '') : v for k, v in state_dict.items()}
44 |         model.load_state_dict(state_dict, strict = False)
45 |     return model
46 | 
47 | if __name__ == "__main__":
48 |     net = vgg()
49 |     for i, layer in enumerate(net):
50 |         print(i, layer)


--------------------------------------------------------------------------------
/predict.py:
--------------------------------------------------------------------------------
  1 | #-----------------------------------------------------------------------#
  2 | #   predict.py将单张图片预测、摄像头检测、FPS测试和目录遍历检测等功能
  3 | #   整合到了一个py文件中，通过指定mode进行模式的修改。
  4 | #-----------------------------------------------------------------------#
  5 | import time
  6 | 
  7 | import cv2
  8 | import numpy as np
  9 | from PIL import Image
 10 | 
 11 | from ssd import SSD
 12 | 
 13 | if __name__ == "__main__":
 14 |     ssd = SSD()
 15 |     #----------------------------------------------------------------------------------------------------------#
 16 |     #   mode用于指定测试的模式：
 17 |     #   'predict'           表示单张图片预测，如果想对预测过程进行修改，如保存图片，截取对象等，可以先看下方详细的注释
 18 |     #   'video'             表示视频检测，可调用摄像头或者视频进行检测，详情查看下方注释。
 19 |     #   'fps'               表示测试fps，使用的图片是img里面的street.jpg，详情查看下方注释。
 20 |     #   'dir_predict'       表示遍历文件夹进行检测并保存。默认遍历img文件夹，保存img_out文件夹，详情查看下方注释。
 21 |     #   'export_onnx'       表示将模型导出为onnx，需要pytorch1.7.1以上。
 22 |     #----------------------------------------------------------------------------------------------------------#
 23 |     mode = "predict"
 24 |     #-------------------------------------------------------------------------#
 25 |     #   crop                指定了是否在单张图片预测后对目标进行截取
 26 |     #   count               指定了是否进行目标的计数
 27 |     #   crop、count仅在mode='predict'时有效
 28 |     #-------------------------------------------------------------------------#
 29 |     crop            = False
 30 |     count           = False
 31 |     #----------------------------------------------------------------------------------------------------------#
 32 |     #   video_path          用于指定视频的路径，当video_path=0时表示检测摄像头
 33 |     #                       想要检测视频，则设置如video_path = "xxx.mp4"即可，代表读取出根目录下的xxx.mp4文件。
 34 |     #   video_save_path     表示视频保存的路径，当video_save_path=""时表示不保存
 35 |     #                       想要保存视频，则设置如video_save_path = "yyy.mp4"即可，代表保存为根目录下的yyy.mp4文件。
 36 |     #   video_fps           用于保存的视频的fps
 37 |     #
 38 |     #   video_path、video_save_path和video_fps仅在mode='video'时有效
 39 |     #   保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。
 40 |     #----------------------------------------------------------------------------------------------------------#
 41 |     video_path      = 0
 42 |     video_save_path = ""
 43 |     video_fps       = 25.0
 44 |     #----------------------------------------------------------------------------------------------------------#
 45 |     #   test_interval       用于指定测量fps的时候，图片检测的次数。理论上test_interval越大，fps越准确。
 46 |     #   fps_image_path      用于指定测试的fps图片
 47 |     #   
 48 |     #   test_interval和fps_image_path仅在mode='fps'有效
 49 |     #----------------------------------------------------------------------------------------------------------#
 50 |     test_interval   = 100
 51 |     fps_image_path  = "img/street.jpg"
 52 |     #-------------------------------------------------------------------------#
 53 |     #   dir_origin_path     指定了用于检测的图片的文件夹路径
 54 |     #   dir_save_path       指定了检测完图片的保存路径
 55 |     #   
 56 |     #   dir_origin_path和dir_save_path仅在mode='dir_predict'时有效
 57 |     #-------------------------------------------------------------------------#
 58 |     dir_origin_path = "img/"
 59 |     dir_save_path   = "img_out/"
 60 |     #-------------------------------------------------------------------------#
 61 |     #   simplify            使用Simplify onnx
 62 |     #   onnx_save_path      指定了onnx的保存路径
 63 |     #-------------------------------------------------------------------------#
 64 |     simplify        = True
 65 |     onnx_save_path  = "model_data/models.onnx"
 66 |     
 67 |     if mode == "predict":
 68 |         '''
 69 |         1、如果想要进行检测完的图片的保存，利用r_image.save("img.jpg")即可保存，直接在predict.py里进行修改即可。 
 70 |         2、如果想要获得预测框的坐标，可以进入ssd.detect_image函数，在绘图部分读取top，left，bottom，right这四个值。
 71 |         3、如果想要利用预测框截取下目标，可以进入ssd.detect_image函数，在绘图部分利用获取到的top，left，bottom，right这四个值
 72 |         在原图上利用矩阵的方式进行截取。
 73 |         4、如果想要在预测图上写额外的字，比如检测到的特定目标的数量，可以进入ssd.detect_image函数，在绘图部分对predicted_class进行判断，
 74 |         比如判断if predicted_class == 'car': 即可判断当前目标是否为车，然后记录数量即可。利用draw.text即可写字。
 75 |         '''
 76 |         while True:
 77 |             img = input('Input image filename:')
 78 |             try:
 79 |                 image = Image.open(img)
 80 |             except:
 81 |                 print('Open Error! Try again!')
 82 |                 continue
 83 |             else:
 84 |                 r_image = ssd.detect_image(image, crop = crop, count=count)
 85 |                 r_image.show()
 86 | 
 87 |     elif mode == "video":
 88 |         capture = cv2.VideoCapture(video_path)
 89 |         if video_save_path!="":
 90 |             fourcc  = cv2.VideoWriter_fourcc(*'XVID')
 91 |             size    = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
 92 |             out     = cv2.VideoWriter(video_save_path, fourcc, video_fps, size)
 93 | 
 94 |         ref, frame = capture.read()
 95 |         if not ref:
 96 |             raise ValueError("未能正确读取摄像头（视频），请注意是否正确安装摄像头（是否正确填写视频路径）。")
 97 | 
 98 |         fps = 0.0
 99 |         while(True):
100 |             t1 = time.time()
101 |             # 读取某一帧
102 |             ref, frame = capture.read()
103 |             if not ref:
104 |                 break
105 |             # 格式转变，BGRtoRGB
106 |             frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
107 |             # 转变成Image
108 |             frame = Image.fromarray(np.uint8(frame))
109 |             # 进行检测
110 |             frame = np.array(ssd.detect_image(frame))
111 |             # RGBtoBGR满足opencv显示格式
112 |             frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR)
113 |             
114 |             fps  = ( fps + (1./(time.time()-t1)) ) / 2
115 |             print("fps= %.2f"%(fps))
116 |             frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
117 |             
118 |             cv2.imshow("video",frame)
119 |             c= cv2.waitKey(1) & 0xff 
120 |             if video_save_path!="":
121 |                 out.write(frame)
122 | 
123 |             if c==27:
124 |                 capture.release()
125 |                 break
126 | 
127 |         print("Video Detection Done!")
128 |         capture.release()
129 |         if video_save_path!="":
130 |             print("Save processed video to the path :" + video_save_path)
131 |             out.release()
132 |         cv2.destroyAllWindows()
133 |         
134 |     elif mode == "fps":
135 |         img = Image.open(fps_image_path)
136 |         tact_time = ssd.get_FPS(img, test_interval)
137 |         print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1')
138 | 
139 |     elif mode == "dir_predict":
140 |         import os
141 | 
142 |         from tqdm import tqdm
143 | 
144 |         img_names = os.listdir(dir_origin_path)
145 |         for img_name in tqdm(img_names):
146 |             if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')):
147 |                 image_path  = os.path.join(dir_origin_path, img_name)
148 |                 image       = Image.open(image_path)
149 |                 r_image     = ssd.detect_image(image)
150 |                 if not os.path.exists(dir_save_path):
151 |                     os.makedirs(dir_save_path)
152 |                 r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0)
153 | 
154 |     elif mode == "export_onnx":
155 |         ssd.convert_to_onnx(simplify, onnx_save_path)
156 |         
157 |     else:
158 |         raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps' or 'dir_predict'.")
159 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | torch
 2 | torchvision
 3 | tensorboard
 4 | scipy==1.2.1
 5 | numpy==1.17.0
 6 | matplotlib==3.1.2
 7 | opencv_python==4.1.2.30
 8 | tqdm==4.60.0
 9 | Pillow==8.2.0
10 | h5py==2.10.0


--------------------------------------------------------------------------------
/ssd.py:
--------------------------------------------------------------------------------
  1 | import colorsys
  2 | import os
  3 | import time
  4 | import warnings
  5 | 
  6 | import numpy as np
  7 | import torch
  8 | import torch.backends.cudnn as cudnn
  9 | from PIL import Image, ImageDraw, ImageFont
 10 | 
 11 | from nets.ssd import SSD300
 12 | from utils.anchors import get_anchors
 13 | from utils.utils import (cvtColor, get_classes, preprocess_input, resize_image,
 14 |                          show_config)
 15 | from utils.utils_bbox import BBoxUtility
 16 | 
 17 | warnings.filterwarnings("ignore")
 18 | 
 19 | #--------------------------------------------#
 20 | #   使用自己训练好的模型预测需要修改3个参数
 21 | #   model_path、backbone和classes_path都需要修改！
 22 | #   如果出现shape不匹配
 23 | #   一定要注意训练时的config里面的num_classes、
 24 | #   model_path和classes_path参数的修改
 25 | #--------------------------------------------#
 26 | class SSD(object):
 27 |     _defaults = {
 28 |         #--------------------------------------------------------------------------#
 29 |         #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
 30 |         #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
 31 |         #
 32 |         #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
 33 |         #   验证集损失较低不代表mAP较高，仅代表该权值在验证集上泛化性能较好。
 34 |         #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
 35 |         #--------------------------------------------------------------------------#
 36 |         "model_path"        : 'model_data/ssd_weights.pth',
 37 |         "classes_path"      : 'model_data/voc_classes.txt',
 38 |         #---------------------------------------------------------------------#
 39 |         #   用于预测的图像大小，和train时使用同一个即可
 40 |         #---------------------------------------------------------------------#
 41 |         "input_shape"       : [300, 300],
 42 |         #-------------------------------#
 43 |         #   主干网络的选择
 44 |         #   vgg或者mobilenetv2或者resnet50
 45 |         #-------------------------------#
 46 |         "backbone"          : "vgg",
 47 |         #---------------------------------------------------------------------#
 48 |         #   只有得分大于置信度的预测框会被保留下来
 49 |         #---------------------------------------------------------------------#
 50 |         "confidence"        : 0.5,
 51 |         #---------------------------------------------------------------------#
 52 |         #   非极大抑制所用到的nms_iou大小
 53 |         #---------------------------------------------------------------------#
 54 |         "nms_iou"           : 0.45,
 55 |         #---------------------------------------------------------------------#
 56 |         #   用于指定先验框的大小
 57 |         #---------------------------------------------------------------------#
 58 |         'anchors_size'      : [30, 60, 111, 162, 213, 264, 315],
 59 |         #---------------------------------------------------------------------#
 60 |         #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
 61 |         #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
 62 |         #---------------------------------------------------------------------#
 63 |         "letterbox_image"   : False,
 64 |         #-------------------------------#
 65 |         #   是否使用Cuda
 66 |         #   没有GPU可以设置成False
 67 |         #-------------------------------#
 68 |         "cuda"              : True,
 69 |     }
 70 | 
 71 |     @classmethod
 72 |     def get_defaults(cls, n):
 73 |         if n in cls._defaults:
 74 |             return cls._defaults[n]
 75 |         else:
 76 |             return "Unrecognized attribute name '" + n + "'"
 77 | 
 78 |     #---------------------------------------------------#
 79 |     #   初始化ssd
 80 |     #---------------------------------------------------#
 81 |     def __init__(self, **kwargs):
 82 |         self.__dict__.update(self._defaults)
 83 |         for name, value in kwargs.items():
 84 |             setattr(self, name, value)
 85 |         #---------------------------------------------------#
 86 |         #   计算总的类的数量
 87 |         #---------------------------------------------------#
 88 |         self.class_names, self.num_classes  = get_classes(self.classes_path)
 89 |         self.anchors                        = torch.from_numpy(get_anchors(self.input_shape, self.anchors_size, self.backbone)).type(torch.FloatTensor)
 90 |         if self.cuda:
 91 |             self.anchors = self.anchors.cuda()
 92 |         self.num_classes                    = self.num_classes + 1
 93 |         
 94 |         #---------------------------------------------------#
 95 |         #   画框设置不同的颜色
 96 |         #---------------------------------------------------#
 97 |         hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]
 98 |         self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
 99 |         self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))
100 | 
101 |         self.bbox_util = BBoxUtility(self.num_classes)
102 |         self.generate()
103 |         
104 |         show_config(**self._defaults)
105 | 
106 |     #---------------------------------------------------#
107 |     #   载入模型
108 |     #---------------------------------------------------#
109 |     def generate(self, onnx=False):
110 |         #-------------------------------#
111 |         #   载入模型与权值
112 |         #-------------------------------#
113 |         self.net    = SSD300(self.num_classes, self.backbone)
114 |         device      = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
115 |         self.net.load_state_dict(torch.load(self.model_path, map_location=device))
116 |         self.net    = self.net.eval()
117 |         print('{} model, anchors, and classes loaded.'.format(self.model_path))
118 |         if not onnx:
119 |             if self.cuda:
120 |                 self.net = torch.nn.DataParallel(self.net)
121 |                 self.net = self.net.cuda()
122 | 
123 |     #---------------------------------------------------#
124 |     #   检测图片
125 |     #---------------------------------------------------#
126 |     def detect_image(self, image, crop = False, count = False):
127 |         #---------------------------------------------------#
128 |         #   计算输入图片的高和宽
129 |         #---------------------------------------------------#
130 |         image_shape = np.array(np.shape(image)[0:2])
131 |         #---------------------------------------------------------#
132 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
133 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
134 |         #---------------------------------------------------------#
135 |         image       = cvtColor(image)
136 |         #---------------------------------------------------------#
137 |         #   给图像增加灰条，实现不失真的resize
138 |         #   也可以直接resize进行识别
139 |         #---------------------------------------------------------#
140 |         image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
141 |         #---------------------------------------------------------#
142 |         #   添加上batch_size维度，图片预处理，归一化。
143 |         #---------------------------------------------------------#
144 |         image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
145 | 
146 |         with torch.no_grad():
147 |             #---------------------------------------------------#
148 |             #   转化成torch的形式
149 |             #---------------------------------------------------#
150 |             images = torch.from_numpy(image_data).type(torch.FloatTensor)
151 |             if self.cuda:
152 |                 images = images.cuda()
153 |             #---------------------------------------------------------#
154 |             #   将图像输入网络当中进行预测！
155 |             #---------------------------------------------------------#
156 |             outputs     = self.net(images)
157 |             #-----------------------------------------------------------#
158 |             #   将预测结果进行解码
159 |             #-----------------------------------------------------------#
160 |             results     = self.bbox_util.decode_box(outputs, self.anchors, image_shape, self.input_shape, self.letterbox_image, 
161 |                                                     nms_iou = self.nms_iou, confidence = self.confidence)
162 |             #--------------------------------------#
163 |             #   如果没有检测到物体，则返回原图
164 |             #--------------------------------------#
165 |             if len(results[0]) <= 0:
166 |                 return image
167 | 
168 |             top_label   = np.array(results[0][:, 4], dtype = 'int32')
169 |             top_conf    = results[0][:, 5]
170 |             top_boxes   = results[0][:, :4]
171 |         #---------------------------------------------------------#
172 |         #   设置字体与边框厚度
173 |         #---------------------------------------------------------#
174 |         font = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * np.shape(image)[1] + 0.5).astype('int32'))
175 |         thickness = max((np.shape(image)[0] + np.shape(image)[1]) // self.input_shape[0], 1)
176 |         #---------------------------------------------------------#
177 |         #   计数
178 |         #---------------------------------------------------------#
179 |         if count:
180 |             print("top_label:", top_label)
181 |             classes_nums    = np.zeros([self.num_classes])
182 |             for i in range(self.num_classes):
183 |                 num = np.sum(top_label == i)
184 |                 if num > 0:
185 |                     print(self.class_names[i], " : ", num)
186 |                 classes_nums[i] = num
187 |             print("classes_nums:", classes_nums)
188 |         #---------------------------------------------------------#
189 |         #   是否进行目标的裁剪
190 |         #---------------------------------------------------------#
191 |         if crop:
192 |             for i, c in list(enumerate(top_boxes)):
193 |                 top, left, bottom, right = top_boxes[i]
194 |                 top     = max(0, np.floor(top).astype('int32'))
195 |                 left    = max(0, np.floor(left).astype('int32'))
196 |                 bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
197 |                 right   = min(image.size[0], np.floor(right).astype('int32'))
198 |                 
199 |                 dir_save_path = "img_crop"
200 |                 if not os.path.exists(dir_save_path):
201 |                     os.makedirs(dir_save_path)
202 |                 crop_image = image.crop([left, top, right, bottom])
203 |                 crop_image.save(os.path.join(dir_save_path, "crop_" + str(i) + ".png"), quality=95, subsampling=0)
204 |                 print("save crop_" + str(i) + ".png to " + dir_save_path)
205 |         #---------------------------------------------------------#
206 |         #   图像绘制
207 |         #---------------------------------------------------------#
208 |         for i, c in list(enumerate(top_label)):
209 |             predicted_class = self.class_names[int(c)]
210 |             box             = top_boxes[i]
211 |             score           = top_conf[i]
212 | 
213 |             top, left, bottom, right = box
214 | 
215 |             top     = max(0, np.floor(top).astype('int32'))
216 |             left    = max(0, np.floor(left).astype('int32'))
217 |             bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
218 |             right   = min(image.size[0], np.floor(right).astype('int32'))
219 | 
220 |             label = '{} {:.2f}'.format(predicted_class, score)
221 |             draw = ImageDraw.Draw(image)
222 |             label_size = draw.textsize(label, font)
223 |             label = label.encode('utf-8')
224 |             print(label, top, left, bottom, right)
225 |             
226 |             if top - label_size[1] >= 0:
227 |                 text_origin = np.array([left, top - label_size[1]])
228 |             else:
229 |                 text_origin = np.array([left, top + 1])
230 | 
231 |             for i in range(thickness):
232 |                 draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c])
233 |             draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c])
234 |             draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
235 |             del draw
236 | 
237 |         return image
238 | 
239 |     def get_FPS(self, image, test_interval):
240 |         #---------------------------------------------------#
241 |         #   计算输入图片的高和宽
242 |         #---------------------------------------------------#
243 |         image_shape = np.array(np.shape(image)[0:2])
244 |         #---------------------------------------------------------#
245 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
246 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
247 |         #---------------------------------------------------------#
248 |         image       = cvtColor(image)
249 |         #---------------------------------------------------------#
250 |         #   给图像增加灰条，实现不失真的resize
251 |         #   也可以直接resize进行识别
252 |         #---------------------------------------------------------#
253 |         image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
254 |         #---------------------------------------------------------#
255 |         #   添加上batch_size维度，图片预处理，归一化。
256 |         #---------------------------------------------------------#
257 |         image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
258 | 
259 |         with torch.no_grad():
260 |             #---------------------------------------------------#
261 |             #   转化成torch的形式
262 |             #---------------------------------------------------#
263 |             images = torch.from_numpy(image_data).type(torch.FloatTensor)
264 |             if self.cuda:
265 |                 images = images.cuda()
266 |             #---------------------------------------------------------#
267 |             #   将图像输入网络当中进行预测！
268 |             #---------------------------------------------------------#
269 |             outputs     = self.net(images)
270 |             #-----------------------------------------------------------#
271 |             #   将预测结果进行解码
272 |             #-----------------------------------------------------------#
273 |             results     = self.bbox_util.decode_box(outputs, self.anchors, image_shape, self.input_shape, self.letterbox_image, 
274 |                                                     nms_iou = self.nms_iou, confidence = self.confidence)
275 | 
276 |         t1 = time.time()
277 |         for _ in range(test_interval):
278 |             with torch.no_grad():
279 |                 #---------------------------------------------------------#
280 |                 #   将图像输入网络当中进行预测！
281 |                 #---------------------------------------------------------#
282 |                 outputs     = self.net(images)
283 |                 #-----------------------------------------------------------#
284 |                 #   将预测结果进行解码
285 |                 #-----------------------------------------------------------#
286 |                 results     = self.bbox_util.decode_box(outputs, self.anchors, image_shape, self.input_shape, self.letterbox_image, 
287 |                                                         nms_iou = self.nms_iou, confidence = self.confidence)
288 | 
289 |         t2 = time.time()
290 |         tact_time = (t2 - t1) / test_interval
291 |         return tact_time
292 | 
293 |     def convert_to_onnx(self, simplify, model_path):
294 |         import onnx
295 |         self.generate(onnx=True)
296 | 
297 |         im                  = torch.zeros(1, 3, *self.input_shape).to('cpu')  # image size(1, 3, 512, 512) BCHW
298 |         input_layer_names   = ["images"]
299 |         output_layer_names  = ["output"]
300 |         
301 |         # Export the model
302 |         print(f'Starting export with onnx {onnx.__version__}.')
303 |         torch.onnx.export(self.net,
304 |                         im,
305 |                         f               = model_path,
306 |                         verbose         = False,
307 |                         opset_version   = 12,
308 |                         training        = torch.onnx.TrainingMode.EVAL,
309 |                         do_constant_folding = True,
310 |                         input_names     = input_layer_names,
311 |                         output_names    = output_layer_names,
312 |                         dynamic_axes    = None)
313 | 
314 |         # Checks
315 |         model_onnx = onnx.load(model_path)  # load onnx model
316 |         onnx.checker.check_model(model_onnx)  # check onnx model
317 | 
318 |         # Simplify onnx
319 |         if simplify:
320 |             import onnxsim
321 |             print(f'Simplifying with onnx-simplifier {onnxsim.__version__}.')
322 |             model_onnx, check = onnxsim.simplify(
323 |                 model_onnx,
324 |                 dynamic_input_shape=False,
325 |                 input_shapes=None)
326 |             assert check, 'assert check failed'
327 |             onnx.save(model_onnx, model_path)
328 | 
329 |         print('Onnx model save as {}'.format(model_path))
330 |     
331 |     def get_map_txt(self, image_id, image, class_names, map_out_path):
332 |         f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 
333 |         #---------------------------------------------------#
334 |         #   计算输入图片的高和宽
335 |         #---------------------------------------------------#
336 |         image_shape = np.array(np.shape(image)[0:2])
337 |         #---------------------------------------------------------#
338 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
339 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
340 |         #---------------------------------------------------------#
341 |         image       = cvtColor(image)
342 |         #---------------------------------------------------------#
343 |         #   给图像增加灰条，实现不失真的resize
344 |         #   也可以直接resize进行识别
345 |         #---------------------------------------------------------#
346 |         image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
347 |         #---------------------------------------------------------#
348 |         #   添加上batch_size维度，图片预处理，归一化。
349 |         #---------------------------------------------------------#
350 |         image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
351 | 
352 |         with torch.no_grad():
353 |             #---------------------------------------------------#
354 |             #   转化成torch的形式
355 |             #---------------------------------------------------#
356 |             images = torch.from_numpy(image_data).type(torch.FloatTensor)
357 |             if self.cuda:
358 |                 images = images.cuda()
359 |             #---------------------------------------------------------#
360 |             #   将图像输入网络当中进行预测！
361 |             #---------------------------------------------------------#
362 |             outputs     = self.net(images)
363 |             #-----------------------------------------------------------#
364 |             #   将预测结果进行解码
365 |             #-----------------------------------------------------------#
366 |             results     = self.bbox_util.decode_box(outputs, self.anchors, image_shape, self.input_shape, self.letterbox_image, 
367 |                                                     nms_iou = self.nms_iou, confidence = self.confidence)
368 |             #--------------------------------------#
369 |             #   如果没有检测到物体，则返回原图
370 |             #--------------------------------------#
371 |             if len(results[0]) <= 0:
372 |                 return 
373 | 
374 |             top_label   = np.array(results[0][:, 4], dtype = 'int32')
375 |             top_conf    = results[0][:, 5]
376 |             top_boxes   = results[0][:, :4]
377 |         
378 |         for i, c in list(enumerate(top_label)):
379 |             predicted_class = self.class_names[int(c)]
380 |             box             = top_boxes[i]
381 |             score           = str(top_conf[i])
382 | 
383 |             top, left, bottom, right = box
384 |             if predicted_class not in class_names:
385 |                 continue
386 | 
387 |             f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
388 | 
389 |         f.close()
390 |         return 
391 | 


--------------------------------------------------------------------------------
/summary.py:
--------------------------------------------------------------------------------
 1 | #--------------------------------------------#
 2 | #   该部分代码用于看网络结构
 3 | #--------------------------------------------#
 4 | import torch
 5 | from thop import clever_format, profile
 6 | from torchsummary import summary
 7 | 
 8 | from nets.ssd import SSD300
 9 | 
10 | if __name__ == "__main__":
11 |     input_shape = [300, 300]
12 |     num_classes = 21
13 |     backbone    = "vgg"
14 |     
15 |     device  = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
16 |     m       = SSD300(num_classes, backbone).to(device)
17 |     summary(m, (3, input_shape[0], input_shape[1]))
18 |     
19 |     dummy_input     = torch.randn(1, 3, input_shape[0], input_shape[1]).to(device)
20 |     flops, params   = profile(m.to(device), (dummy_input, ), verbose=False)
21 |     #--------------------------------------------------------#
22 |     #   flops * 2是因为profile没有将卷积作为两个operations
23 |     #   有些论文将卷积算乘法、加法两个operations。此时乘2
24 |     #   有些论文只考虑乘法的运算次数，忽略加法。此时不乘2
25 |     #   本代码选择乘2，参考YOLOX。
26 |     #--------------------------------------------------------#
27 |     flops           = flops * 2
28 |     flops, params   = clever_format([flops, params], "%.3f")
29 |     print('Total GFLOPS: %s' % (flops))
30 |     print('Total params: %s' % (params))
31 | 
32 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | import os
  3 | import warnings
  4 | from functools import partial
  5 | 
  6 | import numpy as np
  7 | import torch
  8 | import torch.backends.cudnn as cudnn
  9 | import torch.distributed as dist
 10 | import torch.nn as nn
 11 | import torch.optim as optim
 12 | from torch.utils.data import DataLoader
 13 | 
 14 | from nets.ssd import SSD300
 15 | from nets.ssd_training import (MultiboxLoss, get_lr_scheduler,
 16 |                                set_optimizer_lr, weights_init)
 17 | from utils.anchors import get_anchors
 18 | from utils.callbacks import EvalCallback, LossHistory
 19 | from utils.dataloader import SSDDataset, ssd_dataset_collate
 20 | from utils.utils import (download_weights, get_classes, seed_everything,
 21 |                          show_config, worker_init_fn)
 22 | from utils.utils_fit import fit_one_epoch
 23 | 
 24 | warnings.filterwarnings("ignore")
 25 | 
 26 | '''
 27 | 训练自己的目标检测模型一定需要注意以下几点：
 28 | 1、训练前仔细检查自己的格式是否满足要求，该库要求数据集格式为VOC格式，需要准备好的内容有输入图片和标签
 29 |    输入图片为.jpg图片，无需固定大小，传入训练前会自动进行resize。
 30 |    灰度图会自动转成RGB图片进行训练，无需自己修改。
 31 |    输入图片如果后缀非jpg，需要自己批量转成jpg后再开始训练。
 32 | 
 33 |    标签为.xml格式，文件中会有需要检测的目标信息，标签文件和输入图片文件相对应。
 34 | 
 35 | 2、损失值的大小用于判断是否收敛，比较重要的是有收敛的趋势，即验证集损失不断下降，如果验证集损失基本上不改变的话，模型基本上就收敛了。
 36 |    损失值的具体大小并没有什么意义，大和小只在于损失的计算方式，并不是接近于0才好。如果想要让损失好看点，可以直接到对应的损失函数里面除上10000。
 37 |    训练过程中的损失值会保存在logs文件夹下的loss_%Y_%m_%d_%H_%M_%S文件夹中
 38 |    
 39 | 3、训练好的权值文件保存在logs文件夹中，每个训练世代（Epoch）包含若干训练步长（Step），每个训练步长（Step）进行一次梯度下降。
 40 |    如果只是训练了几个Step是不会保存的，Epoch和Step的概念要捋清楚一下。
 41 | '''
 42 | if __name__ == "__main__":
 43 |     #---------------------------------#
 44 |     #   Cuda    是否使用Cuda
 45 |     #           没有GPU可以设置成False
 46 |     #---------------------------------#
 47 |     Cuda = True
 48 |     #----------------------------------------------#
 49 |     #   Seed    用于固定随机种子
 50 |     #           使得每次独立训练都可以获得一样的结果
 51 |     #----------------------------------------------#
 52 |     seed            = 11
 53 |     #---------------------------------------------------------------------#
 54 |     #   distributed     用于指定是否使用单机多卡分布式运行
 55 |     #                   终端指令仅支持Ubuntu。CUDA_VISIBLE_DEVICES用于在Ubuntu下指定显卡。
 56 |     #                   Windows系统下默认使用DP模式调用所有显卡，不支持DDP。
 57 |     #   DP模式：
 58 |     #       设置            distributed = False
 59 |     #       在终端中输入    CUDA_VISIBLE_DEVICES=0,1 python train.py
 60 |     #   DDP模式：
 61 |     #       设置            distributed = True
 62 |     #       在终端中输入    CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py
 63 |     #---------------------------------------------------------------------#
 64 |     distributed     = False
 65 |     #---------------------------------------------------------------------#
 66 |     #   sync_bn     是否使用sync_bn，DDP模式多卡可用
 67 |     #---------------------------------------------------------------------#
 68 |     sync_bn         = False
 69 |     #---------------------------------------------------------------------#
 70 |     #   fp16        是否使用混合精度训练
 71 |     #               可减少约一半的显存、需要pytorch1.7.1以上
 72 |     #---------------------------------------------------------------------#
 73 |     fp16            = False
 74 |     #---------------------------------------------------------------------#
 75 |     #   classes_path    指向model_data下的txt，与自己训练的数据集相关 
 76 |     #                   训练前一定要修改classes_path，使其对应自己的数据集
 77 |     #---------------------------------------------------------------------#
 78 |     classes_path    = 'model_data/voc_classes.txt'
 79 |     #----------------------------------------------------------------------------------------------------------------------------#
 80 |     #   权值文件的下载请看README，可以通过网盘下载。模型的 预训练权重 对不同数据集是通用的，因为特征是通用的。
 81 |     #   模型的 预训练权重 比较重要的部分是 主干特征提取网络的权值部分，用于进行特征提取。
 82 |     #   预训练权重对于99%的情况都必须要用，不用的话主干部分的权值太过随机，特征提取效果不明显，网络训练的结果也不会好
 83 |     #
 84 |     #   如果训练过程中存在中断训练的操作，可以将model_path设置成logs文件夹下的权值文件，将已经训练了一部分的权值再次载入。
 85 |     #   同时修改下方的 冻结阶段 或者 解冻阶段 的参数，来保证模型epoch的连续性。
 86 |     #   
 87 |     #   当model_path = ''的时候不加载整个模型的权值。
 88 |     #
 89 |     #   此处使用的是整个模型的权重，因此是在train.py进行加载的，下面的pretrain不影响此处的权值加载。
 90 |     #   如果想要让模型从主干的预训练权值开始训练，则设置model_path = ''，下面的pretrain = True，此时仅加载主干。
 91 |     #   如果想要让模型从0开始训练，则设置model_path = ''，下面的pretrain = Fasle，Freeze_Train = Fasle，此时从0开始训练，且没有冻结主干的过程。
 92 |     #   一般来讲，从0开始训练效果会很差，因为权值太过随机，特征提取效果不明显。
 93 |     #
 94 |     #   网络一般不从0开始训练，至少会使用主干部分的权值，有些论文提到可以不用预训练，主要原因是他们 数据集较大 且 调参能力优秀。
 95 |     #   如果一定要训练网络的主干部分，可以了解imagenet数据集，首先训练分类模型，分类模型的 主干部分 和该模型通用，基于此进行训练。
 96 |     #----------------------------------------------------------------------------------------------------------------------------#
 97 |     model_path      = 'model_data/ssd_weights.pth'
 98 |     #------------------------------------------------------#
 99 |     #   input_shape     输入的shape大小
100 |     #------------------------------------------------------#
101 |     input_shape     = [300, 300]
102 |     #------------------------------------------------------#
103 |     #   vgg或者mobilenetv2或者resnet50
104 |     #------------------------------------------------------#
105 |     backbone        = "vgg"
106 |     #----------------------------------------------------------------------------------------------------------------------------#
107 |     #   pretrained      是否使用主干网络的预训练权重，此处使用的是主干的权重，因此是在模型构建的时候进行加载的。
108 |     #                   如果设置了model_path，则主干的权值无需加载，pretrained的值无意义。
109 |     #                   如果不设置model_path，pretrained = True，此时仅加载主干开始训练。
110 |     #                   如果不设置model_path，pretrained = False，Freeze_Train = Fasle，此时从0开始训练，且没有冻结主干的过程。
111 |     #----------------------------------------------------------------------------------------------------------------------------#
112 |     pretrained      = False
113 |     #------------------------------------------------------#
114 |     #   可用于设定先验框的大小，默认的anchors_size
115 |     #   是根据voc数据集设定的，大多数情况下都是通用的！
116 |     #   如果想要检测小物体，可以修改anchors_size
117 |     #   一般调小浅层先验框的大小就行了！因为浅层负责小物体检测！
118 |     #   比如anchors_size = [21, 45, 99, 153, 207, 261, 315]
119 |     #------------------------------------------------------#
120 |     anchors_size    = [30, 60, 111, 162, 213, 264, 315]
121 | 
122 |     #----------------------------------------------------------------------------------------------------------------------------#
123 |     #   训练分为两个阶段，分别是冻结阶段和解冻阶段。设置冻结阶段是为了满足机器性能不足的同学的训练需求。
124 |     #   冻结训练需要的显存较小，显卡非常差的情况下，可设置Freeze_Epoch等于UnFreeze_Epoch，此时仅仅进行冻结训练。
125 |     #      
126 |     #   在此提供若干参数设置建议，各位训练者根据自己的需求进行灵活调整：
127 |     #   （一）从整个模型的预训练权重开始训练： 
128 |     #       Adam：
129 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adam'，Init_lr = 6e-4，weight_decay = 0。（冻结）
130 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adam'，Init_lr = 6e-4，weight_decay = 0。（不冻结）
131 |     #       SGD：
132 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 200，Freeze_Train = True，optimizer_type = 'sgd'，Init_lr = 2e-3，weight_decay = 5e-4。（冻结）
133 |     #           Init_Epoch = 0，UnFreeze_Epoch = 200，Freeze_Train = False，optimizer_type = 'sgd'，Init_lr = 2e-3，weight_decay = 5e-4。（不冻结）
134 |     #       其中：UnFreeze_Epoch可以在100-300之间调整。
135 |     #   （二）从主干网络的预训练权重开始训练：
136 |     #       Adam：
137 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adam'，Init_lr = 6e-4，weight_decay = 0。（冻结）
138 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adam'，Init_lr = 6e-4，weight_decay = 0。（不冻结）
139 |     #       SGD：
140 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 200，Freeze_Train = True，optimizer_type = 'sgd'，Init_lr = 2e-3，weight_decay = 5e-4。（冻结）
141 |     #           Init_Epoch = 0，UnFreeze_Epoch = 200，Freeze_Train = False，optimizer_type = 'sgd'，Init_lr = 2e-3，weight_decay = 5e-4。（不冻结）
142 |     #       其中：由于从主干网络的预训练权重开始训练，主干的权值不一定适合目标检测，需要更多的训练跳出局部最优解。
143 |     #             UnFreeze_Epoch可以在200-300之间调整，YOLOV5和YOLOX均推荐使用300。
144 |     #             Adam相较于SGD收敛的快一些。因此UnFreeze_Epoch理论上可以小一点，但依然推荐更多的Epoch。
145 |     #   （三）batch_size的设置：
146 |     #       在显卡能够接受的范围内，以大为好。显存不足与数据集大小无关，提示显存不足（OOM或者CUDA out of memory）请调小batch_size。
147 |     #       受到BatchNorm层影响，batch_size最小为2，不能为1。
148 |     #       正常情况下Freeze_batch_size建议为Unfreeze_batch_size的1-2倍。不建议设置的差距过大，因为关系到学习率的自动调整。
149 |     #----------------------------------------------------------------------------------------------------------------------------#
150 |     #------------------------------------------------------------------#
151 |     #   冻结阶段训练参数
152 |     #   此时模型的主干被冻结了，特征提取网络不发生改变
153 |     #   占用的显存较小，仅对网络进行微调
154 |     #   Init_Epoch          模型当前开始的训练世代，其值可以大于Freeze_Epoch，如设置：
155 |     #                       Init_Epoch = 60、Freeze_Epoch = 50、UnFreeze_Epoch = 100
156 |     #                       会跳过冻结阶段，直接从60代开始，并调整对应的学习率。
157 |     #                       （断点续练时使用）
158 |     #   Freeze_Epoch        模型冻结训练的Freeze_Epoch
159 |     #                       (当Freeze_Train=False时失效)
160 |     #   Freeze_batch_size   模型冻结训练的batch_size
161 |     #                       (当Freeze_Train=False时失效)
162 |     #------------------------------------------------------------------#
163 |     Init_Epoch          = 0
164 |     Freeze_Epoch        = 50
165 |     Freeze_batch_size   = 16
166 |     #------------------------------------------------------------------#
167 |     #   解冻阶段训练参数
168 |     #   此时模型的主干不被冻结了，特征提取网络会发生改变
169 |     #   占用的显存较大，网络所有的参数都会发生改变
170 |     #   UnFreeze_Epoch          模型总共训练的epoch
171 |     #                           SGD需要更长的时间收敛，因此设置较大的UnFreeze_Epoch
172 |     #                           Adam可以使用相对较小的UnFreeze_Epoch
173 |     #   Unfreeze_batch_size     模型在解冻后的batch_size
174 |     #------------------------------------------------------------------#
175 |     UnFreeze_Epoch      = 200
176 |     Unfreeze_batch_size = 8
177 |     #------------------------------------------------------------------#
178 |     #   Freeze_Train    是否进行冻结训练
179 |     #                   默认先冻结主干训练后解冻训练。
180 |     #                   如果设置Freeze_Train=False，建议使用优化器为sgd
181 |     #------------------------------------------------------------------#
182 |     Freeze_Train        = True
183 | 
184 |     #------------------------------------------------------------------#
185 |     #   其它训练参数：学习率、优化器、学习率下降有关
186 |     #------------------------------------------------------------------#
187 |     #------------------------------------------------------------------#
188 |     #   Init_lr         模型的最大学习率
189 |     #                   当使用Adam优化器时建议设置  Init_lr=6e-4
190 |     #                   当使用SGD优化器时建议设置   Init_lr=2e-3
191 |     #   Min_lr          模型的最小学习率，默认为最大学习率的0.01
192 |     #------------------------------------------------------------------#
193 |     Init_lr             = 2e-3
194 |     Min_lr              = Init_lr * 0.01
195 |     #------------------------------------------------------------------#
196 |     #   optimizer_type  使用到的优化器种类，可选的有adam、sgd
197 |     #                   当使用Adam优化器时建议设置  Init_lr=6e-4
198 |     #                   当使用SGD优化器时建议设置   Init_lr=2e-3
199 |     #   momentum        优化器内部使用到的momentum参数
200 |     #   weight_decay    权值衰减，可防止过拟合
201 |     #                   adam会导致weight_decay错误，使用adam时建议设置为0。
202 |     #------------------------------------------------------------------#
203 |     optimizer_type      = "sgd"
204 |     momentum            = 0.937
205 |     weight_decay        = 5e-4
206 |     #------------------------------------------------------------------#
207 |     #   lr_decay_type   使用到的学习率下降方式，可选的有'step'、'cos'
208 |     #------------------------------------------------------------------#
209 |     lr_decay_type       = 'cos'
210 |     #------------------------------------------------------------------#
211 |     #   save_period     多少个epoch保存一次权值
212 |     #------------------------------------------------------------------#
213 |     save_period         = 10
214 |     #------------------------------------------------------------------#
215 |     #   save_dir        权值与日志文件保存的文件夹
216 |     #------------------------------------------------------------------#
217 |     save_dir            = 'logs'
218 |     #------------------------------------------------------------------#
219 |     #   eval_flag       是否在训练时进行评估，评估对象为验证集
220 |     #                   安装pycocotools库后，评估体验更佳。
221 |     #   eval_period     代表多少个epoch评估一次，不建议频繁的评估
222 |     #                   评估需要消耗较多的时间，频繁评估会导致训练非常慢
223 |     #   此处获得的mAP会与get_map.py获得的会有所不同，原因有二：
224 |     #   （一）此处获得的mAP为验证集的mAP。
225 |     #   （二）此处设置评估参数较为保守，目的是加快评估速度。
226 |     #------------------------------------------------------------------#
227 |     eval_flag           = True
228 |     eval_period         = 10
229 |     #------------------------------------------------------------------#
230 |     #   num_workers     用于设置是否使用多线程读取数据，1代表关闭多线程
231 |     #                   开启后会加快数据读取速度，但是会占用更多内存
232 |     #                   keras里开启多线程有些时候速度反而慢了许多
233 |     #                   在IO为瓶颈的时候再开启多线程，即GPU运算速度远大于读取图片的速度。
234 |     #------------------------------------------------------------------#
235 |     num_workers         = 4
236 | 
237 |     #------------------------------------------------------#
238 |     #   train_annotation_path   训练图片路径和标签
239 |     #   val_annotation_path     验证图片路径和标签
240 |     #------------------------------------------------------#
241 |     train_annotation_path   = '2007_train.txt'
242 |     val_annotation_path     = '2007_val.txt'
243 | 
244 |     seed_everything(seed)
245 |     #------------------------------------------------------#
246 |     #   设置用到的显卡
247 |     #------------------------------------------------------#
248 |     ngpus_per_node  = torch.cuda.device_count()
249 |     if distributed:
250 |         dist.init_process_group(backend="nccl")
251 |         local_rank  = int(os.environ["LOCAL_RANK"])
252 |         rank        = int(os.environ["RANK"])
253 |         device      = torch.device("cuda", local_rank)
254 |         if local_rank == 0:
255 |             print(f"[{os.getpid()}] (rank = {rank}, local_rank = {local_rank}) training...")
256 |             print("Gpu Device Count : ", ngpus_per_node)
257 |     else:
258 |         device          = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
259 |         local_rank      = 0
260 |         rank            = 0
261 | 
262 |     if pretrained:
263 |         if distributed:
264 |             if local_rank == 0:
265 |                 download_weights(backbone)  
266 |             dist.barrier()
267 |         else:
268 |             download_weights(backbone)
269 | 
270 |     #----------------------------------------------------#
271 |     #   获取classes和anchor
272 |     #----------------------------------------------------#
273 |     class_names, num_classes = get_classes(classes_path)
274 |     num_classes += 1
275 |     anchors = get_anchors(input_shape, anchors_size, backbone)
276 | 
277 |     model = SSD300(num_classes, backbone, pretrained)
278 |     if not pretrained:
279 |         weights_init(model)
280 |     if model_path != '':
281 |         #------------------------------------------------------#
282 |         #   权值文件请看README，百度网盘下载
283 |         #------------------------------------------------------#
284 |         if local_rank == 0:
285 |             print('Load weights {}.'.format(model_path))
286 |         
287 |         #------------------------------------------------------#
288 |         #   根据预训练权重的Key和模型的Key进行加载
289 |         #------------------------------------------------------#
290 |         model_dict      = model.state_dict()
291 |         pretrained_dict = torch.load(model_path, map_location = device)
292 |         load_key, no_load_key, temp_dict = [], [], {}
293 |         for k, v in pretrained_dict.items():
294 |             if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):
295 |                 temp_dict[k] = v
296 |                 load_key.append(k)
297 |             else:
298 |                 no_load_key.append(k)
299 |         model_dict.update(temp_dict)
300 |         model.load_state_dict(model_dict)
301 |         #------------------------------------------------------#
302 |         #   显示没有匹配上的Key
303 |         #------------------------------------------------------#
304 |         if local_rank == 0:
305 |             print("\nSuccessful Load Key:", str(load_key)[:500], "……\nSuccessful Load Key Num:", len(load_key))
306 |             print("\nFail To Load Key:", str(no_load_key)[:500], "……\nFail To Load Key num:", len(no_load_key))
307 |             print("\n\033[1;33;44m温馨提示，head部分没有载入是正常现象，Backbone部分没有载入是错误的。\033[0m")
308 | 
309 |     #----------------------#
310 |     #   获得损失函数
311 |     #----------------------#
312 |     criterion       = MultiboxLoss(num_classes, neg_pos_ratio=3.0)
313 |     #----------------------#
314 |     #   记录Loss
315 |     #----------------------#
316 |     if local_rank == 0:
317 |         time_str        = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S')
318 |         log_dir         = os.path.join(save_dir, "loss_" + str(time_str))
319 |         loss_history    = LossHistory(log_dir, model, input_shape=input_shape)
320 |     else:
321 |         loss_history    = None
322 |         
323 |     #------------------------------------------------------------------#
324 |     #   torch 1.2不支持amp，建议使用torch 1.7.1及以上正确使用fp16
325 |     #   因此torch1.2这里显示"could not be resolve"
326 |     #------------------------------------------------------------------#
327 |     if fp16:
328 |         from torch.cuda.amp import GradScaler as GradScaler
329 |         scaler = GradScaler()
330 |     else:
331 |         scaler = None
332 | 
333 |     model_train     = model.train()
334 |     #----------------------------#
335 |     #   多卡同步Bn
336 |     #----------------------------#
337 |     if sync_bn and ngpus_per_node > 1 and distributed:
338 |         model_train = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model_train)
339 |     elif sync_bn:
340 |         print("Sync_bn is not support in one gpu or not distributed.")
341 | 
342 |     if Cuda:
343 |         if distributed:
344 |             #----------------------------#
345 |             #   多卡平行运行
346 |             #----------------------------#
347 |             model_train = model_train.cuda(local_rank)
348 |             model_train = torch.nn.parallel.DistributedDataParallel(model_train, device_ids=[local_rank], find_unused_parameters=True)
349 |         else:
350 |             model_train = torch.nn.DataParallel(model)
351 |             cudnn.benchmark = True
352 |             model_train = model_train.cuda()
353 | 
354 |     #---------------------------#
355 |     #   读取数据集对应的txt
356 |     #---------------------------#
357 |     with open(train_annotation_path, encoding='utf-8') as f:
358 |         train_lines = f.readlines()
359 |     with open(val_annotation_path, encoding='utf-8') as f:
360 |         val_lines   = f.readlines()
361 |     num_train   = len(train_lines)
362 |     num_val     = len(val_lines)   
363 |     
364 |     if local_rank == 0:
365 |         show_config(
366 |             classes_path = classes_path, model_path = model_path, input_shape = input_shape, \
367 |             Init_Epoch = Init_Epoch, Freeze_Epoch = Freeze_Epoch, UnFreeze_Epoch = UnFreeze_Epoch, Freeze_batch_size = Freeze_batch_size, Unfreeze_batch_size = Unfreeze_batch_size, Freeze_Train = Freeze_Train, \
368 |             Init_lr = Init_lr, Min_lr = Min_lr, optimizer_type = optimizer_type, momentum = momentum, lr_decay_type = lr_decay_type, \
369 |             save_period = save_period, save_dir = save_dir, num_workers = num_workers, num_train = num_train, num_val = num_val
370 |         )
371 |         #---------------------------------------------------------#
372 |         #   总训练世代指的是遍历全部数据的总次数
373 |         #   总训练步长指的是梯度下降的总次数 
374 |         #   每个训练世代包含若干训练步长，每个训练步长进行一次梯度下降。
375 |         #   此处仅建议最低训练世代，上不封顶，计算时只考虑了解冻部分
376 |         #----------------------------------------------------------#
377 |         wanted_step = 5e4 if optimizer_type == "sgd" else 1.5e4
378 |         total_step  = num_train // Unfreeze_batch_size * UnFreeze_Epoch
379 |         if total_step <= wanted_step:
380 |             if num_train // Unfreeze_batch_size == 0:
381 |                 raise ValueError('数据集过小，无法进行训练，请扩充数据集。')
382 |             wanted_epoch = wanted_step // (num_train // Unfreeze_batch_size) + 1
383 |             print("\n\033[1;33;44m[Warning] 使用%s优化器时，建议将训练总步长设置到%d以上。\033[0m"%(optimizer_type, wanted_step))
384 |             print("\033[1;33;44m[Warning] 本次运行的总训练数据量为%d，Unfreeze_batch_size为%d，共训练%d个Epoch，计算出总训练步长为%d。\033[0m"%(num_train, Unfreeze_batch_size, UnFreeze_Epoch, total_step))
385 |             print("\033[1;33;44m[Warning] 由于总训练步长为%d，小于建议总步长%d，建议设置总世代为%d。\033[0m"%(total_step, wanted_step, wanted_epoch))
386 | 
387 |     #------------------------------------------------------#
388 |     #   主干特征提取网络特征通用，冻结训练可以加快训练速度
389 |     #   也可以在训练初期防止权值被破坏。
390 |     #   Init_Epoch为起始世代
391 |     #   Freeze_Epoch为冻结训练的世代
392 |     #   UnFreeze_Epoch总训练世代
393 |     #   提示OOM或者显存不足请调小Batch_size
394 |     #------------------------------------------------------#
395 |     if True:
396 |         UnFreeze_flag = False
397 |         #------------------------------------#
398 |         #   冻结一定部分训练
399 |         #------------------------------------#
400 |         if Freeze_Train:
401 |             if backbone == "vgg":
402 |                 for param in model.vgg[:28].parameters():
403 |                     param.requires_grad = False
404 |             elif backbone == "mobilenetv2":
405 |                 for param in model.mobilenet.parameters():
406 |                     param.requires_grad = False
407 |             else:
408 |                 for param in model.resnet.parameters():
409 |                     param.requires_grad = False
410 | 
411 |         #-------------------------------------------------------------------#
412 |         #   如果不冻结训练的话，直接设置batch_size为Unfreeze_batch_size
413 |         #-------------------------------------------------------------------#
414 |         batch_size = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size
415 | 
416 |         #-------------------------------------------------------------------#
417 |         #   判断当前batch_size，自适应调整学习率
418 |         #-------------------------------------------------------------------#
419 |         nbs             = 64
420 |         lr_limit_max    = 1e-3 if optimizer_type == 'adam' else 5e-2
421 |         lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-5
422 |         Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
423 |         Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
424 | 
425 |         #---------------------------------------#
426 |         #   根据optimizer_type选择优化器
427 |         #---------------------------------------#
428 |         optimizer = {
429 |             'adam'  : optim.Adam(model.parameters(), Init_lr_fit, betas = (momentum, 0.999), weight_decay = weight_decay),
430 |             'sgd'   : optim.SGD(model.parameters(), Init_lr_fit, momentum = momentum, nesterov=True, weight_decay = weight_decay)
431 |         }[optimizer_type]
432 | 
433 |         #---------------------------------------#
434 |         #   获得学习率下降的公式
435 |         #---------------------------------------#
436 |         lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
437 |         
438 |         #---------------------------------------#
439 |         #   判断每一个世代的长度
440 |         #---------------------------------------#
441 |         epoch_step      = num_train // batch_size
442 |         epoch_step_val  = num_val // batch_size
443 |         
444 |         if epoch_step == 0 or epoch_step_val == 0:
445 |             raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")
446 | 
447 |         train_dataset   = SSDDataset(train_lines, input_shape, anchors, batch_size, num_classes, train = True)
448 |         val_dataset     = SSDDataset(val_lines, input_shape, anchors, batch_size, num_classes, train = False)
449 |         
450 |         if distributed:
451 |             train_sampler   = torch.utils.data.distributed.DistributedSampler(train_dataset, shuffle=True,)
452 |             val_sampler     = torch.utils.data.distributed.DistributedSampler(val_dataset, shuffle=False,)
453 |             batch_size      = batch_size // ngpus_per_node
454 |             shuffle         = False
455 |         else:
456 |             train_sampler   = None
457 |             val_sampler     = None
458 |             shuffle         = True
459 | 
460 |         gen             = DataLoader(train_dataset, shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True,
461 |                                     drop_last=True, collate_fn=ssd_dataset_collate, sampler=train_sampler, 
462 |                                     worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))
463 |         gen_val         = DataLoader(val_dataset  , shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 
464 |                                     drop_last=True, collate_fn=ssd_dataset_collate, sampler=val_sampler, 
465 |                                     worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))
466 | 
467 |         #----------------------#
468 |         #   记录eval的map曲线
469 |         #----------------------#
470 |         if local_rank == 0:
471 |             eval_callback   = EvalCallback(model, input_shape, anchors, class_names, num_classes, val_lines, log_dir, Cuda, \
472 |                                             eval_flag=eval_flag, period=eval_period)
473 |         else:
474 |             eval_callback   = None
475 |         
476 |         #---------------------------------------#
477 |         #   开始模型训练
478 |         #---------------------------------------#
479 |         for epoch in range(Init_Epoch, UnFreeze_Epoch):
480 |             #---------------------------------------#
481 |             #   如果模型有冻结学习部分
482 |             #   则解冻，并设置参数
483 |             #---------------------------------------#
484 |             if epoch >= Freeze_Epoch and not UnFreeze_flag and Freeze_Train:
485 |                 batch_size = Unfreeze_batch_size
486 | 
487 |                 #-------------------------------------------------------------------#
488 |                 #   判断当前batch_size，自适应调整学习率
489 |                 #-------------------------------------------------------------------#
490 |                 nbs             = 64
491 |                 lr_limit_max    = 1e-3 if optimizer_type == 'adam' else 5e-2
492 |                 lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-5
493 |                 Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
494 |                 Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
495 |                 #---------------------------------------#
496 |                 #   获得学习率下降的公式
497 |                 #---------------------------------------#
498 |                 lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
499 |                 
500 |                 if backbone == "vgg":
501 |                     for param in model.vgg[:28].parameters():
502 |                         param.requires_grad = True
503 |                 elif backbone == "mobilenetv2":
504 |                     for param in model.mobilenet.parameters():
505 |                         param.requires_grad = True
506 |                 else:
507 |                     for param in model.resnet.parameters():
508 |                         param.requires_grad = True
509 |                         
510 |                 epoch_step      = num_train // batch_size
511 |                 epoch_step_val  = num_val // batch_size
512 | 
513 |                 if epoch_step == 0 or epoch_step_val == 0:
514 |                     raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")
515 | 
516 |                 if distributed:
517 |                     batch_size = batch_size // ngpus_per_node
518 |                             
519 |                 gen         = DataLoader(train_dataset, shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True,
520 |                                             drop_last=True, collate_fn=ssd_dataset_collate, sampler=train_sampler, 
521 |                                             worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))
522 |                 gen_val     = DataLoader(val_dataset  , shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 
523 |                                             drop_last=True, collate_fn=ssd_dataset_collate, sampler=val_sampler, 
524 |                                             worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))
525 | 
526 |                 UnFreeze_flag = True
527 | 
528 |             if distributed:
529 |                 train_sampler.set_epoch(epoch)
530 | 
531 |             set_optimizer_lr(optimizer, lr_scheduler_func, epoch)
532 | 
533 |             fit_one_epoch(model_train, model, criterion, loss_history, eval_callback, optimizer, epoch, 
534 |                     epoch_step, epoch_step_val, gen, gen_val, UnFreeze_Epoch, Cuda, fp16, scaler, save_period, save_dir, local_rank)
535 |                         
536 |             if distributed:
537 |                 dist.barrier()
538 | 
539 |         if local_rank == 0:
540 |             loss_history.writer.close()
541 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/utils/anchors.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | class AnchorBox():
  5 |     def __init__(self, input_shape, min_size, max_size=None, aspect_ratios=None, flip=True):
  6 |         self.input_shape = input_shape
  7 | 
  8 |         self.min_size = min_size
  9 |         self.max_size = max_size
 10 | 
 11 |         self.aspect_ratios = []
 12 |         for ar in aspect_ratios:
 13 |             self.aspect_ratios.append(ar)
 14 |             self.aspect_ratios.append(1.0 / ar)
 15 | 
 16 |     def call(self, layer_shape, mask=None):
 17 |         # --------------------------------- #
 18 |         #   获取输入进来的特征层的宽和高
 19 |         #   比如38x38
 20 |         # --------------------------------- #
 21 |         layer_height    = layer_shape[0]
 22 |         layer_width     = layer_shape[1]
 23 |         # --------------------------------- #
 24 |         #   获取输入进来的图片的宽和高
 25 |         #   比如300x300
 26 |         # --------------------------------- #
 27 |         img_height  = self.input_shape[0]
 28 |         img_width   = self.input_shape[1]
 29 | 
 30 |         box_widths  = []
 31 |         box_heights = []
 32 |         # --------------------------------- #
 33 |         #   self.aspect_ratios一般有两个值
 34 |         #   [1, 1, 2, 1/2]
 35 |         #   [1, 1, 2, 1/2, 3, 1/3]
 36 |         # --------------------------------- #
 37 |         for ar in self.aspect_ratios:
 38 |             # 首先添加一个较小的正方形
 39 |             if ar == 1 and len(box_widths) == 0:
 40 |                 box_widths.append(self.min_size)
 41 |                 box_heights.append(self.min_size)
 42 |             # 然后添加一个较大的正方形
 43 |             elif ar == 1 and len(box_widths) > 0:
 44 |                 box_widths.append(np.sqrt(self.min_size * self.max_size))
 45 |                 box_heights.append(np.sqrt(self.min_size * self.max_size))
 46 |             # 然后添加长方形
 47 |             elif ar != 1:
 48 |                 box_widths.append(self.min_size * np.sqrt(ar))
 49 |                 box_heights.append(self.min_size / np.sqrt(ar))
 50 | 
 51 |         # --------------------------------- #
 52 |         #   获得所有先验框的宽高1/2
 53 |         # --------------------------------- #
 54 |         box_widths  = 0.5 * np.array(box_widths)
 55 |         box_heights = 0.5 * np.array(box_heights)
 56 | 
 57 |         # --------------------------------- #
 58 |         #   每一个特征层对应的步长
 59 |         # --------------------------------- #
 60 |         step_x = img_width / layer_width
 61 |         step_y = img_height / layer_height
 62 | 
 63 |         # --------------------------------- #
 64 |         #   生成网格中心
 65 |         # --------------------------------- #
 66 |         linx = np.linspace(0.5 * step_x, img_width - 0.5 * step_x,
 67 |                            layer_width)
 68 |         liny = np.linspace(0.5 * step_y, img_height - 0.5 * step_y,
 69 |                            layer_height)
 70 |         centers_x, centers_y = np.meshgrid(linx, liny)
 71 |         centers_x = centers_x.reshape(-1, 1)
 72 |         centers_y = centers_y.reshape(-1, 1)
 73 | 
 74 |         # 每一个先验框需要两个(centers_x, centers_y)，前一个用来计算左上角，后一个计算右下角
 75 |         num_anchors_ = len(self.aspect_ratios)
 76 |         anchor_boxes = np.concatenate((centers_x, centers_y), axis=1)
 77 |         anchor_boxes = np.tile(anchor_boxes, (1, 2 * num_anchors_))
 78 |         # 获得先验框的左上角和右下角
 79 |         anchor_boxes[:, ::4]    -= box_widths
 80 |         anchor_boxes[:, 1::4]   -= box_heights
 81 |         anchor_boxes[:, 2::4]   += box_widths
 82 |         anchor_boxes[:, 3::4]   += box_heights
 83 | 
 84 |         # --------------------------------- #
 85 |         #   将先验框变成小数的形式
 86 |         #   归一化
 87 |         # --------------------------------- #
 88 |         anchor_boxes[:, ::2]    /= img_width
 89 |         anchor_boxes[:, 1::2]   /= img_height
 90 |         anchor_boxes = anchor_boxes.reshape(-1, 4)
 91 | 
 92 |         anchor_boxes = np.minimum(np.maximum(anchor_boxes, 0.0), 1.0)
 93 |         return anchor_boxes
 94 | 
 95 | #---------------------------------------------------#
 96 | #   用于计算共享特征层的大小
 97 | #---------------------------------------------------#
 98 | def get_vgg_output_length(height, width):
 99 |     filter_sizes    = [3, 3, 3, 3, 3, 3, 3, 3]
100 |     padding         = [1, 1, 1, 1, 1, 1, 0, 0]
101 |     stride          = [2, 2, 2, 2, 2, 2, 1, 1]
102 |     feature_heights = []
103 |     feature_widths  = []
104 | 
105 |     for i in range(len(filter_sizes)):
106 |         height  = (height + 2*padding[i] - filter_sizes[i]) // stride[i] + 1
107 |         width   = (width + 2*padding[i] - filter_sizes[i]) // stride[i] + 1
108 |         feature_heights.append(height)
109 |         feature_widths.append(width)
110 |     return np.array(feature_heights)[-6:], np.array(feature_widths)[-6:]
111 |     
112 | def get_mobilenet_output_length(height, width):
113 |     filter_sizes    = [3, 3, 3, 3, 3, 3, 3, 3, 3]
114 |     padding         = [1, 1, 1, 1, 1, 1, 1, 1, 1]
115 |     stride          = [2, 2, 2, 2, 2, 2, 2, 2, 2]
116 |     feature_heights = []
117 |     feature_widths  = []
118 | 
119 |     for i in range(len(filter_sizes)):
120 |         height  = (height + 2*padding[i] - filter_sizes[i]) // stride[i] + 1
121 |         width   = (width + 2*padding[i] - filter_sizes[i]) // stride[i] + 1
122 |         feature_heights.append(height)
123 |         feature_widths.append(width)
124 |     return np.array(feature_heights)[-6:], np.array(feature_widths)[-6:]
125 | 
126 | def get_anchors(input_shape = [300,300], anchors_size = [30, 60, 111, 162, 213, 264, 315], backbone = 'vgg'):
127 |     if backbone == 'vgg' or backbone == 'resnet50':
128 |         feature_heights, feature_widths = get_vgg_output_length(input_shape[0], input_shape[1])
129 |         aspect_ratios = [[1, 2], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2], [1, 2]]
130 |     else:
131 |         feature_heights, feature_widths = get_mobilenet_output_length(input_shape[0], input_shape[1])
132 |         aspect_ratios = [[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
133 |         
134 |     anchors = []
135 |     for i in range(len(feature_heights)):
136 |         anchor_boxes = AnchorBox(input_shape, anchors_size[i], max_size = anchors_size[i+1], 
137 |                     aspect_ratios = aspect_ratios[i]).call([feature_heights[i], feature_widths[i]])
138 |         anchors.append(anchor_boxes)
139 | 
140 |     anchors = np.concatenate(anchors, axis=0)
141 |     return anchors
142 | 
143 | if __name__ == '__main__':
144 |     import matplotlib.pyplot as plt
145 |     class AnchorBox_for_Vision():
146 |         def __init__(self, input_shape, min_size, max_size=None, aspect_ratios=None, flip=True):
147 |             # 获得输入图片的大小，300x300
148 |             self.input_shape = input_shape
149 | 
150 |             # 先验框的短边
151 |             self.min_size = min_size
152 |             # 先验框的长边
153 |             self.max_size = max_size
154 | 
155 |             # [1, 2] => [1, 1, 2, 1/2]
156 |             # [1, 2, 3] => [1, 1, 2, 1/2, 3, 1/3]
157 |             self.aspect_ratios = []
158 |             for ar in aspect_ratios:
159 |                 self.aspect_ratios.append(ar)
160 |                 self.aspect_ratios.append(1.0 / ar)
161 | 
162 |         def call(self, layer_shape, mask=None):
163 |             # --------------------------------- #
164 |             #   获取输入进来的特征层的宽和高
165 |             #   比如3x3
166 |             # --------------------------------- #
167 |             layer_height    = layer_shape[0]
168 |             layer_width     = layer_shape[1]
169 |             # --------------------------------- #
170 |             #   获取输入进来的图片的宽和高
171 |             #   比如300x300
172 |             # --------------------------------- #
173 |             img_height  = self.input_shape[0]
174 |             img_width   = self.input_shape[1]
175 |             
176 |             box_widths  = []
177 |             box_heights = []
178 |             # --------------------------------- #
179 |             #   self.aspect_ratios一般有两个值
180 |             #   [1, 1, 2, 1/2]
181 |             #   [1, 1, 2, 1/2, 3, 1/3]
182 |             # --------------------------------- #
183 |             for ar in self.aspect_ratios:
184 |                 # 首先添加一个较小的正方形
185 |                 if ar == 1 and len(box_widths) == 0:
186 |                     box_widths.append(self.min_size)
187 |                     box_heights.append(self.min_size)
188 |                 # 然后添加一个较大的正方形
189 |                 elif ar == 1 and len(box_widths) > 0:
190 |                     box_widths.append(np.sqrt(self.min_size * self.max_size))
191 |                     box_heights.append(np.sqrt(self.min_size * self.max_size))
192 |                 # 然后添加长方形
193 |                 elif ar != 1:
194 |                     box_widths.append(self.min_size * np.sqrt(ar))
195 |                     box_heights.append(self.min_size / np.sqrt(ar))
196 | 
197 |             print("box_widths:", box_widths)
198 |             print("box_heights:", box_heights)
199 |             
200 |             # --------------------------------- #
201 |             #   获得所有先验框的宽高1/2
202 |             # --------------------------------- #
203 |             box_widths  = 0.5 * np.array(box_widths)
204 |             box_heights = 0.5 * np.array(box_heights)
205 | 
206 |             # --------------------------------- #
207 |             #   每一个特征层对应的步长
208 |             #   3x3的步长为100
209 |             # --------------------------------- #
210 |             step_x = img_width / layer_width
211 |             step_y = img_height / layer_height
212 | 
213 |             # --------------------------------- #
214 |             #   生成网格中心
215 |             # --------------------------------- #
216 |             linx = np.linspace(0.5 * step_x, img_width - 0.5 * step_x, layer_width)
217 |             liny = np.linspace(0.5 * step_y, img_height - 0.5 * step_y, layer_height)
218 |             # 构建网格
219 |             centers_x, centers_y = np.meshgrid(linx, liny)
220 |             centers_x = centers_x.reshape(-1, 1)
221 |             centers_y = centers_y.reshape(-1, 1)
222 | 
223 |             if layer_height == 3:
224 |                 fig = plt.figure()
225 |                 ax = fig.add_subplot(111)
226 |                 plt.ylim(-50,350)
227 |                 plt.xlim(-50,350)
228 |                 plt.scatter(centers_x,centers_y)
229 | 
230 |             # 每一个先验框需要两个(centers_x, centers_y)，前一个用来计算左上角，后一个计算右下角
231 |             num_anchors_ = len(self.aspect_ratios)
232 |             anchor_boxes = np.concatenate((centers_x, centers_y), axis=1)
233 |             anchor_boxes = np.tile(anchor_boxes, (1, 2 * num_anchors_))
234 |             
235 |             # 获得先验框的左上角和右下角
236 |             anchor_boxes[:, ::4]    -= box_widths
237 |             anchor_boxes[:, 1::4]   -= box_heights
238 |             anchor_boxes[:, 2::4]   += box_widths
239 |             anchor_boxes[:, 3::4]   += box_heights
240 | 
241 |             print(np.shape(anchor_boxes))
242 |             if layer_height == 3:
243 |                 rect1 = plt.Rectangle([anchor_boxes[4, 0],anchor_boxes[4, 1]],box_widths[0]*2,box_heights[0]*2,color="r",fill=False)
244 |                 rect2 = plt.Rectangle([anchor_boxes[4, 4],anchor_boxes[4, 5]],box_widths[1]*2,box_heights[1]*2,color="r",fill=False)
245 |                 rect3 = plt.Rectangle([anchor_boxes[4, 8],anchor_boxes[4, 9]],box_widths[2]*2,box_heights[2]*2,color="r",fill=False)
246 |                 rect4 = plt.Rectangle([anchor_boxes[4, 12],anchor_boxes[4, 13]],box_widths[3]*2,box_heights[3]*2,color="r",fill=False)
247 |                 
248 |                 ax.add_patch(rect1)
249 |                 ax.add_patch(rect2)
250 |                 ax.add_patch(rect3)
251 |                 ax.add_patch(rect4)
252 | 
253 |                 plt.show()
254 |             # --------------------------------- #
255 |             #   将先验框变成小数的形式
256 |             #   归一化
257 |             # --------------------------------- #
258 |             anchor_boxes[:, ::2]    /= img_width
259 |             anchor_boxes[:, 1::2]   /= img_height
260 |             anchor_boxes = anchor_boxes.reshape(-1, 4)
261 | 
262 |             anchor_boxes = np.minimum(np.maximum(anchor_boxes, 0.0), 1.0)
263 |             return anchor_boxes
264 | 
265 |     # 输入图片大小为300, 300
266 |     input_shape     = [300, 300] 
267 |     # 指定先验框的大小，即宽高
268 |     anchors_size    = [30, 60, 111, 162, 213, 264, 315]
269 |     # feature_heights   [38, 19, 10, 5, 3, 1]
270 |     # feature_widths    [38, 19, 10, 5, 3, 1]
271 |     feature_heights, feature_widths = get_vgg_output_length(input_shape[0], input_shape[1])
272 |     # 对先验框的数量进行一个指定 4，6
273 |     aspect_ratios                   = [[1, 2], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2], [1, 2]]
274 | 
275 |     anchors = []
276 |     for i in range(len(feature_heights)):
277 |         anchors.append(AnchorBox_for_Vision(input_shape, anchors_size[i], max_size = anchors_size[i+1], 
278 |                     aspect_ratios = aspect_ratios[i]).call([feature_heights[i], feature_widths[i]]))
279 | 
280 |     anchors = np.concatenate(anchors, axis=0)
281 |     print(np.shape(anchors))
282 | 


--------------------------------------------------------------------------------
/utils/callbacks.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | import os
  3 | 
  4 | import torch
  5 | import matplotlib
  6 | matplotlib.use('Agg')
  7 | import scipy.signal
  8 | from matplotlib import pyplot as plt
  9 | from torch.utils.tensorboard import SummaryWriter
 10 | 
 11 | import shutil
 12 | import numpy as np
 13 | 
 14 | from PIL import Image
 15 | from tqdm import tqdm
 16 | from .utils import cvtColor, preprocess_input, resize_image
 17 | from .utils_bbox import BBoxUtility
 18 | from .utils_map import get_coco_map, get_map
 19 | 
 20 | 
 21 | class LossHistory():
 22 |     def __init__(self, log_dir, model, input_shape):
 23 |         self.log_dir    = log_dir
 24 |         self.losses     = []
 25 |         self.val_loss   = []
 26 |         
 27 |         os.makedirs(self.log_dir)
 28 |         self.writer     = SummaryWriter(self.log_dir)
 29 |         try:
 30 |             dummy_input     = torch.randn(2, 3, input_shape[0], input_shape[1])
 31 |             self.writer.add_graph(model, dummy_input)
 32 |         except:
 33 |             pass
 34 | 
 35 |     def append_loss(self, epoch, loss, val_loss):
 36 |         if not os.path.exists(self.log_dir):
 37 |             os.makedirs(self.log_dir)
 38 | 
 39 |         self.losses.append(loss)
 40 |         self.val_loss.append(val_loss)
 41 | 
 42 |         with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f:
 43 |             f.write(str(loss))
 44 |             f.write("\n")
 45 |         with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f:
 46 |             f.write(str(val_loss))
 47 |             f.write("\n")
 48 | 
 49 |         self.writer.add_scalar('loss', loss, epoch)
 50 |         self.writer.add_scalar('val_loss', val_loss, epoch)
 51 |         self.loss_plot()
 52 | 
 53 |     def loss_plot(self):
 54 |         iters = range(len(self.losses))
 55 | 
 56 |         plt.figure()
 57 |         plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss')
 58 |         plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss')
 59 |         try:
 60 |             if len(self.losses) < 25:
 61 |                 num = 5
 62 |             else:
 63 |                 num = 15
 64 |             
 65 |             plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss')
 66 |             plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss')
 67 |         except:
 68 |             pass
 69 | 
 70 |         plt.grid(True)
 71 |         plt.xlabel('Epoch')
 72 |         plt.ylabel('Loss')
 73 |         plt.legend(loc="upper right")
 74 | 
 75 |         plt.savefig(os.path.join(self.log_dir, "epoch_loss.png"))
 76 | 
 77 |         plt.cla()
 78 |         plt.close("all")
 79 | 
 80 | class EvalCallback():
 81 |     def __init__(self, net, input_shape, anchors, class_names, num_classes, val_lines, log_dir, cuda, \
 82 |             map_out_path=".temp_map_out", max_boxes=100, confidence=0.05, nms_iou=0.5, letterbox_image=True, MINOVERLAP=0.5, eval_flag=True, period=1):
 83 |         super(EvalCallback, self).__init__()
 84 |         
 85 |         self.net                = net
 86 |         self.input_shape        = input_shape
 87 |         self.anchors            = anchors
 88 |         self.class_names        = class_names
 89 |         self.num_classes        = num_classes
 90 |         self.val_lines          = val_lines
 91 |         self.log_dir            = log_dir
 92 |         self.cuda               = cuda
 93 |         self.map_out_path       = map_out_path
 94 |         self.max_boxes          = max_boxes
 95 |         self.confidence         = confidence
 96 |         self.nms_iou            = nms_iou
 97 |         self.letterbox_image    = letterbox_image
 98 |         self.MINOVERLAP         = MINOVERLAP
 99 |         self.eval_flag          = eval_flag
100 |         self.period             = period
101 |         
102 |         self.anchors            = torch.from_numpy(self.anchors).type(torch.FloatTensor)
103 |         if self.cuda:
104 |             self.anchors = self.anchors.cuda()
105 |         
106 |         self.bbox_util = BBoxUtility(self.num_classes)
107 |         
108 |         self.maps       = [0]
109 |         self.epoches    = [0]
110 |         if self.eval_flag:
111 |             with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
112 |                 f.write(str(0))
113 |                 f.write("\n")
114 | 
115 |     def get_map_txt(self, image_id, image, class_names, map_out_path):
116 |         f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 
117 |         #---------------------------------------------------#
118 |         #   计算输入图片的高和宽
119 |         #---------------------------------------------------#
120 |         image_shape = np.array(np.shape(image)[0:2])
121 |         #---------------------------------------------------------#
122 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
123 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
124 |         #---------------------------------------------------------#
125 |         image       = cvtColor(image)
126 |         #---------------------------------------------------------#
127 |         #   给图像增加灰条，实现不失真的resize
128 |         #   也可以直接resize进行识别
129 |         #---------------------------------------------------------#
130 |         image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
131 |         #---------------------------------------------------------#
132 |         #   添加上batch_size维度，图片预处理，归一化。
133 |         #---------------------------------------------------------#
134 |         image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
135 | 
136 |         with torch.no_grad():
137 |             #---------------------------------------------------#
138 |             #   转化成torch的形式
139 |             #---------------------------------------------------#
140 |             images = torch.from_numpy(image_data).type(torch.FloatTensor)
141 |             if self.cuda:
142 |                 images = images.cuda()
143 |             #---------------------------------------------------------#
144 |             #   将图像输入网络当中进行预测！
145 |             #---------------------------------------------------------#
146 |             outputs     = self.net(images)
147 |             #-----------------------------------------------------------#
148 |             #   将预测结果进行解码
149 |             #-----------------------------------------------------------#
150 |             results     = self.bbox_util.decode_box(outputs, self.anchors, image_shape, self.input_shape, self.letterbox_image, 
151 |                                                     nms_iou = self.nms_iou, confidence = self.confidence)
152 |             #--------------------------------------#
153 |             #   如果没有检测到物体，则返回原图
154 |             #--------------------------------------#
155 |             if len(results[0]) <= 0:
156 |                 return 
157 | 
158 |             top_label   = np.array(results[0][:, 4], dtype = 'int32')
159 |             top_conf    = results[0][:, 5]
160 |             top_boxes   = results[0][:, :4]
161 | 
162 |         top_100     = np.argsort(top_conf)[::-1][:self.max_boxes]
163 |         top_boxes   = top_boxes[top_100]
164 |         top_conf    = top_conf[top_100]
165 |         top_label   = top_label[top_100]
166 | 
167 |         for i, c in list(enumerate(top_label)):
168 |             predicted_class = self.class_names[int(c)]
169 |             box             = top_boxes[i]
170 |             score           = str(top_conf[i])
171 | 
172 |             top, left, bottom, right = box
173 |             if predicted_class not in class_names:
174 |                 continue
175 | 
176 |             f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
177 | 
178 |         f.close()
179 |         return 
180 |     
181 |     def on_epoch_end(self, epoch, model_eval):
182 |         if epoch % self.period == 0 and self.eval_flag:
183 |             self.net = model_eval
184 |             if not os.path.exists(self.map_out_path):
185 |                 os.makedirs(self.map_out_path)
186 |             if not os.path.exists(os.path.join(self.map_out_path, "ground-truth")):
187 |                 os.makedirs(os.path.join(self.map_out_path, "ground-truth"))
188 |             if not os.path.exists(os.path.join(self.map_out_path, "detection-results")):
189 |                 os.makedirs(os.path.join(self.map_out_path, "detection-results"))
190 |             print("Get map.")
191 |             for annotation_line in tqdm(self.val_lines):
192 |                 line        = annotation_line.split()
193 |                 image_id    = os.path.basename(line[0]).split('.')[0]
194 |                 #------------------------------#
195 |                 #   读取图像并转换成RGB图像
196 |                 #------------------------------#
197 |                 image       = Image.open(line[0])
198 |                 #------------------------------#
199 |                 #   获得预测框
200 |                 #------------------------------#
201 |                 gt_boxes    = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
202 |                 #------------------------------#
203 |                 #   获得预测txt
204 |                 #------------------------------#
205 |                 self.get_map_txt(image_id, image, self.class_names, self.map_out_path)
206 |                 
207 |                 #------------------------------#
208 |                 #   获得真实框txt
209 |                 #------------------------------#
210 |                 with open(os.path.join(self.map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
211 |                     for box in gt_boxes:
212 |                         left, top, right, bottom, obj = box
213 |                         obj_name = self.class_names[obj]
214 |                         new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
215 |                         
216 |             print("Calculate Map.")
217 |             try:
218 |                 temp_map = get_coco_map(class_names = self.class_names, path = self.map_out_path)[1]
219 |             except:
220 |                 temp_map = get_map(self.MINOVERLAP, False, path = self.map_out_path)
221 |             self.maps.append(temp_map)
222 |             self.epoches.append(epoch)
223 | 
224 |             with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
225 |                 f.write(str(temp_map))
226 |                 f.write("\n")
227 |             
228 |             plt.figure()
229 |             plt.plot(self.epoches, self.maps, 'red', linewidth = 2, label='train map')
230 | 
231 |             plt.grid(True)
232 |             plt.xlabel('Epoch')
233 |             plt.ylabel('Map %s'%str(self.MINOVERLAP))
234 |             plt.title('A Map Curve')
235 |             plt.legend(loc="upper right")
236 | 
237 |             plt.savefig(os.path.join(self.log_dir, "epoch_map.png"))
238 |             plt.cla()
239 |             plt.close("all")
240 | 
241 |             print("Get map done.")
242 |             shutil.rmtree(self.map_out_path)
243 | 


--------------------------------------------------------------------------------
/utils/dataloader.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | import torch
  4 | from PIL import Image
  5 | from torch.utils.data.dataset import Dataset
  6 | 
  7 | from utils.utils import cvtColor, preprocess_input
  8 | 
  9 | 
 10 | class SSDDataset(Dataset):
 11 |     def __init__(self, annotation_lines, input_shape, anchors, batch_size, num_classes, train, overlap_threshold = 0.5):
 12 |         super(SSDDataset, self).__init__()
 13 |         self.annotation_lines   = annotation_lines
 14 |         self.length             = len(self.annotation_lines)
 15 |         
 16 |         self.input_shape        = input_shape
 17 |         self.anchors            = anchors
 18 |         self.num_anchors        = len(anchors)
 19 |         self.batch_size         = batch_size
 20 |         self.num_classes        = num_classes
 21 |         self.train              = train
 22 |         self.overlap_threshold  = overlap_threshold
 23 | 
 24 |     def __len__(self):
 25 |         return self.length
 26 | 
 27 |     def __getitem__(self, index):
 28 |         index = index % self.length
 29 |         #---------------------------------------------------#
 30 |         #   训练时进行数据的随机增强
 31 |         #   验证时不进行数据的随机增强
 32 |         #---------------------------------------------------#
 33 |         image, box  = self.get_random_data(self.annotation_lines[index], self.input_shape, random = self.train)
 34 |         image_data  = np.transpose(preprocess_input(np.array(image, dtype = np.float32)), (2, 0, 1))
 35 |         if len(box)!=0:
 36 |             boxes               = np.array(box[:,:4] , dtype=np.float32)
 37 |             # 进行归一化，调整到0-1之间
 38 |             boxes[:, [0, 2]]    = boxes[:,[0, 2]] / self.input_shape[1]
 39 |             boxes[:, [1, 3]]    = boxes[:,[1, 3]] / self.input_shape[0]
 40 |             # 对真实框的种类进行one hot处理
 41 |             one_hot_label   = np.eye(self.num_classes - 1)[np.array(box[:,4], np.int32)]
 42 |             box             = np.concatenate([boxes, one_hot_label], axis=-1)
 43 |         box = self.assign_boxes(box)
 44 | 
 45 |         return np.array(image_data, np.float32), np.array(box, np.float32)
 46 | 
 47 |     def rand(self, a=0, b=1):
 48 |         return np.random.rand()*(b-a) + a
 49 | 
 50 |     def get_random_data(self, annotation_line, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.4, random=True):
 51 |         line = annotation_line.split()
 52 |         #------------------------------#
 53 |         #   读取图像并转换成RGB图像
 54 |         #------------------------------#
 55 |         image   = Image.open(line[0])
 56 |         image   = cvtColor(image)
 57 |         #------------------------------#
 58 |         #   获得图像的高宽与目标高宽
 59 |         #------------------------------#
 60 |         iw, ih  = image.size
 61 |         h, w    = input_shape
 62 |         #------------------------------#
 63 |         #   获得预测框
 64 |         #------------------------------#
 65 |         box     = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
 66 | 
 67 |         if not random:
 68 |             scale = min(w/iw, h/ih)
 69 |             nw = int(iw*scale)
 70 |             nh = int(ih*scale)
 71 |             dx = (w-nw)//2
 72 |             dy = (h-nh)//2
 73 | 
 74 |             #---------------------------------#
 75 |             #   将图像多余的部分加上灰条
 76 |             #---------------------------------#
 77 |             image       = image.resize((nw,nh), Image.BICUBIC)
 78 |             new_image   = Image.new('RGB', (w,h), (128,128,128))
 79 |             new_image.paste(image, (dx, dy))
 80 |             image_data  = np.array(new_image, np.float32)
 81 | 
 82 |             #---------------------------------#
 83 |             #   对真实框进行调整
 84 |             #---------------------------------#
 85 |             if len(box)>0:
 86 |                 np.random.shuffle(box)
 87 |                 box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
 88 |                 box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
 89 |                 box[:, 0:2][box[:, 0:2]<0] = 0
 90 |                 box[:, 2][box[:, 2]>w] = w
 91 |                 box[:, 3][box[:, 3]>h] = h
 92 |                 box_w = box[:, 2] - box[:, 0]
 93 |                 box_h = box[:, 3] - box[:, 1]
 94 |                 box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
 95 | 
 96 |             return image_data, box
 97 |                 
 98 |         #------------------------------------------#
 99 |         #   对图像进行缩放并且进行长和宽的扭曲
100 |         #------------------------------------------#
101 |         new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter)
102 |         scale = self.rand(.25, 2)
103 |         if new_ar < 1:
104 |             nh = int(scale*h)
105 |             nw = int(nh*new_ar)
106 |         else:
107 |             nw = int(scale*w)
108 |             nh = int(nw/new_ar)
109 |         image = image.resize((nw,nh), Image.BICUBIC)
110 | 
111 |         #------------------------------------------#
112 |         #   将图像多余的部分加上灰条
113 |         #------------------------------------------#
114 |         dx = int(self.rand(0, w-nw))
115 |         dy = int(self.rand(0, h-nh))
116 |         new_image = Image.new('RGB', (w,h), (128,128,128))
117 |         new_image.paste(image, (dx, dy))
118 |         image = new_image
119 | 
120 |         #------------------------------------------#
121 |         #   翻转图像
122 |         #------------------------------------------#
123 |         flip = self.rand()<.5
124 |         if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
125 | 
126 |         image_data      = np.array(image, np.uint8)
127 |         #---------------------------------#
128 |         #   对图像进行色域变换
129 |         #   计算色域变换的参数
130 |         #---------------------------------#
131 |         r               = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1
132 |         #---------------------------------#
133 |         #   将图像转到HSV上
134 |         #---------------------------------#
135 |         hue, sat, val   = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV))
136 |         dtype           = image_data.dtype
137 |         #---------------------------------#
138 |         #   应用变换
139 |         #---------------------------------#
140 |         x       = np.arange(0, 256, dtype=r.dtype)
141 |         lut_hue = ((x * r[0]) % 180).astype(dtype)
142 |         lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
143 |         lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
144 | 
145 |         image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))
146 |         image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB)
147 | 
148 |         #---------------------------------#
149 |         #   对真实框进行调整
150 |         #---------------------------------#
151 |         if len(box)>0:
152 |             np.random.shuffle(box)
153 |             box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
154 |             box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
155 |             if flip: box[:, [0,2]] = w - box[:, [2,0]]
156 |             box[:, 0:2][box[:, 0:2]<0] = 0
157 |             box[:, 2][box[:, 2]>w] = w
158 |             box[:, 3][box[:, 3]>h] = h
159 |             box_w = box[:, 2] - box[:, 0]
160 |             box_h = box[:, 3] - box[:, 1]
161 |             box = box[np.logical_and(box_w>1, box_h>1)] 
162 |         
163 |         return image_data, box
164 | 
165 |     def iou(self, box):
166 |         #---------------------------------------------#
167 |         #   计算出每个真实框与所有的先验框的iou
168 |         #   判断真实框与先验框的重合情况
169 |         #---------------------------------------------#
170 |         inter_upleft    = np.maximum(self.anchors[:, :2], box[:2])
171 |         inter_botright  = np.minimum(self.anchors[:, 2:4], box[2:])
172 | 
173 |         inter_wh    = inter_botright - inter_upleft
174 |         inter_wh    = np.maximum(inter_wh, 0)
175 |         inter       = inter_wh[:, 0] * inter_wh[:, 1]
176 |         #---------------------------------------------# 
177 |         #   真实框的面积
178 |         #---------------------------------------------#
179 |         area_true = (box[2] - box[0]) * (box[3] - box[1])
180 |         #---------------------------------------------#
181 |         #   先验框的面积
182 |         #---------------------------------------------#
183 |         area_gt = (self.anchors[:, 2] - self.anchors[:, 0])*(self.anchors[:, 3] - self.anchors[:, 1])
184 |         #---------------------------------------------#
185 |         #   计算iou
186 |         #---------------------------------------------#
187 |         union = area_true + area_gt - inter
188 | 
189 |         iou = inter / union
190 |         return iou
191 | 
192 |     def encode_box(self, box, return_iou=True, variances = [0.1, 0.1, 0.2, 0.2]):
193 |         #---------------------------------------------#
194 |         #   计算当前真实框和先验框的重合情况
195 |         #   iou [self.num_anchors]
196 |         #   encoded_box [self.num_anchors, 5]
197 |         #---------------------------------------------#
198 |         iou = self.iou(box)
199 |         encoded_box = np.zeros((self.num_anchors, 4 + return_iou))
200 |         
201 |         #---------------------------------------------#
202 |         #   找到每一个真实框，重合程度较高的先验框
203 |         #   真实框可以由这个先验框来负责预测
204 |         #---------------------------------------------#
205 |         assign_mask = iou > self.overlap_threshold
206 | 
207 |         #---------------------------------------------#
208 |         #   如果没有一个先验框重合度大于self.overlap_threshold
209 |         #   则选择重合度最大的为正样本
210 |         #---------------------------------------------#
211 |         if not assign_mask.any():
212 |             assign_mask[iou.argmax()] = True
213 |         
214 |         #---------------------------------------------#
215 |         #   利用iou进行赋值 
216 |         #---------------------------------------------#
217 |         if return_iou:
218 |             encoded_box[:, -1][assign_mask] = iou[assign_mask]
219 |         
220 |         #---------------------------------------------#
221 |         #   找到对应的先验框
222 |         #---------------------------------------------#
223 |         assigned_anchors = self.anchors[assign_mask]
224 | 
225 |         #---------------------------------------------#
226 |         #   逆向编码，将真实框转化为ssd预测结果的格式
227 |         #   先计算真实框的中心与长宽
228 |         #---------------------------------------------#
229 |         box_center  = 0.5 * (box[:2] + box[2:])
230 |         box_wh      = box[2:] - box[:2]
231 |         #---------------------------------------------#
232 |         #   再计算重合度较高的先验框的中心与长宽
233 |         #---------------------------------------------#
234 |         assigned_anchors_center = (assigned_anchors[:, 0:2] + assigned_anchors[:, 2:4]) * 0.5
235 |         assigned_anchors_wh     = (assigned_anchors[:, 2:4] - assigned_anchors[:, 0:2])
236 |         
237 |         #------------------------------------------------#
238 |         #   逆向求取ssd应该有的预测结果
239 |         #   先求取中心的预测结果，再求取宽高的预测结果
240 |         #   存在改变数量级的参数，默认为[0.1,0.1,0.2,0.2]
241 |         #------------------------------------------------#
242 |         encoded_box[:, :2][assign_mask] = box_center - assigned_anchors_center
243 |         encoded_box[:, :2][assign_mask] /= assigned_anchors_wh
244 |         encoded_box[:, :2][assign_mask] /= np.array(variances)[:2]
245 | 
246 |         encoded_box[:, 2:4][assign_mask] = np.log(box_wh / assigned_anchors_wh)
247 |         encoded_box[:, 2:4][assign_mask] /= np.array(variances)[2:4]
248 |         return encoded_box.ravel()
249 | 
250 |     def assign_boxes(self, boxes):
251 |         #---------------------------------------------------#
252 |         #   assignment分为3个部分
253 |         #   :4      的内容为网络应该有的回归预测结果
254 |         #   4:-1    的内容为先验框所对应的种类，默认为背景
255 |         #   -1      的内容为当前先验框是否包含目标
256 |         #---------------------------------------------------#
257 |         assignment          = np.zeros((self.num_anchors, 4 + self.num_classes + 1))
258 |         assignment[:, 4]    = 1.0
259 |         if len(boxes) == 0:
260 |             return assignment
261 | 
262 |         # 对每一个真实框都进行iou计算
263 |         encoded_boxes   = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])
264 |         #---------------------------------------------------#
265 |         #   在reshape后，获得的encoded_boxes的shape为：
266 |         #   [num_true_box, num_anchors, 4 + 1]
267 |         #   4是编码后的结果，1为iou
268 |         #---------------------------------------------------#
269 |         encoded_boxes   = encoded_boxes.reshape(-1, self.num_anchors, 5)
270 |         
271 |         #---------------------------------------------------#
272 |         #   [num_anchors]求取每一个先验框重合度最大的真实框
273 |         #---------------------------------------------------#
274 |         best_iou        = encoded_boxes[:, :, -1].max(axis=0)
275 |         best_iou_idx    = encoded_boxes[:, :, -1].argmax(axis=0)
276 |         best_iou_mask   = best_iou > 0
277 |         best_iou_idx    = best_iou_idx[best_iou_mask]
278 |         
279 |         #---------------------------------------------------#
280 |         #   计算一共有多少先验框满足需求
281 |         #---------------------------------------------------#
282 |         assign_num      = len(best_iou_idx)
283 | 
284 |         # 将编码后的真实框取出
285 |         encoded_boxes   = encoded_boxes[:, best_iou_mask, :]
286 |         #---------------------------------------------------#
287 |         #   编码后的真实框的赋值
288 |         #---------------------------------------------------#
289 |         assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx, np.arange(assign_num), :4]
290 |         #----------------------------------------------------------#
291 |         #   4代表为背景的概率，设定为0，因为这些先验框有对应的物体
292 |         #----------------------------------------------------------#
293 |         assignment[:, 4][best_iou_mask]     = 0
294 |         assignment[:, 5:-1][best_iou_mask]  = boxes[best_iou_idx, 4:]
295 |         #----------------------------------------------------------#
296 |         #   -1表示先验框是否有对应的物体
297 |         #----------------------------------------------------------#
298 |         assignment[:, -1][best_iou_mask]    = 1
299 |         # 通过assign_boxes我们就获得了，输入进来的这张图片，应该有的预测结果是什么样子的
300 |         return assignment
301 | 
302 | # DataLoader中collate_fn使用
303 | def ssd_dataset_collate(batch):
304 |     images = []
305 |     bboxes = []
306 |     for img, box in batch:
307 |         images.append(img)
308 |         bboxes.append(box)
309 |     images = torch.from_numpy(np.array(images)).type(torch.FloatTensor)
310 |     bboxes = torch.from_numpy(np.array(bboxes)).type(torch.FloatTensor)
311 |     return images, bboxes
312 | 


--------------------------------------------------------------------------------
/utils/utils.py:
--------------------------------------------------------------------------------
  1 | import random
  2 | 
  3 | import numpy as np
  4 | import torch
  5 | from PIL import Image
  6 | 
  7 | 
  8 | #---------------------------------------------------------#
  9 | #   将图像转换成RGB图像，防止灰度图在预测时报错。
 10 | #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
 11 | #---------------------------------------------------------#
 12 | def cvtColor(image):
 13 |     if len(np.shape(image)) == 3 and np.shape(image)[2] == 3:
 14 |         return image 
 15 |     else:
 16 |         image = image.convert('RGB')
 17 |         return image 
 18 | 
 19 | #---------------------------------------------------#
 20 | #   对输入图像进行resize
 21 | #---------------------------------------------------#
 22 | def resize_image(image, size, letterbox_image):
 23 |     iw, ih  = image.size
 24 |     w, h    = size
 25 |     if letterbox_image:
 26 |         scale   = min(w/iw, h/ih)
 27 |         nw      = int(iw*scale)
 28 |         nh      = int(ih*scale)
 29 | 
 30 |         image   = image.resize((nw,nh), Image.BICUBIC)
 31 |         new_image = Image.new('RGB', size, (128,128,128))
 32 |         new_image.paste(image, ((w-nw)//2, (h-nh)//2))
 33 |     else:
 34 |         new_image = image.resize((w, h), Image.BICUBIC)
 35 |     return new_image
 36 | 
 37 | #---------------------------------------------------#
 38 | #   获得类
 39 | #---------------------------------------------------#
 40 | def get_classes(classes_path):
 41 |     with open(classes_path, encoding='utf-8') as f:
 42 |         class_names = f.readlines()
 43 |     class_names = [c.strip() for c in class_names]
 44 |     return class_names, len(class_names)
 45 | 
 46 | #---------------------------------------------------#
 47 | #   获得学习率
 48 | #---------------------------------------------------#
 49 | def preprocess_input(inputs):
 50 |     MEANS = (104, 117, 123)
 51 |     return inputs - MEANS
 52 | 
 53 | #---------------------------------------------------#
 54 | #   获得学习率
 55 | #---------------------------------------------------#
 56 | def get_lr(optimizer):
 57 |     for param_group in optimizer.param_groups:
 58 |         return param_group['lr']
 59 | 
 60 | #---------------------------------------------------#
 61 | #   设置种子
 62 | #---------------------------------------------------#
 63 | def seed_everything(seed=11):
 64 |     random.seed(seed)
 65 |     np.random.seed(seed)
 66 |     torch.manual_seed(seed)
 67 |     torch.cuda.manual_seed(seed)
 68 |     torch.cuda.manual_seed_all(seed)
 69 |     torch.backends.cudnn.deterministic = True
 70 |     torch.backends.cudnn.benchmark = False
 71 | 
 72 | #---------------------------------------------------#
 73 | #   设置Dataloader的种子
 74 | #---------------------------------------------------#
 75 | def worker_init_fn(worker_id, rank, seed):
 76 |     worker_seed = rank + seed
 77 |     random.seed(worker_seed)
 78 |     np.random.seed(worker_seed)
 79 |     torch.manual_seed(worker_seed)
 80 | 
 81 | def show_config(**kwargs):
 82 |     print('Configurations:')
 83 |     print('-' * 70)
 84 |     print('|%25s | %40s|' % ('keys', 'values'))
 85 |     print('-' * 70)
 86 |     for key, value in kwargs.items():
 87 |         print('|%25s | %40s|' % (str(key), str(value)))
 88 |     print('-' * 70)
 89 | 
 90 | def download_weights(backbone, model_dir="./model_data"):
 91 |     import os
 92 |     from torch.hub import load_state_dict_from_url
 93 |     
 94 |     download_urls = {
 95 |         'vgg'           : 'https://download.pytorch.org/models/vgg16-397923af.pth',
 96 |         'mobilenetv2'   : 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth',
 97 |         'resnet50'      : 'https://s3.amazonaws.com/pytorch/models/resnet50-19c8e357.pth'
 98 |     }
 99 |     url = download_urls[backbone]
100 |     
101 |     if not os.path.exists(model_dir):
102 |         os.makedirs(model_dir)
103 |     load_state_dict_from_url(url, model_dir)


--------------------------------------------------------------------------------
/utils/utils_bbox.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | from torch import nn
  4 | from torchvision.ops import nms
  5 | 
  6 | 
  7 | class BBoxUtility(object):
  8 |     def __init__(self, num_classes):
  9 |         self.num_classes    = num_classes
 10 | 
 11 |     def ssd_correct_boxes(self, box_xy, box_wh, input_shape, image_shape, letterbox_image):
 12 |         #-----------------------------------------------------------------#
 13 |         #   把y轴放前面是因为方便预测框和图像的宽高进行相乘
 14 |         #-----------------------------------------------------------------#
 15 |         box_yx = box_xy[..., ::-1]
 16 |         box_hw = box_wh[..., ::-1]
 17 |         input_shape = np.array(input_shape)
 18 |         image_shape = np.array(image_shape)
 19 | 
 20 |         if letterbox_image:
 21 |             #-----------------------------------------------------------------#
 22 |             #   这里求出来的offset是图像有效区域相对于图像左上角的偏移情况
 23 |             #   new_shape指的是宽高缩放情况
 24 |             #-----------------------------------------------------------------#
 25 |             new_shape = np.round(image_shape * np.min(input_shape/image_shape))
 26 |             offset  = (input_shape - new_shape)/2./input_shape
 27 |             scale   = input_shape/new_shape
 28 | 
 29 |             box_yx  = (box_yx - offset) * scale
 30 |             box_hw *= scale
 31 | 
 32 |         box_mins    = box_yx - (box_hw / 2.)
 33 |         box_maxes   = box_yx + (box_hw / 2.)
 34 |         boxes  = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1)
 35 |         boxes *= np.concatenate([image_shape, image_shape], axis=-1)
 36 |         return boxes
 37 | 
 38 |     def decode_boxes(self, mbox_loc, anchors, variances):
 39 |         # 获得先验框的宽与高
 40 |         anchor_width     = anchors[:, 2] - anchors[:, 0]
 41 |         anchor_height    = anchors[:, 3] - anchors[:, 1]
 42 |         # 获得先验框的中心点
 43 |         anchor_center_x  = 0.5 * (anchors[:, 2] + anchors[:, 0])
 44 |         anchor_center_y  = 0.5 * (anchors[:, 3] + anchors[:, 1])
 45 | 
 46 |         # 真实框距离先验框中心的xy轴偏移情况
 47 |         decode_bbox_center_x = mbox_loc[:, 0] * anchor_width * variances[0]
 48 |         decode_bbox_center_x += anchor_center_x
 49 |         decode_bbox_center_y = mbox_loc[:, 1] * anchor_height * variances[0]
 50 |         decode_bbox_center_y += anchor_center_y
 51 |         
 52 |         # 真实框的宽与高的求取
 53 |         decode_bbox_width   = torch.exp(mbox_loc[:, 2] * variances[1])
 54 |         decode_bbox_width   *= anchor_width
 55 |         decode_bbox_height  = torch.exp(mbox_loc[:, 3] * variances[1])
 56 |         decode_bbox_height  *= anchor_height
 57 | 
 58 |         # 获取真实框的左上角与右下角
 59 |         decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width
 60 |         decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height
 61 |         decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width
 62 |         decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height
 63 | 
 64 |         # 真实框的左上角与右下角进行堆叠
 65 |         decode_bbox = torch.cat((decode_bbox_xmin[:, None],
 66 |                                       decode_bbox_ymin[:, None],
 67 |                                       decode_bbox_xmax[:, None],
 68 |                                       decode_bbox_ymax[:, None]), dim=-1)
 69 |         # 防止超出0与1
 70 |         decode_bbox = torch.min(torch.max(decode_bbox, torch.zeros_like(decode_bbox)), torch.ones_like(decode_bbox))
 71 |         return decode_bbox
 72 | 
 73 |     def decode_box(self, predictions, anchors, image_shape, input_shape, letterbox_image, variances = [0.1, 0.2], nms_iou = 0.3, confidence = 0.5):
 74 |         #---------------------------------------------------#
 75 |         #   :4是回归预测结果
 76 |         #---------------------------------------------------#
 77 |         mbox_loc        = predictions[0]
 78 |         #---------------------------------------------------#
 79 |         #   获得种类的置信度
 80 |         #---------------------------------------------------#
 81 |         mbox_conf       = nn.Softmax(-1)(predictions[1])
 82 | 
 83 |         results = []
 84 |         #----------------------------------------------------------------------------------------------------------------#
 85 |         #   对每一张图片进行处理，由于在predict.py的时候，我们只输入一张图片，所以for i in range(len(mbox_loc))只进行一次
 86 |         #----------------------------------------------------------------------------------------------------------------#
 87 |         for i in range(len(mbox_loc)):
 88 |             results.append([])
 89 |             #--------------------------------#
 90 |             #   利用回归结果对先验框进行解码
 91 |             #--------------------------------#
 92 |             decode_bbox = self.decode_boxes(mbox_loc[i], anchors, variances)
 93 | 
 94 |             for c in range(1, self.num_classes):
 95 |                 #--------------------------------#
 96 |                 #   取出属于该类的所有框的置信度
 97 |                 #   判断是否大于门限
 98 |                 #--------------------------------#
 99 |                 c_confs     = mbox_conf[i, :, c]
100 |                 c_confs_m   = c_confs > confidence
101 |                 if len(c_confs[c_confs_m]) > 0:
102 |                     #-----------------------------------------#
103 |                     #   取出得分高于confidence的框
104 |                     #-----------------------------------------#
105 |                     boxes_to_process = decode_bbox[c_confs_m]
106 |                     confs_to_process = c_confs[c_confs_m]
107 | 
108 |                     keep = nms(
109 |                         boxes_to_process,
110 |                         confs_to_process,
111 |                         nms_iou
112 |                     )
113 |                     #-----------------------------------------#
114 |                     #   取出在非极大抑制中效果较好的内容
115 |                     #-----------------------------------------#
116 |                     good_boxes  = boxes_to_process[keep]
117 |                     confs       = confs_to_process[keep][:, None]
118 |                     labels      = (c - 1) * torch.ones((len(keep), 1)).cuda() if confs.is_cuda else (c - 1) * torch.ones((len(keep), 1))
119 |                     #-----------------------------------------#
120 |                     #   将label、置信度、框的位置进行堆叠。
121 |                     #-----------------------------------------#
122 |                     c_pred      = torch.cat((good_boxes, labels, confs), dim=1).cpu().numpy()
123 |                     # 添加进result里
124 |                     results[-1].extend(c_pred)
125 | 
126 |             if len(results[-1]) > 0:
127 |                 results[-1] = np.array(results[-1])
128 |                 box_xy, box_wh = (results[-1][:, 0:2] + results[-1][:, 2:4])/2, results[-1][:, 2:4] - results[-1][:, 0:2]
129 |                 results[-1][:, :4] = self.ssd_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
130 | 
131 |         return results
132 | 


--------------------------------------------------------------------------------
/utils/utils_fit.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import torch
  4 | from tqdm import tqdm
  5 | 
  6 | from utils.utils import get_lr
  7 | 
  8 | 
  9 | def fit_one_epoch(model_train, model, ssd_loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, cuda, fp16, scaler, save_period, save_dir, local_rank=0):
 10 |     total_loss  = 0
 11 |     val_loss    = 0 
 12 | 
 13 |     if local_rank == 0:
 14 |         print('Start Train')
 15 |         pbar = tqdm(total=epoch_step,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3)
 16 |     model_train.train()
 17 |     for iteration, batch in enumerate(gen):
 18 |         if iteration >= epoch_step:
 19 |             break
 20 |         images, targets = batch[0], batch[1]
 21 |         with torch.no_grad():
 22 |             if cuda:
 23 |                 images  = images.cuda(local_rank)
 24 |                 targets = targets.cuda(local_rank)
 25 |         if not fp16:
 26 |             #----------------------#
 27 |             #   前向传播
 28 |             #----------------------#
 29 |             out = model_train(images)
 30 |             #----------------------#
 31 |             #   清零梯度
 32 |             #----------------------#
 33 |             optimizer.zero_grad()
 34 |             #----------------------#
 35 |             #   计算损失
 36 |             #----------------------#
 37 |             loss = ssd_loss.forward(targets, out)
 38 |             #----------------------#
 39 |             #   反向传播
 40 |             #----------------------#
 41 |             loss.backward()
 42 |             optimizer.step()
 43 |         else:
 44 |             from torch.cuda.amp import autocast
 45 |             with autocast():
 46 |                 #----------------------#
 47 |                 #   前向传播
 48 |                 #----------------------#
 49 |                 out = model_train(images)
 50 |                 #----------------------#
 51 |                 #   清零梯度
 52 |                 #----------------------#
 53 |                 optimizer.zero_grad()
 54 |                 #----------------------#
 55 |                 #   计算损失
 56 |                 #----------------------#
 57 |                 loss = ssd_loss.forward(targets, out)
 58 | 
 59 |             #----------------------#
 60 |             #   反向传播
 61 |             #----------------------#
 62 |             scaler.scale(loss).backward()
 63 |             scaler.step(optimizer)
 64 |             scaler.update()
 65 | 
 66 |         total_loss += loss.item()
 67 |         
 68 |         if local_rank == 0:
 69 |             pbar.set_postfix(**{'total_loss'    : total_loss / (iteration + 1), 
 70 |                                 'lr'            : get_lr(optimizer)})
 71 |             pbar.update(1)
 72 |                 
 73 |     if local_rank == 0:
 74 |         pbar.close()
 75 |         print('Finish Train')
 76 |         print('Start Validation')
 77 |         pbar = tqdm(total=epoch_step_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3)
 78 | 
 79 |     model_train.eval()
 80 |     for iteration, batch in enumerate(gen_val):
 81 |         if iteration >= epoch_step_val:
 82 |             break
 83 |         images, targets = batch[0], batch[1]
 84 |         with torch.no_grad():
 85 |             if cuda:
 86 |                 images  = images.cuda(local_rank)
 87 |                 targets = targets.cuda(local_rank)
 88 | 
 89 |             out     = model_train(images)
 90 |             optimizer.zero_grad()
 91 |             loss    = ssd_loss.forward(targets, out)
 92 |             val_loss += loss.item()
 93 | 
 94 |             if local_rank == 0:
 95 |                 pbar.set_postfix(**{'val_loss'      : val_loss / (iteration + 1), 
 96 |                                     'lr'            : get_lr(optimizer)})
 97 |                 pbar.update(1)
 98 | 
 99 |     if local_rank == 0:
100 |         pbar.close()
101 |         print('Finish Validation')
102 |         loss_history.append_loss(epoch + 1, total_loss / epoch_step, val_loss / epoch_step_val)
103 |         eval_callback.on_epoch_end(epoch + 1, model_train)
104 |         print('Epoch:'+ str(epoch+1) + '/' + str(Epoch))
105 |         print('Total Loss: %.3f || Val Loss: %.3f ' % (total_loss / epoch_step, val_loss / epoch_step_val))
106 |         
107 |         #-----------------------------------------------#
108 |         #   保存权值
109 |         #-----------------------------------------------#
110 |         if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch:
111 |             torch.save(model.state_dict(), os.path.join(save_dir, "ep%03d-loss%.3f-val_loss%.3f.pth" % (epoch + 1, total_loss / epoch_step, val_loss / epoch_step_val)))
112 | 
113 |         if len(loss_history.val_loss) <= 1 or (val_loss / epoch_step_val) <= min(loss_history.val_loss):
114 |             print('Save best model to best_epoch_weights.pth')
115 |             torch.save(model.state_dict(), os.path.join(save_dir, "best_epoch_weights.pth"))
116 |             
117 |         torch.save(model.state_dict(), os.path.join(save_dir, "last_epoch_weights.pth"))


--------------------------------------------------------------------------------
/voc_annotation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import random
  3 | import xml.etree.ElementTree as ET
  4 | 
  5 | import numpy as np
  6 | 
  7 | from utils.utils import get_classes
  8 | 
  9 | #--------------------------------------------------------------------------------------------------------------------------------#
 10 | #   annotation_mode用于指定该文件运行时计算的内容
 11 | #   annotation_mode为0代表整个标签处理过程，包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt
 12 | #   annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt
 13 | #   annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt
 14 | #--------------------------------------------------------------------------------------------------------------------------------#
 15 | annotation_mode     = 0
 16 | #-------------------------------------------------------------------#
 17 | #   必须要修改，用于生成2007_train.txt、2007_val.txt的目标信息
 18 | #   与训练和预测所用的classes_path一致即可
 19 | #   如果生成的2007_train.txt里面没有目标信息
 20 | #   那么就是因为classes没有设定正确
 21 | #   仅在annotation_mode为0和2的时候有效
 22 | #-------------------------------------------------------------------#
 23 | classes_path        = 'model_data/voc_classes.txt'
 24 | #--------------------------------------------------------------------------------------------------------------------------------#
 25 | #   trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1
 26 | #   train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1
 27 | #   仅在annotation_mode为0和1的时候有效
 28 | #--------------------------------------------------------------------------------------------------------------------------------#
 29 | trainval_percent    = 0.9
 30 | train_percent       = 0.9
 31 | #-------------------------------------------------------#
 32 | #   指向VOC数据集所在的文件夹
 33 | #   默认指向根目录下的VOC数据集
 34 | #-------------------------------------------------------#
 35 | VOCdevkit_path  = 'VOCdevkit'
 36 | 
 37 | VOCdevkit_sets  = [('2007', 'train'), ('2007', 'val')]
 38 | classes, _      = get_classes(classes_path)
 39 | 
 40 | #-------------------------------------------------------#
 41 | #   统计目标数量
 42 | #-------------------------------------------------------#
 43 | photo_nums  = np.zeros(len(VOCdevkit_sets))
 44 | nums        = np.zeros(len(classes))
 45 | def convert_annotation(year, image_id, list_file):
 46 |     in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8')
 47 |     tree=ET.parse(in_file)
 48 |     root = tree.getroot()
 49 | 
 50 |     for obj in root.iter('object'):
 51 |         difficult = 0 
 52 |         if obj.find('difficult')!=None:
 53 |             difficult = obj.find('difficult').text
 54 |         cls = obj.find('name').text
 55 |         if cls not in classes or int(difficult)==1:
 56 |             continue
 57 |         cls_id = classes.index(cls)
 58 |         xmlbox = obj.find('bndbox')
 59 |         b = (int(float(xmlbox.find('xmin').text)), int(float(xmlbox.find('ymin').text)), int(float(xmlbox.find('xmax').text)), int(float(xmlbox.find('ymax').text)))
 60 |         list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
 61 |         
 62 |         nums[classes.index(cls)] = nums[classes.index(cls)] + 1
 63 |         
 64 | if __name__ == "__main__":
 65 |     random.seed(0)
 66 |     if " " in os.path.abspath(VOCdevkit_path):
 67 |         raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格，否则会影响正常的模型训练，请注意修改。")
 68 | 
 69 |     if annotation_mode == 0 or annotation_mode == 1:
 70 |         print("Generate txt in ImageSets.")
 71 |         xmlfilepath     = os.path.join(VOCdevkit_path, 'VOC2007/Annotations')
 72 |         saveBasePath    = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Main')
 73 |         temp_xml        = os.listdir(xmlfilepath)
 74 |         total_xml       = []
 75 |         for xml in temp_xml:
 76 |             if xml.endswith(".xml"):
 77 |                 total_xml.append(xml)
 78 | 
 79 |         num     = len(total_xml)  
 80 |         list    = range(num)  
 81 |         tv      = int(num*trainval_percent)  
 82 |         tr      = int(tv*train_percent)  
 83 |         trainval= random.sample(list,tv)  
 84 |         train   = random.sample(trainval,tr)  
 85 |         
 86 |         print("train and val size",tv)
 87 |         print("train size",tr)
 88 |         ftrainval   = open(os.path.join(saveBasePath,'trainval.txt'), 'w')  
 89 |         ftest       = open(os.path.join(saveBasePath,'test.txt'), 'w')  
 90 |         ftrain      = open(os.path.join(saveBasePath,'train.txt'), 'w')  
 91 |         fval        = open(os.path.join(saveBasePath,'val.txt'), 'w')  
 92 |         
 93 |         for i in list:  
 94 |             name=total_xml[i][:-4]+'\n'  
 95 |             if i in trainval:  
 96 |                 ftrainval.write(name)  
 97 |                 if i in train:  
 98 |                     ftrain.write(name)  
 99 |                 else:  
100 |                     fval.write(name)  
101 |             else:  
102 |                 ftest.write(name)  
103 |         
104 |         ftrainval.close()  
105 |         ftrain.close()  
106 |         fval.close()  
107 |         ftest.close()
108 |         print("Generate txt in ImageSets done.")
109 | 
110 |     if annotation_mode == 0 or annotation_mode == 2:
111 |         print("Generate 2007_train.txt and 2007_val.txt for train.")
112 |         type_index = 0
113 |         for year, image_set in VOCdevkit_sets:
114 |             image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split()
115 |             list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8')
116 |             for image_id in image_ids:
117 |                 list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(os.path.abspath(VOCdevkit_path), year, image_id))
118 | 
119 |                 convert_annotation(year, image_id, list_file)
120 |                 list_file.write('\n')
121 |             photo_nums[type_index] = len(image_ids)
122 |             type_index += 1
123 |             list_file.close()
124 |         print("Generate 2007_train.txt and 2007_val.txt for train done.")
125 |         
126 |         def printTable(List1, List2):
127 |             for i in range(len(List1[0])):
128 |                 print("|", end=' ')
129 |                 for j in range(len(List1)):
130 |                     print(List1[j][i].rjust(int(List2[j])), end=' ')
131 |                     print("|", end=' ')
132 |                 print()
133 | 
134 |         str_nums = [str(int(x)) for x in nums]
135 |         tableData = [
136 |             classes, str_nums
137 |         ]
138 |         colWidths = [0]*len(tableData)
139 |         len1 = 0
140 |         for i in range(len(tableData)):
141 |             for j in range(len(tableData[i])):
142 |                 if len(tableData[i][j]) > colWidths[i]:
143 |                     colWidths[i] = len(tableData[i][j])
144 |         printTable(tableData, colWidths)
145 | 
146 |         if photo_nums[0] <= 500:
147 |             print("训练集数量小于500，属于较小的数据量，请注意设置较大的训练世代（Epoch）以满足足够的梯度下降次数（Step）。")
148 | 
149 |         if np.sum(nums) == 0:
150 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
151 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
152 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
153 |             print("（重要的事情说三遍）。")
154 | 


--------------------------------------------------------------------------------
/常见问题汇总.md:
--------------------------------------------------------------------------------
  1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。
  2 | 
  3 | # 问题汇总
  4 | ## 1、下载问题
  5 | ### a、代码下载
  6 | **问：up主，可以给我发一份代码吗，代码在哪里下载啊？ 
  7 | 答：Github上的地址就在视频简介里。复制一下就能进去下载了。**
  8 | 
  9 | **问：up主，为什么我下载的代码提示压缩包损坏？
 10 | 答：重新去Github下载。**
 11 | 
 12 | **问：up主，为什么我下载的代码和你在视频以及博客上的代码不一样？
 13 | 答：我常常会对代码进行更新，最终以实际的代码为准。**
 14 | 
 15 | ### b、 权值下载
 16 | **问：up主，为什么我下载的代码里面，model_data下面没有.pth或者.h5文件？ 
 17 | 答：我一般会把权值上传到Github和百度网盘，在GITHUB的README里面就能找到。**
 18 | 
 19 | ### c、 数据集下载
 20 | **问：up主，XXXX数据集在哪里下载啊？
 21 | 答：一般数据集的下载地址我会放在README里面，基本上都有，没有的话请及时联系我添加，直接发github的issue即可**。
 22 | 
 23 | ## 2、环境配置问题
 24 | ### a、20系列及以下显卡环境配置
 25 | **pytorch代码对应的pytorch版本为1.2，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。
 26 | 
 27 | **keras代码对应的tensorflow版本为1.13.2，keras版本是2.1.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。
 28 | 
 29 | **tf2代码对应的tensorflow版本为2.2.0，无需安装keras，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。
 30 | 
 31 | **问：你的代码某某某版本的tensorflow和pytorch能用嘛？
 32 | 答：最好按照我推荐的配置，配置教程也有！其它版本的我没有试过！可能出现问题但是一般问题不大。仅需要改少量代码即可。**
 33 | 
 34 | ### b、30系列显卡环境配置
 35 | 30系显卡由于框架更新不可使用上述环境配置教程。
 36 | 当前我已经测试的可以用的30显卡配置如下：
 37 | **pytorch代码对应的pytorch版本为1.7.0，cuda为11.0，cudnn为8.0.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120668551](https://blog.csdn.net/weixin_44791964/article/details/120668551)。
 38 | 
 39 | **keras代码无法在win10下配置cuda11，在ubuntu下可以百度查询一下，配置tensorflow版本为1.15.4，keras版本是2.1.5或者2.3.1（少量函数接口不同，代码可能还需要少量调整。）**
 40 | 
 41 | **tf2代码对应的tensorflow版本为2.4.0，cuda为11.0，cudnn为8.0.5，博客地址对应为**[https://blog.csdn.net/weixin_44791964/article/details/120657664](https://blog.csdn.net/weixin_44791964/article/details/120657664)。
 42 | 
 43 | ### c、CPU环境配置
 44 | **pytorch代码对应的pytorch-cpu版本为1.2，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120655098](https://blog.csdn.net/weixin_44791964/article/details/120655098)
 45 | 
 46 | **keras代码对应的tensorflow-cpu版本为1.13.2，keras版本是2.1.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120653717](https://blog.csdn.net/weixin_44791964/article/details/120653717)。
 47 | 
 48 | **tf2代码对应的tensorflow-cpu版本为2.2.0，无需安装keras，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120656291](https://blog.csdn.net/weixin_44791964/article/details/120656291)。
 49 | 
 50 | 
 51 | ### d、GPU利用问题与环境使用问题
 52 | **问：为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢？
 53 | 答：确认tensorflow-gpu已经装好，利用pip list查看tensorflow版本，然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练，任务管理器的话要看显存使用情况。**
 54 | 
 55 | **问：up主，我好像没有在用gpu进行训练啊，怎么看是不是用了GPU进行训练？
 56 | 答：查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令。在windows电脑中打开cmd然后利用nvidia-smi指令查看GPU利用情况**
 57 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/f88ef794c9a341918f000eb2b1c67af6.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
 58 | **如果要一定看任务管理器的话，请看性能部分GPU的显存是否利用，或者查看任务管理器的Cuda，而非Copy。**
 59 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center)
 60 | 
 61 | ### e、DLL load failed: 找不到指定的模块
 62 | **问：出现如下错误**
 63 | ```python
 64 | Traceback (most recent call last):
 65 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
 66 |  from tensorflow.python.pywrap_tensorflow_internal import *
 67 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
 68 | pywrap_tensorflow_internal = swig_import_helper()
 69 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
 70 |     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
 71 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file)
 72 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic
 73 |     return _load(spec)
 74 | ImportError: DLL load failed: 找不到指定的模块。
 75 | ```
 76 | **答：如果没重启过就重启一下，否则重新按照步骤安装，还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。**
 77 | 
 78 | ### f、no module问题（no module name utils.utils、no module named 'matplotlib' ）
 79 | **问：为什么提示说no module name utils.utils（no module name nets.yolo、no module name nets.ssd等一系列问题）啊？
 80 | 答：utils并不需要用pip装，它就在我上传的仓库的根目录，出现这个问题的原因是根目录不对，查查相对目录和根目录的概念。查了基本上就明白了。**
 81 | 
 82 | **问：为什么提示说no module name matplotlib（no module name PIL，no module name cv2等等）？
 83 | 答：这个库没安装打开命令行安装就好。pip install matplotlib**
 84 | 
 85 | **问：为什么我已经用pip装了opencv（pillow、matplotlib等），还是提示no module name cv2？
 86 | 答：没有激活环境装，要激活对应的conda环境进行安装才可以正常使用**
 87 | 
 88 | **问：为什么提示说No module named 'torch' ？
 89 | 答：其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况？一般就俩情况，一个是真的没装，还有一个是装到其它环境了，当前激活的环境不是自己装的环境。**
 90 | 
 91 | **问：为什么提示说No module named 'tensorflow' ？
 92 | 答：同上。**
 93 | 
 94 | ### g、cuda安装失败问题
 95 | 一般cuda安装前需要安装Visual Studio，装个2017版本即可。
 96 | 
 97 | ### h、Ubuntu系统问题
 98 | **所有代码在Ubuntu下可以使用，我两个系统都试过。**
 99 | 
100 | ### i、VSCODE提示错误的问题
101 | **问：为什么在VSCODE里面提示一大堆的错误啊？
102 | 答：我也提示一大堆的错误，但是不影响，是VSCODE的问题，如果不想看错误的话就装Pycharm。
103 | 最好将设置里面的Python:Language Server，调整为Pylance。**
104 | 
105 | ### j、使用cpu进行训练与预测的问题
106 | **对于keras和tf2的代码而言，如果想用cpu进行训练和预测，直接装cpu版本的tensorflow就可以了。**
107 | 
108 | **对于pytorch的代码而言，如果想用cpu进行训练和预测，需要将cuda=True修改成cuda=False。**
109 | 
110 | ### k、tqdm没有pos参数问题
111 | **问：运行代码提示'tqdm' object has no attribute 'pos'。
112 | 答：重装tqdm，换个版本就可以了。**
113 | 
114 | ### l、提示decode(“utf-8”)的问题
115 | **由于h5py库的更新，安装过程中会自动安装h5py=3.0.0以上的版本，会导致decode("utf-8")的错误！
116 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0！**
117 | ```
118 | pip install h5py==2.10.0
119 | ```
120 | 
121 | ### m、提示TypeError: __array__() takes 1 positional argument but 2 were given错误
122 | 可以修改pillow版本解决。
123 | ```
124 | pip install pillow==8.2.0
125 | ```
126 | ### n、如何查看当前cuda和cudnn
127 | **window下cuda版本查看方式如下：
128 | 1、打开cmd窗口。
129 | 2、输入nvcc -V。
130 | 3、Cuda compilation tools, release XXXXXXXX中的XXXXXXXX即cuda版本。**
131 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/0389ea35107a408a80ab5cb6590d5a74.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
132 | window下cudnn版本查看方式如下：
133 | 1、进入cuda安装目录，进入incude文件夹。
134 | 2、找到cudnn.h文件。
135 | 3、右键文本打开，下拉，看到#define处可获得cudnn版本。
136 | ```python
137 | #define CUDNN_MAJOR 7
138 | #define CUDNN_MINOR 4
139 | #define CUDNN_PATCHLEVEL 1
140 | ```
141 | 代表cudnn为7.4.1。
142 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/7a86b68b17c84feaa6fa95780d4ae4b4.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
143 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/81bb7c3e13cc492292530e4b69df86a9.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
144 | 
145 | ### o、为什么按照你的环境配置后还是不能使用
146 | **问：up主，为什么我按照你的环境配置后还是不能使用？
147 | 答：请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。**
148 | 
149 | ### p、其它问题
150 | **问：为什么提示TypeError: cat() got an unexpected keyword argument 'axis'，Traceback (most recent call last)，AttributeError: 'Tensor' object has no attribute 'bool'？
151 | 答：这是版本问题，建议使用torch1.2以上版本**
152 | 
153 | **其它有很多稀奇古怪的问题，很多是版本问题，建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2，就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。**
154 | 
155 | ## 3、目标检测库问题汇总（人脸检测和分类库也可参考）
156 | ### a、shape不匹配问题。
157 | #### 1）、训练时shape不匹配问题。
158 | **问：up主，为什么运行train.py会提示shape不匹配啊？
159 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
160 | 
161 | #### 2）、预测时shape不匹配问题。
162 | **问：为什么我运行predict.py会提示我说shape不匹配呀。**
163 | ##### i、copying a param with shape torch.Size([75, 704, 1, 1]) from checkpoint
164 | 在Pytorch里面是这样的：
165 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
166 | ##### ii、Shapes are [1,1,1024,75] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,75], [255,1024,1,1].
167 | 在Keras里面是这样的：
168 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
169 | **答：原因主要有仨：
170 | 1、训练的classes_path没改，就开始训练了。
171 | 2、训练的model_path没改。
172 | 3、训练的classes_path没改。
173 | 请检查清楚了！确定自己所用的model_path和classes_path是对应的！训练的时候用到的num_classes或者classes_path也需要检查！**
174 | 
175 | ### b、显存不足问题（OOM、RuntimeError: CUDA out of memory）。
176 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
177 | 答：这是在keras中出现的，爆显存了，可以改小batch_size，SSD的显存占用率是最小的，建议用SSD；
178 | 2G显存：SSD、YOLOV4-TINY
179 | 4G显存：YOLOV3
180 | 6G显存：YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等
181 | 8G+显存：随便选吧。**
182 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
183 | 
184 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
185 | 答：这是pytorch中出现的，爆显存了，同上。**
186 | 
187 | **问：为什么我显存都没利用，就直接爆显存了？ 
188 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
189 | ### c、为什么要进行冻结训练与解冻训练，不进行行吗？
190 | **问：为什么要冻结训练和解冻训练呀？
191 | 答：可以不进行，本质上是为了保证性能不足的同学的训练，如果电脑性能完全不够，可以将Freeze_Epoch和UnFreeze_Epoch设置成一样，只进行冻结训练。**
192 | 
193 | **同时这也是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
194 | 在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。
195 | 在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。
196 | 
197 | ### d、我的LOSS好大啊，有问题吗？（我的LOSS好小啊，有问题吗？）
198 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
199 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
200 | 
201 | ### e、为什么我训练出来的模型没有预测结果？
202 | **问：为什么我的训练效果不好？预测了没有框（框不准）。
203 | 答：**
204 | 考虑几个问题：
205 | 1、目标信息问题，查看2007_train.txt文件是否有目标信息，没有的话请修改voc_annotation.py。
206 | 2、数据集问题，小于500的自行考虑增加数据集，同时测试不同的模型，确认数据集是好的。
207 | 3、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
208 | 4、网络问题，比如SSD不适合小目标，因为先验框固定了。
209 | 5、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
210 | 6、确认自己是否按照步骤去做了，如果比如voc_annotation.py里面的classes是否修改了等。
211 | 7、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。
212 | 8、是否修改了网络的主干，如果修改了没有预训练权重，网络不容易收敛，自然效果不好。
213 | 
214 | ### f、为什么我计算出来的map是0？
215 | **问：为什么我的训练效果不好？没有map？
216 | 答：**
217 | 首先尝试利用predict.py预测一下，如果有效果的话应该是get_map.py里面的classes_path设置错误。如果没有预测结果的话，解决方法同e问题，对下面几点进行检查：
218 | 1、目标信息问题，查看2007_train.txt文件是否有目标信息，没有的话请修改voc_annotation.py。
219 | 2、数据集问题，小于500的自行考虑增加数据集，同时测试不同的模型，确认数据集是好的。
220 | 3、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
221 | 4、网络问题，比如SSD不适合小目标，因为先验框固定了。
222 | 5、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
223 | 6、确认自己是否按照步骤去做了，如果比如voc_annotation.py里面的classes是否修改了等。
224 | 7、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。
225 | 8、是否修改了网络的主干，如果修改了没有预训练权重，网络不容易收敛，自然效果不好。
226 | 
227 | ### g、gbk编码错误（'gbk' codec can't decode byte）。
228 | **问：我怎么出现了gbk什么的编码错误啊：**
229 | ```python
230 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
231 | ```
232 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
233 | 
234 | ### h、我的图片是xxx*xxx的分辨率的，可以用吗？
235 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
236 | **答：可以用，代码里面会自动进行resize与数据增强。**
237 | 
238 | ### i、我想进行数据增强！怎么增强？
239 | **问：我想要进行数据增强！怎么做呢？**
240 | **答：可以用，代码里面会自动进行resize与数据增强。**
241 | 
242 | ### j、多GPU训练。
243 | **问：怎么进行多GPU训练？
244 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
245 | 
246 | ### k、能不能训练灰度图？
247 | **问：能不能训练灰度图（预测灰度图）啊？
248 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
249 | 
250 | ### l、断点续练问题。
251 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
252 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
253 | 
254 | ### m、我要训练其它的数据集，预训练权重能不能用？
255 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
256 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
257 | 
258 | ### n、网络如何从0开始训练？
259 | **问：我要怎么不使用预训练权重啊？
260 | 答：看一看注释、大多数代码是model_path = ''，Freeze_Train = Fasle**，如果设置model_path无用，**那么把载入预训练权重的代码注释了就行。**
261 | 
262 | ### o、为什么从0开始训练效果这么差（修改了网络主干，效果不好怎么办）？
263 | **问：为什么我不使用预训练权重效果这么差啊？
264 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
265 | 
266 | **问：up，我修改了网络，预训练权重还能用吗？
267 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
268 | 权值匹配的方式可以参考如下：
269 | ```python
270 | # 加快模型训练的效率
271 | print('Loading weights into state dict...')
272 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
273 | model_dict = model.state_dict()
274 | pretrained_dict = torch.load(model_path, map_location=device)
275 | a = {}
276 | for k, v in pretrained_dict.items():
277 |     try:    
278 |         if np.shape(model_dict[k]) ==  np.shape(v):
279 |             a[k]=v
280 |     except:
281 |         pass
282 | model_dict.update(a)
283 | model.load_state_dict(model_dict)
284 | print('Finished!')
285 | ```
286 | 
287 | **问：为什么从0开始训练效果这么差（我修改了网络主干，效果不好怎么办）？
288 | 答：一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。
289 | 网络修改了主干之后也是同样的问题，随机的权值效果很差。**
290 | 
291 | **问：怎么在模型上从0开始训练？
292 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
293 | 如果一定要从0开始，那么训练的时候请注意几点：
294 |  - 不载入预训练权重。 
295 |  - 不要进行冻结训练，注释冻结模型的代码。
296 | 
297 | **问：为什么我不使用预训练权重效果这么差啊？
298 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
299 | 
300 | ### p、你的权值都是哪里来的？
301 | **问：如果网络不能从0开始训练的话你的权值哪里来的？
302 | 答：有些权值是官方转换过来的，有些权值是自己训练出来的，我用到的主干的imagenet的权值都是官方的。**
303 | 
304 | ### q、视频检测与摄像头检测
305 | **问：怎么用摄像头检测呀？
306 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
307 | 
308 | **问：怎么用视频检测呀？
309 | 答：同上**
310 | 
311 | ### r、如何保存检测出的图片
312 | **问：检测完的图片怎么保存？
313 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
314 | 
315 | **问：怎么用视频保存呀？
316 | 答：详细看看predict.py文件的注释。**
317 | 
318 | ### s、遍历问题
319 | **问：如何对一个文件夹的图片进行遍历？
320 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
321 | 
322 | **问：如何对一个文件夹的图片进行遍历？并且保存。
323 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
324 | 
325 | ### t、路径问题（No such file or directory、StopIteration: [Errno 13] Permission denied: 'XXXXXX'）
326 | **问：我怎么出现了这样的错误呀：**
327 | ```python
328 | FileNotFoundError: 【Errno 2】 No such file or directory
329 | StopIteration: [Errno 13] Permission denied: 'D:\\Study\\Collection\\Dataset\\VOC07+12+test\\VOCdevkit/VOC2007'
330 | ……………………………………
331 | ……………………………………
332 | ```
333 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
334 | 关于路径有几个重要的点：
335 | **文件夹名称中一定不要有空格。
336 | 注意相对路径和绝对路径。
337 | 多百度路径相关的知识。**
338 | 
339 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
340 | ### u、和原版比较问题，你怎么和原版不一样啊？
341 | **问：原版的代码是XXX，为什么你的代码是XXX？
342 | 答：是啊……这要不怎么说我不是原版呢……**
343 | 
344 | **问：你这个代码和原版比怎么样，可以达到原版的效果么？
345 | 答：基本上可以达到，我都用voc数据测过，我没有好显卡，没有能力在coco上测试与训练。**
346 | 
347 | **问：你有没有实现yolov4所有的tricks，和原版差距多少？
348 | 答：并没有实现全部的改进部分，由于YOLOV4使用的改进实在太多了，很难完全实现与列出来，这里只列出来了一些我比较感兴趣，而且非常有效的改进。论文中提到的SAM（注意力机制模块），作者自己的源码也没有使用。还有其它很多的tricks，不是所有的tricks都有提升，我也没法实现全部的tricks。至于和原版的比较，我没有能力训练coco数据集，根据使用过的同学反应差距不大。**
349 | 
350 | ### v、我的检测速度是xxx正常吗？我的检测速度还能增快吗？
351 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
352 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
353 | 
354 | **问：我的检测速度是xxx正常吗？我的检测速度还能增快吗？
355 | 答：看配置，配置好速度就快，如果想要配置不变的情况下加快速度，就要修改网络了。**
356 | 
357 | **问：为什么我用服务器去测试yolov4（or others）的FPS只有十几？
358 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。**
359 | 
360 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
361 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
362 | 
363 | ### w、预测图片不显示问题
364 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
365 | 答：给系统安装一个图片查看器就行了。**
366 | 
367 | ### x、算法评价问题（目标检测的map、PR曲线、Recall、Precision等）
368 | **问：怎么计算map？
369 | 答：看map视频，都一个流程。**
370 | 
371 | **问：计算map的时候，get_map.py里面有一个MINOVERLAP是什么用的，是iou吗？
372 | 答：是iou，它的作用是判断预测框和真实框的重合成度，如果重合程度大于MINOVERLAP，则预测正确。**
373 | 
374 | **问：为什么get_map.py里面的self.confidence（self.score）要设置的那么小？
375 | 答：看一下map的视频的原理部分，要知道所有的结果然后再进行pr曲线的绘制。**
376 | 
377 | **问：能不能说说怎么绘制PR曲线啥的呀。
378 | 答：可以看mAP视频，结果里面有PR曲线。**
379 | 
380 | **问：怎么计算Recall、Precision指标。
381 | 答：这俩指标应该是相对于特定的置信度的，计算map的时候也会获得。**
382 | 
383 | ### y、coco数据集训练问题
384 | **问：目标检测怎么训练COCO数据集啊？。
385 | 答：coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库，格式都是一样的。**
386 | 
387 | ### z、UP，怎么优化模型啊？我想提升效果
388 | **问：up，怎么修改模型啊，我想发个小论文！
389 | 答：建议看看yolov3和yolov4的区别，然后看看yolov4的论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。**
390 | 
391 | ### aa、UP，有Focal LOSS的代码吗？怎么改啊？
392 | **问：up，YOLO系列使用Focal LOSS的代码你有吗，有提升吗？
393 | 答：很多人试过，提升效果也不大（甚至变的更Low），它自己有自己的正负样本的平衡方式**。改代码的事情，还是自己好好看看代码吧。
394 | 
395 | ### ab、部署问题（ONNX、TensorRT等）
396 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
397 | 
398 | ## 4、语义分割库问题汇总
399 | ### a、shape不匹配问题
400 | #### 1）、训练时shape不匹配问题
401 | **问：up主，为什么运行train.py会提示shape不匹配啊？
402 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
403 | 
404 | #### 2）、预测时shape不匹配问题
405 | **问：为什么我运行predict.py会提示我说shape不匹配呀。**
406 | ##### i、copying a param with shape torch.Size([75, 704, 1, 1]) from checkpoint
407 | 在Pytorch里面是这样的：
408 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
409 | ##### ii、Shapes are [1,1,1024,75] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,75], [255,1024,1,1].
410 | 在Keras里面是这样的：
411 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
412 | **答：原因主要有二：
413 | 1、train.py里面的num_classes没改。
414 | 2、预测时num_classes没改。
415 | 3、预测时model_path没改。
416 | 请检查清楚！训练和预测的时候用到的num_classes都需要检查！**
417 | 
418 | ### b、显存不足问题（OOM、RuntimeError: CUDA out of memory）。
419 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
420 | 答：这是在keras中出现的，爆显存了，可以改小batch_size。**
421 | 
422 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
423 | 
424 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
425 | 答：这是pytorch中出现的，爆显存了，同上。**
426 | 
427 | **问：为什么我显存都没利用，就直接爆显存了？ 
428 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
429 | 
430 | ### c、为什么要进行冻结训练与解冻训练，不进行行吗？
431 | **问：为什么要冻结训练和解冻训练呀？
432 | 答：可以不进行，本质上是为了保证性能不足的同学的训练，如果电脑性能完全不够，可以将Freeze_Epoch和UnFreeze_Epoch设置成一样，只进行冻结训练。**
433 | 
434 | **同时这也是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
435 | 在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。
436 | 在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。
437 | 
438 | ### d、我的LOSS好大啊，有问题吗？（我的LOSS好小啊，有问题吗？）
439 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
440 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
441 | 
442 | ### e、为什么我训练出来的模型没有预测结果？
443 | **问：为什么我的训练效果不好？预测了没有框（框不准）。
444 | 答：**
445 | **考虑几个问题：
446 | 1、数据集问题，这是最重要的问题。小于500的自行考虑增加数据集；一定要检查数据集的标签，视频中详细解析了VOC数据集的格式，但并不是有输入图片有输出标签即可，还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对，最常见的错误格式就是标签的背景为黑，目标为白，此时目标的像素点值为255，无法正常训练，目标需要为1才行。
447 | 2、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
448 | 3、网络问题，可以尝试不同的网络。
449 | 4、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
450 | 5、确认自己是否按照步骤去做了。
451 | 6、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。**
452 | 
453 | **问：为什么我的训练效果不好？对小目标预测不准确。
454 | 答：对于deeplab和pspnet而言，可以修改一下downsample_factor，当downsample_factor为16的时候下采样倍数过多，效果不太好，可以修改为8。**
455 | 
456 | ### f、为什么我计算出来的miou是0？
457 | **问：为什么我的训练效果不好？计算出来的miou是0？。**
458 | 答：
459 | 与e类似，**考虑几个问题：
460 | 1、数据集问题，这是最重要的问题。小于500的自行考虑增加数据集；一定要检查数据集的标签，视频中详细解析了VOC数据集的格式，但并不是有输入图片有输出标签即可，还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对，最常见的错误格式就是标签的背景为黑，目标为白，此时目标的像素点值为255，无法正常训练，目标需要为1才行。
461 | 2、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
462 | 3、网络问题，可以尝试不同的网络。
463 | 4、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
464 | 5、确认自己是否按照步骤去做了。
465 | 6、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。**
466 | 
467 | ### g、gbk编码错误（'gbk' codec can't decode byte）。
468 | **问：我怎么出现了gbk什么的编码错误啊：**
469 | ```python
470 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
471 | ```
472 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
473 | 
474 | ### h、我的图片是xxx*xxx的分辨率的，可以用吗？
475 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
476 | **答：可以用，代码里面会自动进行resize与数据增强。**
477 | 
478 | ### i、我想进行数据增强！怎么增强？
479 | **问：我想要进行数据增强！怎么做呢？**
480 | **答：可以用，代码里面会自动进行resize与数据增强。**
481 | 
482 | ### j、多GPU训练。
483 | **问：怎么进行多GPU训练？
484 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
485 | 
486 | ### k、能不能训练灰度图？
487 | **问：能不能训练灰度图（预测灰度图）啊？
488 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
489 | 
490 | ### l、断点续练问题。
491 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
492 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
493 | 
494 | ### m、我要训练其它的数据集，预训练权重能不能用？
495 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
496 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
497 | 
498 | ### n、网络如何从0开始训练？
499 | **问：我要怎么不使用预训练权重啊？
500 | 答：看一看注释、大多数代码是model_path = ''，Freeze_Train = Fasle**，如果设置model_path无用，**那么把载入预训练权重的代码注释了就行。**
501 | 
502 | ### o、为什么从0开始训练效果这么差（修改了网络主干，效果不好怎么办）？
503 | **问：为什么我不使用预训练权重效果这么差啊？
504 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，预训练权重还是非常重要的。**
505 | 
506 | **问：up，我修改了网络，预训练权重还能用吗？
507 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
508 | 权值匹配的方式可以参考如下：
509 | ```python
510 | # 加快模型训练的效率
511 | print('Loading weights into state dict...')
512 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
513 | model_dict = model.state_dict()
514 | pretrained_dict = torch.load(model_path, map_location=device)
515 | a = {}
516 | for k, v in pretrained_dict.items():
517 |     try:    
518 |         if np.shape(model_dict[k]) ==  np.shape(v):
519 |             a[k]=v
520 |     except:
521 |         pass
522 | model_dict.update(a)
523 | model.load_state_dict(model_dict)
524 | print('Finished!')
525 | ```
526 | 
527 | **问：为什么从0开始训练效果这么差（我修改了网络主干，效果不好怎么办）？
528 | 答：一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。
529 | 网络修改了主干之后也是同样的问题，随机的权值效果很差。**
530 | 
531 | **问：怎么在模型上从0开始训练？
532 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
533 | 如果一定要从0开始，那么训练的时候请注意几点：
534 |  - 不载入预训练权重。 
535 |  - 不要进行冻结训练，注释冻结模型的代码。
536 | 
537 | **问：为什么我不使用预训练权重效果这么差啊？
538 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
539 | 
540 | ### p、你的权值都是哪里来的？
541 | **问：如果网络不能从0开始训练的话你的权值哪里来的？
542 | 答：有些权值是官方转换过来的，有些权值是自己训练出来的，我用到的主干的imagenet的权值都是官方的。**
543 | 
544 | 
545 | ### q、视频检测与摄像头检测
546 | **问：怎么用摄像头检测呀？
547 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
548 | 
549 | **问：怎么用视频检测呀？
550 | 答：同上**
551 | 
552 | ### r、如何保存检测出的图片
553 | **问：检测完的图片怎么保存？
554 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
555 | 
556 | **问：怎么用视频保存呀？
557 | 答：详细看看predict.py文件的注释。**
558 | 
559 | ### s、遍历问题
560 | **问：如何对一个文件夹的图片进行遍历？
561 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
562 | 
563 | **问：如何对一个文件夹的图片进行遍历？并且保存。
564 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
565 | 
566 | ### t、路径问题（No such file or directory、StopIteration: [Errno 13] Permission denied: 'XXXXXX'）
567 | **问：我怎么出现了这样的错误呀：**
568 | ```python
569 | FileNotFoundError: 【Errno 2】 No such file or directory
570 | StopIteration: [Errno 13] Permission denied: 'D:\\Study\\Collection\\Dataset\\VOC07+12+test\\VOCdevkit/VOC2007'
571 | ……………………………………
572 | ……………………………………
573 | ```
574 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
575 | 关于路径有几个重要的点：
576 | **文件夹名称中一定不要有空格。
577 | 注意相对路径和绝对路径。
578 | 多百度路径相关的知识。**
579 | 
580 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
581 | ### u、和原版比较问题，你怎么和原版不一样啊？
582 | **问：原版的代码是XXX，为什么你的代码是XXX？
583 | 答：是啊……这要不怎么说我不是原版呢……**
584 | 
585 | **问：你这个代码和原版比怎么样，可以达到原版的效果么？
586 | 答：基本上可以达到，我都用voc数据测过，我没有好显卡，没有能力在coco上测试与训练。**
587 | 
588 | ### v、我的检测速度是xxx正常吗？我的检测速度还能增快吗？
589 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
590 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
591 | 
592 | **问：我的检测速度是xxx正常吗？我的检测速度还能增快吗？
593 | 答：看配置，配置好速度就快，如果想要配置不变的情况下加快速度，就要修改网络了。**
594 | 
595 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
596 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
597 | 
598 | ### w、预测图片不显示问题
599 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
600 | 答：给系统安装一个图片查看器就行了。**
601 | 
602 | ### x、算法评价问题（miou）
603 | **问：怎么计算miou？
604 | 答：参考视频里的miou测量部分。**
605 | 
606 | **问：怎么计算Recall、Precision指标。
607 | 答：现有的代码还无法获得，需要各位同学理解一下混淆矩阵的概念，然后自行计算一下。**
608 | 
609 | ### y、UP，怎么优化模型啊？我想提升效果
610 | **问：up，怎么修改模型啊，我想发个小论文！
611 | 答：建议目标检测中的yolov4论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。**
612 | 
613 | ### z、部署问题（ONNX、TensorRT等）
614 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
615 | 
616 | ## 5、交流群问题
617 | **问：up，有没有QQ群啥的呢？
618 | 答：没有没有，我没有时间管理QQ群……**
619 | 
620 | ## 6、怎么学习的问题
621 | **问：up，你的学习路线怎么样的？我是个小白我要怎么学？
622 | 答：这里有几点需要注意哈
623 | 1、我不是高手，很多东西我也不会，我的学习路线也不一定适用所有人。
624 | 2、我实验室不做深度学习，所以我很多东西都是自学，自己摸索，正确与否我也不知道。
625 | 3、我个人觉得学习更靠自学**
626 | 学习路线的话，我是先学习了莫烦的python教程，从tensorflow、keras、pytorch入门，入门完之后学的SSD，YOLO，然后了解了很多经典的卷积网，后面就开始学很多不同的代码了，我的学习方法就是一行一行的看，了解整个代码的执行流程，特征层的shape变化等，花了很多时间也没有什么捷径，就是要花时间吧。
627 | 


--------------------------------------------------------------------------------