├── .gitignore
├── LICENSE
├── README.md
├── VOCdevkit
    └── VOC2007
    │   ├── Annotations
    │       └── README.md
    │   ├── ImageSets
    │       └── Main
    │       │   └── README.md
    │   └── JPEGImages
    │       └── README.md
├── efficientdet.py
├── get_map.py
├── img
    └── street.jpg
├── logs
    └── README.md
├── model_data
    ├── coco_classes.txt
    ├── simhei.ttf
    └── voc_classes.txt
├── nets
    ├── __init__.py
    ├── efficientdet.py
    ├── efficientdet_training.py
    ├── efficientnet.py
    └── layers.py
├── predict.py
├── requirements.txt
├── summary.py
├── train.py
├── utils
    ├── __init__.py
    ├── anchors.py
    ├── callbacks.py
    ├── dataloader.py
    ├── utils.py
    ├── utils_bbox.py
    ├── utils_fit.py
    └── utils_map.py
├── voc_annotation.py
└── 常见问题汇总.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # ignore map, miou, datasets
  2 | map_out/
  3 | miou_out/
  4 | VOCdevkit/
  5 | datasets/
  6 | Medical_Datasets/
  7 | lfw/
  8 | logs/
  9 | model_data/
 10 | .temp_map_out/
 11 | 
 12 | # Byte-compiled / optimized / DLL files
 13 | __pycache__/
 14 | *.py[cod]
 15 | *$py.class
 16 | 
 17 | # C extensions
 18 | *.so
 19 | 
 20 | # Distribution / packaging
 21 | .Python
 22 | build/
 23 | develop-eggs/
 24 | dist/
 25 | downloads/
 26 | eggs/
 27 | .eggs/
 28 | lib/
 29 | lib64/
 30 | parts/
 31 | sdist/
 32 | var/
 33 | wheels/
 34 | pip-wheel-metadata/
 35 | share/python-wheels/
 36 | *.egg-info/
 37 | .installed.cfg
 38 | *.egg
 39 | MANIFEST
 40 | 
 41 | # PyInstaller
 42 | #  Usually these files are written by a python script from a template
 43 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 44 | *.manifest
 45 | *.spec
 46 | 
 47 | # Installer logs
 48 | pip-log.txt
 49 | pip-delete-this-directory.txt
 50 | 
 51 | # Unit test / coverage reports
 52 | htmlcov/
 53 | .tox/
 54 | .nox/
 55 | .coverage
 56 | .coverage.*
 57 | .cache
 58 | nosetests.xml
 59 | coverage.xml
 60 | *.cover
 61 | *.py,cover
 62 | .hypothesis/
 63 | .pytest_cache/
 64 | 
 65 | # Translations
 66 | *.mo
 67 | *.pot
 68 | 
 69 | # Django stuff:
 70 | *.log
 71 | local_settings.py
 72 | db.sqlite3
 73 | db.sqlite3-journal
 74 | 
 75 | # Flask stuff:
 76 | instance/
 77 | .webassets-cache
 78 | 
 79 | # Scrapy stuff:
 80 | .scrapy
 81 | 
 82 | # Sphinx documentation
 83 | docs/_build/
 84 | 
 85 | # PyBuilder
 86 | target/
 87 | 
 88 | # Jupyter Notebook
 89 | .ipynb_checkpoints
 90 | 
 91 | # IPython
 92 | profile_default/
 93 | ipython_config.py
 94 | 
 95 | # pyenv
 96 | .python-version
 97 | 
 98 | # pipenv
 99 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
100 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
101 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
102 | #   install all needed dependencies.
103 | #Pipfile.lock
104 | 
105 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
106 | __pypackages__/
107 | 
108 | # Celery stuff
109 | celerybeat-schedule
110 | celerybeat.pid
111 | 
112 | # SageMath parsed files
113 | *.sage.py
114 | 
115 | # Environments
116 | .env
117 | .venv
118 | env/
119 | venv/
120 | ENV/
121 | env.bak/
122 | venv.bak/
123 | 
124 | # Spyder project settings
125 | .spyderproject
126 | .spyproject
127 | 
128 | # Rope project settings
129 | .ropeproject
130 | 
131 | # mkdocs documentation
132 | /site
133 | 
134 | # mypy
135 | .mypy_cache/
136 | .dmypy.json
137 | dmypy.json
138 | 
139 | # Pyre type checker
140 | .pyre/
141 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Bubbliiiing
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## Efficientdet：Scalable and Efficient Object目标检测模型在Pytorch当中的实现
  2 | ---
  3 | 
  4 | ## 目录
  5 | 1. [仓库更新 Top News](#仓库更新)
  6 | 2. [性能情况 Performance](#性能情况)
  7 | 3. [所需环境 Environment](#所需环境)
  8 | 4. [注意事项 Attention](#注意事项)
  9 | 5. [文件下载 Download](#文件下载)
 10 | 6. [训练步骤 How2train](#训练步骤)
 11 | 7. [预测步骤 How2predict](#预测步骤)
 12 | 8. [评估步骤 How2eval](#评估步骤)
 13 | 9. [参考资料 Reference](#Reference)
 14 | 
 15 | ## Top News
 16 | **`2022-04`**:**进行了大幅度的更新，支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整、新增图片裁剪。支持多GPU训练，新增各个种类目标数量计算。**  
 17 | BiliBili视频中的原仓库地址为：https://github.com/bubbliiiing/efficientdet-pytorch/tree/bilibili
 18 | 
 19 | **`2021-10`**:**进行了大幅度的更新，增加了大量注释、增加了大量可调整参数、对代码的组成模块进行修改、增加fps、视频预测、批量预测等功能。**   
 20 | 
 21 | ### 性能情况
 22 | | 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mAP 0.5:0.95 |
 23 | | :-----: | :-----: | :------: | :------: | :------: |
 24 | | COCO-Train2017 | [efficientdet-d0.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d0.pth) | COCO-Val2017 | 512x512 | 33.1 
 25 | | COCO-Train2017 | [efficientdet-d1.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d1.pth) | COCO-Val2017 | 640x640 | 38.8  
 26 | | COCO-Train2017 | [efficientdet-d2.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d2.pth) | COCO-Val2017 | 768x768 | 42.1
 27 | | COCO-Train2017 | [efficientdet-d3.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d3.pth) | COCO-Val2017 | 896x896 | 45.6
 28 | | COCO-Train2017 | [efficientdet-d4.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d4.pth) | COCO-Val2017 | 1024x1024 | 48.8
 29 | | COCO-Train2017 | [efficientdet-d5.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d5.pth) | COCO-Val2017 | 1280x1280 | 50.2
 30 | | COCO-Train2017 | [efficientdet-d6.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d6.pth) | COCO-Val2017 | 1408x1408 | 50.7 
 31 | | COCO-Train2017 | [efficientdet-d7.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d7.pth) | COCO-Val2017 | 1536x1536 | 51.2  
 32 | 
 33 | ### 所需环境
 34 | torch==1.2.0
 35 | 
 36 | ### 文件下载  
 37 | 训练所需的pth可以在百度网盘下载。       
 38 | 包括Efficientdet-d0到d7所有权重。    
 39 | 链接: https://pan.baidu.com/s/1cTNR63gTizlggSgwDrmwxg    
 40 | 提取码: hk96    
 41 | 
 42 | VOC数据集下载地址如下，里面已经包括了训练集、测试集、验证集（与测试集一样），无需再次划分：  
 43 | 链接: https://pan.baidu.com/s/1-1Ej6dayrx3g0iAA88uY5A    
 44 | 提取码: ph32   
 45 | 
 46 | ## 训练步骤
 47 | ### a、训练VOC07+12数据集
 48 | 1. 数据集的准备   
 49 | **本文使用VOC格式进行训练，训练前需要下载好VOC07+12的数据集，解压后放在根目录**  
 50 | 
 51 | 2. 数据集的处理   
 52 | 修改voc_annotation.py里面的annotation_mode=2，运行voc_annotation.py生成根目录下的2007_train.txt和2007_val.txt。   
 53 | 
 54 | 3. 开始网络训练   
 55 | train.py的默认参数用于训练VOC数据集，直接运行train.py即可开始训练。   
 56 | 
 57 | 4. 训练结果预测   
 58 | 训练结果预测需要用到两个文件，分别是efficientdet.py和predict.py。我们首先需要去efficientdet.py里面修改model_path以及classes_path，这两个参数必须要修改。   
 59 | **model_path指向训练好的权值文件，在logs文件夹里。   
 60 | classes_path指向检测类别所对应的txt。**   
 61 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。   
 62 | 
 63 | ### b、训练自己的数据集
 64 | 1. 数据集的准备  
 65 | **本文使用VOC格式进行训练，训练前需要自己制作好数据集，**    
 66 | 训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。   
 67 | 训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。   
 68 | 
 69 | 2. 数据集的处理  
 70 | 在完成数据集的摆放之后，我们需要利用voc_annotation.py获得训练用的2007_train.txt和2007_val.txt。   
 71 | 修改voc_annotation.py里面的参数。第一次训练可以仅修改classes_path，classes_path用于指向检测类别所对应的txt。   
 72 | 训练自己的数据集时，可以自己建立一个cls_classes.txt，里面写自己所需要区分的类别。   
 73 | model_data/cls_classes.txt文件内容为：      
 74 | ```python
 75 | cat
 76 | dog
 77 | ...
 78 | ```
 79 | 修改voc_annotation.py中的classes_path，使其对应cls_classes.txt，并运行voc_annotation.py。  
 80 | 
 81 | 3. 开始网络训练  
 82 | **训练的参数较多，均在train.py中，大家可以在下载库后仔细看注释，其中最重要的部分依然是train.py里的classes_path。**  
 83 | **classes_path用于指向检测类别所对应的txt，这个txt和voc_annotation.py里面的txt一样！训练自己的数据集必须要修改！**  
 84 | 修改完classes_path后就可以运行train.py开始训练了，在训练多个epoch后，权值会生成在logs文件夹中。  
 85 | 
 86 | 4. 训练结果预测  
 87 | 训练结果预测需要用到两个文件，分别是efficientdet.py和predict.py。在efficientdet.py里面修改model_path以及classes_path。  
 88 | **model_path指向训练好的权值文件，在logs文件夹里。  
 89 | classes_path指向检测类别所对应的txt。**  
 90 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。  
 91 | 
 92 | ## 预测步骤
 93 | ### a、使用预训练权重
 94 | 1. 下载完库后解压，在百度网盘下载权值，放入model_data，运行predict.py，输入  
 95 | ```python
 96 | img/street.jpg
 97 | ```
 98 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
 99 | ### b、使用自己训练的权重
100 | 1. 按照训练步骤训练。  
101 | 2. 在efficientdet.py文件里面，在如下部分修改model_path和classes_path使其对应训练好的文件；**model_path对应logs文件夹下面的权值文件，classes_path是model_path对应分的类**。  
102 | ```python
103 | _defaults = {
104 |     #--------------------------------------------------------------------------#
105 |     #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
106 |     #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
107 |     #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
108 |     #--------------------------------------------------------------------------#
109 |     "model_path"        : 'model_data/efficientdet-d0.pth',
110 |     "classes_path"      : 'model_data/coco_classes.txt',
111 |     #---------------------------------------------------------------------#
112 |     #   用于选择所使用的模型的版本，0-7
113 |     #---------------------------------------------------------------------#
114 |     "phi"               : 0,
115 |     #---------------------------------------------------------------------#
116 |     #   只有得分大于置信度的预测框会被保留下来
117 |     #---------------------------------------------------------------------#
118 |     "confidence"        : 0.3,
119 |     #---------------------------------------------------------------------#
120 |     #   非极大抑制所用到的nms_iou大小
121 |     #---------------------------------------------------------------------#
122 |     "nms_iou"           : 0.3,
123 |     #---------------------------------------------------------------------#
124 |     #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
125 |     #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
126 |     #---------------------------------------------------------------------#
127 |     "letterbox_image"   : False,
128 |     #---------------------------------------------------------------------#
129 |     #   是否使用Cuda
130 |     #   没有GPU可以设置成False
131 |     #---------------------------------------------------------------------#
132 |     "cuda"              : True
133 | }
134 | ```
135 | 3. 运行predict.py，输入  
136 | ```python
137 | img/street.jpg
138 | ```
139 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
140 | 
141 | ## 评估步骤 
142 | ### a、评估VOC07+12的测试集
143 | 1. 本文使用VOC格式进行评估。VOC07+12已经划分好了测试集，无需利用voc_annotation.py生成ImageSets文件夹下的txt。
144 | 2. 在efficientdet.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
145 | 3. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
146 | 
147 | ### b、评估自己的数据集
148 | 1. 本文使用VOC格式进行评估。  
149 | 2. 如果在训练前已经运行过voc_annotation.py文件，代码会自动将数据集划分成训练集、验证集和测试集。如果想要修改测试集的比例，可以修改voc_annotation.py文件下的trainval_percent。trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1。train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1。
150 | 3. 利用voc_annotation.py划分测试集后，前往get_map.py文件修改classes_path，classes_path用于指向检测类别所对应的txt，这个txt和训练时的txt一样。评估自己的数据集必须要修改。
151 | 4. 在efficientdet.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
152 | 5. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
153 | 
154 | ### Reference
155 | https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch   
156 | https://github.com/Cartucho/mAP
157 | 


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/Annotations/README.md:
--------------------------------------------------------------------------------
1 | 存放标签文件


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/ImageSets/Main/README.md:
--------------------------------------------------------------------------------
1 | 存放训练索引文件


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/JPEGImages/README.md:
--------------------------------------------------------------------------------
1 | 存放图片文件


--------------------------------------------------------------------------------
/efficientdet.py:
--------------------------------------------------------------------------------
  1 | import colorsys
  2 | import os
  3 | import time
  4 | 
  5 | import numpy as np
  6 | import torch
  7 | import torch.nn as nn
  8 | from PIL import ImageDraw, ImageFont
  9 | 
 10 | from nets.efficientdet import EfficientDetBackbone
 11 | from utils.utils import (cvtColor, get_classes, image_sizes, preprocess_input,
 12 |                          resize_image, show_config)
 13 | from utils.utils_bbox import decodebox, non_max_suppression
 14 | 
 15 | 
 16 | #--------------------------------------------#
 17 | #   使用自己训练好的模型预测需要修改3个参数
 18 | #   model_path和classes_path和phi都需要修改！
 19 | #   如果出现shape不匹配，一定要注意
 20 | #   训练时的model_path和classes_path参数的修改
 21 | #--------------------------------------------#
 22 | class Efficientdet(object):
 23 |     _defaults = {
 24 |         #--------------------------------------------------------------------------#
 25 |         #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
 26 |         #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
 27 |         #
 28 |         #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
 29 |         #   验证集损失较低不代表mAP较高，仅代表该权值在验证集上泛化性能较好。
 30 |         #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
 31 |         #--------------------------------------------------------------------------#
 32 |         "model_path"        : 'model_data/efficientdet-d0.pth',
 33 |         "classes_path"      : 'model_data/coco_classes.txt',
 34 |         #---------------------------------------------------------------------#
 35 |         #   用于选择所使用的模型的版本，0-7
 36 |         #---------------------------------------------------------------------#
 37 |         "phi"               : 0,
 38 |         #---------------------------------------------------------------------#
 39 |         #   只有得分大于置信度的预测框会被保留下来
 40 |         #---------------------------------------------------------------------#
 41 |         "confidence"        : 0.3,
 42 |         #---------------------------------------------------------------------#
 43 |         #   非极大抑制所用到的nms_iou大小
 44 |         #---------------------------------------------------------------------#
 45 |         "nms_iou"           : 0.3,
 46 |         #---------------------------------------------------------------------#
 47 |         #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
 48 |         #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
 49 |         #---------------------------------------------------------------------#
 50 |         "letterbox_image"   : False,
 51 |         #---------------------------------------------------------------------#
 52 |         #   是否使用Cuda
 53 |         #   没有GPU可以设置成False
 54 |         #---------------------------------------------------------------------#
 55 |         "cuda"              : True
 56 |     }
 57 | 
 58 |     @classmethod
 59 |     def get_defaults(cls, n):
 60 |         if n in cls._defaults:
 61 |             return cls._defaults[n]
 62 |         else:
 63 |             return "Unrecognized attribute name '" + n + "'"
 64 | 
 65 |     #---------------------------------------------------#
 66 |     #   初始化Efficientdet
 67 |     #---------------------------------------------------#
 68 |     def __init__(self, **kwargs):
 69 |         self.__dict__.update(self._defaults)
 70 |         for name, value in kwargs.items():
 71 |             setattr(self, name, value)
 72 |             self._defaults[name] = value 
 73 | 
 74 |         self.input_shape                    = [image_sizes[self.phi], image_sizes[self.phi]]
 75 |         #---------------------------------------------------#
 76 |         #   计算总的类的数量
 77 |         #---------------------------------------------------#
 78 |         self.class_names, self.num_classes  = get_classes(self.classes_path)
 79 |         
 80 |         #---------------------------------------------------#
 81 |         #   画框设置不同的颜色
 82 |         #---------------------------------------------------#
 83 |         hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]
 84 |         self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
 85 |         self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))
 86 | 
 87 |         self.generate()
 88 |         
 89 |         show_config(**self._defaults)
 90 |         
 91 |     #---------------------------------------------------#
 92 |     #   载入模型
 93 |     #---------------------------------------------------#
 94 |     def generate(self):
 95 |         #----------------------------------------#
 96 |         #   创建Efficientdet模型
 97 |         #----------------------------------------#
 98 |         self.net    = EfficientDetBackbone(self.num_classes, self.phi)
 99 | 
100 |         device      = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
101 |         self.net.load_state_dict(torch.load(self.model_path, map_location=device))
102 |         self.net    = self.net.eval()
103 |         print('{} model, anchors, and classes loaded.'.format(self.model_path))
104 | 
105 |         if self.cuda:
106 |             self.net = nn.DataParallel(self.net)
107 |             self.net = self.net.cuda()
108 | 
109 |     #---------------------------------------------------#
110 |     #   检测图片
111 |     #---------------------------------------------------#
112 |     def detect_image(self, image, crop = False, count = False):
113 |         #---------------------------------------------------#
114 |         #   计算输入图片的高和宽
115 |         #---------------------------------------------------#
116 |         image_shape = np.array(np.shape(image)[0:2])
117 |         #---------------------------------------------------------#
118 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
119 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
120 |         #---------------------------------------------------------#
121 |         image       = cvtColor(image)
122 |         #---------------------------------------------------------#
123 |         #   给图像增加灰条，实现不失真的resize
124 |         #   也可以直接resize进行识别
125 |         #---------------------------------------------------------#
126 |         image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
127 |         #---------------------------------------------------------#
128 |         #   添加上batch_size维度，图片预处理，归一化。
129 |         #---------------------------------------------------------#
130 |         image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
131 | 
132 |         with torch.no_grad():
133 |             images = torch.from_numpy(image_data)
134 |             if self.cuda:
135 |                 images = images.cuda()
136 |             #---------------------------------------------------------#
137 |             #   传入网络当中进行预测
138 |             #---------------------------------------------------------#
139 |             _, regression, classification, anchors = self.net(images)
140 |             
141 |             #-----------------------------------------------------------#
142 |             #   将预测结果进行解码
143 |             #-----------------------------------------------------------#
144 |             outputs     = decodebox(regression, anchors, self.input_shape)
145 |             results     = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 
146 |                                     image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
147 |                
148 |             if results[0] is None: 
149 |                 return image
150 | 
151 |             top_label   = np.array(results[0][:, 5], dtype = 'int32')
152 |             top_conf    = results[0][:, 4]
153 |             top_boxes   = results[0][:, :4]
154 | 
155 |         #---------------------------------------------------------#
156 |         #   设置字体与边框厚度
157 |         #---------------------------------------------------------#
158 |         font        = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
159 |         thickness   = int(max((image.size[0] + image.size[1]) // np.mean(self.input_shape), 1))
160 |         #---------------------------------------------------------#
161 |         #   计数
162 |         #---------------------------------------------------------#
163 |         if count:
164 |             print("top_label:", top_label)
165 |             classes_nums    = np.zeros([self.num_classes])
166 |             for i in range(self.num_classes):
167 |                 num = np.sum(top_label == i)
168 |                 if num > 0:
169 |                     print(self.class_names[i], " : ", num)
170 |                 classes_nums[i] = num
171 |             print("classes_nums:", classes_nums)
172 |         #---------------------------------------------------------#
173 |         #   是否进行目标的裁剪
174 |         #---------------------------------------------------------#
175 |         if crop:
176 |             for i, c in list(enumerate(top_label)):
177 |                 top, left, bottom, right = top_boxes[i]
178 |                 top     = max(0, np.floor(top).astype('int32'))
179 |                 left    = max(0, np.floor(left).astype('int32'))
180 |                 bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
181 |                 right   = min(image.size[0], np.floor(right).astype('int32'))
182 |                 
183 |                 dir_save_path = "img_crop"
184 |                 if not os.path.exists(dir_save_path):
185 |                     os.makedirs(dir_save_path)
186 |                 crop_image = image.crop([left, top, right, bottom])
187 |                 crop_image.save(os.path.join(dir_save_path, "crop_" + str(i) + ".png"), quality=95, subsampling=0)
188 |                 print("save crop_" + str(i) + ".png to " + dir_save_path)
189 |         #---------------------------------------------------------#
190 |         #   图像绘制
191 |         #---------------------------------------------------------#
192 |         for i, c in list(enumerate(top_label)):
193 |             predicted_class = self.class_names[int(c)]
194 |             box             = top_boxes[i]
195 |             score           = top_conf[i]
196 | 
197 |             top, left, bottom, right = box
198 | 
199 |             top     = max(0, np.floor(top).astype('int32'))
200 |             left    = max(0, np.floor(left).astype('int32'))
201 |             bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
202 |             right   = min(image.size[0], np.floor(right).astype('int32'))
203 | 
204 |             label = '{} {:.2f}'.format(predicted_class, score)
205 |             draw = ImageDraw.Draw(image)
206 |             label_size = draw.textsize(label, font)
207 |             label = label.encode('utf-8')
208 |             print(label, top, left, bottom, right)
209 |             
210 |             if top - label_size[1] >= 0:
211 |                 text_origin = np.array([left, top - label_size[1]])
212 |             else:
213 |                 text_origin = np.array([left, top + 1])
214 | 
215 |             for i in range(thickness):
216 |                 draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c])
217 |             draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c])
218 |             draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
219 |             del draw
220 | 
221 |         return image
222 | 
223 |     def get_FPS(self, image, test_interval):
224 |         image_shape = np.array(np.shape(image)[0:2])
225 |         #---------------------------------------------------------#
226 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
227 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
228 |         #---------------------------------------------------------#
229 |         image       = cvtColor(image)
230 |         #---------------------------------------------------------#
231 |         #   给图像增加灰条，实现不失真的resize
232 |         #   也可以直接resize进行识别
233 |         #---------------------------------------------------------#
234 |         image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
235 |         #---------------------------------------------------------#
236 |         #   添加上batch_size维度，图片预处理，归一化。
237 |         #---------------------------------------------------------#
238 |         image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
239 | 
240 |         with torch.no_grad():
241 |             images = torch.from_numpy(image_data)
242 |             if self.cuda:
243 |                 images = images.cuda()
244 |             #---------------------------------------------------------#
245 |             #   传入网络当中进行预测
246 |             #---------------------------------------------------------#
247 |             _, regression, classification, anchors = self.net(images)
248 |             
249 |             #-----------------------------------------------------------#
250 |             #   将预测结果进行解码
251 |             #-----------------------------------------------------------#
252 |             outputs     = decodebox(regression, anchors, self.input_shape)
253 |             results     = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 
254 |                                     image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
255 | 
256 |         t1 = time.time()
257 |         for _ in range(test_interval):
258 |             with torch.no_grad():
259 |                 #---------------------------------------------------------#
260 |                 #   传入网络当中进行预测
261 |                 #---------------------------------------------------------#
262 |                 _, regression, classification, anchors = self.net(images)
263 |                 
264 |                 #-----------------------------------------------------------#
265 |                 #   将预测结果进行解码
266 |                 #-----------------------------------------------------------#
267 |                 outputs     = decodebox(regression, anchors, self.input_shape)
268 |                 results     = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 
269 |                                         image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
270 | 
271 |         t2 = time.time()
272 |         tact_time = (t2 - t1) / test_interval
273 |         return tact_time
274 | 
275 |     #---------------------------------------------------#
276 |     #   检测图片
277 |     #---------------------------------------------------#
278 |     def get_map_txt(self, image_id, image, class_names, map_out_path):
279 |         f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 
280 |         image_shape = np.array(np.shape(image)[0:2])
281 |         #---------------------------------------------------------#
282 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
283 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
284 |         #---------------------------------------------------------#
285 |         image       = cvtColor(image)
286 |         #---------------------------------------------------------#
287 |         #   给图像增加灰条，实现不失真的resize
288 |         #   也可以直接resize进行识别
289 |         #---------------------------------------------------------#
290 |         image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
291 |         #---------------------------------------------------------#
292 |         #   添加上batch_size维度，图片预处理，归一化。
293 |         #---------------------------------------------------------#
294 |         image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
295 | 
296 |         with torch.no_grad():
297 |             images = torch.from_numpy(image_data)
298 |             if self.cuda:
299 |                 images = images.cuda()
300 |             #---------------------------------------------------------#
301 |             #   传入网络当中进行预测
302 |             #---------------------------------------------------------#
303 |             _, regression, classification, anchors = self.net(images)
304 |             
305 |             #-----------------------------------------------------------#
306 |             #   将预测结果进行解码
307 |             #-----------------------------------------------------------#
308 |             outputs     = decodebox(regression, anchors, self.input_shape)
309 |             results     = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 
310 |                                     image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
311 |                
312 |             if results[0] is None: 
313 |                 return 
314 | 
315 |             top_label   = np.array(results[0][:, 5], dtype = 'int32')
316 |             top_conf    = results[0][:, 4]
317 |             top_boxes   = results[0][:, :4]
318 | 
319 |         for i, c in list(enumerate(top_label)):
320 |             predicted_class = self.class_names[int(c)]
321 |             box             = top_boxes[i]
322 |             score           = str(top_conf[i])
323 | 
324 |             top, left, bottom, right = box
325 |             if predicted_class not in class_names:
326 |                 continue
327 | 
328 |             f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
329 | 
330 |         f.close()
331 |         return 
332 | 


--------------------------------------------------------------------------------
/get_map.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import xml.etree.ElementTree as ET
  3 | 
  4 | from PIL import Image
  5 | from tqdm import tqdm
  6 | 
  7 | from utils.utils import get_classes
  8 | from utils.utils_map import get_coco_map, get_map
  9 | from efficientdet import Efficientdet
 10 | 
 11 | if __name__ == "__main__":
 12 |     '''
 13 |     Recall和Precision不像AP是一个面积的概念，因此在门限值（Confidence）不同时，网络的Recall和Precision值是不同的。
 14 |     默认情况下，本代码计算的Recall和Precision代表的是当门限值（Confidence）为0.5时，所对应的Recall和Precision值。
 15 | 
 16 |     受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算不同门限条件下的Recall和Precision值
 17 |     因此，本代码获得的map_out/detection-results/里面的txt的框的数量一般会比直接predict多一些，目的是列出所有可能的预测框，
 18 |     '''
 19 |     #------------------------------------------------------------------------------------------------------------------#
 20 |     #   map_mode用于指定该文件运行时计算的内容
 21 |     #   map_mode为0代表整个map计算流程，包括获得预测结果、获得真实框、计算VOC_map。
 22 |     #   map_mode为1代表仅仅获得预测结果。
 23 |     #   map_mode为2代表仅仅获得真实框。
 24 |     #   map_mode为3代表仅仅计算VOC_map。
 25 |     #   map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行
 26 |     #-------------------------------------------------------------------------------------------------------------------#
 27 |     map_mode        = 0
 28 |     #--------------------------------------------------------------------------------------#
 29 |     #   此处的classes_path用于指定需要测量VOC_map的类别
 30 |     #   一般情况下与训练和预测所用的classes_path一致即可
 31 |     #--------------------------------------------------------------------------------------#
 32 |     classes_path    = 'model_data/voc_classes.txt'
 33 |     #--------------------------------------------------------------------------------------#
 34 |     #   MINOVERLAP用于指定想要获得的mAP0.x，mAP0.x的意义是什么请同学们百度一下。
 35 |     #   比如计算mAP0.75，可以设定MINOVERLAP = 0.75。
 36 |     #
 37 |     #   当某一预测框与真实框重合度大于MINOVERLAP时，该预测框被认为是正样本，否则为负样本。
 38 |     #   因此MINOVERLAP的值越大，预测框要预测的越准确才能被认为是正样本，此时算出来的mAP值越低，
 39 |     #--------------------------------------------------------------------------------------#
 40 |     MINOVERLAP      = 0.5
 41 |     #--------------------------------------------------------------------------------------#
 42 |     #   受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算mAP
 43 |     #   因此，confidence的值应当设置的尽量小进而获得全部可能的预测框。
 44 |     #   
 45 |     #   该值一般不调整。因为计算mAP需要获得近乎所有的预测框，此处的confidence不能随便更改。
 46 |     #   想要获得不同门限值下的Recall和Precision值，请修改下方的score_threhold。
 47 |     #--------------------------------------------------------------------------------------#
 48 |     confidence      = 0.02
 49 |     #--------------------------------------------------------------------------------------#
 50 |     #   预测时使用到的非极大抑制值的大小，越大表示非极大抑制越不严格。
 51 |     #   
 52 |     #   该值一般不调整。
 53 |     #--------------------------------------------------------------------------------------#
 54 |     nms_iou         = 0.5
 55 |     #---------------------------------------------------------------------------------------------------------------#
 56 |     #   Recall和Precision不像AP是一个面积的概念，因此在门限值不同时，网络的Recall和Precision值是不同的。
 57 |     #   
 58 |     #   默认情况下，本代码计算的Recall和Precision代表的是当门限值为0.5（此处定义为score_threhold）时所对应的Recall和Precision值。
 59 |     #   因为计算mAP需要获得近乎所有的预测框，上面定义的confidence不能随便更改。
 60 |     #   这里专门定义一个score_threhold用于代表门限值，进而在计算mAP时找到门限值对应的Recall和Precision值。
 61 |     #---------------------------------------------------------------------------------------------------------------#
 62 |     score_threhold  = 0.5
 63 |     #-------------------------------------------------------#
 64 |     #   map_vis用于指定是否开启VOC_map计算的可视化
 65 |     #-------------------------------------------------------#
 66 |     map_vis         = False
 67 |     #-------------------------------------------------------#
 68 |     #   指向VOC数据集所在的文件夹
 69 |     #   默认指向根目录下的VOC数据集
 70 |     #-------------------------------------------------------#
 71 |     VOCdevkit_path  = 'VOCdevkit'
 72 |     #-------------------------------------------------------#
 73 |     #   结果输出的文件夹，默认为map_out
 74 |     #-------------------------------------------------------#
 75 |     map_out_path    = 'map_out'
 76 | 
 77 |     image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split()
 78 | 
 79 |     if not os.path.exists(map_out_path):
 80 |         os.makedirs(map_out_path)
 81 |     if not os.path.exists(os.path.join(map_out_path, 'ground-truth')):
 82 |         os.makedirs(os.path.join(map_out_path, 'ground-truth'))
 83 |     if not os.path.exists(os.path.join(map_out_path, 'detection-results')):
 84 |         os.makedirs(os.path.join(map_out_path, 'detection-results'))
 85 |     if not os.path.exists(os.path.join(map_out_path, 'images-optional')):
 86 |         os.makedirs(os.path.join(map_out_path, 'images-optional'))
 87 | 
 88 |     class_names, _ = get_classes(classes_path)
 89 | 
 90 |     if map_mode == 0 or map_mode == 1:
 91 |         print("Load model.")
 92 |         efficientdet = Efficientdet(confidence = confidence, nms_iou = nms_iou)
 93 |         print("Load model done.")
 94 | 
 95 |         print("Get predict result.")
 96 |         for image_id in tqdm(image_ids):
 97 |             image_path  = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg")
 98 |             image       = Image.open(image_path)
 99 |             if map_vis:
100 |                 image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg"))
101 |             efficientdet.get_map_txt(image_id, image, class_names, map_out_path)
102 |         print("Get predict result done.")
103 |         
104 |     if map_mode == 0 or map_mode == 2:
105 |         print("Get ground truth result.")
106 |         for image_id in tqdm(image_ids):
107 |             with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
108 |                 root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot()
109 |                 for obj in root.findall('object'):
110 |                     difficult_flag = False
111 |                     if obj.find('difficult')!=None:
112 |                         difficult = obj.find('difficult').text
113 |                         if int(difficult)==1:
114 |                             difficult_flag = True
115 |                     obj_name = obj.find('name').text
116 |                     if obj_name not in class_names:
117 |                         continue
118 |                     bndbox  = obj.find('bndbox')
119 |                     left    = bndbox.find('xmin').text
120 |                     top     = bndbox.find('ymin').text
121 |                     right   = bndbox.find('xmax').text
122 |                     bottom  = bndbox.find('ymax').text
123 | 
124 |                     if difficult_flag:
125 |                         new_f.write("%s %s %s %s %s difficult\n" % (obj_name, left, top, right, bottom))
126 |                     else:
127 |                         new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
128 |         print("Get ground truth result done.")
129 | 
130 |     if map_mode == 0 or map_mode == 3:
131 |         print("Get map.")
132 |         get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path)
133 |         print("Get map done.")
134 | 
135 |     if map_mode == 4:
136 |         print("Get map.")
137 |         get_coco_map(class_names = class_names, path = map_out_path)
138 |         print("Get map done.")
139 | 


--------------------------------------------------------------------------------
/img/street.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/efficientdet-pytorch/172bd8c85daeba96a03b955e6c4776a1afd99cca/img/street.jpg


--------------------------------------------------------------------------------
/logs/README.md:
--------------------------------------------------------------------------------
1 | 训练好的权重会保存在这里
2 | 


--------------------------------------------------------------------------------
/model_data/coco_classes.txt:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorbike
 5 | aeroplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | 
13 | stop sign
14 | parking meter
15 | bench
16 | bird
17 | cat
18 | dog
19 | horse
20 | sheep
21 | cow
22 | elephant
23 | bear
24 | zebra
25 | giraffe
26 | 
27 | backpack
28 | umbrella
29 | 
30 | 
31 | handbag
32 | tie
33 | suitcase
34 | frisbee
35 | skis
36 | snowboard
37 | sports ball
38 | kite
39 | baseball bat
40 | baseball glove
41 | skateboard
42 | surfboard
43 | tennis racket
44 | bottle
45 | 
46 | wine glass
47 | cup
48 | fork
49 | knife
50 | spoon
51 | bowl
52 | banana
53 | apple
54 | sandwich
55 | orange
56 | broccoli
57 | carrot
58 | hot dog
59 | pizza
60 | donut
61 | cake
62 | chair
63 | sofa
64 | pottedplant
65 | bed
66 | 
67 | diningtable
68 | 
69 | 
70 | toilet
71 | 
72 | tvmonitor
73 | laptop
74 | mouse
75 | remote
76 | keyboard
77 | cell phone
78 | microwave
79 | oven
80 | toaster
81 | sink
82 | refrigerator
83 | 
84 | book
85 | clock
86 | vase
87 | scissors
88 | teddy bear
89 | hair drier
90 | toothbrush
91 | 


--------------------------------------------------------------------------------
/model_data/simhei.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bubbliiiing/efficientdet-pytorch/172bd8c85daeba96a03b955e6c4776a1afd99cca/model_data/simhei.ttf


--------------------------------------------------------------------------------
/model_data/voc_classes.txt:
--------------------------------------------------------------------------------
 1 | aeroplane
 2 | bicycle
 3 | bird
 4 | boat
 5 | bottle
 6 | bus
 7 | car
 8 | cat
 9 | chair
10 | cow
11 | diningtable
12 | dog
13 | horse
14 | motorbike
15 | person
16 | pottedplant
17 | sheep
18 | sofa
19 | train
20 | tvmonitor


--------------------------------------------------------------------------------
/nets/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/nets/efficientdet.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | from utils.anchors import Anchors
  4 | 
  5 | from nets.efficientnet import EfficientNet as EffNet
  6 | from nets.layers import (Conv2dStaticSamePadding, MaxPool2dStaticSamePadding,
  7 |                          MemoryEfficientSwish, Swish)
  8 | 
  9 | 
 10 | #----------------------------------#
 11 | #   Xception中深度可分离卷积
 12 | #   先3x3的深度可分离卷积
 13 | #   再1x1的普通卷积
 14 | #----------------------------------#
 15 | class SeparableConvBlock(nn.Module):
 16 |     def __init__(self, in_channels, out_channels=None, norm=True, activation=False, onnx_export=False):
 17 |         super(SeparableConvBlock, self).__init__()
 18 |         if out_channels is None:
 19 |             out_channels = in_channels
 20 | 
 21 |         self.depthwise_conv = Conv2dStaticSamePadding(in_channels, in_channels, kernel_size=3, stride=1, groups=in_channels, bias=False)
 22 |         self.pointwise_conv = Conv2dStaticSamePadding(in_channels, out_channels, kernel_size=1, stride=1)
 23 | 
 24 |         self.norm = norm
 25 |         if self.norm:
 26 |             self.bn = nn.BatchNorm2d(num_features=out_channels, momentum=0.01, eps=1e-3)
 27 | 
 28 |         self.activation = activation
 29 |         if self.activation:
 30 |             self.swish = MemoryEfficientSwish() if not onnx_export else Swish()
 31 | 
 32 |     def forward(self, x):
 33 |         x = self.depthwise_conv(x)
 34 |         x = self.pointwise_conv(x)
 35 | 
 36 |         if self.norm:
 37 |             x = self.bn(x)
 38 | 
 39 |         if self.activation:
 40 |             x = self.swish(x)
 41 | 
 42 |         return x
 43 | 
 44 | class BiFPN(nn.Module):
 45 |     def __init__(self, num_channels, conv_channels, first_time=False, epsilon=1e-4, onnx_export=False, attention=True):
 46 |         super(BiFPN, self).__init__()
 47 |         self.epsilon = epsilon
 48 |         self.conv6_up = SeparableConvBlock(num_channels, onnx_export=onnx_export)
 49 |         self.conv5_up = SeparableConvBlock(num_channels, onnx_export=onnx_export)
 50 |         self.conv4_up = SeparableConvBlock(num_channels, onnx_export=onnx_export)
 51 |         self.conv3_up = SeparableConvBlock(num_channels, onnx_export=onnx_export)
 52 | 
 53 |         self.conv4_down = SeparableConvBlock(num_channels, onnx_export=onnx_export)
 54 |         self.conv5_down = SeparableConvBlock(num_channels, onnx_export=onnx_export)
 55 |         self.conv6_down = SeparableConvBlock(num_channels, onnx_export=onnx_export)
 56 |         self.conv7_down = SeparableConvBlock(num_channels, onnx_export=onnx_export)
 57 | 
 58 |         self.p6_upsample = nn.Upsample(scale_factor=2, mode='nearest')
 59 |         self.p5_upsample = nn.Upsample(scale_factor=2, mode='nearest')
 60 |         self.p4_upsample = nn.Upsample(scale_factor=2, mode='nearest')
 61 |         self.p3_upsample = nn.Upsample(scale_factor=2, mode='nearest')
 62 | 
 63 |         self.p4_downsample = MaxPool2dStaticSamePadding(3, 2)
 64 |         self.p5_downsample = MaxPool2dStaticSamePadding(3, 2)
 65 |         self.p6_downsample = MaxPool2dStaticSamePadding(3, 2)
 66 |         self.p7_downsample = MaxPool2dStaticSamePadding(3, 2)
 67 | 
 68 |         self.swish = MemoryEfficientSwish() if not onnx_export else Swish()
 69 | 
 70 |         self.first_time = first_time
 71 |         if self.first_time:
 72 |             # 获取到了efficientnet的最后三层，对其进行通道的下压缩
 73 |             self.p5_down_channel = nn.Sequential(
 74 |                 Conv2dStaticSamePadding(conv_channels[2], num_channels, 1),
 75 |                 nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3),
 76 |             )
 77 |             self.p4_down_channel = nn.Sequential(
 78 |                 Conv2dStaticSamePadding(conv_channels[1], num_channels, 1),
 79 |                 nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3),
 80 |             )
 81 |             self.p3_down_channel = nn.Sequential(
 82 |                 Conv2dStaticSamePadding(conv_channels[0], num_channels, 1),
 83 |                 nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3),
 84 |             )
 85 | 
 86 |             # 对输入进来的p5进行宽高的下采样
 87 |             self.p5_to_p6 = nn.Sequential(
 88 |                 Conv2dStaticSamePadding(conv_channels[2], num_channels, 1),
 89 |                 nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3),
 90 |                 MaxPool2dStaticSamePadding(3, 2)
 91 |             )
 92 |             self.p6_to_p7 = nn.Sequential(
 93 |                 MaxPool2dStaticSamePadding(3, 2)
 94 |             )
 95 | 
 96 |             # BIFPN第一轮的时候，跳线那里并不是同一个in
 97 |             self.p4_down_channel_2 = nn.Sequential(
 98 |                 Conv2dStaticSamePadding(conv_channels[1], num_channels, 1),
 99 |                 nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3),
100 |             )
101 |             self.p5_down_channel_2 = nn.Sequential(
102 |                 Conv2dStaticSamePadding(conv_channels[2], num_channels, 1),
103 |                 nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3),
104 |             )
105 | 
106 |         # 简易注意力机制的weights
107 |         self.p6_w1 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
108 |         self.p6_w1_relu = nn.ReLU()
109 |         self.p5_w1 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
110 |         self.p5_w1_relu = nn.ReLU()
111 |         self.p4_w1 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
112 |         self.p4_w1_relu = nn.ReLU()
113 |         self.p3_w1 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
114 |         self.p3_w1_relu = nn.ReLU()
115 | 
116 |         self.p4_w2 = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True)
117 |         self.p4_w2_relu = nn.ReLU()
118 |         self.p5_w2 = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True)
119 |         self.p5_w2_relu = nn.ReLU()
120 |         self.p6_w2 = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True)
121 |         self.p6_w2_relu = nn.ReLU()
122 |         self.p7_w2 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True)
123 |         self.p7_w2_relu = nn.ReLU()
124 | 
125 |         self.attention = attention
126 | 
127 |     def forward(self, inputs):
128 |         """ bifpn模块结构示意图
129 |             P7_0 -------------------------> P7_2 -------->
130 |                |-------------|                ↑
131 |                              ↓                |
132 |             P6_0 ---------> P6_1 ---------> P6_2 -------->
133 |                |-------------|--------------↑ ↑
134 |                              ↓                |
135 |             P5_0 ---------> P5_1 ---------> P5_2 -------->
136 |                |-------------|--------------↑ ↑
137 |                              ↓                |
138 |             P4_0 ---------> P4_1 ---------> P4_2 -------->
139 |                |-------------|--------------↑ ↑
140 |                              |--------------↓ |
141 |             P3_0 -------------------------> P3_2 -------->
142 |         """
143 |         if self.attention:
144 |             p3_out, p4_out, p5_out, p6_out, p7_out = self._forward_fast_attention(inputs)
145 |         else:
146 |             p3_out, p4_out, p5_out, p6_out, p7_out = self._forward(inputs)
147 | 
148 |         return p3_out, p4_out, p5_out, p6_out, p7_out
149 | 
150 |     def _forward_fast_attention(self, inputs):
151 |         #------------------------------------------------#
152 |         #   当phi=1、2、3、4、5的时候使用fast_attention
153 |         #   获得三个shape的有效特征层
154 |         #   分别是C3  64, 64, 40
155 |         #         C4  32, 32, 112
156 |         #         C5  16, 16, 320
157 |         #------------------------------------------------#
158 |         if self.first_time:
159 |             #------------------------------------------------------------------------#
160 |             #   第一次BIFPN需要 下采样 与 调整通道 获得 p3_in p4_in p5_in p6_in p7_in
161 |             #------------------------------------------------------------------------#
162 |             p3, p4, p5 = inputs
163 |             #-------------------------------------------#
164 |             #   首先对通道数进行调整
165 |             #   C3 64, 64, 40 -> 64, 64, 64
166 |             #-------------------------------------------#
167 |             p3_in = self.p3_down_channel(p3)
168 | 
169 |             #-------------------------------------------#
170 |             #   首先对通道数进行调整
171 |             #   C4 32, 32, 112 -> 32, 32, 64
172 |             #                  -> 32, 32, 64
173 |             #-------------------------------------------#
174 |             p4_in_1 = self.p4_down_channel(p4)
175 |             p4_in_2 = self.p4_down_channel_2(p4)
176 | 
177 |             #-------------------------------------------#
178 |             #   首先对通道数进行调整
179 |             #   C5 16, 16, 320 -> 16, 16, 64
180 |             #                  -> 16, 16, 64
181 |             #-------------------------------------------#
182 |             p5_in_1 = self.p5_down_channel(p5)
183 |             p5_in_2 = self.p5_down_channel_2(p5)
184 |             
185 |             #-------------------------------------------#
186 |             #   对C5进行下采样，调整通道数与宽高
187 |             #   C5 16, 16, 320 -> 8, 8, 64
188 |             #-------------------------------------------#
189 |             p6_in = self.p5_to_p6(p5)
190 |             #-------------------------------------------#
191 |             #   对P6_in进行下采样，调整宽高
192 |             #   P6_in 8, 8, 64 -> 4, 4, 64
193 |             #-------------------------------------------#
194 |             p7_in = self.p6_to_p7(p6_in)
195 | 
196 |             # 简单的注意力机制，用于确定更关注p7_in还是p6_in
197 |             p6_w1 = self.p6_w1_relu(self.p6_w1)
198 |             weight = p6_w1 / (torch.sum(p6_w1, dim=0) + self.epsilon)
199 |             p6_td= self.conv6_up(self.swish(weight[0] * p6_in + weight[1] * self.p6_upsample(p7_in)))
200 | 
201 |             # 简单的注意力机制，用于确定更关注p6_up还是p5_in
202 |             p5_w1 = self.p5_w1_relu(self.p5_w1)
203 |             weight = p5_w1 / (torch.sum(p5_w1, dim=0) + self.epsilon)
204 |             p5_td= self.conv5_up(self.swish(weight[0] * p5_in_1 + weight[1] * self.p5_upsample(p6_td)))
205 | 
206 |             # 简单的注意力机制，用于确定更关注p5_up还是p4_in
207 |             p4_w1 = self.p4_w1_relu(self.p4_w1)
208 |             weight = p4_w1 / (torch.sum(p4_w1, dim=0) + self.epsilon)
209 |             p4_td= self.conv4_up(self.swish(weight[0] * p4_in_1 + weight[1] * self.p4_upsample(p5_td)))
210 | 
211 |             # 简单的注意力机制，用于确定更关注p4_up还是p3_in
212 |             p3_w1 = self.p3_w1_relu(self.p3_w1)
213 |             weight = p3_w1 / (torch.sum(p3_w1, dim=0) + self.epsilon)
214 |             p3_out = self.conv3_up(self.swish(weight[0] * p3_in + weight[1] * self.p3_upsample(p4_td)))
215 | 
216 |             # 简单的注意力机制，用于确定更关注p4_in_2还是p4_up还是p3_out
217 |             p4_w2 = self.p4_w2_relu(self.p4_w2)
218 |             weight = p4_w2 / (torch.sum(p4_w2, dim=0) + self.epsilon)
219 |             p4_out = self.conv4_down(
220 |                 self.swish(weight[0] * p4_in_2 + weight[1] * p4_td+ weight[2] * self.p4_downsample(p3_out)))
221 | 
222 |             # 简单的注意力机制，用于确定更关注p5_in_2还是p5_up还是p4_out
223 |             p5_w2 = self.p5_w2_relu(self.p5_w2)
224 |             weight = p5_w2 / (torch.sum(p5_w2, dim=0) + self.epsilon)
225 |             p5_out = self.conv5_down(
226 |                 self.swish(weight[0] * p5_in_2 + weight[1] * p5_td+ weight[2] * self.p5_downsample(p4_out)))
227 | 
228 |             # 简单的注意力机制，用于确定更关注p6_in还是p6_up还是p5_out
229 |             p6_w2 = self.p6_w2_relu(self.p6_w2)
230 |             weight = p6_w2 / (torch.sum(p6_w2, dim=0) + self.epsilon)
231 |             p6_out = self.conv6_down(
232 |                 self.swish(weight[0] * p6_in + weight[1] * p6_td+ weight[2] * self.p6_downsample(p5_out)))
233 | 
234 |             # 简单的注意力机制，用于确定更关注p7_in还是p7_up还是p6_out
235 |             p7_w2 = self.p7_w2_relu(self.p7_w2)
236 |             weight = p7_w2 / (torch.sum(p7_w2, dim=0) + self.epsilon)
237 |             p7_out = self.conv7_down(self.swish(weight[0] * p7_in + weight[1] * self.p7_downsample(p6_out)))
238 |         else:
239 |             p3_in, p4_in, p5_in, p6_in, p7_in = inputs
240 | 
241 |             # 简单的注意力机制，用于确定更关注p7_in还是p6_in
242 |             p6_w1 = self.p6_w1_relu(self.p6_w1)
243 |             weight = p6_w1 / (torch.sum(p6_w1, dim=0) + self.epsilon)
244 |             p6_td= self.conv6_up(self.swish(weight[0] * p6_in + weight[1] * self.p6_upsample(p7_in)))
245 | 
246 |             # 简单的注意力机制，用于确定更关注p6_up还是p5_in
247 |             p5_w1 = self.p5_w1_relu(self.p5_w1)
248 |             weight = p5_w1 / (torch.sum(p5_w1, dim=0) + self.epsilon)
249 |             p5_td= self.conv5_up(self.swish(weight[0] * p5_in + weight[1] * self.p5_upsample(p6_td)))
250 | 
251 |             # 简单的注意力机制，用于确定更关注p5_up还是p4_in
252 |             p4_w1 = self.p4_w1_relu(self.p4_w1)
253 |             weight = p4_w1 / (torch.sum(p4_w1, dim=0) + self.epsilon)
254 |             p4_td= self.conv4_up(self.swish(weight[0] * p4_in + weight[1] * self.p4_upsample(p5_td)))
255 | 
256 |             # 简单的注意力机制，用于确定更关注p4_up还是p3_in
257 |             p3_w1 = self.p3_w1_relu(self.p3_w1)
258 |             weight = p3_w1 / (torch.sum(p3_w1, dim=0) + self.epsilon)
259 |             p3_out = self.conv3_up(self.swish(weight[0] * p3_in + weight[1] * self.p3_upsample(p4_td)))
260 | 
261 |             # 简单的注意力机制，用于确定更关注p4_in还是p4_up还是p3_out
262 |             p4_w2 = self.p4_w2_relu(self.p4_w2)
263 |             weight = p4_w2 / (torch.sum(p4_w2, dim=0) + self.epsilon)
264 |             p4_out = self.conv4_down(
265 |                 self.swish(weight[0] * p4_in + weight[1] * p4_td+ weight[2] * self.p4_downsample(p3_out)))
266 | 
267 |             # 简单的注意力机制，用于确定更关注p5_in还是p5_up还是p4_out
268 |             p5_w2 = self.p5_w2_relu(self.p5_w2)
269 |             weight = p5_w2 / (torch.sum(p5_w2, dim=0) + self.epsilon)
270 |             p5_out = self.conv5_down(
271 |                 self.swish(weight[0] * p5_in + weight[1] * p5_td+ weight[2] * self.p5_downsample(p4_out)))
272 | 
273 |             # 简单的注意力机制，用于确定更关注p6_in还是p6_up还是p5_out
274 |             p6_w2 = self.p6_w2_relu(self.p6_w2)
275 |             weight = p6_w2 / (torch.sum(p6_w2, dim=0) + self.epsilon)
276 |             p6_out = self.conv6_down(
277 |                 self.swish(weight[0] * p6_in + weight[1] * p6_td+ weight[2] * self.p6_downsample(p5_out)))
278 | 
279 |             # 简单的注意力机制，用于确定更关注p7_in还是p7_up还是p6_out
280 |             p7_w2 = self.p7_w2_relu(self.p7_w2)
281 |             weight = p7_w2 / (torch.sum(p7_w2, dim=0) + self.epsilon)
282 |             p7_out = self.conv7_down(self.swish(weight[0] * p7_in + weight[1] * self.p7_downsample(p6_out)))
283 | 
284 |         return p3_out, p4_out, p5_out, p6_out, p7_out
285 | 
286 |     def _forward(self, inputs):
287 |         # 当phi=6、7的时候使用_forward
288 |         if self.first_time:
289 |             # 第一次BIFPN需要下采样与降通道获得
290 |             # p3_in p4_in p5_in p6_in p7_in
291 |             p3, p4, p5 = inputs
292 |             p3_in = self.p3_down_channel(p3)
293 |             p4_in_1 = self.p4_down_channel(p4)
294 |             p4_in_2 = self.p4_down_channel_2(p4)
295 |             p5_in_1 = self.p5_down_channel(p5)
296 |             p5_in_2 = self.p5_down_channel_2(p5)
297 |             p6_in = self.p5_to_p6(p5)
298 |             p7_in = self.p6_to_p7(p6_in)
299 | 
300 |             p6_td= self.conv6_up(self.swish(p6_in + self.p6_upsample(p7_in)))
301 | 
302 |             p5_td= self.conv5_up(self.swish(p5_in_1 + self.p5_upsample(p6_td)))
303 | 
304 |             p4_td= self.conv4_up(self.swish(p4_in_1 + self.p4_upsample(p5_td)))
305 | 
306 |             p3_out = self.conv3_up(self.swish(p3_in + self.p3_upsample(p4_td)))
307 | 
308 |             p4_out = self.conv4_down(
309 |                 self.swish(p4_in_2 + p4_td+ self.p4_downsample(p3_out)))
310 | 
311 |             p5_out = self.conv5_down(
312 |                 self.swish(p5_in_2 + p5_td+ self.p5_downsample(p4_out)))
313 | 
314 |             p6_out = self.conv6_down(
315 |                 self.swish(p6_in + p6_td+ self.p6_downsample(p5_out)))
316 | 
317 |             p7_out = self.conv7_down(self.swish(p7_in + self.p7_downsample(p6_out)))
318 | 
319 |         else:
320 |             p3_in, p4_in, p5_in, p6_in, p7_in = inputs
321 | 
322 |             p6_td= self.conv6_up(self.swish(p6_in + self.p6_upsample(p7_in)))
323 | 
324 |             p5_td= self.conv5_up(self.swish(p5_in + self.p5_upsample(p6_td)))
325 | 
326 |             p4_td= self.conv4_up(self.swish(p4_in + self.p4_upsample(p5_td)))
327 | 
328 |             p3_out = self.conv3_up(self.swish(p3_in + self.p3_upsample(p4_td)))
329 | 
330 |             p4_out = self.conv4_down(
331 |                 self.swish(p4_in + p4_td+ self.p4_downsample(p3_out)))
332 | 
333 |             p5_out = self.conv5_down(
334 |                 self.swish(p5_in + p5_td+ self.p5_downsample(p4_out)))
335 | 
336 |             p6_out = self.conv6_down(
337 |                 self.swish(p6_in + p6_td+ self.p6_downsample(p5_out)))
338 | 
339 |             p7_out = self.conv7_down(self.swish(p7_in + self.p7_downsample(p6_out)))
340 | 
341 |         return p3_out, p4_out, p5_out, p6_out, p7_out
342 | 
343 | class BoxNet(nn.Module):
344 |     def __init__(self, in_channels, num_anchors, num_layers, onnx_export=False):
345 |         super(BoxNet, self).__init__()
346 |         self.num_layers = num_layers
347 | 
348 |         self.conv_list = nn.ModuleList(
349 |             [SeparableConvBlock(in_channels, in_channels, norm=False, activation=False) for i in range(num_layers)])
350 |         # 每一个有效特征层对应的Batchnor不同
351 |         self.bn_list = nn.ModuleList(
352 |             [nn.ModuleList([nn.BatchNorm2d(in_channels, momentum=0.01, eps=1e-3) for i in range(num_layers)]) for j in range(5)])
353 |         # 9
354 |         # 4 中心，宽高
355 |         self.header = SeparableConvBlock(in_channels, num_anchors * 4, norm=False, activation=False)
356 |         self.swish = MemoryEfficientSwish() if not onnx_export else Swish()
357 | 
358 |     def forward(self, inputs):
359 |         feats = []
360 |         # 对每个特征层循环
361 |         for feat, bn_list in zip(inputs, self.bn_list):
362 |             # 每个特征层需要进行num_layer次卷积+标准化+激活函数
363 |             for i, bn, conv in zip(range(self.num_layers), bn_list, self.conv_list):
364 |                 feat = conv(feat)
365 |                 feat = bn(feat)
366 |                 feat = self.swish(feat)
367 |             feat = self.header(feat)
368 | 
369 |             feat = feat.permute(0, 2, 3, 1)
370 |             feat = feat.contiguous().view(feat.shape[0], -1, 4)
371 |             
372 |             feats.append(feat)
373 |         # 进行一个堆叠
374 |         feats = torch.cat(feats, dim=1)
375 | 
376 |         return feats
377 | 
378 | class ClassNet(nn.Module):
379 |     def __init__(self, in_channels, num_anchors, num_classes, num_layers, onnx_export=False):
380 |         super(ClassNet, self).__init__()
381 |         self.num_anchors = num_anchors
382 |         self.num_classes = num_classes
383 |         self.num_layers = num_layers
384 |         self.conv_list  = nn.ModuleList(
385 |             [SeparableConvBlock(in_channels, in_channels, norm=False, activation=False) for i in range(num_layers)])
386 |         # 每一个有效特征层对应的BatchNorm2d不同
387 |         self.bn_list = nn.ModuleList(
388 |             [nn.ModuleList([nn.BatchNorm2d(in_channels, momentum=0.01, eps=1e-3) for i in range(num_layers)]) for j in range(5)])
389 |         # num_anchors = 9
390 |         # num_anchors num_classes
391 |         self.header = SeparableConvBlock(in_channels, num_anchors * num_classes, norm=False, activation=False)
392 |         self.swish = MemoryEfficientSwish() if not onnx_export else Swish()
393 | 
394 |     def forward(self, inputs):
395 |         feats = []
396 |         # 对每个特征层循环
397 |         for feat, bn_list in zip(inputs, self.bn_list):
398 |             for i, bn, conv in zip(range(self.num_layers), bn_list, self.conv_list):
399 |                 # 每个特征层需要进行num_layer次卷积+标准化+激活函数
400 |                 feat = conv(feat)
401 |                 feat = bn(feat)
402 |                 feat = self.swish(feat)
403 |             feat = self.header(feat)
404 | 
405 |             feat = feat.permute(0, 2, 3, 1)
406 |             feat = feat.contiguous().view(feat.shape[0], feat.shape[1], feat.shape[2], self.num_anchors, self.num_classes)
407 |             feat = feat.contiguous().view(feat.shape[0], -1, self.num_classes)
408 | 
409 |             feats.append(feat)
410 |         # 进行一个堆叠
411 |         feats = torch.cat(feats, dim=1)
412 |         # 取sigmoid表示概率
413 |         feats = feats.sigmoid()
414 | 
415 |         return feats
416 | 
417 | class EfficientNet(nn.Module):
418 |     def __init__(self, phi, pretrained=False):
419 |         super(EfficientNet, self).__init__()
420 |         model = EffNet.from_pretrained(f'efficientnet-b{phi}', pretrained)
421 |         del model._conv_head
422 |         del model._bn1
423 |         del model._avg_pooling
424 |         del model._dropout
425 |         del model._fc
426 |         self.model = model
427 | 
428 |     def forward(self, x):
429 |         x = self.model._conv_stem(x)
430 |         x = self.model._bn0(x)
431 |         x = self.model._swish(x)
432 |         feature_maps = []
433 | 
434 |         last_x = None
435 |         for idx, block in enumerate(self.model._blocks):
436 |             drop_connect_rate = self.model._global_params.drop_connect_rate
437 |             if drop_connect_rate:
438 |                 drop_connect_rate *= float(idx) / len(self.model._blocks)
439 |             x = block(x, drop_connect_rate=drop_connect_rate)
440 |             #------------------------------------------------------#
441 |             #   取出对应的特征层，如果某个EffcientBlock的步长为2的话
442 |             #   意味着它的前一个特征层为有效特征层
443 |             #   除此之外，最后一个EffcientBlock的输出为有效特征层
444 |             #------------------------------------------------------#
445 |             if block._depthwise_conv.stride == [2, 2]:
446 |                 feature_maps.append(last_x)
447 |             elif idx == len(self.model._blocks) - 1:
448 |                 feature_maps.append(x)
449 |             last_x = x
450 |         del last_x
451 |         return feature_maps[1:]
452 | 
453 | class EfficientDetBackbone(nn.Module):
454 |     def __init__(self, num_classes = 80, phi = 0, pretrained = False):
455 |         super(EfficientDetBackbone, self).__init__()
456 |         #--------------------------------#
457 |         #   phi指的是efficientdet的版本
458 |         #--------------------------------#
459 |         self.phi = phi
460 |         #---------------------------------------------------#
461 |         #   backbone_phi指的是该efficientdet对应的efficient
462 |         #---------------------------------------------------#
463 |         self.backbone_phi = [0, 1, 2, 3, 4, 5, 6, 6]
464 |         #--------------------------------#
465 |         #   BiFPN所用的通道数
466 |         #--------------------------------#
467 |         self.fpn_num_filters = [64, 88, 112, 160, 224, 288, 384, 384]
468 |         #--------------------------------#
469 |         #   BiFPN的重复次数
470 |         #--------------------------------#
471 |         self.fpn_cell_repeats = [3, 4, 5, 6, 7, 7, 8, 8]
472 |         #---------------------------------------------------#
473 |         #   Effcient Head卷积重复次数
474 |         #---------------------------------------------------#
475 |         self.box_class_repeats = [3, 3, 3, 4, 4, 4, 5, 5]
476 |         #---------------------------------------------------#
477 |         #   基础的先验框大小
478 |         #---------------------------------------------------#
479 |         self.anchor_scale = [4., 4., 4., 4., 4., 4., 4., 5.]
480 |         num_anchors = 9
481 |         
482 |         conv_channel_coef = {
483 |             0: [40, 112, 320],
484 |             1: [40, 112, 320],
485 |             2: [48, 120, 352],
486 |             3: [48, 136, 384],
487 |             4: [56, 160, 448],
488 |             5: [64, 176, 512],
489 |             6: [72, 200, 576],
490 |             7: [72, 200, 576],
491 |         }
492 | 
493 |         #------------------------------------------------------#
494 |         #   在经过多次BiFPN模块的堆叠后，我们获得的fpn_features
495 |         #   假设我们使用的是efficientdet-D0包括五个有效特征层：
496 |         #   P3_out      64,64,64
497 |         #   P4_out      32,32,64
498 |         #   P5_out      16,16,64
499 |         #   P6_out      8,8,64
500 |         #   P7_out      4,4,64
501 |         #------------------------------------------------------#
502 |         self.bifpn = nn.Sequential(
503 |             *[BiFPN(self.fpn_num_filters[self.phi],
504 |                     conv_channel_coef[phi],
505 |                     True if _ == 0 else False,
506 |                     attention=True if phi < 6 else False)
507 |               for _ in range(self.fpn_cell_repeats[phi])])
508 | 
509 |         self.num_classes = num_classes
510 |         #------------------------------------------------------#
511 |         #   创建efficient head
512 |         #   可以将特征层转换成预测结果
513 |         #------------------------------------------------------#
514 |         self.regressor      = BoxNet(in_channels=self.fpn_num_filters[self.phi], num_anchors=num_anchors,
515 |                                     num_layers=self.box_class_repeats[self.phi])
516 | 
517 |         self.classifier     = ClassNet(in_channels=self.fpn_num_filters[self.phi], num_anchors=num_anchors,
518 |                                     num_classes=num_classes, num_layers=self.box_class_repeats[self.phi])
519 | 
520 |         self.anchors        = Anchors(anchor_scale=self.anchor_scale[phi])
521 | 
522 |         #-------------------------------------------#
523 |         #   获得三个shape的有效特征层
524 |         #   分别是C3  64, 64, 40
525 |         #         C4  32, 32, 112
526 |         #         C5  16, 16, 320
527 |         #-------------------------------------------#
528 |         self.backbone_net   = EfficientNet(self.backbone_phi[phi], pretrained)
529 | 
530 |     def freeze_bn(self):
531 |         for m in self.modules():
532 |             if isinstance(m, nn.BatchNorm2d):
533 |                 m.eval()
534 | 
535 |     def forward(self, inputs):
536 |         _, p3, p4, p5 = self.backbone_net(inputs)
537 | 
538 |         features = (p3, p4, p5)
539 |         features = self.bifpn(features)
540 | 
541 |         regression = self.regressor(features)
542 |         classification = self.classifier(features)
543 |         anchors = self.anchors(inputs)
544 |     
545 |         return features, regression, classification, anchors
546 | 
547 | 


--------------------------------------------------------------------------------
/nets/efficientdet_training.py:
--------------------------------------------------------------------------------
  1 | import math
  2 | from functools import partial
  3 | 
  4 | import torch
  5 | import torch.nn as nn
  6 | 
  7 | 
  8 | def calc_iou(a, b):
  9 |     max_length = torch.max(a)
 10 |     a = a / max_length
 11 |     b = b / max_length
 12 |     
 13 |     area = (b[:, 2] - b[:, 0]) * (b[:, 3] - b[:, 1])
 14 |     iw = torch.min(torch.unsqueeze(a[:, 3], dim=1), b[:, 2]) - torch.max(torch.unsqueeze(a[:, 1], 1), b[:, 0])
 15 |     ih = torch.min(torch.unsqueeze(a[:, 2], dim=1), b[:, 3]) - torch.max(torch.unsqueeze(a[:, 0], 1), b[:, 1])
 16 |     iw = torch.clamp(iw, min=0)
 17 |     ih = torch.clamp(ih, min=0)
 18 |     ua = torch.unsqueeze((a[:, 2] - a[:, 0]) * (a[:, 3] - a[:, 1]), dim=1) + area - iw * ih
 19 |     ua = torch.clamp(ua, min=1e-8)
 20 |     intersection = iw * ih
 21 |     IoU = intersection / ua
 22 | 
 23 |     return IoU
 24 | 
 25 | def get_target(anchor, bbox_annotation, classification, cuda):
 26 |     #------------------------------------------------------#
 27 |     #   计算真实框和先验框的交并比
 28 |     #   anchor              num_anchors, 4
 29 |     #   bbox_annotation     num_true_boxes, 5
 30 |     #   Iou                 num_anchors, num_true_boxes
 31 |     #------------------------------------------------------#
 32 |     IoU = calc_iou(anchor[:, :], bbox_annotation[:, :4])
 33 |     
 34 |     #------------------------------------------------------#
 35 |     #   计算与先验框重合度最大的真实框
 36 |     #   IoU_max             num_anchors,
 37 |     #   IoU_argmax          num_anchors,
 38 |     #------------------------------------------------------#
 39 |     IoU_max, IoU_argmax = torch.max(IoU, dim=1)
 40 | 
 41 |     #------------------------------------------------------#
 42 |     #   寻找哪些先验框在计算loss的时候需要忽略
 43 |     #------------------------------------------------------#
 44 |     targets = torch.ones_like(classification) * -1
 45 |     targets = targets.type_as(classification)
 46 | 
 47 |     #------------------------------------------#
 48 |     #   重合度小于0.4需要参与训练
 49 |     #------------------------------------------#
 50 |     targets[torch.lt(IoU_max, 0.4), :] = 0
 51 | 
 52 |     #--------------------------------------------------#
 53 |     #   重合度大于0.5需要参与训练，还需要计算回归loss
 54 |     #--------------------------------------------------#
 55 |     positive_indices = torch.ge(IoU_max, 0.5)
 56 | 
 57 |     #--------------------------------------------------#
 58 |     #   取出每个先验框最对应的真实框
 59 |     #--------------------------------------------------#
 60 |     assigned_annotations = bbox_annotation[IoU_argmax, :]
 61 | 
 62 |     #--------------------------------------------------#
 63 |     #   将对应的种类置为1
 64 |     #--------------------------------------------------#
 65 |     targets[positive_indices, :] = 0
 66 |     targets[positive_indices, assigned_annotations[positive_indices, 4].long()] = 1
 67 |     #--------------------------------------------------#
 68 |     #   计算正样本数量
 69 |     #--------------------------------------------------#
 70 |     num_positive_anchors = positive_indices.sum()
 71 |     return targets, num_positive_anchors, positive_indices, assigned_annotations
 72 | 
 73 | def encode_bbox(assigned_annotations, positive_indices, anchor_widths, anchor_heights, anchor_ctr_x, anchor_ctr_y):
 74 |     #--------------------------------------------------#
 75 |     #   取出作为正样本的先验框对应的真实框
 76 |     #--------------------------------------------------#
 77 |     assigned_annotations = assigned_annotations[positive_indices, :]
 78 | 
 79 |     #--------------------------------------------------#
 80 |     #   取出作为正样本的先验框
 81 |     #--------------------------------------------------#
 82 |     anchor_widths_pi = anchor_widths[positive_indices]
 83 |     anchor_heights_pi = anchor_heights[positive_indices]
 84 |     anchor_ctr_x_pi = anchor_ctr_x[positive_indices]
 85 |     anchor_ctr_y_pi = anchor_ctr_y[positive_indices]
 86 | 
 87 |     #--------------------------------------------------#
 88 |     #   计算真实框的宽高与中心
 89 |     #--------------------------------------------------#
 90 |     gt_widths = assigned_annotations[:, 2] - assigned_annotations[:, 0]
 91 |     gt_heights = assigned_annotations[:, 3] - assigned_annotations[:, 1]
 92 |     gt_ctr_x = assigned_annotations[:, 0] + 0.5 * gt_widths
 93 |     gt_ctr_y = assigned_annotations[:, 1] + 0.5 * gt_heights
 94 | 
 95 |     gt_widths = torch.clamp(gt_widths, min=1)
 96 |     gt_heights = torch.clamp(gt_heights, min=1)
 97 | 
 98 |     #---------------------------------------------------#
 99 |     #   利用真实框和先验框进行编码，获得应该有的预测结果
100 |     #---------------------------------------------------#
101 |     targets_dx = (gt_ctr_x - anchor_ctr_x_pi) / anchor_widths_pi
102 |     targets_dy = (gt_ctr_y - anchor_ctr_y_pi) / anchor_heights_pi
103 |     targets_dw = torch.log(gt_widths / anchor_widths_pi)
104 |     targets_dh = torch.log(gt_heights / anchor_heights_pi)
105 | 
106 |     targets = torch.stack((targets_dy, targets_dx, targets_dh, targets_dw))
107 |     targets = targets.t()
108 |     return targets
109 | 
110 | class FocalLoss(nn.Module):
111 |     def __init__(self):
112 |         super(FocalLoss, self).__init__()
113 | 
114 |     def forward(self, classifications, regressions, anchors, annotations, alpha = 0.25, gamma = 2.0, cuda = True):
115 |         #---------------------------#
116 |         #   获得batch_size的大小
117 |         #---------------------------#
118 |         batch_size = classifications.shape[0]
119 | 
120 |         #--------------------------------------------#
121 |         #   获得先验框，将先验框转换成中心宽高的形式
122 |         #--------------------------------------------#
123 |         dtype = regressions.dtype
124 |         anchor = anchors[0, :, :].to(dtype)
125 |         #--------------------------------------------#
126 |         #   将先验框转换成中心，宽高的形式
127 |         #--------------------------------------------#
128 |         anchor_widths = anchor[:, 3] - anchor[:, 1]
129 |         anchor_heights = anchor[:, 2] - anchor[:, 0]
130 |         anchor_ctr_x = anchor[:, 1] + 0.5 * anchor_widths
131 |         anchor_ctr_y = anchor[:, 0] + 0.5 * anchor_heights
132 | 
133 |         regression_losses = []
134 |         classification_losses = []
135 |         for j in range(batch_size):
136 |             #-------------------------------------------------------#
137 |             #   取出每张图片对应的真实框、种类预测结果和回归预测结果
138 |             #-------------------------------------------------------#
139 |             bbox_annotation = annotations[j]
140 |             classification = classifications[j, :, :]
141 |             regression = regressions[j, :, :]
142 |             
143 |             classification = torch.clamp(classification, 5e-4, 1.0 - 5e-4)
144 |             
145 |             if len(bbox_annotation) == 0:
146 |                 #-------------------------------------------------------#
147 |                 #   当图片中不存在真实框的时候，所有特征点均为负样本
148 |                 #-------------------------------------------------------#
149 |                 alpha_factor = torch.ones_like(classification) * alpha
150 |                 alpha_factor = alpha_factor.type_as(classification)
151 | 
152 |                 alpha_factor = 1. - alpha_factor
153 |                 focal_weight = classification
154 |                 focal_weight = alpha_factor * torch.pow(focal_weight, gamma)
155 |                 
156 |                 #-------------------------------------------------------#
157 |                 #   计算特征点对应的交叉熵
158 |                 #-------------------------------------------------------#
159 |                 bce = - (torch.log(1.0 - classification))
160 |                 
161 |                 cls_loss = focal_weight * bce
162 |                 
163 |                 classification_losses.append(cls_loss.sum())
164 |                 #-------------------------------------------------------#
165 |                 #   回归损失此时为0
166 |                 #-------------------------------------------------------#
167 |                 regression_losses.append(torch.tensor(0).type_as(classification))
168 |                     
169 |                 continue
170 | 
171 |             #------------------------------------------------------#
172 |             #   计算真实框和先验框的交并比
173 |             #   targets                 num_anchors, num_classes
174 |             #   num_positive_anchors    正样本的数量
175 |             #   positive_indices        num_anchors, 
176 |             #   assigned_annotations    num_anchors, 5
177 |             #------------------------------------------------------#
178 |             targets, num_positive_anchors, positive_indices, assigned_annotations = get_target(anchor, 
179 |                                                                                         bbox_annotation, classification, cuda)
180 |             
181 |             #------------------------------------------------------#
182 |             #   首先计算交叉熵loss
183 |             #------------------------------------------------------#
184 |             alpha_factor = torch.ones_like(targets) * alpha
185 |             alpha_factor = alpha_factor.type_as(classification)
186 |             #------------------------------------------------------#
187 |             #   这里使用的是Focal loss的思想，
188 |             #   易分类样本权值小
189 |             #   难分类样本权值大
190 |             #------------------------------------------------------#
191 |             alpha_factor = torch.where(torch.eq(targets, 1.), alpha_factor, 1. - alpha_factor)
192 |             focal_weight = torch.where(torch.eq(targets, 1.), 1. - classification, classification)
193 |             focal_weight = alpha_factor * torch.pow(focal_weight, gamma)
194 | 
195 |             bce = - (targets * torch.log(classification) + (1.0 - targets) * torch.log(1.0 - classification))
196 |             cls_loss = focal_weight * bce
197 | 
198 |             #------------------------------------------------------#
199 |             #   把忽略的先验框的loss置为0
200 |             #------------------------------------------------------#
201 |             zeros = torch.zeros_like(cls_loss)
202 |             zeros = zeros.type_as(cls_loss)
203 |             cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, zeros)
204 | 
205 |             classification_losses.append(cls_loss.sum() / torch.clamp(num_positive_anchors.to(dtype), min=1.0))
206 |             
207 |             #------------------------------------------------------#
208 |             #   如果存在先验框为正样本的话
209 |             #------------------------------------------------------#
210 |             if positive_indices.sum() > 0:
211 |                 targets = encode_bbox(assigned_annotations, positive_indices, anchor_widths, anchor_heights, anchor_ctr_x, anchor_ctr_y)
212 |                 #---------------------------------------------------#
213 |                 #   将网络应该有的预测结果和实际的预测结果进行比较
214 |                 #   计算smooth l1 loss
215 |                 #---------------------------------------------------#
216 |                 regression_diff = torch.abs(targets - regression[positive_indices, :])
217 |                 regression_loss = torch.where(
218 |                     torch.le(regression_diff, 1.0 / 9.0),
219 |                     0.5 * 9.0 * torch.pow(regression_diff, 2),
220 |                     regression_diff - 0.5 / 9.0
221 |                 )
222 |                 regression_losses.append(regression_loss.mean())
223 |             else:
224 |                 regression_losses.append(torch.tensor(0).type_as(classification))
225 |         
226 |         # 计算平均loss并返回
227 |         c_loss = torch.stack(classification_losses).mean()
228 |         r_loss = torch.stack(regression_losses).mean()
229 |         loss = c_loss + r_loss
230 |         return loss, c_loss, r_loss
231 | 
232 | def get_lr_scheduler(lr_decay_type, lr, min_lr, total_iters, warmup_iters_ratio = 0.05, warmup_lr_ratio = 0.1, no_aug_iter_ratio = 0.05, step_num = 10):
233 |     def yolox_warm_cos_lr(lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter, iters):
234 |         if iters <= warmup_total_iters:
235 |             # lr = (lr - warmup_lr_start) * iters / float(warmup_total_iters) + warmup_lr_start
236 |             lr = (lr - warmup_lr_start) * pow(iters / float(warmup_total_iters), 2) + warmup_lr_start
237 |         elif iters >= total_iters - no_aug_iter:
238 |             lr = min_lr
239 |         else:
240 |             lr = min_lr + 0.5 * (lr - min_lr) * (
241 |                 1.0 + math.cos(math.pi* (iters - warmup_total_iters) / (total_iters - warmup_total_iters - no_aug_iter))
242 |             )
243 |         return lr
244 | 
245 |     def step_lr(lr, decay_rate, step_size, iters):
246 |         if step_size < 1:
247 |             raise ValueError("step_size must above 1.")
248 |         n       = iters // step_size
249 |         out_lr  = lr * decay_rate ** n
250 |         return out_lr
251 | 
252 |     if lr_decay_type == "cos":
253 |         warmup_total_iters  = min(max(warmup_iters_ratio * total_iters, 1), 3)
254 |         warmup_lr_start     = max(warmup_lr_ratio * lr, 1e-6)
255 |         no_aug_iter         = min(max(no_aug_iter_ratio * total_iters, 1), 15)
256 |         func = partial(yolox_warm_cos_lr ,lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter)
257 |     else:
258 |         decay_rate  = (min_lr / lr) ** (1 / (step_num - 1))
259 |         step_size   = total_iters / step_num
260 |         func = partial(step_lr, lr, decay_rate, step_size)
261 | 
262 |     return func
263 | 
264 | def set_optimizer_lr(optimizer, lr_scheduler_func, epoch):
265 |     lr = lr_scheduler_func(epoch)
266 |     for param_group in optimizer.param_groups:
267 |         param_group['lr'] = lr
268 | 


--------------------------------------------------------------------------------
/nets/efficientnet.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | from torch import nn
  3 | from torch.nn import functional as F
  4 | 
  5 | from nets.layers import (MemoryEfficientSwish, Swish, drop_connect,
  6 |                          efficientnet_params, get_model_params,
  7 |                          get_same_padding_conv2d, load_pretrained_weights,
  8 |                          round_filters, round_repeats)
  9 | 
 10 | 
 11 | class MBConvBlock(nn.Module):
 12 |     '''
 13 |     EfficientNet-b0:
 14 |     [BlockArgs(kernel_size=3, num_repeat=1, input_filters=32, output_filters=16, expand_ratio=1, id_skip=True, stride=[1], se_ratio=0.25), 
 15 |      BlockArgs(kernel_size=3, num_repeat=2, input_filters=16, output_filters=24, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 
 16 |      BlockArgs(kernel_size=5, num_repeat=2, input_filters=24, output_filters=40, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 
 17 |      BlockArgs(kernel_size=3, num_repeat=3, input_filters=40, output_filters=80, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 
 18 |      BlockArgs(kernel_size=5, num_repeat=3, input_filters=80, output_filters=112, expand_ratio=6, id_skip=True, stride=[1], se_ratio=0.25), 
 19 |      BlockArgs(kernel_size=5, num_repeat=4, input_filters=112, output_filters=192, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 
 20 |      BlockArgs(kernel_size=3, num_repeat=1, input_filters=192, output_filters=320, expand_ratio=6, id_skip=True, stride=[1], se_ratio=0.25)]
 21 |     
 22 |      GlobalParams(batch_norm_momentum=0.99, batch_norm_epsilon=0.001, dropout_rate=0.2, num_classes=1000, width_coefficient=1.0, 
 23 |                     depth_coefficient=1.0, depth_divisor=8, min_depth=None, drop_connect_rate=0.2, image_size=224)
 24 |     '''
 25 |     def __init__(self, block_args, global_params):
 26 |         super().__init__()
 27 |         self._block_args = block_args
 28 |         # 获得一种卷积方法
 29 |         Conv2d = get_same_padding_conv2d(image_size=global_params.image_size)
 30 | 
 31 |         # 获得标准化的参数
 32 |         self._bn_mom = 1 - global_params.batch_norm_momentum
 33 |         self._bn_eps = global_params.batch_norm_epsilon
 34 | 
 35 |         #----------------------------#
 36 |         #   计算是否施加注意力机制
 37 |         #----------------------------#
 38 |         self.has_se = (self._block_args.se_ratio is not None) and (0 < self._block_args.se_ratio <= 1)
 39 |         #----------------------------#
 40 |         #   判断是否添加残差边
 41 |         #----------------------------#
 42 |         self.id_skip = block_args.id_skip 
 43 | 
 44 |         #-------------------------------------------------#
 45 |         #   利用Inverted residuals
 46 |         #   part1 利用1x1卷积进行通道数上升
 47 |         #-------------------------------------------------#
 48 |         inp = self._block_args.input_filters
 49 |         oup = self._block_args.input_filters * self._block_args.expand_ratio
 50 |         if self._block_args.expand_ratio != 1:
 51 |             self._expand_conv = Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, bias=False)
 52 |             self._bn0 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps)
 53 | 
 54 |         #------------------------------------------------------#
 55 |         #   如果步长为2x2的话，利用深度可分离卷积进行高宽压缩
 56 |         #   part2 利用3x3卷积对每一个channel进行卷积
 57 |         #------------------------------------------------------#
 58 |         k = self._block_args.kernel_size
 59 |         s = self._block_args.stride
 60 |         self._depthwise_conv = Conv2d(in_channels=oup, out_channels=oup, groups=oup, kernel_size=k, stride=s, bias=False)
 61 |         self._bn1 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps)
 62 | 
 63 |         #------------------------------------------------------#
 64 |         #   完成深度可分离卷积后
 65 |         #   对深度可分离卷积的结果施加注意力机制
 66 |         #------------------------------------------------------#
 67 |         if self.has_se:
 68 |             num_squeezed_channels = max(1, int(self._block_args.input_filters * self._block_args.se_ratio))
 69 |             #------------------------------------------------------#
 70 |             #   通道先压缩后上升，最后利用sigmoid将值固定到0-1之间
 71 |             #------------------------------------------------------#
 72 |             self._se_reduce = Conv2d(in_channels=oup, out_channels=num_squeezed_channels, kernel_size=1)
 73 |             self._se_expand = Conv2d(in_channels=num_squeezed_channels, out_channels=oup, kernel_size=1)
 74 | 
 75 |         #------------------------------------------------------#
 76 |         #   part3 利用1x1卷积进行通道下降
 77 |         #------------------------------------------------------#
 78 |         final_oup = self._block_args.output_filters
 79 |         self._project_conv = Conv2d(in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False)
 80 |         self._bn2 = nn.BatchNorm2d(num_features=final_oup, momentum=self._bn_mom, eps=self._bn_eps)
 81 | 
 82 |         self._swish = MemoryEfficientSwish()
 83 | 
 84 |     def forward(self, inputs, drop_connect_rate=None):
 85 |         x = inputs
 86 |         #-------------------------------------------------#
 87 |         #   利用Inverted residuals
 88 |         #   part1 利用1x1卷积进行通道数上升
 89 |         #-------------------------------------------------#
 90 |         if self._block_args.expand_ratio != 1:
 91 |             x = self._swish(self._bn0(self._expand_conv(inputs)))
 92 | 
 93 |         #------------------------------------------------------#
 94 |         #   如果步长为2x2的话，利用深度可分离卷积进行高宽压缩
 95 |         #   part2 利用3x3卷积对每一个channel进行卷积
 96 |         #------------------------------------------------------#
 97 |         x = self._swish(self._bn1(self._depthwise_conv(x)))
 98 | 
 99 |         #------------------------------------------------------#
100 |         #   完成深度可分离卷积后
101 |         #   对深度可分离卷积的结果施加注意力机制
102 |         #------------------------------------------------------#
103 |         if self.has_se:
104 |             x_squeezed = F.adaptive_avg_pool2d(x, 1)
105 |             x_squeezed = self._se_expand(
106 |                 self._swish(self._se_reduce(x_squeezed)))
107 |             x = torch.sigmoid(x_squeezed) * x
108 | 
109 |         #------------------------------------------------------#
110 |         #   part3 利用1x1卷积进行通道下降
111 |         #------------------------------------------------------#
112 |         x = self._bn2(self._project_conv(x))
113 | 
114 |         #------------------------------------------------------#
115 |         #   part4 如果满足残差条件，那么就增加残差边
116 |         #------------------------------------------------------#
117 |         input_filters, output_filters = self._block_args.input_filters, self._block_args.output_filters
118 |         if self.id_skip and self._block_args.stride == 1 and input_filters == output_filters:
119 |             if drop_connect_rate:
120 |                 x = drop_connect(x, p=drop_connect_rate,
121 |                                  training=self.training)
122 |             x = x + inputs  # skip connection
123 |         return x
124 | 
125 |     def set_swish(self, memory_efficient=True):
126 |         """Sets swish function as memory efficient (for training) or standard (for export)"""
127 |         self._swish = MemoryEfficientSwish() if memory_efficient else Swish()
128 | 
129 | class EfficientNet(nn.Module):
130 |     '''
131 |     EfficientNet-b0:
132 |     [BlockArgs(kernel_size=3, num_repeat=1, input_filters=32, output_filters=16, expand_ratio=1, id_skip=True, stride=[1], se_ratio=0.25), 
133 |      BlockArgs(kernel_size=3, num_repeat=2, input_filters=16, output_filters=24, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 
134 |      BlockArgs(kernel_size=5, num_repeat=2, input_filters=24, output_filters=40, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 
135 |      BlockArgs(kernel_size=3, num_repeat=3, input_filters=40, output_filters=80, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 
136 |      BlockArgs(kernel_size=5, num_repeat=3, input_filters=80, output_filters=112, expand_ratio=6, id_skip=True, stride=[1], se_ratio=0.25), 
137 |      BlockArgs(kernel_size=5, num_repeat=4, input_filters=112, output_filters=192, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 
138 |      BlockArgs(kernel_size=3, num_repeat=1, input_filters=192, output_filters=320, expand_ratio=6, id_skip=True, stride=[1], se_ratio=0.25)]
139 |     
140 |      GlobalParams(batch_norm_momentum=0.99, batch_norm_epsilon=0.001, dropout_rate=0.2, num_classes=1000, width_coefficient=1.0, 
141 |                     depth_coefficient=1.0, depth_divisor=8, min_depth=None, drop_connect_rate=0.2, image_size=224)
142 |     '''
143 |     def __init__(self, blocks_args=None, global_params=None):
144 |         super().__init__()
145 |         assert isinstance(blocks_args, list), 'blocks_args should be a list'
146 |         assert len(blocks_args) > 0, 'block args must be greater than 0'
147 |         self._global_params = global_params
148 |         self._blocks_args = blocks_args
149 |         # 获得一种卷积方法
150 |         Conv2d = get_same_padding_conv2d(image_size=global_params.image_size)
151 | 
152 |         # 获得标准化的参数
153 |         bn_mom = 1 - self._global_params.batch_norm_momentum
154 |         bn_eps = self._global_params.batch_norm_epsilon
155 | 
156 |         #-------------------------------------------------#
157 |         #   网络主干部分开始
158 |         #   设定输入进来的是RGB三通道图像
159 |         #   利用round_filters可以使得通道可以被8整除
160 |         #-------------------------------------------------#
161 |         in_channels = 3  
162 |         out_channels = round_filters(32, self._global_params)
163 | 
164 |         #-------------------------------------------------#
165 |         #   创建stem部分
166 |         #-------------------------------------------------#
167 |         self._conv_stem = Conv2d(
168 |             in_channels, out_channels, kernel_size=3, stride=2, bias=False)
169 |         self._bn0 = nn.BatchNorm2d(
170 |             num_features=out_channels, momentum=bn_mom, eps=bn_eps)
171 | 
172 |         #-------------------------------------------------#
173 |         #   在这个地方对大结构块进行循环
174 |         #-------------------------------------------------#
175 |         self._blocks = nn.ModuleList([])
176 |         for i in range(len(self._blocks_args)):
177 |             #-------------------------------------------------------------#
178 |             #   对每个block的参数进行修改，根据所选的efficient版本进行修改
179 |             #-------------------------------------------------------------#
180 |             self._blocks_args[i] = self._blocks_args[i]._replace(
181 |                 input_filters=round_filters(self._blocks_args[i].input_filters, self._global_params),
182 |                 output_filters=round_filters(self._blocks_args[i].output_filters, self._global_params),
183 |                 num_repeat=round_repeats(self._blocks_args[i].num_repeat, self._global_params)
184 |             )
185 | 
186 |             #-------------------------------------------------------------#
187 |             #   每个大结构块里面的第一个EfficientBlock
188 |             #   都需要考虑步长和输入通道数
189 |             #-------------------------------------------------------------#
190 |             self._blocks.append(MBConvBlock(self._blocks_args[i], self._global_params))
191 | 
192 |             if self._blocks_args[i].num_repeat > 1:
193 |                 self._blocks_args[i] = self._blocks_args[i]._replace(input_filters=self._blocks_args[i].output_filters, stride=1)
194 | 
195 |             #---------------------------------------------------------------#
196 |             #   在利用第一个EfficientBlock进行通道数的调整或者高和宽的压缩后
197 |             #   进行EfficientBlock的堆叠
198 |             #---------------------------------------------------------------#
199 |             for _ in range(self._blocks_args[i].num_repeat - 1):
200 |                 self._blocks.append(MBConvBlock(self._blocks_args[i], self._global_params))
201 | 
202 |         #----------------------------------------------------------------#
203 |         #   这是efficientnet的尾部部分，在进行effcientdet构建的时候没用到
204 |         #   只在利用efficientnet进行分类的时候用到。
205 |         #----------------------------------------------------------------#
206 |         in_channels = self._blocks_args[len(self._blocks_args)-1].output_filters
207 |         out_channels = round_filters(1280, self._global_params)
208 | 
209 |         self._conv_head = Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
210 |         self._bn1 = nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps)
211 | 
212 |         self._avg_pooling = nn.AdaptiveAvgPool2d(1)
213 |         self._dropout = nn.Dropout(self._global_params.dropout_rate)
214 |         self._fc = nn.Linear(out_channels, self._global_params.num_classes)
215 | 
216 |         self._swish = MemoryEfficientSwish()
217 | 
218 |     def set_swish(self, memory_efficient=True):
219 |         """Sets swish function as memory efficient (for training) or standard (for export)"""
220 |         # swish函数
221 |         self._swish = MemoryEfficientSwish() if memory_efficient else Swish()
222 |         for block in self._blocks:
223 |             block.set_swish(memory_efficient)
224 | 
225 |     def extract_features(self, inputs):
226 |         """ Returns output of the final convolution layer """
227 | 
228 |         # Stem
229 |         x = self._swish(self._bn0(self._conv_stem(inputs)))
230 | 
231 |         # Blocks
232 |         for idx, block in enumerate(self._blocks):
233 |             drop_connect_rate = self._global_params.drop_connect_rate
234 |             if drop_connect_rate:
235 |                 drop_connect_rate *= float(idx) / len(self._blocks)
236 |             x = block(x, drop_connect_rate=drop_connect_rate)
237 |         # Head
238 |         x = self._swish(self._bn1(self._conv_head(x)))
239 | 
240 |         return x
241 | 
242 |     def forward(self, inputs):
243 |         """ Calls extract_features to extract features, applies final linear layer, and returns logits. """
244 |         bs = inputs.size(0)
245 |         # Convolution layers
246 |         x = self.extract_features(inputs)
247 | 
248 |         # Pooling and final linear layer
249 |         x = self._avg_pooling(x)
250 |         x = x.view(bs, -1)
251 |         x = self._dropout(x)
252 |         x = self._fc(x)
253 |         return x
254 | 
255 |     @classmethod
256 |     def from_name(cls, model_name, override_params=None):
257 |         cls._check_model_name_is_valid(model_name)
258 |         blocks_args, global_params = get_model_params(model_name, override_params)
259 |         return cls(blocks_args, global_params)
260 | 
261 |     @classmethod
262 |     def from_pretrained(cls, model_name, load_weights=True, advprop=True, num_classes=1000, in_channels=3):
263 |         model = cls.from_name(model_name, override_params={'num_classes': num_classes})
264 |         if load_weights:
265 |             load_pretrained_weights(model, model_name, load_fc=(num_classes == 1000), advprop=advprop)
266 |         if in_channels != 3:
267 |             Conv2d = get_same_padding_conv2d(image_size = model._global_params.image_size)
268 |             out_channels = round_filters(32, model._global_params)
269 |             model._conv_stem = Conv2d(in_channels, out_channels, kernel_size=3, stride=2, bias=False)
270 |         return model
271 | 
272 |     @classmethod
273 |     def get_image_size(cls, model_name):
274 |         cls._check_model_name_is_valid(model_name)
275 |         _, _, res, _ = efficientnet_params(model_name)
276 |         return res
277 | 
278 |     @classmethod
279 |     def _check_model_name_is_valid(cls, model_name):
280 |         """ Validates model name. """
281 |         valid_models = ['efficientnet-b'+str(i) for i in range(9)]
282 |         if model_name not in valid_models:
283 |             raise ValueError('model_name should be one of: ' + ', '.join(valid_models))
284 | 


--------------------------------------------------------------------------------
/nets/layers.py:
--------------------------------------------------------------------------------
  1 | import collections
  2 | import math
  3 | import re
  4 | from functools import partial
  5 | 
  6 | import torch
  7 | from torch import nn
  8 | from torch.nn import functional as F
  9 | from torch.utils import model_zoo
 10 | 
 11 | #--------------------------------------------------------------#
 12 | #   模型构建的辅助函数
 13 | #--------------------------------------------------------------#
 14 | GlobalParams = collections.namedtuple('GlobalParams', [
 15 |     'batch_norm_momentum', 'batch_norm_epsilon', 'dropout_rate',
 16 |     'num_classes', 'width_coefficient', 'depth_coefficient',
 17 |     'depth_divisor', 'min_depth', 'drop_connect_rate', 'image_size'])
 18 | 
 19 | BlockArgs = collections.namedtuple('BlockArgs', [
 20 |     'kernel_size', 'num_repeat', 'input_filters', 'output_filters',
 21 |     'expand_ratio', 'id_skip', 'stride', 'se_ratio'])
 22 | 
 23 | GlobalParams.__new__.__defaults__ = (None,) * len(GlobalParams._fields)
 24 | BlockArgs.__new__.__defaults__ = (None,) * len(BlockArgs._fields)
 25 | 
 26 | def round_filters(filters, global_params):
 27 |     """ Calculate and round number of filters based on depth multiplier. """
 28 |     multiplier = global_params.width_coefficient
 29 |     if not multiplier:
 30 |         return filters
 31 |     divisor = global_params.depth_divisor
 32 |     min_depth = global_params.min_depth
 33 |     filters *= multiplier
 34 |     min_depth = min_depth or divisor
 35 |     new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor)
 36 |     if new_filters < 0.9 * filters:  # prevent rounding by more than 10%
 37 |         new_filters += divisor
 38 |     return int(new_filters)
 39 | 
 40 | 
 41 | def round_repeats(repeats, global_params):
 42 |     """ Round number of filters based on depth multiplier. """
 43 |     multiplier = global_params.depth_coefficient
 44 |     if not multiplier:
 45 |         return repeats
 46 |     return int(math.ceil(multiplier * repeats))
 47 | 
 48 | 
 49 | def drop_connect(inputs, p, training):
 50 |     """ Drop connect. """
 51 |     if not training: return inputs
 52 |     batch_size = inputs.shape[0]
 53 |     keep_prob = 1 - p
 54 |     random_tensor = keep_prob
 55 |     random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=inputs.dtype, device=inputs.device)
 56 |     binary_tensor = torch.floor(random_tensor)
 57 |     output = inputs / keep_prob * binary_tensor
 58 |     return output
 59 | 
 60 | 
 61 | def get_same_padding_conv2d(image_size=None):
 62 |     """ Chooses static padding if you have specified an image size, and dynamic padding otherwise.
 63 |         Static padding is necessary for ONNX exporting of models. """
 64 |     if image_size is None:
 65 |         return Conv2dDynamicSamePadding
 66 |     else:
 67 |         return partial(Conv2dStaticSamePadding, image_size=image_size)
 68 | 
 69 | 
 70 | class Conv2dDynamicSamePadding(nn.Conv2d):
 71 |     """ 2D Convolutions like TensorFlow, for a dynamic image size """
 72 | 
 73 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, groups=1, bias=True):
 74 |         super().__init__(in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias)
 75 |         self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]] * 2
 76 | 
 77 |     def forward(self, x):
 78 |         ih, iw = x.size()[-2:]
 79 |         kh, kw = self.weight.size()[-2:]
 80 |         sh, sw = self.stride
 81 |         oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
 82 |         pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0)
 83 |         pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0)
 84 |         if pad_h > 0 or pad_w > 0:
 85 |             x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2])
 86 |         return F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
 87 | 
 88 | 
 89 | class Identity(nn.Module):
 90 |     def __init__(self, ):
 91 |         super(Identity, self).__init__()
 92 | 
 93 |     def forward(self, input):
 94 |         return input
 95 | 
 96 | #--------------------------------------------------------------#
 97 | #   加载模型参数的辅助函数
 98 | #--------------------------------------------------------------#
 99 | def efficientnet_params(model_name):
100 |     """ Map EfficientNet model name to parameter coefficients. """
101 |     params_dict = {
102 |         # Coefficients: width,depth,res,dropout
103 |         'efficientnet-b0': (1.0, 1.0, 224, 0.2),
104 |         'efficientnet-b1': (1.0, 1.1, 240, 0.2),
105 |         'efficientnet-b2': (1.1, 1.2, 260, 0.3),
106 |         'efficientnet-b3': (1.2, 1.4, 300, 0.3),
107 |         'efficientnet-b4': (1.4, 1.8, 380, 0.4),
108 |         'efficientnet-b5': (1.6, 2.2, 456, 0.4),
109 |         'efficientnet-b6': (1.8, 2.6, 528, 0.5),
110 |         'efficientnet-b7': (2.0, 3.1, 600, 0.5),
111 |         'efficientnet-b8': (2.2, 3.6, 672, 0.5),
112 |         'efficientnet-l2': (4.3, 5.3, 800, 0.5),
113 |     }
114 |     return params_dict[model_name]
115 | 
116 | 
117 | class BlockDecoder(object):
118 |     """ Block Decoder for readability, straight from the official TensorFlow repository """
119 | 
120 |     @staticmethod
121 |     def _decode_block_string(block_string):
122 |         """ Gets a block through a string notation of arguments. """
123 |         assert isinstance(block_string, str)
124 | 
125 |         ops = block_string.split('_')
126 |         options = {}
127 |         for op in ops:
128 |             splits = re.split(r'(\d.*)', op)
129 |             if len(splits) >= 2:
130 |                 key, value = splits[:2]
131 |                 options[key] = value
132 | 
133 |         # Check stride
134 |         assert (('s' in options and len(options['s']) == 1) or
135 |                 (len(options['s']) == 2 and options['s'][0] == options['s'][1]))
136 | 
137 |         return BlockArgs(
138 |             kernel_size=int(options['k']),
139 |             num_repeat=int(options['r']),
140 |             input_filters=int(options['i']),
141 |             output_filters=int(options['o']),
142 |             expand_ratio=int(options['e']),
143 |             id_skip=('noskip' not in block_string),
144 |             se_ratio=float(options['se']) if 'se' in options else None,
145 |             stride=[int(options['s'][0])])
146 | 
147 |     @staticmethod
148 |     def _encode_block_string(block):
149 |         """Encodes a block to a string."""
150 |         args = [
151 |             'r%d' % block.num_repeat,
152 |             'k%d' % block.kernel_size,
153 |             's%d%d' % (block.strides[0], block.strides[1]),
154 |             'e%s' % block.expand_ratio,
155 |             'i%d' % block.input_filters,
156 |             'o%d' % block.output_filters
157 |         ]
158 |         if 0 < block.se_ratio <= 1:
159 |             args.append('se%s' % block.se_ratio)
160 |         if block.id_skip is False:
161 |             args.append('noskip')
162 |         return '_'.join(args)
163 | 
164 |     @staticmethod
165 |     def decode(string_list):
166 |         """
167 |         Decodes a list of string notations to specify blocks inside the network.
168 | 
169 |         :param string_list: a list of strings, each string is a notation of block
170 |         :return: a list of BlockArgs namedtuples of block args
171 |         """
172 |         assert isinstance(string_list, list)
173 |         blocks_args = []
174 |         for block_string in string_list:
175 |             blocks_args.append(BlockDecoder._decode_block_string(block_string))
176 |         return blocks_args
177 | 
178 |     @staticmethod
179 |     def encode(blocks_args):
180 |         """
181 |         Encodes a list of BlockArgs to a list of strings.
182 | 
183 |         :param blocks_args: a list of BlockArgs namedtuples of block args
184 |         :return: a list of strings, each string is a notation of block
185 |         """
186 |         block_strings = []
187 |         for block in blocks_args:
188 |             block_strings.append(BlockDecoder._encode_block_string(block))
189 |         return block_strings
190 | 
191 | def efficientnet(width_coefficient=None, depth_coefficient=None, dropout_rate=0.2,
192 |                  drop_connect_rate=0.2, image_size=None, num_classes=1000):
193 |     """ Creates a efficientnet model. """
194 | 
195 |     blocks_args = [
196 |         'r1_k3_s11_e1_i32_o16_se0.25', 'r2_k3_s22_e6_i16_o24_se0.25',
197 |         'r2_k5_s22_e6_i24_o40_se0.25', 'r3_k3_s22_e6_i40_o80_se0.25',
198 |         'r3_k5_s11_e6_i80_o112_se0.25', 'r4_k5_s22_e6_i112_o192_se0.25',
199 |         'r1_k3_s11_e6_i192_o320_se0.25',
200 |     ]
201 |     blocks_args = BlockDecoder.decode(blocks_args)
202 | 
203 |     global_params = GlobalParams(
204 |         batch_norm_momentum=0.99,
205 |         batch_norm_epsilon=1e-3,
206 |         dropout_rate=dropout_rate,
207 |         drop_connect_rate=drop_connect_rate,
208 |         # data_format='channels_last',  # removed, this is always true in PyTorch
209 |         num_classes=num_classes,
210 |         width_coefficient=width_coefficient,
211 |         depth_coefficient=depth_coefficient,
212 |         depth_divisor=8,
213 |         min_depth=None,
214 |         image_size=image_size,
215 |     )
216 | 
217 |     return blocks_args, global_params
218 | 
219 | def get_model_params(model_name, override_params):
220 |     """ Get the block args and global params for a given model """
221 |     if model_name.startswith('efficientnet'):
222 |         w, d, s, p = efficientnet_params(model_name)
223 |         # note: all models have drop connect rate = 0.2
224 |         blocks_args, global_params = efficientnet(
225 |             width_coefficient=w, depth_coefficient=d, dropout_rate=p, image_size=s)
226 |     else:
227 |         raise NotImplementedError('model name is not pre-defined: %s' % model_name)
228 |     if override_params:
229 |         # ValueError will be raised here if override_params has fields not included in global_params.
230 |         global_params = global_params._replace(**override_params)
231 |     return blocks_args, global_params
232 | 
233 | url_map = {
234 |     'efficientnet-b0': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b0.pth',
235 |     'efficientnet-b1': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b1.pth',
236 |     'efficientnet-b2': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b2.pth',
237 |     'efficientnet-b3': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b3.pth',
238 |     'efficientnet-b4': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b4.pth',
239 |     'efficientnet-b5': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b5.pth',
240 |     'efficientnet-b6': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b6.pth',
241 |     'efficientnet-b7': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b7.pth',
242 | }
243 | 
244 | def load_pretrained_weights(model, model_name, load_fc=True, advprop=False):
245 |     """ Loads pretrained weights, and downloads if loading for the first time. """
246 |     # AutoAugment or Advprop (different preprocessing)
247 |     url_map_ = url_map
248 |     state_dict = model_zoo.load_url(url_map_[model_name], map_location=torch.device('cpu'), model_dir="./model_data")
249 |     # state_dict = torch.load('../../weights/backbone_efficientnetb0.pth')
250 |     if load_fc:
251 |         ret = model.load_state_dict(state_dict, strict=False)
252 |         print(ret)
253 |     else:
254 |         state_dict.pop('_fc.weight')
255 |         state_dict.pop('_fc.bias')
256 |         res = model.load_state_dict(state_dict, strict=False)
257 |         assert set(res.missing_keys) == set(['_fc.weight', '_fc.bias']), 'issue loading pretrained weights'
258 |     print('Loaded pretrained weights for {}'.format(model_name))
259 | 
260 | class SwishImplementation(torch.autograd.Function):
261 |     @staticmethod
262 |     def forward(ctx, i):
263 |         result = i * torch.sigmoid(i)
264 |         ctx.save_for_backward(i)
265 |         return result
266 | 
267 |     @staticmethod
268 |     def backward(ctx, grad_output):
269 |         i = ctx.saved_variables[0]
270 |         sigmoid_i = torch.sigmoid(i)
271 |         return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i)))
272 | 
273 | class MemoryEfficientSwish(nn.Module):
274 |     def forward(self, x):
275 |         return SwishImplementation.apply(x)
276 | 
277 | class Swish(nn.Module):
278 |     def forward(self, x):
279 |         return x * torch.sigmoid(x)
280 | 
281 | class Conv2dStaticSamePadding(nn.Module):
282 |     def __init__(self, in_channels, out_channels, kernel_size, stride=1, bias=True, groups=1, dilation=1, **kwargs):
283 |         super().__init__()
284 |         self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride,
285 |                               bias=bias, groups=groups)
286 |         self.stride = self.conv.stride
287 |         self.kernel_size = self.conv.kernel_size
288 |         self.dilation = self.conv.dilation
289 | 
290 |         if isinstance(self.stride, int):
291 |             self.stride = [self.stride] * 2
292 |         elif len(self.stride) == 1:
293 |             self.stride = [self.stride[0]] * 2
294 | 
295 |         if isinstance(self.kernel_size, int):
296 |             self.kernel_size = [self.kernel_size] * 2
297 |         elif len(self.kernel_size) == 1:
298 |             self.kernel_size = [self.kernel_size[0]] * 2
299 | 
300 |     def forward(self, x):
301 |         h, w = x.shape[-2:]
302 |         
303 |         extra_h = (math.ceil(w / self.stride[1]) - 1) * self.stride[1] - w + self.kernel_size[1]
304 |         extra_v = (math.ceil(h / self.stride[0]) - 1) * self.stride[0] - h + self.kernel_size[0]
305 |         
306 |         left = extra_h // 2
307 |         right = extra_h - left
308 |         top = extra_v // 2
309 |         bottom = extra_v - top
310 | 
311 |         x = F.pad(x, [left, right, top, bottom])
312 | 
313 |         x = self.conv(x)
314 |         return x
315 | 
316 | class MaxPool2dStaticSamePadding(nn.Module):
317 |     def __init__(self, *args, **kwargs):
318 |         super().__init__()
319 |         self.pool = nn.MaxPool2d(*args, **kwargs)
320 |         self.stride = self.pool.stride
321 |         self.kernel_size = self.pool.kernel_size
322 | 
323 |         if isinstance(self.stride, int):
324 |             self.stride = [self.stride] * 2
325 |         elif len(self.stride) == 1:
326 |             self.stride = [self.stride[0]] * 2
327 | 
328 |         if isinstance(self.kernel_size, int):
329 |             self.kernel_size = [self.kernel_size] * 2
330 |         elif len(self.kernel_size) == 1:
331 |             self.kernel_size = [self.kernel_size[0]] * 2
332 | 
333 |     def forward(self, x):
334 |         h, w = x.shape[-2:]
335 |         
336 |         extra_h = (math.ceil(w / self.stride[1]) - 1) * self.stride[1] - w + self.kernel_size[1]
337 |         extra_v = (math.ceil(h / self.stride[0]) - 1) * self.stride[0] - h + self.kernel_size[0]
338 | 
339 |         left = extra_h // 2
340 |         right = extra_h - left
341 |         top = extra_v // 2
342 |         bottom = extra_v - top
343 | 
344 |         x = F.pad(x, [left, right, top, bottom])
345 | 
346 |         x = self.pool(x)
347 |         return x
348 | 


--------------------------------------------------------------------------------
/predict.py:
--------------------------------------------------------------------------------
  1 | #-----------------------------------------------------------------------#
  2 | #   predict.py将单张图片预测、摄像头检测、FPS测试和目录遍历检测等功能
  3 | #   整合到了一个py文件中，通过指定mode进行模式的修改。
  4 | #-----------------------------------------------------------------------#
  5 | import time
  6 | 
  7 | import cv2
  8 | import numpy as np
  9 | from PIL import Image
 10 | 
 11 | from efficientdet import Efficientdet
 12 | 
 13 | if __name__ == "__main__":
 14 |     efficientdet = Efficientdet()
 15 |     #----------------------------------------------------------------------------------------------------------#
 16 |     #   mode用于指定测试的模式：
 17 |     #   'predict'           表示单张图片预测，如果想对预测过程进行修改，如保存图片，截取对象等，可以先看下方详细的注释
 18 |     #   'video'             表示视频检测，可调用摄像头或者视频进行检测，详情查看下方注释。
 19 |     #   'fps'               表示测试fps，使用的图片是img里面的street.jpg，详情查看下方注释。
 20 |     #   'dir_predict'       表示遍历文件夹进行检测并保存。默认遍历img文件夹，保存img_out文件夹，详情查看下方注释。
 21 |     #----------------------------------------------------------------------------------------------------------#
 22 |     mode = "predict"
 23 |     #-------------------------------------------------------------------------#
 24 |     #   crop                指定了是否在单张图片预测后对目标进行截取
 25 |     #   count               指定了是否进行目标的计数
 26 |     #   crop、count仅在mode='predict'时有效
 27 |     #-------------------------------------------------------------------------#
 28 |     crop            = False
 29 |     count           = False
 30 |     #----------------------------------------------------------------------------------------------------------#
 31 |     #   video_path          用于指定视频的路径，当video_path=0时表示检测摄像头
 32 |     #                       想要检测视频，则设置如video_path = "xxx.mp4"即可，代表读取出根目录下的xxx.mp4文件。
 33 |     #   video_save_path     表示视频保存的路径，当video_save_path=""时表示不保存
 34 |     #                       想要保存视频，则设置如video_save_path = "yyy.mp4"即可，代表保存为根目录下的yyy.mp4文件。
 35 |     #   video_fps           用于保存的视频的fps
 36 |     #
 37 |     #   video_path、video_save_path和video_fps仅在mode='video'时有效
 38 |     #   保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。
 39 |     #----------------------------------------------------------------------------------------------------------#
 40 |     video_path      = 0
 41 |     video_save_path = ""
 42 |     video_fps       = 25.0
 43 |     #----------------------------------------------------------------------------------------------------------#
 44 |     #   test_interval       用于指定测量fps的时候，图片检测的次数。理论上test_interval越大，fps越准确。
 45 |     #   fps_image_path      用于指定测试的fps图片
 46 |     #   
 47 |     #   test_interval和fps_image_path仅在mode='fps'有效
 48 |     #----------------------------------------------------------------------------------------------------------#
 49 |     test_interval   = 100
 50 |     fps_image_path  = "img/street.jpg"
 51 |     #-------------------------------------------------------------------------#
 52 |     #   dir_origin_path     指定了用于检测的图片的文件夹路径
 53 |     #   dir_save_path       指定了检测完图片的保存路径
 54 |     #   
 55 |     #   dir_origin_path和dir_save_path仅在mode='dir_predict'时有效
 56 |     #-------------------------------------------------------------------------#
 57 |     dir_origin_path = "img/"
 58 |     dir_save_path   = "img_out/"
 59 | 
 60 |     if mode == "predict":
 61 |         '''
 62 |         1、如果想要进行检测完的图片的保存，利用r_image.save("img.jpg")即可保存，直接在predict.py里进行修改即可。 
 63 |         2、如果想要获得预测框的坐标，可以进入efficientdet.detect_image函数，在绘图部分读取top，left，bottom，right这四个值。
 64 |         3、如果想要利用预测框截取下目标，可以进入efficientdet.detect_image函数，在绘图部分利用获取到的top，left，bottom，right这四个值
 65 |         在原图上利用矩阵的方式进行截取。
 66 |         4、如果想要在预测图上写额外的字，比如检测到的特定目标的数量，可以进入efficientdet.detect_image函数，在绘图部分对predicted_class进行判断，
 67 |         比如判断if predicted_class == 'car': 即可判断当前目标是否为车，然后记录数量即可。利用draw.text即可写字。
 68 |         '''
 69 |         while True:
 70 |             img = input('Input image filename:')
 71 |             try:
 72 |                 image = Image.open(img)
 73 |             except:
 74 |                 print('Open Error! Try again!')
 75 |                 continue
 76 |             else:
 77 |                 r_image = efficientdet.detect_image(image, crop = crop, count=count)
 78 |                 r_image.show()
 79 | 
 80 |     elif mode == "video":
 81 |         capture = cv2.VideoCapture(video_path)
 82 |         if video_save_path!="":
 83 |             fourcc  = cv2.VideoWriter_fourcc(*'XVID')
 84 |             size    = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
 85 |             out     = cv2.VideoWriter(video_save_path, fourcc, video_fps, size)
 86 | 
 87 |         ref, frame = capture.read()
 88 |         if not ref:
 89 |             raise ValueError("未能正确读取摄像头（视频），请注意是否正确安装摄像头（是否正确填写视频路径）。")
 90 | 
 91 |         fps = 0.0
 92 |         while(True):
 93 |             t1 = time.time()
 94 |             # 读取某一帧
 95 |             ref, frame = capture.read()
 96 |             if not ref:
 97 |                 break
 98 |             # 格式转变，BGRtoRGB
 99 |             frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
100 |             # 转变成Image
101 |             frame = Image.fromarray(np.uint8(frame))
102 |             # 进行检测
103 |             frame = np.array(efficientdet.detect_image(frame))
104 |             # RGBtoBGR满足opencv显示格式
105 |             frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR)
106 |             
107 |             fps  = ( fps + (1./(time.time()-t1)) ) / 2
108 |             print("fps= %.2f"%(fps))
109 |             frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
110 |             
111 |             cv2.imshow("video",frame)
112 |             c= cv2.waitKey(1) & 0xff 
113 |             if video_save_path!="":
114 |                 out.write(frame)
115 | 
116 |             if c==27:
117 |                 capture.release()
118 |                 break
119 | 
120 |         print("Video Detection Done!")
121 |         capture.release()
122 |         if video_save_path!="":
123 |             print("Save processed video to the path :" + video_save_path)
124 |             out.release()
125 |         cv2.destroyAllWindows()
126 |         
127 |     elif mode == "fps":
128 |         img = Image.open(fps_image_path)
129 |         tact_time = efficientdet.get_FPS(img, test_interval)
130 |         print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1')
131 | 
132 |     elif mode == "dir_predict":
133 |         import os
134 | 
135 |         from tqdm import tqdm
136 | 
137 |         img_names = os.listdir(dir_origin_path)
138 |         for img_name in tqdm(img_names):
139 |             if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')):
140 |                 image_path  = os.path.join(dir_origin_path, img_name)
141 |                 image       = Image.open(image_path)
142 |                 r_image     = efficientdet.detect_image(image)
143 |                 if not os.path.exists(dir_save_path):
144 |                     os.makedirs(dir_save_path)
145 |                 r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0)
146 | 
147 |     else:
148 |         raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps' or 'dir_predict'.")
149 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | torch
 2 | torchvision
 3 | tensorboard
 4 | scipy==1.2.1
 5 | numpy==1.17.0
 6 | matplotlib==3.1.2
 7 | opencv_python==4.1.2.30
 8 | tqdm==4.60.0
 9 | Pillow==8.2.0
10 | h5py==2.10.0


--------------------------------------------------------------------------------
/summary.py:
--------------------------------------------------------------------------------
 1 | #--------------------------------------------#
 2 | #   该部分代码用于看网络参数
 3 | #--------------------------------------------#
 4 | import torch
 5 | from thop import clever_format, profile
 6 | 
 7 | from nets.efficientdet import EfficientDetBackbone
 8 | from utils.utils import image_sizes
 9 | 
10 | if __name__ == '__main__':
11 |     phi             = 0
12 |     input_shape     = [image_sizes[phi], image_sizes[phi]]
13 |     num_classes     = 80
14 |     
15 |     device          = torch.device("cuda" if torch.cuda.is_available() else "cpu")
16 |     model           = EfficientDetBackbone(num_classes, phi).to(device)
17 |     print(model)
18 |     print('# generator parameters:', sum(param.numel() for param in model.parameters()))
19 |     
20 |     dummy_input     = torch.randn(1, 3, input_shape[0], input_shape[1]).to(device)
21 |     flops, params   = profile(model.to(device), (dummy_input, ), verbose=False)
22 |     #--------------------------------------------------------#
23 |     #   flops * 2是因为profile没有将卷积作为两个operations
24 |     #   有些论文将卷积算乘法、加法两个operations。此时乘2
25 |     #   有些论文只考虑乘法的运算次数，忽略加法。此时不乘2
26 |     #   本代码选择乘2，参考YOLOX。
27 |     #--------------------------------------------------------#
28 |     flops           = flops * 2
29 |     flops, params   = clever_format([flops, params], "%.3f")
30 |     print("Total GFLOPs: %s" %(flops))
31 |     print("Total Parameters: %s" %(params))
32 | 


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | #-------------------------------------#
  2 | #       对数据集进行训练
  3 | #-------------------------------------#
  4 | import datetime
  5 | import os
  6 | import warnings
  7 | from functools import partial
  8 | 
  9 | import numpy as np
 10 | import torch
 11 | import torch.backends.cudnn as cudnn
 12 | import torch.distributed as dist
 13 | import torch.optim as optim
 14 | from torch.utils.data import DataLoader
 15 | 
 16 | from nets.efficientdet import EfficientDetBackbone
 17 | from nets.efficientdet_training import (FocalLoss, get_lr_scheduler,
 18 |                                         set_optimizer_lr)
 19 | from utils.callbacks import EvalCallback, LossHistory
 20 | from utils.dataloader import EfficientdetDataset, efficientdet_dataset_collate
 21 | from utils.utils import (download_weights, get_classes, image_sizes,
 22 |                          seed_everything, show_config, worker_init_fn)
 23 | from utils.utils_fit import fit_one_epoch
 24 | 
 25 | warnings.filterwarnings("ignore")
 26 | 
 27 | '''
 28 | 训练自己的目标检测模型一定需要注意以下几点：
 29 | 1、训练前仔细检查自己的格式是否满足要求，该库要求数据集格式为VOC格式，需要准备好的内容有输入图片和标签
 30 |    输入图片为.jpg图片，无需固定大小，传入训练前会自动进行resize。
 31 |    灰度图会自动转成RGB图片进行训练，无需自己修改。
 32 |    输入图片如果后缀非jpg，需要自己批量转成jpg后再开始训练。
 33 | 
 34 |    标签为.xml格式，文件中会有需要检测的目标信息，标签文件和输入图片文件相对应。
 35 | 
 36 | 2、损失值的大小用于判断是否收敛，比较重要的是有收敛的趋势，即验证集损失不断下降，如果验证集损失基本上不改变的话，模型基本上就收敛了。
 37 |    损失值的具体大小并没有什么意义，大和小只在于损失的计算方式，并不是接近于0才好。如果想要让损失好看点，可以直接到对应的损失函数里面除上10000。
 38 |    训练过程中的损失值会保存在logs文件夹下的loss_%Y_%m_%d_%H_%M_%S文件夹中
 39 |    
 40 | 3、训练好的权值文件保存在logs文件夹中，每个训练世代（Epoch）包含若干训练步长（Step），每个训练步长（Step）进行一次梯度下降。
 41 |    如果只是训练了几个Step是不会保存的，Epoch和Step的概念要捋清楚一下。
 42 | '''
 43 | if __name__ == "__main__":
 44 |     #-------------------------------#
 45 |     #   是否使用Cuda
 46 |     #   没有GPU可以设置成False
 47 |     #-------------------------------#
 48 |     Cuda            = True
 49 |     #----------------------------------------------#
 50 |     #   Seed    用于固定随机种子
 51 |     #           使得每次独立训练都可以获得一样的结果
 52 |     #----------------------------------------------#
 53 |     seed            = 11
 54 |     #---------------------------------------------------------------------#
 55 |     #   distributed     用于指定是否使用单机多卡分布式运行
 56 |     #                   终端指令仅支持Ubuntu。CUDA_VISIBLE_DEVICES用于在Ubuntu下指定显卡。
 57 |     #                   Windows系统下默认使用DP模式调用所有显卡，不支持DDP。
 58 |     #   DP模式：
 59 |     #       设置            distributed = False
 60 |     #       在终端中输入    CUDA_VISIBLE_DEVICES=0,1 python train.py
 61 |     #   DDP模式：
 62 |     #       设置            distributed = True
 63 |     #       在终端中输入    CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py
 64 |     #---------------------------------------------------------------------#
 65 |     distributed     = False
 66 |     #---------------------------------------------------------------------#
 67 |     #   sync_bn     是否使用sync_bn，DDP模式多卡可用
 68 |     #---------------------------------------------------------------------#
 69 |     sync_bn         = False
 70 |     #---------------------------------------------------------------------#
 71 |     #   fp16        是否使用混合精度训练
 72 |     #               可减少约一半的显存、需要pytorch1.7.1以上
 73 |     #---------------------------------------------------------------------#
 74 |     fp16            = False
 75 |     #---------------------------------------------------------------------#
 76 |     #   classes_path    指向model_data下的txt，与自己训练的数据集相关 
 77 |     #                   训练前一定要修改classes_path，使其对应自己的数据集
 78 |     #---------------------------------------------------------------------#
 79 |     classes_path    = 'model_data/voc_classes.txt'
 80 |     #---------------------------------------------------------------------#
 81 |     #   用于选择所使用的模型的版本，0-7
 82 |     #---------------------------------------------------------------------#
 83 |     phi             = 0
 84 |     #----------------------------------------------------------------------------------------------------------------------------#
 85 |     #   pretrained      是否使用主干网络的预训练权重，此处使用的是主干的权重，因此是在模型构建的时候进行加载的。
 86 |     #                   如果设置了model_path，则主干的权值无需加载，pretrained的值无意义。
 87 |     #                   如果不设置model_path，pretrained = True，此时仅加载主干开始训练。
 88 |     #                   如果不设置model_path，pretrained = False，Freeze_Train = Fasle，此时从0开始训练，且没有冻结主干的过程。
 89 |     #----------------------------------------------------------------------------------------------------------------------------#
 90 |     pretrained      = False
 91 |     #----------------------------------------------------------------------------------------------------------------------------#
 92 |     #   权值文件的下载请看README，可以通过网盘下载。模型的 预训练权重 对不同数据集是通用的，因为特征是通用的。
 93 |     #   模型的 预训练权重 比较重要的部分是 主干特征提取网络的权值部分，用于进行特征提取。
 94 |     #   预训练权重对于99%的情况都必须要用，不用的话主干部分的权值太过随机，特征提取效果不明显，网络训练的结果也不会好
 95 |     #
 96 |     #   如果训练过程中存在中断训练的操作，可以将model_path设置成logs文件夹下的权值文件，将已经训练了一部分的权值再次载入。
 97 |     #   同时修改下方的 冻结阶段 或者 解冻阶段 的参数，来保证模型epoch的连续性。
 98 |     #   
 99 |     #   当model_path = ''的时候不加载整个模型的权值。
100 |     #
101 |     #   此处使用的是整个模型的权重，因此是在train.py进行加载的，pretrain不影响此处的权值加载。
102 |     #   如果想要让模型从主干的预训练权值开始训练，则设置model_path = ''，pretrain = True，此时仅加载主干。
103 |     #   如果想要让模型从0开始训练，则设置model_path = ''，pretrain = Fasle，Freeze_Train = Fasle，此时从0开始训练，且没有冻结主干的过程。
104 |     #   
105 |     #   一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！
106 |     #   如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。
107 |     #----------------------------------------------------------------------------------------------------------------------------#
108 |     model_path      = 'model_data/efficientdet-d0.pth'
109 |     #------------------------------------------------------#
110 |     #   input_shape     输入的shape大小
111 |     #------------------------------------------------------#
112 |     input_shape     = [image_sizes[phi], image_sizes[phi]]
113 |     
114 |     #----------------------------------------------------------------------------------------------------------------------------#
115 |     #   训练分为两个阶段，分别是冻结阶段和解冻阶段。设置冻结阶段是为了满足机器性能不足的同学的训练需求。
116 |     #   冻结训练需要的显存较小，显卡非常差的情况下，可设置Freeze_Epoch等于UnFreeze_Epoch，此时仅仅进行冻结训练。
117 |     #      
118 |     #   在此提供若干参数设置建议，各位训练者根据自己的需求进行灵活调整：
119 |     #   （一）从整个模型的预训练权重开始训练： 
120 |     #       Adam：
121 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adam'，Init_lr = 3e-4，weight_decay = 0。（冻结）
122 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adam'，Init_lr = 3e-4，weight_decay = 0。（不冻结）
123 |     #       SGD：
124 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 200，Freeze_Train = True，optimizer_type = 'sgd'，Init_lr = 1e-2，weight_decay = 4e-5。（冻结）
125 |     #           Init_Epoch = 0，UnFreeze_Epoch = 200，Freeze_Train = False，optimizer_type = 'sgd'，Init_lr = 1e-2，weight_decay = 4e-5。（不冻结）
126 |     #       其中：UnFreeze_Epoch可以在100-300之间调整。
127 |     #   （二）从主干网络的预训练权重开始训练：
128 |     #       Adam：
129 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 100，Freeze_Train = True，optimizer_type = 'adam'，Init_lr = 3e-4，weight_decay = 0。（冻结）
130 |     #           Init_Epoch = 0，UnFreeze_Epoch = 100，Freeze_Train = False，optimizer_type = 'adam'，Init_lr = 3e-4，weight_decay = 0。（不冻结）
131 |     #       SGD：
132 |     #           Init_Epoch = 0，Freeze_Epoch = 50，UnFreeze_Epoch = 200，Freeze_Train = True，optimizer_type = 'sgd'，Init_lr = 1e-2，weight_decay = 4e-5。（冻结）
133 |     #           Init_Epoch = 0，UnFreeze_Epoch = 200，Freeze_Train = False，optimizer_type = 'sgd'，Init_lr = 1e-2，weight_decay = 4e-5。（不冻结）
134 |     #       其中：由于从主干网络的预训练权重开始训练，主干的权值不一定适合目标检测，需要更多的训练跳出局部最优解。
135 |     #             UnFreeze_Epoch可以在200-300之间调整，YOLOV5和YOLOX均推荐使用300。
136 |     #             Adam相较于SGD收敛的快一些。因此UnFreeze_Epoch理论上可以小一点，但依然推荐更多的Epoch。
137 |     #   （三）batch_size的设置：
138 |     #       在显卡能够接受的范围内，以大为好。显存不足与数据集大小无关，提示显存不足（OOM或者CUDA out of memory）请调小batch_size。
139 |     #       受到BatchNorm层影响，batch_size最小为2，不能为1。
140 |     #       正常情况下Freeze_batch_size建议为Unfreeze_batch_size的1-2倍。不建议设置的差距过大，因为关系到学习率的自动调整。
141 |     #----------------------------------------------------------------------------------------------------------------------------#
142 |     #------------------------------------------------------------------#
143 |     #   冻结阶段训练参数
144 |     #   此时模型的主干被冻结了，特征提取网络不发生改变
145 |     #   占用的显存较小，仅对网络进行微调
146 |     #   Init_Epoch          模型当前开始的训练世代，其值可以大于Freeze_Epoch，如设置：
147 |     #                       Init_Epoch = 60、Freeze_Epoch = 50、UnFreeze_Epoch = 100
148 |     #                       会跳过冻结阶段，直接从60代开始，并调整对应的学习率。
149 |     #                       （断点续练时使用）
150 |     #   Freeze_Epoch        模型冻结训练的Freeze_Epoch
151 |     #                       (当Freeze_Train=False时失效)
152 |     #   Freeze_batch_size   模型冻结训练的batch_size
153 |     #                       (当Freeze_Train=False时失效)
154 |     #------------------------------------------------------------------#
155 |     Init_Epoch          = 0
156 |     Freeze_Epoch        = 50
157 |     Freeze_batch_size   = 8
158 |     #------------------------------------------------------------------#
159 |     #   解冻阶段训练参数
160 |     #   此时模型的主干不被冻结了，特征提取网络会发生改变
161 |     #   占用的显存较大，网络所有的参数都会发生改变
162 |     #   UnFreeze_Epoch          模型总共训练的epoch
163 |     #                           SGD需要更长的时间收敛，因此设置较大的UnFreeze_Epoch
164 |     #                           Adam可以使用相对较小的UnFreeze_Epoch
165 |     #   Unfreeze_batch_size     模型在解冻后的batch_size
166 |     #------------------------------------------------------------------#
167 |     UnFreeze_Epoch      = 100
168 |     Unfreeze_batch_size = 4
169 |     #------------------------------------------------------------------#
170 |     #   Freeze_Train    是否进行冻结训练
171 |     #                   默认先冻结主干训练后解冻训练。
172 |     #------------------------------------------------------------------#
173 |     Freeze_Train        = True
174 |     
175 |     #------------------------------------------------------------------#
176 |     #   其它训练参数：学习率、优化器、学习率下降有关
177 |     #------------------------------------------------------------------#
178 |     #------------------------------------------------------------------#
179 |     #   Init_lr         模型的最大学习率
180 |     #                   当使用Adam优化器时建议设置  Init_lr=3e-4
181 |     #                   当使用SGD优化器时建议设置   Init_lr=1e-2
182 |     #   Min_lr          模型的最小学习率，默认为最大学习率的0.01
183 |     #------------------------------------------------------------------#
184 |     Init_lr             = 3e-4
185 |     Min_lr              = Init_lr * 0.01
186 |     #------------------------------------------------------------------#
187 |     #   optimizer_type  使用到的优化器种类，可选的有adam、sgd
188 |     #                   当使用Adam优化器时建议设置  Init_lr=3e-4
189 |     #                   当使用SGD优化器时建议设置   Init_lr=1e-2
190 |     #   momentum        优化器内部使用到的momentum参数
191 |     #   weight_decay    权值衰减，可防止过拟合
192 |     #                   adam会导致weight_decay错误，使用adam时建议设置为0。
193 |     #------------------------------------------------------------------#
194 |     optimizer_type      = "adam"
195 |     momentum            = 0.9
196 |     weight_decay        = 0
197 |     #------------------------------------------------------------------#
198 |     #   lr_decay_type   使用到的学习率下降方式，可选的有'step'、'cos'
199 |     #------------------------------------------------------------------#
200 |     lr_decay_type       = 'cos'
201 |     #------------------------------------------------------------------#
202 |     #   save_period     多少个epoch保存一次权值
203 |     #------------------------------------------------------------------#
204 |     save_period         = 5
205 |     #------------------------------------------------------------------#
206 |     #   save_dir        权值与日志文件保存的文件夹
207 |     #------------------------------------------------------------------#
208 |     save_dir            = 'logs'
209 |     #------------------------------------------------------------------#
210 |     #   eval_flag       是否在训练时进行评估，评估对象为验证集
211 |     #                   安装pycocotools库后，评估体验更佳。
212 |     #   eval_period     代表多少个epoch评估一次，不建议频繁的评估
213 |     #                   评估需要消耗较多的时间，频繁评估会导致训练非常慢
214 |     #   此处获得的mAP会与get_map.py获得的会有所不同，原因有二：
215 |     #   （一）此处获得的mAP为验证集的mAP。
216 |     #   （二）此处设置评估参数较为保守，目的是加快评估速度。
217 |     #------------------------------------------------------------------#
218 |     eval_flag           = True
219 |     eval_period         = 5
220 |     #------------------------------------------------------------------#
221 |     #   num_workers     用于设置是否使用多线程读取数据，1代表关闭多线程
222 |     #                   开启后会加快数据读取速度，但是会占用更多内存
223 |     #                   在IO为瓶颈的时候再开启多线程，即GPU运算速度远大于读取图片的速度。
224 |     #------------------------------------------------------------------#
225 |     num_workers         = 4
226 | 
227 |     #------------------------------------------------------#
228 |     #   train_annotation_path   训练图片路径和标签
229 |     #   val_annotation_path     验证图片路径和标签
230 |     #------------------------------------------------------#
231 |     train_annotation_path   = '2007_train.txt'
232 |     val_annotation_path     = '2007_val.txt'
233 | 
234 |     seed_everything(seed)
235 |     #------------------------------------------------------#
236 |     #   设置用到的显卡
237 |     #------------------------------------------------------#
238 |     ngpus_per_node  = torch.cuda.device_count()
239 |     if distributed:
240 |         dist.init_process_group(backend="nccl")
241 |         local_rank  = int(os.environ["LOCAL_RANK"])
242 |         rank        = int(os.environ["RANK"])
243 |         device      = torch.device("cuda", local_rank)
244 |         if local_rank == 0:
245 |             print(f"[{os.getpid()}] (rank = {rank}, local_rank = {local_rank}) training...")
246 |             print("Gpu Device Count : ", ngpus_per_node)
247 |     else:
248 |         device          = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
249 |         local_rank      = 0
250 |         rank            = 0
251 | 
252 |     #----------------------------------------------------#
253 |     #   下载预训练权重
254 |     #----------------------------------------------------#
255 |     if pretrained:
256 |         backbone = "efficientnet-b" + str(phi)
257 |         if distributed:
258 |             if local_rank == 0:
259 |                 download_weights(backbone)  
260 |             dist.barrier()
261 |         else:
262 |             download_weights(backbone)
263 | 
264 |     #----------------------------------------------------#
265 |     #   获取classes和anchor
266 |     #----------------------------------------------------#
267 |     class_names, num_classes = get_classes(classes_path)
268 | 
269 |     #------------------------------------------------------#
270 |     #   创建EfficientDet模型
271 |     #   训练前一定要修改classes_path和对应的txt文件
272 |     #------------------------------------------------------#
273 |     model = EfficientDetBackbone(num_classes, phi, pretrained)
274 |     if model_path != '':
275 |         #------------------------------------------------------#
276 |         #   权值文件请看README，百度网盘下载
277 |         #------------------------------------------------------#
278 |         if local_rank == 0:
279 |             print('Load weights {}.'.format(model_path))
280 |         
281 |         #------------------------------------------------------#
282 |         #   根据预训练权重的Key和模型的Key进行加载
283 |         #------------------------------------------------------#
284 |         model_dict      = model.state_dict()
285 |         pretrained_dict = torch.load(model_path, map_location = device)
286 |         load_key, no_load_key, temp_dict = [], [], {}
287 |         for k, v in pretrained_dict.items():
288 |             if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):
289 |                 temp_dict[k] = v
290 |                 load_key.append(k)
291 |             else:
292 |                 no_load_key.append(k)
293 |         model_dict.update(temp_dict)
294 |         model.load_state_dict(model_dict)
295 |         #------------------------------------------------------#
296 |         #   显示没有匹配上的Key
297 |         #------------------------------------------------------#
298 |         if local_rank == 0:
299 |             print("\nSuccessful Load Key:", str(load_key)[:500], "……\nSuccessful Load Key Num:", len(load_key))
300 |             print("\nFail To Load Key:", str(no_load_key)[:500], "……\nFail To Load Key num:", len(no_load_key))
301 |             print("\n\033[1;33;44m温馨提示，head部分没有载入是正常现象，Backbone部分没有载入是错误的。\033[0m")
302 | 
303 |     #----------------------#
304 |     #   获得损失函数
305 |     #----------------------#
306 |     focal_loss      = FocalLoss()
307 |     #----------------------#
308 |     #   记录Loss
309 |     #----------------------#
310 |     if local_rank == 0:
311 |         time_str        = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S')
312 |         log_dir         = os.path.join(save_dir, "loss_" + str(time_str))
313 |         loss_history    = LossHistory(log_dir, model, input_shape=input_shape)
314 |     else:
315 |         loss_history    = None
316 |         
317 |     #------------------------------------------------------------------#
318 |     #   torch 1.2不支持amp，建议使用torch 1.7.1及以上正确使用fp16
319 |     #   因此torch1.2这里显示"could not be resolve"
320 |     #------------------------------------------------------------------#
321 |     if fp16:
322 |         from torch.cuda.amp import GradScaler as GradScaler
323 |         scaler = GradScaler()
324 |     else:
325 |         scaler = None
326 | 
327 |     model_train     = model.train()
328 |     #----------------------------#
329 |     #   多卡同步Bn
330 |     #----------------------------#
331 |     if sync_bn and ngpus_per_node > 1 and distributed:
332 |         model_train = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model_train)
333 |     elif sync_bn:
334 |         print("Sync_bn is not support in one gpu or not distributed.")
335 | 
336 |     if Cuda:
337 |         if distributed:
338 |             #----------------------------#
339 |             #   多卡平行运行
340 |             #----------------------------#
341 |             model_train = model_train.cuda(local_rank)
342 |             model_train = torch.nn.parallel.DistributedDataParallel(model_train, device_ids=[local_rank], find_unused_parameters=True)
343 |         else:
344 |             model_train = torch.nn.DataParallel(model)
345 |             cudnn.benchmark = True
346 |             model_train = model_train.cuda()
347 | 
348 |     #---------------------------#
349 |     #   读取数据集对应的txt
350 |     #---------------------------#
351 |     with open(train_annotation_path) as f:
352 |         train_lines = f.readlines()
353 |     with open(val_annotation_path) as f:
354 |         val_lines   = f.readlines()
355 |     num_train   = len(train_lines)
356 |     num_val     = len(val_lines)
357 |     
358 |     if local_rank == 0:
359 |         show_config(
360 |             classes_path = classes_path, model_path = model_path, input_shape = input_shape, \
361 |             Init_Epoch = Init_Epoch, Freeze_Epoch = Freeze_Epoch, UnFreeze_Epoch = UnFreeze_Epoch, Freeze_batch_size = Freeze_batch_size, Unfreeze_batch_size = Unfreeze_batch_size, Freeze_Train = Freeze_Train, \
362 |             Init_lr = Init_lr, Min_lr = Min_lr, optimizer_type = optimizer_type, momentum = momentum, lr_decay_type = lr_decay_type, \
363 |             save_period = save_period, save_dir = save_dir, num_workers = num_workers, num_train = num_train, num_val = num_val
364 |         )
365 |         #---------------------------------------------------------#
366 |         #   总训练世代指的是遍历全部数据的总次数
367 |         #   总训练步长指的是梯度下降的总次数 
368 |         #   每个训练世代包含若干训练步长，每个训练步长进行一次梯度下降。
369 |         #   此处仅建议最低训练世代，上不封顶，计算时只考虑了解冻部分
370 |         #----------------------------------------------------------#
371 |         wanted_step = 5e4 if optimizer_type == "sgd" else 1.5e4
372 |         total_step  = num_train // Unfreeze_batch_size * UnFreeze_Epoch
373 |         if total_step <= wanted_step:
374 |             if num_train // Unfreeze_batch_size == 0:
375 |                 raise ValueError('数据集过小，无法进行训练，请扩充数据集。')
376 |             wanted_epoch = wanted_step // (num_train // Unfreeze_batch_size) + 1
377 |             print("\n\033[1;33;44m[Warning] 使用%s优化器时，建议将训练总步长设置到%d以上。\033[0m"%(optimizer_type, wanted_step))
378 |             print("\033[1;33;44m[Warning] 本次运行的总训练数据量为%d，Unfreeze_batch_size为%d，共训练%d个Epoch，计算出总训练步长为%d。\033[0m"%(num_train, Unfreeze_batch_size, UnFreeze_Epoch, total_step))
379 |             print("\033[1;33;44m[Warning] 由于总训练步长为%d，小于建议总步长%d，建议设置总世代为%d。\033[0m"%(total_step, wanted_step, wanted_epoch))
380 | 
381 |     #------------------------------------------------------#
382 |     #   主干特征提取网络特征通用，冻结训练可以加快训练速度
383 |     #   也可以在训练初期防止权值被破坏。
384 |     #   Init_Epoch为起始世代
385 |     #   Freeze_Epoch为冻结训练的世代
386 |     #   UnFreeze_Epoch总训练世代
387 |     #   提示OOM或者显存不足请调小Batch_size
388 |     #------------------------------------------------------#
389 |     if True:
390 |         UnFreeze_flag = False
391 |         #------------------------------------#
392 |         #   冻结一定部分训练
393 |         #------------------------------------#
394 |         if Freeze_Train:
395 |             for param in model.backbone_net.parameters():
396 |                 param.requires_grad = False
397 | 
398 |         #-------------------------------------------------------------------#
399 |         #   如果不冻结训练的话，直接设置batch_size为Unfreeze_batch_size
400 |         #-------------------------------------------------------------------#
401 |         batch_size = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size
402 | 
403 |         #-------------------------------------------------------------------#
404 |         #   判断当前batch_size，自适应调整学习率
405 |         #-------------------------------------------------------------------#
406 |         nbs             = 16
407 |         lr_limit_max    = 5e-4 if optimizer_type == 'adam' else 1e-1
408 |         lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-4
409 |         Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
410 |         Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
411 |         
412 |         #---------------------------------------#
413 |         #   根据optimizer_type选择优化器
414 |         #---------------------------------------#
415 |         optimizer = {
416 |             'adam'  : optim.Adam(model.parameters(), Init_lr_fit, betas = (momentum, 0.999), weight_decay = weight_decay),
417 |             'sgd'   : optim.SGD(model.parameters(), Init_lr_fit, momentum = momentum, nesterov=True, weight_decay = weight_decay)
418 |         }[optimizer_type]
419 | 
420 |         #---------------------------------------#
421 |         #   获得学习率下降的公式
422 |         #---------------------------------------#
423 |         lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
424 |         
425 |         #---------------------------------------#
426 |         #   判断每一个世代的长度
427 |         #---------------------------------------#
428 |         epoch_step      = num_train // batch_size
429 |         epoch_step_val  = num_val // batch_size
430 |         
431 |         if epoch_step == 0 or epoch_step_val == 0:
432 |             raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")
433 | 
434 |         train_dataset   = EfficientdetDataset(train_lines, input_shape, num_classes, train = True)
435 |         val_dataset     = EfficientdetDataset(val_lines, input_shape, num_classes, train = False)
436 |         
437 |         if distributed:
438 |             train_sampler   = torch.utils.data.distributed.DistributedSampler(train_dataset, shuffle=True,)
439 |             val_sampler     = torch.utils.data.distributed.DistributedSampler(val_dataset, shuffle=False,)
440 |             batch_size      = batch_size // ngpus_per_node
441 |             shuffle         = False
442 |         else:
443 |             train_sampler   = None
444 |             val_sampler     = None
445 |             shuffle         = True
446 | 
447 |         gen             = DataLoader(train_dataset, shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True,
448 |                                     drop_last=True, collate_fn=efficientdet_dataset_collate, sampler=train_sampler, 
449 |                                     worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))
450 |         gen_val         = DataLoader(val_dataset  , shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 
451 |                                     drop_last=True, collate_fn=efficientdet_dataset_collate, sampler=val_sampler, 
452 |                                     worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))
453 | 
454 |         #----------------------#
455 |         #   记录eval的map曲线
456 |         #----------------------#
457 |         if local_rank == 0:
458 |             eval_callback   = EvalCallback(model, input_shape, class_names, num_classes, val_lines, log_dir, Cuda, \
459 |                                             eval_flag=eval_flag, period=eval_period)
460 |         else:
461 |             eval_callback   = None
462 |         
463 |         #---------------------------------------#
464 |         #   开始模型训练
465 |         #---------------------------------------#
466 |         for epoch in range(Init_Epoch, UnFreeze_Epoch):
467 |             #---------------------------------------#
468 |             #   如果模型有冻结学习部分
469 |             #   则解冻，并设置参数
470 |             #---------------------------------------#
471 |             if epoch >= Freeze_Epoch and not UnFreeze_flag and Freeze_Train:
472 |                 batch_size = Unfreeze_batch_size
473 | 
474 |                 #-------------------------------------------------------------------#
475 |                 #   判断当前batch_size，自适应调整学习率
476 |                 #-------------------------------------------------------------------#
477 |                 nbs             = 16
478 |                 lr_limit_max    = 5e-4 if optimizer_type == 'adam' else 1e-1
479 |                 lr_limit_min    = 3e-4 if optimizer_type == 'adam' else 5e-4
480 |                 Init_lr_fit     = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max)
481 |                 Min_lr_fit      = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2)
482 |                 #---------------------------------------#
483 |                 #   获得学习率下降的公式
484 |                 #---------------------------------------#
485 |                 lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch)
486 |                 
487 |                 for param in model.backbone_net.parameters():
488 |                     param.requires_grad = True
489 | 
490 |                 epoch_step      = num_train // batch_size
491 |                 epoch_step_val  = num_val // batch_size
492 | 
493 |                 if epoch_step == 0 or epoch_step_val == 0:
494 |                     raise ValueError("数据集过小，无法继续进行训练，请扩充数据集。")
495 | 
496 |                 if distributed:
497 |                     batch_size = batch_size // ngpus_per_node
498 | 
499 |                 gen             = DataLoader(train_dataset, shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True,
500 |                                             drop_last=True, collate_fn=efficientdet_dataset_collate, sampler=train_sampler, 
501 |                                             worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))
502 |                 gen_val         = DataLoader(val_dataset  , shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 
503 |                                             drop_last=True, collate_fn=efficientdet_dataset_collate, sampler=val_sampler, 
504 |                                             worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed))
505 | 
506 |                 UnFreeze_flag = True
507 | 
508 |             if distributed:
509 |                 train_sampler.set_epoch(epoch)
510 |                 
511 |             set_optimizer_lr(optimizer, lr_scheduler_func, epoch)
512 |             
513 |             fit_one_epoch(model_train, model, focal_loss, loss_history, eval_callback, optimizer, epoch, 
514 |                     epoch_step, epoch_step_val, gen, gen_val, UnFreeze_Epoch, Cuda, fp16, scaler, save_period, save_dir, local_rank)
515 |                         
516 |             if distributed:
517 |                 dist.barrier()
518 | 
519 |         if local_rank == 0:
520 |             loss_history.writer.close()
521 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/utils/anchors.py:
--------------------------------------------------------------------------------
 1 | import itertools
 2 | 
 3 | import numpy as np
 4 | import torch
 5 | import torch.nn as nn
 6 | 
 7 | 
 8 | class Anchors(nn.Module):
 9 |     def __init__(self, anchor_scale=4., pyramid_levels=[3, 4, 5, 6, 7]):
10 |         super().__init__()
11 |         self.anchor_scale = anchor_scale
12 |         self.pyramid_levels = pyramid_levels
13 |         # strides步长为[8, 16, 32, 64, 128]， 特征点的间距
14 |         self.strides = [2 ** x for x in self.pyramid_levels]
15 |         self.scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)])
16 |         self.ratios = [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)]
17 | 
18 |     def forward(self, image):
19 |         image_shape = image.shape[2:]
20 | 
21 |         boxes_all = []
22 |         for stride in self.strides:
23 |             boxes_level = []
24 |             for scale, ratio in itertools.product(self.scales, self.ratios):
25 |                 if image_shape[1] % stride != 0:
26 |                     raise ValueError('input size must be divided by the stride.')
27 |                 base_anchor_size = self.anchor_scale * stride * scale
28 |                 anchor_size_x_2 = base_anchor_size * ratio[0] / 2.0
29 |                 anchor_size_y_2 = base_anchor_size * ratio[1] / 2.0
30 |                 x = np.arange(stride / 2, image_shape[1], stride)
31 |                 y = np.arange(stride / 2, image_shape[0], stride)
32 | 
33 |                 xv, yv = np.meshgrid(x, y)
34 |                 
35 |                 xv = xv.reshape(-1)
36 |                 yv = yv.reshape(-1)
37 | 
38 |                 # y1,x1,y2,x2
39 |                 boxes = np.vstack((yv - anchor_size_y_2, xv - anchor_size_x_2,
40 |                                    yv + anchor_size_y_2, xv + anchor_size_x_2))
41 |                 boxes = np.swapaxes(boxes, 0, 1)
42 |                 boxes_level.append(np.expand_dims(boxes, axis=1))
43 |             # concat anchors on the same level to the reshape NxAx4
44 |             boxes_level = np.concatenate(boxes_level, axis=1)
45 |             boxes_all.append(boxes_level.reshape([-1, 4]))
46 | 
47 |         anchor_boxes = np.vstack(boxes_all)
48 | 
49 |         anchor_boxes = torch.from_numpy(anchor_boxes).to(image.device)
50 |         anchor_boxes = anchor_boxes.unsqueeze(0)
51 | 
52 |         return anchor_boxes
53 | 


--------------------------------------------------------------------------------
/utils/callbacks.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | import os
  3 | 
  4 | import matplotlib
  5 | import torch
  6 | 
  7 | matplotlib.use('Agg')
  8 | from matplotlib import pyplot as plt
  9 | import scipy.signal
 10 | 
 11 | import shutil
 12 | import numpy as np
 13 | from PIL import Image
 14 | from torch.utils.tensorboard import SummaryWriter
 15 | from tqdm import tqdm
 16 | 
 17 | from .utils import cvtColor, preprocess_input, resize_image
 18 | from .utils_bbox import decodebox, non_max_suppression
 19 | from .utils_map import get_coco_map, get_map
 20 | 
 21 | 
 22 | class LossHistory():
 23 |     def __init__(self, log_dir, model, input_shape):
 24 |         self.log_dir    = log_dir
 25 |         self.losses     = []
 26 |         self.val_loss   = []
 27 |         
 28 |         os.makedirs(self.log_dir)
 29 |         self.writer     = SummaryWriter(self.log_dir)
 30 |         try:
 31 |             dummy_input     = torch.randn(2, 3, input_shape[0], input_shape[1])
 32 |             self.writer.add_graph(model, dummy_input)
 33 |         except:
 34 |             pass
 35 | 
 36 |     def append_loss(self, epoch, loss, val_loss):
 37 |         if not os.path.exists(self.log_dir):
 38 |             os.makedirs(self.log_dir)
 39 | 
 40 |         self.losses.append(loss)
 41 |         self.val_loss.append(val_loss)
 42 | 
 43 |         with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f:
 44 |             f.write(str(loss))
 45 |             f.write("\n")
 46 |         with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f:
 47 |             f.write(str(val_loss))
 48 |             f.write("\n")
 49 | 
 50 |         self.writer.add_scalar('loss', loss, epoch)
 51 |         self.writer.add_scalar('val_loss', val_loss, epoch)
 52 |         self.loss_plot()
 53 | 
 54 |     def loss_plot(self):
 55 |         iters = range(len(self.losses))
 56 | 
 57 |         plt.figure()
 58 |         plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss')
 59 |         plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss')
 60 |         try:
 61 |             if len(self.losses) < 25:
 62 |                 num = 5
 63 |             else:
 64 |                 num = 15
 65 |             
 66 |             plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss')
 67 |             plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss')
 68 |         except:
 69 |             pass
 70 | 
 71 |         plt.grid(True)
 72 |         plt.xlabel('Epoch')
 73 |         plt.ylabel('Loss')
 74 |         plt.legend(loc="upper right")
 75 | 
 76 |         plt.savefig(os.path.join(self.log_dir, "epoch_loss.png"))
 77 | 
 78 |         plt.cla()
 79 |         plt.close("all")
 80 | 
 81 | class EvalCallback():
 82 |     def __init__(self, net, input_shape, class_names, num_classes, val_lines, log_dir, cuda, \
 83 |             map_out_path=".temp_map_out", max_boxes=100, confidence=0.05, nms_iou=0.5, letterbox_image=True, MINOVERLAP=0.5, eval_flag=True, period=1):
 84 |         super(EvalCallback, self).__init__()
 85 |         
 86 |         self.net                = net
 87 |         self.input_shape        = input_shape
 88 |         self.class_names        = class_names
 89 |         self.num_classes        = num_classes
 90 |         self.val_lines          = val_lines
 91 |         self.log_dir            = log_dir
 92 |         self.cuda               = cuda
 93 |         self.map_out_path       = map_out_path
 94 |         self.max_boxes          = max_boxes
 95 |         self.confidence         = confidence
 96 |         self.nms_iou            = nms_iou
 97 |         self.letterbox_image    = letterbox_image
 98 |         self.MINOVERLAP         = MINOVERLAP
 99 |         self.eval_flag          = eval_flag
100 |         self.period             = period
101 |         
102 |         self.maps       = [0]
103 |         self.epoches    = [0]
104 |         if self.eval_flag:
105 |             with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
106 |                 f.write(str(0))
107 |                 f.write("\n")
108 | 
109 |     #---------------------------------------------------#
110 |     #   检测图片
111 |     #---------------------------------------------------#
112 |     def get_map_txt(self, image_id, image, class_names, map_out_path):
113 |         f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 
114 |         image_shape = np.array(np.shape(image)[0:2])
115 |         #---------------------------------------------------------#
116 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
117 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
118 |         #---------------------------------------------------------#
119 |         image       = cvtColor(image)
120 |         #---------------------------------------------------------#
121 |         #   给图像增加灰条，实现不失真的resize
122 |         #   也可以直接resize进行识别
123 |         #---------------------------------------------------------#
124 |         image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
125 |         #---------------------------------------------------------#
126 |         #   添加上batch_size维度，图片预处理，归一化。
127 |         #---------------------------------------------------------#
128 |         image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
129 | 
130 |         with torch.no_grad():
131 |             images = torch.from_numpy(image_data)
132 |             if self.cuda:
133 |                 images = images.cuda()
134 |             #---------------------------------------------------------#
135 |             #   传入网络当中进行预测
136 |             #---------------------------------------------------------#
137 |             _, regression, classification, anchors = self.net(images)
138 |             
139 |             #-----------------------------------------------------------#
140 |             #   将预测结果进行解码
141 |             #-----------------------------------------------------------#
142 |             outputs     = decodebox(regression, anchors, self.input_shape)
143 |             results     = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 
144 |                                     image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
145 |                
146 |             if results[0] is None: 
147 |                 return 
148 | 
149 |             top_label   = np.array(results[0][:, 5], dtype = 'int32')
150 |             top_conf    = results[0][:, 4]
151 |             top_boxes   = results[0][:, :4]
152 | 
153 |         top_100     = np.argsort(top_conf)[::-1][:self.max_boxes]
154 |         top_boxes   = top_boxes[top_100]
155 |         top_conf    = top_conf[top_100]
156 |         top_label   = top_label[top_100]
157 | 
158 |         for i, c in list(enumerate(top_label)):
159 |             predicted_class = self.class_names[int(c)]
160 |             box             = top_boxes[i]
161 |             score           = str(top_conf[i])
162 | 
163 |             top, left, bottom, right = box
164 |             if predicted_class not in class_names:
165 |                 continue
166 | 
167 |             f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
168 | 
169 |         f.close()
170 |         return 
171 |     
172 |     def on_epoch_end(self, epoch, model_eval):
173 |         if epoch % self.period == 0 and self.eval_flag:
174 |             self.net = model_eval
175 |             if not os.path.exists(self.map_out_path):
176 |                 os.makedirs(self.map_out_path)
177 |             if not os.path.exists(os.path.join(self.map_out_path, "ground-truth")):
178 |                 os.makedirs(os.path.join(self.map_out_path, "ground-truth"))
179 |             if not os.path.exists(os.path.join(self.map_out_path, "detection-results")):
180 |                 os.makedirs(os.path.join(self.map_out_path, "detection-results"))
181 |             print("Get map.")
182 |             for annotation_line in tqdm(self.val_lines):
183 |                 line        = annotation_line.split()
184 |                 image_id    = os.path.basename(line[0]).split('.')[0]
185 |                 #------------------------------#
186 |                 #   读取图像并转换成RGB图像
187 |                 #------------------------------#
188 |                 image       = Image.open(line[0])
189 |                 #------------------------------#
190 |                 #   获得预测框
191 |                 #------------------------------#
192 |                 gt_boxes    = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
193 |                 #------------------------------#
194 |                 #   获得预测txt
195 |                 #------------------------------#
196 |                 self.get_map_txt(image_id, image, self.class_names, self.map_out_path)
197 |                 
198 |                 #------------------------------#
199 |                 #   获得真实框txt
200 |                 #------------------------------#
201 |                 with open(os.path.join(self.map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
202 |                     for box in gt_boxes:
203 |                         left, top, right, bottom, obj = box
204 |                         obj_name = self.class_names[obj]
205 |                         new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
206 |                         
207 |             print("Calculate Map.")
208 |             try:
209 |                 temp_map = get_coco_map(class_names = self.class_names, path = self.map_out_path)[1]
210 |             except:
211 |                 temp_map = get_map(self.MINOVERLAP, False, path = self.map_out_path)
212 |             self.maps.append(temp_map)
213 |             self.epoches.append(epoch)
214 | 
215 |             with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
216 |                 f.write(str(temp_map))
217 |                 f.write("\n")
218 |             
219 |             plt.figure()
220 |             plt.plot(self.epoches, self.maps, 'red', linewidth = 2, label='train map')
221 | 
222 |             plt.grid(True)
223 |             plt.xlabel('Epoch')
224 |             plt.ylabel('Map %s'%str(self.MINOVERLAP))
225 |             plt.title('A Map Curve')
226 |             plt.legend(loc="upper right")
227 | 
228 |             plt.savefig(os.path.join(self.log_dir, "epoch_map.png"))
229 |             plt.cla()
230 |             plt.close("all")
231 | 
232 |             print("Get map done.")
233 |             shutil.rmtree(self.map_out_path)
234 | 


--------------------------------------------------------------------------------
/utils/dataloader.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | import torch
  4 | from PIL import Image
  5 | from torch.utils.data.dataset import Dataset
  6 | 
  7 | from utils.utils import cvtColor, preprocess_input
  8 | 
  9 | 
 10 | class EfficientdetDataset(Dataset):
 11 |     def __init__(self, annotation_lines, input_shape, num_classes, train):
 12 |         super(EfficientdetDataset, self).__init__()
 13 |         self.annotation_lines   = annotation_lines
 14 |         self.length             = len(self.annotation_lines)
 15 |         self.input_shape        = input_shape
 16 |         self.num_classes        = num_classes
 17 |         self.train              = train
 18 | 
 19 |     def __len__(self):
 20 |         return self.length
 21 | 
 22 |     def __getitem__(self, index):
 23 |         index = index % self.length
 24 | 
 25 |         image, box  = self.get_random_data(self.annotation_lines[index], self.input_shape, random = self.train)
 26 | 
 27 |         image       = np.transpose(preprocess_input(np.array(image, dtype=np.float32)),(2,0,1))
 28 |         box         = np.array(box, dtype=np.float32)
 29 |         return image, box
 30 |         
 31 |     def rand(self, a=0, b=1):
 32 |         return np.random.rand()*(b-a) + a
 33 | 
 34 |     def get_random_data(self, annotation_line, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.4, random=True):
 35 |         line    = annotation_line.split()
 36 |         #------------------------------#
 37 |         #   读取图像并转换成RGB图像
 38 |         #------------------------------#
 39 |         image   = Image.open(line[0])
 40 |         image   = cvtColor(image)
 41 |         #------------------------------#
 42 |         #   获得图像的高宽与目标高宽
 43 |         #------------------------------#
 44 |         iw, ih  = image.size
 45 |         h, w    = input_shape
 46 |         #------------------------------#
 47 |         #   获得预测框
 48 |         #------------------------------#
 49 |         box     = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])
 50 | 
 51 |         if not random:
 52 |             scale = min(w/iw, h/ih)
 53 |             nw = int(iw*scale)
 54 |             nh = int(ih*scale)
 55 |             dx = (w-nw)//2
 56 |             dy = (h-nh)//2
 57 | 
 58 |             #---------------------------------#
 59 |             #   将图像多余的部分加上灰条
 60 |             #---------------------------------#
 61 |             image       = image.resize((nw,nh), Image.BICUBIC)
 62 |             new_image   = Image.new('RGB', (w,h), (128,128,128))
 63 |             new_image.paste(image, (dx, dy))
 64 |             image_data  = np.array(new_image, np.float32)
 65 | 
 66 |             #---------------------------------#
 67 |             #   对真实框进行调整
 68 |             #---------------------------------#
 69 |             if len(box)>0:
 70 |                 np.random.shuffle(box)
 71 |                 box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
 72 |                 box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
 73 |                 box[:, 0:2][box[:, 0:2]<0] = 0
 74 |                 box[:, 2][box[:, 2]>w] = w
 75 |                 box[:, 3][box[:, 3]>h] = h
 76 |                 box_w = box[:, 2] - box[:, 0]
 77 |                 box_h = box[:, 3] - box[:, 1]
 78 |                 box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box
 79 | 
 80 |             return image_data, box
 81 |                 
 82 |         #------------------------------------------#
 83 |         #   对图像进行缩放并且进行长和宽的扭曲
 84 |         #------------------------------------------#
 85 |         new_ar = w/h * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter)
 86 |         scale = self.rand(.25, 2)
 87 |         if new_ar < 1:
 88 |             nh = int(scale*h)
 89 |             nw = int(nh*new_ar)
 90 |         else:
 91 |             nw = int(scale*w)
 92 |             nh = int(nw/new_ar)
 93 |         image = image.resize((nw,nh), Image.BICUBIC)
 94 | 
 95 |         #------------------------------------------#
 96 |         #   将图像多余的部分加上灰条
 97 |         #------------------------------------------#
 98 |         dx = int(self.rand(0, w-nw))
 99 |         dy = int(self.rand(0, h-nh))
100 |         new_image = Image.new('RGB', (w,h), (128,128,128))
101 |         new_image.paste(image, (dx, dy))
102 |         image = new_image
103 | 
104 |         #------------------------------------------#
105 |         #   翻转图像
106 |         #------------------------------------------#
107 |         flip = self.rand()<.5
108 |         if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
109 | 
110 |         image_data      = np.array(image, np.uint8)
111 |         #---------------------------------#
112 |         #   对图像进行色域变换
113 |         #   计算色域变换的参数
114 |         #---------------------------------#
115 |         r               = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1
116 |         #---------------------------------#
117 |         #   将图像转到HSV上
118 |         #---------------------------------#
119 |         hue, sat, val   = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV))
120 |         dtype           = image_data.dtype
121 |         #---------------------------------#
122 |         #   应用变换
123 |         #---------------------------------#
124 |         x       = np.arange(0, 256, dtype=r.dtype)
125 |         lut_hue = ((x * r[0]) % 180).astype(dtype)
126 |         lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
127 |         lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
128 | 
129 |         image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))
130 |         image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB)
131 | 
132 |         #---------------------------------#
133 |         #   对真实框进行调整
134 |         #---------------------------------#
135 |         if len(box)>0:
136 |             np.random.shuffle(box)
137 |             box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx
138 |             box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy
139 |             if flip: box[:, [0,2]] = w - box[:, [2,0]]
140 |             box[:, 0:2][box[:, 0:2]<0] = 0
141 |             box[:, 2][box[:, 2]>w] = w
142 |             box[:, 3][box[:, 3]>h] = h
143 |             box_w = box[:, 2] - box[:, 0]
144 |             box_h = box[:, 3] - box[:, 1]
145 |             box = box[np.logical_and(box_w>1, box_h>1)] 
146 |         
147 |         return image_data, box
148 | 
149 | # DataLoader中collate_fn使用
150 | def efficientdet_dataset_collate(batch):
151 |     images = []
152 |     bboxes = []
153 |     for img, box in batch:
154 |         images.append(img)
155 |         bboxes.append(box)
156 |     images = torch.from_numpy(np.array(images)).type(torch.FloatTensor)
157 |     bboxes = [torch.from_numpy(ann).type(torch.FloatTensor) for ann in bboxes]
158 |     return images, bboxes
159 | 
160 | 


--------------------------------------------------------------------------------
/utils/utils.py:
--------------------------------------------------------------------------------
  1 | import random
  2 | 
  3 | import numpy as np
  4 | import torch
  5 | from PIL import Image
  6 | 
  7 | #---------------------------------------------------------------------#
  8 | #   用于预测的图像大小，无需修改，由phi选择
  9 | #---------------------------------------------------------------------#
 10 | image_sizes = [512, 640, 768, 896, 1024, 1280, 1408, 1536]
 11 | 
 12 | #---------------------------------------------------------#
 13 | #   将图像转换成RGB图像，防止灰度图在预测时报错。
 14 | #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
 15 | #---------------------------------------------------------#
 16 | def cvtColor(image):
 17 |     if len(np.shape(image)) == 3 and np.shape(image)[2] == 3:
 18 |         return image 
 19 |     else:
 20 |         image = image.convert('RGB')
 21 |         return image 
 22 | 
 23 | #---------------------------------------------------#
 24 | #   对输入图像进行resize
 25 | #---------------------------------------------------#
 26 | def resize_image(image, size, letterbox_image):
 27 |     iw, ih  = image.size
 28 |     w, h    = size
 29 |     if letterbox_image:
 30 |         scale   = min(w/iw, h/ih)
 31 |         nw      = int(iw*scale)
 32 |         nh      = int(ih*scale)
 33 | 
 34 |         image   = image.resize((nw,nh), Image.BICUBIC)
 35 |         new_image = Image.new('RGB', size, (128,128,128))
 36 |         new_image.paste(image, ((w-nw)//2, (h-nh)//2))
 37 |     else:
 38 |         new_image = image.resize((w, h), Image.BICUBIC)
 39 |     return new_image
 40 | 
 41 | #---------------------------------------------------#
 42 | #   获得类
 43 | #---------------------------------------------------#
 44 | def get_classes(classes_path):
 45 |     with open(classes_path, encoding='utf-8') as f:
 46 |         class_names = f.readlines()
 47 |     class_names = [c.strip() for c in class_names]
 48 |     return class_names, len(class_names)
 49 | 
 50 | #---------------------------------------------------#
 51 | #   获得学习率
 52 | #---------------------------------------------------#
 53 | def get_lr(optimizer):
 54 |     for param_group in optimizer.param_groups:
 55 |         return param_group['lr']
 56 | 
 57 | #---------------------------------------------------#
 58 | #   设置种子
 59 | #---------------------------------------------------#
 60 | def seed_everything(seed=11):
 61 |     random.seed(seed)
 62 |     np.random.seed(seed)
 63 |     torch.manual_seed(seed)
 64 |     torch.cuda.manual_seed(seed)
 65 |     torch.cuda.manual_seed_all(seed)
 66 |     torch.backends.cudnn.deterministic = True
 67 |     torch.backends.cudnn.benchmark = False
 68 | 
 69 | #---------------------------------------------------#
 70 | #   设置Dataloader的种子
 71 | #---------------------------------------------------#
 72 | def worker_init_fn(worker_id, rank, seed):
 73 |     worker_seed = rank + seed
 74 |     random.seed(worker_seed)
 75 |     np.random.seed(worker_seed)
 76 |     torch.manual_seed(worker_seed)
 77 |         
 78 | def preprocess_input(image):
 79 |     image   /= 255
 80 |     mean    = (0.485, 0.456, 0.406)
 81 |     std     = (0.229, 0.224, 0.225)
 82 |     image   -= mean
 83 |     image   /= std
 84 |     return image
 85 | 
 86 | def show_config(**kwargs):
 87 |     print('Configurations:')
 88 |     print('-' * 70)
 89 |     print('|%25s | %40s|' % ('keys', 'values'))
 90 |     print('-' * 70)
 91 |     for key, value in kwargs.items():
 92 |         print('|%25s | %40s|' % (str(key), str(value)))
 93 |     print('-' * 70)
 94 | 
 95 | def download_weights(backbone, model_dir="./model_data"):
 96 |     import os
 97 | 
 98 |     from torch.hub import load_state_dict_from_url
 99 |     
100 |     download_urls = {
101 |         'efficientnet-b0': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b0.pth',
102 |         'efficientnet-b1': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b1.pth',
103 |         'efficientnet-b2': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b2.pth',
104 |         'efficientnet-b3': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b3.pth',
105 |         'efficientnet-b4': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b4.pth',
106 |         'efficientnet-b5': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b5.pth',
107 |         'efficientnet-b6': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b6.pth',
108 |         'efficientnet-b7': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b7.pth',
109 |     }
110 |     url = download_urls[backbone]
111 |     
112 |     if not os.path.exists(model_dir):
113 |         os.makedirs(model_dir)
114 |     load_state_dict_from_url(url, model_dir)


--------------------------------------------------------------------------------
/utils/utils_bbox.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | from torchvision.ops import nms
  4 | 
  5 | def decodebox(regression, anchors, input_shape):
  6 |     dtype   = regression.dtype
  7 |     anchors = anchors.to(dtype)
  8 |     #--------------------------------------#
  9 |     #   计算先验框的中心
 10 |     #--------------------------------------#
 11 |     y_centers_a = (anchors[..., 0] + anchors[..., 2]) / 2
 12 |     x_centers_a = (anchors[..., 1] + anchors[..., 3]) / 2
 13 | 
 14 |     #--------------------------------------#
 15 |     #   计算先验框的宽高
 16 |     #--------------------------------------#
 17 |     ha = anchors[..., 2] - anchors[..., 0]
 18 |     wa = anchors[..., 3] - anchors[..., 1]
 19 | 
 20 |     #--------------------------------------#
 21 |     #   计算调整后先验框的宽高
 22 |     #   即计算预测框的宽高
 23 |     #--------------------------------------#
 24 |     w = regression[..., 3].exp() * wa
 25 |     h = regression[..., 2].exp() * ha
 26 | 
 27 |     #--------------------------------------#
 28 |     #   计算调整后先验框的中心
 29 |     #   即计算预测框的中心
 30 |     #--------------------------------------#
 31 |     y_centers = regression[..., 0] * ha + y_centers_a
 32 |     x_centers = regression[..., 1] * wa + x_centers_a
 33 | 
 34 |     #--------------------------------------#
 35 |     #   计算预测框的左上角右下角
 36 |     #--------------------------------------#
 37 |     ymin = y_centers - h / 2.
 38 |     xmin = x_centers - w / 2.
 39 |     ymax = y_centers + h / 2.
 40 |     xmax = x_centers + w / 2.
 41 | 
 42 |     #--------------------------------------#
 43 |     #   将预测框的结果进行堆叠
 44 |     #--------------------------------------#
 45 |     boxes = torch.stack([xmin, ymin, xmax, ymax], dim=2)
 46 | 
 47 |     # fig = plt.figure()
 48 |     # ax = fig.add_subplot(121)
 49 |     # grid_x = x_centers_a[0,-4*4*9:]
 50 |     # grid_y = y_centers_a[0,-4*4*9:]
 51 |     # plt.ylim(-600,1200)
 52 |     # plt.xlim(-600,1200)
 53 |     # plt.gca().invert_yaxis()
 54 |     # plt.scatter(grid_x.cpu(),grid_y.cpu())
 55 | 
 56 |     # anchor_left = anchors[0,-4*4*9:,1]
 57 |     # anchor_top = anchors[0,-4*4*9:,0]
 58 |     # anchor_w = wa[0,-4*4*9:]
 59 |     # anchor_h = ha[0,-4*4*9:]
 60 | 
 61 |     # for i in range(9,18):
 62 |     #     rect1 = plt.Rectangle([anchor_left[i],anchor_top[i]],anchor_w[i],anchor_h[i],color="r",fill=False)
 63 |     #     ax.add_patch(rect1)
 64 | 
 65 |     # ax = fig.add_subplot(122)
 66 |     
 67 |     # grid_x = x_centers_a[0,-4*4*9:]
 68 |     # grid_y = y_centers_a[0,-4*4*9:]
 69 |     # plt.scatter(grid_x.cpu(),grid_y.cpu())
 70 |     # plt.ylim(-600,1200)
 71 |     # plt.xlim(-600,1200)
 72 |     # plt.gca().invert_yaxis()
 73 |     
 74 |     # y_centers = y_centers[0,-4*4*9:]
 75 |     # x_centers = x_centers[0,-4*4*9:]
 76 | 
 77 |     # pre_left = xmin[0,-4*4*9:]
 78 |     # pre_top = ymin[0,-4*4*9:]
 79 |     
 80 |     # pre_w = xmax[0,-4*4*9:]-xmin[0,-4*4*9:]
 81 |     # pre_h = ymax[0,-4*4*9:]-ymin[0,-4*4*9:]
 82 | 
 83 |     # for i in range(9,18):
 84 |     #     plt.scatter(x_centers[i].cpu(),y_centers[i].cpu(),c='r')
 85 |     #     rect1 = plt.Rectangle([pre_left[i],pre_top[i]],pre_w[i],pre_h[i],color="r",fill=False)
 86 |     #     ax.add_patch(rect1)
 87 | 
 88 |     # plt.show()
 89 |     boxes[:, :, [0, 2]] = boxes[:, :, [0, 2]] / input_shape[1]
 90 |     boxes[:, :, [1, 3]] = boxes[:, :, [1, 3]] / input_shape[0]
 91 | 
 92 |     boxes = torch.clamp(boxes, min = 0, max = 1)
 93 |     return boxes
 94 | 
 95 | def bbox_iou(box1, box2, x1y1x2y2=True):
 96 |     """
 97 |         计算IOU
 98 |     """
 99 |     if not x1y1x2y2:
100 |         b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
101 |         b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
102 |         b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
103 |         b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
104 |     else:
105 |         b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
106 |         b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
107 | 
108 |     inter_rect_x1 = torch.max(b1_x1, b2_x1)
109 |     inter_rect_y1 = torch.max(b1_y1, b2_y1)
110 |     inter_rect_x2 = torch.min(b1_x2, b2_x2)
111 |     inter_rect_y2 = torch.min(b1_y2, b2_y2)
112 | 
113 |     inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1, min=0) * \
114 |                  torch.clamp(inter_rect_y2 - inter_rect_y1, min=0)
115 |                  
116 |     b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1)
117 |     b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1)
118 |     
119 |     iou = inter_area / torch.clamp(b1_area + b2_area - inter_area, min = 1e-6)
120 | 
121 |     return iou
122 | 
123 | def efficientdet_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image):
124 |     #-----------------------------------------------------------------#
125 |     #   把y轴放前面是因为方便预测框和图像的宽高进行相乘
126 |     #-----------------------------------------------------------------#
127 |     box_yx = box_xy[..., ::-1]
128 |     box_hw = box_wh[..., ::-1]
129 |     input_shape = np.array(input_shape)
130 |     image_shape = np.array(image_shape)
131 | 
132 |     if letterbox_image:
133 |         #-----------------------------------------------------------------#
134 |         #   这里求出来的offset是图像有效区域相对于图像左上角的偏移情况
135 |         #   new_shape指的是宽高缩放情况
136 |         #-----------------------------------------------------------------#
137 |         new_shape = np.round(image_shape * np.min(input_shape/image_shape))
138 |         offset  = (input_shape - new_shape)/2./input_shape
139 |         scale   = input_shape/new_shape
140 | 
141 |         box_yx  = (box_yx - offset) * scale
142 |         box_hw *= scale
143 | 
144 |     box_mins    = box_yx - (box_hw / 2.)
145 |     box_maxes   = box_yx + (box_hw / 2.)
146 |     boxes  = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1)
147 |     boxes *= np.concatenate([image_shape, image_shape], axis=-1)
148 |     return boxes
149 | 
150 | def non_max_suppression(prediction, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
151 |     output = [None for _ in range(len(prediction))]
152 |     
153 |     #----------------------------------------------------------#
154 |     #   预测只用一张图片，只会进行一次
155 |     #----------------------------------------------------------#
156 |     for i, image_pred in enumerate(prediction):
157 |         #----------------------------------------------------------#
158 |         #   对种类预测部分取max。
159 |         #   class_conf  [num_anchors, 1]    种类置信度
160 |         #   class_pred  [num_anchors, 1]    种类
161 |         #----------------------------------------------------------#
162 |         class_conf, class_pred = torch.max(image_pred[:, 4:], 1, keepdim=True)
163 | 
164 |         #----------------------------------------------------------#
165 |         #   利用置信度进行第一轮筛选
166 |         #----------------------------------------------------------#
167 |         conf_mask = (class_conf[:, 0] >= conf_thres).squeeze()
168 | 
169 |         #----------------------------------------------------------#
170 |         #   根据置信度进行预测结果的筛选
171 |         #----------------------------------------------------------#
172 |         image_pred = image_pred[conf_mask]
173 |         class_conf = class_conf[conf_mask]
174 |         class_pred = class_pred[conf_mask]
175 |         if not image_pred.size(0):
176 |             continue
177 |         #-------------------------------------------------------------------------#
178 |         #   detections  [num_anchors, 6]
179 |         #   6的内容为：x1, y1, x2, y2, class_conf, class_pred
180 |         #-------------------------------------------------------------------------#
181 |         detections = torch.cat((image_pred[:, :4], class_conf.float(), class_pred.float()), 1)
182 | 
183 |         #------------------------------------------#
184 |         #   获得预测结果中包含的所有种类
185 |         #------------------------------------------#
186 |         unique_labels = detections[:, -1].cpu().unique()
187 | 
188 |         if prediction.is_cuda:
189 |             unique_labels = unique_labels.cuda()
190 |             detections = detections.cuda()
191 | 
192 |         for c in unique_labels:
193 |             #------------------------------------------#
194 |             #   获得某一类得分筛选后全部的预测结果
195 |             #------------------------------------------#
196 |             detections_class = detections[detections[:, -1] == c]
197 | 
198 |             #------------------------------------------#
199 |             #   使用官方自带的非极大抑制会速度更快一些！
200 |             #------------------------------------------#
201 |             keep = nms(
202 |                 detections_class[:, :4],
203 |                 detections_class[:, 4],
204 |                 nms_thres
205 |             )
206 |             max_detections = detections_class[keep]
207 |             
208 |             # #------------------------------------------#
209 |             # #   按照存在物体的置信度排序
210 |             # #------------------------------------------#
211 |             # _, conf_sort_index = torch.sort(detections_class[:, 4], descending=True)
212 |             # detections_class = detections_class[conf_sort_index]
213 |             # #------------------------------------------#
214 |             # #   进行非极大抑制
215 |             # #------------------------------------------#
216 |             # max_detections = []
217 |             # while detections_class.size(0):
218 |             #     #---------------------------------------------------#
219 |             #     #   取出这一类置信度最高的，一步一步往下判断。
220 |             #     #   判断重合程度是否大于nms_thres，如果是则去除掉
221 |             #     #---------------------------------------------------#
222 |             #     max_detections.append(detections_class[0].unsqueeze(0))
223 |             #     if len(detections_class) == 1:
224 |             #         break
225 |             #     ious = bbox_iou(max_detections[-1], detections_class[1:])
226 |             #     detections_class = detections_class[1:][ious < nms_thres]
227 |             # #------------------------------------------#
228 |             # #   堆叠
229 |             # #------------------------------------------#
230 |             # max_detections = torch.cat(max_detections).data
231 |             
232 |             output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))
233 | 
234 |         if output[i] is not None:
235 |             output[i]           = output[i].cpu().numpy()
236 |             box_xy, box_wh      = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2]
237 |             output[i][:, :4]    = efficientdet_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)
238 |     return output
239 | 


--------------------------------------------------------------------------------
/utils/utils_fit.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import torch
  4 | from tqdm import tqdm
  5 | 
  6 | from utils.utils import get_lr
  7 | 
  8 | 
  9 | def fit_one_epoch(model_train, model, focal_loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, cuda, fp16, scaler, save_period, save_dir, local_rank=0):
 10 |     loss        = 0
 11 |     val_loss    = 0
 12 | 
 13 |     if local_rank == 0:
 14 |         print('Start Train')
 15 |         pbar = tqdm(total=epoch_step,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3)
 16 |     model_train.train()
 17 |     for iteration, batch in enumerate(gen):
 18 |         if iteration >= epoch_step:
 19 |             break
 20 |         images, targets = batch[0], batch[1]
 21 |         with torch.no_grad():
 22 |             if cuda:
 23 |                 images  = images.cuda(local_rank)
 24 |                 targets = [ann.cuda(local_rank) for ann in targets]
 25 |         #----------------------#
 26 |         #   清零梯度
 27 |         #----------------------#
 28 |         optimizer.zero_grad()
 29 |         if not fp16:
 30 |             #-------------------#
 31 |             #   获得预测结果
 32 |             #-------------------#
 33 |             _, regression, classification, anchors = model_train(images)
 34 |             #-------------------#
 35 |             #   计算损失
 36 |             #-------------------#
 37 |             loss_value, _, _ = focal_loss(classification, regression, anchors, targets, cuda = cuda)
 38 | 
 39 |             loss_value.backward()
 40 |             optimizer.step()
 41 |         else:
 42 |             from torch.cuda.amp import autocast
 43 |             with autocast():
 44 |                 #-------------------#
 45 |                 #   获得预测结果
 46 |                 #-------------------#
 47 |                 _, regression, classification, anchors = model_train(images)
 48 |                 #-------------------#
 49 |                 #   计算损失
 50 |                 #-------------------#
 51 |                 loss_value, _, _ = focal_loss(classification, regression, anchors, targets, cuda = cuda)
 52 | 
 53 |             #----------------------#
 54 |             #   反向传播
 55 |             #----------------------#
 56 |             scaler.scale(loss_value).backward()
 57 |             scaler.step(optimizer)
 58 |             scaler.update()
 59 |             
 60 |         loss += loss_value.item()
 61 |         
 62 |         if local_rank == 0:
 63 |             pbar.set_postfix(**{'loss'  : loss / (iteration + 1), 
 64 |                                 'lr'    : get_lr(optimizer)})
 65 |             pbar.update(1)
 66 | 
 67 |     if local_rank == 0:
 68 |         pbar.close()
 69 |         print('Finish Train')
 70 |         print('Start Validation')
 71 |         pbar = tqdm(total=epoch_step_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3)
 72 | 
 73 |     model_train.eval()
 74 |     for iteration, batch in enumerate(gen_val):
 75 |         if iteration >= epoch_step_val:
 76 |             break
 77 |         images, targets = batch[0], batch[1]
 78 |         with torch.no_grad():
 79 |             if cuda:
 80 |                 images  = images.cuda(local_rank)
 81 |                 targets = [ann.cuda(local_rank) for ann in targets]
 82 |             #----------------------#
 83 |             #   清零梯度
 84 |             #----------------------#
 85 |             optimizer.zero_grad()
 86 |             #-------------------#
 87 |             #   获得预测结果
 88 |             #-------------------#
 89 |             _, regression, classification, anchors = model_train(images)
 90 |             #-------------------#
 91 |             #   计算损失
 92 |             #-------------------#
 93 |             loss_value, _, _ = focal_loss(classification, regression, anchors, targets, cuda = cuda)
 94 | 
 95 |         val_loss += loss_value.item()
 96 |         if local_rank == 0:
 97 |             pbar.set_postfix(**{'val_loss': val_loss / (iteration + 1)})
 98 |             pbar.update(1)
 99 | 
100 |     if local_rank == 0:
101 |         pbar.close()
102 |         print('Finish Validation')
103 |         loss_history.append_loss(epoch + 1, loss / epoch_step, val_loss / epoch_step_val)
104 |         eval_callback.on_epoch_end(epoch + 1, model_train)
105 |         print('Epoch:'+ str(epoch + 1) + '/' + str(Epoch))
106 |         print('Total Loss: %.3f || Val Loss: %.3f ' % (loss / epoch_step, val_loss / epoch_step_val))
107 |         
108 |         #-----------------------------------------------#
109 |         #   保存权值
110 |         #-----------------------------------------------#
111 |         if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch:
112 |             torch.save(model.state_dict(), os.path.join(save_dir, 'ep%03d-loss%.3f-val_loss%.3f.pth' % (epoch + 1, loss / epoch_step, val_loss / epoch_step_val)))
113 |             
114 |         if len(loss_history.val_loss) <= 1 or (val_loss / epoch_step_val) <= min(loss_history.val_loss):
115 |             print('Save best model to best_epoch_weights.pth')
116 |             torch.save(model.state_dict(), os.path.join(save_dir, "best_epoch_weights.pth"))
117 |             
118 |         torch.save(model.state_dict(), os.path.join(save_dir, "last_epoch_weights.pth"))


--------------------------------------------------------------------------------
/voc_annotation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import random
  3 | import xml.etree.ElementTree as ET
  4 | 
  5 | import numpy as np
  6 | 
  7 | from utils.utils import get_classes
  8 | 
  9 | #--------------------------------------------------------------------------------------------------------------------------------#
 10 | #   annotation_mode用于指定该文件运行时计算的内容
 11 | #   annotation_mode为0代表整个标签处理过程，包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt
 12 | #   annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt
 13 | #   annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt
 14 | #--------------------------------------------------------------------------------------------------------------------------------#
 15 | annotation_mode     = 0
 16 | #-------------------------------------------------------------------#
 17 | #   必须要修改，用于生成2007_train.txt、2007_val.txt的目标信息
 18 | #   与训练和预测所用的classes_path一致即可
 19 | #   如果生成的2007_train.txt里面没有目标信息
 20 | #   那么就是因为classes没有设定正确
 21 | #   仅在annotation_mode为0和2的时候有效
 22 | #-------------------------------------------------------------------#
 23 | classes_path        = 'model_data/voc_classes.txt'
 24 | #--------------------------------------------------------------------------------------------------------------------------------#
 25 | #   trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1
 26 | #   train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1
 27 | #   仅在annotation_mode为0和1的时候有效
 28 | #--------------------------------------------------------------------------------------------------------------------------------#
 29 | trainval_percent    = 0.9
 30 | train_percent       = 0.9
 31 | #-------------------------------------------------------#
 32 | #   指向VOC数据集所在的文件夹
 33 | #   默认指向根目录下的VOC数据集
 34 | #-------------------------------------------------------#
 35 | VOCdevkit_path  = 'VOCdevkit'
 36 | 
 37 | VOCdevkit_sets  = [('2007', 'train'), ('2007', 'val')]
 38 | classes, _      = get_classes(classes_path)
 39 | 
 40 | #-------------------------------------------------------#
 41 | #   统计目标数量
 42 | #-------------------------------------------------------#
 43 | photo_nums  = np.zeros(len(VOCdevkit_sets))
 44 | nums        = np.zeros(len(classes))
 45 | def convert_annotation(year, image_id, list_file):
 46 |     in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8')
 47 |     tree=ET.parse(in_file)
 48 |     root = tree.getroot()
 49 | 
 50 |     for obj in root.iter('object'):
 51 |         difficult = 0 
 52 |         if obj.find('difficult')!=None:
 53 |             difficult = obj.find('difficult').text
 54 |         cls = obj.find('name').text
 55 |         if cls not in classes or int(difficult)==1:
 56 |             continue
 57 |         cls_id = classes.index(cls)
 58 |         xmlbox = obj.find('bndbox')
 59 |         b = (int(float(xmlbox.find('xmin').text)), int(float(xmlbox.find('ymin').text)), int(float(xmlbox.find('xmax').text)), int(float(xmlbox.find('ymax').text)))
 60 |         list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
 61 |         
 62 |         nums[classes.index(cls)] = nums[classes.index(cls)] + 1
 63 |         
 64 | if __name__ == "__main__":
 65 |     random.seed(0)
 66 |     if " " in os.path.abspath(VOCdevkit_path):
 67 |         raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格，否则会影响正常的模型训练，请注意修改。")
 68 | 
 69 |     if annotation_mode == 0 or annotation_mode == 1:
 70 |         print("Generate txt in ImageSets.")
 71 |         xmlfilepath     = os.path.join(VOCdevkit_path, 'VOC2007/Annotations')
 72 |         saveBasePath    = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Main')
 73 |         temp_xml        = os.listdir(xmlfilepath)
 74 |         total_xml       = []
 75 |         for xml in temp_xml:
 76 |             if xml.endswith(".xml"):
 77 |                 total_xml.append(xml)
 78 | 
 79 |         num     = len(total_xml)  
 80 |         list    = range(num)  
 81 |         tv      = int(num*trainval_percent)  
 82 |         tr      = int(tv*train_percent)  
 83 |         trainval= random.sample(list,tv)  
 84 |         train   = random.sample(trainval,tr)  
 85 |         
 86 |         print("train and val size",tv)
 87 |         print("train size",tr)
 88 |         ftrainval   = open(os.path.join(saveBasePath,'trainval.txt'), 'w')  
 89 |         ftest       = open(os.path.join(saveBasePath,'test.txt'), 'w')  
 90 |         ftrain      = open(os.path.join(saveBasePath,'train.txt'), 'w')  
 91 |         fval        = open(os.path.join(saveBasePath,'val.txt'), 'w')  
 92 |         
 93 |         for i in list:  
 94 |             name=total_xml[i][:-4]+'\n'  
 95 |             if i in trainval:  
 96 |                 ftrainval.write(name)  
 97 |                 if i in train:  
 98 |                     ftrain.write(name)  
 99 |                 else:  
100 |                     fval.write(name)  
101 |             else:  
102 |                 ftest.write(name)  
103 |         
104 |         ftrainval.close()  
105 |         ftrain.close()  
106 |         fval.close()  
107 |         ftest.close()
108 |         print("Generate txt in ImageSets done.")
109 | 
110 |     if annotation_mode == 0 or annotation_mode == 2:
111 |         print("Generate 2007_train.txt and 2007_val.txt for train.")
112 |         type_index = 0
113 |         for year, image_set in VOCdevkit_sets:
114 |             image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split()
115 |             list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8')
116 |             for image_id in image_ids:
117 |                 list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(os.path.abspath(VOCdevkit_path), year, image_id))
118 | 
119 |                 convert_annotation(year, image_id, list_file)
120 |                 list_file.write('\n')
121 |             photo_nums[type_index] = len(image_ids)
122 |             type_index += 1
123 |             list_file.close()
124 |         print("Generate 2007_train.txt and 2007_val.txt for train done.")
125 |         
126 |         def printTable(List1, List2):
127 |             for i in range(len(List1[0])):
128 |                 print("|", end=' ')
129 |                 for j in range(len(List1)):
130 |                     print(List1[j][i].rjust(int(List2[j])), end=' ')
131 |                     print("|", end=' ')
132 |                 print()
133 | 
134 |         str_nums = [str(int(x)) for x in nums]
135 |         tableData = [
136 |             classes, str_nums
137 |         ]
138 |         colWidths = [0]*len(tableData)
139 |         len1 = 0
140 |         for i in range(len(tableData)):
141 |             for j in range(len(tableData[i])):
142 |                 if len(tableData[i][j]) > colWidths[i]:
143 |                     colWidths[i] = len(tableData[i][j])
144 |         printTable(tableData, colWidths)
145 | 
146 |         if photo_nums[0] <= 500:
147 |             print("训练集数量小于500，属于较小的数据量，请注意设置较大的训练世代（Epoch）以满足足够的梯度下降次数（Step）。")
148 | 
149 |         if np.sum(nums) == 0:
150 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
151 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
152 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
153 |             print("（重要的事情说三遍）。")
154 | 


--------------------------------------------------------------------------------
/常见问题汇总.md:
--------------------------------------------------------------------------------
  1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。
  2 | 
  3 | # 问题汇总
  4 | ## 1、下载问题
  5 | ### a、代码下载
  6 | **问：up主，可以给我发一份代码吗，代码在哪里下载啊？ 
  7 | 答：Github上的地址就在视频简介里。复制一下就能进去下载了。**
  8 | 
  9 | **问：up主，为什么我下载的代码提示压缩包损坏？
 10 | 答：重新去Github下载。**
 11 | 
 12 | **问：up主，为什么我下载的代码和你在视频以及博客上的代码不一样？
 13 | 答：我常常会对代码进行更新，最终以实际的代码为准。**
 14 | 
 15 | ### b、 权值下载
 16 | **问：up主，为什么我下载的代码里面，model_data下面没有.pth或者.h5文件？ 
 17 | 答：我一般会把权值上传到Github和百度网盘，在GITHUB的README里面就能找到。**
 18 | 
 19 | ### c、 数据集下载
 20 | **问：up主，XXXX数据集在哪里下载啊？
 21 | 答：一般数据集的下载地址我会放在README里面，基本上都有，没有的话请及时联系我添加，直接发github的issue即可**。
 22 | 
 23 | ## 2、环境配置问题
 24 | ### a、现在库中所用的环境
 25 | **pytorch代码对应的pytorch版本为1.2，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。
 26 | 
 27 | **keras代码对应的tensorflow版本为1.13.2，keras版本是2.1.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。
 28 | 
 29 | **tf2代码对应的tensorflow版本为2.2.0，无需安装keras，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。
 30 | 
 31 | **问：你的代码某某某版本的tensorflow和pytorch能用嘛？
 32 | 答：最好按照我推荐的配置，配置教程也有！其它版本的我没有试过！可能出现问题但是一般问题不大。仅需要改少量代码即可。**
 33 | 
 34 | ### b、30系列显卡环境配置
 35 | 30系显卡由于框架更新不可使用上述环境配置教程。
 36 | 当前我已经测试的可以用的30显卡配置如下：
 37 | **pytorch代码对应的pytorch版本为1.7.0，cuda为11.0，cudnn为8.0.5**。
 38 | 
 39 | **keras代码无法在win10下配置cuda11，在ubuntu下可以百度查询一下，配置tensorflow版本为1.15.4，keras版本是2.1.5或者2.3.1（少量函数接口不同，代码可能还需要少量调整。）**
 40 | 
 41 | **tf2代码对应的tensorflow版本为2.4.0，cuda为11.0，cudnn为8.0.5**。
 42 | 
 43 | ### c、GPU利用问题与环境使用问题
 44 | **问：为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢？
 45 | 答：确认tensorflow-gpu已经装好，利用pip list查看tensorflow版本，然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练，任务管理器的话要看显存使用情况。**
 46 | 
 47 | **问：up主，我好像没有在用gpu进行训练啊，怎么看是不是用了GPU进行训练？
 48 | 答：查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令，如果要看任务管理器的话，请看性能部分GPU的显存是否利用，或者查看任务管理器的Cuda，而非Copy。**
 49 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center)
 50 | 
 51 | **问：up主，为什么我按照你的环境配置后还是不能使用？
 52 | 答：请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。**
 53 | 
 54 | **问：出现如下错误**
 55 | ```python
 56 | Traceback (most recent call last):
 57 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
 58 |  from tensorflow.python.pywrap_tensorflow_internal import *
 59 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
 60 | pywrap_tensorflow_internal = swig_import_helper()
 61 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
 62 |     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
 63 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file)
 64 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic
 65 |     return _load(spec)
 66 | ImportError: DLL load failed: 找不到指定的模块。
 67 | ```
 68 | **答：如果没重启过就重启一下，否则重新按照步骤安装，还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。**
 69 | 
 70 | ### d、no module问题
 71 | **问：为什么提示说no module name utils.utils（no module name nets.yolo、no module name nets.ssd等一系列问题）啊？
 72 | 答：utils并不需要用pip装，它就在我上传的仓库的根目录，出现这个问题的原因是根目录不对，查查相对目录和根目录的概念。查了基本上就明白了。**
 73 | 
 74 | **问：为什么提示说no module name matplotlib（no module name PIL，no module name cv2等等）？
 75 | 答：这个库没安装打开命令行安装就好。pip install matplotlib**
 76 | 
 77 | **问：为什么我已经用pip装了opencv（pillow、matplotlib等），还是提示no module name cv2？
 78 | 答：没有激活环境装，要激活对应的conda环境进行安装才可以正常使用**
 79 | 
 80 | **问：为什么提示说No module named 'torch' ？
 81 | 答：其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况？一般就俩情况，一个是真的没装，还有一个是装到其它环境了，当前激活的环境不是自己装的环境。**
 82 | 
 83 | **问：为什么提示说No module named 'tensorflow' ？
 84 | 答：同上。**
 85 | 
 86 | ### e、cuda安装失败问题
 87 | 一般cuda安装前需要安装Visual Studio，装个2017版本即可。
 88 | 
 89 | ### f、Ubuntu系统问题
 90 | **所有代码在Ubuntu下可以使用，我两个系统都试过。**
 91 | 
 92 | ### g、VSCODE提示错误的问题
 93 | **问：为什么在VSCODE里面提示一大堆的错误啊？
 94 | 答：我也提示一大堆的错误，但是不影响，是VSCODE的问题，如果不想看错误的话就装Pycharm。**
 95 | 
 96 | ### h、使用cpu进行训练与预测的问题
 97 | **对于keras和tf2的代码而言，如果想用cpu进行训练和预测，直接装cpu版本的tensorflow就可以了。**
 98 | 
 99 | **对于pytorch的代码而言，如果想用cpu进行训练和预测，需要将cuda=True修改成cuda=False。**
100 | 
101 | ### i、tqdm没有pos参数问题
102 | **问：运行代码提示'tqdm' object has no attribute 'pos'。
103 | 答：重装tqdm，换个版本就可以了。**
104 | 
105 | ### j、提示decode(“utf-8”)的问题
106 | **由于h5py库的更新，安装过程中会自动安装h5py=3.0.0以上的版本，会导致decode("utf-8")的错误！
107 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0！**
108 | ```
109 | pip install h5py==2.10.0
110 | ```
111 | 
112 | ### k、提示TypeError: __array__() takes 1 positional argument but 2 were given错误
113 | 可以修改pillow版本解决。
114 | ```
115 | pip install pillow==8.2.0
116 | ```
117 | 
118 | ### l、其它问题
119 | **问：为什么提示TypeError: cat() got an unexpected keyword argument 'axis'，Traceback (most recent call last)，AttributeError: 'Tensor' object has no attribute 'bool'？
120 | 答：这是版本问题，建议使用torch1.2以上版本**
121 | **其它有很多稀奇古怪的问题，很多是版本问题，建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2，就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。**
122 | 
123 | ## 3、目标检测库问题汇总（人脸检测和分类库也可参考）
124 | ### a、shape不匹配问题
125 | #### 1）、训练时shape不匹配问题
126 | **问：up主，为什么运行train.py会提示shape不匹配啊？
127 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
128 | 
129 | #### 2）、预测时shape不匹配问题
130 | **问：为什么我运行predict.py会提示我说shape不匹配呀。
131 | 在Pytorch里面是这样的：**
132 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
133 | 在Keras里面是这样的：
134 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
135 | **答：原因主要有仨：
136 | 1、在ssd、FasterRCNN里面，可能是train.py里面的num_classes没改。
137 | 2、model_path没改。
138 | 3、classes_path没改。
139 | 请检查清楚了！确定自己所用的model_path和classes_path是对应的！训练的时候用到的num_classes或者classes_path也需要检查！**
140 | 
141 | ### b、显存不足问题
142 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
143 | 答：这是在keras中出现的，爆显存了，可以改小batch_size，SSD的显存占用率是最小的，建议用SSD；
144 | 2G显存：SSD、YOLOV4-TINY
145 | 4G显存：YOLOV3
146 | 6G显存：YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等
147 | 8G+显存：随便选吧。**
148 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
149 | 
150 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
151 | 答：这是pytorch中出现的，爆显存了，同上。**
152 | 
153 | **问：为什么我显存都没利用，就直接爆显存了？ 
154 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
155 | ### c、训练问题（冻结训练，LOSS问题、训练效果问题等）
156 | **问：为什么要冻结训练和解冻训练呀？
157 | 答：这是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
158 | 在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。
159 | 在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。
160 | 
161 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
162 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
163 | 
164 | **问：为什么我的训练效果不好？预测了没有框（框不准）。
165 | 答：**
166 | 
167 | 考虑几个问题：
168 | 1、目标信息问题，查看2007_train.txt文件是否有目标信息，没有的话请修改voc_annotation.py。
169 | 2、数据集问题，小于500的自行考虑增加数据集，同时测试不同的模型，确认数据集是好的。
170 | 3、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
171 | 4、网络问题，比如SSD不适合小目标，因为先验框固定了。
172 | 5、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
173 | 6、确认自己是否按照步骤去做了，如果比如voc_annotation.py里面的classes是否修改了等。
174 | 7、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。
175 | 
176 | **问：我怎么出现了gbk什么的编码错误啊：**
177 | ```python
178 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
179 | ```
180 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
181 | 
182 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
183 | **答：可以用，代码里面会自动进行resize或者数据增强。**
184 | 
185 | **问：怎么进行多GPU训练？
186 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
187 | ### d、灰度图问题
188 | **问：能不能训练灰度图（预测灰度图）啊？
189 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
190 | 
191 | ### e、断点续练问题
192 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
193 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
194 | 
195 | ### f、预训练权重的问题
196 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
197 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
198 | 
199 | **问：up，我修改了网络，预训练权重还能用吗？
200 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
201 | 权值匹配的方式可以参考如下：
202 | ```python
203 | # 加快模型训练的效率
204 | print('Loading weights into state dict...')
205 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
206 | model_dict = model.state_dict()
207 | pretrained_dict = torch.load(model_path, map_location=device)
208 | a = {}
209 | for k, v in pretrained_dict.items():
210 |     try:    
211 |         if np.shape(model_dict[k]) ==  np.shape(v):
212 |             a[k]=v
213 |     except:
214 |         pass
215 | model_dict.update(a)
216 | model.load_state_dict(model_dict)
217 | print('Finished!')
218 | ```
219 | 
220 | **问：我要怎么不使用预训练权重啊？
221 | 答：把载入预训练权重的代码注释了就行。**
222 | 
223 | **问：为什么我不使用预训练权重效果这么差啊？
224 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
225 | 
226 | ### g、视频检测问题与摄像头检测问题
227 | **问：怎么用摄像头检测呀？
228 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
229 | 
230 | **问：怎么用视频检测呀？
231 | 答：同上**
232 | ### h、从0开始训练问题
233 | **问：怎么在模型上从0开始训练？
234 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
235 | 如果一定要从0开始，那么训练的时候请注意几点：
236 |  - 不载入预训练权重。 
237 |  - 不要进行冻结训练，注释冻结模型的代码。
238 | 
239 | **问：为什么我不使用预训练权重效果这么差啊？
240 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
241 | 
242 | ### i、保存问题
243 | **问：检测完的图片怎么保存？
244 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
245 | 
246 | **问：怎么用视频保存呀？
247 | 答：详细看看predict.py文件的注释。**
248 | 
249 | ### j、遍历问题
250 | **问：如何对一个文件夹的图片进行遍历？
251 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
252 | 
253 | **问：如何对一个文件夹的图片进行遍历？并且保存。
254 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
255 | 
256 | ### k、路径问题（No such file or directory）
257 | **问：我怎么出现了这样的错误呀：**
258 | ```python
259 | FileNotFoundError: 【Errno 2】 No such file or directory
260 | ……………………………………
261 | ……………………………………
262 | ```
263 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
264 | 关于路径有几个重要的点：
265 | **文件夹名称中一定不要有空格。
266 | 注意相对路径和绝对路径。
267 | 多百度路径相关的知识。**
268 | 
269 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
270 | ### l、和原版比较问题
271 | **问：你这个代码和原版比怎么样，可以达到原版的效果么？
272 | 答：基本上可以达到，我都用voc数据测过，我没有好显卡，没有能力在coco上测试与训练。**
273 | 
274 | **问：你有没有实现yolov4所有的tricks，和原版差距多少？
275 | 答：并没有实现全部的改进部分，由于YOLOV4使用的改进实在太多了，很难完全实现与列出来，这里只列出来了一些我比较感兴趣，而且非常有效的改进。论文中提到的SAM（注意力机制模块），作者自己的源码也没有使用。还有其它很多的tricks，不是所有的tricks都有提升，我也没法实现全部的tricks。至于和原版的比较，我没有能力训练coco数据集，根据使用过的同学反应差距不大。**
276 | 
277 | ### m、FPS问题（检测速度问题）
278 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
279 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
280 | 
281 | **问：为什么我用服务器去测试yolov4（or others）的FPS只有十几？
282 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。**
283 | 
284 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
285 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
286 | 
287 | ### n、预测图片不显示问题
288 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
289 | 答：给系统安装一个图片查看器就行了。**
290 | 
291 | ### o、算法评价问题（目标检测的map、PR曲线、Recall、Precision等）
292 | **问：怎么计算map？
293 | 答：看map视频，都一个流程。**
294 | 
295 | **问：计算map的时候，get_map.py里面有一个MINOVERLAP是什么用的，是iou吗？
296 | 答：是iou，它的作用是判断预测框和真实框的重合成度，如果重合程度大于MINOVERLAP，则预测正确。**
297 | 
298 | **问：为什么get_map.py里面的self.confidence（self.score）要设置的那么小？
299 | 答：看一下map的视频的原理部分，要知道所有的结果然后再进行pr曲线的绘制。**
300 | 
301 | **问：能不能说说怎么绘制PR曲线啥的呀。
302 | 答：可以看mAP视频，结果里面有PR曲线。**
303 | 
304 | **问：怎么计算Recall、Precision指标。
305 | 答：这俩指标应该是相对于特定的置信度的，计算map的时候也会获得。**
306 | 
307 | ### p、coco数据集训练问题
308 | **问：目标检测怎么训练COCO数据集啊？。
309 | 答：coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库，格式都是一样的。**
310 | 
311 | ### q、模型优化（模型修改）问题
312 | **问：up，YOLO系列使用Focal LOSS的代码你有吗，有提升吗？
313 | 答：很多人试过，提升效果也不大（甚至变的更Low），它自己有自己的正负样本的平衡方式。**
314 | 
315 | **问：up，我修改了网络，预训练权重还能用吗？
316 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
317 | 权值匹配的方式可以参考如下：
318 | ```python
319 | # 加快模型训练的效率
320 | print('Loading weights into state dict...')
321 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
322 | model_dict = model.state_dict()
323 | pretrained_dict = torch.load(model_path, map_location=device)
324 | a = {}
325 | for k, v in pretrained_dict.items():
326 |     try:    
327 |         if np.shape(model_dict[k]) ==  np.shape(v):
328 |             a[k]=v
329 |     except:
330 |         pass
331 | model_dict.update(a)
332 | model.load_state_dict(model_dict)
333 | print('Finished!')
334 | ```
335 | 
336 | **问：up，怎么修改模型啊，我想发个小论文！
337 | 答：建议看看yolov3和yolov4的区别，然后看看yolov4的论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。**
338 | 
339 | ### r、部署问题
340 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
341 | 
342 | ## 4、语义分割库问题汇总
343 | ### a、shape不匹配问题
344 | #### 1）、训练时shape不匹配问题
345 | **问：up主，为什么运行train.py会提示shape不匹配啊？
346 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
347 | 
348 | #### 2）、预测时shape不匹配问题
349 | **问：为什么我运行predict.py会提示我说shape不匹配呀。
350 | 在Pytorch里面是这样的：**
351 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
352 | 在Keras里面是这样的：
353 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
354 | **答：原因主要有二：
355 | 1、train.py里面的num_classes没改。
356 | 2、预测时num_classes没改。
357 | 请检查清楚！训练和预测的时候用到的num_classes都需要检查！**
358 | 
359 | ### b、显存不足问题
360 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
361 | 答：这是在keras中出现的，爆显存了，可以改小batch_size。**
362 | 
363 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
364 | 
365 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
366 | 答：这是pytorch中出现的，爆显存了，同上。**
367 | 
368 | **问：为什么我显存都没利用，就直接爆显存了？ 
369 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
370 | 
371 | ### c、训练问题（冻结训练，LOSS问题、训练效果问题等）
372 | **问：为什么要冻结训练和解冻训练呀？
373 | 答：这是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
374 | **在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。**
375 | **在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。**
376 | 
377 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
378 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
379 | 
380 | **问：为什么我的训练效果不好？预测了没有目标，结果是一片黑。
381 | 答：**
382 | **考虑几个问题：
383 | 1、数据集问题，这是最重要的问题。小于500的自行考虑增加数据集；一定要检查数据集的标签，视频中详细解析了VOC数据集的格式，但并不是有输入图片有输出标签即可，还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对，最常见的错误格式就是标签的背景为黑，目标为白，此时目标的像素点值为255，无法正常训练，目标需要为1才行。
384 | 2、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
385 | 3、网络问题，可以尝试不同的网络。
386 | 4、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
387 | 5、确认自己是否按照步骤去做了。
388 | 6、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。**
389 | 
390 | 
391 | 
392 | **问：为什么我的训练效果不好？对小目标预测不准确。
393 | 答：对于deeplab和pspnet而言，可以修改一下downsample_factor，当downsample_factor为16的时候下采样倍数过多，效果不太好，可以修改为8。**
394 | 
395 | **问：我怎么出现了gbk什么的编码错误啊：**
396 | ```python
397 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
398 | ```
399 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
400 | 
401 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
402 | **答：可以用，代码里面会自动进行resize或者数据增强。**
403 | 
404 | **问：怎么进行多GPU训练？
405 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
406 | 
407 | ### d、灰度图问题
408 | **问：能不能训练灰度图（预测灰度图）啊？
409 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
410 | 
411 | ### e、断点续练问题
412 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
413 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
414 | 
415 | ### f、预训练权重的问题
416 | 
417 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
418 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
419 | 
420 | **问：up，我修改了网络，预训练权重还能用吗？
421 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
422 | 权值匹配的方式可以参考如下：
423 | 
424 | ```python
425 | # 加快模型训练的效率
426 | print('Loading weights into state dict...')
427 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
428 | model_dict = model.state_dict()
429 | pretrained_dict = torch.load(model_path, map_location=device)
430 | a = {}
431 | for k, v in pretrained_dict.items():
432 |     try:    
433 |         if np.shape(model_dict[k]) ==  np.shape(v):
434 |             a[k]=v
435 |     except:
436 |         pass
437 | model_dict.update(a)
438 | model.load_state_dict(model_dict)
439 | print('Finished!')
440 | ```
441 | 
442 | **问：我要怎么不使用预训练权重啊？
443 | 答：把载入预训练权重的代码注释了就行。**
444 | 
445 | **问：为什么我不使用预训练权重效果这么差啊？
446 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，预训练权重还是非常重要的。**
447 | 
448 | ### g、视频检测问题与摄像头检测问题
449 | **问：怎么用摄像头检测呀？
450 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
451 | 
452 | **问：怎么用视频检测呀？
453 | 答：同上**
454 | 
455 | ### h、从0开始训练问题
456 | **问：怎么在模型上从0开始训练？
457 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
458 | 如果一定要从0开始，那么训练的时候请注意几点：
459 |  - 不载入预训练权重。
460 |  - 不要进行冻结训练，注释冻结模型的代码。
461 | 
462 | **问：为什么我不使用预训练权重效果这么差啊？
463 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，预训练权重还是非常重要的。**
464 | 
465 | ### i、保存问题
466 | **问：检测完的图片怎么保存？
467 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
468 | 
469 | **问：怎么用视频保存呀？
470 | 答：详细看看predict.py文件的注释。**
471 | 
472 | ### j、遍历问题
473 | **问：如何对一个文件夹的图片进行遍历？
474 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
475 | 
476 | **问：如何对一个文件夹的图片进行遍历？并且保存。
477 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
478 | 
479 | ### k、路径问题（No such file or directory）
480 | **问：我怎么出现了这样的错误呀：**
481 | ```python
482 | FileNotFoundError: 【Errno 2】 No such file or directory
483 | ……………………………………
484 | ……………………………………
485 | ```
486 | 
487 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
488 | 关于路径有几个重要的点：
489 | **文件夹名称中一定不要有空格。
490 | 注意相对路径和绝对路径。
491 | 多百度路径相关的知识。**
492 | 
493 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
494 | 
495 | ### l、FPS问题（检测速度问题）
496 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
497 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
498 | 
499 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
500 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
501 | 
502 | ### m、预测图片不显示问题
503 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
504 | 答：给系统安装一个图片查看器就行了。**
505 | 
506 | ### n、算法评价问题（miou）
507 | **问：怎么计算miou？
508 | 答：参考视频里的miou测量部分。**
509 | 
510 | **问：怎么计算Recall、Precision指标。
511 | 答：现有的代码还无法获得，需要各位同学理解一下混淆矩阵的概念，然后自行计算一下。**
512 | 
513 | ### o、模型优化（模型修改）问题
514 | **问：up，我修改了网络，预训练权重还能用吗？
515 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
516 | 权值匹配的方式可以参考如下：
517 | 
518 | ```python
519 | # 加快模型训练的效率
520 | print('Loading weights into state dict...')
521 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
522 | model_dict = model.state_dict()
523 | pretrained_dict = torch.load(model_path, map_location=device)
524 | a = {}
525 | for k, v in pretrained_dict.items():
526 |     try:    
527 |         if np.shape(model_dict[k]) ==  np.shape(v):
528 |             a[k]=v
529 |     except:
530 |         pass
531 | model_dict.update(a)
532 | model.load_state_dict(model_dict)
533 | print('Finished!')
534 | ```
535 | 
536 | 
537 | 
538 | **问：up，怎么修改模型啊，我想发个小论文！
539 | 答：建议看看目标检测中yolov4的论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。常用的tricks如注意力机制什么的，可以试试。**
540 | 
541 | ### p、部署问题
542 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
543 | 
544 | ## 5、交流群问题
545 | **问：up，有没有QQ群啥的呢？
546 | 答：没有没有，我没有时间管理QQ群……**
547 | 
548 | ## 6、怎么学习的问题
549 | **问：up，你的学习路线怎么样的？我是个小白我要怎么学？
550 | 答：这里有几点需要注意哈
551 | 1、我不是高手，很多东西我也不会，我的学习路线也不一定适用所有人。
552 | 2、我实验室不做深度学习，所以我很多东西都是自学，自己摸索，正确与否我也不知道。
553 | 3、我个人觉得学习更靠自学**
554 | 学习路线的话，我是先学习了莫烦的python教程，从tensorflow、keras、pytorch入门，入门完之后学的SSD，YOLO，然后了解了很多经典的卷积网，后面就开始学很多不同的代码了，我的学习方法就是一行一行的看，了解整个代码的执行流程，特征层的shape变化等，花了很多时间也没有什么捷径，就是要花时间吧。


--------------------------------------------------------------------------------