├── .gitignore ├── LICENSE ├── README.md ├── VOCdevkit └── VOC2007 │ ├── Annotations │ └── README.md │ ├── ImageSets │ └── Main │ │ └── README.md │ └── JPEGImages │ └── README.md ├── efficientdet.py ├── get_map.py ├── img └── street.jpg ├── logs └── README.md ├── model_data ├── coco_classes.txt ├── simhei.ttf └── voc_classes.txt ├── nets ├── __init__.py ├── efficientdet.py ├── efficientdet_training.py ├── efficientnet.py └── layers.py ├── predict.py ├── requirements.txt ├── summary.py ├── train.py ├── utils ├── __init__.py ├── anchors.py ├── callbacks.py ├── dataloader.py ├── utils.py ├── utils_bbox.py ├── utils_fit.py └── utils_map.py ├── voc_annotation.py └── 常见问题汇总.md /.gitignore: -------------------------------------------------------------------------------- 1 | # ignore map, miou, datasets 2 | map_out/ 3 | miou_out/ 4 | VOCdevkit/ 5 | datasets/ 6 | Medical_Datasets/ 7 | lfw/ 8 | logs/ 9 | model_data/ 10 | .temp_map_out/ 11 | 12 | # Byte-compiled / optimized / DLL files 13 | __pycache__/ 14 | *.py[cod] 15 | *$py.class 16 | 17 | # C extensions 18 | *.so 19 | 20 | # Distribution / packaging 21 | .Python 22 | build/ 23 | develop-eggs/ 24 | dist/ 25 | downloads/ 26 | eggs/ 27 | .eggs/ 28 | lib/ 29 | lib64/ 30 | parts/ 31 | sdist/ 32 | var/ 33 | wheels/ 34 | pip-wheel-metadata/ 35 | share/python-wheels/ 36 | *.egg-info/ 37 | .installed.cfg 38 | *.egg 39 | MANIFEST 40 | 41 | # PyInstaller 42 | # Usually these files are written by a python script from a template 43 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 44 | *.manifest 45 | *.spec 46 | 47 | # Installer logs 48 | pip-log.txt 49 | pip-delete-this-directory.txt 50 | 51 | # Unit test / coverage reports 52 | htmlcov/ 53 | .tox/ 54 | .nox/ 55 | .coverage 56 | .coverage.* 57 | .cache 58 | nosetests.xml 59 | coverage.xml 60 | *.cover 61 | *.py,cover 62 | .hypothesis/ 63 | .pytest_cache/ 64 | 65 | # Translations 66 | *.mo 67 | *.pot 68 | 69 | # Django stuff: 70 | *.log 71 | local_settings.py 72 | db.sqlite3 73 | db.sqlite3-journal 74 | 75 | # Flask stuff: 76 | instance/ 77 | .webassets-cache 78 | 79 | # Scrapy stuff: 80 | .scrapy 81 | 82 | # Sphinx documentation 83 | docs/_build/ 84 | 85 | # PyBuilder 86 | target/ 87 | 88 | # Jupyter Notebook 89 | .ipynb_checkpoints 90 | 91 | # IPython 92 | profile_default/ 93 | ipython_config.py 94 | 95 | # pyenv 96 | .python-version 97 | 98 | # pipenv 99 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 100 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 101 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 102 | # install all needed dependencies. 103 | #Pipfile.lock 104 | 105 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 106 | __pypackages__/ 107 | 108 | # Celery stuff 109 | celerybeat-schedule 110 | celerybeat.pid 111 | 112 | # SageMath parsed files 113 | *.sage.py 114 | 115 | # Environments 116 | .env 117 | .venv 118 | env/ 119 | venv/ 120 | ENV/ 121 | env.bak/ 122 | venv.bak/ 123 | 124 | # Spyder project settings 125 | .spyderproject 126 | .spyproject 127 | 128 | # Rope project settings 129 | .ropeproject 130 | 131 | # mkdocs documentation 132 | /site 133 | 134 | # mypy 135 | .mypy_cache/ 136 | .dmypy.json 137 | dmypy.json 138 | 139 | # Pyre type checker 140 | .pyre/ 141 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 Bubbliiiing 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Efficientdet:Scalable and Efficient Object目标检测模型在Pytorch当中的实现 2 | --- 3 | 4 | ## 目录 5 | 1. [仓库更新 Top News](#仓库更新) 6 | 2. [性能情况 Performance](#性能情况) 7 | 3. [所需环境 Environment](#所需环境) 8 | 4. [注意事项 Attention](#注意事项) 9 | 5. [文件下载 Download](#文件下载) 10 | 6. [训练步骤 How2train](#训练步骤) 11 | 7. [预测步骤 How2predict](#预测步骤) 12 | 8. [评估步骤 How2eval](#评估步骤) 13 | 9. [参考资料 Reference](#Reference) 14 | 15 | ## Top News 16 | **`2022-04`**:**进行了大幅度的更新,支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整、新增图片裁剪。支持多GPU训练,新增各个种类目标数量计算。** 17 | BiliBili视频中的原仓库地址为:https://github.com/bubbliiiing/efficientdet-pytorch/tree/bilibili 18 | 19 | **`2021-10`**:**进行了大幅度的更新,增加了大量注释、增加了大量可调整参数、对代码的组成模块进行修改、增加fps、视频预测、批量预测等功能。** 20 | 21 | ### 性能情况 22 | | 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mAP 0.5:0.95 | 23 | | :-----: | :-----: | :------: | :------: | :------: | 24 | | COCO-Train2017 | [efficientdet-d0.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d0.pth) | COCO-Val2017 | 512x512 | 33.1 25 | | COCO-Train2017 | [efficientdet-d1.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d1.pth) | COCO-Val2017 | 640x640 | 38.8 26 | | COCO-Train2017 | [efficientdet-d2.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d2.pth) | COCO-Val2017 | 768x768 | 42.1 27 | | COCO-Train2017 | [efficientdet-d3.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d3.pth) | COCO-Val2017 | 896x896 | 45.6 28 | | COCO-Train2017 | [efficientdet-d4.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d4.pth) | COCO-Val2017 | 1024x1024 | 48.8 29 | | COCO-Train2017 | [efficientdet-d5.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d5.pth) | COCO-Val2017 | 1280x1280 | 50.2 30 | | COCO-Train2017 | [efficientdet-d6.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d6.pth) | COCO-Val2017 | 1408x1408 | 50.7 31 | | COCO-Train2017 | [efficientdet-d7.pth](https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientdet-d7.pth) | COCO-Val2017 | 1536x1536 | 51.2 32 | 33 | ### 所需环境 34 | torch==1.2.0 35 | 36 | ### 文件下载 37 | 训练所需的pth可以在百度网盘下载。 38 | 包括Efficientdet-d0到d7所有权重。 39 | 链接: https://pan.baidu.com/s/1cTNR63gTizlggSgwDrmwxg 40 | 提取码: hk96 41 | 42 | VOC数据集下载地址如下,里面已经包括了训练集、测试集、验证集(与测试集一样),无需再次划分: 43 | 链接: https://pan.baidu.com/s/1-1Ej6dayrx3g0iAA88uY5A 44 | 提取码: ph32 45 | 46 | ## 训练步骤 47 | ### a、训练VOC07+12数据集 48 | 1. 数据集的准备 49 | **本文使用VOC格式进行训练,训练前需要下载好VOC07+12的数据集,解压后放在根目录** 50 | 51 | 2. 数据集的处理 52 | 修改voc_annotation.py里面的annotation_mode=2,运行voc_annotation.py生成根目录下的2007_train.txt和2007_val.txt。 53 | 54 | 3. 开始网络训练 55 | train.py的默认参数用于训练VOC数据集,直接运行train.py即可开始训练。 56 | 57 | 4. 训练结果预测 58 | 训练结果预测需要用到两个文件,分别是efficientdet.py和predict.py。我们首先需要去efficientdet.py里面修改model_path以及classes_path,这两个参数必须要修改。 59 | **model_path指向训练好的权值文件,在logs文件夹里。 60 | classes_path指向检测类别所对应的txt。** 61 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。 62 | 63 | ### b、训练自己的数据集 64 | 1. 数据集的准备 65 | **本文使用VOC格式进行训练,训练前需要自己制作好数据集,** 66 | 训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。 67 | 训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。 68 | 69 | 2. 数据集的处理 70 | 在完成数据集的摆放之后,我们需要利用voc_annotation.py获得训练用的2007_train.txt和2007_val.txt。 71 | 修改voc_annotation.py里面的参数。第一次训练可以仅修改classes_path,classes_path用于指向检测类别所对应的txt。 72 | 训练自己的数据集时,可以自己建立一个cls_classes.txt,里面写自己所需要区分的类别。 73 | model_data/cls_classes.txt文件内容为: 74 | ```python 75 | cat 76 | dog 77 | ... 78 | ``` 79 | 修改voc_annotation.py中的classes_path,使其对应cls_classes.txt,并运行voc_annotation.py。 80 | 81 | 3. 开始网络训练 82 | **训练的参数较多,均在train.py中,大家可以在下载库后仔细看注释,其中最重要的部分依然是train.py里的classes_path。** 83 | **classes_path用于指向检测类别所对应的txt,这个txt和voc_annotation.py里面的txt一样!训练自己的数据集必须要修改!** 84 | 修改完classes_path后就可以运行train.py开始训练了,在训练多个epoch后,权值会生成在logs文件夹中。 85 | 86 | 4. 训练结果预测 87 | 训练结果预测需要用到两个文件,分别是efficientdet.py和predict.py。在efficientdet.py里面修改model_path以及classes_path。 88 | **model_path指向训练好的权值文件,在logs文件夹里。 89 | classes_path指向检测类别所对应的txt。** 90 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。 91 | 92 | ## 预测步骤 93 | ### a、使用预训练权重 94 | 1. 下载完库后解压,在百度网盘下载权值,放入model_data,运行predict.py,输入 95 | ```python 96 | img/street.jpg 97 | ``` 98 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。 99 | ### b、使用自己训练的权重 100 | 1. 按照训练步骤训练。 101 | 2. 在efficientdet.py文件里面,在如下部分修改model_path和classes_path使其对应训练好的文件;**model_path对应logs文件夹下面的权值文件,classes_path是model_path对应分的类**。 102 | ```python 103 | _defaults = { 104 | #--------------------------------------------------------------------------# 105 | # 使用自己训练好的模型进行预测一定要修改model_path和classes_path! 106 | # model_path指向logs文件夹下的权值文件,classes_path指向model_data下的txt 107 | # 如果出现shape不匹配,同时要注意训练时的model_path和classes_path参数的修改 108 | #--------------------------------------------------------------------------# 109 | "model_path" : 'model_data/efficientdet-d0.pth', 110 | "classes_path" : 'model_data/coco_classes.txt', 111 | #---------------------------------------------------------------------# 112 | # 用于选择所使用的模型的版本,0-7 113 | #---------------------------------------------------------------------# 114 | "phi" : 0, 115 | #---------------------------------------------------------------------# 116 | # 只有得分大于置信度的预测框会被保留下来 117 | #---------------------------------------------------------------------# 118 | "confidence" : 0.3, 119 | #---------------------------------------------------------------------# 120 | # 非极大抑制所用到的nms_iou大小 121 | #---------------------------------------------------------------------# 122 | "nms_iou" : 0.3, 123 | #---------------------------------------------------------------------# 124 | # 该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize, 125 | # 在多次测试后,发现关闭letterbox_image直接resize的效果更好 126 | #---------------------------------------------------------------------# 127 | "letterbox_image" : False, 128 | #---------------------------------------------------------------------# 129 | # 是否使用Cuda 130 | # 没有GPU可以设置成False 131 | #---------------------------------------------------------------------# 132 | "cuda" : True 133 | } 134 | ``` 135 | 3. 运行predict.py,输入 136 | ```python 137 | img/street.jpg 138 | ``` 139 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。 140 | 141 | ## 评估步骤 142 | ### a、评估VOC07+12的测试集 143 | 1. 本文使用VOC格式进行评估。VOC07+12已经划分好了测试集,无需利用voc_annotation.py生成ImageSets文件夹下的txt。 144 | 2. 在efficientdet.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件,在logs文件夹里。classes_path指向检测类别所对应的txt。** 145 | 3. 运行get_map.py即可获得评估结果,评估结果会保存在map_out文件夹中。 146 | 147 | ### b、评估自己的数据集 148 | 1. 本文使用VOC格式进行评估。 149 | 2. 如果在训练前已经运行过voc_annotation.py文件,代码会自动将数据集划分成训练集、验证集和测试集。如果想要修改测试集的比例,可以修改voc_annotation.py文件下的trainval_percent。trainval_percent用于指定(训练集+验证集)与测试集的比例,默认情况下 (训练集+验证集):测试集 = 9:1。train_percent用于指定(训练集+验证集)中训练集与验证集的比例,默认情况下 训练集:验证集 = 9:1。 150 | 3. 利用voc_annotation.py划分测试集后,前往get_map.py文件修改classes_path,classes_path用于指向检测类别所对应的txt,这个txt和训练时的txt一样。评估自己的数据集必须要修改。 151 | 4. 在efficientdet.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件,在logs文件夹里。classes_path指向检测类别所对应的txt。** 152 | 5. 运行get_map.py即可获得评估结果,评估结果会保存在map_out文件夹中。 153 | 154 | ### Reference 155 | https://github.com/zylo117/Yet-Another-EfficientDet-Pytorch 156 | https://github.com/Cartucho/mAP 157 | -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/Annotations/README.md: -------------------------------------------------------------------------------- 1 | 存放标签文件 -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/ImageSets/Main/README.md: -------------------------------------------------------------------------------- 1 | 存放训练索引文件 -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/JPEGImages/README.md: -------------------------------------------------------------------------------- 1 | 存放图片文件 -------------------------------------------------------------------------------- /efficientdet.py: -------------------------------------------------------------------------------- 1 | import colorsys 2 | import os 3 | import time 4 | 5 | import numpy as np 6 | import torch 7 | import torch.nn as nn 8 | from PIL import ImageDraw, ImageFont 9 | 10 | from nets.efficientdet import EfficientDetBackbone 11 | from utils.utils import (cvtColor, get_classes, image_sizes, preprocess_input, 12 | resize_image, show_config) 13 | from utils.utils_bbox import decodebox, non_max_suppression 14 | 15 | 16 | #--------------------------------------------# 17 | # 使用自己训练好的模型预测需要修改3个参数 18 | # model_path和classes_path和phi都需要修改! 19 | # 如果出现shape不匹配,一定要注意 20 | # 训练时的model_path和classes_path参数的修改 21 | #--------------------------------------------# 22 | class Efficientdet(object): 23 | _defaults = { 24 | #--------------------------------------------------------------------------# 25 | # 使用自己训练好的模型进行预测一定要修改model_path和classes_path! 26 | # model_path指向logs文件夹下的权值文件,classes_path指向model_data下的txt 27 | # 28 | # 训练好后logs文件夹下存在多个权值文件,选择验证集损失较低的即可。 29 | # 验证集损失较低不代表mAP较高,仅代表该权值在验证集上泛化性能较好。 30 | # 如果出现shape不匹配,同时要注意训练时的model_path和classes_path参数的修改 31 | #--------------------------------------------------------------------------# 32 | "model_path" : 'model_data/efficientdet-d0.pth', 33 | "classes_path" : 'model_data/coco_classes.txt', 34 | #---------------------------------------------------------------------# 35 | # 用于选择所使用的模型的版本,0-7 36 | #---------------------------------------------------------------------# 37 | "phi" : 0, 38 | #---------------------------------------------------------------------# 39 | # 只有得分大于置信度的预测框会被保留下来 40 | #---------------------------------------------------------------------# 41 | "confidence" : 0.3, 42 | #---------------------------------------------------------------------# 43 | # 非极大抑制所用到的nms_iou大小 44 | #---------------------------------------------------------------------# 45 | "nms_iou" : 0.3, 46 | #---------------------------------------------------------------------# 47 | # 该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize, 48 | # 在多次测试后,发现关闭letterbox_image直接resize的效果更好 49 | #---------------------------------------------------------------------# 50 | "letterbox_image" : False, 51 | #---------------------------------------------------------------------# 52 | # 是否使用Cuda 53 | # 没有GPU可以设置成False 54 | #---------------------------------------------------------------------# 55 | "cuda" : True 56 | } 57 | 58 | @classmethod 59 | def get_defaults(cls, n): 60 | if n in cls._defaults: 61 | return cls._defaults[n] 62 | else: 63 | return "Unrecognized attribute name '" + n + "'" 64 | 65 | #---------------------------------------------------# 66 | # 初始化Efficientdet 67 | #---------------------------------------------------# 68 | def __init__(self, **kwargs): 69 | self.__dict__.update(self._defaults) 70 | for name, value in kwargs.items(): 71 | setattr(self, name, value) 72 | self._defaults[name] = value 73 | 74 | self.input_shape = [image_sizes[self.phi], image_sizes[self.phi]] 75 | #---------------------------------------------------# 76 | # 计算总的类的数量 77 | #---------------------------------------------------# 78 | self.class_names, self.num_classes = get_classes(self.classes_path) 79 | 80 | #---------------------------------------------------# 81 | # 画框设置不同的颜色 82 | #---------------------------------------------------# 83 | hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)] 84 | self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) 85 | self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors)) 86 | 87 | self.generate() 88 | 89 | show_config(**self._defaults) 90 | 91 | #---------------------------------------------------# 92 | # 载入模型 93 | #---------------------------------------------------# 94 | def generate(self): 95 | #----------------------------------------# 96 | # 创建Efficientdet模型 97 | #----------------------------------------# 98 | self.net = EfficientDetBackbone(self.num_classes, self.phi) 99 | 100 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 101 | self.net.load_state_dict(torch.load(self.model_path, map_location=device)) 102 | self.net = self.net.eval() 103 | print('{} model, anchors, and classes loaded.'.format(self.model_path)) 104 | 105 | if self.cuda: 106 | self.net = nn.DataParallel(self.net) 107 | self.net = self.net.cuda() 108 | 109 | #---------------------------------------------------# 110 | # 检测图片 111 | #---------------------------------------------------# 112 | def detect_image(self, image, crop = False, count = False): 113 | #---------------------------------------------------# 114 | # 计算输入图片的高和宽 115 | #---------------------------------------------------# 116 | image_shape = np.array(np.shape(image)[0:2]) 117 | #---------------------------------------------------------# 118 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 119 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 120 | #---------------------------------------------------------# 121 | image = cvtColor(image) 122 | #---------------------------------------------------------# 123 | # 给图像增加灰条,实现不失真的resize 124 | # 也可以直接resize进行识别 125 | #---------------------------------------------------------# 126 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 127 | #---------------------------------------------------------# 128 | # 添加上batch_size维度,图片预处理,归一化。 129 | #---------------------------------------------------------# 130 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 131 | 132 | with torch.no_grad(): 133 | images = torch.from_numpy(image_data) 134 | if self.cuda: 135 | images = images.cuda() 136 | #---------------------------------------------------------# 137 | # 传入网络当中进行预测 138 | #---------------------------------------------------------# 139 | _, regression, classification, anchors = self.net(images) 140 | 141 | #-----------------------------------------------------------# 142 | # 将预测结果进行解码 143 | #-----------------------------------------------------------# 144 | outputs = decodebox(regression, anchors, self.input_shape) 145 | results = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 146 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 147 | 148 | if results[0] is None: 149 | return image 150 | 151 | top_label = np.array(results[0][:, 5], dtype = 'int32') 152 | top_conf = results[0][:, 4] 153 | top_boxes = results[0][:, :4] 154 | 155 | #---------------------------------------------------------# 156 | # 设置字体与边框厚度 157 | #---------------------------------------------------------# 158 | font = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) 159 | thickness = int(max((image.size[0] + image.size[1]) // np.mean(self.input_shape), 1)) 160 | #---------------------------------------------------------# 161 | # 计数 162 | #---------------------------------------------------------# 163 | if count: 164 | print("top_label:", top_label) 165 | classes_nums = np.zeros([self.num_classes]) 166 | for i in range(self.num_classes): 167 | num = np.sum(top_label == i) 168 | if num > 0: 169 | print(self.class_names[i], " : ", num) 170 | classes_nums[i] = num 171 | print("classes_nums:", classes_nums) 172 | #---------------------------------------------------------# 173 | # 是否进行目标的裁剪 174 | #---------------------------------------------------------# 175 | if crop: 176 | for i, c in list(enumerate(top_label)): 177 | top, left, bottom, right = top_boxes[i] 178 | top = max(0, np.floor(top).astype('int32')) 179 | left = max(0, np.floor(left).astype('int32')) 180 | bottom = min(image.size[1], np.floor(bottom).astype('int32')) 181 | right = min(image.size[0], np.floor(right).astype('int32')) 182 | 183 | dir_save_path = "img_crop" 184 | if not os.path.exists(dir_save_path): 185 | os.makedirs(dir_save_path) 186 | crop_image = image.crop([left, top, right, bottom]) 187 | crop_image.save(os.path.join(dir_save_path, "crop_" + str(i) + ".png"), quality=95, subsampling=0) 188 | print("save crop_" + str(i) + ".png to " + dir_save_path) 189 | #---------------------------------------------------------# 190 | # 图像绘制 191 | #---------------------------------------------------------# 192 | for i, c in list(enumerate(top_label)): 193 | predicted_class = self.class_names[int(c)] 194 | box = top_boxes[i] 195 | score = top_conf[i] 196 | 197 | top, left, bottom, right = box 198 | 199 | top = max(0, np.floor(top).astype('int32')) 200 | left = max(0, np.floor(left).astype('int32')) 201 | bottom = min(image.size[1], np.floor(bottom).astype('int32')) 202 | right = min(image.size[0], np.floor(right).astype('int32')) 203 | 204 | label = '{} {:.2f}'.format(predicted_class, score) 205 | draw = ImageDraw.Draw(image) 206 | label_size = draw.textsize(label, font) 207 | label = label.encode('utf-8') 208 | print(label, top, left, bottom, right) 209 | 210 | if top - label_size[1] >= 0: 211 | text_origin = np.array([left, top - label_size[1]]) 212 | else: 213 | text_origin = np.array([left, top + 1]) 214 | 215 | for i in range(thickness): 216 | draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c]) 217 | draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c]) 218 | draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font) 219 | del draw 220 | 221 | return image 222 | 223 | def get_FPS(self, image, test_interval): 224 | image_shape = np.array(np.shape(image)[0:2]) 225 | #---------------------------------------------------------# 226 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 227 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 228 | #---------------------------------------------------------# 229 | image = cvtColor(image) 230 | #---------------------------------------------------------# 231 | # 给图像增加灰条,实现不失真的resize 232 | # 也可以直接resize进行识别 233 | #---------------------------------------------------------# 234 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 235 | #---------------------------------------------------------# 236 | # 添加上batch_size维度,图片预处理,归一化。 237 | #---------------------------------------------------------# 238 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 239 | 240 | with torch.no_grad(): 241 | images = torch.from_numpy(image_data) 242 | if self.cuda: 243 | images = images.cuda() 244 | #---------------------------------------------------------# 245 | # 传入网络当中进行预测 246 | #---------------------------------------------------------# 247 | _, regression, classification, anchors = self.net(images) 248 | 249 | #-----------------------------------------------------------# 250 | # 将预测结果进行解码 251 | #-----------------------------------------------------------# 252 | outputs = decodebox(regression, anchors, self.input_shape) 253 | results = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 254 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 255 | 256 | t1 = time.time() 257 | for _ in range(test_interval): 258 | with torch.no_grad(): 259 | #---------------------------------------------------------# 260 | # 传入网络当中进行预测 261 | #---------------------------------------------------------# 262 | _, regression, classification, anchors = self.net(images) 263 | 264 | #-----------------------------------------------------------# 265 | # 将预测结果进行解码 266 | #-----------------------------------------------------------# 267 | outputs = decodebox(regression, anchors, self.input_shape) 268 | results = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 269 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 270 | 271 | t2 = time.time() 272 | tact_time = (t2 - t1) / test_interval 273 | return tact_time 274 | 275 | #---------------------------------------------------# 276 | # 检测图片 277 | #---------------------------------------------------# 278 | def get_map_txt(self, image_id, image, class_names, map_out_path): 279 | f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 280 | image_shape = np.array(np.shape(image)[0:2]) 281 | #---------------------------------------------------------# 282 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 283 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 284 | #---------------------------------------------------------# 285 | image = cvtColor(image) 286 | #---------------------------------------------------------# 287 | # 给图像增加灰条,实现不失真的resize 288 | # 也可以直接resize进行识别 289 | #---------------------------------------------------------# 290 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 291 | #---------------------------------------------------------# 292 | # 添加上batch_size维度,图片预处理,归一化。 293 | #---------------------------------------------------------# 294 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 295 | 296 | with torch.no_grad(): 297 | images = torch.from_numpy(image_data) 298 | if self.cuda: 299 | images = images.cuda() 300 | #---------------------------------------------------------# 301 | # 传入网络当中进行预测 302 | #---------------------------------------------------------# 303 | _, regression, classification, anchors = self.net(images) 304 | 305 | #-----------------------------------------------------------# 306 | # 将预测结果进行解码 307 | #-----------------------------------------------------------# 308 | outputs = decodebox(regression, anchors, self.input_shape) 309 | results = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 310 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 311 | 312 | if results[0] is None: 313 | return 314 | 315 | top_label = np.array(results[0][:, 5], dtype = 'int32') 316 | top_conf = results[0][:, 4] 317 | top_boxes = results[0][:, :4] 318 | 319 | for i, c in list(enumerate(top_label)): 320 | predicted_class = self.class_names[int(c)] 321 | box = top_boxes[i] 322 | score = str(top_conf[i]) 323 | 324 | top, left, bottom, right = box 325 | if predicted_class not in class_names: 326 | continue 327 | 328 | f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom)))) 329 | 330 | f.close() 331 | return 332 | -------------------------------------------------------------------------------- /get_map.py: -------------------------------------------------------------------------------- 1 | import os 2 | import xml.etree.ElementTree as ET 3 | 4 | from PIL import Image 5 | from tqdm import tqdm 6 | 7 | from utils.utils import get_classes 8 | from utils.utils_map import get_coco_map, get_map 9 | from efficientdet import Efficientdet 10 | 11 | if __name__ == "__main__": 12 | ''' 13 | Recall和Precision不像AP是一个面积的概念,因此在门限值(Confidence)不同时,网络的Recall和Precision值是不同的。 14 | 默认情况下,本代码计算的Recall和Precision代表的是当门限值(Confidence)为0.5时,所对应的Recall和Precision值。 15 | 16 | 受到mAP计算原理的限制,网络在计算mAP时需要获得近乎所有的预测框,这样才可以计算不同门限条件下的Recall和Precision值 17 | 因此,本代码获得的map_out/detection-results/里面的txt的框的数量一般会比直接predict多一些,目的是列出所有可能的预测框, 18 | ''' 19 | #------------------------------------------------------------------------------------------------------------------# 20 | # map_mode用于指定该文件运行时计算的内容 21 | # map_mode为0代表整个map计算流程,包括获得预测结果、获得真实框、计算VOC_map。 22 | # map_mode为1代表仅仅获得预测结果。 23 | # map_mode为2代表仅仅获得真实框。 24 | # map_mode为3代表仅仅计算VOC_map。 25 | # map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行 26 | #-------------------------------------------------------------------------------------------------------------------# 27 | map_mode = 0 28 | #--------------------------------------------------------------------------------------# 29 | # 此处的classes_path用于指定需要测量VOC_map的类别 30 | # 一般情况下与训练和预测所用的classes_path一致即可 31 | #--------------------------------------------------------------------------------------# 32 | classes_path = 'model_data/voc_classes.txt' 33 | #--------------------------------------------------------------------------------------# 34 | # MINOVERLAP用于指定想要获得的mAP0.x,mAP0.x的意义是什么请同学们百度一下。 35 | # 比如计算mAP0.75,可以设定MINOVERLAP = 0.75。 36 | # 37 | # 当某一预测框与真实框重合度大于MINOVERLAP时,该预测框被认为是正样本,否则为负样本。 38 | # 因此MINOVERLAP的值越大,预测框要预测的越准确才能被认为是正样本,此时算出来的mAP值越低, 39 | #--------------------------------------------------------------------------------------# 40 | MINOVERLAP = 0.5 41 | #--------------------------------------------------------------------------------------# 42 | # 受到mAP计算原理的限制,网络在计算mAP时需要获得近乎所有的预测框,这样才可以计算mAP 43 | # 因此,confidence的值应当设置的尽量小进而获得全部可能的预测框。 44 | # 45 | # 该值一般不调整。因为计算mAP需要获得近乎所有的预测框,此处的confidence不能随便更改。 46 | # 想要获得不同门限值下的Recall和Precision值,请修改下方的score_threhold。 47 | #--------------------------------------------------------------------------------------# 48 | confidence = 0.02 49 | #--------------------------------------------------------------------------------------# 50 | # 预测时使用到的非极大抑制值的大小,越大表示非极大抑制越不严格。 51 | # 52 | # 该值一般不调整。 53 | #--------------------------------------------------------------------------------------# 54 | nms_iou = 0.5 55 | #---------------------------------------------------------------------------------------------------------------# 56 | # Recall和Precision不像AP是一个面积的概念,因此在门限值不同时,网络的Recall和Precision值是不同的。 57 | # 58 | # 默认情况下,本代码计算的Recall和Precision代表的是当门限值为0.5(此处定义为score_threhold)时所对应的Recall和Precision值。 59 | # 因为计算mAP需要获得近乎所有的预测框,上面定义的confidence不能随便更改。 60 | # 这里专门定义一个score_threhold用于代表门限值,进而在计算mAP时找到门限值对应的Recall和Precision值。 61 | #---------------------------------------------------------------------------------------------------------------# 62 | score_threhold = 0.5 63 | #-------------------------------------------------------# 64 | # map_vis用于指定是否开启VOC_map计算的可视化 65 | #-------------------------------------------------------# 66 | map_vis = False 67 | #-------------------------------------------------------# 68 | # 指向VOC数据集所在的文件夹 69 | # 默认指向根目录下的VOC数据集 70 | #-------------------------------------------------------# 71 | VOCdevkit_path = 'VOCdevkit' 72 | #-------------------------------------------------------# 73 | # 结果输出的文件夹,默认为map_out 74 | #-------------------------------------------------------# 75 | map_out_path = 'map_out' 76 | 77 | image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split() 78 | 79 | if not os.path.exists(map_out_path): 80 | os.makedirs(map_out_path) 81 | if not os.path.exists(os.path.join(map_out_path, 'ground-truth')): 82 | os.makedirs(os.path.join(map_out_path, 'ground-truth')) 83 | if not os.path.exists(os.path.join(map_out_path, 'detection-results')): 84 | os.makedirs(os.path.join(map_out_path, 'detection-results')) 85 | if not os.path.exists(os.path.join(map_out_path, 'images-optional')): 86 | os.makedirs(os.path.join(map_out_path, 'images-optional')) 87 | 88 | class_names, _ = get_classes(classes_path) 89 | 90 | if map_mode == 0 or map_mode == 1: 91 | print("Load model.") 92 | efficientdet = Efficientdet(confidence = confidence, nms_iou = nms_iou) 93 | print("Load model done.") 94 | 95 | print("Get predict result.") 96 | for image_id in tqdm(image_ids): 97 | image_path = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg") 98 | image = Image.open(image_path) 99 | if map_vis: 100 | image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg")) 101 | efficientdet.get_map_txt(image_id, image, class_names, map_out_path) 102 | print("Get predict result done.") 103 | 104 | if map_mode == 0 or map_mode == 2: 105 | print("Get ground truth result.") 106 | for image_id in tqdm(image_ids): 107 | with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f: 108 | root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot() 109 | for obj in root.findall('object'): 110 | difficult_flag = False 111 | if obj.find('difficult')!=None: 112 | difficult = obj.find('difficult').text 113 | if int(difficult)==1: 114 | difficult_flag = True 115 | obj_name = obj.find('name').text 116 | if obj_name not in class_names: 117 | continue 118 | bndbox = obj.find('bndbox') 119 | left = bndbox.find('xmin').text 120 | top = bndbox.find('ymin').text 121 | right = bndbox.find('xmax').text 122 | bottom = bndbox.find('ymax').text 123 | 124 | if difficult_flag: 125 | new_f.write("%s %s %s %s %s difficult\n" % (obj_name, left, top, right, bottom)) 126 | else: 127 | new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom)) 128 | print("Get ground truth result done.") 129 | 130 | if map_mode == 0 or map_mode == 3: 131 | print("Get map.") 132 | get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path) 133 | print("Get map done.") 134 | 135 | if map_mode == 4: 136 | print("Get map.") 137 | get_coco_map(class_names = class_names, path = map_out_path) 138 | print("Get map done.") 139 | -------------------------------------------------------------------------------- /img/street.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/efficientdet-pytorch/172bd8c85daeba96a03b955e6c4776a1afd99cca/img/street.jpg -------------------------------------------------------------------------------- /logs/README.md: -------------------------------------------------------------------------------- 1 | 训练好的权重会保存在这里 2 | -------------------------------------------------------------------------------- /model_data/coco_classes.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | 13 | stop sign 14 | parking meter 15 | bench 16 | bird 17 | cat 18 | dog 19 | horse 20 | sheep 21 | cow 22 | elephant 23 | bear 24 | zebra 25 | giraffe 26 | 27 | backpack 28 | umbrella 29 | 30 | 31 | handbag 32 | tie 33 | suitcase 34 | frisbee 35 | skis 36 | snowboard 37 | sports ball 38 | kite 39 | baseball bat 40 | baseball glove 41 | skateboard 42 | surfboard 43 | tennis racket 44 | bottle 45 | 46 | wine glass 47 | cup 48 | fork 49 | knife 50 | spoon 51 | bowl 52 | banana 53 | apple 54 | sandwich 55 | orange 56 | broccoli 57 | carrot 58 | hot dog 59 | pizza 60 | donut 61 | cake 62 | chair 63 | sofa 64 | pottedplant 65 | bed 66 | 67 | diningtable 68 | 69 | 70 | toilet 71 | 72 | tvmonitor 73 | laptop 74 | mouse 75 | remote 76 | keyboard 77 | cell phone 78 | microwave 79 | oven 80 | toaster 81 | sink 82 | refrigerator 83 | 84 | book 85 | clock 86 | vase 87 | scissors 88 | teddy bear 89 | hair drier 90 | toothbrush 91 | -------------------------------------------------------------------------------- /model_data/simhei.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bubbliiiing/efficientdet-pytorch/172bd8c85daeba96a03b955e6c4776a1afd99cca/model_data/simhei.ttf -------------------------------------------------------------------------------- /model_data/voc_classes.txt: -------------------------------------------------------------------------------- 1 | aeroplane 2 | bicycle 3 | bird 4 | boat 5 | bottle 6 | bus 7 | car 8 | cat 9 | chair 10 | cow 11 | diningtable 12 | dog 13 | horse 14 | motorbike 15 | person 16 | pottedplant 17 | sheep 18 | sofa 19 | train 20 | tvmonitor -------------------------------------------------------------------------------- /nets/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /nets/efficientdet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from utils.anchors import Anchors 4 | 5 | from nets.efficientnet import EfficientNet as EffNet 6 | from nets.layers import (Conv2dStaticSamePadding, MaxPool2dStaticSamePadding, 7 | MemoryEfficientSwish, Swish) 8 | 9 | 10 | #----------------------------------# 11 | # Xception中深度可分离卷积 12 | # 先3x3的深度可分离卷积 13 | # 再1x1的普通卷积 14 | #----------------------------------# 15 | class SeparableConvBlock(nn.Module): 16 | def __init__(self, in_channels, out_channels=None, norm=True, activation=False, onnx_export=False): 17 | super(SeparableConvBlock, self).__init__() 18 | if out_channels is None: 19 | out_channels = in_channels 20 | 21 | self.depthwise_conv = Conv2dStaticSamePadding(in_channels, in_channels, kernel_size=3, stride=1, groups=in_channels, bias=False) 22 | self.pointwise_conv = Conv2dStaticSamePadding(in_channels, out_channels, kernel_size=1, stride=1) 23 | 24 | self.norm = norm 25 | if self.norm: 26 | self.bn = nn.BatchNorm2d(num_features=out_channels, momentum=0.01, eps=1e-3) 27 | 28 | self.activation = activation 29 | if self.activation: 30 | self.swish = MemoryEfficientSwish() if not onnx_export else Swish() 31 | 32 | def forward(self, x): 33 | x = self.depthwise_conv(x) 34 | x = self.pointwise_conv(x) 35 | 36 | if self.norm: 37 | x = self.bn(x) 38 | 39 | if self.activation: 40 | x = self.swish(x) 41 | 42 | return x 43 | 44 | class BiFPN(nn.Module): 45 | def __init__(self, num_channels, conv_channels, first_time=False, epsilon=1e-4, onnx_export=False, attention=True): 46 | super(BiFPN, self).__init__() 47 | self.epsilon = epsilon 48 | self.conv6_up = SeparableConvBlock(num_channels, onnx_export=onnx_export) 49 | self.conv5_up = SeparableConvBlock(num_channels, onnx_export=onnx_export) 50 | self.conv4_up = SeparableConvBlock(num_channels, onnx_export=onnx_export) 51 | self.conv3_up = SeparableConvBlock(num_channels, onnx_export=onnx_export) 52 | 53 | self.conv4_down = SeparableConvBlock(num_channels, onnx_export=onnx_export) 54 | self.conv5_down = SeparableConvBlock(num_channels, onnx_export=onnx_export) 55 | self.conv6_down = SeparableConvBlock(num_channels, onnx_export=onnx_export) 56 | self.conv7_down = SeparableConvBlock(num_channels, onnx_export=onnx_export) 57 | 58 | self.p6_upsample = nn.Upsample(scale_factor=2, mode='nearest') 59 | self.p5_upsample = nn.Upsample(scale_factor=2, mode='nearest') 60 | self.p4_upsample = nn.Upsample(scale_factor=2, mode='nearest') 61 | self.p3_upsample = nn.Upsample(scale_factor=2, mode='nearest') 62 | 63 | self.p4_downsample = MaxPool2dStaticSamePadding(3, 2) 64 | self.p5_downsample = MaxPool2dStaticSamePadding(3, 2) 65 | self.p6_downsample = MaxPool2dStaticSamePadding(3, 2) 66 | self.p7_downsample = MaxPool2dStaticSamePadding(3, 2) 67 | 68 | self.swish = MemoryEfficientSwish() if not onnx_export else Swish() 69 | 70 | self.first_time = first_time 71 | if self.first_time: 72 | # 获取到了efficientnet的最后三层,对其进行通道的下压缩 73 | self.p5_down_channel = nn.Sequential( 74 | Conv2dStaticSamePadding(conv_channels[2], num_channels, 1), 75 | nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3), 76 | ) 77 | self.p4_down_channel = nn.Sequential( 78 | Conv2dStaticSamePadding(conv_channels[1], num_channels, 1), 79 | nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3), 80 | ) 81 | self.p3_down_channel = nn.Sequential( 82 | Conv2dStaticSamePadding(conv_channels[0], num_channels, 1), 83 | nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3), 84 | ) 85 | 86 | # 对输入进来的p5进行宽高的下采样 87 | self.p5_to_p6 = nn.Sequential( 88 | Conv2dStaticSamePadding(conv_channels[2], num_channels, 1), 89 | nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3), 90 | MaxPool2dStaticSamePadding(3, 2) 91 | ) 92 | self.p6_to_p7 = nn.Sequential( 93 | MaxPool2dStaticSamePadding(3, 2) 94 | ) 95 | 96 | # BIFPN第一轮的时候,跳线那里并不是同一个in 97 | self.p4_down_channel_2 = nn.Sequential( 98 | Conv2dStaticSamePadding(conv_channels[1], num_channels, 1), 99 | nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3), 100 | ) 101 | self.p5_down_channel_2 = nn.Sequential( 102 | Conv2dStaticSamePadding(conv_channels[2], num_channels, 1), 103 | nn.BatchNorm2d(num_channels, momentum=0.01, eps=1e-3), 104 | ) 105 | 106 | # 简易注意力机制的weights 107 | self.p6_w1 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True) 108 | self.p6_w1_relu = nn.ReLU() 109 | self.p5_w1 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True) 110 | self.p5_w1_relu = nn.ReLU() 111 | self.p4_w1 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True) 112 | self.p4_w1_relu = nn.ReLU() 113 | self.p3_w1 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True) 114 | self.p3_w1_relu = nn.ReLU() 115 | 116 | self.p4_w2 = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True) 117 | self.p4_w2_relu = nn.ReLU() 118 | self.p5_w2 = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True) 119 | self.p5_w2_relu = nn.ReLU() 120 | self.p6_w2 = nn.Parameter(torch.ones(3, dtype=torch.float32), requires_grad=True) 121 | self.p6_w2_relu = nn.ReLU() 122 | self.p7_w2 = nn.Parameter(torch.ones(2, dtype=torch.float32), requires_grad=True) 123 | self.p7_w2_relu = nn.ReLU() 124 | 125 | self.attention = attention 126 | 127 | def forward(self, inputs): 128 | """ bifpn模块结构示意图 129 | P7_0 -------------------------> P7_2 --------> 130 | |-------------| ↑ 131 | ↓ | 132 | P6_0 ---------> P6_1 ---------> P6_2 --------> 133 | |-------------|--------------↑ ↑ 134 | ↓ | 135 | P5_0 ---------> P5_1 ---------> P5_2 --------> 136 | |-------------|--------------↑ ↑ 137 | ↓ | 138 | P4_0 ---------> P4_1 ---------> P4_2 --------> 139 | |-------------|--------------↑ ↑ 140 | |--------------↓ | 141 | P3_0 -------------------------> P3_2 --------> 142 | """ 143 | if self.attention: 144 | p3_out, p4_out, p5_out, p6_out, p7_out = self._forward_fast_attention(inputs) 145 | else: 146 | p3_out, p4_out, p5_out, p6_out, p7_out = self._forward(inputs) 147 | 148 | return p3_out, p4_out, p5_out, p6_out, p7_out 149 | 150 | def _forward_fast_attention(self, inputs): 151 | #------------------------------------------------# 152 | # 当phi=1、2、3、4、5的时候使用fast_attention 153 | # 获得三个shape的有效特征层 154 | # 分别是C3 64, 64, 40 155 | # C4 32, 32, 112 156 | # C5 16, 16, 320 157 | #------------------------------------------------# 158 | if self.first_time: 159 | #------------------------------------------------------------------------# 160 | # 第一次BIFPN需要 下采样 与 调整通道 获得 p3_in p4_in p5_in p6_in p7_in 161 | #------------------------------------------------------------------------# 162 | p3, p4, p5 = inputs 163 | #-------------------------------------------# 164 | # 首先对通道数进行调整 165 | # C3 64, 64, 40 -> 64, 64, 64 166 | #-------------------------------------------# 167 | p3_in = self.p3_down_channel(p3) 168 | 169 | #-------------------------------------------# 170 | # 首先对通道数进行调整 171 | # C4 32, 32, 112 -> 32, 32, 64 172 | # -> 32, 32, 64 173 | #-------------------------------------------# 174 | p4_in_1 = self.p4_down_channel(p4) 175 | p4_in_2 = self.p4_down_channel_2(p4) 176 | 177 | #-------------------------------------------# 178 | # 首先对通道数进行调整 179 | # C5 16, 16, 320 -> 16, 16, 64 180 | # -> 16, 16, 64 181 | #-------------------------------------------# 182 | p5_in_1 = self.p5_down_channel(p5) 183 | p5_in_2 = self.p5_down_channel_2(p5) 184 | 185 | #-------------------------------------------# 186 | # 对C5进行下采样,调整通道数与宽高 187 | # C5 16, 16, 320 -> 8, 8, 64 188 | #-------------------------------------------# 189 | p6_in = self.p5_to_p6(p5) 190 | #-------------------------------------------# 191 | # 对P6_in进行下采样,调整宽高 192 | # P6_in 8, 8, 64 -> 4, 4, 64 193 | #-------------------------------------------# 194 | p7_in = self.p6_to_p7(p6_in) 195 | 196 | # 简单的注意力机制,用于确定更关注p7_in还是p6_in 197 | p6_w1 = self.p6_w1_relu(self.p6_w1) 198 | weight = p6_w1 / (torch.sum(p6_w1, dim=0) + self.epsilon) 199 | p6_td= self.conv6_up(self.swish(weight[0] * p6_in + weight[1] * self.p6_upsample(p7_in))) 200 | 201 | # 简单的注意力机制,用于确定更关注p6_up还是p5_in 202 | p5_w1 = self.p5_w1_relu(self.p5_w1) 203 | weight = p5_w1 / (torch.sum(p5_w1, dim=0) + self.epsilon) 204 | p5_td= self.conv5_up(self.swish(weight[0] * p5_in_1 + weight[1] * self.p5_upsample(p6_td))) 205 | 206 | # 简单的注意力机制,用于确定更关注p5_up还是p4_in 207 | p4_w1 = self.p4_w1_relu(self.p4_w1) 208 | weight = p4_w1 / (torch.sum(p4_w1, dim=0) + self.epsilon) 209 | p4_td= self.conv4_up(self.swish(weight[0] * p4_in_1 + weight[1] * self.p4_upsample(p5_td))) 210 | 211 | # 简单的注意力机制,用于确定更关注p4_up还是p3_in 212 | p3_w1 = self.p3_w1_relu(self.p3_w1) 213 | weight = p3_w1 / (torch.sum(p3_w1, dim=0) + self.epsilon) 214 | p3_out = self.conv3_up(self.swish(weight[0] * p3_in + weight[1] * self.p3_upsample(p4_td))) 215 | 216 | # 简单的注意力机制,用于确定更关注p4_in_2还是p4_up还是p3_out 217 | p4_w2 = self.p4_w2_relu(self.p4_w2) 218 | weight = p4_w2 / (torch.sum(p4_w2, dim=0) + self.epsilon) 219 | p4_out = self.conv4_down( 220 | self.swish(weight[0] * p4_in_2 + weight[1] * p4_td+ weight[2] * self.p4_downsample(p3_out))) 221 | 222 | # 简单的注意力机制,用于确定更关注p5_in_2还是p5_up还是p4_out 223 | p5_w2 = self.p5_w2_relu(self.p5_w2) 224 | weight = p5_w2 / (torch.sum(p5_w2, dim=0) + self.epsilon) 225 | p5_out = self.conv5_down( 226 | self.swish(weight[0] * p5_in_2 + weight[1] * p5_td+ weight[2] * self.p5_downsample(p4_out))) 227 | 228 | # 简单的注意力机制,用于确定更关注p6_in还是p6_up还是p5_out 229 | p6_w2 = self.p6_w2_relu(self.p6_w2) 230 | weight = p6_w2 / (torch.sum(p6_w2, dim=0) + self.epsilon) 231 | p6_out = self.conv6_down( 232 | self.swish(weight[0] * p6_in + weight[1] * p6_td+ weight[2] * self.p6_downsample(p5_out))) 233 | 234 | # 简单的注意力机制,用于确定更关注p7_in还是p7_up还是p6_out 235 | p7_w2 = self.p7_w2_relu(self.p7_w2) 236 | weight = p7_w2 / (torch.sum(p7_w2, dim=0) + self.epsilon) 237 | p7_out = self.conv7_down(self.swish(weight[0] * p7_in + weight[1] * self.p7_downsample(p6_out))) 238 | else: 239 | p3_in, p4_in, p5_in, p6_in, p7_in = inputs 240 | 241 | # 简单的注意力机制,用于确定更关注p7_in还是p6_in 242 | p6_w1 = self.p6_w1_relu(self.p6_w1) 243 | weight = p6_w1 / (torch.sum(p6_w1, dim=0) + self.epsilon) 244 | p6_td= self.conv6_up(self.swish(weight[0] * p6_in + weight[1] * self.p6_upsample(p7_in))) 245 | 246 | # 简单的注意力机制,用于确定更关注p6_up还是p5_in 247 | p5_w1 = self.p5_w1_relu(self.p5_w1) 248 | weight = p5_w1 / (torch.sum(p5_w1, dim=0) + self.epsilon) 249 | p5_td= self.conv5_up(self.swish(weight[0] * p5_in + weight[1] * self.p5_upsample(p6_td))) 250 | 251 | # 简单的注意力机制,用于确定更关注p5_up还是p4_in 252 | p4_w1 = self.p4_w1_relu(self.p4_w1) 253 | weight = p4_w1 / (torch.sum(p4_w1, dim=0) + self.epsilon) 254 | p4_td= self.conv4_up(self.swish(weight[0] * p4_in + weight[1] * self.p4_upsample(p5_td))) 255 | 256 | # 简单的注意力机制,用于确定更关注p4_up还是p3_in 257 | p3_w1 = self.p3_w1_relu(self.p3_w1) 258 | weight = p3_w1 / (torch.sum(p3_w1, dim=0) + self.epsilon) 259 | p3_out = self.conv3_up(self.swish(weight[0] * p3_in + weight[1] * self.p3_upsample(p4_td))) 260 | 261 | # 简单的注意力机制,用于确定更关注p4_in还是p4_up还是p3_out 262 | p4_w2 = self.p4_w2_relu(self.p4_w2) 263 | weight = p4_w2 / (torch.sum(p4_w2, dim=0) + self.epsilon) 264 | p4_out = self.conv4_down( 265 | self.swish(weight[0] * p4_in + weight[1] * p4_td+ weight[2] * self.p4_downsample(p3_out))) 266 | 267 | # 简单的注意力机制,用于确定更关注p5_in还是p5_up还是p4_out 268 | p5_w2 = self.p5_w2_relu(self.p5_w2) 269 | weight = p5_w2 / (torch.sum(p5_w2, dim=0) + self.epsilon) 270 | p5_out = self.conv5_down( 271 | self.swish(weight[0] * p5_in + weight[1] * p5_td+ weight[2] * self.p5_downsample(p4_out))) 272 | 273 | # 简单的注意力机制,用于确定更关注p6_in还是p6_up还是p5_out 274 | p6_w2 = self.p6_w2_relu(self.p6_w2) 275 | weight = p6_w2 / (torch.sum(p6_w2, dim=0) + self.epsilon) 276 | p6_out = self.conv6_down( 277 | self.swish(weight[0] * p6_in + weight[1] * p6_td+ weight[2] * self.p6_downsample(p5_out))) 278 | 279 | # 简单的注意力机制,用于确定更关注p7_in还是p7_up还是p6_out 280 | p7_w2 = self.p7_w2_relu(self.p7_w2) 281 | weight = p7_w2 / (torch.sum(p7_w2, dim=0) + self.epsilon) 282 | p7_out = self.conv7_down(self.swish(weight[0] * p7_in + weight[1] * self.p7_downsample(p6_out))) 283 | 284 | return p3_out, p4_out, p5_out, p6_out, p7_out 285 | 286 | def _forward(self, inputs): 287 | # 当phi=6、7的时候使用_forward 288 | if self.first_time: 289 | # 第一次BIFPN需要下采样与降通道获得 290 | # p3_in p4_in p5_in p6_in p7_in 291 | p3, p4, p5 = inputs 292 | p3_in = self.p3_down_channel(p3) 293 | p4_in_1 = self.p4_down_channel(p4) 294 | p4_in_2 = self.p4_down_channel_2(p4) 295 | p5_in_1 = self.p5_down_channel(p5) 296 | p5_in_2 = self.p5_down_channel_2(p5) 297 | p6_in = self.p5_to_p6(p5) 298 | p7_in = self.p6_to_p7(p6_in) 299 | 300 | p6_td= self.conv6_up(self.swish(p6_in + self.p6_upsample(p7_in))) 301 | 302 | p5_td= self.conv5_up(self.swish(p5_in_1 + self.p5_upsample(p6_td))) 303 | 304 | p4_td= self.conv4_up(self.swish(p4_in_1 + self.p4_upsample(p5_td))) 305 | 306 | p3_out = self.conv3_up(self.swish(p3_in + self.p3_upsample(p4_td))) 307 | 308 | p4_out = self.conv4_down( 309 | self.swish(p4_in_2 + p4_td+ self.p4_downsample(p3_out))) 310 | 311 | p5_out = self.conv5_down( 312 | self.swish(p5_in_2 + p5_td+ self.p5_downsample(p4_out))) 313 | 314 | p6_out = self.conv6_down( 315 | self.swish(p6_in + p6_td+ self.p6_downsample(p5_out))) 316 | 317 | p7_out = self.conv7_down(self.swish(p7_in + self.p7_downsample(p6_out))) 318 | 319 | else: 320 | p3_in, p4_in, p5_in, p6_in, p7_in = inputs 321 | 322 | p6_td= self.conv6_up(self.swish(p6_in + self.p6_upsample(p7_in))) 323 | 324 | p5_td= self.conv5_up(self.swish(p5_in + self.p5_upsample(p6_td))) 325 | 326 | p4_td= self.conv4_up(self.swish(p4_in + self.p4_upsample(p5_td))) 327 | 328 | p3_out = self.conv3_up(self.swish(p3_in + self.p3_upsample(p4_td))) 329 | 330 | p4_out = self.conv4_down( 331 | self.swish(p4_in + p4_td+ self.p4_downsample(p3_out))) 332 | 333 | p5_out = self.conv5_down( 334 | self.swish(p5_in + p5_td+ self.p5_downsample(p4_out))) 335 | 336 | p6_out = self.conv6_down( 337 | self.swish(p6_in + p6_td+ self.p6_downsample(p5_out))) 338 | 339 | p7_out = self.conv7_down(self.swish(p7_in + self.p7_downsample(p6_out))) 340 | 341 | return p3_out, p4_out, p5_out, p6_out, p7_out 342 | 343 | class BoxNet(nn.Module): 344 | def __init__(self, in_channels, num_anchors, num_layers, onnx_export=False): 345 | super(BoxNet, self).__init__() 346 | self.num_layers = num_layers 347 | 348 | self.conv_list = nn.ModuleList( 349 | [SeparableConvBlock(in_channels, in_channels, norm=False, activation=False) for i in range(num_layers)]) 350 | # 每一个有效特征层对应的Batchnor不同 351 | self.bn_list = nn.ModuleList( 352 | [nn.ModuleList([nn.BatchNorm2d(in_channels, momentum=0.01, eps=1e-3) for i in range(num_layers)]) for j in range(5)]) 353 | # 9 354 | # 4 中心,宽高 355 | self.header = SeparableConvBlock(in_channels, num_anchors * 4, norm=False, activation=False) 356 | self.swish = MemoryEfficientSwish() if not onnx_export else Swish() 357 | 358 | def forward(self, inputs): 359 | feats = [] 360 | # 对每个特征层循环 361 | for feat, bn_list in zip(inputs, self.bn_list): 362 | # 每个特征层需要进行num_layer次卷积+标准化+激活函数 363 | for i, bn, conv in zip(range(self.num_layers), bn_list, self.conv_list): 364 | feat = conv(feat) 365 | feat = bn(feat) 366 | feat = self.swish(feat) 367 | feat = self.header(feat) 368 | 369 | feat = feat.permute(0, 2, 3, 1) 370 | feat = feat.contiguous().view(feat.shape[0], -1, 4) 371 | 372 | feats.append(feat) 373 | # 进行一个堆叠 374 | feats = torch.cat(feats, dim=1) 375 | 376 | return feats 377 | 378 | class ClassNet(nn.Module): 379 | def __init__(self, in_channels, num_anchors, num_classes, num_layers, onnx_export=False): 380 | super(ClassNet, self).__init__() 381 | self.num_anchors = num_anchors 382 | self.num_classes = num_classes 383 | self.num_layers = num_layers 384 | self.conv_list = nn.ModuleList( 385 | [SeparableConvBlock(in_channels, in_channels, norm=False, activation=False) for i in range(num_layers)]) 386 | # 每一个有效特征层对应的BatchNorm2d不同 387 | self.bn_list = nn.ModuleList( 388 | [nn.ModuleList([nn.BatchNorm2d(in_channels, momentum=0.01, eps=1e-3) for i in range(num_layers)]) for j in range(5)]) 389 | # num_anchors = 9 390 | # num_anchors num_classes 391 | self.header = SeparableConvBlock(in_channels, num_anchors * num_classes, norm=False, activation=False) 392 | self.swish = MemoryEfficientSwish() if not onnx_export else Swish() 393 | 394 | def forward(self, inputs): 395 | feats = [] 396 | # 对每个特征层循环 397 | for feat, bn_list in zip(inputs, self.bn_list): 398 | for i, bn, conv in zip(range(self.num_layers), bn_list, self.conv_list): 399 | # 每个特征层需要进行num_layer次卷积+标准化+激活函数 400 | feat = conv(feat) 401 | feat = bn(feat) 402 | feat = self.swish(feat) 403 | feat = self.header(feat) 404 | 405 | feat = feat.permute(0, 2, 3, 1) 406 | feat = feat.contiguous().view(feat.shape[0], feat.shape[1], feat.shape[2], self.num_anchors, self.num_classes) 407 | feat = feat.contiguous().view(feat.shape[0], -1, self.num_classes) 408 | 409 | feats.append(feat) 410 | # 进行一个堆叠 411 | feats = torch.cat(feats, dim=1) 412 | # 取sigmoid表示概率 413 | feats = feats.sigmoid() 414 | 415 | return feats 416 | 417 | class EfficientNet(nn.Module): 418 | def __init__(self, phi, pretrained=False): 419 | super(EfficientNet, self).__init__() 420 | model = EffNet.from_pretrained(f'efficientnet-b{phi}', pretrained) 421 | del model._conv_head 422 | del model._bn1 423 | del model._avg_pooling 424 | del model._dropout 425 | del model._fc 426 | self.model = model 427 | 428 | def forward(self, x): 429 | x = self.model._conv_stem(x) 430 | x = self.model._bn0(x) 431 | x = self.model._swish(x) 432 | feature_maps = [] 433 | 434 | last_x = None 435 | for idx, block in enumerate(self.model._blocks): 436 | drop_connect_rate = self.model._global_params.drop_connect_rate 437 | if drop_connect_rate: 438 | drop_connect_rate *= float(idx) / len(self.model._blocks) 439 | x = block(x, drop_connect_rate=drop_connect_rate) 440 | #------------------------------------------------------# 441 | # 取出对应的特征层,如果某个EffcientBlock的步长为2的话 442 | # 意味着它的前一个特征层为有效特征层 443 | # 除此之外,最后一个EffcientBlock的输出为有效特征层 444 | #------------------------------------------------------# 445 | if block._depthwise_conv.stride == [2, 2]: 446 | feature_maps.append(last_x) 447 | elif idx == len(self.model._blocks) - 1: 448 | feature_maps.append(x) 449 | last_x = x 450 | del last_x 451 | return feature_maps[1:] 452 | 453 | class EfficientDetBackbone(nn.Module): 454 | def __init__(self, num_classes = 80, phi = 0, pretrained = False): 455 | super(EfficientDetBackbone, self).__init__() 456 | #--------------------------------# 457 | # phi指的是efficientdet的版本 458 | #--------------------------------# 459 | self.phi = phi 460 | #---------------------------------------------------# 461 | # backbone_phi指的是该efficientdet对应的efficient 462 | #---------------------------------------------------# 463 | self.backbone_phi = [0, 1, 2, 3, 4, 5, 6, 6] 464 | #--------------------------------# 465 | # BiFPN所用的通道数 466 | #--------------------------------# 467 | self.fpn_num_filters = [64, 88, 112, 160, 224, 288, 384, 384] 468 | #--------------------------------# 469 | # BiFPN的重复次数 470 | #--------------------------------# 471 | self.fpn_cell_repeats = [3, 4, 5, 6, 7, 7, 8, 8] 472 | #---------------------------------------------------# 473 | # Effcient Head卷积重复次数 474 | #---------------------------------------------------# 475 | self.box_class_repeats = [3, 3, 3, 4, 4, 4, 5, 5] 476 | #---------------------------------------------------# 477 | # 基础的先验框大小 478 | #---------------------------------------------------# 479 | self.anchor_scale = [4., 4., 4., 4., 4., 4., 4., 5.] 480 | num_anchors = 9 481 | 482 | conv_channel_coef = { 483 | 0: [40, 112, 320], 484 | 1: [40, 112, 320], 485 | 2: [48, 120, 352], 486 | 3: [48, 136, 384], 487 | 4: [56, 160, 448], 488 | 5: [64, 176, 512], 489 | 6: [72, 200, 576], 490 | 7: [72, 200, 576], 491 | } 492 | 493 | #------------------------------------------------------# 494 | # 在经过多次BiFPN模块的堆叠后,我们获得的fpn_features 495 | # 假设我们使用的是efficientdet-D0包括五个有效特征层: 496 | # P3_out 64,64,64 497 | # P4_out 32,32,64 498 | # P5_out 16,16,64 499 | # P6_out 8,8,64 500 | # P7_out 4,4,64 501 | #------------------------------------------------------# 502 | self.bifpn = nn.Sequential( 503 | *[BiFPN(self.fpn_num_filters[self.phi], 504 | conv_channel_coef[phi], 505 | True if _ == 0 else False, 506 | attention=True if phi < 6 else False) 507 | for _ in range(self.fpn_cell_repeats[phi])]) 508 | 509 | self.num_classes = num_classes 510 | #------------------------------------------------------# 511 | # 创建efficient head 512 | # 可以将特征层转换成预测结果 513 | #------------------------------------------------------# 514 | self.regressor = BoxNet(in_channels=self.fpn_num_filters[self.phi], num_anchors=num_anchors, 515 | num_layers=self.box_class_repeats[self.phi]) 516 | 517 | self.classifier = ClassNet(in_channels=self.fpn_num_filters[self.phi], num_anchors=num_anchors, 518 | num_classes=num_classes, num_layers=self.box_class_repeats[self.phi]) 519 | 520 | self.anchors = Anchors(anchor_scale=self.anchor_scale[phi]) 521 | 522 | #-------------------------------------------# 523 | # 获得三个shape的有效特征层 524 | # 分别是C3 64, 64, 40 525 | # C4 32, 32, 112 526 | # C5 16, 16, 320 527 | #-------------------------------------------# 528 | self.backbone_net = EfficientNet(self.backbone_phi[phi], pretrained) 529 | 530 | def freeze_bn(self): 531 | for m in self.modules(): 532 | if isinstance(m, nn.BatchNorm2d): 533 | m.eval() 534 | 535 | def forward(self, inputs): 536 | _, p3, p4, p5 = self.backbone_net(inputs) 537 | 538 | features = (p3, p4, p5) 539 | features = self.bifpn(features) 540 | 541 | regression = self.regressor(features) 542 | classification = self.classifier(features) 543 | anchors = self.anchors(inputs) 544 | 545 | return features, regression, classification, anchors 546 | 547 | -------------------------------------------------------------------------------- /nets/efficientdet_training.py: -------------------------------------------------------------------------------- 1 | import math 2 | from functools import partial 3 | 4 | import torch 5 | import torch.nn as nn 6 | 7 | 8 | def calc_iou(a, b): 9 | max_length = torch.max(a) 10 | a = a / max_length 11 | b = b / max_length 12 | 13 | area = (b[:, 2] - b[:, 0]) * (b[:, 3] - b[:, 1]) 14 | iw = torch.min(torch.unsqueeze(a[:, 3], dim=1), b[:, 2]) - torch.max(torch.unsqueeze(a[:, 1], 1), b[:, 0]) 15 | ih = torch.min(torch.unsqueeze(a[:, 2], dim=1), b[:, 3]) - torch.max(torch.unsqueeze(a[:, 0], 1), b[:, 1]) 16 | iw = torch.clamp(iw, min=0) 17 | ih = torch.clamp(ih, min=0) 18 | ua = torch.unsqueeze((a[:, 2] - a[:, 0]) * (a[:, 3] - a[:, 1]), dim=1) + area - iw * ih 19 | ua = torch.clamp(ua, min=1e-8) 20 | intersection = iw * ih 21 | IoU = intersection / ua 22 | 23 | return IoU 24 | 25 | def get_target(anchor, bbox_annotation, classification, cuda): 26 | #------------------------------------------------------# 27 | # 计算真实框和先验框的交并比 28 | # anchor num_anchors, 4 29 | # bbox_annotation num_true_boxes, 5 30 | # Iou num_anchors, num_true_boxes 31 | #------------------------------------------------------# 32 | IoU = calc_iou(anchor[:, :], bbox_annotation[:, :4]) 33 | 34 | #------------------------------------------------------# 35 | # 计算与先验框重合度最大的真实框 36 | # IoU_max num_anchors, 37 | # IoU_argmax num_anchors, 38 | #------------------------------------------------------# 39 | IoU_max, IoU_argmax = torch.max(IoU, dim=1) 40 | 41 | #------------------------------------------------------# 42 | # 寻找哪些先验框在计算loss的时候需要忽略 43 | #------------------------------------------------------# 44 | targets = torch.ones_like(classification) * -1 45 | targets = targets.type_as(classification) 46 | 47 | #------------------------------------------# 48 | # 重合度小于0.4需要参与训练 49 | #------------------------------------------# 50 | targets[torch.lt(IoU_max, 0.4), :] = 0 51 | 52 | #--------------------------------------------------# 53 | # 重合度大于0.5需要参与训练,还需要计算回归loss 54 | #--------------------------------------------------# 55 | positive_indices = torch.ge(IoU_max, 0.5) 56 | 57 | #--------------------------------------------------# 58 | # 取出每个先验框最对应的真实框 59 | #--------------------------------------------------# 60 | assigned_annotations = bbox_annotation[IoU_argmax, :] 61 | 62 | #--------------------------------------------------# 63 | # 将对应的种类置为1 64 | #--------------------------------------------------# 65 | targets[positive_indices, :] = 0 66 | targets[positive_indices, assigned_annotations[positive_indices, 4].long()] = 1 67 | #--------------------------------------------------# 68 | # 计算正样本数量 69 | #--------------------------------------------------# 70 | num_positive_anchors = positive_indices.sum() 71 | return targets, num_positive_anchors, positive_indices, assigned_annotations 72 | 73 | def encode_bbox(assigned_annotations, positive_indices, anchor_widths, anchor_heights, anchor_ctr_x, anchor_ctr_y): 74 | #--------------------------------------------------# 75 | # 取出作为正样本的先验框对应的真实框 76 | #--------------------------------------------------# 77 | assigned_annotations = assigned_annotations[positive_indices, :] 78 | 79 | #--------------------------------------------------# 80 | # 取出作为正样本的先验框 81 | #--------------------------------------------------# 82 | anchor_widths_pi = anchor_widths[positive_indices] 83 | anchor_heights_pi = anchor_heights[positive_indices] 84 | anchor_ctr_x_pi = anchor_ctr_x[positive_indices] 85 | anchor_ctr_y_pi = anchor_ctr_y[positive_indices] 86 | 87 | #--------------------------------------------------# 88 | # 计算真实框的宽高与中心 89 | #--------------------------------------------------# 90 | gt_widths = assigned_annotations[:, 2] - assigned_annotations[:, 0] 91 | gt_heights = assigned_annotations[:, 3] - assigned_annotations[:, 1] 92 | gt_ctr_x = assigned_annotations[:, 0] + 0.5 * gt_widths 93 | gt_ctr_y = assigned_annotations[:, 1] + 0.5 * gt_heights 94 | 95 | gt_widths = torch.clamp(gt_widths, min=1) 96 | gt_heights = torch.clamp(gt_heights, min=1) 97 | 98 | #---------------------------------------------------# 99 | # 利用真实框和先验框进行编码,获得应该有的预测结果 100 | #---------------------------------------------------# 101 | targets_dx = (gt_ctr_x - anchor_ctr_x_pi) / anchor_widths_pi 102 | targets_dy = (gt_ctr_y - anchor_ctr_y_pi) / anchor_heights_pi 103 | targets_dw = torch.log(gt_widths / anchor_widths_pi) 104 | targets_dh = torch.log(gt_heights / anchor_heights_pi) 105 | 106 | targets = torch.stack((targets_dy, targets_dx, targets_dh, targets_dw)) 107 | targets = targets.t() 108 | return targets 109 | 110 | class FocalLoss(nn.Module): 111 | def __init__(self): 112 | super(FocalLoss, self).__init__() 113 | 114 | def forward(self, classifications, regressions, anchors, annotations, alpha = 0.25, gamma = 2.0, cuda = True): 115 | #---------------------------# 116 | # 获得batch_size的大小 117 | #---------------------------# 118 | batch_size = classifications.shape[0] 119 | 120 | #--------------------------------------------# 121 | # 获得先验框,将先验框转换成中心宽高的形式 122 | #--------------------------------------------# 123 | dtype = regressions.dtype 124 | anchor = anchors[0, :, :].to(dtype) 125 | #--------------------------------------------# 126 | # 将先验框转换成中心,宽高的形式 127 | #--------------------------------------------# 128 | anchor_widths = anchor[:, 3] - anchor[:, 1] 129 | anchor_heights = anchor[:, 2] - anchor[:, 0] 130 | anchor_ctr_x = anchor[:, 1] + 0.5 * anchor_widths 131 | anchor_ctr_y = anchor[:, 0] + 0.5 * anchor_heights 132 | 133 | regression_losses = [] 134 | classification_losses = [] 135 | for j in range(batch_size): 136 | #-------------------------------------------------------# 137 | # 取出每张图片对应的真实框、种类预测结果和回归预测结果 138 | #-------------------------------------------------------# 139 | bbox_annotation = annotations[j] 140 | classification = classifications[j, :, :] 141 | regression = regressions[j, :, :] 142 | 143 | classification = torch.clamp(classification, 5e-4, 1.0 - 5e-4) 144 | 145 | if len(bbox_annotation) == 0: 146 | #-------------------------------------------------------# 147 | # 当图片中不存在真实框的时候,所有特征点均为负样本 148 | #-------------------------------------------------------# 149 | alpha_factor = torch.ones_like(classification) * alpha 150 | alpha_factor = alpha_factor.type_as(classification) 151 | 152 | alpha_factor = 1. - alpha_factor 153 | focal_weight = classification 154 | focal_weight = alpha_factor * torch.pow(focal_weight, gamma) 155 | 156 | #-------------------------------------------------------# 157 | # 计算特征点对应的交叉熵 158 | #-------------------------------------------------------# 159 | bce = - (torch.log(1.0 - classification)) 160 | 161 | cls_loss = focal_weight * bce 162 | 163 | classification_losses.append(cls_loss.sum()) 164 | #-------------------------------------------------------# 165 | # 回归损失此时为0 166 | #-------------------------------------------------------# 167 | regression_losses.append(torch.tensor(0).type_as(classification)) 168 | 169 | continue 170 | 171 | #------------------------------------------------------# 172 | # 计算真实框和先验框的交并比 173 | # targets num_anchors, num_classes 174 | # num_positive_anchors 正样本的数量 175 | # positive_indices num_anchors, 176 | # assigned_annotations num_anchors, 5 177 | #------------------------------------------------------# 178 | targets, num_positive_anchors, positive_indices, assigned_annotations = get_target(anchor, 179 | bbox_annotation, classification, cuda) 180 | 181 | #------------------------------------------------------# 182 | # 首先计算交叉熵loss 183 | #------------------------------------------------------# 184 | alpha_factor = torch.ones_like(targets) * alpha 185 | alpha_factor = alpha_factor.type_as(classification) 186 | #------------------------------------------------------# 187 | # 这里使用的是Focal loss的思想, 188 | # 易分类样本权值小 189 | # 难分类样本权值大 190 | #------------------------------------------------------# 191 | alpha_factor = torch.where(torch.eq(targets, 1.), alpha_factor, 1. - alpha_factor) 192 | focal_weight = torch.where(torch.eq(targets, 1.), 1. - classification, classification) 193 | focal_weight = alpha_factor * torch.pow(focal_weight, gamma) 194 | 195 | bce = - (targets * torch.log(classification) + (1.0 - targets) * torch.log(1.0 - classification)) 196 | cls_loss = focal_weight * bce 197 | 198 | #------------------------------------------------------# 199 | # 把忽略的先验框的loss置为0 200 | #------------------------------------------------------# 201 | zeros = torch.zeros_like(cls_loss) 202 | zeros = zeros.type_as(cls_loss) 203 | cls_loss = torch.where(torch.ne(targets, -1.0), cls_loss, zeros) 204 | 205 | classification_losses.append(cls_loss.sum() / torch.clamp(num_positive_anchors.to(dtype), min=1.0)) 206 | 207 | #------------------------------------------------------# 208 | # 如果存在先验框为正样本的话 209 | #------------------------------------------------------# 210 | if positive_indices.sum() > 0: 211 | targets = encode_bbox(assigned_annotations, positive_indices, anchor_widths, anchor_heights, anchor_ctr_x, anchor_ctr_y) 212 | #---------------------------------------------------# 213 | # 将网络应该有的预测结果和实际的预测结果进行比较 214 | # 计算smooth l1 loss 215 | #---------------------------------------------------# 216 | regression_diff = torch.abs(targets - regression[positive_indices, :]) 217 | regression_loss = torch.where( 218 | torch.le(regression_diff, 1.0 / 9.0), 219 | 0.5 * 9.0 * torch.pow(regression_diff, 2), 220 | regression_diff - 0.5 / 9.0 221 | ) 222 | regression_losses.append(regression_loss.mean()) 223 | else: 224 | regression_losses.append(torch.tensor(0).type_as(classification)) 225 | 226 | # 计算平均loss并返回 227 | c_loss = torch.stack(classification_losses).mean() 228 | r_loss = torch.stack(regression_losses).mean() 229 | loss = c_loss + r_loss 230 | return loss, c_loss, r_loss 231 | 232 | def get_lr_scheduler(lr_decay_type, lr, min_lr, total_iters, warmup_iters_ratio = 0.05, warmup_lr_ratio = 0.1, no_aug_iter_ratio = 0.05, step_num = 10): 233 | def yolox_warm_cos_lr(lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter, iters): 234 | if iters <= warmup_total_iters: 235 | # lr = (lr - warmup_lr_start) * iters / float(warmup_total_iters) + warmup_lr_start 236 | lr = (lr - warmup_lr_start) * pow(iters / float(warmup_total_iters), 2) + warmup_lr_start 237 | elif iters >= total_iters - no_aug_iter: 238 | lr = min_lr 239 | else: 240 | lr = min_lr + 0.5 * (lr - min_lr) * ( 241 | 1.0 + math.cos(math.pi* (iters - warmup_total_iters) / (total_iters - warmup_total_iters - no_aug_iter)) 242 | ) 243 | return lr 244 | 245 | def step_lr(lr, decay_rate, step_size, iters): 246 | if step_size < 1: 247 | raise ValueError("step_size must above 1.") 248 | n = iters // step_size 249 | out_lr = lr * decay_rate ** n 250 | return out_lr 251 | 252 | if lr_decay_type == "cos": 253 | warmup_total_iters = min(max(warmup_iters_ratio * total_iters, 1), 3) 254 | warmup_lr_start = max(warmup_lr_ratio * lr, 1e-6) 255 | no_aug_iter = min(max(no_aug_iter_ratio * total_iters, 1), 15) 256 | func = partial(yolox_warm_cos_lr ,lr, min_lr, total_iters, warmup_total_iters, warmup_lr_start, no_aug_iter) 257 | else: 258 | decay_rate = (min_lr / lr) ** (1 / (step_num - 1)) 259 | step_size = total_iters / step_num 260 | func = partial(step_lr, lr, decay_rate, step_size) 261 | 262 | return func 263 | 264 | def set_optimizer_lr(optimizer, lr_scheduler_func, epoch): 265 | lr = lr_scheduler_func(epoch) 266 | for param_group in optimizer.param_groups: 267 | param_group['lr'] = lr 268 | -------------------------------------------------------------------------------- /nets/efficientnet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch import nn 3 | from torch.nn import functional as F 4 | 5 | from nets.layers import (MemoryEfficientSwish, Swish, drop_connect, 6 | efficientnet_params, get_model_params, 7 | get_same_padding_conv2d, load_pretrained_weights, 8 | round_filters, round_repeats) 9 | 10 | 11 | class MBConvBlock(nn.Module): 12 | ''' 13 | EfficientNet-b0: 14 | [BlockArgs(kernel_size=3, num_repeat=1, input_filters=32, output_filters=16, expand_ratio=1, id_skip=True, stride=[1], se_ratio=0.25), 15 | BlockArgs(kernel_size=3, num_repeat=2, input_filters=16, output_filters=24, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 16 | BlockArgs(kernel_size=5, num_repeat=2, input_filters=24, output_filters=40, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 17 | BlockArgs(kernel_size=3, num_repeat=3, input_filters=40, output_filters=80, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 18 | BlockArgs(kernel_size=5, num_repeat=3, input_filters=80, output_filters=112, expand_ratio=6, id_skip=True, stride=[1], se_ratio=0.25), 19 | BlockArgs(kernel_size=5, num_repeat=4, input_filters=112, output_filters=192, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 20 | BlockArgs(kernel_size=3, num_repeat=1, input_filters=192, output_filters=320, expand_ratio=6, id_skip=True, stride=[1], se_ratio=0.25)] 21 | 22 | GlobalParams(batch_norm_momentum=0.99, batch_norm_epsilon=0.001, dropout_rate=0.2, num_classes=1000, width_coefficient=1.0, 23 | depth_coefficient=1.0, depth_divisor=8, min_depth=None, drop_connect_rate=0.2, image_size=224) 24 | ''' 25 | def __init__(self, block_args, global_params): 26 | super().__init__() 27 | self._block_args = block_args 28 | # 获得一种卷积方法 29 | Conv2d = get_same_padding_conv2d(image_size=global_params.image_size) 30 | 31 | # 获得标准化的参数 32 | self._bn_mom = 1 - global_params.batch_norm_momentum 33 | self._bn_eps = global_params.batch_norm_epsilon 34 | 35 | #----------------------------# 36 | # 计算是否施加注意力机制 37 | #----------------------------# 38 | self.has_se = (self._block_args.se_ratio is not None) and (0 < self._block_args.se_ratio <= 1) 39 | #----------------------------# 40 | # 判断是否添加残差边 41 | #----------------------------# 42 | self.id_skip = block_args.id_skip 43 | 44 | #-------------------------------------------------# 45 | # 利用Inverted residuals 46 | # part1 利用1x1卷积进行通道数上升 47 | #-------------------------------------------------# 48 | inp = self._block_args.input_filters 49 | oup = self._block_args.input_filters * self._block_args.expand_ratio 50 | if self._block_args.expand_ratio != 1: 51 | self._expand_conv = Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, bias=False) 52 | self._bn0 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps) 53 | 54 | #------------------------------------------------------# 55 | # 如果步长为2x2的话,利用深度可分离卷积进行高宽压缩 56 | # part2 利用3x3卷积对每一个channel进行卷积 57 | #------------------------------------------------------# 58 | k = self._block_args.kernel_size 59 | s = self._block_args.stride 60 | self._depthwise_conv = Conv2d(in_channels=oup, out_channels=oup, groups=oup, kernel_size=k, stride=s, bias=False) 61 | self._bn1 = nn.BatchNorm2d(num_features=oup, momentum=self._bn_mom, eps=self._bn_eps) 62 | 63 | #------------------------------------------------------# 64 | # 完成深度可分离卷积后 65 | # 对深度可分离卷积的结果施加注意力机制 66 | #------------------------------------------------------# 67 | if self.has_se: 68 | num_squeezed_channels = max(1, int(self._block_args.input_filters * self._block_args.se_ratio)) 69 | #------------------------------------------------------# 70 | # 通道先压缩后上升,最后利用sigmoid将值固定到0-1之间 71 | #------------------------------------------------------# 72 | self._se_reduce = Conv2d(in_channels=oup, out_channels=num_squeezed_channels, kernel_size=1) 73 | self._se_expand = Conv2d(in_channels=num_squeezed_channels, out_channels=oup, kernel_size=1) 74 | 75 | #------------------------------------------------------# 76 | # part3 利用1x1卷积进行通道下降 77 | #------------------------------------------------------# 78 | final_oup = self._block_args.output_filters 79 | self._project_conv = Conv2d(in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False) 80 | self._bn2 = nn.BatchNorm2d(num_features=final_oup, momentum=self._bn_mom, eps=self._bn_eps) 81 | 82 | self._swish = MemoryEfficientSwish() 83 | 84 | def forward(self, inputs, drop_connect_rate=None): 85 | x = inputs 86 | #-------------------------------------------------# 87 | # 利用Inverted residuals 88 | # part1 利用1x1卷积进行通道数上升 89 | #-------------------------------------------------# 90 | if self._block_args.expand_ratio != 1: 91 | x = self._swish(self._bn0(self._expand_conv(inputs))) 92 | 93 | #------------------------------------------------------# 94 | # 如果步长为2x2的话,利用深度可分离卷积进行高宽压缩 95 | # part2 利用3x3卷积对每一个channel进行卷积 96 | #------------------------------------------------------# 97 | x = self._swish(self._bn1(self._depthwise_conv(x))) 98 | 99 | #------------------------------------------------------# 100 | # 完成深度可分离卷积后 101 | # 对深度可分离卷积的结果施加注意力机制 102 | #------------------------------------------------------# 103 | if self.has_se: 104 | x_squeezed = F.adaptive_avg_pool2d(x, 1) 105 | x_squeezed = self._se_expand( 106 | self._swish(self._se_reduce(x_squeezed))) 107 | x = torch.sigmoid(x_squeezed) * x 108 | 109 | #------------------------------------------------------# 110 | # part3 利用1x1卷积进行通道下降 111 | #------------------------------------------------------# 112 | x = self._bn2(self._project_conv(x)) 113 | 114 | #------------------------------------------------------# 115 | # part4 如果满足残差条件,那么就增加残差边 116 | #------------------------------------------------------# 117 | input_filters, output_filters = self._block_args.input_filters, self._block_args.output_filters 118 | if self.id_skip and self._block_args.stride == 1 and input_filters == output_filters: 119 | if drop_connect_rate: 120 | x = drop_connect(x, p=drop_connect_rate, 121 | training=self.training) 122 | x = x + inputs # skip connection 123 | return x 124 | 125 | def set_swish(self, memory_efficient=True): 126 | """Sets swish function as memory efficient (for training) or standard (for export)""" 127 | self._swish = MemoryEfficientSwish() if memory_efficient else Swish() 128 | 129 | class EfficientNet(nn.Module): 130 | ''' 131 | EfficientNet-b0: 132 | [BlockArgs(kernel_size=3, num_repeat=1, input_filters=32, output_filters=16, expand_ratio=1, id_skip=True, stride=[1], se_ratio=0.25), 133 | BlockArgs(kernel_size=3, num_repeat=2, input_filters=16, output_filters=24, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 134 | BlockArgs(kernel_size=5, num_repeat=2, input_filters=24, output_filters=40, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 135 | BlockArgs(kernel_size=3, num_repeat=3, input_filters=40, output_filters=80, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 136 | BlockArgs(kernel_size=5, num_repeat=3, input_filters=80, output_filters=112, expand_ratio=6, id_skip=True, stride=[1], se_ratio=0.25), 137 | BlockArgs(kernel_size=5, num_repeat=4, input_filters=112, output_filters=192, expand_ratio=6, id_skip=True, stride=[2], se_ratio=0.25), 138 | BlockArgs(kernel_size=3, num_repeat=1, input_filters=192, output_filters=320, expand_ratio=6, id_skip=True, stride=[1], se_ratio=0.25)] 139 | 140 | GlobalParams(batch_norm_momentum=0.99, batch_norm_epsilon=0.001, dropout_rate=0.2, num_classes=1000, width_coefficient=1.0, 141 | depth_coefficient=1.0, depth_divisor=8, min_depth=None, drop_connect_rate=0.2, image_size=224) 142 | ''' 143 | def __init__(self, blocks_args=None, global_params=None): 144 | super().__init__() 145 | assert isinstance(blocks_args, list), 'blocks_args should be a list' 146 | assert len(blocks_args) > 0, 'block args must be greater than 0' 147 | self._global_params = global_params 148 | self._blocks_args = blocks_args 149 | # 获得一种卷积方法 150 | Conv2d = get_same_padding_conv2d(image_size=global_params.image_size) 151 | 152 | # 获得标准化的参数 153 | bn_mom = 1 - self._global_params.batch_norm_momentum 154 | bn_eps = self._global_params.batch_norm_epsilon 155 | 156 | #-------------------------------------------------# 157 | # 网络主干部分开始 158 | # 设定输入进来的是RGB三通道图像 159 | # 利用round_filters可以使得通道可以被8整除 160 | #-------------------------------------------------# 161 | in_channels = 3 162 | out_channels = round_filters(32, self._global_params) 163 | 164 | #-------------------------------------------------# 165 | # 创建stem部分 166 | #-------------------------------------------------# 167 | self._conv_stem = Conv2d( 168 | in_channels, out_channels, kernel_size=3, stride=2, bias=False) 169 | self._bn0 = nn.BatchNorm2d( 170 | num_features=out_channels, momentum=bn_mom, eps=bn_eps) 171 | 172 | #-------------------------------------------------# 173 | # 在这个地方对大结构块进行循环 174 | #-------------------------------------------------# 175 | self._blocks = nn.ModuleList([]) 176 | for i in range(len(self._blocks_args)): 177 | #-------------------------------------------------------------# 178 | # 对每个block的参数进行修改,根据所选的efficient版本进行修改 179 | #-------------------------------------------------------------# 180 | self._blocks_args[i] = self._blocks_args[i]._replace( 181 | input_filters=round_filters(self._blocks_args[i].input_filters, self._global_params), 182 | output_filters=round_filters(self._blocks_args[i].output_filters, self._global_params), 183 | num_repeat=round_repeats(self._blocks_args[i].num_repeat, self._global_params) 184 | ) 185 | 186 | #-------------------------------------------------------------# 187 | # 每个大结构块里面的第一个EfficientBlock 188 | # 都需要考虑步长和输入通道数 189 | #-------------------------------------------------------------# 190 | self._blocks.append(MBConvBlock(self._blocks_args[i], self._global_params)) 191 | 192 | if self._blocks_args[i].num_repeat > 1: 193 | self._blocks_args[i] = self._blocks_args[i]._replace(input_filters=self._blocks_args[i].output_filters, stride=1) 194 | 195 | #---------------------------------------------------------------# 196 | # 在利用第一个EfficientBlock进行通道数的调整或者高和宽的压缩后 197 | # 进行EfficientBlock的堆叠 198 | #---------------------------------------------------------------# 199 | for _ in range(self._blocks_args[i].num_repeat - 1): 200 | self._blocks.append(MBConvBlock(self._blocks_args[i], self._global_params)) 201 | 202 | #----------------------------------------------------------------# 203 | # 这是efficientnet的尾部部分,在进行effcientdet构建的时候没用到 204 | # 只在利用efficientnet进行分类的时候用到。 205 | #----------------------------------------------------------------# 206 | in_channels = self._blocks_args[len(self._blocks_args)-1].output_filters 207 | out_channels = round_filters(1280, self._global_params) 208 | 209 | self._conv_head = Conv2d(in_channels, out_channels, kernel_size=1, bias=False) 210 | self._bn1 = nn.BatchNorm2d(num_features=out_channels, momentum=bn_mom, eps=bn_eps) 211 | 212 | self._avg_pooling = nn.AdaptiveAvgPool2d(1) 213 | self._dropout = nn.Dropout(self._global_params.dropout_rate) 214 | self._fc = nn.Linear(out_channels, self._global_params.num_classes) 215 | 216 | self._swish = MemoryEfficientSwish() 217 | 218 | def set_swish(self, memory_efficient=True): 219 | """Sets swish function as memory efficient (for training) or standard (for export)""" 220 | # swish函数 221 | self._swish = MemoryEfficientSwish() if memory_efficient else Swish() 222 | for block in self._blocks: 223 | block.set_swish(memory_efficient) 224 | 225 | def extract_features(self, inputs): 226 | """ Returns output of the final convolution layer """ 227 | 228 | # Stem 229 | x = self._swish(self._bn0(self._conv_stem(inputs))) 230 | 231 | # Blocks 232 | for idx, block in enumerate(self._blocks): 233 | drop_connect_rate = self._global_params.drop_connect_rate 234 | if drop_connect_rate: 235 | drop_connect_rate *= float(idx) / len(self._blocks) 236 | x = block(x, drop_connect_rate=drop_connect_rate) 237 | # Head 238 | x = self._swish(self._bn1(self._conv_head(x))) 239 | 240 | return x 241 | 242 | def forward(self, inputs): 243 | """ Calls extract_features to extract features, applies final linear layer, and returns logits. """ 244 | bs = inputs.size(0) 245 | # Convolution layers 246 | x = self.extract_features(inputs) 247 | 248 | # Pooling and final linear layer 249 | x = self._avg_pooling(x) 250 | x = x.view(bs, -1) 251 | x = self._dropout(x) 252 | x = self._fc(x) 253 | return x 254 | 255 | @classmethod 256 | def from_name(cls, model_name, override_params=None): 257 | cls._check_model_name_is_valid(model_name) 258 | blocks_args, global_params = get_model_params(model_name, override_params) 259 | return cls(blocks_args, global_params) 260 | 261 | @classmethod 262 | def from_pretrained(cls, model_name, load_weights=True, advprop=True, num_classes=1000, in_channels=3): 263 | model = cls.from_name(model_name, override_params={'num_classes': num_classes}) 264 | if load_weights: 265 | load_pretrained_weights(model, model_name, load_fc=(num_classes == 1000), advprop=advprop) 266 | if in_channels != 3: 267 | Conv2d = get_same_padding_conv2d(image_size = model._global_params.image_size) 268 | out_channels = round_filters(32, model._global_params) 269 | model._conv_stem = Conv2d(in_channels, out_channels, kernel_size=3, stride=2, bias=False) 270 | return model 271 | 272 | @classmethod 273 | def get_image_size(cls, model_name): 274 | cls._check_model_name_is_valid(model_name) 275 | _, _, res, _ = efficientnet_params(model_name) 276 | return res 277 | 278 | @classmethod 279 | def _check_model_name_is_valid(cls, model_name): 280 | """ Validates model name. """ 281 | valid_models = ['efficientnet-b'+str(i) for i in range(9)] 282 | if model_name not in valid_models: 283 | raise ValueError('model_name should be one of: ' + ', '.join(valid_models)) 284 | -------------------------------------------------------------------------------- /nets/layers.py: -------------------------------------------------------------------------------- 1 | import collections 2 | import math 3 | import re 4 | from functools import partial 5 | 6 | import torch 7 | from torch import nn 8 | from torch.nn import functional as F 9 | from torch.utils import model_zoo 10 | 11 | #--------------------------------------------------------------# 12 | # 模型构建的辅助函数 13 | #--------------------------------------------------------------# 14 | GlobalParams = collections.namedtuple('GlobalParams', [ 15 | 'batch_norm_momentum', 'batch_norm_epsilon', 'dropout_rate', 16 | 'num_classes', 'width_coefficient', 'depth_coefficient', 17 | 'depth_divisor', 'min_depth', 'drop_connect_rate', 'image_size']) 18 | 19 | BlockArgs = collections.namedtuple('BlockArgs', [ 20 | 'kernel_size', 'num_repeat', 'input_filters', 'output_filters', 21 | 'expand_ratio', 'id_skip', 'stride', 'se_ratio']) 22 | 23 | GlobalParams.__new__.__defaults__ = (None,) * len(GlobalParams._fields) 24 | BlockArgs.__new__.__defaults__ = (None,) * len(BlockArgs._fields) 25 | 26 | def round_filters(filters, global_params): 27 | """ Calculate and round number of filters based on depth multiplier. """ 28 | multiplier = global_params.width_coefficient 29 | if not multiplier: 30 | return filters 31 | divisor = global_params.depth_divisor 32 | min_depth = global_params.min_depth 33 | filters *= multiplier 34 | min_depth = min_depth or divisor 35 | new_filters = max(min_depth, int(filters + divisor / 2) // divisor * divisor) 36 | if new_filters < 0.9 * filters: # prevent rounding by more than 10% 37 | new_filters += divisor 38 | return int(new_filters) 39 | 40 | 41 | def round_repeats(repeats, global_params): 42 | """ Round number of filters based on depth multiplier. """ 43 | multiplier = global_params.depth_coefficient 44 | if not multiplier: 45 | return repeats 46 | return int(math.ceil(multiplier * repeats)) 47 | 48 | 49 | def drop_connect(inputs, p, training): 50 | """ Drop connect. """ 51 | if not training: return inputs 52 | batch_size = inputs.shape[0] 53 | keep_prob = 1 - p 54 | random_tensor = keep_prob 55 | random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=inputs.dtype, device=inputs.device) 56 | binary_tensor = torch.floor(random_tensor) 57 | output = inputs / keep_prob * binary_tensor 58 | return output 59 | 60 | 61 | def get_same_padding_conv2d(image_size=None): 62 | """ Chooses static padding if you have specified an image size, and dynamic padding otherwise. 63 | Static padding is necessary for ONNX exporting of models. """ 64 | if image_size is None: 65 | return Conv2dDynamicSamePadding 66 | else: 67 | return partial(Conv2dStaticSamePadding, image_size=image_size) 68 | 69 | 70 | class Conv2dDynamicSamePadding(nn.Conv2d): 71 | """ 2D Convolutions like TensorFlow, for a dynamic image size """ 72 | 73 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, groups=1, bias=True): 74 | super().__init__(in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias) 75 | self.stride = self.stride if len(self.stride) == 2 else [self.stride[0]] * 2 76 | 77 | def forward(self, x): 78 | ih, iw = x.size()[-2:] 79 | kh, kw = self.weight.size()[-2:] 80 | sh, sw = self.stride 81 | oh, ow = math.ceil(ih / sh), math.ceil(iw / sw) 82 | pad_h = max((oh - 1) * self.stride[0] + (kh - 1) * self.dilation[0] + 1 - ih, 0) 83 | pad_w = max((ow - 1) * self.stride[1] + (kw - 1) * self.dilation[1] + 1 - iw, 0) 84 | if pad_h > 0 or pad_w > 0: 85 | x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2]) 86 | return F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups) 87 | 88 | 89 | class Identity(nn.Module): 90 | def __init__(self, ): 91 | super(Identity, self).__init__() 92 | 93 | def forward(self, input): 94 | return input 95 | 96 | #--------------------------------------------------------------# 97 | # 加载模型参数的辅助函数 98 | #--------------------------------------------------------------# 99 | def efficientnet_params(model_name): 100 | """ Map EfficientNet model name to parameter coefficients. """ 101 | params_dict = { 102 | # Coefficients: width,depth,res,dropout 103 | 'efficientnet-b0': (1.0, 1.0, 224, 0.2), 104 | 'efficientnet-b1': (1.0, 1.1, 240, 0.2), 105 | 'efficientnet-b2': (1.1, 1.2, 260, 0.3), 106 | 'efficientnet-b3': (1.2, 1.4, 300, 0.3), 107 | 'efficientnet-b4': (1.4, 1.8, 380, 0.4), 108 | 'efficientnet-b5': (1.6, 2.2, 456, 0.4), 109 | 'efficientnet-b6': (1.8, 2.6, 528, 0.5), 110 | 'efficientnet-b7': (2.0, 3.1, 600, 0.5), 111 | 'efficientnet-b8': (2.2, 3.6, 672, 0.5), 112 | 'efficientnet-l2': (4.3, 5.3, 800, 0.5), 113 | } 114 | return params_dict[model_name] 115 | 116 | 117 | class BlockDecoder(object): 118 | """ Block Decoder for readability, straight from the official TensorFlow repository """ 119 | 120 | @staticmethod 121 | def _decode_block_string(block_string): 122 | """ Gets a block through a string notation of arguments. """ 123 | assert isinstance(block_string, str) 124 | 125 | ops = block_string.split('_') 126 | options = {} 127 | for op in ops: 128 | splits = re.split(r'(\d.*)', op) 129 | if len(splits) >= 2: 130 | key, value = splits[:2] 131 | options[key] = value 132 | 133 | # Check stride 134 | assert (('s' in options and len(options['s']) == 1) or 135 | (len(options['s']) == 2 and options['s'][0] == options['s'][1])) 136 | 137 | return BlockArgs( 138 | kernel_size=int(options['k']), 139 | num_repeat=int(options['r']), 140 | input_filters=int(options['i']), 141 | output_filters=int(options['o']), 142 | expand_ratio=int(options['e']), 143 | id_skip=('noskip' not in block_string), 144 | se_ratio=float(options['se']) if 'se' in options else None, 145 | stride=[int(options['s'][0])]) 146 | 147 | @staticmethod 148 | def _encode_block_string(block): 149 | """Encodes a block to a string.""" 150 | args = [ 151 | 'r%d' % block.num_repeat, 152 | 'k%d' % block.kernel_size, 153 | 's%d%d' % (block.strides[0], block.strides[1]), 154 | 'e%s' % block.expand_ratio, 155 | 'i%d' % block.input_filters, 156 | 'o%d' % block.output_filters 157 | ] 158 | if 0 < block.se_ratio <= 1: 159 | args.append('se%s' % block.se_ratio) 160 | if block.id_skip is False: 161 | args.append('noskip') 162 | return '_'.join(args) 163 | 164 | @staticmethod 165 | def decode(string_list): 166 | """ 167 | Decodes a list of string notations to specify blocks inside the network. 168 | 169 | :param string_list: a list of strings, each string is a notation of block 170 | :return: a list of BlockArgs namedtuples of block args 171 | """ 172 | assert isinstance(string_list, list) 173 | blocks_args = [] 174 | for block_string in string_list: 175 | blocks_args.append(BlockDecoder._decode_block_string(block_string)) 176 | return blocks_args 177 | 178 | @staticmethod 179 | def encode(blocks_args): 180 | """ 181 | Encodes a list of BlockArgs to a list of strings. 182 | 183 | :param blocks_args: a list of BlockArgs namedtuples of block args 184 | :return: a list of strings, each string is a notation of block 185 | """ 186 | block_strings = [] 187 | for block in blocks_args: 188 | block_strings.append(BlockDecoder._encode_block_string(block)) 189 | return block_strings 190 | 191 | def efficientnet(width_coefficient=None, depth_coefficient=None, dropout_rate=0.2, 192 | drop_connect_rate=0.2, image_size=None, num_classes=1000): 193 | """ Creates a efficientnet model. """ 194 | 195 | blocks_args = [ 196 | 'r1_k3_s11_e1_i32_o16_se0.25', 'r2_k3_s22_e6_i16_o24_se0.25', 197 | 'r2_k5_s22_e6_i24_o40_se0.25', 'r3_k3_s22_e6_i40_o80_se0.25', 198 | 'r3_k5_s11_e6_i80_o112_se0.25', 'r4_k5_s22_e6_i112_o192_se0.25', 199 | 'r1_k3_s11_e6_i192_o320_se0.25', 200 | ] 201 | blocks_args = BlockDecoder.decode(blocks_args) 202 | 203 | global_params = GlobalParams( 204 | batch_norm_momentum=0.99, 205 | batch_norm_epsilon=1e-3, 206 | dropout_rate=dropout_rate, 207 | drop_connect_rate=drop_connect_rate, 208 | # data_format='channels_last', # removed, this is always true in PyTorch 209 | num_classes=num_classes, 210 | width_coefficient=width_coefficient, 211 | depth_coefficient=depth_coefficient, 212 | depth_divisor=8, 213 | min_depth=None, 214 | image_size=image_size, 215 | ) 216 | 217 | return blocks_args, global_params 218 | 219 | def get_model_params(model_name, override_params): 220 | """ Get the block args and global params for a given model """ 221 | if model_name.startswith('efficientnet'): 222 | w, d, s, p = efficientnet_params(model_name) 223 | # note: all models have drop connect rate = 0.2 224 | blocks_args, global_params = efficientnet( 225 | width_coefficient=w, depth_coefficient=d, dropout_rate=p, image_size=s) 226 | else: 227 | raise NotImplementedError('model name is not pre-defined: %s' % model_name) 228 | if override_params: 229 | # ValueError will be raised here if override_params has fields not included in global_params. 230 | global_params = global_params._replace(**override_params) 231 | return blocks_args, global_params 232 | 233 | url_map = { 234 | 'efficientnet-b0': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b0.pth', 235 | 'efficientnet-b1': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b1.pth', 236 | 'efficientnet-b2': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b2.pth', 237 | 'efficientnet-b3': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b3.pth', 238 | 'efficientnet-b4': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b4.pth', 239 | 'efficientnet-b5': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b5.pth', 240 | 'efficientnet-b6': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b6.pth', 241 | 'efficientnet-b7': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b7.pth', 242 | } 243 | 244 | def load_pretrained_weights(model, model_name, load_fc=True, advprop=False): 245 | """ Loads pretrained weights, and downloads if loading for the first time. """ 246 | # AutoAugment or Advprop (different preprocessing) 247 | url_map_ = url_map 248 | state_dict = model_zoo.load_url(url_map_[model_name], map_location=torch.device('cpu'), model_dir="./model_data") 249 | # state_dict = torch.load('../../weights/backbone_efficientnetb0.pth') 250 | if load_fc: 251 | ret = model.load_state_dict(state_dict, strict=False) 252 | print(ret) 253 | else: 254 | state_dict.pop('_fc.weight') 255 | state_dict.pop('_fc.bias') 256 | res = model.load_state_dict(state_dict, strict=False) 257 | assert set(res.missing_keys) == set(['_fc.weight', '_fc.bias']), 'issue loading pretrained weights' 258 | print('Loaded pretrained weights for {}'.format(model_name)) 259 | 260 | class SwishImplementation(torch.autograd.Function): 261 | @staticmethod 262 | def forward(ctx, i): 263 | result = i * torch.sigmoid(i) 264 | ctx.save_for_backward(i) 265 | return result 266 | 267 | @staticmethod 268 | def backward(ctx, grad_output): 269 | i = ctx.saved_variables[0] 270 | sigmoid_i = torch.sigmoid(i) 271 | return grad_output * (sigmoid_i * (1 + i * (1 - sigmoid_i))) 272 | 273 | class MemoryEfficientSwish(nn.Module): 274 | def forward(self, x): 275 | return SwishImplementation.apply(x) 276 | 277 | class Swish(nn.Module): 278 | def forward(self, x): 279 | return x * torch.sigmoid(x) 280 | 281 | class Conv2dStaticSamePadding(nn.Module): 282 | def __init__(self, in_channels, out_channels, kernel_size, stride=1, bias=True, groups=1, dilation=1, **kwargs): 283 | super().__init__() 284 | self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride, 285 | bias=bias, groups=groups) 286 | self.stride = self.conv.stride 287 | self.kernel_size = self.conv.kernel_size 288 | self.dilation = self.conv.dilation 289 | 290 | if isinstance(self.stride, int): 291 | self.stride = [self.stride] * 2 292 | elif len(self.stride) == 1: 293 | self.stride = [self.stride[0]] * 2 294 | 295 | if isinstance(self.kernel_size, int): 296 | self.kernel_size = [self.kernel_size] * 2 297 | elif len(self.kernel_size) == 1: 298 | self.kernel_size = [self.kernel_size[0]] * 2 299 | 300 | def forward(self, x): 301 | h, w = x.shape[-2:] 302 | 303 | extra_h = (math.ceil(w / self.stride[1]) - 1) * self.stride[1] - w + self.kernel_size[1] 304 | extra_v = (math.ceil(h / self.stride[0]) - 1) * self.stride[0] - h + self.kernel_size[0] 305 | 306 | left = extra_h // 2 307 | right = extra_h - left 308 | top = extra_v // 2 309 | bottom = extra_v - top 310 | 311 | x = F.pad(x, [left, right, top, bottom]) 312 | 313 | x = self.conv(x) 314 | return x 315 | 316 | class MaxPool2dStaticSamePadding(nn.Module): 317 | def __init__(self, *args, **kwargs): 318 | super().__init__() 319 | self.pool = nn.MaxPool2d(*args, **kwargs) 320 | self.stride = self.pool.stride 321 | self.kernel_size = self.pool.kernel_size 322 | 323 | if isinstance(self.stride, int): 324 | self.stride = [self.stride] * 2 325 | elif len(self.stride) == 1: 326 | self.stride = [self.stride[0]] * 2 327 | 328 | if isinstance(self.kernel_size, int): 329 | self.kernel_size = [self.kernel_size] * 2 330 | elif len(self.kernel_size) == 1: 331 | self.kernel_size = [self.kernel_size[0]] * 2 332 | 333 | def forward(self, x): 334 | h, w = x.shape[-2:] 335 | 336 | extra_h = (math.ceil(w / self.stride[1]) - 1) * self.stride[1] - w + self.kernel_size[1] 337 | extra_v = (math.ceil(h / self.stride[0]) - 1) * self.stride[0] - h + self.kernel_size[0] 338 | 339 | left = extra_h // 2 340 | right = extra_h - left 341 | top = extra_v // 2 342 | bottom = extra_v - top 343 | 344 | x = F.pad(x, [left, right, top, bottom]) 345 | 346 | x = self.pool(x) 347 | return x 348 | -------------------------------------------------------------------------------- /predict.py: -------------------------------------------------------------------------------- 1 | #-----------------------------------------------------------------------# 2 | # predict.py将单张图片预测、摄像头检测、FPS测试和目录遍历检测等功能 3 | # 整合到了一个py文件中,通过指定mode进行模式的修改。 4 | #-----------------------------------------------------------------------# 5 | import time 6 | 7 | import cv2 8 | import numpy as np 9 | from PIL import Image 10 | 11 | from efficientdet import Efficientdet 12 | 13 | if __name__ == "__main__": 14 | efficientdet = Efficientdet() 15 | #----------------------------------------------------------------------------------------------------------# 16 | # mode用于指定测试的模式: 17 | # 'predict' 表示单张图片预测,如果想对预测过程进行修改,如保存图片,截取对象等,可以先看下方详细的注释 18 | # 'video' 表示视频检测,可调用摄像头或者视频进行检测,详情查看下方注释。 19 | # 'fps' 表示测试fps,使用的图片是img里面的street.jpg,详情查看下方注释。 20 | # 'dir_predict' 表示遍历文件夹进行检测并保存。默认遍历img文件夹,保存img_out文件夹,详情查看下方注释。 21 | #----------------------------------------------------------------------------------------------------------# 22 | mode = "predict" 23 | #-------------------------------------------------------------------------# 24 | # crop 指定了是否在单张图片预测后对目标进行截取 25 | # count 指定了是否进行目标的计数 26 | # crop、count仅在mode='predict'时有效 27 | #-------------------------------------------------------------------------# 28 | crop = False 29 | count = False 30 | #----------------------------------------------------------------------------------------------------------# 31 | # video_path 用于指定视频的路径,当video_path=0时表示检测摄像头 32 | # 想要检测视频,则设置如video_path = "xxx.mp4"即可,代表读取出根目录下的xxx.mp4文件。 33 | # video_save_path 表示视频保存的路径,当video_save_path=""时表示不保存 34 | # 想要保存视频,则设置如video_save_path = "yyy.mp4"即可,代表保存为根目录下的yyy.mp4文件。 35 | # video_fps 用于保存的视频的fps 36 | # 37 | # video_path、video_save_path和video_fps仅在mode='video'时有效 38 | # 保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。 39 | #----------------------------------------------------------------------------------------------------------# 40 | video_path = 0 41 | video_save_path = "" 42 | video_fps = 25.0 43 | #----------------------------------------------------------------------------------------------------------# 44 | # test_interval 用于指定测量fps的时候,图片检测的次数。理论上test_interval越大,fps越准确。 45 | # fps_image_path 用于指定测试的fps图片 46 | # 47 | # test_interval和fps_image_path仅在mode='fps'有效 48 | #----------------------------------------------------------------------------------------------------------# 49 | test_interval = 100 50 | fps_image_path = "img/street.jpg" 51 | #-------------------------------------------------------------------------# 52 | # dir_origin_path 指定了用于检测的图片的文件夹路径 53 | # dir_save_path 指定了检测完图片的保存路径 54 | # 55 | # dir_origin_path和dir_save_path仅在mode='dir_predict'时有效 56 | #-------------------------------------------------------------------------# 57 | dir_origin_path = "img/" 58 | dir_save_path = "img_out/" 59 | 60 | if mode == "predict": 61 | ''' 62 | 1、如果想要进行检测完的图片的保存,利用r_image.save("img.jpg")即可保存,直接在predict.py里进行修改即可。 63 | 2、如果想要获得预测框的坐标,可以进入efficientdet.detect_image函数,在绘图部分读取top,left,bottom,right这四个值。 64 | 3、如果想要利用预测框截取下目标,可以进入efficientdet.detect_image函数,在绘图部分利用获取到的top,left,bottom,right这四个值 65 | 在原图上利用矩阵的方式进行截取。 66 | 4、如果想要在预测图上写额外的字,比如检测到的特定目标的数量,可以进入efficientdet.detect_image函数,在绘图部分对predicted_class进行判断, 67 | 比如判断if predicted_class == 'car': 即可判断当前目标是否为车,然后记录数量即可。利用draw.text即可写字。 68 | ''' 69 | while True: 70 | img = input('Input image filename:') 71 | try: 72 | image = Image.open(img) 73 | except: 74 | print('Open Error! Try again!') 75 | continue 76 | else: 77 | r_image = efficientdet.detect_image(image, crop = crop, count=count) 78 | r_image.show() 79 | 80 | elif mode == "video": 81 | capture = cv2.VideoCapture(video_path) 82 | if video_save_path!="": 83 | fourcc = cv2.VideoWriter_fourcc(*'XVID') 84 | size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))) 85 | out = cv2.VideoWriter(video_save_path, fourcc, video_fps, size) 86 | 87 | ref, frame = capture.read() 88 | if not ref: 89 | raise ValueError("未能正确读取摄像头(视频),请注意是否正确安装摄像头(是否正确填写视频路径)。") 90 | 91 | fps = 0.0 92 | while(True): 93 | t1 = time.time() 94 | # 读取某一帧 95 | ref, frame = capture.read() 96 | if not ref: 97 | break 98 | # 格式转变,BGRtoRGB 99 | frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB) 100 | # 转变成Image 101 | frame = Image.fromarray(np.uint8(frame)) 102 | # 进行检测 103 | frame = np.array(efficientdet.detect_image(frame)) 104 | # RGBtoBGR满足opencv显示格式 105 | frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR) 106 | 107 | fps = ( fps + (1./(time.time()-t1)) ) / 2 108 | print("fps= %.2f"%(fps)) 109 | frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 110 | 111 | cv2.imshow("video",frame) 112 | c= cv2.waitKey(1) & 0xff 113 | if video_save_path!="": 114 | out.write(frame) 115 | 116 | if c==27: 117 | capture.release() 118 | break 119 | 120 | print("Video Detection Done!") 121 | capture.release() 122 | if video_save_path!="": 123 | print("Save processed video to the path :" + video_save_path) 124 | out.release() 125 | cv2.destroyAllWindows() 126 | 127 | elif mode == "fps": 128 | img = Image.open(fps_image_path) 129 | tact_time = efficientdet.get_FPS(img, test_interval) 130 | print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1') 131 | 132 | elif mode == "dir_predict": 133 | import os 134 | 135 | from tqdm import tqdm 136 | 137 | img_names = os.listdir(dir_origin_path) 138 | for img_name in tqdm(img_names): 139 | if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')): 140 | image_path = os.path.join(dir_origin_path, img_name) 141 | image = Image.open(image_path) 142 | r_image = efficientdet.detect_image(image) 143 | if not os.path.exists(dir_save_path): 144 | os.makedirs(dir_save_path) 145 | r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0) 146 | 147 | else: 148 | raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps' or 'dir_predict'.") 149 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch 2 | torchvision 3 | tensorboard 4 | scipy==1.2.1 5 | numpy==1.17.0 6 | matplotlib==3.1.2 7 | opencv_python==4.1.2.30 8 | tqdm==4.60.0 9 | Pillow==8.2.0 10 | h5py==2.10.0 -------------------------------------------------------------------------------- /summary.py: -------------------------------------------------------------------------------- 1 | #--------------------------------------------# 2 | # 该部分代码用于看网络参数 3 | #--------------------------------------------# 4 | import torch 5 | from thop import clever_format, profile 6 | 7 | from nets.efficientdet import EfficientDetBackbone 8 | from utils.utils import image_sizes 9 | 10 | if __name__ == '__main__': 11 | phi = 0 12 | input_shape = [image_sizes[phi], image_sizes[phi]] 13 | num_classes = 80 14 | 15 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 16 | model = EfficientDetBackbone(num_classes, phi).to(device) 17 | print(model) 18 | print('# generator parameters:', sum(param.numel() for param in model.parameters())) 19 | 20 | dummy_input = torch.randn(1, 3, input_shape[0], input_shape[1]).to(device) 21 | flops, params = profile(model.to(device), (dummy_input, ), verbose=False) 22 | #--------------------------------------------------------# 23 | # flops * 2是因为profile没有将卷积作为两个operations 24 | # 有些论文将卷积算乘法、加法两个operations。此时乘2 25 | # 有些论文只考虑乘法的运算次数,忽略加法。此时不乘2 26 | # 本代码选择乘2,参考YOLOX。 27 | #--------------------------------------------------------# 28 | flops = flops * 2 29 | flops, params = clever_format([flops, params], "%.3f") 30 | print("Total GFLOPs: %s" %(flops)) 31 | print("Total Parameters: %s" %(params)) 32 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | #-------------------------------------# 2 | # 对数据集进行训练 3 | #-------------------------------------# 4 | import datetime 5 | import os 6 | import warnings 7 | from functools import partial 8 | 9 | import numpy as np 10 | import torch 11 | import torch.backends.cudnn as cudnn 12 | import torch.distributed as dist 13 | import torch.optim as optim 14 | from torch.utils.data import DataLoader 15 | 16 | from nets.efficientdet import EfficientDetBackbone 17 | from nets.efficientdet_training import (FocalLoss, get_lr_scheduler, 18 | set_optimizer_lr) 19 | from utils.callbacks import EvalCallback, LossHistory 20 | from utils.dataloader import EfficientdetDataset, efficientdet_dataset_collate 21 | from utils.utils import (download_weights, get_classes, image_sizes, 22 | seed_everything, show_config, worker_init_fn) 23 | from utils.utils_fit import fit_one_epoch 24 | 25 | warnings.filterwarnings("ignore") 26 | 27 | ''' 28 | 训练自己的目标检测模型一定需要注意以下几点: 29 | 1、训练前仔细检查自己的格式是否满足要求,该库要求数据集格式为VOC格式,需要准备好的内容有输入图片和标签 30 | 输入图片为.jpg图片,无需固定大小,传入训练前会自动进行resize。 31 | 灰度图会自动转成RGB图片进行训练,无需自己修改。 32 | 输入图片如果后缀非jpg,需要自己批量转成jpg后再开始训练。 33 | 34 | 标签为.xml格式,文件中会有需要检测的目标信息,标签文件和输入图片文件相对应。 35 | 36 | 2、损失值的大小用于判断是否收敛,比较重要的是有收敛的趋势,即验证集损失不断下降,如果验证集损失基本上不改变的话,模型基本上就收敛了。 37 | 损失值的具体大小并没有什么意义,大和小只在于损失的计算方式,并不是接近于0才好。如果想要让损失好看点,可以直接到对应的损失函数里面除上10000。 38 | 训练过程中的损失值会保存在logs文件夹下的loss_%Y_%m_%d_%H_%M_%S文件夹中 39 | 40 | 3、训练好的权值文件保存在logs文件夹中,每个训练世代(Epoch)包含若干训练步长(Step),每个训练步长(Step)进行一次梯度下降。 41 | 如果只是训练了几个Step是不会保存的,Epoch和Step的概念要捋清楚一下。 42 | ''' 43 | if __name__ == "__main__": 44 | #-------------------------------# 45 | # 是否使用Cuda 46 | # 没有GPU可以设置成False 47 | #-------------------------------# 48 | Cuda = True 49 | #----------------------------------------------# 50 | # Seed 用于固定随机种子 51 | # 使得每次独立训练都可以获得一样的结果 52 | #----------------------------------------------# 53 | seed = 11 54 | #---------------------------------------------------------------------# 55 | # distributed 用于指定是否使用单机多卡分布式运行 56 | # 终端指令仅支持Ubuntu。CUDA_VISIBLE_DEVICES用于在Ubuntu下指定显卡。 57 | # Windows系统下默认使用DP模式调用所有显卡,不支持DDP。 58 | # DP模式: 59 | # 设置 distributed = False 60 | # 在终端中输入 CUDA_VISIBLE_DEVICES=0,1 python train.py 61 | # DDP模式: 62 | # 设置 distributed = True 63 | # 在终端中输入 CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py 64 | #---------------------------------------------------------------------# 65 | distributed = False 66 | #---------------------------------------------------------------------# 67 | # sync_bn 是否使用sync_bn,DDP模式多卡可用 68 | #---------------------------------------------------------------------# 69 | sync_bn = False 70 | #---------------------------------------------------------------------# 71 | # fp16 是否使用混合精度训练 72 | # 可减少约一半的显存、需要pytorch1.7.1以上 73 | #---------------------------------------------------------------------# 74 | fp16 = False 75 | #---------------------------------------------------------------------# 76 | # classes_path 指向model_data下的txt,与自己训练的数据集相关 77 | # 训练前一定要修改classes_path,使其对应自己的数据集 78 | #---------------------------------------------------------------------# 79 | classes_path = 'model_data/voc_classes.txt' 80 | #---------------------------------------------------------------------# 81 | # 用于选择所使用的模型的版本,0-7 82 | #---------------------------------------------------------------------# 83 | phi = 0 84 | #----------------------------------------------------------------------------------------------------------------------------# 85 | # pretrained 是否使用主干网络的预训练权重,此处使用的是主干的权重,因此是在模型构建的时候进行加载的。 86 | # 如果设置了model_path,则主干的权值无需加载,pretrained的值无意义。 87 | # 如果不设置model_path,pretrained = True,此时仅加载主干开始训练。 88 | # 如果不设置model_path,pretrained = False,Freeze_Train = Fasle,此时从0开始训练,且没有冻结主干的过程。 89 | #----------------------------------------------------------------------------------------------------------------------------# 90 | pretrained = False 91 | #----------------------------------------------------------------------------------------------------------------------------# 92 | # 权值文件的下载请看README,可以通过网盘下载。模型的 预训练权重 对不同数据集是通用的,因为特征是通用的。 93 | # 模型的 预训练权重 比较重要的部分是 主干特征提取网络的权值部分,用于进行特征提取。 94 | # 预训练权重对于99%的情况都必须要用,不用的话主干部分的权值太过随机,特征提取效果不明显,网络训练的结果也不会好 95 | # 96 | # 如果训练过程中存在中断训练的操作,可以将model_path设置成logs文件夹下的权值文件,将已经训练了一部分的权值再次载入。 97 | # 同时修改下方的 冻结阶段 或者 解冻阶段 的参数,来保证模型epoch的连续性。 98 | # 99 | # 当model_path = ''的时候不加载整个模型的权值。 100 | # 101 | # 此处使用的是整个模型的权重,因此是在train.py进行加载的,pretrain不影响此处的权值加载。 102 | # 如果想要让模型从主干的预训练权值开始训练,则设置model_path = '',pretrain = True,此时仅加载主干。 103 | # 如果想要让模型从0开始训练,则设置model_path = '',pretrain = Fasle,Freeze_Train = Fasle,此时从0开始训练,且没有冻结主干的过程。 104 | # 105 | # 一般来讲,网络从0开始的训练效果会很差,因为权值太过随机,特征提取效果不明显,因此非常、非常、非常不建议大家从0开始训练! 106 | # 如果一定要从0开始,可以了解imagenet数据集,首先训练分类模型,获得网络的主干部分权值,分类模型的 主干部分 和该模型通用,基于此进行训练。 107 | #----------------------------------------------------------------------------------------------------------------------------# 108 | model_path = 'model_data/efficientdet-d0.pth' 109 | #------------------------------------------------------# 110 | # input_shape 输入的shape大小 111 | #------------------------------------------------------# 112 | input_shape = [image_sizes[phi], image_sizes[phi]] 113 | 114 | #----------------------------------------------------------------------------------------------------------------------------# 115 | # 训练分为两个阶段,分别是冻结阶段和解冻阶段。设置冻结阶段是为了满足机器性能不足的同学的训练需求。 116 | # 冻结训练需要的显存较小,显卡非常差的情况下,可设置Freeze_Epoch等于UnFreeze_Epoch,此时仅仅进行冻结训练。 117 | # 118 | # 在此提供若干参数设置建议,各位训练者根据自己的需求进行灵活调整: 119 | # (一)从整个模型的预训练权重开始训练: 120 | # Adam: 121 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 100,Freeze_Train = True,optimizer_type = 'adam',Init_lr = 3e-4,weight_decay = 0。(冻结) 122 | # Init_Epoch = 0,UnFreeze_Epoch = 100,Freeze_Train = False,optimizer_type = 'adam',Init_lr = 3e-4,weight_decay = 0。(不冻结) 123 | # SGD: 124 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 200,Freeze_Train = True,optimizer_type = 'sgd',Init_lr = 1e-2,weight_decay = 4e-5。(冻结) 125 | # Init_Epoch = 0,UnFreeze_Epoch = 200,Freeze_Train = False,optimizer_type = 'sgd',Init_lr = 1e-2,weight_decay = 4e-5。(不冻结) 126 | # 其中:UnFreeze_Epoch可以在100-300之间调整。 127 | # (二)从主干网络的预训练权重开始训练: 128 | # Adam: 129 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 100,Freeze_Train = True,optimizer_type = 'adam',Init_lr = 3e-4,weight_decay = 0。(冻结) 130 | # Init_Epoch = 0,UnFreeze_Epoch = 100,Freeze_Train = False,optimizer_type = 'adam',Init_lr = 3e-4,weight_decay = 0。(不冻结) 131 | # SGD: 132 | # Init_Epoch = 0,Freeze_Epoch = 50,UnFreeze_Epoch = 200,Freeze_Train = True,optimizer_type = 'sgd',Init_lr = 1e-2,weight_decay = 4e-5。(冻结) 133 | # Init_Epoch = 0,UnFreeze_Epoch = 200,Freeze_Train = False,optimizer_type = 'sgd',Init_lr = 1e-2,weight_decay = 4e-5。(不冻结) 134 | # 其中:由于从主干网络的预训练权重开始训练,主干的权值不一定适合目标检测,需要更多的训练跳出局部最优解。 135 | # UnFreeze_Epoch可以在200-300之间调整,YOLOV5和YOLOX均推荐使用300。 136 | # Adam相较于SGD收敛的快一些。因此UnFreeze_Epoch理论上可以小一点,但依然推荐更多的Epoch。 137 | # (三)batch_size的设置: 138 | # 在显卡能够接受的范围内,以大为好。显存不足与数据集大小无关,提示显存不足(OOM或者CUDA out of memory)请调小batch_size。 139 | # 受到BatchNorm层影响,batch_size最小为2,不能为1。 140 | # 正常情况下Freeze_batch_size建议为Unfreeze_batch_size的1-2倍。不建议设置的差距过大,因为关系到学习率的自动调整。 141 | #----------------------------------------------------------------------------------------------------------------------------# 142 | #------------------------------------------------------------------# 143 | # 冻结阶段训练参数 144 | # 此时模型的主干被冻结了,特征提取网络不发生改变 145 | # 占用的显存较小,仅对网络进行微调 146 | # Init_Epoch 模型当前开始的训练世代,其值可以大于Freeze_Epoch,如设置: 147 | # Init_Epoch = 60、Freeze_Epoch = 50、UnFreeze_Epoch = 100 148 | # 会跳过冻结阶段,直接从60代开始,并调整对应的学习率。 149 | # (断点续练时使用) 150 | # Freeze_Epoch 模型冻结训练的Freeze_Epoch 151 | # (当Freeze_Train=False时失效) 152 | # Freeze_batch_size 模型冻结训练的batch_size 153 | # (当Freeze_Train=False时失效) 154 | #------------------------------------------------------------------# 155 | Init_Epoch = 0 156 | Freeze_Epoch = 50 157 | Freeze_batch_size = 8 158 | #------------------------------------------------------------------# 159 | # 解冻阶段训练参数 160 | # 此时模型的主干不被冻结了,特征提取网络会发生改变 161 | # 占用的显存较大,网络所有的参数都会发生改变 162 | # UnFreeze_Epoch 模型总共训练的epoch 163 | # SGD需要更长的时间收敛,因此设置较大的UnFreeze_Epoch 164 | # Adam可以使用相对较小的UnFreeze_Epoch 165 | # Unfreeze_batch_size 模型在解冻后的batch_size 166 | #------------------------------------------------------------------# 167 | UnFreeze_Epoch = 100 168 | Unfreeze_batch_size = 4 169 | #------------------------------------------------------------------# 170 | # Freeze_Train 是否进行冻结训练 171 | # 默认先冻结主干训练后解冻训练。 172 | #------------------------------------------------------------------# 173 | Freeze_Train = True 174 | 175 | #------------------------------------------------------------------# 176 | # 其它训练参数:学习率、优化器、学习率下降有关 177 | #------------------------------------------------------------------# 178 | #------------------------------------------------------------------# 179 | # Init_lr 模型的最大学习率 180 | # 当使用Adam优化器时建议设置 Init_lr=3e-4 181 | # 当使用SGD优化器时建议设置 Init_lr=1e-2 182 | # Min_lr 模型的最小学习率,默认为最大学习率的0.01 183 | #------------------------------------------------------------------# 184 | Init_lr = 3e-4 185 | Min_lr = Init_lr * 0.01 186 | #------------------------------------------------------------------# 187 | # optimizer_type 使用到的优化器种类,可选的有adam、sgd 188 | # 当使用Adam优化器时建议设置 Init_lr=3e-4 189 | # 当使用SGD优化器时建议设置 Init_lr=1e-2 190 | # momentum 优化器内部使用到的momentum参数 191 | # weight_decay 权值衰减,可防止过拟合 192 | # adam会导致weight_decay错误,使用adam时建议设置为0。 193 | #------------------------------------------------------------------# 194 | optimizer_type = "adam" 195 | momentum = 0.9 196 | weight_decay = 0 197 | #------------------------------------------------------------------# 198 | # lr_decay_type 使用到的学习率下降方式,可选的有'step'、'cos' 199 | #------------------------------------------------------------------# 200 | lr_decay_type = 'cos' 201 | #------------------------------------------------------------------# 202 | # save_period 多少个epoch保存一次权值 203 | #------------------------------------------------------------------# 204 | save_period = 5 205 | #------------------------------------------------------------------# 206 | # save_dir 权值与日志文件保存的文件夹 207 | #------------------------------------------------------------------# 208 | save_dir = 'logs' 209 | #------------------------------------------------------------------# 210 | # eval_flag 是否在训练时进行评估,评估对象为验证集 211 | # 安装pycocotools库后,评估体验更佳。 212 | # eval_period 代表多少个epoch评估一次,不建议频繁的评估 213 | # 评估需要消耗较多的时间,频繁评估会导致训练非常慢 214 | # 此处获得的mAP会与get_map.py获得的会有所不同,原因有二: 215 | # (一)此处获得的mAP为验证集的mAP。 216 | # (二)此处设置评估参数较为保守,目的是加快评估速度。 217 | #------------------------------------------------------------------# 218 | eval_flag = True 219 | eval_period = 5 220 | #------------------------------------------------------------------# 221 | # num_workers 用于设置是否使用多线程读取数据,1代表关闭多线程 222 | # 开启后会加快数据读取速度,但是会占用更多内存 223 | # 在IO为瓶颈的时候再开启多线程,即GPU运算速度远大于读取图片的速度。 224 | #------------------------------------------------------------------# 225 | num_workers = 4 226 | 227 | #------------------------------------------------------# 228 | # train_annotation_path 训练图片路径和标签 229 | # val_annotation_path 验证图片路径和标签 230 | #------------------------------------------------------# 231 | train_annotation_path = '2007_train.txt' 232 | val_annotation_path = '2007_val.txt' 233 | 234 | seed_everything(seed) 235 | #------------------------------------------------------# 236 | # 设置用到的显卡 237 | #------------------------------------------------------# 238 | ngpus_per_node = torch.cuda.device_count() 239 | if distributed: 240 | dist.init_process_group(backend="nccl") 241 | local_rank = int(os.environ["LOCAL_RANK"]) 242 | rank = int(os.environ["RANK"]) 243 | device = torch.device("cuda", local_rank) 244 | if local_rank == 0: 245 | print(f"[{os.getpid()}] (rank = {rank}, local_rank = {local_rank}) training...") 246 | print("Gpu Device Count : ", ngpus_per_node) 247 | else: 248 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 249 | local_rank = 0 250 | rank = 0 251 | 252 | #----------------------------------------------------# 253 | # 下载预训练权重 254 | #----------------------------------------------------# 255 | if pretrained: 256 | backbone = "efficientnet-b" + str(phi) 257 | if distributed: 258 | if local_rank == 0: 259 | download_weights(backbone) 260 | dist.barrier() 261 | else: 262 | download_weights(backbone) 263 | 264 | #----------------------------------------------------# 265 | # 获取classes和anchor 266 | #----------------------------------------------------# 267 | class_names, num_classes = get_classes(classes_path) 268 | 269 | #------------------------------------------------------# 270 | # 创建EfficientDet模型 271 | # 训练前一定要修改classes_path和对应的txt文件 272 | #------------------------------------------------------# 273 | model = EfficientDetBackbone(num_classes, phi, pretrained) 274 | if model_path != '': 275 | #------------------------------------------------------# 276 | # 权值文件请看README,百度网盘下载 277 | #------------------------------------------------------# 278 | if local_rank == 0: 279 | print('Load weights {}.'.format(model_path)) 280 | 281 | #------------------------------------------------------# 282 | # 根据预训练权重的Key和模型的Key进行加载 283 | #------------------------------------------------------# 284 | model_dict = model.state_dict() 285 | pretrained_dict = torch.load(model_path, map_location = device) 286 | load_key, no_load_key, temp_dict = [], [], {} 287 | for k, v in pretrained_dict.items(): 288 | if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v): 289 | temp_dict[k] = v 290 | load_key.append(k) 291 | else: 292 | no_load_key.append(k) 293 | model_dict.update(temp_dict) 294 | model.load_state_dict(model_dict) 295 | #------------------------------------------------------# 296 | # 显示没有匹配上的Key 297 | #------------------------------------------------------# 298 | if local_rank == 0: 299 | print("\nSuccessful Load Key:", str(load_key)[:500], "……\nSuccessful Load Key Num:", len(load_key)) 300 | print("\nFail To Load Key:", str(no_load_key)[:500], "……\nFail To Load Key num:", len(no_load_key)) 301 | print("\n\033[1;33;44m温馨提示,head部分没有载入是正常现象,Backbone部分没有载入是错误的。\033[0m") 302 | 303 | #----------------------# 304 | # 获得损失函数 305 | #----------------------# 306 | focal_loss = FocalLoss() 307 | #----------------------# 308 | # 记录Loss 309 | #----------------------# 310 | if local_rank == 0: 311 | time_str = datetime.datetime.strftime(datetime.datetime.now(),'%Y_%m_%d_%H_%M_%S') 312 | log_dir = os.path.join(save_dir, "loss_" + str(time_str)) 313 | loss_history = LossHistory(log_dir, model, input_shape=input_shape) 314 | else: 315 | loss_history = None 316 | 317 | #------------------------------------------------------------------# 318 | # torch 1.2不支持amp,建议使用torch 1.7.1及以上正确使用fp16 319 | # 因此torch1.2这里显示"could not be resolve" 320 | #------------------------------------------------------------------# 321 | if fp16: 322 | from torch.cuda.amp import GradScaler as GradScaler 323 | scaler = GradScaler() 324 | else: 325 | scaler = None 326 | 327 | model_train = model.train() 328 | #----------------------------# 329 | # 多卡同步Bn 330 | #----------------------------# 331 | if sync_bn and ngpus_per_node > 1 and distributed: 332 | model_train = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model_train) 333 | elif sync_bn: 334 | print("Sync_bn is not support in one gpu or not distributed.") 335 | 336 | if Cuda: 337 | if distributed: 338 | #----------------------------# 339 | # 多卡平行运行 340 | #----------------------------# 341 | model_train = model_train.cuda(local_rank) 342 | model_train = torch.nn.parallel.DistributedDataParallel(model_train, device_ids=[local_rank], find_unused_parameters=True) 343 | else: 344 | model_train = torch.nn.DataParallel(model) 345 | cudnn.benchmark = True 346 | model_train = model_train.cuda() 347 | 348 | #---------------------------# 349 | # 读取数据集对应的txt 350 | #---------------------------# 351 | with open(train_annotation_path) as f: 352 | train_lines = f.readlines() 353 | with open(val_annotation_path) as f: 354 | val_lines = f.readlines() 355 | num_train = len(train_lines) 356 | num_val = len(val_lines) 357 | 358 | if local_rank == 0: 359 | show_config( 360 | classes_path = classes_path, model_path = model_path, input_shape = input_shape, \ 361 | Init_Epoch = Init_Epoch, Freeze_Epoch = Freeze_Epoch, UnFreeze_Epoch = UnFreeze_Epoch, Freeze_batch_size = Freeze_batch_size, Unfreeze_batch_size = Unfreeze_batch_size, Freeze_Train = Freeze_Train, \ 362 | Init_lr = Init_lr, Min_lr = Min_lr, optimizer_type = optimizer_type, momentum = momentum, lr_decay_type = lr_decay_type, \ 363 | save_period = save_period, save_dir = save_dir, num_workers = num_workers, num_train = num_train, num_val = num_val 364 | ) 365 | #---------------------------------------------------------# 366 | # 总训练世代指的是遍历全部数据的总次数 367 | # 总训练步长指的是梯度下降的总次数 368 | # 每个训练世代包含若干训练步长,每个训练步长进行一次梯度下降。 369 | # 此处仅建议最低训练世代,上不封顶,计算时只考虑了解冻部分 370 | #----------------------------------------------------------# 371 | wanted_step = 5e4 if optimizer_type == "sgd" else 1.5e4 372 | total_step = num_train // Unfreeze_batch_size * UnFreeze_Epoch 373 | if total_step <= wanted_step: 374 | if num_train // Unfreeze_batch_size == 0: 375 | raise ValueError('数据集过小,无法进行训练,请扩充数据集。') 376 | wanted_epoch = wanted_step // (num_train // Unfreeze_batch_size) + 1 377 | print("\n\033[1;33;44m[Warning] 使用%s优化器时,建议将训练总步长设置到%d以上。\033[0m"%(optimizer_type, wanted_step)) 378 | print("\033[1;33;44m[Warning] 本次运行的总训练数据量为%d,Unfreeze_batch_size为%d,共训练%d个Epoch,计算出总训练步长为%d。\033[0m"%(num_train, Unfreeze_batch_size, UnFreeze_Epoch, total_step)) 379 | print("\033[1;33;44m[Warning] 由于总训练步长为%d,小于建议总步长%d,建议设置总世代为%d。\033[0m"%(total_step, wanted_step, wanted_epoch)) 380 | 381 | #------------------------------------------------------# 382 | # 主干特征提取网络特征通用,冻结训练可以加快训练速度 383 | # 也可以在训练初期防止权值被破坏。 384 | # Init_Epoch为起始世代 385 | # Freeze_Epoch为冻结训练的世代 386 | # UnFreeze_Epoch总训练世代 387 | # 提示OOM或者显存不足请调小Batch_size 388 | #------------------------------------------------------# 389 | if True: 390 | UnFreeze_flag = False 391 | #------------------------------------# 392 | # 冻结一定部分训练 393 | #------------------------------------# 394 | if Freeze_Train: 395 | for param in model.backbone_net.parameters(): 396 | param.requires_grad = False 397 | 398 | #-------------------------------------------------------------------# 399 | # 如果不冻结训练的话,直接设置batch_size为Unfreeze_batch_size 400 | #-------------------------------------------------------------------# 401 | batch_size = Freeze_batch_size if Freeze_Train else Unfreeze_batch_size 402 | 403 | #-------------------------------------------------------------------# 404 | # 判断当前batch_size,自适应调整学习率 405 | #-------------------------------------------------------------------# 406 | nbs = 16 407 | lr_limit_max = 5e-4 if optimizer_type == 'adam' else 1e-1 408 | lr_limit_min = 3e-4 if optimizer_type == 'adam' else 5e-4 409 | Init_lr_fit = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max) 410 | Min_lr_fit = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2) 411 | 412 | #---------------------------------------# 413 | # 根据optimizer_type选择优化器 414 | #---------------------------------------# 415 | optimizer = { 416 | 'adam' : optim.Adam(model.parameters(), Init_lr_fit, betas = (momentum, 0.999), weight_decay = weight_decay), 417 | 'sgd' : optim.SGD(model.parameters(), Init_lr_fit, momentum = momentum, nesterov=True, weight_decay = weight_decay) 418 | }[optimizer_type] 419 | 420 | #---------------------------------------# 421 | # 获得学习率下降的公式 422 | #---------------------------------------# 423 | lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch) 424 | 425 | #---------------------------------------# 426 | # 判断每一个世代的长度 427 | #---------------------------------------# 428 | epoch_step = num_train // batch_size 429 | epoch_step_val = num_val // batch_size 430 | 431 | if epoch_step == 0 or epoch_step_val == 0: 432 | raise ValueError("数据集过小,无法继续进行训练,请扩充数据集。") 433 | 434 | train_dataset = EfficientdetDataset(train_lines, input_shape, num_classes, train = True) 435 | val_dataset = EfficientdetDataset(val_lines, input_shape, num_classes, train = False) 436 | 437 | if distributed: 438 | train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset, shuffle=True,) 439 | val_sampler = torch.utils.data.distributed.DistributedSampler(val_dataset, shuffle=False,) 440 | batch_size = batch_size // ngpus_per_node 441 | shuffle = False 442 | else: 443 | train_sampler = None 444 | val_sampler = None 445 | shuffle = True 446 | 447 | gen = DataLoader(train_dataset, shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 448 | drop_last=True, collate_fn=efficientdet_dataset_collate, sampler=train_sampler, 449 | worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed)) 450 | gen_val = DataLoader(val_dataset , shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 451 | drop_last=True, collate_fn=efficientdet_dataset_collate, sampler=val_sampler, 452 | worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed)) 453 | 454 | #----------------------# 455 | # 记录eval的map曲线 456 | #----------------------# 457 | if local_rank == 0: 458 | eval_callback = EvalCallback(model, input_shape, class_names, num_classes, val_lines, log_dir, Cuda, \ 459 | eval_flag=eval_flag, period=eval_period) 460 | else: 461 | eval_callback = None 462 | 463 | #---------------------------------------# 464 | # 开始模型训练 465 | #---------------------------------------# 466 | for epoch in range(Init_Epoch, UnFreeze_Epoch): 467 | #---------------------------------------# 468 | # 如果模型有冻结学习部分 469 | # 则解冻,并设置参数 470 | #---------------------------------------# 471 | if epoch >= Freeze_Epoch and not UnFreeze_flag and Freeze_Train: 472 | batch_size = Unfreeze_batch_size 473 | 474 | #-------------------------------------------------------------------# 475 | # 判断当前batch_size,自适应调整学习率 476 | #-------------------------------------------------------------------# 477 | nbs = 16 478 | lr_limit_max = 5e-4 if optimizer_type == 'adam' else 1e-1 479 | lr_limit_min = 3e-4 if optimizer_type == 'adam' else 5e-4 480 | Init_lr_fit = min(max(batch_size / nbs * Init_lr, lr_limit_min), lr_limit_max) 481 | Min_lr_fit = min(max(batch_size / nbs * Min_lr, lr_limit_min * 1e-2), lr_limit_max * 1e-2) 482 | #---------------------------------------# 483 | # 获得学习率下降的公式 484 | #---------------------------------------# 485 | lr_scheduler_func = get_lr_scheduler(lr_decay_type, Init_lr_fit, Min_lr_fit, UnFreeze_Epoch) 486 | 487 | for param in model.backbone_net.parameters(): 488 | param.requires_grad = True 489 | 490 | epoch_step = num_train // batch_size 491 | epoch_step_val = num_val // batch_size 492 | 493 | if epoch_step == 0 or epoch_step_val == 0: 494 | raise ValueError("数据集过小,无法继续进行训练,请扩充数据集。") 495 | 496 | if distributed: 497 | batch_size = batch_size // ngpus_per_node 498 | 499 | gen = DataLoader(train_dataset, shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 500 | drop_last=True, collate_fn=efficientdet_dataset_collate, sampler=train_sampler, 501 | worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed)) 502 | gen_val = DataLoader(val_dataset , shuffle = shuffle, batch_size = batch_size, num_workers = num_workers, pin_memory=True, 503 | drop_last=True, collate_fn=efficientdet_dataset_collate, sampler=val_sampler, 504 | worker_init_fn=partial(worker_init_fn, rank=rank, seed=seed)) 505 | 506 | UnFreeze_flag = True 507 | 508 | if distributed: 509 | train_sampler.set_epoch(epoch) 510 | 511 | set_optimizer_lr(optimizer, lr_scheduler_func, epoch) 512 | 513 | fit_one_epoch(model_train, model, focal_loss, loss_history, eval_callback, optimizer, epoch, 514 | epoch_step, epoch_step_val, gen, gen_val, UnFreeze_Epoch, Cuda, fp16, scaler, save_period, save_dir, local_rank) 515 | 516 | if distributed: 517 | dist.barrier() 518 | 519 | if local_rank == 0: 520 | loss_history.writer.close() 521 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /utils/anchors.py: -------------------------------------------------------------------------------- 1 | import itertools 2 | 3 | import numpy as np 4 | import torch 5 | import torch.nn as nn 6 | 7 | 8 | class Anchors(nn.Module): 9 | def __init__(self, anchor_scale=4., pyramid_levels=[3, 4, 5, 6, 7]): 10 | super().__init__() 11 | self.anchor_scale = anchor_scale 12 | self.pyramid_levels = pyramid_levels 13 | # strides步长为[8, 16, 32, 64, 128], 特征点的间距 14 | self.strides = [2 ** x for x in self.pyramid_levels] 15 | self.scales = np.array([2 ** 0, 2 ** (1.0 / 3.0), 2 ** (2.0 / 3.0)]) 16 | self.ratios = [(1.0, 1.0), (1.4, 0.7), (0.7, 1.4)] 17 | 18 | def forward(self, image): 19 | image_shape = image.shape[2:] 20 | 21 | boxes_all = [] 22 | for stride in self.strides: 23 | boxes_level = [] 24 | for scale, ratio in itertools.product(self.scales, self.ratios): 25 | if image_shape[1] % stride != 0: 26 | raise ValueError('input size must be divided by the stride.') 27 | base_anchor_size = self.anchor_scale * stride * scale 28 | anchor_size_x_2 = base_anchor_size * ratio[0] / 2.0 29 | anchor_size_y_2 = base_anchor_size * ratio[1] / 2.0 30 | x = np.arange(stride / 2, image_shape[1], stride) 31 | y = np.arange(stride / 2, image_shape[0], stride) 32 | 33 | xv, yv = np.meshgrid(x, y) 34 | 35 | xv = xv.reshape(-1) 36 | yv = yv.reshape(-1) 37 | 38 | # y1,x1,y2,x2 39 | boxes = np.vstack((yv - anchor_size_y_2, xv - anchor_size_x_2, 40 | yv + anchor_size_y_2, xv + anchor_size_x_2)) 41 | boxes = np.swapaxes(boxes, 0, 1) 42 | boxes_level.append(np.expand_dims(boxes, axis=1)) 43 | # concat anchors on the same level to the reshape NxAx4 44 | boxes_level = np.concatenate(boxes_level, axis=1) 45 | boxes_all.append(boxes_level.reshape([-1, 4])) 46 | 47 | anchor_boxes = np.vstack(boxes_all) 48 | 49 | anchor_boxes = torch.from_numpy(anchor_boxes).to(image.device) 50 | anchor_boxes = anchor_boxes.unsqueeze(0) 51 | 52 | return anchor_boxes 53 | -------------------------------------------------------------------------------- /utils/callbacks.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import os 3 | 4 | import matplotlib 5 | import torch 6 | 7 | matplotlib.use('Agg') 8 | from matplotlib import pyplot as plt 9 | import scipy.signal 10 | 11 | import shutil 12 | import numpy as np 13 | from PIL import Image 14 | from torch.utils.tensorboard import SummaryWriter 15 | from tqdm import tqdm 16 | 17 | from .utils import cvtColor, preprocess_input, resize_image 18 | from .utils_bbox import decodebox, non_max_suppression 19 | from .utils_map import get_coco_map, get_map 20 | 21 | 22 | class LossHistory(): 23 | def __init__(self, log_dir, model, input_shape): 24 | self.log_dir = log_dir 25 | self.losses = [] 26 | self.val_loss = [] 27 | 28 | os.makedirs(self.log_dir) 29 | self.writer = SummaryWriter(self.log_dir) 30 | try: 31 | dummy_input = torch.randn(2, 3, input_shape[0], input_shape[1]) 32 | self.writer.add_graph(model, dummy_input) 33 | except: 34 | pass 35 | 36 | def append_loss(self, epoch, loss, val_loss): 37 | if not os.path.exists(self.log_dir): 38 | os.makedirs(self.log_dir) 39 | 40 | self.losses.append(loss) 41 | self.val_loss.append(val_loss) 42 | 43 | with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f: 44 | f.write(str(loss)) 45 | f.write("\n") 46 | with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f: 47 | f.write(str(val_loss)) 48 | f.write("\n") 49 | 50 | self.writer.add_scalar('loss', loss, epoch) 51 | self.writer.add_scalar('val_loss', val_loss, epoch) 52 | self.loss_plot() 53 | 54 | def loss_plot(self): 55 | iters = range(len(self.losses)) 56 | 57 | plt.figure() 58 | plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss') 59 | plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss') 60 | try: 61 | if len(self.losses) < 25: 62 | num = 5 63 | else: 64 | num = 15 65 | 66 | plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss') 67 | plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss') 68 | except: 69 | pass 70 | 71 | plt.grid(True) 72 | plt.xlabel('Epoch') 73 | plt.ylabel('Loss') 74 | plt.legend(loc="upper right") 75 | 76 | plt.savefig(os.path.join(self.log_dir, "epoch_loss.png")) 77 | 78 | plt.cla() 79 | plt.close("all") 80 | 81 | class EvalCallback(): 82 | def __init__(self, net, input_shape, class_names, num_classes, val_lines, log_dir, cuda, \ 83 | map_out_path=".temp_map_out", max_boxes=100, confidence=0.05, nms_iou=0.5, letterbox_image=True, MINOVERLAP=0.5, eval_flag=True, period=1): 84 | super(EvalCallback, self).__init__() 85 | 86 | self.net = net 87 | self.input_shape = input_shape 88 | self.class_names = class_names 89 | self.num_classes = num_classes 90 | self.val_lines = val_lines 91 | self.log_dir = log_dir 92 | self.cuda = cuda 93 | self.map_out_path = map_out_path 94 | self.max_boxes = max_boxes 95 | self.confidence = confidence 96 | self.nms_iou = nms_iou 97 | self.letterbox_image = letterbox_image 98 | self.MINOVERLAP = MINOVERLAP 99 | self.eval_flag = eval_flag 100 | self.period = period 101 | 102 | self.maps = [0] 103 | self.epoches = [0] 104 | if self.eval_flag: 105 | with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f: 106 | f.write(str(0)) 107 | f.write("\n") 108 | 109 | #---------------------------------------------------# 110 | # 检测图片 111 | #---------------------------------------------------# 112 | def get_map_txt(self, image_id, image, class_names, map_out_path): 113 | f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w") 114 | image_shape = np.array(np.shape(image)[0:2]) 115 | #---------------------------------------------------------# 116 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 117 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 118 | #---------------------------------------------------------# 119 | image = cvtColor(image) 120 | #---------------------------------------------------------# 121 | # 给图像增加灰条,实现不失真的resize 122 | # 也可以直接resize进行识别 123 | #---------------------------------------------------------# 124 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 125 | #---------------------------------------------------------# 126 | # 添加上batch_size维度,图片预处理,归一化。 127 | #---------------------------------------------------------# 128 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 129 | 130 | with torch.no_grad(): 131 | images = torch.from_numpy(image_data) 132 | if self.cuda: 133 | images = images.cuda() 134 | #---------------------------------------------------------# 135 | # 传入网络当中进行预测 136 | #---------------------------------------------------------# 137 | _, regression, classification, anchors = self.net(images) 138 | 139 | #-----------------------------------------------------------# 140 | # 将预测结果进行解码 141 | #-----------------------------------------------------------# 142 | outputs = decodebox(regression, anchors, self.input_shape) 143 | results = non_max_suppression(torch.cat([outputs, classification], axis=-1), self.input_shape, 144 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 145 | 146 | if results[0] is None: 147 | return 148 | 149 | top_label = np.array(results[0][:, 5], dtype = 'int32') 150 | top_conf = results[0][:, 4] 151 | top_boxes = results[0][:, :4] 152 | 153 | top_100 = np.argsort(top_conf)[::-1][:self.max_boxes] 154 | top_boxes = top_boxes[top_100] 155 | top_conf = top_conf[top_100] 156 | top_label = top_label[top_100] 157 | 158 | for i, c in list(enumerate(top_label)): 159 | predicted_class = self.class_names[int(c)] 160 | box = top_boxes[i] 161 | score = str(top_conf[i]) 162 | 163 | top, left, bottom, right = box 164 | if predicted_class not in class_names: 165 | continue 166 | 167 | f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom)))) 168 | 169 | f.close() 170 | return 171 | 172 | def on_epoch_end(self, epoch, model_eval): 173 | if epoch % self.period == 0 and self.eval_flag: 174 | self.net = model_eval 175 | if not os.path.exists(self.map_out_path): 176 | os.makedirs(self.map_out_path) 177 | if not os.path.exists(os.path.join(self.map_out_path, "ground-truth")): 178 | os.makedirs(os.path.join(self.map_out_path, "ground-truth")) 179 | if not os.path.exists(os.path.join(self.map_out_path, "detection-results")): 180 | os.makedirs(os.path.join(self.map_out_path, "detection-results")) 181 | print("Get map.") 182 | for annotation_line in tqdm(self.val_lines): 183 | line = annotation_line.split() 184 | image_id = os.path.basename(line[0]).split('.')[0] 185 | #------------------------------# 186 | # 读取图像并转换成RGB图像 187 | #------------------------------# 188 | image = Image.open(line[0]) 189 | #------------------------------# 190 | # 获得预测框 191 | #------------------------------# 192 | gt_boxes = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]]) 193 | #------------------------------# 194 | # 获得预测txt 195 | #------------------------------# 196 | self.get_map_txt(image_id, image, self.class_names, self.map_out_path) 197 | 198 | #------------------------------# 199 | # 获得真实框txt 200 | #------------------------------# 201 | with open(os.path.join(self.map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f: 202 | for box in gt_boxes: 203 | left, top, right, bottom, obj = box 204 | obj_name = self.class_names[obj] 205 | new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom)) 206 | 207 | print("Calculate Map.") 208 | try: 209 | temp_map = get_coco_map(class_names = self.class_names, path = self.map_out_path)[1] 210 | except: 211 | temp_map = get_map(self.MINOVERLAP, False, path = self.map_out_path) 212 | self.maps.append(temp_map) 213 | self.epoches.append(epoch) 214 | 215 | with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f: 216 | f.write(str(temp_map)) 217 | f.write("\n") 218 | 219 | plt.figure() 220 | plt.plot(self.epoches, self.maps, 'red', linewidth = 2, label='train map') 221 | 222 | plt.grid(True) 223 | plt.xlabel('Epoch') 224 | plt.ylabel('Map %s'%str(self.MINOVERLAP)) 225 | plt.title('A Map Curve') 226 | plt.legend(loc="upper right") 227 | 228 | plt.savefig(os.path.join(self.log_dir, "epoch_map.png")) 229 | plt.cla() 230 | plt.close("all") 231 | 232 | print("Get map done.") 233 | shutil.rmtree(self.map_out_path) 234 | -------------------------------------------------------------------------------- /utils/dataloader.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import torch 4 | from PIL import Image 5 | from torch.utils.data.dataset import Dataset 6 | 7 | from utils.utils import cvtColor, preprocess_input 8 | 9 | 10 | class EfficientdetDataset(Dataset): 11 | def __init__(self, annotation_lines, input_shape, num_classes, train): 12 | super(EfficientdetDataset, self).__init__() 13 | self.annotation_lines = annotation_lines 14 | self.length = len(self.annotation_lines) 15 | self.input_shape = input_shape 16 | self.num_classes = num_classes 17 | self.train = train 18 | 19 | def __len__(self): 20 | return self.length 21 | 22 | def __getitem__(self, index): 23 | index = index % self.length 24 | 25 | image, box = self.get_random_data(self.annotation_lines[index], self.input_shape, random = self.train) 26 | 27 | image = np.transpose(preprocess_input(np.array(image, dtype=np.float32)),(2,0,1)) 28 | box = np.array(box, dtype=np.float32) 29 | return image, box 30 | 31 | def rand(self, a=0, b=1): 32 | return np.random.rand()*(b-a) + a 33 | 34 | def get_random_data(self, annotation_line, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.4, random=True): 35 | line = annotation_line.split() 36 | #------------------------------# 37 | # 读取图像并转换成RGB图像 38 | #------------------------------# 39 | image = Image.open(line[0]) 40 | image = cvtColor(image) 41 | #------------------------------# 42 | # 获得图像的高宽与目标高宽 43 | #------------------------------# 44 | iw, ih = image.size 45 | h, w = input_shape 46 | #------------------------------# 47 | # 获得预测框 48 | #------------------------------# 49 | box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]]) 50 | 51 | if not random: 52 | scale = min(w/iw, h/ih) 53 | nw = int(iw*scale) 54 | nh = int(ih*scale) 55 | dx = (w-nw)//2 56 | dy = (h-nh)//2 57 | 58 | #---------------------------------# 59 | # 将图像多余的部分加上灰条 60 | #---------------------------------# 61 | image = image.resize((nw,nh), Image.BICUBIC) 62 | new_image = Image.new('RGB', (w,h), (128,128,128)) 63 | new_image.paste(image, (dx, dy)) 64 | image_data = np.array(new_image, np.float32) 65 | 66 | #---------------------------------# 67 | # 对真实框进行调整 68 | #---------------------------------# 69 | if len(box)>0: 70 | np.random.shuffle(box) 71 | box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx 72 | box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy 73 | box[:, 0:2][box[:, 0:2]<0] = 0 74 | box[:, 2][box[:, 2]>w] = w 75 | box[:, 3][box[:, 3]>h] = h 76 | box_w = box[:, 2] - box[:, 0] 77 | box_h = box[:, 3] - box[:, 1] 78 | box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid box 79 | 80 | return image_data, box 81 | 82 | #------------------------------------------# 83 | # 对图像进行缩放并且进行长和宽的扭曲 84 | #------------------------------------------# 85 | new_ar = w/h * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter) 86 | scale = self.rand(.25, 2) 87 | if new_ar < 1: 88 | nh = int(scale*h) 89 | nw = int(nh*new_ar) 90 | else: 91 | nw = int(scale*w) 92 | nh = int(nw/new_ar) 93 | image = image.resize((nw,nh), Image.BICUBIC) 94 | 95 | #------------------------------------------# 96 | # 将图像多余的部分加上灰条 97 | #------------------------------------------# 98 | dx = int(self.rand(0, w-nw)) 99 | dy = int(self.rand(0, h-nh)) 100 | new_image = Image.new('RGB', (w,h), (128,128,128)) 101 | new_image.paste(image, (dx, dy)) 102 | image = new_image 103 | 104 | #------------------------------------------# 105 | # 翻转图像 106 | #------------------------------------------# 107 | flip = self.rand()<.5 108 | if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT) 109 | 110 | image_data = np.array(image, np.uint8) 111 | #---------------------------------# 112 | # 对图像进行色域变换 113 | # 计算色域变换的参数 114 | #---------------------------------# 115 | r = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1 116 | #---------------------------------# 117 | # 将图像转到HSV上 118 | #---------------------------------# 119 | hue, sat, val = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV)) 120 | dtype = image_data.dtype 121 | #---------------------------------# 122 | # 应用变换 123 | #---------------------------------# 124 | x = np.arange(0, 256, dtype=r.dtype) 125 | lut_hue = ((x * r[0]) % 180).astype(dtype) 126 | lut_sat = np.clip(x * r[1], 0, 255).astype(dtype) 127 | lut_val = np.clip(x * r[2], 0, 255).astype(dtype) 128 | 129 | image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))) 130 | image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB) 131 | 132 | #---------------------------------# 133 | # 对真实框进行调整 134 | #---------------------------------# 135 | if len(box)>0: 136 | np.random.shuffle(box) 137 | box[:, [0,2]] = box[:, [0,2]]*nw/iw + dx 138 | box[:, [1,3]] = box[:, [1,3]]*nh/ih + dy 139 | if flip: box[:, [0,2]] = w - box[:, [2,0]] 140 | box[:, 0:2][box[:, 0:2]<0] = 0 141 | box[:, 2][box[:, 2]>w] = w 142 | box[:, 3][box[:, 3]>h] = h 143 | box_w = box[:, 2] - box[:, 0] 144 | box_h = box[:, 3] - box[:, 1] 145 | box = box[np.logical_and(box_w>1, box_h>1)] 146 | 147 | return image_data, box 148 | 149 | # DataLoader中collate_fn使用 150 | def efficientdet_dataset_collate(batch): 151 | images = [] 152 | bboxes = [] 153 | for img, box in batch: 154 | images.append(img) 155 | bboxes.append(box) 156 | images = torch.from_numpy(np.array(images)).type(torch.FloatTensor) 157 | bboxes = [torch.from_numpy(ann).type(torch.FloatTensor) for ann in bboxes] 158 | return images, bboxes 159 | 160 | -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | import random 2 | 3 | import numpy as np 4 | import torch 5 | from PIL import Image 6 | 7 | #---------------------------------------------------------------------# 8 | # 用于预测的图像大小,无需修改,由phi选择 9 | #---------------------------------------------------------------------# 10 | image_sizes = [512, 640, 768, 896, 1024, 1280, 1408, 1536] 11 | 12 | #---------------------------------------------------------# 13 | # 将图像转换成RGB图像,防止灰度图在预测时报错。 14 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 15 | #---------------------------------------------------------# 16 | def cvtColor(image): 17 | if len(np.shape(image)) == 3 and np.shape(image)[2] == 3: 18 | return image 19 | else: 20 | image = image.convert('RGB') 21 | return image 22 | 23 | #---------------------------------------------------# 24 | # 对输入图像进行resize 25 | #---------------------------------------------------# 26 | def resize_image(image, size, letterbox_image): 27 | iw, ih = image.size 28 | w, h = size 29 | if letterbox_image: 30 | scale = min(w/iw, h/ih) 31 | nw = int(iw*scale) 32 | nh = int(ih*scale) 33 | 34 | image = image.resize((nw,nh), Image.BICUBIC) 35 | new_image = Image.new('RGB', size, (128,128,128)) 36 | new_image.paste(image, ((w-nw)//2, (h-nh)//2)) 37 | else: 38 | new_image = image.resize((w, h), Image.BICUBIC) 39 | return new_image 40 | 41 | #---------------------------------------------------# 42 | # 获得类 43 | #---------------------------------------------------# 44 | def get_classes(classes_path): 45 | with open(classes_path, encoding='utf-8') as f: 46 | class_names = f.readlines() 47 | class_names = [c.strip() for c in class_names] 48 | return class_names, len(class_names) 49 | 50 | #---------------------------------------------------# 51 | # 获得学习率 52 | #---------------------------------------------------# 53 | def get_lr(optimizer): 54 | for param_group in optimizer.param_groups: 55 | return param_group['lr'] 56 | 57 | #---------------------------------------------------# 58 | # 设置种子 59 | #---------------------------------------------------# 60 | def seed_everything(seed=11): 61 | random.seed(seed) 62 | np.random.seed(seed) 63 | torch.manual_seed(seed) 64 | torch.cuda.manual_seed(seed) 65 | torch.cuda.manual_seed_all(seed) 66 | torch.backends.cudnn.deterministic = True 67 | torch.backends.cudnn.benchmark = False 68 | 69 | #---------------------------------------------------# 70 | # 设置Dataloader的种子 71 | #---------------------------------------------------# 72 | def worker_init_fn(worker_id, rank, seed): 73 | worker_seed = rank + seed 74 | random.seed(worker_seed) 75 | np.random.seed(worker_seed) 76 | torch.manual_seed(worker_seed) 77 | 78 | def preprocess_input(image): 79 | image /= 255 80 | mean = (0.485, 0.456, 0.406) 81 | std = (0.229, 0.224, 0.225) 82 | image -= mean 83 | image /= std 84 | return image 85 | 86 | def show_config(**kwargs): 87 | print('Configurations:') 88 | print('-' * 70) 89 | print('|%25s | %40s|' % ('keys', 'values')) 90 | print('-' * 70) 91 | for key, value in kwargs.items(): 92 | print('|%25s | %40s|' % (str(key), str(value))) 93 | print('-' * 70) 94 | 95 | def download_weights(backbone, model_dir="./model_data"): 96 | import os 97 | 98 | from torch.hub import load_state_dict_from_url 99 | 100 | download_urls = { 101 | 'efficientnet-b0': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b0.pth', 102 | 'efficientnet-b1': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b1.pth', 103 | 'efficientnet-b2': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b2.pth', 104 | 'efficientnet-b3': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b3.pth', 105 | 'efficientnet-b4': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b4.pth', 106 | 'efficientnet-b5': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b5.pth', 107 | 'efficientnet-b6': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b6.pth', 108 | 'efficientnet-b7': 'https://github.com/bubbliiiing/efficientdet-pytorch/releases/download/v1.0/efficientnet-b7.pth', 109 | } 110 | url = download_urls[backbone] 111 | 112 | if not os.path.exists(model_dir): 113 | os.makedirs(model_dir) 114 | load_state_dict_from_url(url, model_dir) -------------------------------------------------------------------------------- /utils/utils_bbox.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from torchvision.ops import nms 4 | 5 | def decodebox(regression, anchors, input_shape): 6 | dtype = regression.dtype 7 | anchors = anchors.to(dtype) 8 | #--------------------------------------# 9 | # 计算先验框的中心 10 | #--------------------------------------# 11 | y_centers_a = (anchors[..., 0] + anchors[..., 2]) / 2 12 | x_centers_a = (anchors[..., 1] + anchors[..., 3]) / 2 13 | 14 | #--------------------------------------# 15 | # 计算先验框的宽高 16 | #--------------------------------------# 17 | ha = anchors[..., 2] - anchors[..., 0] 18 | wa = anchors[..., 3] - anchors[..., 1] 19 | 20 | #--------------------------------------# 21 | # 计算调整后先验框的宽高 22 | # 即计算预测框的宽高 23 | #--------------------------------------# 24 | w = regression[..., 3].exp() * wa 25 | h = regression[..., 2].exp() * ha 26 | 27 | #--------------------------------------# 28 | # 计算调整后先验框的中心 29 | # 即计算预测框的中心 30 | #--------------------------------------# 31 | y_centers = regression[..., 0] * ha + y_centers_a 32 | x_centers = regression[..., 1] * wa + x_centers_a 33 | 34 | #--------------------------------------# 35 | # 计算预测框的左上角右下角 36 | #--------------------------------------# 37 | ymin = y_centers - h / 2. 38 | xmin = x_centers - w / 2. 39 | ymax = y_centers + h / 2. 40 | xmax = x_centers + w / 2. 41 | 42 | #--------------------------------------# 43 | # 将预测框的结果进行堆叠 44 | #--------------------------------------# 45 | boxes = torch.stack([xmin, ymin, xmax, ymax], dim=2) 46 | 47 | # fig = plt.figure() 48 | # ax = fig.add_subplot(121) 49 | # grid_x = x_centers_a[0,-4*4*9:] 50 | # grid_y = y_centers_a[0,-4*4*9:] 51 | # plt.ylim(-600,1200) 52 | # plt.xlim(-600,1200) 53 | # plt.gca().invert_yaxis() 54 | # plt.scatter(grid_x.cpu(),grid_y.cpu()) 55 | 56 | # anchor_left = anchors[0,-4*4*9:,1] 57 | # anchor_top = anchors[0,-4*4*9:,0] 58 | # anchor_w = wa[0,-4*4*9:] 59 | # anchor_h = ha[0,-4*4*9:] 60 | 61 | # for i in range(9,18): 62 | # rect1 = plt.Rectangle([anchor_left[i],anchor_top[i]],anchor_w[i],anchor_h[i],color="r",fill=False) 63 | # ax.add_patch(rect1) 64 | 65 | # ax = fig.add_subplot(122) 66 | 67 | # grid_x = x_centers_a[0,-4*4*9:] 68 | # grid_y = y_centers_a[0,-4*4*9:] 69 | # plt.scatter(grid_x.cpu(),grid_y.cpu()) 70 | # plt.ylim(-600,1200) 71 | # plt.xlim(-600,1200) 72 | # plt.gca().invert_yaxis() 73 | 74 | # y_centers = y_centers[0,-4*4*9:] 75 | # x_centers = x_centers[0,-4*4*9:] 76 | 77 | # pre_left = xmin[0,-4*4*9:] 78 | # pre_top = ymin[0,-4*4*9:] 79 | 80 | # pre_w = xmax[0,-4*4*9:]-xmin[0,-4*4*9:] 81 | # pre_h = ymax[0,-4*4*9:]-ymin[0,-4*4*9:] 82 | 83 | # for i in range(9,18): 84 | # plt.scatter(x_centers[i].cpu(),y_centers[i].cpu(),c='r') 85 | # rect1 = plt.Rectangle([pre_left[i],pre_top[i]],pre_w[i],pre_h[i],color="r",fill=False) 86 | # ax.add_patch(rect1) 87 | 88 | # plt.show() 89 | boxes[:, :, [0, 2]] = boxes[:, :, [0, 2]] / input_shape[1] 90 | boxes[:, :, [1, 3]] = boxes[:, :, [1, 3]] / input_shape[0] 91 | 92 | boxes = torch.clamp(boxes, min = 0, max = 1) 93 | return boxes 94 | 95 | def bbox_iou(box1, box2, x1y1x2y2=True): 96 | """ 97 | 计算IOU 98 | """ 99 | if not x1y1x2y2: 100 | b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2 101 | b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2 102 | b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2 103 | b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2 104 | else: 105 | b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3] 106 | b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3] 107 | 108 | inter_rect_x1 = torch.max(b1_x1, b2_x1) 109 | inter_rect_y1 = torch.max(b1_y1, b2_y1) 110 | inter_rect_x2 = torch.min(b1_x2, b2_x2) 111 | inter_rect_y2 = torch.min(b1_y2, b2_y2) 112 | 113 | inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1, min=0) * \ 114 | torch.clamp(inter_rect_y2 - inter_rect_y1, min=0) 115 | 116 | b1_area = (b1_x2 - b1_x1) * (b1_y2 - b1_y1) 117 | b2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1) 118 | 119 | iou = inter_area / torch.clamp(b1_area + b2_area - inter_area, min = 1e-6) 120 | 121 | return iou 122 | 123 | def efficientdet_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image): 124 | #-----------------------------------------------------------------# 125 | # 把y轴放前面是因为方便预测框和图像的宽高进行相乘 126 | #-----------------------------------------------------------------# 127 | box_yx = box_xy[..., ::-1] 128 | box_hw = box_wh[..., ::-1] 129 | input_shape = np.array(input_shape) 130 | image_shape = np.array(image_shape) 131 | 132 | if letterbox_image: 133 | #-----------------------------------------------------------------# 134 | # 这里求出来的offset是图像有效区域相对于图像左上角的偏移情况 135 | # new_shape指的是宽高缩放情况 136 | #-----------------------------------------------------------------# 137 | new_shape = np.round(image_shape * np.min(input_shape/image_shape)) 138 | offset = (input_shape - new_shape)/2./input_shape 139 | scale = input_shape/new_shape 140 | 141 | box_yx = (box_yx - offset) * scale 142 | box_hw *= scale 143 | 144 | box_mins = box_yx - (box_hw / 2.) 145 | box_maxes = box_yx + (box_hw / 2.) 146 | boxes = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]], axis=-1) 147 | boxes *= np.concatenate([image_shape, image_shape], axis=-1) 148 | return boxes 149 | 150 | def non_max_suppression(prediction, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4): 151 | output = [None for _ in range(len(prediction))] 152 | 153 | #----------------------------------------------------------# 154 | # 预测只用一张图片,只会进行一次 155 | #----------------------------------------------------------# 156 | for i, image_pred in enumerate(prediction): 157 | #----------------------------------------------------------# 158 | # 对种类预测部分取max。 159 | # class_conf [num_anchors, 1] 种类置信度 160 | # class_pred [num_anchors, 1] 种类 161 | #----------------------------------------------------------# 162 | class_conf, class_pred = torch.max(image_pred[:, 4:], 1, keepdim=True) 163 | 164 | #----------------------------------------------------------# 165 | # 利用置信度进行第一轮筛选 166 | #----------------------------------------------------------# 167 | conf_mask = (class_conf[:, 0] >= conf_thres).squeeze() 168 | 169 | #----------------------------------------------------------# 170 | # 根据置信度进行预测结果的筛选 171 | #----------------------------------------------------------# 172 | image_pred = image_pred[conf_mask] 173 | class_conf = class_conf[conf_mask] 174 | class_pred = class_pred[conf_mask] 175 | if not image_pred.size(0): 176 | continue 177 | #-------------------------------------------------------------------------# 178 | # detections [num_anchors, 6] 179 | # 6的内容为:x1, y1, x2, y2, class_conf, class_pred 180 | #-------------------------------------------------------------------------# 181 | detections = torch.cat((image_pred[:, :4], class_conf.float(), class_pred.float()), 1) 182 | 183 | #------------------------------------------# 184 | # 获得预测结果中包含的所有种类 185 | #------------------------------------------# 186 | unique_labels = detections[:, -1].cpu().unique() 187 | 188 | if prediction.is_cuda: 189 | unique_labels = unique_labels.cuda() 190 | detections = detections.cuda() 191 | 192 | for c in unique_labels: 193 | #------------------------------------------# 194 | # 获得某一类得分筛选后全部的预测结果 195 | #------------------------------------------# 196 | detections_class = detections[detections[:, -1] == c] 197 | 198 | #------------------------------------------# 199 | # 使用官方自带的非极大抑制会速度更快一些! 200 | #------------------------------------------# 201 | keep = nms( 202 | detections_class[:, :4], 203 | detections_class[:, 4], 204 | nms_thres 205 | ) 206 | max_detections = detections_class[keep] 207 | 208 | # #------------------------------------------# 209 | # # 按照存在物体的置信度排序 210 | # #------------------------------------------# 211 | # _, conf_sort_index = torch.sort(detections_class[:, 4], descending=True) 212 | # detections_class = detections_class[conf_sort_index] 213 | # #------------------------------------------# 214 | # # 进行非极大抑制 215 | # #------------------------------------------# 216 | # max_detections = [] 217 | # while detections_class.size(0): 218 | # #---------------------------------------------------# 219 | # # 取出这一类置信度最高的,一步一步往下判断。 220 | # # 判断重合程度是否大于nms_thres,如果是则去除掉 221 | # #---------------------------------------------------# 222 | # max_detections.append(detections_class[0].unsqueeze(0)) 223 | # if len(detections_class) == 1: 224 | # break 225 | # ious = bbox_iou(max_detections[-1], detections_class[1:]) 226 | # detections_class = detections_class[1:][ious < nms_thres] 227 | # #------------------------------------------# 228 | # # 堆叠 229 | # #------------------------------------------# 230 | # max_detections = torch.cat(max_detections).data 231 | 232 | output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections)) 233 | 234 | if output[i] is not None: 235 | output[i] = output[i].cpu().numpy() 236 | box_xy, box_wh = (output[i][:, 0:2] + output[i][:, 2:4])/2, output[i][:, 2:4] - output[i][:, 0:2] 237 | output[i][:, :4] = efficientdet_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image) 238 | return output 239 | -------------------------------------------------------------------------------- /utils/utils_fit.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import torch 4 | from tqdm import tqdm 5 | 6 | from utils.utils import get_lr 7 | 8 | 9 | def fit_one_epoch(model_train, model, focal_loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, cuda, fp16, scaler, save_period, save_dir, local_rank=0): 10 | loss = 0 11 | val_loss = 0 12 | 13 | if local_rank == 0: 14 | print('Start Train') 15 | pbar = tqdm(total=epoch_step,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) 16 | model_train.train() 17 | for iteration, batch in enumerate(gen): 18 | if iteration >= epoch_step: 19 | break 20 | images, targets = batch[0], batch[1] 21 | with torch.no_grad(): 22 | if cuda: 23 | images = images.cuda(local_rank) 24 | targets = [ann.cuda(local_rank) for ann in targets] 25 | #----------------------# 26 | # 清零梯度 27 | #----------------------# 28 | optimizer.zero_grad() 29 | if not fp16: 30 | #-------------------# 31 | # 获得预测结果 32 | #-------------------# 33 | _, regression, classification, anchors = model_train(images) 34 | #-------------------# 35 | # 计算损失 36 | #-------------------# 37 | loss_value, _, _ = focal_loss(classification, regression, anchors, targets, cuda = cuda) 38 | 39 | loss_value.backward() 40 | optimizer.step() 41 | else: 42 | from torch.cuda.amp import autocast 43 | with autocast(): 44 | #-------------------# 45 | # 获得预测结果 46 | #-------------------# 47 | _, regression, classification, anchors = model_train(images) 48 | #-------------------# 49 | # 计算损失 50 | #-------------------# 51 | loss_value, _, _ = focal_loss(classification, regression, anchors, targets, cuda = cuda) 52 | 53 | #----------------------# 54 | # 反向传播 55 | #----------------------# 56 | scaler.scale(loss_value).backward() 57 | scaler.step(optimizer) 58 | scaler.update() 59 | 60 | loss += loss_value.item() 61 | 62 | if local_rank == 0: 63 | pbar.set_postfix(**{'loss' : loss / (iteration + 1), 64 | 'lr' : get_lr(optimizer)}) 65 | pbar.update(1) 66 | 67 | if local_rank == 0: 68 | pbar.close() 69 | print('Finish Train') 70 | print('Start Validation') 71 | pbar = tqdm(total=epoch_step_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) 72 | 73 | model_train.eval() 74 | for iteration, batch in enumerate(gen_val): 75 | if iteration >= epoch_step_val: 76 | break 77 | images, targets = batch[0], batch[1] 78 | with torch.no_grad(): 79 | if cuda: 80 | images = images.cuda(local_rank) 81 | targets = [ann.cuda(local_rank) for ann in targets] 82 | #----------------------# 83 | # 清零梯度 84 | #----------------------# 85 | optimizer.zero_grad() 86 | #-------------------# 87 | # 获得预测结果 88 | #-------------------# 89 | _, regression, classification, anchors = model_train(images) 90 | #-------------------# 91 | # 计算损失 92 | #-------------------# 93 | loss_value, _, _ = focal_loss(classification, regression, anchors, targets, cuda = cuda) 94 | 95 | val_loss += loss_value.item() 96 | if local_rank == 0: 97 | pbar.set_postfix(**{'val_loss': val_loss / (iteration + 1)}) 98 | pbar.update(1) 99 | 100 | if local_rank == 0: 101 | pbar.close() 102 | print('Finish Validation') 103 | loss_history.append_loss(epoch + 1, loss / epoch_step, val_loss / epoch_step_val) 104 | eval_callback.on_epoch_end(epoch + 1, model_train) 105 | print('Epoch:'+ str(epoch + 1) + '/' + str(Epoch)) 106 | print('Total Loss: %.3f || Val Loss: %.3f ' % (loss / epoch_step, val_loss / epoch_step_val)) 107 | 108 | #-----------------------------------------------# 109 | # 保存权值 110 | #-----------------------------------------------# 111 | if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch: 112 | torch.save(model.state_dict(), os.path.join(save_dir, 'ep%03d-loss%.3f-val_loss%.3f.pth' % (epoch + 1, loss / epoch_step, val_loss / epoch_step_val))) 113 | 114 | if len(loss_history.val_loss) <= 1 or (val_loss / epoch_step_val) <= min(loss_history.val_loss): 115 | print('Save best model to best_epoch_weights.pth') 116 | torch.save(model.state_dict(), os.path.join(save_dir, "best_epoch_weights.pth")) 117 | 118 | torch.save(model.state_dict(), os.path.join(save_dir, "last_epoch_weights.pth")) -------------------------------------------------------------------------------- /voc_annotation.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import xml.etree.ElementTree as ET 4 | 5 | import numpy as np 6 | 7 | from utils.utils import get_classes 8 | 9 | #--------------------------------------------------------------------------------------------------------------------------------# 10 | # annotation_mode用于指定该文件运行时计算的内容 11 | # annotation_mode为0代表整个标签处理过程,包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt 12 | # annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt 13 | # annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt 14 | #--------------------------------------------------------------------------------------------------------------------------------# 15 | annotation_mode = 0 16 | #-------------------------------------------------------------------# 17 | # 必须要修改,用于生成2007_train.txt、2007_val.txt的目标信息 18 | # 与训练和预测所用的classes_path一致即可 19 | # 如果生成的2007_train.txt里面没有目标信息 20 | # 那么就是因为classes没有设定正确 21 | # 仅在annotation_mode为0和2的时候有效 22 | #-------------------------------------------------------------------# 23 | classes_path = 'model_data/voc_classes.txt' 24 | #--------------------------------------------------------------------------------------------------------------------------------# 25 | # trainval_percent用于指定(训练集+验证集)与测试集的比例,默认情况下 (训练集+验证集):测试集 = 9:1 26 | # train_percent用于指定(训练集+验证集)中训练集与验证集的比例,默认情况下 训练集:验证集 = 9:1 27 | # 仅在annotation_mode为0和1的时候有效 28 | #--------------------------------------------------------------------------------------------------------------------------------# 29 | trainval_percent = 0.9 30 | train_percent = 0.9 31 | #-------------------------------------------------------# 32 | # 指向VOC数据集所在的文件夹 33 | # 默认指向根目录下的VOC数据集 34 | #-------------------------------------------------------# 35 | VOCdevkit_path = 'VOCdevkit' 36 | 37 | VOCdevkit_sets = [('2007', 'train'), ('2007', 'val')] 38 | classes, _ = get_classes(classes_path) 39 | 40 | #-------------------------------------------------------# 41 | # 统计目标数量 42 | #-------------------------------------------------------# 43 | photo_nums = np.zeros(len(VOCdevkit_sets)) 44 | nums = np.zeros(len(classes)) 45 | def convert_annotation(year, image_id, list_file): 46 | in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8') 47 | tree=ET.parse(in_file) 48 | root = tree.getroot() 49 | 50 | for obj in root.iter('object'): 51 | difficult = 0 52 | if obj.find('difficult')!=None: 53 | difficult = obj.find('difficult').text 54 | cls = obj.find('name').text 55 | if cls not in classes or int(difficult)==1: 56 | continue 57 | cls_id = classes.index(cls) 58 | xmlbox = obj.find('bndbox') 59 | b = (int(float(xmlbox.find('xmin').text)), int(float(xmlbox.find('ymin').text)), int(float(xmlbox.find('xmax').text)), int(float(xmlbox.find('ymax').text))) 60 | list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id)) 61 | 62 | nums[classes.index(cls)] = nums[classes.index(cls)] + 1 63 | 64 | if __name__ == "__main__": 65 | random.seed(0) 66 | if " " in os.path.abspath(VOCdevkit_path): 67 | raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格,否则会影响正常的模型训练,请注意修改。") 68 | 69 | if annotation_mode == 0 or annotation_mode == 1: 70 | print("Generate txt in ImageSets.") 71 | xmlfilepath = os.path.join(VOCdevkit_path, 'VOC2007/Annotations') 72 | saveBasePath = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Main') 73 | temp_xml = os.listdir(xmlfilepath) 74 | total_xml = [] 75 | for xml in temp_xml: 76 | if xml.endswith(".xml"): 77 | total_xml.append(xml) 78 | 79 | num = len(total_xml) 80 | list = range(num) 81 | tv = int(num*trainval_percent) 82 | tr = int(tv*train_percent) 83 | trainval= random.sample(list,tv) 84 | train = random.sample(trainval,tr) 85 | 86 | print("train and val size",tv) 87 | print("train size",tr) 88 | ftrainval = open(os.path.join(saveBasePath,'trainval.txt'), 'w') 89 | ftest = open(os.path.join(saveBasePath,'test.txt'), 'w') 90 | ftrain = open(os.path.join(saveBasePath,'train.txt'), 'w') 91 | fval = open(os.path.join(saveBasePath,'val.txt'), 'w') 92 | 93 | for i in list: 94 | name=total_xml[i][:-4]+'\n' 95 | if i in trainval: 96 | ftrainval.write(name) 97 | if i in train: 98 | ftrain.write(name) 99 | else: 100 | fval.write(name) 101 | else: 102 | ftest.write(name) 103 | 104 | ftrainval.close() 105 | ftrain.close() 106 | fval.close() 107 | ftest.close() 108 | print("Generate txt in ImageSets done.") 109 | 110 | if annotation_mode == 0 or annotation_mode == 2: 111 | print("Generate 2007_train.txt and 2007_val.txt for train.") 112 | type_index = 0 113 | for year, image_set in VOCdevkit_sets: 114 | image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split() 115 | list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8') 116 | for image_id in image_ids: 117 | list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(os.path.abspath(VOCdevkit_path), year, image_id)) 118 | 119 | convert_annotation(year, image_id, list_file) 120 | list_file.write('\n') 121 | photo_nums[type_index] = len(image_ids) 122 | type_index += 1 123 | list_file.close() 124 | print("Generate 2007_train.txt and 2007_val.txt for train done.") 125 | 126 | def printTable(List1, List2): 127 | for i in range(len(List1[0])): 128 | print("|", end=' ') 129 | for j in range(len(List1)): 130 | print(List1[j][i].rjust(int(List2[j])), end=' ') 131 | print("|", end=' ') 132 | print() 133 | 134 | str_nums = [str(int(x)) for x in nums] 135 | tableData = [ 136 | classes, str_nums 137 | ] 138 | colWidths = [0]*len(tableData) 139 | len1 = 0 140 | for i in range(len(tableData)): 141 | for j in range(len(tableData[i])): 142 | if len(tableData[i][j]) > colWidths[i]: 143 | colWidths[i] = len(tableData[i][j]) 144 | printTable(tableData, colWidths) 145 | 146 | if photo_nums[0] <= 500: 147 | print("训练集数量小于500,属于较小的数据量,请注意设置较大的训练世代(Epoch)以满足足够的梯度下降次数(Step)。") 148 | 149 | if np.sum(nums) == 0: 150 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 151 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 152 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 153 | print("(重要的事情说三遍)。") 154 | -------------------------------------------------------------------------------- /常见问题汇总.md: -------------------------------------------------------------------------------- 1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。 2 | 3 | # 问题汇总 4 | ## 1、下载问题 5 | ### a、代码下载 6 | **问:up主,可以给我发一份代码吗,代码在哪里下载啊? 7 | 答:Github上的地址就在视频简介里。复制一下就能进去下载了。** 8 | 9 | **问:up主,为什么我下载的代码提示压缩包损坏? 10 | 答:重新去Github下载。** 11 | 12 | **问:up主,为什么我下载的代码和你在视频以及博客上的代码不一样? 13 | 答:我常常会对代码进行更新,最终以实际的代码为准。** 14 | 15 | ### b、 权值下载 16 | **问:up主,为什么我下载的代码里面,model_data下面没有.pth或者.h5文件? 17 | 答:我一般会把权值上传到Github和百度网盘,在GITHUB的README里面就能找到。** 18 | 19 | ### c、 数据集下载 20 | **问:up主,XXXX数据集在哪里下载啊? 21 | 答:一般数据集的下载地址我会放在README里面,基本上都有,没有的话请及时联系我添加,直接发github的issue即可**。 22 | 23 | ## 2、环境配置问题 24 | ### a、现在库中所用的环境 25 | **pytorch代码对应的pytorch版本为1.2,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。 26 | 27 | **keras代码对应的tensorflow版本为1.13.2,keras版本是2.1.5,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。 28 | 29 | **tf2代码对应的tensorflow版本为2.2.0,无需安装keras,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。 30 | 31 | **问:你的代码某某某版本的tensorflow和pytorch能用嘛? 32 | 答:最好按照我推荐的配置,配置教程也有!其它版本的我没有试过!可能出现问题但是一般问题不大。仅需要改少量代码即可。** 33 | 34 | ### b、30系列显卡环境配置 35 | 30系显卡由于框架更新不可使用上述环境配置教程。 36 | 当前我已经测试的可以用的30显卡配置如下: 37 | **pytorch代码对应的pytorch版本为1.7.0,cuda为11.0,cudnn为8.0.5**。 38 | 39 | **keras代码无法在win10下配置cuda11,在ubuntu下可以百度查询一下,配置tensorflow版本为1.15.4,keras版本是2.1.5或者2.3.1(少量函数接口不同,代码可能还需要少量调整。)** 40 | 41 | **tf2代码对应的tensorflow版本为2.4.0,cuda为11.0,cudnn为8.0.5**。 42 | 43 | ### c、GPU利用问题与环境使用问题 44 | **问:为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢? 45 | 答:确认tensorflow-gpu已经装好,利用pip list查看tensorflow版本,然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练,任务管理器的话要看显存使用情况。** 46 | 47 | **问:up主,我好像没有在用gpu进行训练啊,怎么看是不是用了GPU进行训练? 48 | 答:查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令,如果要看任务管理器的话,请看性能部分GPU的显存是否利用,或者查看任务管理器的Cuda,而非Copy。** 49 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center) 50 | 51 | **问:up主,为什么我按照你的环境配置后还是不能使用? 52 | 答:请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。** 53 | 54 | **问:出现如下错误** 55 | ```python 56 | Traceback (most recent call last): 57 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in 58 | from tensorflow.python.pywrap_tensorflow_internal import * 59 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in 60 | pywrap_tensorflow_internal = swig_import_helper() 61 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper 62 | _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) 63 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file) 64 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic 65 | return _load(spec) 66 | ImportError: DLL load failed: 找不到指定的模块。 67 | ``` 68 | **答:如果没重启过就重启一下,否则重新按照步骤安装,还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。** 69 | 70 | ### d、no module问题 71 | **问:为什么提示说no module name utils.utils(no module name nets.yolo、no module name nets.ssd等一系列问题)啊? 72 | 答:utils并不需要用pip装,它就在我上传的仓库的根目录,出现这个问题的原因是根目录不对,查查相对目录和根目录的概念。查了基本上就明白了。** 73 | 74 | **问:为什么提示说no module name matplotlib(no module name PIL,no module name cv2等等)? 75 | 答:这个库没安装打开命令行安装就好。pip install matplotlib** 76 | 77 | **问:为什么我已经用pip装了opencv(pillow、matplotlib等),还是提示no module name cv2? 78 | 答:没有激活环境装,要激活对应的conda环境进行安装才可以正常使用** 79 | 80 | **问:为什么提示说No module named 'torch' ? 81 | 答:其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况?一般就俩情况,一个是真的没装,还有一个是装到其它环境了,当前激活的环境不是自己装的环境。** 82 | 83 | **问:为什么提示说No module named 'tensorflow' ? 84 | 答:同上。** 85 | 86 | ### e、cuda安装失败问题 87 | 一般cuda安装前需要安装Visual Studio,装个2017版本即可。 88 | 89 | ### f、Ubuntu系统问题 90 | **所有代码在Ubuntu下可以使用,我两个系统都试过。** 91 | 92 | ### g、VSCODE提示错误的问题 93 | **问:为什么在VSCODE里面提示一大堆的错误啊? 94 | 答:我也提示一大堆的错误,但是不影响,是VSCODE的问题,如果不想看错误的话就装Pycharm。** 95 | 96 | ### h、使用cpu进行训练与预测的问题 97 | **对于keras和tf2的代码而言,如果想用cpu进行训练和预测,直接装cpu版本的tensorflow就可以了。** 98 | 99 | **对于pytorch的代码而言,如果想用cpu进行训练和预测,需要将cuda=True修改成cuda=False。** 100 | 101 | ### i、tqdm没有pos参数问题 102 | **问:运行代码提示'tqdm' object has no attribute 'pos'。 103 | 答:重装tqdm,换个版本就可以了。** 104 | 105 | ### j、提示decode(“utf-8”)的问题 106 | **由于h5py库的更新,安装过程中会自动安装h5py=3.0.0以上的版本,会导致decode("utf-8")的错误! 107 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0!** 108 | ``` 109 | pip install h5py==2.10.0 110 | ``` 111 | 112 | ### k、提示TypeError: __array__() takes 1 positional argument but 2 were given错误 113 | 可以修改pillow版本解决。 114 | ``` 115 | pip install pillow==8.2.0 116 | ``` 117 | 118 | ### l、其它问题 119 | **问:为什么提示TypeError: cat() got an unexpected keyword argument 'axis',Traceback (most recent call last),AttributeError: 'Tensor' object has no attribute 'bool'? 120 | 答:这是版本问题,建议使用torch1.2以上版本** 121 | **其它有很多稀奇古怪的问题,很多是版本问题,建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2,就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。** 122 | 123 | ## 3、目标检测库问题汇总(人脸检测和分类库也可参考) 124 | ### a、shape不匹配问题 125 | #### 1)、训练时shape不匹配问题 126 | **问:up主,为什么运行train.py会提示shape不匹配啊? 127 | 答:在keras环境中,因为你训练的种类和原始的种类不同,网络结构会变化,所以最尾部的shape会有少量不匹配。** 128 | 129 | #### 2)、预测时shape不匹配问题 130 | **问:为什么我运行predict.py会提示我说shape不匹配呀。 131 | 在Pytorch里面是这样的:** 132 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png) 133 | 在Keras里面是这样的: 134 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70) 135 | **答:原因主要有仨: 136 | 1、在ssd、FasterRCNN里面,可能是train.py里面的num_classes没改。 137 | 2、model_path没改。 138 | 3、classes_path没改。 139 | 请检查清楚了!确定自己所用的model_path和classes_path是对应的!训练的时候用到的num_classes或者classes_path也需要检查!** 140 | 141 | ### b、显存不足问题 142 | **问:为什么我运行train.py下面的命令行闪的贼快,还提示OOM啥的? 143 | 答:这是在keras中出现的,爆显存了,可以改小batch_size,SSD的显存占用率是最小的,建议用SSD; 144 | 2G显存:SSD、YOLOV4-TINY 145 | 4G显存:YOLOV3 146 | 6G显存:YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等 147 | 8G+显存:随便选吧。** 148 | **需要注意的是,受到BatchNorm2d影响,batch_size不可为1,至少为2。** 149 | 150 | **问:为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)? 151 | 答:这是pytorch中出现的,爆显存了,同上。** 152 | 153 | **问:为什么我显存都没利用,就直接爆显存了? 154 | 答:都爆显存了,自然就不利用了,模型没有开始训练。** 155 | ### c、训练问题(冻结训练,LOSS问题、训练效果问题等) 156 | **问:为什么要冻结训练和解冻训练呀? 157 | 答:这是迁移学习的思想,因为神经网络主干特征提取部分所提取到的特征是通用的,我们冻结起来训练可以加快训练效率,也可以防止权值被破坏。** 158 | 在冻结阶段,模型的主干被冻结了,特征提取网络不发生改变。占用的显存较小,仅对网络进行微调。 159 | 在解冻阶段,模型的主干不被冻结了,特征提取网络会发生改变。占用的显存较大,网络所有的参数都会发生改变。 160 | 161 | **问:为什么我的网络不收敛啊,LOSS是XXXX。 162 | 答:不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,我的yolo代码都没有归一化,所以LOSS值看起来比较高,LOSS的值不重要,重要的是是否在变小,预测是否有效果。** 163 | 164 | **问:为什么我的训练效果不好?预测了没有框(框不准)。 165 | 答:** 166 | 167 | 考虑几个问题: 168 | 1、目标信息问题,查看2007_train.txt文件是否有目标信息,没有的话请修改voc_annotation.py。 169 | 2、数据集问题,小于500的自行考虑增加数据集,同时测试不同的模型,确认数据集是好的。 170 | 3、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 171 | 4、网络问题,比如SSD不适合小目标,因为先验框固定了。 172 | 5、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 173 | 6、确认自己是否按照步骤去做了,如果比如voc_annotation.py里面的classes是否修改了等。 174 | 7、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。 175 | 176 | **问:我怎么出现了gbk什么的编码错误啊:** 177 | ```python 178 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence 179 | ``` 180 | **答:标签和路径不要使用中文,如果一定要使用中文,请注意处理的时候编码的问题,改成打开文件的encoding方式改为utf-8。** 181 | 182 | **问:我的图片是xxx*xxx的分辨率的,可以用吗!** 183 | **答:可以用,代码里面会自动进行resize或者数据增强。** 184 | 185 | **问:怎么进行多GPU训练? 186 | 答:pytorch的大多数代码可以直接使用gpu训练,keras的话直接百度就好了,实现并不复杂,我没有多卡没法详细测试,还需要各位同学自己努力了。** 187 | ### d、灰度图问题 188 | **问:能不能训练灰度图(预测灰度图)啊? 189 | 答:我的大多数库会将灰度图转化成RGB进行训练和预测,如果遇到代码不能训练或者预测灰度图的情况,可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB,预测的时候也这样试试。(仅供参考)** 190 | 191 | ### e、断点续练问题 192 | **问:我已经训练过几个世代了,能不能从这个基础上继续开始训练 193 | 答:可以,你在训练前,和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面,将model_path修改成你要开始的权值的路径即可。** 194 | 195 | ### f、预训练权重的问题 196 | **问:如果我要训练其它的数据集,预训练权重要怎么办啊?** 197 | **答:数据的预训练权重对不同数据集是通用的,因为特征是通用的,预训练权重对于99%的情况都必须要用,不用的话权值太过随机,特征提取效果不明显,网络训练的结果也不会好。** 198 | 199 | **问:up,我修改了网络,预训练权重还能用吗? 200 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 201 | 权值匹配的方式可以参考如下: 202 | ```python 203 | # 加快模型训练的效率 204 | print('Loading weights into state dict...') 205 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 206 | model_dict = model.state_dict() 207 | pretrained_dict = torch.load(model_path, map_location=device) 208 | a = {} 209 | for k, v in pretrained_dict.items(): 210 | try: 211 | if np.shape(model_dict[k]) == np.shape(v): 212 | a[k]=v 213 | except: 214 | pass 215 | model_dict.update(a) 216 | model.load_state_dict(model_dict) 217 | print('Finished!') 218 | ``` 219 | 220 | **问:我要怎么不使用预训练权重啊? 221 | 答:把载入预训练权重的代码注释了就行。** 222 | 223 | **问:为什么我不使用预训练权重效果这么差啊? 224 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 225 | 226 | ### g、视频检测问题与摄像头检测问题 227 | **问:怎么用摄像头检测呀? 228 | 答:predict.py修改参数可以进行摄像头检测,也有视频详细解释了摄像头检测的思路。** 229 | 230 | **问:怎么用视频检测呀? 231 | 答:同上** 232 | ### h、从0开始训练问题 233 | **问:怎么在模型上从0开始训练? 234 | 答:在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力,无法使得网络正常收敛。** 235 | 如果一定要从0开始,那么训练的时候请注意几点: 236 | - 不载入预训练权重。 237 | - 不要进行冻结训练,注释冻结模型的代码。 238 | 239 | **问:为什么我不使用预训练权重效果这么差啊? 240 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 241 | 242 | ### i、保存问题 243 | **问:检测完的图片怎么保存? 244 | 答:一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。** 245 | 246 | **问:怎么用视频保存呀? 247 | 答:详细看看predict.py文件的注释。** 248 | 249 | ### j、遍历问题 250 | **问:如何对一个文件夹的图片进行遍历? 251 | 答:一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了,详细看看predict.py文件的注释。** 252 | 253 | **问:如何对一个文件夹的图片进行遍历?并且保存。 254 | 答:遍历的话一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2,那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。** 255 | 256 | ### k、路径问题(No such file or directory) 257 | **问:我怎么出现了这样的错误呀:** 258 | ```python 259 | FileNotFoundError: 【Errno 2】 No such file or directory 260 | …………………………………… 261 | …………………………………… 262 | ``` 263 | **答:去检查一下文件夹路径,查看是否有对应文件;并且检查一下2007_train.txt,其中文件路径是否有错。** 264 | 关于路径有几个重要的点: 265 | **文件夹名称中一定不要有空格。 266 | 注意相对路径和绝对路径。 267 | 多百度路径相关的知识。** 268 | 269 | **所有的路径问题基本上都是根目录问题,好好查一下相对目录的概念!** 270 | ### l、和原版比较问题 271 | **问:你这个代码和原版比怎么样,可以达到原版的效果么? 272 | 答:基本上可以达到,我都用voc数据测过,我没有好显卡,没有能力在coco上测试与训练。** 273 | 274 | **问:你有没有实现yolov4所有的tricks,和原版差距多少? 275 | 答:并没有实现全部的改进部分,由于YOLOV4使用的改进实在太多了,很难完全实现与列出来,这里只列出来了一些我比较感兴趣,而且非常有效的改进。论文中提到的SAM(注意力机制模块),作者自己的源码也没有使用。还有其它很多的tricks,不是所有的tricks都有提升,我也没法实现全部的tricks。至于和原版的比较,我没有能力训练coco数据集,根据使用过的同学反应差距不大。** 276 | 277 | ### m、FPS问题(检测速度问题) 278 | **问:你这个FPS可以到达多少,可以到 XX FPS么? 279 | 答:FPS和机子的配置有关,配置高就快,配置低就慢。** 280 | 281 | **问:为什么我用服务器去测试yolov4(or others)的FPS只有十几? 282 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。** 283 | 284 | **问:为什么论文中说速度可以达到XX,但是这里却没有? 285 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。有些论文还会使用多batch进行预测,我并没有去实现这个部分。** 286 | 287 | ### n、预测图片不显示问题 288 | **问:为什么你的代码在预测完成后不显示图片?只是在命令行告诉我有什么目标。 289 | 答:给系统安装一个图片查看器就行了。** 290 | 291 | ### o、算法评价问题(目标检测的map、PR曲线、Recall、Precision等) 292 | **问:怎么计算map? 293 | 答:看map视频,都一个流程。** 294 | 295 | **问:计算map的时候,get_map.py里面有一个MINOVERLAP是什么用的,是iou吗? 296 | 答:是iou,它的作用是判断预测框和真实框的重合成度,如果重合程度大于MINOVERLAP,则预测正确。** 297 | 298 | **问:为什么get_map.py里面的self.confidence(self.score)要设置的那么小? 299 | 答:看一下map的视频的原理部分,要知道所有的结果然后再进行pr曲线的绘制。** 300 | 301 | **问:能不能说说怎么绘制PR曲线啥的呀。 302 | 答:可以看mAP视频,结果里面有PR曲线。** 303 | 304 | **问:怎么计算Recall、Precision指标。 305 | 答:这俩指标应该是相对于特定的置信度的,计算map的时候也会获得。** 306 | 307 | ### p、coco数据集训练问题 308 | **问:目标检测怎么训练COCO数据集啊?。 309 | 答:coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库,格式都是一样的。** 310 | 311 | ### q、模型优化(模型修改)问题 312 | **问:up,YOLO系列使用Focal LOSS的代码你有吗,有提升吗? 313 | 答:很多人试过,提升效果也不大(甚至变的更Low),它自己有自己的正负样本的平衡方式。** 314 | 315 | **问:up,我修改了网络,预训练权重还能用吗? 316 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 317 | 权值匹配的方式可以参考如下: 318 | ```python 319 | # 加快模型训练的效率 320 | print('Loading weights into state dict...') 321 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 322 | model_dict = model.state_dict() 323 | pretrained_dict = torch.load(model_path, map_location=device) 324 | a = {} 325 | for k, v in pretrained_dict.items(): 326 | try: 327 | if np.shape(model_dict[k]) == np.shape(v): 328 | a[k]=v 329 | except: 330 | pass 331 | model_dict.update(a) 332 | model.load_state_dict(model_dict) 333 | print('Finished!') 334 | ``` 335 | 336 | **问:up,怎么修改模型啊,我想发个小论文! 337 | 答:建议看看yolov3和yolov4的区别,然后看看yolov4的论文,作为一个大型调参现场非常有参考意义,使用了很多tricks。我能给的建议就是多看一些经典模型,然后拆解里面的亮点结构并使用。** 338 | 339 | ### r、部署问题 340 | 我没有具体部署到手机等设备上过,所以很多部署问题我并不了解…… 341 | 342 | ## 4、语义分割库问题汇总 343 | ### a、shape不匹配问题 344 | #### 1)、训练时shape不匹配问题 345 | **问:up主,为什么运行train.py会提示shape不匹配啊? 346 | 答:在keras环境中,因为你训练的种类和原始的种类不同,网络结构会变化,所以最尾部的shape会有少量不匹配。** 347 | 348 | #### 2)、预测时shape不匹配问题 349 | **问:为什么我运行predict.py会提示我说shape不匹配呀。 350 | 在Pytorch里面是这样的:** 351 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png) 352 | 在Keras里面是这样的: 353 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70) 354 | **答:原因主要有二: 355 | 1、train.py里面的num_classes没改。 356 | 2、预测时num_classes没改。 357 | 请检查清楚!训练和预测的时候用到的num_classes都需要检查!** 358 | 359 | ### b、显存不足问题 360 | **问:为什么我运行train.py下面的命令行闪的贼快,还提示OOM啥的? 361 | 答:这是在keras中出现的,爆显存了,可以改小batch_size。** 362 | 363 | **需要注意的是,受到BatchNorm2d影响,batch_size不可为1,至少为2。** 364 | 365 | **问:为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)? 366 | 答:这是pytorch中出现的,爆显存了,同上。** 367 | 368 | **问:为什么我显存都没利用,就直接爆显存了? 369 | 答:都爆显存了,自然就不利用了,模型没有开始训练。** 370 | 371 | ### c、训练问题(冻结训练,LOSS问题、训练效果问题等) 372 | **问:为什么要冻结训练和解冻训练呀? 373 | 答:这是迁移学习的思想,因为神经网络主干特征提取部分所提取到的特征是通用的,我们冻结起来训练可以加快训练效率,也可以防止权值被破坏。** 374 | **在冻结阶段,模型的主干被冻结了,特征提取网络不发生改变。占用的显存较小,仅对网络进行微调。** 375 | **在解冻阶段,模型的主干不被冻结了,特征提取网络会发生改变。占用的显存较大,网络所有的参数都会发生改变。** 376 | 377 | **问:为什么我的网络不收敛啊,LOSS是XXXX。 378 | 答:不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,我的yolo代码都没有归一化,所以LOSS值看起来比较高,LOSS的值不重要,重要的是是否在变小,预测是否有效果。** 379 | 380 | **问:为什么我的训练效果不好?预测了没有目标,结果是一片黑。 381 | 答:** 382 | **考虑几个问题: 383 | 1、数据集问题,这是最重要的问题。小于500的自行考虑增加数据集;一定要检查数据集的标签,视频中详细解析了VOC数据集的格式,但并不是有输入图片有输出标签即可,还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对,最常见的错误格式就是标签的背景为黑,目标为白,此时目标的像素点值为255,无法正常训练,目标需要为1才行。 384 | 2、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 385 | 3、网络问题,可以尝试不同的网络。 386 | 4、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 387 | 5、确认自己是否按照步骤去做了。 388 | 6、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。** 389 | 390 | 391 | 392 | **问:为什么我的训练效果不好?对小目标预测不准确。 393 | 答:对于deeplab和pspnet而言,可以修改一下downsample_factor,当downsample_factor为16的时候下采样倍数过多,效果不太好,可以修改为8。** 394 | 395 | **问:我怎么出现了gbk什么的编码错误啊:** 396 | ```python 397 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence 398 | ``` 399 | **答:标签和路径不要使用中文,如果一定要使用中文,请注意处理的时候编码的问题,改成打开文件的encoding方式改为utf-8。** 400 | 401 | **问:我的图片是xxx*xxx的分辨率的,可以用吗!** 402 | **答:可以用,代码里面会自动进行resize或者数据增强。** 403 | 404 | **问:怎么进行多GPU训练? 405 | 答:pytorch的大多数代码可以直接使用gpu训练,keras的话直接百度就好了,实现并不复杂,我没有多卡没法详细测试,还需要各位同学自己努力了。** 406 | 407 | ### d、灰度图问题 408 | **问:能不能训练灰度图(预测灰度图)啊? 409 | 答:我的大多数库会将灰度图转化成RGB进行训练和预测,如果遇到代码不能训练或者预测灰度图的情况,可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB,预测的时候也这样试试。(仅供参考)** 410 | 411 | ### e、断点续练问题 412 | **问:我已经训练过几个世代了,能不能从这个基础上继续开始训练 413 | 答:可以,你在训练前,和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面,将model_path修改成你要开始的权值的路径即可。** 414 | 415 | ### f、预训练权重的问题 416 | 417 | **问:如果我要训练其它的数据集,预训练权重要怎么办啊?** 418 | **答:数据的预训练权重对不同数据集是通用的,因为特征是通用的,预训练权重对于99%的情况都必须要用,不用的话权值太过随机,特征提取效果不明显,网络训练的结果也不会好。** 419 | 420 | **问:up,我修改了网络,预训练权重还能用吗? 421 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 422 | 权值匹配的方式可以参考如下: 423 | 424 | ```python 425 | # 加快模型训练的效率 426 | print('Loading weights into state dict...') 427 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 428 | model_dict = model.state_dict() 429 | pretrained_dict = torch.load(model_path, map_location=device) 430 | a = {} 431 | for k, v in pretrained_dict.items(): 432 | try: 433 | if np.shape(model_dict[k]) == np.shape(v): 434 | a[k]=v 435 | except: 436 | pass 437 | model_dict.update(a) 438 | model.load_state_dict(model_dict) 439 | print('Finished!') 440 | ``` 441 | 442 | **问:我要怎么不使用预训练权重啊? 443 | 答:把载入预训练权重的代码注释了就行。** 444 | 445 | **问:为什么我不使用预训练权重效果这么差啊? 446 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,预训练权重还是非常重要的。** 447 | 448 | ### g、视频检测问题与摄像头检测问题 449 | **问:怎么用摄像头检测呀? 450 | 答:predict.py修改参数可以进行摄像头检测,也有视频详细解释了摄像头检测的思路。** 451 | 452 | **问:怎么用视频检测呀? 453 | 答:同上** 454 | 455 | ### h、从0开始训练问题 456 | **问:怎么在模型上从0开始训练? 457 | 答:在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力,无法使得网络正常收敛。** 458 | 如果一定要从0开始,那么训练的时候请注意几点: 459 | - 不载入预训练权重。 460 | - 不要进行冻结训练,注释冻结模型的代码。 461 | 462 | **问:为什么我不使用预训练权重效果这么差啊? 463 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,预训练权重还是非常重要的。** 464 | 465 | ### i、保存问题 466 | **问:检测完的图片怎么保存? 467 | 答:一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。** 468 | 469 | **问:怎么用视频保存呀? 470 | 答:详细看看predict.py文件的注释。** 471 | 472 | ### j、遍历问题 473 | **问:如何对一个文件夹的图片进行遍历? 474 | 答:一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了,详细看看predict.py文件的注释。** 475 | 476 | **问:如何对一个文件夹的图片进行遍历?并且保存。 477 | 答:遍历的话一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2,那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。** 478 | 479 | ### k、路径问题(No such file or directory) 480 | **问:我怎么出现了这样的错误呀:** 481 | ```python 482 | FileNotFoundError: 【Errno 2】 No such file or directory 483 | …………………………………… 484 | …………………………………… 485 | ``` 486 | 487 | **答:去检查一下文件夹路径,查看是否有对应文件;并且检查一下2007_train.txt,其中文件路径是否有错。** 488 | 关于路径有几个重要的点: 489 | **文件夹名称中一定不要有空格。 490 | 注意相对路径和绝对路径。 491 | 多百度路径相关的知识。** 492 | 493 | **所有的路径问题基本上都是根目录问题,好好查一下相对目录的概念!** 494 | 495 | ### l、FPS问题(检测速度问题) 496 | **问:你这个FPS可以到达多少,可以到 XX FPS么? 497 | 答:FPS和机子的配置有关,配置高就快,配置低就慢。** 498 | 499 | **问:为什么论文中说速度可以达到XX,但是这里却没有? 500 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。有些论文还会使用多batch进行预测,我并没有去实现这个部分。** 501 | 502 | ### m、预测图片不显示问题 503 | **问:为什么你的代码在预测完成后不显示图片?只是在命令行告诉我有什么目标。 504 | 答:给系统安装一个图片查看器就行了。** 505 | 506 | ### n、算法评价问题(miou) 507 | **问:怎么计算miou? 508 | 答:参考视频里的miou测量部分。** 509 | 510 | **问:怎么计算Recall、Precision指标。 511 | 答:现有的代码还无法获得,需要各位同学理解一下混淆矩阵的概念,然后自行计算一下。** 512 | 513 | ### o、模型优化(模型修改)问题 514 | **问:up,我修改了网络,预训练权重还能用吗? 515 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 516 | 权值匹配的方式可以参考如下: 517 | 518 | ```python 519 | # 加快模型训练的效率 520 | print('Loading weights into state dict...') 521 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 522 | model_dict = model.state_dict() 523 | pretrained_dict = torch.load(model_path, map_location=device) 524 | a = {} 525 | for k, v in pretrained_dict.items(): 526 | try: 527 | if np.shape(model_dict[k]) == np.shape(v): 528 | a[k]=v 529 | except: 530 | pass 531 | model_dict.update(a) 532 | model.load_state_dict(model_dict) 533 | print('Finished!') 534 | ``` 535 | 536 | 537 | 538 | **问:up,怎么修改模型啊,我想发个小论文! 539 | 答:建议看看目标检测中yolov4的论文,作为一个大型调参现场非常有参考意义,使用了很多tricks。我能给的建议就是多看一些经典模型,然后拆解里面的亮点结构并使用。常用的tricks如注意力机制什么的,可以试试。** 540 | 541 | ### p、部署问题 542 | 我没有具体部署到手机等设备上过,所以很多部署问题我并不了解…… 543 | 544 | ## 5、交流群问题 545 | **问:up,有没有QQ群啥的呢? 546 | 答:没有没有,我没有时间管理QQ群……** 547 | 548 | ## 6、怎么学习的问题 549 | **问:up,你的学习路线怎么样的?我是个小白我要怎么学? 550 | 答:这里有几点需要注意哈 551 | 1、我不是高手,很多东西我也不会,我的学习路线也不一定适用所有人。 552 | 2、我实验室不做深度学习,所以我很多东西都是自学,自己摸索,正确与否我也不知道。 553 | 3、我个人觉得学习更靠自学** 554 | 学习路线的话,我是先学习了莫烦的python教程,从tensorflow、keras、pytorch入门,入门完之后学的SSD,YOLO,然后了解了很多经典的卷积网,后面就开始学很多不同的代码了,我的学习方法就是一行一行的看,了解整个代码的执行流程,特征层的shape变化等,花了很多时间也没有什么捷径,就是要花时间吧。 --------------------------------------------------------------------------------