├── .gitignore ├── LICENSE ├── README.md ├── VOCdevkit └── VOC2007 │ ├── Annotations │ └── README.md │ ├── ImageSets │ └── Main │ │ └── README.md │ └── JPEGImages │ └── README.md ├── convert_trt.py ├── get_map.py ├── hrsc_annotation.py ├── img └── test.jpg ├── kmeans_for_anchors.py ├── logs └── README.md ├── model_data ├── coco_classes.txt ├── simhei.ttf ├── uav_classes.txt ├── voc_classes.txt └── yolo_anchors.txt ├── nets ├── __init__.py ├── backbone.py ├── yolo.py └── yolo_training.py ├── predict.py ├── requirements.txt ├── summary.py ├── train.py ├── utils ├── __init__.py ├── callbacks.py ├── dataloader.py ├── kld_loss.py ├── nms_rotated │ ├── __init__.py │ ├── nms_rotated_wrapper.py │ ├── setup.py │ └── src │ │ ├── box_iou_rotated_utils.h │ │ ├── nms_rotated_cpu.cpp │ │ ├── nms_rotated_cuda.cu │ │ ├── nms_rotated_ext.cpp │ │ ├── poly_nms_cpu.cpp │ │ └── poly_nms_cuda.cu ├── utils.py ├── utils_bbox.py ├── utils_fit.py ├── utils_map.py └── utils_rbox.py ├── utils_coco ├── coco_annotation.py └── get_map_coco.py ├── voc_annotation.py ├── yolo.py └── 常见问题汇总.md /.gitignore: -------------------------------------------------------------------------------- 1 | # ignore map, miou, datasets 2 | *.mp4 3 | map_out/ 4 | miou_out/ 5 | VOCdevkit/ 6 | datasets/ 7 | Medical_Datasets/ 8 | lfw/ 9 | logs/ 10 | model_data/ 11 | .temp_map_out/ 12 | 13 | # Byte-compiled / optimized / DLL files 14 | __pycache__/ 15 | *.py[cod] 16 | *$py.class 17 | 18 | # C extensions 19 | *.so 20 | 21 | # Distribution / packaging 22 | .Python 23 | build/ 24 | develop-eggs/ 25 | dist/ 26 | downloads/ 27 | eggs/ 28 | .eggs/ 29 | lib/ 30 | lib64/ 31 | parts/ 32 | sdist/ 33 | var/ 34 | wheels/ 35 | pip-wheel-metadata/ 36 | share/python-wheels/ 37 | *.egg-info/ 38 | .installed.cfg 39 | *.egg 40 | MANIFEST 41 | 42 | # PyInstaller 43 | # Usually these files are written by a python script from a template 44 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 45 | *.manifest 46 | *.spec 47 | 48 | # Installer logs 49 | pip-log.txt 50 | pip-delete-this-directory.txt 51 | 52 | # Unit test / coverage reports 53 | htmlcov/ 54 | .tox/ 55 | .nox/ 56 | .coverage 57 | .coverage.* 58 | .cache 59 | nosetests.xml 60 | coverage.xml 61 | *.cover 62 | *.py,cover 63 | .hypothesis/ 64 | .pytest_cache/ 65 | 66 | # Translations 67 | *.mo 68 | *.pot 69 | 70 | # Django stuff: 71 | *.log 72 | local_settings.py 73 | db.sqlite3 74 | db.sqlite3-journal 75 | 76 | # Flask stuff: 77 | instance/ 78 | .webassets-cache 79 | 80 | # Scrapy stuff: 81 | .scrapy 82 | 83 | # Sphinx documentation 84 | docs/_build/ 85 | 86 | # PyBuilder 87 | target/ 88 | 89 | # Jupyter Notebook 90 | .ipynb_checkpoints 91 | 92 | # IPython 93 | profile_default/ 94 | ipython_config.py 95 | 96 | # pyenv 97 | .python-version 98 | 99 | # pipenv 100 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 101 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 102 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 103 | # install all needed dependencies. 104 | #Pipfile.lock 105 | 106 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 107 | __pypackages__/ 108 | 109 | # Celery stuff 110 | celerybeat-schedule 111 | celerybeat.pid 112 | 113 | # SageMath parsed files 114 | *.sage.py 115 | 116 | # Environments 117 | .env 118 | .venv 119 | env/ 120 | venv/ 121 | ENV/ 122 | env.bak/ 123 | venv.bak/ 124 | 125 | # Spyder project settings 126 | .spyderproject 127 | .spyproject 128 | 129 | # Rope project settings 130 | .ropeproject 131 | 132 | # mkdocs documentation 133 | /site 134 | 135 | # mypy 136 | .mypy_cache/ 137 | .dmypy.json 138 | dmypy.json 139 | 140 | # Pyre type checker 141 | .pyre/ 142 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## YOLOV7-Tiny-OBB:You Only Look Once OBB旋转目标检测模型在pytorch当中的实现 2 | --- 3 | 4 | ## 目录 5 | 1. [仓库更新 Top News](#仓库更新) 6 | 2. [相关仓库 Related code](#相关仓库) 7 | 3. [性能情况 Performance](#性能情况) 8 | 4. [所需环境 Environment](#所需环境) 9 | 5. [文件下载 Download](#文件下载) 10 | 6. [训练步骤 How2train](#训练步骤) 11 | 7. [预测步骤 How2predict](#预测步骤) 12 | 8. [评估步骤 How2eval](#评估步骤) 13 | 9. [参考资料 Reference](#Reference) 14 | 15 | ## Top News 16 | **`2023-02`**:**仓库创建,支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整、新增图片裁剪、支持多GPU训练、支持各个种类目标数量计算、支持heatmap、支持EMA。** 17 | 18 | ## 相关仓库 19 | | 目标检测模型 | 路径 | 20 | | :----- | :----- | 21 | YoloV7-OBB | https://github.com/Egrt/yolov7-obb 22 | YoloV7-Tiny-OBB | https://github.com/Egrt/yolov7-tiny-obb 23 | 24 | ## 性能情况 25 | | 训练数据集 | 权值文件名称 | 测试数据集 | 输入图片大小 | mAP 0.5 | fps | 26 | | :-----: | :------: | :------: | :------: | :------: | :------: | 27 | | UAV-ROD | [yolov7_tiny_obb_uav](https://github.com/Egrt/yolov7-tiny-obb/releases/download/V1.0.0/yolov7_tiny_obb_uav.pth) | UAV-ROD-Val | 640x640 | 98.00% | 50 | 28 | | UAV-ROD | [yolov7_tiny_trt](https://github.com/Egrt/yolov7-tiny-obb/releases/download/V1.0.0/yolov7_tiny_trt.pth) | UAV-ROD-Val | 640x640 | 97.75% | 120 | 29 | ### 预测结果展示 30 | ![预测结果](img/test.jpg) 31 | ## 所需环境 32 | cuda==11.3 33 | torch==1.10.1 34 | torchvision==0.11.2 35 | 为了使用amp混合精度,推荐使用torch1.7.1以上的版本。 36 | 37 | ## 文件下载 38 | 39 | UAV-ROD数据集下载地址如下,里面已经包括了训练集、测试集、验证集(与测试集一样),无需再次划分: 40 | 链接: https://pan.baidu.com/s/1Ae8AGb2L6zCjCwJFzs2WfA 41 | 提取码: ybec 42 | 43 | ## 训练步骤 44 | ### a、训练VOC07+12数据集 45 | 1. 数据集的准备 46 | **本文使用VOC格式进行训练,训练前需要下载好VOC07+12的数据集,解压后放在根目录** 47 | 48 | 2. 数据集的处理 49 | 修改voc_annotation.py里面的annotation_mode=2,运行voc_annotation.py生成根目录下的2007_train.txt和2007_val.txt。 50 | 生成的数据集格式为image_path, x1, y1, x2, y2, x3, y3, x4, y4(polygon), class。 51 | 52 | 3. 开始网络训练 53 | train.py的默认参数用于训练VOC数据集,直接运行train.py即可开始训练。 54 | 55 | 4. 训练结果预测 56 | 训练结果预测需要用到两个文件,分别是yolo.py和predict.py。我们首先需要去yolo.py里面修改model_path以及classes_path,这两个参数必须要修改。 57 | **model_path指向训练好的权值文件,在logs文件夹里。 58 | classes_path指向检测类别所对应的txt。** 59 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。 60 | 61 | ### b、训练自己的数据集 62 | 1. 数据集的准备 63 | **本文使用VOC格式进行训练,训练前需要自己制作好数据集,** 64 | 训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。 65 | 训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。 66 | 67 | 2. 数据集的处理 68 | 在完成数据集的摆放之后,我们需要利用voc_annotation.py获得训练用的2007_train.txt和2007_val.txt。 69 | 修改voc_annotation.py里面的参数。第一次训练可以仅修改classes_path,classes_path用于指向检测类别所对应的txt。 70 | 训练自己的数据集时,可以自己建立一个cls_classes.txt,里面写自己所需要区分的类别。 71 | model_data/cls_classes.txt文件内容为: 72 | ```python 73 | cat 74 | dog 75 | ... 76 | ``` 77 | 修改voc_annotation.py中的classes_path,使其对应cls_classes.txt,并运行voc_annotation.py。 78 | 79 | 3. 开始网络训练 80 | **训练的参数较多,均在train.py中,大家可以在下载库后仔细看注释,其中最重要的部分依然是train.py里的classes_path。** 81 | **classes_path用于指向检测类别所对应的txt,这个txt和voc_annotation.py里面的txt一样!训练自己的数据集必须要修改!** 82 | 修改完classes_path后就可以运行train.py开始训练了,在训练多个epoch后,权值会生成在logs文件夹中。 83 | 84 | 4. 训练结果预测 85 | 训练结果预测需要用到两个文件,分别是yolo.py和predict.py。在yolo.py里面修改model_path以及classes_path。 86 | **model_path指向训练好的权值文件,在logs文件夹里。 87 | classes_path指向检测类别所对应的txt。** 88 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。 89 | 90 | ## 预测步骤 91 | ### a、使用预训练权重 92 | 1. 下载完库后解压,在百度网盘下载权值,放入model_data,运行predict.py,输入 93 | ```python 94 | img/street.jpg 95 | ``` 96 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。 97 | ### b、使用自己训练的权重 98 | 1. 按照训练步骤训练。 99 | 2. 在yolo.py文件里面,在如下部分修改model_path和classes_path使其对应训练好的文件;**model_path对应logs文件夹下面的权值文件,classes_path是model_path对应分的类**。 100 | ```python 101 | _defaults = { 102 | #--------------------------------------------------------------------------# 103 | # 使用自己训练好的模型进行预测一定要修改model_path和classes_path! 104 | # model_path指向logs文件夹下的权值文件,classes_path指向model_data下的txt 105 | # 106 | # 训练好后logs文件夹下存在多个权值文件,选择验证集损失较低的即可。 107 | # 验证集损失较低不代表mAP较高,仅代表该权值在验证集上泛化性能较好。 108 | # 如果出现shape不匹配,同时要注意训练时的model_path和classes_path参数的修改 109 | #--------------------------------------------------------------------------# 110 | "model_path" : 'model_data/yolov7_weights.pth', 111 | "classes_path" : 'model_data/coco_classes.txt', 112 | #---------------------------------------------------------------------# 113 | # anchors_path代表先验框对应的txt文件,一般不修改。 114 | # anchors_mask用于帮助代码找到对应的先验框,一般不修改。 115 | #---------------------------------------------------------------------# 116 | "anchors_path" : 'model_data/yolo_anchors.txt', 117 | "anchors_mask" : [[6, 7, 8], [3, 4, 5], [0, 1, 2]], 118 | #---------------------------------------------------------------------# 119 | # 输入图片的大小,必须为32的倍数。 120 | #---------------------------------------------------------------------# 121 | "input_shape" : [640, 640], 122 | #---------------------------------------------------------------------# 123 | # 只有得分大于置信度的预测框会被保留下来 124 | #---------------------------------------------------------------------# 125 | "confidence" : 0.5, 126 | #---------------------------------------------------------------------# 127 | # 非极大抑制所用到的nms_iou大小 128 | #---------------------------------------------------------------------# 129 | "nms_iou" : 0.3, 130 | #---------------------------------------------------------------------# 131 | # 该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize, 132 | # 在多次测试后,发现关闭letterbox_image直接resize的效果更好 133 | #---------------------------------------------------------------------# 134 | "letterbox_image" : True, 135 | #-------------------------------# 136 | # 是否使用Cuda 137 | # 没有GPU可以设置成False 138 | #-------------------------------# 139 | "cuda" : True, 140 | } 141 | ``` 142 | 3. 运行predict.py,输入 143 | ```python 144 | img/street.jpg 145 | ``` 146 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。 147 | 148 | ## 评估步骤 149 | ### a、评估VOC07+12的测试集 150 | 1. 本文使用VOC格式进行评估。VOC07+12已经划分好了测试集,无需利用voc_annotation.py生成ImageSets文件夹下的txt。 151 | 2. 在yolo.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件,在logs文件夹里。classes_path指向检测类别所对应的txt。** 152 | 3. 运行get_map.py即可获得评估结果,评估结果会保存在map_out文件夹中。 153 | 154 | ### b、评估自己的数据集 155 | 1. 本文使用VOC格式进行评估。 156 | 2. 如果在训练前已经运行过voc_annotation.py文件,代码会自动将数据集划分成训练集、验证集和测试集。如果想要修改测试集的比例,可以修改voc_annotation.py文件下的trainval_percent。trainval_percent用于指定(训练集+验证集)与测试集的比例,默认情况下 (训练集+验证集):测试集 = 9:1。train_percent用于指定(训练集+验证集)中训练集与验证集的比例,默认情况下 训练集:验证集 = 9:1。 157 | 3. 利用voc_annotation.py划分测试集后,前往get_map.py文件修改classes_path,classes_path用于指向检测类别所对应的txt,这个txt和训练时的txt一样。评估自己的数据集必须要修改。 158 | 4. 在yolo.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件,在logs文件夹里。classes_path指向检测类别所对应的txt。** 159 | 5. 运行get_map.py即可获得评估结果,评估结果会保存在map_out文件夹中。 160 | 161 | ## Reference 162 | https://github.com/WongKinYiu/yolov7 163 | 164 | https://github.com/bubbliiiing/yolov7-tiny-pytorch 165 | -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/Annotations/README.md: -------------------------------------------------------------------------------- 1 | 存放标签文件 -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/ImageSets/Main/README.md: -------------------------------------------------------------------------------- 1 | 存放训练索引文件 -------------------------------------------------------------------------------- /VOCdevkit/VOC2007/JPEGImages/README.md: -------------------------------------------------------------------------------- 1 | 存放图片文件 -------------------------------------------------------------------------------- /convert_trt.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Description: 3 | Author: Egrt 4 | Date: 2023-02-18 16:44:58 5 | LastEditors: Egrt 6 | LastEditTime: 2023-02-18 17:25:24 7 | ''' 8 | import torch 9 | from nets.yolo import YoloBody 10 | from utils.utils import (cvtColor, get_anchors, get_classes, preprocess_input, 11 | resize_image, show_config) 12 | from torch2trt import torch2trt 13 | 14 | class YOLO(object): 15 | _defaults = { 16 | #--------------------------------------------------------------------------# 17 | # 使用自己训练好的模型进行预测一定要修改model_path和classes_path! 18 | # model_path指向logs文件夹下的权值文件,classes_path指向model_data下的txt 19 | # 20 | # 训练好后logs文件夹下存在多个权值文件,选择验证集损失较低的即可。 21 | # 验证集损失较低不代表mAP较高,仅代表该权值在验证集上泛化性能较好。 22 | # 如果出现shape不匹配,同时要注意训练时的model_path和classes_path参数的修改 23 | #--------------------------------------------------------------------------# 24 | "model_path" : 'model_data/yolov7_tiny_obb_uav.pth', 25 | "classes_path" : 'model_data/uav_classes.txt', 26 | #---------------------------------------------------------------------# 27 | # anchors_path代表先验框对应的txt文件,一般不修改。 28 | # anchors_mask用于帮助代码找到对应的先验框,一般不修改。 29 | #---------------------------------------------------------------------# 30 | "anchors_path" : 'model_data/yolo_anchors.txt', 31 | "anchors_mask" : [[6, 7, 8], [3, 4, 5], [0, 1, 2]], 32 | #---------------------------------------------------------------------# 33 | # 输入图片的大小,必须为32的倍数。 34 | #---------------------------------------------------------------------# 35 | "input_shape" : [640, 640], 36 | #---------------------------------------------------------------------# 37 | # 只有得分大于置信度的预测框会被保留下来 38 | #---------------------------------------------------------------------# 39 | "confidence" : 0.5, 40 | #---------------------------------------------------------------------# 41 | # 非极大抑制所用到的nms_iou大小 42 | #---------------------------------------------------------------------# 43 | "nms_iou" : 0.3, 44 | #---------------------------------------------------------------------# 45 | # 该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize, 46 | # 在多次测试后,发现关闭letterbox_image直接resize的效果更好 47 | #---------------------------------------------------------------------# 48 | "letterbox_image" : True, 49 | #-------------------------------# 50 | # 是否使用Cuda 51 | # 没有GPU可以设置成False 52 | #-------------------------------# 53 | "cuda" : True, 54 | } 55 | 56 | @classmethod 57 | def get_defaults(cls, n): 58 | if n in cls._defaults: 59 | return cls._defaults[n] 60 | else: 61 | return "Unrecognized attribute name '" + n + "'" 62 | 63 | #---------------------------------------------------# 64 | # 初始化YOLO 65 | #---------------------------------------------------# 66 | def __init__(self, **kwargs): 67 | self.__dict__.update(self._defaults) 68 | for name, value in kwargs.items(): 69 | setattr(self, name, value) 70 | self._defaults[name] = value 71 | 72 | #---------------------------------------------------# 73 | # 获得种类和先验框的数量 74 | #---------------------------------------------------# 75 | self.class_names, self.num_classes = get_classes(self.classes_path) 76 | self.anchors, self.num_anchors = get_anchors(self.anchors_path) 77 | self.generate() 78 | show_config(**self._defaults) 79 | 80 | #---------------------------------------------------# 81 | # 生成模型 82 | #---------------------------------------------------# 83 | def generate(self, onnx=False): 84 | #---------------------------------------------------# 85 | # 建立yolo模型,载入yolo模型的权重 86 | #---------------------------------------------------# 87 | self.net = YoloBody(self.anchors_mask, self.num_classes) 88 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 89 | self.net.load_state_dict(torch.load(self.model_path, map_location=device)) 90 | self.net = self.net.fuse().eval() 91 | print('{} model, and classes loaded.'.format(self.model_path)) 92 | if self.cuda: 93 | self.net = self.net.cuda() 94 | 95 | # create some regular pytorch model... 96 | model = YOLO().net 97 | 98 | # create example data 99 | x = torch.ones((1, 3, 640, 640)).cuda() 100 | 101 | # convert to TensorRT feeding sample data as input 102 | model_trt = torch2trt(model, [x], fp16_mode=True) 103 | 104 | y = model(x) 105 | y_trt = model_trt(x) 106 | 107 | # check the output against PyTorch 108 | # print(torch.max(torch.abs(y - y_trt))) 109 | 110 | # save the tensorrt model 111 | torch.save(model_trt.state_dict(), 'model_data/yolov7_tiny_trt.pth') 112 | 113 | -------------------------------------------------------------------------------- /get_map.py: -------------------------------------------------------------------------------- 1 | import os 2 | import xml.etree.ElementTree as ET 3 | import cv2 4 | from PIL import Image 5 | from tqdm import tqdm 6 | import numpy as np 7 | from utils.utils import get_classes 8 | from utils.utils_map import get_coco_map, get_map 9 | from utils.utils_rbox import * 10 | from yolo import YOLO 11 | 12 | if __name__ == "__main__": 13 | ''' 14 | Recall和Precision不像AP是一个面积的概念,因此在门限值(Confidence)不同时,网络的Recall和Precision值是不同的。 15 | 默认情况下,本代码计算的Recall和Precision代表的是当门限值(Confidence)为0.5时,所对应的Recall和Precision值。 16 | 17 | 受到mAP计算原理的限制,网络在计算mAP时需要获得近乎所有的预测框,这样才可以计算不同门限条件下的Recall和Precision值 18 | 因此,本代码获得的map_out/detection-results/里面的txt的框的数量一般会比直接predict多一些,目的是列出所有可能的预测框, 19 | ''' 20 | #------------------------------------------------------------------------------------------------------------------# 21 | # map_mode用于指定该文件运行时计算的内容 22 | # map_mode为0代表整个map计算流程,包括获得预测结果、获得真实框、计算VOC_map。 23 | # map_mode为1代表仅仅获得预测结果。 24 | # map_mode为2代表仅仅获得真实框。 25 | # map_mode为3代表仅仅计算VOC_map。 26 | # map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行 27 | #-------------------------------------------------------------------------------------------------------------------# 28 | map_mode = 0 29 | #--------------------------------------------------------------------------------------# 30 | # 此处的classes_path用于指定需要测量VOC_map的类别 31 | # 一般情况下与训练和预测所用的classes_path一致即可 32 | #--------------------------------------------------------------------------------------# 33 | classes_path = 'model_data/uav_classes.txt' 34 | #--------------------------------------------------------------------------------------# 35 | # MINOVERLAP用于指定想要获得的mAP0.x,mAP0.x的意义是什么请同学们百度一下。 36 | # 比如计算mAP0.75,可以设定MINOVERLAP = 0.75。 37 | # 38 | # 当某一预测框与真实框重合度大于MINOVERLAP时,该预测框被认为是正样本,否则为负样本。 39 | # 因此MINOVERLAP的值越大,预测框要预测的越准确才能被认为是正样本,此时算出来的mAP值越低, 40 | #--------------------------------------------------------------------------------------# 41 | MINOVERLAP = 0.5 42 | #--------------------------------------------------------------------------------------# 43 | # 受到mAP计算原理的限制,网络在计算mAP时需要获得近乎所有的预测框,这样才可以计算mAP 44 | # 因此,confidence的值应当设置的尽量小进而获得全部可能的预测框。 45 | # 46 | # 该值一般不调整。因为计算mAP需要获得近乎所有的预测框,此处的confidence不能随便更改。 47 | # 想要获得不同门限值下的Recall和Precision值,请修改下方的score_threhold。 48 | #--------------------------------------------------------------------------------------# 49 | confidence = 0.001 50 | #--------------------------------------------------------------------------------------# 51 | # 预测时使用到的非极大抑制值的大小,越大表示非极大抑制越不严格。 52 | # 53 | # 该值一般不调整。 54 | #--------------------------------------------------------------------------------------# 55 | nms_iou = 0.5 56 | #---------------------------------------------------------------------------------------------------------------# 57 | # Recall和Precision不像AP是一个面积的概念,因此在门限值不同时,网络的Recall和Precision值是不同的。 58 | # 59 | # 默认情况下,本代码计算的Recall和Precision代表的是当门限值为0.5(此处定义为score_threhold)时所对应的Recall和Precision值。 60 | # 因为计算mAP需要获得近乎所有的预测框,上面定义的confidence不能随便更改。 61 | # 这里专门定义一个score_threhold用于代表门限值,进而在计算mAP时找到门限值对应的Recall和Precision值。 62 | #---------------------------------------------------------------------------------------------------------------# 63 | score_threhold = 0.5 64 | #-------------------------------------------------------# 65 | # map_vis用于指定是否开启VOC_map计算的可视化 66 | #-------------------------------------------------------# 67 | map_vis = False 68 | #-------------------------------------------------------# 69 | # 指向VOC数据集所在的文件夹 70 | # 默认指向根目录下的VOC数据集 71 | #-------------------------------------------------------# 72 | VOCdevkit_path = 'VOCdevkit' 73 | #-------------------------------------------------------# 74 | # 结果输出的文件夹,默认为map_out 75 | #-------------------------------------------------------# 76 | map_out_path = 'map_out' 77 | 78 | image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split() 79 | 80 | if not os.path.exists(map_out_path): 81 | os.makedirs(map_out_path) 82 | if not os.path.exists(os.path.join(map_out_path, 'ground-truth')): 83 | os.makedirs(os.path.join(map_out_path, 'ground-truth')) 84 | if not os.path.exists(os.path.join(map_out_path, 'detection-results')): 85 | os.makedirs(os.path.join(map_out_path, 'detection-results')) 86 | if not os.path.exists(os.path.join(map_out_path, 'images-optional')): 87 | os.makedirs(os.path.join(map_out_path, 'images-optional')) 88 | 89 | class_names, _ = get_classes(classes_path) 90 | 91 | if map_mode == 0 or map_mode == 1: 92 | print("Load model.") 93 | yolo = YOLO(confidence = confidence, nms_iou = nms_iou) 94 | print("Load model done.") 95 | 96 | print("Get predict result.") 97 | for image_id in tqdm(image_ids): 98 | image_path = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg") 99 | image = Image.open(image_path) 100 | if map_vis: 101 | image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg")) 102 | yolo.get_map_txt(image_id, image, class_names, map_out_path) 103 | print("Get predict result done.") 104 | 105 | if map_mode == 0 or map_mode == 2: 106 | print("Get ground truth result.") 107 | for image_id in tqdm(image_ids): 108 | with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f: 109 | root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot() 110 | for obj in root.findall('object'): 111 | difficult_flag = False 112 | if obj.find('difficult')!=None: 113 | difficult = obj.find('difficult').text 114 | if int(difficult)==1: 115 | difficult_flag = True 116 | obj_name = obj.find('name').text 117 | if obj_name not in class_names: 118 | continue 119 | bndbox = obj.find('robndbox') 120 | cx = float(bndbox.find('cx').text) 121 | cy = float(bndbox.find('cy').text) 122 | h = float(bndbox.find('h').text) 123 | w = float(bndbox.find('w').text) 124 | theta = float(bndbox.find('angle').text) 125 | rbox = np.array([[cx, cy, w, h, theta]], dtype=np.float32) 126 | poly = rbox2poly(rbox)[0] 127 | poly = np.float32(poly.reshape(4, 2)) 128 | (x, y), (w, h), angle = cv2.minAreaRect(poly) # θ ∈ [0, 90] 129 | if difficult_flag: 130 | new_f.write("%s %s %s %s %s %s difficult\n" % (obj_name, int(x), int(y), int(w), int(h), angle)) 131 | else: 132 | new_f.write("%s %s %s %s %s %s\n" % (obj_name, int(x), int(y), int(w), int(h), angle)) 133 | print("Get ground truth result done.") 134 | 135 | if map_mode == 0 or map_mode == 3: 136 | print("Get map.") 137 | get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path) 138 | print("Get map done.") 139 | 140 | if map_mode == 4: 141 | print("Get map.") 142 | get_coco_map(class_names = class_names, path = map_out_path) 143 | print("Get map done.") 144 | -------------------------------------------------------------------------------- /hrsc_annotation.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import xml.etree.ElementTree as ET 4 | 5 | import numpy as np 6 | from utils.utils_rbox import * 7 | from utils.utils import get_classes 8 | 9 | #--------------------------------------------------------------------------------------------------------------------------------# 10 | # annotation_mode用于指定该文件运行时计算的内容 11 | # annotation_mode为0代表整个标签处理过程,包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt 12 | # annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt 13 | # annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt 14 | #--------------------------------------------------------------------------------------------------------------------------------# 15 | annotation_mode = 0 16 | #-------------------------------------------------------------------# 17 | # 必须要修改,用于生成2007_train.txt、2007_val.txt的目标信息 18 | # 与训练和预测所用的classes_path一致即可 19 | # 如果生成的2007_train.txt里面没有目标信息 20 | # 那么就是因为classes没有设定正确 21 | # 仅在annotation_mode为0和2的时候有效 22 | #-------------------------------------------------------------------# 23 | classes_path = 'model_data/hrsc_classes.txt' 24 | #--------------------------------------------------------------------------------------------------------------------------------# 25 | # trainval_percent用于指定(训练集+验证集)与测试集的比例,默认情况下 (训练集+验证集):测试集 = 9:1 26 | # train_percent用于指定(训练集+验证集)中训练集与验证集的比例,默认情况下 训练集:验证集 = 9:1 27 | # 仅在annotation_mode为0和1的时候有效 28 | #--------------------------------------------------------------------------------------------------------------------------------# 29 | trainval_percent = 0.9 30 | train_percent = 0.9 31 | #-------------------------------------------------------# 32 | # 指向VOC数据集所在的文件夹 33 | # 默认指向根目录下的VOC数据集 34 | #-------------------------------------------------------# 35 | VOCdevkit_path = 'VOCdevkit' 36 | 37 | VOCdevkit_sets = [('2007_HRSC', 'train'), ('2007_HRSC', 'val')] 38 | classes, _ = get_classes(classes_path) 39 | 40 | #-------------------------------------------------------# 41 | # 统计目标数量 42 | #-------------------------------------------------------# 43 | photo_nums = np.zeros(len(VOCdevkit_sets)) 44 | nums = np.zeros(len(classes)) 45 | def convert_annotation(year, image_id, list_file): 46 | in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8') 47 | tree=ET.parse(in_file) 48 | root = tree.getroot().find('HRSC_Objects') 49 | 50 | for obj in root.iter('HRSC_Object'): 51 | difficult = 0 52 | if obj.find('difficult')!=None: 53 | difficult = obj.find('difficult').text 54 | cls = obj.find('name').text 55 | if cls not in classes or int(difficult)==1: 56 | continue 57 | if obj.find('mbox_cx')==None: 58 | continue 59 | cls_id = classes.index(cls) 60 | cx = float(obj.find('mbox_cx').text) 61 | cy = float(obj.find('mbox_cy').text) 62 | w = float(obj.find('mbox_w').text) 63 | h = float(obj.find('mbox_h').text) 64 | angle = float(obj.find('mbox_ang').text) 65 | b = np.array([[cx, cy, w, h, angle]], dtype=np.float32) 66 | b = rbox2poly(b)[0] 67 | b = (b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7]) 68 | list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id)) 69 | 70 | nums[classes.index(cls)] = nums[classes.index(cls)] + 1 71 | 72 | if __name__ == "__main__": 73 | random.seed(0) 74 | if " " in os.path.abspath(VOCdevkit_path): 75 | raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格,否则会影响正常的模型训练,请注意修改。") 76 | 77 | if annotation_mode == 0 or annotation_mode == 1: 78 | print("Generate txt in ImageSets.") 79 | xmlfilepath = os.path.join(VOCdevkit_path, 'VOC2007_HRSC/Annotations') 80 | saveBasePath = os.path.join(VOCdevkit_path, 'VOC2007_HRSC/ImageSets/Main') 81 | temp_xml = os.listdir(xmlfilepath) 82 | total_xml = [] 83 | for xml in temp_xml: 84 | if xml.endswith(".xml"): 85 | total_xml.append(xml) 86 | 87 | num = len(total_xml) 88 | list = range(num) 89 | tv = int(num*trainval_percent) 90 | tr = int(tv*train_percent) 91 | trainval= random.sample(list,tv) 92 | train = random.sample(trainval,tr) 93 | 94 | print("train and val size",tv) 95 | print("train size",tr) 96 | ftrainval = open(os.path.join(saveBasePath,'trainval.txt'), 'w') 97 | ftest = open(os.path.join(saveBasePath,'test.txt'), 'w') 98 | ftrain = open(os.path.join(saveBasePath,'train.txt'), 'w') 99 | fval = open(os.path.join(saveBasePath,'val.txt'), 'w') 100 | 101 | for i in list: 102 | name=total_xml[i][:-4]+'\n' 103 | if i in trainval: 104 | ftrainval.write(name) 105 | if i in train: 106 | ftrain.write(name) 107 | else: 108 | fval.write(name) 109 | else: 110 | ftest.write(name) 111 | 112 | ftrainval.close() 113 | ftrain.close() 114 | fval.close() 115 | ftest.close() 116 | print("Generate txt in ImageSets done.") 117 | 118 | if annotation_mode == 0 or annotation_mode == 2: 119 | print("Generate 2007_train.txt and 2007_val.txt for train.") 120 | type_index = 0 121 | for year, image_set in VOCdevkit_sets: 122 | image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split() 123 | list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8') 124 | for image_id in image_ids: 125 | list_file.write('%s/VOC%s/JPEGImages/%s.bmp'%(os.path.abspath(VOCdevkit_path), year, image_id)) 126 | 127 | convert_annotation(year, image_id, list_file) 128 | list_file.write('\n') 129 | photo_nums[type_index] = len(image_ids) 130 | type_index += 1 131 | list_file.close() 132 | print("Generate 2007_train.txt and 2007_val.txt for train done.") 133 | 134 | def printTable(List1, List2): 135 | for i in range(len(List1[0])): 136 | print("|", end=' ') 137 | for j in range(len(List1)): 138 | print(List1[j][i].rjust(int(List2[j])), end=' ') 139 | print("|", end=' ') 140 | print() 141 | 142 | str_nums = [str(int(x)) for x in nums] 143 | tableData = [ 144 | classes, str_nums 145 | ] 146 | colWidths = [0]*len(tableData) 147 | len1 = 0 148 | for i in range(len(tableData)): 149 | for j in range(len(tableData[i])): 150 | if len(tableData[i][j]) > colWidths[i]: 151 | colWidths[i] = len(tableData[i][j]) 152 | printTable(tableData, colWidths) 153 | 154 | if photo_nums[0] <= 500: 155 | print("训练集数量小于500,属于较小的数据量,请注意设置较大的训练世代(Epoch)以满足足够的梯度下降次数(Step)。") 156 | 157 | if np.sum(nums) == 0: 158 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 159 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 160 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 161 | print("(重要的事情说三遍)。") 162 | -------------------------------------------------------------------------------- /img/test.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Egrt/yolov7-tiny-obb/92139b483f07eaaa61e91138030946976826d0db/img/test.jpg -------------------------------------------------------------------------------- /kmeans_for_anchors.py: -------------------------------------------------------------------------------- 1 | #-------------------------------------------------------------------------------------------------------# 2 | # kmeans虽然会对数据集中的框进行聚类,但是很多数据集由于框的大小相近,聚类出来的9个框相差不大, 3 | # 这样的框反而不利于模型的训练。因为不同的特征层适合不同大小的先验框,shape越小的特征层适合越大的先验框 4 | # 原始网络的先验框已经按大中小比例分配好了,不进行聚类也会有非常好的效果。 5 | #-------------------------------------------------------------------------------------------------------# 6 | import glob 7 | import xml.etree.ElementTree as ET 8 | 9 | import matplotlib.pyplot as plt 10 | import numpy as np 11 | from tqdm import tqdm 12 | 13 | 14 | def cas_ratio(box,cluster): 15 | ratios_of_box_cluster = box / cluster 16 | ratios_of_cluster_box = cluster / box 17 | ratios = np.concatenate([ratios_of_box_cluster, ratios_of_cluster_box], axis = -1) 18 | 19 | return np.max(ratios, -1) 20 | 21 | def avg_ratio(box,cluster): 22 | return np.mean([np.min(cas_ratio(box[i],cluster)) for i in range(box.shape[0])]) 23 | 24 | def kmeans(box,k): 25 | #-------------------------------------------------------------# 26 | # 取出一共有多少框 27 | #-------------------------------------------------------------# 28 | row = box.shape[0] 29 | 30 | #-------------------------------------------------------------# 31 | # 每个框各个点的位置 32 | #-------------------------------------------------------------# 33 | distance = np.empty((row,k)) 34 | 35 | #-------------------------------------------------------------# 36 | # 最后的聚类位置 37 | #-------------------------------------------------------------# 38 | last_clu = np.zeros((row,)) 39 | 40 | np.random.seed() 41 | 42 | #-------------------------------------------------------------# 43 | # 随机选5个当聚类中心 44 | #-------------------------------------------------------------# 45 | cluster = box[np.random.choice(row,k,replace = False)] 46 | 47 | iter = 0 48 | while True: 49 | #-------------------------------------------------------------# 50 | # 计算当前框和先验框的宽高比例 51 | #-------------------------------------------------------------# 52 | for i in range(row): 53 | distance[i] = cas_ratio(box[i],cluster) 54 | 55 | #-------------------------------------------------------------# 56 | # 取出最小点 57 | #-------------------------------------------------------------# 58 | near = np.argmin(distance,axis=1) 59 | 60 | if (last_clu == near).all(): 61 | break 62 | 63 | #-------------------------------------------------------------# 64 | # 求每一个类的中位点 65 | #-------------------------------------------------------------# 66 | for j in range(k): 67 | cluster[j] = np.median( 68 | box[near == j],axis=0) 69 | 70 | last_clu = near 71 | if iter % 5 == 0: 72 | print('iter: {:d}. avg_ratio:{:.2f}'.format(iter, avg_ratio(box,cluster))) 73 | iter += 1 74 | 75 | return cluster, near 76 | 77 | def load_data(path): 78 | data = [] 79 | #-------------------------------------------------------------# 80 | # 对于每一个xml都寻找box 81 | #-------------------------------------------------------------# 82 | for xml_file in tqdm(glob.glob('{}/*xml'.format(path))): 83 | tree = ET.parse(xml_file) 84 | height = int(tree.findtext('./size/height')) 85 | width = int(tree.findtext('./size/width')) 86 | if height<=0 or width<=0: 87 | continue 88 | 89 | #-------------------------------------------------------------# 90 | # 对于每一个目标都获得它的宽高 91 | #-------------------------------------------------------------# 92 | for obj in tree.iter('object'): 93 | xmin = int(float(obj.findtext('bndbox/xmin'))) / width 94 | ymin = int(float(obj.findtext('bndbox/ymin'))) / height 95 | xmax = int(float(obj.findtext('bndbox/xmax'))) / width 96 | ymax = int(float(obj.findtext('bndbox/ymax'))) / height 97 | 98 | xmin = np.float64(xmin) 99 | ymin = np.float64(ymin) 100 | xmax = np.float64(xmax) 101 | ymax = np.float64(ymax) 102 | # 得到宽高 103 | data.append([xmax-xmin,ymax-ymin]) 104 | return np.array(data) 105 | 106 | if __name__ == '__main__': 107 | np.random.seed(0) 108 | #-------------------------------------------------------------# 109 | # 运行该程序会计算'./VOCdevkit/VOC2007/Annotations'的xml 110 | # 会生成yolo_anchors.txt 111 | #-------------------------------------------------------------# 112 | input_shape = [640, 640] 113 | anchors_num = 9 114 | #-------------------------------------------------------------# 115 | # 载入数据集,可以使用VOC的xml 116 | #-------------------------------------------------------------# 117 | path = 'VOCdevkit/VOC2007/Annotations' 118 | 119 | #-------------------------------------------------------------# 120 | # 载入所有的xml 121 | # 存储格式为转化为比例后的width,height 122 | #-------------------------------------------------------------# 123 | print('Load xmls.') 124 | data = load_data(path) 125 | print('Load xmls done.') 126 | 127 | #-------------------------------------------------------------# 128 | # 使用k聚类算法 129 | #-------------------------------------------------------------# 130 | print('K-means boxes.') 131 | cluster, near = kmeans(data, anchors_num) 132 | print('K-means boxes done.') 133 | data = data * np.array([input_shape[1], input_shape[0]]) 134 | cluster = cluster * np.array([input_shape[1], input_shape[0]]) 135 | 136 | #-------------------------------------------------------------# 137 | # 绘图 138 | #-------------------------------------------------------------# 139 | for j in range(anchors_num): 140 | plt.scatter(data[near == j][:,0], data[near == j][:,1]) 141 | plt.scatter(cluster[j][0], cluster[j][1], marker='x', c='black') 142 | plt.savefig("kmeans_for_anchors.jpg") 143 | plt.show() 144 | print('Save kmeans_for_anchors.jpg in root dir.') 145 | 146 | cluster = cluster[np.argsort(cluster[:, 0] * cluster[:, 1])] 147 | print('avg_ratio:{:.2f}'.format(avg_ratio(data, cluster))) 148 | print(cluster) 149 | 150 | f = open("yolo_anchors.txt", 'w') 151 | row = np.shape(cluster)[0] 152 | for i in range(row): 153 | if i == 0: 154 | x_y = "%d,%d" % (cluster[i][0], cluster[i][1]) 155 | else: 156 | x_y = ", %d,%d" % (cluster[i][0], cluster[i][1]) 157 | f.write(x_y) 158 | f.close() 159 | -------------------------------------------------------------------------------- /logs/README.md: -------------------------------------------------------------------------------- 1 | 训练好的权重会保存在这里 2 | -------------------------------------------------------------------------------- /model_data/coco_classes.txt: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /model_data/simhei.ttf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Egrt/yolov7-tiny-obb/92139b483f07eaaa61e91138030946976826d0db/model_data/simhei.ttf -------------------------------------------------------------------------------- /model_data/uav_classes.txt: -------------------------------------------------------------------------------- 1 | car -------------------------------------------------------------------------------- /model_data/voc_classes.txt: -------------------------------------------------------------------------------- 1 | aeroplane 2 | bicycle 3 | bird 4 | boat 5 | bottle 6 | bus 7 | car 8 | cat 9 | chair 10 | cow 11 | diningtable 12 | dog 13 | horse 14 | motorbike 15 | person 16 | pottedplant 17 | sheep 18 | sofa 19 | train 20 | tvmonitor -------------------------------------------------------------------------------- /model_data/yolo_anchors.txt: -------------------------------------------------------------------------------- 1 | 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 -------------------------------------------------------------------------------- /nets/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /nets/backbone.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | def autopad(k, p=None): 6 | if p is None: 7 | p = k // 2 if isinstance(k, int) else [x // 2 for x in k] 8 | return p 9 | 10 | class Conv(nn.Module): 11 | def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=nn.LeakyReLU(0.1, inplace=True)): # ch_in, ch_out, kernel, stride, padding, groups 12 | super(Conv, self).__init__() 13 | self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False) 14 | self.bn = nn.BatchNorm2d(c2, eps=0.001, momentum=0.03) 15 | self.act = nn.LeakyReLU(0.1, inplace=True) if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) 16 | 17 | def forward(self, x): 18 | return self.act(self.bn(self.conv(x))) 19 | 20 | def fuseforward(self, x): 21 | return self.act(self.conv(x)) 22 | 23 | class Multi_Concat_Block(nn.Module): 24 | def __init__(self, c1, c2, c3, n=4, e=1, ids=[0]): 25 | super(Multi_Concat_Block, self).__init__() 26 | c_ = int(c2 * e) 27 | 28 | self.ids = ids 29 | self.cv1 = Conv(c1, c_, 1, 1) 30 | self.cv2 = Conv(c1, c_, 1, 1) 31 | self.cv3 = nn.ModuleList( 32 | [Conv(c_ if i ==0 else c2, c2, 3, 1) for i in range(n)] 33 | ) 34 | self.cv4 = Conv(c_ * 2 + c2 * (len(ids) - 2), c3, 1, 1) 35 | 36 | def forward(self, x): 37 | x_1 = self.cv1(x) 38 | x_2 = self.cv2(x) 39 | 40 | x_all = [x_1, x_2] 41 | for i in range(len(self.cv3)): 42 | x_2 = self.cv3[i](x_2) 43 | x_all.append(x_2) 44 | 45 | out = self.cv4(torch.cat([x_all[id] for id in self.ids], 1)) 46 | return out 47 | 48 | class MP(nn.Module): 49 | def __init__(self, k=2): 50 | super(MP, self).__init__() 51 | self.m = nn.MaxPool2d(kernel_size=k, stride=k) 52 | 53 | def forward(self, x): 54 | return self.m(x) 55 | 56 | class Backbone(nn.Module): 57 | def __init__(self, transition_channels, block_channels, n, pretrained=False): 58 | super().__init__() 59 | #-----------------------------------------------# 60 | # 输入图片是640, 640, 3 61 | #-----------------------------------------------# 62 | ids = [-1, -2, -3, -4] 63 | # 640, 640, 3 => 320, 320, 64 64 | self.stem = Conv(3, transition_channels * 2, 3, 2) 65 | # 320, 320, 64 => 160, 160, 128 => 160, 160, 128 66 | self.dark2 = nn.Sequential( 67 | Conv(transition_channels * 2, transition_channels * 4, 3, 2), 68 | Multi_Concat_Block(transition_channels * 4, block_channels * 2, transition_channels * 4, n=n, ids=ids), 69 | ) 70 | # 160, 160, 128 => 80, 80, 128 => 80, 80, 256 71 | self.dark3 = nn.Sequential( 72 | MP(), 73 | Multi_Concat_Block(transition_channels * 4, block_channels * 4, transition_channels * 8, n=n, ids=ids), 74 | ) 75 | # 80, 80, 256 => 40, 40, 256 => 40, 40, 512 76 | self.dark4 = nn.Sequential( 77 | MP(), 78 | Multi_Concat_Block(transition_channels * 8, block_channels * 8, transition_channels * 16, n=n, ids=ids), 79 | ) 80 | # 40, 40, 512 => 20, 20, 512 => 20, 20, 1024 81 | self.dark5 = nn.Sequential( 82 | MP(), 83 | Multi_Concat_Block(transition_channels * 16, block_channels * 16, transition_channels * 32, n=n, ids=ids), 84 | ) 85 | 86 | if pretrained: 87 | url = 'https://github.com/bubbliiiing/yolov7-tiny-pytorch/releases/download/v1.0/yolov7_tiny_backbone_weights.pth' 88 | checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", model_dir="./model_data") 89 | self.load_state_dict(checkpoint, strict=False) 90 | print("Load weights from " + url.split('/')[-1]) 91 | 92 | def forward(self, x): 93 | x = self.stem(x) 94 | x = self.dark2(x) 95 | #-----------------------------------------------# 96 | # dark3的输出为80, 80, 256,是一个有效特征层 97 | #-----------------------------------------------# 98 | x = self.dark3(x) 99 | feat1 = x 100 | #-----------------------------------------------# 101 | # dark4的输出为40, 40, 512,是一个有效特征层 102 | #-----------------------------------------------# 103 | x = self.dark4(x) 104 | feat2 = x 105 | #-----------------------------------------------# 106 | # dark5的输出为20, 20, 1024,是一个有效特征层 107 | #-----------------------------------------------# 108 | x = self.dark5(x) 109 | feat3 = x 110 | return feat1, feat2, feat3 111 | -------------------------------------------------------------------------------- /nets/yolo.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | from nets.backbone import Backbone, Multi_Concat_Block, Conv 5 | 6 | 7 | class SPPCSPC(nn.Module): 8 | # CSP https://github.com/WongKinYiu/CrossStagePartialNetworks 9 | def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(13, 9, 5)): 10 | super(SPPCSPC, self).__init__() 11 | c_ = int(2 * c2 * e) # hidden channels 12 | self.cv1 = Conv(c1, c_, 1, 1) 13 | self.cv2 = Conv(c1, c_, 1, 1) 14 | self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k]) 15 | self.cv3 = Conv(4 * c_, c_, 1, 1) 16 | self.cv4 = Conv(2 * c_, c2, 1, 1) 17 | 18 | def forward(self, x): 19 | x1 = self.cv1(x) 20 | y1 = self.cv3(torch.cat([m(x1) for m in self.m] + [x1], 1)) 21 | y2 = self.cv2(x) 22 | return self.cv4(torch.cat((y1, y2), dim=1)) 23 | 24 | def fuse_conv_and_bn(conv, bn): 25 | fusedconv = nn.Conv2d(conv.in_channels, 26 | conv.out_channels, 27 | kernel_size=conv.kernel_size, 28 | stride=conv.stride, 29 | padding=conv.padding, 30 | groups=conv.groups, 31 | bias=True).requires_grad_(False).to(conv.weight.device) 32 | 33 | w_conv = conv.weight.clone().view(conv.out_channels, -1) 34 | w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var))) 35 | fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape)) 36 | 37 | b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device) if conv.bias is None else conv.bias 38 | b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps)) 39 | fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn) 40 | return fusedconv 41 | 42 | #---------------------------------------------------# 43 | # yolo_body 44 | #---------------------------------------------------# 45 | class YoloBody(nn.Module): 46 | def __init__(self, anchors_mask, num_classes, pretrained=False): 47 | super(YoloBody, self).__init__() 48 | #-----------------------------------------------# 49 | # 定义了不同yolov7-tiny的参数 50 | #-----------------------------------------------# 51 | transition_channels = 16 52 | block_channels = 16 53 | panet_channels = 16 54 | e = 1 55 | n = 2 56 | ids = [-1, -2, -3, -4] 57 | #-----------------------------------------------# 58 | # 输入图片是640, 640, 3 59 | #-----------------------------------------------# 60 | 61 | #---------------------------------------------------# 62 | # 生成主干模型 63 | # 获得三个有效特征层,他们的shape分别是: 64 | # 80, 80, 512 65 | # 40, 40, 1024 66 | # 20, 20, 1024 67 | #---------------------------------------------------# 68 | self.backbone = Backbone(transition_channels, block_channels, n, pretrained=pretrained) 69 | 70 | self.upsample = nn.Upsample(scale_factor=2, mode="nearest") 71 | 72 | self.sppcspc = SPPCSPC(transition_channels * 32, transition_channels * 16) 73 | self.conv_for_P5 = Conv(transition_channels * 16, transition_channels * 8) 74 | self.conv_for_feat2 = Conv(transition_channels * 16, transition_channels * 8) 75 | self.conv3_for_upsample1 = Multi_Concat_Block(transition_channels * 16, panet_channels * 4, transition_channels * 8, e=e, n=n, ids=ids) 76 | 77 | self.conv_for_P4 = Conv(transition_channels * 8, transition_channels * 4) 78 | self.conv_for_feat1 = Conv(transition_channels * 8, transition_channels * 4) 79 | self.conv3_for_upsample2 = Multi_Concat_Block(transition_channels * 8, panet_channels * 2, transition_channels * 4, e=e, n=n, ids=ids) 80 | 81 | self.down_sample1 = Conv(transition_channels * 4, transition_channels * 8, k=3, s=2) 82 | self.conv3_for_downsample1 = Multi_Concat_Block(transition_channels * 16, panet_channels * 4, transition_channels * 8, e=e, n=n, ids=ids) 83 | 84 | self.down_sample2 = Conv(transition_channels * 8, transition_channels * 16, k=3, s=2) 85 | self.conv3_for_downsample2 = Multi_Concat_Block(transition_channels * 32, panet_channels * 8, transition_channels * 16, e=e, n=n, ids=ids) 86 | 87 | self.rep_conv_1 = Conv(transition_channels * 4, transition_channels * 8, 3, 1) 88 | self.rep_conv_2 = Conv(transition_channels * 8, transition_channels * 16, 3, 1) 89 | self.rep_conv_3 = Conv(transition_channels * 16, transition_channels * 32, 3, 1) 90 | 91 | self.yolo_head_P3 = nn.Conv2d(transition_channels * 8, len(anchors_mask[2]) * (6 + num_classes), 1) 92 | self.yolo_head_P4 = nn.Conv2d(transition_channels * 16, len(anchors_mask[1]) * (6 + num_classes), 1) 93 | self.yolo_head_P5 = nn.Conv2d(transition_channels * 32, len(anchors_mask[0]) * (6 + num_classes), 1) 94 | 95 | def fuse(self): 96 | print('Fusing layers... ') 97 | for m in self.modules(): 98 | if type(m) is Conv and hasattr(m, 'bn'): 99 | m.conv = fuse_conv_and_bn(m.conv, m.bn) 100 | delattr(m, 'bn') 101 | m.forward = m.fuseforward 102 | return self 103 | 104 | def forward(self, x): 105 | # backbone 106 | feat1, feat2, feat3 = self.backbone.forward(x) 107 | 108 | P5 = self.sppcspc(feat3) 109 | P5_conv = self.conv_for_P5(P5) 110 | P5_upsample = self.upsample(P5_conv) 111 | P4 = torch.cat([self.conv_for_feat2(feat2), P5_upsample], 1) 112 | P4 = self.conv3_for_upsample1(P4) 113 | 114 | P4_conv = self.conv_for_P4(P4) 115 | P4_upsample = self.upsample(P4_conv) 116 | P3 = torch.cat([self.conv_for_feat1(feat1), P4_upsample], 1) 117 | P3 = self.conv3_for_upsample2(P3) 118 | 119 | P3_downsample = self.down_sample1(P3) 120 | P4 = torch.cat([P3_downsample, P4], 1) 121 | P4 = self.conv3_for_downsample1(P4) 122 | 123 | P4_downsample = self.down_sample2(P4) 124 | P5 = torch.cat([P4_downsample, P5], 1) 125 | P5 = self.conv3_for_downsample2(P5) 126 | 127 | P3 = self.rep_conv_1(P3) 128 | P4 = self.rep_conv_2(P4) 129 | P5 = self.rep_conv_3(P5) 130 | #---------------------------------------------------# 131 | # 第三个特征层 132 | # y3=(batch_size, 78, 80, 80) 133 | #---------------------------------------------------# 134 | out2 = self.yolo_head_P3(P3) 135 | #---------------------------------------------------# 136 | # 第二个特征层 137 | # y2=(batch_size, 78, 40, 40) 138 | #---------------------------------------------------# 139 | out1 = self.yolo_head_P4(P4) 140 | #---------------------------------------------------# 141 | # 第一个特征层 142 | # y1=(batch_size, 78, 20, 20) 143 | #---------------------------------------------------# 144 | out0 = self.yolo_head_P5(P5) 145 | 146 | return [out0, out1, out2] 147 | -------------------------------------------------------------------------------- /predict.py: -------------------------------------------------------------------------------- 1 | #-----------------------------------------------------------------------# 2 | # predict.py将单张图片预测、摄像头检测、FPS测试和目录遍历检测等功能 3 | # 整合到了一个py文件中,通过指定mode进行模式的修改。 4 | #-----------------------------------------------------------------------# 5 | import time 6 | 7 | import cv2 8 | import numpy as np 9 | from PIL import Image 10 | 11 | from yolo import YOLO 12 | 13 | if __name__ == "__main__": 14 | yolo = YOLO() 15 | #----------------------------------------------------------------------------------------------------------# 16 | # mode用于指定测试的模式: 17 | # 'predict' 表示单张图片预测,如果想对预测过程进行修改,如保存图片,截取对象等,可以先看下方详细的注释 18 | # 'video' 表示视频检测,可调用摄像头或者视频进行检测,详情查看下方注释。 19 | # 'fps' 表示测试fps,使用的图片是img里面的street.jpg,详情查看下方注释。 20 | # 'dir_predict' 表示遍历文件夹进行检测并保存。默认遍历img文件夹,保存img_out文件夹,详情查看下方注释。 21 | # 'heatmap' 表示进行预测结果的热力图可视化,详情查看下方注释。 22 | # 'export_onnx' 表示将模型导出为onnx,需要pytorch1.7.1以上。 23 | #----------------------------------------------------------------------------------------------------------# 24 | mode = "predict" 25 | #-------------------------------------------------------------------------# 26 | # crop 指定了是否在单张图片预测后对目标进行截取 27 | # count 指定了是否进行目标的计数 28 | # crop、count仅在mode='predict'时有效 29 | #-------------------------------------------------------------------------# 30 | crop = False 31 | count = False 32 | #----------------------------------------------------------------------------------------------------------# 33 | # video_path 用于指定视频的路径,当video_path=0时表示检测摄像头 34 | # 想要检测视频,则设置如video_path = "xxx.mp4"即可,代表读取出根目录下的xxx.mp4文件。 35 | # video_save_path 表示视频保存的路径,当video_save_path=""时表示不保存 36 | # 想要保存视频,则设置如video_save_path = "yyy.mp4"即可,代表保存为根目录下的yyy.mp4文件。 37 | # video_fps 用于保存的视频的fps 38 | # 39 | # video_path、video_save_path和video_fps仅在mode='video'时有效 40 | # 保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。 41 | #----------------------------------------------------------------------------------------------------------# 42 | video_path = "img/input.mp4" 43 | video_save_path = "img/output.mp4" 44 | video_fps = 25.0 45 | #----------------------------------------------------------------------------------------------------------# 46 | # test_interval 用于指定测量fps的时候,图片检测的次数。理论上test_interval越大,fps越准确。 47 | # fps_image_path 用于指定测试的fps图片 48 | # 49 | # test_interval和fps_image_path仅在mode='fps'有效 50 | #----------------------------------------------------------------------------------------------------------# 51 | test_interval = 100 52 | fps_image_path = "img/test.jpg" 53 | #-------------------------------------------------------------------------# 54 | # dir_origin_path 指定了用于检测的图片的文件夹路径 55 | # dir_save_path 指定了检测完图片的保存路径 56 | # 57 | # dir_origin_path和dir_save_path仅在mode='dir_predict'时有效 58 | #-------------------------------------------------------------------------# 59 | dir_origin_path = "img/" 60 | dir_save_path = "img_out/" 61 | #-------------------------------------------------------------------------# 62 | # heatmap_save_path 热力图的保存路径,默认保存在model_data下 63 | # 64 | # heatmap_save_path仅在mode='heatmap'有效 65 | #-------------------------------------------------------------------------# 66 | heatmap_save_path = "model_data/heatmap_vision.png" 67 | #-------------------------------------------------------------------------# 68 | # simplify 使用Simplify onnx 69 | # onnx_save_path 指定了onnx的保存路径 70 | #-------------------------------------------------------------------------# 71 | simplify = True 72 | onnx_save_path = "model_data/models.onnx" 73 | 74 | if mode == "predict": 75 | ''' 76 | 1、如果想要进行检测完的图片的保存,利用r_image.save("img.jpg")即可保存,直接在predict.py里进行修改即可。 77 | 2、如果想要获得预测框的坐标,可以进入yolo.detect_image函数,在绘图部分读取top,left,bottom,right这四个值。 78 | 3、如果想要利用预测框截取下目标,可以进入yolo.detect_image函数,在绘图部分利用获取到的top,left,bottom,right这四个值 79 | 在原图上利用矩阵的方式进行截取。 80 | 4、如果想要在预测图上写额外的字,比如检测到的特定目标的数量,可以进入yolo.detect_image函数,在绘图部分对predicted_class进行判断, 81 | 比如判断if predicted_class == 'car': 即可判断当前目标是否为车,然后记录数量即可。利用draw.text即可写字。 82 | ''' 83 | while True: 84 | img = input('Input image filename:') 85 | try: 86 | image = Image.open(img) 87 | except: 88 | print('Open Error! Try again!') 89 | continue 90 | else: 91 | r_image = yolo.detect_image(image, crop = crop, count=count) 92 | r_image.show() 93 | 94 | elif mode == "video": 95 | capture = cv2.VideoCapture(video_path) 96 | if video_save_path!="": 97 | fourcc = cv2.VideoWriter_fourcc(*'XVID') 98 | size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))) 99 | out = cv2.VideoWriter(video_save_path, fourcc, video_fps, size) 100 | 101 | ref, frame = capture.read() 102 | if not ref: 103 | raise ValueError("未能正确读取摄像头(视频),请注意是否正确安装摄像头(是否正确填写视频路径)。") 104 | 105 | fps = 0.0 106 | while(True): 107 | t1 = time.time() 108 | # 读取某一帧 109 | ref, frame = capture.read() 110 | if not ref: 111 | break 112 | # 格式转变,BGRtoRGB 113 | frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB) 114 | # 转变成Image 115 | frame = Image.fromarray(np.uint8(frame)) 116 | # 进行检测 117 | frame = np.array(yolo.detect_image(frame)) 118 | # RGBtoBGR满足opencv显示格式 119 | frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR) 120 | 121 | fps = ( fps + (1./(time.time()-t1)) ) / 2 122 | print("fps= %.2f"%(fps)) 123 | frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 124 | 125 | cv2.imshow("video",frame) 126 | c= cv2.waitKey(1) & 0xff 127 | if video_save_path!="": 128 | out.write(frame) 129 | 130 | if c==27: 131 | capture.release() 132 | break 133 | 134 | print("Video Detection Done!") 135 | capture.release() 136 | if video_save_path!="": 137 | print("Save processed video to the path :" + video_save_path) 138 | out.release() 139 | cv2.destroyAllWindows() 140 | 141 | elif mode == "fps": 142 | img = Image.open(fps_image_path) 143 | tact_time = yolo.get_FPS(img, test_interval) 144 | print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1') 145 | 146 | elif mode == "dir_predict": 147 | import os 148 | 149 | from tqdm import tqdm 150 | 151 | img_names = os.listdir(dir_origin_path) 152 | for img_name in tqdm(img_names): 153 | if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')): 154 | image_path = os.path.join(dir_origin_path, img_name) 155 | image = Image.open(image_path) 156 | r_image = yolo.detect_image(image) 157 | if not os.path.exists(dir_save_path): 158 | os.makedirs(dir_save_path) 159 | r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0) 160 | 161 | elif mode == "heatmap": 162 | while True: 163 | img = input('Input image filename:') 164 | try: 165 | image = Image.open(img) 166 | except: 167 | print('Open Error! Try again!') 168 | continue 169 | else: 170 | yolo.detect_heatmap(image, heatmap_save_path) 171 | 172 | elif mode == "export_onnx": 173 | yolo.convert_to_onnx(simplify, onnx_save_path) 174 | 175 | else: 176 | raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps', 'heatmap', 'export_onnx', 'dir_predict'.") 177 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | scipy==1.9.1 2 | numpy==1.23.1 3 | matplotlib==3.4.3 4 | opencv_python==4.7.0 5 | torch==1.10.1 6 | torchvision==0.11.2 7 | tqdm==4.62.2 8 | Pillow==9.3.0 9 | h5py==2.10.0 10 | -------------------------------------------------------------------------------- /summary.py: -------------------------------------------------------------------------------- 1 | #--------------------------------------------# 2 | # 该部分代码用于看网络结构 3 | #--------------------------------------------# 4 | import torch 5 | from thop import clever_format, profile 6 | 7 | from nets.yolo import YoloBody 8 | 9 | if __name__ == "__main__": 10 | input_shape = [640, 640] 11 | anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] 12 | num_classes = 80 13 | phi = 'l' 14 | 15 | device = torch.device("cuda" if torch.cuda.is_available() else "cpu") 16 | m = YoloBody(anchors_mask, num_classes, phi, False).to(device) 17 | for i in m.children(): 18 | print(i) 19 | print('==============================') 20 | 21 | dummy_input = torch.randn(1, 3, input_shape[0], input_shape[1]).to(device) 22 | flops, params = profile(m.to(device), (dummy_input, ), verbose=False) 23 | #--------------------------------------------------------# 24 | # flops * 2是因为profile没有将卷积作为两个operations 25 | # 有些论文将卷积算乘法、加法两个operations。此时乘2 26 | # 有些论文只考虑乘法的运算次数,忽略加法。此时不乘2 27 | # 本代码选择乘2,参考YOLOX。 28 | #--------------------------------------------------------# 29 | flops = flops * 2 30 | flops, params = clever_format([flops, params], "%.3f") 31 | print('Total GFLOPS: %s' % (flops)) 32 | print('Total params: %s' % (params)) 33 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /utils/callbacks.py: -------------------------------------------------------------------------------- 1 | import datetime 2 | import os 3 | 4 | import torch 5 | import matplotlib 6 | matplotlib.use('Agg') 7 | import scipy.signal 8 | from matplotlib import pyplot as plt 9 | from torch.utils.tensorboard import SummaryWriter 10 | from utils.utils_rbox import rbox2poly, poly2hbb 11 | import shutil 12 | import numpy as np 13 | 14 | from PIL import Image 15 | from tqdm import tqdm 16 | from .utils import cvtColor, preprocess_input, resize_image 17 | from .utils_bbox import DecodeBox 18 | from .utils_map import get_coco_map, get_map 19 | 20 | 21 | class LossHistory(): 22 | def __init__(self, log_dir, model, input_shape): 23 | self.log_dir = log_dir 24 | self.losses = [] 25 | self.val_loss = [] 26 | 27 | os.makedirs(self.log_dir) 28 | self.writer = SummaryWriter(self.log_dir) 29 | try: 30 | dummy_input = torch.randn(2, 3, input_shape[0], input_shape[1]) 31 | self.writer.add_graph(model, dummy_input) 32 | except: 33 | pass 34 | 35 | def append_loss(self, epoch, loss, val_loss): 36 | if not os.path.exists(self.log_dir): 37 | os.makedirs(self.log_dir) 38 | 39 | self.losses.append(loss) 40 | self.val_loss.append(val_loss) 41 | 42 | with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f: 43 | f.write(str(loss)) 44 | f.write("\n") 45 | with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f: 46 | f.write(str(val_loss)) 47 | f.write("\n") 48 | 49 | self.writer.add_scalar('loss', loss, epoch) 50 | self.writer.add_scalar('val_loss', val_loss, epoch) 51 | self.loss_plot() 52 | 53 | def loss_plot(self): 54 | iters = range(len(self.losses)) 55 | 56 | plt.figure() 57 | plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss') 58 | plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss') 59 | try: 60 | if len(self.losses) < 25: 61 | num = 5 62 | else: 63 | num = 15 64 | 65 | plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss') 66 | plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss') 67 | except: 68 | pass 69 | 70 | plt.grid(True) 71 | plt.xlabel('Epoch') 72 | plt.ylabel('Loss') 73 | plt.legend(loc="upper right") 74 | 75 | plt.savefig(os.path.join(self.log_dir, "epoch_loss.png")) 76 | 77 | plt.cla() 78 | plt.close("all") 79 | 80 | class EvalCallback(): 81 | def __init__(self, net, input_shape, anchors, anchors_mask, class_names, num_classes, val_lines, log_dir, cuda, \ 82 | map_out_path=".temp_map_out", max_boxes=100, confidence=0.05, nms_iou=0.5, letterbox_image=False, MINOVERLAP=0.5, eval_flag=True, period=1): 83 | super(EvalCallback, self).__init__() 84 | 85 | self.net = net 86 | self.input_shape = input_shape 87 | self.anchors = anchors 88 | self.anchors_mask = anchors_mask 89 | self.class_names = class_names 90 | self.num_classes = num_classes 91 | self.val_lines = val_lines 92 | self.log_dir = log_dir 93 | self.cuda = cuda 94 | self.map_out_path = map_out_path 95 | self.max_boxes = max_boxes 96 | self.confidence = confidence 97 | self.nms_iou = nms_iou 98 | self.letterbox_image = letterbox_image 99 | self.MINOVERLAP = MINOVERLAP 100 | self.eval_flag = eval_flag 101 | self.period = period 102 | 103 | self.bbox_util = DecodeBox(self.anchors, self.num_classes, (self.input_shape[0], self.input_shape[1]), self.anchors_mask) 104 | 105 | self.maps = [0] 106 | self.epoches = [0] 107 | if self.eval_flag: 108 | with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f: 109 | f.write(str(0)) 110 | f.write("\n") 111 | 112 | def get_map_txt(self, image_id, image, class_names, map_out_path): 113 | f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"), "w", encoding='utf-8') 114 | image_shape = np.array(np.shape(image)[0:2]) 115 | #---------------------------------------------------------# 116 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 117 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 118 | #---------------------------------------------------------# 119 | image = cvtColor(image) 120 | #---------------------------------------------------------# 121 | # 给图像增加灰条,实现不失真的resize 122 | # 也可以直接resize进行识别 123 | #---------------------------------------------------------# 124 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 125 | #---------------------------------------------------------# 126 | # 添加上batch_size维度 127 | #---------------------------------------------------------# 128 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 129 | 130 | with torch.no_grad(): 131 | images = torch.from_numpy(image_data) 132 | if self.cuda: 133 | images = images.cuda() 134 | #---------------------------------------------------------# 135 | # 将图像输入网络当中进行预测! 136 | #---------------------------------------------------------# 137 | outputs = self.net(images) 138 | outputs = self.bbox_util.decode_box(outputs) 139 | #---------------------------------------------------------# 140 | # 将预测框进行堆叠,然后进行非极大抑制 141 | #---------------------------------------------------------# 142 | results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 143 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 144 | 145 | if results[0] is None: 146 | return 147 | 148 | top_label = np.array(results[0][:, 7], dtype = 'int32') 149 | top_conf = results[0][:, 5] * results[0][:, 6] 150 | top_rboxes = results[0][:, :5] 151 | top_polys = rbox2poly(top_rboxes) 152 | top_hbbs = poly2hbb(top_polys) 153 | top_100 = np.argsort(top_conf)[::-1][:self.max_boxes] 154 | top_hbbs = top_hbbs[top_100] 155 | top_conf = top_conf[top_100] 156 | top_label = top_label[top_100] 157 | 158 | for i, c in list(enumerate(top_label)): 159 | predicted_class = self.class_names[int(c)] 160 | hbb = top_hbbs[i] 161 | score = str(top_conf[i]) 162 | 163 | xc, yc, w, h = hbb 164 | left = xc - w/2 165 | top = yc - h/2 166 | right = xc + w/2 167 | bottom = yc + h/2 168 | if predicted_class not in class_names: 169 | continue 170 | 171 | f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom)))) 172 | 173 | f.close() 174 | return 175 | 176 | def on_epoch_end(self, epoch, model_eval): 177 | if epoch % self.period == 0 and self.eval_flag: 178 | self.net = model_eval 179 | if not os.path.exists(self.map_out_path): 180 | os.makedirs(self.map_out_path) 181 | if not os.path.exists(os.path.join(self.map_out_path, "ground-truth")): 182 | os.makedirs(os.path.join(self.map_out_path, "ground-truth")) 183 | if not os.path.exists(os.path.join(self.map_out_path, "detection-results")): 184 | os.makedirs(os.path.join(self.map_out_path, "detection-results")) 185 | print("Get map.") 186 | for annotation_line in tqdm(self.val_lines): 187 | line = annotation_line.split() 188 | image_id = os.path.basename(line[0]).split('.')[0] 189 | #------------------------------# 190 | # 读取图像并转换成RGB图像 191 | #------------------------------# 192 | image = Image.open(line[0]) 193 | #------------------------------# 194 | # 获得预测框 195 | #------------------------------# 196 | gt_boxes = np.array([np.array(list(map(float,box.split(',')))) for box in line[1:]]) 197 | #------------------------------# 198 | # 将polygon转换为hbb 199 | #------------------------------# 200 | hbbs = np.zeros((gt_boxes.shape[0], 5)) 201 | hbbs[..., :4] = poly2hbb(gt_boxes[..., :8]) 202 | hbbs[..., 4] = gt_boxes[..., 8] 203 | #------------------------------# 204 | # 获得预测txt 205 | #------------------------------# 206 | self.get_map_txt(image_id, image, self.class_names, self.map_out_path) 207 | 208 | #------------------------------# 209 | # 获得真实框txt 210 | #------------------------------# 211 | with open(os.path.join(self.map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f: 212 | for hbb in hbbs: 213 | xc, yc, w, h, obj = hbb 214 | left = xc - w/2 215 | top = yc - h/2 216 | right = xc + w/2 217 | bottom = yc + h/2 218 | obj_name = self.class_names[int(obj)] 219 | new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom)) 220 | 221 | print("Calculate Map.") 222 | try: 223 | temp_map = get_coco_map(class_names = self.class_names, path = self.map_out_path)[1] 224 | except: 225 | temp_map = get_map(self.MINOVERLAP, False, path = self.map_out_path) 226 | self.maps.append(temp_map) 227 | self.epoches.append(epoch) 228 | 229 | with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f: 230 | f.write(str(temp_map)) 231 | f.write("\n") 232 | 233 | plt.figure() 234 | plt.plot(self.epoches, self.maps, 'red', linewidth = 2, label='train map') 235 | 236 | plt.grid(True) 237 | plt.xlabel('Epoch') 238 | plt.ylabel('Map %s'%str(self.MINOVERLAP)) 239 | plt.title('A Map Curve') 240 | plt.legend(loc="upper right") 241 | 242 | plt.savefig(os.path.join(self.log_dir, "epoch_map.png")) 243 | plt.cla() 244 | plt.close("all") 245 | 246 | print("Get map done.") 247 | shutil.rmtree(self.map_out_path) 248 | -------------------------------------------------------------------------------- /utils/dataloader.py: -------------------------------------------------------------------------------- 1 | from random import sample, shuffle 2 | 3 | import cv2 4 | import numpy as np 5 | import torch 6 | from PIL import Image, ImageDraw 7 | from torch.utils.data.dataset import Dataset 8 | 9 | from utils.utils import cvtColor, preprocess_input 10 | from utils.utils_rbox import poly2rbox, rbox2poly 11 | 12 | class YoloDataset(Dataset): 13 | def __init__(self, annotation_lines, input_shape, num_classes, anchors, anchors_mask, epoch_length, \ 14 | mosaic, mixup, mosaic_prob, mixup_prob, train, special_aug_ratio = 0.7): 15 | super(YoloDataset, self).__init__() 16 | self.annotation_lines = annotation_lines 17 | self.input_shape = input_shape 18 | self.num_classes = num_classes 19 | self.anchors = anchors 20 | self.anchors_mask = anchors_mask 21 | self.epoch_length = epoch_length 22 | self.mosaic = mosaic 23 | self.mosaic_prob = mosaic_prob 24 | self.mixup = mixup 25 | self.mixup_prob = mixup_prob 26 | self.train = train 27 | self.special_aug_ratio = special_aug_ratio 28 | 29 | self.epoch_now = -1 30 | self.length = len(self.annotation_lines) 31 | 32 | self.bbox_attrs = 5 + 1 + num_classes 33 | 34 | def __len__(self): 35 | return self.length 36 | 37 | def __getitem__(self, index): 38 | index = index % self.length 39 | 40 | #---------------------------------------------------# 41 | # 训练时进行数据的随机增强 42 | # 验证时不进行数据的随机增强 43 | #---------------------------------------------------# 44 | if self.mosaic and self.rand() < self.mosaic_prob and self.epoch_now < self.epoch_length * self.special_aug_ratio: 45 | lines = sample(self.annotation_lines, 3) 46 | lines.append(self.annotation_lines[index]) 47 | shuffle(lines) 48 | image, rbox = self.get_random_data_with_Mosaic(lines, self.input_shape) 49 | 50 | if self.mixup and self.rand() < self.mixup_prob: 51 | lines = sample(self.annotation_lines, 1) 52 | image_2, rbox_2 = self.get_random_data(lines[0], self.input_shape, random = self.train) 53 | image, rbox = self.get_random_data_with_MixUp(image, rbox, image_2, rbox_2) 54 | else: 55 | image, rbox = self.get_random_data(self.annotation_lines[index], self.input_shape, random = self.train) 56 | 57 | image = np.transpose(preprocess_input(np.array(image, dtype=np.float32)), (2, 0, 1)) 58 | rbox = np.array(rbox, dtype=np.float32) 59 | 60 | #---------------------------------------------------# 61 | # 对真实框进行预处理 62 | #---------------------------------------------------# 63 | nL = len(rbox) 64 | labels_out = np.zeros((nL, 7)) 65 | if nL: 66 | #---------------------------------------------------# 67 | # 对真实框进行归一化,调整到0-1之间 68 | #---------------------------------------------------# 69 | rbox[:, [0, 2]] = rbox[:, [0, 2]] / self.input_shape[1] 70 | rbox[:, [1, 3]] = rbox[:, [1, 3]] / self.input_shape[0] 71 | #---------------------------------------------------# 72 | #---------------------------------------------------# 73 | # 调整顺序,符合训练的格式 74 | # labels_out中序号为0的部分在collect时处理 75 | #---------------------------------------------------# 76 | labels_out[:, 1] = rbox[:, -1] 77 | labels_out[:, 2:] = rbox[:, :5] 78 | 79 | return image, labels_out 80 | 81 | def rand(self, a=0, b=1): 82 | return np.random.rand()*(b-a) + a 83 | 84 | def get_random_data(self, annotation_line, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.4, random=True, show=False): 85 | line = annotation_line.split() 86 | #------------------------------# 87 | # 读取图像并转换成RGB图像 88 | #------------------------------# 89 | image = Image.open(line[0]) 90 | image = cvtColor(image) 91 | #------------------------------# 92 | # 获得图像的高宽与目标高宽 93 | #------------------------------# 94 | iw, ih = image.size 95 | h, w = input_shape 96 | #------------------------------# 97 | # 获得预测框 98 | #------------------------------# 99 | box = np.array([np.array(list(map(float,box.split(',')))) for box in line[1:]]) 100 | 101 | if not random: 102 | scale = min(w/iw, h/ih) 103 | nw = int(iw*scale) 104 | nh = int(ih*scale) 105 | dx = (w-nw)//2 106 | dy = (h-nh)//2 107 | 108 | #---------------------------------# 109 | # 将图像多余的部分加上灰条 110 | #---------------------------------# 111 | image = image.resize((nw,nh), Image.BICUBIC) 112 | new_image = Image.new('RGB', (w,h), (128,128,128)) 113 | new_image.paste(image, (dx, dy)) 114 | image_data = np.array(new_image, np.float32) 115 | 116 | #---------------------------------# 117 | # 对真实框进行调整 118 | #---------------------------------# 119 | if len(box)>0: 120 | np.random.shuffle(box) 121 | box[:, [0,2,4,6]] = box[:, [0,2,4,6]]*nw/iw + dx 122 | box[:, [1,3,5,7]] = box[:, [1,3,5,7]]*nh/ih + dy 123 | #------------------------------# 124 | # 将polygon转换为rbox 125 | #------------------------------# 126 | rbox = np.zeros((box.shape[0], 6)) 127 | rbox[..., :5] = poly2rbox(box[..., :8]) 128 | rbox[..., 5] = box[..., 8] 129 | keep = (rbox[:, 0] >= 0) & (rbox[:, 0] < w) \ 130 | & (rbox[:, 1] >= 0) & (rbox[:, 0] < h) \ 131 | & (rbox[:, 2] > 5) | (rbox[:, 3] > 5) 132 | rbox = rbox[keep] 133 | return image_data, rbox 134 | 135 | #------------------------------------------# 136 | # 对图像进行缩放并且进行长和宽的扭曲 137 | #------------------------------------------# 138 | new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter) 139 | scale = self.rand(.25, 2) 140 | if new_ar < 1: 141 | nh = int(scale*h) 142 | nw = int(nh*new_ar) 143 | else: 144 | nw = int(scale*w) 145 | nh = int(nw/new_ar) 146 | image = image.resize((nw,nh), Image.BICUBIC) 147 | 148 | #------------------------------------------# 149 | # 将图像多余的部分加上灰条 150 | #------------------------------------------# 151 | dx = int(self.rand(0, w-nw)) 152 | dy = int(self.rand(0, h-nh)) 153 | new_image = Image.new('RGB', (w,h), (128,128,128)) 154 | new_image.paste(image, (dx, dy)) 155 | image = new_image 156 | #------------------------------------------# 157 | # 翻转图像 158 | #------------------------------------------# 159 | flip = self.rand()<.5 160 | if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT) 161 | 162 | image_data = np.array(image, np.uint8) 163 | #---------------------------------# 164 | # 对图像进行色域变换 165 | # 计算色域变换的参数 166 | #---------------------------------# 167 | r = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1 168 | #---------------------------------# 169 | # 将图像转到HSV上 170 | #---------------------------------# 171 | hue, sat, val = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV)) 172 | dtype = image_data.dtype 173 | #---------------------------------# 174 | # 应用变换 175 | #---------------------------------# 176 | x = np.arange(0, 256, dtype=r.dtype) 177 | lut_hue = ((x * r[0]) % 180).astype(dtype) 178 | lut_sat = np.clip(x * r[1], 0, 255).astype(dtype) 179 | lut_val = np.clip(x * r[2], 0, 255).astype(dtype) 180 | 181 | image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))) 182 | image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB) 183 | #---------------------------------# 184 | # 对真实框进行调整 185 | #---------------------------------# 186 | if len(box)>0: 187 | np.random.shuffle(box) 188 | box[:, [0,2,4,6]] = box[:, [0,2,4,6]]*nw/iw + dx 189 | box[:, [1,3,5,7]] = box[:, [1,3,5,7]]*nh/ih + dy 190 | if flip: box[:, [0,2,4,6]] = w - box[:, [0,2,4,6]] 191 | #------------------------------# 192 | # 将polygon转换为rbox 193 | #------------------------------# 194 | rbox = np.zeros((box.shape[0], 6)) 195 | rbox[..., :5] = poly2rbox(box[..., :8]) 196 | rbox[..., 5] = box[..., 8] 197 | keep = (rbox[:, 0] >= 0) & (rbox[:, 0] < w) \ 198 | & (rbox[:, 1] >= 0) & (rbox[:, 0] < h) \ 199 | & (rbox[:, 2] > 5) | (rbox[:, 3] > 5) 200 | rbox = rbox[keep] 201 | #------------------------------# 202 | # 检查旋转框 203 | #------------------------------# 204 | if show: 205 | draw = ImageDraw.Draw(image) 206 | polys = rbox2poly(rbox[..., :5]) 207 | for poly in polys: 208 | draw.polygon(xy=list(poly)) 209 | image.show() 210 | return image_data, rbox 211 | 212 | def merge_rboxes(self, rboxes, cutx, cuty): 213 | merge_rbox = [] 214 | for i in range(len(rboxes)): 215 | for rbox in rboxes[i]: 216 | tmp_rbox = [] 217 | xc, yc, w, h = rbox[0], rbox[1], rbox[2], rbox[3] 218 | tmp_rbox.append(xc) 219 | tmp_rbox.append(yc) 220 | tmp_rbox.append(h) 221 | tmp_rbox.append(w) 222 | tmp_rbox.append(rbox[-1]) 223 | merge_rbox.append(rbox) 224 | merge_rbox = np.array(merge_rbox) 225 | return merge_rbox 226 | 227 | def get_random_data_with_Mosaic(self, annotation_line, input_shape, jitter=0.3, hue=.1, sat=0.7, val=0.4, show=False): 228 | h, w = input_shape 229 | min_offset_x = self.rand(0.3, 0.7) 230 | min_offset_y = self.rand(0.3, 0.7) 231 | 232 | image_datas = [] 233 | rbox_datas = [] 234 | index = 0 235 | for line in annotation_line: 236 | #---------------------------------# 237 | # 每一行进行分割 238 | #---------------------------------# 239 | line_content = line.split() 240 | #---------------------------------# 241 | # 打开图片 242 | #---------------------------------# 243 | image = Image.open(line_content[0]) 244 | image = cvtColor(image) 245 | 246 | #---------------------------------# 247 | # 图片的大小 248 | #---------------------------------# 249 | iw, ih = image.size 250 | #---------------------------------# 251 | # 保存框的位置 252 | #---------------------------------# 253 | box = np.array([np.array(list(map(float,box.split(',')))) for box in line_content[1:]]) 254 | #---------------------------------# 255 | # 是否翻转图片 256 | #---------------------------------# 257 | flip = self.rand()<.5 258 | if flip and len(box)>0: 259 | image = image.transpose(Image.FLIP_LEFT_RIGHT) 260 | box[:, [0,2,4,6]] = iw - box[:, [0,2,4,6]] 261 | #------------------------------------------# 262 | # 对图像进行缩放并且进行长和宽的扭曲 263 | #------------------------------------------# 264 | new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter) 265 | scale = self.rand(.4, 1) 266 | if new_ar < 1: 267 | nh = int(scale*h) 268 | nw = int(nh*new_ar) 269 | else: 270 | nw = int(scale*w) 271 | nh = int(nw/new_ar) 272 | image = image.resize((nw, nh), Image.BICUBIC) 273 | 274 | #-----------------------------------------------# 275 | # 将图片进行放置,分别对应四张分割图片的位置 276 | #-----------------------------------------------# 277 | if index == 0: 278 | dx = int(w*min_offset_x) - nw 279 | dy = int(h*min_offset_y) - nh 280 | elif index == 1: 281 | dx = int(w*min_offset_x) - nw 282 | dy = int(h*min_offset_y) 283 | elif index == 2: 284 | dx = int(w*min_offset_x) 285 | dy = int(h*min_offset_y) 286 | elif index == 3: 287 | dx = int(w*min_offset_x) 288 | dy = int(h*min_offset_y) - nh 289 | 290 | new_image = Image.new('RGB', (w,h), (128,128,128)) 291 | new_image.paste(image, (dx, dy)) 292 | image_data = np.array(new_image) 293 | 294 | index = index + 1 295 | rbox_data = [] 296 | #---------------------------------# 297 | # 对rbox进行重新处理 298 | #---------------------------------# 299 | if len(box)>0: 300 | np.random.shuffle(box) 301 | box[:, [0,2,4,6]] = box[:, [0,2,4,6]]*nw/iw + dx 302 | box[:, [1,3,5,7]] = box[:, [1,3,5,7]]*nh/ih + dy 303 | #------------------------------# 304 | # 将polygon转换为rbox 305 | #------------------------------# 306 | rbox = np.zeros((box.shape[0], 6)) 307 | rbox[..., :5] = poly2rbox(box[..., :8]) 308 | rbox[..., 5] = box[..., 8] 309 | keep = (rbox[:, 0] >= 0) & (rbox[:, 0] < w) \ 310 | & (rbox[:, 1] >= 0) & (rbox[:, 0] < h) \ 311 | & (rbox[:, 2] > 5) | (rbox[:, 3] > 5) 312 | rbox = rbox[keep] 313 | rbox_data = np.zeros((len(rbox),6)) 314 | rbox_data[:len(rbox)] = rbox 315 | 316 | image_datas.append(image_data) 317 | rbox_datas.append(rbox_data) 318 | 319 | #---------------------------------# 320 | # 将图片分割,放在一起 321 | #---------------------------------# 322 | cutx = int(w * min_offset_x) 323 | cuty = int(h * min_offset_y) 324 | 325 | new_image = np.zeros([h, w, 3]) 326 | new_image[:cuty, :cutx, :] = image_datas[0][:cuty, :cutx, :] 327 | new_image[cuty:, :cutx, :] = image_datas[1][cuty:, :cutx, :] 328 | new_image[cuty:, cutx:, :] = image_datas[2][cuty:, cutx:, :] 329 | new_image[:cuty, cutx:, :] = image_datas[3][:cuty, cutx:, :] 330 | 331 | new_image = np.array(new_image, np.uint8) 332 | #---------------------------------# 333 | # 对图像进行色域变换 334 | # 计算色域变换的参数 335 | #---------------------------------# 336 | r = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1 337 | #---------------------------------# 338 | # 将图像转到HSV上 339 | #---------------------------------# 340 | hue, sat, val = cv2.split(cv2.cvtColor(new_image, cv2.COLOR_RGB2HSV)) 341 | dtype = new_image.dtype 342 | #---------------------------------# 343 | # 应用变换 344 | #---------------------------------# 345 | x = np.arange(0, 256, dtype=r.dtype) 346 | lut_hue = ((x * r[0]) % 180).astype(dtype) 347 | lut_sat = np.clip(x * r[1], 0, 255).astype(dtype) 348 | lut_val = np.clip(x * r[2], 0, 255).astype(dtype) 349 | 350 | new_image = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))) 351 | new_image = cv2.cvtColor(new_image, cv2.COLOR_HSV2RGB) 352 | 353 | #---------------------------------# 354 | # 对框进行进一步的处理 355 | #---------------------------------# 356 | new_rboxes = self.merge_rboxes(rbox_datas, cutx, cuty) 357 | #---------------------------------# 358 | # 检查旋转框 359 | #---------------------------------# 360 | if show: 361 | new_img = Image.fromarray(new_image) 362 | draw = ImageDraw.Draw(new_img) 363 | polys = rbox2poly(new_rboxes[..., :5]) 364 | for poly in polys: 365 | draw.polygon(xy=list(poly)) 366 | new_img.show() 367 | return new_image, new_rboxes 368 | 369 | def get_random_data_with_MixUp(self, image_1, rbox_1, image_2, rbox_2): 370 | new_image = np.array(image_1, np.float32) * 0.5 + np.array(image_2, np.float32) * 0.5 371 | if len(rbox_1) == 0: 372 | new_rboxes = rbox_2 373 | elif len(rbox_2) == 0: 374 | new_rboxes = rbox_1 375 | else: 376 | new_rboxes = np.concatenate([rbox_1, rbox_2], axis=0) 377 | return new_image, new_rboxes 378 | 379 | 380 | # DataLoader中collate_fn使用 381 | def yolo_dataset_collate(batch): 382 | images = [] 383 | bboxes = [] 384 | for i, (img, box) in enumerate(batch): 385 | images.append(img) 386 | box[:, 0] = i 387 | bboxes.append(box) 388 | 389 | images = torch.from_numpy(np.array(images)).type(torch.FloatTensor) 390 | bboxes = torch.from_numpy(np.concatenate(bboxes, 0)).type(torch.FloatTensor) 391 | return images, bboxes 392 | -------------------------------------------------------------------------------- /utils/kld_loss.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Author: [egrt] 3 | Date: 2023-01-30 18:47:24 4 | LastEditors: Egrt 5 | LastEditTime: 2023-05-26 15:01:10 6 | Description: 7 | ''' 8 | import torch 9 | import torch.nn as nn 10 | 11 | class KLDloss(nn.Module): 12 | 13 | def __init__(self, taf=1.0, fun="sqrt"): 14 | super(KLDloss, self).__init__() 15 | self.fun = fun 16 | self.taf = taf 17 | self.eps = 1e-8 18 | 19 | def forward(self, pred, target): # pred [[x,y,w,h,angle], ...] 20 | #assert pred.shape[0] == target.shape[0] 21 | 22 | pred = pred.view(-1, 5) 23 | target = target.view(-1, 5) 24 | 25 | delta_x = pred[:, 0] - target[:, 0] 26 | delta_y = pred[:, 1] - target[:, 1] 27 | pre_angle_radian = pred[:, 4] 28 | targrt_angle_radian = target[:, 4] 29 | delta_angle_radian = pre_angle_radian - targrt_angle_radian 30 | 31 | kld = 0.5 * ( 32 | 4 * torch.pow( ( delta_x.mul(torch.cos(targrt_angle_radian)) + delta_y.mul(torch.sin(targrt_angle_radian)) ), 2) / torch.pow(target[:, 2], 2) 33 | + 4 * torch.pow( ( delta_y.mul(torch.cos(targrt_angle_radian)) - delta_x.mul(torch.sin(targrt_angle_radian)) ), 2) / torch.pow(target[:, 3], 2) 34 | )\ 35 | + 0.5 * ( 36 | torch.pow(pred[:, 3], 2) / torch.pow(target[:, 2], 2) * torch.pow(torch.sin(delta_angle_radian), 2) 37 | + torch.pow(pred[:, 2], 2) / torch.pow(target[:, 3], 2) * torch.pow(torch.sin(delta_angle_radian), 2) 38 | + torch.pow(pred[:, 3], 2) / torch.pow(target[:, 3], 2) * torch.pow(torch.cos(delta_angle_radian), 2) 39 | + torch.pow(pred[:, 2], 2) / torch.pow(target[:, 2], 2) * torch.pow(torch.cos(delta_angle_radian), 2) 40 | )\ 41 | + 0.5 * ( 42 | torch.log(torch.pow(target[:, 3], 2) / torch.pow(pred[:, 3], 2)) 43 | + torch.log(torch.pow(target[:, 2], 2) / torch.pow(pred[:, 2], 2)) 44 | )\ 45 | - 1.0 46 | 47 | 48 | 49 | if self.fun == "sqrt": 50 | kld = kld.clamp(1e-7).sqrt() 51 | elif self.fun == "log1p": 52 | kld = torch.log1p(kld.clamp(1e-7)) 53 | else: 54 | pass 55 | 56 | kld_loss = 1 - 1 / (self.taf + self.eps + kld) 57 | 58 | return kld_loss 59 | 60 | def compute_kld_loss(targets, preds,taf=1.0,fun='sqrt'): 61 | with torch.no_grad(): 62 | kld_loss_ts_ps = torch.zeros(0, preds.shape[0], device=targets.device) 63 | for target in targets: 64 | target = target.unsqueeze(0).repeat(preds.shape[0], 1) 65 | kld_loss_t_p = kld_loss(preds, target,taf=taf, fun=fun) 66 | kld_loss_ts_ps = torch.cat((kld_loss_ts_ps, kld_loss_t_p.unsqueeze(0)), dim=0) 67 | return kld_loss_ts_ps 68 | 69 | 70 | def kld_loss(pred, target, taf=1.0, fun='sqrt'): # pred [[x,y,w,h,angle], ...] 71 | #assert pred.shape[0] == target.shape[0] 72 | 73 | pred = pred.view(-1, 5) 74 | target = target.view(-1, 5) 75 | 76 | delta_x = pred[:, 0] - target[:, 0] 77 | delta_y = pred[:, 1] - target[:, 1] 78 | pre_angle_radian = pred[:, 4] #3.141592653589793 * pred[:, 4] / 180.0 79 | targrt_angle_radian = target[:, 4] #3.141592653589793 * target[:, 4] / 180.0 80 | delta_angle_radian = pre_angle_radian - targrt_angle_radian 81 | 82 | kld = 0.5 * ( 83 | 4 * torch.pow((delta_x.mul(torch.cos(targrt_angle_radian)) + delta_y.mul(torch.sin(targrt_angle_radian))), 84 | 2) / torch.pow(target[:, 2], 2) 85 | + 4 * torch.pow((delta_y.mul(torch.cos(targrt_angle_radian)) - delta_x.mul(torch.sin(targrt_angle_radian))), 86 | 2) / torch.pow(target[:, 3], 2) 87 | ) \ 88 | + 0.5 * ( 89 | torch.pow(pred[:, 3], 2) / torch.pow(target[:, 2], 2) * torch.pow(torch.sin(delta_angle_radian), 2) 90 | + torch.pow(pred[:, 2], 2) / torch.pow(target[:, 3], 2) * torch.pow(torch.sin(delta_angle_radian), 2) 91 | + torch.pow(pred[:, 3], 2) / torch.pow(target[:, 3], 2) * torch.pow(torch.cos(delta_angle_radian), 2) 92 | + torch.pow(pred[:, 2], 2) / torch.pow(target[:, 2], 2) * torch.pow(torch.cos(delta_angle_radian), 2) 93 | ) \ 94 | + 0.5 * ( 95 | torch.log(torch.pow(target[:, 3], 2) / torch.pow(pred[:, 3], 2)) 96 | + torch.log(torch.pow(target[:, 2], 2) / torch.pow(pred[:, 2], 2)) 97 | ) \ 98 | - 1.0 99 | 100 | if fun == "sqrt": 101 | kld = kld.clamp(1e-7).sqrt() 102 | elif fun == "log1p": 103 | kld = torch.log1p(kld.clamp(1e-7)) 104 | else: 105 | pass 106 | 107 | kld_loss = 1 - 1 / (taf + kld) 108 | return kld_loss 109 | 110 | if __name__ == '__main__': 111 | ''' 112 | 测试损失函数 113 | ''' 114 | kld_loss_n = KLDloss(alpha=1,fun='log1p') 115 | pred = torch.tensor([[5, 5, 5, 23, 0.15],[6,6,5,28,0]]).type(torch.float32) 116 | target = torch.tensor([[5, 5, 5, 24, 0],[6,6,5,28,0]]).type(torch.float32) 117 | kld = kld_loss_n(target,pred) -------------------------------------------------------------------------------- /utils/nms_rotated/__init__.py: -------------------------------------------------------------------------------- 1 | from .nms_rotated_wrapper import obb_nms, poly_nms 2 | 3 | __all__ = ['obb_nms', 'poly_nms'] 4 | -------------------------------------------------------------------------------- /utils/nms_rotated/nms_rotated_wrapper.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | 4 | from . import nms_rotated_ext 5 | 6 | def obb_nms(dets, scores, iou_thr, device_id=None): 7 | """ 8 | RIoU NMS - iou_thr. 9 | Args: 10 | dets (tensor/array): (num, [cx cy w h θ]) θ∈[-pi/2, pi/2) 11 | scores (tensor/array): (num) 12 | iou_thr (float): (1) 13 | Returns: 14 | dets (tensor): (n_nms, [cx cy w h θ]) 15 | inds (tensor): (n_nms), nms index of dets 16 | """ 17 | if isinstance(dets, torch.Tensor): 18 | is_numpy = False 19 | dets_th = dets 20 | elif isinstance(dets, np.ndarray): 21 | is_numpy = True 22 | device = 'cpu' if device_id is None else f'cuda:{device_id}' 23 | dets_th = torch.from_numpy(dets).to(device) 24 | else: 25 | raise TypeError('dets must be eithr a Tensor or numpy array, ' 26 | f'but got {type(dets)}') 27 | 28 | if dets_th.numel() == 0: # len(dets) 29 | inds = dets_th.new_zeros(0, dtype=torch.int64) 30 | else: 31 | # same bug will happen when bboxes is too small 32 | too_small = dets_th[:, [2, 3]].min(1)[0] < 0.001 # [n] 33 | if too_small.all(): # all the bboxes is too small 34 | inds = dets_th.new_zeros(0, dtype=torch.int64) 35 | else: 36 | ori_inds = torch.arange(dets_th.size(0)) # 0 ~ n-1 37 | ori_inds = ori_inds[~too_small] 38 | dets_th = dets_th[~too_small] # (n_filter, 5) 39 | scores = scores[~too_small] 40 | 41 | inds = nms_rotated_ext.nms_rotated(dets_th, scores, iou_thr) 42 | inds = ori_inds[inds] 43 | 44 | if is_numpy: 45 | inds = inds.cpu().numpy() 46 | return dets[inds, :], inds 47 | 48 | 49 | def poly_nms(dets, iou_thr, device_id=None): 50 | if isinstance(dets, torch.Tensor): 51 | is_numpy = False 52 | dets_th = dets 53 | elif isinstance(dets, np.ndarray): 54 | is_numpy = True 55 | device = 'cpu' if device_id is None else f'cuda:{device_id}' 56 | dets_th = torch.from_numpy(dets).to(device) 57 | else: 58 | raise TypeError('dets must be eithr a Tensor or numpy array, ' 59 | f'but got {type(dets)}') 60 | 61 | if dets_th.device == torch.device('cpu'): 62 | raise NotImplementedError 63 | inds = nms_rotated_ext.nms_poly(dets_th.float(), iou_thr) 64 | 65 | if is_numpy: 66 | inds = inds.cpu().numpy() 67 | return dets[inds, :], inds 68 | 69 | if __name__ == '__main__': 70 | rboxes_opencv = torch.tensor(([136.6, 111.6, 200, 100, -60], 71 | [136.6, 111.6, 100, 200, -30], 72 | [100, 100, 141.4, 141.4, -45], 73 | [100, 100, 141.4, 141.4, -45])) 74 | rboxes_longedge = torch.tensor(([136.6, 111.6, 200, 100, -60], 75 | [136.6, 111.6, 200, 100, 120], 76 | [100, 100, 141.4, 141.4, 45], 77 | [100, 100, 141.4, 141.4, 135])) 78 | -------------------------------------------------------------------------------- /utils/nms_rotated/setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import os 3 | import subprocess 4 | import time 5 | from setuptools import find_packages, setup 6 | 7 | import torch 8 | from torch.utils.cpp_extension import (BuildExtension, CppExtension, 9 | CUDAExtension) 10 | def make_cuda_ext(name, module, sources, sources_cuda=[]): 11 | 12 | define_macros = [] 13 | extra_compile_args = {'cxx': []} 14 | 15 | if torch.cuda.is_available() or os.getenv('FORCE_CUDA', '0') == '1': 16 | define_macros += [('WITH_CUDA', None)] 17 | extension = CUDAExtension 18 | extra_compile_args['nvcc'] = [ 19 | '-D__CUDA_NO_HALF_OPERATORS__', 20 | '-D__CUDA_NO_HALF_CONVERSIONS__', 21 | '-D__CUDA_NO_HALF2_OPERATORS__', 22 | ] 23 | sources += sources_cuda 24 | else: 25 | print(f'Compiling {name} without CUDA') 26 | extension = CppExtension 27 | # raise EnvironmentError('CUDA is required to compile MMDetection!') 28 | 29 | return extension( 30 | name=f'{module}.{name}', 31 | sources=[os.path.join(*module.split('.'), p) for p in sources], 32 | define_macros=define_macros, 33 | extra_compile_args=extra_compile_args) 34 | 35 | # python setup.py develop 36 | if __name__ == '__main__': 37 | #write_version_py() 38 | setup( 39 | name='nms_rotated', 40 | ext_modules=[ 41 | make_cuda_ext( 42 | name='nms_rotated_ext', 43 | module='', 44 | sources=[ 45 | 'src/nms_rotated_cpu.cpp', 46 | 'src/nms_rotated_ext.cpp' 47 | ], 48 | sources_cuda=[ 49 | 'src/nms_rotated_cuda.cu', 50 | 'src/poly_nms_cuda.cu', 51 | ]), 52 | ], 53 | cmdclass={'build_ext': BuildExtension}, 54 | zip_safe=False) -------------------------------------------------------------------------------- /utils/nms_rotated/src/box_iou_rotated_utils.h: -------------------------------------------------------------------------------- 1 | // Mortified from 2 | // https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/box_iou_rotated 3 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 4 | #pragma once 5 | 6 | #include 7 | #include 8 | 9 | #if defined(__CUDACC__) || __HCC__ == 1 || __HIP__ == 1 10 | // Designates functions callable from the host (CPU) and the device (GPU) 11 | #define HOST_DEVICE __host__ __device__ 12 | #define HOST_DEVICE_INLINE HOST_DEVICE __forceinline__ 13 | #else 14 | #include 15 | #define HOST_DEVICE 16 | #define HOST_DEVICE_INLINE HOST_DEVICE inline 17 | #endif 18 | 19 | 20 | template 21 | struct RotatedBox { 22 | T x_ctr, y_ctr, w, h, a; 23 | }; 24 | 25 | template 26 | struct Point { 27 | T x, y; 28 | HOST_DEVICE_INLINE Point(const T& px = 0, const T& py = 0) : x(px), y(py) {} 29 | HOST_DEVICE_INLINE Point operator+(const Point& p) const { 30 | return Point(x + p.x, y + p.y); 31 | } 32 | HOST_DEVICE_INLINE Point& operator+=(const Point& p) { 33 | x += p.x; 34 | y += p.y; 35 | return *this; 36 | } 37 | HOST_DEVICE_INLINE Point operator-(const Point& p) const { 38 | return Point(x - p.x, y - p.y); 39 | } 40 | HOST_DEVICE_INLINE Point operator*(const T coeff) const { 41 | return Point(x * coeff, y * coeff); 42 | } 43 | }; 44 | 45 | template 46 | HOST_DEVICE_INLINE T dot_2d(const Point& A, const Point& B) { 47 | return A.x * B.x + A.y * B.y; 48 | } 49 | 50 | // R: result type. can be different from input type 51 | template 52 | HOST_DEVICE_INLINE R cross_2d(const Point& A, const Point& B) { 53 | return static_cast(A.x) * static_cast(B.y) - 54 | static_cast(B.x) * static_cast(A.y); 55 | } 56 | 57 | template 58 | HOST_DEVICE_INLINE void get_rotated_vertices( 59 | const RotatedBox& box, 60 | Point (&pts)[4]) { 61 | // M_PI / 180. == 0.01745329251 62 | //double theta = box.a * 0.01745329251; ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 63 | double theta = box.a; 64 | T cosTheta2 = (T)cos(theta) * 0.5f; 65 | T sinTheta2 = (T)sin(theta) * 0.5f; 66 | 67 | // y: top --> down; x: left --> right 68 | pts[0].x = box.x_ctr + sinTheta2 * box.h + cosTheta2 * box.w; 69 | pts[0].y = box.y_ctr + cosTheta2 * box.h - sinTheta2 * box.w; 70 | pts[1].x = box.x_ctr - sinTheta2 * box.h + cosTheta2 * box.w; 71 | pts[1].y = box.y_ctr - cosTheta2 * box.h - sinTheta2 * box.w; 72 | pts[2].x = 2 * box.x_ctr - pts[0].x; 73 | pts[2].y = 2 * box.y_ctr - pts[0].y; 74 | pts[3].x = 2 * box.x_ctr - pts[1].x; 75 | pts[3].y = 2 * box.y_ctr - pts[1].y; 76 | } 77 | 78 | template 79 | HOST_DEVICE_INLINE int get_intersection_points( 80 | const Point (&pts1)[4], 81 | const Point (&pts2)[4], 82 | Point (&intersections)[24]) { 83 | // Line vector 84 | // A line from p1 to p2 is: p1 + (p2-p1)*t, t=[0,1] 85 | Point vec1[4], vec2[4]; 86 | for (int i = 0; i < 4; i++) { 87 | vec1[i] = pts1[(i + 1) % 4] - pts1[i]; 88 | vec2[i] = pts2[(i + 1) % 4] - pts2[i]; 89 | } 90 | 91 | // Line test - test all line combos for intersection 92 | int num = 0; // number of intersections 93 | for (int i = 0; i < 4; i++) { 94 | for (int j = 0; j < 4; j++) { 95 | // Solve for 2x2 Ax=b 96 | T det = cross_2d(vec2[j], vec1[i]); 97 | 98 | // This takes care of parallel lines 99 | if (fabs(det) <= 1e-14) { 100 | continue; 101 | } 102 | 103 | auto vec12 = pts2[j] - pts1[i]; 104 | 105 | T t1 = cross_2d(vec2[j], vec12) / det; 106 | T t2 = cross_2d(vec1[i], vec12) / det; 107 | 108 | if (t1 >= 0.0f && t1 <= 1.0f && t2 >= 0.0f && t2 <= 1.0f) { 109 | intersections[num++] = pts1[i] + vec1[i] * t1; 110 | } 111 | } 112 | } 113 | 114 | // Check for vertices of rect1 inside rect2 115 | { 116 | const auto& AB = vec2[0]; 117 | const auto& DA = vec2[3]; 118 | auto ABdotAB = dot_2d(AB, AB); 119 | auto ADdotAD = dot_2d(DA, DA); 120 | for (int i = 0; i < 4; i++) { 121 | // assume ABCD is the rectangle, and P is the point to be judged 122 | // P is inside ABCD iff. P's projection on AB lies within AB 123 | // and P's projection on AD lies within AD 124 | 125 | auto AP = pts1[i] - pts2[0]; 126 | 127 | auto APdotAB = dot_2d(AP, AB); 128 | auto APdotAD = -dot_2d(AP, DA); 129 | 130 | if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) && 131 | (APdotAD <= ADdotAD)) { 132 | intersections[num++] = pts1[i]; 133 | } 134 | } 135 | } 136 | 137 | // Reverse the check - check for vertices of rect2 inside rect1 138 | { 139 | const auto& AB = vec1[0]; 140 | const auto& DA = vec1[3]; 141 | auto ABdotAB = dot_2d(AB, AB); 142 | auto ADdotAD = dot_2d(DA, DA); 143 | for (int i = 0; i < 4; i++) { 144 | auto AP = pts2[i] - pts1[0]; 145 | 146 | auto APdotAB = dot_2d(AP, AB); 147 | auto APdotAD = -dot_2d(AP, DA); 148 | 149 | if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) && 150 | (APdotAD <= ADdotAD)) { 151 | intersections[num++] = pts2[i]; 152 | } 153 | } 154 | } 155 | 156 | return num; 157 | } 158 | 159 | template 160 | HOST_DEVICE_INLINE int convex_hull_graham( 161 | const Point (&p)[24], 162 | const int& num_in, 163 | Point (&q)[24], 164 | bool shift_to_zero = false) { 165 | assert(num_in >= 2); 166 | 167 | // Step 1: 168 | // Find point with minimum y 169 | // if more than 1 points have the same minimum y, 170 | // pick the one with the minimum x. 171 | int t = 0; 172 | for (int i = 1; i < num_in; i++) { 173 | if (p[i].y < p[t].y || (p[i].y == p[t].y && p[i].x < p[t].x)) { 174 | t = i; 175 | } 176 | } 177 | auto& start = p[t]; // starting point 178 | 179 | // Step 2: 180 | // Subtract starting point from every points (for sorting in the next step) 181 | for (int i = 0; i < num_in; i++) { 182 | q[i] = p[i] - start; 183 | } 184 | 185 | // Swap the starting point to position 0 186 | auto tmp = q[0]; 187 | q[0] = q[t]; 188 | q[t] = tmp; 189 | 190 | // Step 3: 191 | // Sort point 1 ~ num_in according to their relative cross-product values 192 | // (essentially sorting according to angles) 193 | // If the angles are the same, sort according to their distance to origin 194 | T dist[24]; 195 | #if defined(__CUDACC__) || __HCC__ == 1 || __HIP__ == 1 196 | // compute distance to origin before sort, and sort them together with the 197 | // points 198 | for (int i = 0; i < num_in; i++) { 199 | dist[i] = dot_2d(q[i], q[i]); 200 | } 201 | 202 | // CUDA version 203 | // In the future, we can potentially use thrust 204 | // for sorting here to improve speed (though not guaranteed) 205 | for (int i = 1; i < num_in - 1; i++) { 206 | for (int j = i + 1; j < num_in; j++) { 207 | T crossProduct = cross_2d(q[i], q[j]); 208 | if ((crossProduct < -1e-6) || 209 | (fabs(crossProduct) < 1e-6 && dist[i] > dist[j])) { 210 | auto q_tmp = q[i]; 211 | q[i] = q[j]; 212 | q[j] = q_tmp; 213 | auto dist_tmp = dist[i]; 214 | dist[i] = dist[j]; 215 | dist[j] = dist_tmp; 216 | } 217 | } 218 | } 219 | #else 220 | // CPU version 221 | std::sort( 222 | q + 1, q + num_in, [](const Point& A, const Point& B) -> bool { 223 | T temp = cross_2d(A, B); 224 | if (fabs(temp) < 1e-6) { 225 | return dot_2d(A, A) < dot_2d(B, B); 226 | } else { 227 | return temp > 0; 228 | } 229 | }); 230 | // compute distance to origin after sort, since the points are now different. 231 | for (int i = 0; i < num_in; i++) { 232 | dist[i] = dot_2d(q[i], q[i]); 233 | } 234 | #endif 235 | 236 | // Step 4: 237 | // Make sure there are at least 2 points (that don't overlap with each other) 238 | // in the stack 239 | int k; // index of the non-overlapped second point 240 | for (k = 1; k < num_in; k++) { 241 | if (dist[k] > 1e-8) { 242 | break; 243 | } 244 | } 245 | if (k == num_in) { 246 | // We reach the end, which means the convex hull is just one point 247 | q[0] = p[t]; 248 | return 1; 249 | } 250 | q[1] = q[k]; 251 | int m = 2; // 2 points in the stack 252 | // Step 5: 253 | // Finally we can start the scanning process. 254 | // When a non-convex relationship between the 3 points is found 255 | // (either concave shape or duplicated points), 256 | // we pop the previous point from the stack 257 | // until the 3-point relationship is convex again, or 258 | // until the stack only contains two points 259 | for (int i = k + 1; i < num_in; i++) { 260 | while (m > 1) { 261 | auto q1 = q[i] - q[m - 2], q2 = q[m - 1] - q[m - 2]; 262 | // cross_2d() uses FMA and therefore computes round(round(q1.x*q2.y) - 263 | // q2.x*q1.y) So it may not return 0 even when q1==q2. Therefore we 264 | // compare round(q1.x*q2.y) and round(q2.x*q1.y) directly. (round means 265 | // round to nearest floating point). 266 | if (q1.x * q2.y >= q2.x * q1.y) 267 | m--; 268 | else 269 | break; 270 | } 271 | // Using double also helps, but float can solve the issue for now. 272 | // while (m > 1 && cross_2d(q[i] - q[m - 2], q[m - 1] - q[m - 2]) 273 | // >= 0) { 274 | // m--; 275 | // } 276 | q[m++] = q[i]; 277 | } 278 | 279 | // Step 6 (Optional): 280 | // In general sense we need the original coordinates, so we 281 | // need to shift the points back (reverting Step 2) 282 | // But if we're only interested in getting the area/perimeter of the shape 283 | // We can simply return. 284 | if (!shift_to_zero) { 285 | for (int i = 0; i < m; i++) { 286 | q[i] += start; 287 | } 288 | } 289 | 290 | return m; 291 | } 292 | 293 | template 294 | HOST_DEVICE_INLINE T polygon_area(const Point (&q)[24], const int& m) { 295 | if (m <= 2) { 296 | return 0; 297 | } 298 | 299 | T area = 0; 300 | for (int i = 1; i < m - 1; i++) { 301 | area += fabs(cross_2d(q[i] - q[0], q[i + 1] - q[0])); 302 | } 303 | 304 | return area / 2.0; 305 | } 306 | 307 | template 308 | HOST_DEVICE_INLINE T rotated_boxes_intersection( 309 | const RotatedBox& box1, 310 | const RotatedBox& box2) { 311 | // There are up to 4 x 4 + 4 + 4 = 24 intersections (including dups) returned 312 | // from rotated_rect_intersection_pts 313 | Point intersectPts[24], orderedPts[24]; 314 | 315 | Point pts1[4]; 316 | Point pts2[4]; 317 | get_rotated_vertices(box1, pts1); 318 | get_rotated_vertices(box2, pts2); 319 | 320 | int num = get_intersection_points(pts1, pts2, intersectPts); 321 | 322 | if (num <= 2) { 323 | return 0.0; 324 | } 325 | 326 | // Convex Hull to order the intersection points in clockwise order and find 327 | // the contour area. 328 | int num_convex = convex_hull_graham(intersectPts, num, orderedPts, true); 329 | return polygon_area(orderedPts, num_convex); 330 | } 331 | 332 | 333 | template 334 | HOST_DEVICE_INLINE T 335 | single_box_iou_rotated(T const* const box1_raw, T const* const box2_raw) { 336 | // shift center to the middle point to achieve higher precision in result 337 | RotatedBox box1, box2; 338 | auto center_shift_x = (box1_raw[0] + box2_raw[0]) / 2.0; 339 | auto center_shift_y = (box1_raw[1] + box2_raw[1]) / 2.0; 340 | box1.x_ctr = box1_raw[0] - center_shift_x; 341 | box1.y_ctr = box1_raw[1] - center_shift_y; 342 | box1.w = box1_raw[2]; 343 | box1.h = box1_raw[3]; 344 | box1.a = box1_raw[4]; 345 | box2.x_ctr = box2_raw[0] - center_shift_x; 346 | box2.y_ctr = box2_raw[1] - center_shift_y; 347 | box2.w = box2_raw[2]; 348 | box2.h = box2_raw[3]; 349 | box2.a = box2_raw[4]; 350 | 351 | T area1 = box1.w * box1.h; 352 | T area2 = box2.w * box2.h; 353 | if (area1 < 1e-14 || area2 < 1e-14) { 354 | return 0.f; 355 | } 356 | 357 | T intersection = rotated_boxes_intersection(box1, box2); 358 | T iou = intersection / (area1 + area2 - intersection); 359 | return iou; 360 | } 361 | -------------------------------------------------------------------------------- /utils/nms_rotated/src/nms_rotated_cpu.cpp: -------------------------------------------------------------------------------- 1 | // Modified from 2 | // https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/nms_rotated 3 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 4 | #include 5 | #include "box_iou_rotated_utils.h" 6 | 7 | 8 | template 9 | at::Tensor nms_rotated_cpu_kernel( 10 | const at::Tensor& dets, 11 | const at::Tensor& scores, 12 | const float iou_threshold) { 13 | // nms_rotated_cpu_kernel is modified from torchvision's nms_cpu_kernel, 14 | // however, the code in this function is much shorter because 15 | // we delegate the IoU computation for rotated boxes to 16 | // the single_box_iou_rotated function in box_iou_rotated_utils.h 17 | AT_ASSERTM(dets.device().is_cpu(), "dets must be a CPU tensor"); 18 | AT_ASSERTM(scores.device().is_cpu(), "scores must be a CPU tensor"); 19 | AT_ASSERTM( 20 | dets.scalar_type() == scores.scalar_type(), 21 | "dets should have the same type as scores"); 22 | 23 | if (dets.numel() == 0) { 24 | return at::empty({0}, dets.options().dtype(at::kLong)); 25 | } 26 | 27 | auto order_t = std::get<1>(scores.sort(0, /* descending=*/true)); 28 | 29 | auto ndets = dets.size(0); 30 | at::Tensor suppressed_t = at::zeros({ndets}, dets.options().dtype(at::kByte)); 31 | at::Tensor keep_t = at::zeros({ndets}, dets.options().dtype(at::kLong)); 32 | 33 | auto suppressed = suppressed_t.data_ptr(); 34 | auto keep = keep_t.data_ptr(); 35 | auto order = order_t.data_ptr(); 36 | 37 | int64_t num_to_keep = 0; 38 | 39 | for (int64_t _i = 0; _i < ndets; _i++) { 40 | auto i = order[_i]; 41 | if (suppressed[i] == 1) { 42 | continue; 43 | } 44 | 45 | keep[num_to_keep++] = i; 46 | 47 | for (int64_t _j = _i + 1; _j < ndets; _j++) { 48 | auto j = order[_j]; 49 | if (suppressed[j] == 1) { 50 | continue; 51 | } 52 | 53 | auto ovr = single_box_iou_rotated( 54 | dets[i].data_ptr(), dets[j].data_ptr()); 55 | if (ovr >= iou_threshold) { 56 | suppressed[j] = 1; 57 | } 58 | } 59 | } 60 | return keep_t.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep); 61 | } 62 | 63 | at::Tensor nms_rotated_cpu( 64 | // input must be contiguous 65 | const at::Tensor& dets, 66 | const at::Tensor& scores, 67 | const float iou_threshold) { 68 | auto result = at::empty({0}, dets.options()); 69 | 70 | AT_DISPATCH_FLOATING_TYPES(dets.scalar_type(), "nms_rotated", [&] { 71 | result = nms_rotated_cpu_kernel(dets, scores, iou_threshold); 72 | }); 73 | return result; 74 | } 75 | -------------------------------------------------------------------------------- /utils/nms_rotated/src/nms_rotated_cuda.cu: -------------------------------------------------------------------------------- 1 | // Modified from 2 | // https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/nms_rotated 3 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include "box_iou_rotated_utils.h" 9 | 10 | int const threadsPerBlock = sizeof(unsigned long long) * 8; 11 | 12 | template 13 | __global__ void nms_rotated_cuda_kernel( 14 | const int n_boxes, 15 | const float iou_threshold, 16 | const T* dev_boxes, 17 | unsigned long long* dev_mask) { 18 | // nms_rotated_cuda_kernel is modified from torchvision's nms_cuda_kernel 19 | 20 | const int row_start = blockIdx.y; 21 | const int col_start = blockIdx.x; 22 | 23 | // if (row_start > col_start) return; 24 | 25 | const int row_size = 26 | min(n_boxes - row_start * threadsPerBlock, threadsPerBlock); 27 | const int col_size = 28 | min(n_boxes - col_start * threadsPerBlock, threadsPerBlock); 29 | 30 | // Compared to nms_cuda_kernel, where each box is represented with 4 values 31 | // (x1, y1, x2, y2), each rotated box is represented with 5 values 32 | // (x_center, y_center, width, height, angle_degrees) here. 33 | __shared__ T block_boxes[threadsPerBlock * 5]; 34 | if (threadIdx.x < col_size) { 35 | block_boxes[threadIdx.x * 5 + 0] = 36 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0]; 37 | block_boxes[threadIdx.x * 5 + 1] = 38 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1]; 39 | block_boxes[threadIdx.x * 5 + 2] = 40 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2]; 41 | block_boxes[threadIdx.x * 5 + 3] = 42 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3]; 43 | block_boxes[threadIdx.x * 5 + 4] = 44 | dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4]; 45 | } 46 | __syncthreads(); 47 | 48 | if (threadIdx.x < row_size) { 49 | const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x; 50 | const T* cur_box = dev_boxes + cur_box_idx * 5; 51 | int i = 0; 52 | unsigned long long t = 0; 53 | int start = 0; 54 | if (row_start == col_start) { 55 | start = threadIdx.x + 1; 56 | } 57 | for (i = start; i < col_size; i++) { 58 | // Instead of devIoU used by original horizontal nms, here 59 | // we use the single_box_iou_rotated function from box_iou_rotated_utils.h 60 | if (single_box_iou_rotated(cur_box, block_boxes + i * 5) > 61 | iou_threshold) { 62 | t |= 1ULL << i; 63 | } 64 | } 65 | const int col_blocks = at::cuda::ATenCeilDiv(n_boxes, threadsPerBlock); 66 | dev_mask[cur_box_idx * col_blocks + col_start] = t; 67 | } 68 | } 69 | 70 | 71 | at::Tensor nms_rotated_cuda( 72 | // input must be contiguous 73 | const at::Tensor& dets, 74 | const at::Tensor& scores, 75 | float iou_threshold) { 76 | // using scalar_t = float; 77 | AT_ASSERTM(dets.is_cuda(), "dets must be a CUDA tensor"); 78 | AT_ASSERTM(scores.is_cuda(), "scores must be a CUDA tensor"); 79 | at::cuda::CUDAGuard device_guard(dets.device()); 80 | 81 | auto order_t = std::get<1>(scores.sort(0, /* descending=*/true)); 82 | auto dets_sorted = dets.index_select(0, order_t); 83 | 84 | auto dets_num = dets.size(0); 85 | 86 | const int col_blocks = 87 | at::cuda::ATenCeilDiv(static_cast(dets_num), threadsPerBlock); 88 | 89 | at::Tensor mask = 90 | at::empty({dets_num * col_blocks}, dets.options().dtype(at::kLong)); 91 | 92 | dim3 blocks(col_blocks, col_blocks); 93 | dim3 threads(threadsPerBlock); 94 | cudaStream_t stream = at::cuda::getCurrentCUDAStream(); 95 | 96 | AT_DISPATCH_FLOATING_TYPES( 97 | dets_sorted.scalar_type(), "nms_rotated_kernel_cuda", [&] { 98 | nms_rotated_cuda_kernel<<>>( 99 | dets_num, 100 | iou_threshold, 101 | dets_sorted.data_ptr(), 102 | (unsigned long long*)mask.data_ptr()); 103 | }); 104 | 105 | at::Tensor mask_cpu = mask.to(at::kCPU); 106 | unsigned long long* mask_host = 107 | (unsigned long long*)mask_cpu.data_ptr(); 108 | 109 | std::vector remv(col_blocks); 110 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks); 111 | 112 | at::Tensor keep = 113 | at::empty({dets_num}, dets.options().dtype(at::kLong).device(at::kCPU)); 114 | int64_t* keep_out = keep.data_ptr(); 115 | 116 | int num_to_keep = 0; 117 | for (int i = 0; i < dets_num; i++) { 118 | int nblock = i / threadsPerBlock; 119 | int inblock = i % threadsPerBlock; 120 | 121 | if (!(remv[nblock] & (1ULL << inblock))) { 122 | keep_out[num_to_keep++] = i; 123 | unsigned long long* p = mask_host + i * col_blocks; 124 | for (int j = nblock; j < col_blocks; j++) { 125 | remv[j] |= p[j]; 126 | } 127 | } 128 | } 129 | 130 | AT_CUDA_CHECK(cudaGetLastError()); 131 | return order_t.index( 132 | {keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep) 133 | .to(order_t.device(), keep.scalar_type())}); 134 | } 135 | -------------------------------------------------------------------------------- /utils/nms_rotated/src/nms_rotated_ext.cpp: -------------------------------------------------------------------------------- 1 | // Modified from 2 | // https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/nms_rotated 3 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved 4 | #include 5 | #include 6 | 7 | 8 | #ifdef WITH_CUDA 9 | at::Tensor nms_rotated_cuda( 10 | const at::Tensor& dets, 11 | const at::Tensor& scores, 12 | const float iou_threshold); 13 | 14 | at::Tensor poly_nms_cuda( 15 | const at::Tensor boxes, 16 | float nms_overlap_thresh); 17 | #endif 18 | 19 | at::Tensor nms_rotated_cpu( 20 | const at::Tensor& dets, 21 | const at::Tensor& scores, 22 | const float iou_threshold); 23 | 24 | 25 | inline at::Tensor nms_rotated( 26 | const at::Tensor& dets, 27 | const at::Tensor& scores, 28 | const float iou_threshold) { 29 | assert(dets.device().is_cuda() == scores.device().is_cuda()); 30 | if (dets.device().is_cuda()) { 31 | #ifdef WITH_CUDA 32 | return nms_rotated_cuda( 33 | dets.contiguous(), scores.contiguous(), iou_threshold); 34 | #else 35 | AT_ERROR("Not compiled with GPU support"); 36 | #endif 37 | } 38 | return nms_rotated_cpu(dets.contiguous(), scores.contiguous(), iou_threshold); 39 | } 40 | 41 | 42 | inline at::Tensor nms_poly( 43 | const at::Tensor& dets, 44 | const float iou_threshold) { 45 | if (dets.device().is_cuda()) { 46 | #ifdef WITH_CUDA 47 | if (dets.numel() == 0) 48 | return at::empty({0}, dets.options().dtype(at::kLong).device(at::kCPU)); 49 | return poly_nms_cuda(dets, iou_threshold); 50 | #else 51 | AT_ERROR("POLY_NMS is not compiled with GPU support"); 52 | #endif 53 | } 54 | AT_ERROR("POLY_NMS is not implemented on CPU"); 55 | } 56 | 57 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 58 | m.def("nms_rotated", &nms_rotated, "nms for rotated bboxes"); 59 | m.def("nms_poly", &nms_poly, "nms for poly bboxes"); 60 | } 61 | -------------------------------------------------------------------------------- /utils/nms_rotated/src/poly_nms_cpu.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | template 4 | at::Tensor poly_nms_cpu_kernel(const at::Tensor& dets, const float threshold) { 5 | 6 | -------------------------------------------------------------------------------- /utils/nms_rotated/src/poly_nms_cuda.cu: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | 4 | #include 5 | #include 6 | 7 | #include 8 | #include 9 | 10 | #define CUDA_CHECK(condition) \ 11 | /* Code block avoids redefinition of cudaError_t error */ \ 12 | do { \ 13 | cudaError_t error = condition; \ 14 | if (error != cudaSuccess) { \ 15 | std::cout << cudaGetErrorString(error) << std::endl; \ 16 | } \ 17 | } while (0) 18 | 19 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0)) 20 | int const threadsPerBlock = sizeof(unsigned long long) * 8; 21 | 22 | 23 | #define maxn 10 24 | const double eps=1E-8; 25 | 26 | __device__ inline int sig(float d){ 27 | return(d>1E-8)-(d<-1E-8); 28 | } 29 | 30 | __device__ inline int point_eq(const float2 a, const float2 b) { 31 | return sig(a.x - b.x) == 0 && sig(a.y - b.y)==0; 32 | } 33 | 34 | __device__ inline void point_swap(float2 *a, float2 *b) { 35 | float2 temp = *a; 36 | *a = *b; 37 | *b = temp; 38 | } 39 | 40 | __device__ inline void point_reverse(float2 *first, float2* last) 41 | { 42 | while ((first!=last)&&(first!=--last)) { 43 | point_swap (first,last); 44 | ++first; 45 | } 46 | } 47 | 48 | __device__ inline float cross(float2 o,float2 a,float2 b){ //叉积 49 | return(a.x-o.x)*(b.y-o.y)-(b.x-o.x)*(a.y-o.y); 50 | } 51 | __device__ inline float area(float2* ps,int n){ 52 | ps[n]=ps[0]; 53 | float res=0; 54 | for(int i=0;i0) pp[m++]=p[i]; 75 | if(sig(cross(a,b,p[i]))!=sig(cross(a,b,p[i+1]))) 76 | lineCross(a,b,p[i],p[i+1],pp[m++]); 77 | } 78 | n=0; 79 | for(int i=0;i1&&p[n-1]==p[0])n--; 83 | while(n>1&&point_eq(p[n-1], p[0]))n--; 84 | } 85 | 86 | //---------------华丽的分隔线-----------------// 87 | //返回三角形oab和三角形ocd的有向交面积,o是原点// 88 | __device__ inline float intersectArea(float2 a,float2 b,float2 c,float2 d){ 89 | float2 o = make_float2(0,0); 90 | int s1=sig(cross(o,a,b)); 91 | int s2=sig(cross(o,c,d)); 92 | if(s1==0||s2==0)return 0.0;//退化,面积为0 93 | // if(s1==-1) swap(a,b); 94 | // if(s2==-1) swap(c,d); 95 | if (s1 == -1) point_swap(&a, &b); 96 | if (s2 == -1) point_swap(&c, &d); 97 | float2 p[10]={o,a,b}; 98 | int n=3; 99 | float2 pp[maxn]; 100 | polygon_cut(p,n,o,c,pp); 101 | polygon_cut(p,n,c,d,pp); 102 | polygon_cut(p,n,d,o,pp); 103 | float res=fabs(area(p,n)); 104 | if(s1*s2==-1) res=-res;return res; 105 | } 106 | //求两多边形的交面积 107 | __device__ inline float intersectArea(float2*ps1,int n1,float2*ps2,int n2){ 108 | if(area(ps1,n1)<0) point_reverse(ps1,ps1+n1); 109 | if(area(ps2,n2)<0) point_reverse(ps2,ps2+n2); 110 | ps1[n1]=ps1[0]; 111 | ps2[n2]=ps2[0]; 112 | float res=0; 113 | for(int i=0;i nms_overlap_thresh) { 188 | t |= 1ULL << i; 189 | } 190 | } 191 | const int col_blocks = THCCeilDiv(n_polys, threadsPerBlock); 192 | dev_mask[cur_box_idx * col_blocks + col_start] = t; 193 | } 194 | } 195 | 196 | // boxes is a N x 9 tensor 197 | at::Tensor poly_nms_cuda(const at::Tensor boxes, float nms_overlap_thresh) { 198 | 199 | at::DeviceGuard guard(boxes.device()); 200 | 201 | using scalar_t = float; 202 | AT_ASSERTM(boxes.device().is_cuda(), "boxes must be a CUDA tensor"); 203 | auto scores = boxes.select(1, 8); 204 | auto order_t = std::get<1>(scores.sort(0, /*descending=*/true)); 205 | auto boxes_sorted = boxes.index_select(0, order_t); 206 | 207 | int boxes_num = boxes.size(0); 208 | 209 | const int col_blocks = THCCeilDiv(boxes_num, threadsPerBlock); 210 | 211 | scalar_t* boxes_dev = boxes_sorted.data_ptr(); 212 | 213 | THCState *state = at::globalContext().lazyInitCUDA(); 214 | 215 | unsigned long long* mask_dev = NULL; 216 | 217 | mask_dev = (unsigned long long*) THCudaMalloc(state, boxes_num * col_blocks * sizeof(unsigned long long)); 218 | 219 | dim3 blocks(THCCeilDiv(boxes_num, threadsPerBlock), 220 | THCCeilDiv(boxes_num, threadsPerBlock)); 221 | dim3 threads(threadsPerBlock); 222 | poly_nms_kernel<<>>(boxes_num, 223 | nms_overlap_thresh, 224 | boxes_dev, 225 | mask_dev); 226 | 227 | std::vector mask_host(boxes_num * col_blocks); 228 | THCudaCheck(cudaMemcpyAsync( 229 | &mask_host[0], 230 | mask_dev, 231 | sizeof(unsigned long long) * boxes_num * col_blocks, 232 | cudaMemcpyDeviceToHost, 233 | at::cuda::getCurrentCUDAStream() 234 | )); 235 | 236 | std::vector remv(col_blocks); 237 | memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks); 238 | 239 | at::Tensor keep = at::empty({boxes_num}, boxes.options().dtype(at::kLong).device(at::kCPU)); 240 | int64_t* keep_out = keep.data_ptr(); 241 | 242 | int num_to_keep = 0; 243 | for (int i = 0; i < boxes_num; i++) { 244 | int nblock = i / threadsPerBlock; 245 | int inblock = i % threadsPerBlock; 246 | 247 | if (!(remv[nblock] & (1ULL << inblock))) { 248 | keep_out[num_to_keep++] = i; 249 | unsigned long long *p = &mask_host[0] + i * col_blocks; 250 | for (int j = nblock; j < col_blocks; j++) { 251 | remv[j] |= p[j]; 252 | } 253 | } 254 | } 255 | 256 | THCudaFree(state, mask_dev); 257 | 258 | return order_t.index({ 259 | keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep).to( 260 | order_t.device(), keep.scalar_type())}); 261 | } 262 | 263 | -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from PIL import Image 3 | 4 | 5 | #---------------------------------------------------------# 6 | # 将图像转换成RGB图像,防止灰度图在预测时报错。 7 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 8 | #---------------------------------------------------------# 9 | def cvtColor(image): 10 | if len(np.shape(image)) == 3 and np.shape(image)[2] == 3: 11 | return image 12 | else: 13 | image = image.convert('RGB') 14 | return image 15 | 16 | #---------------------------------------------------# 17 | # 对输入图像进行resize 18 | #---------------------------------------------------# 19 | def resize_image(image, size, letterbox_image): 20 | iw, ih = image.size 21 | w, h = size 22 | if letterbox_image: 23 | scale = min(w/iw, h/ih) 24 | nw = int(iw*scale) 25 | nh = int(ih*scale) 26 | 27 | image = image.resize((nw,nh), Image.BICUBIC) 28 | new_image = Image.new('RGB', size, (128,128,128)) 29 | new_image.paste(image, ((w-nw)//2, (h-nh)//2)) 30 | else: 31 | new_image = image.resize((w, h), Image.BICUBIC) 32 | return new_image 33 | 34 | #---------------------------------------------------# 35 | # 获得类 36 | #---------------------------------------------------# 37 | def get_classes(classes_path): 38 | with open(classes_path, encoding='utf-8') as f: 39 | class_names = f.readlines() 40 | class_names = [c.strip() for c in class_names] 41 | return class_names, len(class_names) 42 | 43 | #---------------------------------------------------# 44 | # 获得先验框 45 | #---------------------------------------------------# 46 | def get_anchors(anchors_path): 47 | '''loads the anchors from a file''' 48 | with open(anchors_path, encoding='utf-8') as f: 49 | anchors = f.readline() 50 | anchors = [float(x) for x in anchors.split(',')] 51 | anchors = np.array(anchors).reshape(-1, 2) 52 | return anchors, len(anchors) 53 | 54 | #---------------------------------------------------# 55 | # 获得学习率 56 | #---------------------------------------------------# 57 | def get_lr(optimizer): 58 | for param_group in optimizer.param_groups: 59 | return param_group['lr'] 60 | 61 | def preprocess_input(image): 62 | image /= 255.0 63 | return image 64 | 65 | def show_config(**kwargs): 66 | print('Configurations:') 67 | print('-' * 70) 68 | print('|%25s | %40s|' % ('keys', 'values')) 69 | print('-' * 70) 70 | for key, value in kwargs.items(): 71 | print('|%25s | %40s|' % (str(key), str(value))) 72 | print('-' * 70) 73 | 74 | def download_weights(model_dir="./model_data"): 75 | import os 76 | from torch.hub import load_state_dict_from_url 77 | 78 | url = 'https://github.com/bubbliiiing/yolov7-tiny-pytorch/releases/download/v1.0/yolov7_tiny_backbone_weights.pth' 79 | 80 | if not os.path.exists(model_dir): 81 | os.makedirs(model_dir) 82 | load_state_dict_from_url(url, model_dir) -------------------------------------------------------------------------------- /utils/utils_bbox.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import math 4 | from utils.utils_rbox import * 5 | from utils.nms_rotated import obb_nms 6 | 7 | class DecodeBox(): 8 | def __init__(self, anchors, num_classes, input_shape, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]): 9 | super(DecodeBox, self).__init__() 10 | self.anchors = anchors 11 | self.num_classes = num_classes 12 | self.bbox_attrs = 6 + num_classes 13 | self.input_shape = input_shape 14 | #-----------------------------------------------------------# 15 | # 13x13的特征层对应的anchor是[142, 110],[192, 243],[459, 401] 16 | # 26x26的特征层对应的anchor是[36, 75],[76, 55],[72, 146] 17 | # 52x52的特征层对应的anchor是[12, 16],[19, 36],[40, 28] 18 | #-----------------------------------------------------------# 19 | self.anchors_mask = anchors_mask 20 | 21 | def decode_box(self, inputs): 22 | outputs = [] 23 | for i, input in enumerate(inputs): 24 | #-----------------------------------------------# 25 | # 输入的input一共有三个,他们的shape分别是 26 | # batch_size = 1 27 | # batch_size, 3 * (5 + 1 + 80), 20, 20 28 | # batch_size, 255, 40, 40 29 | # batch_size, 255, 80, 80 30 | #-----------------------------------------------# 31 | batch_size = input.size(0) 32 | input_height = input.size(2) 33 | input_width = input.size(3) 34 | 35 | #-----------------------------------------------# 36 | # 输入为640x640时 37 | # stride_h = stride_w = 32、16、8 38 | #-----------------------------------------------# 39 | stride_h = self.input_shape[0] / input_height 40 | stride_w = self.input_shape[1] / input_width 41 | #-------------------------------------------------# 42 | # 此时获得的scaled_anchors大小是相对于特征层的 43 | #-------------------------------------------------# 44 | scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]] 45 | 46 | #-----------------------------------------------# 47 | # 输入的input一共有三个,他们的shape分别是 48 | # batch_size, 3, 20, 20, 85 49 | # batch_size, 3, 40, 40, 85 50 | # batch_size, 3, 80, 80, 85 51 | #-----------------------------------------------# 52 | prediction = input.view(batch_size, len(self.anchors_mask[i]), 53 | self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous() 54 | 55 | #-----------------------------------------------# 56 | # 先验框的中心位置的调整参数 57 | #-----------------------------------------------# 58 | x = torch.sigmoid(prediction[..., 0]) 59 | y = torch.sigmoid(prediction[..., 1]) 60 | #-----------------------------------------------# 61 | # 先验框的宽高调整参数 62 | #-----------------------------------------------# 63 | w = torch.sigmoid(prediction[..., 2]) 64 | h = torch.sigmoid(prediction[..., 3]) 65 | #-----------------------------------------------# 66 | # 获取旋转角度 67 | #-----------------------------------------------# 68 | angle = torch.sigmoid(prediction[..., 4]) 69 | #-----------------------------------------------# 70 | # 获得置信度,是否有物体 71 | #-----------------------------------------------# 72 | conf = torch.sigmoid(prediction[..., 5]) 73 | #-----------------------------------------------# 74 | # 种类置信度 75 | #-----------------------------------------------# 76 | pred_cls = torch.sigmoid(prediction[..., 6:]) 77 | 78 | FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor 79 | LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor 80 | 81 | #----------------------------------------------------------# 82 | # 生成网格,先验框中心,网格左上角 83 | # batch_size,3,20,20 84 | #----------------------------------------------------------# 85 | grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat( 86 | batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor) 87 | grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat( 88 | batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor) 89 | 90 | #----------------------------------------------------------# 91 | # 按照网格格式生成先验框的宽高 92 | # batch_size,3,20,20 93 | #----------------------------------------------------------# 94 | anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0])) 95 | anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1])) 96 | anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape) 97 | anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape) 98 | 99 | #----------------------------------------------------------# 100 | # 利用预测结果对先验框进行调整 101 | # 首先调整先验框的中心,从先验框中心向右下角偏移 102 | # 再调整先验框的宽高。 103 | # x 0 ~ 1 => 0 ~ 2 => -0.5, 1.5 => 负责一定范围的目标的预测 104 | # y 0 ~ 1 => 0 ~ 2 => -0.5, 1.5 => 负责一定范围的目标的预测 105 | # w 0 ~ 1 => 0 ~ 2 => 0 ~ 4 => 先验框的宽高调节范围为0~4倍 106 | # h 0 ~ 1 => 0 ~ 2 => 0 ~ 4 => 先验框的宽高调节范围为0~4倍 107 | #----------------------------------------------------------# 108 | pred_boxes = FloatTensor(prediction[..., :4].shape) 109 | pred_boxes[..., 0] = x.data * 2. - 0.5 + grid_x 110 | pred_boxes[..., 1] = y.data * 2. - 0.5 + grid_y 111 | pred_boxes[..., 2] = (w.data * 2) ** 2 * anchor_w 112 | pred_boxes[..., 3] = (h.data * 2) ** 2 * anchor_h 113 | pred_theta = (angle.data - 0.5) * math.pi 114 | #----------------------------------------------------------# 115 | # 将输出结果归一化成小数的形式 116 | #----------------------------------------------------------# 117 | _scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor) 118 | output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale, pred_theta.view(batch_size, -1, 1), 119 | conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1) 120 | outputs.append(output.data) 121 | return outputs 122 | 123 | def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4): 124 | #----------------------------------------------------------# 125 | # prediction [batch_size, num_anchors, 85] 126 | #----------------------------------------------------------# 127 | 128 | output = [None for _ in range(len(prediction))] 129 | for i, image_pred in enumerate(prediction): 130 | #----------------------------------------------------------# 131 | # 对种类预测部分取max。 132 | # class_conf [num_anchors, 1] 种类置信度 133 | # class_pred [num_anchors, 1] 种类 134 | #----------------------------------------------------------# 135 | class_conf, class_pred = torch.max(image_pred[:, 6:6 + num_classes], 1, keepdim=True) 136 | 137 | #----------------------------------------------------------# 138 | # 利用置信度进行第一轮筛选 139 | #----------------------------------------------------------# 140 | conf_mask = (image_pred[:, 5] * class_conf[:, 0] >= conf_thres).squeeze() 141 | #----------------------------------------------------------# 142 | # 根据置信度进行预测结果的筛选 143 | #----------------------------------------------------------# 144 | image_pred = image_pred[conf_mask] 145 | class_conf = class_conf[conf_mask] 146 | class_pred = class_pred[conf_mask] 147 | if not image_pred.size(0): 148 | continue 149 | #-------------------------------------------------------------------------# 150 | # detections [num_anchors, 8] 151 | # 8的内容为:x, y, w, h, angle, obj_conf, class_conf, class_pred 152 | #-------------------------------------------------------------------------# 153 | detections = torch.cat((image_pred[:, :6], class_conf.float(), class_pred.float()), 1) 154 | 155 | #------------------------------------------# 156 | # 获得预测结果中包含的所有种类 157 | #------------------------------------------# 158 | unique_labels = detections[:, -1].cpu().unique() 159 | 160 | if prediction.is_cuda: 161 | unique_labels = unique_labels.cuda() 162 | detections = detections.cuda() 163 | 164 | for c in unique_labels: 165 | #------------------------------------------# 166 | # 获得某一类得分筛选后全部的预测结果 167 | #------------------------------------------# 168 | detections_class = detections[detections[:, -1] == c] 169 | 170 | #------------------------------------------# 171 | # 使用官方自带的非极大抑制会速度更快一些! 172 | # 筛选出一定区域内,属于同一种类得分最大的框 173 | #------------------------------------------# 174 | _, keep = obb_nms( 175 | detections_class[:, :5], 176 | detections_class[:, 5] * detections_class[:, 6], 177 | nms_thres 178 | ) 179 | max_detections = detections_class[keep] 180 | 181 | # Add max detections to outputs 182 | output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections)) 183 | 184 | if output[i] is not None: 185 | output[i] = output[i].cpu().numpy() 186 | output[i][:, :5] = self.yolo_correct_boxes(output[i], input_shape, image_shape, letterbox_image) 187 | return output 188 | 189 | def yolo_correct_boxes(self, output, input_shape, image_shape, letterbox_image): 190 | #-----------------------------------------------------------------# 191 | # 把y轴放前面是因为方便预测框和图像的宽高进行相乘 192 | #-----------------------------------------------------------------# 193 | box_xy = output[..., 0:2] 194 | box_wh = output[..., 2:4] 195 | angle = output[..., 4:5] 196 | box_yx = box_xy[..., ::-1] 197 | box_hw = box_wh[..., ::-1] 198 | input_shape = np.array(input_shape) 199 | image_shape = np.array(image_shape) 200 | 201 | if letterbox_image: 202 | #-----------------------------------------------------------------# 203 | # 这里求出来的offset是图像有效区域相对于图像左上角的偏移情况 204 | # new_shape指的是宽高缩放情况 205 | #-----------------------------------------------------------------# 206 | new_shape = np.round(image_shape * np.min(input_shape/image_shape)) 207 | offset = (input_shape - new_shape)/2./input_shape 208 | scale = input_shape/new_shape 209 | 210 | box_yx = (box_yx - offset) * scale 211 | box_hw *= scale 212 | 213 | box_xy = box_yx[:, ::-1] 214 | box_hw = box_wh[:, ::-1] 215 | 216 | rboxes = np.concatenate([box_xy, box_wh, angle], axis=-1) 217 | rboxes[:, [0, 2]] *= image_shape[1] 218 | rboxes[:, [1, 3]] *= image_shape[0] 219 | return rboxes 220 | 221 | if __name__ == "__main__": 222 | import matplotlib.pyplot as plt 223 | import numpy as np 224 | 225 | #---------------------------------------------------# 226 | # 将预测值的每个特征层调成真实值 227 | #---------------------------------------------------# 228 | def get_anchors_and_decode(input, input_shape, anchors, anchors_mask, num_classes): 229 | #-----------------------------------------------# 230 | # input batch_size, 3 * (5 + 1 + num_classes), 20, 20 231 | #-----------------------------------------------# 232 | batch_size = input.size(0) 233 | input_height = input.size(2) 234 | input_width = input.size(3) 235 | 236 | #-----------------------------------------------# 237 | # 输入为640x640时 input_shape = [640, 640] input_height = 20, input_width = 20 238 | # 640 / 20 = 32 239 | # stride_h = stride_w = 32 240 | #-----------------------------------------------# 241 | stride_h = input_shape[0] / input_height 242 | stride_w = input_shape[1] / input_width 243 | #-------------------------------------------------# 244 | # 此时获得的scaled_anchors大小是相对于特征层的 245 | # anchor_width, anchor_height / stride_h, stride_w 246 | #-------------------------------------------------# 247 | scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in anchors[anchors_mask[2]]] 248 | 249 | #-----------------------------------------------# 250 | # batch_size, 3 * (4 + 1 + num_classes), 20, 20 => 251 | # batch_size, 3, 5 + num_classes, 20, 20 => 252 | # batch_size, 3, 20, 20, 4 + 1 + num_classes 253 | #-----------------------------------------------# 254 | prediction = input.view(batch_size, len(anchors_mask[2]), 255 | num_classes + 6, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous() 256 | 257 | #-----------------------------------------------# 258 | # 先验框的中心位置的调整参数 259 | #-----------------------------------------------# 260 | x = torch.sigmoid(prediction[..., 0]) 261 | y = torch.sigmoid(prediction[..., 1]) 262 | #-----------------------------------------------# 263 | # 先验框的宽高调整参数 264 | #-----------------------------------------------# 265 | w = torch.sigmoid(prediction[..., 2]) 266 | h = torch.sigmoid(prediction[..., 3]) 267 | #-----------------------------------------------# 268 | # 获得置信度,是否有物体 0 - 1 269 | #-----------------------------------------------# 270 | conf = torch.sigmoid(prediction[..., 5]) 271 | #-----------------------------------------------# 272 | # 种类置信度 0 - 1 273 | #-----------------------------------------------# 274 | pred_cls = torch.sigmoid(prediction[..., 6:]) 275 | 276 | FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor 277 | LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor 278 | 279 | #----------------------------------------------------------# 280 | # 生成网格,先验框中心,网格左上角 281 | # batch_size,3,20,20 282 | # range(20) 283 | # [ 284 | # [0, 1, 2, 3 ……, 19], 285 | # [0, 1, 2, 3 ……, 19], 286 | # …… (20次) 287 | # [0, 1, 2, 3 ……, 19] 288 | # ] * (batch_size * 3) 289 | # [batch_size, 3, 20, 20] 290 | # 291 | # [ 292 | # [0, 1, 2, 3 ……, 19], 293 | # [0, 1, 2, 3 ……, 19], 294 | # …… (20次) 295 | # [0, 1, 2, 3 ……, 19] 296 | # ].T * (batch_size * 3) 297 | # [batch_size, 3, 20, 20] 298 | #----------------------------------------------------------# 299 | grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat( 300 | batch_size * len(anchors_mask[2]), 1, 1).view(x.shape).type(FloatTensor) 301 | grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat( 302 | batch_size * len(anchors_mask[2]), 1, 1).view(y.shape).type(FloatTensor) 303 | 304 | #----------------------------------------------------------# 305 | # 按照网格格式生成先验框的宽高 306 | # batch_size, 3, 20 * 20 => batch_size, 3, 20, 20 307 | # batch_size, 3, 20 * 20 => batch_size, 3, 20, 20 308 | #----------------------------------------------------------# 309 | anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0])) 310 | anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1])) 311 | anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape) 312 | anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape) 313 | 314 | #----------------------------------------------------------# 315 | # 利用预测结果对先验框进行调整 316 | # 首先调整先验框的中心,从先验框中心向右下角偏移 317 | # 再调整先验框的宽高。 318 | # x 0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_x 319 | # y 0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_y 320 | # w 0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_w 321 | # h 0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_h 322 | #----------------------------------------------------------# 323 | pred_boxes = FloatTensor(prediction[..., :4].shape) 324 | pred_boxes[..., 0] = x.data * 2. - 0.5 + grid_x 325 | pred_boxes[..., 1] = y.data * 2. - 0.5 + grid_y 326 | pred_boxes[..., 2] = (w.data * 2) ** 2 * anchor_w 327 | pred_boxes[..., 3] = (h.data * 2) ** 2 * anchor_h 328 | 329 | point_h = 5 330 | point_w = 5 331 | 332 | box_xy = pred_boxes[..., 0:2].cpu().numpy() * 32 333 | box_wh = pred_boxes[..., 2:4].cpu().numpy() * 32 334 | grid_x = grid_x.cpu().numpy() * 32 335 | grid_y = grid_y.cpu().numpy() * 32 336 | anchor_w = anchor_w.cpu().numpy() * 32 337 | anchor_h = anchor_h.cpu().numpy() * 32 338 | 339 | fig = plt.figure() 340 | ax = fig.add_subplot(121) 341 | from PIL import Image 342 | img = Image.open("img/street.jpg").resize([640, 640]) 343 | plt.imshow(img, alpha=0.5) 344 | plt.ylim(-30, 650) 345 | plt.xlim(-30, 650) 346 | plt.scatter(grid_x, grid_y) 347 | plt.scatter(point_h * 32, point_w * 32, c='black') 348 | plt.gca().invert_yaxis() 349 | 350 | anchor_left = grid_x - anchor_w / 2 351 | anchor_top = grid_y - anchor_h / 2 352 | 353 | rect1 = plt.Rectangle([anchor_left[0, 0, point_h, point_w],anchor_top[0, 0, point_h, point_w]], \ 354 | anchor_w[0, 0, point_h, point_w],anchor_h[0, 0, point_h, point_w],color="r",fill=False) 355 | rect2 = plt.Rectangle([anchor_left[0, 1, point_h, point_w],anchor_top[0, 1, point_h, point_w]], \ 356 | anchor_w[0, 1, point_h, point_w],anchor_h[0, 1, point_h, point_w],color="r",fill=False) 357 | rect3 = plt.Rectangle([anchor_left[0, 2, point_h, point_w],anchor_top[0, 2, point_h, point_w]], \ 358 | anchor_w[0, 2, point_h, point_w],anchor_h[0, 2, point_h, point_w],color="r",fill=False) 359 | 360 | ax.add_patch(rect1) 361 | ax.add_patch(rect2) 362 | ax.add_patch(rect3) 363 | 364 | ax = fig.add_subplot(122) 365 | plt.imshow(img, alpha=0.5) 366 | plt.ylim(-30, 650) 367 | plt.xlim(-30, 650) 368 | plt.scatter(grid_x, grid_y) 369 | plt.scatter(point_h * 32, point_w * 32, c='black') 370 | plt.scatter(box_xy[0, :, point_h, point_w, 0], box_xy[0, :, point_h, point_w, 1], c='r') 371 | plt.gca().invert_yaxis() 372 | 373 | pre_left = box_xy[...,0] - box_wh[...,0] / 2 374 | pre_top = box_xy[...,1] - box_wh[...,1] / 2 375 | 376 | rect1 = plt.Rectangle([pre_left[0, 0, point_h, point_w], pre_top[0, 0, point_h, point_w]],\ 377 | box_wh[0, 0, point_h, point_w,0], box_wh[0, 0, point_h, point_w,1],color="r",fill=False) 378 | rect2 = plt.Rectangle([pre_left[0, 1, point_h, point_w], pre_top[0, 1, point_h, point_w]],\ 379 | box_wh[0, 1, point_h, point_w,0], box_wh[0, 1, point_h, point_w,1],color="r",fill=False) 380 | rect3 = plt.Rectangle([pre_left[0, 2, point_h, point_w], pre_top[0, 2, point_h, point_w]],\ 381 | box_wh[0, 2, point_h, point_w,0], box_wh[0, 2, point_h, point_w,1],color="r",fill=False) 382 | 383 | ax.add_patch(rect1) 384 | ax.add_patch(rect2) 385 | ax.add_patch(rect3) 386 | 387 | plt.show() 388 | # 389 | feat = torch.from_numpy(np.random.normal(0.2, 0.5, [4, 258, 20, 20])).float() 390 | anchors = np.array([[116, 90], [156, 198], [373, 326], [30,61], [62,45], [59,119], [10,13], [16,30], [33,23]]) 391 | anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]] 392 | get_anchors_and_decode(feat, [640, 640], anchors, anchors_mask, 80) 393 | -------------------------------------------------------------------------------- /utils/utils_fit.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import torch 4 | from tqdm import tqdm 5 | 6 | from utils.utils import get_lr 7 | 8 | def fit_one_epoch(model_train, model, ema, yolo_loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, cuda, fp16, scaler, save_period, save_dir, local_rank=0): 9 | loss = 0 10 | val_loss = 0 11 | 12 | if local_rank == 0: 13 | print('Start Train') 14 | pbar = tqdm(total=epoch_step,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) 15 | model_train.train() 16 | for iteration, batch in enumerate(gen): 17 | if iteration >= epoch_step: 18 | break 19 | 20 | images, targets = batch[0], batch[1] 21 | with torch.no_grad(): 22 | if cuda: 23 | images = images.cuda(local_rank) 24 | targets = targets.cuda(local_rank) 25 | #----------------------# 26 | # 清零梯度 27 | #----------------------# 28 | optimizer.zero_grad() 29 | if not fp16: 30 | #----------------------# 31 | # 前向传播 32 | #----------------------# 33 | outputs = model_train(images) 34 | loss_value = yolo_loss(outputs, targets, images) 35 | 36 | #----------------------# 37 | # 反向传播 38 | #----------------------# 39 | loss_value.backward() 40 | optimizer.step() 41 | else: 42 | from torch.cuda.amp import autocast 43 | with autocast(): 44 | #----------------------# 45 | # 前向传播 46 | #----------------------# 47 | outputs = model_train(images) 48 | loss_value = yolo_loss(outputs, targets, images) 49 | 50 | #----------------------# 51 | # 反向传播 52 | #----------------------# 53 | scaler.scale(loss_value).backward() 54 | scaler.step(optimizer) 55 | scaler.update() 56 | if ema: 57 | ema.update(model_train) 58 | 59 | loss += loss_value.item() 60 | 61 | if local_rank == 0: 62 | pbar.set_postfix(**{'loss' : loss / (iteration + 1), 63 | 'lr' : get_lr(optimizer)}) 64 | pbar.update(1) 65 | 66 | if local_rank == 0: 67 | pbar.close() 68 | print('Finish Train') 69 | print('Start Validation') 70 | pbar = tqdm(total=epoch_step_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3) 71 | 72 | if ema: 73 | model_train_eval = ema.ema 74 | else: 75 | model_train_eval = model_train.eval() 76 | 77 | for iteration, batch in enumerate(gen_val): 78 | if iteration >= epoch_step_val: 79 | break 80 | images, targets = batch[0], batch[1] 81 | with torch.no_grad(): 82 | if cuda: 83 | images = images.cuda(local_rank) 84 | targets = targets.cuda(local_rank) 85 | #----------------------# 86 | # 清零梯度 87 | #----------------------# 88 | optimizer.zero_grad() 89 | #----------------------# 90 | # 前向传播 91 | #----------------------# 92 | outputs = model_train_eval(images) 93 | loss_value = yolo_loss(outputs, targets, images) 94 | 95 | val_loss += loss_value.item() 96 | if local_rank == 0: 97 | pbar.set_postfix(**{'val_loss': val_loss / (iteration + 1)}) 98 | pbar.update(1) 99 | 100 | if local_rank == 0: 101 | pbar.close() 102 | print('Finish Validation') 103 | loss_history.append_loss(epoch + 1, loss / epoch_step, val_loss / epoch_step_val) 104 | eval_callback.on_epoch_end(epoch + 1, model_train_eval) 105 | print('Epoch:'+ str(epoch + 1) + '/' + str(Epoch)) 106 | print('Total Loss: %.3f || Val Loss: %.3f ' % (loss / epoch_step, val_loss / epoch_step_val)) 107 | 108 | #-----------------------------------------------# 109 | # 保存权值 110 | #-----------------------------------------------# 111 | if ema: 112 | save_state_dict = ema.ema.state_dict() 113 | else: 114 | save_state_dict = model.state_dict() 115 | 116 | if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch: 117 | torch.save(save_state_dict, os.path.join(save_dir, "ep%03d-loss%.3f-val_loss%.3f.pth" % (epoch + 1, loss / epoch_step, val_loss / epoch_step_val))) 118 | 119 | if len(loss_history.val_loss) <= 1 or (val_loss / epoch_step_val) <= min(loss_history.val_loss): 120 | print('Save best model to best_epoch_weights.pth') 121 | torch.save(save_state_dict, os.path.join(save_dir, "best_epoch_weights.pth")) 122 | 123 | torch.save(save_state_dict, os.path.join(save_dir, "last_epoch_weights.pth")) -------------------------------------------------------------------------------- /utils/utils_rbox.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Author: [egrt] 3 | Date: 2023-01-30 19:00:28 4 | LastEditors: Egrt 5 | LastEditTime: 2023-03-13 16:26:23 6 | Description: Oriented Bounding Boxes utils 7 | ''' 8 | 9 | import numpy as np 10 | import math 11 | pi = np.pi 12 | import cv2 13 | import torch 14 | 15 | def poly2rbox(polys): 16 | """ 17 | Trans poly format to rbox format. 18 | Args: 19 | polys (array): (num_gts, [x1 y1 x2 y2 x3 y3 x4 y4]) 20 | Returns: 21 | rboxes (array): (num_gts, [cx cy l s θ]) 22 | """ 23 | assert polys.shape[-1] == 8 24 | rboxes = [] 25 | for poly in polys: 26 | poly = np.float32(poly.reshape(4, 2)) 27 | (x, y), (w, h), angle = cv2.minAreaRect(poly) # θ ∈ [0, 90] 28 | theta = angle / 180 * pi # 转为pi制 29 | # trans opencv format to longedge format θ ∈ [-pi/2, pi/2] 30 | if w < h: 31 | w, h = h, w 32 | theta += np.pi / 2 33 | while not np.pi / 2 > theta >= -np.pi / 2: 34 | if theta >= np.pi / 2: 35 | theta -= np.pi 36 | else: 37 | theta += np.pi 38 | assert np.pi / 2 > theta >= -np.pi / 2 39 | rboxes.append([x, y, w, h, theta]) 40 | return np.array(rboxes) 41 | 42 | def poly2obb_np_le90(poly): 43 | """Convert polygons to oriented bounding boxes. 44 | Args: 45 | polys (ndarray): [x0,y0,x1,y1,x2,y2,x3,y3] 46 | Returns: 47 | obbs (ndarray): [x_ctr,y_ctr,w,h,angle] 48 | """ 49 | bboxps = np.array(poly).reshape((4, 2)) 50 | rbbox = cv2.minAreaRect(bboxps) 51 | x, y, w, h, a = rbbox[0][0], rbbox[0][1], rbbox[1][0], rbbox[1][1], rbbox[2] 52 | if w < 2 or h < 2: 53 | return 54 | a = a / 180 * np.pi 55 | if w < h: 56 | w, h = h, w 57 | a += np.pi / 2 58 | while not np.pi / 2 > a >= -np.pi / 2: 59 | if a >= np.pi / 2: 60 | a -= np.pi 61 | else: 62 | a += np.pi 63 | assert np.pi / 2 > a >= -np.pi / 2 64 | return x, y, w, h, a 65 | 66 | def poly2hbb(polys): 67 | """ 68 | Trans poly format to hbb format 69 | Args: 70 | rboxes (array/tensor): (num_gts, poly) 71 | 72 | Returns: 73 | hbboxes (array/tensor): (num_gts, [xc yc w h]) 74 | """ 75 | assert polys.shape[-1] == 8 76 | if isinstance(polys, torch.Tensor): 77 | x = polys[:, 0::2] # (num, 4) 78 | y = polys[:, 1::2] 79 | x_max = torch.amax(x, dim=1) # (num) 80 | x_min = torch.amin(x, dim=1) 81 | y_max = torch.amax(y, dim=1) 82 | y_min = torch.amin(y, dim=1) 83 | x_ctr, y_ctr = (x_max + x_min) / 2.0, (y_max + y_min) / 2.0 # (num) 84 | h = y_max - y_min # (num) 85 | w = x_max - x_min 86 | x_ctr, y_ctr, w, h = x_ctr.reshape(-1, 1), y_ctr.reshape(-1, 1), w.reshape(-1, 1), h.reshape(-1, 1) # (num, 1) 87 | hbboxes = torch.cat((x_ctr, y_ctr, w, h), dim=1) 88 | else: 89 | x = polys[:, 0::2] # (num, 4) 90 | y = polys[:, 1::2] 91 | x_max = np.amax(x, axis=1) # (num) 92 | x_min = np.amin(x, axis=1) 93 | y_max = np.amax(y, axis=1) 94 | y_min = np.amin(y, axis=1) 95 | x_ctr, y_ctr = (x_max + x_min) / 2.0, (y_max + y_min) / 2.0 # (num) 96 | h = y_max - y_min # (num) 97 | w = x_max - x_min 98 | x_ctr, y_ctr, w, h = x_ctr.reshape(-1, 1), y_ctr.reshape(-1, 1), w.reshape(-1, 1), h.reshape(-1, 1) # (num, 1) 99 | hbboxes = np.concatenate((x_ctr, y_ctr, w, h), axis=1) 100 | return hbboxes 101 | 102 | def rbox2poly(obboxes): 103 | """Convert oriented bounding boxes to polygons. 104 | Args: 105 | obbs (ndarray): [x_ctr,y_ctr,w,h,angle] 106 | Returns: 107 | polys (ndarray): [x0,y0,x1,y1,x2,y2,x3,y3] 108 | """ 109 | try: 110 | center, w, h, theta = np.split(obboxes, (2, 3, 4), axis=-1) 111 | except: 112 | results = np.stack([0., 0., 0., 0., 0., 0., 0., 0.], axis=-1) 113 | return results.reshape(1, -1) 114 | Cos, Sin = np.cos(theta), np.sin(theta) 115 | vector1 = np.concatenate([w / 2 * Cos, w / 2 * Sin], axis=-1) 116 | vector2 = np.concatenate([-h / 2 * Sin, h / 2 * Cos], axis=-1) 117 | point1 = center - vector1 - vector2 118 | point2 = center + vector1 - vector2 119 | point3 = center + vector1 + vector2 120 | point4 = center - vector1 + vector2 121 | polys = np.concatenate([point1, point2, point3, point4], axis=-1) 122 | polys = get_best_begin_point(polys) 123 | return polys 124 | 125 | def cal_line_length(point1, point2): 126 | """Calculate the length of line. 127 | Args: 128 | point1 (List): [x,y] 129 | point2 (List): [x,y] 130 | Returns: 131 | length (float) 132 | """ 133 | return math.sqrt( 134 | math.pow(point1[0] - point2[0], 2) + 135 | math.pow(point1[1] - point2[1], 2)) 136 | 137 | 138 | def get_best_begin_point_single(coordinate): 139 | """Get the best begin point of the single polygon. 140 | Args: 141 | coordinate (List): [x1, y1, x2, y2, x3, y3, x4, y4, score] 142 | Returns: 143 | reorder coordinate (List): [x1, y1, x2, y2, x3, y3, x4, y4, score] 144 | """ 145 | x1, y1, x2, y2, x3, y3, x4, y4 = coordinate 146 | xmin = min(x1, x2, x3, x4) 147 | ymin = min(y1, y2, y3, y4) 148 | xmax = max(x1, x2, x3, x4) 149 | ymax = max(y1, y2, y3, y4) 150 | combine = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], 151 | [[x2, y2], [x3, y3], [x4, y4], [x1, y1]], 152 | [[x3, y3], [x4, y4], [x1, y1], [x2, y2]], 153 | [[x4, y4], [x1, y1], [x2, y2], [x3, y3]]] 154 | dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]] 155 | force = 100000000.0 156 | force_flag = 0 157 | for i in range(4): 158 | temp_force = cal_line_length(combine[i][0], dst_coordinate[0]) \ 159 | + cal_line_length(combine[i][1], dst_coordinate[1]) \ 160 | + cal_line_length(combine[i][2], dst_coordinate[2]) \ 161 | + cal_line_length(combine[i][3], dst_coordinate[3]) 162 | if temp_force < force: 163 | force = temp_force 164 | force_flag = i 165 | if force_flag != 0: 166 | pass 167 | return np.hstack( 168 | (np.array(combine[force_flag]).reshape(8))) 169 | 170 | 171 | def get_best_begin_point(coordinates): 172 | """Get the best begin points of polygons. 173 | Args: 174 | coordinate (ndarray): shape(n, 8). 175 | Returns: 176 | reorder coordinate (ndarray): shape(n, 8). 177 | """ 178 | coordinates = list(map(get_best_begin_point_single, coordinates.tolist())) 179 | coordinates = np.array(coordinates) 180 | return coordinates 181 | 182 | def correct_rboxes(rboxes, image_shape): 183 | """将polys按比例进行缩放 184 | Args: 185 | coordinate (ndarray): shape(n, 8). 186 | Returns: 187 | reorder coordinate (ndarray): shape(n, 8). 188 | """ 189 | polys = rbox2poly(rboxes) 190 | nh, nw = image_shape 191 | polys[:, [0, 2, 4, 6]] *= nw 192 | polys[:, [1, 3, 5, 7]] *= nh 193 | 194 | return polys 195 | -------------------------------------------------------------------------------- /utils_coco/coco_annotation.py: -------------------------------------------------------------------------------- 1 | #-------------------------------------------------------# 2 | # 用于处理COCO数据集,根据json文件生成txt文件用于训练 3 | #-------------------------------------------------------# 4 | import json 5 | import os 6 | from collections import defaultdict 7 | 8 | #-------------------------------------------------------# 9 | # 指向了COCO训练集与验证集图片的路径 10 | #-------------------------------------------------------# 11 | train_datasets_path = "coco_dataset/train2017" 12 | val_datasets_path = "coco_dataset/val2017" 13 | 14 | #-------------------------------------------------------# 15 | # 指向了COCO训练集与验证集标签的路径 16 | #-------------------------------------------------------# 17 | train_annotation_path = "coco_dataset/annotations/instances_train2017.json" 18 | val_annotation_path = "coco_dataset/annotations/instances_val2017.json" 19 | 20 | #-------------------------------------------------------# 21 | # 生成的txt文件路径 22 | #-------------------------------------------------------# 23 | train_output_path = "coco_train.txt" 24 | val_output_path = "coco_val.txt" 25 | 26 | if __name__ == "__main__": 27 | name_box_id = defaultdict(list) 28 | id_name = dict() 29 | f = open(train_annotation_path, encoding='utf-8') 30 | data = json.load(f) 31 | 32 | annotations = data['annotations'] 33 | for ant in annotations: 34 | id = ant['image_id'] 35 | name = os.path.join(train_datasets_path, '%012d.jpg' % id) 36 | cat = ant['category_id'] 37 | if cat >= 1 and cat <= 11: 38 | cat = cat - 1 39 | elif cat >= 13 and cat <= 25: 40 | cat = cat - 2 41 | elif cat >= 27 and cat <= 28: 42 | cat = cat - 3 43 | elif cat >= 31 and cat <= 44: 44 | cat = cat - 5 45 | elif cat >= 46 and cat <= 65: 46 | cat = cat - 6 47 | elif cat == 67: 48 | cat = cat - 7 49 | elif cat == 70: 50 | cat = cat - 9 51 | elif cat >= 72 and cat <= 82: 52 | cat = cat - 10 53 | elif cat >= 84 and cat <= 90: 54 | cat = cat - 11 55 | name_box_id[name].append([ant['bbox'], cat]) 56 | 57 | f = open(train_output_path, 'w') 58 | for key in name_box_id.keys(): 59 | f.write(key) 60 | box_infos = name_box_id[key] 61 | for info in box_infos: 62 | x_min = int(info[0][0]) 63 | y_min = int(info[0][1]) 64 | x_max = x_min + int(info[0][2]) 65 | y_max = y_min + int(info[0][3]) 66 | 67 | box_info = " %d,%d,%d,%d,%d" % ( 68 | x_min, y_min, x_max, y_max, int(info[1])) 69 | f.write(box_info) 70 | f.write('\n') 71 | f.close() 72 | 73 | name_box_id = defaultdict(list) 74 | id_name = dict() 75 | f = open(val_annotation_path, encoding='utf-8') 76 | data = json.load(f) 77 | 78 | annotations = data['annotations'] 79 | for ant in annotations: 80 | id = ant['image_id'] 81 | name = os.path.join(val_datasets_path, '%012d.jpg' % id) 82 | cat = ant['category_id'] 83 | if cat >= 1 and cat <= 11: 84 | cat = cat - 1 85 | elif cat >= 13 and cat <= 25: 86 | cat = cat - 2 87 | elif cat >= 27 and cat <= 28: 88 | cat = cat - 3 89 | elif cat >= 31 and cat <= 44: 90 | cat = cat - 5 91 | elif cat >= 46 and cat <= 65: 92 | cat = cat - 6 93 | elif cat == 67: 94 | cat = cat - 7 95 | elif cat == 70: 96 | cat = cat - 9 97 | elif cat >= 72 and cat <= 82: 98 | cat = cat - 10 99 | elif cat >= 84 and cat <= 90: 100 | cat = cat - 11 101 | name_box_id[name].append([ant['bbox'], cat]) 102 | 103 | f = open(val_output_path, 'w') 104 | for key in name_box_id.keys(): 105 | f.write(key) 106 | box_infos = name_box_id[key] 107 | for info in box_infos: 108 | x_min = int(info[0][0]) 109 | y_min = int(info[0][1]) 110 | x_max = x_min + int(info[0][2]) 111 | y_max = y_min + int(info[0][3]) 112 | 113 | box_info = " %d,%d,%d,%d,%d" % ( 114 | x_min, y_min, x_max, y_max, int(info[1])) 115 | f.write(box_info) 116 | f.write('\n') 117 | f.close() 118 | -------------------------------------------------------------------------------- /utils_coco/get_map_coco.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | 4 | import numpy as np 5 | import torch 6 | from PIL import Image 7 | from pycocotools.coco import COCO 8 | from pycocotools.cocoeval import COCOeval 9 | from tqdm import tqdm 10 | 11 | from utils.utils import cvtColor, preprocess_input, resize_image 12 | from yolo import YOLO 13 | 14 | #---------------------------------------------------------------------------# 15 | # map_mode用于指定该文件运行时计算的内容 16 | # map_mode为0代表整个map计算流程,包括获得预测结果、计算map。 17 | # map_mode为1代表仅仅获得预测结果。 18 | # map_mode为2代表仅仅获得计算map。 19 | #---------------------------------------------------------------------------# 20 | map_mode = 0 21 | #-------------------------------------------------------# 22 | # 指向了验证集标签与图片路径 23 | #-------------------------------------------------------# 24 | cocoGt_path = 'coco_dataset/annotations/instances_val2017.json' 25 | dataset_img_path = 'coco_dataset/val2017' 26 | #-------------------------------------------------------# 27 | # 结果输出的文件夹,默认为map_out 28 | #-------------------------------------------------------# 29 | temp_save_path = 'map_out/coco_eval' 30 | 31 | class mAP_YOLO(YOLO): 32 | #---------------------------------------------------# 33 | # 检测图片 34 | #---------------------------------------------------# 35 | def detect_image(self, image_id, image, results, clsid2catid): 36 | #---------------------------------------------------# 37 | # 计算输入图片的高和宽 38 | #---------------------------------------------------# 39 | image_shape = np.array(np.shape(image)[0:2]) 40 | #---------------------------------------------------------# 41 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 42 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 43 | #---------------------------------------------------------# 44 | image = cvtColor(image) 45 | #---------------------------------------------------------# 46 | # 给图像增加灰条,实现不失真的resize 47 | # 也可以直接resize进行识别 48 | #---------------------------------------------------------# 49 | image_data = resize_image(image, (self.input_shape[1],self.input_shape[0]), self.letterbox_image) 50 | #---------------------------------------------------------# 51 | # 添加上batch_size维度 52 | #---------------------------------------------------------# 53 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 54 | 55 | with torch.no_grad(): 56 | images = torch.from_numpy(image_data) 57 | if self.cuda: 58 | images = images.cuda() 59 | #---------------------------------------------------------# 60 | # 将图像输入网络当中进行预测! 61 | #---------------------------------------------------------# 62 | outputs = self.net(images) 63 | outputs = self.bbox_util.decode_box(outputs) 64 | #---------------------------------------------------------# 65 | # 将预测框进行堆叠,然后进行非极大抑制 66 | #---------------------------------------------------------# 67 | outputs = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 68 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 69 | 70 | if outputs[0] is None: 71 | return results 72 | 73 | top_label = np.array(outputs[0][:, 6], dtype = 'int32') 74 | top_conf = outputs[0][:, 4] * outputs[0][:, 5] 75 | top_boxes = outputs[0][:, :4] 76 | 77 | for i, c in enumerate(top_label): 78 | result = {} 79 | top, left, bottom, right = top_boxes[i] 80 | 81 | result["image_id"] = int(image_id) 82 | result["category_id"] = clsid2catid[c] 83 | result["bbox"] = [float(left),float(top),float(right-left),float(bottom-top)] 84 | result["score"] = float(top_conf[i]) 85 | results.append(result) 86 | return results 87 | 88 | if __name__ == "__main__": 89 | if not os.path.exists(temp_save_path): 90 | os.makedirs(temp_save_path) 91 | 92 | cocoGt = COCO(cocoGt_path) 93 | ids = list(cocoGt.imgToAnns.keys()) 94 | clsid2catid = cocoGt.getCatIds() 95 | 96 | if map_mode == 0 or map_mode == 1: 97 | yolo = mAP_YOLO(confidence = 0.001, nms_iou = 0.65) 98 | 99 | with open(os.path.join(temp_save_path, 'eval_results.json'),"w") as f: 100 | results = [] 101 | for image_id in tqdm(ids): 102 | image_path = os.path.join(dataset_img_path, cocoGt.loadImgs(image_id)[0]['file_name']) 103 | image = Image.open(image_path) 104 | results = yolo.detect_image(image_id, image, results, clsid2catid) 105 | json.dump(results, f) 106 | 107 | if map_mode == 0 or map_mode == 2: 108 | cocoDt = cocoGt.loadRes(os.path.join(temp_save_path, 'eval_results.json')) 109 | cocoEval = COCOeval(cocoGt, cocoDt, 'bbox') 110 | cocoEval.evaluate() 111 | cocoEval.accumulate() 112 | cocoEval.summarize() 113 | print("Get map done.") 114 | -------------------------------------------------------------------------------- /voc_annotation.py: -------------------------------------------------------------------------------- 1 | import os 2 | import random 3 | import xml.etree.ElementTree as ET 4 | 5 | import numpy as np 6 | from utils.utils_rbox import * 7 | from utils.utils import get_classes 8 | 9 | #--------------------------------------------------------------------------------------------------------------------------------# 10 | # annotation_mode用于指定该文件运行时计算的内容 11 | # annotation_mode为0代表整个标签处理过程,包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt 12 | # annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt 13 | # annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt 14 | #--------------------------------------------------------------------------------------------------------------------------------# 15 | annotation_mode = 0 16 | #-------------------------------------------------------------------# 17 | # 必须要修改,用于生成2007_train.txt、2007_val.txt的目标信息 18 | # 与训练和预测所用的classes_path一致即可 19 | # 如果生成的2007_train.txt里面没有目标信息 20 | # 那么就是因为classes没有设定正确 21 | # 仅在annotation_mode为0和2的时候有效 22 | #-------------------------------------------------------------------# 23 | classes_path = 'model_data/uav_classes.txt' 24 | #--------------------------------------------------------------------------------------------------------------------------------# 25 | # trainval_percent用于指定(训练集+验证集)与测试集的比例,默认情况下 (训练集+验证集):测试集 = 9:1 26 | # train_percent用于指定(训练集+验证集)中训练集与验证集的比例,默认情况下 训练集:验证集 = 9:1 27 | # 仅在annotation_mode为0和1的时候有效 28 | #--------------------------------------------------------------------------------------------------------------------------------# 29 | trainval_percent = 0.9 30 | train_percent = 0.9 31 | #-------------------------------------------------------# 32 | # 指向VOC数据集所在的文件夹 33 | # 默认指向根目录下的VOC数据集 34 | #-------------------------------------------------------# 35 | VOCdevkit_path = 'VOCdevkit' 36 | 37 | VOCdevkit_sets = [('2007', 'train'), ('2007', 'val')] 38 | classes, _ = get_classes(classes_path) 39 | 40 | #-------------------------------------------------------# 41 | # 统计目标数量 42 | #-------------------------------------------------------# 43 | photo_nums = np.zeros(len(VOCdevkit_sets)) 44 | nums = np.zeros(len(classes)) 45 | def convert_annotation(year, image_id, list_file): 46 | in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8') 47 | tree=ET.parse(in_file) 48 | root = tree.getroot() 49 | 50 | for obj in root.iter('object'): 51 | difficult = 0 52 | if obj.find('difficult')!=None: 53 | difficult = obj.find('difficult').text 54 | cls = obj.find('name').text 55 | if cls not in classes or int(difficult)==1: 56 | continue 57 | cls_id = classes.index(cls) 58 | xmlbox = obj.find('robndbox') 59 | cx = float(xmlbox.find('cx').text) 60 | cy = float(xmlbox.find('cy').text) 61 | h = float(xmlbox.find('h').text) 62 | w = float(xmlbox.find('w').text) 63 | angle = float(xmlbox.find('angle').text) 64 | b = np.array([[cx, cy, w, h, angle]], dtype=np.float32) 65 | b = rbox2poly(b)[0] 66 | b = (b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7]) 67 | list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id)) 68 | 69 | nums[classes.index(cls)] = nums[classes.index(cls)] + 1 70 | 71 | if __name__ == "__main__": 72 | random.seed(0) 73 | if " " in os.path.abspath(VOCdevkit_path): 74 | raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格,否则会影响正常的模型训练,请注意修改。") 75 | 76 | if annotation_mode == 0 or annotation_mode == 1: 77 | print("Generate txt in ImageSets.") 78 | xmlfilepath = os.path.join(VOCdevkit_path, 'VOC2007/Annotations') 79 | saveBasePath = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Main') 80 | temp_xml = os.listdir(xmlfilepath) 81 | total_xml = [] 82 | for xml in temp_xml: 83 | if xml.endswith(".xml"): 84 | total_xml.append(xml) 85 | 86 | num = len(total_xml) 87 | list = range(num) 88 | tv = int(num*trainval_percent) 89 | tr = int(tv*train_percent) 90 | trainval= random.sample(list,tv) 91 | train = random.sample(trainval,tr) 92 | 93 | print("train and val size",tv) 94 | print("train size",tr) 95 | ftrainval = open(os.path.join(saveBasePath,'trainval.txt'), 'w') 96 | ftest = open(os.path.join(saveBasePath,'test.txt'), 'w') 97 | ftrain = open(os.path.join(saveBasePath,'train.txt'), 'w') 98 | fval = open(os.path.join(saveBasePath,'val.txt'), 'w') 99 | 100 | for i in list: 101 | name=total_xml[i][:-4]+'\n' 102 | if i in trainval: 103 | ftrainval.write(name) 104 | if i in train: 105 | ftrain.write(name) 106 | else: 107 | fval.write(name) 108 | else: 109 | ftest.write(name) 110 | 111 | ftrainval.close() 112 | ftrain.close() 113 | fval.close() 114 | ftest.close() 115 | print("Generate txt in ImageSets done.") 116 | 117 | if annotation_mode == 0 or annotation_mode == 2: 118 | print("Generate 2007_train.txt and 2007_val.txt for train.") 119 | type_index = 0 120 | for year, image_set in VOCdevkit_sets: 121 | image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split() 122 | list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8') 123 | for image_id in image_ids: 124 | list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(os.path.abspath(VOCdevkit_path), year, image_id)) 125 | 126 | convert_annotation(year, image_id, list_file) 127 | list_file.write('\n') 128 | photo_nums[type_index] = len(image_ids) 129 | type_index += 1 130 | list_file.close() 131 | print("Generate 2007_train.txt and 2007_val.txt for train done.") 132 | 133 | def printTable(List1, List2): 134 | for i in range(len(List1[0])): 135 | print("|", end=' ') 136 | for j in range(len(List1)): 137 | print(List1[j][i].rjust(int(List2[j])), end=' ') 138 | print("|", end=' ') 139 | print() 140 | 141 | str_nums = [str(int(x)) for x in nums] 142 | tableData = [ 143 | classes, str_nums 144 | ] 145 | colWidths = [0]*len(tableData) 146 | len1 = 0 147 | for i in range(len(tableData)): 148 | for j in range(len(tableData[i])): 149 | if len(tableData[i][j]) > colWidths[i]: 150 | colWidths[i] = len(tableData[i][j]) 151 | printTable(tableData, colWidths) 152 | 153 | if photo_nums[0] <= 500: 154 | print("训练集数量小于500,属于较小的数据量,请注意设置较大的训练世代(Epoch)以满足足够的梯度下降次数(Step)。") 155 | 156 | if np.sum(nums) == 0: 157 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 158 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 159 | print("在数据集中并未获得任何目标,请注意修改classes_path对应自己的数据集,并且保证标签名字正确,否则训练将会没有任何效果!") 160 | print("(重要的事情说三遍)。") 161 | -------------------------------------------------------------------------------- /yolo.py: -------------------------------------------------------------------------------- 1 | import colorsys 2 | import os 3 | import time 4 | 5 | import numpy as np 6 | import torch 7 | import torch.nn as nn 8 | from PIL import ImageDraw, ImageFont 9 | 10 | from nets.yolo import YoloBody 11 | from utils.utils import (cvtColor, get_anchors, get_classes, preprocess_input, 12 | resize_image, show_config) 13 | from utils.utils_bbox import DecodeBox 14 | from utils.utils_rbox import * 15 | ''' 16 | 训练自己的数据集必看注释! 17 | ''' 18 | class YOLO(object): 19 | _defaults = { 20 | #--------------------------------------------------------------------------# 21 | # 使用自己训练好的模型进行预测一定要修改model_path和classes_path! 22 | # model_path指向logs文件夹下的权值文件,classes_path指向model_data下的txt 23 | # 24 | # 训练好后logs文件夹下存在多个权值文件,选择验证集损失较低的即可。 25 | # 验证集损失较低不代表mAP较高,仅代表该权值在验证集上泛化性能较好。 26 | # 如果出现shape不匹配,同时要注意训练时的model_path和classes_path参数的修改 27 | #--------------------------------------------------------------------------# 28 | "model_path" : 'model_data/yolov7_tiny_obb_uav.pth', 29 | "classes_path" : 'model_data/uav_classes.txt', 30 | #---------------------------------------------------------------------# 31 | # anchors_path代表先验框对应的txt文件,一般不修改。 32 | # anchors_mask用于帮助代码找到对应的先验框,一般不修改。 33 | #---------------------------------------------------------------------# 34 | "anchors_path" : 'model_data/yolo_anchors.txt', 35 | "anchors_mask" : [[6, 7, 8], [3, 4, 5], [0, 1, 2]], 36 | #---------------------------------------------------------------------# 37 | # 输入图片的大小,必须为32的倍数。 38 | #---------------------------------------------------------------------# 39 | "input_shape" : [640, 640], 40 | #---------------------------------------------------------------------# 41 | # 只有得分大于置信度的预测框会被保留下来 42 | #---------------------------------------------------------------------# 43 | "confidence" : 0.5, 44 | #---------------------------------------------------------------------# 45 | # 非极大抑制所用到的nms_iou大小 46 | #---------------------------------------------------------------------# 47 | "nms_iou" : 0.3, 48 | #---------------------------------------------------------------------# 49 | # 该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize, 50 | # 在多次测试后,发现关闭letterbox_image直接resize的效果更好 51 | #---------------------------------------------------------------------# 52 | "letterbox_image" : True, 53 | #-------------------------------# 54 | # 是否使用Cuda 55 | # 没有GPU可以设置成False 56 | #-------------------------------# 57 | "cuda" : True, 58 | } 59 | 60 | @classmethod 61 | def get_defaults(cls, n): 62 | if n in cls._defaults: 63 | return cls._defaults[n] 64 | else: 65 | return "Unrecognized attribute name '" + n + "'" 66 | 67 | #---------------------------------------------------# 68 | # 初始化YOLO 69 | #---------------------------------------------------# 70 | def __init__(self, **kwargs): 71 | self.__dict__.update(self._defaults) 72 | for name, value in kwargs.items(): 73 | setattr(self, name, value) 74 | self._defaults[name] = value 75 | 76 | #---------------------------------------------------# 77 | # 获得种类和先验框的数量 78 | #---------------------------------------------------# 79 | self.class_names, self.num_classes = get_classes(self.classes_path) 80 | self.anchors, self.num_anchors = get_anchors(self.anchors_path) 81 | self.bbox_util = DecodeBox(self.anchors, self.num_classes, (self.input_shape[0], self.input_shape[1]), self.anchors_mask) 82 | #---------------------------------------------------# 83 | # 画框设置不同的颜色 84 | #---------------------------------------------------# 85 | hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)] 86 | self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) 87 | self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors)) 88 | self.generate() 89 | 90 | show_config(**self._defaults) 91 | 92 | #---------------------------------------------------# 93 | # 生成模型 94 | #---------------------------------------------------# 95 | def generate(self, onnx=False, trt=False): 96 | #---------------------------------------------------# 97 | # 建立yolo模型,载入yolo模型的权重 98 | #---------------------------------------------------# 99 | self.net = YoloBody(self.anchors_mask, self.num_classes) 100 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 101 | self.net.load_state_dict(torch.load(self.model_path, map_location=device)) 102 | self.net = self.net.fuse().eval() 103 | print('{} model, and classes loaded.'.format(self.model_path)) 104 | if not onnx: 105 | if self.cuda: 106 | self.net = nn.DataParallel(self.net) 107 | self.net = self.net.cuda() 108 | if trt: 109 | from torch2trt import TRTModule 110 | 111 | model_trt = TRTModule() 112 | model_trt.load_state_dict(torch.load('model_data/yolov7_tiny_trt.pth')) 113 | self.net = model_trt 114 | #---------------------------------------------------# 115 | # 检测图片 116 | #---------------------------------------------------# 117 | def detect_image(self, image, crop = False, count = False): 118 | #---------------------------------------------------# 119 | # 计算输入图片的高和宽 120 | #---------------------------------------------------# 121 | image_shape = np.array(np.shape(image)[0:2]) 122 | #---------------------------------------------------------# 123 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 124 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 125 | #---------------------------------------------------------# 126 | image = cvtColor(image) 127 | #---------------------------------------------------------# 128 | # 给图像增加灰条,实现不失真的resize 129 | # 也可以直接resize进行识别 130 | #---------------------------------------------------------# 131 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 132 | #---------------------------------------------------------# 133 | # 添加上batch_size维度 134 | # h, w, 3 => 3, h, w => 1, 3, h, w 135 | #---------------------------------------------------------# 136 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 137 | 138 | with torch.no_grad(): 139 | images = torch.from_numpy(image_data) 140 | if self.cuda: 141 | images = images.cuda() 142 | #---------------------------------------------------------# 143 | # 将图像输入网络当中进行预测! 144 | #---------------------------------------------------------# 145 | outputs = self.net(images) 146 | outputs = self.bbox_util.decode_box(outputs) 147 | #---------------------------------------------------------# 148 | # 将预测框进行堆叠,然后进行非极大抑制 149 | #---------------------------------------------------------# 150 | results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 151 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 152 | 153 | if results[0] is None: 154 | return image 155 | 156 | top_label = np.array(results[0][:, 7], dtype = 'int32') 157 | top_conf = results[0][:, 5] * results[0][:, 6] 158 | top_rboxes = results[0][:, :5] 159 | top_polys = rbox2poly(top_rboxes) 160 | #---------------------------------------------------------# 161 | # 设置字体与边框厚度 162 | #---------------------------------------------------------# 163 | font = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) 164 | thickness = int(max((image.size[0] + image.size[1]) // np.mean(self.input_shape), 1)) 165 | #---------------------------------------------------------# 166 | # 计数 167 | #---------------------------------------------------------# 168 | if count: 169 | print("top_label:", top_label) 170 | classes_nums = np.zeros([self.num_classes]) 171 | for i in range(self.num_classes): 172 | num = np.sum(top_label == i) 173 | if num > 0: 174 | print(self.class_names[i], " : ", num) 175 | classes_nums[i] = num 176 | print("classes_nums:", classes_nums) 177 | #---------------------------------------------------------# 178 | # 图像绘制 179 | #---------------------------------------------------------# 180 | for i, c in list(enumerate(top_label)): 181 | predicted_class = self.class_names[int(c)] 182 | poly = top_polys[i].astype(np.int32) 183 | score = top_conf[i] 184 | 185 | polygon_list = list(poly) 186 | label = '{} {:.2f}'.format(predicted_class, score) 187 | draw = ImageDraw.Draw(image) 188 | label = label.encode('utf-8') 189 | 190 | text_origin = np.array([poly[0], poly[1]], np.int32) 191 | 192 | draw.polygon(xy=polygon_list, outline=self.colors[c]) 193 | draw.text(text_origin, str(label,'UTF-8'), fill=self.colors[c], font=font) 194 | del draw 195 | 196 | return image 197 | 198 | def get_FPS(self, image, test_interval): 199 | image_shape = np.array(np.shape(image)[0:2]) 200 | #---------------------------------------------------------# 201 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 202 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 203 | #---------------------------------------------------------# 204 | image = cvtColor(image) 205 | #---------------------------------------------------------# 206 | # 给图像增加灰条,实现不失真的resize 207 | # 也可以直接resize进行识别 208 | #---------------------------------------------------------# 209 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 210 | #---------------------------------------------------------# 211 | # 添加上batch_size维度 212 | #---------------------------------------------------------# 213 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 214 | 215 | with torch.no_grad(): 216 | images = torch.from_numpy(image_data) 217 | if self.cuda: 218 | images = images.cuda() 219 | #---------------------------------------------------------# 220 | # 将图像输入网络当中进行预测! 221 | #---------------------------------------------------------# 222 | outputs = self.net(images) 223 | outputs = self.bbox_util.decode_box(outputs) 224 | #---------------------------------------------------------# 225 | # 将预测框进行堆叠,然后进行非极大抑制 226 | #---------------------------------------------------------# 227 | results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 228 | image_shape, self.letterbox_image, conf_thres=self.confidence, nms_thres=self.nms_iou) 229 | 230 | t1 = time.time() 231 | for _ in range(test_interval): 232 | with torch.no_grad(): 233 | #---------------------------------------------------------# 234 | # 将图像输入网络当中进行预测! 235 | #---------------------------------------------------------# 236 | outputs = self.net(images) 237 | outputs = self.bbox_util.decode_box(outputs) 238 | #---------------------------------------------------------# 239 | # 将预测框进行堆叠,然后进行非极大抑制 240 | #---------------------------------------------------------# 241 | results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 242 | image_shape, self.letterbox_image, conf_thres=self.confidence, nms_thres=self.nms_iou) 243 | 244 | t2 = time.time() 245 | tact_time = (t2 - t1) / test_interval 246 | return tact_time 247 | 248 | def detect_heatmap(self, image, heatmap_save_path): 249 | import cv2 250 | import matplotlib.pyplot as plt 251 | def sigmoid(x): 252 | y = 1.0 / (1.0 + np.exp(-x)) 253 | return y 254 | #---------------------------------------------------------# 255 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 256 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 257 | #---------------------------------------------------------# 258 | image = cvtColor(image) 259 | #---------------------------------------------------------# 260 | # 给图像增加灰条,实现不失真的resize 261 | # 也可以直接resize进行识别 262 | #---------------------------------------------------------# 263 | image_data = resize_image(image, (self.input_shape[1],self.input_shape[0]), self.letterbox_image) 264 | #---------------------------------------------------------# 265 | # 添加上batch_size维度 266 | #---------------------------------------------------------# 267 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 268 | 269 | with torch.no_grad(): 270 | images = torch.from_numpy(image_data) 271 | if self.cuda: 272 | images = images.cuda() 273 | #---------------------------------------------------------# 274 | # 将图像输入网络当中进行预测! 275 | #---------------------------------------------------------# 276 | outputs = self.net(images) 277 | 278 | plt.imshow(image, alpha=1) 279 | plt.axis('off') 280 | mask = np.zeros((image.size[1], image.size[0])) 281 | for sub_output in outputs: 282 | sub_output = sub_output.cpu().numpy() 283 | b, c, h, w = np.shape(sub_output) 284 | sub_output = np.transpose(np.reshape(sub_output, [b, 3, -1, h, w]), [0, 3, 4, 1, 2])[0] 285 | score = np.max(sigmoid(sub_output[..., 4]), -1) 286 | score = cv2.resize(score, (image.size[0], image.size[1])) 287 | normed_score = (score * 255).astype('uint8') 288 | mask = np.maximum(mask, normed_score) 289 | 290 | plt.imshow(mask, alpha=0.5, interpolation='nearest', cmap="jet") 291 | 292 | plt.axis('off') 293 | plt.subplots_adjust(top=1, bottom=0, right=1, left=0, hspace=0, wspace=0) 294 | plt.margins(0, 0) 295 | plt.savefig(heatmap_save_path, dpi=200, bbox_inches='tight', pad_inches = -0.1) 296 | print("Save to the " + heatmap_save_path) 297 | plt.show() 298 | 299 | def convert_to_onnx(self, simplify, model_path): 300 | import onnx 301 | self.generate(onnx=True) 302 | 303 | im = torch.zeros(1, 3, *self.input_shape).to('cpu') # image size(1, 3, 512, 512) BCHW 304 | input_layer_names = ["images"] 305 | output_layer_names = ["output"] 306 | 307 | # Export the model 308 | print(f'Starting export with onnx {onnx.__version__}.') 309 | torch.onnx.export(self.net, 310 | im, 311 | f = model_path, 312 | verbose = False, 313 | opset_version = 12, 314 | training = torch.onnx.TrainingMode.EVAL, 315 | do_constant_folding = True, 316 | input_names = input_layer_names, 317 | output_names = output_layer_names, 318 | dynamic_axes = None) 319 | 320 | # Checks 321 | model_onnx = onnx.load(model_path) # load onnx model 322 | onnx.checker.check_model(model_onnx) # check onnx model 323 | 324 | # Simplify onnx 325 | if simplify: 326 | import onnxsim 327 | print(f'Simplifying with onnx-simplifier {onnxsim.__version__}.') 328 | model_onnx, check = onnxsim.simplify( 329 | model_onnx, 330 | dynamic_input_shape=False, 331 | input_shapes=None) 332 | assert check, 'assert check failed' 333 | onnx.save(model_onnx, model_path) 334 | 335 | print('Onnx model save as {}'.format(model_path)) 336 | 337 | def get_map_txt(self, image_id, image, class_names, map_out_path): 338 | f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"), "w", encoding='utf-8') 339 | image_shape = np.array(np.shape(image)[0:2]) 340 | #---------------------------------------------------------# 341 | # 在这里将图像转换成RGB图像,防止灰度图在预测时报错。 342 | # 代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB 343 | #---------------------------------------------------------# 344 | image = cvtColor(image) 345 | #---------------------------------------------------------# 346 | # 给图像增加灰条,实现不失真的resize 347 | # 也可以直接resize进行识别 348 | #---------------------------------------------------------# 349 | image_data = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image) 350 | #---------------------------------------------------------# 351 | # 添加上batch_size维度 352 | #---------------------------------------------------------# 353 | image_data = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0) 354 | 355 | with torch.no_grad(): 356 | images = torch.from_numpy(image_data) 357 | if self.cuda: 358 | images = images.cuda() 359 | #---------------------------------------------------------# 360 | # 将图像输入网络当中进行预测! 361 | #---------------------------------------------------------# 362 | outputs = self.net(images) 363 | outputs = self.bbox_util.decode_box(outputs) 364 | #---------------------------------------------------------# 365 | # 将预测框进行堆叠,然后进行非极大抑制 366 | #---------------------------------------------------------# 367 | results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 368 | image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou) 369 | 370 | if results[0] is None: 371 | return 372 | 373 | top_label = np.array(results[0][:, 7], dtype = 'int32') 374 | top_conf = results[0][:, 5] * results[0][:, 6] 375 | top_rboxes = results[0][:, :5] 376 | for i, c in list(enumerate(top_label)): 377 | predicted_class = self.class_names[int(c)] 378 | obb = top_rboxes[i] 379 | score = str(top_conf[i]) 380 | 381 | xc, yc, w, h, angle = obb 382 | 383 | if predicted_class not in class_names: 384 | continue 385 | 386 | f.write("%s %s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(xc)), str(int(yc)), str(int(w)), str(int(h)), str(math.degrees(angle)))) 387 | 388 | f.close() 389 | return 390 | -------------------------------------------------------------------------------- /常见问题汇总.md: -------------------------------------------------------------------------------- 1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。 2 | 3 | # 问题汇总 4 | ## 1、下载问题 5 | ### a、代码下载 6 | **问:up主,可以给我发一份代码吗,代码在哪里下载啊? 7 | 答:Github上的地址就在视频简介里。复制一下就能进去下载了。** 8 | 9 | **问:up主,为什么我下载的代码提示压缩包损坏? 10 | 答:重新去Github下载。** 11 | 12 | **问:up主,为什么我下载的代码和你在视频以及博客上的代码不一样? 13 | 答:我常常会对代码进行更新,最终以实际的代码为准。** 14 | 15 | ### b、 权值下载 16 | **问:up主,为什么我下载的代码里面,model_data下面没有.pth或者.h5文件? 17 | 答:我一般会把权值上传到Github和百度网盘,在GITHUB的README里面就能找到。** 18 | 19 | ### c、 数据集下载 20 | **问:up主,XXXX数据集在哪里下载啊? 21 | 答:一般数据集的下载地址我会放在README里面,基本上都有,没有的话请及时联系我添加,直接发github的issue即可**。 22 | 23 | ## 2、环境配置问题 24 | ### a、20系列及以下显卡环境配置 25 | **pytorch代码对应的pytorch版本为1.2,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。 26 | 27 | **keras代码对应的tensorflow版本为1.13.2,keras版本是2.1.5,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。 28 | 29 | **tf2代码对应的tensorflow版本为2.2.0,无需安装keras,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。 30 | 31 | **问:你的代码某某某版本的tensorflow和pytorch能用嘛? 32 | 答:最好按照我推荐的配置,配置教程也有!其它版本的我没有试过!可能出现问题但是一般问题不大。仅需要改少量代码即可。** 33 | 34 | ### b、30系列显卡环境配置 35 | 30系显卡由于框架更新不可使用上述环境配置教程。 36 | 当前我已经测试的可以用的30显卡配置如下: 37 | **pytorch代码对应的pytorch版本为1.7.0,cuda为11.0,cudnn为8.0.5,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120668551](https://blog.csdn.net/weixin_44791964/article/details/120668551)。 38 | 39 | **keras代码无法在win10下配置cuda11,在ubuntu下可以百度查询一下,配置tensorflow版本为1.15.4,keras版本是2.1.5或者2.3.1(少量函数接口不同,代码可能还需要少量调整。)** 40 | 41 | **tf2代码对应的tensorflow版本为2.4.0,cuda为11.0,cudnn为8.0.5,博客地址对应为**[https://blog.csdn.net/weixin_44791964/article/details/120657664](https://blog.csdn.net/weixin_44791964/article/details/120657664)。 42 | 43 | ### c、CPU环境配置 44 | **pytorch代码对应的pytorch-cpu版本为1.2,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120655098](https://blog.csdn.net/weixin_44791964/article/details/120655098) 45 | 46 | **keras代码对应的tensorflow-cpu版本为1.13.2,keras版本是2.1.5,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120653717](https://blog.csdn.net/weixin_44791964/article/details/120653717)。 47 | 48 | **tf2代码对应的tensorflow-cpu版本为2.2.0,无需安装keras,博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120656291](https://blog.csdn.net/weixin_44791964/article/details/120656291)。 49 | 50 | 51 | ### d、GPU利用问题与环境使用问题 52 | **问:为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢? 53 | 答:确认tensorflow-gpu已经装好,利用pip list查看tensorflow版本,然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练,任务管理器的话要看显存使用情况。** 54 | 55 | **问:up主,我好像没有在用gpu进行训练啊,怎么看是不是用了GPU进行训练? 56 | 答:查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令。在windows电脑中打开cmd然后利用nvidia-smi指令查看GPU利用情况** 57 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/f88ef794c9a341918f000eb2b1c67af6.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16) 58 | **如果要一定看任务管理器的话,请看性能部分GPU的显存是否利用,或者查看任务管理器的Cuda,而非Copy。** 59 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center) 60 | 61 | ### e、DLL load failed: 找不到指定的模块 62 | **问:出现如下错误** 63 | ```python 64 | Traceback (most recent call last): 65 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in 66 | from tensorflow.python.pywrap_tensorflow_internal import * 67 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in 68 | pywrap_tensorflow_internal = swig_import_helper() 69 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper 70 | _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description) 71 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file) 72 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic 73 | return _load(spec) 74 | ImportError: DLL load failed: 找不到指定的模块。 75 | ``` 76 | **答:如果没重启过就重启一下,否则重新按照步骤安装,还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。** 77 | 78 | ### f、no module问题(no module name utils.utils、no module named 'matplotlib' ) 79 | **问:为什么提示说no module name utils.utils(no module name nets.yolo、no module name nets.ssd等一系列问题)啊? 80 | 答:utils并不需要用pip装,它就在我上传的仓库的根目录,出现这个问题的原因是根目录不对,查查相对目录和根目录的概念。查了基本上就明白了。** 81 | 82 | **问:为什么提示说no module name matplotlib(no module name PIL,no module name cv2等等)? 83 | 答:这个库没安装打开命令行安装就好。pip install matplotlib** 84 | 85 | **问:为什么我已经用pip装了opencv(pillow、matplotlib等),还是提示no module name cv2? 86 | 答:没有激活环境装,要激活对应的conda环境进行安装才可以正常使用** 87 | 88 | **问:为什么提示说No module named 'torch' ? 89 | 答:其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况?一般就俩情况,一个是真的没装,还有一个是装到其它环境了,当前激活的环境不是自己装的环境。** 90 | 91 | **问:为什么提示说No module named 'tensorflow' ? 92 | 答:同上。** 93 | 94 | ### g、cuda安装失败问题 95 | 一般cuda安装前需要安装Visual Studio,装个2017版本即可。 96 | 97 | ### h、Ubuntu系统问题 98 | **所有代码在Ubuntu下可以使用,我两个系统都试过。** 99 | 100 | ### i、VSCODE提示错误的问题 101 | **问:为什么在VSCODE里面提示一大堆的错误啊? 102 | 答:我也提示一大堆的错误,但是不影响,是VSCODE的问题,如果不想看错误的话就装Pycharm。 103 | 最好将设置里面的Python:Language Server,调整为Pylance。** 104 | 105 | ### j、使用cpu进行训练与预测的问题 106 | **对于keras和tf2的代码而言,如果想用cpu进行训练和预测,直接装cpu版本的tensorflow就可以了。** 107 | 108 | **对于pytorch的代码而言,如果想用cpu进行训练和预测,需要将cuda=True修改成cuda=False。** 109 | 110 | ### k、tqdm没有pos参数问题 111 | **问:运行代码提示'tqdm' object has no attribute 'pos'。 112 | 答:重装tqdm,换个版本就可以了。** 113 | 114 | ### l、提示decode(“utf-8”)的问题 115 | **由于h5py库的更新,安装过程中会自动安装h5py=3.0.0以上的版本,会导致decode("utf-8")的错误! 116 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0!** 117 | ``` 118 | pip install h5py==2.10.0 119 | ``` 120 | 121 | ### m、提示TypeError: __array__() takes 1 positional argument but 2 were given错误 122 | 可以修改pillow版本解决。 123 | ``` 124 | pip install pillow==8.2.0 125 | ``` 126 | ### n、如何查看当前cuda和cudnn 127 | **window下cuda版本查看方式如下: 128 | 1、打开cmd窗口。 129 | 2、输入nvcc -V。 130 | 3、Cuda compilation tools, release XXXXXXXX中的XXXXXXXX即cuda版本。** 131 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/0389ea35107a408a80ab5cb6590d5a74.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16) 132 | window下cudnn版本查看方式如下: 133 | 1、进入cuda安装目录,进入incude文件夹。 134 | 2、找到cudnn.h文件。 135 | 3、右键文本打开,下拉,看到#define处可获得cudnn版本。 136 | ```python 137 | #define CUDNN_MAJOR 7 138 | #define CUDNN_MINOR 4 139 | #define CUDNN_PATCHLEVEL 1 140 | ``` 141 | 代表cudnn为7.4.1。 142 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/7a86b68b17c84feaa6fa95780d4ae4b4.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16) 143 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/81bb7c3e13cc492292530e4b69df86a9.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16) 144 | 145 | ### o、为什么按照你的环境配置后还是不能使用 146 | **问:up主,为什么我按照你的环境配置后还是不能使用? 147 | 答:请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。** 148 | 149 | ### p、其它问题 150 | **问:为什么提示TypeError: cat() got an unexpected keyword argument 'axis',Traceback (most recent call last),AttributeError: 'Tensor' object has no attribute 'bool'? 151 | 答:这是版本问题,建议使用torch1.2以上版本** 152 | 153 | **其它有很多稀奇古怪的问题,很多是版本问题,建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2,就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。** 154 | 155 | ## 3、目标检测库问题汇总(人脸检测和分类库也可参考) 156 | ### a、shape不匹配问题。 157 | #### 1)、训练时shape不匹配问题。 158 | **问:up主,为什么运行train.py会提示shape不匹配啊? 159 | 答:在keras环境中,因为你训练的种类和原始的种类不同,网络结构会变化,所以最尾部的shape会有少量不匹配。** 160 | 161 | #### 2)、预测时shape不匹配问题。 162 | **问:为什么我运行predict.py会提示我说shape不匹配呀。** 163 | ##### i、copying a param with shape torch.Size([75, 704, 1, 1]) from checkpoint 164 | 在Pytorch里面是这样的: 165 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png) 166 | ##### ii、Shapes are [1,1,1024,75] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,75], [255,1024,1,1]. 167 | 在Keras里面是这样的: 168 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70) 169 | **答:原因主要有仨: 170 | 1、训练的classes_path没改,就开始训练了。 171 | 2、训练的model_path没改。 172 | 3、训练的classes_path没改。 173 | 请检查清楚了!确定自己所用的model_path和classes_path是对应的!训练的时候用到的num_classes或者classes_path也需要检查!** 174 | 175 | ### b、显存不足问题(OOM、RuntimeError: CUDA out of memory)。 176 | **问:为什么我运行train.py下面的命令行闪的贼快,还提示OOM啥的? 177 | 答:这是在keras中出现的,爆显存了,可以改小batch_size,SSD的显存占用率是最小的,建议用SSD; 178 | 2G显存:SSD、YOLOV4-TINY 179 | 4G显存:YOLOV3 180 | 6G显存:YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等 181 | 8G+显存:随便选吧。** 182 | **需要注意的是,受到BatchNorm2d影响,batch_size不可为1,至少为2。** 183 | 184 | **问:为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)? 185 | 答:这是pytorch中出现的,爆显存了,同上。** 186 | 187 | **问:为什么我显存都没利用,就直接爆显存了? 188 | 答:都爆显存了,自然就不利用了,模型没有开始训练。** 189 | ### c、为什么要进行冻结训练与解冻训练,不进行行吗? 190 | **问:为什么要冻结训练和解冻训练呀? 191 | 答:可以不进行,本质上是为了保证性能不足的同学的训练,如果电脑性能完全不够,可以将Freeze_Epoch和UnFreeze_Epoch设置成一样,只进行冻结训练。** 192 | 193 | **同时这也是迁移学习的思想,因为神经网络主干特征提取部分所提取到的特征是通用的,我们冻结起来训练可以加快训练效率,也可以防止权值被破坏。** 194 | 在冻结阶段,模型的主干被冻结了,特征提取网络不发生改变。占用的显存较小,仅对网络进行微调。 195 | 在解冻阶段,模型的主干不被冻结了,特征提取网络会发生改变。占用的显存较大,网络所有的参数都会发生改变。 196 | 197 | ### d、我的LOSS好大啊,有问题吗?(我的LOSS好小啊,有问题吗?) 198 | **问:为什么我的网络不收敛啊,LOSS是XXXX。 199 | 答:不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,我的yolo代码都没有归一化,所以LOSS值看起来比较高,LOSS的值不重要,重要的是是否在变小,预测是否有效果。** 200 | 201 | ### e、为什么我训练出来的模型没有预测结果? 202 | **问:为什么我的训练效果不好?预测了没有框(框不准)。 203 | 答:** 204 | 考虑几个问题: 205 | 1、目标信息问题,查看2007_train.txt文件是否有目标信息,没有的话请修改voc_annotation.py。 206 | 2、数据集问题,小于500的自行考虑增加数据集,同时测试不同的模型,确认数据集是好的。 207 | 3、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 208 | 4、网络问题,比如SSD不适合小目标,因为先验框固定了。 209 | 5、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 210 | 6、确认自己是否按照步骤去做了,如果比如voc_annotation.py里面的classes是否修改了等。 211 | 7、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。 212 | 8、是否修改了网络的主干,如果修改了没有预训练权重,网络不容易收敛,自然效果不好。 213 | 214 | ### f、为什么我计算出来的map是0? 215 | **问:为什么我的训练效果不好?没有map? 216 | 答:** 217 | 首先尝试利用predict.py预测一下,如果有效果的话应该是get_map.py里面的classes_path设置错误。如果没有预测结果的话,解决方法同e问题,对下面几点进行检查: 218 | 1、目标信息问题,查看2007_train.txt文件是否有目标信息,没有的话请修改voc_annotation.py。 219 | 2、数据集问题,小于500的自行考虑增加数据集,同时测试不同的模型,确认数据集是好的。 220 | 3、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 221 | 4、网络问题,比如SSD不适合小目标,因为先验框固定了。 222 | 5、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 223 | 6、确认自己是否按照步骤去做了,如果比如voc_annotation.py里面的classes是否修改了等。 224 | 7、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。 225 | 8、是否修改了网络的主干,如果修改了没有预训练权重,网络不容易收敛,自然效果不好。 226 | 227 | ### g、gbk编码错误('gbk' codec can't decode byte)。 228 | **问:我怎么出现了gbk什么的编码错误啊:** 229 | ```python 230 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence 231 | ``` 232 | **答:标签和路径不要使用中文,如果一定要使用中文,请注意处理的时候编码的问题,改成打开文件的encoding方式改为utf-8。** 233 | 234 | ### h、我的图片是xxx*xxx的分辨率的,可以用吗? 235 | **问:我的图片是xxx*xxx的分辨率的,可以用吗!** 236 | **答:可以用,代码里面会自动进行resize与数据增强。** 237 | 238 | ### i、我想进行数据增强!怎么增强? 239 | **问:我想要进行数据增强!怎么做呢?** 240 | **答:可以用,代码里面会自动进行resize与数据增强。** 241 | 242 | ### j、多GPU训练。 243 | **问:怎么进行多GPU训练? 244 | 答:pytorch的大多数代码可以直接使用gpu训练,keras的话直接百度就好了,实现并不复杂,我没有多卡没法详细测试,还需要各位同学自己努力了。** 245 | 246 | ### k、能不能训练灰度图? 247 | **问:能不能训练灰度图(预测灰度图)啊? 248 | 答:我的大多数库会将灰度图转化成RGB进行训练和预测,如果遇到代码不能训练或者预测灰度图的情况,可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB,预测的时候也这样试试。(仅供参考)** 249 | 250 | ### l、断点续练问题。 251 | **问:我已经训练过几个世代了,能不能从这个基础上继续开始训练 252 | 答:可以,你在训练前,和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面,将model_path修改成你要开始的权值的路径即可。** 253 | 254 | ### m、我要训练其它的数据集,预训练权重能不能用? 255 | **问:如果我要训练其它的数据集,预训练权重要怎么办啊?** 256 | **答:数据的预训练权重对不同数据集是通用的,因为特征是通用的,预训练权重对于99%的情况都必须要用,不用的话权值太过随机,特征提取效果不明显,网络训练的结果也不会好。** 257 | 258 | ### n、网络如何从0开始训练? 259 | **问:我要怎么不使用预训练权重啊? 260 | 答:看一看注释、大多数代码是model_path = '',Freeze_Train = Fasle**,如果设置model_path无用,**那么把载入预训练权重的代码注释了就行。** 261 | 262 | ### o、为什么从0开始训练效果这么差(修改了网络主干,效果不好怎么办)? 263 | **问:为什么我不使用预训练权重效果这么差啊? 264 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 265 | 266 | **问:up,我修改了网络,预训练权重还能用吗? 267 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 268 | 权值匹配的方式可以参考如下: 269 | ```python 270 | # 加快模型训练的效率 271 | print('Loading weights into state dict...') 272 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 273 | model_dict = model.state_dict() 274 | pretrained_dict = torch.load(model_path, map_location=device) 275 | a = {} 276 | for k, v in pretrained_dict.items(): 277 | try: 278 | if np.shape(model_dict[k]) == np.shape(v): 279 | a[k]=v 280 | except: 281 | pass 282 | model_dict.update(a) 283 | model.load_state_dict(model_dict) 284 | print('Finished!') 285 | ``` 286 | 287 | **问:为什么从0开始训练效果这么差(我修改了网络主干,效果不好怎么办)? 288 | 答:一般来讲,网络从0开始的训练效果会很差,因为权值太过随机,特征提取效果不明显,因此非常、非常、非常不建议大家从0开始训练!如果一定要从0开始,可以了解imagenet数据集,首先训练分类模型,获得网络的主干部分权值,分类模型的 主干部分 和该模型通用,基于此进行训练。 289 | 网络修改了主干之后也是同样的问题,随机的权值效果很差。** 290 | 291 | **问:怎么在模型上从0开始训练? 292 | 答:在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力,无法使得网络正常收敛。** 293 | 如果一定要从0开始,那么训练的时候请注意几点: 294 | - 不载入预训练权重。 295 | - 不要进行冻结训练,注释冻结模型的代码。 296 | 297 | **问:为什么我不使用预训练权重效果这么差啊? 298 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 299 | 300 | ### p、你的权值都是哪里来的? 301 | **问:如果网络不能从0开始训练的话你的权值哪里来的? 302 | 答:有些权值是官方转换过来的,有些权值是自己训练出来的,我用到的主干的imagenet的权值都是官方的。** 303 | 304 | ### q、视频检测与摄像头检测 305 | **问:怎么用摄像头检测呀? 306 | 答:predict.py修改参数可以进行摄像头检测,也有视频详细解释了摄像头检测的思路。** 307 | 308 | **问:怎么用视频检测呀? 309 | 答:同上** 310 | 311 | ### r、如何保存检测出的图片 312 | **问:检测完的图片怎么保存? 313 | 答:一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。** 314 | 315 | **问:怎么用视频保存呀? 316 | 答:详细看看predict.py文件的注释。** 317 | 318 | ### s、遍历问题 319 | **问:如何对一个文件夹的图片进行遍历? 320 | 答:一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了,详细看看predict.py文件的注释。** 321 | 322 | **问:如何对一个文件夹的图片进行遍历?并且保存。 323 | 答:遍历的话一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2,那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。** 324 | 325 | ### t、路径问题(No such file or directory、StopIteration: [Errno 13] Permission denied: 'XXXXXX') 326 | **问:我怎么出现了这样的错误呀:** 327 | ```python 328 | FileNotFoundError: 【Errno 2】 No such file or directory 329 | StopIteration: [Errno 13] Permission denied: 'D:\\Study\\Collection\\Dataset\\VOC07+12+test\\VOCdevkit/VOC2007' 330 | …………………………………… 331 | …………………………………… 332 | ``` 333 | **答:去检查一下文件夹路径,查看是否有对应文件;并且检查一下2007_train.txt,其中文件路径是否有错。** 334 | 关于路径有几个重要的点: 335 | **文件夹名称中一定不要有空格。 336 | 注意相对路径和绝对路径。 337 | 多百度路径相关的知识。** 338 | 339 | **所有的路径问题基本上都是根目录问题,好好查一下相对目录的概念!** 340 | ### u、和原版比较问题,你怎么和原版不一样啊? 341 | **问:原版的代码是XXX,为什么你的代码是XXX? 342 | 答:是啊……这要不怎么说我不是原版呢……** 343 | 344 | **问:你这个代码和原版比怎么样,可以达到原版的效果么? 345 | 答:基本上可以达到,我都用voc数据测过,我没有好显卡,没有能力在coco上测试与训练。** 346 | 347 | **问:你有没有实现yolov4所有的tricks,和原版差距多少? 348 | 答:并没有实现全部的改进部分,由于YOLOV4使用的改进实在太多了,很难完全实现与列出来,这里只列出来了一些我比较感兴趣,而且非常有效的改进。论文中提到的SAM(注意力机制模块),作者自己的源码也没有使用。还有其它很多的tricks,不是所有的tricks都有提升,我也没法实现全部的tricks。至于和原版的比较,我没有能力训练coco数据集,根据使用过的同学反应差距不大。** 349 | 350 | ### v、我的检测速度是xxx正常吗?我的检测速度还能增快吗? 351 | **问:你这个FPS可以到达多少,可以到 XX FPS么? 352 | 答:FPS和机子的配置有关,配置高就快,配置低就慢。** 353 | 354 | **问:我的检测速度是xxx正常吗?我的检测速度还能增快吗? 355 | 答:看配置,配置好速度就快,如果想要配置不变的情况下加快速度,就要修改网络了。** 356 | 357 | **问:为什么我用服务器去测试yolov4(or others)的FPS只有十几? 358 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。** 359 | 360 | **问:为什么论文中说速度可以达到XX,但是这里却没有? 361 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。有些论文还会使用多batch进行预测,我并没有去实现这个部分。** 362 | 363 | ### w、预测图片不显示问题 364 | **问:为什么你的代码在预测完成后不显示图片?只是在命令行告诉我有什么目标。 365 | 答:给系统安装一个图片查看器就行了。** 366 | 367 | ### x、算法评价问题(目标检测的map、PR曲线、Recall、Precision等) 368 | **问:怎么计算map? 369 | 答:看map视频,都一个流程。** 370 | 371 | **问:计算map的时候,get_map.py里面有一个MINOVERLAP是什么用的,是iou吗? 372 | 答:是iou,它的作用是判断预测框和真实框的重合成度,如果重合程度大于MINOVERLAP,则预测正确。** 373 | 374 | **问:为什么get_map.py里面的self.confidence(self.score)要设置的那么小? 375 | 答:看一下map的视频的原理部分,要知道所有的结果然后再进行pr曲线的绘制。** 376 | 377 | **问:能不能说说怎么绘制PR曲线啥的呀。 378 | 答:可以看mAP视频,结果里面有PR曲线。** 379 | 380 | **问:怎么计算Recall、Precision指标。 381 | 答:这俩指标应该是相对于特定的置信度的,计算map的时候也会获得。** 382 | 383 | ### y、coco数据集训练问题 384 | **问:目标检测怎么训练COCO数据集啊?。 385 | 答:coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库,格式都是一样的。** 386 | 387 | ### z、UP,怎么优化模型啊?我想提升效果 388 | **问:up,怎么修改模型啊,我想发个小论文! 389 | 答:建议看看yolov3和yolov4的区别,然后看看yolov4的论文,作为一个大型调参现场非常有参考意义,使用了很多tricks。我能给的建议就是多看一些经典模型,然后拆解里面的亮点结构并使用。** 390 | 391 | ### aa、UP,有Focal LOSS的代码吗?怎么改啊? 392 | **问:up,YOLO系列使用Focal LOSS的代码你有吗,有提升吗? 393 | 答:很多人试过,提升效果也不大(甚至变的更Low),它自己有自己的正负样本的平衡方式**。改代码的事情,还是自己好好看看代码吧。 394 | 395 | ### ab、部署问题(ONNX、TensorRT等) 396 | 我没有具体部署到手机等设备上过,所以很多部署问题我并不了解…… 397 | 398 | ## 4、语义分割库问题汇总 399 | ### a、shape不匹配问题 400 | #### 1)、训练时shape不匹配问题 401 | **问:up主,为什么运行train.py会提示shape不匹配啊? 402 | 答:在keras环境中,因为你训练的种类和原始的种类不同,网络结构会变化,所以最尾部的shape会有少量不匹配。** 403 | 404 | #### 2)、预测时shape不匹配问题 405 | **问:为什么我运行predict.py会提示我说shape不匹配呀。** 406 | ##### i、copying a param with shape torch.Size([75, 704, 1, 1]) from checkpoint 407 | 在Pytorch里面是这样的: 408 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png) 409 | ##### ii、Shapes are [1,1,1024,75] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,75], [255,1024,1,1]. 410 | 在Keras里面是这样的: 411 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70) 412 | **答:原因主要有二: 413 | 1、train.py里面的num_classes没改。 414 | 2、预测时num_classes没改。 415 | 3、预测时model_path没改。 416 | 请检查清楚!训练和预测的时候用到的num_classes都需要检查!** 417 | 418 | ### b、显存不足问题(OOM、RuntimeError: CUDA out of memory)。 419 | **问:为什么我运行train.py下面的命令行闪的贼快,还提示OOM啥的? 420 | 答:这是在keras中出现的,爆显存了,可以改小batch_size。** 421 | 422 | **需要注意的是,受到BatchNorm2d影响,batch_size不可为1,至少为2。** 423 | 424 | **问:为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)? 425 | 答:这是pytorch中出现的,爆显存了,同上。** 426 | 427 | **问:为什么我显存都没利用,就直接爆显存了? 428 | 答:都爆显存了,自然就不利用了,模型没有开始训练。** 429 | 430 | ### c、为什么要进行冻结训练与解冻训练,不进行行吗? 431 | **问:为什么要冻结训练和解冻训练呀? 432 | 答:可以不进行,本质上是为了保证性能不足的同学的训练,如果电脑性能完全不够,可以将Freeze_Epoch和UnFreeze_Epoch设置成一样,只进行冻结训练。** 433 | 434 | **同时这也是迁移学习的思想,因为神经网络主干特征提取部分所提取到的特征是通用的,我们冻结起来训练可以加快训练效率,也可以防止权值被破坏。** 435 | 在冻结阶段,模型的主干被冻结了,特征提取网络不发生改变。占用的显存较小,仅对网络进行微调。 436 | 在解冻阶段,模型的主干不被冻结了,特征提取网络会发生改变。占用的显存较大,网络所有的参数都会发生改变。 437 | 438 | ### d、我的LOSS好大啊,有问题吗?(我的LOSS好小啊,有问题吗?) 439 | **问:为什么我的网络不收敛啊,LOSS是XXXX。 440 | 答:不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,我的yolo代码都没有归一化,所以LOSS值看起来比较高,LOSS的值不重要,重要的是是否在变小,预测是否有效果。** 441 | 442 | ### e、为什么我训练出来的模型没有预测结果? 443 | **问:为什么我的训练效果不好?预测了没有框(框不准)。 444 | 答:** 445 | **考虑几个问题: 446 | 1、数据集问题,这是最重要的问题。小于500的自行考虑增加数据集;一定要检查数据集的标签,视频中详细解析了VOC数据集的格式,但并不是有输入图片有输出标签即可,还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对,最常见的错误格式就是标签的背景为黑,目标为白,此时目标的像素点值为255,无法正常训练,目标需要为1才行。 447 | 2、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 448 | 3、网络问题,可以尝试不同的网络。 449 | 4、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 450 | 5、确认自己是否按照步骤去做了。 451 | 6、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。** 452 | 453 | **问:为什么我的训练效果不好?对小目标预测不准确。 454 | 答:对于deeplab和pspnet而言,可以修改一下downsample_factor,当downsample_factor为16的时候下采样倍数过多,效果不太好,可以修改为8。** 455 | 456 | ### f、为什么我计算出来的miou是0? 457 | **问:为什么我的训练效果不好?计算出来的miou是0?。** 458 | 答: 459 | 与e类似,**考虑几个问题: 460 | 1、数据集问题,这是最重要的问题。小于500的自行考虑增加数据集;一定要检查数据集的标签,视频中详细解析了VOC数据集的格式,但并不是有输入图片有输出标签即可,还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对,最常见的错误格式就是标签的背景为黑,目标为白,此时目标的像素点值为255,无法正常训练,目标需要为1才行。 461 | 2、是否解冻训练,如果数据集分布与常规画面差距过大需要进一步解冻训练,调整主干,加强特征提取能力。 462 | 3、网络问题,可以尝试不同的网络。 463 | 4、训练时长问题,有些同学只训练了几代表示没有效果,按默认参数训练完。 464 | 5、确认自己是否按照步骤去做了。 465 | 6、不同网络的LOSS不同,LOSS只是一个参考指标,用于查看网络是否收敛,而非评价网络好坏,LOSS的值不重要,重要的是是否收敛。** 466 | 467 | ### g、gbk编码错误('gbk' codec can't decode byte)。 468 | **问:我怎么出现了gbk什么的编码错误啊:** 469 | ```python 470 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence 471 | ``` 472 | **答:标签和路径不要使用中文,如果一定要使用中文,请注意处理的时候编码的问题,改成打开文件的encoding方式改为utf-8。** 473 | 474 | ### h、我的图片是xxx*xxx的分辨率的,可以用吗? 475 | **问:我的图片是xxx*xxx的分辨率的,可以用吗!** 476 | **答:可以用,代码里面会自动进行resize与数据增强。** 477 | 478 | ### i、我想进行数据增强!怎么增强? 479 | **问:我想要进行数据增强!怎么做呢?** 480 | **答:可以用,代码里面会自动进行resize与数据增强。** 481 | 482 | ### j、多GPU训练。 483 | **问:怎么进行多GPU训练? 484 | 答:pytorch的大多数代码可以直接使用gpu训练,keras的话直接百度就好了,实现并不复杂,我没有多卡没法详细测试,还需要各位同学自己努力了。** 485 | 486 | ### k、能不能训练灰度图? 487 | **问:能不能训练灰度图(预测灰度图)啊? 488 | 答:我的大多数库会将灰度图转化成RGB进行训练和预测,如果遇到代码不能训练或者预测灰度图的情况,可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB,预测的时候也这样试试。(仅供参考)** 489 | 490 | ### l、断点续练问题。 491 | **问:我已经训练过几个世代了,能不能从这个基础上继续开始训练 492 | 答:可以,你在训练前,和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面,将model_path修改成你要开始的权值的路径即可。** 493 | 494 | ### m、我要训练其它的数据集,预训练权重能不能用? 495 | **问:如果我要训练其它的数据集,预训练权重要怎么办啊?** 496 | **答:数据的预训练权重对不同数据集是通用的,因为特征是通用的,预训练权重对于99%的情况都必须要用,不用的话权值太过随机,特征提取效果不明显,网络训练的结果也不会好。** 497 | 498 | ### n、网络如何从0开始训练? 499 | **问:我要怎么不使用预训练权重啊? 500 | 答:看一看注释、大多数代码是model_path = '',Freeze_Train = Fasle**,如果设置model_path无用,**那么把载入预训练权重的代码注释了就行。** 501 | 502 | ### o、为什么从0开始训练效果这么差(修改了网络主干,效果不好怎么办)? 503 | **问:为什么我不使用预训练权重效果这么差啊? 504 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,预训练权重还是非常重要的。** 505 | 506 | **问:up,我修改了网络,预训练权重还能用吗? 507 | 答:修改了主干的话,如果不是用的现有的网络,基本上预训练权重是不能用的,要么就自己判断权值里卷积核的shape然后自己匹配,要么只能自己预训练去了;修改了后半部分的话,前半部分的主干部分的预训练权重还是可以用的,如果是pytorch代码的话,需要自己修改一下载入权值的方式,判断shape后载入,如果是keras代码,直接by_name=True,skip_mismatch=True即可。** 508 | 权值匹配的方式可以参考如下: 509 | ```python 510 | # 加快模型训练的效率 511 | print('Loading weights into state dict...') 512 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') 513 | model_dict = model.state_dict() 514 | pretrained_dict = torch.load(model_path, map_location=device) 515 | a = {} 516 | for k, v in pretrained_dict.items(): 517 | try: 518 | if np.shape(model_dict[k]) == np.shape(v): 519 | a[k]=v 520 | except: 521 | pass 522 | model_dict.update(a) 523 | model.load_state_dict(model_dict) 524 | print('Finished!') 525 | ``` 526 | 527 | **问:为什么从0开始训练效果这么差(我修改了网络主干,效果不好怎么办)? 528 | 答:一般来讲,网络从0开始的训练效果会很差,因为权值太过随机,特征提取效果不明显,因此非常、非常、非常不建议大家从0开始训练!如果一定要从0开始,可以了解imagenet数据集,首先训练分类模型,获得网络的主干部分权值,分类模型的 主干部分 和该模型通用,基于此进行训练。 529 | 网络修改了主干之后也是同样的问题,随机的权值效果很差。** 530 | 531 | **问:怎么在模型上从0开始训练? 532 | 答:在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力,无法使得网络正常收敛。** 533 | 如果一定要从0开始,那么训练的时候请注意几点: 534 | - 不载入预训练权重。 535 | - 不要进行冻结训练,注释冻结模型的代码。 536 | 537 | **问:为什么我不使用预训练权重效果这么差啊? 538 | 答:因为随机初始化的权值不好,提取的特征不好,也就导致了模型训练的效果不好,voc07+12、coco+voc07+12效果都不一样,预训练权重还是非常重要的。** 539 | 540 | ### p、你的权值都是哪里来的? 541 | **问:如果网络不能从0开始训练的话你的权值哪里来的? 542 | 答:有些权值是官方转换过来的,有些权值是自己训练出来的,我用到的主干的imagenet的权值都是官方的。** 543 | 544 | 545 | ### q、视频检测与摄像头检测 546 | **问:怎么用摄像头检测呀? 547 | 答:predict.py修改参数可以进行摄像头检测,也有视频详细解释了摄像头检测的思路。** 548 | 549 | **问:怎么用视频检测呀? 550 | 答:同上** 551 | 552 | ### r、如何保存检测出的图片 553 | **问:检测完的图片怎么保存? 554 | 答:一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。** 555 | 556 | **问:怎么用视频保存呀? 557 | 答:详细看看predict.py文件的注释。** 558 | 559 | ### s、遍历问题 560 | **问:如何对一个文件夹的图片进行遍历? 561 | 答:一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了,详细看看predict.py文件的注释。** 562 | 563 | **问:如何对一个文件夹的图片进行遍历?并且保存。 564 | 答:遍历的话一般使用os.listdir先找出文件夹里面的所有图片,然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image,所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2,那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。** 565 | 566 | ### t、路径问题(No such file or directory、StopIteration: [Errno 13] Permission denied: 'XXXXXX') 567 | **问:我怎么出现了这样的错误呀:** 568 | ```python 569 | FileNotFoundError: 【Errno 2】 No such file or directory 570 | StopIteration: [Errno 13] Permission denied: 'D:\\Study\\Collection\\Dataset\\VOC07+12+test\\VOCdevkit/VOC2007' 571 | …………………………………… 572 | …………………………………… 573 | ``` 574 | **答:去检查一下文件夹路径,查看是否有对应文件;并且检查一下2007_train.txt,其中文件路径是否有错。** 575 | 关于路径有几个重要的点: 576 | **文件夹名称中一定不要有空格。 577 | 注意相对路径和绝对路径。 578 | 多百度路径相关的知识。** 579 | 580 | **所有的路径问题基本上都是根目录问题,好好查一下相对目录的概念!** 581 | ### u、和原版比较问题,你怎么和原版不一样啊? 582 | **问:原版的代码是XXX,为什么你的代码是XXX? 583 | 答:是啊……这要不怎么说我不是原版呢……** 584 | 585 | **问:你这个代码和原版比怎么样,可以达到原版的效果么? 586 | 答:基本上可以达到,我都用voc数据测过,我没有好显卡,没有能力在coco上测试与训练。** 587 | 588 | ### v、我的检测速度是xxx正常吗?我的检测速度还能增快吗? 589 | **问:你这个FPS可以到达多少,可以到 XX FPS么? 590 | 答:FPS和机子的配置有关,配置高就快,配置低就慢。** 591 | 592 | **问:我的检测速度是xxx正常吗?我的检测速度还能增快吗? 593 | 答:看配置,配置好速度就快,如果想要配置不变的情况下加快速度,就要修改网络了。** 594 | 595 | **问:为什么论文中说速度可以达到XX,但是这里却没有? 596 | 答:检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本,如果已经正确安装,可以去利用time.time()的方法查看detect_image里面,哪一段代码耗时更长(不仅只有网络耗时长,其它处理部分也会耗时,如绘图等)。有些论文还会使用多batch进行预测,我并没有去实现这个部分。** 597 | 598 | ### w、预测图片不显示问题 599 | **问:为什么你的代码在预测完成后不显示图片?只是在命令行告诉我有什么目标。 600 | 答:给系统安装一个图片查看器就行了。** 601 | 602 | ### x、算法评价问题(miou) 603 | **问:怎么计算miou? 604 | 答:参考视频里的miou测量部分。** 605 | 606 | **问:怎么计算Recall、Precision指标。 607 | 答:现有的代码还无法获得,需要各位同学理解一下混淆矩阵的概念,然后自行计算一下。** 608 | 609 | ### y、UP,怎么优化模型啊?我想提升效果 610 | **问:up,怎么修改模型啊,我想发个小论文! 611 | 答:建议目标检测中的yolov4论文,作为一个大型调参现场非常有参考意义,使用了很多tricks。我能给的建议就是多看一些经典模型,然后拆解里面的亮点结构并使用。** 612 | 613 | ### z、部署问题(ONNX、TensorRT等) 614 | 我没有具体部署到手机等设备上过,所以很多部署问题我并不了解…… 615 | 616 | ## 5、交流群问题 617 | **问:up,有没有QQ群啥的呢? 618 | 答:没有没有,我没有时间管理QQ群……** 619 | 620 | ## 6、怎么学习的问题 621 | **问:up,你的学习路线怎么样的?我是个小白我要怎么学? 622 | 答:这里有几点需要注意哈 623 | 1、我不是高手,很多东西我也不会,我的学习路线也不一定适用所有人。 624 | 2、我实验室不做深度学习,所以我很多东西都是自学,自己摸索,正确与否我也不知道。 625 | 3、我个人觉得学习更靠自学** 626 | 学习路线的话,我是先学习了莫烦的python教程,从tensorflow、keras、pytorch入门,入门完之后学的SSD,YOLO,然后了解了很多经典的卷积网,后面就开始学很多不同的代码了,我的学习方法就是一行一行的看,了解整个代码的执行流程,特征层的shape变化等,花了很多时间也没有什么捷径,就是要花时间吧。 627 | --------------------------------------------------------------------------------