├── .gitignore
├── LICENSE
├── README.md
├── VOCdevkit
    └── VOC2007
    │   ├── Annotations
    │       └── README.md
    │   ├── ImageSets
    │       └── Main
    │       │   └── README.md
    │   └── JPEGImages
    │       └── README.md
├── convert_trt.py
├── get_map.py
├── hrsc_annotation.py
├── img
    └── test.jpg
├── kmeans_for_anchors.py
├── logs
    └── README.md
├── model_data
    ├── coco_classes.txt
    ├── simhei.ttf
    ├── uav_classes.txt
    ├── voc_classes.txt
    └── yolo_anchors.txt
├── nets
    ├── __init__.py
    ├── backbone.py
    ├── yolo.py
    └── yolo_training.py
├── predict.py
├── requirements.txt
├── summary.py
├── train.py
├── utils
    ├── __init__.py
    ├── callbacks.py
    ├── dataloader.py
    ├── kld_loss.py
    ├── nms_rotated
    │   ├── __init__.py
    │   ├── nms_rotated_wrapper.py
    │   ├── setup.py
    │   └── src
    │   │   ├── box_iou_rotated_utils.h
    │   │   ├── nms_rotated_cpu.cpp
    │   │   ├── nms_rotated_cuda.cu
    │   │   ├── nms_rotated_ext.cpp
    │   │   ├── poly_nms_cpu.cpp
    │   │   └── poly_nms_cuda.cu
    ├── utils.py
    ├── utils_bbox.py
    ├── utils_fit.py
    ├── utils_map.py
    └── utils_rbox.py
├── utils_coco
    ├── coco_annotation.py
    └── get_map_coco.py
├── voc_annotation.py
├── yolo.py
└── 常见问题汇总.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # ignore map, miou, datasets
  2 | *.mp4
  3 | map_out/
  4 | miou_out/
  5 | VOCdevkit/
  6 | datasets/
  7 | Medical_Datasets/
  8 | lfw/
  9 | logs/
 10 | model_data/
 11 | .temp_map_out/
 12 | 
 13 | # Byte-compiled / optimized / DLL files
 14 | __pycache__/
 15 | *.py[cod]
 16 | *$py.class
 17 | 
 18 | # C extensions
 19 | *.so
 20 | 
 21 | # Distribution / packaging
 22 | .Python
 23 | build/
 24 | develop-eggs/
 25 | dist/
 26 | downloads/
 27 | eggs/
 28 | .eggs/
 29 | lib/
 30 | lib64/
 31 | parts/
 32 | sdist/
 33 | var/
 34 | wheels/
 35 | pip-wheel-metadata/
 36 | share/python-wheels/
 37 | *.egg-info/
 38 | .installed.cfg
 39 | *.egg
 40 | MANIFEST
 41 | 
 42 | # PyInstaller
 43 | #  Usually these files are written by a python script from a template
 44 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 45 | *.manifest
 46 | *.spec
 47 | 
 48 | # Installer logs
 49 | pip-log.txt
 50 | pip-delete-this-directory.txt
 51 | 
 52 | # Unit test / coverage reports
 53 | htmlcov/
 54 | .tox/
 55 | .nox/
 56 | .coverage
 57 | .coverage.*
 58 | .cache
 59 | nosetests.xml
 60 | coverage.xml
 61 | *.cover
 62 | *.py,cover
 63 | .hypothesis/
 64 | .pytest_cache/
 65 | 
 66 | # Translations
 67 | *.mo
 68 | *.pot
 69 | 
 70 | # Django stuff:
 71 | *.log
 72 | local_settings.py
 73 | db.sqlite3
 74 | db.sqlite3-journal
 75 | 
 76 | # Flask stuff:
 77 | instance/
 78 | .webassets-cache
 79 | 
 80 | # Scrapy stuff:
 81 | .scrapy
 82 | 
 83 | # Sphinx documentation
 84 | docs/_build/
 85 | 
 86 | # PyBuilder
 87 | target/
 88 | 
 89 | # Jupyter Notebook
 90 | .ipynb_checkpoints
 91 | 
 92 | # IPython
 93 | profile_default/
 94 | ipython_config.py
 95 | 
 96 | # pyenv
 97 | .python-version
 98 | 
 99 | # pipenv
100 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
101 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
102 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
103 | #   install all needed dependencies.
104 | #Pipfile.lock
105 | 
106 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
107 | __pypackages__/
108 | 
109 | # Celery stuff
110 | celerybeat-schedule
111 | celerybeat.pid
112 | 
113 | # SageMath parsed files
114 | *.sage.py
115 | 
116 | # Environments
117 | .env
118 | .venv
119 | env/
120 | venv/
121 | ENV/
122 | env.bak/
123 | venv.bak/
124 | 
125 | # Spyder project settings
126 | .spyderproject
127 | .spyproject
128 | 
129 | # Rope project settings
130 | .ropeproject
131 | 
132 | # mkdocs documentation
133 | /site
134 | 
135 | # mypy
136 | .mypy_cache/
137 | .dmypy.json
138 | dmypy.json
139 | 
140 | # Pyre type checker
141 | .pyre/
142 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | ## YOLOV7-Tiny-OBB：You Only Look Once OBB旋转目标检测模型在pytorch当中的实现
  2 | ---
  3 | 
  4 | ## 目录
  5 | 1. [仓库更新 Top News](#仓库更新)
  6 | 2. [相关仓库 Related code](#相关仓库)
  7 | 3. [性能情况 Performance](#性能情况)
  8 | 4. [所需环境 Environment](#所需环境)
  9 | 5. [文件下载 Download](#文件下载)
 10 | 6. [训练步骤 How2train](#训练步骤)
 11 | 7. [预测步骤 How2predict](#预测步骤)
 12 | 8. [评估步骤 How2eval](#评估步骤)
 13 | 9. [参考资料 Reference](#Reference)
 14 | 
 15 | ## Top News
 16 | **`2023-02`**:**仓库创建，支持step、cos学习率下降法、支持adam、sgd优化器选择、支持学习率根据batch_size自适应调整、新增图片裁剪、支持多GPU训练、支持各个种类目标数量计算、支持heatmap、支持EMA。**  
 17 | 
 18 | ## 相关仓库
 19 | | 目标检测模型 | 路径 |
 20 | | :----- | :----- |
 21 | YoloV7-OBB | https://github.com/Egrt/yolov7-obb
 22 | YoloV7-Tiny-OBB | https://github.com/Egrt/yolov7-tiny-obb
 23 | 
 24 | ## 性能情况
 25 | | 训练数据集 | 权值文件名称	| 测试数据集 | 输入图片大小 | mAP 0.5 | fps |
 26 | | :-----: | :------: | :------: | :------: | :------: | :------: |
 27 | | UAV-ROD | [yolov7_tiny_obb_uav](https://github.com/Egrt/yolov7-tiny-obb/releases/download/V1.0.0/yolov7_tiny_obb_uav.pth) | UAV-ROD-Val | 640x640 | 98.00% | 50 |
 28 | | UAV-ROD | [yolov7_tiny_trt](https://github.com/Egrt/yolov7-tiny-obb/releases/download/V1.0.0/yolov7_tiny_trt.pth) | UAV-ROD-Val | 640x640 | 97.75% | 120 |
 29 | ### 预测结果展示
 30 | ![预测结果](img/test.jpg)
 31 | ## 所需环境
 32 | cuda==11.3
 33 | torch==1.10.1
 34 | torchvision==0.11.2  
 35 | 为了使用amp混合精度，推荐使用torch1.7.1以上的版本。
 36 | 
 37 | ## 文件下载
 38 | 
 39 | UAV-ROD数据集下载地址如下，里面已经包括了训练集、测试集、验证集（与测试集一样），无需再次划分：  
 40 | 链接: https://pan.baidu.com/s/1Ae8AGb2L6zCjCwJFzs2WfA 
 41 | 提取码: ybec 
 42 | 
 43 | ## 训练步骤
 44 | ### a、训练VOC07+12数据集
 45 | 1. 数据集的准备   
 46 | **本文使用VOC格式进行训练，训练前需要下载好VOC07+12的数据集，解压后放在根目录**  
 47 | 
 48 | 2. 数据集的处理   
 49 | 修改voc_annotation.py里面的annotation_mode=2，运行voc_annotation.py生成根目录下的2007_train.txt和2007_val.txt。   
 50 | 生成的数据集格式为image_path, x1, y1, x2, y2, x3, y3, x4, y4(polygon), class。 
 51 | 
 52 | 3. 开始网络训练   
 53 | train.py的默认参数用于训练VOC数据集，直接运行train.py即可开始训练。   
 54 | 
 55 | 4. 训练结果预测   
 56 | 训练结果预测需要用到两个文件，分别是yolo.py和predict.py。我们首先需要去yolo.py里面修改model_path以及classes_path，这两个参数必须要修改。   
 57 | **model_path指向训练好的权值文件，在logs文件夹里。   
 58 | classes_path指向检测类别所对应的txt。**   
 59 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。   
 60 | 
 61 | ### b、训练自己的数据集
 62 | 1. 数据集的准备  
 63 | **本文使用VOC格式进行训练，训练前需要自己制作好数据集，**    
 64 | 训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。   
 65 | 训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。   
 66 | 
 67 | 2. 数据集的处理  
 68 | 在完成数据集的摆放之后，我们需要利用voc_annotation.py获得训练用的2007_train.txt和2007_val.txt。   
 69 | 修改voc_annotation.py里面的参数。第一次训练可以仅修改classes_path，classes_path用于指向检测类别所对应的txt。   
 70 | 训练自己的数据集时，可以自己建立一个cls_classes.txt，里面写自己所需要区分的类别。   
 71 | model_data/cls_classes.txt文件内容为：      
 72 | ```python
 73 | cat
 74 | dog
 75 | ...
 76 | ```
 77 | 修改voc_annotation.py中的classes_path，使其对应cls_classes.txt，并运行voc_annotation.py。  
 78 | 
 79 | 3. 开始网络训练  
 80 | **训练的参数较多，均在train.py中，大家可以在下载库后仔细看注释，其中最重要的部分依然是train.py里的classes_path。**  
 81 | **classes_path用于指向检测类别所对应的txt，这个txt和voc_annotation.py里面的txt一样！训练自己的数据集必须要修改！**  
 82 | 修改完classes_path后就可以运行train.py开始训练了，在训练多个epoch后，权值会生成在logs文件夹中。  
 83 | 
 84 | 4. 训练结果预测  
 85 | 训练结果预测需要用到两个文件，分别是yolo.py和predict.py。在yolo.py里面修改model_path以及classes_path。  
 86 | **model_path指向训练好的权值文件，在logs文件夹里。  
 87 | classes_path指向检测类别所对应的txt。**  
 88 | 完成修改后就可以运行predict.py进行检测了。运行后输入图片路径即可检测。  
 89 | 
 90 | ## 预测步骤
 91 | ### a、使用预训练权重
 92 | 1. 下载完库后解压，在百度网盘下载权值，放入model_data，运行predict.py，输入  
 93 | ```python
 94 | img/street.jpg
 95 | ```
 96 | 2. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
 97 | ### b、使用自己训练的权重
 98 | 1. 按照训练步骤训练。  
 99 | 2. 在yolo.py文件里面，在如下部分修改model_path和classes_path使其对应训练好的文件；**model_path对应logs文件夹下面的权值文件，classes_path是model_path对应分的类**。  
100 | ```python
101 | _defaults = {
102 |     #--------------------------------------------------------------------------#
103 |     #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
104 |     #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
105 |     #
106 |     #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
107 |     #   验证集损失较低不代表mAP较高，仅代表该权值在验证集上泛化性能较好。
108 |     #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
109 |     #--------------------------------------------------------------------------#
110 |     "model_path"        : 'model_data/yolov7_weights.pth',
111 |     "classes_path"      : 'model_data/coco_classes.txt',
112 |     #---------------------------------------------------------------------#
113 |     #   anchors_path代表先验框对应的txt文件，一般不修改。
114 |     #   anchors_mask用于帮助代码找到对应的先验框，一般不修改。
115 |     #---------------------------------------------------------------------#
116 |     "anchors_path"      : 'model_data/yolo_anchors.txt',
117 |     "anchors_mask"      : [[6, 7, 8], [3, 4, 5], [0, 1, 2]],
118 |     #---------------------------------------------------------------------#
119 |     #   输入图片的大小，必须为32的倍数。
120 |     #---------------------------------------------------------------------#
121 |     "input_shape"       : [640, 640],
122 |     #---------------------------------------------------------------------#
123 |     #   只有得分大于置信度的预测框会被保留下来
124 |     #---------------------------------------------------------------------#
125 |     "confidence"        : 0.5,
126 |     #---------------------------------------------------------------------#
127 |     #   非极大抑制所用到的nms_iou大小
128 |     #---------------------------------------------------------------------#
129 |     "nms_iou"           : 0.3,
130 |     #---------------------------------------------------------------------#
131 |     #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
132 |     #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
133 |     #---------------------------------------------------------------------#
134 |     "letterbox_image"   : True,
135 |     #-------------------------------#
136 |     #   是否使用Cuda
137 |     #   没有GPU可以设置成False
138 |     #-------------------------------#
139 |     "cuda"              : True,
140 | }
141 | ```
142 | 3. 运行predict.py，输入  
143 | ```python
144 | img/street.jpg
145 | ```
146 | 4. 在predict.py里面进行设置可以进行fps测试和video视频检测。  
147 | 
148 | ## 评估步骤 
149 | ### a、评估VOC07+12的测试集
150 | 1. 本文使用VOC格式进行评估。VOC07+12已经划分好了测试集，无需利用voc_annotation.py生成ImageSets文件夹下的txt。
151 | 2. 在yolo.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
152 | 3. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
153 | 
154 | ### b、评估自己的数据集
155 | 1. 本文使用VOC格式进行评估。  
156 | 2. 如果在训练前已经运行过voc_annotation.py文件，代码会自动将数据集划分成训练集、验证集和测试集。如果想要修改测试集的比例，可以修改voc_annotation.py文件下的trainval_percent。trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1。train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1。
157 | 3. 利用voc_annotation.py划分测试集后，前往get_map.py文件修改classes_path，classes_path用于指向检测类别所对应的txt，这个txt和训练时的txt一样。评估自己的数据集必须要修改。
158 | 4. 在yolo.py里面修改model_path以及classes_path。**model_path指向训练好的权值文件，在logs文件夹里。classes_path指向检测类别所对应的txt。**  
159 | 5. 运行get_map.py即可获得评估结果，评估结果会保存在map_out文件夹中。
160 | 
161 | ## Reference
162 | https://github.com/WongKinYiu/yolov7
163 | 
164 | https://github.com/bubbliiiing/yolov7-tiny-pytorch
165 | 


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/Annotations/README.md:
--------------------------------------------------------------------------------
1 | 存放标签文件


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/ImageSets/Main/README.md:
--------------------------------------------------------------------------------
1 | 存放训练索引文件


--------------------------------------------------------------------------------
/VOCdevkit/VOC2007/JPEGImages/README.md:
--------------------------------------------------------------------------------
1 | 存放图片文件


--------------------------------------------------------------------------------
/convert_trt.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Description: 
  3 | Author: Egrt
  4 | Date: 2023-02-18 16:44:58
  5 | LastEditors: Egrt
  6 | LastEditTime: 2023-02-18 17:25:24
  7 | '''
  8 | import torch
  9 | from nets.yolo import YoloBody
 10 | from utils.utils import (cvtColor, get_anchors, get_classes, preprocess_input,
 11 |                          resize_image, show_config)
 12 | from torch2trt import torch2trt
 13 | 
 14 | class YOLO(object):
 15 |     _defaults = {
 16 |         #--------------------------------------------------------------------------#
 17 |         #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
 18 |         #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
 19 |         #
 20 |         #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
 21 |         #   验证集损失较低不代表mAP较高，仅代表该权值在验证集上泛化性能较好。
 22 |         #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
 23 |         #--------------------------------------------------------------------------#
 24 |         "model_path"        : 'model_data/yolov7_tiny_obb_uav.pth',
 25 |         "classes_path"      : 'model_data/uav_classes.txt',
 26 |         #---------------------------------------------------------------------#
 27 |         #   anchors_path代表先验框对应的txt文件，一般不修改。
 28 |         #   anchors_mask用于帮助代码找到对应的先验框，一般不修改。
 29 |         #---------------------------------------------------------------------#
 30 |         "anchors_path"      : 'model_data/yolo_anchors.txt',
 31 |         "anchors_mask"      : [[6, 7, 8], [3, 4, 5], [0, 1, 2]],
 32 |         #---------------------------------------------------------------------#
 33 |         #   输入图片的大小，必须为32的倍数。
 34 |         #---------------------------------------------------------------------#
 35 |         "input_shape"       : [640, 640],
 36 |         #---------------------------------------------------------------------#
 37 |         #   只有得分大于置信度的预测框会被保留下来
 38 |         #---------------------------------------------------------------------#
 39 |         "confidence"        : 0.5,
 40 |         #---------------------------------------------------------------------#
 41 |         #   非极大抑制所用到的nms_iou大小
 42 |         #---------------------------------------------------------------------#
 43 |         "nms_iou"           : 0.3,
 44 |         #---------------------------------------------------------------------#
 45 |         #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
 46 |         #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
 47 |         #---------------------------------------------------------------------#
 48 |         "letterbox_image"   : True,
 49 |         #-------------------------------#
 50 |         #   是否使用Cuda
 51 |         #   没有GPU可以设置成False
 52 |         #-------------------------------#
 53 |         "cuda"              : True,
 54 |     }
 55 | 
 56 |     @classmethod
 57 |     def get_defaults(cls, n):
 58 |         if n in cls._defaults:
 59 |             return cls._defaults[n]
 60 |         else:
 61 |             return "Unrecognized attribute name '" + n + "'"
 62 | 
 63 |     #---------------------------------------------------#
 64 |     #   初始化YOLO
 65 |     #---------------------------------------------------#
 66 |     def __init__(self, **kwargs):
 67 |         self.__dict__.update(self._defaults)
 68 |         for name, value in kwargs.items():
 69 |             setattr(self, name, value)
 70 |             self._defaults[name] = value 
 71 |             
 72 |         #---------------------------------------------------#
 73 |         #   获得种类和先验框的数量
 74 |         #---------------------------------------------------#
 75 |         self.class_names, self.num_classes  = get_classes(self.classes_path)
 76 |         self.anchors, self.num_anchors      = get_anchors(self.anchors_path)
 77 |         self.generate()
 78 |         show_config(**self._defaults)
 79 | 
 80 |     #---------------------------------------------------#
 81 |     #   生成模型
 82 |     #---------------------------------------------------#
 83 |     def generate(self, onnx=False):
 84 |         #---------------------------------------------------#
 85 |         #   建立yolo模型，载入yolo模型的权重
 86 |         #---------------------------------------------------#
 87 |         self.net    = YoloBody(self.anchors_mask, self.num_classes)
 88 |         device      = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 89 |         self.net.load_state_dict(torch.load(self.model_path, map_location=device))
 90 |         self.net    = self.net.fuse().eval()
 91 |         print('{} model, and classes loaded.'.format(self.model_path))
 92 |         if self.cuda:
 93 |             self.net = self.net.cuda()
 94 | 
 95 | # create some regular pytorch model...
 96 | model = YOLO().net
 97 | 
 98 | # create example data
 99 | x = torch.ones((1, 3, 640, 640)).cuda()
100 | 
101 | # convert to TensorRT feeding sample data as input
102 | model_trt = torch2trt(model, [x], fp16_mode=True)
103 | 
104 | y = model(x)
105 | y_trt = model_trt(x)
106 | 
107 | # check the output against PyTorch
108 | # print(torch.max(torch.abs(y - y_trt)))
109 | 
110 | # save the tensorrt model
111 | torch.save(model_trt.state_dict(), 'model_data/yolov7_tiny_trt.pth')
112 | 
113 | 


--------------------------------------------------------------------------------
/get_map.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import xml.etree.ElementTree as ET
  3 | import cv2
  4 | from PIL import Image
  5 | from tqdm import tqdm
  6 | import numpy as np
  7 | from utils.utils import get_classes
  8 | from utils.utils_map import get_coco_map, get_map
  9 | from utils.utils_rbox import *
 10 | from yolo import YOLO
 11 | 
 12 | if __name__ == "__main__":
 13 |     '''
 14 |     Recall和Precision不像AP是一个面积的概念，因此在门限值（Confidence）不同时，网络的Recall和Precision值是不同的。
 15 |     默认情况下，本代码计算的Recall和Precision代表的是当门限值（Confidence）为0.5时，所对应的Recall和Precision值。
 16 | 
 17 |     受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算不同门限条件下的Recall和Precision值
 18 |     因此，本代码获得的map_out/detection-results/里面的txt的框的数量一般会比直接predict多一些，目的是列出所有可能的预测框，
 19 |     '''
 20 |     #------------------------------------------------------------------------------------------------------------------#
 21 |     #   map_mode用于指定该文件运行时计算的内容
 22 |     #   map_mode为0代表整个map计算流程，包括获得预测结果、获得真实框、计算VOC_map。
 23 |     #   map_mode为1代表仅仅获得预测结果。
 24 |     #   map_mode为2代表仅仅获得真实框。
 25 |     #   map_mode为3代表仅仅计算VOC_map。
 26 |     #   map_mode为4代表利用COCO工具箱计算当前数据集的0.50:0.95map。需要获得预测结果、获得真实框后并安装pycocotools才行
 27 |     #-------------------------------------------------------------------------------------------------------------------#
 28 |     map_mode        = 0
 29 |     #--------------------------------------------------------------------------------------#
 30 |     #   此处的classes_path用于指定需要测量VOC_map的类别
 31 |     #   一般情况下与训练和预测所用的classes_path一致即可
 32 |     #--------------------------------------------------------------------------------------#
 33 |     classes_path    = 'model_data/uav_classes.txt'
 34 |     #--------------------------------------------------------------------------------------#
 35 |     #   MINOVERLAP用于指定想要获得的mAP0.x，mAP0.x的意义是什么请同学们百度一下。
 36 |     #   比如计算mAP0.75，可以设定MINOVERLAP = 0.75。
 37 |     #
 38 |     #   当某一预测框与真实框重合度大于MINOVERLAP时，该预测框被认为是正样本，否则为负样本。
 39 |     #   因此MINOVERLAP的值越大，预测框要预测的越准确才能被认为是正样本，此时算出来的mAP值越低，
 40 |     #--------------------------------------------------------------------------------------#
 41 |     MINOVERLAP      = 0.5
 42 |     #--------------------------------------------------------------------------------------#
 43 |     #   受到mAP计算原理的限制，网络在计算mAP时需要获得近乎所有的预测框，这样才可以计算mAP
 44 |     #   因此，confidence的值应当设置的尽量小进而获得全部可能的预测框。
 45 |     #   
 46 |     #   该值一般不调整。因为计算mAP需要获得近乎所有的预测框，此处的confidence不能随便更改。
 47 |     #   想要获得不同门限值下的Recall和Precision值，请修改下方的score_threhold。
 48 |     #--------------------------------------------------------------------------------------#
 49 |     confidence      = 0.001
 50 |     #--------------------------------------------------------------------------------------#
 51 |     #   预测时使用到的非极大抑制值的大小，越大表示非极大抑制越不严格。
 52 |     #   
 53 |     #   该值一般不调整。
 54 |     #--------------------------------------------------------------------------------------#
 55 |     nms_iou         = 0.5
 56 |     #---------------------------------------------------------------------------------------------------------------#
 57 |     #   Recall和Precision不像AP是一个面积的概念，因此在门限值不同时，网络的Recall和Precision值是不同的。
 58 |     #   
 59 |     #   默认情况下，本代码计算的Recall和Precision代表的是当门限值为0.5（此处定义为score_threhold）时所对应的Recall和Precision值。
 60 |     #   因为计算mAP需要获得近乎所有的预测框，上面定义的confidence不能随便更改。
 61 |     #   这里专门定义一个score_threhold用于代表门限值，进而在计算mAP时找到门限值对应的Recall和Precision值。
 62 |     #---------------------------------------------------------------------------------------------------------------#
 63 |     score_threhold  = 0.5
 64 |     #-------------------------------------------------------#
 65 |     #   map_vis用于指定是否开启VOC_map计算的可视化
 66 |     #-------------------------------------------------------#
 67 |     map_vis         = False
 68 |     #-------------------------------------------------------#
 69 |     #   指向VOC数据集所在的文件夹
 70 |     #   默认指向根目录下的VOC数据集
 71 |     #-------------------------------------------------------#
 72 |     VOCdevkit_path  = 'VOCdevkit'
 73 |     #-------------------------------------------------------#
 74 |     #   结果输出的文件夹，默认为map_out
 75 |     #-------------------------------------------------------#
 76 |     map_out_path    = 'map_out'
 77 | 
 78 |     image_ids = open(os.path.join(VOCdevkit_path, "VOC2007/ImageSets/Main/test.txt")).read().strip().split()
 79 | 
 80 |     if not os.path.exists(map_out_path):
 81 |         os.makedirs(map_out_path)
 82 |     if not os.path.exists(os.path.join(map_out_path, 'ground-truth')):
 83 |         os.makedirs(os.path.join(map_out_path, 'ground-truth'))
 84 |     if not os.path.exists(os.path.join(map_out_path, 'detection-results')):
 85 |         os.makedirs(os.path.join(map_out_path, 'detection-results'))
 86 |     if not os.path.exists(os.path.join(map_out_path, 'images-optional')):
 87 |         os.makedirs(os.path.join(map_out_path, 'images-optional'))
 88 | 
 89 |     class_names, _ = get_classes(classes_path)
 90 | 
 91 |     if map_mode == 0 or map_mode == 1:
 92 |         print("Load model.")
 93 |         yolo = YOLO(confidence = confidence, nms_iou = nms_iou)
 94 |         print("Load model done.")
 95 | 
 96 |         print("Get predict result.")
 97 |         for image_id in tqdm(image_ids):
 98 |             image_path  = os.path.join(VOCdevkit_path, "VOC2007/JPEGImages/"+image_id+".jpg")
 99 |             image       = Image.open(image_path)
100 |             if map_vis:
101 |                 image.save(os.path.join(map_out_path, "images-optional/" + image_id + ".jpg"))
102 |             yolo.get_map_txt(image_id, image, class_names, map_out_path)
103 |         print("Get predict result done.")
104 |         
105 |     if map_mode == 0 or map_mode == 2:
106 |         print("Get ground truth result.")
107 |         for image_id in tqdm(image_ids):
108 |             with open(os.path.join(map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
109 |                 root = ET.parse(os.path.join(VOCdevkit_path, "VOC2007/Annotations/"+image_id+".xml")).getroot()
110 |                 for obj in root.findall('object'):
111 |                     difficult_flag = False
112 |                     if obj.find('difficult')!=None:
113 |                         difficult = obj.find('difficult').text
114 |                         if int(difficult)==1:
115 |                             difficult_flag = True
116 |                     obj_name = obj.find('name').text
117 |                     if obj_name not in class_names:
118 |                         continue
119 |                     bndbox  = obj.find('robndbox')
120 |                     cx      = float(bndbox.find('cx').text)
121 |                     cy      = float(bndbox.find('cy').text)
122 |                     h       = float(bndbox.find('h').text)
123 |                     w       = float(bndbox.find('w').text)
124 |                     theta   = float(bndbox.find('angle').text)
125 |                     rbox    = np.array([[cx, cy, w, h, theta]], dtype=np.float32)
126 |                     poly    = rbox2poly(rbox)[0]
127 |                     poly    = np.float32(poly.reshape(4, 2))
128 |                     (x, y), (w, h), angle = cv2.minAreaRect(poly)  # θ ∈ [0， 90]
129 |                     if difficult_flag:
130 |                         new_f.write("%s %s %s %s %s %s difficult\n" % (obj_name, int(x), int(y), int(w), int(h), angle))
131 |                     else:
132 |                         new_f.write("%s %s %s %s %s %s\n" % (obj_name, int(x), int(y), int(w), int(h), angle))
133 |         print("Get ground truth result done.")
134 | 
135 |     if map_mode == 0 or map_mode == 3:
136 |         print("Get map.")
137 |         get_map(MINOVERLAP, True, score_threhold = score_threhold, path = map_out_path)
138 |         print("Get map done.")
139 | 
140 |     if map_mode == 4:
141 |         print("Get map.")
142 |         get_coco_map(class_names = class_names, path = map_out_path)
143 |         print("Get map done.")
144 | 


--------------------------------------------------------------------------------
/hrsc_annotation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import random
  3 | import xml.etree.ElementTree as ET
  4 | 
  5 | import numpy as np
  6 | from utils.utils_rbox import *
  7 | from utils.utils import get_classes
  8 | 
  9 | #--------------------------------------------------------------------------------------------------------------------------------#
 10 | #   annotation_mode用于指定该文件运行时计算的内容
 11 | #   annotation_mode为0代表整个标签处理过程，包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt
 12 | #   annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt
 13 | #   annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt
 14 | #--------------------------------------------------------------------------------------------------------------------------------#
 15 | annotation_mode     = 0
 16 | #-------------------------------------------------------------------#
 17 | #   必须要修改，用于生成2007_train.txt、2007_val.txt的目标信息
 18 | #   与训练和预测所用的classes_path一致即可
 19 | #   如果生成的2007_train.txt里面没有目标信息
 20 | #   那么就是因为classes没有设定正确
 21 | #   仅在annotation_mode为0和2的时候有效
 22 | #-------------------------------------------------------------------#
 23 | classes_path        = 'model_data/hrsc_classes.txt'
 24 | #--------------------------------------------------------------------------------------------------------------------------------#
 25 | #   trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1
 26 | #   train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1
 27 | #   仅在annotation_mode为0和1的时候有效
 28 | #--------------------------------------------------------------------------------------------------------------------------------#
 29 | trainval_percent    = 0.9
 30 | train_percent       = 0.9
 31 | #-------------------------------------------------------#
 32 | #   指向VOC数据集所在的文件夹
 33 | #   默认指向根目录下的VOC数据集
 34 | #-------------------------------------------------------#
 35 | VOCdevkit_path  = 'VOCdevkit'
 36 | 
 37 | VOCdevkit_sets  = [('2007_HRSC', 'train'), ('2007_HRSC', 'val')]
 38 | classes, _      = get_classes(classes_path)
 39 | 
 40 | #-------------------------------------------------------#
 41 | #   统计目标数量
 42 | #-------------------------------------------------------#
 43 | photo_nums  = np.zeros(len(VOCdevkit_sets))
 44 | nums        = np.zeros(len(classes))
 45 | def convert_annotation(year, image_id, list_file):
 46 |     in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8')
 47 |     tree=ET.parse(in_file)
 48 |     root = tree.getroot().find('HRSC_Objects')
 49 | 
 50 |     for obj in root.iter('HRSC_Object'):
 51 |         difficult = 0 
 52 |         if obj.find('difficult')!=None:
 53 |             difficult = obj.find('difficult').text
 54 |         cls = obj.find('name').text
 55 |         if cls not in classes or int(difficult)==1:
 56 |             continue
 57 |         if obj.find('mbox_cx')==None:
 58 |             continue
 59 |         cls_id = classes.index(cls)
 60 |         cx = float(obj.find('mbox_cx').text)
 61 |         cy = float(obj.find('mbox_cy').text)
 62 |         w  = float(obj.find('mbox_w').text)
 63 |         h  = float(obj.find('mbox_h').text)
 64 |         angle = float(obj.find('mbox_ang').text)
 65 |         b = np.array([[cx, cy, w, h, angle]], dtype=np.float32)
 66 |         b = rbox2poly(b)[0]
 67 |         b = (b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7])
 68 |         list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
 69 |         
 70 |         nums[classes.index(cls)] = nums[classes.index(cls)] + 1
 71 |         
 72 | if __name__ == "__main__":
 73 |     random.seed(0)
 74 |     if " " in os.path.abspath(VOCdevkit_path):
 75 |         raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格，否则会影响正常的模型训练，请注意修改。")
 76 | 
 77 |     if annotation_mode == 0 or annotation_mode == 1:
 78 |         print("Generate txt in ImageSets.")
 79 |         xmlfilepath     = os.path.join(VOCdevkit_path, 'VOC2007_HRSC/Annotations')
 80 |         saveBasePath    = os.path.join(VOCdevkit_path, 'VOC2007_HRSC/ImageSets/Main')
 81 |         temp_xml        = os.listdir(xmlfilepath)
 82 |         total_xml       = []
 83 |         for xml in temp_xml:
 84 |             if xml.endswith(".xml"):
 85 |                 total_xml.append(xml)
 86 | 
 87 |         num     = len(total_xml)  
 88 |         list    = range(num)  
 89 |         tv      = int(num*trainval_percent)  
 90 |         tr      = int(tv*train_percent)  
 91 |         trainval= random.sample(list,tv)  
 92 |         train   = random.sample(trainval,tr)  
 93 |         
 94 |         print("train and val size",tv)
 95 |         print("train size",tr)
 96 |         ftrainval   = open(os.path.join(saveBasePath,'trainval.txt'), 'w')  
 97 |         ftest       = open(os.path.join(saveBasePath,'test.txt'), 'w')  
 98 |         ftrain      = open(os.path.join(saveBasePath,'train.txt'), 'w')  
 99 |         fval        = open(os.path.join(saveBasePath,'val.txt'), 'w')  
100 |         
101 |         for i in list:  
102 |             name=total_xml[i][:-4]+'\n'  
103 |             if i in trainval:  
104 |                 ftrainval.write(name)  
105 |                 if i in train:  
106 |                     ftrain.write(name)  
107 |                 else:  
108 |                     fval.write(name)  
109 |             else:  
110 |                 ftest.write(name)  
111 |         
112 |         ftrainval.close()  
113 |         ftrain.close()  
114 |         fval.close()  
115 |         ftest.close()
116 |         print("Generate txt in ImageSets done.")
117 | 
118 |     if annotation_mode == 0 or annotation_mode == 2:
119 |         print("Generate 2007_train.txt and 2007_val.txt for train.")
120 |         type_index = 0
121 |         for year, image_set in VOCdevkit_sets:
122 |             image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split()
123 |             list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8')
124 |             for image_id in image_ids:
125 |                 list_file.write('%s/VOC%s/JPEGImages/%s.bmp'%(os.path.abspath(VOCdevkit_path), year, image_id))
126 | 
127 |                 convert_annotation(year, image_id, list_file)
128 |                 list_file.write('\n')
129 |             photo_nums[type_index] = len(image_ids)
130 |             type_index += 1
131 |             list_file.close()
132 |         print("Generate 2007_train.txt and 2007_val.txt for train done.")
133 |         
134 |         def printTable(List1, List2):
135 |             for i in range(len(List1[0])):
136 |                 print("|", end=' ')
137 |                 for j in range(len(List1)):
138 |                     print(List1[j][i].rjust(int(List2[j])), end=' ')
139 |                     print("|", end=' ')
140 |                 print()
141 | 
142 |         str_nums = [str(int(x)) for x in nums]
143 |         tableData = [
144 |             classes, str_nums
145 |         ]
146 |         colWidths = [0]*len(tableData)
147 |         len1 = 0
148 |         for i in range(len(tableData)):
149 |             for j in range(len(tableData[i])):
150 |                 if len(tableData[i][j]) > colWidths[i]:
151 |                     colWidths[i] = len(tableData[i][j])
152 |         printTable(tableData, colWidths)
153 | 
154 |         if photo_nums[0] <= 500:
155 |             print("训练集数量小于500，属于较小的数据量，请注意设置较大的训练世代（Epoch）以满足足够的梯度下降次数（Step）。")
156 | 
157 |         if np.sum(nums) == 0:
158 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
159 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
160 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
161 |             print("（重要的事情说三遍）。")
162 | 


--------------------------------------------------------------------------------
/img/test.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Egrt/yolov7-tiny-obb/92139b483f07eaaa61e91138030946976826d0db/img/test.jpg


--------------------------------------------------------------------------------
/kmeans_for_anchors.py:
--------------------------------------------------------------------------------
  1 | #-------------------------------------------------------------------------------------------------------#
  2 | #   kmeans虽然会对数据集中的框进行聚类，但是很多数据集由于框的大小相近，聚类出来的9个框相差不大，
  3 | #   这样的框反而不利于模型的训练。因为不同的特征层适合不同大小的先验框，shape越小的特征层适合越大的先验框
  4 | #   原始网络的先验框已经按大中小比例分配好了，不进行聚类也会有非常好的效果。
  5 | #-------------------------------------------------------------------------------------------------------#
  6 | import glob
  7 | import xml.etree.ElementTree as ET
  8 | 
  9 | import matplotlib.pyplot as plt
 10 | import numpy as np
 11 | from tqdm import tqdm
 12 | 
 13 | 
 14 | def cas_ratio(box,cluster):
 15 |     ratios_of_box_cluster = box / cluster
 16 |     ratios_of_cluster_box = cluster / box
 17 |     ratios = np.concatenate([ratios_of_box_cluster, ratios_of_cluster_box], axis = -1)
 18 | 
 19 |     return np.max(ratios, -1)
 20 | 
 21 | def avg_ratio(box,cluster):
 22 |     return np.mean([np.min(cas_ratio(box[i],cluster)) for i in range(box.shape[0])])
 23 | 
 24 | def kmeans(box,k):
 25 |     #-------------------------------------------------------------#
 26 |     #   取出一共有多少框
 27 |     #-------------------------------------------------------------#
 28 |     row = box.shape[0]
 29 |     
 30 |     #-------------------------------------------------------------#
 31 |     #   每个框各个点的位置
 32 |     #-------------------------------------------------------------#
 33 |     distance = np.empty((row,k))
 34 |     
 35 |     #-------------------------------------------------------------#
 36 |     #   最后的聚类位置
 37 |     #-------------------------------------------------------------#
 38 |     last_clu = np.zeros((row,))
 39 | 
 40 |     np.random.seed()
 41 | 
 42 |     #-------------------------------------------------------------#
 43 |     #   随机选5个当聚类中心
 44 |     #-------------------------------------------------------------#
 45 |     cluster = box[np.random.choice(row,k,replace = False)]
 46 | 
 47 |     iter = 0
 48 |     while True:
 49 |         #-------------------------------------------------------------#
 50 |         #   计算当前框和先验框的宽高比例
 51 |         #-------------------------------------------------------------#
 52 |         for i in range(row):
 53 |             distance[i] = cas_ratio(box[i],cluster)
 54 |         
 55 |         #-------------------------------------------------------------#
 56 |         #   取出最小点
 57 |         #-------------------------------------------------------------#
 58 |         near = np.argmin(distance,axis=1)
 59 | 
 60 |         if (last_clu == near).all():
 61 |             break
 62 |         
 63 |         #-------------------------------------------------------------#
 64 |         #   求每一个类的中位点
 65 |         #-------------------------------------------------------------#
 66 |         for j in range(k):
 67 |             cluster[j] = np.median(
 68 |                 box[near == j],axis=0)
 69 | 
 70 |         last_clu = near
 71 |         if iter % 5 == 0:
 72 |             print('iter: {:d}. avg_ratio:{:.2f}'.format(iter, avg_ratio(box,cluster)))
 73 |         iter += 1
 74 | 
 75 |     return cluster, near
 76 | 
 77 | def load_data(path):
 78 |     data = []
 79 |     #-------------------------------------------------------------#
 80 |     #   对于每一个xml都寻找box
 81 |     #-------------------------------------------------------------#
 82 |     for xml_file in tqdm(glob.glob('{}/*xml'.format(path))):
 83 |         tree = ET.parse(xml_file)
 84 |         height = int(tree.findtext('./size/height'))
 85 |         width = int(tree.findtext('./size/width'))
 86 |         if height<=0 or width<=0:
 87 |             continue
 88 |         
 89 |         #-------------------------------------------------------------#
 90 |         #   对于每一个目标都获得它的宽高
 91 |         #-------------------------------------------------------------#
 92 |         for obj in tree.iter('object'):
 93 |             xmin = int(float(obj.findtext('bndbox/xmin'))) / width
 94 |             ymin = int(float(obj.findtext('bndbox/ymin'))) / height
 95 |             xmax = int(float(obj.findtext('bndbox/xmax'))) / width
 96 |             ymax = int(float(obj.findtext('bndbox/ymax'))) / height
 97 | 
 98 |             xmin = np.float64(xmin)
 99 |             ymin = np.float64(ymin)
100 |             xmax = np.float64(xmax)
101 |             ymax = np.float64(ymax)
102 |             # 得到宽高
103 |             data.append([xmax-xmin,ymax-ymin])
104 |     return np.array(data)
105 | 
106 | if __name__ == '__main__':
107 |     np.random.seed(0)
108 |     #-------------------------------------------------------------#
109 |     #   运行该程序会计算'./VOCdevkit/VOC2007/Annotations'的xml
110 |     #   会生成yolo_anchors.txt
111 |     #-------------------------------------------------------------#
112 |     input_shape = [640, 640]
113 |     anchors_num = 9
114 |     #-------------------------------------------------------------#
115 |     #   载入数据集，可以使用VOC的xml
116 |     #-------------------------------------------------------------#
117 |     path        = 'VOCdevkit/VOC2007/Annotations'
118 |     
119 |     #-------------------------------------------------------------#
120 |     #   载入所有的xml
121 |     #   存储格式为转化为比例后的width,height
122 |     #-------------------------------------------------------------#
123 |     print('Load xmls.')
124 |     data = load_data(path)
125 |     print('Load xmls done.')
126 |     
127 |     #-------------------------------------------------------------#
128 |     #   使用k聚类算法
129 |     #-------------------------------------------------------------#
130 |     print('K-means boxes.')
131 |     cluster, near   = kmeans(data, anchors_num)
132 |     print('K-means boxes done.')
133 |     data            = data * np.array([input_shape[1], input_shape[0]])
134 |     cluster         = cluster * np.array([input_shape[1], input_shape[0]])
135 | 
136 |     #-------------------------------------------------------------#
137 |     #   绘图
138 |     #-------------------------------------------------------------#
139 |     for j in range(anchors_num):
140 |         plt.scatter(data[near == j][:,0], data[near == j][:,1])
141 |         plt.scatter(cluster[j][0], cluster[j][1], marker='x', c='black')
142 |     plt.savefig("kmeans_for_anchors.jpg")
143 |     plt.show()
144 |     print('Save kmeans_for_anchors.jpg in root dir.')
145 | 
146 |     cluster = cluster[np.argsort(cluster[:, 0] * cluster[:, 1])]
147 |     print('avg_ratio:{:.2f}'.format(avg_ratio(data, cluster)))
148 |     print(cluster)
149 | 
150 |     f = open("yolo_anchors.txt", 'w')
151 |     row = np.shape(cluster)[0]
152 |     for i in range(row):
153 |         if i == 0:
154 |             x_y = "%d,%d" % (cluster[i][0], cluster[i][1])
155 |         else:
156 |             x_y = ", %d,%d" % (cluster[i][0], cluster[i][1])
157 |         f.write(x_y)
158 |     f.close()
159 | 


--------------------------------------------------------------------------------
/logs/README.md:
--------------------------------------------------------------------------------
1 | 训练好的权重会保存在这里
2 | 


--------------------------------------------------------------------------------
/model_data/coco_classes.txt:
--------------------------------------------------------------------------------
 1 | person
 2 | bicycle
 3 | car
 4 | motorbike
 5 | aeroplane
 6 | bus
 7 | train
 8 | truck
 9 | boat
10 | traffic light
11 | fire hydrant
12 | stop sign
13 | parking meter
14 | bench
15 | bird
16 | cat
17 | dog
18 | horse
19 | sheep
20 | cow
21 | elephant
22 | bear
23 | zebra
24 | giraffe
25 | backpack
26 | umbrella
27 | handbag
28 | tie
29 | suitcase
30 | frisbee
31 | skis
32 | snowboard
33 | sports ball
34 | kite
35 | baseball bat
36 | baseball glove
37 | skateboard
38 | surfboard
39 | tennis racket
40 | bottle
41 | wine glass
42 | cup
43 | fork
44 | knife
45 | spoon
46 | bowl
47 | banana
48 | apple
49 | sandwich
50 | orange
51 | broccoli
52 | carrot
53 | hot dog
54 | pizza
55 | donut
56 | cake
57 | chair
58 | sofa
59 | pottedplant
60 | bed
61 | diningtable
62 | toilet
63 | tvmonitor
64 | laptop
65 | mouse
66 | remote
67 | keyboard
68 | cell phone
69 | microwave
70 | oven
71 | toaster
72 | sink
73 | refrigerator
74 | book
75 | clock
76 | vase
77 | scissors
78 | teddy bear
79 | hair drier
80 | toothbrush
81 | 


--------------------------------------------------------------------------------
/model_data/simhei.ttf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Egrt/yolov7-tiny-obb/92139b483f07eaaa61e91138030946976826d0db/model_data/simhei.ttf


--------------------------------------------------------------------------------
/model_data/uav_classes.txt:
--------------------------------------------------------------------------------
1 | car


--------------------------------------------------------------------------------
/model_data/voc_classes.txt:
--------------------------------------------------------------------------------
 1 | aeroplane
 2 | bicycle
 3 | bird
 4 | boat
 5 | bottle
 6 | bus
 7 | car
 8 | cat
 9 | chair
10 | cow
11 | diningtable
12 | dog
13 | horse
14 | motorbike
15 | person
16 | pottedplant
17 | sheep
18 | sofa
19 | train
20 | tvmonitor


--------------------------------------------------------------------------------
/model_data/yolo_anchors.txt:
--------------------------------------------------------------------------------
1 | 12, 16,  19, 36,  40, 28,  36, 75,  76, 55,  72, 146,  142, 110,  192, 243,  459, 401


--------------------------------------------------------------------------------
/nets/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/nets/backbone.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | 
  4 | 
  5 | def autopad(k, p=None):
  6 |     if p is None:
  7 |         p = k // 2 if isinstance(k, int) else [x // 2 for x in k] 
  8 |     return p
  9 |     
 10 | class Conv(nn.Module):
 11 |     def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=nn.LeakyReLU(0.1, inplace=True)):  # ch_in, ch_out, kernel, stride, padding, groups
 12 |         super(Conv, self).__init__()
 13 |         self.conv   = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
 14 |         self.bn     = nn.BatchNorm2d(c2, eps=0.001, momentum=0.03)
 15 |         self.act    = nn.LeakyReLU(0.1, inplace=True) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
 16 | 
 17 |     def forward(self, x):
 18 |         return self.act(self.bn(self.conv(x)))
 19 | 
 20 |     def fuseforward(self, x):
 21 |         return self.act(self.conv(x))
 22 |     
 23 | class Multi_Concat_Block(nn.Module):
 24 |     def __init__(self, c1, c2, c3, n=4, e=1, ids=[0]):
 25 |         super(Multi_Concat_Block, self).__init__()
 26 |         c_ = int(c2 * e)
 27 |         
 28 |         self.ids = ids
 29 |         self.cv1 = Conv(c1, c_, 1, 1)
 30 |         self.cv2 = Conv(c1, c_, 1, 1)
 31 |         self.cv3 = nn.ModuleList(
 32 |             [Conv(c_ if i ==0 else c2, c2, 3, 1) for i in range(n)]
 33 |         )
 34 |         self.cv4 = Conv(c_ * 2 + c2 * (len(ids) - 2), c3, 1, 1)
 35 | 
 36 |     def forward(self, x):
 37 |         x_1 = self.cv1(x)
 38 |         x_2 = self.cv2(x)
 39 |         
 40 |         x_all = [x_1, x_2]
 41 |         for i in range(len(self.cv3)):
 42 |             x_2 = self.cv3[i](x_2)
 43 |             x_all.append(x_2)
 44 |             
 45 |         out = self.cv4(torch.cat([x_all[id] for id in self.ids], 1))
 46 |         return out
 47 | 
 48 | class MP(nn.Module):
 49 |     def __init__(self, k=2):
 50 |         super(MP, self).__init__()
 51 |         self.m = nn.MaxPool2d(kernel_size=k, stride=k)
 52 | 
 53 |     def forward(self, x):
 54 |         return self.m(x)
 55 | 
 56 | class Backbone(nn.Module):
 57 |     def __init__(self, transition_channels, block_channels, n, pretrained=False):
 58 |         super().__init__()
 59 |         #-----------------------------------------------#
 60 |         #   输入图片是640, 640, 3
 61 |         #-----------------------------------------------#
 62 |         ids = [-1, -2, -3, -4]
 63 |         # 640, 640, 3 => 320, 320, 64
 64 |         self.stem = Conv(3, transition_channels * 2, 3, 2)
 65 |         # 320, 320, 64 => 160, 160, 128 => 160, 160, 128
 66 |         self.dark2 = nn.Sequential(
 67 |             Conv(transition_channels * 2, transition_channels * 4, 3, 2),
 68 |             Multi_Concat_Block(transition_channels * 4, block_channels * 2, transition_channels * 4, n=n, ids=ids),
 69 |         )
 70 |         # 160, 160, 128 => 80, 80, 128 => 80, 80, 256
 71 |         self.dark3 = nn.Sequential(
 72 |             MP(),
 73 |             Multi_Concat_Block(transition_channels * 4, block_channels * 4, transition_channels * 8, n=n, ids=ids),
 74 |         )
 75 |         # 80, 80, 256 => 40, 40, 256 => 40, 40, 512
 76 |         self.dark4 = nn.Sequential(
 77 |             MP(),
 78 |             Multi_Concat_Block(transition_channels * 8, block_channels * 8, transition_channels * 16, n=n, ids=ids),
 79 |         )
 80 |         # 40, 40, 512 => 20, 20, 512 => 20, 20, 1024
 81 |         self.dark5 = nn.Sequential(
 82 |             MP(),
 83 |             Multi_Concat_Block(transition_channels * 16, block_channels * 16, transition_channels * 32, n=n, ids=ids),
 84 |         )
 85 |         
 86 |         if pretrained:
 87 |             url = 'https://github.com/bubbliiiing/yolov7-tiny-pytorch/releases/download/v1.0/yolov7_tiny_backbone_weights.pth'
 88 |             checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location="cpu", model_dir="./model_data")
 89 |             self.load_state_dict(checkpoint, strict=False)
 90 |             print("Load weights from " + url.split('/')[-1])
 91 | 
 92 |     def forward(self, x):
 93 |         x = self.stem(x)
 94 |         x = self.dark2(x)
 95 |         #-----------------------------------------------#
 96 |         #   dark3的输出为80, 80, 256，是一个有效特征层
 97 |         #-----------------------------------------------#
 98 |         x = self.dark3(x)
 99 |         feat1 = x
100 |         #-----------------------------------------------#
101 |         #   dark4的输出为40, 40, 512，是一个有效特征层
102 |         #-----------------------------------------------#
103 |         x = self.dark4(x)
104 |         feat2 = x
105 |         #-----------------------------------------------#
106 |         #   dark5的输出为20, 20, 1024，是一个有效特征层
107 |         #-----------------------------------------------#
108 |         x = self.dark5(x)
109 |         feat3 = x
110 |         return feat1, feat2, feat3
111 | 


--------------------------------------------------------------------------------
/nets/yolo.py:
--------------------------------------------------------------------------------
  1 | import torch
  2 | import torch.nn as nn
  3 | 
  4 | from nets.backbone import Backbone, Multi_Concat_Block, Conv
  5 | 
  6 | 
  7 | class SPPCSPC(nn.Module):
  8 |     # CSP https://github.com/WongKinYiu/CrossStagePartialNetworks
  9 |     def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5, k=(13, 9, 5)):
 10 |         super(SPPCSPC, self).__init__()
 11 |         c_ = int(2 * c2 * e)  # hidden channels
 12 |         self.cv1 = Conv(c1, c_, 1, 1)
 13 |         self.cv2 = Conv(c1, c_, 1, 1)
 14 |         self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])
 15 |         self.cv3 = Conv(4 * c_, c_, 1, 1)
 16 |         self.cv4 = Conv(2 * c_, c2, 1, 1)
 17 | 
 18 |     def forward(self, x):
 19 |         x1 = self.cv1(x)
 20 |         y1 = self.cv3(torch.cat([m(x1) for m in self.m] + [x1], 1))
 21 |         y2 = self.cv2(x)
 22 |         return self.cv4(torch.cat((y1, y2), dim=1))
 23 | 
 24 | def fuse_conv_and_bn(conv, bn):
 25 |     fusedconv = nn.Conv2d(conv.in_channels,
 26 |                           conv.out_channels,
 27 |                           kernel_size=conv.kernel_size,
 28 |                           stride=conv.stride,
 29 |                           padding=conv.padding,
 30 |                           groups=conv.groups,
 31 |                           bias=True).requires_grad_(False).to(conv.weight.device)
 32 | 
 33 |     w_conv  = conv.weight.clone().view(conv.out_channels, -1)
 34 |     w_bn    = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))
 35 |     fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))
 36 | 
 37 |     b_conv  = torch.zeros(conv.weight.size(0), device=conv.weight.device) if conv.bias is None else conv.bias
 38 |     b_bn    = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))
 39 |     fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
 40 |     return fusedconv
 41 | 
 42 | #---------------------------------------------------#
 43 | #   yolo_body
 44 | #---------------------------------------------------#
 45 | class YoloBody(nn.Module):
 46 |     def __init__(self, anchors_mask, num_classes, pretrained=False):
 47 |         super(YoloBody, self).__init__()
 48 |         #-----------------------------------------------#
 49 |         #   定义了不同yolov7-tiny的参数
 50 |         #-----------------------------------------------#
 51 |         transition_channels = 16
 52 |         block_channels      = 16
 53 |         panet_channels      = 16
 54 |         e                   = 1
 55 |         n                   = 2
 56 |         ids                 = [-1, -2, -3, -4]
 57 |         #-----------------------------------------------#
 58 |         #   输入图片是640, 640, 3
 59 |         #-----------------------------------------------#
 60 | 
 61 |         #---------------------------------------------------#   
 62 |         #   生成主干模型
 63 |         #   获得三个有效特征层，他们的shape分别是：
 64 |         #   80, 80, 512
 65 |         #   40, 40, 1024
 66 |         #   20, 20, 1024
 67 |         #---------------------------------------------------#
 68 |         self.backbone   = Backbone(transition_channels, block_channels, n, pretrained=pretrained)
 69 | 
 70 |         self.upsample   = nn.Upsample(scale_factor=2, mode="nearest")
 71 | 
 72 |         self.sppcspc                = SPPCSPC(transition_channels * 32, transition_channels * 16)
 73 |         self.conv_for_P5            = Conv(transition_channels * 16, transition_channels * 8)
 74 |         self.conv_for_feat2         = Conv(transition_channels * 16, transition_channels * 8)
 75 |         self.conv3_for_upsample1    = Multi_Concat_Block(transition_channels * 16, panet_channels * 4, transition_channels * 8, e=e, n=n, ids=ids)
 76 | 
 77 |         self.conv_for_P4            = Conv(transition_channels * 8, transition_channels * 4)
 78 |         self.conv_for_feat1         = Conv(transition_channels * 8, transition_channels * 4)
 79 |         self.conv3_for_upsample2    = Multi_Concat_Block(transition_channels * 8, panet_channels * 2, transition_channels * 4, e=e, n=n, ids=ids)
 80 | 
 81 |         self.down_sample1           = Conv(transition_channels * 4, transition_channels * 8, k=3, s=2)
 82 |         self.conv3_for_downsample1  = Multi_Concat_Block(transition_channels * 16, panet_channels * 4, transition_channels * 8, e=e, n=n, ids=ids)
 83 | 
 84 |         self.down_sample2           = Conv(transition_channels * 8, transition_channels * 16, k=3, s=2)
 85 |         self.conv3_for_downsample2  = Multi_Concat_Block(transition_channels * 32, panet_channels * 8, transition_channels * 16, e=e, n=n, ids=ids)
 86 | 
 87 |         self.rep_conv_1 = Conv(transition_channels * 4, transition_channels * 8, 3, 1)
 88 |         self.rep_conv_2 = Conv(transition_channels * 8, transition_channels * 16, 3, 1)
 89 |         self.rep_conv_3 = Conv(transition_channels * 16, transition_channels * 32, 3, 1)
 90 | 
 91 |         self.yolo_head_P3 = nn.Conv2d(transition_channels * 8, len(anchors_mask[2]) * (6 + num_classes), 1)
 92 |         self.yolo_head_P4 = nn.Conv2d(transition_channels * 16, len(anchors_mask[1]) * (6 + num_classes), 1)
 93 |         self.yolo_head_P5 = nn.Conv2d(transition_channels * 32, len(anchors_mask[0]) * (6 + num_classes), 1)
 94 | 
 95 |     def fuse(self):
 96 |         print('Fusing layers... ')
 97 |         for m in self.modules():
 98 |             if type(m) is Conv and hasattr(m, 'bn'):
 99 |                 m.conv = fuse_conv_and_bn(m.conv, m.bn)
100 |                 delattr(m, 'bn')
101 |                 m.forward = m.fuseforward
102 |         return self
103 |     
104 |     def forward(self, x):
105 |         #  backbone
106 |         feat1, feat2, feat3 = self.backbone.forward(x)
107 |         
108 |         P5          = self.sppcspc(feat3)
109 |         P5_conv     = self.conv_for_P5(P5)
110 |         P5_upsample = self.upsample(P5_conv)
111 |         P4          = torch.cat([self.conv_for_feat2(feat2), P5_upsample], 1)
112 |         P4          = self.conv3_for_upsample1(P4)
113 | 
114 |         P4_conv     = self.conv_for_P4(P4)
115 |         P4_upsample = self.upsample(P4_conv)
116 |         P3          = torch.cat([self.conv_for_feat1(feat1), P4_upsample], 1)
117 |         P3          = self.conv3_for_upsample2(P3)
118 | 
119 |         P3_downsample = self.down_sample1(P3)
120 |         P4 = torch.cat([P3_downsample, P4], 1)
121 |         P4 = self.conv3_for_downsample1(P4)
122 | 
123 |         P4_downsample = self.down_sample2(P4)
124 |         P5 = torch.cat([P4_downsample, P5], 1)
125 |         P5 = self.conv3_for_downsample2(P5)
126 |         
127 |         P3 = self.rep_conv_1(P3)
128 |         P4 = self.rep_conv_2(P4)
129 |         P5 = self.rep_conv_3(P5)
130 |         #---------------------------------------------------#
131 |         #   第三个特征层
132 |         #   y3=(batch_size, 78, 80, 80)
133 |         #---------------------------------------------------#
134 |         out2 = self.yolo_head_P3(P3)
135 |         #---------------------------------------------------#
136 |         #   第二个特征层
137 |         #   y2=(batch_size, 78, 40, 40)
138 |         #---------------------------------------------------#
139 |         out1 = self.yolo_head_P4(P4)
140 |         #---------------------------------------------------#
141 |         #   第一个特征层
142 |         #   y1=(batch_size, 78, 20, 20)
143 |         #---------------------------------------------------#
144 |         out0 = self.yolo_head_P5(P5)
145 | 
146 |         return [out0, out1, out2]
147 | 


--------------------------------------------------------------------------------
/predict.py:
--------------------------------------------------------------------------------
  1 | #-----------------------------------------------------------------------#
  2 | #   predict.py将单张图片预测、摄像头检测、FPS测试和目录遍历检测等功能
  3 | #   整合到了一个py文件中，通过指定mode进行模式的修改。
  4 | #-----------------------------------------------------------------------#
  5 | import time
  6 | 
  7 | import cv2
  8 | import numpy as np
  9 | from PIL import Image
 10 | 
 11 | from yolo import YOLO
 12 | 
 13 | if __name__ == "__main__":
 14 |     yolo = YOLO()
 15 |     #----------------------------------------------------------------------------------------------------------#
 16 |     #   mode用于指定测试的模式：
 17 |     #   'predict'           表示单张图片预测，如果想对预测过程进行修改，如保存图片，截取对象等，可以先看下方详细的注释
 18 |     #   'video'             表示视频检测，可调用摄像头或者视频进行检测，详情查看下方注释。
 19 |     #   'fps'               表示测试fps，使用的图片是img里面的street.jpg，详情查看下方注释。
 20 |     #   'dir_predict'       表示遍历文件夹进行检测并保存。默认遍历img文件夹，保存img_out文件夹，详情查看下方注释。
 21 |     #   'heatmap'           表示进行预测结果的热力图可视化，详情查看下方注释。
 22 |     #   'export_onnx'       表示将模型导出为onnx，需要pytorch1.7.1以上。
 23 |     #----------------------------------------------------------------------------------------------------------#
 24 |     mode = "predict"
 25 |     #-------------------------------------------------------------------------#
 26 |     #   crop                指定了是否在单张图片预测后对目标进行截取
 27 |     #   count               指定了是否进行目标的计数
 28 |     #   crop、count仅在mode='predict'时有效
 29 |     #-------------------------------------------------------------------------#
 30 |     crop            = False
 31 |     count           = False
 32 |     #----------------------------------------------------------------------------------------------------------#
 33 |     #   video_path          用于指定视频的路径，当video_path=0时表示检测摄像头
 34 |     #                       想要检测视频，则设置如video_path = "xxx.mp4"即可，代表读取出根目录下的xxx.mp4文件。
 35 |     #   video_save_path     表示视频保存的路径，当video_save_path=""时表示不保存
 36 |     #                       想要保存视频，则设置如video_save_path = "yyy.mp4"即可，代表保存为根目录下的yyy.mp4文件。
 37 |     #   video_fps           用于保存的视频的fps
 38 |     #
 39 |     #   video_path、video_save_path和video_fps仅在mode='video'时有效
 40 |     #   保存视频时需要ctrl+c退出或者运行到最后一帧才会完成完整的保存步骤。
 41 |     #----------------------------------------------------------------------------------------------------------#
 42 |     video_path      = "img/input.mp4"
 43 |     video_save_path = "img/output.mp4"
 44 |     video_fps       = 25.0
 45 |     #----------------------------------------------------------------------------------------------------------#
 46 |     #   test_interval       用于指定测量fps的时候，图片检测的次数。理论上test_interval越大，fps越准确。
 47 |     #   fps_image_path      用于指定测试的fps图片
 48 |     #   
 49 |     #   test_interval和fps_image_path仅在mode='fps'有效
 50 |     #----------------------------------------------------------------------------------------------------------#
 51 |     test_interval   = 100
 52 |     fps_image_path  = "img/test.jpg"
 53 |     #-------------------------------------------------------------------------#
 54 |     #   dir_origin_path     指定了用于检测的图片的文件夹路径
 55 |     #   dir_save_path       指定了检测完图片的保存路径
 56 |     #   
 57 |     #   dir_origin_path和dir_save_path仅在mode='dir_predict'时有效
 58 |     #-------------------------------------------------------------------------#
 59 |     dir_origin_path = "img/"
 60 |     dir_save_path   = "img_out/"
 61 |     #-------------------------------------------------------------------------#
 62 |     #   heatmap_save_path   热力图的保存路径，默认保存在model_data下
 63 |     #   
 64 |     #   heatmap_save_path仅在mode='heatmap'有效
 65 |     #-------------------------------------------------------------------------#
 66 |     heatmap_save_path = "model_data/heatmap_vision.png"
 67 |     #-------------------------------------------------------------------------#
 68 |     #   simplify            使用Simplify onnx
 69 |     #   onnx_save_path      指定了onnx的保存路径
 70 |     #-------------------------------------------------------------------------#
 71 |     simplify        = True
 72 |     onnx_save_path  = "model_data/models.onnx"
 73 | 
 74 |     if mode == "predict":
 75 |         '''
 76 |         1、如果想要进行检测完的图片的保存，利用r_image.save("img.jpg")即可保存，直接在predict.py里进行修改即可。 
 77 |         2、如果想要获得预测框的坐标，可以进入yolo.detect_image函数，在绘图部分读取top，left，bottom，right这四个值。
 78 |         3、如果想要利用预测框截取下目标，可以进入yolo.detect_image函数，在绘图部分利用获取到的top，left，bottom，right这四个值
 79 |         在原图上利用矩阵的方式进行截取。
 80 |         4、如果想要在预测图上写额外的字，比如检测到的特定目标的数量，可以进入yolo.detect_image函数，在绘图部分对predicted_class进行判断，
 81 |         比如判断if predicted_class == 'car': 即可判断当前目标是否为车，然后记录数量即可。利用draw.text即可写字。
 82 |         '''
 83 |         while True:
 84 |             img = input('Input image filename:')
 85 |             try:
 86 |                 image = Image.open(img)
 87 |             except:
 88 |                 print('Open Error! Try again!')
 89 |                 continue
 90 |             else:
 91 |                 r_image = yolo.detect_image(image, crop = crop, count=count)
 92 |                 r_image.show()
 93 | 
 94 |     elif mode == "video":
 95 |         capture = cv2.VideoCapture(video_path)
 96 |         if video_save_path!="":
 97 |             fourcc  = cv2.VideoWriter_fourcc(*'XVID')
 98 |             size    = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
 99 |             out     = cv2.VideoWriter(video_save_path, fourcc, video_fps, size)
100 | 
101 |         ref, frame = capture.read()
102 |         if not ref:
103 |             raise ValueError("未能正确读取摄像头（视频），请注意是否正确安装摄像头（是否正确填写视频路径）。")
104 | 
105 |         fps = 0.0
106 |         while(True):
107 |             t1 = time.time()
108 |             # 读取某一帧
109 |             ref, frame = capture.read()
110 |             if not ref:
111 |                 break
112 |             # 格式转变，BGRtoRGB
113 |             frame = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
114 |             # 转变成Image
115 |             frame = Image.fromarray(np.uint8(frame))
116 |             # 进行检测
117 |             frame = np.array(yolo.detect_image(frame))
118 |             # RGBtoBGR满足opencv显示格式
119 |             frame = cv2.cvtColor(frame,cv2.COLOR_RGB2BGR)
120 |             
121 |             fps  = ( fps + (1./(time.time()-t1)) ) / 2
122 |             print("fps= %.2f"%(fps))
123 |             frame = cv2.putText(frame, "fps= %.2f"%(fps), (0, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
124 |             
125 |             cv2.imshow("video",frame)
126 |             c= cv2.waitKey(1) & 0xff 
127 |             if video_save_path!="":
128 |                 out.write(frame)
129 | 
130 |             if c==27:
131 |                 capture.release()
132 |                 break
133 | 
134 |         print("Video Detection Done!")
135 |         capture.release()
136 |         if video_save_path!="":
137 |             print("Save processed video to the path :" + video_save_path)
138 |             out.release()
139 |         cv2.destroyAllWindows()
140 |         
141 |     elif mode == "fps":
142 |         img = Image.open(fps_image_path)
143 |         tact_time = yolo.get_FPS(img, test_interval)
144 |         print(str(tact_time) + ' seconds, ' + str(1/tact_time) + 'FPS, @batch_size 1')
145 | 
146 |     elif mode == "dir_predict":
147 |         import os
148 | 
149 |         from tqdm import tqdm
150 | 
151 |         img_names = os.listdir(dir_origin_path)
152 |         for img_name in tqdm(img_names):
153 |             if img_name.lower().endswith(('.bmp', '.dib', '.png', '.jpg', '.jpeg', '.pbm', '.pgm', '.ppm', '.tif', '.tiff')):
154 |                 image_path  = os.path.join(dir_origin_path, img_name)
155 |                 image       = Image.open(image_path)
156 |                 r_image     = yolo.detect_image(image)
157 |                 if not os.path.exists(dir_save_path):
158 |                     os.makedirs(dir_save_path)
159 |                 r_image.save(os.path.join(dir_save_path, img_name.replace(".jpg", ".png")), quality=95, subsampling=0)
160 | 
161 |     elif mode == "heatmap":
162 |         while True:
163 |             img = input('Input image filename:')
164 |             try:
165 |                 image = Image.open(img)
166 |             except:
167 |                 print('Open Error! Try again!')
168 |                 continue
169 |             else:
170 |                 yolo.detect_heatmap(image, heatmap_save_path)
171 |                 
172 |     elif mode == "export_onnx":
173 |         yolo.convert_to_onnx(simplify, onnx_save_path)
174 |         
175 |     else:
176 |         raise AssertionError("Please specify the correct mode: 'predict', 'video', 'fps', 'heatmap', 'export_onnx', 'dir_predict'.")
177 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | scipy==1.9.1
 2 | numpy==1.23.1
 3 | matplotlib==3.4.3
 4 | opencv_python==4.7.0
 5 | torch==1.10.1
 6 | torchvision==0.11.2
 7 | tqdm==4.62.2
 8 | Pillow==9.3.0
 9 | h5py==2.10.0
10 | 


--------------------------------------------------------------------------------
/summary.py:
--------------------------------------------------------------------------------
 1 | #--------------------------------------------#
 2 | #   该部分代码用于看网络结构
 3 | #--------------------------------------------#
 4 | import torch
 5 | from thop import clever_format, profile
 6 | 
 7 | from nets.yolo import YoloBody
 8 | 
 9 | if __name__ == "__main__":
10 |     input_shape     = [640, 640]
11 |     anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
12 |     num_classes     = 80
13 |     phi             = 'l'
14 |     
15 |     device  = torch.device("cuda" if torch.cuda.is_available() else "cpu")
16 |     m       = YoloBody(anchors_mask, num_classes, phi, False).to(device)
17 |     for i in m.children():
18 |         print(i)
19 |         print('==============================')
20 |     
21 |     dummy_input     = torch.randn(1, 3, input_shape[0], input_shape[1]).to(device)
22 |     flops, params   = profile(m.to(device), (dummy_input, ), verbose=False)
23 |     #--------------------------------------------------------#
24 |     #   flops * 2是因为profile没有将卷积作为两个operations
25 |     #   有些论文将卷积算乘法、加法两个operations。此时乘2
26 |     #   有些论文只考虑乘法的运算次数，忽略加法。此时不乘2
27 |     #   本代码选择乘2，参考YOLOX。
28 |     #--------------------------------------------------------#
29 |     flops           = flops * 2
30 |     flops, params   = clever_format([flops, params], "%.3f")
31 |     print('Total GFLOPS: %s' % (flops))
32 |     print('Total params: %s' % (params))
33 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | #


--------------------------------------------------------------------------------
/utils/callbacks.py:
--------------------------------------------------------------------------------
  1 | import datetime
  2 | import os
  3 | 
  4 | import torch
  5 | import matplotlib
  6 | matplotlib.use('Agg')
  7 | import scipy.signal
  8 | from matplotlib import pyplot as plt
  9 | from torch.utils.tensorboard import SummaryWriter
 10 | from utils.utils_rbox import rbox2poly, poly2hbb
 11 | import shutil
 12 | import numpy as np
 13 | 
 14 | from PIL import Image
 15 | from tqdm import tqdm
 16 | from .utils import cvtColor, preprocess_input, resize_image
 17 | from .utils_bbox import DecodeBox
 18 | from .utils_map import get_coco_map, get_map
 19 | 
 20 | 
 21 | class LossHistory():
 22 |     def __init__(self, log_dir, model, input_shape):
 23 |         self.log_dir    = log_dir
 24 |         self.losses     = []
 25 |         self.val_loss   = []
 26 |         
 27 |         os.makedirs(self.log_dir)
 28 |         self.writer     = SummaryWriter(self.log_dir)
 29 |         try:
 30 |             dummy_input     = torch.randn(2, 3, input_shape[0], input_shape[1])
 31 |             self.writer.add_graph(model, dummy_input)
 32 |         except:
 33 |             pass
 34 | 
 35 |     def append_loss(self, epoch, loss, val_loss):
 36 |         if not os.path.exists(self.log_dir):
 37 |             os.makedirs(self.log_dir)
 38 | 
 39 |         self.losses.append(loss)
 40 |         self.val_loss.append(val_loss)
 41 | 
 42 |         with open(os.path.join(self.log_dir, "epoch_loss.txt"), 'a') as f:
 43 |             f.write(str(loss))
 44 |             f.write("\n")
 45 |         with open(os.path.join(self.log_dir, "epoch_val_loss.txt"), 'a') as f:
 46 |             f.write(str(val_loss))
 47 |             f.write("\n")
 48 | 
 49 |         self.writer.add_scalar('loss', loss, epoch)
 50 |         self.writer.add_scalar('val_loss', val_loss, epoch)
 51 |         self.loss_plot()
 52 | 
 53 |     def loss_plot(self):
 54 |         iters = range(len(self.losses))
 55 | 
 56 |         plt.figure()
 57 |         plt.plot(iters, self.losses, 'red', linewidth = 2, label='train loss')
 58 |         plt.plot(iters, self.val_loss, 'coral', linewidth = 2, label='val loss')
 59 |         try:
 60 |             if len(self.losses) < 25:
 61 |                 num = 5
 62 |             else:
 63 |                 num = 15
 64 |             
 65 |             plt.plot(iters, scipy.signal.savgol_filter(self.losses, num, 3), 'green', linestyle = '--', linewidth = 2, label='smooth train loss')
 66 |             plt.plot(iters, scipy.signal.savgol_filter(self.val_loss, num, 3), '#8B4513', linestyle = '--', linewidth = 2, label='smooth val loss')
 67 |         except:
 68 |             pass
 69 | 
 70 |         plt.grid(True)
 71 |         plt.xlabel('Epoch')
 72 |         plt.ylabel('Loss')
 73 |         plt.legend(loc="upper right")
 74 | 
 75 |         plt.savefig(os.path.join(self.log_dir, "epoch_loss.png"))
 76 | 
 77 |         plt.cla()
 78 |         plt.close("all")
 79 | 
 80 | class EvalCallback():
 81 |     def __init__(self, net, input_shape, anchors, anchors_mask, class_names, num_classes, val_lines, log_dir, cuda, \
 82 |             map_out_path=".temp_map_out", max_boxes=100, confidence=0.05, nms_iou=0.5, letterbox_image=False, MINOVERLAP=0.5, eval_flag=True, period=1):
 83 |         super(EvalCallback, self).__init__()
 84 |         
 85 |         self.net                = net
 86 |         self.input_shape        = input_shape
 87 |         self.anchors            = anchors
 88 |         self.anchors_mask       = anchors_mask
 89 |         self.class_names        = class_names
 90 |         self.num_classes        = num_classes
 91 |         self.val_lines          = val_lines
 92 |         self.log_dir            = log_dir
 93 |         self.cuda               = cuda
 94 |         self.map_out_path       = map_out_path
 95 |         self.max_boxes          = max_boxes
 96 |         self.confidence         = confidence
 97 |         self.nms_iou            = nms_iou
 98 |         self.letterbox_image    = letterbox_image
 99 |         self.MINOVERLAP         = MINOVERLAP
100 |         self.eval_flag          = eval_flag
101 |         self.period             = period
102 |         
103 |         self.bbox_util          = DecodeBox(self.anchors, self.num_classes, (self.input_shape[0], self.input_shape[1]), self.anchors_mask)
104 |         
105 |         self.maps       = [0]
106 |         self.epoches    = [0]
107 |         if self.eval_flag:
108 |             with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
109 |                 f.write(str(0))
110 |                 f.write("\n")
111 | 
112 |     def get_map_txt(self, image_id, image, class_names, map_out_path):
113 |         f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"), "w", encoding='utf-8') 
114 |         image_shape = np.array(np.shape(image)[0:2])
115 |         #---------------------------------------------------------#
116 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
117 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
118 |         #---------------------------------------------------------#
119 |         image       = cvtColor(image)
120 |         #---------------------------------------------------------#
121 |         #   给图像增加灰条，实现不失真的resize
122 |         #   也可以直接resize进行识别
123 |         #---------------------------------------------------------#
124 |         image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
125 |         #---------------------------------------------------------#
126 |         #   添加上batch_size维度
127 |         #---------------------------------------------------------#
128 |         image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
129 | 
130 |         with torch.no_grad():
131 |             images = torch.from_numpy(image_data)
132 |             if self.cuda:
133 |                 images = images.cuda()
134 |             #---------------------------------------------------------#
135 |             #   将图像输入网络当中进行预测！
136 |             #---------------------------------------------------------#
137 |             outputs = self.net(images)
138 |             outputs = self.bbox_util.decode_box(outputs)
139 |             #---------------------------------------------------------#
140 |             #   将预测框进行堆叠，然后进行非极大抑制
141 |             #---------------------------------------------------------#
142 |             results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 
143 |                         image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
144 |                                                     
145 |             if results[0] is None: 
146 |                 return 
147 | 
148 |             top_label   = np.array(results[0][:, 7], dtype = 'int32')
149 |             top_conf    = results[0][:, 5] * results[0][:, 6]
150 |             top_rboxes  = results[0][:, :5]
151 |             top_polys   = rbox2poly(top_rboxes)
152 |             top_hbbs    = poly2hbb(top_polys)
153 |         top_100     = np.argsort(top_conf)[::-1][:self.max_boxes]
154 |         top_hbbs    = top_hbbs[top_100]
155 |         top_conf    = top_conf[top_100]
156 |         top_label   = top_label[top_100]
157 | 
158 |         for i, c in list(enumerate(top_label)):
159 |             predicted_class = self.class_names[int(c)]
160 |             hbb             = top_hbbs[i]
161 |             score           = str(top_conf[i])
162 | 
163 |             xc, yc, w, h = hbb
164 |             left   = xc - w/2
165 |             top    = yc - h/2
166 |             right  = xc + w/2
167 |             bottom = yc + h/2
168 |             if predicted_class not in class_names:
169 |                 continue
170 | 
171 |             f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
172 | 
173 |         f.close()
174 |         return 
175 |     
176 |     def on_epoch_end(self, epoch, model_eval):
177 |         if epoch % self.period == 0 and self.eval_flag:
178 |             self.net = model_eval
179 |             if not os.path.exists(self.map_out_path):
180 |                 os.makedirs(self.map_out_path)
181 |             if not os.path.exists(os.path.join(self.map_out_path, "ground-truth")):
182 |                 os.makedirs(os.path.join(self.map_out_path, "ground-truth"))
183 |             if not os.path.exists(os.path.join(self.map_out_path, "detection-results")):
184 |                 os.makedirs(os.path.join(self.map_out_path, "detection-results"))
185 |             print("Get map.")
186 |             for annotation_line in tqdm(self.val_lines):
187 |                 line        = annotation_line.split()
188 |                 image_id    = os.path.basename(line[0]).split('.')[0]
189 |                 #------------------------------#
190 |                 #   读取图像并转换成RGB图像
191 |                 #------------------------------#
192 |                 image       = Image.open(line[0])
193 |                 #------------------------------#
194 |                 #   获得预测框
195 |                 #------------------------------#
196 |                 gt_boxes    = np.array([np.array(list(map(float,box.split(',')))) for box in line[1:]])
197 |                 #------------------------------#
198 |                 #   将polygon转换为hbb
199 |                 #------------------------------#
200 |                 hbbs        = np.zeros((gt_boxes.shape[0], 5))
201 |                 hbbs[..., :4] = poly2hbb(gt_boxes[..., :8])
202 |                 hbbs[..., 4]  = gt_boxes[..., 8]
203 |                 #------------------------------#
204 |                 #   获得预测txt
205 |                 #------------------------------#
206 |                 self.get_map_txt(image_id, image, self.class_names, self.map_out_path)
207 |                 
208 |                 #------------------------------#
209 |                 #   获得真实框txt
210 |                 #------------------------------#
211 |                 with open(os.path.join(self.map_out_path, "ground-truth/"+image_id+".txt"), "w") as new_f:
212 |                     for hbb in hbbs:
213 |                         xc, yc, w, h, obj = hbb
214 |                         left   = xc - w/2
215 |                         top    = yc - h/2
216 |                         right  = xc + w/2
217 |                         bottom = yc + h/2
218 |                         obj_name = self.class_names[int(obj)]
219 |                         new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
220 |                         
221 |             print("Calculate Map.")
222 |             try:
223 |                 temp_map = get_coco_map(class_names = self.class_names, path = self.map_out_path)[1]
224 |             except:
225 |                 temp_map = get_map(self.MINOVERLAP, False, path = self.map_out_path)
226 |             self.maps.append(temp_map)
227 |             self.epoches.append(epoch)
228 | 
229 |             with open(os.path.join(self.log_dir, "epoch_map.txt"), 'a') as f:
230 |                 f.write(str(temp_map))
231 |                 f.write("\n")
232 |             
233 |             plt.figure()
234 |             plt.plot(self.epoches, self.maps, 'red', linewidth = 2, label='train map')
235 | 
236 |             plt.grid(True)
237 |             plt.xlabel('Epoch')
238 |             plt.ylabel('Map %s'%str(self.MINOVERLAP))
239 |             plt.title('A Map Curve')
240 |             plt.legend(loc="upper right")
241 | 
242 |             plt.savefig(os.path.join(self.log_dir, "epoch_map.png"))
243 |             plt.cla()
244 |             plt.close("all")
245 | 
246 |             print("Get map done.")
247 |             shutil.rmtree(self.map_out_path)
248 | 


--------------------------------------------------------------------------------
/utils/dataloader.py:
--------------------------------------------------------------------------------
  1 | from random import sample, shuffle
  2 | 
  3 | import cv2
  4 | import numpy as np
  5 | import torch
  6 | from PIL import Image, ImageDraw
  7 | from torch.utils.data.dataset import Dataset
  8 | 
  9 | from utils.utils import cvtColor, preprocess_input
 10 | from utils.utils_rbox import poly2rbox, rbox2poly
 11 | 
 12 | class YoloDataset(Dataset):
 13 |     def __init__(self, annotation_lines, input_shape, num_classes, anchors, anchors_mask, epoch_length, \
 14 |                         mosaic, mixup, mosaic_prob, mixup_prob, train, special_aug_ratio = 0.7):
 15 |         super(YoloDataset, self).__init__()
 16 |         self.annotation_lines   = annotation_lines
 17 |         self.input_shape        = input_shape
 18 |         self.num_classes        = num_classes
 19 |         self.anchors            = anchors
 20 |         self.anchors_mask       = anchors_mask
 21 |         self.epoch_length       = epoch_length
 22 |         self.mosaic             = mosaic
 23 |         self.mosaic_prob        = mosaic_prob
 24 |         self.mixup              = mixup
 25 |         self.mixup_prob         = mixup_prob
 26 |         self.train              = train
 27 |         self.special_aug_ratio  = special_aug_ratio
 28 | 
 29 |         self.epoch_now          = -1
 30 |         self.length             = len(self.annotation_lines)
 31 |         
 32 |         self.bbox_attrs         = 5 + 1 + num_classes
 33 | 
 34 |     def __len__(self):
 35 |         return self.length
 36 | 
 37 |     def __getitem__(self, index):
 38 |         index       = index % self.length
 39 | 
 40 |         #---------------------------------------------------#
 41 |         #   训练时进行数据的随机增强
 42 |         #   验证时不进行数据的随机增强
 43 |         #---------------------------------------------------#
 44 |         if self.mosaic and self.rand() < self.mosaic_prob and self.epoch_now < self.epoch_length * self.special_aug_ratio:
 45 |             lines = sample(self.annotation_lines, 3)
 46 |             lines.append(self.annotation_lines[index])
 47 |             shuffle(lines)
 48 |             image, rbox  = self.get_random_data_with_Mosaic(lines, self.input_shape)
 49 |                 
 50 |             if self.mixup and self.rand() < self.mixup_prob:
 51 |                 lines           = sample(self.annotation_lines, 1)
 52 |                 image_2, rbox_2  = self.get_random_data(lines[0], self.input_shape, random = self.train)
 53 |                 image, rbox      = self.get_random_data_with_MixUp(image, rbox, image_2, rbox_2)
 54 |         else:
 55 |             image, rbox      = self.get_random_data(self.annotation_lines[index], self.input_shape, random = self.train)
 56 | 
 57 |         image       = np.transpose(preprocess_input(np.array(image, dtype=np.float32)), (2, 0, 1))
 58 |         rbox        = np.array(rbox, dtype=np.float32)
 59 |         
 60 |         #---------------------------------------------------#
 61 |         #   对真实框进行预处理
 62 |         #---------------------------------------------------#
 63 |         nL          = len(rbox)
 64 |         labels_out  = np.zeros((nL, 7))
 65 |         if nL:
 66 |             #---------------------------------------------------#
 67 |             #   对真实框进行归一化，调整到0-1之间
 68 |             #---------------------------------------------------#
 69 |             rbox[:, [0, 2]] = rbox[:, [0, 2]] / self.input_shape[1]
 70 |             rbox[:, [1, 3]] = rbox[:, [1, 3]] / self.input_shape[0]
 71 |             #---------------------------------------------------#
 72 |             #---------------------------------------------------#
 73 |             #   调整顺序，符合训练的格式
 74 |             #   labels_out中序号为0的部分在collect时处理
 75 |             #---------------------------------------------------#
 76 |             labels_out[:, 1]  = rbox[:, -1]
 77 |             labels_out[:, 2:] = rbox[:, :5]
 78 |             
 79 |         return image, labels_out
 80 | 
 81 |     def rand(self, a=0, b=1):
 82 |         return np.random.rand()*(b-a) + a
 83 | 
 84 |     def get_random_data(self, annotation_line, input_shape, jitter=.3, hue=.1, sat=0.7, val=0.4, random=True, show=False):
 85 |         line    = annotation_line.split()
 86 |         #------------------------------#
 87 |         #   读取图像并转换成RGB图像
 88 |         #------------------------------#
 89 |         image   = Image.open(line[0])
 90 |         image   = cvtColor(image)
 91 |         #------------------------------#
 92 |         #   获得图像的高宽与目标高宽
 93 |         #------------------------------#
 94 |         iw, ih  = image.size
 95 |         h, w    = input_shape
 96 |         #------------------------------#
 97 |         #   获得预测框
 98 |         #------------------------------#
 99 |         box     = np.array([np.array(list(map(float,box.split(',')))) for box in line[1:]])
100 | 
101 |         if not random:
102 |             scale = min(w/iw, h/ih)
103 |             nw = int(iw*scale)
104 |             nh = int(ih*scale)
105 |             dx = (w-nw)//2
106 |             dy = (h-nh)//2
107 | 
108 |             #---------------------------------#
109 |             #   将图像多余的部分加上灰条
110 |             #---------------------------------#
111 |             image       = image.resize((nw,nh), Image.BICUBIC)
112 |             new_image   = Image.new('RGB', (w,h), (128,128,128))
113 |             new_image.paste(image, (dx, dy))
114 |             image_data  = np.array(new_image, np.float32)
115 | 
116 |             #---------------------------------#
117 |             #   对真实框进行调整
118 |             #---------------------------------#
119 |             if len(box)>0:
120 |                 np.random.shuffle(box)
121 |                 box[:, [0,2,4,6]] = box[:, [0,2,4,6]]*nw/iw + dx
122 |                 box[:, [1,3,5,7]] = box[:, [1,3,5,7]]*nh/ih + dy
123 |                 #------------------------------#
124 |                 #   将polygon转换为rbox
125 |                 #------------------------------#
126 |                 rbox          = np.zeros((box.shape[0], 6))
127 |                 rbox[..., :5] = poly2rbox(box[..., :8])
128 |                 rbox[..., 5]  = box[..., 8]
129 |                 keep = (rbox[:, 0] >= 0) & (rbox[:, 0] < w) \
130 |                         & (rbox[:, 1] >= 0) & (rbox[:, 0] < h) \
131 |                         & (rbox[:, 2] > 5) | (rbox[:, 3] > 5)
132 |                 rbox = rbox[keep]
133 |             return image_data, rbox
134 | 
135 |         #------------------------------------------#
136 |         #   对图像进行缩放并且进行长和宽的扭曲
137 |         #------------------------------------------#
138 |         new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter)
139 |         scale = self.rand(.25, 2)
140 |         if new_ar < 1:
141 |             nh = int(scale*h)
142 |             nw = int(nh*new_ar)
143 |         else:
144 |             nw = int(scale*w)
145 |             nh = int(nw/new_ar)
146 |         image = image.resize((nw,nh), Image.BICUBIC)
147 | 
148 |         #------------------------------------------#
149 |         #   将图像多余的部分加上灰条
150 |         #------------------------------------------#
151 |         dx = int(self.rand(0, w-nw))
152 |         dy = int(self.rand(0, h-nh))
153 |         new_image = Image.new('RGB', (w,h), (128,128,128))
154 |         new_image.paste(image, (dx, dy))
155 |         image = new_image
156 |         #------------------------------------------#
157 |         #   翻转图像
158 |         #------------------------------------------#
159 |         flip = self.rand()<.5
160 |         if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)
161 |         
162 |         image_data      = np.array(image, np.uint8)
163 |         #---------------------------------#
164 |         #   对图像进行色域变换
165 |         #   计算色域变换的参数
166 |         #---------------------------------#
167 |         r               = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1
168 |         #---------------------------------#
169 |         #   将图像转到HSV上
170 |         #---------------------------------#
171 |         hue, sat, val   = cv2.split(cv2.cvtColor(image_data, cv2.COLOR_RGB2HSV))
172 |         dtype           = image_data.dtype
173 |         #---------------------------------#
174 |         #   应用变换
175 |         #---------------------------------#
176 |         x       = np.arange(0, 256, dtype=r.dtype)
177 |         lut_hue = ((x * r[0]) % 180).astype(dtype)
178 |         lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
179 |         lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
180 | 
181 |         image_data = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))
182 |         image_data = cv2.cvtColor(image_data, cv2.COLOR_HSV2RGB)
183 |         #---------------------------------#
184 |         #   对真实框进行调整
185 |         #---------------------------------#
186 |         if len(box)>0:
187 |             np.random.shuffle(box)
188 |             box[:, [0,2,4,6]] = box[:, [0,2,4,6]]*nw/iw + dx
189 |             box[:, [1,3,5,7]] = box[:, [1,3,5,7]]*nh/ih + dy
190 |             if flip: box[:, [0,2,4,6]] = w - box[:, [0,2,4,6]]
191 |             #------------------------------#
192 |             #   将polygon转换为rbox
193 |             #------------------------------#
194 |             rbox          = np.zeros((box.shape[0], 6))
195 |             rbox[..., :5] = poly2rbox(box[..., :8])
196 |             rbox[..., 5]  = box[..., 8]
197 |             keep = (rbox[:, 0] >= 0) & (rbox[:, 0] < w) \
198 |                     & (rbox[:, 1] >= 0) & (rbox[:, 0] < h) \
199 |                     & (rbox[:, 2] > 5) | (rbox[:, 3] > 5)
200 |             rbox = rbox[keep]
201 |         #------------------------------#
202 |         #   检查旋转框
203 |         #------------------------------#
204 |         if show:
205 |             draw  = ImageDraw.Draw(image)
206 |             polys = rbox2poly(rbox[..., :5])
207 |             for poly in polys:
208 |                 draw.polygon(xy=list(poly))
209 |             image.show()
210 |         return image_data, rbox
211 |     
212 |     def merge_rboxes(self, rboxes, cutx, cuty):
213 |         merge_rbox = []
214 |         for i in range(len(rboxes)):
215 |             for rbox in rboxes[i]:
216 |                 tmp_rbox = []
217 |                 xc, yc, w, h = rbox[0], rbox[1], rbox[2], rbox[3]
218 |                 tmp_rbox.append(xc)
219 |                 tmp_rbox.append(yc)
220 |                 tmp_rbox.append(h)
221 |                 tmp_rbox.append(w)
222 |                 tmp_rbox.append(rbox[-1])
223 |                 merge_rbox.append(rbox)
224 |         merge_rbox = np.array(merge_rbox)
225 |         return merge_rbox
226 | 
227 |     def get_random_data_with_Mosaic(self, annotation_line, input_shape, jitter=0.3, hue=.1, sat=0.7, val=0.4, show=False):
228 |         h, w = input_shape
229 |         min_offset_x = self.rand(0.3, 0.7)
230 |         min_offset_y = self.rand(0.3, 0.7)
231 | 
232 |         image_datas = [] 
233 |         rbox_datas  = []
234 |         index       = 0
235 |         for line in annotation_line:
236 |             #---------------------------------#
237 |             #   每一行进行分割
238 |             #---------------------------------#
239 |             line_content = line.split()
240 |             #---------------------------------#
241 |             #   打开图片
242 |             #---------------------------------#
243 |             image = Image.open(line_content[0])
244 |             image = cvtColor(image)
245 |             
246 |             #---------------------------------#
247 |             #   图片的大小
248 |             #---------------------------------#
249 |             iw, ih = image.size
250 |             #---------------------------------#
251 |             #   保存框的位置
252 |             #---------------------------------#
253 |             box = np.array([np.array(list(map(float,box.split(',')))) for box in line_content[1:]])
254 |             #---------------------------------#
255 |             #   是否翻转图片
256 |             #---------------------------------#
257 |             flip = self.rand()<.5
258 |             if flip and len(box)>0:
259 |                 image = image.transpose(Image.FLIP_LEFT_RIGHT)
260 |                 box[:, [0,2,4,6]] = iw - box[:, [0,2,4,6]]
261 |             #------------------------------------------#
262 |             #   对图像进行缩放并且进行长和宽的扭曲
263 |             #------------------------------------------#
264 |             new_ar = iw/ih * self.rand(1-jitter,1+jitter) / self.rand(1-jitter,1+jitter)
265 |             scale = self.rand(.4, 1)
266 |             if new_ar < 1:
267 |                 nh = int(scale*h)
268 |                 nw = int(nh*new_ar)
269 |             else:
270 |                 nw = int(scale*w)
271 |                 nh = int(nw/new_ar)
272 |             image = image.resize((nw, nh), Image.BICUBIC)
273 | 
274 |             #-----------------------------------------------#
275 |             #   将图片进行放置，分别对应四张分割图片的位置
276 |             #-----------------------------------------------#
277 |             if index == 0:
278 |                 dx = int(w*min_offset_x) - nw
279 |                 dy = int(h*min_offset_y) - nh
280 |             elif index == 1:
281 |                 dx = int(w*min_offset_x) - nw
282 |                 dy = int(h*min_offset_y)
283 |             elif index == 2:
284 |                 dx = int(w*min_offset_x)
285 |                 dy = int(h*min_offset_y)
286 |             elif index == 3:
287 |                 dx = int(w*min_offset_x)
288 |                 dy = int(h*min_offset_y) - nh
289 |             
290 |             new_image = Image.new('RGB', (w,h), (128,128,128))
291 |             new_image.paste(image, (dx, dy))
292 |             image_data = np.array(new_image)
293 | 
294 |             index = index + 1
295 |             rbox_data = []
296 |             #---------------------------------#
297 |             #   对rbox进行重新处理
298 |             #---------------------------------#
299 |             if len(box)>0:
300 |                 np.random.shuffle(box)
301 |                 box[:, [0,2,4,6]] = box[:, [0,2,4,6]]*nw/iw + dx
302 |                 box[:, [1,3,5,7]] = box[:, [1,3,5,7]]*nh/ih + dy
303 |                 #------------------------------#
304 |                 #   将polygon转换为rbox
305 |                 #------------------------------#
306 |                 rbox          = np.zeros((box.shape[0], 6))
307 |                 rbox[..., :5] = poly2rbox(box[..., :8])
308 |                 rbox[..., 5]  = box[..., 8]
309 |                 keep = (rbox[:, 0] >= 0) & (rbox[:, 0] < w) \
310 |                         & (rbox[:, 1] >= 0) & (rbox[:, 0] < h) \
311 |                         & (rbox[:, 2] > 5) | (rbox[:, 3] > 5)
312 |                 rbox = rbox[keep]
313 |                 rbox_data = np.zeros((len(rbox),6))
314 |                 rbox_data[:len(rbox)] = rbox
315 |             
316 |             image_datas.append(image_data)
317 |             rbox_datas.append(rbox_data)
318 | 
319 |         #---------------------------------#
320 |         #   将图片分割，放在一起
321 |         #---------------------------------#
322 |         cutx = int(w * min_offset_x)
323 |         cuty = int(h * min_offset_y)
324 | 
325 |         new_image = np.zeros([h, w, 3])
326 |         new_image[:cuty, :cutx, :] = image_datas[0][:cuty, :cutx, :]
327 |         new_image[cuty:, :cutx, :] = image_datas[1][cuty:, :cutx, :]
328 |         new_image[cuty:, cutx:, :] = image_datas[2][cuty:, cutx:, :]
329 |         new_image[:cuty, cutx:, :] = image_datas[3][:cuty, cutx:, :]
330 | 
331 |         new_image       = np.array(new_image, np.uint8)
332 |         #---------------------------------#
333 |         #   对图像进行色域变换
334 |         #   计算色域变换的参数
335 |         #---------------------------------#
336 |         r               = np.random.uniform(-1, 1, 3) * [hue, sat, val] + 1
337 |         #---------------------------------#
338 |         #   将图像转到HSV上
339 |         #---------------------------------#
340 |         hue, sat, val   = cv2.split(cv2.cvtColor(new_image, cv2.COLOR_RGB2HSV))
341 |         dtype           = new_image.dtype
342 |         #---------------------------------#
343 |         #   应用变换
344 |         #---------------------------------#
345 |         x       = np.arange(0, 256, dtype=r.dtype)
346 |         lut_hue = ((x * r[0]) % 180).astype(dtype)
347 |         lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)
348 |         lut_val = np.clip(x * r[2], 0, 255).astype(dtype)
349 | 
350 |         new_image = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))
351 |         new_image = cv2.cvtColor(new_image, cv2.COLOR_HSV2RGB)
352 | 
353 |         #---------------------------------#
354 |         #   对框进行进一步的处理
355 |         #---------------------------------#
356 |         new_rboxes = self.merge_rboxes(rbox_datas, cutx, cuty)
357 |         #---------------------------------#
358 |         #   检查旋转框
359 |         #---------------------------------#
360 |         if show:
361 |             new_img = Image.fromarray(new_image) 
362 |             draw    = ImageDraw.Draw(new_img)
363 |             polys   = rbox2poly(new_rboxes[..., :5])
364 |             for poly in polys:
365 |                 draw.polygon(xy=list(poly))
366 |             new_img.show()
367 |         return new_image, new_rboxes
368 | 
369 |     def get_random_data_with_MixUp(self, image_1, rbox_1, image_2, rbox_2):
370 |         new_image = np.array(image_1, np.float32) * 0.5 + np.array(image_2, np.float32) * 0.5
371 |         if len(rbox_1) == 0:
372 |             new_rboxes = rbox_2
373 |         elif len(rbox_2) == 0:
374 |             new_rboxes = rbox_1
375 |         else:
376 |             new_rboxes = np.concatenate([rbox_1, rbox_2], axis=0)
377 |         return new_image, new_rboxes
378 |     
379 |     
380 | # DataLoader中collate_fn使用
381 | def yolo_dataset_collate(batch):
382 |     images  = []
383 |     bboxes  = []
384 |     for i, (img, box) in enumerate(batch):
385 |         images.append(img)
386 |         box[:, 0] = i
387 |         bboxes.append(box)
388 |             
389 |     images  = torch.from_numpy(np.array(images)).type(torch.FloatTensor)
390 |     bboxes  = torch.from_numpy(np.concatenate(bboxes, 0)).type(torch.FloatTensor)
391 |     return images, bboxes
392 | 


--------------------------------------------------------------------------------
/utils/kld_loss.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Author: [egrt]
  3 | Date: 2023-01-30 18:47:24
  4 | LastEditors: Egrt
  5 | LastEditTime: 2023-05-26 15:01:10
  6 | Description: 
  7 | '''
  8 | import torch
  9 | import torch.nn as nn
 10 | 
 11 | class KLDloss(nn.Module):
 12 | 
 13 |     def __init__(self, taf=1.0, fun="sqrt"):
 14 |         super(KLDloss, self).__init__()
 15 |         self.fun = fun
 16 |         self.taf = taf
 17 |         self.eps = 1e-8
 18 | 
 19 |     def forward(self, pred, target): # pred [[x,y,w,h,angle], ...]
 20 |         #assert pred.shape[0] == target.shape[0]
 21 | 
 22 |         pred = pred.view(-1, 5)
 23 |         target = target.view(-1, 5)
 24 | 
 25 |         delta_x = pred[:, 0] - target[:, 0]
 26 |         delta_y = pred[:, 1] - target[:, 1]
 27 |         pre_angle_radian = pred[:, 4]
 28 |         targrt_angle_radian = target[:, 4]
 29 |         delta_angle_radian = pre_angle_radian - targrt_angle_radian
 30 | 
 31 |         kld =  0.5 * (
 32 |                         4 * torch.pow( ( delta_x.mul(torch.cos(targrt_angle_radian)) + delta_y.mul(torch.sin(targrt_angle_radian)) ), 2) / torch.pow(target[:, 2], 2)
 33 |                       + 4 * torch.pow( ( delta_y.mul(torch.cos(targrt_angle_radian)) - delta_x.mul(torch.sin(targrt_angle_radian)) ), 2) / torch.pow(target[:, 3], 2)
 34 |                      )\
 35 |              + 0.5 * (
 36 |                         torch.pow(pred[:, 3], 2) / torch.pow(target[:, 2], 2) * torch.pow(torch.sin(delta_angle_radian), 2)
 37 |                       + torch.pow(pred[:, 2], 2) / torch.pow(target[:, 3], 2) * torch.pow(torch.sin(delta_angle_radian), 2)
 38 |                       + torch.pow(pred[:, 3], 2) / torch.pow(target[:, 3], 2) * torch.pow(torch.cos(delta_angle_radian), 2)
 39 |                       + torch.pow(pred[:, 2], 2) / torch.pow(target[:, 2], 2) * torch.pow(torch.cos(delta_angle_radian), 2)
 40 |                      )\
 41 |              + 0.5 * (
 42 |                         torch.log(torch.pow(target[:, 3], 2) / torch.pow(pred[:, 3], 2))
 43 |                       + torch.log(torch.pow(target[:, 2], 2) / torch.pow(pred[:, 2], 2))
 44 |                      )\
 45 |              - 1.0
 46 | 
 47 |         
 48 | 
 49 |         if self.fun == "sqrt":
 50 |             kld = kld.clamp(1e-7).sqrt()
 51 |         elif self.fun == "log1p":
 52 |             kld = torch.log1p(kld.clamp(1e-7))
 53 |         else:
 54 |             pass
 55 | 
 56 |         kld_loss = 1 - 1 / (self.taf + self.eps + kld)
 57 | 
 58 |         return kld_loss
 59 |     
 60 | def compute_kld_loss(targets, preds,taf=1.0,fun='sqrt'):
 61 |     with torch.no_grad():
 62 |         kld_loss_ts_ps = torch.zeros(0, preds.shape[0], device=targets.device)
 63 |         for target in targets:
 64 |             target = target.unsqueeze(0).repeat(preds.shape[0], 1)
 65 |             kld_loss_t_p = kld_loss(preds, target,taf=taf, fun=fun)
 66 |             kld_loss_ts_ps = torch.cat((kld_loss_ts_ps, kld_loss_t_p.unsqueeze(0)), dim=0)
 67 |     return kld_loss_ts_ps
 68 | 
 69 | 
 70 | def kld_loss(pred, target, taf=1.0, fun='sqrt'):  # pred [[x,y,w,h,angle], ...]
 71 |     #assert pred.shape[0] == target.shape[0]
 72 | 
 73 |     pred = pred.view(-1, 5)
 74 |     target = target.view(-1, 5)
 75 | 
 76 |     delta_x = pred[:, 0] - target[:, 0]
 77 |     delta_y = pred[:, 1] - target[:, 1]
 78 |     pre_angle_radian = pred[:, 4]  #3.141592653589793 * pred[:, 4] / 180.0
 79 |     targrt_angle_radian = target[:, 4] #3.141592653589793 * target[:, 4] / 180.0
 80 |     delta_angle_radian = pre_angle_radian - targrt_angle_radian
 81 | 
 82 |     kld = 0.5 * (
 83 |             4 * torch.pow((delta_x.mul(torch.cos(targrt_angle_radian)) + delta_y.mul(torch.sin(targrt_angle_radian))),
 84 |                           2) / torch.pow(target[:, 2], 2)
 85 |             + 4 * torch.pow((delta_y.mul(torch.cos(targrt_angle_radian)) - delta_x.mul(torch.sin(targrt_angle_radian))),
 86 |                             2) / torch.pow(target[:, 3], 2)
 87 |     ) \
 88 |           + 0.5 * (
 89 |                   torch.pow(pred[:, 3], 2) / torch.pow(target[:, 2], 2) * torch.pow(torch.sin(delta_angle_radian), 2)
 90 |                   + torch.pow(pred[:, 2], 2) / torch.pow(target[:, 3], 2) * torch.pow(torch.sin(delta_angle_radian), 2)
 91 |                   + torch.pow(pred[:, 3], 2) / torch.pow(target[:, 3], 2) * torch.pow(torch.cos(delta_angle_radian), 2)
 92 |                   + torch.pow(pred[:, 2], 2) / torch.pow(target[:, 2], 2) * torch.pow(torch.cos(delta_angle_radian), 2)
 93 |           ) \
 94 |           + 0.5 * (
 95 |                   torch.log(torch.pow(target[:, 3], 2) / torch.pow(pred[:, 3], 2))
 96 |                   + torch.log(torch.pow(target[:, 2], 2) / torch.pow(pred[:, 2], 2))
 97 |           ) \
 98 |           - 1.0
 99 | 
100 |     if fun == "sqrt":
101 |         kld = kld.clamp(1e-7).sqrt()
102 |     elif fun == "log1p":
103 |         kld = torch.log1p(kld.clamp(1e-7))
104 |     else:
105 |         pass
106 | 
107 |     kld_loss = 1 - 1 / (taf + kld)
108 |     return kld_loss
109 | 
110 | if __name__ == '__main__':
111 |     '''
112 |         测试损失函数
113 |     '''
114 |     kld_loss_n = KLDloss(alpha=1,fun='log1p')
115 |     pred = torch.tensor([[5, 5, 5, 23, 0.15],[6,6,5,28,0]]).type(torch.float32)
116 |     target = torch.tensor([[5, 5, 5, 24, 0],[6,6,5,28,0]]).type(torch.float32)
117 |     kld = kld_loss_n(target,pred)


--------------------------------------------------------------------------------
/utils/nms_rotated/__init__.py:
--------------------------------------------------------------------------------
1 | from .nms_rotated_wrapper import obb_nms, poly_nms
2 | 
3 | __all__ = ['obb_nms', 'poly_nms']
4 | 


--------------------------------------------------------------------------------
/utils/nms_rotated/nms_rotated_wrapper.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import torch
 3 | 
 4 | from . import nms_rotated_ext
 5 | 
 6 | def obb_nms(dets, scores, iou_thr, device_id=None):
 7 |     """
 8 |     RIoU NMS - iou_thr.
 9 |     Args:
10 |         dets (tensor/array): (num, [cx cy w h θ]) θ∈[-pi/2, pi/2)
11 |         scores (tensor/array): (num)
12 |         iou_thr (float): (1)
13 |     Returns:
14 |         dets (tensor): (n_nms, [cx cy w h θ])
15 |         inds (tensor): (n_nms), nms index of dets
16 |     """
17 |     if isinstance(dets, torch.Tensor):
18 |         is_numpy = False
19 |         dets_th = dets
20 |     elif isinstance(dets, np.ndarray):
21 |         is_numpy = True
22 |         device = 'cpu' if device_id is None else f'cuda:{device_id}'
23 |         dets_th = torch.from_numpy(dets).to(device)
24 |     else:
25 |         raise TypeError('dets must be eithr a Tensor or numpy array, '
26 |                         f'but got {type(dets)}')
27 | 
28 |     if dets_th.numel() == 0: # len(dets)
29 |         inds = dets_th.new_zeros(0, dtype=torch.int64)
30 |     else:
31 |         # same bug will happen when bboxes is too small
32 |         too_small = dets_th[:, [2, 3]].min(1)[0] < 0.001 # [n]
33 |         if too_small.all(): # all the bboxes is too small
34 |             inds = dets_th.new_zeros(0, dtype=torch.int64)
35 |         else:
36 |             ori_inds = torch.arange(dets_th.size(0)) # 0 ~ n-1
37 |             ori_inds = ori_inds[~too_small]
38 |             dets_th = dets_th[~too_small] # (n_filter, 5)
39 |             scores = scores[~too_small]
40 | 
41 |             inds = nms_rotated_ext.nms_rotated(dets_th, scores, iou_thr)
42 |             inds = ori_inds[inds]
43 | 
44 |     if is_numpy:
45 |         inds = inds.cpu().numpy()
46 |     return dets[inds, :], inds
47 | 
48 | 
49 | def poly_nms(dets, iou_thr, device_id=None):
50 |     if isinstance(dets, torch.Tensor):
51 |         is_numpy = False
52 |         dets_th = dets
53 |     elif isinstance(dets, np.ndarray):
54 |         is_numpy = True
55 |         device = 'cpu' if device_id is None else f'cuda:{device_id}'
56 |         dets_th = torch.from_numpy(dets).to(device)
57 |     else:
58 |         raise TypeError('dets must be eithr a Tensor or numpy array, '
59 |                         f'but got {type(dets)}')
60 | 
61 |     if dets_th.device == torch.device('cpu'):
62 |         raise NotImplementedError
63 |     inds = nms_rotated_ext.nms_poly(dets_th.float(), iou_thr)
64 | 
65 |     if is_numpy:
66 |         inds = inds.cpu().numpy()
67 |     return dets[inds, :], inds
68 | 
69 | if __name__ == '__main__':
70 |     rboxes_opencv = torch.tensor(([136.6, 111.6, 200, 100, -60],
71 |                                   [136.6, 111.6, 100, 200, -30],
72 |                                   [100, 100, 141.4, 141.4, -45],
73 |                                   [100, 100, 141.4, 141.4, -45]))
74 |     rboxes_longedge = torch.tensor(([136.6, 111.6, 200, 100, -60],
75 |                                     [136.6, 111.6, 200, 100, 120],
76 |                                     [100, 100, 141.4, 141.4, 45],
77 |                                     [100, 100, 141.4, 141.4, 135]))
78 |     


--------------------------------------------------------------------------------
/utils/nms_rotated/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | import os
 3 | import subprocess
 4 | import time
 5 | from setuptools import find_packages, setup
 6 | 
 7 | import torch
 8 | from torch.utils.cpp_extension import (BuildExtension, CppExtension,
 9 |                                        CUDAExtension)
10 | def make_cuda_ext(name, module, sources, sources_cuda=[]):
11 | 
12 |     define_macros = []
13 |     extra_compile_args = {'cxx': []}
14 | 
15 |     if torch.cuda.is_available() or os.getenv('FORCE_CUDA', '0') == '1':
16 |         define_macros += [('WITH_CUDA', None)]
17 |         extension = CUDAExtension
18 |         extra_compile_args['nvcc'] = [
19 |             '-D__CUDA_NO_HALF_OPERATORS__',
20 |             '-D__CUDA_NO_HALF_CONVERSIONS__',
21 |             '-D__CUDA_NO_HALF2_OPERATORS__',
22 |         ]
23 |         sources += sources_cuda
24 |     else:
25 |         print(f'Compiling {name} without CUDA')
26 |         extension = CppExtension
27 |         # raise EnvironmentError('CUDA is required to compile MMDetection!')
28 | 
29 |     return extension(
30 |         name=f'{module}.{name}',
31 |         sources=[os.path.join(*module.split('.'), p) for p in sources],
32 |         define_macros=define_macros,
33 |         extra_compile_args=extra_compile_args)
34 | 
35 | # python setup.py develop
36 | if __name__ == '__main__':
37 |     #write_version_py()
38 |     setup(
39 |         name='nms_rotated',
40 |         ext_modules=[
41 |             make_cuda_ext(
42 |                 name='nms_rotated_ext',
43 |                 module='',
44 |                 sources=[
45 |                     'src/nms_rotated_cpu.cpp',
46 |                     'src/nms_rotated_ext.cpp'
47 |                 ],
48 |                 sources_cuda=[
49 |                     'src/nms_rotated_cuda.cu',
50 |                     'src/poly_nms_cuda.cu',
51 |                 ]),
52 |         ],
53 |         cmdclass={'build_ext': BuildExtension},
54 |         zip_safe=False)


--------------------------------------------------------------------------------
/utils/nms_rotated/src/box_iou_rotated_utils.h:
--------------------------------------------------------------------------------
  1 | // Mortified from
  2 | // https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/box_iou_rotated
  3 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
  4 | #pragma once
  5 | 
  6 | #include <cassert>
  7 | #include <cmath>
  8 | 
  9 | #if defined(__CUDACC__) || __HCC__ == 1 || __HIP__ == 1
 10 | // Designates functions callable from the host (CPU) and the device (GPU)
 11 | #define HOST_DEVICE __host__ __device__
 12 | #define HOST_DEVICE_INLINE HOST_DEVICE __forceinline__
 13 | #else
 14 | #include <algorithm>
 15 | #define HOST_DEVICE
 16 | #define HOST_DEVICE_INLINE HOST_DEVICE inline
 17 | #endif
 18 | 
 19 | 
 20 | template <typename T>
 21 | struct RotatedBox {
 22 |   T x_ctr, y_ctr, w, h, a;
 23 | };
 24 | 
 25 | template <typename T>
 26 | struct Point {
 27 |   T x, y;
 28 |   HOST_DEVICE_INLINE Point(const T& px = 0, const T& py = 0) : x(px), y(py) {}
 29 |   HOST_DEVICE_INLINE Point operator+(const Point& p) const {
 30 |     return Point(x + p.x, y + p.y);
 31 |   }
 32 |   HOST_DEVICE_INLINE Point& operator+=(const Point& p) {
 33 |     x += p.x;
 34 |     y += p.y;
 35 |     return *this;
 36 |   }
 37 |   HOST_DEVICE_INLINE Point operator-(const Point& p) const {
 38 |     return Point(x - p.x, y - p.y);
 39 |   }
 40 |   HOST_DEVICE_INLINE Point operator*(const T coeff) const {
 41 |     return Point(x * coeff, y * coeff);
 42 |   }
 43 | };
 44 | 
 45 | template <typename T>
 46 | HOST_DEVICE_INLINE T dot_2d(const Point<T>& A, const Point<T>& B) {
 47 |   return A.x * B.x + A.y * B.y;
 48 | }
 49 | 
 50 | // R: result type. can be different from input type
 51 | template <typename T, typename R = T>
 52 | HOST_DEVICE_INLINE R cross_2d(const Point<T>& A, const Point<T>& B) {
 53 |   return static_cast<R>(A.x) * static_cast<R>(B.y) -
 54 |       static_cast<R>(B.x) * static_cast<R>(A.y);
 55 | }
 56 | 
 57 | template <typename T>
 58 | HOST_DEVICE_INLINE void get_rotated_vertices(
 59 |     const RotatedBox<T>& box,
 60 |     Point<T> (&pts)[4]) {
 61 |   // M_PI / 180. == 0.01745329251
 62 |   //double theta = box.a * 0.01745329251;             ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 63 |   double theta = box.a;
 64 |   T cosTheta2 = (T)cos(theta) * 0.5f;
 65 |   T sinTheta2 = (T)sin(theta) * 0.5f;
 66 | 
 67 |   // y: top --> down; x: left --> right
 68 |   pts[0].x = box.x_ctr + sinTheta2 * box.h + cosTheta2 * box.w;
 69 |   pts[0].y = box.y_ctr + cosTheta2 * box.h - sinTheta2 * box.w;
 70 |   pts[1].x = box.x_ctr - sinTheta2 * box.h + cosTheta2 * box.w;
 71 |   pts[1].y = box.y_ctr - cosTheta2 * box.h - sinTheta2 * box.w;
 72 |   pts[2].x = 2 * box.x_ctr - pts[0].x;
 73 |   pts[2].y = 2 * box.y_ctr - pts[0].y;
 74 |   pts[3].x = 2 * box.x_ctr - pts[1].x;
 75 |   pts[3].y = 2 * box.y_ctr - pts[1].y;
 76 | }
 77 | 
 78 | template <typename T>
 79 | HOST_DEVICE_INLINE int get_intersection_points(
 80 |     const Point<T> (&pts1)[4],
 81 |     const Point<T> (&pts2)[4],
 82 |     Point<T> (&intersections)[24]) {
 83 |   // Line vector
 84 |   // A line from p1 to p2 is: p1 + (p2-p1)*t, t=[0,1]
 85 |   Point<T> vec1[4], vec2[4];
 86 |   for (int i = 0; i < 4; i++) {
 87 |     vec1[i] = pts1[(i + 1) % 4] - pts1[i];
 88 |     vec2[i] = pts2[(i + 1) % 4] - pts2[i];
 89 |   }
 90 | 
 91 |   // Line test - test all line combos for intersection
 92 |   int num = 0; // number of intersections
 93 |   for (int i = 0; i < 4; i++) {
 94 |     for (int j = 0; j < 4; j++) {
 95 |       // Solve for 2x2 Ax=b
 96 |       T det = cross_2d<T>(vec2[j], vec1[i]);
 97 | 
 98 |       // This takes care of parallel lines
 99 |       if (fabs(det) <= 1e-14) {
100 |         continue;
101 |       }
102 | 
103 |       auto vec12 = pts2[j] - pts1[i];
104 | 
105 |       T t1 = cross_2d<T>(vec2[j], vec12) / det;
106 |       T t2 = cross_2d<T>(vec1[i], vec12) / det;
107 | 
108 |       if (t1 >= 0.0f && t1 <= 1.0f && t2 >= 0.0f && t2 <= 1.0f) {
109 |         intersections[num++] = pts1[i] + vec1[i] * t1;
110 |       }
111 |     }
112 |   }
113 | 
114 |   // Check for vertices of rect1 inside rect2
115 |   {
116 |     const auto& AB = vec2[0];
117 |     const auto& DA = vec2[3];
118 |     auto ABdotAB = dot_2d<T>(AB, AB);
119 |     auto ADdotAD = dot_2d<T>(DA, DA);
120 |     for (int i = 0; i < 4; i++) {
121 |       // assume ABCD is the rectangle, and P is the point to be judged
122 |       // P is inside ABCD iff. P's projection on AB lies within AB
123 |       // and P's projection on AD lies within AD
124 | 
125 |       auto AP = pts1[i] - pts2[0];
126 | 
127 |       auto APdotAB = dot_2d<T>(AP, AB);
128 |       auto APdotAD = -dot_2d<T>(AP, DA);
129 | 
130 |       if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) &&
131 |           (APdotAD <= ADdotAD)) {
132 |         intersections[num++] = pts1[i];
133 |       }
134 |     }
135 |   }
136 | 
137 |   // Reverse the check - check for vertices of rect2 inside rect1
138 |   {
139 |     const auto& AB = vec1[0];
140 |     const auto& DA = vec1[3];
141 |     auto ABdotAB = dot_2d<T>(AB, AB);
142 |     auto ADdotAD = dot_2d<T>(DA, DA);
143 |     for (int i = 0; i < 4; i++) {
144 |       auto AP = pts2[i] - pts1[0];
145 | 
146 |       auto APdotAB = dot_2d<T>(AP, AB);
147 |       auto APdotAD = -dot_2d<T>(AP, DA);
148 | 
149 |       if ((APdotAB >= 0) && (APdotAD >= 0) && (APdotAB <= ABdotAB) &&
150 |           (APdotAD <= ADdotAD)) {
151 |         intersections[num++] = pts2[i];
152 |       }
153 |     }
154 |   }
155 | 
156 |   return num;
157 | }
158 | 
159 | template <typename T>
160 | HOST_DEVICE_INLINE int convex_hull_graham(
161 |     const Point<T> (&p)[24],
162 |     const int& num_in,
163 |     Point<T> (&q)[24],
164 |     bool shift_to_zero = false) {
165 |   assert(num_in >= 2);
166 | 
167 |   // Step 1:
168 |   // Find point with minimum y
169 |   // if more than 1 points have the same minimum y,
170 |   // pick the one with the minimum x.
171 |   int t = 0;
172 |   for (int i = 1; i < num_in; i++) {
173 |     if (p[i].y < p[t].y || (p[i].y == p[t].y && p[i].x < p[t].x)) {
174 |       t = i;
175 |     }
176 |   }
177 |   auto& start = p[t]; // starting point
178 | 
179 |   // Step 2:
180 |   // Subtract starting point from every points (for sorting in the next step)
181 |   for (int i = 0; i < num_in; i++) {
182 |     q[i] = p[i] - start;
183 |   }
184 | 
185 |   // Swap the starting point to position 0
186 |   auto tmp = q[0];
187 |   q[0] = q[t];
188 |   q[t] = tmp;
189 | 
190 |   // Step 3:
191 |   // Sort point 1 ~ num_in according to their relative cross-product values
192 |   // (essentially sorting according to angles)
193 |   // If the angles are the same, sort according to their distance to origin
194 |   T dist[24];
195 | #if defined(__CUDACC__) || __HCC__ == 1 || __HIP__ == 1
196 |   // compute distance to origin before sort, and sort them together with the
197 |   // points
198 |   for (int i = 0; i < num_in; i++) {
199 |     dist[i] = dot_2d<T>(q[i], q[i]);
200 |   }
201 | 
202 |   // CUDA version
203 |   // In the future, we can potentially use thrust
204 |   // for sorting here to improve speed (though not guaranteed)
205 |   for (int i = 1; i < num_in - 1; i++) {
206 |     for (int j = i + 1; j < num_in; j++) {
207 |       T crossProduct = cross_2d<T>(q[i], q[j]);
208 |       if ((crossProduct < -1e-6) ||
209 |           (fabs(crossProduct) < 1e-6 && dist[i] > dist[j])) {
210 |         auto q_tmp = q[i];
211 |         q[i] = q[j];
212 |         q[j] = q_tmp;
213 |         auto dist_tmp = dist[i];
214 |         dist[i] = dist[j];
215 |         dist[j] = dist_tmp;
216 |       }
217 |     }
218 |   }
219 | #else
220 |   // CPU version
221 |   std::sort(
222 |       q + 1, q + num_in, [](const Point<T>& A, const Point<T>& B) -> bool {
223 |         T temp = cross_2d<T>(A, B);
224 |         if (fabs(temp) < 1e-6) {
225 |           return dot_2d<T>(A, A) < dot_2d<T>(B, B);
226 |         } else {
227 |           return temp > 0;
228 |         }
229 |       });
230 |   // compute distance to origin after sort, since the points are now different.
231 |   for (int i = 0; i < num_in; i++) {
232 |     dist[i] = dot_2d<T>(q[i], q[i]);
233 |   }
234 | #endif
235 | 
236 |   // Step 4:
237 |   // Make sure there are at least 2 points (that don't overlap with each other)
238 |   // in the stack
239 |   int k; // index of the non-overlapped second point
240 |   for (k = 1; k < num_in; k++) {
241 |     if (dist[k] > 1e-8) {
242 |       break;
243 |     }
244 |   }
245 |   if (k == num_in) {
246 |     // We reach the end, which means the convex hull is just one point
247 |     q[0] = p[t];
248 |     return 1;
249 |   }
250 |   q[1] = q[k];
251 |   int m = 2; // 2 points in the stack
252 |   // Step 5:
253 |   // Finally we can start the scanning process.
254 |   // When a non-convex relationship between the 3 points is found
255 |   // (either concave shape or duplicated points),
256 |   // we pop the previous point from the stack
257 |   // until the 3-point relationship is convex again, or
258 |   // until the stack only contains two points
259 |   for (int i = k + 1; i < num_in; i++) {
260 |     while (m > 1) {
261 |       auto q1 = q[i] - q[m - 2], q2 = q[m - 1] - q[m - 2];
262 |       // cross_2d() uses FMA and therefore computes round(round(q1.x*q2.y) -
263 |       // q2.x*q1.y) So it may not return 0 even when q1==q2. Therefore we
264 |       // compare round(q1.x*q2.y) and round(q2.x*q1.y) directly. (round means
265 |       // round to nearest floating point).
266 |       if (q1.x * q2.y >= q2.x * q1.y)
267 |         m--;
268 |       else
269 |         break;
270 |     }
271 |     // Using double also helps, but float can solve the issue for now.
272 |     // while (m > 1 && cross_2d<T, double>(q[i] - q[m - 2], q[m - 1] - q[m - 2])
273 |     // >= 0) {
274 |     //     m--;
275 |     // }
276 |     q[m++] = q[i];
277 |   }
278 | 
279 |   // Step 6 (Optional):
280 |   // In general sense we need the original coordinates, so we
281 |   // need to shift the points back (reverting Step 2)
282 |   // But if we're only interested in getting the area/perimeter of the shape
283 |   // We can simply return.
284 |   if (!shift_to_zero) {
285 |     for (int i = 0; i < m; i++) {
286 |       q[i] += start;
287 |     }
288 |   }
289 | 
290 |   return m;
291 | }
292 | 
293 | template <typename T>
294 | HOST_DEVICE_INLINE T polygon_area(const Point<T> (&q)[24], const int& m) {
295 |   if (m <= 2) {
296 |     return 0;
297 |   }
298 | 
299 |   T area = 0;
300 |   for (int i = 1; i < m - 1; i++) {
301 |     area += fabs(cross_2d<T>(q[i] - q[0], q[i + 1] - q[0]));
302 |   }
303 | 
304 |   return area / 2.0;
305 | }
306 | 
307 | template <typename T>
308 | HOST_DEVICE_INLINE T rotated_boxes_intersection(
309 |     const RotatedBox<T>& box1,
310 |     const RotatedBox<T>& box2) {
311 |   // There are up to 4 x 4 + 4 + 4 = 24 intersections (including dups) returned
312 |   // from rotated_rect_intersection_pts
313 |   Point<T> intersectPts[24], orderedPts[24];
314 | 
315 |   Point<T> pts1[4];
316 |   Point<T> pts2[4];
317 |   get_rotated_vertices<T>(box1, pts1);
318 |   get_rotated_vertices<T>(box2, pts2);
319 | 
320 |   int num = get_intersection_points<T>(pts1, pts2, intersectPts);
321 | 
322 |   if (num <= 2) {
323 |     return 0.0;
324 |   }
325 | 
326 |   // Convex Hull to order the intersection points in clockwise order and find
327 |   // the contour area.
328 |   int num_convex = convex_hull_graham<T>(intersectPts, num, orderedPts, true);
329 |   return polygon_area<T>(orderedPts, num_convex);
330 | }
331 | 
332 | 
333 | template <typename T>
334 | HOST_DEVICE_INLINE T
335 | single_box_iou_rotated(T const* const box1_raw, T const* const box2_raw) {
336 |   // shift center to the middle point to achieve higher precision in result
337 |   RotatedBox<T> box1, box2;
338 |   auto center_shift_x = (box1_raw[0] + box2_raw[0]) / 2.0;
339 |   auto center_shift_y = (box1_raw[1] + box2_raw[1]) / 2.0;
340 |   box1.x_ctr = box1_raw[0] - center_shift_x;
341 |   box1.y_ctr = box1_raw[1] - center_shift_y;
342 |   box1.w = box1_raw[2];
343 |   box1.h = box1_raw[3];
344 |   box1.a = box1_raw[4];
345 |   box2.x_ctr = box2_raw[0] - center_shift_x;
346 |   box2.y_ctr = box2_raw[1] - center_shift_y;
347 |   box2.w = box2_raw[2];
348 |   box2.h = box2_raw[3];
349 |   box2.a = box2_raw[4];
350 | 
351 |   T area1 = box1.w * box1.h;
352 |   T area2 = box2.w * box2.h;
353 |   if (area1 < 1e-14 || area2 < 1e-14) {
354 |     return 0.f;
355 |   }
356 | 
357 |   T intersection = rotated_boxes_intersection<T>(box1, box2);
358 |   T iou = intersection / (area1 + area2 - intersection);
359 |   return iou;
360 | }
361 | 


--------------------------------------------------------------------------------
/utils/nms_rotated/src/nms_rotated_cpu.cpp:
--------------------------------------------------------------------------------
 1 | // Modified from
 2 | // https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/nms_rotated
 3 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
 4 | #include <torch/types.h>
 5 | #include "box_iou_rotated_utils.h"
 6 | 
 7 | 
 8 | template <typename scalar_t>
 9 | at::Tensor nms_rotated_cpu_kernel(
10 |     const at::Tensor& dets,
11 |     const at::Tensor& scores,
12 |     const float iou_threshold) {
13 |   // nms_rotated_cpu_kernel is modified from torchvision's nms_cpu_kernel,
14 |   // however, the code in this function is much shorter because
15 |   // we delegate the IoU computation for rotated boxes to
16 |   // the single_box_iou_rotated function in box_iou_rotated_utils.h
17 |   AT_ASSERTM(dets.device().is_cpu(), "dets must be a CPU tensor");
18 |   AT_ASSERTM(scores.device().is_cpu(), "scores must be a CPU tensor");
19 |   AT_ASSERTM(
20 |       dets.scalar_type() == scores.scalar_type(),
21 |       "dets should have the same type as scores");
22 | 
23 |   if (dets.numel() == 0) {
24 |     return at::empty({0}, dets.options().dtype(at::kLong));
25 |   }
26 | 
27 |   auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));
28 | 
29 |   auto ndets = dets.size(0);
30 |   at::Tensor suppressed_t = at::zeros({ndets}, dets.options().dtype(at::kByte));
31 |   at::Tensor keep_t = at::zeros({ndets}, dets.options().dtype(at::kLong));
32 | 
33 |   auto suppressed = suppressed_t.data_ptr<uint8_t>();
34 |   auto keep = keep_t.data_ptr<int64_t>();
35 |   auto order = order_t.data_ptr<int64_t>();
36 | 
37 |   int64_t num_to_keep = 0;
38 | 
39 |   for (int64_t _i = 0; _i < ndets; _i++) {
40 |     auto i = order[_i];
41 |     if (suppressed[i] == 1) {
42 |       continue;
43 |     }
44 | 
45 |     keep[num_to_keep++] = i;
46 | 
47 |     for (int64_t _j = _i + 1; _j < ndets; _j++) {
48 |       auto j = order[_j];
49 |       if (suppressed[j] == 1) {
50 |         continue;
51 |       }
52 | 
53 |       auto ovr = single_box_iou_rotated<scalar_t>(
54 |           dets[i].data_ptr<scalar_t>(), dets[j].data_ptr<scalar_t>());
55 |       if (ovr >= iou_threshold) {
56 |         suppressed[j] = 1;
57 |       }
58 |     }
59 |   }
60 |   return keep_t.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep);
61 | }
62 | 
63 | at::Tensor nms_rotated_cpu(
64 |     // input must be contiguous
65 |     const at::Tensor& dets,
66 |     const at::Tensor& scores,
67 |     const float iou_threshold) {
68 |   auto result = at::empty({0}, dets.options());
69 | 
70 |   AT_DISPATCH_FLOATING_TYPES(dets.scalar_type(), "nms_rotated", [&] {
71 |     result = nms_rotated_cpu_kernel<scalar_t>(dets, scores, iou_threshold);
72 |   });
73 |   return result;
74 | }
75 | 


--------------------------------------------------------------------------------
/utils/nms_rotated/src/nms_rotated_cuda.cu:
--------------------------------------------------------------------------------
  1 | // Modified from
  2 | // https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/nms_rotated
  3 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
  4 | #include <ATen/ATen.h>
  5 | #include <ATen/cuda/CUDAContext.h>
  6 | #include <c10/cuda/CUDAGuard.h>
  7 | #include <ATen/cuda/CUDAApplyUtils.cuh>
  8 | #include "box_iou_rotated_utils.h"
  9 | 
 10 | int const threadsPerBlock = sizeof(unsigned long long) * 8;
 11 | 
 12 | template <typename T>
 13 | __global__ void nms_rotated_cuda_kernel(
 14 |     const int n_boxes,
 15 |     const float iou_threshold,
 16 |     const T* dev_boxes,
 17 |     unsigned long long* dev_mask) {
 18 |   // nms_rotated_cuda_kernel is modified from torchvision's nms_cuda_kernel
 19 | 
 20 |   const int row_start = blockIdx.y;
 21 |   const int col_start = blockIdx.x;
 22 | 
 23 |   // if (row_start > col_start) return;
 24 | 
 25 |   const int row_size =
 26 |       min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);
 27 |   const int col_size =
 28 |       min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);
 29 | 
 30 |   // Compared to nms_cuda_kernel, where each box is represented with 4 values
 31 |   // (x1, y1, x2, y2), each rotated box is represented with 5 values
 32 |   // (x_center, y_center, width, height, angle_degrees) here.
 33 |   __shared__ T block_boxes[threadsPerBlock * 5];
 34 |   if (threadIdx.x < col_size) {
 35 |     block_boxes[threadIdx.x * 5 + 0] =
 36 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 0];
 37 |     block_boxes[threadIdx.x * 5 + 1] =
 38 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 1];
 39 |     block_boxes[threadIdx.x * 5 + 2] =
 40 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 2];
 41 |     block_boxes[threadIdx.x * 5 + 3] =
 42 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 3];
 43 |     block_boxes[threadIdx.x * 5 + 4] =
 44 |         dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 5 + 4];
 45 |   }
 46 |   __syncthreads();
 47 | 
 48 |   if (threadIdx.x < row_size) {
 49 |     const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
 50 |     const T* cur_box = dev_boxes + cur_box_idx * 5;
 51 |     int i = 0;
 52 |     unsigned long long t = 0;
 53 |     int start = 0;
 54 |     if (row_start == col_start) {
 55 |       start = threadIdx.x + 1;
 56 |     }
 57 |     for (i = start; i < col_size; i++) {
 58 |       // Instead of devIoU used by original horizontal nms, here
 59 |       // we use the single_box_iou_rotated function from box_iou_rotated_utils.h
 60 |       if (single_box_iou_rotated<T>(cur_box, block_boxes + i * 5) >
 61 |           iou_threshold) {
 62 |         t |= 1ULL << i;
 63 |       }
 64 |     }
 65 |     const int col_blocks = at::cuda::ATenCeilDiv(n_boxes, threadsPerBlock);
 66 |     dev_mask[cur_box_idx * col_blocks + col_start] = t;
 67 |   }
 68 | }
 69 | 
 70 | 
 71 | at::Tensor nms_rotated_cuda(
 72 |     // input must be contiguous
 73 |     const at::Tensor& dets,
 74 |     const at::Tensor& scores,
 75 |     float iou_threshold) {
 76 |   // using scalar_t = float;
 77 |   AT_ASSERTM(dets.is_cuda(), "dets must be a CUDA tensor");
 78 |   AT_ASSERTM(scores.is_cuda(), "scores must be a CUDA tensor");
 79 |   at::cuda::CUDAGuard device_guard(dets.device());
 80 | 
 81 |   auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));
 82 |   auto dets_sorted = dets.index_select(0, order_t);
 83 | 
 84 |   auto dets_num = dets.size(0);
 85 | 
 86 |   const int col_blocks =
 87 |       at::cuda::ATenCeilDiv(static_cast<int>(dets_num), threadsPerBlock);
 88 | 
 89 |   at::Tensor mask =
 90 |       at::empty({dets_num * col_blocks}, dets.options().dtype(at::kLong));
 91 | 
 92 |   dim3 blocks(col_blocks, col_blocks);
 93 |   dim3 threads(threadsPerBlock);
 94 |   cudaStream_t stream = at::cuda::getCurrentCUDAStream();
 95 | 
 96 |   AT_DISPATCH_FLOATING_TYPES(
 97 |       dets_sorted.scalar_type(), "nms_rotated_kernel_cuda", [&] {
 98 |         nms_rotated_cuda_kernel<scalar_t><<<blocks, threads, 0, stream>>>(
 99 |             dets_num,
100 |             iou_threshold,
101 |             dets_sorted.data_ptr<scalar_t>(),
102 |             (unsigned long long*)mask.data_ptr<int64_t>());
103 |       });
104 | 
105 |   at::Tensor mask_cpu = mask.to(at::kCPU);
106 |   unsigned long long* mask_host =
107 |       (unsigned long long*)mask_cpu.data_ptr<int64_t>();
108 | 
109 |   std::vector<unsigned long long> remv(col_blocks);
110 |   memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
111 | 
112 |   at::Tensor keep =
113 |       at::empty({dets_num}, dets.options().dtype(at::kLong).device(at::kCPU));
114 |   int64_t* keep_out = keep.data_ptr<int64_t>();
115 | 
116 |   int num_to_keep = 0;
117 |   for (int i = 0; i < dets_num; i++) {
118 |     int nblock = i / threadsPerBlock;
119 |     int inblock = i % threadsPerBlock;
120 | 
121 |     if (!(remv[nblock] & (1ULL << inblock))) {
122 |       keep_out[num_to_keep++] = i;
123 |       unsigned long long* p = mask_host + i * col_blocks;
124 |       for (int j = nblock; j < col_blocks; j++) {
125 |         remv[j] |= p[j];
126 |       }
127 |     }
128 |   }
129 | 
130 |   AT_CUDA_CHECK(cudaGetLastError());
131 |   return order_t.index(
132 |       {keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep)
133 |            .to(order_t.device(), keep.scalar_type())});
134 | }
135 | 


--------------------------------------------------------------------------------
/utils/nms_rotated/src/nms_rotated_ext.cpp:
--------------------------------------------------------------------------------
 1 | // Modified from
 2 | // https://github.com/facebookresearch/detectron2/tree/master/detectron2/layers/csrc/nms_rotated
 3 | // Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
 4 | #include <ATen/ATen.h>
 5 | #include <torch/extension.h>
 6 | 
 7 | 
 8 | #ifdef WITH_CUDA
 9 | at::Tensor nms_rotated_cuda(
10 |     const at::Tensor& dets,
11 |     const at::Tensor& scores,
12 |     const float iou_threshold);
13 | 
14 | at::Tensor poly_nms_cuda(
15 |     const at::Tensor boxes,
16 |     float nms_overlap_thresh);
17 | #endif
18 | 
19 | at::Tensor nms_rotated_cpu(
20 |     const at::Tensor& dets,
21 |     const at::Tensor& scores,
22 |     const float iou_threshold);
23 | 
24 | 
25 | inline at::Tensor nms_rotated(
26 |     const at::Tensor& dets,
27 |     const at::Tensor& scores,
28 |     const float iou_threshold) {
29 |   assert(dets.device().is_cuda() == scores.device().is_cuda());
30 |   if (dets.device().is_cuda()) {
31 | #ifdef WITH_CUDA
32 |     return nms_rotated_cuda(
33 |         dets.contiguous(), scores.contiguous(), iou_threshold);
34 | #else
35 |     AT_ERROR("Not compiled with GPU support");
36 | #endif
37 |   }
38 |   return nms_rotated_cpu(dets.contiguous(), scores.contiguous(), iou_threshold);
39 | }
40 | 
41 | 
42 | inline at::Tensor nms_poly(
43 |     const at::Tensor& dets,
44 |     const float iou_threshold) {
45 |   if (dets.device().is_cuda()) {
46 | #ifdef WITH_CUDA
47 |     if (dets.numel() == 0)
48 |       return at::empty({0}, dets.options().dtype(at::kLong).device(at::kCPU));
49 |     return poly_nms_cuda(dets, iou_threshold);
50 | #else
51 |     AT_ERROR("POLY_NMS is not compiled with GPU support");
52 | #endif
53 |   }
54 |   AT_ERROR("POLY_NMS is not implemented on CPU");
55 | }
56 | 
57 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
58 |   m.def("nms_rotated", &nms_rotated, "nms for rotated bboxes");
59 |   m.def("nms_poly", &nms_poly, "nms for poly bboxes");
60 | }
61 | 


--------------------------------------------------------------------------------
/utils/nms_rotated/src/poly_nms_cpu.cpp:
--------------------------------------------------------------------------------
1 | #include <torch/extension.h>
2 | 
3 | template <typename scalar_t>
4 | at::Tensor poly_nms_cpu_kernel(const at::Tensor& dets, const float threshold) {
5 | 
6 | 


--------------------------------------------------------------------------------
/utils/nms_rotated/src/poly_nms_cuda.cu:
--------------------------------------------------------------------------------
  1 | #include <ATen/ATen.h>
  2 | #include <ATen/cuda/CUDAContext.h>
  3 | 
  4 | #include <THC/THC.h>
  5 | #include <THC/THCDeviceUtils.cuh>
  6 | 
  7 | #include <vector>
  8 | #include <iostream>
  9 | 
 10 | #define CUDA_CHECK(condition) \
 11 |   /* Code block avoids redefinition of cudaError_t error */ \
 12 |   do { \
 13 |     cudaError_t error = condition; \
 14 |     if (error != cudaSuccess) { \
 15 |       std::cout << cudaGetErrorString(error) << std::endl; \
 16 |     } \
 17 |   } while (0)
 18 | 
 19 | #define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))
 20 | int const threadsPerBlock = sizeof(unsigned long long) * 8;
 21 | 
 22 | 
 23 | #define maxn 10
 24 | const double eps=1E-8;
 25 | 
 26 | __device__ inline int sig(float d){
 27 |     return(d>1E-8)-(d<-1E-8);
 28 | }
 29 | 
 30 | __device__ inline int point_eq(const float2 a, const float2 b) {
 31 |     return sig(a.x - b.x) == 0 && sig(a.y - b.y)==0;
 32 | }
 33 | 
 34 | __device__ inline void point_swap(float2 *a, float2 *b) {
 35 |     float2 temp = *a;
 36 |     *a = *b;
 37 |     *b = temp;
 38 | }
 39 | 
 40 | __device__ inline void point_reverse(float2 *first, float2* last)
 41 | {
 42 |     while ((first!=last)&&(first!=--last)) {
 43 |         point_swap (first,last);
 44 |         ++first;
 45 |     }
 46 | }
 47 | 
 48 | __device__ inline float cross(float2 o,float2 a,float2 b){  //叉积
 49 |     return(a.x-o.x)*(b.y-o.y)-(b.x-o.x)*(a.y-o.y);
 50 | }
 51 | __device__ inline float area(float2* ps,int n){
 52 |     ps[n]=ps[0];
 53 |     float res=0;
 54 |     for(int i=0;i<n;i++){
 55 |         res+=ps[i].x*ps[i+1].y-ps[i].y*ps[i+1].x;
 56 |     }
 57 |     return res/2.0;
 58 | }
 59 | __device__ inline int lineCross(float2 a,float2 b,float2 c,float2 d,float2&p){
 60 |     float s1,s2;
 61 |     s1=cross(a,b,c);
 62 |     s2=cross(a,b,d);
 63 |     if(sig(s1)==0&&sig(s2)==0) return 2;
 64 |     if(sig(s2-s1)==0) return 0;
 65 |     p.x=(c.x*s2-d.x*s1)/(s2-s1);
 66 |     p.y=(c.y*s2-d.y*s1)/(s2-s1);
 67 |     return 1;
 68 | }
 69 | 
 70 | __device__ inline void polygon_cut(float2*p,int&n,float2 a,float2 b, float2* pp){
 71 | 
 72 |     int m=0;p[n]=p[0];
 73 |     for(int i=0;i<n;i++){
 74 |         if(sig(cross(a,b,p[i]))>0) pp[m++]=p[i];
 75 |         if(sig(cross(a,b,p[i]))!=sig(cross(a,b,p[i+1])))
 76 |             lineCross(a,b,p[i],p[i+1],pp[m++]);
 77 |     }
 78 |     n=0;
 79 |     for(int i=0;i<m;i++)
 80 |         if(!i||!(point_eq(pp[i], pp[i-1])))
 81 |             p[n++]=pp[i];
 82 |     // while(n>1&&p[n-1]==p[0])n--;
 83 |     while(n>1&&point_eq(p[n-1], p[0]))n--;
 84 | }
 85 | 
 86 | //---------------华丽的分隔线-----------------//
 87 | //返回三角形oab和三角形ocd的有向交面积,o是原点//
 88 | __device__ inline float intersectArea(float2 a,float2 b,float2 c,float2 d){
 89 |     float2 o = make_float2(0,0);
 90 |     int s1=sig(cross(o,a,b));
 91 |     int s2=sig(cross(o,c,d));
 92 |     if(s1==0||s2==0)return 0.0;//退化，面积为0
 93 |     // if(s1==-1) swap(a,b);
 94 |     // if(s2==-1) swap(c,d);
 95 |     if (s1 == -1) point_swap(&a, &b);
 96 |     if (s2 == -1) point_swap(&c, &d);
 97 |     float2 p[10]={o,a,b};
 98 |     int n=3;
 99 |     float2 pp[maxn];
100 |     polygon_cut(p,n,o,c,pp);
101 |     polygon_cut(p,n,c,d,pp);
102 |     polygon_cut(p,n,d,o,pp);
103 |     float res=fabs(area(p,n));
104 |     if(s1*s2==-1) res=-res;return res;
105 | }
106 | //求两多边形的交面积
107 | __device__ inline float intersectArea(float2*ps1,int n1,float2*ps2,int n2){
108 |     if(area(ps1,n1)<0) point_reverse(ps1,ps1+n1);
109 |     if(area(ps2,n2)<0) point_reverse(ps2,ps2+n2);
110 |     ps1[n1]=ps1[0];
111 |     ps2[n2]=ps2[0];
112 |     float res=0;
113 |     for(int i=0;i<n1;i++){
114 |         for(int j=0;j<n2;j++){
115 |             res+=intersectArea(ps1[i],ps1[i+1],ps2[j],ps2[j+1]);
116 |         }
117 |     }
118 |     return res;//assumeresispositive!
119 | }
120 | 
121 | // TODO: optimal if by first calculate the iou between two hbbs
122 | __device__ inline float devPolyIoU(float const * const p, float const * const q) {
123 |     float2 ps1[maxn], ps2[maxn];
124 |     int n1 = 4;
125 |     int n2 = 4;
126 |     for (int i = 0; i < 4; i++) {
127 |         ps1[i].x = p[i * 2];
128 |         ps1[i].y = p[i * 2 + 1];
129 | 
130 |         ps2[i].x = q[i * 2];
131 |         ps2[i].y = q[i * 2 + 1];
132 |     }
133 |     float inter_area = intersectArea(ps1, n1, ps2, n2);
134 |     float union_area = fabs(area(ps1, n1)) + fabs(area(ps2, n2)) - inter_area;
135 |     float iou = 0;
136 |     if (union_area == 0) {
137 |         iou = (inter_area + 1) / (union_area + 1);
138 |     } else {
139 |         iou = inter_area / union_area;
140 |     }
141 |     return iou;
142 | }
143 | 
144 | __global__ void poly_nms_kernel(const int n_polys, const float nms_overlap_thresh,
145 |                             const float *dev_polys, unsigned long long *dev_mask) {
146 |     const int row_start = blockIdx.y;
147 |     const int col_start = blockIdx.x;
148 | 
149 |     const int row_size =
150 |             min(n_polys - row_start * threadsPerBlock, threadsPerBlock);
151 |     const int cols_size =
152 |             min(n_polys - col_start * threadsPerBlock, threadsPerBlock);
153 | 
154 |     __shared__ float block_polys[threadsPerBlock * 9];
155 |     if (threadIdx.x < cols_size) {
156 |         block_polys[threadIdx.x * 9 + 0] =
157 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 0];
158 |         block_polys[threadIdx.x * 9 + 1] =
159 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 1];
160 |         block_polys[threadIdx.x * 9 + 2] =
161 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 2];
162 |         block_polys[threadIdx.x * 9 + 3] =
163 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 3];
164 |         block_polys[threadIdx.x * 9 + 4] =
165 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 4];
166 |         block_polys[threadIdx.x * 9 + 5] =
167 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 5];
168 |         block_polys[threadIdx.x * 9 + 6] =
169 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 6];
170 |         block_polys[threadIdx.x * 9 + 7] =
171 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 7];
172 |         block_polys[threadIdx.x * 9 + 8] =
173 |             dev_polys[(threadsPerBlock * col_start + threadIdx.x) * 9 + 8];
174 |     }
175 |     __syncthreads();
176 | 
177 |     if (threadIdx.x < row_size) {
178 |         const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;
179 |         const float *cur_box = dev_polys + cur_box_idx * 9;
180 |         int i = 0;
181 |         unsigned long long t = 0;
182 |         int start = 0;
183 |         if (row_start == col_start) {
184 |             start = threadIdx.x + 1;
185 |         }
186 |         for (i = start; i < cols_size; i++) {
187 |             if (devPolyIoU(cur_box, block_polys + i * 9) > nms_overlap_thresh) {
188 |                 t |= 1ULL << i;
189 |             }
190 |         }
191 |         const int col_blocks = THCCeilDiv(n_polys, threadsPerBlock);
192 |         dev_mask[cur_box_idx * col_blocks + col_start] = t;
193 |     }
194 | }
195 | 
196 | // boxes is a N x 9 tensor
197 | at::Tensor poly_nms_cuda(const at::Tensor boxes, float nms_overlap_thresh) {
198 | 
199 |     at::DeviceGuard guard(boxes.device());
200 | 
201 |     using scalar_t = float;
202 |     AT_ASSERTM(boxes.device().is_cuda(), "boxes must be a CUDA tensor");
203 |     auto scores = boxes.select(1, 8);
204 |     auto order_t = std::get<1>(scores.sort(0, /*descending=*/true));
205 |     auto boxes_sorted = boxes.index_select(0, order_t);
206 | 
207 |     int boxes_num = boxes.size(0);
208 | 
209 |     const int col_blocks = THCCeilDiv(boxes_num, threadsPerBlock);
210 | 
211 |     scalar_t* boxes_dev = boxes_sorted.data_ptr<scalar_t>();
212 | 
213 |     THCState *state = at::globalContext().lazyInitCUDA();
214 | 
215 |     unsigned long long* mask_dev = NULL;
216 | 
217 |     mask_dev = (unsigned long long*) THCudaMalloc(state, boxes_num * col_blocks * sizeof(unsigned long long));
218 | 
219 |     dim3 blocks(THCCeilDiv(boxes_num, threadsPerBlock),
220 |                 THCCeilDiv(boxes_num, threadsPerBlock));
221 |     dim3 threads(threadsPerBlock);
222 |     poly_nms_kernel<<<blocks, threads, 0, at::cuda::getCurrentCUDAStream()>>>(boxes_num,
223 |                                         nms_overlap_thresh,
224 |                                         boxes_dev,
225 |                                         mask_dev);
226 |     
227 |     std::vector<unsigned long long> mask_host(boxes_num * col_blocks);
228 |     THCudaCheck(cudaMemcpyAsync(
229 | 			    &mask_host[0],
230 |                             mask_dev,
231 |                             sizeof(unsigned long long) * boxes_num * col_blocks,
232 |                             cudaMemcpyDeviceToHost,
233 | 			    at::cuda::getCurrentCUDAStream()
234 | 			    ));
235 |     
236 |     std::vector<unsigned long long> remv(col_blocks);
237 |     memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);
238 | 
239 |     at::Tensor keep = at::empty({boxes_num}, boxes.options().dtype(at::kLong).device(at::kCPU));
240 |     int64_t* keep_out = keep.data_ptr<int64_t>();
241 | 
242 |     int num_to_keep = 0;
243 |     for (int i = 0; i < boxes_num; i++) {
244 |         int nblock = i / threadsPerBlock;
245 |         int inblock = i % threadsPerBlock;
246 | 
247 |         if (!(remv[nblock] & (1ULL << inblock))) {
248 |             keep_out[num_to_keep++] = i;
249 |             unsigned long long *p = &mask_host[0] + i * col_blocks;
250 |             for (int j = nblock; j < col_blocks; j++) {
251 |                 remv[j] |= p[j];
252 |             }
253 |         }
254 |     }
255 | 
256 |     THCudaFree(state, mask_dev);
257 | 
258 |     return order_t.index({
259 |         keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep).to(
260 |           order_t.device(), keep.scalar_type())});
261 | }
262 | 
263 | 


--------------------------------------------------------------------------------
/utils/utils.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | from PIL import Image
 3 | 
 4 | 
 5 | #---------------------------------------------------------#
 6 | #   将图像转换成RGB图像，防止灰度图在预测时报错。
 7 | #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
 8 | #---------------------------------------------------------#
 9 | def cvtColor(image):
10 |     if len(np.shape(image)) == 3 and np.shape(image)[2] == 3:
11 |         return image 
12 |     else:
13 |         image = image.convert('RGB')
14 |         return image 
15 | 
16 | #---------------------------------------------------#
17 | #   对输入图像进行resize
18 | #---------------------------------------------------#
19 | def resize_image(image, size, letterbox_image):
20 |     iw, ih  = image.size
21 |     w, h    = size
22 |     if letterbox_image:
23 |         scale   = min(w/iw, h/ih)
24 |         nw      = int(iw*scale)
25 |         nh      = int(ih*scale)
26 | 
27 |         image   = image.resize((nw,nh), Image.BICUBIC)
28 |         new_image = Image.new('RGB', size, (128,128,128))
29 |         new_image.paste(image, ((w-nw)//2, (h-nh)//2))
30 |     else:
31 |         new_image = image.resize((w, h), Image.BICUBIC)
32 |     return new_image
33 | 
34 | #---------------------------------------------------#
35 | #   获得类
36 | #---------------------------------------------------#
37 | def get_classes(classes_path):
38 |     with open(classes_path, encoding='utf-8') as f:
39 |         class_names = f.readlines()
40 |     class_names = [c.strip() for c in class_names]
41 |     return class_names, len(class_names)
42 | 
43 | #---------------------------------------------------#
44 | #   获得先验框
45 | #---------------------------------------------------#
46 | def get_anchors(anchors_path):
47 |     '''loads the anchors from a file'''
48 |     with open(anchors_path, encoding='utf-8') as f:
49 |         anchors = f.readline()
50 |     anchors = [float(x) for x in anchors.split(',')]
51 |     anchors = np.array(anchors).reshape(-1, 2)
52 |     return anchors, len(anchors)
53 | 
54 | #---------------------------------------------------#
55 | #   获得学习率
56 | #---------------------------------------------------#
57 | def get_lr(optimizer):
58 |     for param_group in optimizer.param_groups:
59 |         return param_group['lr']
60 | 
61 | def preprocess_input(image):
62 |     image /= 255.0
63 |     return image
64 | 
65 | def show_config(**kwargs):
66 |     print('Configurations:')
67 |     print('-' * 70)
68 |     print('|%25s | %40s|' % ('keys', 'values'))
69 |     print('-' * 70)
70 |     for key, value in kwargs.items():
71 |         print('|%25s | %40s|' % (str(key), str(value)))
72 |     print('-' * 70)
73 |         
74 | def download_weights(model_dir="./model_data"):
75 |     import os
76 |     from torch.hub import load_state_dict_from_url
77 |     
78 |     url = 'https://github.com/bubbliiiing/yolov7-tiny-pytorch/releases/download/v1.0/yolov7_tiny_backbone_weights.pth'
79 |     
80 |     if not os.path.exists(model_dir):
81 |         os.makedirs(model_dir)
82 |     load_state_dict_from_url(url, model_dir)


--------------------------------------------------------------------------------
/utils/utils_bbox.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | import math
  4 | from utils.utils_rbox import *
  5 | from utils.nms_rotated import obb_nms
  6 | 
  7 | class DecodeBox():
  8 |     def __init__(self, anchors, num_classes, input_shape, anchors_mask = [[6,7,8], [3,4,5], [0,1,2]]):
  9 |         super(DecodeBox, self).__init__()
 10 |         self.anchors        = anchors
 11 |         self.num_classes    = num_classes
 12 |         self.bbox_attrs     = 6 + num_classes
 13 |         self.input_shape    = input_shape
 14 |         #-----------------------------------------------------------#
 15 |         #   13x13的特征层对应的anchor是[142, 110],[192, 243],[459, 401]
 16 |         #   26x26的特征层对应的anchor是[36, 75],[76, 55],[72, 146]
 17 |         #   52x52的特征层对应的anchor是[12, 16],[19, 36],[40, 28]
 18 |         #-----------------------------------------------------------#
 19 |         self.anchors_mask   = anchors_mask
 20 | 
 21 |     def decode_box(self, inputs):
 22 |         outputs = []
 23 |         for i, input in enumerate(inputs):
 24 |             #-----------------------------------------------#
 25 |             #   输入的input一共有三个，他们的shape分别是
 26 |             #   batch_size = 1
 27 |             #   batch_size, 3 * (5 + 1 + 80), 20, 20
 28 |             #   batch_size, 255, 40, 40
 29 |             #   batch_size, 255, 80, 80
 30 |             #-----------------------------------------------#
 31 |             batch_size      = input.size(0)
 32 |             input_height    = input.size(2)
 33 |             input_width     = input.size(3)
 34 | 
 35 |             #-----------------------------------------------#
 36 |             #   输入为640x640时
 37 |             #   stride_h = stride_w = 32、16、8
 38 |             #-----------------------------------------------#
 39 |             stride_h = self.input_shape[0] / input_height
 40 |             stride_w = self.input_shape[1] / input_width
 41 |             #-------------------------------------------------#
 42 |             #   此时获得的scaled_anchors大小是相对于特征层的
 43 |             #-------------------------------------------------#
 44 |             scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in self.anchors[self.anchors_mask[i]]]
 45 | 
 46 |             #-----------------------------------------------#
 47 |             #   输入的input一共有三个，他们的shape分别是
 48 |             #   batch_size, 3, 20, 20, 85
 49 |             #   batch_size, 3, 40, 40, 85
 50 |             #   batch_size, 3, 80, 80, 85
 51 |             #-----------------------------------------------#
 52 |             prediction = input.view(batch_size, len(self.anchors_mask[i]),
 53 |                                     self.bbox_attrs, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
 54 | 
 55 |             #-----------------------------------------------#
 56 |             #   先验框的中心位置的调整参数
 57 |             #-----------------------------------------------#
 58 |             x = torch.sigmoid(prediction[..., 0])  
 59 |             y = torch.sigmoid(prediction[..., 1])
 60 |             #-----------------------------------------------#
 61 |             #   先验框的宽高调整参数
 62 |             #-----------------------------------------------#
 63 |             w = torch.sigmoid(prediction[..., 2]) 
 64 |             h = torch.sigmoid(prediction[..., 3]) 
 65 |             #-----------------------------------------------#
 66 |             #   获取旋转角度
 67 |             #-----------------------------------------------#
 68 |             angle       = torch.sigmoid(prediction[..., 4])
 69 |             #-----------------------------------------------#
 70 |             #   获得置信度，是否有物体
 71 |             #-----------------------------------------------#
 72 |             conf        = torch.sigmoid(prediction[..., 5])
 73 |             #-----------------------------------------------#
 74 |             #   种类置信度
 75 |             #-----------------------------------------------#
 76 |             pred_cls    = torch.sigmoid(prediction[..., 6:])
 77 | 
 78 |             FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
 79 |             LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
 80 | 
 81 |             #----------------------------------------------------------#
 82 |             #   生成网格，先验框中心，网格左上角 
 83 |             #   batch_size,3,20,20
 84 |             #----------------------------------------------------------#
 85 |             grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
 86 |                 batch_size * len(self.anchors_mask[i]), 1, 1).view(x.shape).type(FloatTensor)
 87 |             grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
 88 |                 batch_size * len(self.anchors_mask[i]), 1, 1).view(y.shape).type(FloatTensor)
 89 | 
 90 |             #----------------------------------------------------------#
 91 |             #   按照网格格式生成先验框的宽高
 92 |             #   batch_size,3,20,20
 93 |             #----------------------------------------------------------#
 94 |             anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
 95 |             anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
 96 |             anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
 97 |             anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)
 98 | 
 99 |             #----------------------------------------------------------#
100 |             #   利用预测结果对先验框进行调整
101 |             #   首先调整先验框的中心，从先验框中心向右下角偏移
102 |             #   再调整先验框的宽高。
103 |             #   x 0 ~ 1 => 0 ~ 2 => -0.5, 1.5 => 负责一定范围的目标的预测
104 |             #   y 0 ~ 1 => 0 ~ 2 => -0.5, 1.5 => 负责一定范围的目标的预测
105 |             #   w 0 ~ 1 => 0 ~ 2 => 0 ~ 4 => 先验框的宽高调节范围为0~4倍
106 |             #   h 0 ~ 1 => 0 ~ 2 => 0 ~ 4 => 先验框的宽高调节范围为0~4倍
107 |             #----------------------------------------------------------#
108 |             pred_boxes          = FloatTensor(prediction[..., :4].shape)
109 |             pred_boxes[..., 0]  = x.data * 2. - 0.5 + grid_x
110 |             pred_boxes[..., 1]  = y.data * 2. - 0.5 + grid_y
111 |             pred_boxes[..., 2]  = (w.data * 2) ** 2 * anchor_w
112 |             pred_boxes[..., 3]  = (h.data * 2) ** 2 * anchor_h
113 |             pred_theta          = (angle.data - 0.5) * math.pi
114 |             #----------------------------------------------------------#
115 |             #   将输出结果归一化成小数的形式
116 |             #----------------------------------------------------------#
117 |             _scale = torch.Tensor([input_width, input_height, input_width, input_height]).type(FloatTensor)
118 |             output = torch.cat((pred_boxes.view(batch_size, -1, 4) / _scale, pred_theta.view(batch_size, -1, 1),
119 |                                 conf.view(batch_size, -1, 1), pred_cls.view(batch_size, -1, self.num_classes)), -1)
120 |             outputs.append(output.data)
121 |         return outputs
122 | 
123 |     def non_max_suppression(self, prediction, num_classes, input_shape, image_shape, letterbox_image, conf_thres=0.5, nms_thres=0.4):
124 |         #----------------------------------------------------------#
125 |         #   prediction  [batch_size, num_anchors, 85]
126 |         #----------------------------------------------------------#
127 | 
128 |         output = [None for _ in range(len(prediction))]
129 |         for i, image_pred in enumerate(prediction):
130 |             #----------------------------------------------------------#
131 |             #   对种类预测部分取max。
132 |             #   class_conf  [num_anchors, 1]    种类置信度
133 |             #   class_pred  [num_anchors, 1]    种类
134 |             #----------------------------------------------------------#
135 |             class_conf, class_pred = torch.max(image_pred[:, 6:6 + num_classes], 1, keepdim=True)
136 | 
137 |             #----------------------------------------------------------#
138 |             #   利用置信度进行第一轮筛选
139 |             #----------------------------------------------------------#
140 |             conf_mask = (image_pred[:, 5] * class_conf[:, 0] >= conf_thres).squeeze()
141 |             #----------------------------------------------------------#
142 |             #   根据置信度进行预测结果的筛选
143 |             #----------------------------------------------------------#
144 |             image_pred = image_pred[conf_mask]
145 |             class_conf = class_conf[conf_mask]
146 |             class_pred = class_pred[conf_mask]
147 |             if not image_pred.size(0):
148 |                 continue
149 |             #-------------------------------------------------------------------------#
150 |             #   detections  [num_anchors, 8]
151 |             #   8的内容为：x, y, w, h, angle, obj_conf, class_conf, class_pred
152 |             #-------------------------------------------------------------------------#
153 |             detections = torch.cat((image_pred[:, :6], class_conf.float(), class_pred.float()), 1)
154 | 
155 |             #------------------------------------------#
156 |             #   获得预测结果中包含的所有种类
157 |             #------------------------------------------#
158 |             unique_labels = detections[:, -1].cpu().unique()
159 | 
160 |             if prediction.is_cuda:
161 |                 unique_labels = unique_labels.cuda()
162 |                 detections = detections.cuda()
163 | 
164 |             for c in unique_labels:
165 |                 #------------------------------------------#
166 |                 #   获得某一类得分筛选后全部的预测结果
167 |                 #------------------------------------------#
168 |                 detections_class = detections[detections[:, -1] == c]
169 | 
170 |                 #------------------------------------------#
171 |                 #   使用官方自带的非极大抑制会速度更快一些！
172 |                 #   筛选出一定区域内，属于同一种类得分最大的框
173 |                 #------------------------------------------#
174 |                 _, keep = obb_nms(
175 |                     detections_class[:, :5],
176 |                     detections_class[:, 5] * detections_class[:, 6],
177 |                     nms_thres
178 |                 )
179 |                 max_detections = detections_class[keep]
180 |                 
181 |                 # Add max detections to outputs
182 |                 output[i] = max_detections if output[i] is None else torch.cat((output[i], max_detections))
183 |             
184 |             if output[i] is not None:
185 |                 output[i] = output[i].cpu().numpy()
186 |                 output[i][:, :5]  = self.yolo_correct_boxes(output[i], input_shape, image_shape, letterbox_image)
187 |         return output
188 | 
189 |     def yolo_correct_boxes(self, output, input_shape, image_shape, letterbox_image):
190 |         #-----------------------------------------------------------------#
191 |         #   把y轴放前面是因为方便预测框和图像的宽高进行相乘
192 |         #-----------------------------------------------------------------#
193 |         box_xy = output[..., 0:2]
194 |         box_wh = output[..., 2:4]
195 |         angle  = output[..., 4:5]
196 |         box_yx = box_xy[..., ::-1]
197 |         box_hw = box_wh[..., ::-1]
198 |         input_shape = np.array(input_shape)
199 |         image_shape = np.array(image_shape)
200 | 
201 |         if letterbox_image:
202 |             #-----------------------------------------------------------------#
203 |             #   这里求出来的offset是图像有效区域相对于图像左上角的偏移情况
204 |             #   new_shape指的是宽高缩放情况
205 |             #-----------------------------------------------------------------#
206 |             new_shape = np.round(image_shape * np.min(input_shape/image_shape))
207 |             offset  = (input_shape - new_shape)/2./input_shape
208 |             scale   = input_shape/new_shape
209 | 
210 |             box_yx  = (box_yx - offset) * scale
211 |             box_hw *= scale
212 | 
213 |         box_xy = box_yx[:, ::-1]
214 |         box_hw = box_wh[:, ::-1]
215 | 
216 |         rboxes  = np.concatenate([box_xy, box_wh, angle], axis=-1)
217 |         rboxes[:, [0, 2]] *= image_shape[1]
218 |         rboxes[:, [1, 3]] *= image_shape[0]
219 |         return rboxes
220 | 
221 | if __name__ == "__main__":
222 |     import matplotlib.pyplot as plt
223 |     import numpy as np
224 | 
225 |     #---------------------------------------------------#
226 |     #   将预测值的每个特征层调成真实值
227 |     #---------------------------------------------------#
228 |     def get_anchors_and_decode(input, input_shape, anchors, anchors_mask, num_classes):
229 |         #-----------------------------------------------#
230 |         #   input   batch_size, 3 * (5 + 1 + num_classes), 20, 20
231 |         #-----------------------------------------------#
232 |         batch_size      = input.size(0)
233 |         input_height    = input.size(2)
234 |         input_width     = input.size(3)
235 | 
236 |         #-----------------------------------------------#
237 |         #   输入为640x640时 input_shape = [640, 640]  input_height = 20, input_width = 20
238 |         #   640 / 20 = 32
239 |         #   stride_h = stride_w = 32
240 |         #-----------------------------------------------#
241 |         stride_h = input_shape[0] / input_height
242 |         stride_w = input_shape[1] / input_width
243 |         #-------------------------------------------------#
244 |         #   此时获得的scaled_anchors大小是相对于特征层的
245 |         #   anchor_width, anchor_height / stride_h, stride_w
246 |         #-------------------------------------------------#
247 |         scaled_anchors = [(anchor_width / stride_w, anchor_height / stride_h) for anchor_width, anchor_height in anchors[anchors_mask[2]]]
248 | 
249 |         #-----------------------------------------------#
250 |         #   batch_size, 3 * (4 + 1 + num_classes), 20, 20 => 
251 |         #   batch_size, 3, 5 + num_classes, 20, 20  => 
252 |         #   batch_size, 3, 20, 20, 4 + 1 + num_classes
253 |         #-----------------------------------------------#
254 |         prediction = input.view(batch_size, len(anchors_mask[2]),
255 |                                 num_classes + 6, input_height, input_width).permute(0, 1, 3, 4, 2).contiguous()
256 | 
257 |         #-----------------------------------------------#
258 |         #   先验框的中心位置的调整参数
259 |         #-----------------------------------------------#
260 |         x = torch.sigmoid(prediction[..., 0])  
261 |         y = torch.sigmoid(prediction[..., 1])
262 |         #-----------------------------------------------#
263 |         #   先验框的宽高调整参数
264 |         #-----------------------------------------------#
265 |         w = torch.sigmoid(prediction[..., 2]) 
266 |         h = torch.sigmoid(prediction[..., 3]) 
267 |         #-----------------------------------------------#
268 |         #   获得置信度，是否有物体 0 - 1
269 |         #-----------------------------------------------#
270 |         conf        = torch.sigmoid(prediction[..., 5])
271 |         #-----------------------------------------------#
272 |         #   种类置信度 0 - 1
273 |         #-----------------------------------------------#
274 |         pred_cls    = torch.sigmoid(prediction[..., 6:])
275 | 
276 |         FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
277 |         LongTensor  = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
278 | 
279 |         #----------------------------------------------------------#
280 |         #   生成网格，先验框中心，网格左上角 
281 |         #   batch_size,3,20,20
282 |         #   range(20)
283 |         #   [
284 |         #       [0, 1, 2, 3 ……, 19], 
285 |         #       [0, 1, 2, 3 ……, 19], 
286 |         #       …… （20次）
287 |         #       [0, 1, 2, 3 ……, 19]
288 |         #   ] * (batch_size * 3)
289 |         #   [batch_size, 3, 20, 20]
290 |         #   
291 |         #   [
292 |         #       [0, 1, 2, 3 ……, 19], 
293 |         #       [0, 1, 2, 3 ……, 19], 
294 |         #       …… （20次）
295 |         #       [0, 1, 2, 3 ……, 19]
296 |         #   ].T * (batch_size * 3)
297 |         #   [batch_size, 3, 20, 20]
298 |         #----------------------------------------------------------#
299 |         grid_x = torch.linspace(0, input_width - 1, input_width).repeat(input_height, 1).repeat(
300 |             batch_size * len(anchors_mask[2]), 1, 1).view(x.shape).type(FloatTensor)
301 |         grid_y = torch.linspace(0, input_height - 1, input_height).repeat(input_width, 1).t().repeat(
302 |             batch_size * len(anchors_mask[2]), 1, 1).view(y.shape).type(FloatTensor)
303 | 
304 |         #----------------------------------------------------------#
305 |         #   按照网格格式生成先验框的宽高
306 |         #   batch_size, 3, 20 * 20 => batch_size, 3, 20, 20
307 |         #   batch_size, 3, 20 * 20 => batch_size, 3, 20, 20
308 |         #----------------------------------------------------------#
309 |         anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0]))
310 |         anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1]))
311 |         anchor_w = anchor_w.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(w.shape)
312 |         anchor_h = anchor_h.repeat(batch_size, 1).repeat(1, 1, input_height * input_width).view(h.shape)
313 | 
314 |         #----------------------------------------------------------#
315 |         #   利用预测结果对先验框进行调整
316 |         #   首先调整先验框的中心，从先验框中心向右下角偏移
317 |         #   再调整先验框的宽高。
318 |         #   x  0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_x
319 |         #   y  0 ~ 1 => 0 ~ 2 => -0.5 ~ 1.5 + grid_y
320 |         #   w  0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_w
321 |         #   h  0 ~ 1 => 0 ~ 2 => 0 ~ 4 * anchor_h 
322 |         #----------------------------------------------------------#
323 |         pred_boxes          = FloatTensor(prediction[..., :4].shape)
324 |         pred_boxes[..., 0]  = x.data * 2. - 0.5 + grid_x
325 |         pred_boxes[..., 1]  = y.data * 2. - 0.5 + grid_y
326 |         pred_boxes[..., 2]  = (w.data * 2) ** 2 * anchor_w
327 |         pred_boxes[..., 3]  = (h.data * 2) ** 2 * anchor_h
328 | 
329 |         point_h = 5
330 |         point_w = 5
331 |         
332 |         box_xy          = pred_boxes[..., 0:2].cpu().numpy() * 32
333 |         box_wh          = pred_boxes[..., 2:4].cpu().numpy() * 32
334 |         grid_x          = grid_x.cpu().numpy() * 32
335 |         grid_y          = grid_y.cpu().numpy() * 32
336 |         anchor_w        = anchor_w.cpu().numpy() * 32
337 |         anchor_h        = anchor_h.cpu().numpy() * 32
338 |         
339 |         fig = plt.figure()
340 |         ax  = fig.add_subplot(121)
341 |         from PIL import Image
342 |         img = Image.open("img/street.jpg").resize([640, 640])
343 |         plt.imshow(img, alpha=0.5)
344 |         plt.ylim(-30, 650)
345 |         plt.xlim(-30, 650)
346 |         plt.scatter(grid_x, grid_y)
347 |         plt.scatter(point_h * 32, point_w * 32, c='black')
348 |         plt.gca().invert_yaxis()
349 | 
350 |         anchor_left = grid_x - anchor_w / 2
351 |         anchor_top  = grid_y - anchor_h / 2
352 |         
353 |         rect1 = plt.Rectangle([anchor_left[0, 0, point_h, point_w],anchor_top[0, 0, point_h, point_w]], \
354 |             anchor_w[0, 0, point_h, point_w],anchor_h[0, 0, point_h, point_w],color="r",fill=False)
355 |         rect2 = plt.Rectangle([anchor_left[0, 1, point_h, point_w],anchor_top[0, 1, point_h, point_w]], \
356 |             anchor_w[0, 1, point_h, point_w],anchor_h[0, 1, point_h, point_w],color="r",fill=False)
357 |         rect3 = plt.Rectangle([anchor_left[0, 2, point_h, point_w],anchor_top[0, 2, point_h, point_w]], \
358 |             anchor_w[0, 2, point_h, point_w],anchor_h[0, 2, point_h, point_w],color="r",fill=False)
359 | 
360 |         ax.add_patch(rect1)
361 |         ax.add_patch(rect2)
362 |         ax.add_patch(rect3)
363 | 
364 |         ax  = fig.add_subplot(122)
365 |         plt.imshow(img, alpha=0.5)
366 |         plt.ylim(-30, 650)
367 |         plt.xlim(-30, 650)
368 |         plt.scatter(grid_x, grid_y)
369 |         plt.scatter(point_h * 32, point_w * 32, c='black')
370 |         plt.scatter(box_xy[0, :, point_h, point_w, 0], box_xy[0, :, point_h, point_w, 1], c='r')
371 |         plt.gca().invert_yaxis()
372 | 
373 |         pre_left    = box_xy[...,0] - box_wh[...,0] / 2
374 |         pre_top     = box_xy[...,1] - box_wh[...,1] / 2
375 | 
376 |         rect1 = plt.Rectangle([pre_left[0, 0, point_h, point_w], pre_top[0, 0, point_h, point_w]],\
377 |             box_wh[0, 0, point_h, point_w,0], box_wh[0, 0, point_h, point_w,1],color="r",fill=False)
378 |         rect2 = plt.Rectangle([pre_left[0, 1, point_h, point_w], pre_top[0, 1, point_h, point_w]],\
379 |             box_wh[0, 1, point_h, point_w,0], box_wh[0, 1, point_h, point_w,1],color="r",fill=False)
380 |         rect3 = plt.Rectangle([pre_left[0, 2, point_h, point_w], pre_top[0, 2, point_h, point_w]],\
381 |             box_wh[0, 2, point_h, point_w,0], box_wh[0, 2, point_h, point_w,1],color="r",fill=False)
382 | 
383 |         ax.add_patch(rect1)
384 |         ax.add_patch(rect2)
385 |         ax.add_patch(rect3)
386 | 
387 |         plt.show()
388 |         #
389 |     feat            = torch.from_numpy(np.random.normal(0.2, 0.5, [4, 258, 20, 20])).float()
390 |     anchors         = np.array([[116, 90], [156, 198], [373, 326], [30,61], [62,45], [59,119], [10,13], [16,30], [33,23]])
391 |     anchors_mask    = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
392 |     get_anchors_and_decode(feat, [640, 640], anchors, anchors_mask, 80)
393 | 


--------------------------------------------------------------------------------
/utils/utils_fit.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import torch
  4 | from tqdm import tqdm
  5 | 
  6 | from utils.utils import get_lr
  7 |         
  8 | def fit_one_epoch(model_train, model, ema, yolo_loss, loss_history, eval_callback, optimizer, epoch, epoch_step, epoch_step_val, gen, gen_val, Epoch, cuda, fp16, scaler, save_period, save_dir, local_rank=0):
  9 |     loss        = 0
 10 |     val_loss    = 0
 11 | 
 12 |     if local_rank == 0:
 13 |         print('Start Train')
 14 |         pbar = tqdm(total=epoch_step,desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3)
 15 |     model_train.train()
 16 |     for iteration, batch in enumerate(gen):
 17 |         if iteration >= epoch_step:
 18 |             break
 19 | 
 20 |         images, targets = batch[0], batch[1]
 21 |         with torch.no_grad():
 22 |             if cuda:
 23 |                 images  = images.cuda(local_rank)
 24 |                 targets = targets.cuda(local_rank)
 25 |         #----------------------#
 26 |         #   清零梯度
 27 |         #----------------------#
 28 |         optimizer.zero_grad()
 29 |         if not fp16:
 30 |             #----------------------#
 31 |             #   前向传播
 32 |             #----------------------#
 33 |             outputs         = model_train(images)
 34 |             loss_value      = yolo_loss(outputs, targets, images)
 35 | 
 36 |             #----------------------#
 37 |             #   反向传播
 38 |             #----------------------#
 39 |             loss_value.backward()
 40 |             optimizer.step()
 41 |         else:
 42 |             from torch.cuda.amp import autocast
 43 |             with autocast():
 44 |                 #----------------------#
 45 |                 #   前向传播
 46 |                 #----------------------#
 47 |                 outputs         = model_train(images)
 48 |                 loss_value      = yolo_loss(outputs, targets, images)
 49 | 
 50 |             #----------------------#
 51 |             #   反向传播
 52 |             #----------------------#
 53 |             scaler.scale(loss_value).backward()
 54 |             scaler.step(optimizer)
 55 |             scaler.update()
 56 |         if ema:
 57 |             ema.update(model_train)
 58 | 
 59 |         loss += loss_value.item()
 60 |         
 61 |         if local_rank == 0:
 62 |             pbar.set_postfix(**{'loss'  : loss / (iteration + 1), 
 63 |                                 'lr'    : get_lr(optimizer)})
 64 |             pbar.update(1)
 65 | 
 66 |     if local_rank == 0:
 67 |         pbar.close()
 68 |         print('Finish Train')
 69 |         print('Start Validation')
 70 |         pbar = tqdm(total=epoch_step_val, desc=f'Epoch {epoch + 1}/{Epoch}',postfix=dict,mininterval=0.3)
 71 | 
 72 |     if ema:
 73 |         model_train_eval = ema.ema
 74 |     else:
 75 |         model_train_eval = model_train.eval()
 76 |         
 77 |     for iteration, batch in enumerate(gen_val):
 78 |         if iteration >= epoch_step_val:
 79 |             break
 80 |         images, targets = batch[0], batch[1]
 81 |         with torch.no_grad():
 82 |             if cuda:
 83 |                 images  = images.cuda(local_rank)
 84 |                 targets = targets.cuda(local_rank)
 85 |             #----------------------#
 86 |             #   清零梯度
 87 |             #----------------------#
 88 |             optimizer.zero_grad()
 89 |             #----------------------#
 90 |             #   前向传播
 91 |             #----------------------#
 92 |             outputs         = model_train_eval(images)
 93 |             loss_value      = yolo_loss(outputs, targets, images)
 94 | 
 95 |         val_loss += loss_value.item()
 96 |         if local_rank == 0:
 97 |             pbar.set_postfix(**{'val_loss': val_loss / (iteration + 1)})
 98 |             pbar.update(1)
 99 |             
100 |     if local_rank == 0:
101 |         pbar.close()
102 |         print('Finish Validation')
103 |         loss_history.append_loss(epoch + 1, loss / epoch_step, val_loss / epoch_step_val)
104 |         eval_callback.on_epoch_end(epoch + 1, model_train_eval)
105 |         print('Epoch:'+ str(epoch + 1) + '/' + str(Epoch))
106 |         print('Total Loss: %.3f || Val Loss: %.3f ' % (loss / epoch_step, val_loss / epoch_step_val))
107 |         
108 |         #-----------------------------------------------#
109 |         #   保存权值
110 |         #-----------------------------------------------#
111 |         if ema:
112 |             save_state_dict = ema.ema.state_dict()
113 |         else:
114 |             save_state_dict = model.state_dict()
115 | 
116 |         if (epoch + 1) % save_period == 0 or epoch + 1 == Epoch:
117 |             torch.save(save_state_dict, os.path.join(save_dir, "ep%03d-loss%.3f-val_loss%.3f.pth" % (epoch + 1, loss / epoch_step, val_loss / epoch_step_val)))
118 |             
119 |         if len(loss_history.val_loss) <= 1 or (val_loss / epoch_step_val) <= min(loss_history.val_loss):
120 |             print('Save best model to best_epoch_weights.pth')
121 |             torch.save(save_state_dict, os.path.join(save_dir, "best_epoch_weights.pth"))
122 |             
123 |         torch.save(save_state_dict, os.path.join(save_dir, "last_epoch_weights.pth"))


--------------------------------------------------------------------------------
/utils/utils_rbox.py:
--------------------------------------------------------------------------------
  1 | '''
  2 | Author: [egrt]
  3 | Date: 2023-01-30 19:00:28
  4 | LastEditors: Egrt
  5 | LastEditTime: 2023-03-13 16:26:23
  6 | Description: Oriented Bounding Boxes utils
  7 | '''
  8 | 
  9 | import numpy as np
 10 | import math
 11 | pi = np.pi 
 12 | import cv2
 13 | import torch
 14 | 
 15 | def poly2rbox(polys):
 16 |     """
 17 |     Trans poly format to rbox format.
 18 |     Args:
 19 |         polys (array): (num_gts, [x1 y1 x2 y2 x3 y3 x4 y4]) 
 20 |     Returns:
 21 |         rboxes (array): (num_gts, [cx cy l s θ]) 
 22 |     """
 23 |     assert polys.shape[-1] == 8
 24 |     rboxes = []
 25 |     for poly in polys:
 26 |         poly = np.float32(poly.reshape(4, 2))
 27 |         (x, y), (w, h), angle = cv2.minAreaRect(poly) # θ ∈ [0， 90]
 28 |         theta = angle / 180 * pi # 转为pi制
 29 |         # trans opencv format to longedge format θ ∈ [-pi/2， pi/2]
 30 |         if w < h:
 31 |             w, h = h, w
 32 |             theta += np.pi / 2
 33 |         while not np.pi / 2 > theta >= -np.pi / 2:
 34 |             if theta >= np.pi / 2:
 35 |                 theta -= np.pi
 36 |             else:
 37 |                 theta += np.pi
 38 |         assert np.pi / 2 > theta >= -np.pi / 2
 39 |         rboxes.append([x, y, w, h, theta])
 40 |     return np.array(rboxes)
 41 | 
 42 | def poly2obb_np_le90(poly):
 43 |     """Convert polygons to oriented bounding boxes.
 44 |     Args:
 45 |         polys (ndarray): [x0,y0,x1,y1,x2,y2,x3,y3]
 46 |     Returns:
 47 |         obbs (ndarray): [x_ctr,y_ctr,w,h,angle]
 48 |     """
 49 |     bboxps = np.array(poly).reshape((4, 2))
 50 |     rbbox = cv2.minAreaRect(bboxps)
 51 |     x, y, w, h, a = rbbox[0][0], rbbox[0][1], rbbox[1][0], rbbox[1][1], rbbox[2]
 52 |     if w < 2 or h < 2:
 53 |         return
 54 |     a = a / 180 * np.pi
 55 |     if w < h:
 56 |         w, h = h, w
 57 |         a += np.pi / 2
 58 |     while not np.pi / 2 > a >= -np.pi / 2:
 59 |         if a >= np.pi / 2:
 60 |             a -= np.pi
 61 |         else:
 62 |             a += np.pi
 63 |     assert np.pi / 2 > a >= -np.pi / 2
 64 |     return x, y, w, h, a
 65 |     
 66 | def poly2hbb(polys):
 67 |     """
 68 |     Trans poly format to hbb format
 69 |     Args:
 70 |         rboxes (array/tensor): (num_gts, poly) 
 71 | 
 72 |     Returns:
 73 |         hbboxes (array/tensor): (num_gts, [xc yc w h]) 
 74 |     """
 75 |     assert polys.shape[-1] == 8
 76 |     if isinstance(polys, torch.Tensor):
 77 |         x = polys[:, 0::2] # (num, 4) 
 78 |         y = polys[:, 1::2]
 79 |         x_max = torch.amax(x, dim=1) # (num)
 80 |         x_min = torch.amin(x, dim=1)
 81 |         y_max = torch.amax(y, dim=1)
 82 |         y_min = torch.amin(y, dim=1)
 83 |         x_ctr, y_ctr = (x_max + x_min) / 2.0, (y_max + y_min) / 2.0 # (num)
 84 |         h = y_max - y_min # (num)
 85 |         w = x_max - x_min
 86 |         x_ctr, y_ctr, w, h = x_ctr.reshape(-1, 1), y_ctr.reshape(-1, 1), w.reshape(-1, 1), h.reshape(-1, 1) # (num, 1)
 87 |         hbboxes = torch.cat((x_ctr, y_ctr, w, h), dim=1)
 88 |     else:
 89 |         x = polys[:, 0::2] # (num, 4) 
 90 |         y = polys[:, 1::2]
 91 |         x_max = np.amax(x, axis=1) # (num)
 92 |         x_min = np.amin(x, axis=1) 
 93 |         y_max = np.amax(y, axis=1)
 94 |         y_min = np.amin(y, axis=1)
 95 |         x_ctr, y_ctr = (x_max + x_min) / 2.0, (y_max + y_min) / 2.0 # (num)
 96 |         h = y_max - y_min # (num)
 97 |         w = x_max - x_min
 98 |         x_ctr, y_ctr, w, h = x_ctr.reshape(-1, 1), y_ctr.reshape(-1, 1), w.reshape(-1, 1), h.reshape(-1, 1) # (num, 1)
 99 |         hbboxes = np.concatenate((x_ctr, y_ctr, w, h), axis=1)
100 |     return hbboxes
101 | 
102 | def rbox2poly(obboxes):
103 |     """Convert oriented bounding boxes to polygons.
104 |     Args:
105 |         obbs (ndarray): [x_ctr,y_ctr,w,h,angle]
106 |     Returns:
107 |         polys (ndarray): [x0,y0,x1,y1,x2,y2,x3,y3]
108 |     """
109 |     try:
110 |         center, w, h, theta = np.split(obboxes, (2, 3, 4), axis=-1)
111 |     except:
112 |         results = np.stack([0., 0., 0., 0., 0., 0., 0., 0.], axis=-1)
113 |         return results.reshape(1, -1)
114 |     Cos, Sin = np.cos(theta), np.sin(theta)
115 |     vector1 = np.concatenate([w / 2 * Cos, w / 2 * Sin], axis=-1)
116 |     vector2 = np.concatenate([-h / 2 * Sin, h / 2 * Cos], axis=-1)
117 |     point1 = center - vector1 - vector2
118 |     point2 = center + vector1 - vector2
119 |     point3 = center + vector1 + vector2
120 |     point4 = center - vector1 + vector2
121 |     polys = np.concatenate([point1, point2, point3, point4], axis=-1)
122 |     polys = get_best_begin_point(polys)
123 |     return polys
124 | 
125 | def cal_line_length(point1, point2):
126 |     """Calculate the length of line.
127 |     Args:
128 |         point1 (List): [x,y]
129 |         point2 (List): [x,y]
130 |     Returns:
131 |         length (float)
132 |     """
133 |     return math.sqrt(
134 |         math.pow(point1[0] - point2[0], 2) +
135 |         math.pow(point1[1] - point2[1], 2))
136 | 
137 | 
138 | def get_best_begin_point_single(coordinate):
139 |     """Get the best begin point of the single polygon.
140 |     Args:
141 |         coordinate (List): [x1, y1, x2, y2, x3, y3, x4, y4, score]
142 |     Returns:
143 |         reorder coordinate (List): [x1, y1, x2, y2, x3, y3, x4, y4, score]
144 |     """
145 |     x1, y1, x2, y2, x3, y3, x4, y4 = coordinate
146 |     xmin = min(x1, x2, x3, x4)
147 |     ymin = min(y1, y2, y3, y4)
148 |     xmax = max(x1, x2, x3, x4)
149 |     ymax = max(y1, y2, y3, y4)
150 |     combine = [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]],
151 |                [[x2, y2], [x3, y3], [x4, y4], [x1, y1]],
152 |                [[x3, y3], [x4, y4], [x1, y1], [x2, y2]],
153 |                [[x4, y4], [x1, y1], [x2, y2], [x3, y3]]]
154 |     dst_coordinate = [[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]]
155 |     force = 100000000.0
156 |     force_flag = 0
157 |     for i in range(4):
158 |         temp_force = cal_line_length(combine[i][0], dst_coordinate[0]) \
159 |                      + cal_line_length(combine[i][1], dst_coordinate[1]) \
160 |                      + cal_line_length(combine[i][2], dst_coordinate[2]) \
161 |                      + cal_line_length(combine[i][3], dst_coordinate[3])
162 |         if temp_force < force:
163 |             force = temp_force
164 |             force_flag = i
165 |     if force_flag != 0:
166 |         pass
167 |     return np.hstack(
168 |         (np.array(combine[force_flag]).reshape(8)))
169 | 
170 | 
171 | def get_best_begin_point(coordinates):
172 |     """Get the best begin points of polygons.
173 |     Args:
174 |         coordinate (ndarray): shape(n, 8).
175 |     Returns:
176 |         reorder coordinate (ndarray): shape(n, 8).
177 |     """
178 |     coordinates = list(map(get_best_begin_point_single, coordinates.tolist()))
179 |     coordinates = np.array(coordinates)
180 |     return coordinates
181 | 
182 | def correct_rboxes(rboxes, image_shape):
183 |     """将polys按比例进行缩放
184 |     Args:
185 |         coordinate (ndarray): shape(n, 8).
186 |     Returns:
187 |         reorder coordinate (ndarray): shape(n, 8).
188 |     """
189 |     polys = rbox2poly(rboxes)
190 |     nh, nw = image_shape
191 |     polys[:, [0, 2, 4, 6]] *= nw
192 |     polys[:, [1, 3, 5, 7]] *= nh
193 | 
194 |     return polys
195 | 


--------------------------------------------------------------------------------
/utils_coco/coco_annotation.py:
--------------------------------------------------------------------------------
  1 | #-------------------------------------------------------#
  2 | #   用于处理COCO数据集，根据json文件生成txt文件用于训练
  3 | #-------------------------------------------------------#
  4 | import json
  5 | import os
  6 | from collections import defaultdict
  7 | 
  8 | #-------------------------------------------------------#
  9 | #   指向了COCO训练集与验证集图片的路径
 10 | #-------------------------------------------------------#
 11 | train_datasets_path     = "coco_dataset/train2017"
 12 | val_datasets_path       = "coco_dataset/val2017"
 13 | 
 14 | #-------------------------------------------------------#
 15 | #   指向了COCO训练集与验证集标签的路径
 16 | #-------------------------------------------------------#
 17 | train_annotation_path   = "coco_dataset/annotations/instances_train2017.json"
 18 | val_annotation_path     = "coco_dataset/annotations/instances_val2017.json"
 19 | 
 20 | #-------------------------------------------------------#
 21 | #   生成的txt文件路径
 22 | #-------------------------------------------------------#
 23 | train_output_path       = "coco_train.txt"
 24 | val_output_path         = "coco_val.txt"
 25 | 
 26 | if __name__ == "__main__":
 27 |     name_box_id = defaultdict(list)
 28 |     id_name     = dict()
 29 |     f           = open(train_annotation_path, encoding='utf-8')
 30 |     data        = json.load(f)
 31 | 
 32 |     annotations = data['annotations']
 33 |     for ant in annotations:
 34 |         id = ant['image_id']
 35 |         name = os.path.join(train_datasets_path, '%012d.jpg' % id)
 36 |         cat = ant['category_id']
 37 |         if cat >= 1 and cat <= 11:
 38 |             cat = cat - 1
 39 |         elif cat >= 13 and cat <= 25:
 40 |             cat = cat - 2
 41 |         elif cat >= 27 and cat <= 28:
 42 |             cat = cat - 3
 43 |         elif cat >= 31 and cat <= 44:
 44 |             cat = cat - 5
 45 |         elif cat >= 46 and cat <= 65:
 46 |             cat = cat - 6
 47 |         elif cat == 67:
 48 |             cat = cat - 7
 49 |         elif cat == 70:
 50 |             cat = cat - 9
 51 |         elif cat >= 72 and cat <= 82:
 52 |             cat = cat - 10
 53 |         elif cat >= 84 and cat <= 90:
 54 |             cat = cat - 11
 55 |         name_box_id[name].append([ant['bbox'], cat])
 56 | 
 57 |     f = open(train_output_path, 'w')
 58 |     for key in name_box_id.keys():
 59 |         f.write(key)
 60 |         box_infos = name_box_id[key]
 61 |         for info in box_infos:
 62 |             x_min = int(info[0][0])
 63 |             y_min = int(info[0][1])
 64 |             x_max = x_min + int(info[0][2])
 65 |             y_max = y_min + int(info[0][3])
 66 | 
 67 |             box_info = " %d,%d,%d,%d,%d" % (
 68 |                 x_min, y_min, x_max, y_max, int(info[1]))
 69 |             f.write(box_info)
 70 |         f.write('\n')
 71 |     f.close()
 72 | 
 73 |     name_box_id = defaultdict(list)
 74 |     id_name     = dict()
 75 |     f           = open(val_annotation_path, encoding='utf-8')
 76 |     data        = json.load(f)
 77 | 
 78 |     annotations = data['annotations']
 79 |     for ant in annotations:
 80 |         id = ant['image_id']
 81 |         name = os.path.join(val_datasets_path, '%012d.jpg' % id)
 82 |         cat = ant['category_id']
 83 |         if cat >= 1 and cat <= 11:
 84 |             cat = cat - 1
 85 |         elif cat >= 13 and cat <= 25:
 86 |             cat = cat - 2
 87 |         elif cat >= 27 and cat <= 28:
 88 |             cat = cat - 3
 89 |         elif cat >= 31 and cat <= 44:
 90 |             cat = cat - 5
 91 |         elif cat >= 46 and cat <= 65:
 92 |             cat = cat - 6
 93 |         elif cat == 67:
 94 |             cat = cat - 7
 95 |         elif cat == 70:
 96 |             cat = cat - 9
 97 |         elif cat >= 72 and cat <= 82:
 98 |             cat = cat - 10
 99 |         elif cat >= 84 and cat <= 90:
100 |             cat = cat - 11
101 |         name_box_id[name].append([ant['bbox'], cat])
102 | 
103 |     f = open(val_output_path, 'w')
104 |     for key in name_box_id.keys():
105 |         f.write(key)
106 |         box_infos = name_box_id[key]
107 |         for info in box_infos:
108 |             x_min = int(info[0][0])
109 |             y_min = int(info[0][1])
110 |             x_max = x_min + int(info[0][2])
111 |             y_max = y_min + int(info[0][3])
112 | 
113 |             box_info = " %d,%d,%d,%d,%d" % (
114 |                 x_min, y_min, x_max, y_max, int(info[1]))
115 |             f.write(box_info)
116 |         f.write('\n')
117 |     f.close()
118 | 


--------------------------------------------------------------------------------
/utils_coco/get_map_coco.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | 
  4 | import numpy as np
  5 | import torch
  6 | from PIL import Image
  7 | from pycocotools.coco import COCO
  8 | from pycocotools.cocoeval import COCOeval
  9 | from tqdm import tqdm
 10 | 
 11 | from utils.utils import cvtColor, preprocess_input, resize_image
 12 | from yolo import YOLO
 13 | 
 14 | #---------------------------------------------------------------------------#
 15 | #   map_mode用于指定该文件运行时计算的内容
 16 | #   map_mode为0代表整个map计算流程，包括获得预测结果、计算map。
 17 | #   map_mode为1代表仅仅获得预测结果。
 18 | #   map_mode为2代表仅仅获得计算map。
 19 | #---------------------------------------------------------------------------#
 20 | map_mode            = 0
 21 | #-------------------------------------------------------#
 22 | #   指向了验证集标签与图片路径
 23 | #-------------------------------------------------------#
 24 | cocoGt_path         = 'coco_dataset/annotations/instances_val2017.json'
 25 | dataset_img_path    = 'coco_dataset/val2017'
 26 | #-------------------------------------------------------#
 27 | #   结果输出的文件夹，默认为map_out
 28 | #-------------------------------------------------------#
 29 | temp_save_path      = 'map_out/coco_eval'
 30 | 
 31 | class mAP_YOLO(YOLO):
 32 |     #---------------------------------------------------#
 33 |     #   检测图片
 34 |     #---------------------------------------------------#
 35 |     def detect_image(self, image_id, image, results, clsid2catid):
 36 |         #---------------------------------------------------#
 37 |         #   计算输入图片的高和宽
 38 |         #---------------------------------------------------#
 39 |         image_shape = np.array(np.shape(image)[0:2])
 40 |         #---------------------------------------------------------#
 41 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
 42 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
 43 |         #---------------------------------------------------------#
 44 |         image       = cvtColor(image)
 45 |         #---------------------------------------------------------#
 46 |         #   给图像增加灰条，实现不失真的resize
 47 |         #   也可以直接resize进行识别
 48 |         #---------------------------------------------------------#
 49 |         image_data  = resize_image(image, (self.input_shape[1],self.input_shape[0]), self.letterbox_image)
 50 |         #---------------------------------------------------------#
 51 |         #   添加上batch_size维度
 52 |         #---------------------------------------------------------#
 53 |         image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
 54 | 
 55 |         with torch.no_grad():
 56 |             images = torch.from_numpy(image_data)
 57 |             if self.cuda:
 58 |                 images = images.cuda()
 59 |             #---------------------------------------------------------#
 60 |             #   将图像输入网络当中进行预测！
 61 |             #---------------------------------------------------------#
 62 |             outputs = self.net(images)
 63 |             outputs = self.bbox_util.decode_box(outputs)
 64 |             #---------------------------------------------------------#
 65 |             #   将预测框进行堆叠，然后进行非极大抑制
 66 |             #---------------------------------------------------------#
 67 |             outputs = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 
 68 |                         image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
 69 |                                                     
 70 |             if outputs[0] is None: 
 71 |                 return results
 72 | 
 73 |             top_label   = np.array(outputs[0][:, 6], dtype = 'int32')
 74 |             top_conf    = outputs[0][:, 4] * outputs[0][:, 5]
 75 |             top_boxes   = outputs[0][:, :4]
 76 | 
 77 |         for i, c in enumerate(top_label):
 78 |             result                      = {}
 79 |             top, left, bottom, right    = top_boxes[i]
 80 | 
 81 |             result["image_id"]      = int(image_id)
 82 |             result["category_id"]   = clsid2catid[c]
 83 |             result["bbox"]          = [float(left),float(top),float(right-left),float(bottom-top)]
 84 |             result["score"]         = float(top_conf[i])
 85 |             results.append(result)
 86 |         return results
 87 | 
 88 | if __name__ == "__main__":
 89 |     if not os.path.exists(temp_save_path):
 90 |         os.makedirs(temp_save_path)
 91 | 
 92 |     cocoGt      = COCO(cocoGt_path)
 93 |     ids         = list(cocoGt.imgToAnns.keys())
 94 |     clsid2catid = cocoGt.getCatIds()
 95 | 
 96 |     if map_mode == 0 or map_mode == 1:
 97 |         yolo = mAP_YOLO(confidence = 0.001, nms_iou = 0.65)
 98 | 
 99 |         with open(os.path.join(temp_save_path, 'eval_results.json'),"w") as f:
100 |             results = []
101 |             for image_id in tqdm(ids):
102 |                 image_path  = os.path.join(dataset_img_path, cocoGt.loadImgs(image_id)[0]['file_name'])
103 |                 image       = Image.open(image_path)
104 |                 results     = yolo.detect_image(image_id, image, results, clsid2catid)
105 |             json.dump(results, f)
106 | 
107 |     if map_mode == 0 or map_mode == 2:
108 |         cocoDt      = cocoGt.loadRes(os.path.join(temp_save_path, 'eval_results.json'))
109 |         cocoEval    = COCOeval(cocoGt, cocoDt, 'bbox') 
110 |         cocoEval.evaluate()
111 |         cocoEval.accumulate()
112 |         cocoEval.summarize()
113 |         print("Get map done.")
114 | 


--------------------------------------------------------------------------------
/voc_annotation.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import random
  3 | import xml.etree.ElementTree as ET
  4 | 
  5 | import numpy as np
  6 | from utils.utils_rbox import *
  7 | from utils.utils import get_classes
  8 | 
  9 | #--------------------------------------------------------------------------------------------------------------------------------#
 10 | #   annotation_mode用于指定该文件运行时计算的内容
 11 | #   annotation_mode为0代表整个标签处理过程，包括获得VOCdevkit/VOC2007/ImageSets里面的txt以及训练用的2007_train.txt、2007_val.txt
 12 | #   annotation_mode为1代表获得VOCdevkit/VOC2007/ImageSets里面的txt
 13 | #   annotation_mode为2代表获得训练用的2007_train.txt、2007_val.txt
 14 | #--------------------------------------------------------------------------------------------------------------------------------#
 15 | annotation_mode     = 0
 16 | #-------------------------------------------------------------------#
 17 | #   必须要修改，用于生成2007_train.txt、2007_val.txt的目标信息
 18 | #   与训练和预测所用的classes_path一致即可
 19 | #   如果生成的2007_train.txt里面没有目标信息
 20 | #   那么就是因为classes没有设定正确
 21 | #   仅在annotation_mode为0和2的时候有效
 22 | #-------------------------------------------------------------------#
 23 | classes_path        = 'model_data/uav_classes.txt'
 24 | #--------------------------------------------------------------------------------------------------------------------------------#
 25 | #   trainval_percent用于指定(训练集+验证集)与测试集的比例，默认情况下 (训练集+验证集):测试集 = 9:1
 26 | #   train_percent用于指定(训练集+验证集)中训练集与验证集的比例，默认情况下 训练集:验证集 = 9:1
 27 | #   仅在annotation_mode为0和1的时候有效
 28 | #--------------------------------------------------------------------------------------------------------------------------------#
 29 | trainval_percent    = 0.9
 30 | train_percent       = 0.9
 31 | #-------------------------------------------------------#
 32 | #   指向VOC数据集所在的文件夹
 33 | #   默认指向根目录下的VOC数据集
 34 | #-------------------------------------------------------#
 35 | VOCdevkit_path  = 'VOCdevkit'
 36 | 
 37 | VOCdevkit_sets  = [('2007', 'train'), ('2007', 'val')]
 38 | classes, _      = get_classes(classes_path)
 39 | 
 40 | #-------------------------------------------------------#
 41 | #   统计目标数量
 42 | #-------------------------------------------------------#
 43 | photo_nums  = np.zeros(len(VOCdevkit_sets))
 44 | nums        = np.zeros(len(classes))
 45 | def convert_annotation(year, image_id, list_file):
 46 |     in_file = open(os.path.join(VOCdevkit_path, 'VOC%s/Annotations/%s.xml'%(year, image_id)), encoding='utf-8')
 47 |     tree=ET.parse(in_file)
 48 |     root = tree.getroot()
 49 | 
 50 |     for obj in root.iter('object'):
 51 |         difficult = 0 
 52 |         if obj.find('difficult')!=None:
 53 |             difficult = obj.find('difficult').text
 54 |         cls = obj.find('name').text
 55 |         if cls not in classes or int(difficult)==1:
 56 |             continue
 57 |         cls_id = classes.index(cls)
 58 |         xmlbox = obj.find('robndbox')
 59 |         cx = float(xmlbox.find('cx').text)
 60 |         cy = float(xmlbox.find('cy').text)
 61 |         h = float(xmlbox.find('h').text)
 62 |         w = float(xmlbox.find('w').text)
 63 |         angle = float(xmlbox.find('angle').text)
 64 |         b = np.array([[cx, cy, w, h, angle]], dtype=np.float32)
 65 |         b = rbox2poly(b)[0]
 66 |         b = (b[0], b[1], b[2], b[3], b[4], b[5], b[6], b[7])
 67 |         list_file.write(" " + ",".join([str(a) for a in b]) + ',' + str(cls_id))
 68 |         
 69 |         nums[classes.index(cls)] = nums[classes.index(cls)] + 1
 70 |         
 71 | if __name__ == "__main__":
 72 |     random.seed(0)
 73 |     if " " in os.path.abspath(VOCdevkit_path):
 74 |         raise ValueError("数据集存放的文件夹路径与图片名称中不可以存在空格，否则会影响正常的模型训练，请注意修改。")
 75 | 
 76 |     if annotation_mode == 0 or annotation_mode == 1:
 77 |         print("Generate txt in ImageSets.")
 78 |         xmlfilepath     = os.path.join(VOCdevkit_path, 'VOC2007/Annotations')
 79 |         saveBasePath    = os.path.join(VOCdevkit_path, 'VOC2007/ImageSets/Main')
 80 |         temp_xml        = os.listdir(xmlfilepath)
 81 |         total_xml       = []
 82 |         for xml in temp_xml:
 83 |             if xml.endswith(".xml"):
 84 |                 total_xml.append(xml)
 85 | 
 86 |         num     = len(total_xml)  
 87 |         list    = range(num)  
 88 |         tv      = int(num*trainval_percent)  
 89 |         tr      = int(tv*train_percent)  
 90 |         trainval= random.sample(list,tv)  
 91 |         train   = random.sample(trainval,tr)  
 92 |         
 93 |         print("train and val size",tv)
 94 |         print("train size",tr)
 95 |         ftrainval   = open(os.path.join(saveBasePath,'trainval.txt'), 'w')  
 96 |         ftest       = open(os.path.join(saveBasePath,'test.txt'), 'w')  
 97 |         ftrain      = open(os.path.join(saveBasePath,'train.txt'), 'w')  
 98 |         fval        = open(os.path.join(saveBasePath,'val.txt'), 'w')  
 99 |         
100 |         for i in list:  
101 |             name=total_xml[i][:-4]+'\n'  
102 |             if i in trainval:  
103 |                 ftrainval.write(name)  
104 |                 if i in train:  
105 |                     ftrain.write(name)  
106 |                 else:  
107 |                     fval.write(name)  
108 |             else:  
109 |                 ftest.write(name)  
110 |         
111 |         ftrainval.close()  
112 |         ftrain.close()  
113 |         fval.close()  
114 |         ftest.close()
115 |         print("Generate txt in ImageSets done.")
116 | 
117 |     if annotation_mode == 0 or annotation_mode == 2:
118 |         print("Generate 2007_train.txt and 2007_val.txt for train.")
119 |         type_index = 0
120 |         for year, image_set in VOCdevkit_sets:
121 |             image_ids = open(os.path.join(VOCdevkit_path, 'VOC%s/ImageSets/Main/%s.txt'%(year, image_set)), encoding='utf-8').read().strip().split()
122 |             list_file = open('%s_%s.txt'%(year, image_set), 'w', encoding='utf-8')
123 |             for image_id in image_ids:
124 |                 list_file.write('%s/VOC%s/JPEGImages/%s.jpg'%(os.path.abspath(VOCdevkit_path), year, image_id))
125 | 
126 |                 convert_annotation(year, image_id, list_file)
127 |                 list_file.write('\n')
128 |             photo_nums[type_index] = len(image_ids)
129 |             type_index += 1
130 |             list_file.close()
131 |         print("Generate 2007_train.txt and 2007_val.txt for train done.")
132 |         
133 |         def printTable(List1, List2):
134 |             for i in range(len(List1[0])):
135 |                 print("|", end=' ')
136 |                 for j in range(len(List1)):
137 |                     print(List1[j][i].rjust(int(List2[j])), end=' ')
138 |                     print("|", end=' ')
139 |                 print()
140 | 
141 |         str_nums = [str(int(x)) for x in nums]
142 |         tableData = [
143 |             classes, str_nums
144 |         ]
145 |         colWidths = [0]*len(tableData)
146 |         len1 = 0
147 |         for i in range(len(tableData)):
148 |             for j in range(len(tableData[i])):
149 |                 if len(tableData[i][j]) > colWidths[i]:
150 |                     colWidths[i] = len(tableData[i][j])
151 |         printTable(tableData, colWidths)
152 | 
153 |         if photo_nums[0] <= 500:
154 |             print("训练集数量小于500，属于较小的数据量，请注意设置较大的训练世代（Epoch）以满足足够的梯度下降次数（Step）。")
155 | 
156 |         if np.sum(nums) == 0:
157 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
158 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
159 |             print("在数据集中并未获得任何目标，请注意修改classes_path对应自己的数据集，并且保证标签名字正确，否则训练将会没有任何效果！")
160 |             print("（重要的事情说三遍）。")
161 | 


--------------------------------------------------------------------------------
/yolo.py:
--------------------------------------------------------------------------------
  1 | import colorsys
  2 | import os
  3 | import time
  4 | 
  5 | import numpy as np
  6 | import torch
  7 | import torch.nn as nn
  8 | from PIL import ImageDraw, ImageFont
  9 | 
 10 | from nets.yolo import YoloBody
 11 | from utils.utils import (cvtColor, get_anchors, get_classes, preprocess_input,
 12 |                          resize_image, show_config)
 13 | from utils.utils_bbox import DecodeBox
 14 | from utils.utils_rbox import *
 15 | '''
 16 | 训练自己的数据集必看注释！
 17 | '''
 18 | class YOLO(object):
 19 |     _defaults = {
 20 |         #--------------------------------------------------------------------------#
 21 |         #   使用自己训练好的模型进行预测一定要修改model_path和classes_path！
 22 |         #   model_path指向logs文件夹下的权值文件，classes_path指向model_data下的txt
 23 |         #
 24 |         #   训练好后logs文件夹下存在多个权值文件，选择验证集损失较低的即可。
 25 |         #   验证集损失较低不代表mAP较高，仅代表该权值在验证集上泛化性能较好。
 26 |         #   如果出现shape不匹配，同时要注意训练时的model_path和classes_path参数的修改
 27 |         #--------------------------------------------------------------------------#
 28 |         "model_path"        : 'model_data/yolov7_tiny_obb_uav.pth',
 29 |         "classes_path"      : 'model_data/uav_classes.txt',
 30 |         #---------------------------------------------------------------------#
 31 |         #   anchors_path代表先验框对应的txt文件，一般不修改。
 32 |         #   anchors_mask用于帮助代码找到对应的先验框，一般不修改。
 33 |         #---------------------------------------------------------------------#
 34 |         "anchors_path"      : 'model_data/yolo_anchors.txt',
 35 |         "anchors_mask"      : [[6, 7, 8], [3, 4, 5], [0, 1, 2]],
 36 |         #---------------------------------------------------------------------#
 37 |         #   输入图片的大小，必须为32的倍数。
 38 |         #---------------------------------------------------------------------#
 39 |         "input_shape"       : [640, 640],
 40 |         #---------------------------------------------------------------------#
 41 |         #   只有得分大于置信度的预测框会被保留下来
 42 |         #---------------------------------------------------------------------#
 43 |         "confidence"        : 0.5,
 44 |         #---------------------------------------------------------------------#
 45 |         #   非极大抑制所用到的nms_iou大小
 46 |         #---------------------------------------------------------------------#
 47 |         "nms_iou"           : 0.3,
 48 |         #---------------------------------------------------------------------#
 49 |         #   该变量用于控制是否使用letterbox_image对输入图像进行不失真的resize，
 50 |         #   在多次测试后，发现关闭letterbox_image直接resize的效果更好
 51 |         #---------------------------------------------------------------------#
 52 |         "letterbox_image"   : True,
 53 |         #-------------------------------#
 54 |         #   是否使用Cuda
 55 |         #   没有GPU可以设置成False
 56 |         #-------------------------------#
 57 |         "cuda"              : True,
 58 |     }
 59 | 
 60 |     @classmethod
 61 |     def get_defaults(cls, n):
 62 |         if n in cls._defaults:
 63 |             return cls._defaults[n]
 64 |         else:
 65 |             return "Unrecognized attribute name '" + n + "'"
 66 | 
 67 |     #---------------------------------------------------#
 68 |     #   初始化YOLO
 69 |     #---------------------------------------------------#
 70 |     def __init__(self, **kwargs):
 71 |         self.__dict__.update(self._defaults)
 72 |         for name, value in kwargs.items():
 73 |             setattr(self, name, value)
 74 |             self._defaults[name] = value 
 75 |             
 76 |         #---------------------------------------------------#
 77 |         #   获得种类和先验框的数量
 78 |         #---------------------------------------------------#
 79 |         self.class_names, self.num_classes  = get_classes(self.classes_path)
 80 |         self.anchors, self.num_anchors      = get_anchors(self.anchors_path)
 81 |         self.bbox_util                      = DecodeBox(self.anchors, self.num_classes, (self.input_shape[0], self.input_shape[1]), self.anchors_mask)
 82 |         #---------------------------------------------------#
 83 |         #   画框设置不同的颜色
 84 |         #---------------------------------------------------#
 85 |         hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]
 86 |         self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
 87 |         self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))
 88 |         self.generate()
 89 | 
 90 |         show_config(**self._defaults)
 91 | 
 92 |     #---------------------------------------------------#
 93 |     #   生成模型
 94 |     #---------------------------------------------------#
 95 |     def generate(self, onnx=False, trt=False):
 96 |         #---------------------------------------------------#
 97 |         #   建立yolo模型，载入yolo模型的权重
 98 |         #---------------------------------------------------#
 99 |         self.net    = YoloBody(self.anchors_mask, self.num_classes)
100 |         device      = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
101 |         self.net.load_state_dict(torch.load(self.model_path, map_location=device))
102 |         self.net    = self.net.fuse().eval()
103 |         print('{} model, and classes loaded.'.format(self.model_path))
104 |         if not onnx:
105 |             if self.cuda:
106 |                 self.net = nn.DataParallel(self.net)
107 |                 self.net = self.net.cuda()
108 |         if trt:
109 |             from torch2trt import TRTModule
110 | 
111 |             model_trt = TRTModule()
112 |             model_trt.load_state_dict(torch.load('model_data/yolov7_tiny_trt.pth'))
113 |             self.net  = model_trt
114 |     #---------------------------------------------------#
115 |     #   检测图片
116 |     #---------------------------------------------------#
117 |     def detect_image(self, image, crop = False, count = False):
118 |         #---------------------------------------------------#
119 |         #   计算输入图片的高和宽
120 |         #---------------------------------------------------#
121 |         image_shape = np.array(np.shape(image)[0:2])
122 |         #---------------------------------------------------------#
123 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
124 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
125 |         #---------------------------------------------------------#
126 |         image       = cvtColor(image)
127 |         #---------------------------------------------------------#
128 |         #   给图像增加灰条，实现不失真的resize
129 |         #   也可以直接resize进行识别
130 |         #---------------------------------------------------------#
131 |         image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
132 |         #---------------------------------------------------------#
133 |         #   添加上batch_size维度
134 |         #   h, w, 3 => 3, h, w => 1, 3, h, w
135 |         #---------------------------------------------------------#
136 |         image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
137 | 
138 |         with torch.no_grad():
139 |             images = torch.from_numpy(image_data)
140 |             if self.cuda:
141 |                 images = images.cuda()
142 |             #---------------------------------------------------------#
143 |             #   将图像输入网络当中进行预测！
144 |             #---------------------------------------------------------#
145 |             outputs = self.net(images)
146 |             outputs = self.bbox_util.decode_box(outputs)
147 |             #---------------------------------------------------------#
148 |             #   将预测框进行堆叠，然后进行非极大抑制
149 |             #---------------------------------------------------------#
150 |             results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 
151 |                         image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
152 |                                                     
153 |             if results[0] is None: 
154 |                 return image
155 | 
156 |             top_label   = np.array(results[0][:, 7], dtype = 'int32')
157 |             top_conf    = results[0][:, 5] * results[0][:, 6]
158 |             top_rboxes  = results[0][:, :5]
159 |             top_polys   = rbox2poly(top_rboxes)
160 |         #---------------------------------------------------------#
161 |         #   设置字体与边框厚度
162 |         #---------------------------------------------------------#
163 |         font        = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
164 |         thickness   = int(max((image.size[0] + image.size[1]) // np.mean(self.input_shape), 1))
165 |         #---------------------------------------------------------#
166 |         #   计数
167 |         #---------------------------------------------------------#
168 |         if count:
169 |             print("top_label:", top_label)
170 |             classes_nums    = np.zeros([self.num_classes])
171 |             for i in range(self.num_classes):
172 |                 num = np.sum(top_label == i)
173 |                 if num > 0:
174 |                     print(self.class_names[i], " : ", num)
175 |                 classes_nums[i] = num
176 |             print("classes_nums:", classes_nums)
177 |         #---------------------------------------------------------#
178 |         #   图像绘制
179 |         #---------------------------------------------------------#
180 |         for i, c in list(enumerate(top_label)):
181 |             predicted_class = self.class_names[int(c)]
182 |             poly            = top_polys[i].astype(np.int32)
183 |             score           = top_conf[i]
184 | 
185 |             polygon_list = list(poly)
186 |             label = '{} {:.2f}'.format(predicted_class, score)
187 |             draw = ImageDraw.Draw(image)
188 |             label = label.encode('utf-8')
189 |             
190 |             text_origin = np.array([poly[0], poly[1]], np.int32)
191 | 
192 |             draw.polygon(xy=polygon_list, outline=self.colors[c])
193 |             draw.text(text_origin, str(label,'UTF-8'), fill=self.colors[c], font=font)
194 |             del draw
195 | 
196 |         return image
197 | 
198 |     def get_FPS(self, image, test_interval):
199 |         image_shape = np.array(np.shape(image)[0:2])
200 |         #---------------------------------------------------------#
201 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
202 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
203 |         #---------------------------------------------------------#
204 |         image       = cvtColor(image)
205 |         #---------------------------------------------------------#
206 |         #   给图像增加灰条，实现不失真的resize
207 |         #   也可以直接resize进行识别
208 |         #---------------------------------------------------------#
209 |         image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
210 |         #---------------------------------------------------------#
211 |         #   添加上batch_size维度
212 |         #---------------------------------------------------------#
213 |         image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
214 | 
215 |         with torch.no_grad():
216 |             images = torch.from_numpy(image_data)
217 |             if self.cuda:
218 |                 images = images.cuda()
219 |             #---------------------------------------------------------#
220 |             #   将图像输入网络当中进行预测！
221 |             #---------------------------------------------------------#
222 |             outputs = self.net(images)
223 |             outputs = self.bbox_util.decode_box(outputs)
224 |             #---------------------------------------------------------#
225 |             #   将预测框进行堆叠，然后进行非极大抑制
226 |             #---------------------------------------------------------#
227 |             results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 
228 |                         image_shape, self.letterbox_image, conf_thres=self.confidence, nms_thres=self.nms_iou)
229 |                                                     
230 |         t1 = time.time()
231 |         for _ in range(test_interval):
232 |             with torch.no_grad():
233 |                 #---------------------------------------------------------#
234 |                 #   将图像输入网络当中进行预测！
235 |                 #---------------------------------------------------------#
236 |                 outputs = self.net(images)
237 |                 outputs = self.bbox_util.decode_box(outputs)
238 |                 #---------------------------------------------------------#
239 |                 #   将预测框进行堆叠，然后进行非极大抑制
240 |                 #---------------------------------------------------------#
241 |                 results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 
242 |                             image_shape, self.letterbox_image, conf_thres=self.confidence, nms_thres=self.nms_iou)
243 |                             
244 |         t2 = time.time()
245 |         tact_time = (t2 - t1) / test_interval
246 |         return tact_time
247 | 
248 |     def detect_heatmap(self, image, heatmap_save_path):
249 |         import cv2
250 |         import matplotlib.pyplot as plt
251 |         def sigmoid(x):
252 |             y = 1.0 / (1.0 + np.exp(-x))
253 |             return y
254 |         #---------------------------------------------------------#
255 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
256 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
257 |         #---------------------------------------------------------#
258 |         image       = cvtColor(image)
259 |         #---------------------------------------------------------#
260 |         #   给图像增加灰条，实现不失真的resize
261 |         #   也可以直接resize进行识别
262 |         #---------------------------------------------------------#
263 |         image_data  = resize_image(image, (self.input_shape[1],self.input_shape[0]), self.letterbox_image)
264 |         #---------------------------------------------------------#
265 |         #   添加上batch_size维度
266 |         #---------------------------------------------------------#
267 |         image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
268 | 
269 |         with torch.no_grad():
270 |             images = torch.from_numpy(image_data)
271 |             if self.cuda:
272 |                 images = images.cuda()
273 |             #---------------------------------------------------------#
274 |             #   将图像输入网络当中进行预测！
275 |             #---------------------------------------------------------#
276 |             outputs = self.net(images)
277 |         
278 |         plt.imshow(image, alpha=1)
279 |         plt.axis('off')
280 |         mask    = np.zeros((image.size[1], image.size[0]))
281 |         for sub_output in outputs:
282 |             sub_output = sub_output.cpu().numpy()
283 |             b, c, h, w = np.shape(sub_output)
284 |             sub_output = np.transpose(np.reshape(sub_output, [b, 3, -1, h, w]), [0, 3, 4, 1, 2])[0]
285 |             score      = np.max(sigmoid(sub_output[..., 4]), -1)
286 |             score      = cv2.resize(score, (image.size[0], image.size[1]))
287 |             normed_score    = (score * 255).astype('uint8')
288 |             mask            = np.maximum(mask, normed_score)
289 |             
290 |         plt.imshow(mask, alpha=0.5, interpolation='nearest', cmap="jet")
291 | 
292 |         plt.axis('off')
293 |         plt.subplots_adjust(top=1, bottom=0, right=1,  left=0, hspace=0, wspace=0)
294 |         plt.margins(0, 0)
295 |         plt.savefig(heatmap_save_path, dpi=200, bbox_inches='tight', pad_inches = -0.1)
296 |         print("Save to the " + heatmap_save_path)
297 |         plt.show()
298 | 
299 |     def convert_to_onnx(self, simplify, model_path):
300 |         import onnx
301 |         self.generate(onnx=True)
302 | 
303 |         im                  = torch.zeros(1, 3, *self.input_shape).to('cpu')  # image size(1, 3, 512, 512) BCHW
304 |         input_layer_names   = ["images"]
305 |         output_layer_names  = ["output"]
306 |         
307 |         # Export the model
308 |         print(f'Starting export with onnx {onnx.__version__}.')
309 |         torch.onnx.export(self.net,
310 |                         im,
311 |                         f               = model_path,
312 |                         verbose         = False,
313 |                         opset_version   = 12,
314 |                         training        = torch.onnx.TrainingMode.EVAL,
315 |                         do_constant_folding = True,
316 |                         input_names     = input_layer_names,
317 |                         output_names    = output_layer_names,
318 |                         dynamic_axes    = None)
319 | 
320 |         # Checks
321 |         model_onnx = onnx.load(model_path)  # load onnx model
322 |         onnx.checker.check_model(model_onnx)  # check onnx model
323 | 
324 |         # Simplify onnx
325 |         if simplify:
326 |             import onnxsim
327 |             print(f'Simplifying with onnx-simplifier {onnxsim.__version__}.')
328 |             model_onnx, check = onnxsim.simplify(
329 |                 model_onnx,
330 |                 dynamic_input_shape=False,
331 |                 input_shapes=None)
332 |             assert check, 'assert check failed'
333 |             onnx.save(model_onnx, model_path)
334 | 
335 |         print('Onnx model save as {}'.format(model_path))
336 | 
337 |     def get_map_txt(self, image_id, image, class_names, map_out_path):
338 |         f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"), "w", encoding='utf-8') 
339 |         image_shape = np.array(np.shape(image)[0:2])
340 |         #---------------------------------------------------------#
341 |         #   在这里将图像转换成RGB图像，防止灰度图在预测时报错。
342 |         #   代码仅仅支持RGB图像的预测，所有其它类型的图像都会转化成RGB
343 |         #---------------------------------------------------------#
344 |         image       = cvtColor(image)
345 |         #---------------------------------------------------------#
346 |         #   给图像增加灰条，实现不失真的resize
347 |         #   也可以直接resize进行识别
348 |         #---------------------------------------------------------#
349 |         image_data  = resize_image(image, (self.input_shape[1], self.input_shape[0]), self.letterbox_image)
350 |         #---------------------------------------------------------#
351 |         #   添加上batch_size维度
352 |         #---------------------------------------------------------#
353 |         image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)
354 | 
355 |         with torch.no_grad():
356 |             images = torch.from_numpy(image_data)
357 |             if self.cuda:
358 |                 images = images.cuda()
359 |             #---------------------------------------------------------#
360 |             #   将图像输入网络当中进行预测！
361 |             #---------------------------------------------------------#
362 |             outputs = self.net(images)
363 |             outputs = self.bbox_util.decode_box(outputs)
364 |             #---------------------------------------------------------#
365 |             #   将预测框进行堆叠，然后进行非极大抑制
366 |             #---------------------------------------------------------#
367 |             results = self.bbox_util.non_max_suppression(torch.cat(outputs, 1), self.num_classes, self.input_shape, 
368 |                         image_shape, self.letterbox_image, conf_thres = self.confidence, nms_thres = self.nms_iou)
369 |                                                     
370 |             if results[0] is None: 
371 |                 return 
372 | 
373 |             top_label   = np.array(results[0][:, 7], dtype = 'int32')
374 |             top_conf    = results[0][:, 5] * results[0][:, 6]
375 |             top_rboxes  = results[0][:, :5]
376 |         for i, c in list(enumerate(top_label)):
377 |             predicted_class = self.class_names[int(c)]
378 |             obb             = top_rboxes[i]
379 |             score           = str(top_conf[i])
380 | 
381 |             xc, yc, w, h, angle = obb
382 | 
383 |             if predicted_class not in class_names:
384 |                 continue
385 | 
386 |             f.write("%s %s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(xc)), str(int(yc)), str(int(w)), str(int(h)), str(math.degrees(angle))))
387 | 
388 |         f.close()
389 |         return 
390 | 


--------------------------------------------------------------------------------
/常见问题汇总.md:
--------------------------------------------------------------------------------
  1 | 问题汇总的博客地址为[https://blog.csdn.net/weixin_44791964/article/details/107517428](https://blog.csdn.net/weixin_44791964/article/details/107517428)。
  2 | 
  3 | # 问题汇总
  4 | ## 1、下载问题
  5 | ### a、代码下载
  6 | **问：up主，可以给我发一份代码吗，代码在哪里下载啊？ 
  7 | 答：Github上的地址就在视频简介里。复制一下就能进去下载了。**
  8 | 
  9 | **问：up主，为什么我下载的代码提示压缩包损坏？
 10 | 答：重新去Github下载。**
 11 | 
 12 | **问：up主，为什么我下载的代码和你在视频以及博客上的代码不一样？
 13 | 答：我常常会对代码进行更新，最终以实际的代码为准。**
 14 | 
 15 | ### b、 权值下载
 16 | **问：up主，为什么我下载的代码里面，model_data下面没有.pth或者.h5文件？ 
 17 | 答：我一般会把权值上传到Github和百度网盘，在GITHUB的README里面就能找到。**
 18 | 
 19 | ### c、 数据集下载
 20 | **问：up主，XXXX数据集在哪里下载啊？
 21 | 答：一般数据集的下载地址我会放在README里面，基本上都有，没有的话请及时联系我添加，直接发github的issue即可**。
 22 | 
 23 | ## 2、环境配置问题
 24 | ### a、20系列及以下显卡环境配置
 25 | **pytorch代码对应的pytorch版本为1.2，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/106037141](https://blog.csdn.net/weixin_44791964/article/details/106037141)。
 26 | 
 27 | **keras代码对应的tensorflow版本为1.13.2，keras版本是2.1.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/104702142](https://blog.csdn.net/weixin_44791964/article/details/104702142)。
 28 | 
 29 | **tf2代码对应的tensorflow版本为2.2.0，无需安装keras，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/109161493](https://blog.csdn.net/weixin_44791964/article/details/109161493)。
 30 | 
 31 | **问：你的代码某某某版本的tensorflow和pytorch能用嘛？
 32 | 答：最好按照我推荐的配置，配置教程也有！其它版本的我没有试过！可能出现问题但是一般问题不大。仅需要改少量代码即可。**
 33 | 
 34 | ### b、30系列显卡环境配置
 35 | 30系显卡由于框架更新不可使用上述环境配置教程。
 36 | 当前我已经测试的可以用的30显卡配置如下：
 37 | **pytorch代码对应的pytorch版本为1.7.0，cuda为11.0，cudnn为8.0.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120668551](https://blog.csdn.net/weixin_44791964/article/details/120668551)。
 38 | 
 39 | **keras代码无法在win10下配置cuda11，在ubuntu下可以百度查询一下，配置tensorflow版本为1.15.4，keras版本是2.1.5或者2.3.1（少量函数接口不同，代码可能还需要少量调整。）**
 40 | 
 41 | **tf2代码对应的tensorflow版本为2.4.0，cuda为11.0，cudnn为8.0.5，博客地址对应为**[https://blog.csdn.net/weixin_44791964/article/details/120657664](https://blog.csdn.net/weixin_44791964/article/details/120657664)。
 42 | 
 43 | ### c、CPU环境配置
 44 | **pytorch代码对应的pytorch-cpu版本为1.2，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120655098](https://blog.csdn.net/weixin_44791964/article/details/120655098)
 45 | 
 46 | **keras代码对应的tensorflow-cpu版本为1.13.2，keras版本是2.1.5，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120653717](https://blog.csdn.net/weixin_44791964/article/details/120653717)。
 47 | 
 48 | **tf2代码对应的tensorflow-cpu版本为2.2.0，无需安装keras，博客地址对应**[https://blog.csdn.net/weixin_44791964/article/details/120656291](https://blog.csdn.net/weixin_44791964/article/details/120656291)。
 49 | 
 50 | 
 51 | ### d、GPU利用问题与环境使用问题
 52 | **问：为什么我安装了tensorflow-gpu但是却没用利用GPU进行训练呢？
 53 | 答：确认tensorflow-gpu已经装好，利用pip list查看tensorflow版本，然后查看任务管理器或者利用nvidia命令看看是否使用了gpu进行训练，任务管理器的话要看显存使用情况。**
 54 | 
 55 | **问：up主，我好像没有在用gpu进行训练啊，怎么看是不是用了GPU进行训练？
 56 | 答：查看是否使用GPU进行训练一般使用NVIDIA在命令行的查看命令。在windows电脑中打开cmd然后利用nvidia-smi指令查看GPU利用情况**
 57 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/f88ef794c9a341918f000eb2b1c67af6.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
 58 | **如果要一定看任务管理器的话，请看性能部分GPU的显存是否利用，或者查看任务管理器的Cuda，而非Copy。**
 59 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20201013234241524.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70#pic_center)
 60 | 
 61 | ### e、DLL load failed: 找不到指定的模块
 62 | **问：出现如下错误**
 63 | ```python
 64 | Traceback (most recent call last):
 65 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
 66 |  from tensorflow.python.pywrap_tensorflow_internal import *
 67 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
 68 | pywrap_tensorflow_internal = swig_import_helper()
 69 |   File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
 70 |     _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
 71 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 243, in load_modulereturn load_dynamic(name, filename, file)
 72 | File "C:\Users\focus\Anaconda3\ana\envs\tensorflow-gpu\lib\imp.py", line 343, in load_dynamic
 73 |     return _load(spec)
 74 | ImportError: DLL load failed: 找不到指定的模块。
 75 | ```
 76 | **答：如果没重启过就重启一下，否则重新按照步骤安装，还无法解决则把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本私聊告诉我。**
 77 | 
 78 | ### f、no module问题（no module name utils.utils、no module named 'matplotlib' ）
 79 | **问：为什么提示说no module name utils.utils（no module name nets.yolo、no module name nets.ssd等一系列问题）啊？
 80 | 答：utils并不需要用pip装，它就在我上传的仓库的根目录，出现这个问题的原因是根目录不对，查查相对目录和根目录的概念。查了基本上就明白了。**
 81 | 
 82 | **问：为什么提示说no module name matplotlib（no module name PIL，no module name cv2等等）？
 83 | 答：这个库没安装打开命令行安装就好。pip install matplotlib**
 84 | 
 85 | **问：为什么我已经用pip装了opencv（pillow、matplotlib等），还是提示no module name cv2？
 86 | 答：没有激活环境装，要激活对应的conda环境进行安装才可以正常使用**
 87 | 
 88 | **问：为什么提示说No module named 'torch' ？
 89 | 答：其实我也真的很想知道为什么会有这个问题……这个pytorch没装是什么情况？一般就俩情况，一个是真的没装，还有一个是装到其它环境了，当前激活的环境不是自己装的环境。**
 90 | 
 91 | **问：为什么提示说No module named 'tensorflow' ？
 92 | 答：同上。**
 93 | 
 94 | ### g、cuda安装失败问题
 95 | 一般cuda安装前需要安装Visual Studio，装个2017版本即可。
 96 | 
 97 | ### h、Ubuntu系统问题
 98 | **所有代码在Ubuntu下可以使用，我两个系统都试过。**
 99 | 
100 | ### i、VSCODE提示错误的问题
101 | **问：为什么在VSCODE里面提示一大堆的错误啊？
102 | 答：我也提示一大堆的错误，但是不影响，是VSCODE的问题，如果不想看错误的话就装Pycharm。
103 | 最好将设置里面的Python:Language Server，调整为Pylance。**
104 | 
105 | ### j、使用cpu进行训练与预测的问题
106 | **对于keras和tf2的代码而言，如果想用cpu进行训练和预测，直接装cpu版本的tensorflow就可以了。**
107 | 
108 | **对于pytorch的代码而言，如果想用cpu进行训练和预测，需要将cuda=True修改成cuda=False。**
109 | 
110 | ### k、tqdm没有pos参数问题
111 | **问：运行代码提示'tqdm' object has no attribute 'pos'。
112 | 答：重装tqdm，换个版本就可以了。**
113 | 
114 | ### l、提示decode(“utf-8”)的问题
115 | **由于h5py库的更新，安装过程中会自动安装h5py=3.0.0以上的版本，会导致decode("utf-8")的错误！
116 | 各位一定要在安装完tensorflow后利用命令装h5py=2.10.0！**
117 | ```
118 | pip install h5py==2.10.0
119 | ```
120 | 
121 | ### m、提示TypeError: __array__() takes 1 positional argument but 2 were given错误
122 | 可以修改pillow版本解决。
123 | ```
124 | pip install pillow==8.2.0
125 | ```
126 | ### n、如何查看当前cuda和cudnn
127 | **window下cuda版本查看方式如下：
128 | 1、打开cmd窗口。
129 | 2、输入nvcc -V。
130 | 3、Cuda compilation tools, release XXXXXXXX中的XXXXXXXX即cuda版本。**
131 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/0389ea35107a408a80ab5cb6590d5a74.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
132 | window下cudnn版本查看方式如下：
133 | 1、进入cuda安装目录，进入incude文件夹。
134 | 2、找到cudnn.h文件。
135 | 3、右键文本打开，下拉，看到#define处可获得cudnn版本。
136 | ```python
137 | #define CUDNN_MAJOR 7
138 | #define CUDNN_MINOR 4
139 | #define CUDNN_PATCHLEVEL 1
140 | ```
141 | 代表cudnn为7.4.1。
142 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/7a86b68b17c84feaa6fa95780d4ae4b4.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
143 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/81bb7c3e13cc492292530e4b69df86a9.png?x-oss-process=image/watermark,type_d3F5LXplbmhlaQ,shadow_50,text_Q1NETiBAQnViYmxpaWlpbmc=,size_20,color_FFFFFF,t_70,g_se,x_16)
144 | 
145 | ### o、为什么按照你的环境配置后还是不能使用
146 | **问：up主，为什么我按照你的环境配置后还是不能使用？
147 | 答：请把你的GPU、CUDA、CUDNN、TF版本以及PYTORCH版本B站私聊告诉我。**
148 | 
149 | ### p、其它问题
150 | **问：为什么提示TypeError: cat() got an unexpected keyword argument 'axis'，Traceback (most recent call last)，AttributeError: 'Tensor' object has no attribute 'bool'？
151 | 答：这是版本问题，建议使用torch1.2以上版本**
152 | 
153 | **其它有很多稀奇古怪的问题，很多是版本问题，建议按照我的视频教程安装Keras和tensorflow。比如装的是tensorflow2，就不用问我说为什么我没法运行Keras-yolo啥的。那是必然不行的。**
154 | 
155 | ## 3、目标检测库问题汇总（人脸检测和分类库也可参考）
156 | ### a、shape不匹配问题。
157 | #### 1）、训练时shape不匹配问题。
158 | **问：up主，为什么运行train.py会提示shape不匹配啊？
159 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
160 | 
161 | #### 2）、预测时shape不匹配问题。
162 | **问：为什么我运行predict.py会提示我说shape不匹配呀。**
163 | ##### i、copying a param with shape torch.Size([75, 704, 1, 1]) from checkpoint
164 | 在Pytorch里面是这样的：
165 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
166 | ##### ii、Shapes are [1,1,1024,75] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,75], [255,1024,1,1].
167 | 在Keras里面是这样的：
168 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
169 | **答：原因主要有仨：
170 | 1、训练的classes_path没改，就开始训练了。
171 | 2、训练的model_path没改。
172 | 3、训练的classes_path没改。
173 | 请检查清楚了！确定自己所用的model_path和classes_path是对应的！训练的时候用到的num_classes或者classes_path也需要检查！**
174 | 
175 | ### b、显存不足问题（OOM、RuntimeError: CUDA out of memory）。
176 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
177 | 答：这是在keras中出现的，爆显存了，可以改小batch_size，SSD的显存占用率是最小的，建议用SSD；
178 | 2G显存：SSD、YOLOV4-TINY
179 | 4G显存：YOLOV3
180 | 6G显存：YOLOV4、Retinanet、M2det、Efficientdet、Faster RCNN等
181 | 8G+显存：随便选吧。**
182 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
183 | 
184 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
185 | 答：这是pytorch中出现的，爆显存了，同上。**
186 | 
187 | **问：为什么我显存都没利用，就直接爆显存了？ 
188 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
189 | ### c、为什么要进行冻结训练与解冻训练，不进行行吗？
190 | **问：为什么要冻结训练和解冻训练呀？
191 | 答：可以不进行，本质上是为了保证性能不足的同学的训练，如果电脑性能完全不够，可以将Freeze_Epoch和UnFreeze_Epoch设置成一样，只进行冻结训练。**
192 | 
193 | **同时这也是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
194 | 在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。
195 | 在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。
196 | 
197 | ### d、我的LOSS好大啊，有问题吗？（我的LOSS好小啊，有问题吗？）
198 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
199 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
200 | 
201 | ### e、为什么我训练出来的模型没有预测结果？
202 | **问：为什么我的训练效果不好？预测了没有框（框不准）。
203 | 答：**
204 | 考虑几个问题：
205 | 1、目标信息问题，查看2007_train.txt文件是否有目标信息，没有的话请修改voc_annotation.py。
206 | 2、数据集问题，小于500的自行考虑增加数据集，同时测试不同的模型，确认数据集是好的。
207 | 3、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
208 | 4、网络问题，比如SSD不适合小目标，因为先验框固定了。
209 | 5、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
210 | 6、确认自己是否按照步骤去做了，如果比如voc_annotation.py里面的classes是否修改了等。
211 | 7、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。
212 | 8、是否修改了网络的主干，如果修改了没有预训练权重，网络不容易收敛，自然效果不好。
213 | 
214 | ### f、为什么我计算出来的map是0？
215 | **问：为什么我的训练效果不好？没有map？
216 | 答：**
217 | 首先尝试利用predict.py预测一下，如果有效果的话应该是get_map.py里面的classes_path设置错误。如果没有预测结果的话，解决方法同e问题，对下面几点进行检查：
218 | 1、目标信息问题，查看2007_train.txt文件是否有目标信息，没有的话请修改voc_annotation.py。
219 | 2、数据集问题，小于500的自行考虑增加数据集，同时测试不同的模型，确认数据集是好的。
220 | 3、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
221 | 4、网络问题，比如SSD不适合小目标，因为先验框固定了。
222 | 5、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
223 | 6、确认自己是否按照步骤去做了，如果比如voc_annotation.py里面的classes是否修改了等。
224 | 7、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。
225 | 8、是否修改了网络的主干，如果修改了没有预训练权重，网络不容易收敛，自然效果不好。
226 | 
227 | ### g、gbk编码错误（'gbk' codec can't decode byte）。
228 | **问：我怎么出现了gbk什么的编码错误啊：**
229 | ```python
230 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
231 | ```
232 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
233 | 
234 | ### h、我的图片是xxx*xxx的分辨率的，可以用吗？
235 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
236 | **答：可以用，代码里面会自动进行resize与数据增强。**
237 | 
238 | ### i、我想进行数据增强！怎么增强？
239 | **问：我想要进行数据增强！怎么做呢？**
240 | **答：可以用，代码里面会自动进行resize与数据增强。**
241 | 
242 | ### j、多GPU训练。
243 | **问：怎么进行多GPU训练？
244 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
245 | 
246 | ### k、能不能训练灰度图？
247 | **问：能不能训练灰度图（预测灰度图）啊？
248 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
249 | 
250 | ### l、断点续练问题。
251 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
252 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
253 | 
254 | ### m、我要训练其它的数据集，预训练权重能不能用？
255 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
256 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
257 | 
258 | ### n、网络如何从0开始训练？
259 | **问：我要怎么不使用预训练权重啊？
260 | 答：看一看注释、大多数代码是model_path = ''，Freeze_Train = Fasle**，如果设置model_path无用，**那么把载入预训练权重的代码注释了就行。**
261 | 
262 | ### o、为什么从0开始训练效果这么差（修改了网络主干，效果不好怎么办）？
263 | **问：为什么我不使用预训练权重效果这么差啊？
264 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
265 | 
266 | **问：up，我修改了网络，预训练权重还能用吗？
267 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
268 | 权值匹配的方式可以参考如下：
269 | ```python
270 | # 加快模型训练的效率
271 | print('Loading weights into state dict...')
272 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
273 | model_dict = model.state_dict()
274 | pretrained_dict = torch.load(model_path, map_location=device)
275 | a = {}
276 | for k, v in pretrained_dict.items():
277 |     try:    
278 |         if np.shape(model_dict[k]) ==  np.shape(v):
279 |             a[k]=v
280 |     except:
281 |         pass
282 | model_dict.update(a)
283 | model.load_state_dict(model_dict)
284 | print('Finished!')
285 | ```
286 | 
287 | **问：为什么从0开始训练效果这么差（我修改了网络主干，效果不好怎么办）？
288 | 答：一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。
289 | 网络修改了主干之后也是同样的问题，随机的权值效果很差。**
290 | 
291 | **问：怎么在模型上从0开始训练？
292 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
293 | 如果一定要从0开始，那么训练的时候请注意几点：
294 |  - 不载入预训练权重。 
295 |  - 不要进行冻结训练，注释冻结模型的代码。
296 | 
297 | **问：为什么我不使用预训练权重效果这么差啊？
298 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
299 | 
300 | ### p、你的权值都是哪里来的？
301 | **问：如果网络不能从0开始训练的话你的权值哪里来的？
302 | 答：有些权值是官方转换过来的，有些权值是自己训练出来的，我用到的主干的imagenet的权值都是官方的。**
303 | 
304 | ### q、视频检测与摄像头检测
305 | **问：怎么用摄像头检测呀？
306 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
307 | 
308 | **问：怎么用视频检测呀？
309 | 答：同上**
310 | 
311 | ### r、如何保存检测出的图片
312 | **问：检测完的图片怎么保存？
313 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
314 | 
315 | **问：怎么用视频保存呀？
316 | 答：详细看看predict.py文件的注释。**
317 | 
318 | ### s、遍历问题
319 | **问：如何对一个文件夹的图片进行遍历？
320 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
321 | 
322 | **问：如何对一个文件夹的图片进行遍历？并且保存。
323 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
324 | 
325 | ### t、路径问题（No such file or directory、StopIteration: [Errno 13] Permission denied: 'XXXXXX'）
326 | **问：我怎么出现了这样的错误呀：**
327 | ```python
328 | FileNotFoundError: 【Errno 2】 No such file or directory
329 | StopIteration: [Errno 13] Permission denied: 'D:\\Study\\Collection\\Dataset\\VOC07+12+test\\VOCdevkit/VOC2007'
330 | ……………………………………
331 | ……………………………………
332 | ```
333 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
334 | 关于路径有几个重要的点：
335 | **文件夹名称中一定不要有空格。
336 | 注意相对路径和绝对路径。
337 | 多百度路径相关的知识。**
338 | 
339 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
340 | ### u、和原版比较问题，你怎么和原版不一样啊？
341 | **问：原版的代码是XXX，为什么你的代码是XXX？
342 | 答：是啊……这要不怎么说我不是原版呢……**
343 | 
344 | **问：你这个代码和原版比怎么样，可以达到原版的效果么？
345 | 答：基本上可以达到，我都用voc数据测过，我没有好显卡，没有能力在coco上测试与训练。**
346 | 
347 | **问：你有没有实现yolov4所有的tricks，和原版差距多少？
348 | 答：并没有实现全部的改进部分，由于YOLOV4使用的改进实在太多了，很难完全实现与列出来，这里只列出来了一些我比较感兴趣，而且非常有效的改进。论文中提到的SAM（注意力机制模块），作者自己的源码也没有使用。还有其它很多的tricks，不是所有的tricks都有提升，我也没法实现全部的tricks。至于和原版的比较，我没有能力训练coco数据集，根据使用过的同学反应差距不大。**
349 | 
350 | ### v、我的检测速度是xxx正常吗？我的检测速度还能增快吗？
351 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
352 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
353 | 
354 | **问：我的检测速度是xxx正常吗？我的检测速度还能增快吗？
355 | 答：看配置，配置好速度就快，如果想要配置不变的情况下加快速度，就要修改网络了。**
356 | 
357 | **问：为什么我用服务器去测试yolov4（or others）的FPS只有十几？
358 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。**
359 | 
360 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
361 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
362 | 
363 | ### w、预测图片不显示问题
364 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
365 | 答：给系统安装一个图片查看器就行了。**
366 | 
367 | ### x、算法评价问题（目标检测的map、PR曲线、Recall、Precision等）
368 | **问：怎么计算map？
369 | 答：看map视频，都一个流程。**
370 | 
371 | **问：计算map的时候，get_map.py里面有一个MINOVERLAP是什么用的，是iou吗？
372 | 答：是iou，它的作用是判断预测框和真实框的重合成度，如果重合程度大于MINOVERLAP，则预测正确。**
373 | 
374 | **问：为什么get_map.py里面的self.confidence（self.score）要设置的那么小？
375 | 答：看一下map的视频的原理部分，要知道所有的结果然后再进行pr曲线的绘制。**
376 | 
377 | **问：能不能说说怎么绘制PR曲线啥的呀。
378 | 答：可以看mAP视频，结果里面有PR曲线。**
379 | 
380 | **问：怎么计算Recall、Precision指标。
381 | 答：这俩指标应该是相对于特定的置信度的，计算map的时候也会获得。**
382 | 
383 | ### y、coco数据集训练问题
384 | **问：目标检测怎么训练COCO数据集啊？。
385 | 答：coco数据训练所需要的txt文件可以参考qqwweee的yolo3的库，格式都是一样的。**
386 | 
387 | ### z、UP，怎么优化模型啊？我想提升效果
388 | **问：up，怎么修改模型啊，我想发个小论文！
389 | 答：建议看看yolov3和yolov4的区别，然后看看yolov4的论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。**
390 | 
391 | ### aa、UP，有Focal LOSS的代码吗？怎么改啊？
392 | **问：up，YOLO系列使用Focal LOSS的代码你有吗，有提升吗？
393 | 答：很多人试过，提升效果也不大（甚至变的更Low），它自己有自己的正负样本的平衡方式**。改代码的事情，还是自己好好看看代码吧。
394 | 
395 | ### ab、部署问题（ONNX、TensorRT等）
396 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
397 | 
398 | ## 4、语义分割库问题汇总
399 | ### a、shape不匹配问题
400 | #### 1）、训练时shape不匹配问题
401 | **问：up主，为什么运行train.py会提示shape不匹配啊？
402 | 答：在keras环境中，因为你训练的种类和原始的种类不同，网络结构会变化，所以最尾部的shape会有少量不匹配。**
403 | 
404 | #### 2）、预测时shape不匹配问题
405 | **问：为什么我运行predict.py会提示我说shape不匹配呀。**
406 | ##### i、copying a param with shape torch.Size([75, 704, 1, 1]) from checkpoint
407 | 在Pytorch里面是这样的：
408 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171631901.png)
409 | ##### ii、Shapes are [1,1,1024,75] and [255,1024,1,1]. for 'Assign_360' (op: 'Assign') with input shapes: [1,1,1024,75], [255,1024,1,1].
410 | 在Keras里面是这样的：
411 | ![在这里插入图片描述](https://img-blog.csdnimg.cn/20200722171523380.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NDc5MTk2NA==,size_16,color_FFFFFF,t_70)
412 | **答：原因主要有二：
413 | 1、train.py里面的num_classes没改。
414 | 2、预测时num_classes没改。
415 | 3、预测时model_path没改。
416 | 请检查清楚！训练和预测的时候用到的num_classes都需要检查！**
417 | 
418 | ### b、显存不足问题（OOM、RuntimeError: CUDA out of memory）。
419 | **问：为什么我运行train.py下面的命令行闪的贼快，还提示OOM啥的？ 
420 | 答：这是在keras中出现的，爆显存了，可以改小batch_size。**
421 | 
422 | **需要注意的是，受到BatchNorm2d影响，batch_size不可为1，至少为2。**
423 | 
424 | **问：为什么提示 RuntimeError: CUDA out of memory. Tried to allocate 52.00 MiB (GPU 0; 15.90 GiB total capacity; 14.85 GiB already allocated; 51.88 MiB free; 15.07 GiB reserved in total by PyTorch)？ 
425 | 答：这是pytorch中出现的，爆显存了，同上。**
426 | 
427 | **问：为什么我显存都没利用，就直接爆显存了？ 
428 | 答：都爆显存了，自然就不利用了，模型没有开始训练。**
429 | 
430 | ### c、为什么要进行冻结训练与解冻训练，不进行行吗？
431 | **问：为什么要冻结训练和解冻训练呀？
432 | 答：可以不进行，本质上是为了保证性能不足的同学的训练，如果电脑性能完全不够，可以将Freeze_Epoch和UnFreeze_Epoch设置成一样，只进行冻结训练。**
433 | 
434 | **同时这也是迁移学习的思想，因为神经网络主干特征提取部分所提取到的特征是通用的，我们冻结起来训练可以加快训练效率，也可以防止权值被破坏。**
435 | 在冻结阶段，模型的主干被冻结了，特征提取网络不发生改变。占用的显存较小，仅对网络进行微调。
436 | 在解冻阶段，模型的主干不被冻结了，特征提取网络会发生改变。占用的显存较大，网络所有的参数都会发生改变。
437 | 
438 | ### d、我的LOSS好大啊，有问题吗？（我的LOSS好小啊，有问题吗？）
439 | **问：为什么我的网络不收敛啊，LOSS是XXXX。
440 | 答：不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，我的yolo代码都没有归一化，所以LOSS值看起来比较高，LOSS的值不重要，重要的是是否在变小，预测是否有效果。**
441 | 
442 | ### e、为什么我训练出来的模型没有预测结果？
443 | **问：为什么我的训练效果不好？预测了没有框（框不准）。
444 | 答：**
445 | **考虑几个问题：
446 | 1、数据集问题，这是最重要的问题。小于500的自行考虑增加数据集；一定要检查数据集的标签，视频中详细解析了VOC数据集的格式，但并不是有输入图片有输出标签即可，还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对，最常见的错误格式就是标签的背景为黑，目标为白，此时目标的像素点值为255，无法正常训练，目标需要为1才行。
447 | 2、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
448 | 3、网络问题，可以尝试不同的网络。
449 | 4、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
450 | 5、确认自己是否按照步骤去做了。
451 | 6、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。**
452 | 
453 | **问：为什么我的训练效果不好？对小目标预测不准确。
454 | 答：对于deeplab和pspnet而言，可以修改一下downsample_factor，当downsample_factor为16的时候下采样倍数过多，效果不太好，可以修改为8。**
455 | 
456 | ### f、为什么我计算出来的miou是0？
457 | **问：为什么我的训练效果不好？计算出来的miou是0？。**
458 | 答：
459 | 与e类似，**考虑几个问题：
460 | 1、数据集问题，这是最重要的问题。小于500的自行考虑增加数据集；一定要检查数据集的标签，视频中详细解析了VOC数据集的格式，但并不是有输入图片有输出标签即可，还需要确认标签的每一个像素值是否为它对应的种类。很多同学的标签格式不对，最常见的错误格式就是标签的背景为黑，目标为白，此时目标的像素点值为255，无法正常训练，目标需要为1才行。
461 | 2、是否解冻训练，如果数据集分布与常规画面差距过大需要进一步解冻训练，调整主干，加强特征提取能力。
462 | 3、网络问题，可以尝试不同的网络。
463 | 4、训练时长问题，有些同学只训练了几代表示没有效果，按默认参数训练完。
464 | 5、确认自己是否按照步骤去做了。
465 | 6、不同网络的LOSS不同，LOSS只是一个参考指标，用于查看网络是否收敛，而非评价网络好坏，LOSS的值不重要，重要的是是否收敛。**
466 | 
467 | ### g、gbk编码错误（'gbk' codec can't decode byte）。
468 | **问：我怎么出现了gbk什么的编码错误啊：**
469 | ```python
470 | UnicodeDecodeError: 'gbk' codec can't decode byte 0xa6 in position 446: illegal multibyte sequence
471 | ```
472 | **答：标签和路径不要使用中文，如果一定要使用中文，请注意处理的时候编码的问题，改成打开文件的encoding方式改为utf-8。**
473 | 
474 | ### h、我的图片是xxx*xxx的分辨率的，可以用吗？
475 | **问：我的图片是xxx*xxx的分辨率的，可以用吗！**
476 | **答：可以用，代码里面会自动进行resize与数据增强。**
477 | 
478 | ### i、我想进行数据增强！怎么增强？
479 | **问：我想要进行数据增强！怎么做呢？**
480 | **答：可以用，代码里面会自动进行resize与数据增强。**
481 | 
482 | ### j、多GPU训练。
483 | **问：怎么进行多GPU训练？
484 | 答：pytorch的大多数代码可以直接使用gpu训练，keras的话直接百度就好了，实现并不复杂，我没有多卡没法详细测试，还需要各位同学自己努力了。**
485 | 
486 | ### k、能不能训练灰度图？
487 | **问：能不能训练灰度图（预测灰度图）啊？
488 | 答：我的大多数库会将灰度图转化成RGB进行训练和预测，如果遇到代码不能训练或者预测灰度图的情况，可以尝试一下在get_random_data里面将Image.open后的结果转换成RGB，预测的时候也这样试试。（仅供参考）**
489 | 
490 | ### l、断点续练问题。
491 | **问：我已经训练过几个世代了，能不能从这个基础上继续开始训练
492 | 答：可以，你在训练前，和载入预训练权重一样载入训练过的权重就行了。一般训练好的权重会保存在logs文件夹里面，将model_path修改成你要开始的权值的路径即可。**
493 | 
494 | ### m、我要训练其它的数据集，预训练权重能不能用？
495 | **问：如果我要训练其它的数据集，预训练权重要怎么办啊？**
496 | **答：数据的预训练权重对不同数据集是通用的，因为特征是通用的，预训练权重对于99%的情况都必须要用，不用的话权值太过随机，特征提取效果不明显，网络训练的结果也不会好。**
497 | 
498 | ### n、网络如何从0开始训练？
499 | **问：我要怎么不使用预训练权重啊？
500 | 答：看一看注释、大多数代码是model_path = ''，Freeze_Train = Fasle**，如果设置model_path无用，**那么把载入预训练权重的代码注释了就行。**
501 | 
502 | ### o、为什么从0开始训练效果这么差（修改了网络主干，效果不好怎么办）？
503 | **问：为什么我不使用预训练权重效果这么差啊？
504 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，预训练权重还是非常重要的。**
505 | 
506 | **问：up，我修改了网络，预训练权重还能用吗？
507 | 答：修改了主干的话，如果不是用的现有的网络，基本上预训练权重是不能用的，要么就自己判断权值里卷积核的shape然后自己匹配，要么只能自己预训练去了；修改了后半部分的话，前半部分的主干部分的预训练权重还是可以用的，如果是pytorch代码的话，需要自己修改一下载入权值的方式，判断shape后载入，如果是keras代码，直接by_name=True,skip_mismatch=True即可。**
508 | 权值匹配的方式可以参考如下：
509 | ```python
510 | # 加快模型训练的效率
511 | print('Loading weights into state dict...')
512 | device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
513 | model_dict = model.state_dict()
514 | pretrained_dict = torch.load(model_path, map_location=device)
515 | a = {}
516 | for k, v in pretrained_dict.items():
517 |     try:    
518 |         if np.shape(model_dict[k]) ==  np.shape(v):
519 |             a[k]=v
520 |     except:
521 |         pass
522 | model_dict.update(a)
523 | model.load_state_dict(model_dict)
524 | print('Finished!')
525 | ```
526 | 
527 | **问：为什么从0开始训练效果这么差（我修改了网络主干，效果不好怎么办）？
528 | 答：一般来讲，网络从0开始的训练效果会很差，因为权值太过随机，特征提取效果不明显，因此非常、非常、非常不建议大家从0开始训练！如果一定要从0开始，可以了解imagenet数据集，首先训练分类模型，获得网络的主干部分权值，分类模型的 主干部分 和该模型通用，基于此进行训练。
529 | 网络修改了主干之后也是同样的问题，随机的权值效果很差。**
530 | 
531 | **问：怎么在模型上从0开始训练？
532 | 答：在算力不足与调参能力不足的情况下从0开始训练毫无意义。模型特征提取能力在随机初始化参数的情况下非常差。没有好的参数调节能力和算力，无法使得网络正常收敛。**
533 | 如果一定要从0开始，那么训练的时候请注意几点：
534 |  - 不载入预训练权重。 
535 |  - 不要进行冻结训练，注释冻结模型的代码。
536 | 
537 | **问：为什么我不使用预训练权重效果这么差啊？
538 | 答：因为随机初始化的权值不好，提取的特征不好，也就导致了模型训练的效果不好，voc07+12、coco+voc07+12效果都不一样，预训练权重还是非常重要的。**
539 | 
540 | ### p、你的权值都是哪里来的？
541 | **问：如果网络不能从0开始训练的话你的权值哪里来的？
542 | 答：有些权值是官方转换过来的，有些权值是自己训练出来的，我用到的主干的imagenet的权值都是官方的。**
543 | 
544 | 
545 | ### q、视频检测与摄像头检测
546 | **问：怎么用摄像头检测呀？
547 | 答：predict.py修改参数可以进行摄像头检测，也有视频详细解释了摄像头检测的思路。**
548 | 
549 | **问：怎么用视频检测呀？
550 | 答：同上**
551 | 
552 | ### r、如何保存检测出的图片
553 | **问：检测完的图片怎么保存？
554 | 答：一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。详细看看predict.py文件的注释。**
555 | 
556 | **问：怎么用视频保存呀？
557 | 答：详细看看predict.py文件的注释。**
558 | 
559 | ### s、遍历问题
560 | **问：如何对一个文件夹的图片进行遍历？
561 | 答：一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了，详细看看predict.py文件的注释。**
562 | 
563 | **问：如何对一个文件夹的图片进行遍历？并且保存。
564 | 答：遍历的话一般使用os.listdir先找出文件夹里面的所有图片，然后根据predict.py文件里面的执行思路检测图片就行了。保存的话一般目标检测用的是Image，所以查询一下PIL库的Image如何进行保存。如果有些库用的是cv2，那就是查一下cv2怎么保存图片。详细看看predict.py文件的注释。**
565 | 
566 | ### t、路径问题（No such file or directory、StopIteration: [Errno 13] Permission denied: 'XXXXXX'）
567 | **问：我怎么出现了这样的错误呀：**
568 | ```python
569 | FileNotFoundError: 【Errno 2】 No such file or directory
570 | StopIteration: [Errno 13] Permission denied: 'D:\\Study\\Collection\\Dataset\\VOC07+12+test\\VOCdevkit/VOC2007'
571 | ……………………………………
572 | ……………………………………
573 | ```
574 | **答：去检查一下文件夹路径，查看是否有对应文件；并且检查一下2007_train.txt，其中文件路径是否有错。**
575 | 关于路径有几个重要的点：
576 | **文件夹名称中一定不要有空格。
577 | 注意相对路径和绝对路径。
578 | 多百度路径相关的知识。**
579 | 
580 | **所有的路径问题基本上都是根目录问题，好好查一下相对目录的概念！**
581 | ### u、和原版比较问题，你怎么和原版不一样啊？
582 | **问：原版的代码是XXX，为什么你的代码是XXX？
583 | 答：是啊……这要不怎么说我不是原版呢……**
584 | 
585 | **问：你这个代码和原版比怎么样，可以达到原版的效果么？
586 | 答：基本上可以达到，我都用voc数据测过，我没有好显卡，没有能力在coco上测试与训练。**
587 | 
588 | ### v、我的检测速度是xxx正常吗？我的检测速度还能增快吗？
589 | **问：你这个FPS可以到达多少，可以到 XX FPS么？
590 | 答：FPS和机子的配置有关，配置高就快，配置低就慢。**
591 | 
592 | **问：我的检测速度是xxx正常吗？我的检测速度还能增快吗？
593 | 答：看配置，配置好速度就快，如果想要配置不变的情况下加快速度，就要修改网络了。**
594 | 
595 | **问：为什么论文中说速度可以达到XX，但是这里却没有？
596 | 答：检查是否正确安装了tensorflow-gpu或者pytorch的gpu版本，如果已经正确安装，可以去利用time.time()的方法查看detect_image里面，哪一段代码耗时更长（不仅只有网络耗时长，其它处理部分也会耗时，如绘图等）。有些论文还会使用多batch进行预测，我并没有去实现这个部分。**
597 | 
598 | ### w、预测图片不显示问题
599 | **问：为什么你的代码在预测完成后不显示图片？只是在命令行告诉我有什么目标。
600 | 答：给系统安装一个图片查看器就行了。**
601 | 
602 | ### x、算法评价问题（miou）
603 | **问：怎么计算miou？
604 | 答：参考视频里的miou测量部分。**
605 | 
606 | **问：怎么计算Recall、Precision指标。
607 | 答：现有的代码还无法获得，需要各位同学理解一下混淆矩阵的概念，然后自行计算一下。**
608 | 
609 | ### y、UP，怎么优化模型啊？我想提升效果
610 | **问：up，怎么修改模型啊，我想发个小论文！
611 | 答：建议目标检测中的yolov4论文，作为一个大型调参现场非常有参考意义，使用了很多tricks。我能给的建议就是多看一些经典模型，然后拆解里面的亮点结构并使用。**
612 | 
613 | ### z、部署问题（ONNX、TensorRT等）
614 | 我没有具体部署到手机等设备上过，所以很多部署问题我并不了解……
615 | 
616 | ## 5、交流群问题
617 | **问：up，有没有QQ群啥的呢？
618 | 答：没有没有，我没有时间管理QQ群……**
619 | 
620 | ## 6、怎么学习的问题
621 | **问：up，你的学习路线怎么样的？我是个小白我要怎么学？
622 | 答：这里有几点需要注意哈
623 | 1、我不是高手，很多东西我也不会，我的学习路线也不一定适用所有人。
624 | 2、我实验室不做深度学习，所以我很多东西都是自学，自己摸索，正确与否我也不知道。
625 | 3、我个人觉得学习更靠自学**
626 | 学习路线的话，我是先学习了莫烦的python教程，从tensorflow、keras、pytorch入门，入门完之后学的SSD，YOLO，然后了解了很多经典的卷积网，后面就开始学很多不同的代码了，我的学习方法就是一行一行的看，了解整个代码的执行流程，特征层的shape变化等，花了很多时间也没有什么捷径，就是要花时间吧。
627 | 


--------------------------------------------------------------------------------