├── README.md ├── app.py ├── cam ├── 1.png ├── 2.png ├── __pycache__ │ ├── base_camera.cpython-37.pyc │ └── base_camera.cpython-38.pyc ├── base_camera.py ├── camera.py ├── coco.names ├── result.png ├── test.jpg ├── test_re.jpg └── train.jpg ├── center ├── get_train_val.py └── xml_yolo.py ├── config ├── score.yaml ├── yolov3-spp.yaml ├── yolov5l.yaml ├── yolov5m.yaml ├── yolov5s.yaml └── yolov5x.yaml ├── detect.py ├── inference ├── inputs │ └── 2007_000033.jpg └── outputs │ └── 2007_000033.jpg ├── models ├── __pycache__ │ ├── common.cpython-37.pyc │ ├── de.cpython-37.pyc │ ├── experimental.cpython-37.pyc │ └── yolo.cpython-37.pyc ├── common.py ├── de.py ├── experimental.py ├── onnx_export.py └── yolo.py ├── requirements.txt ├── static ├── client.js ├── style.css ├── style1.css └── worker.js ├── templates └── index1.html ├── test.py ├── train.py └── utils ├── __init__.py ├── __pycache__ ├── __init__.cpython-37.pyc ├── datasets.cpython-37.pyc ├── google_utils.cpython-37.pyc ├── torch_utils.cpython-37.pyc └── utils.cpython-37.pyc ├── activations.py ├── datasets.py ├── google_utils.py ├── torch_utils.py └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | # 使用yolov5训练自己的数据集(详细过程)并通过flask部署 2 | 3 | #### 依赖库 4 | - torch 5 | - torchvision 6 | - numpy 7 | - opencv-python 8 | - lxml 9 | - tqdm 10 | - flask 11 | - pillow 12 | - tensorboard 13 | - matplotlib 14 | - pycocotools 15 | 16 | #### Windows,请使用 pycocotools-windows 代替 pycocotools 17 | 18 | #### Check all dependencies installed 19 | ``` 20 | pip install -r requirements.txt 21 | ``` 22 | ### 1.准备数据集 23 | 24 | 这里以PASCAL VOC数据集为例,[提取码: 07wp](https://pan.baidu.com/s/1u8k9wlLUklyLxQnaSrG4xQ) 25 | 将获取的数据集放到datasets目录下 26 | 数据集结构如下: 27 | ``` 28 | ---VOC2012 29 | --------Annotations 30 | ---------------xml0 31 | ---------------xml1 32 | --------JPEGImages 33 | ---------------img0 34 | ---------------img1 35 | --------pascal_voc_classes.txt 36 | ``` 37 | Annotations为所有的xml文件,JPEGImages为所有的图片文件,pascal_voc_classes.txt为类别文件。 38 | 39 | #### 获取标签文件 40 | yolo标签文件的格式如下: 41 | ``` 42 | 102 0.682813 0.415278 0.237500 0.502778 43 | 102 0.914844 0.396528 0.168750 0.451389 44 | 45 | 第一位 label,为图片中物体的类别 46 | 后面四位为图片中物体的位置,(xcenter, ycenter, h, w)即目标物体中心位置的相对坐标和相对高宽 47 | 上图中存在两个目标 48 | ``` 49 | 如果你已经拥有如上的label文件,可直接跳到下一步。 50 | 没有如上标签文件,可使用 [labelimg 提取码 dbi2](https://pan.baidu.com/s/1oEFodW83koHLcGasRoBZhA) 打标签。生成xml格式的label文件,再转为yolo格式的label文件。labelimg的使用非常简单,在此不在赘述。 51 | 52 | xml格式的label文件转为yolo格式: 53 | 54 | ``` 55 | python center/xml_yolo.py 56 | ``` 57 | 58 | pascal_voc_classes.txt,为你的类别对应的json文件。如下为voc数据集类别格式。 59 | ```python 60 | ["aeroplane","bicycle", "bird","boat","bottle","bus","car","cat","chair","cow","diningtable","dog","horse","motorbike","person","pottedplant","sheep","sofa","train", "tvmonitor"] 61 | ``` 62 | #### 运行上面代码后的路径结构 63 | ``` 64 | ---VOC2012 65 | --------Annotations 66 | --------JPEGImages 67 | --------pascal_voc_classes.json 68 | ---yolodata 69 | --------images 70 | --------labels 71 | ``` 72 | 73 | ### 2.划分训练集和测试集 74 | 训练集和测试集的划分很简单,将原始数据打乱,然后按 9 :1划分为训练集和测试集即可。代码如下: 75 | 76 | ``` 77 | python center/get_train_val.py 78 | ``` 79 | ##### 运行上面代码会生成如下路径结构 80 | ``` 81 | ---VOC2012 82 | --------Annotations 83 | --------JPEGImages 84 | --------pascal_voc_classes.json 85 | ---yolodata 86 | --------images 87 | --------labels 88 | ---traindata 89 | --------images 90 | ----------------train 91 | ----------------val 92 | --------labels 93 | ----------------train 94 | ----------------val 95 | ``` 96 | ##### traindata就是最后需要的训练文件 97 | 98 | ### 3. 训练模型 99 | 100 | yolov5的训练很简单,本文已将代码简化,代码结构如下: 101 | 102 | ``` 103 | dataset # 数据集 104 | ------traindata # 训练数据集 105 | inference # 输入输出接口 106 | ------inputs # 输入数据 107 | ------outputs # 输出数据 108 | config # 配置文件 109 | ------score.yaml # 训练配置文件 110 | ------yolov5l.yaml # 模型配置文件 111 | models # 模型代码 112 | runs # 日志文件 113 | utils # 代码文件 114 | weights # 模型保存路径,last.pt,best.pt 115 | train.py # 训练代码 116 | detect.py # 测试代码 117 | ``` 118 | 119 | score.yaml解释如下: 120 | ``` 121 | # train and val datasets (image directory) 122 | train: ./datasets/traindata/images/train/ 123 | val: ./datasets/traindata/images/val/ 124 | # number of classes 125 | nc: 2 126 | # class names 127 | names: ['苹果','香蕉'] 128 | ``` 129 | 130 | - train: 为图像数据的train,地址 131 | - val: 为图像数据的val,地址 132 | - nc: 为类别个数 133 | - names: 为类别对应的名称 134 | 135 | 136 | ##### yolov5l.yaml解释如下: 137 | 138 | ``` 139 | nc: 2 # number of classes 140 | depth_multiple: 1.0 # model depth multiple 141 | width_multiple: 1.0 # layer channel multiple 142 | anchors: 143 | - [10,13, 16,30, 33,23] # P3/8 144 | - [30,61, 62,45, 59,119] # P4/16 145 | - [116,90, 156,198, 373,326] # P5/32 146 | backbone: 147 | # [from, number, module, args] 148 | [[-1, 1, Focus, [64, 3]], # 1-P1/2 149 | [-1, 1, Conv, [128, 3, 2]], # 2-P2/4 150 | [-1, 3, Bottleneck, [128]], 151 | [-1, 1, Conv, [256, 3, 2]], # 4-P3/8 152 | [-1, 9, BottleneckCSP, [256]], 153 | [-1, 1, Conv, [512, 3, 2]], # 6-P4/16 154 | [-1, 9, BottleneckCSP, [512]], 155 | [-1, 1, Conv, [1024, 3, 2]], # 8-P5/32 156 | [-1, 1, SPP, [1024, [5, 9, 13]]], 157 | [-1, 6, BottleneckCSP, [1024]], # 10 158 | ] 159 | head: 160 | [[-1, 3, BottleneckCSP, [1024, False]], # 11 161 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 12 (P5/32-large) 162 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 163 | [[-1, 6], 1, Concat, [1]], # cat backbone P4 164 | [-1, 1, Conv, [512, 1, 1]], 165 | [-1, 3, BottleneckCSP, [512, False]], 166 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 17 (P4/16-medium) 167 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 168 | [[-1, 4], 1, Concat, [1]], # cat backbone P3 169 | [-1, 1, Conv, [256, 1, 1]], 170 | [-1, 3, BottleneckCSP, [256, False]], 171 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 22 (P3/8-small) 172 | [[], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) 173 | ] 174 | ``` 175 | - nc:为目标类别个数 176 | - depth_multiple 和 width_multiple:控制模型深度和宽度。不同的参数对应:s,m,l,x 模型。 177 | - anchors: 为对输入的目标框通过k-means聚类产生的基础框,通过这个基础框去预测目标的box。 178 | - yolov5会自动产生anchors,yolov5采用欧氏距离进行k-means聚类,再使用遗传算法做一系列的变异得到最终的anchors。但是本人采用欧氏距离进行k-means聚类得到的效果不如采用 1 - iou进行k-means聚类的效果。如果想要 1 - iou 进行k-means聚类源码请私聊我。但是效果其实相差无几。 179 | - backbone: 为图像特征提取部分的网络结构。 180 | - head: 为最后的预测部分的网络结构 181 | 182 | 183 | #####train.py配置十分简单: 184 | ![在这里插入图片描述](cam/1.png) 185 | 186 | 我们仅需修改如下参数即可 187 | ``` 188 | epoch: 控制训练迭代的次数 189 | batch_size 输入迭代的图片数量 190 | cfg: 配置网络模型路径 191 | data: 训练配置文件路径 192 | weights: 载入模型,进行断点继续训练 193 | ``` 194 | 终端运行(默认yolov5l) 195 | ``` 196 | python train.py 197 | ``` 198 | 即可开始训练。 199 | 200 | ##### 训练过程 201 | 202 | ![](cam/train.jpg) 203 | 204 | ##### 训练结果 205 | 206 | ![](cam/result.png) 207 | 208 | ### 4. 测试模型 209 | 210 | ![](cam/2.png) 211 | 212 | ##### 需要需改三个参数 213 | ``` 214 | source: 需要检测的images/videos路径 215 | out: 保存结果的路径 216 | weights: 训练得到的模型权重文件的路径 217 | ``` 218 | ##### 你也可以使用在coco数据集上的权重文件进行测试将他们放到weights文件夹下 219 | 220 | [提取码:hhbb](https://pan.baidu.com/s/18AD8HpLhcRGSKOwGwPJMMg) 221 | 222 | 终端运行 223 | ``` 224 | python detect.py 225 | ``` 226 | 即可开始检测。 227 | 228 | ##### 测试结果 229 | 230 | ![](cam/test.jpg) 231 | 232 | ![](cam/test_re.jpg) 233 | 234 | ### 5.通过flask部署 235 | 236 | flask的部署是非简单。如果有不明白的可以参考我之前的博客。 237 | 238 | [阿里云ECS部署python,flask项目,简单易懂,无需nginx和uwsgi](https://blog.csdn.net/qq_44523137/article/details/112676287?spm=1001.2014.3001.5501) 239 | 240 | [基于yolov3-deepsort-flask的目标检测和多目标追踪web平台](https://blog.csdn.net/qq_44523137/article/details/116323516?spm=1001.2014.3001.5501) 241 | 242 | 243 | 244 | 终端运行 245 | ``` 246 | python app.py 247 | ``` 248 | 即可开始跳转到网页,上传图片进行检测。 249 | 250 | 251 | 252 | 253 | -------------------------------------------------------------------------------- /app.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import time 3 | from flask import Flask, request, Response,render_template 4 | import json 5 | from cam.base_camera import BaseCamera 6 | 7 | from models.de import detect,get_model 8 | import os 9 | os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" 10 | app = Flask(__name__) 11 | class_names = [c.strip() for c in open(r'cam/coco.names').readlines()] 12 | file_name = ['jpg','jpeg','png'] 13 | 14 | yolov5_model = get_model() 15 | 16 | @app.route('/images', methods= ['POST']) 17 | def get_image(): 18 | image = request.files["images"] 19 | image_name = image.filename 20 | image.save(os.path.join(os.getcwd(), image_name)) 21 | if image_name.split(".")[-1] in file_name: 22 | img = cv2.imread(image_name) 23 | img = detect(yolov5_model,img) 24 | _, img_encoded = cv2.imencode('.jpg', img) 25 | response = img_encoded.tobytes() 26 | os.remove(image_name) 27 | try: 28 | return Response(response=response, status=200, mimetype='image/jpg') 29 | except: 30 | return render_template('index1.html') 31 | @app.route('/') 32 | def upload_file(): 33 | return render_template('index1.html') 34 | if __name__ == '__main__': 35 | # Run locally 36 | app.run(debug=True, host='127.0.0.1', port=5000) 37 | #Run on the server 38 | # app.run(debug=True, host = '0.0.0.0', port=5000) 39 | -------------------------------------------------------------------------------- /cam/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/cam/1.png -------------------------------------------------------------------------------- /cam/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/cam/2.png -------------------------------------------------------------------------------- /cam/__pycache__/base_camera.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/cam/__pycache__/base_camera.cpython-37.pyc -------------------------------------------------------------------------------- /cam/__pycache__/base_camera.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/cam/__pycache__/base_camera.cpython-38.pyc -------------------------------------------------------------------------------- /cam/base_camera.py: -------------------------------------------------------------------------------- 1 | import time 2 | import threading 3 | try: 4 | from greenlet import getcurrent as get_ident 5 | except ImportError: 6 | try: 7 | from thread import get_ident 8 | except ImportError: 9 | from _thread import get_ident 10 | 11 | 12 | class CameraEvent(object): 13 | """An Event-like class that signals all active clients when a new frame is 14 | available. 15 | """ 16 | def __init__(self): 17 | self.events = {} 18 | 19 | def wait(self): 20 | """Invoked from each client's thread to wait for the next frame.""" 21 | ident = get_ident() 22 | if ident not in self.events: 23 | # this is a new client 24 | # add an entry for it in the self.events dict 25 | # each entry has two elements, a threading.Event() and a timestamp 26 | self.events[ident] = [threading.Event(), time.time()] 27 | return self.events[ident][0].wait() 28 | 29 | def set(self): 30 | """Invoked by the camera thread when a new frame is available.""" 31 | now = time.time() 32 | remove = None 33 | for ident, event in self.events.items(): 34 | if not event[0].isSet(): 35 | # if this client's event is not set, then set it 36 | # also update the last set timestamp to now 37 | event[0].set() 38 | event[1] = now 39 | else: 40 | # if the client's event is already set, it means the client 41 | # did not process a previous frame 42 | # if the event stays set for more than 5 seconds, then assume 43 | # the client is gone and remove it 44 | if now - event[1] > 5: 45 | remove = ident 46 | if remove: 47 | del self.events[remove] 48 | 49 | def clear(self): 50 | """Invoked from each client's thread after a frame was processed.""" 51 | self.events[get_ident()][0].clear() 52 | 53 | 54 | class BaseCamera(object): 55 | thread = None # background thread that reads frames from camera 56 | frame = None # current frame is stored here by background thread 57 | last_access = 0 # time of last client access to the camera 58 | event = CameraEvent() 59 | 60 | def __init__(self): 61 | """Start the background camera thread if it isn't running yet.""" 62 | if BaseCamera.thread is None: 63 | BaseCamera.last_access = time.time() 64 | 65 | # start background frame thread 66 | BaseCamera.thread = threading.Thread(target=self._thread) 67 | BaseCamera.thread.start() 68 | 69 | # wait until frames are available 70 | while self.get_frame() is None: 71 | time.sleep(0) 72 | 73 | def get_frame(self): 74 | """Return the current camera frame.""" 75 | BaseCamera.last_access = time.time() 76 | 77 | # wait for a signal from the camera thread 78 | BaseCamera.event.wait() 79 | BaseCamera.event.clear() 80 | 81 | return BaseCamera.frame 82 | 83 | @staticmethod 84 | def frames(path): 85 | """"Generator that returns frames from the camera.""" 86 | raise RuntimeError('Must be implemented by subclasses.') 87 | 88 | @classmethod 89 | def _thread(cls): 90 | """Camera background thread.""" 91 | print('Starting camera thread.') 92 | frames_iterator = cls.frames() 93 | for frame in frames_iterator: 94 | BaseCamera.frame = frame 95 | BaseCamera.event.set() # send signal to clients 96 | time.sleep(0) 97 | 98 | # if there hasn't been any clients asking for frames in 99 | # the last 10 seconds then stop the thread 100 | if time.time() - BaseCamera.last_access > 60: 101 | frames_iterator.close() 102 | print('Stopping camera thread due to inactivity.') 103 | break 104 | BaseCamera.thread = None -------------------------------------------------------------------------------- /cam/camera.py: -------------------------------------------------------------------------------- 1 | 2 | from cam.base_camera import BaseCamera 3 | import cv2 4 | import tensorflow as tf 5 | from yolov3_tf2.models import YoloV3 6 | from yolov3_tf2.dataset import transform_images 7 | from yolov3_tf2.utils import draw_outputs 8 | 9 | # customize your API through the following parameters 10 | classes_path = 'coco.names' 11 | weights_path = './weights/yolov3.tf' 12 | tiny = False # set to True if using a Yolov3 Tiny model 13 | size = 416 # size images are resized to for model 14 | output_path = './detections/' # path to output folder where images with detections are saved 15 | num_classes = 80 # number of classes in model 16 | 17 | # load in weights and classes 18 | physical_devices = tf.config.experimental.list_physical_devices('GPU') 19 | if len(physical_devices) > 0: 20 | tf.config.experimental.set_memory_growth(physical_devices[0], True) 21 | 22 | 23 | yolo = YoloV3(classes=num_classes) 24 | 25 | yolo.load_weights(weights_path).expect_partial() 26 | print('weights loaded') 27 | 28 | class_names = [c.strip() for c in open(classes_path).readlines()] 29 | print('classes loaded') 30 | 31 | 32 | class Camera(BaseCamera): 33 | 34 | @staticmethod 35 | def frames(): 36 | cam = cv2.VideoCapture(r'./finish.mp4') 37 | if not cam.isOpened(): 38 | raise RuntimeError('Could not start camera.') 39 | 40 | while True: 41 | # read current frame 42 | _, img = cam.read() 43 | try: 44 | if CameraParams.gray: 45 | img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 46 | if CameraParams.gaussian: 47 | img_raw = tf.convert_to_tensor(img) 48 | img_raw = tf.expand_dims(img_raw, 0) 49 | # img detect 50 | img_raw = transform_images(img_raw, size) 51 | boxes, scores, classes, nums = yolo(img_raw) 52 | img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) 53 | img = draw_outputs(img, (boxes, scores, classes, nums), class_names) 54 | if CameraParams.sobel: 55 | if(len(img.shape) == 3): 56 | img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 57 | img = cv2.Sobel(img,cv2.CV_64F,1,0,ksize=5) # x 58 | img = cv2.Sobel(img,cv2.CV_64F,0,1,ksize=5) # y 59 | if CameraParams.canny: 60 | img = cv2.Canny(img, 100, 200, 3, L2gradient=True) 61 | except Exception as e: 62 | print(e) 63 | # encode as a jpeg image and return it 64 | yield cv2.imencode('.jpg', img)[1].tobytes() 65 | 66 | class CameraParams(): 67 | 68 | gray = False 69 | gaussian = False 70 | sobel = False 71 | canny = False 72 | def __init__(self, gray, gaussian, sobel, canny, yolo): 73 | self.gray = gray 74 | self.gaussian = gaussian 75 | self.sobel = sobel 76 | self.canny = canny 77 | self.yolo 78 | -------------------------------------------------------------------------------- /cam/coco.names: -------------------------------------------------------------------------------- 1 | person 2 | bicycle 3 | car 4 | motorbike 5 | aeroplane 6 | bus 7 | train 8 | truck 9 | boat 10 | traffic light 11 | fire hydrant 12 | stop sign 13 | parking meter 14 | bench 15 | bird 16 | cat 17 | dog 18 | horse 19 | sheep 20 | cow 21 | elephant 22 | bear 23 | zebra 24 | giraffe 25 | backpack 26 | umbrella 27 | handbag 28 | tie 29 | suitcase 30 | frisbee 31 | skis 32 | snowboard 33 | sports ball 34 | kite 35 | baseball bat 36 | baseball glove 37 | skateboard 38 | surfboard 39 | tennis racket 40 | bottle 41 | wine glass 42 | cup 43 | fork 44 | knife 45 | spoon 46 | bowl 47 | banana 48 | apple 49 | sandwich 50 | orange 51 | broccoli 52 | carrot 53 | hot dog 54 | pizza 55 | donut 56 | cake 57 | chair 58 | sofa 59 | pottedplant 60 | bed 61 | diningtable 62 | toilet 63 | tvmonitor 64 | laptop 65 | mouse 66 | remote 67 | keyboard 68 | cell phone 69 | microwave 70 | oven 71 | toaster 72 | sink 73 | refrigerator 74 | book 75 | clock 76 | vase 77 | scissors 78 | teddy bear 79 | hair drier 80 | toothbrush 81 | -------------------------------------------------------------------------------- /cam/result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/cam/result.png -------------------------------------------------------------------------------- /cam/test.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/cam/test.jpg -------------------------------------------------------------------------------- /cam/test_re.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/cam/test_re.jpg -------------------------------------------------------------------------------- /cam/train.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/cam/train.jpg -------------------------------------------------------------------------------- /center/get_train_val.py: -------------------------------------------------------------------------------- 1 | import os,shutil 2 | import numpy as np 3 | import cv2 4 | from tqdm import tqdm 5 | #上一步保存的所有image和label文件路径 6 | image_root = r'../datasets/yolo_data/images' 7 | label_root = r'../datasets/yolo_data/labels' 8 | names = [] 9 | for root,dir,files in os.walk(label_root ): 10 | for file in files: 11 | names.append(file) 12 | val_split = 0.1 13 | np.random.seed(10101) 14 | np.random.shuffle(names) 15 | num_val = int(len(names)*val_split) 16 | num_train = len(names) - num_val 17 | trains = names[:num_train] 18 | vals = names[num_train:] 19 | #保存路径 20 | save_path_img = r'../datasets//traindata' 21 | if not os.path.exists(save_path_img): 22 | os.mkdir(save_path_img) 23 | def get_train_val_data(img_root,txt_root,save_path_img,files,typ): 24 | def get_path(root_path,path1): 25 | path = os.path.join(root_path,path1) 26 | if not os.path.exists(path): 27 | os.mkdir(path) 28 | return path 29 | for val in tqdm(files): 30 | txt_path = os.path.join(txt_root,val) 31 | img_path = os.path.join(img_root,val.split('.')[0]+'.jpg') 32 | img_path1 = get_path(save_path_img,'images') 33 | txt_path1 = get_path(save_path_img,'labels') 34 | rt_img = get_path(img_path1,typ) 35 | rt_txt = get_path(txt_path1,typ) 36 | txt_path1 = os.path.join(rt_txt,val) 37 | img_path1 = os.path.join(rt_img,val.split('.')[0]+'.jpg') 38 | shutil.copyfile(img_path, img_path1) 39 | shutil.copyfile(txt_path,txt_path1) 40 | get_train_val_data(image_root,label_root,save_path_img,vals,'val') 41 | get_train_val_data(image_root,label_root,save_path_img,trains,'train') 42 | 43 | 44 | 45 | 46 | 47 | -------------------------------------------------------------------------------- /center/xml_yolo.py: -------------------------------------------------------------------------------- 1 | import os 2 | from tqdm import tqdm 3 | from lxml import etree 4 | import json 5 | import shutil 6 | # 原始xml路径和image路径 7 | xml_root_path = r'../datasets/VOC2012/Annotations' 8 | img_root_path = r'../datasets/VOC2012/JPEGImages' 9 | # 保存的图片和yolo格式label路径。要新建文件夹 10 | def get_path(path): 11 | if not os.path.exists(path): 12 | os.mkdir(path) 13 | return path 14 | get_path(r'../datasets/yolo_data') 15 | save_label_path = get_path(r'../datasets/yolo_data/labels') 16 | save_images_path = get_path(r'../datasets/yolo_data/images') 17 | def parse_xml_to_dict(xml): 18 | if len(xml) == 0: # 遍历到底层,直接返回tag对应的信息 19 | return {xml.tag: xml.text} 20 | result = {} 21 | for child in xml: 22 | child_result = parse_xml_to_dict(child) # 递归遍历标签信息 23 | if child.tag != 'object': 24 | result[child.tag] = child_result[child.tag] 25 | else: 26 | if child.tag not in result: # 因为object可能有多个,所以需要放入列表里 27 | result[child.tag] = [] 28 | result[child.tag].append(child_result[child.tag]) 29 | return {xml.tag: result} 30 | def translate_info(file_names, img_root_path, class_list): 31 | for root,dirs,files in os.walk(file_names): 32 | for file in tqdm(files): 33 | # 检查xml文件是否存在 34 | xml_path = os.path.join(root, file) 35 | # read xml 36 | with open(xml_path) as fid: 37 | xml_str = fid.read() 38 | xml = etree.fromstring(xml_str) 39 | data = parse_xml_to_dict(xml)["annotation"] 40 | img_height = int(data["size"]["height"]) 41 | img_width = int(data["size"]["width"]) 42 | img_path = data["filename"] 43 | 44 | # write object info into txt 45 | assert "object" in data.keys(), "file: '{}' lack of object key.".format(xml_path) 46 | if len(data["object"]) == 0: 47 | # 如果xml文件中没有目标就直接忽略该样本 48 | print("Warning: in '{}' xml, there are no objects.".format(xml_path)) 49 | continue 50 | with open(os.path.join(save_label_path, file.split(".")[0] + ".txt"), "w") as f: 51 | for index, obj in enumerate(data["object"]): 52 | # 获取每个object的box信息 53 | xmin = float(obj["bndbox"]["xmin"]) 54 | xmax = float(obj["bndbox"]["xmax"]) 55 | ymin = float(obj["bndbox"]["ymin"]) 56 | ymax = float(obj["bndbox"]["ymax"]) 57 | class_name = obj["name"] 58 | class_index = class_list.index(class_name) 59 | # 进一步检查数据,有的标注信息中可能有w或h为0的情况,这样的数据会导致计算回归loss为nan 60 | if xmax <= xmin or ymax <= ymin: 61 | print("Warning: in '{}' xml, there are some bbox w/h <=0".format(xml_path)) 62 | continue 63 | # 将box信息转换到yolo格式 64 | xcenter = xmin + (xmax - xmin) / 2 65 | ycenter = ymin + (ymax - ymin) / 2 66 | w = xmax - xmin 67 | h = ymax - ymin 68 | # 绝对坐标转相对坐标,保存6位小数 69 | xcenter = round(xcenter / img_width, 6) 70 | ycenter = round(ycenter / img_height, 6) 71 | w = round(w / img_width, 6) 72 | h = round(h / img_height, 6) 73 | info = [str(i) for i in [class_index, xcenter, ycenter, w, h]] 74 | if index == 0: 75 | f.write(" ".join(info)) 76 | else: 77 | f.write("\n" + " ".join(info)) 78 | # copy image into save_images_path 79 | path_copy_to = os.path.join(save_images_path,file.split(".")[0] + ".jpg") 80 | shutil.copyfile(os.path.join(img_root_path, img_path), path_copy_to) 81 | 82 | label_json_path = r'../datasets/VOC2012/pascal_voc_classes.txt' 83 | with open(label_json_path, 'r') as f: 84 | label_file = f.readlines() 85 | class_list = label_file[0].split(',') 86 | translate_info(xml_root_path, img_root_path, class_list) -------------------------------------------------------------------------------- /config/score.yaml: -------------------------------------------------------------------------------- 1 | # train and val datasets (image directory or *.txt file with image paths) 2 | train: ./datasets/traindata/images/train/ 3 | val: ./datasets/traindata/images/val/ 4 | # number of classes 5 | nc: 20 6 | # class names 7 | names: ["aeroplane","bicycle","bird","boat","bottle","bus","car","cat","chair","cow","diningtable","dog","horse","motorbike","person","pottedplant","sheep","sofa","train","tvmonitor"] -------------------------------------------------------------------------------- /config/yolov3-spp.yaml: -------------------------------------------------------------------------------- 1 | # parameters 2 | nc: 80 # number of classes 3 | depth_multiple: 1.0 # expand model depth 4 | width_multiple: 1.0 # expand layer channels 5 | 6 | # anchors 7 | anchors: 8 | - [10,13, 16,30, 33,23] # P3/8 9 | - [30,61, 62,45, 59,119] # P4/16 10 | - [116,90, 156,198, 373,326] # P5/32 11 | 12 | # darknet53 backbone 13 | backbone: 14 | # [from, number, module, args] 15 | [[-1, 1, Conv, [32, 3, 1]], # 0 16 | [-1, 1, Conv, [64, 3, 2]], # 1-P1/2 17 | [-1, 1, Bottleneck, [64]], 18 | [-1, 1, Conv, [128, 3, 2]], # 3-P2/4 19 | [-1, 2, Bottleneck, [128]], 20 | [-1, 1, Conv, [256, 3, 2]], # 5-P3/8 21 | [-1, 8, Bottleneck, [256]], 22 | [-1, 1, Conv, [512, 3, 2]], # 7-P4/16 23 | [-1, 8, Bottleneck, [512]], 24 | [-1, 1, Conv, [1024, 3, 2]], # 9-P5/32 25 | [-1, 4, Bottleneck, [1024]], # 10 26 | ] 27 | 28 | # yolov3-spp head 29 | # na = len(anchors[0]) 30 | head: 31 | [[-1, 1, Bottleneck, [1024, False]], # 11 32 | [-1, 1, SPP, [512, [5, 9, 13]]], 33 | [-1, 1, Conv, [1024, 3, 1]], 34 | [-1, 1, Conv, [512, 1, 1]], 35 | [-1, 1, Conv, [1024, 3, 1]], 36 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 16 (P5/32-large) 37 | 38 | [-3, 1, Conv, [256, 1, 1]], 39 | [-1, 1, nn.Upsample, [None, 2, 'nearest']], 40 | [[-1, 8], 1, Concat, [1]], # cat backbone P4 41 | [-1, 1, Bottleneck, [512, False]], 42 | [-1, 1, Bottleneck, [512, False]], 43 | [-1, 1, Conv, [256, 1, 1]], 44 | [-1, 1, Conv, [512, 3, 1]], 45 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 24 (P4/16-medium) 46 | 47 | [-3, 1, Conv, [128, 1, 1]], 48 | [-1, 1, nn.Upsample, [None, 2, 'nearest']], 49 | [[-1, 6], 1, Concat, [1]], # cat backbone P3 50 | [-1, 1, Bottleneck, [256, False]], 51 | [-1, 2, Bottleneck, [256, False]], 52 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1]], # 30 (P3/8-small) 53 | 54 | [[], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) 55 | ] 56 | -------------------------------------------------------------------------------- /config/yolov5l.yaml: -------------------------------------------------------------------------------- 1 | # parameters 2 | nc: 20 # number of classes 3 | depth_multiple: 1.0 # model depth multiple 4 | width_multiple: 1.0 # layer channel multiple 5 | 6 | # anchors 7 | anchors: 8 | - [10,13, 16,30, 33,23] # P3/8 9 | - [30,61, 62,45, 59,119] # P4/16 10 | - [116,90, 156,198, 373,326] # P5/32 11 | 12 | # yolov5 backbone 13 | backbone: 14 | # [from, number, module, args] 15 | [[-1, 1, Focus, [64, 3]], # 1-P1/2 16 | [-1, 1, Conv, [128, 3, 2]], # 2-P2/4 17 | [-1, 3, Bottleneck, [128]], 18 | [-1, 1, Conv, [256, 3, 2]], # 4-P3/8 19 | [-1, 9, BottleneckCSP, [256]], 20 | [-1, 1, Conv, [512, 3, 2]], # 6-P4/16 21 | [-1, 9, BottleneckCSP, [512]], 22 | [-1, 1, Conv, [1024, 3, 2]], # 8-P5/32 23 | [-1, 1, SPP, [1024, [5, 9, 13]]], 24 | [-1, 6, BottleneckCSP, [1024]], # 10 25 | ] 26 | 27 | # yolov5 head 28 | head: 29 | [[-1, 3, BottleneckCSP, [1024, False]], # 11 30 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 12 (P5/32-large) 31 | 32 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 33 | [[-1, 6], 1, Concat, [1]], # cat backbone P4 34 | [-1, 1, Conv, [512, 1, 1]], 35 | [-1, 3, BottleneckCSP, [512, False]], 36 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 17 (P4/16-medium) 37 | 38 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 39 | [[-1, 4], 1, Concat, [1]], # cat backbone P3 40 | [-1, 1, Conv, [256, 1, 1]], 41 | [-1, 3, BottleneckCSP, [256, False]], 42 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 22 (P3/8-small) 43 | 44 | [[], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) 45 | ] 46 | -------------------------------------------------------------------------------- /config/yolov5m.yaml: -------------------------------------------------------------------------------- 1 | # parameters 2 | nc: 20 # number of classes 3 | depth_multiple: 0.67 # model depth multiple 4 | width_multiple: 0.75 # layer channel multiple 5 | 6 | # anchors 7 | anchors: 8 | - [10,13, 16,30, 33,23] # P3/8 9 | - [30,61, 62,45, 59,119] # P4/16 10 | - [116,90, 156,198, 373,326] # P5/32 11 | 12 | # yolov5 backbone 13 | backbone: 14 | # [from, number, module, args] 15 | [[-1, 1, Focus, [64, 3]], # 1-P1/2 16 | [-1, 1, Conv, [128, 3, 2]], # 2-P2/4 17 | [-1, 3, Bottleneck, [128]], 18 | [-1, 1, Conv, [256, 3, 2]], # 4-P3/8 19 | [-1, 9, BottleneckCSP, [256]], 20 | [-1, 1, Conv, [512, 3, 2]], # 6-P4/16 21 | [-1, 9, BottleneckCSP, [512]], 22 | [-1, 1, Conv, [1024, 3, 2]], # 8-P5/32 23 | [-1, 1, SPP, [1024, [5, 9, 13]]], 24 | [-1, 6, BottleneckCSP, [1024]], # 10 25 | ] 26 | 27 | # yolov5 head 28 | head: 29 | [[-1, 3, BottleneckCSP, [1024, False]], # 11 30 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 12 (P5/32-large) 31 | 32 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 33 | [[-1, 6], 1, Concat, [1]], # cat backbone P4 34 | [-1, 1, Conv, [512, 1, 1]], 35 | [-1, 3, BottleneckCSP, [512, False]], 36 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 17 (P4/16-medium) 37 | 38 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 39 | [[-1, 4], 1, Concat, [1]], # cat backbone P3 40 | [-1, 1, Conv, [256, 1, 1]], 41 | [-1, 3, BottleneckCSP, [256, False]], 42 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 22 (P3/8-small) 43 | 44 | [[], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) 45 | ] 46 | -------------------------------------------------------------------------------- /config/yolov5s.yaml: -------------------------------------------------------------------------------- 1 | # parameters 2 | nc: 20 # number of classes 3 | depth_multiple: 0.33 # model depth multiple 4 | width_multiple: 0.50 # layer channel multiple 5 | 6 | # anchors 7 | anchors: 8 | - [10,13, 16,30, 33,23] # P3/8 9 | - [30,61, 62,45, 59,119] # P4/16 10 | - [116,90, 156,198, 373,326] # P5/32 11 | 12 | # yolov5 backbone 13 | backbone: 14 | # [from, number, module, args] 15 | [[-1, 1, Focus, [64, 3]], # 1-P1/2 16 | [-1, 1, Conv, [128, 3, 2]], # 2-P2/4 17 | [-1, 3, Bottleneck, [128]], 18 | [-1, 1, Conv, [256, 3, 2]], # 4-P3/8 19 | [-1, 9, BottleneckCSP, [256]], 20 | [-1, 1, Conv, [512, 3, 2]], # 6-P4/16 21 | [-1, 9, BottleneckCSP, [512]], 22 | [-1, 1, Conv, [1024, 3, 2]], # 8-P5/32 23 | [-1, 1, SPP, [1024, [5, 9, 13]]], 24 | [-1, 6, BottleneckCSP, [1024]], # 10 25 | ] 26 | 27 | # yolov5 head 28 | head: 29 | [[-1, 3, BottleneckCSP, [1024, False]], # 11 30 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 12 (P5/32-large) 31 | 32 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 33 | [[-1, 6], 1, Concat, [1]], # cat backbone P4 34 | [-1, 1, Conv, [512, 1, 1]], 35 | [-1, 3, BottleneckCSP, [512, False]], 36 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 17 (P4/16-medium) 37 | 38 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 39 | [[-1, 4], 1, Concat, [1]], # cat backbone P3 40 | [-1, 1, Conv, [256, 1, 1]], 41 | [-1, 3, BottleneckCSP, [256, False]], 42 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 22 (P3/8-small) 43 | 44 | [[], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) 45 | ] 46 | -------------------------------------------------------------------------------- /config/yolov5x.yaml: -------------------------------------------------------------------------------- 1 | # parameters 2 | nc: 80 # number of classes 3 | depth_multiple: 1.33 # model depth multiple 4 | width_multiple: 1.25 # layer channel multiple 5 | 6 | # anchors 7 | anchors: 8 | - [10,13, 16,30, 33,23] # P3/8 9 | - [30,61, 62,45, 59,119] # P4/16 10 | - [116,90, 156,198, 373,326] # P5/32 11 | 12 | # yolov5 backbone 13 | backbone: 14 | # [from, number, module, args] 15 | [[-1, 1, Focus, [64, 3]], # 1-P1/2 16 | [-1, 1, Conv, [128, 3, 2]], # 2-P2/4 17 | [-1, 3, Bottleneck, [128]], 18 | [-1, 1, Conv, [256, 3, 2]], # 4-P3/8 19 | [-1, 9, BottleneckCSP, [256]], 20 | [-1, 1, Conv, [512, 3, 2]], # 6-P4/16 21 | [-1, 9, BottleneckCSP, [512]], 22 | [-1, 1, Conv, [1024, 3, 2]], # 8-P5/32 23 | [-1, 1, SPP, [1024, [5, 9, 13]]], 24 | [-1, 6, BottleneckCSP, [1024]], # 10 25 | ] 26 | 27 | # yolov5 head 28 | head: 29 | [[-1, 3, BottleneckCSP, [1024, False]], # 11 30 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 12 (P5/32-large) 31 | 32 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 33 | [[-1, 6], 1, Concat, [1]], # cat backbone P4 34 | [-1, 1, Conv, [512, 1, 1]], 35 | [-1, 3, BottleneckCSP, [512, False]], 36 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 17 (P4/16-medium) 37 | 38 | [-2, 1, nn.Upsample, [None, 2, 'nearest']], 39 | [[-1, 4], 1, Concat, [1]], # cat backbone P3 40 | [-1, 1, Conv, [256, 1, 1]], 41 | [-1, 3, BottleneckCSP, [256, False]], 42 | [-1, 1, nn.Conv2d, [na * (nc + 5), 1, 1, 0]], # 22 (P3/8-small) 43 | 44 | [[], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5) 45 | ] 46 | -------------------------------------------------------------------------------- /detect.py: -------------------------------------------------------------------------------- 1 | from utils.datasets import * 2 | from utils.utils import * 3 | 4 | def detect(source, out, weights): 5 | source, out, weights, imgsz = source, out, weights, 640 6 | # Initialize 7 | device = torch_utils.select_device('cpu') 8 | if os.path.exists(out): 9 | shutil.rmtree(out) # delete output folder 10 | os.makedirs(out) # make new output folder 11 | # Load model 12 | google_utils.attempt_download(weights) 13 | model = torch.load(weights, map_location=device)['model'] 14 | model.to(device).eval() 15 | vid_path, vid_writer = None, None 16 | dataset = LoadImages(source, img_size=imgsz) 17 | # Get names and colors 18 | names = model.names if hasattr(model, 'names') else model.modules.names 19 | colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))] 20 | # Run inference 21 | t0 = time.time() 22 | for path, img, im0s, vid_cap in dataset: 23 | t1 = time.time() 24 | img = torch.from_numpy(img).to(device) 25 | img = img.float() # uint8 to fp16/32 26 | img /= 255.0 # 0 - 255 to 0.0 - 1.0 27 | if img.ndimension() == 3: 28 | img = img.unsqueeze(0) 29 | # Inference 30 | pred = model(img, augment=False)[0] 31 | pred = non_max_suppression(pred, 0.4, 0.5, 32 | fast=True, classes=None, agnostic=False) 33 | # Process detections 34 | for i, det in enumerate(pred): # detections per image 35 | p, s, im0 = path, '', im0s 36 | save_path = str(Path(out) / Path(p).name) 37 | s += '%gx%g ' % img.shape[2:] # print string 38 | if det is not None and len(det): 39 | # Rescale boxes from img_size to im0 size 40 | det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round() 41 | # Print results 42 | for c in det[:, -1].unique(): 43 | n = (det[:, -1] == c).sum() # detections per class 44 | s += '%g %ss, ' % (n, names[int(c)]) # add to string 45 | for *xyxy, conf, cls in det: 46 | # Add bbox to image 47 | label = '%s%.2f' % (names[int(cls)], conf) 48 | im0 = plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=1) 49 | # xmin,ymin, xmax,ymax = int(xyxy[0]), int(xyxy[1]),int(xyxy[2]), int(xyxy[3]) 50 | # xcenter = xmin + (xmax - xmin) / 2 51 | # ycenter = ymin + (ymax - ymin) / 2 52 | # w = xmax - xmin 53 | # h = ymax - ymin 54 | # Save results (image with detections) 55 | print('%sDone. (%.3fs)' % (s, time.time() - t1)) 56 | if dataset.mode == 'images': 57 | cv2.imwrite(save_path, im0) 58 | else: 59 | if vid_path != save_path: # new video 60 | vid_path = save_path 61 | if isinstance(vid_writer, cv2.VideoWriter): 62 | vid_writer.release() # release previous video writer 63 | fps = vid_cap.get(cv2.CAP_PROP_FPS) 64 | w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH)) 65 | h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) 66 | vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*opt.fourcc), fps, (w, h)) 67 | vid_writer.write(im0) 68 | 69 | print('Done. (%.3fs)' % (time.time() - t0)) 70 | 71 | 72 | source = './inference/inputs' 73 | out = './inference/outputs' 74 | weights = './weights/yolov5l.pt' 75 | 76 | with torch.no_grad(): 77 | detect(source, out, weights) 78 | -------------------------------------------------------------------------------- /inference/inputs/2007_000033.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/inference/inputs/2007_000033.jpg -------------------------------------------------------------------------------- /inference/outputs/2007_000033.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/inference/outputs/2007_000033.jpg -------------------------------------------------------------------------------- /models/__pycache__/common.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/models/__pycache__/common.cpython-37.pyc -------------------------------------------------------------------------------- /models/__pycache__/de.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/models/__pycache__/de.cpython-37.pyc -------------------------------------------------------------------------------- /models/__pycache__/experimental.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/models/__pycache__/experimental.cpython-37.pyc -------------------------------------------------------------------------------- /models/__pycache__/yolo.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/models/__pycache__/yolo.cpython-37.pyc -------------------------------------------------------------------------------- /models/common.py: -------------------------------------------------------------------------------- 1 | # This file contains modules common to various models 2 | 3 | 4 | from utils.utils import * 5 | 6 | 7 | def DWConv(c1, c2, k=1, s=1, act=True): 8 | # Depthwise convolution 9 | return Conv(c1, c2, k, s, g=math.gcd(c1, c2), act=act) 10 | 11 | 12 | class Conv(nn.Module): 13 | # Standard convolution 14 | def __init__(self, c1, c2, k=1, s=1, g=1, act=True): # ch_in, ch_out, kernel, stride, groups 15 | super(Conv, self).__init__() 16 | self.conv = nn.Conv2d(c1, c2, k, s, k // 2, groups=g, bias=False) 17 | self.bn = nn.BatchNorm2d(c2) 18 | self.act = nn.LeakyReLU(0.1, inplace=True) if act else nn.Identity() 19 | 20 | def forward(self, x): 21 | return self.act(self.bn(self.conv(x))) 22 | 23 | def fuseforward(self, x): 24 | return self.act(self.conv(x)) 25 | 26 | 27 | class Bottleneck(nn.Module): 28 | # Standard bottleneck 29 | def __init__(self, c1, c2, shortcut=True, g=1, e=0.5): # ch_in, ch_out, shortcut, groups, expansion 30 | super(Bottleneck, self).__init__() 31 | c_ = int(c2 * e) # hidden channels 32 | self.cv1 = Conv(c1, c_, 1, 1) 33 | self.cv2 = Conv(c_, c2, 3, 1, g=g) 34 | self.add = shortcut and c1 == c2 35 | 36 | def forward(self, x): 37 | return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x)) 38 | 39 | 40 | class BottleneckCSP(nn.Module): 41 | # CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks 42 | def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion 43 | super(BottleneckCSP, self).__init__() 44 | c_ = int(c2 * e) # hidden channels 45 | self.cv1 = Conv(c1, c_, 1, 1) 46 | self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False) 47 | self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False) 48 | self.cv4 = Conv(c2, c2, 1, 1) 49 | self.bn = nn.BatchNorm2d(2 * c_) # applied to cat(cv2, cv3) 50 | self.act = nn.LeakyReLU(0.1, inplace=True) 51 | self.m = nn.Sequential(*[Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)]) 52 | 53 | def forward(self, x): 54 | y1 = self.cv3(self.m(self.cv1(x))) 55 | y2 = self.cv2(x) 56 | return self.cv4(self.act(self.bn(torch.cat((y1, y2), dim=1)))) 57 | 58 | 59 | class SPP(nn.Module): 60 | # Spatial pyramid pooling layer used in YOLOv3-SPP 61 | def __init__(self, c1, c2, k=(5, 9, 13)): 62 | super(SPP, self).__init__() 63 | c_ = c1 // 2 # hidden channels 64 | self.cv1 = Conv(c1, c_, 1, 1) 65 | self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1) 66 | self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k]) 67 | 68 | def forward(self, x): 69 | x = self.cv1(x) 70 | return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1)) 71 | 72 | 73 | class Flatten(nn.Module): 74 | # Use after nn.AdaptiveAvgPool2d(1) to remove last 2 dimensions 75 | def forward(self, x): 76 | return x.view(x.size(0), -1) 77 | 78 | 79 | class Focus(nn.Module): 80 | # Focus wh information into c-space 81 | def __init__(self, c1, c2, k=1): 82 | super(Focus, self).__init__() 83 | self.conv = Conv(c1 * 4, c2, k, 1) 84 | 85 | def forward(self, x): # x(b,c,w,h) -> y(b,4c,w/2,h/2) 86 | return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)) 87 | 88 | 89 | class Concat(nn.Module): 90 | # Concatenate a list of tensors along dimension 91 | def __init__(self, dimension=1): 92 | super(Concat, self).__init__() 93 | self.d = dimension 94 | 95 | def forward(self, x): 96 | return torch.cat(x, self.d) 97 | -------------------------------------------------------------------------------- /models/de.py: -------------------------------------------------------------------------------- 1 | from utils.datasets import * 2 | from utils.utils import * 3 | 4 | 5 | def get_model(): 6 | weights = r'./weights/yolov5s.pt' 7 | device = torch.device("cuda" if (torch.cuda.is_available()) else "cpu") 8 | google_utils.attempt_download(weights) 9 | model = torch.load(weights, map_location=device)['model'] 10 | model.to(device).eval() 11 | return model 12 | 13 | 14 | def letterbox(img, new_shape=(416, 416), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True): 15 | # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232 16 | shape = img.shape[:2] # current shape [height, width] 17 | if isinstance(new_shape, int): 18 | new_shape = (new_shape, new_shape) 19 | 20 | # Scale ratio (new / old) 21 | r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) 22 | if not scaleup: # only scale down, do not scale up (for better test mAP) 23 | r = min(r, 1.0) 24 | # Compute padding 25 | ratio = r, r # width, height ratios 26 | new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) 27 | dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding 28 | if auto: # minimum rectangle 29 | dw, dh = np.mod(dw, 64), np.mod(dh, 64) # wh padding 30 | elif scaleFill: # stretch 31 | dw, dh = 0.0, 0.0 32 | new_unpad = new_shape 33 | ratio = new_shape[0] / shape[1], new_shape[1] / shape[0] # width, height ratios 34 | 35 | dw /= 2 # divide padding into 2 sides 36 | dh /= 2 37 | if shape[::-1] != new_unpad: # resize 38 | img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR) 39 | top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) 40 | left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) 41 | img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border 42 | return img, ratio, (dw, dh) 43 | 44 | def detect(model, im0s): 45 | t0 = time.time() 46 | device = torch.device("cuda" if (torch.cuda.is_available()) else "cpu") 47 | names = model.names if hasattr(model, 'names') else model.modules.names 48 | colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))] 49 | img = letterbox(im0s, new_shape=640)[0] 50 | img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416 51 | img = np.ascontiguousarray(img) 52 | img = torch.from_numpy(img).to(device) 53 | img = img.float() 54 | img /= 255.0 # 0 - 255 to 0.0 - 1.0 55 | if img.ndimension() == 3: 56 | img = img.unsqueeze(0) 57 | pred = model(img, augment=False)[0] 58 | pred = non_max_suppression(pred, 0.4, 0.5, 59 | fast=True, classes=None, agnostic=False) 60 | for i, det in enumerate(pred): # detections per image 61 | im0 = im0s 62 | if det is not None and len(det): 63 | det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round() 64 | for *xyxy, conf, cls in det: 65 | label = '%s%.2f' % (names[int(cls)], conf) 66 | im0 = plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=1) 67 | print('Done. (%.3fs)' % (time.time() - t0)) 68 | return im0 69 | 70 | -------------------------------------------------------------------------------- /models/experimental.py: -------------------------------------------------------------------------------- 1 | from models.common import * 2 | 3 | 4 | class Sum(nn.Module): 5 | # Weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070 6 | def __init__(self, n, weight=False): # n: number of inputs 7 | super(Sum, self).__init__() 8 | self.weight = weight # apply weights boolean 9 | self.iter = range(n - 1) # iter object 10 | if weight: 11 | self.w = nn.Parameter(-torch.arange(1., n) / 2, requires_grad=True) # layer weights 12 | 13 | def forward(self, x): 14 | y = x[0] # no weight 15 | if self.weight: 16 | w = torch.sigmoid(self.w) * 2 17 | for i in self.iter: 18 | y = y + x[i + 1] * w[i] 19 | else: 20 | for i in self.iter: 21 | y = y + x[i + 1] 22 | return y 23 | 24 | 25 | class GhostConv(nn.Module): 26 | # Ghost Convolution https://github.com/huawei-noah/ghostnet 27 | def __init__(self, c1, c2, k=1, s=1, g=1, act=True): # ch_in, ch_out, kernel, stride, groups 28 | super(GhostConv, self).__init__() 29 | c_ = c2 // 2 # hidden channels 30 | self.cv1 = Conv(c1, c_, k, s, g, act) 31 | self.cv2 = Conv(c_, c_, 5, 1, c_, act) 32 | 33 | def forward(self, x): 34 | y = self.cv1(x) 35 | return torch.cat([y, self.cv2(y)], 1) 36 | 37 | 38 | class GhostBottleneck(nn.Module): 39 | # Ghost Bottleneck https://github.com/huawei-noah/ghostnet 40 | def __init__(self, c1, c2, k, s): 41 | super(GhostBottleneck, self).__init__() 42 | c_ = c2 // 2 43 | self.conv = nn.Sequential(GhostConv(c1, c_, 1, 1), # pw 44 | DWConv(c_, c_, k, s, act=False) if s == 2 else nn.Identity(), # dw 45 | GhostConv(c_, c2, 1, 1, act=False)) # pw-linear 46 | self.shortcut = nn.Sequential(DWConv(c1, c1, k, s, act=False), 47 | Conv(c1, c2, 1, 1, act=False)) if s == 2 else nn.Identity() 48 | 49 | def forward(self, x): 50 | return self.conv(x) + self.shortcut(x) 51 | 52 | 53 | class ConvPlus(nn.Module): 54 | # Plus-shaped convolution 55 | def __init__(self, c1, c2, k=3, s=1, g=1, bias=True): # ch_in, ch_out, kernel, stride, groups 56 | super(ConvPlus, self).__init__() 57 | self.cv1 = nn.Conv2d(c1, c2, (k, 1), s, (k // 2, 0), groups=g, bias=bias) 58 | self.cv2 = nn.Conv2d(c1, c2, (1, k), s, (0, k // 2), groups=g, bias=bias) 59 | 60 | def forward(self, x): 61 | return self.cv1(x) + self.cv2(x) 62 | 63 | 64 | class MixConv2d(nn.Module): 65 | # Mixed Depthwise Conv https://arxiv.org/abs/1907.09595 66 | def __init__(self, c1, c2, k=(1, 3), s=1, equal_ch=True): 67 | super(MixConv2d, self).__init__() 68 | groups = len(k) 69 | if equal_ch: # equal c_ per group 70 | i = torch.linspace(0, groups - 1E-6, c2).floor() # c2 indices 71 | c_ = [(i == g).sum() for g in range(groups)] # intermediate channels 72 | else: # equal weight.numel() per group 73 | b = [c2] + [0] * groups 74 | a = np.eye(groups + 1, groups, k=-1) 75 | a -= np.roll(a, 1, axis=1) 76 | a *= np.array(k) ** 2 77 | a[0] = 1 78 | c_ = np.linalg.lstsq(a, b, rcond=None)[0].round() # solve for equal weight indices, ax = b 79 | 80 | self.m = nn.ModuleList([nn.Conv2d(c1, int(c_[g]), k[g], s, k[g] // 2, bias=False) for g in range(groups)]) 81 | self.bn = nn.BatchNorm2d(c2) 82 | self.act = nn.LeakyReLU(0.1, inplace=True) 83 | 84 | def forward(self, x): 85 | return x + self.act(self.bn(torch.cat([m(x) for m in self.m], 1))) 86 | -------------------------------------------------------------------------------- /models/onnx_export.py: -------------------------------------------------------------------------------- 1 | """Exports a pytorch *.pt model to *.onnx format 2 | 3 | Usage: 4 | import torch 5 | $ export PYTHONPATH="$PWD" && python models/onnx_export.py --weights ./weights/yolov5s.pt --img 640 --batch 1 6 | """ 7 | 8 | import argparse 9 | 10 | import onnx 11 | 12 | from models.common import * 13 | 14 | if __name__ == '__main__': 15 | parser = argparse.ArgumentParser() 16 | parser.add_argument('--weights', type=str, default='./yolov5s.pt', help='weights path') 17 | parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size') 18 | parser.add_argument('--batch-size', type=int, default=1, help='batch size') 19 | opt = parser.parse_args() 20 | print(opt) 21 | 22 | # Parameters 23 | f = opt.weights.replace('.pt', '.onnx') # onnx filename 24 | img = torch.zeros((opt.batch_size, 3, *opt.img_size)) # image size, (1, 3, 320, 192) iDetection 25 | 26 | # Load pytorch model 27 | google_utils.attempt_download(opt.weights) 28 | model = torch.load(opt.weights)['model'] 29 | model.eval() 30 | model.fuse() 31 | 32 | # Export to onnx 33 | model.model[-1].export = True # set Detect() layer export=True 34 | _ = model(img) # dry run 35 | torch.onnx.export(model, img, f, verbose=False, opset_version=11, input_names=['images'], 36 | output_names=['output']) # output_names=['classes', 'boxes'] 37 | 38 | # Check onnx model 39 | model = onnx.load(f) # load onnx model 40 | onnx.checker.check_model(model) # check onnx model 41 | print(onnx.helper.printable_graph(model.graph)) # print a human readable representation of the graph 42 | print('Export complete. ONNX model saved to %s\nView with https://github.com/lutzroeder/netron' % f) 43 | -------------------------------------------------------------------------------- /models/yolo.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | import yaml 4 | 5 | from models.experimental import * 6 | 7 | 8 | class Detect(nn.Module): 9 | def __init__(self, nc=80, anchors=()): # detection layer 10 | super(Detect, self).__init__() 11 | self.stride = None # strides computed during build 12 | self.nc = nc # number of classes 13 | self.no = nc + 5 # number of outputs per anchor 14 | self.nl = len(anchors) # number of detection layers 15 | self.na = len(anchors[0]) // 2 # number of anchors 16 | self.grid = [torch.zeros(1)] * self.nl # init grid 17 | a = torch.tensor(anchors).float().view(self.nl, -1, 2) 18 | self.register_buffer('anchors', a) # shape(nl,na,2) 19 | self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2)) # shape(nl,1,na,1,1,2) 20 | self.export = False # onnx export 21 | 22 | def forward(self, x): 23 | # x = x.copy() # for profiling 24 | z = [] # inference output 25 | self.training |= self.export 26 | for i in range(self.nl): 27 | bs, _, ny, nx = x[i].shape # x(bs,255,20,20) to x(bs,3,20,20,85) 28 | x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous() 29 | 30 | if not self.training: # inference 31 | if self.grid[i].shape[2:4] != x[i].shape[2:4]: 32 | self.grid[i] = self._make_grid(nx, ny).to(x[i].device) 33 | 34 | y = x[i].sigmoid() 35 | y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i].to(x[i].device)) * self.stride[i] # xy 36 | y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i] # wh 37 | z.append(y.view(bs, -1, self.no)) 38 | 39 | return x if self.training else (torch.cat(z, 1), x) 40 | 41 | @staticmethod 42 | def _make_grid(nx=20, ny=20): 43 | yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)]) 44 | return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float() 45 | 46 | 47 | class Model(nn.Module): 48 | def __init__(self, model_cfg='yolov5s.yaml', ch=3, nc=None): # model, input channels, number of classes 49 | super(Model, self).__init__() 50 | if type(model_cfg) is dict: 51 | self.md = model_cfg # model dict 52 | else: # is *.yaml 53 | with open(model_cfg) as f: 54 | self.md = yaml.load(f, Loader=yaml.FullLoader) # model dict 55 | 56 | # Define model 57 | if nc: 58 | self.md['nc'] = nc # override yaml value 59 | self.model, self.save = parse_model(self.md, ch=[ch]) # model, savelist, ch_out 60 | # print([x.shape for x in self.forward(torch.zeros(1, ch, 64, 64))]) 61 | 62 | # Build strides, anchors 63 | m = self.model[-1] # Detect() 64 | m.stride = torch.tensor([64 / x.shape[-2] for x in self.forward(torch.zeros(1, ch, 64, 64))]) # forward 65 | m.anchors /= m.stride.view(-1, 1, 1) 66 | self.stride = m.stride 67 | 68 | # Init weights, biases 69 | torch_utils.initialize_weights(self) 70 | self._initialize_biases() # only run once 71 | torch_utils.model_info(self) 72 | print('') 73 | 74 | def forward(self, x, augment=False, profile=False): 75 | if augment: 76 | img_size = x.shape[-2:] # height, width 77 | s = [0.83, 0.67] # scales 78 | y = [] 79 | for i, xi in enumerate((x, 80 | torch_utils.scale_img(x.flip(3), s[0]), # flip-lr and scale 81 | torch_utils.scale_img(x, s[1]), # scale 82 | )): 83 | # cv2.imwrite('img%g.jpg' % i, 255 * xi[0].numpy().transpose((1, 2, 0))[:, :, ::-1]) 84 | y.append(self.forward_once(xi)[0]) 85 | 86 | y[1][..., :4] /= s[0] # scale 87 | y[1][..., 0] = img_size[1] - y[1][..., 0] # flip lr 88 | y[2][..., :4] /= s[1] # scale 89 | return torch.cat(y, 1), None # augmented inference, train 90 | else: 91 | return self.forward_once(x, profile) # single-scale inference, train 92 | 93 | def forward_once(self, x, profile=False): 94 | y, dt = [], [] # outputs 95 | for m in self.model: 96 | if m.f != -1: # if not from previous layer 97 | x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers 98 | 99 | if profile: 100 | import thop 101 | o = thop.profile(m, inputs=(x,), verbose=False)[0] / 1E9 * 2 # FLOPS 102 | t = torch_utils.time_synchronized() 103 | for _ in range(10): 104 | _ = m(x) 105 | dt.append((torch_utils.time_synchronized() - t) * 100) 106 | print('%10.1f%10.0f%10.1fms %-40s' % (o, m.np, dt[-1], m.type)) 107 | 108 | x = m(x) # run 109 | y.append(x if m.i in self.save else None) # save output 110 | 111 | if profile: 112 | print('%.1fms total' % sum(dt)) 113 | return x 114 | 115 | def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency 116 | # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1. 117 | m = self.model[-1] # Detect() module 118 | for f, s in zip(m.f, m.stride): #  from 119 | mi = self.model[f % m.i] 120 | b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85) 121 | # b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image) 122 | # b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls 123 | b.data[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image) 124 | b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls 125 | 126 | mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True) 127 | # def _initialize_biases(self, cf=None): # initialize biases into Detect(), cf is class frequency 128 | # # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1. 129 | # m = self.model[-1] # Detect() module 130 | # for mi, s in zip(m.m, m.stride): # from 131 | # b = mi.bias.view(m.na, -1) # conv.bias(255) to (3,85) 132 | # with torch.no_grad(): 133 | # b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image) 134 | # b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls 135 | # mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True) 136 | 137 | def _print_biases(self): 138 | m = self.model[-1] # Detect() module 139 | for f in sorted([x % m.i for x in m.f]): #  from 140 | b = self.model[f].bias.detach().view(m.na, -1).T # conv.bias(255) to (3,85) 141 | print(('%g Conv2d.bias:' + '%10.3g' * 6) % (f, *b[:5].mean(1).tolist(), b[5:].mean())) 142 | 143 | # def _print_weights(self): 144 | # for m in self.model.modules(): 145 | # if type(m) is Bottleneck: 146 | # print('%10.3g' % (m.w.detach().sigmoid() * 2)) # shortcut weights 147 | 148 | def fuse(self): # fuse model Conv2d() + BatchNorm2d() layers 149 | print('Fusing layers...') 150 | for m in self.model.modules(): 151 | if type(m) is Conv: 152 | m.conv = torch_utils.fuse_conv_and_bn(m.conv, m.bn) # update conv 153 | m.bn = None # remove batchnorm 154 | m.forward = m.fuseforward # update forward 155 | torch_utils.model_info(self) 156 | 157 | 158 | def parse_model(md, ch): # model_dict, input_channels(3) 159 | print('\n%3s%15s%3s%10s %-40s%-30s' % ('', 'from', 'n', 'params', 'module', 'arguments')) 160 | anchors, nc, gd, gw = md['anchors'], md['nc'], md['depth_multiple'], md['width_multiple'] 161 | na = (len(anchors[0]) // 2) # number of anchors 162 | no = na * (nc + 5) # number of outputs = anchors * (classes + 5) 163 | 164 | layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out 165 | for i, (f, n, m, args) in enumerate(md['backbone'] + md['head']): # from, number, module, args 166 | m = eval(m) if isinstance(m, str) else m # eval strings 167 | for j, a in enumerate(args): 168 | try: 169 | args[j] = eval(a) if isinstance(a, str) else a # eval strings 170 | except: 171 | pass 172 | 173 | n = max(round(n * gd), 1) if n > 1 else n # depth gain 174 | if m in [nn.Conv2d, Conv, Bottleneck, SPP, DWConv, MixConv2d, Focus, ConvPlus, BottleneckCSP]: 175 | c1, c2 = ch[f], args[0] 176 | 177 | # Normal 178 | # if i > 0 and args[0] != no: # channel expansion factor 179 | # ex = 1.75 # exponential (default 2.0) 180 | # e = math.log(c2 / ch[1]) / math.log(2) 181 | # c2 = int(ch[1] * ex ** e) 182 | # if m != Focus: 183 | c2 = make_divisible(c2 * gw, 8) if c2 != no else c2 184 | 185 | # Experimental 186 | # if i > 0 and args[0] != no: # channel expansion factor 187 | # ex = 1 + gw # exponential (default 2.0) 188 | # ch1 = 32 # ch[1] 189 | # e = math.log(c2 / ch1) / math.log(2) # level 1-n 190 | # c2 = int(ch1 * ex ** e) 191 | # if m != Focus: 192 | # c2 = make_divisible(c2, 8) if c2 != no else c2 193 | 194 | args = [c1, c2, *args[1:]] 195 | if m is BottleneckCSP: 196 | args.insert(2, n) 197 | n = 1 198 | elif m is nn.BatchNorm2d: 199 | args = [ch[f]] 200 | elif m is Concat: 201 | c2 = sum([ch[-1 if x == -1 else x + 1] for x in f]) 202 | elif m is Detect: 203 | f = f or list(reversed([(-1 if j == i else j - 1) for j, x in enumerate(ch) if x == no])) 204 | else: 205 | c2 = ch[f] 206 | 207 | m_ = nn.Sequential(*[m(*args) for _ in range(n)]) if n > 1 else m(*args) # module 208 | t = str(m)[8:-2].replace('__main__.', '') # module type 209 | np = sum([x.numel() for x in m_.parameters()]) # number params 210 | m_.i, m_.f, m_.type, m_.np = i, f, t, np # attach index, 'from' index, type, number params 211 | print('%3s%15s%3s%10.0f %-40s%-30s' % (i, f, n, np, t, args)) # print 212 | save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist 213 | layers.append(m_) 214 | ch.append(c2) 215 | return nn.Sequential(*layers), sorted(save) 216 | 217 | 218 | if __name__ == '__main__': 219 | parser = argparse.ArgumentParser() 220 | parser.add_argument('--cfg', type=str, default='yolov5s.yaml', help='model.yaml') 221 | parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') 222 | opt = parser.parse_args() 223 | opt.cfg = glob.glob('./**/' + opt.cfg, recursive=True)[0] # find file 224 | 225 | device = torch_utils.select_device(opt.device) 226 | 227 | # Create model 228 | model = Model(opt.cfg).to(device) 229 | model.train() 230 | 231 | # Profile 232 | # img = torch.rand(8 if torch.cuda.is_available() else 1, 3, 640, 640).to(device) 233 | # y = model(img, profile=True) 234 | # print([y[0].shape] + [x.shape for x in y[1]]) 235 | 236 | # ONNX export 237 | # model.model[-1].export = True 238 | # torch.onnx.export(model, img, f.replace('.yaml', '.onnx'), verbose=True, opset_version=11) 239 | 240 | # Tensorboard 241 | # from torch.utils.tensorboard import SummaryWriter 242 | # tb_writer = SummaryWriter() 243 | # print("Run 'tensorboard --logdir=models/runs' to view tensorboard at http://localhost:6006/") 244 | # tb_writer.add_graph(model.model, img) # add model to tensorboard 245 | # tb_writer.add_image('test', img[0], dataformats='CWH') # add model to tensorboard 246 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch==1.8.0 2 | torchvision==0.9.0 3 | numpy 4 | opencv-python 5 | lxml 6 | tqdm 7 | flask 8 | pillow 9 | tensorboard 10 | pycocotools # pycocotools-windows -------------------------------------------------------------------------------- /static/client.js: -------------------------------------------------------------------------------- 1 | var el = x => document.getElementById(x); 2 | 3 | function showPicker() { 4 | el("file-input").click(); 5 | } 6 | 7 | function showPicked(input) { 8 | el("upload-label").innerHTML = input.files[0].name; 9 | 10 | var reader = new FileReader(); 11 | reader.onload = function (e) { 12 | if (e.target.result.split("/")[0].split(":")[1] == "image"){ 13 | el("image-picked").src = e.target.result; 14 | el("image-picked").className = ""; 15 | el("image-picked1").className = "no-display"; 16 | } 17 | else{ 18 | el("image-picked1").src = e.target.result; 19 | el("image-picked1").className = ""; 20 | el("image-picked").className = "no-display"; 21 | } 22 | }; 23 | reader.readAsDataURL(input.files[0]); 24 | } -------------------------------------------------------------------------------- /static/style.css: -------------------------------------------------------------------------------- 1 | .modal { 2 | display: none; 3 | position: fixed; 4 | z-index: 1000; 5 | top: 0; 6 | left: 0; 7 | height: 100%; 8 | width: 100%; 9 | background: rgba( 255, 255, 255, .8 ) 10 | url('/static/ajax-loader.gif') 11 | 50% 50% 12 | no-repeat; 13 | } 14 | 15 | /* When the body has the loading class, we turn 16 | the scrollbar off with overflow:hidden */ 17 | body.loading .modal { 18 | overflow: hidden; 19 | } 20 | 21 | /* Anytime the body has the loading class, our 22 | modal element will be visible */ 23 | body.loading .modal { 24 | display: block; 25 | } -------------------------------------------------------------------------------- /static/style1.css: -------------------------------------------------------------------------------- 1 | body { 2 | background-color: #fff; 3 | } 4 | 5 | .no-display { 6 | display: none; 7 | } 8 | 9 | .center { 10 | margin: auto; 11 | padding: 10px 50px; 12 | text-align: center; 13 | font-size: 14px; 14 | } 15 | 16 | .title { 17 | font-size: 30px; 18 | margin-top: 1em; 19 | margin-bottom: 1em; 20 | color: #262626; 21 | } 22 | 23 | .content { 24 | margin-top: 10em; 25 | } 26 | 27 | .analyze { 28 | margin-top: 5em; 29 | } 30 | 31 | .upload-label { 32 | padding: 10px; 33 | font-size: 12px; 34 | } 35 | 36 | .result-label { 37 | margin-top: 0.5em; 38 | padding: 10px; 39 | font-size: 13px; 40 | } 41 | 42 | button.choose-file-button { 43 | width: 200px; 44 | height: 40px; 45 | border-radius: 2px; 46 | background-color: #ffffff; 47 | border: solid 1px #ff8100; 48 | font-size: 13px; 49 | color: #ff8100; 50 | } 51 | 52 | button.analyze-button { 53 | width: 200px; 54 | height: 40px; 55 | border: solid 1px #ff8100; 56 | border-radius: 2px; 57 | background-color: #ff8100; 58 | font-size: 13px; 59 | color: #ffffff; 60 | } 61 | 62 | button:focus { 63 | outline: 0; 64 | } 65 | -------------------------------------------------------------------------------- /static/worker.js: -------------------------------------------------------------------------------- 1 | $('#detections').hide() 2 | var $loading = $('#loading').hide(); 3 | 4 | $('#updateCamera').click(function (event) { 5 | event.preventDefault(); 6 | const data = { 7 | "gray": $('#gray').is(":checked"), 8 | "gaussian": $('#gaussian').is(":checked"), 9 | "sobel": $('#sobel').is(":checked"), 10 | "canny": $('#canny').is(":checked"), 11 | } 12 | console.log(data) 13 | $.ajax({ 14 | type: 'POST', 15 | url: '/cameraParams', 16 | data: data, 17 | success: function (success) { 18 | console.log(success) 19 | }, error: function (error) { 20 | console.log(error) 21 | } 22 | }) 23 | }); 24 | 25 | var loadFile = function (event) { 26 | var output = document.getElementById('input'); 27 | output.src = URL.createObjectURL(event.target.files[0]); 28 | }; 29 | 30 | $(document) 31 | .ajaxStart(function () { 32 | $loading.show(); 33 | }) 34 | .ajaxStop(function () { 35 | $loading.hide(); 36 | }); 37 | -------------------------------------------------------------------------------- /templates/index1.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 11 | 12 | 13 | yolo deepsort 14 | 15 | 16 |
17 | 24 |
25 |
26 |
27 |
Target Detection and Multi-Target Tracking Platform
28 |
29 |
30 |
32 | 38 |
39 | 40 |
41 | 42 |
43 |
44 | Chosen Image 45 |
47 |
48 | 49 |
50 | 51 |
52 |
53 |
54 | 55 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import json 3 | 4 | import yaml 5 | from torch.utils.data import DataLoader 6 | 7 | from utils.datasets import * 8 | from utils.utils import * 9 | 10 | 11 | def test(data, 12 | weights=None, 13 | batch_size=16, 14 | imgsz=640, 15 | conf_thres=0.001, 16 | iou_thres=0.6, # for nms 17 | save_json=False, 18 | single_cls=False, 19 | augment=False, 20 | model=None, 21 | dataloader=None, 22 | fast=False, 23 | verbose=False): # 0 fast, 1 accurate 24 | # Initialize/load model and set device 25 | if model is None: 26 | device = torch_utils.select_device(opt.device, batch_size=batch_size) 27 | 28 | # Remove previous 29 | for f in glob.glob('test_batch*.jpg'): 30 | os.remove(f) 31 | 32 | # Load model 33 | google_utils.attempt_download(weights) 34 | model = torch.load(weights, map_location=device)['model'] 35 | torch_utils.model_info(model) 36 | # model.fuse() 37 | model.to(device) 38 | 39 | if device.type != 'cpu' and torch.cuda.device_count() > 1: 40 | model = nn.DataParallel(model) 41 | 42 | training = False 43 | else: # called by train.py 44 | device = next(model.parameters()).device # get model device 45 | training = True 46 | 47 | # Configure run 48 | with open(data) as f: 49 | data = yaml.load(f, Loader=yaml.FullLoader) # model dict 50 | nc = 1 if single_cls else int(data['nc']) # number of classes 51 | iouv = torch.linspace(0.5, 0.95, 10).to(device) # iou vector for mAP@0.5:0.95 52 | # iouv = iouv[0].view(1) # comment for mAP@0.5:0.95 53 | niou = iouv.numel() 54 | 55 | # Dataloader 56 | if dataloader is None: 57 | fast |= conf_thres > 0.001 # enable fast mode 58 | path = data['test'] if opt.task == 'test' else data['val'] # path to val/test images 59 | dataset = LoadImagesAndLabels(path, 60 | imgsz, 61 | batch_size, 62 | rect=True, # rectangular inference 63 | single_cls=opt.single_cls, # single class mode 64 | pad=0.0 if fast else 0.5) # padding 65 | batch_size = min(batch_size, len(dataset)) 66 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers 67 | dataloader = DataLoader(dataset, 68 | batch_size=batch_size, 69 | num_workers=nw, 70 | pin_memory=True, 71 | collate_fn=dataset.collate_fn) 72 | 73 | seen = 0 74 | model.eval() 75 | _ = model(torch.zeros((1, 3, imgsz, imgsz), device=device)) if device.type != 'cpu' else None # run once 76 | names = model.names if hasattr(model, 'names') else model.module.names 77 | coco91class = coco80_to_coco91_class() 78 | s = ('%20s' + '%12s' * 6) % ('Class', 'Images', 'Targets', 'P', 'R', 'mAP@.5', 'mAP@.5:.95') 79 | p, r, f1, mp, mr, map50, map, t0, t1 = 0., 0., 0., 0., 0., 0., 0., 0., 0. 80 | loss = torch.zeros(3, device=device) 81 | jdict, stats, ap, ap_class = [], [], [], [] 82 | for batch_i, (imgs, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)): 83 | imgs = imgs.to(device).float() / 255.0 # uint8 to float32, 0 - 255 to 0.0 - 1.0 84 | targets = targets.to(device) 85 | nb, _, height, width = imgs.shape # batch size, channels, height, width 86 | whwh = torch.Tensor([width, height, width, height]).to(device) 87 | 88 | # Disable gradients 89 | with torch.no_grad(): 90 | # Run model 91 | t = torch_utils.time_synchronized() 92 | inf_out, train_out = model(imgs, augment=augment) # inference and training outputs 93 | t0 += torch_utils.time_synchronized() - t 94 | 95 | # Compute loss 96 | if training: # if model has loss hyperparameters 97 | loss += compute_loss(train_out, targets, model)[1][:3] # GIoU, obj, cls 98 | 99 | # Run NMS 100 | t = torch_utils.time_synchronized() 101 | output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres, fast=fast) 102 | t1 += torch_utils.time_synchronized() - t 103 | 104 | # Statistics per image 105 | for si, pred in enumerate(output): 106 | labels = targets[targets[:, 0] == si, 1:] 107 | nl = len(labels) 108 | tcls = labels[:, 0].tolist() if nl else [] # target class 109 | seen += 1 110 | 111 | if pred is None: 112 | if nl: 113 | stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls)) 114 | continue 115 | 116 | # Append to text file 117 | # with open('test.txt', 'a') as file: 118 | # [file.write('%11.5g' * 7 % tuple(x) + '\n') for x in pred] 119 | 120 | # Clip boxes to image bounds 121 | clip_coords(pred, (height, width)) 122 | 123 | # Append to pycocotools JSON dictionary 124 | if save_json: 125 | # [{"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}, ... 126 | image_id = int(Path(paths[si]).stem.split('_')[-1]) 127 | box = pred[:, :4].clone() # xyxy 128 | scale_coords(imgs[si].shape[1:], box, shapes[si][0], shapes[si][1]) # to original shape 129 | box = xyxy2xywh(box) # xywh 130 | box[:, :2] -= box[:, 2:] / 2 # xy center to top-left corner 131 | for p, b in zip(pred.tolist(), box.tolist()): 132 | jdict.append({'image_id': image_id, 133 | 'category_id': coco91class[int(p[5])], 134 | 'bbox': [round(x, 3) for x in b], 135 | 'score': round(p[4], 5)}) 136 | 137 | # Assign all predictions as incorrect 138 | correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool, device=device) 139 | if nl: 140 | detected = [] # target indices 141 | tcls_tensor = labels[:, 0] 142 | 143 | # target boxes 144 | tbox = xywh2xyxy(labels[:, 1:5]) * whwh 145 | 146 | # Per target class 147 | for cls in torch.unique(tcls_tensor): 148 | ti = (cls == tcls_tensor).nonzero().view(-1) # prediction indices 149 | pi = (cls == pred[:, 5]).nonzero().view(-1) # target indices 150 | 151 | # Search for detections 152 | if pi.shape[0]: 153 | # Prediction to target ious 154 | ious, i = box_iou(pred[pi, :4], tbox[ti]).max(1) # best ious, indices 155 | 156 | # Append detections 157 | for j in (ious > iouv[0]).nonzero(): 158 | d = ti[i[j]] # detected target 159 | if d not in detected: 160 | detected.append(d) 161 | correct[pi[j]] = ious[j] > iouv # iou_thres is 1xn 162 | if len(detected) == nl: # all targets already located in image 163 | break 164 | 165 | # Append statistics (correct, conf, pcls, tcls) 166 | stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls)) 167 | 168 | # Plot images 169 | if batch_i < 1: 170 | f = 'test_batch%g_gt.jpg' % batch_i # filename 171 | plot_images(imgs, targets, paths, f, names) # ground truth 172 | f = 'test_batch%g_pred.jpg' % batch_i 173 | plot_images(imgs, output_to_target(output, width, height), paths, f, names) # predictions 174 | 175 | # Compute statistics 176 | stats = [np.concatenate(x, 0) for x in zip(*stats)] # to numpy 177 | if len(stats): 178 | p, r, ap, f1, ap_class = ap_per_class(*stats) 179 | p, r, ap50, ap = p[:, 0], r[:, 0], ap[:, 0], ap.mean(1) # [P, R, AP@0.5, AP@0.5:0.95] 180 | mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean() 181 | nt = np.bincount(stats[3].astype(np.int64), minlength=nc) # number of targets per class 182 | else: 183 | nt = torch.zeros(1) 184 | 185 | # Print results 186 | pf = '%20s' + '%12.3g' * 6 # print format 187 | print(pf % ('all', seen, nt.sum(), mp, mr, map50, map)) 188 | 189 | # Print results per class 190 | if verbose and nc > 1 and len(stats): 191 | for i, c in enumerate(ap_class): 192 | print(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i])) 193 | 194 | # Print speeds 195 | t = tuple(x / seen * 1E3 for x in (t0, t1, t0 + t1)) + (imgsz, imgsz, batch_size) # tuple 196 | if not training: 197 | print('Speed: %.1f/%.1f/%.1f ms inference/NMS/total per %gx%g image at batch-size %g' % t) 198 | 199 | # Save JSON 200 | if save_json and map50 and len(jdict): 201 | imgIds = [int(Path(x).stem.split('_')[-1]) for x in dataloader.dataset.img_files] 202 | f = 'detections_val2017_%s_results.json' % \ 203 | (weights.split(os.sep)[-1].replace('.pt', '') if weights else '') # filename 204 | print('\nCOCO mAP with pycocotools... saving %s...' % f) 205 | with open(f, 'w') as file: 206 | json.dump(jdict, file) 207 | 208 | try: 209 | from pycocotools.coco import COCO 210 | from pycocotools.cocoeval import COCOeval 211 | 212 | # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb 213 | cocoGt = COCO(glob.glob('../coco/annotations/instances_val*.json')[0]) # initialize COCO ground truth api 214 | cocoDt = cocoGt.loadRes(f) # initialize COCO pred api 215 | 216 | cocoEval = COCOeval(cocoGt, cocoDt, 'bbox') 217 | cocoEval.params.imgIds = imgIds # [:32] # only evaluate these images 218 | cocoEval.evaluate() 219 | cocoEval.accumulate() 220 | cocoEval.summarize() 221 | map, map50 = cocoEval.stats[:2] # update to pycocotools results (mAP@0.5:0.95, mAP@0.5) 222 | except: 223 | print('WARNING: pycocotools must be installed with numpy==1.17 to run correctly. ' 224 | 'See https://github.com/cocodataset/cocoapi/issues/356') 225 | 226 | # Return results 227 | maps = np.zeros(nc) + map 228 | for i, c in enumerate(ap_class): 229 | maps[c] = ap[i] 230 | return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t 231 | 232 | 233 | if __name__ == '__main__': 234 | parser = argparse.ArgumentParser(prog='test.py') 235 | parser.add_argument('--weights', type=str, default='weights/yolov5s.pt', help='model.pt path') 236 | parser.add_argument('--data', type=str, default='data/coco.yaml', help='*.data path') 237 | parser.add_argument('--batch-size', type=int, default=32, help='size of each image batch') 238 | parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)') 239 | parser.add_argument('--conf-thres', type=float, default=0.001, help='object confidence threshold') 240 | parser.add_argument('--iou-thres', type=float, default=0.65, help='IOU threshold for NMS') 241 | parser.add_argument('--save-json', action='store_true', help='save a cocoapi-compatible JSON results file') 242 | parser.add_argument('--task', default='val', help="'val', 'test', 'study'") 243 | parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') 244 | parser.add_argument('--single-cls', action='store_true', help='treat as single-class dataset') 245 | parser.add_argument('--augment', action='store_true', help='augmented inference') 246 | parser.add_argument('--verbose', action='store_true', help='report mAP by class') 247 | opt = parser.parse_args() 248 | opt.save_json = opt.save_json or opt.data.endswith('coco.yaml') 249 | opt.data = glob.glob('./**/' + opt.data, recursive=True)[0] # find file 250 | print(opt) 251 | 252 | # task = 'val', 'test', 'study' 253 | if opt.task in ['val', 'test']: # (default) run normally 254 | test(opt.data, 255 | opt.weights, 256 | opt.batch_size, 257 | opt.img_size, 258 | opt.conf_thres, 259 | opt.iou_thres, 260 | opt.save_json, 261 | opt.single_cls, 262 | opt.augment) 263 | 264 | elif opt.task == 'study': # run over a range of settings and save/plot 265 | for weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']: 266 | f = 'study_%s_%s.txt' % (Path(opt.data).stem, Path(weights).stem) # filename to save to 267 | x = list(range(288, 896, 64)) # x axis 268 | y = [] # y axis 269 | for i in x: # img-size 270 | print('\nRunning %s point %s...' % (f, i)) 271 | r, _, t = test(opt.data, weights, opt.batch_size, i, opt.conf_thres, opt.iou_thres, opt.save_json) 272 | y.append(r + t) # results and times 273 | np.savetxt(f, y, fmt='%10.4g') # save 274 | os.system('zip -r study.zip study_*.txt') 275 | # plot_study_txt(f, x) # plot 276 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import torch.distributed as dist 3 | import torch.nn.functional as F 4 | import torch.optim as optim 5 | import torch.optim.lr_scheduler as lr_scheduler 6 | import yaml 7 | from torch.utils.tensorboard import SummaryWriter 8 | import test # import test.py to get mAP after each epoch 9 | from models.yolo import Model 10 | from utils.datasets import * 11 | from utils.utils import * 12 | mixed_precision = True 13 | try: # Mixed precision training https://github.com/NVIDIA/apex 14 | from apex import amp 15 | except: 16 | print('Apex recommended for faster mixed precision training: https://github.com/NVIDIA/apex') 17 | mixed_precision = False # not installed 18 | wdir = 'weights' + os.sep # weights dir 19 | last = wdir + 'last.pt' 20 | best = wdir + 'best.pt' 21 | results_file = 'results.txt' 22 | # Hyperparameters 23 | hyp = {'lr0': 0.01, # initial learning rate (SGD=1E-2, Adam=1E-3) 24 | 'momentum': 0.937, # SGD momentum 25 | 'weight_decay': 5e-4, # optimizer weight decay 26 | 'giou': 0.05, # giou loss gain 27 | 'cls': 0.58, # cls loss gain 28 | 'cls_pw': 1.0, # cls BCELoss positive_weight 29 | 'obj': 1.0, # obj loss gain (*=img_size/320 if img_size != 320) 30 | 'obj_pw': 1.0, # obj BCELoss positive_weight 31 | 'iou_t': 0.20, # iou training threshold 32 | 'anchor_t': 4.0, # anchor-multiple threshold 33 | 'fl_gamma': 0, # focal loss gamma (efficientDet default is gamma=1.5) 34 | 'hsv_h': 0.014, # image HSV-Hue augmentation (fraction) 35 | 'hsv_s': 0.68, # image HSV-Saturation augmentation (fraction) 36 | 'hsv_v': 0.36, # image HSV-Value augmentation (fraction) 37 | 'degrees': 0.0, # image rotation (+/- deg) 38 | 'translate': 0.0, # image translation (+/- fraction) 39 | 'scale': 0.5, # image scale (+/- gain) 40 | 'shear': 0.0} # image shear (+/- deg) 41 | print(hyp) 42 | 43 | # Overwrite hyp with hyp*.txt (optional) 44 | f = glob.glob('hyp*.txt') 45 | if f: 46 | print('Using %s' % f[0]) 47 | for k, v in zip(hyp.keys(), np.loadtxt(f[0])): 48 | hyp[k] = v 49 | 50 | # Print focal loss if gamma > 0 51 | if hyp['fl_gamma']: 52 | print('Using FocalLoss(gamma=%g)' % hyp['fl_gamma']) 53 | 54 | def train(hyp): 55 | epochs = opt.epochs # 300 56 | batch_size = opt.batch_size # 64 57 | weights = opt.weights # initial training weights 58 | 59 | # Configure 60 | init_seeds(1) 61 | with open(opt.data,'r',encoding='UTF-8') as f: 62 | data_dict = yaml.load(f, Loader=yaml.FullLoader) # model dict 63 | train_path = data_dict['train'] 64 | test_path = data_dict['val'] 65 | nc = 1 if opt.single_cls else int(data_dict['nc']) # number of classes 66 | 67 | # Remove previous results 68 | for f in glob.glob('*_batch*.jpg') + glob.glob(results_file): 69 | os.remove(f) 70 | 71 | # Create model 72 | model = Model(opt.cfg).to(device) 73 | assert model.md['nc'] == nc, '%s nc=%g classes but %s nc=%g classes' % (opt.data, nc, opt.cfg, model.md['nc']) 74 | 75 | # Image sizes 76 | gs = int(max(model.stride)) # grid size (max stride) 77 | if any(x % gs != 0 for x in opt.img_size): 78 | print('WARNING: --img-size %g,%g must be multiple of %s max stride %g' % (*opt.img_size, opt.cfg, gs)) 79 | imgsz, imgsz_test = [make_divisible(x, gs) for x in opt.img_size] # image sizes (train, test) 80 | 81 | # Optimizer 82 | nbs = 64 # nominal batch size 83 | accumulate = max(round(nbs / batch_size), 1) # accumulate loss before optimizing 84 | hyp['weight_decay'] *= batch_size * accumulate / nbs # scale weight_decay 85 | pg0, pg1, pg2 = [], [], [] # optimizer parameter groups 86 | for k, v in model.named_parameters(): 87 | if v.requires_grad: 88 | if '.bias' in k: 89 | pg2.append(v) # biases 90 | elif '.weight' in k and '.bn' not in k: 91 | pg1.append(v) # apply weight decay 92 | else: 93 | pg0.append(v) # all else 94 | 95 | optimizer = optim.Adam(pg0, lr=hyp['lr0']) if opt.adam else \ 96 | optim.SGD(pg0, lr=hyp['lr0'], momentum=hyp['momentum'], nesterov=True) 97 | optimizer.add_param_group({'params': pg1, 'weight_decay': hyp['weight_decay']}) # add pg1 with weight_decay 98 | optimizer.add_param_group({'params': pg2}) # add pg2 (biases) 99 | print('Optimizer groups: %g .bias, %g conv.weight, %g other' % (len(pg2), len(pg1), len(pg0))) 100 | del pg0, pg1, pg2 101 | 102 | # Load Model 103 | google_utils.attempt_download(weights) 104 | start_epoch, best_fitness = 0, 0.0 105 | if weights.endswith('.pt'): # pytorch format 106 | ckpt = torch.load(weights, map_location=device) # load checkpoint 107 | 108 | # load model 109 | try: 110 | ckpt['model'] = \ 111 | {k: v for k, v in ckpt['model'].state_dict().items() if model.state_dict()[k].numel() == v.numel()} 112 | model.load_state_dict(ckpt['model'], strict=False) 113 | except KeyError as e: 114 | s = "%s is not compatible with %s. Specify --weights '' or specify a --cfg compatible with %s." \ 115 | % (opt.weights, opt.cfg, opt.weights) 116 | raise KeyError(s) from e 117 | 118 | # load optimizer 119 | if ckpt['optimizer'] is not None: 120 | optimizer.load_state_dict(ckpt['optimizer']) 121 | best_fitness = ckpt['best_fitness'] 122 | 123 | # load results 124 | if ckpt.get('training_results') is not None: 125 | with open(results_file, 'w') as file: 126 | file.write(ckpt['training_results']) # write results.txt 127 | 128 | start_epoch = ckpt['epoch'] + 1 129 | del ckpt 130 | 131 | if mixed_precision: 132 | model, optimizer = amp.initialize(model, optimizer, opt_level='O1', verbosity=0) 133 | 134 | lf = lambda x: (((1 + math.cos(x * math.pi / epochs)) / 2) ** 1.0) * 0.9 + 0.1 # cosine 135 | scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf) 136 | scheduler.last_epoch = start_epoch - 1 # do not move 137 | 138 | # Initialize distributed training 139 | if device.type != 'cpu' and torch.cuda.device_count() > 1 and torch.distributed.is_available(): 140 | dist.init_process_group(backend='nccl', # distributed backend 141 | init_method='tcp://127.0.0.1:9999', # init method 142 | world_size=1, # number of nodes 143 | rank=0) # node rank 144 | model = torch.nn.parallel.DistributedDataParallel(model) 145 | 146 | # Dataset 147 | dataset = LoadImagesAndLabels(train_path, imgsz, batch_size, 148 | augment=True, 149 | hyp=hyp, # augmentation hyperparameters 150 | rect=opt.rect, # rectangular training 151 | cache_images=opt.cache_images, 152 | single_cls=opt.single_cls) 153 | mlc = np.concatenate(dataset.labels, 0)[:, 0].max() # max label class 154 | assert mlc < nc, 'Label class %g exceeds nc=%g in %s. Correct your labels or your model.' % (mlc, nc, opt.cfg) 155 | 156 | # Dataloader 157 | batch_size = min(batch_size, len(dataset)) 158 | nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8]) # number of workers 159 | nw = 0 160 | dataloader = torch.utils.data.DataLoader(dataset, 161 | batch_size=batch_size, 162 | num_workers=nw, 163 | shuffle=not opt.rect, # Shuffle=True unless rectangular training is used 164 | pin_memory=True, 165 | collate_fn=dataset.collate_fn) 166 | 167 | # Testloader 168 | testloader = torch.utils.data.DataLoader(LoadImagesAndLabels(test_path, imgsz_test, batch_size, 169 | hyp=hyp, 170 | rect=True, 171 | cache_images=opt.cache_images, 172 | single_cls=opt.single_cls), 173 | batch_size=batch_size, 174 | num_workers=nw, 175 | pin_memory=True, 176 | collate_fn=dataset.collate_fn) 177 | 178 | # Model parameters 179 | hyp['cls'] *= nc / 80. # scale coco-tuned hyp['cls'] to current dataset 180 | model.nc = nc # attach number of classes to model 181 | model.hyp = hyp # attach hyperparameters to model 182 | model.gr = 1.0 # giou loss ratio (obj_loss = 1.0 or giou) 183 | model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) # attach class weights 184 | model.names = data_dict['names'] 185 | 186 | # class frequency 187 | labels = np.concatenate(dataset.labels, 0) 188 | c = torch.tensor(labels[:, 0]) # classes 189 | tb_writer.add_histogram('classes', c, 0) 190 | 191 | # Exponential moving average 192 | ema = torch_utils.ModelEMA(model) 193 | 194 | # Start training 195 | t0 = time.time() 196 | nb = len(dataloader) # number of batches 197 | n_burn = max(3 * nb, 1e3) # burn-in iterations, max(3 epochs, 1k iterations) 198 | maps = np.zeros(nc) # mAP per class 199 | results = (0, 0, 0, 0, 0, 0, 0) # 'P', 'R', 'mAP', 'F1', 'val GIoU', 'val Objectness', 'val Classification' 200 | print('Image sizes %g train, %g test' % (imgsz, imgsz_test)) 201 | print('Using %g dataloader workers' % nw) 202 | print('Starting training for %g epochs...' % epochs) 203 | # torch.autograd.set_detect_anomaly(True) 204 | for epoch in range(start_epoch, epochs): # epoch ------------------------------------------------------------------ 205 | model.train() 206 | 207 | # Update image weights (optional) 208 | if dataset.image_weights: 209 | w = model.class_weights.cpu().numpy() * (1 - maps) ** 2 # class weights 210 | image_weights = labels_to_image_weights(dataset.labels, nc=nc, class_weights=w) 211 | dataset.indices = random.choices(range(dataset.n), weights=image_weights, k=dataset.n) # rand weighted idx 212 | 213 | mloss = torch.zeros(4, device=device) # mean losses 214 | print(('\n' + '%10s' * 8) % ('Epoch', 'gpu_mem', 'GIoU', 'obj', 'cls', 'total', 'targets', 'img_size')) 215 | try: 216 | pbar = tqdm(enumerate(dataloader), total=nb) # progress bar 217 | for i, (imgs, targets, paths, _) in pbar: # batch ------------------------------------------------------------- 218 | ni = i + nb * epoch # number integrated batches (since train start) 219 | imgs = imgs.to(device).float() / 255.0 # uint8 to float32, 0 - 255 to 0.0 - 1.0 220 | 221 | # Burn-in 222 | if ni <= n_burn: 223 | xi = [0, n_burn] # x interp 224 | # model.gr = np.interp(ni, xi, [0.0, 1.0]) # giou loss ratio (obj_loss = 1.0 or giou) 225 | accumulate = max(1, np.interp(ni, xi, [1, nbs / batch_size]).round()) 226 | for j, x in enumerate(optimizer.param_groups): 227 | # bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0 228 | x['lr'] = np.interp(ni, xi, [0.1 if j == 2 else 0.0, x['initial_lr'] * lf(epoch)]) 229 | if 'momentum' in x: 230 | x['momentum'] = np.interp(ni, xi, [0.9, hyp['momentum']]) 231 | 232 | # Multi-scale 233 | if opt.multi_scale: 234 | sz = random.randrange(imgsz * 0.5, imgsz * 1.5 + gs) // gs * gs # size 235 | sf = sz / max(imgs.shape[2:]) # scale factor 236 | if sf != 1: 237 | ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]] # new shape (stretched to gs-multiple) 238 | imgs = F.interpolate(imgs, size=ns, mode='bilinear', align_corners=False) 239 | 240 | # Forward 241 | pred = model(imgs) 242 | 243 | # Loss 244 | loss, loss_items = compute_loss(pred, targets.to(device), model) 245 | if not torch.isfinite(loss): 246 | print('WARNING: non-finite loss, ending training ', loss_items) 247 | return results 248 | 249 | # Backward 250 | if mixed_precision: 251 | with amp.scale_loss(loss, optimizer) as scaled_loss: 252 | scaled_loss.backward() 253 | else: 254 | loss.backward() 255 | 256 | # Optimize 257 | if ni % accumulate == 0: 258 | optimizer.step() 259 | optimizer.zero_grad() 260 | ema.update(model) 261 | 262 | # Print 263 | mloss = (mloss * i + loss_items) / (i + 1) # update mean losses 264 | mem = '%.3gG' % (torch.cuda.memory_cached() / 1E9 if torch.cuda.is_available() else 0) # (GB) 265 | s = ('%10s' * 2 + '%10.4g' * 6) % ( 266 | '%g/%g' % (epoch, epochs - 1), mem, *mloss, targets.shape[0], imgs.shape[-1]) 267 | pbar.set_description(s) 268 | 269 | # Plot 270 | if ni < 3: 271 | f = 'train_batch%g.jpg' % i # filename 272 | res = plot_images(images=imgs, targets=targets, paths=paths, fname=f) 273 | if tb_writer: 274 | tb_writer.add_image(f, res, dataformats='HWC', global_step=epoch) 275 | # tb_writer.add_graph(model, imgs) # add model to tensorboard 276 | # end batch ------------------------------------------------------------------------------------------------ 277 | except: 278 | pass 279 | # Scheduler 280 | scheduler.step() 281 | 282 | torch.cuda.empty_cache() 283 | # mAP 284 | ema.update_attr(model) 285 | final_epoch = epoch + 1 == epochs 286 | if not opt.notest or final_epoch: # Calculate mAP 287 | results, maps, times = test.test(opt.data, 288 | batch_size=batch_size, 289 | imgsz=imgsz_test, 290 | save_json=final_epoch and opt.data.endswith(os.sep + 'coco.yaml'), 291 | model=ema.ema, 292 | single_cls=opt.single_cls, 293 | dataloader=testloader, 294 | fast=ni < n_burn) 295 | 296 | # Write 297 | with open(results_file, 'a') as f: 298 | f.write(s + '%10.4g' * 7 % results + '\n') # P, R, mAP, F1, test_losses=(GIoU, obj, cls) 299 | if len(opt.name) and opt.bucket: 300 | os.system('gsutil cp results.txt gs://%s/results/results%s.txt' % (opt.bucket, opt.name)) 301 | 302 | # Tensorboard 303 | if tb_writer: 304 | tags = ['train/giou_loss', 'train/obj_loss', 'train/cls_loss', 305 | 'metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/F1', 306 | 'val/giou_loss', 'val/obj_loss', 'val/cls_loss'] 307 | for x, tag in zip(list(mloss[:-1]) + list(results), tags): 308 | tb_writer.add_scalar(tag, x, epoch) 309 | 310 | # Update best mAP 311 | fi = fitness(np.array(results).reshape(1, -1)) # fitness_i = weighted combination of [P, R, mAP, F1] 312 | if fi > best_fitness: 313 | best_fitness = fi 314 | 315 | # Save model 316 | save = (not opt.nosave) or (final_epoch and not opt.evolve) 317 | if save: 318 | with open(results_file, 'r') as f: # create checkpoint 319 | ckpt = {'epoch': epoch, 320 | 'best_fitness': best_fitness, 321 | 'training_results': f.read(), 322 | 'model': ema.ema.module if hasattr(model, 'module') else ema.ema, 323 | 'optimizer': None if final_epoch else optimizer.state_dict()} 324 | 325 | # Save last, best and delete 326 | torch.save(ckpt, last) 327 | if (best_fitness == fi) and not final_epoch: 328 | torch.save(ckpt, best) 329 | del ckpt 330 | 331 | # end epoch ---------------------------------------------------------------------------------------------------- 332 | # end training 333 | 334 | n = opt.name 335 | if len(n): 336 | n = '_' + n if not n.isnumeric() else n 337 | fresults, flast, fbest = 'results%s.txt' % n, wdir + 'last%s.pt' % n, wdir + 'best%s.pt' % n 338 | for f1, f2 in zip([wdir + 'last.pt', wdir + 'best.pt', 'results.txt'], [flast, fbest, fresults]): 339 | if os.path.exists(f1): 340 | os.rename(f1, f2) # rename 341 | ispt = f2.endswith('.pt') # is *.pt 342 | strip_optimizer(f2) if ispt else None # strip optimizer 343 | os.system('gsutil cp %s gs://%s/weights' % (f2, opt.bucket)) if opt.bucket and ispt else None # upload 344 | 345 | if not opt.evolve: 346 | # plot_results() # save as results.png 347 | pass 348 | print('%g epochs completed in %.3f hours.\n' % (epoch - start_epoch + 1, (time.time() - t0) / 3600)) 349 | dist.destroy_process_group() if torch.cuda.device_count() > 1 else None 350 | torch.cuda.empty_cache() 351 | return results 352 | 353 | if __name__ == '__main__': 354 | parser = argparse.ArgumentParser() 355 | parser.add_argument('--epochs', type=int, default=300) 356 | parser.add_argument('--batch-size', type=int, default=1) 357 | parser.add_argument('--cfg', type=str, default='./config/yolov5l.yaml', help='*.cfg path') 358 | parser.add_argument('--data', type=str, default='./config/score.yaml', help='*.data path') 359 | parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='train,test sizes') 360 | parser.add_argument('--rect', action='store_true', help='rectangular training') 361 | parser.add_argument('--resume', action='store_true', help='resume training from last.pt') 362 | parser.add_argument('--nosave', action='store_true', help='only save final checkpoint') 363 | parser.add_argument('--notest', action='store_true', help='only test final epoch') 364 | parser.add_argument('--evolve', action='store_true', help='evolve hyperparameters') 365 | parser.add_argument('--bucket', type=str, default='', help='gsutil bucket') 366 | parser.add_argument('--cache-images', action='store_true', help='cache images for faster training') 367 | parser.add_argument('--weights', type=str, default='', help='initial weights path') 368 | parser.add_argument('--name', default='', help='renames results.txt to results_name.txt if supplied') 369 | parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') 370 | parser.add_argument('--adam', action='store_true', help='use adam optimizer') 371 | parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%') 372 | parser.add_argument('--single-cls', action='store_true', help='train as single-class dataset') 373 | opt = parser.parse_args() 374 | opt.weights = last if opt.resume else opt.weights 375 | print(opt) 376 | opt.img_size.extend([opt.img_size[-1]] * (2 - len(opt.img_size))) # extend to 2 sizes (train, test) 377 | device = torch_utils.select_device(opt.device, apex=mixed_precision, batch_size=opt.batch_size) 378 | # check_git_status() 379 | if device.type == 'cpu': 380 | mixed_precision = False 381 | # Train 382 | # if not opt.evolve: 383 | tb_writer = SummaryWriter(comment=opt.name) 384 | print('Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/') 385 | train(hyp) 386 | # Evolve hyperparameters (optional) 387 | # else: 388 | # tb_writer = None 389 | # opt.notest, opt.nosave = True, True # only test/save final epoch 390 | # if opt.bucket: 391 | # os.system('gsutil cp gs://%s/evolve.txt .' % opt.bucket) # download evolve.txt if exists 392 | # for _ in range(10): # generations to evolve 393 | # if os.path.exists('evolve.txt'): # if evolve.txt exists: select best hyps and mutate 394 | # # Select parent(s) 395 | # parent = 'single' # parent selection method: 'single' or 'weighted' 396 | # x = np.loadtxt('evolve.txt', ndmin=2) 397 | # n = min(5, len(x)) # number of previous results to consider 398 | # x = x[np.argsort(-fitness(x))][:n] # top n mutations 399 | # w = fitness(x) - fitness(x).min() # weights 400 | # if parent == 'single' or len(x) == 1: 401 | # # x = x[random.randint(0, n - 1)] # random selection 402 | # x = x[random.choices(range(n), weights=w)[0]] # weighted selection 403 | # elif parent == 'weighted': 404 | # x = (x * w.reshape(n, 1)).sum(0) / w.sum() # weighted combination 405 | 406 | # # Mutate 407 | # mp, s = 0.9, 0.2 # mutation probability, sigma 408 | # npr = np.random 409 | # npr.seed(int(time.time())) 410 | # g = np.array([1, 1, 1, 1, 1, 1, 1, 0, .1, 1, 0, 1, 1, 1, 1, 1, 1, 1]) # gains 411 | # ng = len(g) 412 | # v = np.ones(ng) 413 | # while all(v == 1): # mutate until a change occurs (prevent duplicates) 414 | # v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0) 415 | # for i, k in enumerate(hyp.keys()): # plt.hist(v.ravel(), 300) 416 | # hyp[k] = x[i + 7] * v[i] # mutate 417 | 418 | # # Clip to limits 419 | # keys = ['lr0', 'iou_t', 'momentum', 'weight_decay', 'hsv_s', 'hsv_v', 'translate', 'scale', 'fl_gamma'] 420 | # limits = [(1e-5, 1e-2), (0.00, 0.70), (0.60, 0.98), (0, 0.001), (0, .9), (0, .9), (0, .9), (0, .9), (0, 3)] 421 | # for k, v in zip(keys, limits): 422 | # hyp[k] = np.clip(hyp[k], v[0], v[1]) 423 | # # Train mutation 424 | # results = train(hyp.copy()) 425 | # # Write mutation results 426 | # print_mutation(hyp, results, opt.bucket) 427 | # # Plot results 428 | # # plot_evolution_results(hyp) 429 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/utils/__init__.py -------------------------------------------------------------------------------- /utils/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/utils/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /utils/__pycache__/datasets.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/utils/__pycache__/datasets.cpython-37.pyc -------------------------------------------------------------------------------- /utils/__pycache__/google_utils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/utils/__pycache__/google_utils.cpython-37.pyc -------------------------------------------------------------------------------- /utils/__pycache__/torch_utils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/utils/__pycache__/torch_utils.cpython-37.pyc -------------------------------------------------------------------------------- /utils/__pycache__/utils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Transformer-man/yolov5-flask/36573a0b6e91d5a91f3394af278f5a5e768efae7/utils/__pycache__/utils.cpython-37.pyc -------------------------------------------------------------------------------- /utils/activations.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.functional as F 3 | import torch.nn as nn 4 | 5 | 6 | # Swish ------------------------------------------------------------------------ 7 | class SwishImplementation(torch.autograd.Function): 8 | @staticmethod 9 | def forward(ctx, x): 10 | ctx.save_for_backward(x) 11 | return x * torch.sigmoid(x) 12 | 13 | @staticmethod 14 | def backward(ctx, grad_output): 15 | x = ctx.saved_tensors[0] 16 | sx = torch.sigmoid(x) 17 | return grad_output * (sx * (1 + x * (1 - sx))) 18 | 19 | 20 | class MemoryEfficientSwish(nn.Module): 21 | @staticmethod 22 | def forward(x): 23 | return SwishImplementation.apply(x) 24 | 25 | 26 | class HardSwish(nn.Module): # https://arxiv.org/pdf/1905.02244.pdf 27 | @staticmethod 28 | def forward(x): 29 | return x * F.hardtanh(x + 3, 0., 6., True) / 6. 30 | 31 | 32 | class Swish(nn.Module): 33 | @staticmethod 34 | def forward(x): 35 | return x * torch.sigmoid(x) 36 | 37 | 38 | # Mish ------------------------------------------------------------------------ 39 | class MishImplementation(torch.autograd.Function): 40 | @staticmethod 41 | def forward(ctx, x): 42 | ctx.save_for_backward(x) 43 | return x.mul(torch.tanh(F.softplus(x))) # x * tanh(ln(1 + exp(x))) 44 | 45 | @staticmethod 46 | def backward(ctx, grad_output): 47 | x = ctx.saved_tensors[0] 48 | sx = torch.sigmoid(x) 49 | fx = F.softplus(x).tanh() 50 | return grad_output * (fx + x * sx * (1 - fx * fx)) 51 | 52 | 53 | class MemoryEfficientMish(nn.Module): 54 | @staticmethod 55 | def forward(x): 56 | return MishImplementation.apply(x) 57 | 58 | 59 | class Mish(nn.Module): # https://github.com/digantamisra98/Mish 60 | @staticmethod 61 | def forward(x): 62 | return x * F.softplus(x).tanh() 63 | -------------------------------------------------------------------------------- /utils/datasets.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import math 3 | import os 4 | import random 5 | import shutil 6 | import time 7 | from pathlib import Path 8 | from threading import Thread 9 | 10 | import cv2 11 | import numpy as np 12 | import torch 13 | from PIL import Image, ExifTags 14 | from torch.utils.data import Dataset 15 | from tqdm import tqdm 16 | 17 | from utils.utils import xyxy2xywh, xywh2xyxy 18 | 19 | help_url = 'https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data' 20 | img_formats = ['.bmp', '.jpg', '.jpeg', '.png', '.tif', '.dng'] 21 | vid_formats = ['.mov', '.avi', '.mp4'] 22 | 23 | # Get orientation exif tag 24 | for orientation in ExifTags.TAGS.keys(): 25 | if ExifTags.TAGS[orientation] == 'Orientation': 26 | break 27 | 28 | 29 | def exif_size(img): 30 | # Returns exif-corrected PIL size 31 | s = img.size # (width, height) 32 | try: 33 | rotation = dict(img._getexif().items())[orientation] 34 | if rotation == 6: # rotation 270 35 | s = (s[1], s[0]) 36 | elif rotation == 8: # rotation 90 37 | s = (s[1], s[0]) 38 | except: 39 | pass 40 | 41 | return s 42 | 43 | 44 | class LoadImages: # for inference 45 | def __init__(self, path, img_size=416): 46 | path = str(Path(path)) # os-agnostic 47 | files = [] 48 | if os.path.isdir(path): 49 | files = sorted(glob.glob(os.path.join(path, '*.*'))) 50 | elif os.path.isfile(path): 51 | files = [path] 52 | 53 | images = [x for x in files if os.path.splitext(x)[-1].lower() in img_formats] 54 | videos = [x for x in files if os.path.splitext(x)[-1].lower() in vid_formats] 55 | nI, nV = len(images), len(videos) 56 | 57 | self.img_size = img_size 58 | self.files = images + videos 59 | self.nF = nI + nV # number of files 60 | self.video_flag = [False] * nI + [True] * nV 61 | self.mode = 'images' 62 | if any(videos): 63 | self.new_video(videos[0]) # new video 64 | else: 65 | self.cap = None 66 | assert self.nF > 0, 'No images or videos found in ' + path 67 | 68 | def __iter__(self): 69 | self.count = 0 70 | return self 71 | 72 | def __next__(self): 73 | if self.count == self.nF: 74 | raise StopIteration 75 | path = self.files[self.count] 76 | 77 | if self.video_flag[self.count]: 78 | # Read video 79 | self.mode = 'video' 80 | ret_val, img0 = self.cap.read() 81 | if not ret_val: 82 | self.count += 1 83 | self.cap.release() 84 | if self.count == self.nF: # last video 85 | raise StopIteration 86 | else: 87 | path = self.files[self.count] 88 | self.new_video(path) 89 | ret_val, img0 = self.cap.read() 90 | 91 | self.frame += 1 92 | print('video %g/%g (%g/%g) %s: ' % (self.count + 1, self.nF, self.frame, self.nframes, path), end='') 93 | 94 | else: 95 | # Read image 96 | self.count += 1 97 | img0 = cv2.imread(path) # BGR 98 | assert img0 is not None, 'Image Not Found ' + path 99 | print('image %g/%g %s: ' % (self.count, self.nF, path), end='') 100 | 101 | # Padded resize 102 | img = letterbox(img0, new_shape=self.img_size)[0] 103 | 104 | # Convert 105 | img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416 106 | img = np.ascontiguousarray(img) 107 | 108 | # cv2.imwrite(path + '.letterbox.jpg', 255 * img.transpose((1, 2, 0))[:, :, ::-1]) # save letterbox image 109 | return path, img, img0, self.cap 110 | 111 | def new_video(self, path): 112 | self.frame = 0 113 | self.cap = cv2.VideoCapture(path) 114 | self.nframes = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT)) 115 | 116 | def __len__(self): 117 | return self.nF # number of files 118 | 119 | # def LoadImages(img0): # for inference 120 | # 121 | # img = letterbox(img0, new_shape=640)[0] 122 | # 123 | # img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416 124 | # img = np.ascontiguousarray(img) 125 | # 126 | # return img, img0 127 | 128 | 129 | 130 | class LoadWebcam: # for inference 131 | def __init__(self, pipe=0, img_size=416): 132 | self.img_size = img_size 133 | 134 | if pipe == '0': 135 | pipe = 0 # local camera 136 | # pipe = 'rtsp://192.168.1.64/1' # IP camera 137 | # pipe = 'rtsp://username:password@192.168.1.64/1' # IP camera with login 138 | # pipe = 'rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa' # IP traffic camera 139 | # pipe = 'http://wmccpinetop.axiscam.net/mjpg/video.mjpg' # IP golf camera 140 | 141 | # https://answers.opencv.org/question/215996/changing-gstreamer-pipeline-to-opencv-in-pythonsolved/ 142 | # pipe = '"rtspsrc location="rtsp://username:password@192.168.1.64/1" latency=10 ! appsink' # GStreamer 143 | 144 | # https://answers.opencv.org/question/200787/video-acceleration-gstremer-pipeline-in-videocapture/ 145 | # https://stackoverflow.com/questions/54095699/install-gstreamer-support-for-opencv-python-package # install help 146 | # pipe = "rtspsrc location=rtsp://root:root@192.168.0.91:554/axis-media/media.amp?videocodec=h264&resolution=3840x2160 protocols=GST_RTSP_LOWER_TRANS_TCP ! rtph264depay ! queue ! vaapih264dec ! videoconvert ! appsink" # GStreamer 147 | 148 | self.pipe = pipe 149 | self.cap = cv2.VideoCapture(pipe) # video capture object 150 | self.cap.set(cv2.CAP_PROP_BUFFERSIZE, 3) # set buffer size 151 | 152 | def __iter__(self): 153 | self.count = -1 154 | return self 155 | 156 | def __next__(self): 157 | self.count += 1 158 | if cv2.waitKey(1) == ord('q'): # q to quit 159 | self.cap.release() 160 | cv2.destroyAllWindows() 161 | raise StopIteration 162 | 163 | # Read frame 164 | if self.pipe == 0: # local camera 165 | ret_val, img0 = self.cap.read() 166 | img0 = cv2.flip(img0, 1) # flip left-right 167 | else: # IP camera 168 | n = 0 169 | while True: 170 | n += 1 171 | self.cap.grab() 172 | if n % 30 == 0: # skip frames 173 | ret_val, img0 = self.cap.retrieve() 174 | if ret_val: 175 | break 176 | 177 | # Print 178 | assert ret_val, 'Camera Error %s' % self.pipe 179 | img_path = 'webcam.jpg' 180 | print('webcam %g: ' % self.count, end='') 181 | 182 | # Padded resize 183 | img = letterbox(img0, new_shape=self.img_size)[0] 184 | 185 | # Convert 186 | img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416 187 | img = np.ascontiguousarray(img) 188 | 189 | return img_path, img, img0, None 190 | 191 | def __len__(self): 192 | return 0 193 | 194 | 195 | class LoadStreams: # multiple IP or RTSP cameras 196 | def __init__(self, sources='streams.txt', img_size=416): 197 | self.mode = 'images' 198 | self.img_size = img_size 199 | 200 | if os.path.isfile(sources): 201 | with open(sources, 'r') as f: 202 | sources = [x.strip() for x in f.read().splitlines() if len(x.strip())] 203 | else: 204 | sources = [sources] 205 | 206 | n = len(sources) 207 | self.imgs = [None] * n 208 | self.sources = sources 209 | for i, s in enumerate(sources): 210 | # Start the thread to read frames from the video stream 211 | print('%g/%g: %s... ' % (i + 1, n, s), end='') 212 | cap = cv2.VideoCapture(0 if s == '0' else s) 213 | assert cap.isOpened(), 'Failed to open %s' % s 214 | w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) 215 | h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) 216 | fps = cap.get(cv2.CAP_PROP_FPS) % 100 217 | _, self.imgs[i] = cap.read() # guarantee first frame 218 | thread = Thread(target=self.update, args=([i, cap]), daemon=True) 219 | print(' success (%gx%g at %.2f FPS).' % (w, h, fps)) 220 | thread.start() 221 | print('') # newline 222 | 223 | # check for common shapes 224 | s = np.stack([letterbox(x, new_shape=self.img_size)[0].shape for x in self.imgs], 0) # inference shapes 225 | self.rect = np.unique(s, axis=0).shape[0] == 1 # rect inference if all shapes equal 226 | if not self.rect: 227 | print('WARNING: Different stream shapes detected. For optimal performance supply similarly-shaped streams.') 228 | 229 | def update(self, index, cap): 230 | # Read next stream frame in a daemon thread 231 | n = 0 232 | while cap.isOpened(): 233 | n += 1 234 | # _, self.imgs[index] = cap.read() 235 | cap.grab() 236 | if n == 4: # read every 4th frame 237 | _, self.imgs[index] = cap.retrieve() 238 | n = 0 239 | time.sleep(0.01) # wait time 240 | 241 | def __iter__(self): 242 | self.count = -1 243 | return self 244 | 245 | def __next__(self): 246 | self.count += 1 247 | img0 = self.imgs.copy() 248 | if cv2.waitKey(1) == ord('q'): # q to quit 249 | cv2.destroyAllWindows() 250 | raise StopIteration 251 | 252 | # Letterbox 253 | img = [letterbox(x, new_shape=self.img_size, auto=self.rect)[0] for x in img0] 254 | 255 | # Stack 256 | img = np.stack(img, 0) 257 | 258 | # Convert 259 | img = img[:, :, :, ::-1].transpose(0, 3, 1, 2) # BGR to RGB, to bsx3x416x416 260 | img = np.ascontiguousarray(img) 261 | 262 | return self.sources, img, img0, None 263 | 264 | def __len__(self): 265 | return 0 # 1E12 frames = 32 streams at 30 FPS for 30 years 266 | 267 | 268 | class LoadImagesAndLabels(Dataset): # for training/testing 269 | def __init__(self, path, img_size=416, batch_size=16, augment=False, hyp=None, rect=False, image_weights=False, 270 | cache_images=False, single_cls=False, pad=0.0): 271 | try: 272 | path = str(Path(path)) # os-agnostic 273 | parent = str(Path(path).parent) + os.sep 274 | if os.path.isfile(path): # file 275 | with open(path, 'r') as f: 276 | f = f.read().splitlines() 277 | f = [x.replace('./', parent) if x.startswith('./') else x for x in f] # local to global path 278 | elif os.path.isdir(path): # folder 279 | f = glob.iglob(path + os.sep + '*.*') 280 | else: 281 | raise Exception('%s does not exist' % path) 282 | self.img_files = [x.replace('/', os.sep) for x in f if os.path.splitext(x)[-1].lower() in img_formats] 283 | except: 284 | raise Exception('Error loading data from %s. See %s' % (path, help_url)) 285 | 286 | n = len(self.img_files) 287 | assert n > 0, 'No images found in %s. See %s' % (path, help_url) 288 | bi = np.floor(np.arange(n) / batch_size).astype(np.int) # batch index 289 | nb = bi[-1] + 1 # number of batches 290 | 291 | self.n = n # number of images 292 | self.batch = bi # batch index of image 293 | self.img_size = img_size 294 | self.augment = augment 295 | self.hyp = hyp 296 | self.image_weights = image_weights 297 | self.rect = False if image_weights else rect 298 | self.mosaic = self.augment and not self.rect # load 4 images at a time into a mosaic (only during training) 299 | 300 | # Define labels 301 | self.label_files = [x.replace('images', 'labels').replace(os.path.splitext(x)[-1], '.txt') 302 | for x in self.img_files] 303 | 304 | # Rectangular Training https://github.com/ultralytics/yolov3/issues/232 305 | if self.rect: 306 | # Read image shapes (wh) 307 | sp = path.replace('.txt', '') + '.shapes' # shapefile path 308 | try: 309 | with open(sp, 'r') as f: # read existing shapefile 310 | s = [x.split() for x in f.read().splitlines()] 311 | assert len(s) == n, 'Shapefile out of sync' 312 | except: 313 | s = [exif_size(Image.open(f)) for f in tqdm(self.img_files, desc='Reading image shapes')] 314 | np.savetxt(sp, s, fmt='%g') # overwrites existing (if any) 315 | 316 | # Sort by aspect ratio 317 | s = np.array(s, dtype=np.float64) 318 | ar = s[:, 1] / s[:, 0] # aspect ratio 319 | irect = ar.argsort() 320 | self.img_files = [self.img_files[i] for i in irect] 321 | self.label_files = [self.label_files[i] for i in irect] 322 | self.shapes = s[irect] # wh 323 | ar = ar[irect] 324 | 325 | # Set training image shapes 326 | shapes = [[1, 1]] * nb 327 | for i in range(nb): 328 | ari = ar[bi == i] 329 | mini, maxi = ari.min(), ari.max() 330 | if maxi < 1: 331 | shapes[i] = [maxi, 1] 332 | elif mini > 1: 333 | shapes[i] = [1, 1 / mini] 334 | 335 | self.batch_shapes = np.ceil(np.array(shapes) * img_size / 32. + pad).astype(np.int) * 32 336 | 337 | # Cache labels 338 | self.imgs = [None] * n 339 | self.labels = [np.zeros((0, 5), dtype=np.float32)] * n 340 | create_datasubset, extract_bounding_boxes, labels_loaded = False, False, False 341 | nm, nf, ne, ns, nd = 0, 0, 0, 0, 0 # number missing, found, empty, datasubset, duplicate 342 | np_labels_path = str(Path(self.label_files[0]).parent) + '.npy' # saved labels in *.npy file 343 | if os.path.isfile(np_labels_path): 344 | s = np_labels_path # print string 345 | x = np.load(np_labels_path, allow_pickle=True) 346 | if len(x) == n: 347 | self.labels = x 348 | labels_loaded = True 349 | else: 350 | s = path.replace('images', 'labels') 351 | 352 | pbar = tqdm(self.label_files) 353 | for i, file in enumerate(pbar): 354 | if labels_loaded: 355 | l = self.labels[i] 356 | # np.savetxt(file, l, '%g') # save *.txt from *.npy file 357 | else: 358 | try: 359 | with open(file, 'r') as f: 360 | l = np.array([x.split() for x in f.read().splitlines()], dtype=np.float32) 361 | except: 362 | nm += 1 # print('missing labels for image %s' % self.img_files[i]) # file missing 363 | continue 364 | 365 | if l.shape[0]: 366 | assert l.shape[1] == 5, '> 5 label columns: %s' % file 367 | assert (l >= 0).all(), 'negative labels: %s' % file 368 | assert (l[:, 1:] <= 1).all(), 'non-normalized or out of bounds coordinate labels: %s' % file 369 | if np.unique(l, axis=0).shape[0] < l.shape[0]: # duplicate rows 370 | nd += 1 # print('WARNING: duplicate rows in %s' % self.label_files[i]) # duplicate rows 371 | if single_cls: 372 | l[:, 0] = 0 # force dataset into single-class mode 373 | self.labels[i] = l 374 | nf += 1 # file found 375 | 376 | # Create subdataset (a smaller dataset) 377 | if create_datasubset and ns < 1E4: 378 | if ns == 0: 379 | create_folder(path='./datasubset') 380 | os.makedirs('./datasubset/images') 381 | exclude_classes = 43 382 | if exclude_classes not in l[:, 0]: 383 | ns += 1 384 | # shutil.copy(src=self.img_files[i], dst='./datasubset/images/') # copy image 385 | with open('./datasubset/images.txt', 'a') as f: 386 | f.write(self.img_files[i] + '\n') 387 | 388 | # Extract object detection boxes for a second stage classifier 389 | if extract_bounding_boxes: 390 | p = Path(self.img_files[i]) 391 | img = cv2.imread(str(p)) 392 | h, w = img.shape[:2] 393 | for j, x in enumerate(l): 394 | f = '%s%sclassifier%s%g_%g_%s' % (p.parent.parent, os.sep, os.sep, x[0], j, p.name) 395 | if not os.path.exists(Path(f).parent): 396 | os.makedirs(Path(f).parent) # make new output folder 397 | 398 | b = x[1:] * [w, h, w, h] # box 399 | b[2:] = b[2:].max() # rectangle to square 400 | b[2:] = b[2:] * 1.3 + 30 # pad 401 | b = xywh2xyxy(b.reshape(-1, 4)).ravel().astype(np.int) 402 | 403 | b[[0, 2]] = np.clip(b[[0, 2]], 0, w) # clip boxes outside of image 404 | b[[1, 3]] = np.clip(b[[1, 3]], 0, h) 405 | assert cv2.imwrite(f, img[b[1]:b[3], b[0]:b[2]]), 'Failure extracting classifier boxes' 406 | else: 407 | ne += 1 # print('empty labels for image %s' % self.img_files[i]) # file empty 408 | # os.system("rm '%s' '%s'" % (self.img_files[i], self.label_files[i])) # remove 409 | 410 | pbar.desc = 'Caching labels %s (%g found, %g missing, %g empty, %g duplicate, for %g images)' % ( 411 | s, nf, nm, ne, nd, n) 412 | assert nf > 0 or n == 20288, 'No labels found in %s. See %s' % (os.path.dirname(file) + os.sep, help_url) 413 | if not labels_loaded and n > 1000: 414 | print('Saving labels to %s for faster future loading' % np_labels_path) 415 | np.save(np_labels_path, self.labels) # save for next time 416 | 417 | # Cache images into memory for faster training (WARNING: large datasets may exceed system RAM) 418 | if cache_images: # if training 419 | gb = 0 # Gigabytes of cached images 420 | pbar = tqdm(range(len(self.img_files)), desc='Caching images') 421 | self.img_hw0, self.img_hw = [None] * n, [None] * n 422 | for i in pbar: # max 10k images 423 | self.imgs[i], self.img_hw0[i], self.img_hw[i] = load_image(self, i) # img, hw_original, hw_resized 424 | gb += self.imgs[i].nbytes 425 | pbar.desc = 'Caching images (%.1fGB)' % (gb / 1E9) 426 | 427 | # Detect corrupted images https://medium.com/joelthchao/programmatically-detect-corrupted-image-8c1b2006c3d3 428 | detect_corrupted_images = False 429 | if detect_corrupted_images: 430 | from skimage import io # conda install -c conda-forge scikit-image 431 | for file in tqdm(self.img_files, desc='Detecting corrupted images'): 432 | try: 433 | _ = io.imread(file) 434 | except: 435 | print('Corrupted image detected: %s' % file) 436 | 437 | def __len__(self): 438 | return len(self.img_files) 439 | 440 | # def __iter__(self): 441 | # self.count = -1 442 | # print('ran dataset iter') 443 | # #self.shuffled_vector = np.random.permutation(self.nF) if self.augment else np.arange(self.nF) 444 | # return self 445 | 446 | def __getitem__(self, index): 447 | if self.image_weights: 448 | index = self.indices[index] 449 | 450 | hyp = self.hyp 451 | if self.mosaic: 452 | # Load mosaic 453 | img, labels = load_mosaic(self, index) 454 | shapes = None 455 | 456 | else: 457 | # Load image 458 | img, (h0, w0), (h, w) = load_image(self, index) 459 | 460 | # Letterbox 461 | shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size # final letterboxed shape 462 | img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment) 463 | shapes = (h0, w0), ((h / h0, w / w0), pad) # for COCO mAP rescaling 464 | 465 | # Load labels 466 | labels = [] 467 | x = self.labels[index] 468 | if x.size > 0: 469 | # Normalized xywh to pixel xyxy format 470 | labels = x.copy() 471 | labels[:, 1] = ratio[0] * w * (x[:, 1] - x[:, 3] / 2) + pad[0] # pad width 472 | labels[:, 2] = ratio[1] * h * (x[:, 2] - x[:, 4] / 2) + pad[1] # pad height 473 | labels[:, 3] = ratio[0] * w * (x[:, 1] + x[:, 3] / 2) + pad[0] 474 | labels[:, 4] = ratio[1] * h * (x[:, 2] + x[:, 4] / 2) + pad[1] 475 | 476 | if self.augment: 477 | # Augment imagespace 478 | if not self.mosaic: 479 | img, labels = random_affine(img, labels, 480 | degrees=hyp['degrees'], 481 | translate=hyp['translate'], 482 | scale=hyp['scale'], 483 | shear=hyp['shear']) 484 | 485 | # Augment colorspace 486 | augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v']) 487 | 488 | # Apply cutouts 489 | # if random.random() < 0.9: 490 | # labels = cutout(img, labels) 491 | 492 | nL = len(labels) # number of labels 493 | if nL: 494 | # convert xyxy to xywh 495 | labels[:, 1:5] = xyxy2xywh(labels[:, 1:5]) 496 | 497 | # Normalize coordinates 0 - 1 498 | labels[:, [2, 4]] /= img.shape[0] # height 499 | labels[:, [1, 3]] /= img.shape[1] # width 500 | 501 | if self.augment: 502 | # random left-right flip 503 | lr_flip = True 504 | if lr_flip and random.random() < 0.5: 505 | img = np.fliplr(img) 506 | if nL: 507 | labels[:, 1] = 1 - labels[:, 1] 508 | 509 | # random up-down flip 510 | ud_flip = False 511 | if ud_flip and random.random() < 0.5: 512 | img = np.flipud(img) 513 | if nL: 514 | labels[:, 2] = 1 - labels[:, 2] 515 | 516 | labels_out = torch.zeros((nL, 6)) 517 | if nL: 518 | labels_out[:, 1:] = torch.from_numpy(labels) 519 | 520 | # Convert 521 | img = img[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416 522 | img = np.ascontiguousarray(img) 523 | 524 | return torch.from_numpy(img), labels_out, self.img_files[index], shapes 525 | 526 | @staticmethod 527 | def collate_fn(batch): 528 | img, label, path, shapes = zip(*batch) # transposed 529 | for i, l in enumerate(label): 530 | l[:, 0] = i # add target image index for build_targets() 531 | return torch.stack(img, 0), torch.cat(label, 0), path, shapes 532 | 533 | 534 | def load_image(self, index): 535 | # loads 1 image from dataset, returns img, original hw, resized hw 536 | img = self.imgs[index] 537 | if img is None: # not cached 538 | path = self.img_files[index] 539 | img = cv2.imread(path) # BGR 540 | assert img is not None, 'Image Not Found ' + path 541 | h0, w0 = img.shape[:2] # orig hw 542 | r = self.img_size / max(h0, w0) # resize image to img_size 543 | if r != 1: # always resize down, only resize up if training with augmentation 544 | interp = cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR 545 | img = cv2.resize(img, (int(w0 * r), int(h0 * r)), interpolation=interp) 546 | return img, (h0, w0), img.shape[:2] # img, hw_original, hw_resized 547 | else: 548 | return self.imgs[index], self.img_hw0[index], self.img_hw[index] # img, hw_original, hw_resized 549 | 550 | 551 | def augment_hsv(img, hgain=0.5, sgain=0.5, vgain=0.5): 552 | r = np.random.uniform(-1, 1, 3) * [hgain, sgain, vgain] + 1 # random gains 553 | hue, sat, val = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV)) 554 | dtype = img.dtype # uint8 555 | 556 | x = np.arange(0, 256, dtype=np.int16) 557 | lut_hue = ((x * r[0]) % 180).astype(dtype) 558 | lut_sat = np.clip(x * r[1], 0, 255).astype(dtype) 559 | lut_val = np.clip(x * r[2], 0, 255).astype(dtype) 560 | 561 | img_hsv = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val))).astype(dtype) 562 | cv2.cvtColor(img_hsv, cv2.COLOR_HSV2BGR, dst=img) # no return needed 563 | 564 | # Histogram equalization 565 | # if random.random() < 0.2: 566 | # for i in range(3): 567 | # img[:, :, i] = cv2.equalizeHist(img[:, :, i]) 568 | 569 | 570 | def load_mosaic(self, index): 571 | # loads images in a mosaic 572 | 573 | labels4 = [] 574 | s = self.img_size 575 | xc, yc = [int(random.uniform(s * 0.5, s * 1.5)) for _ in range(2)] # mosaic center x, y 576 | indices = [index] + [random.randint(0, len(self.labels) - 1) for _ in range(3)] # 3 additional image indices 577 | for i, index in enumerate(indices): 578 | # Load image 579 | img, _, (h, w) = load_image(self, index) 580 | 581 | # place img in img4 582 | if i == 0: # top left 583 | img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8) # base image with 4 tiles 584 | x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc # xmin, ymin, xmax, ymax (large image) 585 | x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h # xmin, ymin, xmax, ymax (small image) 586 | elif i == 1: # top right 587 | x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc 588 | x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h 589 | elif i == 2: # bottom left 590 | x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h) 591 | x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, max(xc, w), min(y2a - y1a, h) 592 | elif i == 3: # bottom right 593 | x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h) 594 | x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h) 595 | 596 | img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b] # img4[ymin:ymax, xmin:xmax] 597 | padw = x1a - x1b 598 | padh = y1a - y1b 599 | 600 | # Labels 601 | x = self.labels[index] 602 | labels = x.copy() 603 | if x.size > 0: # Normalized xywh to pixel xyxy format 604 | labels[:, 1] = w * (x[:, 1] - x[:, 3] / 2) + padw 605 | labels[:, 2] = h * (x[:, 2] - x[:, 4] / 2) + padh 606 | labels[:, 3] = w * (x[:, 1] + x[:, 3] / 2) + padw 607 | labels[:, 4] = h * (x[:, 2] + x[:, 4] / 2) + padh 608 | labels4.append(labels) 609 | 610 | # Concat/clip labels 611 | if len(labels4): 612 | labels4 = np.concatenate(labels4, 0) 613 | # np.clip(labels4[:, 1:] - s / 2, 0, s, out=labels4[:, 1:]) # use with center crop 614 | np.clip(labels4[:, 1:], 0, 2 * s, out=labels4[:, 1:]) # use with random_affine 615 | 616 | # Augment 617 | # img4 = img4[s // 2: int(s * 1.5), s // 2:int(s * 1.5)] # center crop (WARNING, requires box pruning) 618 | img4, labels4 = random_affine(img4, labels4, 619 | degrees=self.hyp['degrees'], 620 | translate=self.hyp['translate'], 621 | scale=self.hyp['scale'], 622 | shear=self.hyp['shear'], 623 | border=-s // 2) # border to remove 624 | 625 | return img4, labels4 626 | 627 | 628 | def letterbox(img, new_shape=(416, 416), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True): 629 | # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232 630 | shape = img.shape[:2] # current shape [height, width] 631 | if isinstance(new_shape, int): 632 | new_shape = (new_shape, new_shape) 633 | 634 | # Scale ratio (new / old) 635 | r = min(new_shape[0] / shape[0], new_shape[1] / shape[1]) 636 | if not scaleup: # only scale down, do not scale up (for better test mAP) 637 | r = min(r, 1.0) 638 | 639 | # Compute padding 640 | ratio = r, r # width, height ratios 641 | new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r)) 642 | dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding 643 | if auto: # minimum rectangle 644 | dw, dh = np.mod(dw, 64), np.mod(dh, 64) # wh padding 645 | elif scaleFill: # stretch 646 | dw, dh = 0.0, 0.0 647 | new_unpad = new_shape 648 | ratio = new_shape[0] / shape[1], new_shape[1] / shape[0] # width, height ratios 649 | 650 | dw /= 2 # divide padding into 2 sides 651 | dh /= 2 652 | 653 | if shape[::-1] != new_unpad: # resize 654 | img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR) 655 | top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1)) 656 | left, right = int(round(dw - 0.1)), int(round(dw + 0.1)) 657 | img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border 658 | return img, ratio, (dw, dh) 659 | 660 | 661 | def random_affine(img, targets=(), degrees=10, translate=.1, scale=.1, shear=10, border=0): 662 | # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(.1, .1), scale=(.9, 1.1), shear=(-10, 10)) 663 | # https://medium.com/uruvideo/dataset-augmentation-with-random-homographies-a8f4b44830d4 664 | # targets = [cls, xyxy] 665 | 666 | height = img.shape[0] + border * 2 667 | width = img.shape[1] + border * 2 668 | 669 | # Rotation and Scale 670 | R = np.eye(3) 671 | a = random.uniform(-degrees, degrees) 672 | # a += random.choice([-180, -90, 0, 90]) # add 90deg rotations to small rotations 673 | s = random.uniform(1 - scale, 1 + scale) 674 | # s = 2 ** random.uniform(-scale, scale) 675 | R[:2] = cv2.getRotationMatrix2D(angle=a, center=(img.shape[1] / 2, img.shape[0] / 2), scale=s) 676 | 677 | # Translation 678 | T = np.eye(3) 679 | T[0, 2] = random.uniform(-translate, translate) * img.shape[0] + border # x translation (pixels) 680 | T[1, 2] = random.uniform(-translate, translate) * img.shape[1] + border # y translation (pixels) 681 | 682 | # Shear 683 | S = np.eye(3) 684 | S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # x shear (deg) 685 | S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180) # y shear (deg) 686 | 687 | # Combined rotation matrix 688 | M = S @ T @ R # ORDER IS IMPORTANT HERE!! 689 | if (border != 0) or (M != np.eye(3)).any(): # image changed 690 | img = cv2.warpAffine(img, M[:2], dsize=(width, height), flags=cv2.INTER_LINEAR, borderValue=(114, 114, 114)) 691 | 692 | # Transform label coordinates 693 | n = len(targets) 694 | if n: 695 | # warp points 696 | xy = np.ones((n * 4, 3)) 697 | xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2) # x1y1, x2y2, x1y2, x2y1 698 | xy = (xy @ M.T)[:, :2].reshape(n, 8) 699 | 700 | # create new boxes 701 | x = xy[:, [0, 2, 4, 6]] 702 | y = xy[:, [1, 3, 5, 7]] 703 | xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T 704 | 705 | # # apply angle-based reduction of bounding boxes 706 | # radians = a * math.pi / 180 707 | # reduction = max(abs(math.sin(radians)), abs(math.cos(radians))) ** 0.5 708 | # x = (xy[:, 2] + xy[:, 0]) / 2 709 | # y = (xy[:, 3] + xy[:, 1]) / 2 710 | # w = (xy[:, 2] - xy[:, 0]) * reduction 711 | # h = (xy[:, 3] - xy[:, 1]) * reduction 712 | # xy = np.concatenate((x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T 713 | 714 | # reject warped points outside of image 715 | xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width) 716 | xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height) 717 | w = xy[:, 2] - xy[:, 0] 718 | h = xy[:, 3] - xy[:, 1] 719 | area = w * h 720 | area0 = (targets[:, 3] - targets[:, 1]) * (targets[:, 4] - targets[:, 2]) 721 | ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16)) # aspect ratio 722 | i = (w > 4) & (h > 4) & (area / (area0 * s + 1e-16) > 0.2) & (ar < 10) 723 | 724 | targets = targets[i] 725 | targets[:, 1:5] = xy[i] 726 | 727 | return img, targets 728 | 729 | 730 | def cutout(image, labels): 731 | # https://arxiv.org/abs/1708.04552 732 | # https://github.com/hysts/pytorch_cutout/blob/master/dataloader.py 733 | # https://towardsdatascience.com/when-conventional-wisdom-fails-revisiting-data-augmentation-for-self-driving-cars-4831998c5509 734 | h, w = image.shape[:2] 735 | 736 | def bbox_ioa(box1, box2): 737 | # Returns the intersection over box2 area given box1, box2. box1 is 4, box2 is nx4. boxes are x1y1x2y2 738 | box2 = box2.transpose() 739 | 740 | # Get the coordinates of bounding boxes 741 | b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3] 742 | b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3] 743 | 744 | # Intersection area 745 | inter_area = (np.minimum(b1_x2, b2_x2) - np.maximum(b1_x1, b2_x1)).clip(0) * \ 746 | (np.minimum(b1_y2, b2_y2) - np.maximum(b1_y1, b2_y1)).clip(0) 747 | 748 | # box2 area 749 | box2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1) + 1e-16 750 | 751 | # Intersection over box2 area 752 | return inter_area / box2_area 753 | 754 | # create random masks 755 | scales = [0.5] * 1 + [0.25] * 2 + [0.125] * 4 + [0.0625] * 8 + [0.03125] * 16 # image size fraction 756 | for s in scales: 757 | mask_h = random.randint(1, int(h * s)) 758 | mask_w = random.randint(1, int(w * s)) 759 | 760 | # box 761 | xmin = max(0, random.randint(0, w) - mask_w // 2) 762 | ymin = max(0, random.randint(0, h) - mask_h // 2) 763 | xmax = min(w, xmin + mask_w) 764 | ymax = min(h, ymin + mask_h) 765 | 766 | # apply random color mask 767 | image[ymin:ymax, xmin:xmax] = [random.randint(64, 191) for _ in range(3)] 768 | 769 | # return unobscured labels 770 | if len(labels) and s > 0.03: 771 | box = np.array([xmin, ymin, xmax, ymax], dtype=np.float32) 772 | ioa = bbox_ioa(box, labels[:, 1:5]) # intersection over area 773 | labels = labels[ioa < 0.60] # remove >60% obscured labels 774 | 775 | return labels 776 | 777 | 778 | def reduce_img_size(path='../data/sm4/images', img_size=1024): # from utils.datasets import *; reduce_img_size() 779 | # creates a new ./images_reduced folder with reduced size images of maximum size img_size 780 | path_new = path + '_reduced' # reduced images path 781 | create_folder(path_new) 782 | for f in tqdm(glob.glob('%s/*.*' % path)): 783 | try: 784 | img = cv2.imread(f) 785 | h, w = img.shape[:2] 786 | r = img_size / max(h, w) # size ratio 787 | if r < 1.0: 788 | img = cv2.resize(img, (int(w * r), int(h * r)), interpolation=cv2.INTER_AREA) # _LINEAR fastest 789 | fnew = f.replace(path, path_new) # .replace(Path(f).suffix, '.jpg') 790 | cv2.imwrite(fnew, img) 791 | except: 792 | print('WARNING: image failure %s' % f) 793 | 794 | 795 | def convert_images2bmp(): # from utils.datasets import *; convert_images2bmp() 796 | # Save images 797 | formats = [x.lower() for x in img_formats] + [x.upper() for x in img_formats] 798 | # for path in ['../coco/images/val2014', '../coco/images/train2014']: 799 | for path in ['../data/sm4/images', '../data/sm4/background']: 800 | create_folder(path + 'bmp') 801 | for ext in formats: # ['.bmp', '.jpg', '.jpeg', '.png', '.tif', '.dng'] 802 | for f in tqdm(glob.glob('%s/*%s' % (path, ext)), desc='Converting %s' % ext): 803 | cv2.imwrite(f.replace(ext.lower(), '.bmp').replace(path, path + 'bmp'), cv2.imread(f)) 804 | 805 | # Save labels 806 | # for path in ['../coco/trainvalno5k.txt', '../coco/5k.txt']: 807 | for file in ['../data/sm4/out_train.txt', '../data/sm4/out_test.txt']: 808 | with open(file, 'r') as f: 809 | lines = f.read() 810 | # lines = f.read().replace('2014/', '2014bmp/') # coco 811 | lines = lines.replace('/images', '/imagesbmp') 812 | lines = lines.replace('/background', '/backgroundbmp') 813 | for ext in formats: 814 | lines = lines.replace(ext, '.bmp') 815 | with open(file.replace('.txt', 'bmp.txt'), 'w') as f: 816 | f.write(lines) 817 | 818 | 819 | def recursive_dataset2bmp(dataset='../data/sm4_bmp'): # from utils.datasets import *; recursive_dataset2bmp() 820 | # Converts dataset to bmp (for faster training) 821 | formats = [x.lower() for x in img_formats] + [x.upper() for x in img_formats] 822 | for a, b, files in os.walk(dataset): 823 | for file in tqdm(files, desc=a): 824 | p = a + '/' + file 825 | s = Path(file).suffix 826 | if s == '.txt': # replace text 827 | with open(p, 'r') as f: 828 | lines = f.read() 829 | for f in formats: 830 | lines = lines.replace(f, '.bmp') 831 | with open(p, 'w') as f: 832 | f.write(lines) 833 | elif s in formats: # replace image 834 | cv2.imwrite(p.replace(s, '.bmp'), cv2.imread(p)) 835 | if s != '.bmp': 836 | os.system("rm '%s'" % p) 837 | 838 | 839 | def imagelist2folder(path='data/coco_64img.txt'): # from utils.datasets import *; imagelist2folder() 840 | # Copies all the images in a text file (list of images) into a folder 841 | create_folder(path[:-4]) 842 | with open(path, 'r') as f: 843 | for line in f.read().splitlines(): 844 | os.system('cp "%s" %s' % (line, path[:-4])) 845 | print(line) 846 | 847 | 848 | def create_folder(path='./new_folder'): 849 | # Create folder 850 | if os.path.exists(path): 851 | shutil.rmtree(path) # delete output folder 852 | os.makedirs(path) # make new output folder 853 | -------------------------------------------------------------------------------- /utils/google_utils.py: -------------------------------------------------------------------------------- 1 | # This file contains google utils: https://cloud.google.com/storage/docs/reference/libraries 2 | # pip install --upgrade google-cloud-storage 3 | # from google.cloud import storage 4 | 5 | import os 6 | import time 7 | from pathlib import Path 8 | 9 | 10 | def attempt_download(weights): 11 | # Attempt to download pretrained weights if not found locally 12 | weights = weights.strip() 13 | msg = weights + ' missing, try downloading from https://drive.google.com/drive/folders/1Drs_Aiu7xx6S-ix95f9kNsA6ueKRpN2J' 14 | 15 | r = 1 16 | if len(weights) > 0 and not os.path.isfile(weights): 17 | d = {'yolov3-spp.pt': '1mM67oNw4fZoIOL1c8M3hHmj66d8e-ni_', # yolov3-spp.yaml 18 | 'yolov5s.pt': '1R5T6rIyy3lLwgFXNms8whc-387H0tMQO', # yolov5s.yaml 19 | 'yolov5m.pt': '1vobuEExpWQVpXExsJ2w-Mbf3HJjWkQJr', # yolov5m.yaml 20 | 'yolov5l.pt': '1hrlqD1Wdei7UT4OgT785BEk1JwnSvNEV', # yolov5l.yaml 21 | 'yolov5x.pt': '1mM8aZJlWTxOg7BZJvNUMrTnA2AbeCVzS', # yolov5x.yaml 22 | } 23 | 24 | file = Path(weights).name 25 | if file in d: 26 | r = gdrive_download(id=d[file], name=weights) 27 | 28 | # Error check 29 | if not (r == 0 and os.path.exists(weights) and os.path.getsize(weights) > 1E6): # weights exist and > 1MB 30 | os.system('rm ' + weights) # remove partial downloads 31 | raise Exception(msg) 32 | 33 | 34 | def gdrive_download(id='1HaXkef9z6y5l4vUnCYgdmEAj61c6bfWO', name='coco.zip'): 35 | # https://gist.github.com/tanaikech/f0f2d122e05bf5f971611258c22c110f 36 | # Downloads a file from Google Drive, accepting presented query 37 | # from utils.google_utils import *; gdrive_download() 38 | t = time.time() 39 | 40 | print('Downloading https://drive.google.com/uc?export=download&id=%s as %s... ' % (id, name), end='') 41 | os.remove(name) if os.path.exists(name) else None # remove existing 42 | os.remove('cookie') if os.path.exists('cookie') else None 43 | 44 | # Attempt file download 45 | os.system("curl -c ./cookie -s -L \"https://drive.google.com/uc?export=download&id=%s\" > /dev/null" % id) 46 | if os.path.exists('cookie'): # large file 47 | s = "curl -Lb ./cookie \"https://drive.google.com/uc?export=download&confirm=`awk '/download/ {print $NF}' ./cookie`&id=%s\" -o %s" % ( 48 | id, name) 49 | else: # small file 50 | s = "curl -s -L -o %s 'https://drive.google.com/uc?export=download&id=%s'" % (name, id) 51 | r = os.system(s) # execute, capture return values 52 | os.remove('cookie') if os.path.exists('cookie') else None 53 | 54 | # Error check 55 | if r != 0: 56 | os.remove(name) if os.path.exists(name) else None # remove partial 57 | print('Download error ') # raise Exception('Download error') 58 | return r 59 | 60 | # Unzip if archive 61 | if name.endswith('.zip'): 62 | print('unzipping... ', end='') 63 | os.system('unzip -q %s' % name) # unzip 64 | os.remove(name) # remove zip to free space 65 | 66 | print('Done (%.1fs)' % (time.time() - t)) 67 | return r 68 | 69 | # def upload_blob(bucket_name, source_file_name, destination_blob_name): 70 | # # Uploads a file to a bucket 71 | # # https://cloud.google.com/storage/docs/uploading-objects#storage-upload-object-python 72 | # 73 | # storage_client = storage.Client() 74 | # bucket = storage_client.get_bucket(bucket_name) 75 | # blob = bucket.blob(destination_blob_name) 76 | # 77 | # blob.upload_from_filename(source_file_name) 78 | # 79 | # print('File {} uploaded to {}.'.format( 80 | # source_file_name, 81 | # destination_blob_name)) 82 | # 83 | # 84 | # def download_blob(bucket_name, source_blob_name, destination_file_name): 85 | # # Uploads a blob from a bucket 86 | # storage_client = storage.Client() 87 | # bucket = storage_client.get_bucket(bucket_name) 88 | # blob = bucket.blob(source_blob_name) 89 | # 90 | # blob.download_to_filename(destination_file_name) 91 | # 92 | # print('Blob {} downloaded to {}.'.format( 93 | # source_blob_name, 94 | # destination_file_name)) 95 | -------------------------------------------------------------------------------- /utils/torch_utils.py: -------------------------------------------------------------------------------- 1 | import math 2 | import os 3 | import time 4 | from copy import deepcopy 5 | 6 | import torch 7 | import torch.backends.cudnn as cudnn 8 | import torch.nn as nn 9 | import torch.nn.functional as F 10 | 11 | 12 | def init_seeds(seed=0): 13 | torch.manual_seed(seed) 14 | 15 | # Speed-reproducibility tradeoff https://pytorch.org/docs/stable/notes/randomness.html 16 | if seed == 0: # slower, more reproducible 17 | cudnn.deterministic = True 18 | cudnn.benchmark = False 19 | else: # faster, less reproducible 20 | cudnn.deterministic = False 21 | cudnn.benchmark = True 22 | 23 | 24 | def select_device(device='', apex=False, batch_size=None): 25 | # device = 'cpu' or '0' or '0,1,2,3' 26 | cpu_request = device.lower() == 'cpu' 27 | if device and not cpu_request: # if device requested other than 'cpu' 28 | os.environ['CUDA_VISIBLE_DEVICES'] = device # set environment variable 29 | assert torch.cuda.is_available(), 'CUDA unavailable, invalid device %s requested' % device # check availablity 30 | 31 | cuda = False if cpu_request else torch.cuda.is_available() 32 | if cuda: 33 | c = 1024 ** 2 # bytes to MB 34 | ng = torch.cuda.device_count() 35 | if ng > 1 and batch_size: # check that batch_size is compatible with device_count 36 | assert batch_size % ng == 0, 'batch-size %g not multiple of GPU count %g' % (batch_size, ng) 37 | x = [torch.cuda.get_device_properties(i) for i in range(ng)] 38 | s = 'Using CUDA ' + ('Apex ' if apex else '') # apex for mixed precision https://github.com/NVIDIA/apex 39 | for i in range(0, ng): 40 | if i == 1: 41 | s = ' ' * len(s) 42 | print("%sdevice%g _CudaDeviceProperties(name='%s', total_memory=%dMB)" % 43 | (s, i, x[i].name, x[i].total_memory / c)) 44 | else: 45 | print('Using CPU') 46 | 47 | print('') # skip a line 48 | return torch.device('cuda:0' if cuda else 'cpu') 49 | 50 | 51 | def time_synchronized(): 52 | torch.cuda.synchronize() if torch.cuda.is_available() else None 53 | return time.time() 54 | 55 | 56 | def initialize_weights(model): 57 | for m in model.modules(): 58 | t = type(m) 59 | if t is nn.Conv2d: 60 | pass # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 61 | elif t is nn.BatchNorm2d: 62 | m.eps = 1e-4 63 | m.momentum = 0.03 64 | elif t in [nn.LeakyReLU, nn.ReLU, nn.ReLU6]: 65 | m.inplace = True 66 | 67 | 68 | def find_modules(model, mclass=nn.Conv2d): 69 | # finds layer indices matching module class 'mclass' 70 | return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)] 71 | 72 | 73 | def fuse_conv_and_bn(conv, bn): 74 | # https://tehnokv.com/posts/fusing-batchnorm-and-conv/ 75 | with torch.no_grad(): 76 | # init 77 | fusedconv = torch.nn.Conv2d(conv.in_channels, 78 | conv.out_channels, 79 | kernel_size=conv.kernel_size, 80 | stride=conv.stride, 81 | padding=conv.padding, 82 | bias=True) 83 | 84 | # prepare filters 85 | w_conv = conv.weight.clone().view(conv.out_channels, -1) 86 | w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var))) 87 | fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.size())) 88 | 89 | # prepare spatial bias 90 | if conv.bias is not None: 91 | b_conv = conv.bias 92 | else: 93 | b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device) 94 | b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps)) 95 | fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn) 96 | 97 | return fusedconv 98 | 99 | 100 | def model_info(model, verbose=False): 101 | # Plots a line-by-line description of a PyTorch model 102 | n_p = sum(x.numel() for x in model.parameters()) # number parameters 103 | n_g = sum(x.numel() for x in model.parameters() if x.requires_grad) # number gradients 104 | if verbose: 105 | print('%5s %40s %9s %12s %20s %10s %10s' % ('layer', 'name', 'gradient', 'parameters', 'shape', 'mu', 'sigma')) 106 | for i, (name, p) in enumerate(model.named_parameters()): 107 | name = name.replace('module_list.', '') 108 | print('%5g %40s %9s %12g %20s %10.3g %10.3g' % 109 | (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std())) 110 | 111 | try: # FLOPS 112 | from thop import profile 113 | macs, _ = profile(model, inputs=(torch.zeros(1, 3, 480, 640),), verbose=False) 114 | fs = ', %.1f GFLOPS' % (macs / 1E9 * 2) 115 | except: 116 | fs = '' 117 | 118 | print('Model Summary: %g layers, %g parameters, %g gradients%s' % (len(list(model.parameters())), n_p, n_g, fs)) 119 | 120 | 121 | def load_classifier(name='resnet101', n=2): 122 | # Loads a pretrained model reshaped to n-class output 123 | import pretrainedmodels # https://github.com/Cadene/pretrained-models.pytorch#torchvision 124 | model = pretrainedmodels.__dict__[name](num_classes=1000, pretrained='imagenet') 125 | 126 | # Display model properties 127 | for x in ['model.input_size', 'model.input_space', 'model.input_range', 'model.mean', 'model.std']: 128 | print(x + ' =', eval(x)) 129 | 130 | # Reshape output to n classes 131 | filters = model.last_linear.weight.shape[1] 132 | model.last_linear.bias = torch.nn.Parameter(torch.zeros(n)) 133 | model.last_linear.weight = torch.nn.Parameter(torch.zeros(n, filters)) 134 | model.last_linear.out_features = n 135 | return model 136 | 137 | 138 | def scale_img(img, ratio=1.0, same_shape=False): # img(16,3,256,416), r=ratio 139 | # scales img(bs,3,y,x) by ratio 140 | h, w = img.shape[2:] 141 | s = (int(h * ratio), int(w * ratio)) # new size 142 | img = F.interpolate(img, size=s, mode='bilinear', align_corners=False) # resize 143 | if not same_shape: # pad/crop img 144 | gs = 32 # (pixels) grid size 145 | h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)] 146 | return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447) # value = imagenet mean 147 | 148 | 149 | class ModelEMA: 150 | """ Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models 151 | Keep a moving average of everything in the model state_dict (parameters and buffers). 152 | This is intended to allow functionality like 153 | https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage 154 | A smoothed version of the weights is necessary for some training schemes to perform well. 155 | E.g. Google's hyper-params for training MNASNet, MobileNet-V3, EfficientNet, etc that use 156 | RMSprop with a short 2.4-3 epoch decay period and slow LR decay rate of .96-.99 requires EMA 157 | smoothing of weights to match results. Pay attention to the decay constant you are using 158 | relative to your update count per epoch. 159 | To keep EMA from using GPU resources, set device='cpu'. This will save a bit of memory but 160 | disable validation of the EMA weights. Validation will have to be done manually in a separate 161 | process, or after the training stops converging. 162 | This class is sensitive where it is initialized in the sequence of model init, 163 | GPU assignment and distributed training wrappers. 164 | I've tested with the sequence in my own train.py for torch.DataParallel, apex.DDP, and single-GPU. 165 | """ 166 | 167 | def __init__(self, model, decay=0.9999, device=''): 168 | # make a copy of the model for accumulating moving average of weights 169 | self.ema = deepcopy(model) 170 | self.ema.eval() 171 | self.updates = 0 # number of EMA updates 172 | self.decay = lambda x: decay * (1 - math.exp(-x / 2000)) # decay exponential ramp (to help early epochs) 173 | self.device = device # perform ema on different device from model if set 174 | if device: 175 | self.ema.to(device=device) 176 | for p in self.ema.parameters(): 177 | p.requires_grad_(False) 178 | 179 | def update(self, model): 180 | self.updates += 1 181 | d = self.decay(self.updates) 182 | with torch.no_grad(): 183 | if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel): 184 | msd, esd = model.module.state_dict(), self.ema.module.state_dict() 185 | else: 186 | msd, esd = model.state_dict(), self.ema.state_dict() 187 | 188 | for k, v in esd.items(): 189 | if v.dtype.is_floating_point: 190 | v *= d 191 | v += (1. - d) * msd[k].detach() 192 | 193 | def update_attr(self, model): 194 | # Assign attributes (which may change during training) 195 | for k in model.__dict__.keys(): 196 | if not k.startswith('_'): 197 | setattr(self.ema, k, getattr(model, k)) 198 | -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | import glob 2 | import math 3 | import os 4 | import random 5 | import shutil 6 | import subprocess 7 | import time 8 | from copy import copy 9 | from pathlib import Path 10 | from sys import platform 11 | 12 | import cv2 13 | import matplotlib 14 | import matplotlib.pyplot as plt 15 | import numpy as np 16 | import torch 17 | import torch.nn as nn 18 | import torchvision 19 | from scipy.signal import butter, filtfilt 20 | from tqdm import tqdm 21 | 22 | from PIL import Image,ImageDraw,ImageFont 23 | 24 | from . import torch_utils, google_utils #  torch_utils, google_utils 25 | 26 | # Set printoptions 27 | torch.set_printoptions(linewidth=320, precision=5, profile='long') 28 | np.set_printoptions(linewidth=320, formatter={'float_kind': '{:11.5g}'.format}) # format short g, %precision=5 29 | matplotlib.rc('font', **{'size': 11}) 30 | 31 | # Prevent OpenCV from multithreading (to use PyTorch DataLoader) 32 | cv2.setNumThreads(0) 33 | 34 | 35 | def init_seeds(seed=0): 36 | random.seed(seed) 37 | np.random.seed(seed) 38 | torch_utils.init_seeds(seed=seed) 39 | 40 | 41 | def check_git_status(): 42 | if platform in ['linux', 'darwin']: 43 | # Suggest 'git pull' if repo is out of date 44 | s = subprocess.check_output('if [ -d .git ]; then git fetch && git status -uno; fi', shell=True).decode('utf-8') 45 | if 'Your branch is behind' in s: 46 | print(s[s.find('Your branch is behind'):s.find('\n\n')] + '\n') 47 | 48 | 49 | def make_divisible(x, divisor): 50 | # Returns x evenly divisble by divisor 51 | return math.ceil(x / divisor) * divisor 52 | 53 | 54 | def labels_to_class_weights(labels, nc=80): 55 | # Get class weights (inverse frequency) from training labels 56 | if labels[0] is None: # no labels loaded 57 | return torch.Tensor() 58 | 59 | labels = np.concatenate(labels, 0) # labels.shape = (866643, 5) for COCO 60 | classes = labels[:, 0].astype(np.int) # labels = [class xywh] 61 | weights = np.bincount(classes, minlength=nc) # occurences per class 62 | 63 | # Prepend gridpoint count (for uCE trianing) 64 | # gpi = ((320 / 32 * np.array([1, 2, 4])) ** 2 * 3).sum() # gridpoints per image 65 | # weights = np.hstack([gpi * len(labels) - weights.sum() * 9, weights * 9]) ** 0.5 # prepend gridpoints to start 66 | 67 | weights[weights == 0] = 1 # replace empty bins with 1 68 | weights = 1 / weights # number of targets per class 69 | weights /= weights.sum() # normalize 70 | return torch.from_numpy(weights) 71 | 72 | 73 | def labels_to_image_weights(labels, nc=80, class_weights=np.ones(80)): 74 | # Produces image weights based on class mAPs 75 | n = len(labels) 76 | class_counts = np.array([np.bincount(labels[i][:, 0].astype(np.int), minlength=nc) for i in range(n)]) 77 | image_weights = (class_weights.reshape(1, nc) * class_counts).sum(1) 78 | # index = random.choices(range(n), weights=image_weights, k=1) # weight image sample 79 | return image_weights 80 | 81 | 82 | def coco80_to_coco91_class(): # converts 80-index (val2014) to 91-index (paper) 83 | # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/ 84 | # a = np.loadtxt('data/coco.names', dtype='str', delimiter='\n') 85 | # b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\n') 86 | # x1 = [list(a[i] == b).index(True) + 1 for i in range(80)] # darknet to coco 87 | # x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)] # coco to darknet 88 | x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32, 33, 34, 89 | 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 90 | 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90] 91 | return x 92 | 93 | 94 | def xyxy2xywh(x): 95 | # Convert nx4 boxes from [x1, y1, x2, y2] to [x, y, w, h] where xy1=top-left, xy2=bottom-right 96 | y = torch.zeros_like(x) if isinstance(x, torch.Tensor) else np.zeros_like(x) 97 | y[:, 0] = (x[:, 0] + x[:, 2]) / 2 # x center 98 | y[:, 1] = (x[:, 1] + x[:, 3]) / 2 # y center 99 | y[:, 2] = x[:, 2] - x[:, 0] # width 100 | y[:, 3] = x[:, 3] - x[:, 1] # height 101 | return y 102 | 103 | 104 | def xywh2xyxy(x): 105 | # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right 106 | y = torch.zeros_like(x) if isinstance(x, torch.Tensor) else np.zeros_like(x) 107 | y[:, 0] = x[:, 0] - x[:, 2] / 2 # top left x 108 | y[:, 1] = x[:, 1] - x[:, 3] / 2 # top left y 109 | y[:, 2] = x[:, 0] + x[:, 2] / 2 # bottom right x 110 | y[:, 3] = x[:, 1] + x[:, 3] / 2 # bottom right y 111 | return y 112 | 113 | 114 | def scale_coords(img1_shape, coords, img0_shape, ratio_pad=None): 115 | # Rescale coords (xyxy) from img1_shape to img0_shape 116 | if ratio_pad is None: # calculate from img0_shape 117 | gain = max(img1_shape) / max(img0_shape) # gain = old / new 118 | pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2 # wh padding 119 | else: 120 | gain = ratio_pad[0][0] 121 | pad = ratio_pad[1] 122 | 123 | coords[:, [0, 2]] -= pad[0] # x padding 124 | coords[:, [1, 3]] -= pad[1] # y padding 125 | coords[:, :4] /= gain 126 | clip_coords(coords, img0_shape) 127 | return coords 128 | 129 | 130 | def clip_coords(boxes, img_shape): 131 | # Clip bounding xyxy bounding boxes to image shape (height, width) 132 | boxes[:, 0].clamp_(0, img_shape[1]) # x1 133 | boxes[:, 1].clamp_(0, img_shape[0]) # y1 134 | boxes[:, 2].clamp_(0, img_shape[1]) # x2 135 | boxes[:, 3].clamp_(0, img_shape[0]) # y2 136 | 137 | 138 | def ap_per_class(tp, conf, pred_cls, target_cls): 139 | """ Compute the average precision, given the recall and precision curves. 140 | Source: https://github.com/rafaelpadilla/Object-Detection-Metrics. 141 | # Arguments 142 | tp: True positives (nparray, nx1 or nx10). 143 | conf: Objectness value from 0-1 (nparray). 144 | pred_cls: Predicted object classes (nparray). 145 | target_cls: True object classes (nparray). 146 | # Returns 147 | The average precision as computed in py-faster-rcnn. 148 | """ 149 | 150 | # Sort by objectness 151 | i = np.argsort(-conf) 152 | tp, conf, pred_cls = tp[i], conf[i], pred_cls[i] 153 | 154 | # Find unique classes 155 | unique_classes = np.unique(target_cls) 156 | 157 | # Create Precision-Recall curve and compute AP for each class 158 | pr_score = 0.1 # score to evaluate P and R https://github.com/ultralytics/yolov3/issues/898 159 | s = [unique_classes.shape[0], tp.shape[1]] # number class, number iou thresholds (i.e. 10 for mAP0.5...0.95) 160 | ap, p, r = np.zeros(s), np.zeros(s), np.zeros(s) 161 | for ci, c in enumerate(unique_classes): 162 | i = pred_cls == c 163 | n_gt = (target_cls == c).sum() # Number of ground truth objects 164 | n_p = i.sum() # Number of predicted objects 165 | 166 | if n_p == 0 or n_gt == 0: 167 | continue 168 | else: 169 | # Accumulate FPs and TPs 170 | fpc = (1 - tp[i]).cumsum(0) 171 | tpc = tp[i].cumsum(0) 172 | 173 | # Recall 174 | recall = tpc / (n_gt + 1e-16) # recall curve 175 | r[ci] = np.interp(-pr_score, -conf[i], recall[:, 0]) # r at pr_score, negative x, xp because xp decreases 176 | 177 | # Precision 178 | precision = tpc / (tpc + fpc) # precision curve 179 | p[ci] = np.interp(-pr_score, -conf[i], precision[:, 0]) # p at pr_score 180 | 181 | # AP from recall-precision curve 182 | for j in range(tp.shape[1]): 183 | ap[ci, j] = compute_ap(recall[:, j], precision[:, j]) 184 | 185 | # Plot 186 | # fig, ax = plt.subplots(1, 1, figsize=(5, 5)) 187 | # ax.plot(recall, precision) 188 | # ax.set_xlabel('Recall') 189 | # ax.set_ylabel('Precision') 190 | # ax.set_xlim(0, 1.01) 191 | # ax.set_ylim(0, 1.01) 192 | # fig.tight_layout() 193 | # fig.savefig('PR_curve.png', dpi=300) 194 | 195 | # Compute F1 score (harmonic mean of precision and recall) 196 | f1 = 2 * p * r / (p + r + 1e-16) 197 | 198 | return p, r, ap, f1, unique_classes.astype('int32') 199 | 200 | 201 | def compute_ap(recall, precision): 202 | """ Compute the average precision, given the recall and precision curves. 203 | Source: https://github.com/rbgirshick/py-faster-rcnn. 204 | # Arguments 205 | recall: The recall curve (list). 206 | precision: The precision curve (list). 207 | # Returns 208 | The average precision as computed in py-faster-rcnn. 209 | """ 210 | 211 | # Append sentinel values to beginning and end 212 | mrec = np.concatenate(([0.], recall, [min(recall[-1] + 1E-3, 1.)])) 213 | mpre = np.concatenate(([0.], precision, [0.])) 214 | 215 | # Compute the precision envelope 216 | mpre = np.flip(np.maximum.accumulate(np.flip(mpre))) 217 | 218 | # Integrate area under curve 219 | method = 'interp' # methods: 'continuous', 'interp' 220 | if method == 'interp': 221 | x = np.linspace(0, 1, 101) # 101-point interp (COCO) 222 | ap = np.trapz(np.interp(x, mrec, mpre), x) # integrate 223 | else: # 'continuous' 224 | i = np.where(mrec[1:] != mrec[:-1])[0] # points where x axis (recall) changes 225 | ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) # area under curve 226 | 227 | return ap 228 | 229 | 230 | def bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False): 231 | # Returns the IoU of box1 to box2. box1 is 4, box2 is nx4 232 | box2 = box2.t() 233 | 234 | # Get the coordinates of bounding boxes 235 | if x1y1x2y2: # x1, y1, x2, y2 = box1 236 | b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3] 237 | b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3] 238 | else: # transform from xywh to xyxy 239 | b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2 240 | b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2 241 | b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2 242 | b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2 243 | 244 | # Intersection area 245 | inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \ 246 | (torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0) 247 | 248 | # Union Area 249 | w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 250 | w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 251 | union = (w1 * h1 + 1e-16) + w2 * h2 - inter 252 | 253 | iou = inter / union # iou 254 | if GIoU or DIoU or CIoU: 255 | cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1) # convex (smallest enclosing box) width 256 | ch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1) # convex height 257 | if GIoU: # Generalized IoU https://arxiv.org/pdf/1902.09630.pdf 258 | c_area = cw * ch + 1e-16 # convex area 259 | return iou - (c_area - union) / c_area # GIoU 260 | if DIoU or CIoU: # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1 261 | # convex diagonal squared 262 | c2 = cw ** 2 + ch ** 2 + 1e-16 263 | # centerpoint distance squared 264 | rho2 = ((b2_x1 + b2_x2) - (b1_x1 + b1_x2)) ** 2 / 4 + ((b2_y1 + b2_y2) - (b1_y1 + b1_y2)) ** 2 / 4 265 | if DIoU: 266 | return iou - rho2 / c2 # DIoU 267 | elif CIoU: # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47 268 | v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2) 269 | with torch.no_grad(): 270 | alpha = v / (1 - iou + v) 271 | return iou - (rho2 / c2 + v * alpha) # CIoU 272 | 273 | return iou 274 | 275 | 276 | def box_iou(box1, box2): 277 | # https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py 278 | """ 279 | Return intersection-over-union (Jaccard index) of boxes. 280 | Both sets of boxes are expected to be in (x1, y1, x2, y2) format. 281 | Arguments: 282 | box1 (Tensor[N, 4]) 283 | box2 (Tensor[M, 4]) 284 | Returns: 285 | iou (Tensor[N, M]): the NxM matrix containing the pairwise 286 | IoU values for every element in boxes1 and boxes2 287 | """ 288 | 289 | def box_area(box): 290 | # box = 4xn 291 | return (box[2] - box[0]) * (box[3] - box[1]) 292 | 293 | area1 = box_area(box1.t()) 294 | area2 = box_area(box2.t()) 295 | 296 | # inter(N,M) = (rb(N,M,2) - lt(N,M,2)).clamp(0).prod(2) 297 | inter = (torch.min(box1[:, None, 2:], box2[:, 2:]) - torch.max(box1[:, None, :2], box2[:, :2])).clamp(0).prod(2) 298 | return inter / (area1[:, None] + area2 - inter) # iou = inter / (area1 + area2 - inter) 299 | 300 | 301 | def wh_iou(wh1, wh2): 302 | # Returns the nxm IoU matrix. wh1 is nx2, wh2 is mx2 303 | wh1 = wh1[:, None] # [N,1,2] 304 | wh2 = wh2[None] # [1,M,2] 305 | inter = torch.min(wh1, wh2).prod(2) # [N,M] 306 | return inter / (wh1.prod(2) + wh2.prod(2) - inter) # iou = inter / (area1 + area2 - inter) 307 | 308 | 309 | class FocalLoss(nn.Module): 310 | # Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5) 311 | def __init__(self, loss_fcn, gamma=1.5, alpha=0.25): 312 | super(FocalLoss, self).__init__() 313 | self.loss_fcn = loss_fcn # must be nn.BCEWithLogitsLoss() 314 | self.gamma = gamma 315 | self.alpha = alpha 316 | self.reduction = loss_fcn.reduction 317 | self.loss_fcn.reduction = 'none' # required to apply FL to each element 318 | 319 | def forward(self, pred, true): 320 | loss = self.loss_fcn(pred, true) 321 | # p_t = torch.exp(-loss) 322 | # loss *= self.alpha * (1.000001 - p_t) ** self.gamma # non-zero power for gradient stability 323 | 324 | # TF implementation https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/losses/focal_loss.py 325 | pred_prob = torch.sigmoid(pred) # prob from logits 326 | p_t = true * pred_prob + (1 - true) * (1 - pred_prob) 327 | alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha) 328 | modulating_factor = (1.0 - p_t) ** self.gamma 329 | loss *= alpha_factor * modulating_factor 330 | 331 | if self.reduction == 'mean': 332 | return loss.mean() 333 | elif self.reduction == 'sum': 334 | return loss.sum() 335 | else: # 'none' 336 | return loss 337 | 338 | 339 | def smooth_BCE(eps=0.1): # https://github.com/ultralytics/yolov3/issues/238#issuecomment-598028441 340 | # return positive, negative label smoothing BCE targets 341 | return 1.0 - 0.5 * eps, 0.5 * eps 342 | 343 | 344 | class BCEBlurWithLogitsLoss(nn.Module): 345 | # BCEwithLogitLoss() with reduced missing label effects. 346 | def __init__(self, alpha=0.05): 347 | super(BCEBlurWithLogitsLoss, self).__init__() 348 | self.loss_fcn = nn.BCEWithLogitsLoss(reduction='none') # must be nn.BCEWithLogitsLoss() 349 | self.alpha = alpha 350 | 351 | def forward(self, pred, true): 352 | loss = self.loss_fcn(pred, true) 353 | pred = torch.sigmoid(pred) # prob from logits 354 | dx = pred - true # reduce only missing label effects 355 | # dx = (pred - true).abs() # reduce missing label and false label effects 356 | alpha_factor = 1 - torch.exp((dx - 1) / (self.alpha + 1e-4)) 357 | loss *= alpha_factor 358 | return loss.mean() 359 | 360 | 361 | def compute_loss(p, targets, model): # predictions, targets, model 362 | ft = torch.cuda.FloatTensor if p[0].is_cuda else torch.Tensor 363 | lcls, lbox, lobj = ft([0]), ft([0]), ft([0]) 364 | tcls, tbox, indices, anchors = build_targets(p, targets, model) # targets 365 | h = model.hyp # hyperparameters 366 | red = 'mean' # Loss reduction (sum or mean) 367 | 368 | # Define criteria 369 | BCEcls = nn.BCEWithLogitsLoss(pos_weight=ft([h['cls_pw']]), reduction=red) 370 | BCEobj = nn.BCEWithLogitsLoss(pos_weight=ft([h['obj_pw']]), reduction=red) 371 | 372 | # class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3 373 | cp, cn = smooth_BCE(eps=0.0) 374 | 375 | # focal loss 376 | g = h['fl_gamma'] # focal loss gamma 377 | if g > 0: 378 | BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g) 379 | 380 | # per output 381 | nt = 0 # targets 382 | for i, pi in enumerate(p): # layer index, layer predictions 383 | b, a, gj, gi = indices[i] # image, anchor, gridy, gridx 384 | tobj = torch.zeros_like(pi[..., 0]) # target obj 385 | 386 | nb = b.shape[0] # number of targets 387 | if nb: 388 | nt += nb # cumulative targets 389 | ps = pi[b, a, gj, gi] # prediction subset corresponding to targets 390 | 391 | # GIoU 392 | pxy = ps[:, :2].sigmoid() * 2. - 0.5 393 | pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i] 394 | pbox = torch.cat((pxy, pwh), 1) # predicted box 395 | giou = bbox_iou(pbox.t(), tbox[i], x1y1x2y2=False, GIoU=True) # giou(prediction, target) 396 | lbox += (1.0 - giou).sum() if red == 'sum' else (1.0 - giou).mean() # giou loss 397 | 398 | # Obj 399 | tobj[b, a, gj, gi] = (1.0 - model.gr) + model.gr * giou.detach().clamp(0).type(tobj.dtype) # giou ratio 400 | 401 | # Class 402 | if model.nc > 1: # cls loss (only if multiple classes) 403 | t = torch.full_like(ps[:, 5:], cn) # targets 404 | t[range(nb), tcls[i]] = cp 405 | lcls += BCEcls(ps[:, 5:], t) # BCE 406 | 407 | # Append targets to text file 408 | # with open('targets.txt', 'a') as file: 409 | # [file.write('%11.5g ' * 4 % tuple(x) + '\n') for x in torch.cat((txy[i], twh[i]), 1)] 410 | 411 | lobj += BCEobj(pi[..., 4], tobj) # obj loss 412 | 413 | lbox *= h['giou'] 414 | lobj *= h['obj'] 415 | lcls *= h['cls'] 416 | bs = tobj.shape[0] # batch size 417 | if red == 'sum': 418 | g = 3.0 # loss gain 419 | lobj *= g / bs 420 | if nt: 421 | lcls *= g / nt / model.nc 422 | lbox *= g / nt 423 | 424 | loss = lbox + lobj + lcls 425 | return loss * bs, torch.cat((lbox, lobj, lcls, loss)).detach() 426 | 427 | 428 | def build_targets(p, targets, model): 429 | # Build targets for compute_loss(), input targets(image,class,x,y,w,h) 430 | det = model.module.model[-1] if type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel) \ 431 | else model.model[-1] # Detect() module 432 | na, nt = det.na, targets.shape[0] # number of anchors, targets 433 | tcls, tbox, indices, anch = [], [], [], [] 434 | gain = torch.ones(6, device=targets.device) # normalized to gridspace gain 435 | off = torch.tensor([[1, 0], [0, 1], [-1, 0], [0, -1]], device=targets.device).float() # overlap offsets 436 | at = torch.arange(na).view(na, 1).repeat(1, nt) # anchor tensor, same as .repeat_interleave(nt) 437 | 438 | style = 'rect4' 439 | for i in range(det.nl): 440 | anchors = det.anchors[i] 441 | gain[2:] = torch.tensor(p[i].shape)[[3, 2, 3, 2]] # xyxy gain 442 | 443 | # Match targets to anchors 444 | a, t, offsets = [], targets * gain, 0 445 | if nt: 446 | r = t[None, :, 4:6] / anchors[:, None] # wh ratio 447 | j = torch.max(r, 1. / r).max(2)[0] < model.hyp['anchor_t'] # compare 448 | # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t'] # iou(3,n) = wh_iou(anchors(3,2), gwh(n,2)) 449 | a, t = at[j], t.repeat(na, 1, 1)[j] # filter 450 | 451 | # overlaps 452 | gxy = t[:, 2:4] # grid xy 453 | z = torch.zeros_like(gxy) 454 | if style == 'rect2': 455 | g = 0.2 # offset 456 | j, k = ((gxy % 1. < g) & (gxy > 1.)).T 457 | a, t = torch.cat((a, a[j], a[k]), 0), torch.cat((t, t[j], t[k]), 0) 458 | offsets = torch.cat((z, z[j] + off[0], z[k] + off[1]), 0) * g 459 | 460 | elif style == 'rect4': 461 | g = 0.5 # offset 462 | j, k = ((gxy % 1. < g) & (gxy > 1.)).T 463 | l, m = ((gxy % 1. > (1 - g)) & (gxy < (gain[[2, 3]] - 1.))).T 464 | a, t = torch.cat((a, a[j], a[k], a[l], a[m]), 0), torch.cat((t, t[j], t[k], t[l], t[m]), 0) 465 | offsets = torch.cat((z, z[j] + off[0], z[k] + off[1], z[l] + off[2], z[m] + off[3]), 0) * g 466 | 467 | # Define 468 | b, c = t[:, :2].long().T # image, class 469 | gxy = t[:, 2:4] # grid xy 470 | gwh = t[:, 4:6] # grid wh 471 | gij = (gxy - offsets).long() 472 | gi, gj = gij.T # grid xy indices 473 | 474 | # Append 475 | indices.append((b, a, gj, gi)) # image, anchor, grid indices 476 | tbox.append(torch.cat((gxy - gij, gwh), 1)) # box 477 | anch.append(anchors[a]) # anchors 478 | tcls.append(c) # class 479 | 480 | return tcls, tbox, indices, anch 481 | 482 | 483 | def non_max_suppression(prediction, conf_thres=0.1, iou_thres=0.6, fast=False, classes=None, agnostic=False): 484 | """ 485 | Performs Non-Maximum Suppression on inference results 486 | Returns detections with shape: 487 | nx6 (x1, y1, x2, y2, conf, cls) 488 | """ 489 | nc = prediction[0].shape[1] - 5 # number of classes 490 | xc = prediction[..., 4] > conf_thres # candidates 491 | 492 | # Settings 493 | min_wh, max_wh = 2, 4096 # (pixels) minimum and maximum box width and height 494 | max_det = 300 # maximum number of detections per image 495 | time_limit = 10.0 # seconds to quit after 496 | redundant = True # require redundant detections 497 | fast |= conf_thres > 0.001 # fast mode 498 | if fast: 499 | merge = False 500 | multi_label = False 501 | else: 502 | merge = True # merge for best mAP (adds 0.5ms/img) 503 | multi_label = nc > 1 # multiple labels per box (adds 0.5ms/img) 504 | 505 | t = time.time() 506 | output = [None] * prediction.shape[0] 507 | for xi, x in enumerate(prediction): # image index, image inference 508 | # Apply constraints 509 | # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0 # width-height 510 | x = x[xc[xi]] # confidence 511 | 512 | # If none remain process next image 513 | if not x.shape[0]: 514 | continue 515 | 516 | # Compute conf 517 | x[:, 5:] *= x[:, 4:5] # conf = obj_conf * cls_conf 518 | 519 | # Box (center x, center y, width, height) to (x1, y1, x2, y2) 520 | box = xywh2xyxy(x[:, :4]) 521 | 522 | # Detections matrix nx6 (xyxy, conf, cls) 523 | if multi_label: 524 | i, j = (x[:, 5:] > conf_thres).nonzero().t() 525 | x = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1) 526 | else: # best class only 527 | conf, j = x[:, 5:].max(1, keepdim=True) 528 | x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres] 529 | 530 | # Filter by class 531 | if classes: 532 | x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)] 533 | 534 | # Apply finite constraint 535 | # if not torch.isfinite(x).all(): 536 | # x = x[torch.isfinite(x).all(1)] 537 | 538 | # If none remain process next image 539 | n = x.shape[0] # number of boxes 540 | if not n: 541 | continue 542 | 543 | # Sort by confidence 544 | # x = x[x[:, 4].argsort(descending=True)] 545 | 546 | # Batched NMS 547 | c = x[:, 5:6] * (0 if agnostic else max_wh) # classes 548 | boxes, scores = x[:, :4] + c, x[:, 4] # boxes (offset by class), scores 549 | i = torchvision.ops.boxes.nms(boxes, scores, iou_thres) 550 | if i.shape[0] > max_det: # limit detections 551 | i = i[:max_det] 552 | if merge and (1 < n < 3E3): # Merge NMS (boxes merged using weighted mean) 553 | try: # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4) 554 | iou = box_iou(boxes[i], boxes) > iou_thres # iou matrix 555 | weights = iou * scores[None] # box weights 556 | x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True) # merged boxes 557 | if redundant: 558 | i = i[iou.sum(1) > 1] # require redundancy 559 | except: # possible CUDA error https://github.com/ultralytics/yolov3/issues/1139 560 | print(x, i, x.shape, i.shape) 561 | pass 562 | 563 | output[xi] = x[i] 564 | if (time.time() - t) > time_limit: 565 | break # time limit exceeded 566 | 567 | return output 568 | 569 | 570 | def strip_optimizer(f='weights/best.pt'): # from utils.utils import *; strip_optimizer() 571 | # Strip optimizer from *.pt files for lighter files (reduced by 1/2 size) 572 | x = torch.load(f, map_location=torch.device('cpu')) 573 | x['optimizer'] = None 574 | torch.save(x, f) 575 | print('Optimizer stripped from %s' % f) 576 | 577 | 578 | def create_backbone(f='weights/best.pt', s='weights/backbone.pt'): # from utils.utils import *; create_backbone() 579 | # create backbone 's' from 'f' 580 | device = torch.device('cpu') 581 | x = torch.load(f, map_location=device) 582 | torch.save(x, s) # update model if SourceChangeWarning 583 | x = torch.load(s, map_location=device) 584 | 585 | x['optimizer'] = None 586 | x['training_results'] = None 587 | x['epoch'] = -1 588 | for p in x['model'].parameters(): 589 | p.requires_grad = True 590 | torch.save(x, s) 591 | print('%s modified for backbone use and saved as %s' % (f, s)) 592 | 593 | 594 | def coco_class_count(path='../coco/labels/train2014/'): 595 | # Histogram of occurrences per class 596 | nc = 80 # number classes 597 | x = np.zeros(nc, dtype='int32') 598 | files = sorted(glob.glob('%s/*.*' % path)) 599 | for i, file in enumerate(files): 600 | labels = np.loadtxt(file, dtype=np.float32).reshape(-1, 5) 601 | x += np.bincount(labels[:, 0].astype('int32'), minlength=nc) 602 | print(i, len(files)) 603 | 604 | 605 | def coco_only_people(path='../coco/labels/train2017/'): # from utils.utils import *; coco_only_people() 606 | # Find images with only people 607 | files = sorted(glob.glob('%s/*.*' % path)) 608 | for i, file in enumerate(files): 609 | labels = np.loadtxt(file, dtype=np.float32).reshape(-1, 5) 610 | if all(labels[:, 0] == 0): 611 | print(labels.shape[0], file) 612 | 613 | 614 | def crop_images_random(path='../images/', scale=0.50): # from utils.utils import *; crop_images_random() 615 | # crops images into random squares up to scale fraction 616 | # WARNING: overwrites images! 617 | for file in tqdm(sorted(glob.glob('%s/*.*' % path))): 618 | img = cv2.imread(file) # BGR 619 | if img is not None: 620 | h, w = img.shape[:2] 621 | 622 | # create random mask 623 | a = 30 # minimum size (pixels) 624 | mask_h = random.randint(a, int(max(a, h * scale))) # mask height 625 | mask_w = mask_h # mask width 626 | 627 | # box 628 | xmin = max(0, random.randint(0, w) - mask_w // 2) 629 | ymin = max(0, random.randint(0, h) - mask_h // 2) 630 | xmax = min(w, xmin + mask_w) 631 | ymax = min(h, ymin + mask_h) 632 | 633 | # apply random color mask 634 | cv2.imwrite(file, img[ymin:ymax, xmin:xmax]) 635 | 636 | 637 | def coco_single_class_labels(path='../coco/labels/train2014/', label_class=43): 638 | # Makes single-class coco datasets. from utils.utils import *; coco_single_class_labels() 639 | if os.path.exists('new/'): 640 | shutil.rmtree('new/') # delete output folder 641 | os.makedirs('new/') # make new output folder 642 | os.makedirs('new/labels/') 643 | os.makedirs('new/images/') 644 | for file in tqdm(sorted(glob.glob('%s/*.*' % path))): 645 | with open(file, 'r') as f: 646 | labels = np.array([x.split() for x in f.read().splitlines()], dtype=np.float32) 647 | i = labels[:, 0] == label_class 648 | if any(i): 649 | img_file = file.replace('labels', 'images').replace('txt', 'jpg') 650 | labels[:, 0] = 0 # reset class to 0 651 | with open('new/images.txt', 'a') as f: # add image to dataset list 652 | f.write(img_file + '\n') 653 | with open('new/labels/' + Path(file).name, 'a') as f: # write label 654 | for l in labels[i]: 655 | f.write('%g %.6f %.6f %.6f %.6f\n' % tuple(l)) 656 | shutil.copyfile(src=img_file, dst='new/images/' + Path(file).name.replace('txt', 'jpg')) # copy images 657 | 658 | 659 | def kmean_anchors(path='./data/coco128.txt', n=9, img_size=(640, 640), thr=0.20, gen=1000): 660 | # Creates kmeans anchors for use in *.cfg files: from utils.utils import *; _ = kmean_anchors() 661 | # n: number of anchors 662 | # img_size: (min, max) image size used for multi-scale training (can be same values) 663 | # thr: IoU threshold hyperparameter used for training (0.0 - 1.0) 664 | # gen: generations to evolve anchors using genetic algorithm 665 | from utils.datasets import LoadImagesAndLabels 666 | 667 | def print_results(k): 668 | k = k[np.argsort(k.prod(1))] # sort small to large 669 | iou = wh_iou(wh, torch.Tensor(k)) 670 | max_iou = iou.max(1)[0] 671 | bpr, aat = (max_iou > thr).float().mean(), (iou > thr).float().mean() * n # best possible recall, anch > thr 672 | 673 | # thr = 5.0 674 | # r = wh[:, None] / k[None] 675 | # ar = torch.max(r, 1. / r).max(2)[0] 676 | # max_ar = ar.min(1)[0] 677 | # bpr, aat = (max_ar < thr).float().mean(), (ar < thr).float().mean() * n # best possible recall, anch > thr 678 | 679 | print('%.2f iou_thr: %.3f best possible recall, %.2f anchors > thr' % (thr, bpr, aat)) 680 | print('n=%g, img_size=%s, IoU_all=%.3f/%.3f-mean/best, IoU>thr=%.3f-mean: ' % 681 | (n, img_size, iou.mean(), max_iou.mean(), iou[iou > thr].mean()), end='') 682 | for i, x in enumerate(k): 683 | print('%i,%i' % (round(x[0]), round(x[1])), end=', ' if i < len(k) - 1 else '\n') # use in *.cfg 684 | return k 685 | 686 | def fitness(k): # mutation fitness 687 | iou = wh_iou(wh, torch.Tensor(k)) # iou 688 | max_iou = iou.max(1)[0] 689 | return (max_iou * (max_iou > thr).float()).mean() # product 690 | 691 | # def fitness_ratio(k): # mutation fitness 692 | # # wh(5316,2), k(9,2) 693 | # r = wh[:, None] / k[None] 694 | # x = torch.max(r, 1. / r).max(2)[0] 695 | # m = x.min(1)[0] 696 | # return 1. / (m * (m < 5).float()).mean() # product 697 | 698 | # Get label wh 699 | wh = [] 700 | dataset = LoadImagesAndLabels(path, augment=True, rect=True) 701 | nr = 1 if img_size[0] == img_size[1] else 3 # number augmentation repetitions 702 | for s, l in zip(dataset.shapes, dataset.labels): 703 | # wh.append(l[:, 3:5] * (s / s.max())) # image normalized to letterbox normalized wh 704 | wh.append(l[:, 3:5] * s) # image normalized to pixels 705 | wh = np.concatenate(wh, 0).repeat(nr, axis=0) # augment 3x 706 | # wh *= np.random.uniform(img_size[0], img_size[1], size=(wh.shape[0], 1)) # normalized to pixels (multi-scale) 707 | wh = wh[(wh > 2.0).all(1)] # remove below threshold boxes (< 2 pixels wh) 708 | 709 | # Kmeans calculation 710 | from scipy.cluster.vq import kmeans 711 | print('Running kmeans for %g anchors on %g points...' % (n, len(wh))) 712 | s = wh.std(0) # sigmas for whitening 713 | k, dist = kmeans(wh / s, n, iter=30) # points, mean distance 714 | k *= s 715 | wh = torch.Tensor(wh) 716 | k = print_results(k) 717 | 718 | # # Plot 719 | # k, d = [None] * 20, [None] * 20 720 | # for i in tqdm(range(1, 21)): 721 | # k[i-1], d[i-1] = kmeans(wh / s, i) # points, mean distance 722 | # fig, ax = plt.subplots(1, 2, figsize=(14, 7)) 723 | # ax = ax.ravel() 724 | # ax[0].plot(np.arange(1, 21), np.array(d) ** 2, marker='.') 725 | # fig, ax = plt.subplots(1, 2, figsize=(14, 7)) # plot wh 726 | # ax[0].hist(wh[wh[:, 0]<100, 0],400) 727 | # ax[1].hist(wh[wh[:, 1]<100, 1],400) 728 | # fig.tight_layout() 729 | # fig.savefig('wh.png', dpi=200) 730 | 731 | # Evolve 732 | npr = np.random 733 | f, sh, mp, s = fitness(k), k.shape, 0.9, 0.1 # fitness, generations, mutation prob, sigma 734 | for _ in tqdm(range(gen), desc='Evolving anchors'): 735 | v = np.ones(sh) 736 | while (v == 1).all(): # mutate until a change occurs (prevent duplicates) 737 | v = ((npr.random(sh) < mp) * npr.random() * npr.randn(*sh) * s + 1).clip(0.3, 3.0) 738 | kg = (k.copy() * v).clip(min=2.0) 739 | fg = fitness(kg) 740 | if fg > f: 741 | f, k = fg, kg.copy() 742 | print_results(k) 743 | k = print_results(k) 744 | 745 | return k 746 | 747 | 748 | def print_mutation(hyp, results, bucket=''): 749 | # Print mutation results to evolve.txt (for use with train.py --evolve) 750 | a = '%10s' * len(hyp) % tuple(hyp.keys()) # hyperparam keys 751 | b = '%10.3g' * len(hyp) % tuple(hyp.values()) # hyperparam values 752 | c = '%10.4g' * len(results) % results # results (P, R, mAP, F1, test_loss) 753 | print('\n%s\n%s\nEvolved fitness: %s\n' % (a, b, c)) 754 | 755 | if bucket: 756 | os.system('gsutil cp gs://%s/evolve.txt .' % bucket) # download evolve.txt 757 | 758 | with open('evolve.txt', 'a') as f: # append result 759 | f.write(c + b + '\n') 760 | x = np.unique(np.loadtxt('evolve.txt', ndmin=2), axis=0) # load unique rows 761 | np.savetxt('evolve.txt', x[np.argsort(-fitness(x))], '%10.3g') # save sort by fitness 762 | 763 | if bucket: 764 | os.system('gsutil cp evolve.txt gs://%s' % bucket) # upload evolve.txt 765 | 766 | 767 | def apply_classifier(x, model, img, im0): 768 | # applies a second stage classifier to yolo outputs 769 | im0 = [im0] if isinstance(im0, np.ndarray) else im0 770 | for i, d in enumerate(x): # per image 771 | if d is not None and len(d): 772 | d = d.clone() 773 | 774 | # Reshape and pad cutouts 775 | b = xyxy2xywh(d[:, :4]) # boxes 776 | b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1) # rectangle to square 777 | b[:, 2:] = b[:, 2:] * 1.3 + 30 # pad 778 | d[:, :4] = xywh2xyxy(b).long() 779 | 780 | # Rescale boxes from img_size to im0 size 781 | scale_coords(img.shape[2:], d[:, :4], im0[i].shape) 782 | 783 | # Classes 784 | pred_cls1 = d[:, 5].long() 785 | ims = [] 786 | for j, a in enumerate(d): # per item 787 | cutout = im0[i][int(a[1]):int(a[3]), int(a[0]):int(a[2])] 788 | im = cv2.resize(cutout, (224, 224)) # BGR 789 | # cv2.imwrite('test%i.jpg' % j, cutout) 790 | 791 | im = im[:, :, ::-1].transpose(2, 0, 1) # BGR to RGB, to 3x416x416 792 | im = np.ascontiguousarray(im, dtype=np.float32) # uint8 to float32 793 | im /= 255.0 # 0 - 255 to 0.0 - 1.0 794 | ims.append(im) 795 | 796 | pred_cls2 = model(torch.Tensor(ims).to(d.device)).argmax(1) # classifier prediction 797 | x[i] = x[i][pred_cls1 == pred_cls2] # retain matching class detections 798 | 799 | return x 800 | 801 | 802 | def fitness(x): 803 | # Returns fitness (for use with results.txt or evolve.txt) 804 | w = [0.0, 0.0, 0.1, 0.9] # weights for [P, R, mAP@0.5, mAP@0.5:0.95] 805 | return (x[:, :4] * w).sum(1) 806 | 807 | 808 | def output_to_target(output, width, height): 809 | """ 810 | Convert a YOLO model output to target format 811 | [batch_id, class_id, x, y, w, h, conf] 812 | """ 813 | if isinstance(output, torch.Tensor): 814 | output = output.cpu().numpy() 815 | 816 | targets = [] 817 | for i, o in enumerate(output): 818 | if o is not None: 819 | for pred in o: 820 | box = pred[:4] 821 | w = (box[2] - box[0]) / width 822 | h = (box[3] - box[1]) / height 823 | x = box[0] / width + w / 2 824 | y = box[1] / height + h / 2 825 | conf = pred[4] 826 | cls = int(pred[5]) 827 | 828 | targets.append([i, cls, x, y, w, h, conf]) 829 | 830 | return np.array(targets) 831 | 832 | 833 | # Plotting functions --------------------------------------------------------------------------------------------------- 834 | def butter_lowpass_filtfilt(data, cutoff=1500, fs=50000, order=5): 835 | # https://stackoverflow.com/questions/28536191/how-to-filter-smooth-with-scipy-numpy 836 | def butter_lowpass(cutoff, fs, order): 837 | nyq = 0.5 * fs 838 | normal_cutoff = cutoff / nyq 839 | b, a = butter(order, normal_cutoff, btype='low', analog=False) 840 | return b, a 841 | 842 | b, a = butter_lowpass(cutoff, fs, order=order) 843 | return filtfilt(b, a, data) # forward-backward filter 844 | 845 | 846 | def cv2AddChineseText(img, text, position, textColor=(0, 255, 0), textSize=30): 847 | if (isinstance(img, np.ndarray)): # 判断是否OpenCV图片类型 848 | img = Image.fromarray(cv2.cvtColor(img, cv2.COLOR_BGR2RGB)) 849 | # 创建一个可以在给定图像上绘图的对象 850 | draw = ImageDraw.Draw(img) 851 | # 字体的格式 852 | fontStyle = ImageFont.truetype( 853 | "simsun.ttc", textSize, encoding="utf-8") 854 | # 绘制文本 855 | draw.text(position, text, textColor, font=fontStyle) 856 | # 转换回OpenCV格式 857 | return cv2.cvtColor(np.asarray(img), cv2.COLOR_RGB2BGR) 858 | 859 | def plot_one_box(x, img, color=None, label=None, line_thickness=None): 860 | # Plots one bounding box on image img 861 | tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1 # line/font thickness 862 | color = color or [random.randint(0, 255) for _ in range(3)] 863 | c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3])) 864 | cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA) 865 | if label: 866 | tf = max(tl - 1, 1) # font thickness 867 | t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0] 868 | c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3 869 | cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) # filled 870 | img = cv2AddChineseText(img, label, (c1[0], c1[1] - 14), textColor=(255, 255, 255), textSize=15) 871 | font = cv2.FONT_HERSHEY_SIMPLEX 872 | cv2.putText(img,"YOLO v5 by HuBin",(40,40),font, 0.1, (0, 255, 0), 1) 873 | # cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA) 874 | 875 | return img 876 | 877 | def plot_wh_methods(): # from utils.utils import *; plot_wh_methods() 878 | # Compares the two methods for width-height anchor multiplication 879 | # https://github.com/ultralytics/yolov3/issues/168 880 | x = np.arange(-4.0, 4.0, .1) 881 | ya = np.exp(x) 882 | yb = torch.sigmoid(torch.from_numpy(x)).numpy() * 2 883 | 884 | fig = plt.figure(figsize=(6, 3), dpi=150) 885 | plt.plot(x, ya, '.-', label='yolo method') 886 | plt.plot(x, yb ** 2, '.-', label='^2 power method') 887 | plt.plot(x, yb ** 2.5, '.-', label='^2.5 power method') 888 | plt.xlim(left=-4, right=4) 889 | plt.ylim(bottom=0, top=6) 890 | plt.xlabel('input') 891 | plt.ylabel('output') 892 | plt.legend() 893 | fig.tight_layout() 894 | fig.savefig('comparison.png', dpi=200) 895 | 896 | 897 | def plot_images(images, targets, paths=None, fname='images.jpg', names=None, max_size=640, max_subplots=16): 898 | tl = 3 # line thickness 899 | tf = max(tl - 1, 1) # font thickness 900 | if os.path.isfile(fname): # do not overwrite 901 | return None 902 | 903 | if isinstance(images, torch.Tensor): 904 | images = images.cpu().numpy() 905 | 906 | if isinstance(targets, torch.Tensor): 907 | targets = targets.cpu().numpy() 908 | 909 | # un-normalise 910 | if np.max(images[0]) <= 1: 911 | images *= 255 912 | 913 | bs, _, h, w = images.shape # batch size, _, height, width 914 | bs = min(bs, max_subplots) # limit plot images 915 | ns = np.ceil(bs ** 0.5) # number of subplots (square) 916 | 917 | # Check if we should resize 918 | scale_factor = max_size / max(h, w) 919 | if scale_factor < 1: 920 | h = math.ceil(scale_factor * h) 921 | w = math.ceil(scale_factor * w) 922 | 923 | # Empty array for output 924 | mosaic = np.full((int(ns * h), int(ns * w), 3), 255, dtype=np.uint8) 925 | 926 | # Fix class - colour map 927 | prop_cycle = plt.rcParams['axes.prop_cycle'] 928 | # https://stackoverflow.com/questions/51350872/python-from-color-name-to-rgb 929 | hex2rgb = lambda h: tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4)) 930 | color_lut = [hex2rgb(h) for h in prop_cycle.by_key()['color']] 931 | 932 | for i, img in enumerate(images): 933 | if i == max_subplots: # if last batch has fewer images than we expect 934 | break 935 | 936 | block_x = int(w * (i // ns)) 937 | block_y = int(h * (i % ns)) 938 | 939 | img = img.transpose(1, 2, 0) 940 | if scale_factor < 1: 941 | img = cv2.resize(img, (w, h)) 942 | 943 | mosaic[block_y:block_y + h, block_x:block_x + w, :] = img 944 | if len(targets) > 0: 945 | image_targets = targets[targets[:, 0] == i] 946 | boxes = xywh2xyxy(image_targets[:, 2:6]).T 947 | classes = image_targets[:, 1].astype('int') 948 | gt = image_targets.shape[1] == 6 # ground truth if no conf column 949 | conf = None if gt else image_targets[:, 6] # check for confidence presence (gt vs pred) 950 | 951 | boxes[[0, 2]] *= w 952 | boxes[[0, 2]] += block_x 953 | boxes[[1, 3]] *= h 954 | boxes[[1, 3]] += block_y 955 | for j, box in enumerate(boxes.T): 956 | cls = int(classes[j]) 957 | color = color_lut[cls % len(color_lut)] 958 | cls = names[cls] if names else cls 959 | if gt or conf[j] > 0.3: # 0.3 conf thresh 960 | label = '%s' % cls if gt else '%s %.1f' % (cls, conf[j]) 961 | plot_one_box(box, mosaic, label=label, color=color, line_thickness=tl) 962 | 963 | # Draw image filename labels 964 | if paths is not None: 965 | label = os.path.basename(paths[i])[:40] # trim to 40 char 966 | t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0] 967 | cv2.putText(mosaic, label, (block_x + 5, block_y + t_size[1] + 5), 0, tl / 3, [220, 220, 220], thickness=tf, 968 | lineType=cv2.LINE_AA) 969 | 970 | # Image border 971 | cv2.rectangle(mosaic, (block_x, block_y), (block_x + w, block_y + h), (255, 255, 255), thickness=3) 972 | 973 | if fname is not None: 974 | mosaic = cv2.resize(mosaic, (int(ns * w * 0.5), int(ns * h * 0.5)), interpolation=cv2.INTER_AREA) 975 | cv2.imwrite(fname, cv2.cvtColor(mosaic, cv2.COLOR_BGR2RGB)) 976 | 977 | return mosaic 978 | 979 | 980 | def plot_lr_scheduler(optimizer, scheduler, epochs=300): 981 | # Plot LR simulating training for full epochs 982 | optimizer, scheduler = copy(optimizer), copy(scheduler) # do not modify originals 983 | y = [] 984 | for _ in range(epochs): 985 | scheduler.step() 986 | y.append(optimizer.param_groups[0]['lr']) 987 | plt.plot(y, '.-', label='LR') 988 | plt.xlabel('epoch') 989 | plt.ylabel('LR') 990 | plt.grid() 991 | plt.xlim(0, epochs) 992 | plt.ylim(0) 993 | plt.tight_layout() 994 | plt.savefig('LR.png', dpi=200) 995 | 996 | 997 | def plot_test_txt(): # from utils.utils import *; plot_test() 998 | # Plot test.txt histograms 999 | x = np.loadtxt('test.txt', dtype=np.float32) 1000 | box = xyxy2xywh(x[:, :4]) 1001 | cx, cy = box[:, 0], box[:, 1] 1002 | 1003 | fig, ax = plt.subplots(1, 1, figsize=(6, 6), tight_layout=True) 1004 | ax.hist2d(cx, cy, bins=600, cmax=10, cmin=0) 1005 | ax.set_aspect('equal') 1006 | plt.savefig('hist2d.png', dpi=300) 1007 | 1008 | fig, ax = plt.subplots(1, 2, figsize=(12, 6), tight_layout=True) 1009 | ax[0].hist(cx, bins=600) 1010 | ax[1].hist(cy, bins=600) 1011 | plt.savefig('hist1d.png', dpi=200) 1012 | 1013 | 1014 | def plot_targets_txt(): # from utils.utils import *; plot_targets_txt() 1015 | # Plot targets.txt histograms 1016 | x = np.loadtxt('targets.txt', dtype=np.float32).T 1017 | s = ['x targets', 'y targets', 'width targets', 'height targets'] 1018 | fig, ax = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True) 1019 | ax = ax.ravel() 1020 | for i in range(4): 1021 | ax[i].hist(x[i], bins=100, label='%.3g +/- %.3g' % (x[i].mean(), x[i].std())) 1022 | ax[i].legend() 1023 | ax[i].set_title(s[i]) 1024 | plt.savefig('targets.jpg', dpi=200) 1025 | 1026 | 1027 | def plot_study_txt(f='study.txt', x=None): # from utils.utils import *; plot_study_txt() 1028 | # Plot study.txt generated by test.py 1029 | fig, ax = plt.subplots(2, 4, figsize=(10, 6), tight_layout=True) 1030 | ax = ax.ravel() 1031 | 1032 | fig2, ax2 = plt.subplots(1, 1, figsize=(8, 4), tight_layout=True) 1033 | for f in ['coco_study/study_coco_yolov5%s.txt' % x for x in ['s', 'm', 'l', 'x']]: 1034 | y = np.loadtxt(f, dtype=np.float32, usecols=[0, 1, 2, 3, 7, 8, 9], ndmin=2).T 1035 | x = np.arange(y.shape[1]) if x is None else np.array(x) 1036 | s = ['P', 'R', 'mAP@.5', 'mAP@.5:.95', 't_inference (ms/img)', 't_NMS (ms/img)', 't_total (ms/img)'] 1037 | for i in range(7): 1038 | ax[i].plot(x, y[i], '.-', linewidth=2, markersize=8) 1039 | ax[i].set_title(s[i]) 1040 | 1041 | j = y[3].argmax() + 1 1042 | ax2.plot(y[6, :j], y[3, :j] * 1E2, '.-', linewidth=2, markersize=8, 1043 | label=Path(f).stem.replace('study_coco_', '').replace('yolo', 'YOLO')) 1044 | 1045 | ax2.plot(1E3 / np.array([209, 140, 97, 58, 35, 18]), [33.5, 39.1, 42.5, 45.9, 49., 50.5], 1046 | 'k.-', linewidth=2, markersize=8, alpha=.25, label='EfficientDet') 1047 | ax2.set_xlim(0, 30) 1048 | ax2.set_ylim(25, 50) 1049 | ax2.set_xlabel('GPU Latency (ms)') 1050 | ax2.set_ylabel('COCO AP val') 1051 | ax2.legend(loc='lower right') 1052 | ax2.grid() 1053 | plt.savefig('study_mAP_latency.png', dpi=300) 1054 | plt.savefig(f.replace('.txt', '.png'), dpi=200) 1055 | 1056 | 1057 | def plot_labels(labels): 1058 | # plot dataset labels 1059 | c, b = labels[:, 0], labels[:, 1:].transpose() # classees, boxes 1060 | 1061 | def hist2d(x, y, n=100): 1062 | xedges, yedges = np.linspace(x.min(), x.max(), n), np.linspace(y.min(), y.max(), n) 1063 | hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges)) 1064 | xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1) 1065 | yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1) 1066 | return np.log(hist[xidx, yidx]) 1067 | 1068 | fig, ax = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True) 1069 | ax = ax.ravel() 1070 | ax[0].hist(c, bins=int(c.max() + 1)) 1071 | ax[0].set_xlabel('classes') 1072 | ax[1].scatter(b[0], b[1], c=hist2d(b[0], b[1], 90), cmap='jet') 1073 | ax[1].set_xlabel('x') 1074 | ax[1].set_ylabel('y') 1075 | ax[2].scatter(b[2], b[3], c=hist2d(b[2], b[3], 90), cmap='jet') 1076 | ax[2].set_xlabel('width') 1077 | ax[2].set_ylabel('height') 1078 | plt.savefig('labels.png', dpi=200) 1079 | 1080 | 1081 | def plot_evolution_results(hyp): # from utils.utils import *; plot_evolution_results(hyp) 1082 | # Plot hyperparameter evolution results in evolve.txt 1083 | x = np.loadtxt('evolve.txt', ndmin=2) 1084 | f = fitness(x) 1085 | # weights = (f - f.min()) ** 2 # for weighted results 1086 | plt.figure(figsize=(12, 10), tight_layout=True) 1087 | matplotlib.rc('font', **{'size': 8}) 1088 | for i, (k, v) in enumerate(hyp.items()): 1089 | y = x[:, i + 7] 1090 | # mu = (y * weights).sum() / weights.sum() # best weighted result 1091 | mu = y[f.argmax()] # best single result 1092 | plt.subplot(4, 5, i + 1) 1093 | plt.plot(mu, f.max(), 'o', markersize=10) 1094 | plt.plot(y, f, '.') 1095 | plt.title('%s = %.3g' % (k, mu), fontdict={'size': 9}) # limit to 40 characters 1096 | print('%15s: %.3g' % (k, mu)) 1097 | plt.savefig('evolve.png', dpi=200) 1098 | 1099 | 1100 | def plot_results_overlay(start=0, stop=0): # from utils.utils import *; plot_results_overlay() 1101 | # Plot training 'results*.txt', overlaying train and val losses 1102 | s = ['train', 'train', 'train', 'Precision', 'mAP@0.5', 'val', 'val', 'val', 'Recall', 'mAP@0.5:0.95'] # legends 1103 | t = ['GIoU', 'Objectness', 'Classification', 'P-R', 'mAP-F1'] # titles 1104 | for f in sorted(glob.glob('results*.txt') + glob.glob('../../Downloads/results*.txt')): 1105 | results = np.loadtxt(f, usecols=[2, 3, 4, 8, 9, 12, 13, 14, 10, 11], ndmin=2).T 1106 | n = results.shape[1] # number of rows 1107 | x = range(start, min(stop, n) if stop else n) 1108 | fig, ax = plt.subplots(1, 5, figsize=(14, 3.5), tight_layout=True) 1109 | ax = ax.ravel() 1110 | for i in range(5): 1111 | for j in [i, i + 5]: 1112 | y = results[j, x] 1113 | ax[i].plot(x, y, marker='.', label=s[j]) 1114 | # y_smooth = butter_lowpass_filtfilt(y) 1115 | # ax[i].plot(x, np.gradient(y_smooth), marker='.', label=s[j]) 1116 | 1117 | ax[i].set_title(t[i]) 1118 | ax[i].legend() 1119 | ax[i].set_ylabel(f) if i == 0 else None # add filename 1120 | fig.savefig(f.replace('.txt', '.png'), dpi=200) 1121 | 1122 | 1123 | def plot_results(start=0, stop=0, bucket='', id=(), labels=()): # from utils.utils import *; plot_results() 1124 | # Plot training 'results*.txt' as seen in https://github.com/ultralytics/yolov5#reproduce-our-training 1125 | fig, ax = plt.subplots(2, 5, figsize=(12, 6)) 1126 | ax = ax.ravel() 1127 | s = ['GIoU', 'Objectness', 'Classification', 'Precision', 'Recall', 1128 | 'val GIoU', 'val Objectness', 'val Classification', 'mAP@0.5', 'mAP@0.5:0.95'] 1129 | if bucket: 1130 | os.system('rm -rf storage.googleapis.com') 1131 | files = ['https://storage.googleapis.com/%s/results%g.txt' % (bucket, x) for x in id] 1132 | else: 1133 | files = glob.glob('results*.txt') + glob.glob('../../Downloads/results*.txt') 1134 | for fi, f in enumerate(files): 1135 | try: 1136 | results = np.loadtxt(f, usecols=[2, 3, 4, 8, 9, 12, 13, 14, 10, 11], ndmin=2).T 1137 | n = results.shape[1] # number of rows 1138 | x = range(start, min(stop, n) if stop else n) 1139 | for i in range(10): 1140 | y = results[i, x] 1141 | if i in [0, 1, 2, 5, 6, 7]: 1142 | y[y == 0] = np.nan # dont show zero loss values 1143 | # y /= y[0] # normalize 1144 | label = labels[fi] if len(labels) else Path(f).stem 1145 | ax[i].plot(x, y, marker='.', label=label, linewidth=2, markersize=8) 1146 | ax[i].set_title(s[i]) 1147 | # if i in [5, 6, 7]: # share train and val loss y axes 1148 | # ax[i].get_shared_y_axes().join(ax[i], ax[i - 5]) 1149 | except: 1150 | print('Warning: Plotting error for %s, skipping file' % f) 1151 | 1152 | fig.tight_layout() 1153 | ax[1].legend() 1154 | fig.savefig('results.png', dpi=200) 1155 | --------------------------------------------------------------------------------