├── README.md ├── inference_ros_trt_v2.py ├── split_train_val.py └── voc2yolo_label.py /README.md: -------------------------------------------------------------------------------- 1 | ### YOLOv5-Pytorch-TensorRT 2 | 3 | ### 1 Training in Pytorch 4 | ### 1.1 Download YOLOv5-Pytorch 5 | ``` 6 | cd ~/Desktop 7 | mkdir workspace 8 | cd workspace 9 | git clone https://github.com/ultralytics/yolov5 10 | ``` 11 | 12 | ### 1.2 Prepare dataset 13 | ``` 14 | cd ~/Desktop/workspace 15 | mkdir dataset 16 | cd dataset 17 | mkdir annotations 18 | mkdir images 19 | mkdir ImageSets 20 | cd ImageSets 21 | mkdir Main 22 | ``` 23 | 24 | ### 1.3 Place your images and annotations into images and annotations folder respectively 25 | ### 1.4 Put the ```split_train_val.py``` and ```voc2yolo_label.py```into dataset folder 26 | ### 1.5 Change ```voc2yolo_label.py``` line 8-13 to your config 27 | ``` 28 | classes = ["obstacle", "human", "injury"] # 改成自己的类别 29 | image_dir = "/home/laitathei/Desktop/workspace/dataset/images/" 30 | labels_dir = "/home/laitathei/Desktop/workspace/dataset/labels/" 31 | annotations_dir = "/home/laitathei/Desktop/workspace/dataset/annotations/" 32 | ImageSets_Main_dir = "/home/laitathei/Desktop/workspace/dataset/ImageSets/Main/" 33 | dataset_dir = "/home/laitathei/Desktop/workspace/dataset/" 34 | ``` 35 | 36 | ### 1.5 Dataset folder structures 37 | ``` 38 | ├── dataset 39 | ├── annotations 40 | │ ├── xxx.xml 41 | │ 42 | ├── ImageSets 43 | │ ├── Main 44 | │ ├── test.txt 45 | │ ├── train.txt 46 | │ ├── trainval.txt 47 | │ ├── val.txt 48 | │ 49 | ├── labels 50 | │ ├── test 51 | │ ├── xxx.txt 52 | | ├── ... 53 | │ ├── val 54 | │ ├── xxx.txt 55 | | ├── ... 56 | │ ├── train 57 | │ ├── xxx.txt 58 | | ├── ... 59 | │ 60 | ├── images 61 | │ ├── test 62 | │ ├── xxx.jpg 63 | | ├── ... 64 | │ ├── val 65 | │ ├── xxx.jpg 66 | | ├── ... 67 | │ ├── train 68 | │ ├── xxx.jpg 69 | | ├── ... 70 | ├── train.txt 71 | ├── train_dummy.txt 72 | ├── test.txt 73 | ├── test_dummy.txt 74 | ├── valid.txt 75 | ├── valid_dummy.txt 76 | ├── split_train_val.py 77 | └── voc2yolo_label.py 78 | ``` 79 | 80 | ### 1.6 Prepare your dataset yaml file, download the official weight from https://github.com/ultralytics/yolov5/releases/tag/v6.1 and place it into yolov5/weight folder 81 | ``` 82 | cd ~/Desktop/workspace/yolov5 83 | mkdir weight 84 | cd weight 85 | wget https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5n.pt 86 | cd ~/Desktop/workspace/yolov5/data 87 | gedit custom_dataset.yaml 88 | ``` 89 | 90 | ### 1.7 custom_dataset.yaml content and change your configuration such as number of class and class names 91 | ``` 92 | train: /home/laitathei/Desktop/workspace/dataset/train.txt 93 | val: /home/laitathei/Desktop/workspace/dataset/val.txt 94 | test: /home/laitathei/Desktop/workspace/dataset/test.txt 95 | test_xml: /home/laitathei/Desktop/workspace/dataset/annotations 96 | 97 | # Classes 98 | nc: 3 # number of classes 99 | names: ['obstacle','human', 'injury'] # class names 100 | ``` 101 | 102 | ### 1.8 Train your dataset 103 | ``` 104 | cd ~/Desktop/workspace/yolov5/ 105 | python3 train.py --weights /home/laitathei/Desktop/workspace/yolov5/weight/yolov5n.pt --cfg /home/laitathei/Desktop/workspace/yolov5/models/yolov5n.yaml --data /home/laitathei/Desktop/workspace/yolov5/data/custom_dataset.yaml --epochs 100 106 | ``` 107 | 108 | ### 1.9 Train result, best.pt and last.pt are FP32 format 109 | ``` 110 | 100 epochs completed in 0.091 hours. 111 | Optimizer stripped from runs/train/exp/weights/last.pt, 3.9MB 112 | Optimizer stripped from runs/train/exp/weights/best.pt, 3.9MB 113 | 114 | Validating runs/train/exp/weights/best.pt... 115 | Fusing layers... 116 | Model Summary: 213 layers, 1763224 parameters, 0 gradients, 4.2 GFLOPs 117 | Class Images Labels P R mAP@.5 mAP@.5:.95: 100%|██████████| 2/2 [00:00<00:00, 3.06it/s] 118 | all 55 215 0.986 0.977 0.994 0.916 119 | obstacle 55 215 0.986 0.977 0.994 0.916 120 | Results saved to runs/train/exp 121 | ``` 122 | 123 | ### 2 TensorRT conversion 124 | ### 2.1 Convert ```best.pt```->```best.wts```->```best.engine``` in TensorRTx 125 | ### 2.2 Put the ```gen_wts.py``` into yolov5 folder 126 | ### 2.3 Convert ```best.wts```->```best.engine``` by ```gen_wts.py``` 127 | ``` 128 | cd ~/Desktop/workspace/yolov5 129 | python3 gen_wts.py -w ./runs/train/exp/weights/best.pt -o ./runs/train/exp/weights/best.wts 130 | 131 | ## Below content will show if program success 132 | YOLOv5 🚀 v6.1-11-g63ddb6f torch 1.10.2+cu113 CPU 133 | ``` 134 | 135 | ### 2.4 Convert ```best.wts```->```best.engine``` by TensorRTx cmake and best.engine is FP16 format 136 | ``` 137 | cd ~/Desktop/workspace/tensorrtx/yolov5 138 | mkdir build 139 | cd build 140 | 141 | ## Place the best.wts into build folder 142 | ## update CLASS_NUM in yololayer.h if your model is trained on custom dataset 143 | ## before 144 | static constexpr int CLASS_NUM = 80; // line 20 145 | static constexpr int INPUT_H = 640; // line 21 yolov5's input height and width must be divisible by 32. 146 | static constexpr int INPUT_W = 640; // line 22 147 | ## after 148 | static constexpr int CLASS_NUM = 3; // line 20 149 | static constexpr int INPUT_H = 640; // line 21 yolov5's input height and width must be divisible by 32. 150 | static constexpr int INPUT_W = 640; // line 22 151 | 152 | cmake .. 153 | make 154 | 155 | ## Below content will show if program success 156 | [100%] Linking CXX executable yolov5 157 | [100%] Built target yolov5 158 | 159 | sudo ./yolov5 -s best.wts best.engine n 160 | ## Below content will show if program success 161 | Loading weights: best.wts 162 | Building engine, please wait for a while... 163 | Build engine successfully! 164 | ``` 165 | 166 | ### 2.5 Inference the engine with ROS and realsensen D455 camera 167 | ### 2.6 Put ```inference_ros_trt_v2.py``` into tensorrtx/yolov5 folder 168 | ``` 169 | cd ~/Desktop/workspace/tensorrtx/yolov5 170 | python3 inference_ros_trt_v2.py 171 | 172 | ## change the realsense camera topic, PLUGIN_LIBRARY, and engine_file_path if required 173 | ``` 174 | -------------------------------------------------------------------------------- /inference_ros_trt_v2.py: -------------------------------------------------------------------------------- 1 | """ 2 | An example that uses TensorRT's Python api to make inferences. 3 | """ 4 | import ctypes 5 | import os 6 | import shutil 7 | import random 8 | import sys 9 | import threading 10 | import time 11 | import cv2 12 | import numpy as np 13 | import pycuda.autoinit 14 | import pycuda.driver as cuda 15 | import tensorrt as trt 16 | 17 | # ros package 18 | import rospy 19 | from sensor_msgs import point_cloud2 20 | from sensor_msgs.msg import Image, CameraInfo, PointCloud2 21 | from std_msgs.msg import String 22 | from jsk_recognition_msgs.msg import BoundingBox, BoundingBoxArray 23 | from geometry_msgs.msg import PoseStamped 24 | from std_msgs.msg import Int32 25 | from PIL import Image as PIL_Image 26 | import cv_bridge 27 | import numpy as np 28 | 29 | CONF_THRESH = 0.5 30 | IOU_THRESHOLD = 0.4 31 | 32 | camera_frame_name="D435i::camera_depth_frame" 33 | 34 | def plot_one_box(x, img, color=None, label=None, line_thickness=None): 35 | """ 36 | description: Plots one bounding box on image img, 37 | this function comes from YoLov5 project. 38 | param: 39 | x: a box likes [x1,y1,x2,y2] 40 | img: a opencv image object 41 | color: color to draw rectangle, such as (0,255,0) 42 | label: str 43 | line_thickness: int 44 | return: 45 | no return 46 | 47 | """ 48 | tl = ( 49 | line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1 50 | ) # line/font thickness 51 | #color = color or [random.randint(0, 255) for _ in range(3)] 52 | color = (0,0,255) 53 | c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3])) 54 | cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA) 55 | if label: 56 | tf = max(tl - 1, 1) # font thickness 57 | t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0] 58 | c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3 59 | cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA) # filled 60 | cv2.putText( 61 | img, 62 | label, 63 | (c1[0], c1[1] - 2), 64 | 0, 65 | tl / 3, 66 | [225, 255, 255], 67 | thickness=tf, 68 | lineType=cv2.LINE_AA, 69 | ) 70 | 71 | 72 | class YoLov5TRT(object): 73 | """ 74 | description: A YOLOv5 class that warps TensorRT ops, preprocess and postprocess ops. 75 | """ 76 | 77 | def __init__(self, engine_file_path): 78 | # Create a Context on this device, 79 | self.ctx = cuda.Device(0).make_context() 80 | stream = cuda.Stream() 81 | TRT_LOGGER = trt.Logger(trt.Logger.INFO) 82 | runtime = trt.Runtime(TRT_LOGGER) 83 | 84 | # Deserialize the engine from file 85 | with open(engine_file_path, "rb") as f: 86 | engine = runtime.deserialize_cuda_engine(f.read()) 87 | context = engine.create_execution_context() 88 | 89 | host_inputs = [] 90 | cuda_inputs = [] 91 | host_outputs = [] 92 | cuda_outputs = [] 93 | bindings = [] 94 | 95 | for binding in engine: 96 | print('bingding:', binding, engine.get_binding_shape(binding)) 97 | size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size 98 | dtype = trt.nptype(engine.get_binding_dtype(binding)) 99 | # Allocate host and device buffers 100 | host_mem = cuda.pagelocked_empty(size, dtype) 101 | cuda_mem = cuda.mem_alloc(host_mem.nbytes) 102 | # Append the device buffer to device bindings. 103 | bindings.append(int(cuda_mem)) 104 | # Append to the appropriate list. 105 | if engine.binding_is_input(binding): 106 | self.input_w = engine.get_binding_shape(binding)[-1] 107 | self.input_h = engine.get_binding_shape(binding)[-2] 108 | host_inputs.append(host_mem) 109 | cuda_inputs.append(cuda_mem) 110 | else: 111 | host_outputs.append(host_mem) 112 | cuda_outputs.append(cuda_mem) 113 | 114 | # Store 115 | self.stream = stream 116 | self.context = context 117 | self.engine = engine 118 | self.host_inputs = host_inputs 119 | self.cuda_inputs = cuda_inputs 120 | self.host_outputs = host_outputs 121 | self.cuda_outputs = cuda_outputs 122 | self.bindings = bindings 123 | self.batch_size = engine.max_batch_size 124 | 125 | def infer(self, raw_image_generator, rgb_image): 126 | threading.Thread.__init__(self) 127 | # Make self the active context, pushing it on top of the context stack. 128 | self.ctx.push() 129 | # Restore 130 | stream = self.stream 131 | context = self.context 132 | engine = self.engine 133 | host_inputs = self.host_inputs 134 | cuda_inputs = self.cuda_inputs 135 | host_outputs = self.host_outputs 136 | cuda_outputs = self.cuda_outputs 137 | bindings = self.bindings 138 | # Do image preprocess 139 | batch_image_raw = [] 140 | batch_origin_h = [] 141 | batch_origin_w = [] 142 | batch_input_image = np.empty(shape=[self.batch_size, 3, self.input_h, self.input_w]) 143 | #print("raw_image_generator: {}".format(raw_image_generator)) 144 | 145 | 146 | input_image, image_raw, origin_h, origin_w = self.preprocess_image(rgb_image) 147 | batch_image_raw.append(image_raw) 148 | batch_origin_h.append(origin_h) 149 | batch_origin_w.append(origin_w) 150 | np.copyto(batch_input_image, input_image) 151 | batch_input_image = np.ascontiguousarray(batch_input_image) 152 | 153 | # Copy input image to host buffer 154 | np.copyto(host_inputs[0], batch_input_image.ravel()) 155 | start = time.time() 156 | # Transfer input data to the GPU. 157 | cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream) 158 | # Run inference. 159 | context.execute_async(batch_size=self.batch_size, bindings=bindings, stream_handle=stream.handle) 160 | # Transfer predictions back from the GPU. 161 | cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream) 162 | # Synchronize the stream 163 | stream.synchronize() 164 | end = time.time() 165 | # Remove any context from the top of the context stack, deactivating it. 166 | self.ctx.pop() 167 | # Here we use the first row of output in that batch_size = 1 168 | output = host_outputs[0] 169 | # Do postprocess 170 | for i in range(self.batch_size): 171 | result_boxes, result_scores, result_classid = self.post_process( 172 | output[i * 6001: (i + 1) * 6001], batch_origin_h[i], batch_origin_w[i] 173 | ) 174 | # Draw rectangles and labels on the original image 175 | for j in range(len(result_boxes)): 176 | box = result_boxes[j] 177 | plot_one_box( 178 | box, 179 | batch_image_raw[i], 180 | label="{}:{:.2f}".format( 181 | categories[int(result_classid[j])], result_scores[j] 182 | ), 183 | ) 184 | return batch_image_raw, end - start 185 | 186 | def destroy(self): 187 | # Remove any context from the top of the context stack, deactivating it. 188 | self.ctx.pop() 189 | 190 | def preprocess_image(self, raw_bgr_image): 191 | """ 192 | description: Convert BGR image to RGB, 193 | resize and pad it to target size, normalize to [0,1], 194 | transform to NCHW format. 195 | param: 196 | input_image_path: str, image path 197 | return: 198 | image: the processed image 199 | image_raw: the original image 200 | h: original height 201 | w: original width 202 | """ 203 | image_raw = raw_bgr_image 204 | h, w, c = image_raw.shape 205 | image = cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB) 206 | # Calculate widht and height and paddings 207 | r_w = self.input_w / w 208 | r_h = self.input_h / h 209 | if r_h > r_w: 210 | tw = self.input_w 211 | th = int(r_w * h) 212 | tx1 = tx2 = 0 213 | ty1 = int((self.input_h - th) / 2) 214 | ty2 = self.input_h - th - ty1 215 | else: 216 | tw = int(r_h * w) 217 | th = self.input_h 218 | tx1 = int((self.input_w - tw) / 2) 219 | tx2 = self.input_w - tw - tx1 220 | ty1 = ty2 = 0 221 | # Resize the image with long side while maintaining ratio 222 | image = cv2.resize(image, (tw, th)) 223 | # Pad the short side with (128,128,128) 224 | image = cv2.copyMakeBorder( 225 | image, ty1, ty2, tx1, tx2, cv2.BORDER_CONSTANT, None, (128, 128, 128) 226 | ) 227 | image = image.astype(np.float32) 228 | # Normalize to [0,1] 229 | image /= 255.0 230 | # HWC to CHW format: 231 | image = np.transpose(image, [2, 0, 1]) 232 | # CHW to NCHW format 233 | image = np.expand_dims(image, axis=0) 234 | # Convert the image to row-major order, also known as "C order": 235 | image = np.ascontiguousarray(image) 236 | return image, image_raw, h, w 237 | 238 | def xywh2xyxy(self, origin_h, origin_w, x): 239 | """ 240 | description: Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right 241 | param: 242 | origin_h: height of original image 243 | origin_w: width of original image 244 | x: A boxes numpy, each row is a box [center_x, center_y, w, h] 245 | return: 246 | y: A boxes numpy, each row is a box [x1, y1, x2, y2] 247 | """ 248 | y = np.zeros_like(x) 249 | r_w = self.input_w / origin_w 250 | r_h = self.input_h / origin_h 251 | if r_h > r_w: 252 | y[:, 0] = x[:, 0] - x[:, 2] / 2 253 | y[:, 2] = x[:, 0] + x[:, 2] / 2 254 | y[:, 1] = x[:, 1] - x[:, 3] / 2 - (self.input_h - r_w * origin_h) / 2 255 | y[:, 3] = x[:, 1] + x[:, 3] / 2 - (self.input_h - r_w * origin_h) / 2 256 | y /= r_w 257 | else: 258 | y[:, 0] = x[:, 0] - x[:, 2] / 2 - (self.input_w - r_h * origin_w) / 2 259 | y[:, 2] = x[:, 0] + x[:, 2] / 2 - (self.input_w - r_h * origin_w) / 2 260 | y[:, 1] = x[:, 1] - x[:, 3] / 2 261 | y[:, 3] = x[:, 1] + x[:, 3] / 2 262 | y /= r_h 263 | 264 | return y 265 | 266 | def post_process(self, output, origin_h, origin_w): 267 | """ 268 | description: postprocess the prediction 269 | param: 270 | output: A numpy likes [num_boxes,cx,cy,w,h,conf,cls_id, cx,cy,w,h,conf,cls_id, ...] 271 | origin_h: height of original image 272 | origin_w: width of original image 273 | return: 274 | result_boxes: finally boxes, a boxes numpy, each row is a box [x1, y1, x2, y2] 275 | result_scores: finally scores, a numpy, each element is the score correspoing to box 276 | result_classid: finally classid, a numpy, each element is the classid correspoing to box 277 | """ 278 | # Get the num of boxes detected 279 | num = int(output[0]) 280 | # Reshape to a two dimentional ndarray 281 | pred = np.reshape(output[1:], (-1, 6))[:num, :] 282 | # Do nms 283 | boxes = self.non_max_suppression(pred, origin_h, origin_w, conf_thres=CONF_THRESH, nms_thres=IOU_THRESHOLD) 284 | self.result_boxes = boxes[:, :4] if len(boxes) else np.array([]) 285 | result_scores = boxes[:, 4] if len(boxes) else np.array([]) 286 | result_classid = boxes[:, 5] if len(boxes) else np.array([]) 287 | self.number_of_obstacle = np.count_nonzero(result_classid == 0) 288 | self.number_of_human = np.count_nonzero(result_classid == 1) 289 | self.number_of_injury = np.count_nonzero(result_classid == 2) 290 | return self.result_boxes, result_scores, result_classid 291 | 292 | 293 | 294 | def bbox_iou(self, box1, box2, x1y1x2y2=True): 295 | """ 296 | description: compute the IoU of two bounding boxes 297 | param: 298 | box1: A box coordinate (can be (x1, y1, x2, y2) or (x, y, w, h)) 299 | box2: A box coordinate (can be (x1, y1, x2, y2) or (x, y, w, h)) 300 | x1y1x2y2: select the coordinate format 301 | return: 302 | iou: computed iou 303 | """ 304 | if not x1y1x2y2: 305 | # Transform from center and width to exact coordinates 306 | b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2 307 | b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2 308 | b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2 309 | b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2 310 | else: 311 | # Get the coordinates of bounding boxes 312 | b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3] 313 | b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3] 314 | 315 | # Get the coordinates of the intersection rectangle 316 | inter_rect_x1 = np.maximum(b1_x1, b2_x1) 317 | inter_rect_y1 = np.maximum(b1_y1, b2_y1) 318 | inter_rect_x2 = np.minimum(b1_x2, b2_x2) 319 | inter_rect_y2 = np.minimum(b1_y2, b2_y2) 320 | # Intersection area 321 | inter_area = np.clip(inter_rect_x2 - inter_rect_x1 + 1, 0, None) * \ 322 | np.clip(inter_rect_y2 - inter_rect_y1 + 1, 0, None) 323 | # Union Area 324 | b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1) 325 | b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1) 326 | 327 | iou = inter_area / (b1_area + b2_area - inter_area + 1e-16) 328 | 329 | return iou 330 | 331 | def non_max_suppression(self, prediction, origin_h, origin_w, conf_thres=0.5, nms_thres=0.4): 332 | """ 333 | description: Removes detections with lower object confidence score than 'conf_thres' and performs 334 | Non-Maximum Suppression to further filter detections. 335 | param: 336 | prediction: detections, (x1, y1, x2, y2, conf, cls_id) 337 | origin_h: original image height 338 | origin_w: original image width 339 | conf_thres: a confidence threshold to filter detections 340 | nms_thres: a iou threshold to filter detections 341 | return: 342 | boxes: output after nms with the shape (x1, y1, x2, y2, conf, cls_id) 343 | """ 344 | # Get the boxes that score > CONF_THRESH 345 | boxes = prediction[prediction[:, 4] >= conf_thres] 346 | # Trandform bbox from [center_x, center_y, w, h] to [x1, y1, x2, y2] 347 | boxes[:, :4] = self.xywh2xyxy(origin_h, origin_w, boxes[:, :4]) 348 | # clip the coordinates 349 | boxes[:, 0] = np.clip(boxes[:, 0], 0, origin_w -1) 350 | boxes[:, 2] = np.clip(boxes[:, 2], 0, origin_w -1) 351 | boxes[:, 1] = np.clip(boxes[:, 1], 0, origin_h -1) 352 | boxes[:, 3] = np.clip(boxes[:, 3], 0, origin_h -1) 353 | # Object confidence 354 | confs = boxes[:, 4] 355 | # Sort by the confs 356 | boxes = boxes[np.argsort(-confs)] 357 | # Perform non-maximum suppression 358 | keep_boxes = [] 359 | while boxes.shape[0]: 360 | large_overlap = self.bbox_iou(np.expand_dims(boxes[0, :4], 0), boxes[:, :4]) > nms_thres 361 | label_match = boxes[0, -1] == boxes[:, -1] 362 | # Indices of boxes with lower confidence scores, large IOUs and matching labels 363 | invalid = large_overlap & label_match 364 | keep_boxes += [boxes[0]] 365 | boxes = boxes[~invalid] 366 | boxes = np.stack(keep_boxes, 0) if len(keep_boxes) else np.array([]) 367 | return boxes 368 | 369 | 370 | class inferThread(threading.Thread): 371 | def __init__(self, yolov5_wrapper, rgb_image): 372 | threading.Thread.__init__(self) 373 | self.yolov5_wrapper = yolov5_wrapper 374 | self.rgb_image = rgb_image 375 | 376 | def run(self): 377 | batch_image_raw, use_time = self.yolov5_wrapper.infer(self.rgb_image,self.rgb_image) 378 | 379 | 380 | 381 | 382 | 383 | class object_detect: 384 | def __init__(self): 385 | self.bridge = cv_bridge.CvBridge() 386 | rospy.init_node('object_detection_tensorrt') 387 | 388 | # Subscribe color and depth image 389 | rospy.Subscriber("/camera/color/image_raw",Image,self.color_callback) 390 | rospy.Subscriber("/camera/depth/image_raw",Image,self.depth_callback) 391 | #rospy.Subscriber("/camera/depth/color/points",PointCloud2,self.pointcloud2_callback) 392 | 393 | # Subscribe camera info 394 | rospy.Subscriber("/camera/depth/camera_info",CameraInfo,self.depth_camera_info_callback) 395 | rospy.Subscriber("/camera/color/camera_info",CameraInfo,self.color_camera_info_callback) 396 | 397 | #self.box_pub = rospy.Publisher("/desired/input/box", BoundingBox, queue_size=1) 398 | self.box_array_pub = rospy.Publisher("/desired/input/box_array", BoundingBoxArray, queue_size=1) 399 | 400 | self.number_of_obstacle_pub = rospy.Publisher("/detection_result/number_of_obstacle", Int32, queue_size=1) 401 | self.number_of_human_pub = rospy.Publisher("/detection_result/number_of_human", Int32, queue_size=1) 402 | self.number_of_injury_pub = rospy.Publisher("/detection_result/number_of_injury", Int32, queue_size=1) 403 | 404 | def depth_callback(self,data): 405 | # Depth image callback 406 | self.depth_image = self.bridge.imgmsg_to_cv2(data) 407 | self.depth_image = np.array(self.depth_image, dtype=np.float32) 408 | self.depth_array = self.depth_image/1000.0 409 | 410 | def color_callback(self,data): 411 | # RGB image callback 412 | self.rgb_image = self.bridge.imgmsg_to_cv2(data, "bgr8") 413 | 414 | 415 | # this is depth camera info callback 416 | def depth_camera_info_callback(self, data): 417 | self.depth_height = data.height 418 | self.depth_width = data.width 419 | 420 | # Pixels coordinates of the principal point (center of projection) 421 | self.depth_u = data.P[2] 422 | self.depth_v = data.P[6] 423 | 424 | # Focal Length of image (multiple of pixel width and height) 425 | self.depth_fx = data.P[0] 426 | self.depth_fy = data.P[5] 427 | 428 | # this is color camera info callback 429 | def color_camera_info_callback(self, data): 430 | self.rgb_height = data.height 431 | self.rgb_width = data.width 432 | 433 | # Pixels coordinates of the principal point (center of projection) 434 | self.rgb_u = data.P[2] 435 | self.rgb_v = data.P[6] 436 | 437 | # Focal Length of image (multiple of pixel width and height) 438 | self.rgb_fx = data.P[0] 439 | self.rgb_fy = data.P[5] 440 | 441 | 442 | def transformation(self,top_left_x, top_left_y, bottom_right_x, bottom_right_y, center_x, center_y): 443 | 444 | if (top_left_x==0)&(top_left_y==0)&(bottom_right_x==0)&(bottom_right_y==0): 445 | expected_3d_center_distance=0.0 446 | expected_3d_center_x=0.0 447 | expected_3d_center_y=0.0 448 | expected_3d_top_left_x=0.0 449 | expected_3d_top_left_y=0.0 450 | expected_3d_bottom_right_x=0.0 451 | expected_3d_bottom_right_y=0.0 452 | top_left_x=top_left_x-100 453 | bottom_right_x=bottom_right_x+100 454 | if top_left_x<=0: 455 | top_left_x=0 456 | if bottom_right_x>=640: 457 | bottom_right_x=640 458 | 459 | if 145=640: 522 | center_x=480 523 | elif center_x<=0: 524 | center_x=0 525 | 526 | distance_z = self.depth_array[center_y,center_x] 527 | expected_3d_center_distance = distance_z 528 | expected_3d_top_left_x = ((top_left_x - self.rgb_u)*distance_z)/self.rgb_fx 529 | expected_3d_top_left_y = ((top_left_y - self.rgb_v)*distance_z)/self.rgb_fy 530 | 531 | expected_3d_bottom_right_x = ((bottom_right_x - self.rgb_u)*distance_z)/self.rgb_fx 532 | expected_3d_bottom_right_y = ((bottom_right_y - self.rgb_v)*distance_z)/self.rgb_fy 533 | 534 | expected_3d_center_x = ((center_x - self.rgb_u)*distance_z)/self.rgb_fx 535 | expected_3d_center_y = ((center_y - self.rgb_v)*distance_z)/self.rgb_fy 536 | 537 | return expected_3d_center_distance,expected_3d_center_x,expected_3d_center_y, expected_3d_top_left_x, expected_3d_top_left_y, expected_3d_bottom_right_x, expected_3d_bottom_right_y 538 | 539 | 540 | def box_calculation(self,center_x,center_y,top_left_x,top_left_y,bottom_right_x,bottom_right_y): 541 | box_dimensions_x = abs(top_left_x)-abs(bottom_right_x) 542 | if (top_left_x < 0)&(bottom_right_x > 0): 543 | box_dimensions_x = abs(top_left_x)+abs(bottom_right_x) 544 | box_dimensions_x = abs(box_dimensions_x) 545 | 546 | box_dimensions_y = abs(top_left_y)-abs(bottom_right_y) 547 | if (top_left_y < 0)&(bottom_right_y > 0): 548 | box_dimensions_y = abs(top_left_y)+abs(bottom_right_y) 549 | box_dimensions_y = abs(box_dimensions_y) 550 | 551 | box_position_x = center_x 552 | box_position_y = center_y 553 | box_dimension_z = 3 554 | 555 | return box_dimensions_x,box_dimensions_y,box_dimension_z,box_position_x,box_position_y 556 | 557 | 558 | if __name__ == "__main__": 559 | # load custom plugin and engine 560 | PLUGIN_LIBRARY = "build/libmyplugins.so" 561 | engine_file_path = "build/best.engine" 562 | if len(sys.argv) > 1: 563 | engine_file_path = sys.argv[1] 564 | if len(sys.argv) > 2: 565 | PLUGIN_LIBRARY = sys.argv[2] 566 | ctypes.CDLL(PLUGIN_LIBRARY) 567 | # load coco labels 568 | detection = object_detect() 569 | categories = ["obstacle", "human", "injury"] 570 | # a YoLov5TRT instance 571 | yolov5_wrapper = YoLov5TRT(engine_file_path) 572 | while(True): 573 | rospy.wait_for_message("/camera/color/image_raw",Image) 574 | rospy.wait_for_message("/camera/depth/image_raw",Image) 575 | # create a new thread to do inference 576 | #thread1 = inferThread(yolov5_wrapper, detection.rgb_image) 577 | #thread1.start() 578 | #thread1.join() 579 | t1 = time.time() 580 | batch_image_raw, use_time = yolov5_wrapper.infer(detection.rgb_image,detection.rgb_image) 581 | result = batch_image_raw[0] 582 | print('time->{:.2f}ms'.format(use_time * 1000)) 583 | print("Inference fps= %.2f"%(1/((use_time * 1000)/1000))) 584 | #result = cv2.putText(result, "Inference fps= %.2f"%(1/((use_time * 1000)/1000)), (0, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 585 | result = cv2.putText(result, "obstacle= {}".format(yolov5_wrapper.number_of_obstacle), (0, 60), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 586 | result = cv2.putText(result, "human= {}".format(yolov5_wrapper.number_of_human), (0, 90), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 587 | result = cv2.putText(result, "injury= {}".format(yolov5_wrapper.number_of_injury), (0, 120), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 588 | detection.number_of_obstacle_pub.publish(yolov5_wrapper.number_of_obstacle) 589 | detection.number_of_human_pub.publish(yolov5_wrapper.number_of_human) 590 | detection.number_of_injury_pub.publish(yolov5_wrapper.number_of_injury) 591 | box_array = BoundingBoxArray() 592 | for i in range(len(yolov5_wrapper.result_boxes)): 593 | box = BoundingBox() 594 | top_left_x = yolov5_wrapper.result_boxes[i][0] 595 | top_left_y = yolov5_wrapper.result_boxes[i][1] 596 | bottom_right_x = yolov5_wrapper.result_boxes[i][2] 597 | bottom_right_y = yolov5_wrapper.result_boxes[i][3] 598 | center_x = int((bottom_right_x+top_left_x)/2) 599 | center_y = int((bottom_right_y+top_left_y)/2) 600 | box_position_z,cx,cy,x1,y1,x2,y2 = detection.transformation(top_left_x, top_left_y, bottom_right_x, bottom_right_y, center_x, center_y) 601 | box_size_x,box_size_y,box_size_z,box_position_x,box_position_y = detection.box_calculation(cx,cy,x1,y1,x2,y2) 602 | box.header.stamp = rospy.Time.now() 603 | box.header.frame_id = camera_frame_name 604 | box.pose.orientation.w = 1 605 | box.pose.position.x = box_position_x # increase in program = move to right in rviz 606 | box.pose.position.y = box_position_y # increase in program = downward in rviz 607 | box.pose.position.z = box_position_z # increase in program = move away in rviz (directly use the depth distance) 608 | box.dimensions.x = box_size_x 609 | box.dimensions.y = box_size_y 610 | box.dimensions.z = box_size_z 611 | box_array.boxes.append(box) 612 | box_array.header.frame_id = camera_frame_name 613 | box_array.header.stamp = rospy.Time.now() 614 | fps = (1.0/(time.time()-t1)) 615 | #print("Inference include 2D to 3D fps: {}".format(fps)) 616 | result = cv2.putText(result, "fps: %.2f"%(fps), (0, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) 617 | detection.box_array_pub.publish(box_array) 618 | cv2.imshow("Detection", result) 619 | cv2.waitKey(1) 620 | 621 | 622 | -------------------------------------------------------------------------------- /split_train_val.py: -------------------------------------------------------------------------------- 1 | # coding:utf-8 2 | 3 | import os 4 | import random 5 | import argparse 6 | import shutil 7 | 8 | parser = argparse.ArgumentParser() 9 | # xml文件的地址,根据自己的数据进行修改 xml一般存放在Annotations下 10 | parser.add_argument('--xml_path', default='annotations', type=str, help='input xml label path') 11 | # 数据集的划分,地址选择自己数据下的ImageSets/Main 12 | parser.add_argument('--txt_path', default='ImageSets/Main', type=str, help='output txt label path') 13 | opt = parser.parse_args() 14 | 15 | trainval_percent = 1.0 16 | train_percent = 0.9 17 | xmlfilepath = opt.xml_path 18 | txtsavepath = opt.txt_path 19 | total_xml = os.listdir(xmlfilepath) 20 | if not os.path.exists(txtsavepath): 21 | os.makedirs(txtsavepath) 22 | 23 | num = len(total_xml) 24 | list_index = range(num) 25 | tv = int(num * trainval_percent) 26 | tr = int(tv * train_percent) 27 | trainval = random.sample(list_index, tv) 28 | train = random.sample(trainval, tr) 29 | 30 | file_trainval = open(txtsavepath + '/trainval.txt', 'w') 31 | file_test = open(txtsavepath + '/test.txt', 'w') 32 | file_train = open(txtsavepath + '/train.txt', 'w') 33 | file_val = open(txtsavepath + '/val.txt', 'w') 34 | 35 | for i in list_index: 36 | name = total_xml[i][:-4]+ '\n' 37 | if i in trainval: 38 | file_trainval.write(name) 39 | if i in train: 40 | file_train.write(name) 41 | else: 42 | file_val.write(name) 43 | else: 44 | file_test.write(name) 45 | 46 | file_trainval.close() 47 | file_train.close() 48 | file_val.close() 49 | file_test.close() 50 | -------------------------------------------------------------------------------- /voc2yolo_label.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | import xml.etree.ElementTree as ET 3 | import os 4 | from os import getcwd 5 | import shutil 6 | 7 | sets = ['train', 'val', 'test'] 8 | classes = ["obstacle", "human", "injury"] # 改成自己的类别 9 | image_dir = "/home/laitathei/Desktop/workspace/dataset/images/" 10 | labels_dir = "/home/laitathei/Desktop/workspace/dataset/labels/" 11 | annotations_dir = "/home/laitathei/Desktop/workspace/dataset/annotations/" 12 | ImageSets_Main_dir = "/home/laitathei/Desktop/workspace/dataset/ImageSets/Main/" 13 | dataset_dir = "/home/laitathei/Desktop/workspace/dataset/" 14 | 15 | def convert(size, box): 16 | dw = 1. / (size[0]) 17 | dh = 1. / (size[1]) 18 | x = (box[0] + box[1]) / 2.0 - 1 19 | y = (box[2] + box[3]) / 2.0 - 1 20 | w = box[1] - box[0] 21 | h = box[3] - box[2] 22 | x = x * dw 23 | w = w * dw 24 | y = y * dh 25 | h = h * dh 26 | return x, y, w, h 27 | 28 | def convert_annotation_train(image_id): 29 | in_file = open(annotations_dir+'%s.xml' % (image_id), encoding='UTF-8') 30 | out_file = open(labels_dir+'train/%s.txt' % (image_id), 'w') 31 | tree = ET.parse(in_file) 32 | root = tree.getroot() 33 | size = root.find('size') 34 | w = int(size.find('width').text) 35 | h = int(size.find('height').text) 36 | for obj in root.iter('object'): 37 | # difficult = obj.find('difficult').text 38 | difficult = obj.find('difficult').text 39 | cls = obj.find('name').text 40 | if cls not in classes or int(difficult) == 1: 41 | continue 42 | cls_id = classes.index(cls) 43 | xmlbox = obj.find('bndbox') 44 | b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), 45 | float(xmlbox.find('ymax').text)) 46 | b1, b2, b3, b4 = b 47 | # 标注越界修正 48 | if b2 > w: 49 | b2 = w 50 | if b4 > h: 51 | b4 = h 52 | b = (b1, b2, b3, b4) 53 | bb = convert((w, h), b) 54 | out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n') 55 | 56 | def convert_annotation_test(image_id): 57 | in_file = open(annotations_dir+'%s.xml' % (image_id), encoding='UTF-8') 58 | out_file = open(labels_dir+'test/%s.txt' % (image_id), 'w') 59 | tree = ET.parse(in_file) 60 | root = tree.getroot() 61 | size = root.find('size') 62 | w = int(size.find('width').text) 63 | h = int(size.find('height').text) 64 | for obj in root.iter('object'): 65 | # difficult = obj.find('difficult').text 66 | difficult = obj.find('difficult').text 67 | cls = obj.find('name').text 68 | if cls not in classes or int(difficult) == 1: 69 | continue 70 | cls_id = classes.index(cls) 71 | xmlbox = obj.find('bndbox') 72 | b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), 73 | float(xmlbox.find('ymax').text)) 74 | b1, b2, b3, b4 = b 75 | # 标注越界修正 76 | if b2 > w: 77 | b2 = w 78 | if b4 > h: 79 | b4 = h 80 | b = (b1, b2, b3, b4) 81 | bb = convert((w, h), b) 82 | out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n') 83 | 84 | def convert_annotation_val(image_id): 85 | in_file = open(annotations_dir+'%s.xml' % (image_id), encoding='UTF-8') 86 | out_file = open(labels_dir+'val/%s.txt' % (image_id), 'w') 87 | tree = ET.parse(in_file) 88 | root = tree.getroot() 89 | size = root.find('size') 90 | w = int(size.find('width').text) 91 | h = int(size.find('height').text) 92 | for obj in root.iter('object'): 93 | # difficult = obj.find('difficult').text 94 | difficult = obj.find('difficult').text 95 | cls = obj.find('name').text 96 | if cls not in classes or int(difficult) == 1: 97 | continue 98 | cls_id = classes.index(cls) 99 | xmlbox = obj.find('bndbox') 100 | b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), 101 | float(xmlbox.find('ymax').text)) 102 | b1, b2, b3, b4 = b 103 | # 标注越界修正 104 | if b2 > w: 105 | b2 = w 106 | if b4 > h: 107 | b4 = h 108 | b = (b1, b2, b3, b4) 109 | bb = convert((w, h), b) 110 | out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n') 111 | 112 | images_train_dir = os.path.join(image_dir, "train/") 113 | if not os.path.isdir(images_train_dir): # create image train folder if not exist 114 | os.mkdir(images_train_dir) 115 | images_val_dir = os.path.join(image_dir, "val/") # create image val folder if not exist 116 | if not os.path.isdir(images_val_dir): 117 | os.mkdir(images_val_dir) 118 | images_test_dir = os.path.join(image_dir, "test/") # create image test folder if not exist 119 | if not os.path.isdir(images_test_dir): 120 | os.mkdir(images_test_dir) 121 | 122 | if not os.path.exists(labels_dir): # create label folder 123 | os.makedirs(labels_dir) 124 | labels_train_dir = os.path.join(labels_dir, "train/") # create label train folder if not exist 125 | if not os.path.isdir(labels_train_dir): 126 | os.mkdir(labels_train_dir) 127 | labels_val_dir = os.path.join(labels_dir, "val/") # create label val folder if not exist 128 | if not os.path.isdir(labels_val_dir): 129 | os.mkdir(labels_val_dir) 130 | labels_test_dir = os.path.join(labels_dir, "test/") # create label test folder if not exist 131 | if not os.path.isdir(labels_test_dir): 132 | os.mkdir(labels_test_dir) 133 | 134 | train_dummy = open(dataset_dir+"train_dummy.txt", 'w') # create dummy train file which did include image train folder path 135 | val_dummy = open(dataset_dir+"val_dummy.txt", 'w') # create dummy val file which did include image val folder path 136 | test_dummy = open(dataset_dir+"test_dummy.txt", 'w') # create dummy val file which did include image test folder path 137 | 138 | for image_set in sets: 139 | print(image_set) 140 | image_ids = open(ImageSets_Main_dir+'%s.txt' % (image_set)).read().strip().split() 141 | list_file = open(dataset_dir+'%s.txt' % (image_set), 'w') 142 | for image_id in image_ids: 143 | if image_set=="train": 144 | train_dummy.write(image_dir+'%s.jpg\n' % (image_id)) 145 | list_file.write(image_dir+"train/"+'%s.jpg\n' % (image_id)) # place the file path with image train folder 146 | convert_annotation_train(image_id) 147 | if image_set=="val": 148 | train_dummy.close() 149 | val_dummy.write(image_dir+'%s.jpg\n' % (image_id)) 150 | list_file.write(image_dir+"val/"+'%s.jpg\n' % (image_id)) # place the file path with image val folder 151 | convert_annotation_val(image_id) 152 | if image_set=="test": 153 | val_dummy.close() 154 | test_dummy.write(image_dir+'%s.jpg\n' % (image_id)) 155 | list_file.write(image_dir+"test/"+'%s.jpg\n' % (image_id)) # place the file path with image test folder 156 | convert_annotation_test(image_id) 157 | test_dummy.close() 158 | list_file.close() 159 | 160 | train_dummy.close() 161 | val_dummy.close() 162 | test_dummy.close() 163 | 164 | train_image_file = open(dataset_dir+"train_dummy.txt", 'r') 165 | val_image_file = open(dataset_dir+"val_dummy.txt", 'r') 166 | test_image_file = open(dataset_dir+"test_dummy.txt", 'r') 167 | 168 | train_image_file_line = train_image_file.readlines() 169 | val_image_file_line = val_image_file.readlines() 170 | test_image_file_line = test_image_file.readlines() 171 | 172 | for i in range(len(train_image_file_line)): 173 | shutil.move(train_image_file_line[i].strip(), images_train_dir) # move the image into image train folder 174 | print("train"+str(i)) 175 | 176 | for i in range(len(val_image_file_line)): 177 | shutil.move(val_image_file_line[i].strip(), images_val_dir) # move the image into image val folder 178 | print("val"+str(i)) 179 | 180 | for i in range(len(test_image_file_line)): 181 | shutil.move(test_image_file_line[i].strip(), images_test_dir) # move the image into image test folder 182 | print("test"+str(i)) 183 | 184 | --------------------------------------------------------------------------------