├── README.md ├── coco_dataset.py ├── data ├── dinner.png ├── dinner_result.png ├── face.jpg ├── face.png ├── face_result.png ├── football.jpg ├── football_detected.jpg ├── hand.jpg ├── hand.png ├── hand_result.png ├── people.png ├── people_result.png ├── person.png └── person_result.png ├── entity.py ├── face_detector.py ├── gen_ignore_mask.py ├── getData.sh ├── hand_detector.py ├── models ├── CocoPoseNet.py ├── FaceNet.py └── HandNet.py ├── openpose.py ├── pose_detect.py └── train.py /README.md: -------------------------------------------------------------------------------- 1 | # Pytorch0.4.1_Realtime\_Multi-Person\_Pose\_Estimation 2 | 3 | This is an implementation of [Realtime Multi-Person Pose Estimation](https://arxiv.org/abs/1611.08050) with Pytorch. 4 | The original project is here. 5 | 6 | This repo is mainly based on the [Chainer Implementation](https://github.com/DeNA/Chainer_Realtime_Multi-Person_Pose_Estimation) from DeNA. 7 | 8 | This project is licensed under the terms of the [license](https://github.com/DeNA/Chainer_Realtime_Multi-Person_Pose_Estimation/blob/master/LICENSE) 9 | 10 | Some contributions are: 11 | 12 | 1. Switching the Chainer backbone to Pytorch 13 | 2. Pretrained models in pytorch are provided 14 | 3. Chinese comments to better understanding the method 15 | 16 | ## Content 17 | 18 | 1. [Converting caffe model](#convert-caffe-model-to-chainer-model) 19 | 2. [Testing](#test-using-the-trained-model) 20 | 3. [Training](#train-your-model-from-scratch) 21 | 22 | ## Test using the trained model 23 | 24 | Download the following pretrained models to `models` folder 25 | 26 | posenet.pth : [@Google Drive](https://drive.google.com/open?id=19AIYt2lez5V3x4wFVJvVvWwQpB8uoQp2) [@One Drive](https://1drv.ms/u/s!AhMqVPD44cDOhxrwKTyv9yv3FRVq) 27 | 28 | facenet.pth : [@Google Drive](https://drive.google.com/open?id=1zjv4fQt3Sd567VpesqAEO4JVsjB79xpZ) [@One Drive](https://1drv.ms/u/s!AhMqVPD44cDOhxwu2Nmf1eXNmOXd) 29 | 30 | handnet.pth : [@Google Drive](https://drive.google.com/open?id=1LdWngNbcamMJFAuRaqT45Iar2tA_eUXd) [@One Drive](https://1drv.ms/u/s!AhMqVPD44cDOhxs1EIBYqksR6avn) 31 | 32 | Execute the following command with the weight parameter file and the image file as arguments for estimating pose. 33 | The resulting image will be saved as `result.jpg`. 34 | 35 | ``` 36 | python pose_detect.py models/posenet.pth -i data/football.jpg 37 | ``` 38 | 39 |

40 |

41 | 42 |

43 |

44 | 45 | 46 | 47 | Similarly, execute the following command for face estimation. 48 | The resulting image will be saved as `result.png`. 49 | 50 | ``` 51 | python face_detector.py models/facenet.pth -i data/face.png 52 | ``` 53 | 54 |

55 |

56 | 57 |

58 |

59 | 60 | 61 | 62 | Similarly, execute the following command for hand estimation. 63 | The resulting image will be saved as `result.jpg`. 64 | 65 | ``` 66 | python hand_detector.py models/handnet.pth -i data/hand.jpg 67 | ``` 68 | 69 |

70 |

71 | 72 |

73 |

74 | 75 | 76 | 77 | ## Train your model 78 | 79 | This is a training procedure using COCO 2017 dataset. 80 | 81 | ### Download COCO 2017 dataset 82 | 83 | ``` 84 | cd data 85 | bash getData.sh 86 | ``` 87 | 88 | If you already downloaded the dataset by yourself, please skip this procedure and change coco_dir in `entity.py` to the dataset path that was already downloaded. 89 | 90 | ### Setup COCO API 91 | 92 | ``` 93 | git clone https://github.com/cocodataset/cocoapi.git 94 | cd cocoapi/PythonAPI/ 95 | make 96 | python setup.py install 97 | cd ../../ 98 | ``` 99 | 100 | ### Download [VGG-19 pretrained model](https://1drv.ms/u/s!AhMqVPD44cDOhx3dz655sCwOck2X) to `models` folder 101 | 102 | ### Generate and save image masks 103 | 104 | Mask images are created in order to filter out people regions who were not labeled with any keypoints. 105 | `vis` option can be used to visualize the mask generated from each image. 106 | 107 | ``` 108 | python gen_ignore_mask.py 109 | ``` 110 | 111 | ### Train with COCO dataset 112 | 113 | For each 1000 iterations, the recent weight parameters are saved as a weight file `model_iter_1000`. 114 | 115 | ``` 116 | python train.py 117 | ``` 118 | 119 | More configuration about training are in the `entity.py` file 120 | 121 | ## Related repository 122 | 123 | - CVPR'16, [Convolutional Pose Machines](https://github.com/shihenw/convolutional-pose-machines-release). 124 | - CVPR'17, [Realtime Multi-Person Pose Estimation](https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation). 125 | 126 | 127 | 128 | ## Citation 129 | 130 | Please cite the original paper in your publications if it helps your research: 131 | 132 | ``` 133 | @InProceedings{cao2017realtime, 134 | title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields}, 135 | author = {Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh}, 136 | booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 137 | year = {2017} 138 | } 139 | ``` -------------------------------------------------------------------------------- /coco_dataset.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import cv2 4 | import math 5 | import random 6 | import numpy as np 7 | import torch 8 | from torch.utils.data import Dataset 9 | from pycocotools.coco import COCO 10 | 11 | from entity import JointType, params 12 | 13 | class CocoDataset(Dataset): 14 | def __init__(self, coco, insize, mode='train', n_samples=None): 15 | self.coco = coco 16 | assert mode in ['train', 'val', 'eval'], 'Data loading mode is invalid.' 17 | self.mode = mode 18 | self.catIds = coco.getCatIds(catNms=['person']) 19 | self.imgIds = sorted(coco.getImgIds(catIds=self.catIds)) 20 | if self.mode in ['val', 'eval'] and n_samples is not None: 21 | self.imgIds = random.sample(self.imgIds, n_samples) 22 | print('{} images: {}'.format(mode, len(self))) 23 | self.insize = insize 24 | 25 | def __len__(self): 26 | return len(self.imgIds) 27 | 28 | def overlay_paf(self, img, paf): 29 | hue = ((np.arctan2(paf[1], paf[0]) / np.pi) / -2 + 0.5) 30 | saturation = np.sqrt(paf[0] ** 2 + paf[1] ** 2) 31 | saturation[saturation > 1.0] = 1.0 32 | value = saturation.copy() 33 | hsv_paf = np.vstack((hue[np.newaxis], saturation[np.newaxis], value[np.newaxis])).transpose(1, 2, 0) 34 | rgb_paf = cv2.cvtColor((hsv_paf * 255).astype(np.uint8), cv2.COLOR_HSV2BGR) 35 | img = cv2.addWeighted(img, 0.6, rgb_paf, 0.4, 0) 36 | return img 37 | 38 | def overlay_pafs(self, img, pafs): 39 | mix_paf = np.zeros((2,) + img.shape[:-1]) 40 | paf_flags = np.zeros(mix_paf.shape) # for constant paf 41 | 42 | for paf in pafs.reshape((int(pafs.shape[0]/2), 2,) + pafs.shape[1:]): 43 | paf_flags = paf != 0 44 | paf_flags += np.broadcast_to(paf_flags[0] | paf_flags[1], paf.shape) 45 | mix_paf += paf 46 | 47 | mix_paf[paf_flags > 0] /= paf_flags[paf_flags > 0] 48 | img = self.overlay_paf(img, mix_paf) 49 | return img 50 | 51 | def overlay_heatmap(self, img, heatmap): 52 | rgb_heatmap = cv2.applyColorMap((heatmap * 255).astype(np.uint8), cv2.COLORMAP_JET) 53 | img = cv2.addWeighted(img, 0.6, rgb_heatmap, 0.4, 0) 54 | return img 55 | 56 | def overlay_ignore_mask(self, img, ignore_mask): 57 | img = img * np.repeat((ignore_mask == 0).astype(np.uint8)[:, :, None], 3, axis=2) 58 | return img 59 | 60 | def get_pose_bboxes(self, poses): 61 | pose_bboxes = [] 62 | for pose in poses: 63 | x1 = pose[pose[:, 2] > 0][:, 0].min() 64 | y1 = pose[pose[:, 2] > 0][:, 1].min() 65 | x2 = pose[pose[:, 2] > 0][:, 0].max() 66 | y2 = pose[pose[:, 2] > 0][:, 1].max() 67 | pose_bboxes.append([x1, y1, x2, y2]) 68 | pose_bboxes = np.array(pose_bboxes) 69 | return pose_bboxes 70 | 71 | def resize_data(self, img, ignore_mask, poses, shape): 72 | """resize img, mask and annotations""" 73 | img_h, img_w, _ = img.shape 74 | 75 | resized_img = cv2.resize(img, shape) 76 | ignore_mask = cv2.resize(ignore_mask.astype(np.uint8), shape).astype('bool') 77 | poses[:, :, :2] = (poses[:, :, :2] * np.array(shape) / np.array((img_w, img_h))) 78 | return resized_img, ignore_mask, poses 79 | 80 | def random_resize_img(self, img, ignore_mask, poses): 81 | h, w, _ = img.shape 82 | joint_bboxes = self.get_pose_bboxes(poses) 83 | bbox_sizes = ((joint_bboxes[:, 2:] - joint_bboxes[:, :2] + 1)**2).sum(axis=1)**0.5 84 | 85 | min_scale = params['min_box_size']/bbox_sizes.min() 86 | max_scale = params['max_box_size']/bbox_sizes.max() 87 | 88 | # print(len(bbox_sizes)) 89 | # print('min: {}, max: {}'.format(min_scale, max_scale)) 90 | 91 | min_scale = min(max(min_scale, params['min_scale']), 1) 92 | max_scale = min(max(max_scale, 1), params['max_scale']) 93 | 94 | # print('min: {}, max: {}'.format(min_scale, max_scale)) 95 | 96 | scale = float((max_scale - min_scale) * random.random() + min_scale) 97 | shape = (round(w * scale), round(h * scale)) 98 | 99 | # print(scale) 100 | 101 | resized_img, resized_mask, resized_poses = self.resize_data(img, ignore_mask, poses, shape) 102 | return resized_img, resized_mask, poses 103 | 104 | def random_rotate_img(self, img, mask, poses): 105 | h, w, _ = img.shape 106 | # degree = (random.random() - 0.5) * 2 * params['max_rotate_degree'] 107 | degree = np.random.randn() / 3 * params['max_rotate_degree'] 108 | rad = degree * math.pi / 180 109 | center = (w / 2, h / 2) 110 | R = cv2.getRotationMatrix2D(center, degree, 1) 111 | bbox = (w*abs(math.cos(rad)) + h*abs(math.sin(rad)), w*abs(math.sin(rad)) + h*abs(math.cos(rad))) 112 | R[0, 2] += bbox[0] / 2 - center[0] 113 | R[1, 2] += bbox[1] / 2 - center[1] 114 | rotate_img = cv2.warpAffine(img, R, (int(bbox[0]+0.5), int(bbox[1]+0.5)), flags=cv2.INTER_CUBIC, 115 | borderMode=cv2.BORDER_CONSTANT, borderValue=[127.5, 127.5, 127.5]) 116 | rotate_mask = cv2.warpAffine(mask.astype('uint8')*255, R, (int(bbox[0]+0.5), int(bbox[1]+0.5))) > 0 117 | 118 | tmp_poses = np.ones_like(poses) 119 | tmp_poses[:, :, :2] = poses[:, :, :2].copy() 120 | tmp_rotate_poses = np.dot(tmp_poses, R.T) # apply rotation matrix to the poses 121 | rotate_poses = poses.copy() # to keep visibility flag 122 | rotate_poses[:, :, :2] = tmp_rotate_poses 123 | return rotate_img, rotate_mask, rotate_poses 124 | 125 | def random_crop_img(self, img, ignore_mask, poses): 126 | h, w, _ = img.shape 127 | insize = self.insize 128 | joint_bboxes = self.get_pose_bboxes(poses) 129 | bbox = random.choice(joint_bboxes) # select a bbox randomly 130 | bbox_center = bbox[:2] + (bbox[2:] - bbox[:2])/2 131 | 132 | r_xy = np.random.rand(2) 133 | perturb = ((r_xy - 0.5) * 2 * params['center_perterb_max']) 134 | center = (bbox_center + perturb + 0.5).astype('i') 135 | 136 | crop_img = np.zeros((insize, insize, 3), 'uint8') + 127.5 137 | crop_mask = np.zeros((insize, insize), 'bool') 138 | 139 | offset = (center - (insize-1)/2 + 0.5).astype('i') 140 | offset_ = (center + (insize-1)/2 - (w-1, h-1) + 0.5).astype('i') 141 | 142 | x1, y1 = (center - (insize-1)/2 + 0.5).astype('i') 143 | x2, y2 = (center + (insize-1)/2 + 0.5).astype('i') 144 | 145 | x1 = max(x1, 0) 146 | y1 = max(y1, 0) 147 | x2 = min(x2, w-1) 148 | y2 = min(y2, h-1) 149 | 150 | x_from = -offset[0] if offset[0] < 0 else 0 151 | y_from = -offset[1] if offset[1] < 0 else 0 152 | x_to = insize - offset_[0] - 1 if offset_[0] >= 0 else insize - 1 153 | y_to = insize - offset_[1] - 1 if offset_[1] >= 0 else insize - 1 154 | 155 | crop_img[y_from:y_to+1, x_from:x_to+1] = img[y1:y2+1, x1:x2+1].copy() 156 | crop_mask[y_from:y_to+1, x_from:x_to+1] = ignore_mask[y1:y2+1, x1:x2+1].copy() 157 | 158 | poses[:, :, :2] -= offset 159 | return crop_img.astype('uint8'), crop_mask, poses 160 | 161 | def distort_color(self, img): 162 | img_max = np.broadcast_to(np.array(255, dtype=np.uint8), img.shape[:-1]) 163 | img_min = np.zeros(img.shape[:-1], dtype=np.uint8) 164 | 165 | hsv_img = cv2.cvtColor(img.copy(), cv2.COLOR_BGR2HSV).astype(np.int32) 166 | hsv_img[:, :, 0] = np.maximum(np.minimum(hsv_img[:, :, 0] - 10 + np.random.randint(20 + 1), img_max), img_min) # hue 167 | hsv_img[:, :, 1] = np.maximum(np.minimum(hsv_img[:, :, 1] - 40 + np.random.randint(80 + 1), img_max), img_min) # saturation 168 | hsv_img[:, :, 2] = np.maximum(np.minimum(hsv_img[:, :, 2] - 30 + np.random.randint(60 + 1), img_max), img_min) # value 169 | hsv_img = hsv_img.astype(np.uint8) 170 | 171 | distorted_img = cv2.cvtColor(hsv_img, cv2.COLOR_HSV2BGR) 172 | return distorted_img 173 | 174 | def flip_img(self, img, mask, poses): 175 | flipped_img = cv2.flip(img, 1) 176 | flipped_mask = cv2.flip(mask.astype(np.uint8), 1).astype('bool') 177 | poses[:, :, 0] = img.shape[1] - 1 - poses[:, :, 0] 178 | 179 | def swap_joints(poses, joint_type_1, joint_type_2): 180 | tmp = poses[:, joint_type_1].copy() 181 | poses[:, joint_type_1] = poses[:, joint_type_2] 182 | poses[:, joint_type_2] = tmp 183 | 184 | swap_joints(poses, JointType.LeftEye, JointType.RightEye) 185 | swap_joints(poses, JointType.LeftEar, JointType.RightEar) 186 | swap_joints(poses, JointType.LeftShoulder, JointType.RightShoulder) 187 | swap_joints(poses, JointType.LeftElbow, JointType.RightElbow) 188 | swap_joints(poses, JointType.LeftHand, JointType.RightHand) 189 | swap_joints(poses, JointType.LeftWaist, JointType.RightWaist) 190 | swap_joints(poses, JointType.LeftKnee, JointType.RightKnee) 191 | swap_joints(poses, JointType.LeftFoot, JointType.RightFoot) 192 | return flipped_img, flipped_mask, poses 193 | 194 | def augment_data(self, img, ignore_mask, poses): 195 | aug_img = img.copy() 196 | aug_img, ignore_mask, poses = self.random_resize_img(aug_img, ignore_mask, poses) 197 | aug_img, ignore_mask, poses = self.random_rotate_img(aug_img, ignore_mask, poses) 198 | aug_img, ignore_mask, poses = self.random_crop_img(aug_img, ignore_mask, poses) 199 | if np.random.randint(2): 200 | aug_img = self.distort_color(aug_img) 201 | if np.random.randint(2): 202 | aug_img, ignore_mask, poses = self.flip_img(aug_img, ignore_mask, poses) 203 | 204 | return aug_img, ignore_mask, poses 205 | 206 | # return shape: (height, width) 207 | def generate_gaussian_heatmap(self, shape, joint, sigma): 208 | x, y = joint 209 | grid_x = np.tile(np.arange(shape[1]), (shape[0], 1)) 210 | grid_y = np.tile(np.arange(shape[0]), (shape[1], 1)).transpose() 211 | grid_distance = (grid_x - x) ** 2 + (grid_y - y) ** 2 212 | gaussian_heatmap = np.exp(-0.5 * grid_distance / sigma**2) 213 | #产生的就是一整张图的gaussian分布，只不过里中心点远的点非常非常小 214 | return gaussian_heatmap 215 | 216 | def generate_heatmaps(self, img, poses, heatmap_sigma): 217 | heatmaps = np.zeros((0,) + img.shape[:-1]) 218 | sum_heatmap = np.zeros(img.shape[:-1]) 219 | for joint_index in range(len(JointType)): 220 | heatmap = np.zeros(img.shape[:-1]) 221 | for pose in poses: 222 | if pose[joint_index, 2] > 0: 223 | jointmap = self.generate_gaussian_heatmap(img.shape[:-1], pose[joint_index][:2], heatmap_sigma) 224 | heatmap[jointmap > heatmap] = jointmap[jointmap > heatmap] 225 | sum_heatmap[jointmap > sum_heatmap] = jointmap[jointmap > sum_heatmap] 226 | heatmaps = np.vstack((heatmaps, heatmap.reshape((1,) + heatmap.shape))) 227 | bg_heatmap = 1 - sum_heatmap # background channel 228 | heatmaps = np.vstack((heatmaps, bg_heatmap[None])) 229 | ''' 230 | We take the maximum of the confidence maps insteaof the average so that thprecision of close by peaks remains distinct, 231 | as illus- trated in the right figure. At test time, we predict confidence maps (as shown in the first row of Fig. 4), 232 | and obtain body part candidates by performing non-maximum suppression. 233 | At test time, we predict confidence maps (as shown in the first row of Fig. 4), 234 | and obtain body part candidates by performing non-maximum suppression. 235 | ''' 236 | return heatmaps.astype('f') 237 | 238 | # return shape: (2, height, width) 239 | def generate_constant_paf(self, shape, joint_from, joint_to, paf_width): 240 | if np.array_equal(joint_from, joint_to): # same joint 241 | return np.zeros((2,) + shape[:-1]) 242 | 243 | joint_distance = np.linalg.norm(joint_to - joint_from) 244 | unit_vector = (joint_to - joint_from) / joint_distance 245 | rad = np.pi / 2 246 | rot_matrix = np.array([[np.cos(rad), np.sin(rad)], [-np.sin(rad), np.cos(rad)]]) 247 | vertical_unit_vector = np.dot(rot_matrix, unit_vector) # 垂直分量 248 | grid_x = np.tile(np.arange(shape[1]), (shape[0], 1)) 249 | grid_y = np.tile(np.arange(shape[0]), (shape[1], 1)).transpose() # grid_x, grid_y用来遍历图上的每一个点 250 | horizontal_inner_product = unit_vector[0] * (grid_x - joint_from[0]) + unit_vector[1] * (grid_y - joint_from[1]) 251 | horizontal_paf_flag = (0 <= horizontal_inner_product) & (horizontal_inner_product <= joint_distance) 252 | ''' 253 | 相当于遍历图上的每一个点，从这个点到joint_from的向量与unit_vector点乘 254 | 两个向量点乘相当于取一个向量在另一个向量方向上的投影 255 | 如果点乘大于0，那就可以判断这个点在不在这个躯干的方向上了， 256 | (0 <= horizontal_inner_product) & (horizontal_inner_product <= joint_distance) 257 | 这个限制条件是保证在与躯干水平的方向上，找出所有落在躯干范围内的点 258 | 然而还要判断这个点离躯干的距离有多远 259 | ''' 260 | vertical_inner_product = vertical_unit_vector[0] * (grid_x - joint_from[0]) + vertical_unit_vector[1] * (grid_y - joint_from[1]) 261 | vertical_paf_flag = np.abs(vertical_inner_product) <= paf_width # paf_width : 8 262 | ''' 263 | 要判断这个点离躯干的距离有多远，只要拿与起始点的向量点乘垂直分量就可以了， 264 | 所以这里的限制条件是paf_width, 不然一个手臂就无限粗了 265 | vertical_paf_flag = np.abs(vertical_inner_product) <= paf_width 266 | 这个限制条件是保证在与躯干垂直的方向上，找出所有落在躯干范围内的点（这个躯干范围看来是手工定义的) 267 | ''' 268 | paf_flag = horizontal_paf_flag & vertical_paf_flag # 合并两个限制条件 269 | constant_paf = np.stack((paf_flag, paf_flag)) * np.broadcast_to(unit_vector, shape[:-1] + (2,)).transpose(2, 0, 1) 270 | # constant_paf.shape : (2, 368, 368), 上面这一步就是把2维的unit_vector broadcast到所有paf_flag为true的点上去 271 | # constant_paf里面有368*368个点，每个点上有两个值，代表一个矢量 272 | # constant_paf里的这些矢量只会取两种值，要么是(0,0),要么是unit_vector的值 273 | '''最后，这个函数完成的是论文里公式8和公式9，相关说明也可以看论文这一段的描述''' 274 | return constant_paf 275 | 276 | def generate_pafs(self, img, poses, paf_sigma): 277 | pafs = np.zeros((0,) + img.shape[:-1]) 278 | 279 | for limb in params['limbs_point']: 280 | paf = np.zeros((2,) + img.shape[:-1]) 281 | paf_flags = np.zeros(paf.shape) # for constant paf 282 | 283 | for pose in poses: 284 | joint_from, joint_to = pose[limb] 285 | if joint_from[2] > 0 and joint_to[2] > 0: 286 | limb_paf = self.generate_constant_paf(img.shape, joint_from[:2], joint_to[:2], paf_sigma) #[2,368,368] 287 | limb_paf_flags = limb_paf != 0 288 | paf_flags += np.broadcast_to(limb_paf_flags[0] | limb_paf_flags[1], limb_paf.shape) 289 | ''' 290 | 这个flags的作用是计数，在遍历了一张图上的所有人体之后，有的地方可能会有重叠， 291 | 比如说两个人的左手臂交织在一起，重叠的部分就累加了两次， 292 | 这里计数了之后，后面可以用来求均值 293 | ''' 294 | paf += limb_paf 295 | 296 | paf[paf_flags > 0] /= paf_flags[paf_flags > 0] # 求均值 297 | pafs = np.vstack((pafs, paf)) 298 | return pafs.astype('f') 299 | 300 | def get_img_annotation(self, ind=None, img_id=None): 301 | """インデックスまたは img_id から coco annotation dataを抽出、条件に満たない場合はNoneを返す """ 302 | '''从索引或img_id中提取coco注释数据，如果不符合条件，则返回None''' 303 | annotations = None 304 | 305 | if ind is not None: 306 | img_id = self.imgIds[ind] 307 | anno_ids = self.coco.getAnnIds(imgIds=[img_id], iscrowd=None) 308 | 309 | # annotation for that image 310 | if len(anno_ids) > 0: 311 | annotations_for_img = self.coco.loadAnns(anno_ids) 312 | 313 | person_cnt = 0 314 | valid_annotations_for_img = [] 315 | for annotation in annotations_for_img: 316 | # if too few keypoints or too small 317 | if annotation['num_keypoints'] >= params['min_keypoints'] and annotation['area'] > params['min_area']: 318 | person_cnt += 1 319 | valid_annotations_for_img.append(annotation) 320 | 321 | # if person annotation 322 | if person_cnt > 0: 323 | annotations = valid_annotations_for_img 324 | 325 | if self.mode == 'train': 326 | img_path = os.path.join(params['coco_dir'], 'train2017', self.coco.loadImgs([img_id])[0]['file_name']) 327 | mask_path = os.path.join(params['coco_dir'], 'ignore_mask_train2017', '{:012d}.png'.format(img_id)) 328 | else: 329 | img_path = os.path.join(params['coco_dir'], 'val2017', self.coco.loadImgs([img_id])[0]['file_name']) 330 | mask_path = os.path.join(params['coco_dir'], 'ignore_mask_val2017', '{:012d}.png'.format(img_id)) 331 | img = cv2.imread(img_path) 332 | ignore_mask = cv2.imread(mask_path, 0) 333 | if ignore_mask is None: 334 | ignore_mask = np.zeros(img.shape[:2], 'bool') 335 | else: 336 | ignore_mask = ignore_mask == 255 337 | 338 | if self.mode == 'eval': 339 | return img, img_id, annotations_for_img, ignore_mask 340 | return img, img_id, annotations, ignore_mask 341 | 342 | def parse_coco_annotation(self, annotations): 343 | """coco annotation dataのアノテーションをposes配列に変換""" 344 | '''将coco注释数据注释转换为姿势数组''' 345 | poses = np.zeros((0, len(JointType), 3), dtype=np.int32) 346 | 347 | for ann in annotations: 348 | ann_pose = np.array(ann['keypoints']).reshape(-1, 3) 349 | pose = np.zeros((1, len(JointType), 3), dtype=np.int32) 350 | 351 | # convert poses position 352 | for i, joint_index in enumerate(params['coco_joint_indices']): 353 | pose[0][joint_index] = ann_pose[i] 354 | 355 | # compute neck position 356 | if pose[0][JointType.LeftShoulder][2] > 0 and pose[0][JointType.RightShoulder][2] > 0: 357 | pose[0][JointType.Neck][0] = int((pose[0][JointType.LeftShoulder][0] + pose[0][JointType.RightShoulder][0]) / 2) 358 | pose[0][JointType.Neck][1] = int((pose[0][JointType.LeftShoulder][1] + pose[0][JointType.RightShoulder][1]) / 2) 359 | pose[0][JointType.Neck][2] = 2 360 | 361 | poses = np.vstack((poses, pose)) 362 | 363 | # gt_pose = np.array(ann['keypoints']).reshape(-1, 3) 364 | return poses 365 | 366 | def generate_labels(self, img, poses, ignore_mask): 367 | img, ignore_mask, poses = self.augment_data(img, ignore_mask, poses) 368 | resized_img, ignore_mask, resized_poses = self.resize_data(img, ignore_mask, poses, shape=(self.insize, self.insize)) 369 | 370 | heatmaps = self.generate_heatmaps(resized_img, resized_poses, params['heatmap_sigma']) 371 | pafs = self.generate_pafs(resized_img, resized_poses, params['paf_sigma']) # params['paf_sigma']: 8 372 | ignore_mask = cv2.morphologyEx(ignore_mask.astype('uint8'), cv2.MORPH_DILATE, np.ones((16, 16))).astype('bool') 373 | return resized_img, pafs, heatmaps, ignore_mask 374 | 375 | def preprocess(self, img): 376 | x_data = img.astype('f') 377 | x_data /= 255 378 | x_data -= 0.5 379 | x_data = x_data.transpose(2, 0, 1) 380 | return x_data 381 | 382 | def __getitem__(self, i): 383 | img, img_id, annotations, ignore_mask = self.get_img_annotation(ind=i) 384 | 385 | if self.mode == 'eval': 386 | # don't need to make heatmaps/pafs 387 | return img, annotations, img_id 388 | 389 | # if no annotations are available 390 | while annotations is None: 391 | img_id = self.imgIds[np.random.randint(len(self))] 392 | img, img_id, annotations, ignore_mask = self.get_img_annotation(img_id=img_id) 393 | 394 | poses = self.parse_coco_annotation(annotations) 395 | resized_img, pafs, heatmaps, ignore_mask = self.generate_labels(img, poses, ignore_mask) 396 | resized_img = self.preprocess(resized_img) 397 | resized_img = torch.tensor(resized_img) 398 | pafs = torch.tensor(pafs) 399 | heatmaps = torch.tensor(heatmaps) 400 | ignore_mask = torch.tensor(ignore_mask.astype('f')) 401 | return resized_img, pafs, heatmaps, ignore_mask -------------------------------------------------------------------------------- /data/dinner.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/dinner.png -------------------------------------------------------------------------------- /data/dinner_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/dinner_result.png -------------------------------------------------------------------------------- /data/face.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/face.jpg -------------------------------------------------------------------------------- /data/face.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/face.png -------------------------------------------------------------------------------- /data/face_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/face_result.png -------------------------------------------------------------------------------- /data/football.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/football.jpg -------------------------------------------------------------------------------- /data/football_detected.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/football_detected.jpg -------------------------------------------------------------------------------- /data/hand.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/hand.jpg -------------------------------------------------------------------------------- /data/hand.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/hand.png -------------------------------------------------------------------------------- /data/hand_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/hand_result.png -------------------------------------------------------------------------------- /data/people.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/people.png -------------------------------------------------------------------------------- /data/people_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/people_result.png -------------------------------------------------------------------------------- /data/person.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/person.png -------------------------------------------------------------------------------- /data/person_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/person_result.png -------------------------------------------------------------------------------- /entity.py: -------------------------------------------------------------------------------- 1 | from enum import IntEnum 2 | 3 | from models.CocoPoseNet import CocoPoseNet 4 | 5 | from models.FaceNet import FaceNet 6 | from models.HandNet import HandNet 7 | from pathlib import Path 8 | 9 | class JointType(IntEnum): 10 | """関節の種類を表す """ 11 | Nose = 0 12 | """ 鼻 """ 13 | Neck = 1 14 | """ 首 """ 15 | RightShoulder = 2 16 | """ 右肩 """ 17 | RightElbow = 3 18 | """ 右肘 """ 19 | RightHand = 4 20 | """ 右手 """ 21 | LeftShoulder = 5 22 | """ 左肩 """ 23 | LeftElbow = 6 24 | """ 左肘 """ 25 | LeftHand = 7 26 | """ 左手 """ 27 | RightWaist = 8 28 | """ 右腰 """ 29 | RightKnee = 9 30 | """ 右膝 """ 31 | RightFoot = 10 32 | """ 右足 """ 33 | LeftWaist = 11 34 | """ 左腰 """ 35 | LeftKnee = 12 36 | """ 左膝 """ 37 | LeftFoot = 13 38 | """ 左足 """ 39 | RightEye = 14 40 | """ 右目 """ 41 | LeftEye = 15 42 | """ 左目 """ 43 | RightEar = 16 44 | """ 右耳 """ 45 | LeftEar = 17 46 | """ 左耳 """ 47 | 48 | params = { 49 | 'coco_dir': 'coco2017', 50 | 'archs': { 51 | 'posenet': CocoPoseNet, 52 | 'facenet': FaceNet, 53 | 'handnet': HandNet, 54 | }, 55 | 'pretrained_path' : 'models/pretrained_vgg_base.pth', 56 | # training params 57 | 'min_keypoints': 5, 58 | 'min_area': 32 * 32, 59 | 'insize': 368, 60 | 'downscale': 8, 61 | 'paf_sigma': 8, 62 | 'heatmap_sigma': 7, 63 | 'batch_size': 10, 64 | 'lr': 1e-4, 65 | 'num_workers': 2, 66 | 'eva_num': 100, 67 | 'board_loss_interval': 100, 68 | 'eval_interval': 4, 69 | 'board_pred_image_interval': 2, 70 | 'save_interval': 2, 71 | 'log_path': 'work_space/log', 72 | 'work_space': Path('work_space'), 73 | 74 | 'min_box_size': 64, 75 | 'max_box_size': 512, 76 | 'min_scale': 0.5, 77 | 'max_scale': 2.0, 78 | 'max_rotate_degree': 40, 79 | 'center_perterb_max': 40, 80 | 81 | # inference params 82 | 'inference_img_size': 368, 83 | 'inference_scales': [0.5, 1, 1.5, 2], 84 | # 'inference_scales': [1.0], 85 | 'heatmap_size': 320, 86 | 'gaussian_sigma': 2.5, 87 | 'ksize': 17, 88 | 'n_integ_points': 10, 89 | 'n_integ_points_thresh': 8, 90 | 'heatmap_peak_thresh': 0.05, 91 | 'inner_product_thresh': 0.05, 92 | 'limb_length_ratio': 1.0, 93 | 'length_penalty_value': 1, 94 | 'n_subset_limbs_thresh': 3, 95 | 'subset_score_thresh': 0.2, 96 | 'limbs_point': [ 97 | [JointType.Neck, JointType.RightWaist], 98 | [JointType.RightWaist, JointType.RightKnee], 99 | [JointType.RightKnee, JointType.RightFoot], 100 | [JointType.Neck, JointType.LeftWaist], 101 | [JointType.LeftWaist, JointType.LeftKnee], 102 | [JointType.LeftKnee, JointType.LeftFoot], 103 | [JointType.Neck, JointType.RightShoulder], 104 | [JointType.RightShoulder, JointType.RightElbow], 105 | [JointType.RightElbow, JointType.RightHand], 106 | [JointType.RightShoulder, JointType.RightEar], 107 | [JointType.Neck, JointType.LeftShoulder], 108 | [JointType.LeftShoulder, JointType.LeftElbow], 109 | [JointType.LeftElbow, JointType.LeftHand], 110 | [JointType.LeftShoulder, JointType.LeftEar], 111 | [JointType.Neck, JointType.Nose], 112 | [JointType.Nose, JointType.RightEye], 113 | [JointType.Nose, JointType.LeftEye], 114 | [JointType.RightEye, JointType.RightEar], 115 | [JointType.LeftEye, JointType.LeftEar] 116 | ], 117 | 'coco_joint_indices': [ 118 | JointType.Nose, 119 | JointType.LeftEye, 120 | JointType.RightEye, 121 | JointType.LeftEar, 122 | JointType.RightEar, 123 | JointType.LeftShoulder, 124 | JointType.RightShoulder, 125 | JointType.LeftElbow, 126 | JointType.RightElbow, 127 | JointType.LeftHand, 128 | JointType.RightHand, 129 | JointType.LeftWaist, 130 | JointType.RightWaist, 131 | JointType.LeftKnee, 132 | JointType.RightKnee, 133 | JointType.LeftFoot, 134 | JointType.RightFoot 135 | ], 136 | 137 | # face params 138 | 'face_inference_img_size': 368, 139 | 'face_heatmap_peak_thresh': 0.1, 140 | 'face_crop_scale': 1.5, 141 | 'face_line_indices': [ 142 | [0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 11], [11, 12], [12, 13], [13, 14], [14, 15], [15, 16], # 輪郭 143 | [17, 18], [18, 19], [19, 20], [20, 21], # 右眉 144 | [22, 23], [23, 24], [24, 25], [25, 26], # 左眉 145 | [27, 28], [28, 29], [29, 30], # 鼻 146 | [31, 32], [32, 33], [33, 34], [34, 35], # 鼻下の横線 147 | [36, 37], [37, 38], [38, 39], [39, 40], [40, 41], [41, 36], # 右目 148 | [42, 43], [43, 44], [44, 45], [45, 46], [46, 47], [47, 42], # 左目 149 | [48, 49], [49, 50], [50, 51], [51, 52], [52, 53], [53, 54], [54, 55], [55, 56], [56, 57], [57, 58], [58, 59], [59, 48], # 唇外輪 150 | [60, 61], [61, 62], [62, 63], [63, 64], [64, 65], [65, 66], [66, 67], [67, 60] # 唇内輪 151 | ], 152 | 153 | # hand params 154 | 'hand_inference_img_size': 368, 155 | 'hand_heatmap_peak_thresh': 0.1, 156 | 'fingers_indices': [ 157 | [[0, 1], [1, 2], [2, 3], [3, 4]], 158 | [[0, 5], [5, 6], [6, 7], [7, 8]], 159 | [[0, 9], [9, 10], [10, 11], [11, 12]], 160 | [[0, 13], [13, 14], [14, 15], [15, 16]], 161 | [[0, 17], [17, 18], [18, 19], [19, 20]], 162 | ], 163 | } 164 | -------------------------------------------------------------------------------- /face_detector.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import argparse 3 | import numpy as np 4 | from scipy.ndimage.filters import gaussian_filter 5 | import torch 6 | import torch.nn.functional as F 7 | from entity import params 8 | from models.FaceNet import FaceNet 9 | 10 | class FaceDetector(object): 11 | def __init__(self, weights_file): 12 | print('Loading FaceNet...') 13 | self.model = FaceNet() 14 | self.model.load_state_dict(torch.load(weights_file)) 15 | 16 | self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 17 | self.model = self.model.to(self.device) 18 | 19 | def detect(self, face_img, fast_mode=False): 20 | face_img_h, face_img_w, _ = face_img.shape 21 | 22 | resized_image = cv2.resize(face_img, (params["face_inference_img_size"], params["face_inference_img_size"])) 23 | x_data = np.array(resized_image[np.newaxis], dtype=np.float32).transpose(0, 3, 1, 2) / 256 - 0.5 24 | x_data = torch.tensor(x_data).to(self.device) 25 | x_data.requires_grad = False 26 | 27 | with torch.no_grad(): 28 | hs = self.model(x_data) 29 | heatmaps = F.interpolate(hs[-1], (face_img_h, face_img_w), mode='bilinear', align_corners=True).cpu().numpy()[0] 30 | 31 | keypoints = self.compute_peaks_from_heatmaps(heatmaps) 32 | return keypoints 33 | 34 | def compute_peaks_from_heatmaps(self, heatmaps): 35 | keypoints = [] 36 | 37 | for i in range(heatmaps.shape[0] - 1): 38 | heatmap = gaussian_filter(heatmaps[i], sigma=params['gaussian_sigma']) 39 | max_value = heatmap.max() 40 | if max_value > params['face_heatmap_peak_thresh']: 41 | coords = np.array(np.where(heatmap==max_value)).flatten().tolist() 42 | keypoints.append([coords[1], coords[0], max_value]) # x, y, conf 43 | else: 44 | keypoints.append(None) 45 | 46 | return keypoints 47 | 48 | def draw_face_keypoints(orig_img, face_keypoints, left_top): 49 | orig_img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB) 50 | img = orig_img.copy() 51 | left, top = left_top 52 | 53 | for keypoint in face_keypoints: 54 | if keypoint: 55 | x, y, conf = keypoint 56 | cv2.circle(img, (x + left, y + top), 2, (255, 255, 0), -1) 57 | 58 | for face_line_index in params["face_line_indices"]: 59 | keypoint_from = face_keypoints[face_line_index[0]] 60 | keypoint_to = face_keypoints[face_line_index[1]] 61 | 62 | if keypoint_from and keypoint_to: 63 | keypoint_from_x, keypoint_from_y, _ = keypoint_from 64 | keypoint_to_x, keypoint_to_y, _ = keypoint_to 65 | cv2.line(img, (keypoint_from_x + left, keypoint_from_y + top), (keypoint_to_x + left, keypoint_to_y + top), (255, 255, 0), 1) 66 | 67 | return img 68 | 69 | def crop_face(img, rect): 70 | orig_img_h, orig_img_w, _ = img.shape 71 | crop_center_x = rect[0] + rect[2] / 2 72 | crop_center_y = rect[1] + rect[3] / 2 73 | crop_width = rect[2] * params['face_crop_scale'] 74 | crop_height = rect[3] * params['face_crop_scale'] 75 | crop_left = max(0, int(crop_center_x - crop_width / 2)) 76 | crop_top = max(0, int(crop_center_y - crop_height / 2)) 77 | crop_right = min(orig_img_w-1, int(crop_center_x + crop_width / 2)) 78 | crop_bottom = min(orig_img_h-1, int(crop_center_y + crop_height / 2)) 79 | cropped_face = img[crop_top:crop_bottom, crop_left:crop_right] 80 | max_edge_len = np.max(cropped_face.shape[:-1]) 81 | padded_face = np.zeros((max_edge_len, max_edge_len, cropped_face.shape[-1]), dtype=np.uint8) 82 | padded_face[0:cropped_face.shape[0], 0:cropped_face.shape[1]] = cropped_face 83 | 84 | return padded_face, (crop_left, crop_top) 85 | 86 | if __name__ == '__main__': 87 | parser = argparse.ArgumentParser(description='Face detector') 88 | parser.add_argument('weights', help='weights file path') 89 | parser.add_argument('--img', '-i', help='image file path') 90 | args = parser.parse_args() 91 | 92 | # load model 93 | face_detector = FaceDetector(args.weights) 94 | 95 | # read image 96 | img = cv2.imread(args.img) 97 | 98 | # inference 99 | face_keypoints = face_detector.detect(img) 100 | 101 | # draw and save image 102 | img = draw_face_keypoints(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), face_keypoints, (0, 0)) 103 | print('Saving result into result.png...') 104 | cv2.imwrite('result.png', img) 105 | -------------------------------------------------------------------------------- /gen_ignore_mask.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import cv2 4 | import argparse 5 | import numpy as np 6 | from tqdm import tqdm 7 | 8 | from pycocotools.coco import COCO 9 | 10 | from entity import params 11 | 12 | 13 | class CocoDataLoader(object): 14 | def __init__(self, coco, mode='train'): 15 | self.coco = coco 16 | assert mode in ['train', 'val'], 'Data loading mode is invalid.' 17 | self.mode = mode 18 | self.catIds = coco.getCatIds() # catNms=['person'] 19 | self.imgIds = sorted(coco.getImgIds(catIds=self.catIds)) 20 | 21 | def __len__(self): 22 | return len(self.imgIds) 23 | 24 | def gen_masks(self, img, annotations): 25 | mask_all = np.zeros(img.shape[:2], 'bool') 26 | mask_miss = np.zeros(img.shape[:2], 'bool') 27 | for ann in annotations: 28 | mask = self.coco.annToMask(ann).astype('bool') 29 | if ann['iscrowd'] == 1: 30 | intxn = mask_all & mask 31 | mask_miss = np.bitwise_or(mask_miss.astype(int) , np.subtract(mask, intxn, dtype=np.int32)) 32 | mask_all = np.bitwise_or(mask_all.astype(int) , mask.astype(int)) 33 | elif ann['num_keypoints'] < params['min_keypoints'] or ann['area'] <= params['min_area']: 34 | mask_all = np.bitwise_or(mask_all.astype(int) , mask.astype(int)) 35 | mask_miss = np.bitwise_or(mask_miss.astype(int) , mask.astype(int)) 36 | else: 37 | mask_all = np.bitwise_or(mask_all.astype(int) , mask.astype(int)) 38 | return mask_all, mask_miss 39 | 40 | def dwaw_gen_masks(self, img, mask, color=(0, 0, 1)): 41 | bimsk = np.repeat(mask[:, :, np.newaxis], 3, axis=2) 42 | mskd = img * bimsk.astype(np.int32) 43 | clmsk = np.ones(bimsk.shape) * bimsk 44 | for i in range(3): 45 | clmsk[:, :, i] = clmsk[:, :, i] * color[i] * 255 46 | img = img + 0.7 * clmsk - 0.7 * mskd 47 | return img.astype(np.uint8) 48 | 49 | def draw_masks_and_keypoints(self, img, annotations): 50 | for ann in annotations: 51 | # masks 52 | mask = self.coco.annToMask(ann).astype(np.uint8) 53 | if ann['iscrowd'] == 1: 54 | color = (0, 0, 1) 55 | elif ann['num_keypoints'] == 0: 56 | color = (0, 1, 0) 57 | else: 58 | color = (1, 0, 0) 59 | bimsk = np.repeat(mask[:, :, np.newaxis], 3, axis=2) 60 | mskd = img * bimsk.astype(np.int32) 61 | clmsk = np.ones(bimsk.shape) * bimsk 62 | for i in range(3): 63 | clmsk[:, :, i] = clmsk[:, :, i] * color[i] * 255 64 | img = img + 0.7 * clmsk - 0.7 * mskd 65 | 66 | # keypoints 67 | for x, y, v in np.array(ann['keypoints']).reshape(-1, 3): 68 | if v == 1: 69 | cv2.circle(img, (x, y), 3, (255, 255, 0), -1) 70 | elif v == 2: 71 | cv2.circle(img, (x, y), 3, (255, 0, 255), -1) 72 | return img.astype(np.uint8) 73 | 74 | def get_img_annotation(self, ind=None, img_id=None): 75 | """インデックスまたは img_id から coco annotation dataを抽出、条件に満たない場合はNoneを返す """ 76 | if ind is not None: 77 | img_id = self.imgIds[ind] 78 | 79 | anno_ids = self.coco.getAnnIds(imgIds=[img_id]) 80 | annotations = self.coco.loadAnns(anno_ids) 81 | 82 | img_file = os.path.join(params['coco_dir'], self.mode+'2017', self.coco.loadImgs([img_id])[0]['file_name']) 83 | img = cv2.imread(img_file) 84 | return img, annotations, img_id 85 | 86 | 87 | if __name__ == '__main__': 88 | parser = argparse.ArgumentParser() 89 | parser.add_argument('--vis', action='store_true', help='visualize annotations and ignore masks') 90 | args = parser.parse_args() 91 | 92 | for mode in ['train', 'val']: 93 | coco = COCO(os.path.join(params['coco_dir'], 'annotations/person_keypoints_{}2017.json'.format(mode))) 94 | data_loader = CocoDataLoader(coco, mode=mode) 95 | 96 | save_dir = os.path.join(params['coco_dir'], 'ignore_mask_{}2017'.format(mode)) 97 | if not os.path.exists(save_dir): 98 | os.makedirs(save_dir) 99 | 100 | for i in tqdm(range(len(data_loader))): 101 | img, annotations, img_id = data_loader.get_img_annotation(ind=i) 102 | mask_all, mask_miss = data_loader.gen_masks(img, annotations) 103 | 104 | if args.vis: 105 | ann_img = data_loader.draw_masks_and_keypoints(img, annotations) 106 | msk_img = data_loader.dwaw_gen_masks(img, mask_miss) 107 | cv2.imshow('image', np.hstack((ann_img, msk_img))) 108 | k = cv2.waitKey() 109 | if k == ord('q'): 110 | break 111 | elif k == ord('s'): 112 | cv2.imwrite('aaa.png', np.hstack((ann_img, msk_img))) 113 | 114 | if np.any(mask_miss) and not args.vis: 115 | mask_miss = mask_miss.astype(np.uint8) * 255 116 | save_path = os.path.join(save_dir, '{:012d}.png'.format(img_id)) 117 | cv2.imwrite(save_path, mask_miss) 118 | -------------------------------------------------------------------------------- /getData.sh: -------------------------------------------------------------------------------- 1 | # get COCO dataset 2 | mkdir coco 3 | cd coco 4 | 5 | wget http://images.cocodataset.org/zips/train2017.zip 6 | wget http://images.cocodataset.org/zips/val2017.zip 7 | wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip 8 | 9 | unzip train2017.zip 10 | unzip val2017.zip 11 | unzip annotations_trainval2017.zip 12 | 13 | rm -f train2017.zip 14 | rm -f val2017.zip 15 | rm -f annotations_trainval2017.zip 16 | -------------------------------------------------------------------------------- /hand_detector.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import argparse 3 | import numpy as np 4 | from scipy.ndimage.filters import gaussian_filter 5 | import torch 6 | import torch.nn.functional as F 7 | from entity import params 8 | from models.HandNet import HandNet 9 | 10 | class HandDetector(object): 11 | def __init__(self, weights_file): 12 | print('Loading HandNet...') 13 | self.model = HandNet() 14 | self.model.load_state_dict(torch.load(weights_file)) 15 | 16 | self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 17 | self.model = self.model.to(self.device) 18 | 19 | def detect(self, hand_img, fast_mode=False, hand_type="right"): 20 | if hand_type == "left": 21 | hand_img = cv2.flip(hand_img, 1) 22 | 23 | hand_img_h, hand_img_w, _ = hand_img.shape 24 | 25 | resized_image = cv2.resize(hand_img, (params["hand_inference_img_size"], params["hand_inference_img_size"])) 26 | x_data = np.array(resized_image[np.newaxis], dtype=np.float32).transpose(0, 3, 1, 2) / 256 - 0.5 27 | x_data = torch.tensor(x_data).to(self.device) 28 | x_data.requires_grad = False 29 | with torch.no_grad(): 30 | hs = self.model(x_data) 31 | 32 | heatmaps = F.interpolate(hs[-1], (hand_img_h, hand_img_w), mode='bilinear', align_corners=True).cpu().numpy()[0] 33 | 34 | if hand_type == "left": 35 | heatmaps = cv2.flip(heatmaps.transpose(1, 2, 0), 1).transpose(2, 0, 1) 36 | 37 | keypoints = self.compute_peaks_from_heatmaps(heatmaps) 38 | 39 | return keypoints 40 | 41 | def compute_peaks_from_heatmaps(self, heatmaps): 42 | keypoints = [] 43 | 44 | for i in range(heatmaps.shape[0] - 1): 45 | heatmap = gaussian_filter(heatmaps[i], sigma=params['gaussian_sigma']) 46 | max_value = heatmap.max() 47 | if max_value > params['hand_heatmap_peak_thresh']: 48 | coords = np.array(np.where(heatmap==max_value)).flatten().tolist() 49 | keypoints.append([coords[1], coords[0], max_value]) # x, y, conf 50 | else: 51 | keypoints.append(None) 52 | 53 | return keypoints 54 | 55 | def draw_hand_keypoints(orig_img, hand_keypoints, left_top): 56 | orig_img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB) 57 | img = orig_img.copy() 58 | left, top = left_top 59 | 60 | finger_colors = [ 61 | (0, 0, 255), 62 | (0, 255, 255), 63 | (0, 255, 0), 64 | (255, 0, 0), 65 | (255, 0, 255), 66 | ] 67 | 68 | for i, finger_indices in enumerate(params["fingers_indices"]): 69 | for finger_line_index in finger_indices: 70 | keypoint_from = hand_keypoints[finger_line_index[0]] 71 | keypoint_to = hand_keypoints[finger_line_index[1]] 72 | 73 | if keypoint_from: 74 | keypoint_from_x, keypoint_from_y, _ = keypoint_from 75 | cv2.circle(img, (keypoint_from_x + left, keypoint_from_y + top), 3, finger_colors[i], -1) 76 | 77 | if keypoint_to: 78 | keypoint_to_x, keypoint_to_y, _ = keypoint_to 79 | cv2.circle(img, (keypoint_to_x + left, keypoint_to_y + top), 3, finger_colors[i], -1) 80 | 81 | if keypoint_from and keypoint_to: 82 | cv2.line(img, (keypoint_from_x + left, keypoint_from_y + top), (keypoint_to_x + left, keypoint_to_y + top), finger_colors[i], 1) 83 | 84 | return img 85 | 86 | if __name__ == '__main__': 87 | parser = argparse.ArgumentParser(description='Face detector') 88 | parser.add_argument('weights', help='weights file path') 89 | parser.add_argument('--img', '-i', help='image file path') 90 | args = parser.parse_args() 91 | 92 | # load model 93 | hand_detector = HandDetector(args.weights) 94 | 95 | # read image 96 | img = cv2.imread(args.img) 97 | 98 | # inference 99 | hand_keypoints = hand_detector.detect(img, hand_type="right") 100 | 101 | # draw and save image 102 | img = draw_hand_keypoints(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), hand_keypoints, (0, 0)) 103 | print('Saving result into result.png...') 104 | cv2.imwrite('result.png', img) 105 | -------------------------------------------------------------------------------- /models/CocoPoseNet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.nn import Conv2d, Module, ReLU, MaxPool2d, init 3 | import torch.nn.functional as F 4 | import numpy as np 5 | 6 | def compute_loss(pafs_ys, heatmaps_ys, pafs_t, heatmaps_t, ignore_mask): 7 | heatmap_loss_log = [] 8 | paf_loss_log = [] 9 | total_loss = 0 10 | 11 | paf_masks = ignore_mask.unsqueeze(1).repeat([1, pafs_t.shape[1], 1, 1]) 12 | heatmap_masks = ignore_mask.unsqueeze(1).repeat([1, heatmaps_t.shape[1], 1, 1]) 13 | 14 | # compute loss on each stage 15 | for pafs_y, heatmaps_y in zip(pafs_ys, heatmaps_ys): 16 | stage_pafs_t = pafs_t.clone() 17 | stage_heatmaps_t = heatmaps_t.clone() 18 | stage_paf_masks = paf_masks.clone() 19 | stage_heatmap_masks = heatmap_masks.clone() 20 | 21 | if pafs_y.shape != stage_pafs_t.shape: 22 | with torch.no_grad(): 23 | stage_pafs_t = F.interpolate(stage_pafs_t, pafs_y.shape[2:], mode='bilinear', align_corners=True) 24 | stage_heatmaps_t = F.interpolate(stage_heatmaps_t, heatmaps_y.shape[2:], mode='bilinear', align_corners=True) 25 | stage_paf_masks = F.interpolate(stage_paf_masks, pafs_y.shape[2:]) > 0 26 | stage_heatmap_masks = F.interpolate(stage_heatmap_masks, heatmaps_y.shape[2:]) > 0 27 | 28 | with torch.no_grad(): 29 | stage_pafs_t[stage_paf_masks == 1] = pafs_y.detach()[stage_paf_masks == 1] 30 | stage_heatmaps_t[stage_heatmap_masks == 1] = heatmaps_y.detach()[stage_heatmap_masks == 1] 31 | 32 | pafs_loss = mean_square_error(pafs_y, stage_pafs_t) 33 | heatmaps_loss = mean_square_error(heatmaps_y, stage_heatmaps_t) 34 | 35 | total_loss += pafs_loss + heatmaps_loss 36 | 37 | paf_loss_log.append(pafs_loss.item()) 38 | heatmap_loss_log.append(heatmaps_loss.item()) 39 | 40 | return total_loss, np.array(paf_loss_log), np.array(heatmap_loss_log) 41 | 42 | def mean_square_error(pred, target): 43 | assert pred.shape == target.shape, 'x and y should in same shape' 44 | return torch.sum((pred - target) ** 2) / target.nelement() 45 | 46 | class CocoPoseNet(Module): 47 | insize = 368 48 | def __init__(self, path = None): 49 | super(CocoPoseNet, self).__init__() 50 | self.base = Base_model() 51 | self.stage_1 = Stage_1() 52 | self.stage_2 = Stage_x() 53 | self.stage_3 = Stage_x() 54 | self.stage_4 = Stage_x() 55 | self.stage_5 = Stage_x() 56 | self.stage_6 = Stage_x() 57 | for m in self.modules(): 58 | if isinstance(m, Conv2d): 59 | init.constant_(m.bias, 0) 60 | if path: 61 | self.base.vgg_base.load_state_dict(torch.load(path)) 62 | 63 | def forward(self, x): 64 | heatmaps = [] 65 | pafs = [] 66 | feature_map = self.base(x) 67 | h1, h2 = self.stage_1(feature_map) 68 | pafs.append(h1) 69 | heatmaps.append(h2) 70 | h1, h2 = self.stage_2(torch.cat([h1, h2, feature_map], dim = 1)) 71 | pafs.append(h1) 72 | heatmaps.append(h2) 73 | h1, h2 = self.stage_3(torch.cat([h1, h2, feature_map], dim = 1)) 74 | pafs.append(h1) 75 | heatmaps.append(h2) 76 | h1, h2 = self.stage_4(torch.cat([h1, h2, feature_map], dim = 1)) 77 | pafs.append(h1) 78 | heatmaps.append(h2) 79 | h1, h2 = self.stage_5(torch.cat([h1, h2, feature_map], dim = 1)) 80 | pafs.append(h1) 81 | heatmaps.append(h2) 82 | h1, h2 = self.stage_6(torch.cat([h1, h2, feature_map], dim = 1)) 83 | pafs.append(h1) 84 | heatmaps.append(h2) 85 | return pafs, heatmaps 86 | 87 | class VGG_Base(Module): 88 | def __init__(self): 89 | super(VGG_Base, self).__init__() 90 | self.conv1_1 = Conv2d(in_channels = 3, out_channels = 64, kernel_size = 3, stride = 1, padding = 1) 91 | self.conv1_2 = Conv2d(in_channels = 64, out_channels = 64, kernel_size = 3, stride = 1, padding = 1) 92 | self.conv2_1 = Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 1, padding = 1) 93 | self.conv2_2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1, padding = 1) 94 | self.conv3_1 = Conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 95 | self.conv3_2 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 96 | self.conv3_3 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 97 | self.conv3_4 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 98 | self.conv4_1 = Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 99 | self.conv4_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 100 | self.relu = ReLU() 101 | self.max_pooling_2d = MaxPool2d(kernel_size = 2, stride = 2) 102 | 103 | def forward(self, x): 104 | x = self.relu(self.conv1_1(x)) 105 | x = self.relu(self.conv1_2(x)) 106 | x = self.max_pooling_2d(x) 107 | x = self.relu(self.conv2_1(x)) 108 | x = self.relu(self.conv2_2(x)) 109 | x = self.max_pooling_2d(x) 110 | x = self.relu(self.conv3_1(x)) 111 | x = self.relu(self.conv3_2(x)) 112 | x = self.relu(self.conv3_3(x)) 113 | x = self.relu(self.conv3_4(x)) 114 | x = self.max_pooling_2d(x) 115 | x = self.relu(self.conv4_1(x)) 116 | x = self.relu(self.conv4_2(x)) 117 | return x 118 | 119 | class Base_model(Module): 120 | def __init__(self): 121 | super(Base_model, self).__init__() 122 | self.vgg_base = VGG_Base() 123 | self.conv4_3_CPM = Conv2d(in_channels=512, out_channels=256, kernel_size = 3, stride = 1, padding = 1) 124 | self.conv4_4_CPM = Conv2d(in_channels=256, out_channels=128, kernel_size = 3, stride = 1, padding = 1) 125 | self.relu = ReLU() 126 | def forward(self, x): 127 | x = self.vgg_base(x) 128 | x = self.relu(self.conv4_3_CPM(x)) 129 | x = self.relu(self.conv4_4_CPM(x)) 130 | return x 131 | 132 | class Stage_1(Module): 133 | def __init__(self): 134 | super(Stage_1, self).__init__() 135 | self.conv1_CPM_L1 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1) 136 | self.conv2_CPM_L1 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1) 137 | self.conv3_CPM_L1 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1) 138 | self.conv4_CPM_L1 = Conv2d(in_channels=128, out_channels=512, kernel_size=1, stride=1, padding=0) 139 | self.conv5_CPM_L1 = Conv2d(in_channels=512, out_channels=38, kernel_size=1, stride=1, padding=0) 140 | self.conv1_CPM_L2 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1) 141 | self.conv2_CPM_L2 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1) 142 | self.conv3_CPM_L2 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1) 143 | self.conv4_CPM_L2 = Conv2d(in_channels=128, out_channels=512, kernel_size=1, stride=1, padding=0) 144 | self.conv5_CPM_L2 = Conv2d(in_channels=512, out_channels=19, kernel_size=1, stride=1, padding=0) 145 | self.relu = ReLU() 146 | 147 | def forward(self, x): 148 | h1 = self.relu(self.conv1_CPM_L1(x)) # branch1 149 | h1 = self.relu(self.conv2_CPM_L1(h1)) 150 | h1 = self.relu(self.conv3_CPM_L1(h1)) 151 | h1 = self.relu(self.conv4_CPM_L1(h1)) 152 | h1 = self.conv5_CPM_L1(h1) 153 | h2 = self.relu(self.conv1_CPM_L2(x)) # branch2 154 | h2 = self.relu(self.conv2_CPM_L2(h2)) 155 | h2 = self.relu(self.conv3_CPM_L2(h2)) 156 | h2 = self.relu(self.conv4_CPM_L2(h2)) 157 | h2 = self.conv5_CPM_L2(h2) 158 | return h1, h2 159 | 160 | class Stage_x(Module): 161 | def __init__(self): 162 | super(Stage_x, self).__init__() 163 | self.conv1_L1 = Conv2d(in_channels = 185, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 164 | self.conv2_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 165 | self.conv3_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 166 | self.conv4_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 167 | self.conv5_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 168 | self.conv6_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 169 | self.conv7_L1 = Conv2d(in_channels = 128, out_channels = 38, kernel_size = 1, stride = 1, padding = 0) 170 | self.conv1_L2 = Conv2d(in_channels = 185, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 171 | self.conv2_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 172 | self.conv3_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 173 | self.conv4_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 174 | self.conv5_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 175 | self.conv6_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 176 | self.conv7_L2 = Conv2d(in_channels = 128, out_channels = 19, kernel_size = 1, stride = 1, padding = 0) 177 | self.relu = ReLU() 178 | 179 | def forward(self, x): 180 | h1 = self.relu(self.conv1_L1(x)) # branch1 181 | h1 = self.relu(self.conv2_L1(h1)) 182 | h1 = self.relu(self.conv3_L1(h1)) 183 | h1 = self.relu(self.conv4_L1(h1)) 184 | h1 = self.relu(self.conv5_L1(h1)) 185 | h1 = self.relu(self.conv6_L1(h1)) 186 | h1 = self.conv7_L1(h1) 187 | h2 = self.relu(self.conv1_L2(x)) # branch2 188 | h2 = self.relu(self.conv2_L2(h2)) 189 | h2 = self.relu(self.conv3_L2(h2)) 190 | h2 = self.relu(self.conv4_L2(h2)) 191 | h2 = self.relu(self.conv5_L2(h2)) 192 | h2 = self.relu(self.conv6_L2(h2)) 193 | h2 = self.conv7_L2(h2) 194 | return h1, h2 195 | -------------------------------------------------------------------------------- /models/FaceNet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.nn import Conv2d, Module, ReLU, MaxPool2d, init 3 | import torch.nn.functional as F 4 | 5 | class FaceNet(Module): 6 | insize = 368 7 | 8 | def __init__(self): 9 | super(FaceNet, self).__init__() 10 | # cnn to make feature map 11 | self.relu = ReLU() 12 | self.max_pooling_2d = MaxPool2d(kernel_size = 2, stride = 2) 13 | self.conv1_1 = Conv2d(in_channels = 3, out_channels = 64, kernel_size = 3, stride = 1, padding = 1) 14 | self.conv1_2 = Conv2d(in_channels = 64, out_channels = 64, kernel_size = 3, stride = 1, padding = 1) 15 | self.conv2_1 = Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 1, padding = 1) 16 | self.conv2_2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1, padding = 1) 17 | self.conv3_1 = Conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 18 | self.conv3_2 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 19 | self.conv3_3 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 20 | self.conv3_4 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 21 | self.conv4_1 = Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 22 | self.conv4_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 23 | self.conv4_3 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 24 | self.conv4_4 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 25 | self.conv5_1 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 26 | self.conv5_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 27 | self.conv5_3_CPM = Conv2d(in_channels = 512, out_channels = 128, kernel_size = 3, stride = 1, padding = 1) 28 | 29 | # stage1 30 | self.conv6_1_CPM = Conv2d(in_channels = 128, out_channels = 512, kernel_size = 1, stride = 1, padding = 0) 31 | self.conv6_2_CPM = Conv2d(in_channels = 512, out_channels = 71, kernel_size = 1, stride = 1, padding = 0) 32 | 33 | # stage2 34 | self.Mconv1_stage2 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 35 | self.Mconv2_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 36 | self.Mconv3_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 37 | self.Mconv4_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 38 | self.Mconv5_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 39 | self.Mconv6_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 40 | self.Mconv7_stage2 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0) 41 | 42 | # stage3 43 | self.Mconv1_stage3 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 44 | self.Mconv2_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 45 | self.Mconv3_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 46 | self.Mconv4_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 47 | self.Mconv5_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 48 | self.Mconv6_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 49 | self.Mconv7_stage3 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0) 50 | 51 | # stage4 52 | self.Mconv1_stage4 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 53 | self.Mconv2_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 54 | self.Mconv3_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 55 | self.Mconv4_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 56 | self.Mconv5_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 57 | self.Mconv6_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 58 | self.Mconv7_stage4 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0) 59 | 60 | # stage5 61 | self.Mconv1_stage5 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 62 | self.Mconv2_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 63 | self.Mconv3_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 64 | self.Mconv4_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 65 | self.Mconv5_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 66 | self.Mconv6_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 67 | self.Mconv7_stage5 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0) 68 | 69 | # stage6 70 | self.Mconv1_stage6 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 71 | self.Mconv2_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 72 | self.Mconv3_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 73 | self.Mconv4_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 74 | self.Mconv5_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 75 | self.Mconv6_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 76 | self.Mconv7_stage6 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0) 77 | 78 | for m in self.modules(): 79 | if isinstance(m, Conv2d): 80 | init.constant_(m.bias, 0) 81 | 82 | def __call__(self, x): 83 | heatmaps = [] 84 | 85 | h = self.relu(self.conv1_1(x)) 86 | h = self.relu(self.conv1_2(h)) 87 | h = self.max_pooling_2d(h) 88 | h = self.relu(self.conv2_1(h)) 89 | h = self.relu(self.conv2_2(h)) 90 | h = self.max_pooling_2d(h) 91 | h = self.relu(self.conv3_1(h)) 92 | h = self.relu(self.conv3_2(h)) 93 | h = self.relu(self.conv3_3(h)) 94 | h = self.relu(self.conv3_4(h)) 95 | h = self.max_pooling_2d(h) 96 | h = self.relu(self.conv4_1(h)) 97 | h = self.relu(self.conv4_2(h)) 98 | h = self.relu(self.conv4_3(h)) 99 | h = self.relu(self.conv4_4(h)) 100 | h = self.relu(self.conv5_1(h)) 101 | h = self.relu(self.conv5_2(h)) 102 | h = self.relu(self.conv5_3_CPM(h)) 103 | feature_map = h 104 | 105 | # stage1 106 | h = self.relu(self.conv6_1_CPM(h)) 107 | h = self.conv6_2_CPM(h) 108 | heatmaps.append(h) 109 | 110 | # stage2 111 | h = torch.cat([h, feature_map], dim= 1) # channel concat 112 | h = self.relu(self.Mconv1_stage2(h)) 113 | h = self.relu(self.Mconv2_stage2(h)) 114 | h = self.relu(self.Mconv3_stage2(h)) 115 | h = self.relu(self.Mconv4_stage2(h)) 116 | h = self.relu(self.Mconv5_stage2(h)) 117 | h = self.relu(self.Mconv6_stage2(h)) 118 | h = self.Mconv7_stage2(h) 119 | heatmaps.append(h) 120 | 121 | # stage3 122 | h = torch.cat([h, feature_map], dim= 1) # channel concat 123 | h = self.relu(self.Mconv1_stage3(h)) 124 | h = self.relu(self.Mconv2_stage3(h)) 125 | h = self.relu(self.Mconv3_stage3(h)) 126 | h = self.relu(self.Mconv4_stage3(h)) 127 | h = self.relu(self.Mconv5_stage3(h)) 128 | h = self.relu(self.Mconv6_stage3(h)) 129 | h = self.Mconv7_stage3(h) 130 | heatmaps.append(h) 131 | 132 | # stage4 133 | h = torch.cat([h, feature_map], dim= 1) # channel concat 134 | h = self.relu(self.Mconv1_stage4(h)) 135 | h = self.relu(self.Mconv2_stage4(h)) 136 | h = self.relu(self.Mconv3_stage4(h)) 137 | h = self.relu(self.Mconv4_stage4(h)) 138 | h = self.relu(self.Mconv5_stage4(h)) 139 | h = self.relu(self.Mconv6_stage4(h)) 140 | h = self.Mconv7_stage4(h) 141 | heatmaps.append(h) 142 | 143 | # stage5 144 | h = torch.cat([h, feature_map], dim= 1) # channel concat 145 | h = self.relu(self.Mconv1_stage5(h)) 146 | h = self.relu(self.Mconv2_stage5(h)) 147 | h = self.relu(self.Mconv3_stage5(h)) 148 | h = self.relu(self.Mconv4_stage5(h)) 149 | h = self.relu(self.Mconv5_stage5(h)) 150 | h = self.relu(self.Mconv6_stage5(h)) 151 | h = self.Mconv7_stage5(h) 152 | heatmaps.append(h) 153 | 154 | # stage6 155 | h = torch.cat([h, feature_map], dim= 1) # channel concat 156 | h = self.relu(self.Mconv1_stage6(h)) 157 | h = self.relu(self.Mconv2_stage6(h)) 158 | h = self.relu(self.Mconv3_stage6(h)) 159 | h = self.relu(self.Mconv4_stage6(h)) 160 | h = self.relu(self.Mconv5_stage6(h)) 161 | h = self.relu(self.Mconv6_stage6(h)) 162 | h = self.Mconv7_stage6(h) 163 | heatmaps.append(h) 164 | 165 | return heatmaps -------------------------------------------------------------------------------- /models/HandNet.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torch.nn import Conv2d, Module, ReLU, MaxPool2d, init 3 | import torch.nn.functional as F 4 | 5 | class HandNet(Module): 6 | insize = 368 7 | 8 | def __init__(self): 9 | super(HandNet, self).__init__() 10 | # cnn to make feature map 11 | self.relu = ReLU() 12 | self.max_pooling_2d = MaxPool2d(kernel_size = 2, stride = 2) 13 | self.conv1_1 = Conv2d(in_channels = 3, out_channels = 64, kernel_size = 3, stride = 1, padding = 1) 14 | self.conv1_2 = Conv2d(in_channels = 64, out_channels = 64, kernel_size = 3, stride = 1, padding = 1) 15 | self.conv2_1 = Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 1, padding = 1) 16 | self.conv2_2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1, padding = 1) 17 | self.conv3_1 = Conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 18 | self.conv3_2 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 19 | self.conv3_3 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 20 | self.conv3_4 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1) 21 | self.conv4_1 = Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 22 | self.conv4_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 23 | self.conv4_3 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 24 | self.conv4_4 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 25 | self.conv5_1 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 26 | self.conv5_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1) 27 | self.conv5_3_CPM = Conv2d(in_channels = 512, out_channels = 128, kernel_size = 3, stride = 1, padding = 1) 28 | 29 | # stage1 30 | self.conv6_1_CPM = Conv2d(in_channels = 128, out_channels = 512, kernel_size = 1, stride = 1, padding = 0) 31 | self.conv6_2_CPM = Conv2d(in_channels = 512, out_channels = 22, kernel_size = 1, stride = 1, padding = 0) 32 | 33 | # stage2 34 | self.Mconv1_stage2 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 35 | self.Mconv2_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 36 | self.Mconv3_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 37 | self.Mconv4_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 38 | self.Mconv5_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 39 | self.Mconv6_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 40 | self.Mconv7_stage2 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0) 41 | 42 | # stage3 43 | self.Mconv1_stage3 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 44 | self.Mconv2_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 45 | self.Mconv3_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 46 | self.Mconv4_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 47 | self.Mconv5_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 48 | self.Mconv6_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 49 | self.Mconv7_stage3 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0) 50 | 51 | # stage4 52 | self.Mconv1_stage4 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 53 | self.Mconv2_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 54 | self.Mconv3_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 55 | self.Mconv4_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 56 | self.Mconv5_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 57 | self.Mconv6_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 58 | self.Mconv7_stage4 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0) 59 | 60 | # stage5 61 | self.Mconv1_stage5 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 62 | self.Mconv2_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 63 | self.Mconv3_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 64 | self.Mconv4_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 65 | self.Mconv5_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 66 | self.Mconv6_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 67 | self.Mconv7_stage5 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0) 68 | 69 | # stage6 70 | self.Mconv1_stage6 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 71 | self.Mconv2_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 72 | self.Mconv3_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 73 | self.Mconv4_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 74 | self.Mconv5_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3) 75 | self.Mconv6_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0) 76 | self.Mconv7_stage6 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0) 77 | 78 | for m in self.modules(): 79 | if isinstance(m, Conv2d): 80 | init.constant_(m.bias, 0) 81 | 82 | def __call__(self, x): 83 | heatmaps = [] 84 | 85 | h = self.relu(self.conv1_1(x)) 86 | h = self.relu(self.conv1_2(h)) 87 | h = self.max_pooling_2d(h) 88 | h = self.relu(self.conv2_1(h)) 89 | h = self.relu(self.conv2_2(h)) 90 | h = self.max_pooling_2d(h) 91 | h = self.relu(self.conv3_1(h)) 92 | h = self.relu(self.conv3_2(h)) 93 | h = self.relu(self.conv3_3(h)) 94 | h = self.relu(self.conv3_4(h)) 95 | h = self.max_pooling_2d(h) 96 | h = self.relu(self.conv4_1(h)) 97 | h = self.relu(self.conv4_2(h)) 98 | h = self.relu(self.conv4_3(h)) 99 | h = self.relu(self.conv4_4(h)) 100 | h = self.relu(self.conv5_1(h)) 101 | h = self.relu(self.conv5_2(h)) 102 | h = self.relu(self.conv5_3_CPM(h)) 103 | feature_map = h 104 | 105 | # stage1 106 | h = self.relu(self.conv6_1_CPM(h)) 107 | h = self.conv6_2_CPM(h) 108 | heatmaps.append(h) 109 | 110 | # stage2 111 | h = torch.cat([h, feature_map], dim= 1) # channel concat 112 | h = self.relu(self.Mconv1_stage2(h)) 113 | h = self.relu(self.Mconv2_stage2(h)) 114 | h = self.relu(self.Mconv3_stage2(h)) 115 | h = self.relu(self.Mconv4_stage2(h)) 116 | h = self.relu(self.Mconv5_stage2(h)) 117 | h = self.relu(self.Mconv6_stage2(h)) 118 | h = self.Mconv7_stage2(h) 119 | heatmaps.append(h) 120 | 121 | # stage3 122 | h = torch.cat([h, feature_map], dim= 1) # channel concat 123 | h = self.relu(self.Mconv1_stage3(h)) 124 | h = self.relu(self.Mconv2_stage3(h)) 125 | h = self.relu(self.Mconv3_stage3(h)) 126 | h = self.relu(self.Mconv4_stage3(h)) 127 | h = self.relu(self.Mconv5_stage3(h)) 128 | h = self.relu(self.Mconv6_stage3(h)) 129 | h = self.Mconv7_stage3(h) 130 | heatmaps.append(h) 131 | 132 | # stage4 133 | h = torch.cat([h, feature_map], dim= 1) # channel concat 134 | h = self.relu(self.Mconv1_stage4(h)) 135 | h = self.relu(self.Mconv2_stage4(h)) 136 | h = self.relu(self.Mconv3_stage4(h)) 137 | h = self.relu(self.Mconv4_stage4(h)) 138 | h = self.relu(self.Mconv5_stage4(h)) 139 | h = self.relu(self.Mconv6_stage4(h)) 140 | h = self.Mconv7_stage4(h) 141 | heatmaps.append(h) 142 | 143 | # stage5 144 | h = torch.cat([h, feature_map], dim= 1) # channel concat 145 | h = self.relu(self.Mconv1_stage5(h)) 146 | h = self.relu(self.Mconv2_stage5(h)) 147 | h = self.relu(self.Mconv3_stage5(h)) 148 | h = self.relu(self.Mconv4_stage5(h)) 149 | h = self.relu(self.Mconv5_stage5(h)) 150 | h = self.relu(self.Mconv6_stage5(h)) 151 | h = self.Mconv7_stage5(h) 152 | heatmaps.append(h) 153 | 154 | # stage6 155 | h = torch.cat([h, feature_map], dim= 1) # channel concat 156 | h = self.relu(self.Mconv1_stage6(h)) 157 | h = self.relu(self.Mconv2_stage6(h)) 158 | h = self.relu(self.Mconv3_stage6(h)) 159 | h = self.relu(self.Mconv4_stage6(h)) 160 | h = self.relu(self.Mconv5_stage6(h)) 161 | h = self.relu(self.Mconv6_stage6(h)) 162 | h = self.Mconv7_stage6(h) 163 | heatmaps.append(h) 164 | 165 | return heatmaps -------------------------------------------------------------------------------- /openpose.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import math 3 | import time 4 | import numpy as np 5 | from scipy.ndimage.filters import gaussian_filter 6 | import os 7 | import torch 8 | import torch.nn.functional as F 9 | from tensorboardX import SummaryWriter 10 | from entity import params, JointType 11 | from models.CocoPoseNet import CocoPoseNet, compute_loss 12 | from tqdm import tqdm 13 | from torch.utils.data import DataLoader 14 | from torch.optim import Adam 15 | from datetime import datetime 16 | import time 17 | from matplotlib import pyplot as plt 18 | 19 | def get_time(): 20 | return (str(datetime.now())[:-10]).replace(' ','-').replace(':','-') 21 | 22 | class Openpose(object): 23 | def __init__(self, arch='posenet', weights_file=None, training = True): 24 | self.arch = arch 25 | if weights_file: 26 | self.model = params['archs'][arch]() 27 | self.model.load_state_dict(torch.load(weights_file)) 28 | else: 29 | self.model = params['archs'][arch](params['pretrained_path']) 30 | 31 | self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 32 | self.model = self.model.to(self.device) 33 | 34 | if training: 35 | from pycocotools.coco import COCO 36 | from coco_dataset import CocoDataset 37 | for para in self.model.base.vgg_base.parameters(): 38 | para.requires_grad = False 39 | coco_train = COCO(os.path.join(params['coco_dir'], 'annotations/person_keypoints_train2017.json')) 40 | coco_val = COCO(os.path.join(params['coco_dir'], 'annotations/person_keypoints_val2017.json')) 41 | self.train_loader = DataLoader(CocoDataset(coco_train, params['insize']), 42 | params['batch_size'], 43 | shuffle=True, 44 | pin_memory=False, 45 | num_workers=params['num_workers']) 46 | self.val_loader = DataLoader(CocoDataset(coco_val, params['insize'], mode = 'val'), 47 | params['batch_size'], 48 | shuffle=False, 49 | pin_memory=False, 50 | num_workers=params['num_workers']) 51 | self.train_length = len(self.train_loader) 52 | self.val_length = len(self.val_loader) 53 | self.step = 0 54 | self.writer = SummaryWriter(params['log_path']) 55 | self.board_loss_every = self.train_length // params['board_loss_interval'] 56 | self.evaluate_every = self.train_length // params['eval_interval'] 57 | self.board_pred_image_every = self.train_length // params['board_pred_image_interval'] 58 | self.save_every = self.train_length // params['save_interval'] 59 | self.optimizer = Adam([ 60 | {'params' : [*self.model.parameters()][20:24], 'lr' : params['lr'] / 4}, 61 | {'params' : [*self.model.parameters()][24:], 'lr' : params['lr']}]) 62 | # test only codes 63 | # self.board_loss_every = 5 64 | # self.evaluate_every = 5 65 | # self.board_pred_image_every = 5 66 | # self.save_every = 5 67 | 68 | def board_scalars(self, key, loss, paf_log, heatmap_log): 69 | self.writer.add_scalar('{}_loss'.format(key), loss, self.step) 70 | for stage, (paf_loss, heatmap_loss) in enumerate(zip(paf_log, heatmap_log)): 71 | self.writer.add_scalar('{}_paf_loss_stage{}'.format(key, stage), paf_loss, self.step) 72 | self.writer.add_scalar('{}_heatmap_loss_stage{}'.format(key, stage), heatmap_loss, self.step) 73 | 74 | def evaluate(self, num = 50): 75 | self.model.eval() 76 | count = 0 77 | running_loss = 0. 78 | running_paf_log = 0. 79 | running_heatmap_log = 0. 80 | with torch.no_grad(): 81 | for imgs, pafs, heatmaps, ignore_mask in iter(self.val_loader): 82 | imgs, pafs, heatmaps, ignore_mask = imgs.to(self.device), pafs.to(self.device), heatmaps.to(self.device), ignore_mask.to(self.device) 83 | pafs_ys, heatmaps_ys = self.model(imgs) 84 | total_loss, paf_loss_log, heatmap_loss_log = compute_loss(pafs_ys, heatmaps_ys, pafs, heatmaps, ignore_mask) 85 | running_loss += total_loss.item() 86 | running_paf_log += paf_loss_log 87 | running_heatmap_log += heatmap_loss_log 88 | count += 1 89 | if count >= num: 90 | break 91 | return running_loss / num, running_paf_log / num, running_heatmap_log / num 92 | 93 | def save_state(self, val_loss, to_save_folder=False, model_only=False): 94 | if to_save_folder: 95 | save_path = params['work_space']/'save' 96 | else: 97 | save_path = params['work_space']/'model' 98 | time = get_time() 99 | torch.save( 100 | self.model.state_dict(), save_path / 101 | ('model_{}_val_loss:{}_step:{}.pth'.format(time, val_loss, self.step))) 102 | if not model_only: 103 | torch.save( 104 | self.optimizer.state_dict(), save_path / 105 | ('optimizer_{}_val_loss:{}_step:{}.pth'.format(time, val_loss, self.step))) 106 | 107 | def load_state(self, fixed_str, from_save_folder=False, model_only=False): 108 | if from_save_folder: 109 | save_path = params['work_space']/'save' 110 | else: 111 | save_path = params['work_space']/'model' 112 | self.model.load_state_dict(torch.load(save_path/'model_{}'.format(fixed_str))) 113 | print('load model_{}'.format(fixed_str)) 114 | if not model_only: 115 | self.optimizer.load_state_dict(torch.load(save_path/'optimizer_{}'.format(fixed_str))) 116 | print('load optimizer_{}'.format(fixed_str)) 117 | 118 | def resume_training_load(self, from_save_folder=False): 119 | if from_save_folder: 120 | save_path = params['work_space']/'save' 121 | else: 122 | save_path = params['work_space']/'model' 123 | sorted_files = sorted([*save_path.iterdir()], key=lambda x: os.path.getmtime(x), reverse=True) 124 | seeking_flag = True 125 | index = 0 126 | while seeking_flag: 127 | if index > len(sorted_files) - 2: 128 | break 129 | file_a = sorted_files[index] 130 | file_b = sorted_files[index + 1] 131 | if file_a.name.startswith('model'): 132 | fix_str = file_a.name[6:] 133 | self.step = int(fix_str.split(':')[-1].split('.')[0]) + 1 134 | if file_b.name == ''.join(['optimizer', '_', fix_str]): 135 | if self.step > 2000: 136 | for para in self.model.base.vgg_base.parameters(): 137 | para.requires_grad = True 138 | self.optimizer.add_param_group({'params' : [*self.model.base.vgg_base.parameters()], 'lr' : params['lr'] / 4}) 139 | self.load_state(fix_str, from_save_folder) 140 | print(self.optimizer) 141 | return 142 | else: 143 | index += 1 144 | continue 145 | elif file_a.name.startswith('optimizer'): 146 | fix_str = file_a.name[10:] 147 | self.step = int(fix_str.split(':')[-1].split('.')[0]) + 1 148 | if file_b.name == ''.join(['model', '_', fix_str]): 149 | if self.step > 2000: 150 | for para in self.model.base.vgg_base.parameters(): 151 | para.requires_grad = True 152 | self.optimizer.add_param_group({'params' : [*self.model.base.vgg_base.parameters()], 'lr' : params['lr'] / 4}) 153 | self.load_state(fix_str, from_save_folder) 154 | print(self.optimizer) 155 | return 156 | else: 157 | index += 1 158 | continue 159 | else: 160 | index += 1 161 | continue 162 | print('no available files founded') 163 | return 164 | 165 | def find_lr(self, 166 | init_value=1e-8, 167 | final_value=10., 168 | beta=0.98, 169 | bloding_scale=4., 170 | num=None): 171 | if not num: 172 | num = len(self.train_loader) 173 | mult = (final_value / init_value)**(1 / num) 174 | lr = init_value 175 | for params in self.optimizer.param_groups: 176 | params['lr'] = lr 177 | self.model.train() 178 | avg_loss = 0. 179 | best_loss = 0. 180 | batch_num = 0 181 | losses = [] 182 | log_lrs = [] 183 | for i, (imgs, pafs, heatmaps, ignore_mask) in tqdm(enumerate(self.train_loader), total=num): 184 | 185 | imgs, pafs, heatmaps, ignore_mask = imgs.to(self.device), pafs.to(self.device), heatmaps.to(self.device), ignore_mask.to(self.device) 186 | self.optimizer.zero_grad() 187 | batch_num += 1 188 | pafs_ys, heatmaps_ys = self.model(imgs) 189 | loss, _, _ = compute_loss(pafs_ys, heatmaps_ys, pafs, heatmaps, ignore_mask) 190 | 191 | self.optimizer.step() 192 | 193 | #Compute the smoothed loss 194 | avg_loss = beta * avg_loss + (1 - beta) * loss.item() 195 | self.writer.add_scalar('avg_loss', avg_loss, batch_num) 196 | smoothed_loss = avg_loss / (1 - beta**batch_num) 197 | self.writer.add_scalar('smoothed_loss', smoothed_loss,batch_num) 198 | #Stop if the loss is exploding 199 | if batch_num > 1 and smoothed_loss > bloding_scale * best_loss: 200 | print('exited with best_loss at {}'.format(best_loss)) 201 | plt.plot(log_lrs[10:-5], losses[10:-5]) 202 | return log_lrs, losses 203 | #Record the best loss 204 | if smoothed_loss < best_loss or batch_num == 1: 205 | best_loss = smoothed_loss 206 | #Store the values 207 | losses.append(smoothed_loss) 208 | log_lrs.append(math.log10(lr)) 209 | self.writer.add_scalar('log_lr', math.log10(lr), batch_num) 210 | #Do the SGD step 211 | #Update the lr for the next step 212 | 213 | loss.backward() 214 | self.optimizer.step() 215 | 216 | lr *= mult 217 | for params in self.optimizer.param_groups: 218 | params['lr'] = lr 219 | if batch_num > num: 220 | plt.plot(log_lrs[10:-5], losses[10:-5]) 221 | return log_lrs, losses 222 | 223 | def lr_schedule(self): 224 | for params in self.optimizer.param_groups: 225 | params['lr'] /= 10. 226 | print(self.optimizer) 227 | 228 | def train(self, resume = False): 229 | running_loss = 0. 230 | running_paf_log = 0. 231 | running_heatmap_log = 0. 232 | if resume: 233 | self.resume_training_load() 234 | for epoch in range(60): 235 | for imgs, pafs, heatmaps, ignore_mask in tqdm(iter(self.train_loader)): 236 | if self.step == 2000: 237 | for para in self.model.base.vgg_base.parameters(): 238 | para.requires_grad = True 239 | self.optimizer.add_param_group({'params' : [*self.model.base.vgg_base.parameters()], 'lr' : params['lr'] / 4}) 240 | if self.step == 100000 or self.step == 200000: 241 | self.lr_schedule() 242 | 243 | imgs, pafs, heatmaps, ignore_mask = imgs.to(self.device), pafs.to(self.device), heatmaps.to(self.device), ignore_mask.to(self.device) 244 | self.optimizer.zero_grad() 245 | pafs_ys, heatmaps_ys = self.model(imgs) 246 | total_loss, paf_loss_log, heatmap_loss_log = compute_loss(pafs_ys, heatmaps_ys, pafs, heatmaps, ignore_mask) 247 | total_loss.backward() 248 | self.optimizer.step() 249 | running_loss += total_loss.item() 250 | running_paf_log += paf_loss_log 251 | running_heatmap_log += heatmap_loss_log 252 | 253 | if (self.step % self.board_loss_every == 0) & (self.step != 0): 254 | self.board_scalars('train', 255 | running_loss / self.board_loss_every, 256 | running_paf_log / self.board_loss_every, 257 | running_heatmap_log / self.board_loss_every) 258 | running_loss = 0. 259 | running_paf_log = 0. 260 | running_heatmap_log = 0. 261 | 262 | if (self.step % self.evaluate_every == 0) & (self.step != 0): 263 | val_loss, paf_loss_val_log, heatmap_loss_val_log = self.evaluate(num = params['eva_num']) 264 | self.model.train() 265 | self.board_scalars('val', val_loss, paf_loss_val_log, heatmap_loss_val_log) 266 | 267 | if (self.step % self.board_pred_image_every == 0) & (self.step != 0): 268 | self.model.eval() 269 | with torch.no_grad(): 270 | for i in range(20): 271 | img_id = self.val_loader.dataset.imgIds[i] 272 | img_path = os.path.join(params['coco_dir'], 'val2017', self.val_loader.dataset.coco.loadImgs([img_id])[0]['file_name']) 273 | img = cv2.imread(img_path) 274 | # inference 275 | poses, _ = self.detect(img) 276 | # draw and save image 277 | img = draw_person_pose(img, poses) 278 | img = torch.tensor(img.transpose(2,0,1)) 279 | self.writer.add_image('pred_image_{}'.format(i), img, global_step=self.step) 280 | self.model.train() 281 | 282 | if (self.step % self.save_every == 0) & (self.step != 0): 283 | self.save_state(val_loss) 284 | 285 | self.step += 1 286 | if self.step > 300000: 287 | break 288 | 289 | def pad_image(self, img, stride, pad_value): 290 | h, w, _ = img.shape 291 | 292 | pad = [0] * 2 293 | pad[0] = (stride - (h % stride)) % stride # down 294 | pad[1] = (stride - (w % stride)) % stride # right 295 | 296 | img_padded = np.zeros((h+pad[0], w+pad[1], 3), 'uint8') + pad_value 297 | img_padded[:h, :w, :] = img.copy() 298 | return img_padded, pad 299 | 300 | def compute_optimal_size(self, orig_img, img_size, stride=8): 301 | """画像の幅と高さがstrideの倍数になるように調節する""" 302 | orig_img_h, orig_img_w, _ = orig_img.shape 303 | aspect = orig_img_h / orig_img_w 304 | if orig_img_h < orig_img_w: 305 | img_h = img_size 306 | img_w = np.round(img_size / aspect).astype(int) 307 | surplus = img_w % stride 308 | if surplus != 0: 309 | img_w += stride - surplus 310 | else: 311 | img_w = img_size 312 | img_h = np.round(img_size * aspect).astype(int) 313 | surplus = img_h % stride 314 | if surplus != 0: 315 | img_h += stride - surplus 316 | return (img_w, img_h) 317 | 318 | def compute_peaks_from_heatmaps(self, heatmaps): 319 | """all_peaks: shape = [N, 5], column = (jointtype, x, y, score, index)""" 320 | #heatmaps.shape : (19, 584, 584) 321 | #heatmaps[-1]是背景，训练时有用，推断时没用，这里去掉 322 | heatmaps = heatmaps[:-1] 323 | 324 | all_peaks = [] 325 | peak_counter = 0 326 | for i , heatmap in enumerate(heatmaps): 327 | heatmap = gaussian_filter(heatmap, sigma=params['gaussian_sigma']) 328 | ''' 329 | 可以和下面的GPU codes对比一下， 330 | 这里的gaussian_filter其实就是拿一个gaussian_kernel在输出的heatmaps上depth为1的平面卷积， 331 | 因为网络拟合的heatmap也是在目标点上生成的gaussian heatmap, 332 | 这样卷积比较合理地找到最贴近目标点的坐标 333 | ''' 334 | map_left = np.zeros(heatmap.shape) 335 | map_right = np.zeros(heatmap.shape) 336 | map_top = np.zeros(heatmap.shape) 337 | map_bottom = np.zeros(heatmap.shape) 338 | ''' 339 | 我的理解，其实这里left和top, right和bottom搞反了，但是不影响最终结果 340 | ''' 341 | map_left[1:, :] = heatmap[:-1, :] 342 | map_right[:-1, :] = heatmap[1:, :] 343 | map_top[:, 1:] = heatmap[:, :-1] 344 | map_bottom[:, :-1] = heatmap[:, 1:] 345 | 346 | peaks_binary = np.logical_and.reduce(( 347 | heatmap > params['heatmap_peak_thresh'], 348 | heatmap > map_left, 349 | heatmap > map_right, 350 | heatmap > map_top, 351 | heatmap > map_bottom, 352 | )) 353 | ''' 354 | 这一步操作厉害了，找的是heatmap上满足如下两个条件的所有的点： 355 | 1. 该点的值大于 params['heatmap_peak_thresh']，默认是0.05 356 | 2. 其次，该点的值比它上、下、左、右四个点的值都要大 357 | 满足以上条件为True,否则False 358 | ''' 359 | peaks = zip(np.nonzero(peaks_binary)[1], np.nonzero(peaks_binary)[0]) # [(x, y), (x, y)...]のpeak座標配列 360 | '''np.nonzero返回的坐标格式是[y,x],这里被改成了[x,y]''' 361 | peaks_with_score = [(i,) + peak_pos + (heatmap[peak_pos[1], peak_pos[0]],) for peak_pos in peaks] 362 | ''' 363 | [(0, 387, 47, 0.050346997), 364 | (0, 388, 47, 0.050751492), 365 | (0, 389, 47, 0.051055912), 366 | .....] 367 | (关节点的index, x坐标, y坐标, heatmap value) 368 | ''' 369 | peaks_id = range(peak_counter, peak_counter + len(peaks_with_score)) 370 | peaks_with_score_and_id = [peaks_with_score[i] + (peaks_id[i], ) for i in range(len(peaks_id))] 371 | ''' 372 | [(0, 387, 47, 0.050346997, 0), 373 | (0, 388, 47, 0.050751492, 1), 374 | (0, 389, 47, 0.051055912, 2), 375 | (0, 390, 47, 0.051255725, 3), 376 | ......] 377 | 这一步还把序号带上了 378 | ''' 379 | peak_counter += len(peaks_with_score_and_id) 380 | all_peaks.append(peaks_with_score_and_id) 381 | all_peaks = np.array([peak for peaks_each_category in all_peaks for peak in peaks_each_category]) 382 | '''还可以这样写啊，两层语法糖''' 383 | return all_peaks 384 | 385 | def compute_candidate_connections(self, paf, cand_a, cand_b, img_len, params): 386 | candidate_connections = [] 387 | for joint_a in cand_a: 388 | for joint_b in cand_b: # jointは(x, y)座標 389 | vector = joint_b[:2] - joint_a[:2] 390 | norm = np.linalg.norm(vector) 391 | if norm == 0: 392 | continue 393 | ys = np.linspace(joint_a[1], joint_b[1], num=params['n_integ_points']) 394 | xs = np.linspace(joint_a[0], joint_b[0], num=params['n_integ_points']) 395 | integ_points = np.stack([ys, xs]).T.round().astype('i') 396 | ''' 397 | # joint_aとjoint_bの2点間を結ぶ線分上の座標点 [[x1, y1], [x2, y2]...] 398 | # 连接joint_a和joint_b的线段上的坐标点[[x1，y1]，[x2，y2] ...] 399 | params['n_integ_points'] = 10 400 | integ_points = 401 | array([[ 32, 242], 402 | [ 36, 241], 403 | [ 39, 240], 404 | [ 43, 239], 405 | [ 47, 238], 406 | [ 50, 236], 407 | [ 54, 235], 408 | [ 58, 234], 409 | [ 61, 233], 410 | [ 65, 232]], dtype=int32) 411 | 通过在连接joint_a和joint_b的线段上sample 10个点，取整之后得到坐标值 412 | ''' 413 | paf_in_edge = np.hstack([paf[0][np.hsplit(integ_points, 2)], paf[1][np.hsplit(integ_points, 2)]]) 414 | ''' 415 | paf_in_edge.shape : (10, 2) 416 | paf_in_edge代表在这10个点上的paf预测值 417 | ''' 418 | unit_vector = vector / norm 419 | inner_products = np.dot(paf_in_edge, unit_vector) 420 | integ_value = inner_products.sum() / len(inner_products) 421 | ''' 422 | 以上三行相当于论文中的公式10 423 | 通过sample的方法来代替求积分 424 | ''' 425 | ''' 426 | # vectorの長さが基準値以上の時にペナルティを与える 427 | # 当vector的长度大于或等于参考值时给予惩罚 428 | params['limb_length_ratio'] = 1 429 | params['length_penalty_value'] = 1 430 | img_len = 原始图片的width 431 | ''' 432 | integ_value_with_dist_prior = integ_value + min(params['limb_length_ratio'] * img_len / norm - params['length_penalty_value'], 0) 433 | ''' 434 | params['inner_product_thresh'] = 0.05 435 | params['n_integ_points_thresh'] = 8 436 | 以下条件控制表示， 437 | 只有当这10个点里面至少有8个点的paf向量值与连接joint_a与joint_b之间的单位向量的点积大于0.05时， 438 | 并且这10个点的平均值（近似积分值）> 0时，才认为存在一条可能的connection,并把它记录下来 439 | ''' 440 | n_valid_points = sum(inner_products > params['inner_product_thresh']) 441 | if n_valid_points > params['n_integ_points_thresh'] and integ_value_with_dist_prior > 0: 442 | candidate_connections.append([int(joint_a[3]), int(joint_b[3]), integ_value_with_dist_prior]) 443 | ''' 444 | 这里记录下来的是joint_a和joint_b在all_peaks里的序号，还有积分值 445 | joint_a,joint_b是从cand_a, cand_b中枚举出来的， 446 | 而cand_a和cand_b都是从all_peaks里面map出来的 447 | ''' 448 | candidate_connections = sorted(candidate_connections, key=lambda x: x[2], reverse=True) 449 | '''在所有取到的可能connection里通过积分值的大小排个序''' 450 | return candidate_connections 451 | ''' 452 | len(all_connections) = 19 453 | 正好代表了19种可能的躯干，每一种躯干里面又是一个代表了所有可能connections的array 454 | 比如all_connections[2] = 455 | array([[47. , 51. , 0.86362792], 456 | [46. , 50. , 0.71809054], 457 | [45. , 49. , 0.59873392], 458 | [44. , 48. , 0.3711632 ]]) 459 | 里面有4个连接，这4个连接有可能属于不同的人的， 460 | 然后通过最后一步操作grouping_key_points，把每个人的躯干给组合起来 461 | ''' 462 | 463 | def compute_connections(self, pafs, all_peaks, img_len, params): 464 | all_connections = [] 465 | for i in range(len(params['limbs_point'])): 466 | ''' 467 | params['limbs_point']: 468 | [[, ], 469 | [, ], 470 | [, ], 471 | [, ], 472 | [, ], 473 | [, ], 474 | [, ], 475 | [, ], 476 | [, ], 477 | [, ], 478 | [, ], 479 | [, ], 480 | [, ], 481 | [, ], 482 | [, ], 483 | [, ], 484 | [, ], 485 | [, ], 486 | [, ]] 487 | 代表的是limb躯干的数量，也就是PAF的种类，总共19种， 488 | 可以理解这个大循环执行的是一个个的任务， 489 | 第一次只是执行寻找比如说Neck到右腰的所有的可能的connection, 490 | 然后是右腰到右膝的所有可能的connection,总共19次任务 491 | ''' 492 | paf_index = [i*2, i*2 + 1] 493 | paf = pafs[paf_index] # shape: (2, 320, 320) 494 | ''' 495 | 这里paf的channel为什么等于2 ？ 496 | 因为paf代表的是一个向量场，在躯干覆盖在图片上的这一块区域内， 497 | 每个点都代表一个2维向量， 498 | ground truth的paf代表的是从joint_a到joint_b的单位向量（预测的值自然也会与GT接近） 499 | ''' 500 | limb_point = params['limbs_point'][i] # example: [, ] 501 | cand_a = all_peaks[all_peaks[:, 0] == limb_point[0]][:, 1:] 502 | cand_b = all_peaks[all_peaks[:, 0] == limb_point[1]][:, 1:] 503 | ''' 504 | all_peaks[:, 0]代表peak归属于哪个joint的序号 505 | cand_a表示candidate_a 506 | 表示对应这个关节的所有peak的[x,y,value,idx] 507 | 因为一张图里可能有很多个人， 508 | 这里对应这个关节的peak点也很可能分别属于不同的人 509 | 接下来就要在两两相连的两种关节之间，通过PAF去计算哪些关节是可能相连的 510 | 注意每次任务只会仅仅在两种关节间寻找可能的connection 511 | ''' 512 | 513 | if len(cand_a) > 0 and len(cand_b) > 0: 514 | candidate_connections = self.compute_candidate_connections(paf, cand_a, cand_b, img_len, params) 515 | ''' 516 | candidate_connections: 517 | [[9, 42, 0.8681351658332168], 518 | [8, 41, 0.8360657979306498], 519 | [10, 43, 0.7184696600989704], 520 | [7, 40, 0.619533988669367], 521 | [6, 39, 0.25027479198405156]] 522 | ############## [index_a, index_b, value] 523 | ''' 524 | connections = np.zeros((0, 3)) 525 | ''' 526 | connections : array([], shape=(0, 3), dtype=float64) 527 | 又学到一招，创建一个0行3列的array，相当于一个模板，给后续工作服务 528 | ''' 529 | for index_a, index_b, score in candidate_connections: 530 | if index_a not in connections[:, 0] and index_b not in connections[:, 1]: 531 | ''' 532 | 这个if条件起到了抑制重复连接的作用 533 | index_a代表起始点，index_b代表终点 534 | 一个peak点（或者说检测出来的关节点）， 535 | 在一次任务里只可能有一次机会作为起始点或终点， 536 | 因为前面已经根据积分值的大小做了排序， 537 | 积分值最高的自然会被优先选择，后面的可能的connection如果重复了，就被抑制掉了 538 | ''' 539 | connections = np.vstack([connections, [index_a, index_b, score]]) 540 | if len(connections) >= min(len(cand_a), len(cand_b)): 541 | break 542 | all_connections.append(connections) 543 | else: 544 | all_connections.append(np.zeros((0, 3))) 545 | return all_connections 546 | 547 | def grouping_key_points(self, all_connections, candidate_peaks, params): 548 | subsets = -1 * np.ones((0, 20)) 549 | ''' 550 | subsets是用来记录grouping结果的array 551 | 为什么是20列，前面18位用来记录关节点信息 552 | 总共18种关节，每一位对应candidate_peaks中的一个peak的序号 553 | 倒数第二位用来记录整个subset的得分 554 | 倒数第一位用来记录这个subset已经记录了多少个关节 555 | 18种关节不见得都要找到，缺失了就沿用默认值-1 556 | 每一个subset就代表可能检测到的一个人体 557 | ''' 558 | 559 | for l, connections in enumerate(all_connections): 560 | joint_a, joint_b = params['limbs_point'][l] 561 | ''' 562 | 19种躯干的连接，按照params['limbs_point'][l]的顺序依次读取之前预测出来的可能的connection 563 | 比如 (, ) 564 | 注意这里如果按照顺序来，是不需要走回头路的，没有重复，因为人的躯干就这么几种，不存在重复的问题 565 | ''' 566 | for ind_a, ind_b, score in connections[:, :3]: 567 | ''' 568 | 内循环，针对同一种躯干类型，之前的步骤已经预测出了很多个connection， 569 | 这些connection按道理应该属于不同的人体，因为一个人不可能有两个相同的躯干， 570 | 接下来几个if条件判断的是这3种情况，根据当前这个connection的两个端点的归属情况： 571 | 572 | 1. 如果都不属于任何一个已有的subset（joint_found_cnt == 0）,则根据这个connection创建一个新的subset 573 | 2. 如果只有一个属于一个已有的subset,另一个哪都不属于，这种情况最简单，直接在这个已有的subset上再添加一个connection(躯干) 574 | 3. 如果在两个已有的subset上都能找到这两个端点之中的任何一个，那还要看： 575 | a. 如果这两个subset上完全没有任何重复的关节，那就把这两个subset合并 576 | b. 如果有重复，那就把这个connection在这两个subset上都添加上去（反正最后会把得分较低的subset删掉） 577 | 4. 如果有3个以上的subset都包含有这两个端点之中的任何一个，直接pass 578 | ''' 579 | ind_a, ind_b = int(ind_a), int(ind_b) 580 | joint_found_cnt = 0 581 | joint_found_subset_index = [-1, -1] 582 | for subset_ind, subset in enumerate(subsets): 583 | # そのconnectionのjointをもってるsubsetがいる場合 584 | # 如果存在具有该连接的联合的子集 585 | if subset[joint_a] == ind_a or subset[joint_b] == ind_b: 586 | joint_found_subset_index[joint_found_cnt] = subset_ind 587 | joint_found_cnt += 1 588 | # 上面这个for循环遍历所有已有的subset,判断当前connection是两个端点到底和几个subset重合 589 | # print('joint_found_cnt : {}'.format(joint_found_cnt)) 590 | # print('joint_a : {}, joint_b : {}'.format(joint_a, joint_b)) 591 | # print('ind_a : {}, ind_b : {}'.format(ind_a, ind_b)) 592 | if joint_found_cnt == 1: 593 | # ''' 594 | # 只有一个subset有重合的情况 595 | # そのconnectionのどちらかのjointをsubsetが持っている場合 596 | # 如果子集具有该连接的一个关节 597 | # ''' 598 | found_subset = subsets[joint_found_subset_index[0]] 599 | # 肩->耳のconnectionの組合せを除いて、始点の一致しか起こり得ない。肩->耳の場合、終点が一致していた場合は、既に顔のbone検出済みなので処理不要。 600 | # 除了肩 - 耳连接的组合，只能出现起点的匹配。在肩膀 - >耳朵的情况下，如果端点匹配，则已经不必处理，因为已经检测到面部的骨骼。 601 | if found_subset[joint_b] != ind_b: 602 | found_subset[joint_b] = ind_b 603 | found_subset[-1] += 1 # increment joint count 604 | found_subset[-2] += candidate_peaks[ind_b, 3] + score # joint bのscoreとconnectionの積分値を加算 # 添加关节b的得分和连接的积分值 605 | 606 | elif joint_found_cnt == 2: # '''有2个subset有重合的情况''' 607 | # subset1にjoint1が、subset2にjoint2がある場合(肩->耳のconnectionの組合せした起こり得ない) 608 | # 如果子集1中存在关节1而子集2中存在关节2（通过组合肩 - >耳连接不会发生） 609 | # print('limb {}: 2 subsets have any joint'.format(l)) 610 | found_subset_1 = subsets[joint_found_subset_index[0]] 611 | found_subset_2 = subsets[joint_found_subset_index[1]] 612 | 613 | membership = ((found_subset_1 >= 0).astype(int) + (found_subset_2 >= 0).astype(int))[:-2] 614 | if not np.any(membership == 2): # merge two subsets when no duplication 615 | found_subset_1[:-2] += found_subset_2[:-2] + 1 # default is -1 616 | found_subset_1[-2:] += found_subset_2[-2:] 617 | found_subset_1[-2:] += score # 这一步应该是错误吧，应该是found_subset_1[-2] += score，没有必要把score值加到joint_count上面去，不过不影响最终结果 618 | # connectionの積分値のみ加算(jointのscoreはmerge時に全て加算済み) 619 | # 仅添加连接的积分值（在合并时添加联合分数） 620 | subsets = np.delete(subsets, joint_found_subset_index[1], axis=0) 621 | else: 622 | if found_subset_1[joint_a] == -1: 623 | found_subset_1[joint_a] = ind_a 624 | found_subset_1[-1] += 1 625 | found_subset_1[-2] += candidate_peaks[ind_a, 3] + score 626 | elif found_subset_1[joint_b] == -1: 627 | found_subset_1[joint_b] = ind_b 628 | found_subset_1[-1] += 1 629 | found_subset_1[-2] += candidate_peaks[ind_b, 3] + score 630 | if found_subset_2[joint_a] == -1: 631 | found_subset_2[joint_a] = ind_a 632 | found_subset_2[-1] += 1 633 | found_subset_2[-2] += candidate_peaks[ind_a, 3] + score 634 | elif found_subset_2[joint_b] == -1: 635 | found_subset_2[joint_b] = ind_b 636 | found_subset_2[-1] += 1 637 | found_subset_2[-2] += candidate_peaks[ind_b, 3] + score 638 | 639 | elif joint_found_cnt == 0 and l != 9 and l != 13: 640 | # 新規subset作成, 肩耳のconnectionは新規group対象外 641 | # 如果没有任何现成的subset匹配，则创建新的子集，肩耳连接不适用于创建新组 642 | row = -1 * np.ones(20) 643 | row[joint_a] = ind_a 644 | row[joint_b] = ind_b 645 | row[-1] = 2 646 | row[-2] = sum(candidate_peaks[[ind_a, ind_b], 3]) + score 647 | subsets = np.vstack([subsets, row]) 648 | elif joint_found_cnt >= 3: 649 | pass 650 | 651 | # delete low score subsets 652 | keep = np.logical_and(subsets[:, -1] >= params['n_subset_limbs_thresh'], subsets[:, -2]/subsets[:, -1] >= params['subset_score_thresh']) 653 | # params['n_subset_limbs_thresh'] = 3 654 | # params['subset_score_thresh'] = 0.2 655 | subsets = subsets[keep] 656 | return subsets 657 | 658 | 659 | def subsets_to_pose_array(self, subsets, all_peaks): 660 | ''' 661 | 这个函数没啥， 662 | 就是根据每一个subsets里的peak点的id, 663 | 去all_peaks里面取对应的坐标，然后组装成输出 664 | ''' 665 | person_pose_array = [] 666 | for subset in subsets: 667 | joints = [] 668 | for joint_index in subset[:18].astype('i'): 669 | if joint_index >= 0: 670 | joint = all_peaks[joint_index][1:3].tolist() 671 | joint.append(2) 672 | joints.append(joint) 673 | else: 674 | joints.append([0, 0, 0]) 675 | person_pose_array.append(np.array(joints)) 676 | person_pose_array = np.array(person_pose_array) 677 | return person_pose_array 678 | 679 | def compute_limbs_length(self, joints): 680 | limbs = [] 681 | limbs_len = np.zeros(len(params["limbs_point"])) 682 | for i, joint_indices in enumerate(params["limbs_point"]): 683 | if joints[joint_indices[0]] is not None and joints[joint_indices[1]] is not None: 684 | limbs.append([joints[joint_indices[0]], joints[joint_indices[1]]]) 685 | limbs_len[i] = np.linalg.norm(joints[joint_indices[1]][:-1] - joints[joint_indices[0]][:-1]) 686 | else: 687 | limbs.append(None) 688 | 689 | return limbs_len, limbs 690 | 691 | def compute_unit_length(self, limbs_len): 692 | unit_length = 0 693 | base_limbs_len = limbs_len[[14, 3, 0, 13, 9]] # (鼻首、首左腰、首右腰、肩左耳、肩右耳)の長さの比率(このどれかが存在すればこれを優先的に単位長さの計算する) 694 | non_zero_limbs_len = base_limbs_len > 0 695 | if len(np.nonzero(non_zero_limbs_len)[0]) > 0: 696 | limbs_len_ratio = np.array([0.85, 2.2, 2.2, 0.85, 0.85]) 697 | unit_length = np.sum(base_limbs_len[non_zero_limbs_len] / limbs_len_ratio[non_zero_limbs_len]) / len(np.nonzero(non_zero_limbs_len)[0]) 698 | else: 699 | limbs_len_ratio = np.array([2.2, 1.7, 1.7, 2.2, 1.7, 1.7, 0.6, 0.93, 0.65, 0.85, 0.6, 0.93, 0.65, 0.85, 1, 0.2, 0.2, 0.25, 0.25]) 700 | non_zero_limbs_len = limbs_len > 0 701 | unit_length = np.sum(limbs_len[non_zero_limbs_len] / limbs_len_ratio[non_zero_limbs_len]) / len(np.nonzero(non_zero_limbs_len)[0]) 702 | 703 | return unit_length 704 | 705 | def get_unit_length(self, person_pose): 706 | limbs_length, limbs = self.compute_limbs_length(person_pose) 707 | unit_length = self.compute_unit_length(limbs_length) 708 | 709 | return unit_length 710 | 711 | def crop_around_keypoint(self, img, keypoint, crop_size): 712 | x, y = keypoint 713 | left = int(x - crop_size) 714 | top = int(y - crop_size) 715 | right = int(x + crop_size) 716 | bottom = int(y + crop_size) 717 | bbox = (left, top, right, bottom) 718 | 719 | cropped_img = self.crop_image(img, bbox) 720 | 721 | return cropped_img, bbox 722 | 723 | def crop_person(self, img, person_pose, unit_length): 724 | top_joint_priority = [4, 5, 6, 12, 16, 7, 13, 17, 8, 10, 14, 9, 11, 15, 2, 3, 0, 1, sys.maxsize] 725 | bottom_joint_priority = [9, 6, 7, 14, 16, 8, 15, 17, 4, 2, 0, 5, 3, 1, 10, 11, 12, 13, sys.maxsize] 726 | 727 | top_joint_index = len(top_joint_priority) - 1 728 | bottom_joint_index = len(bottom_joint_priority) - 1 729 | left_joint_index = 0 730 | right_joint_index = 0 731 | top_pos = sys.maxsize 732 | bottom_pos = 0 733 | left_pos = sys.maxsize 734 | right_pos = 0 735 | 736 | for i, joint in enumerate(person_pose): 737 | if joint[2] > 0: 738 | if top_joint_priority[i] < top_joint_priority[top_joint_index]: 739 | top_joint_index = i 740 | elif bottom_joint_priority[i] < bottom_joint_priority[bottom_joint_index]: 741 | bottom_joint_index = i 742 | if joint[1] < top_pos: 743 | top_pos = joint[1] 744 | elif joint[1] > bottom_pos: 745 | bottom_pos = joint[1] 746 | 747 | if joint[0] < left_pos: 748 | left_pos = joint[0] 749 | left_joint_index = i 750 | elif joint[0] > right_pos: 751 | right_pos = joint[0] 752 | right_joint_index = i 753 | 754 | top_padding_radio = [0.9, 1.9, 1.9, 2.9, 3.7, 1.9, 2.9, 3.7, 4.0, 5.5, 7.0, 4.0, 5.5, 7.0, 0.7, 0.8, 0.7, 0.8] 755 | bottom_padding_radio = [6.9, 5.9, 5.9, 4.9, 4.1, 5.9, 4.9, 4.1, 3.8, 2.3, 0.8, 3.8, 2.3, 0.8, 7.1, 7.0, 7.1, 7.0] 756 | 757 | left = (left_pos - 0.3 * unit_length).astype(int) 758 | right = (right_pos + 0.3 * unit_length).astype(int) 759 | top = (top_pos - top_padding_radio[top_joint_index] * unit_length).astype(int) 760 | bottom = (bottom_pos + bottom_padding_radio[bottom_joint_index] * unit_length).astype(int) 761 | bbox = (left, top, right, bottom) 762 | 763 | cropped_img = self.crop_image(img, bbox) 764 | return cropped_img, bbox 765 | 766 | def crop_face(self, img, person_pose, unit_length): 767 | face_size = unit_length 768 | face_img = None 769 | bbox = None 770 | 771 | # if have nose 772 | if person_pose[JointType.Nose][2] > 0: 773 | nose_pos = person_pose[JointType.Nose][:2] 774 | face_top = int(nose_pos[1] - face_size * 1.2) 775 | face_bottom = int(nose_pos[1] + face_size * 0.8) 776 | face_left = int(nose_pos[0] - face_size) 777 | face_right = int(nose_pos[0] + face_size) 778 | bbox = (face_left, face_top, face_right, face_bottom) 779 | face_img = self.crop_image(img, bbox) 780 | 781 | return face_img, bbox 782 | 783 | def crop_hands(self, img, person_pose, unit_length): 784 | hands = { 785 | "left": None, 786 | "right": None 787 | } 788 | 789 | if person_pose[JointType.LeftHand][2] > 0: 790 | crop_center = person_pose[JointType.LeftHand][:-1] 791 | if person_pose[JointType.LeftElbow][2] > 0: 792 | direction_vec = person_pose[JointType.LeftHand][:-1] - person_pose[JointType.LeftElbow][:-1] 793 | crop_center += (0.3 * direction_vec).astype(crop_center.dtype) 794 | hand_img, bbox = self.crop_around_keypoint(img, crop_center, unit_length * 0.95) 795 | hands["left"] = { 796 | "img": hand_img, 797 | "bbox": bbox 798 | } 799 | 800 | if person_pose[JointType.RightHand][2] > 0: 801 | crop_center = person_pose[JointType.RightHand][:-1] 802 | if person_pose[JointType.RightElbow][2] > 0: 803 | direction_vec = person_pose[JointType.RightHand][:-1] - person_pose[JointType.RightElbow][:-1] 804 | crop_center += (0.3 * direction_vec).astype(crop_center.dtype) 805 | hand_img, bbox = self.crop_around_keypoint(img, crop_center, unit_length * 0.95) 806 | hands["right"] = { 807 | "img": hand_img, 808 | "bbox": bbox 809 | } 810 | 811 | return hands 812 | 813 | def crop_image(self, img, bbox): 814 | left, top, right, bottom = bbox 815 | img_h, img_w, img_ch = img.shape 816 | box_h = bottom - top 817 | box_w = right - left 818 | 819 | crop_left = max(0, left) 820 | crop_top = max(0, top) 821 | crop_right = min(img_w, right) 822 | crop_bottom = min(img_h, bottom) 823 | crop_h = crop_bottom - crop_top 824 | crop_w = crop_right - crop_left 825 | cropped_img = img[crop_top:crop_bottom, crop_left:crop_right] 826 | 827 | bias_x = bias_y = 0 828 | if left < crop_left: 829 | bias_x = crop_left - left 830 | if top < crop_top: 831 | bias_y = crop_top - top 832 | 833 | # pad 834 | padded_img = np.zeros((box_h, box_w, img_ch), dtype=np.uint8) 835 | padded_img[bias_y:bias_y+crop_h, bias_x:bias_x+crop_w] = cropped_img 836 | return padded_img 837 | 838 | def preprocess(self, img): 839 | x_data = img.astype('f') 840 | x_data /= 255 841 | x_data -= 0.5 842 | x_data = x_data.transpose(2, 0, 1)[None] 843 | return x_data 844 | 845 | def detect_precise(self, orig_img): 846 | orig_img_h, orig_img_w, _ = orig_img.shape 847 | 848 | pafs_sum = 0 849 | heatmaps_sum = 0 850 | 851 | interpolation = cv2.INTER_CUBIC 852 | 853 | for scale in params['inference_scales']: 854 | # TTA, multl scale testing, scale in [0.5, 1, 1.5, 2] 855 | multiplier = scale * params['inference_img_size'] / min(orig_img.shape[:2]) 856 | # 通过scale和实际输入img尺寸判断缩放参数，然后resize输入图片 857 | img = cv2.resize(orig_img, (math.ceil(orig_img_w*multiplier), math.ceil(orig_img_h*multiplier)), interpolation=interpolation) 858 | # bbox = (params['inference_img_size'], max(params['inference_img_size'], img.shape[1])) 859 | # 这个bbox有什么用？可以删掉吧 860 | padded_img, pad = self.pad_image(img, params['downscale'], (104, 117, 123)) 861 | # 图片经过manpool缩小8倍，如果不能整除，就pad一下，(104, 117, 123)是输入数据集的均值？ 862 | 863 | x_data = self.preprocess(padded_img) 864 | x_data = torch.tensor(x_data).to(self.device) 865 | x_data.requires_grad = False 866 | 867 | with torch.no_grad(): 868 | 869 | h1s, h2s = self.model(x_data) #输出的是6组相同尺寸的feature，训练的时候是都有用，但是推断的时候就只用最后一组 870 | 871 | tmp_paf = h1s[-1][0].cpu().numpy().transpose(1, 2, 0) 872 | tmp_heatmap = h2s[-1][0].cpu().numpy().transpose(1, 2, 0) 873 | 874 | p_h, p_w = padded_img.shape[:2] 875 | tmp_paf = cv2.resize(tmp_paf, (p_w, p_h), interpolation=interpolation) 876 | #首先，paf先 resize到padded_img的尺寸 877 | tmp_paf = tmp_paf[:p_h-pad[0], :p_w-pad[1], :] 878 | #去掉padding 879 | pafs_sum += cv2.resize(tmp_paf, (orig_img_w, orig_img_h), interpolation=interpolation) 880 | #再resize回原始的输入img的尺寸 881 | 882 | tmp_heatmap = cv2.resize(tmp_heatmap, (0, 0), fx=params['downscale'], fy=params['downscale'], interpolation=interpolation) 883 | tmp_heatmap = tmp_heatmap[:padded_img.shape[0]-pad[0], :padded_img.shape[1]-pad[1], :] 884 | heatmaps_sum += cv2.resize(tmp_heatmap, (orig_img_w, orig_img_h), interpolation=interpolation) 885 | #heat_map的操作和pafs一样 886 | 887 | #经过多个scale的feature计算，再对pafs_sum和heatmaps_sum求均值，就得到了TTA最终的输出feature 888 | 889 | self.pafs = (pafs_sum / len(params['inference_scales'])).transpose(2, 0, 1) 890 | self.heatmaps = (heatmaps_sum / len(params['inference_scales'])).transpose(2, 0, 1) 891 | 892 | self.all_peaks = self.compute_peaks_from_heatmaps(self.heatmaps) 893 | if len(self.all_peaks) == 0: 894 | return np.empty((0, len(JointType), 3)), np.empty(0) 895 | all_connections = self.compute_connections(self.pafs, self.all_peaks, orig_img_w, params) 896 | subsets = self.grouping_key_points(all_connections, self.all_peaks, params) 897 | poses = self.subsets_to_pose_array(subsets, self.all_peaks) 898 | scores = subsets[:, -2] 899 | return poses, scores 900 | 901 | def detect(self, orig_img, precise = False): 902 | orig_img = orig_img.copy() 903 | if precise: 904 | return self.detect_precise(orig_img) 905 | orig_img_h, orig_img_w, _ = orig_img.shape 906 | 907 | input_w, input_h = self.compute_optimal_size(orig_img, params['inference_img_size']) 908 | map_w, map_h = self.compute_optimal_size(orig_img, params['heatmap_size']) 909 | 910 | resized_image = cv2.resize(orig_img, (input_w, input_h)) 911 | x_data = self.preprocess(resized_image) 912 | 913 | x_data = torch.tensor(x_data).to(self.device) 914 | x_data.requires_grad = False 915 | 916 | with torch.no_grad(): 917 | 918 | h1s, h2s = self.model(x_data) 919 | 920 | pafs = F.interpolate(h1s[-1], (map_h, map_w), mode='bilinear', align_corners=True).cpu().numpy()[0] 921 | heatmaps = F.interpolate(h2s[-1], (map_h, map_w), mode='bilinear', align_corners=True).cpu().numpy()[0] 922 | 923 | all_peaks = self.compute_peaks_from_heatmaps(heatmaps) 924 | if len(all_peaks) == 0: 925 | return np.empty((0, len(JointType), 3)), np.empty(0) 926 | all_connections = self.compute_connections(pafs, all_peaks, map_w, params) 927 | subsets = self.grouping_key_points(all_connections, all_peaks, params) 928 | all_peaks[:, 1] *= orig_img_w / map_w 929 | all_peaks[:, 2] *= orig_img_h / map_h 930 | poses = self.subsets_to_pose_array(subsets, all_peaks) 931 | scores = subsets[:, -2] 932 | return poses, scores 933 | 934 | 935 | def draw_person_pose(orig_img, poses): 936 | orig_img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB) 937 | if len(poses) == 0: 938 | return orig_img 939 | 940 | limb_colors = [ 941 | [0, 255, 0], [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255], 942 | [0, 85, 255], [255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0.], 943 | [255, 0, 85], [170, 255, 0], [85, 255, 0], [170, 0, 255.], [0, 0, 255], 944 | [0, 0, 255], [255, 0, 255], [170, 0, 255], [255, 0, 170], 945 | ] 946 | 947 | joint_colors = [ 948 | [255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0], 949 | [85, 255, 0], [0, 255, 0], [0, 255, 85], [0, 255, 170], [0, 255, 255], 950 | [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], [170, 0, 255], 951 | [255, 0, 255], [255, 0, 170], [255, 0, 85]] 952 | 953 | canvas = orig_img.copy() 954 | 955 | # limbs 956 | for pose in poses.round().astype('i'): 957 | for i, (limb, color) in enumerate(zip(params['limbs_point'], limb_colors)): 958 | if i != 9 and i != 13: # don't show ear-shoulder connection 959 | limb_ind = np.array(limb) 960 | if np.all(pose[limb_ind][:, 2] != 0): 961 | joint1, joint2 = pose[limb_ind][:, :2] 962 | cv2.line(canvas, tuple(joint1), tuple(joint2), color, 2) 963 | 964 | # joints 965 | for pose in poses.round().astype('i'): 966 | for i, ((x, y, v), color) in enumerate(zip(pose, joint_colors)): 967 | if v != 0: 968 | cv2.circle(canvas, (x, y), 3, color, -1) 969 | return canvas -------------------------------------------------------------------------------- /pose_detect.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import argparse 3 | from openpose import Openpose, draw_person_pose 4 | 5 | if __name__ == '__main__': 6 | parser = argparse.ArgumentParser(description='Pose detector') 7 | parser.add_argument('weights', help='weights file path') 8 | parser.add_argument('--img', '-i', help='image file path') 9 | parser.add_argument('--precise', '-p', action='store_true', help='do precise inference') 10 | args = parser.parse_args() 11 | 12 | # load model 13 | openpose = Openpose(weights_file = args.weights, training = False) 14 | 15 | # read image 16 | img = cv2.imread(args.img) 17 | 18 | # inference 19 | poses, _ = openpose.detect(img, precise=args.precise) 20 | 21 | # draw and save image 22 | img = draw_person_pose(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), poses) 23 | 24 | print('Saving result into result.png...') 25 | cv2.imwrite('result.png', img) -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | from openpose import Openpose 2 | import argparse 3 | 4 | def parse_args(): 5 | parser = argparse.ArgumentParser(description="Train openpose") 6 | parser.add_argument("-r", "--resume", help="whether resume from the latest saved model",action="store_true") 7 | parser.add_argument("-save", "--from_save_folder", help="whether resume from the save path",action="store_true") 8 | args = parser.parse_args() 9 | return args 10 | 11 | if __name__ == '__main__': 12 | args = parse_args() 13 | openpose = Openpose() 14 | if args.resume: 15 | openpose.resume_training_load(from_save_folder = args.from_save_folder) 16 | openpose.train() --------------------------------------------------------------------------------