├── README.md
├── coco_dataset.py
├── data
├── dinner.png
├── dinner_result.png
├── face.jpg
├── face.png
├── face_result.png
├── football.jpg
├── football_detected.jpg
├── hand.jpg
├── hand.png
├── hand_result.png
├── people.png
├── people_result.png
├── person.png
└── person_result.png
├── entity.py
├── face_detector.py
├── gen_ignore_mask.py
├── getData.sh
├── hand_detector.py
├── models
├── CocoPoseNet.py
├── FaceNet.py
└── HandNet.py
├── openpose.py
├── pose_detect.py
└── train.py
/README.md:
--------------------------------------------------------------------------------
1 | # Pytorch0.4.1_Realtime\_Multi-Person\_Pose\_Estimation
2 |
3 | This is an implementation of [Realtime Multi-Person Pose Estimation](https://arxiv.org/abs/1611.08050) with Pytorch.
4 | The original project is here.
5 |
6 | This repo is mainly based on the [Chainer Implementation](https://github.com/DeNA/Chainer_Realtime_Multi-Person_Pose_Estimation) from DeNA.
7 |
8 | This project is licensed under the terms of the [license](https://github.com/DeNA/Chainer_Realtime_Multi-Person_Pose_Estimation/blob/master/LICENSE)
9 |
10 | Some contributions are:
11 |
12 | 1. Switching the Chainer backbone to Pytorch
13 | 2. Pretrained models in pytorch are provided
14 | 3. Chinese comments to better understanding the method
15 |
16 | ## Content
17 |
18 | 1. [Converting caffe model](#convert-caffe-model-to-chainer-model)
19 | 2. [Testing](#test-using-the-trained-model)
20 | 3. [Training](#train-your-model-from-scratch)
21 |
22 | ## Test using the trained model
23 |
24 | Download the following pretrained models to `models` folder
25 |
26 | posenet.pth : [@Google Drive](https://drive.google.com/open?id=19AIYt2lez5V3x4wFVJvVvWwQpB8uoQp2) [@One Drive](https://1drv.ms/u/s!AhMqVPD44cDOhxrwKTyv9yv3FRVq)
27 |
28 | facenet.pth : [@Google Drive](https://drive.google.com/open?id=1zjv4fQt3Sd567VpesqAEO4JVsjB79xpZ) [@One Drive](https://1drv.ms/u/s!AhMqVPD44cDOhxwu2Nmf1eXNmOXd)
29 |
30 | handnet.pth : [@Google Drive](https://drive.google.com/open?id=1LdWngNbcamMJFAuRaqT45Iar2tA_eUXd) [@One Drive](https://1drv.ms/u/s!AhMqVPD44cDOhxs1EIBYqksR6avn)
31 |
32 | Execute the following command with the weight parameter file and the image file as arguments for estimating pose.
33 | The resulting image will be saved as `result.jpg`.
34 |
35 | ```
36 | python pose_detect.py models/posenet.pth -i data/football.jpg
37 | ```
38 |
39 |
40 |

41 |
42 |

43 |
44 |
45 |
46 |
47 | Similarly, execute the following command for face estimation.
48 | The resulting image will be saved as `result.png`.
49 |
50 | ```
51 | python face_detector.py models/facenet.pth -i data/face.png
52 | ```
53 |
54 |
55 |

56 |
57 |

58 |
59 |
60 |
61 |
62 | Similarly, execute the following command for hand estimation.
63 | The resulting image will be saved as `result.jpg`.
64 |
65 | ```
66 | python hand_detector.py models/handnet.pth -i data/hand.jpg
67 | ```
68 |
69 |
70 |

71 |
72 |

73 |
74 |
75 |
76 |
77 | ## Train your model
78 |
79 | This is a training procedure using COCO 2017 dataset.
80 |
81 | ### Download COCO 2017 dataset
82 |
83 | ```
84 | cd data
85 | bash getData.sh
86 | ```
87 |
88 | If you already downloaded the dataset by yourself, please skip this procedure and change coco_dir in `entity.py` to the dataset path that was already downloaded.
89 |
90 | ### Setup COCO API
91 |
92 | ```
93 | git clone https://github.com/cocodataset/cocoapi.git
94 | cd cocoapi/PythonAPI/
95 | make
96 | python setup.py install
97 | cd ../../
98 | ```
99 |
100 | ### Download [VGG-19 pretrained model](https://1drv.ms/u/s!AhMqVPD44cDOhx3dz655sCwOck2X) to `models` folder
101 |
102 | ### Generate and save image masks
103 |
104 | Mask images are created in order to filter out people regions who were not labeled with any keypoints.
105 | `vis` option can be used to visualize the mask generated from each image.
106 |
107 | ```
108 | python gen_ignore_mask.py
109 | ```
110 |
111 | ### Train with COCO dataset
112 |
113 | For each 1000 iterations, the recent weight parameters are saved as a weight file `model_iter_1000`.
114 |
115 | ```
116 | python train.py
117 | ```
118 |
119 | More configuration about training are in the `entity.py` file
120 |
121 | ## Related repository
122 |
123 | - CVPR'16, [Convolutional Pose Machines](https://github.com/shihenw/convolutional-pose-machines-release).
124 | - CVPR'17, [Realtime Multi-Person Pose Estimation](https://github.com/ZheC/Realtime_Multi-Person_Pose_Estimation).
125 |
126 |
127 |
128 | ## Citation
129 |
130 | Please cite the original paper in your publications if it helps your research:
131 |
132 | ```
133 | @InProceedings{cao2017realtime,
134 | title = {Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields},
135 | author = {Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh},
136 | booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
137 | year = {2017}
138 | }
139 | ```
--------------------------------------------------------------------------------
/coco_dataset.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import cv2
4 | import math
5 | import random
6 | import numpy as np
7 | import torch
8 | from torch.utils.data import Dataset
9 | from pycocotools.coco import COCO
10 |
11 | from entity import JointType, params
12 |
13 | class CocoDataset(Dataset):
14 | def __init__(self, coco, insize, mode='train', n_samples=None):
15 | self.coco = coco
16 | assert mode in ['train', 'val', 'eval'], 'Data loading mode is invalid.'
17 | self.mode = mode
18 | self.catIds = coco.getCatIds(catNms=['person'])
19 | self.imgIds = sorted(coco.getImgIds(catIds=self.catIds))
20 | if self.mode in ['val', 'eval'] and n_samples is not None:
21 | self.imgIds = random.sample(self.imgIds, n_samples)
22 | print('{} images: {}'.format(mode, len(self)))
23 | self.insize = insize
24 |
25 | def __len__(self):
26 | return len(self.imgIds)
27 |
28 | def overlay_paf(self, img, paf):
29 | hue = ((np.arctan2(paf[1], paf[0]) / np.pi) / -2 + 0.5)
30 | saturation = np.sqrt(paf[0] ** 2 + paf[1] ** 2)
31 | saturation[saturation > 1.0] = 1.0
32 | value = saturation.copy()
33 | hsv_paf = np.vstack((hue[np.newaxis], saturation[np.newaxis], value[np.newaxis])).transpose(1, 2, 0)
34 | rgb_paf = cv2.cvtColor((hsv_paf * 255).astype(np.uint8), cv2.COLOR_HSV2BGR)
35 | img = cv2.addWeighted(img, 0.6, rgb_paf, 0.4, 0)
36 | return img
37 |
38 | def overlay_pafs(self, img, pafs):
39 | mix_paf = np.zeros((2,) + img.shape[:-1])
40 | paf_flags = np.zeros(mix_paf.shape) # for constant paf
41 |
42 | for paf in pafs.reshape((int(pafs.shape[0]/2), 2,) + pafs.shape[1:]):
43 | paf_flags = paf != 0
44 | paf_flags += np.broadcast_to(paf_flags[0] | paf_flags[1], paf.shape)
45 | mix_paf += paf
46 |
47 | mix_paf[paf_flags > 0] /= paf_flags[paf_flags > 0]
48 | img = self.overlay_paf(img, mix_paf)
49 | return img
50 |
51 | def overlay_heatmap(self, img, heatmap):
52 | rgb_heatmap = cv2.applyColorMap((heatmap * 255).astype(np.uint8), cv2.COLORMAP_JET)
53 | img = cv2.addWeighted(img, 0.6, rgb_heatmap, 0.4, 0)
54 | return img
55 |
56 | def overlay_ignore_mask(self, img, ignore_mask):
57 | img = img * np.repeat((ignore_mask == 0).astype(np.uint8)[:, :, None], 3, axis=2)
58 | return img
59 |
60 | def get_pose_bboxes(self, poses):
61 | pose_bboxes = []
62 | for pose in poses:
63 | x1 = pose[pose[:, 2] > 0][:, 0].min()
64 | y1 = pose[pose[:, 2] > 0][:, 1].min()
65 | x2 = pose[pose[:, 2] > 0][:, 0].max()
66 | y2 = pose[pose[:, 2] > 0][:, 1].max()
67 | pose_bboxes.append([x1, y1, x2, y2])
68 | pose_bboxes = np.array(pose_bboxes)
69 | return pose_bboxes
70 |
71 | def resize_data(self, img, ignore_mask, poses, shape):
72 | """resize img, mask and annotations"""
73 | img_h, img_w, _ = img.shape
74 |
75 | resized_img = cv2.resize(img, shape)
76 | ignore_mask = cv2.resize(ignore_mask.astype(np.uint8), shape).astype('bool')
77 | poses[:, :, :2] = (poses[:, :, :2] * np.array(shape) / np.array((img_w, img_h)))
78 | return resized_img, ignore_mask, poses
79 |
80 | def random_resize_img(self, img, ignore_mask, poses):
81 | h, w, _ = img.shape
82 | joint_bboxes = self.get_pose_bboxes(poses)
83 | bbox_sizes = ((joint_bboxes[:, 2:] - joint_bboxes[:, :2] + 1)**2).sum(axis=1)**0.5
84 |
85 | min_scale = params['min_box_size']/bbox_sizes.min()
86 | max_scale = params['max_box_size']/bbox_sizes.max()
87 |
88 | # print(len(bbox_sizes))
89 | # print('min: {}, max: {}'.format(min_scale, max_scale))
90 |
91 | min_scale = min(max(min_scale, params['min_scale']), 1)
92 | max_scale = min(max(max_scale, 1), params['max_scale'])
93 |
94 | # print('min: {}, max: {}'.format(min_scale, max_scale))
95 |
96 | scale = float((max_scale - min_scale) * random.random() + min_scale)
97 | shape = (round(w * scale), round(h * scale))
98 |
99 | # print(scale)
100 |
101 | resized_img, resized_mask, resized_poses = self.resize_data(img, ignore_mask, poses, shape)
102 | return resized_img, resized_mask, poses
103 |
104 | def random_rotate_img(self, img, mask, poses):
105 | h, w, _ = img.shape
106 | # degree = (random.random() - 0.5) * 2 * params['max_rotate_degree']
107 | degree = np.random.randn() / 3 * params['max_rotate_degree']
108 | rad = degree * math.pi / 180
109 | center = (w / 2, h / 2)
110 | R = cv2.getRotationMatrix2D(center, degree, 1)
111 | bbox = (w*abs(math.cos(rad)) + h*abs(math.sin(rad)), w*abs(math.sin(rad)) + h*abs(math.cos(rad)))
112 | R[0, 2] += bbox[0] / 2 - center[0]
113 | R[1, 2] += bbox[1] / 2 - center[1]
114 | rotate_img = cv2.warpAffine(img, R, (int(bbox[0]+0.5), int(bbox[1]+0.5)), flags=cv2.INTER_CUBIC,
115 | borderMode=cv2.BORDER_CONSTANT, borderValue=[127.5, 127.5, 127.5])
116 | rotate_mask = cv2.warpAffine(mask.astype('uint8')*255, R, (int(bbox[0]+0.5), int(bbox[1]+0.5))) > 0
117 |
118 | tmp_poses = np.ones_like(poses)
119 | tmp_poses[:, :, :2] = poses[:, :, :2].copy()
120 | tmp_rotate_poses = np.dot(tmp_poses, R.T) # apply rotation matrix to the poses
121 | rotate_poses = poses.copy() # to keep visibility flag
122 | rotate_poses[:, :, :2] = tmp_rotate_poses
123 | return rotate_img, rotate_mask, rotate_poses
124 |
125 | def random_crop_img(self, img, ignore_mask, poses):
126 | h, w, _ = img.shape
127 | insize = self.insize
128 | joint_bboxes = self.get_pose_bboxes(poses)
129 | bbox = random.choice(joint_bboxes) # select a bbox randomly
130 | bbox_center = bbox[:2] + (bbox[2:] - bbox[:2])/2
131 |
132 | r_xy = np.random.rand(2)
133 | perturb = ((r_xy - 0.5) * 2 * params['center_perterb_max'])
134 | center = (bbox_center + perturb + 0.5).astype('i')
135 |
136 | crop_img = np.zeros((insize, insize, 3), 'uint8') + 127.5
137 | crop_mask = np.zeros((insize, insize), 'bool')
138 |
139 | offset = (center - (insize-1)/2 + 0.5).astype('i')
140 | offset_ = (center + (insize-1)/2 - (w-1, h-1) + 0.5).astype('i')
141 |
142 | x1, y1 = (center - (insize-1)/2 + 0.5).astype('i')
143 | x2, y2 = (center + (insize-1)/2 + 0.5).astype('i')
144 |
145 | x1 = max(x1, 0)
146 | y1 = max(y1, 0)
147 | x2 = min(x2, w-1)
148 | y2 = min(y2, h-1)
149 |
150 | x_from = -offset[0] if offset[0] < 0 else 0
151 | y_from = -offset[1] if offset[1] < 0 else 0
152 | x_to = insize - offset_[0] - 1 if offset_[0] >= 0 else insize - 1
153 | y_to = insize - offset_[1] - 1 if offset_[1] >= 0 else insize - 1
154 |
155 | crop_img[y_from:y_to+1, x_from:x_to+1] = img[y1:y2+1, x1:x2+1].copy()
156 | crop_mask[y_from:y_to+1, x_from:x_to+1] = ignore_mask[y1:y2+1, x1:x2+1].copy()
157 |
158 | poses[:, :, :2] -= offset
159 | return crop_img.astype('uint8'), crop_mask, poses
160 |
161 | def distort_color(self, img):
162 | img_max = np.broadcast_to(np.array(255, dtype=np.uint8), img.shape[:-1])
163 | img_min = np.zeros(img.shape[:-1], dtype=np.uint8)
164 |
165 | hsv_img = cv2.cvtColor(img.copy(), cv2.COLOR_BGR2HSV).astype(np.int32)
166 | hsv_img[:, :, 0] = np.maximum(np.minimum(hsv_img[:, :, 0] - 10 + np.random.randint(20 + 1), img_max), img_min) # hue
167 | hsv_img[:, :, 1] = np.maximum(np.minimum(hsv_img[:, :, 1] - 40 + np.random.randint(80 + 1), img_max), img_min) # saturation
168 | hsv_img[:, :, 2] = np.maximum(np.minimum(hsv_img[:, :, 2] - 30 + np.random.randint(60 + 1), img_max), img_min) # value
169 | hsv_img = hsv_img.astype(np.uint8)
170 |
171 | distorted_img = cv2.cvtColor(hsv_img, cv2.COLOR_HSV2BGR)
172 | return distorted_img
173 |
174 | def flip_img(self, img, mask, poses):
175 | flipped_img = cv2.flip(img, 1)
176 | flipped_mask = cv2.flip(mask.astype(np.uint8), 1).astype('bool')
177 | poses[:, :, 0] = img.shape[1] - 1 - poses[:, :, 0]
178 |
179 | def swap_joints(poses, joint_type_1, joint_type_2):
180 | tmp = poses[:, joint_type_1].copy()
181 | poses[:, joint_type_1] = poses[:, joint_type_2]
182 | poses[:, joint_type_2] = tmp
183 |
184 | swap_joints(poses, JointType.LeftEye, JointType.RightEye)
185 | swap_joints(poses, JointType.LeftEar, JointType.RightEar)
186 | swap_joints(poses, JointType.LeftShoulder, JointType.RightShoulder)
187 | swap_joints(poses, JointType.LeftElbow, JointType.RightElbow)
188 | swap_joints(poses, JointType.LeftHand, JointType.RightHand)
189 | swap_joints(poses, JointType.LeftWaist, JointType.RightWaist)
190 | swap_joints(poses, JointType.LeftKnee, JointType.RightKnee)
191 | swap_joints(poses, JointType.LeftFoot, JointType.RightFoot)
192 | return flipped_img, flipped_mask, poses
193 |
194 | def augment_data(self, img, ignore_mask, poses):
195 | aug_img = img.copy()
196 | aug_img, ignore_mask, poses = self.random_resize_img(aug_img, ignore_mask, poses)
197 | aug_img, ignore_mask, poses = self.random_rotate_img(aug_img, ignore_mask, poses)
198 | aug_img, ignore_mask, poses = self.random_crop_img(aug_img, ignore_mask, poses)
199 | if np.random.randint(2):
200 | aug_img = self.distort_color(aug_img)
201 | if np.random.randint(2):
202 | aug_img, ignore_mask, poses = self.flip_img(aug_img, ignore_mask, poses)
203 |
204 | return aug_img, ignore_mask, poses
205 |
206 | # return shape: (height, width)
207 | def generate_gaussian_heatmap(self, shape, joint, sigma):
208 | x, y = joint
209 | grid_x = np.tile(np.arange(shape[1]), (shape[0], 1))
210 | grid_y = np.tile(np.arange(shape[0]), (shape[1], 1)).transpose()
211 | grid_distance = (grid_x - x) ** 2 + (grid_y - y) ** 2
212 | gaussian_heatmap = np.exp(-0.5 * grid_distance / sigma**2)
213 | #产生的就是一整张图的gaussian分布,只不过里中心点远的点非常非常小
214 | return gaussian_heatmap
215 |
216 | def generate_heatmaps(self, img, poses, heatmap_sigma):
217 | heatmaps = np.zeros((0,) + img.shape[:-1])
218 | sum_heatmap = np.zeros(img.shape[:-1])
219 | for joint_index in range(len(JointType)):
220 | heatmap = np.zeros(img.shape[:-1])
221 | for pose in poses:
222 | if pose[joint_index, 2] > 0:
223 | jointmap = self.generate_gaussian_heatmap(img.shape[:-1], pose[joint_index][:2], heatmap_sigma)
224 | heatmap[jointmap > heatmap] = jointmap[jointmap > heatmap]
225 | sum_heatmap[jointmap > sum_heatmap] = jointmap[jointmap > sum_heatmap]
226 | heatmaps = np.vstack((heatmaps, heatmap.reshape((1,) + heatmap.shape)))
227 | bg_heatmap = 1 - sum_heatmap # background channel
228 | heatmaps = np.vstack((heatmaps, bg_heatmap[None]))
229 | '''
230 | We take the maximum of the confidence maps insteaof the average so that thprecision of close by peaks remains distinct,
231 | as illus- trated in the right figure. At test time, we predict confidence maps (as shown in the first row of Fig. 4),
232 | and obtain body part candidates by performing non-maximum suppression.
233 | At test time, we predict confidence maps (as shown in the first row of Fig. 4),
234 | and obtain body part candidates by performing non-maximum suppression.
235 | '''
236 | return heatmaps.astype('f')
237 |
238 | # return shape: (2, height, width)
239 | def generate_constant_paf(self, shape, joint_from, joint_to, paf_width):
240 | if np.array_equal(joint_from, joint_to): # same joint
241 | return np.zeros((2,) + shape[:-1])
242 |
243 | joint_distance = np.linalg.norm(joint_to - joint_from)
244 | unit_vector = (joint_to - joint_from) / joint_distance
245 | rad = np.pi / 2
246 | rot_matrix = np.array([[np.cos(rad), np.sin(rad)], [-np.sin(rad), np.cos(rad)]])
247 | vertical_unit_vector = np.dot(rot_matrix, unit_vector) # 垂直分量
248 | grid_x = np.tile(np.arange(shape[1]), (shape[0], 1))
249 | grid_y = np.tile(np.arange(shape[0]), (shape[1], 1)).transpose() # grid_x, grid_y用来遍历图上的每一个点
250 | horizontal_inner_product = unit_vector[0] * (grid_x - joint_from[0]) + unit_vector[1] * (grid_y - joint_from[1])
251 | horizontal_paf_flag = (0 <= horizontal_inner_product) & (horizontal_inner_product <= joint_distance)
252 | '''
253 | 相当于遍历图上的每一个点,从这个点到joint_from的向量与unit_vector点乘
254 | 两个向量点乘相当于取一个向量在另一个向量方向上的投影
255 | 如果点乘大于0,那就可以判断这个点在不在这个躯干的方向上了,
256 | (0 <= horizontal_inner_product) & (horizontal_inner_product <= joint_distance)
257 | 这个限制条件是保证在与躯干水平的方向上,找出所有落在躯干范围内的点
258 | 然而还要判断这个点离躯干的距离有多远
259 | '''
260 | vertical_inner_product = vertical_unit_vector[0] * (grid_x - joint_from[0]) + vertical_unit_vector[1] * (grid_y - joint_from[1])
261 | vertical_paf_flag = np.abs(vertical_inner_product) <= paf_width # paf_width : 8
262 | '''
263 | 要判断这个点离躯干的距离有多远,只要拿与起始点的向量点乘垂直分量就可以了,
264 | 所以这里的限制条件是paf_width, 不然一个手臂就无限粗了
265 | vertical_paf_flag = np.abs(vertical_inner_product) <= paf_width
266 | 这个限制条件是保证在与躯干垂直的方向上,找出所有落在躯干范围内的点(这个躯干范围看来是手工定义的)
267 | '''
268 | paf_flag = horizontal_paf_flag & vertical_paf_flag # 合并两个限制条件
269 | constant_paf = np.stack((paf_flag, paf_flag)) * np.broadcast_to(unit_vector, shape[:-1] + (2,)).transpose(2, 0, 1)
270 | # constant_paf.shape : (2, 368, 368), 上面这一步就是把2维的unit_vector broadcast到所有paf_flag为true的点上去
271 | # constant_paf里面有368*368个点,每个点上有两个值,代表一个矢量
272 | # constant_paf里的这些矢量只会取两种值,要么是(0,0),要么是unit_vector的值
273 | '''最后,这个函数完成的是论文里公式8和公式9,相关说明也可以看论文这一段的描述'''
274 | return constant_paf
275 |
276 | def generate_pafs(self, img, poses, paf_sigma):
277 | pafs = np.zeros((0,) + img.shape[:-1])
278 |
279 | for limb in params['limbs_point']:
280 | paf = np.zeros((2,) + img.shape[:-1])
281 | paf_flags = np.zeros(paf.shape) # for constant paf
282 |
283 | for pose in poses:
284 | joint_from, joint_to = pose[limb]
285 | if joint_from[2] > 0 and joint_to[2] > 0:
286 | limb_paf = self.generate_constant_paf(img.shape, joint_from[:2], joint_to[:2], paf_sigma) #[2,368,368]
287 | limb_paf_flags = limb_paf != 0
288 | paf_flags += np.broadcast_to(limb_paf_flags[0] | limb_paf_flags[1], limb_paf.shape)
289 | '''
290 | 这个flags的作用是计数,在遍历了一张图上的所有人体之后,有的地方可能会有重叠,
291 | 比如说两个人的左手臂交织在一起,重叠的部分就累加了两次,
292 | 这里计数了之后,后面可以用来求均值
293 | '''
294 | paf += limb_paf
295 |
296 | paf[paf_flags > 0] /= paf_flags[paf_flags > 0] # 求均值
297 | pafs = np.vstack((pafs, paf))
298 | return pafs.astype('f')
299 |
300 | def get_img_annotation(self, ind=None, img_id=None):
301 | """インデックスまたは img_id から coco annotation dataを抽出、条件に満たない場合はNoneを返す """
302 | '''从索引或img_id中提取coco注释数据,如果不符合条件,则返回None'''
303 | annotations = None
304 |
305 | if ind is not None:
306 | img_id = self.imgIds[ind]
307 | anno_ids = self.coco.getAnnIds(imgIds=[img_id], iscrowd=None)
308 |
309 | # annotation for that image
310 | if len(anno_ids) > 0:
311 | annotations_for_img = self.coco.loadAnns(anno_ids)
312 |
313 | person_cnt = 0
314 | valid_annotations_for_img = []
315 | for annotation in annotations_for_img:
316 | # if too few keypoints or too small
317 | if annotation['num_keypoints'] >= params['min_keypoints'] and annotation['area'] > params['min_area']:
318 | person_cnt += 1
319 | valid_annotations_for_img.append(annotation)
320 |
321 | # if person annotation
322 | if person_cnt > 0:
323 | annotations = valid_annotations_for_img
324 |
325 | if self.mode == 'train':
326 | img_path = os.path.join(params['coco_dir'], 'train2017', self.coco.loadImgs([img_id])[0]['file_name'])
327 | mask_path = os.path.join(params['coco_dir'], 'ignore_mask_train2017', '{:012d}.png'.format(img_id))
328 | else:
329 | img_path = os.path.join(params['coco_dir'], 'val2017', self.coco.loadImgs([img_id])[0]['file_name'])
330 | mask_path = os.path.join(params['coco_dir'], 'ignore_mask_val2017', '{:012d}.png'.format(img_id))
331 | img = cv2.imread(img_path)
332 | ignore_mask = cv2.imread(mask_path, 0)
333 | if ignore_mask is None:
334 | ignore_mask = np.zeros(img.shape[:2], 'bool')
335 | else:
336 | ignore_mask = ignore_mask == 255
337 |
338 | if self.mode == 'eval':
339 | return img, img_id, annotations_for_img, ignore_mask
340 | return img, img_id, annotations, ignore_mask
341 |
342 | def parse_coco_annotation(self, annotations):
343 | """coco annotation dataのアノテーションをposes配列に変換"""
344 | '''将coco注释数据注释转换为姿势数组'''
345 | poses = np.zeros((0, len(JointType), 3), dtype=np.int32)
346 |
347 | for ann in annotations:
348 | ann_pose = np.array(ann['keypoints']).reshape(-1, 3)
349 | pose = np.zeros((1, len(JointType), 3), dtype=np.int32)
350 |
351 | # convert poses position
352 | for i, joint_index in enumerate(params['coco_joint_indices']):
353 | pose[0][joint_index] = ann_pose[i]
354 |
355 | # compute neck position
356 | if pose[0][JointType.LeftShoulder][2] > 0 and pose[0][JointType.RightShoulder][2] > 0:
357 | pose[0][JointType.Neck][0] = int((pose[0][JointType.LeftShoulder][0] + pose[0][JointType.RightShoulder][0]) / 2)
358 | pose[0][JointType.Neck][1] = int((pose[0][JointType.LeftShoulder][1] + pose[0][JointType.RightShoulder][1]) / 2)
359 | pose[0][JointType.Neck][2] = 2
360 |
361 | poses = np.vstack((poses, pose))
362 |
363 | # gt_pose = np.array(ann['keypoints']).reshape(-1, 3)
364 | return poses
365 |
366 | def generate_labels(self, img, poses, ignore_mask):
367 | img, ignore_mask, poses = self.augment_data(img, ignore_mask, poses)
368 | resized_img, ignore_mask, resized_poses = self.resize_data(img, ignore_mask, poses, shape=(self.insize, self.insize))
369 |
370 | heatmaps = self.generate_heatmaps(resized_img, resized_poses, params['heatmap_sigma'])
371 | pafs = self.generate_pafs(resized_img, resized_poses, params['paf_sigma']) # params['paf_sigma']: 8
372 | ignore_mask = cv2.morphologyEx(ignore_mask.astype('uint8'), cv2.MORPH_DILATE, np.ones((16, 16))).astype('bool')
373 | return resized_img, pafs, heatmaps, ignore_mask
374 |
375 | def preprocess(self, img):
376 | x_data = img.astype('f')
377 | x_data /= 255
378 | x_data -= 0.5
379 | x_data = x_data.transpose(2, 0, 1)
380 | return x_data
381 |
382 | def __getitem__(self, i):
383 | img, img_id, annotations, ignore_mask = self.get_img_annotation(ind=i)
384 |
385 | if self.mode == 'eval':
386 | # don't need to make heatmaps/pafs
387 | return img, annotations, img_id
388 |
389 | # if no annotations are available
390 | while annotations is None:
391 | img_id = self.imgIds[np.random.randint(len(self))]
392 | img, img_id, annotations, ignore_mask = self.get_img_annotation(img_id=img_id)
393 |
394 | poses = self.parse_coco_annotation(annotations)
395 | resized_img, pafs, heatmaps, ignore_mask = self.generate_labels(img, poses, ignore_mask)
396 | resized_img = self.preprocess(resized_img)
397 | resized_img = torch.tensor(resized_img)
398 | pafs = torch.tensor(pafs)
399 | heatmaps = torch.tensor(heatmaps)
400 | ignore_mask = torch.tensor(ignore_mask.astype('f'))
401 | return resized_img, pafs, heatmaps, ignore_mask
--------------------------------------------------------------------------------
/data/dinner.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/dinner.png
--------------------------------------------------------------------------------
/data/dinner_result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/dinner_result.png
--------------------------------------------------------------------------------
/data/face.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/face.jpg
--------------------------------------------------------------------------------
/data/face.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/face.png
--------------------------------------------------------------------------------
/data/face_result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/face_result.png
--------------------------------------------------------------------------------
/data/football.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/football.jpg
--------------------------------------------------------------------------------
/data/football_detected.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/football_detected.jpg
--------------------------------------------------------------------------------
/data/hand.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/hand.jpg
--------------------------------------------------------------------------------
/data/hand.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/hand.png
--------------------------------------------------------------------------------
/data/hand_result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/hand_result.png
--------------------------------------------------------------------------------
/data/people.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/people.png
--------------------------------------------------------------------------------
/data/people_result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/people_result.png
--------------------------------------------------------------------------------
/data/person.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/person.png
--------------------------------------------------------------------------------
/data/person_result.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/TreB1eN/Pytorch0.4.1_Openpose/358ace1708116edc174dd0a2dbdf0f7f7195a7b2/data/person_result.png
--------------------------------------------------------------------------------
/entity.py:
--------------------------------------------------------------------------------
1 | from enum import IntEnum
2 |
3 | from models.CocoPoseNet import CocoPoseNet
4 |
5 | from models.FaceNet import FaceNet
6 | from models.HandNet import HandNet
7 | from pathlib import Path
8 |
9 | class JointType(IntEnum):
10 | """関節の種類を表す """
11 | Nose = 0
12 | """ 鼻 """
13 | Neck = 1
14 | """ 首 """
15 | RightShoulder = 2
16 | """ 右肩 """
17 | RightElbow = 3
18 | """ 右肘 """
19 | RightHand = 4
20 | """ 右手 """
21 | LeftShoulder = 5
22 | """ 左肩 """
23 | LeftElbow = 6
24 | """ 左肘 """
25 | LeftHand = 7
26 | """ 左手 """
27 | RightWaist = 8
28 | """ 右腰 """
29 | RightKnee = 9
30 | """ 右膝 """
31 | RightFoot = 10
32 | """ 右足 """
33 | LeftWaist = 11
34 | """ 左腰 """
35 | LeftKnee = 12
36 | """ 左膝 """
37 | LeftFoot = 13
38 | """ 左足 """
39 | RightEye = 14
40 | """ 右目 """
41 | LeftEye = 15
42 | """ 左目 """
43 | RightEar = 16
44 | """ 右耳 """
45 | LeftEar = 17
46 | """ 左耳 """
47 |
48 | params = {
49 | 'coco_dir': 'coco2017',
50 | 'archs': {
51 | 'posenet': CocoPoseNet,
52 | 'facenet': FaceNet,
53 | 'handnet': HandNet,
54 | },
55 | 'pretrained_path' : 'models/pretrained_vgg_base.pth',
56 | # training params
57 | 'min_keypoints': 5,
58 | 'min_area': 32 * 32,
59 | 'insize': 368,
60 | 'downscale': 8,
61 | 'paf_sigma': 8,
62 | 'heatmap_sigma': 7,
63 | 'batch_size': 10,
64 | 'lr': 1e-4,
65 | 'num_workers': 2,
66 | 'eva_num': 100,
67 | 'board_loss_interval': 100,
68 | 'eval_interval': 4,
69 | 'board_pred_image_interval': 2,
70 | 'save_interval': 2,
71 | 'log_path': 'work_space/log',
72 | 'work_space': Path('work_space'),
73 |
74 | 'min_box_size': 64,
75 | 'max_box_size': 512,
76 | 'min_scale': 0.5,
77 | 'max_scale': 2.0,
78 | 'max_rotate_degree': 40,
79 | 'center_perterb_max': 40,
80 |
81 | # inference params
82 | 'inference_img_size': 368,
83 | 'inference_scales': [0.5, 1, 1.5, 2],
84 | # 'inference_scales': [1.0],
85 | 'heatmap_size': 320,
86 | 'gaussian_sigma': 2.5,
87 | 'ksize': 17,
88 | 'n_integ_points': 10,
89 | 'n_integ_points_thresh': 8,
90 | 'heatmap_peak_thresh': 0.05,
91 | 'inner_product_thresh': 0.05,
92 | 'limb_length_ratio': 1.0,
93 | 'length_penalty_value': 1,
94 | 'n_subset_limbs_thresh': 3,
95 | 'subset_score_thresh': 0.2,
96 | 'limbs_point': [
97 | [JointType.Neck, JointType.RightWaist],
98 | [JointType.RightWaist, JointType.RightKnee],
99 | [JointType.RightKnee, JointType.RightFoot],
100 | [JointType.Neck, JointType.LeftWaist],
101 | [JointType.LeftWaist, JointType.LeftKnee],
102 | [JointType.LeftKnee, JointType.LeftFoot],
103 | [JointType.Neck, JointType.RightShoulder],
104 | [JointType.RightShoulder, JointType.RightElbow],
105 | [JointType.RightElbow, JointType.RightHand],
106 | [JointType.RightShoulder, JointType.RightEar],
107 | [JointType.Neck, JointType.LeftShoulder],
108 | [JointType.LeftShoulder, JointType.LeftElbow],
109 | [JointType.LeftElbow, JointType.LeftHand],
110 | [JointType.LeftShoulder, JointType.LeftEar],
111 | [JointType.Neck, JointType.Nose],
112 | [JointType.Nose, JointType.RightEye],
113 | [JointType.Nose, JointType.LeftEye],
114 | [JointType.RightEye, JointType.RightEar],
115 | [JointType.LeftEye, JointType.LeftEar]
116 | ],
117 | 'coco_joint_indices': [
118 | JointType.Nose,
119 | JointType.LeftEye,
120 | JointType.RightEye,
121 | JointType.LeftEar,
122 | JointType.RightEar,
123 | JointType.LeftShoulder,
124 | JointType.RightShoulder,
125 | JointType.LeftElbow,
126 | JointType.RightElbow,
127 | JointType.LeftHand,
128 | JointType.RightHand,
129 | JointType.LeftWaist,
130 | JointType.RightWaist,
131 | JointType.LeftKnee,
132 | JointType.RightKnee,
133 | JointType.LeftFoot,
134 | JointType.RightFoot
135 | ],
136 |
137 | # face params
138 | 'face_inference_img_size': 368,
139 | 'face_heatmap_peak_thresh': 0.1,
140 | 'face_crop_scale': 1.5,
141 | 'face_line_indices': [
142 | [0, 1], [1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [6, 7], [7, 8], [8, 9], [9, 10], [10, 11], [11, 12], [12, 13], [13, 14], [14, 15], [15, 16], # 輪郭
143 | [17, 18], [18, 19], [19, 20], [20, 21], # 右眉
144 | [22, 23], [23, 24], [24, 25], [25, 26], # 左眉
145 | [27, 28], [28, 29], [29, 30], # 鼻
146 | [31, 32], [32, 33], [33, 34], [34, 35], # 鼻下の横線
147 | [36, 37], [37, 38], [38, 39], [39, 40], [40, 41], [41, 36], # 右目
148 | [42, 43], [43, 44], [44, 45], [45, 46], [46, 47], [47, 42], # 左目
149 | [48, 49], [49, 50], [50, 51], [51, 52], [52, 53], [53, 54], [54, 55], [55, 56], [56, 57], [57, 58], [58, 59], [59, 48], # 唇外輪
150 | [60, 61], [61, 62], [62, 63], [63, 64], [64, 65], [65, 66], [66, 67], [67, 60] # 唇内輪
151 | ],
152 |
153 | # hand params
154 | 'hand_inference_img_size': 368,
155 | 'hand_heatmap_peak_thresh': 0.1,
156 | 'fingers_indices': [
157 | [[0, 1], [1, 2], [2, 3], [3, 4]],
158 | [[0, 5], [5, 6], [6, 7], [7, 8]],
159 | [[0, 9], [9, 10], [10, 11], [11, 12]],
160 | [[0, 13], [13, 14], [14, 15], [15, 16]],
161 | [[0, 17], [17, 18], [18, 19], [19, 20]],
162 | ],
163 | }
164 |
--------------------------------------------------------------------------------
/face_detector.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import argparse
3 | import numpy as np
4 | from scipy.ndimage.filters import gaussian_filter
5 | import torch
6 | import torch.nn.functional as F
7 | from entity import params
8 | from models.FaceNet import FaceNet
9 |
10 | class FaceDetector(object):
11 | def __init__(self, weights_file):
12 | print('Loading FaceNet...')
13 | self.model = FaceNet()
14 | self.model.load_state_dict(torch.load(weights_file))
15 |
16 | self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
17 | self.model = self.model.to(self.device)
18 |
19 | def detect(self, face_img, fast_mode=False):
20 | face_img_h, face_img_w, _ = face_img.shape
21 |
22 | resized_image = cv2.resize(face_img, (params["face_inference_img_size"], params["face_inference_img_size"]))
23 | x_data = np.array(resized_image[np.newaxis], dtype=np.float32).transpose(0, 3, 1, 2) / 256 - 0.5
24 | x_data = torch.tensor(x_data).to(self.device)
25 | x_data.requires_grad = False
26 |
27 | with torch.no_grad():
28 | hs = self.model(x_data)
29 | heatmaps = F.interpolate(hs[-1], (face_img_h, face_img_w), mode='bilinear', align_corners=True).cpu().numpy()[0]
30 |
31 | keypoints = self.compute_peaks_from_heatmaps(heatmaps)
32 | return keypoints
33 |
34 | def compute_peaks_from_heatmaps(self, heatmaps):
35 | keypoints = []
36 |
37 | for i in range(heatmaps.shape[0] - 1):
38 | heatmap = gaussian_filter(heatmaps[i], sigma=params['gaussian_sigma'])
39 | max_value = heatmap.max()
40 | if max_value > params['face_heatmap_peak_thresh']:
41 | coords = np.array(np.where(heatmap==max_value)).flatten().tolist()
42 | keypoints.append([coords[1], coords[0], max_value]) # x, y, conf
43 | else:
44 | keypoints.append(None)
45 |
46 | return keypoints
47 |
48 | def draw_face_keypoints(orig_img, face_keypoints, left_top):
49 | orig_img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB)
50 | img = orig_img.copy()
51 | left, top = left_top
52 |
53 | for keypoint in face_keypoints:
54 | if keypoint:
55 | x, y, conf = keypoint
56 | cv2.circle(img, (x + left, y + top), 2, (255, 255, 0), -1)
57 |
58 | for face_line_index in params["face_line_indices"]:
59 | keypoint_from = face_keypoints[face_line_index[0]]
60 | keypoint_to = face_keypoints[face_line_index[1]]
61 |
62 | if keypoint_from and keypoint_to:
63 | keypoint_from_x, keypoint_from_y, _ = keypoint_from
64 | keypoint_to_x, keypoint_to_y, _ = keypoint_to
65 | cv2.line(img, (keypoint_from_x + left, keypoint_from_y + top), (keypoint_to_x + left, keypoint_to_y + top), (255, 255, 0), 1)
66 |
67 | return img
68 |
69 | def crop_face(img, rect):
70 | orig_img_h, orig_img_w, _ = img.shape
71 | crop_center_x = rect[0] + rect[2] / 2
72 | crop_center_y = rect[1] + rect[3] / 2
73 | crop_width = rect[2] * params['face_crop_scale']
74 | crop_height = rect[3] * params['face_crop_scale']
75 | crop_left = max(0, int(crop_center_x - crop_width / 2))
76 | crop_top = max(0, int(crop_center_y - crop_height / 2))
77 | crop_right = min(orig_img_w-1, int(crop_center_x + crop_width / 2))
78 | crop_bottom = min(orig_img_h-1, int(crop_center_y + crop_height / 2))
79 | cropped_face = img[crop_top:crop_bottom, crop_left:crop_right]
80 | max_edge_len = np.max(cropped_face.shape[:-1])
81 | padded_face = np.zeros((max_edge_len, max_edge_len, cropped_face.shape[-1]), dtype=np.uint8)
82 | padded_face[0:cropped_face.shape[0], 0:cropped_face.shape[1]] = cropped_face
83 |
84 | return padded_face, (crop_left, crop_top)
85 |
86 | if __name__ == '__main__':
87 | parser = argparse.ArgumentParser(description='Face detector')
88 | parser.add_argument('weights', help='weights file path')
89 | parser.add_argument('--img', '-i', help='image file path')
90 | args = parser.parse_args()
91 |
92 | # load model
93 | face_detector = FaceDetector(args.weights)
94 |
95 | # read image
96 | img = cv2.imread(args.img)
97 |
98 | # inference
99 | face_keypoints = face_detector.detect(img)
100 |
101 | # draw and save image
102 | img = draw_face_keypoints(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), face_keypoints, (0, 0))
103 | print('Saving result into result.png...')
104 | cv2.imwrite('result.png', img)
105 |
--------------------------------------------------------------------------------
/gen_ignore_mask.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import cv2
4 | import argparse
5 | import numpy as np
6 | from tqdm import tqdm
7 |
8 | from pycocotools.coco import COCO
9 |
10 | from entity import params
11 |
12 |
13 | class CocoDataLoader(object):
14 | def __init__(self, coco, mode='train'):
15 | self.coco = coco
16 | assert mode in ['train', 'val'], 'Data loading mode is invalid.'
17 | self.mode = mode
18 | self.catIds = coco.getCatIds() # catNms=['person']
19 | self.imgIds = sorted(coco.getImgIds(catIds=self.catIds))
20 |
21 | def __len__(self):
22 | return len(self.imgIds)
23 |
24 | def gen_masks(self, img, annotations):
25 | mask_all = np.zeros(img.shape[:2], 'bool')
26 | mask_miss = np.zeros(img.shape[:2], 'bool')
27 | for ann in annotations:
28 | mask = self.coco.annToMask(ann).astype('bool')
29 | if ann['iscrowd'] == 1:
30 | intxn = mask_all & mask
31 | mask_miss = np.bitwise_or(mask_miss.astype(int) , np.subtract(mask, intxn, dtype=np.int32))
32 | mask_all = np.bitwise_or(mask_all.astype(int) , mask.astype(int))
33 | elif ann['num_keypoints'] < params['min_keypoints'] or ann['area'] <= params['min_area']:
34 | mask_all = np.bitwise_or(mask_all.astype(int) , mask.astype(int))
35 | mask_miss = np.bitwise_or(mask_miss.astype(int) , mask.astype(int))
36 | else:
37 | mask_all = np.bitwise_or(mask_all.astype(int) , mask.astype(int))
38 | return mask_all, mask_miss
39 |
40 | def dwaw_gen_masks(self, img, mask, color=(0, 0, 1)):
41 | bimsk = np.repeat(mask[:, :, np.newaxis], 3, axis=2)
42 | mskd = img * bimsk.astype(np.int32)
43 | clmsk = np.ones(bimsk.shape) * bimsk
44 | for i in range(3):
45 | clmsk[:, :, i] = clmsk[:, :, i] * color[i] * 255
46 | img = img + 0.7 * clmsk - 0.7 * mskd
47 | return img.astype(np.uint8)
48 |
49 | def draw_masks_and_keypoints(self, img, annotations):
50 | for ann in annotations:
51 | # masks
52 | mask = self.coco.annToMask(ann).astype(np.uint8)
53 | if ann['iscrowd'] == 1:
54 | color = (0, 0, 1)
55 | elif ann['num_keypoints'] == 0:
56 | color = (0, 1, 0)
57 | else:
58 | color = (1, 0, 0)
59 | bimsk = np.repeat(mask[:, :, np.newaxis], 3, axis=2)
60 | mskd = img * bimsk.astype(np.int32)
61 | clmsk = np.ones(bimsk.shape) * bimsk
62 | for i in range(3):
63 | clmsk[:, :, i] = clmsk[:, :, i] * color[i] * 255
64 | img = img + 0.7 * clmsk - 0.7 * mskd
65 |
66 | # keypoints
67 | for x, y, v in np.array(ann['keypoints']).reshape(-1, 3):
68 | if v == 1:
69 | cv2.circle(img, (x, y), 3, (255, 255, 0), -1)
70 | elif v == 2:
71 | cv2.circle(img, (x, y), 3, (255, 0, 255), -1)
72 | return img.astype(np.uint8)
73 |
74 | def get_img_annotation(self, ind=None, img_id=None):
75 | """インデックスまたは img_id から coco annotation dataを抽出、条件に満たない場合はNoneを返す """
76 | if ind is not None:
77 | img_id = self.imgIds[ind]
78 |
79 | anno_ids = self.coco.getAnnIds(imgIds=[img_id])
80 | annotations = self.coco.loadAnns(anno_ids)
81 |
82 | img_file = os.path.join(params['coco_dir'], self.mode+'2017', self.coco.loadImgs([img_id])[0]['file_name'])
83 | img = cv2.imread(img_file)
84 | return img, annotations, img_id
85 |
86 |
87 | if __name__ == '__main__':
88 | parser = argparse.ArgumentParser()
89 | parser.add_argument('--vis', action='store_true', help='visualize annotations and ignore masks')
90 | args = parser.parse_args()
91 |
92 | for mode in ['train', 'val']:
93 | coco = COCO(os.path.join(params['coco_dir'], 'annotations/person_keypoints_{}2017.json'.format(mode)))
94 | data_loader = CocoDataLoader(coco, mode=mode)
95 |
96 | save_dir = os.path.join(params['coco_dir'], 'ignore_mask_{}2017'.format(mode))
97 | if not os.path.exists(save_dir):
98 | os.makedirs(save_dir)
99 |
100 | for i in tqdm(range(len(data_loader))):
101 | img, annotations, img_id = data_loader.get_img_annotation(ind=i)
102 | mask_all, mask_miss = data_loader.gen_masks(img, annotations)
103 |
104 | if args.vis:
105 | ann_img = data_loader.draw_masks_and_keypoints(img, annotations)
106 | msk_img = data_loader.dwaw_gen_masks(img, mask_miss)
107 | cv2.imshow('image', np.hstack((ann_img, msk_img)))
108 | k = cv2.waitKey()
109 | if k == ord('q'):
110 | break
111 | elif k == ord('s'):
112 | cv2.imwrite('aaa.png', np.hstack((ann_img, msk_img)))
113 |
114 | if np.any(mask_miss) and not args.vis:
115 | mask_miss = mask_miss.astype(np.uint8) * 255
116 | save_path = os.path.join(save_dir, '{:012d}.png'.format(img_id))
117 | cv2.imwrite(save_path, mask_miss)
118 |
--------------------------------------------------------------------------------
/getData.sh:
--------------------------------------------------------------------------------
1 | # get COCO dataset
2 | mkdir coco
3 | cd coco
4 |
5 | wget http://images.cocodataset.org/zips/train2017.zip
6 | wget http://images.cocodataset.org/zips/val2017.zip
7 | wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
8 |
9 | unzip train2017.zip
10 | unzip val2017.zip
11 | unzip annotations_trainval2017.zip
12 |
13 | rm -f train2017.zip
14 | rm -f val2017.zip
15 | rm -f annotations_trainval2017.zip
16 |
--------------------------------------------------------------------------------
/hand_detector.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import argparse
3 | import numpy as np
4 | from scipy.ndimage.filters import gaussian_filter
5 | import torch
6 | import torch.nn.functional as F
7 | from entity import params
8 | from models.HandNet import HandNet
9 |
10 | class HandDetector(object):
11 | def __init__(self, weights_file):
12 | print('Loading HandNet...')
13 | self.model = HandNet()
14 | self.model.load_state_dict(torch.load(weights_file))
15 |
16 | self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
17 | self.model = self.model.to(self.device)
18 |
19 | def detect(self, hand_img, fast_mode=False, hand_type="right"):
20 | if hand_type == "left":
21 | hand_img = cv2.flip(hand_img, 1)
22 |
23 | hand_img_h, hand_img_w, _ = hand_img.shape
24 |
25 | resized_image = cv2.resize(hand_img, (params["hand_inference_img_size"], params["hand_inference_img_size"]))
26 | x_data = np.array(resized_image[np.newaxis], dtype=np.float32).transpose(0, 3, 1, 2) / 256 - 0.5
27 | x_data = torch.tensor(x_data).to(self.device)
28 | x_data.requires_grad = False
29 | with torch.no_grad():
30 | hs = self.model(x_data)
31 |
32 | heatmaps = F.interpolate(hs[-1], (hand_img_h, hand_img_w), mode='bilinear', align_corners=True).cpu().numpy()[0]
33 |
34 | if hand_type == "left":
35 | heatmaps = cv2.flip(heatmaps.transpose(1, 2, 0), 1).transpose(2, 0, 1)
36 |
37 | keypoints = self.compute_peaks_from_heatmaps(heatmaps)
38 |
39 | return keypoints
40 |
41 | def compute_peaks_from_heatmaps(self, heatmaps):
42 | keypoints = []
43 |
44 | for i in range(heatmaps.shape[0] - 1):
45 | heatmap = gaussian_filter(heatmaps[i], sigma=params['gaussian_sigma'])
46 | max_value = heatmap.max()
47 | if max_value > params['hand_heatmap_peak_thresh']:
48 | coords = np.array(np.where(heatmap==max_value)).flatten().tolist()
49 | keypoints.append([coords[1], coords[0], max_value]) # x, y, conf
50 | else:
51 | keypoints.append(None)
52 |
53 | return keypoints
54 |
55 | def draw_hand_keypoints(orig_img, hand_keypoints, left_top):
56 | orig_img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB)
57 | img = orig_img.copy()
58 | left, top = left_top
59 |
60 | finger_colors = [
61 | (0, 0, 255),
62 | (0, 255, 255),
63 | (0, 255, 0),
64 | (255, 0, 0),
65 | (255, 0, 255),
66 | ]
67 |
68 | for i, finger_indices in enumerate(params["fingers_indices"]):
69 | for finger_line_index in finger_indices:
70 | keypoint_from = hand_keypoints[finger_line_index[0]]
71 | keypoint_to = hand_keypoints[finger_line_index[1]]
72 |
73 | if keypoint_from:
74 | keypoint_from_x, keypoint_from_y, _ = keypoint_from
75 | cv2.circle(img, (keypoint_from_x + left, keypoint_from_y + top), 3, finger_colors[i], -1)
76 |
77 | if keypoint_to:
78 | keypoint_to_x, keypoint_to_y, _ = keypoint_to
79 | cv2.circle(img, (keypoint_to_x + left, keypoint_to_y + top), 3, finger_colors[i], -1)
80 |
81 | if keypoint_from and keypoint_to:
82 | cv2.line(img, (keypoint_from_x + left, keypoint_from_y + top), (keypoint_to_x + left, keypoint_to_y + top), finger_colors[i], 1)
83 |
84 | return img
85 |
86 | if __name__ == '__main__':
87 | parser = argparse.ArgumentParser(description='Face detector')
88 | parser.add_argument('weights', help='weights file path')
89 | parser.add_argument('--img', '-i', help='image file path')
90 | args = parser.parse_args()
91 |
92 | # load model
93 | hand_detector = HandDetector(args.weights)
94 |
95 | # read image
96 | img = cv2.imread(args.img)
97 |
98 | # inference
99 | hand_keypoints = hand_detector.detect(img, hand_type="right")
100 |
101 | # draw and save image
102 | img = draw_hand_keypoints(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), hand_keypoints, (0, 0))
103 | print('Saving result into result.png...')
104 | cv2.imwrite('result.png', img)
105 |
--------------------------------------------------------------------------------
/models/CocoPoseNet.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.nn import Conv2d, Module, ReLU, MaxPool2d, init
3 | import torch.nn.functional as F
4 | import numpy as np
5 |
6 | def compute_loss(pafs_ys, heatmaps_ys, pafs_t, heatmaps_t, ignore_mask):
7 | heatmap_loss_log = []
8 | paf_loss_log = []
9 | total_loss = 0
10 |
11 | paf_masks = ignore_mask.unsqueeze(1).repeat([1, pafs_t.shape[1], 1, 1])
12 | heatmap_masks = ignore_mask.unsqueeze(1).repeat([1, heatmaps_t.shape[1], 1, 1])
13 |
14 | # compute loss on each stage
15 | for pafs_y, heatmaps_y in zip(pafs_ys, heatmaps_ys):
16 | stage_pafs_t = pafs_t.clone()
17 | stage_heatmaps_t = heatmaps_t.clone()
18 | stage_paf_masks = paf_masks.clone()
19 | stage_heatmap_masks = heatmap_masks.clone()
20 |
21 | if pafs_y.shape != stage_pafs_t.shape:
22 | with torch.no_grad():
23 | stage_pafs_t = F.interpolate(stage_pafs_t, pafs_y.shape[2:], mode='bilinear', align_corners=True)
24 | stage_heatmaps_t = F.interpolate(stage_heatmaps_t, heatmaps_y.shape[2:], mode='bilinear', align_corners=True)
25 | stage_paf_masks = F.interpolate(stage_paf_masks, pafs_y.shape[2:]) > 0
26 | stage_heatmap_masks = F.interpolate(stage_heatmap_masks, heatmaps_y.shape[2:]) > 0
27 |
28 | with torch.no_grad():
29 | stage_pafs_t[stage_paf_masks == 1] = pafs_y.detach()[stage_paf_masks == 1]
30 | stage_heatmaps_t[stage_heatmap_masks == 1] = heatmaps_y.detach()[stage_heatmap_masks == 1]
31 |
32 | pafs_loss = mean_square_error(pafs_y, stage_pafs_t)
33 | heatmaps_loss = mean_square_error(heatmaps_y, stage_heatmaps_t)
34 |
35 | total_loss += pafs_loss + heatmaps_loss
36 |
37 | paf_loss_log.append(pafs_loss.item())
38 | heatmap_loss_log.append(heatmaps_loss.item())
39 |
40 | return total_loss, np.array(paf_loss_log), np.array(heatmap_loss_log)
41 |
42 | def mean_square_error(pred, target):
43 | assert pred.shape == target.shape, 'x and y should in same shape'
44 | return torch.sum((pred - target) ** 2) / target.nelement()
45 |
46 | class CocoPoseNet(Module):
47 | insize = 368
48 | def __init__(self, path = None):
49 | super(CocoPoseNet, self).__init__()
50 | self.base = Base_model()
51 | self.stage_1 = Stage_1()
52 | self.stage_2 = Stage_x()
53 | self.stage_3 = Stage_x()
54 | self.stage_4 = Stage_x()
55 | self.stage_5 = Stage_x()
56 | self.stage_6 = Stage_x()
57 | for m in self.modules():
58 | if isinstance(m, Conv2d):
59 | init.constant_(m.bias, 0)
60 | if path:
61 | self.base.vgg_base.load_state_dict(torch.load(path))
62 |
63 | def forward(self, x):
64 | heatmaps = []
65 | pafs = []
66 | feature_map = self.base(x)
67 | h1, h2 = self.stage_1(feature_map)
68 | pafs.append(h1)
69 | heatmaps.append(h2)
70 | h1, h2 = self.stage_2(torch.cat([h1, h2, feature_map], dim = 1))
71 | pafs.append(h1)
72 | heatmaps.append(h2)
73 | h1, h2 = self.stage_3(torch.cat([h1, h2, feature_map], dim = 1))
74 | pafs.append(h1)
75 | heatmaps.append(h2)
76 | h1, h2 = self.stage_4(torch.cat([h1, h2, feature_map], dim = 1))
77 | pafs.append(h1)
78 | heatmaps.append(h2)
79 | h1, h2 = self.stage_5(torch.cat([h1, h2, feature_map], dim = 1))
80 | pafs.append(h1)
81 | heatmaps.append(h2)
82 | h1, h2 = self.stage_6(torch.cat([h1, h2, feature_map], dim = 1))
83 | pafs.append(h1)
84 | heatmaps.append(h2)
85 | return pafs, heatmaps
86 |
87 | class VGG_Base(Module):
88 | def __init__(self):
89 | super(VGG_Base, self).__init__()
90 | self.conv1_1 = Conv2d(in_channels = 3, out_channels = 64, kernel_size = 3, stride = 1, padding = 1)
91 | self.conv1_2 = Conv2d(in_channels = 64, out_channels = 64, kernel_size = 3, stride = 1, padding = 1)
92 | self.conv2_1 = Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 1, padding = 1)
93 | self.conv2_2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1, padding = 1)
94 | self.conv3_1 = Conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
95 | self.conv3_2 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
96 | self.conv3_3 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
97 | self.conv3_4 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
98 | self.conv4_1 = Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
99 | self.conv4_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
100 | self.relu = ReLU()
101 | self.max_pooling_2d = MaxPool2d(kernel_size = 2, stride = 2)
102 |
103 | def forward(self, x):
104 | x = self.relu(self.conv1_1(x))
105 | x = self.relu(self.conv1_2(x))
106 | x = self.max_pooling_2d(x)
107 | x = self.relu(self.conv2_1(x))
108 | x = self.relu(self.conv2_2(x))
109 | x = self.max_pooling_2d(x)
110 | x = self.relu(self.conv3_1(x))
111 | x = self.relu(self.conv3_2(x))
112 | x = self.relu(self.conv3_3(x))
113 | x = self.relu(self.conv3_4(x))
114 | x = self.max_pooling_2d(x)
115 | x = self.relu(self.conv4_1(x))
116 | x = self.relu(self.conv4_2(x))
117 | return x
118 |
119 | class Base_model(Module):
120 | def __init__(self):
121 | super(Base_model, self).__init__()
122 | self.vgg_base = VGG_Base()
123 | self.conv4_3_CPM = Conv2d(in_channels=512, out_channels=256, kernel_size = 3, stride = 1, padding = 1)
124 | self.conv4_4_CPM = Conv2d(in_channels=256, out_channels=128, kernel_size = 3, stride = 1, padding = 1)
125 | self.relu = ReLU()
126 | def forward(self, x):
127 | x = self.vgg_base(x)
128 | x = self.relu(self.conv4_3_CPM(x))
129 | x = self.relu(self.conv4_4_CPM(x))
130 | return x
131 |
132 | class Stage_1(Module):
133 | def __init__(self):
134 | super(Stage_1, self).__init__()
135 | self.conv1_CPM_L1 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1)
136 | self.conv2_CPM_L1 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1)
137 | self.conv3_CPM_L1 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1)
138 | self.conv4_CPM_L1 = Conv2d(in_channels=128, out_channels=512, kernel_size=1, stride=1, padding=0)
139 | self.conv5_CPM_L1 = Conv2d(in_channels=512, out_channels=38, kernel_size=1, stride=1, padding=0)
140 | self.conv1_CPM_L2 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1)
141 | self.conv2_CPM_L2 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1)
142 | self.conv3_CPM_L2 = Conv2d(in_channels=128, out_channels=128, kernel_size=3, stride=1, padding=1)
143 | self.conv4_CPM_L2 = Conv2d(in_channels=128, out_channels=512, kernel_size=1, stride=1, padding=0)
144 | self.conv5_CPM_L2 = Conv2d(in_channels=512, out_channels=19, kernel_size=1, stride=1, padding=0)
145 | self.relu = ReLU()
146 |
147 | def forward(self, x):
148 | h1 = self.relu(self.conv1_CPM_L1(x)) # branch1
149 | h1 = self.relu(self.conv2_CPM_L1(h1))
150 | h1 = self.relu(self.conv3_CPM_L1(h1))
151 | h1 = self.relu(self.conv4_CPM_L1(h1))
152 | h1 = self.conv5_CPM_L1(h1)
153 | h2 = self.relu(self.conv1_CPM_L2(x)) # branch2
154 | h2 = self.relu(self.conv2_CPM_L2(h2))
155 | h2 = self.relu(self.conv3_CPM_L2(h2))
156 | h2 = self.relu(self.conv4_CPM_L2(h2))
157 | h2 = self.conv5_CPM_L2(h2)
158 | return h1, h2
159 |
160 | class Stage_x(Module):
161 | def __init__(self):
162 | super(Stage_x, self).__init__()
163 | self.conv1_L1 = Conv2d(in_channels = 185, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
164 | self.conv2_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
165 | self.conv3_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
166 | self.conv4_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
167 | self.conv5_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
168 | self.conv6_L1 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
169 | self.conv7_L1 = Conv2d(in_channels = 128, out_channels = 38, kernel_size = 1, stride = 1, padding = 0)
170 | self.conv1_L2 = Conv2d(in_channels = 185, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
171 | self.conv2_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
172 | self.conv3_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
173 | self.conv4_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
174 | self.conv5_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
175 | self.conv6_L2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
176 | self.conv7_L2 = Conv2d(in_channels = 128, out_channels = 19, kernel_size = 1, stride = 1, padding = 0)
177 | self.relu = ReLU()
178 |
179 | def forward(self, x):
180 | h1 = self.relu(self.conv1_L1(x)) # branch1
181 | h1 = self.relu(self.conv2_L1(h1))
182 | h1 = self.relu(self.conv3_L1(h1))
183 | h1 = self.relu(self.conv4_L1(h1))
184 | h1 = self.relu(self.conv5_L1(h1))
185 | h1 = self.relu(self.conv6_L1(h1))
186 | h1 = self.conv7_L1(h1)
187 | h2 = self.relu(self.conv1_L2(x)) # branch2
188 | h2 = self.relu(self.conv2_L2(h2))
189 | h2 = self.relu(self.conv3_L2(h2))
190 | h2 = self.relu(self.conv4_L2(h2))
191 | h2 = self.relu(self.conv5_L2(h2))
192 | h2 = self.relu(self.conv6_L2(h2))
193 | h2 = self.conv7_L2(h2)
194 | return h1, h2
195 |
--------------------------------------------------------------------------------
/models/FaceNet.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.nn import Conv2d, Module, ReLU, MaxPool2d, init
3 | import torch.nn.functional as F
4 |
5 | class FaceNet(Module):
6 | insize = 368
7 |
8 | def __init__(self):
9 | super(FaceNet, self).__init__()
10 | # cnn to make feature map
11 | self.relu = ReLU()
12 | self.max_pooling_2d = MaxPool2d(kernel_size = 2, stride = 2)
13 | self.conv1_1 = Conv2d(in_channels = 3, out_channels = 64, kernel_size = 3, stride = 1, padding = 1)
14 | self.conv1_2 = Conv2d(in_channels = 64, out_channels = 64, kernel_size = 3, stride = 1, padding = 1)
15 | self.conv2_1 = Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 1, padding = 1)
16 | self.conv2_2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1, padding = 1)
17 | self.conv3_1 = Conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
18 | self.conv3_2 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
19 | self.conv3_3 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
20 | self.conv3_4 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
21 | self.conv4_1 = Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
22 | self.conv4_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
23 | self.conv4_3 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
24 | self.conv4_4 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
25 | self.conv5_1 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
26 | self.conv5_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
27 | self.conv5_3_CPM = Conv2d(in_channels = 512, out_channels = 128, kernel_size = 3, stride = 1, padding = 1)
28 |
29 | # stage1
30 | self.conv6_1_CPM = Conv2d(in_channels = 128, out_channels = 512, kernel_size = 1, stride = 1, padding = 0)
31 | self.conv6_2_CPM = Conv2d(in_channels = 512, out_channels = 71, kernel_size = 1, stride = 1, padding = 0)
32 |
33 | # stage2
34 | self.Mconv1_stage2 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
35 | self.Mconv2_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
36 | self.Mconv3_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
37 | self.Mconv4_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
38 | self.Mconv5_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
39 | self.Mconv6_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
40 | self.Mconv7_stage2 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0)
41 |
42 | # stage3
43 | self.Mconv1_stage3 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
44 | self.Mconv2_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
45 | self.Mconv3_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
46 | self.Mconv4_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
47 | self.Mconv5_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
48 | self.Mconv6_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
49 | self.Mconv7_stage3 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0)
50 |
51 | # stage4
52 | self.Mconv1_stage4 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
53 | self.Mconv2_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
54 | self.Mconv3_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
55 | self.Mconv4_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
56 | self.Mconv5_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
57 | self.Mconv6_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
58 | self.Mconv7_stage4 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0)
59 |
60 | # stage5
61 | self.Mconv1_stage5 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
62 | self.Mconv2_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
63 | self.Mconv3_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
64 | self.Mconv4_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
65 | self.Mconv5_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
66 | self.Mconv6_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
67 | self.Mconv7_stage5 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0)
68 |
69 | # stage6
70 | self.Mconv1_stage6 = Conv2d(in_channels = 199, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
71 | self.Mconv2_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
72 | self.Mconv3_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
73 | self.Mconv4_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
74 | self.Mconv5_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
75 | self.Mconv6_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
76 | self.Mconv7_stage6 = Conv2d(in_channels = 128, out_channels = 71, kernel_size = 1, stride = 1, padding = 0)
77 |
78 | for m in self.modules():
79 | if isinstance(m, Conv2d):
80 | init.constant_(m.bias, 0)
81 |
82 | def __call__(self, x):
83 | heatmaps = []
84 |
85 | h = self.relu(self.conv1_1(x))
86 | h = self.relu(self.conv1_2(h))
87 | h = self.max_pooling_2d(h)
88 | h = self.relu(self.conv2_1(h))
89 | h = self.relu(self.conv2_2(h))
90 | h = self.max_pooling_2d(h)
91 | h = self.relu(self.conv3_1(h))
92 | h = self.relu(self.conv3_2(h))
93 | h = self.relu(self.conv3_3(h))
94 | h = self.relu(self.conv3_4(h))
95 | h = self.max_pooling_2d(h)
96 | h = self.relu(self.conv4_1(h))
97 | h = self.relu(self.conv4_2(h))
98 | h = self.relu(self.conv4_3(h))
99 | h = self.relu(self.conv4_4(h))
100 | h = self.relu(self.conv5_1(h))
101 | h = self.relu(self.conv5_2(h))
102 | h = self.relu(self.conv5_3_CPM(h))
103 | feature_map = h
104 |
105 | # stage1
106 | h = self.relu(self.conv6_1_CPM(h))
107 | h = self.conv6_2_CPM(h)
108 | heatmaps.append(h)
109 |
110 | # stage2
111 | h = torch.cat([h, feature_map], dim= 1) # channel concat
112 | h = self.relu(self.Mconv1_stage2(h))
113 | h = self.relu(self.Mconv2_stage2(h))
114 | h = self.relu(self.Mconv3_stage2(h))
115 | h = self.relu(self.Mconv4_stage2(h))
116 | h = self.relu(self.Mconv5_stage2(h))
117 | h = self.relu(self.Mconv6_stage2(h))
118 | h = self.Mconv7_stage2(h)
119 | heatmaps.append(h)
120 |
121 | # stage3
122 | h = torch.cat([h, feature_map], dim= 1) # channel concat
123 | h = self.relu(self.Mconv1_stage3(h))
124 | h = self.relu(self.Mconv2_stage3(h))
125 | h = self.relu(self.Mconv3_stage3(h))
126 | h = self.relu(self.Mconv4_stage3(h))
127 | h = self.relu(self.Mconv5_stage3(h))
128 | h = self.relu(self.Mconv6_stage3(h))
129 | h = self.Mconv7_stage3(h)
130 | heatmaps.append(h)
131 |
132 | # stage4
133 | h = torch.cat([h, feature_map], dim= 1) # channel concat
134 | h = self.relu(self.Mconv1_stage4(h))
135 | h = self.relu(self.Mconv2_stage4(h))
136 | h = self.relu(self.Mconv3_stage4(h))
137 | h = self.relu(self.Mconv4_stage4(h))
138 | h = self.relu(self.Mconv5_stage4(h))
139 | h = self.relu(self.Mconv6_stage4(h))
140 | h = self.Mconv7_stage4(h)
141 | heatmaps.append(h)
142 |
143 | # stage5
144 | h = torch.cat([h, feature_map], dim= 1) # channel concat
145 | h = self.relu(self.Mconv1_stage5(h))
146 | h = self.relu(self.Mconv2_stage5(h))
147 | h = self.relu(self.Mconv3_stage5(h))
148 | h = self.relu(self.Mconv4_stage5(h))
149 | h = self.relu(self.Mconv5_stage5(h))
150 | h = self.relu(self.Mconv6_stage5(h))
151 | h = self.Mconv7_stage5(h)
152 | heatmaps.append(h)
153 |
154 | # stage6
155 | h = torch.cat([h, feature_map], dim= 1) # channel concat
156 | h = self.relu(self.Mconv1_stage6(h))
157 | h = self.relu(self.Mconv2_stage6(h))
158 | h = self.relu(self.Mconv3_stage6(h))
159 | h = self.relu(self.Mconv4_stage6(h))
160 | h = self.relu(self.Mconv5_stage6(h))
161 | h = self.relu(self.Mconv6_stage6(h))
162 | h = self.Mconv7_stage6(h)
163 | heatmaps.append(h)
164 |
165 | return heatmaps
--------------------------------------------------------------------------------
/models/HandNet.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from torch.nn import Conv2d, Module, ReLU, MaxPool2d, init
3 | import torch.nn.functional as F
4 |
5 | class HandNet(Module):
6 | insize = 368
7 |
8 | def __init__(self):
9 | super(HandNet, self).__init__()
10 | # cnn to make feature map
11 | self.relu = ReLU()
12 | self.max_pooling_2d = MaxPool2d(kernel_size = 2, stride = 2)
13 | self.conv1_1 = Conv2d(in_channels = 3, out_channels = 64, kernel_size = 3, stride = 1, padding = 1)
14 | self.conv1_2 = Conv2d(in_channels = 64, out_channels = 64, kernel_size = 3, stride = 1, padding = 1)
15 | self.conv2_1 = Conv2d(in_channels = 64, out_channels = 128, kernel_size = 3, stride = 1, padding = 1)
16 | self.conv2_2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 3, stride = 1, padding = 1)
17 | self.conv3_1 = Conv2d(in_channels = 128, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
18 | self.conv3_2 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
19 | self.conv3_3 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
20 | self.conv3_4 = Conv2d(in_channels = 256, out_channels = 256, kernel_size = 3, stride = 1, padding = 1)
21 | self.conv4_1 = Conv2d(in_channels = 256, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
22 | self.conv4_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
23 | self.conv4_3 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
24 | self.conv4_4 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
25 | self.conv5_1 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
26 | self.conv5_2 = Conv2d(in_channels = 512, out_channels = 512, kernel_size = 3, stride = 1, padding = 1)
27 | self.conv5_3_CPM = Conv2d(in_channels = 512, out_channels = 128, kernel_size = 3, stride = 1, padding = 1)
28 |
29 | # stage1
30 | self.conv6_1_CPM = Conv2d(in_channels = 128, out_channels = 512, kernel_size = 1, stride = 1, padding = 0)
31 | self.conv6_2_CPM = Conv2d(in_channels = 512, out_channels = 22, kernel_size = 1, stride = 1, padding = 0)
32 |
33 | # stage2
34 | self.Mconv1_stage2 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
35 | self.Mconv2_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
36 | self.Mconv3_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
37 | self.Mconv4_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
38 | self.Mconv5_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
39 | self.Mconv6_stage2 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
40 | self.Mconv7_stage2 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0)
41 |
42 | # stage3
43 | self.Mconv1_stage3 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
44 | self.Mconv2_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
45 | self.Mconv3_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
46 | self.Mconv4_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
47 | self.Mconv5_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
48 | self.Mconv6_stage3 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
49 | self.Mconv7_stage3 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0)
50 |
51 | # stage4
52 | self.Mconv1_stage4 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
53 | self.Mconv2_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
54 | self.Mconv3_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
55 | self.Mconv4_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
56 | self.Mconv5_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
57 | self.Mconv6_stage4 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
58 | self.Mconv7_stage4 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0)
59 |
60 | # stage5
61 | self.Mconv1_stage5 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
62 | self.Mconv2_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
63 | self.Mconv3_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
64 | self.Mconv4_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
65 | self.Mconv5_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
66 | self.Mconv6_stage5 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
67 | self.Mconv7_stage5 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0)
68 |
69 | # stage6
70 | self.Mconv1_stage6 = Conv2d(in_channels = 150, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
71 | self.Mconv2_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
72 | self.Mconv3_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
73 | self.Mconv4_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
74 | self.Mconv5_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 7, stride = 1, padding = 3)
75 | self.Mconv6_stage6 = Conv2d(in_channels = 128, out_channels = 128, kernel_size = 1, stride = 1, padding = 0)
76 | self.Mconv7_stage6 = Conv2d(in_channels = 128, out_channels = 22, kernel_size = 1, stride = 1, padding = 0)
77 |
78 | for m in self.modules():
79 | if isinstance(m, Conv2d):
80 | init.constant_(m.bias, 0)
81 |
82 | def __call__(self, x):
83 | heatmaps = []
84 |
85 | h = self.relu(self.conv1_1(x))
86 | h = self.relu(self.conv1_2(h))
87 | h = self.max_pooling_2d(h)
88 | h = self.relu(self.conv2_1(h))
89 | h = self.relu(self.conv2_2(h))
90 | h = self.max_pooling_2d(h)
91 | h = self.relu(self.conv3_1(h))
92 | h = self.relu(self.conv3_2(h))
93 | h = self.relu(self.conv3_3(h))
94 | h = self.relu(self.conv3_4(h))
95 | h = self.max_pooling_2d(h)
96 | h = self.relu(self.conv4_1(h))
97 | h = self.relu(self.conv4_2(h))
98 | h = self.relu(self.conv4_3(h))
99 | h = self.relu(self.conv4_4(h))
100 | h = self.relu(self.conv5_1(h))
101 | h = self.relu(self.conv5_2(h))
102 | h = self.relu(self.conv5_3_CPM(h))
103 | feature_map = h
104 |
105 | # stage1
106 | h = self.relu(self.conv6_1_CPM(h))
107 | h = self.conv6_2_CPM(h)
108 | heatmaps.append(h)
109 |
110 | # stage2
111 | h = torch.cat([h, feature_map], dim= 1) # channel concat
112 | h = self.relu(self.Mconv1_stage2(h))
113 | h = self.relu(self.Mconv2_stage2(h))
114 | h = self.relu(self.Mconv3_stage2(h))
115 | h = self.relu(self.Mconv4_stage2(h))
116 | h = self.relu(self.Mconv5_stage2(h))
117 | h = self.relu(self.Mconv6_stage2(h))
118 | h = self.Mconv7_stage2(h)
119 | heatmaps.append(h)
120 |
121 | # stage3
122 | h = torch.cat([h, feature_map], dim= 1) # channel concat
123 | h = self.relu(self.Mconv1_stage3(h))
124 | h = self.relu(self.Mconv2_stage3(h))
125 | h = self.relu(self.Mconv3_stage3(h))
126 | h = self.relu(self.Mconv4_stage3(h))
127 | h = self.relu(self.Mconv5_stage3(h))
128 | h = self.relu(self.Mconv6_stage3(h))
129 | h = self.Mconv7_stage3(h)
130 | heatmaps.append(h)
131 |
132 | # stage4
133 | h = torch.cat([h, feature_map], dim= 1) # channel concat
134 | h = self.relu(self.Mconv1_stage4(h))
135 | h = self.relu(self.Mconv2_stage4(h))
136 | h = self.relu(self.Mconv3_stage4(h))
137 | h = self.relu(self.Mconv4_stage4(h))
138 | h = self.relu(self.Mconv5_stage4(h))
139 | h = self.relu(self.Mconv6_stage4(h))
140 | h = self.Mconv7_stage4(h)
141 | heatmaps.append(h)
142 |
143 | # stage5
144 | h = torch.cat([h, feature_map], dim= 1) # channel concat
145 | h = self.relu(self.Mconv1_stage5(h))
146 | h = self.relu(self.Mconv2_stage5(h))
147 | h = self.relu(self.Mconv3_stage5(h))
148 | h = self.relu(self.Mconv4_stage5(h))
149 | h = self.relu(self.Mconv5_stage5(h))
150 | h = self.relu(self.Mconv6_stage5(h))
151 | h = self.Mconv7_stage5(h)
152 | heatmaps.append(h)
153 |
154 | # stage6
155 | h = torch.cat([h, feature_map], dim= 1) # channel concat
156 | h = self.relu(self.Mconv1_stage6(h))
157 | h = self.relu(self.Mconv2_stage6(h))
158 | h = self.relu(self.Mconv3_stage6(h))
159 | h = self.relu(self.Mconv4_stage6(h))
160 | h = self.relu(self.Mconv5_stage6(h))
161 | h = self.relu(self.Mconv6_stage6(h))
162 | h = self.Mconv7_stage6(h)
163 | heatmaps.append(h)
164 |
165 | return heatmaps
--------------------------------------------------------------------------------
/openpose.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import math
3 | import time
4 | import numpy as np
5 | from scipy.ndimage.filters import gaussian_filter
6 | import os
7 | import torch
8 | import torch.nn.functional as F
9 | from tensorboardX import SummaryWriter
10 | from entity import params, JointType
11 | from models.CocoPoseNet import CocoPoseNet, compute_loss
12 | from tqdm import tqdm
13 | from torch.utils.data import DataLoader
14 | from torch.optim import Adam
15 | from datetime import datetime
16 | import time
17 | from matplotlib import pyplot as plt
18 |
19 | def get_time():
20 | return (str(datetime.now())[:-10]).replace(' ','-').replace(':','-')
21 |
22 | class Openpose(object):
23 | def __init__(self, arch='posenet', weights_file=None, training = True):
24 | self.arch = arch
25 | if weights_file:
26 | self.model = params['archs'][arch]()
27 | self.model.load_state_dict(torch.load(weights_file))
28 | else:
29 | self.model = params['archs'][arch](params['pretrained_path'])
30 |
31 | self.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
32 | self.model = self.model.to(self.device)
33 |
34 | if training:
35 | from pycocotools.coco import COCO
36 | from coco_dataset import CocoDataset
37 | for para in self.model.base.vgg_base.parameters():
38 | para.requires_grad = False
39 | coco_train = COCO(os.path.join(params['coco_dir'], 'annotations/person_keypoints_train2017.json'))
40 | coco_val = COCO(os.path.join(params['coco_dir'], 'annotations/person_keypoints_val2017.json'))
41 | self.train_loader = DataLoader(CocoDataset(coco_train, params['insize']),
42 | params['batch_size'],
43 | shuffle=True,
44 | pin_memory=False,
45 | num_workers=params['num_workers'])
46 | self.val_loader = DataLoader(CocoDataset(coco_val, params['insize'], mode = 'val'),
47 | params['batch_size'],
48 | shuffle=False,
49 | pin_memory=False,
50 | num_workers=params['num_workers'])
51 | self.train_length = len(self.train_loader)
52 | self.val_length = len(self.val_loader)
53 | self.step = 0
54 | self.writer = SummaryWriter(params['log_path'])
55 | self.board_loss_every = self.train_length // params['board_loss_interval']
56 | self.evaluate_every = self.train_length // params['eval_interval']
57 | self.board_pred_image_every = self.train_length // params['board_pred_image_interval']
58 | self.save_every = self.train_length // params['save_interval']
59 | self.optimizer = Adam([
60 | {'params' : [*self.model.parameters()][20:24], 'lr' : params['lr'] / 4},
61 | {'params' : [*self.model.parameters()][24:], 'lr' : params['lr']}])
62 | # test only codes
63 | # self.board_loss_every = 5
64 | # self.evaluate_every = 5
65 | # self.board_pred_image_every = 5
66 | # self.save_every = 5
67 |
68 | def board_scalars(self, key, loss, paf_log, heatmap_log):
69 | self.writer.add_scalar('{}_loss'.format(key), loss, self.step)
70 | for stage, (paf_loss, heatmap_loss) in enumerate(zip(paf_log, heatmap_log)):
71 | self.writer.add_scalar('{}_paf_loss_stage{}'.format(key, stage), paf_loss, self.step)
72 | self.writer.add_scalar('{}_heatmap_loss_stage{}'.format(key, stage), heatmap_loss, self.step)
73 |
74 | def evaluate(self, num = 50):
75 | self.model.eval()
76 | count = 0
77 | running_loss = 0.
78 | running_paf_log = 0.
79 | running_heatmap_log = 0.
80 | with torch.no_grad():
81 | for imgs, pafs, heatmaps, ignore_mask in iter(self.val_loader):
82 | imgs, pafs, heatmaps, ignore_mask = imgs.to(self.device), pafs.to(self.device), heatmaps.to(self.device), ignore_mask.to(self.device)
83 | pafs_ys, heatmaps_ys = self.model(imgs)
84 | total_loss, paf_loss_log, heatmap_loss_log = compute_loss(pafs_ys, heatmaps_ys, pafs, heatmaps, ignore_mask)
85 | running_loss += total_loss.item()
86 | running_paf_log += paf_loss_log
87 | running_heatmap_log += heatmap_loss_log
88 | count += 1
89 | if count >= num:
90 | break
91 | return running_loss / num, running_paf_log / num, running_heatmap_log / num
92 |
93 | def save_state(self, val_loss, to_save_folder=False, model_only=False):
94 | if to_save_folder:
95 | save_path = params['work_space']/'save'
96 | else:
97 | save_path = params['work_space']/'model'
98 | time = get_time()
99 | torch.save(
100 | self.model.state_dict(), save_path /
101 | ('model_{}_val_loss:{}_step:{}.pth'.format(time, val_loss, self.step)))
102 | if not model_only:
103 | torch.save(
104 | self.optimizer.state_dict(), save_path /
105 | ('optimizer_{}_val_loss:{}_step:{}.pth'.format(time, val_loss, self.step)))
106 |
107 | def load_state(self, fixed_str, from_save_folder=False, model_only=False):
108 | if from_save_folder:
109 | save_path = params['work_space']/'save'
110 | else:
111 | save_path = params['work_space']/'model'
112 | self.model.load_state_dict(torch.load(save_path/'model_{}'.format(fixed_str)))
113 | print('load model_{}'.format(fixed_str))
114 | if not model_only:
115 | self.optimizer.load_state_dict(torch.load(save_path/'optimizer_{}'.format(fixed_str)))
116 | print('load optimizer_{}'.format(fixed_str))
117 |
118 | def resume_training_load(self, from_save_folder=False):
119 | if from_save_folder:
120 | save_path = params['work_space']/'save'
121 | else:
122 | save_path = params['work_space']/'model'
123 | sorted_files = sorted([*save_path.iterdir()], key=lambda x: os.path.getmtime(x), reverse=True)
124 | seeking_flag = True
125 | index = 0
126 | while seeking_flag:
127 | if index > len(sorted_files) - 2:
128 | break
129 | file_a = sorted_files[index]
130 | file_b = sorted_files[index + 1]
131 | if file_a.name.startswith('model'):
132 | fix_str = file_a.name[6:]
133 | self.step = int(fix_str.split(':')[-1].split('.')[0]) + 1
134 | if file_b.name == ''.join(['optimizer', '_', fix_str]):
135 | if self.step > 2000:
136 | for para in self.model.base.vgg_base.parameters():
137 | para.requires_grad = True
138 | self.optimizer.add_param_group({'params' : [*self.model.base.vgg_base.parameters()], 'lr' : params['lr'] / 4})
139 | self.load_state(fix_str, from_save_folder)
140 | print(self.optimizer)
141 | return
142 | else:
143 | index += 1
144 | continue
145 | elif file_a.name.startswith('optimizer'):
146 | fix_str = file_a.name[10:]
147 | self.step = int(fix_str.split(':')[-1].split('.')[0]) + 1
148 | if file_b.name == ''.join(['model', '_', fix_str]):
149 | if self.step > 2000:
150 | for para in self.model.base.vgg_base.parameters():
151 | para.requires_grad = True
152 | self.optimizer.add_param_group({'params' : [*self.model.base.vgg_base.parameters()], 'lr' : params['lr'] / 4})
153 | self.load_state(fix_str, from_save_folder)
154 | print(self.optimizer)
155 | return
156 | else:
157 | index += 1
158 | continue
159 | else:
160 | index += 1
161 | continue
162 | print('no available files founded')
163 | return
164 |
165 | def find_lr(self,
166 | init_value=1e-8,
167 | final_value=10.,
168 | beta=0.98,
169 | bloding_scale=4.,
170 | num=None):
171 | if not num:
172 | num = len(self.train_loader)
173 | mult = (final_value / init_value)**(1 / num)
174 | lr = init_value
175 | for params in self.optimizer.param_groups:
176 | params['lr'] = lr
177 | self.model.train()
178 | avg_loss = 0.
179 | best_loss = 0.
180 | batch_num = 0
181 | losses = []
182 | log_lrs = []
183 | for i, (imgs, pafs, heatmaps, ignore_mask) in tqdm(enumerate(self.train_loader), total=num):
184 |
185 | imgs, pafs, heatmaps, ignore_mask = imgs.to(self.device), pafs.to(self.device), heatmaps.to(self.device), ignore_mask.to(self.device)
186 | self.optimizer.zero_grad()
187 | batch_num += 1
188 | pafs_ys, heatmaps_ys = self.model(imgs)
189 | loss, _, _ = compute_loss(pafs_ys, heatmaps_ys, pafs, heatmaps, ignore_mask)
190 |
191 | self.optimizer.step()
192 |
193 | #Compute the smoothed loss
194 | avg_loss = beta * avg_loss + (1 - beta) * loss.item()
195 | self.writer.add_scalar('avg_loss', avg_loss, batch_num)
196 | smoothed_loss = avg_loss / (1 - beta**batch_num)
197 | self.writer.add_scalar('smoothed_loss', smoothed_loss,batch_num)
198 | #Stop if the loss is exploding
199 | if batch_num > 1 and smoothed_loss > bloding_scale * best_loss:
200 | print('exited with best_loss at {}'.format(best_loss))
201 | plt.plot(log_lrs[10:-5], losses[10:-5])
202 | return log_lrs, losses
203 | #Record the best loss
204 | if smoothed_loss < best_loss or batch_num == 1:
205 | best_loss = smoothed_loss
206 | #Store the values
207 | losses.append(smoothed_loss)
208 | log_lrs.append(math.log10(lr))
209 | self.writer.add_scalar('log_lr', math.log10(lr), batch_num)
210 | #Do the SGD step
211 | #Update the lr for the next step
212 |
213 | loss.backward()
214 | self.optimizer.step()
215 |
216 | lr *= mult
217 | for params in self.optimizer.param_groups:
218 | params['lr'] = lr
219 | if batch_num > num:
220 | plt.plot(log_lrs[10:-5], losses[10:-5])
221 | return log_lrs, losses
222 |
223 | def lr_schedule(self):
224 | for params in self.optimizer.param_groups:
225 | params['lr'] /= 10.
226 | print(self.optimizer)
227 |
228 | def train(self, resume = False):
229 | running_loss = 0.
230 | running_paf_log = 0.
231 | running_heatmap_log = 0.
232 | if resume:
233 | self.resume_training_load()
234 | for epoch in range(60):
235 | for imgs, pafs, heatmaps, ignore_mask in tqdm(iter(self.train_loader)):
236 | if self.step == 2000:
237 | for para in self.model.base.vgg_base.parameters():
238 | para.requires_grad = True
239 | self.optimizer.add_param_group({'params' : [*self.model.base.vgg_base.parameters()], 'lr' : params['lr'] / 4})
240 | if self.step == 100000 or self.step == 200000:
241 | self.lr_schedule()
242 |
243 | imgs, pafs, heatmaps, ignore_mask = imgs.to(self.device), pafs.to(self.device), heatmaps.to(self.device), ignore_mask.to(self.device)
244 | self.optimizer.zero_grad()
245 | pafs_ys, heatmaps_ys = self.model(imgs)
246 | total_loss, paf_loss_log, heatmap_loss_log = compute_loss(pafs_ys, heatmaps_ys, pafs, heatmaps, ignore_mask)
247 | total_loss.backward()
248 | self.optimizer.step()
249 | running_loss += total_loss.item()
250 | running_paf_log += paf_loss_log
251 | running_heatmap_log += heatmap_loss_log
252 |
253 | if (self.step % self.board_loss_every == 0) & (self.step != 0):
254 | self.board_scalars('train',
255 | running_loss / self.board_loss_every,
256 | running_paf_log / self.board_loss_every,
257 | running_heatmap_log / self.board_loss_every)
258 | running_loss = 0.
259 | running_paf_log = 0.
260 | running_heatmap_log = 0.
261 |
262 | if (self.step % self.evaluate_every == 0) & (self.step != 0):
263 | val_loss, paf_loss_val_log, heatmap_loss_val_log = self.evaluate(num = params['eva_num'])
264 | self.model.train()
265 | self.board_scalars('val', val_loss, paf_loss_val_log, heatmap_loss_val_log)
266 |
267 | if (self.step % self.board_pred_image_every == 0) & (self.step != 0):
268 | self.model.eval()
269 | with torch.no_grad():
270 | for i in range(20):
271 | img_id = self.val_loader.dataset.imgIds[i]
272 | img_path = os.path.join(params['coco_dir'], 'val2017', self.val_loader.dataset.coco.loadImgs([img_id])[0]['file_name'])
273 | img = cv2.imread(img_path)
274 | # inference
275 | poses, _ = self.detect(img)
276 | # draw and save image
277 | img = draw_person_pose(img, poses)
278 | img = torch.tensor(img.transpose(2,0,1))
279 | self.writer.add_image('pred_image_{}'.format(i), img, global_step=self.step)
280 | self.model.train()
281 |
282 | if (self.step % self.save_every == 0) & (self.step != 0):
283 | self.save_state(val_loss)
284 |
285 | self.step += 1
286 | if self.step > 300000:
287 | break
288 |
289 | def pad_image(self, img, stride, pad_value):
290 | h, w, _ = img.shape
291 |
292 | pad = [0] * 2
293 | pad[0] = (stride - (h % stride)) % stride # down
294 | pad[1] = (stride - (w % stride)) % stride # right
295 |
296 | img_padded = np.zeros((h+pad[0], w+pad[1], 3), 'uint8') + pad_value
297 | img_padded[:h, :w, :] = img.copy()
298 | return img_padded, pad
299 |
300 | def compute_optimal_size(self, orig_img, img_size, stride=8):
301 | """画像の幅と高さがstrideの倍数になるように調節する"""
302 | orig_img_h, orig_img_w, _ = orig_img.shape
303 | aspect = orig_img_h / orig_img_w
304 | if orig_img_h < orig_img_w:
305 | img_h = img_size
306 | img_w = np.round(img_size / aspect).astype(int)
307 | surplus = img_w % stride
308 | if surplus != 0:
309 | img_w += stride - surplus
310 | else:
311 | img_w = img_size
312 | img_h = np.round(img_size * aspect).astype(int)
313 | surplus = img_h % stride
314 | if surplus != 0:
315 | img_h += stride - surplus
316 | return (img_w, img_h)
317 |
318 | def compute_peaks_from_heatmaps(self, heatmaps):
319 | """all_peaks: shape = [N, 5], column = (jointtype, x, y, score, index)"""
320 | #heatmaps.shape : (19, 584, 584)
321 | #heatmaps[-1]是背景,训练时有用,推断时没用,这里去掉
322 | heatmaps = heatmaps[:-1]
323 |
324 | all_peaks = []
325 | peak_counter = 0
326 | for i , heatmap in enumerate(heatmaps):
327 | heatmap = gaussian_filter(heatmap, sigma=params['gaussian_sigma'])
328 | '''
329 | 可以和下面的GPU codes对比一下,
330 | 这里的gaussian_filter其实就是拿一个gaussian_kernel在输出的heatmaps上depth为1的平面卷积,
331 | 因为网络拟合的heatmap也是在目标点上生成的gaussian heatmap,
332 | 这样卷积比较合理地找到最贴近目标点的坐标
333 | '''
334 | map_left = np.zeros(heatmap.shape)
335 | map_right = np.zeros(heatmap.shape)
336 | map_top = np.zeros(heatmap.shape)
337 | map_bottom = np.zeros(heatmap.shape)
338 | '''
339 | 我的理解,其实这里left和top, right和bottom搞反了,但是不影响最终结果
340 | '''
341 | map_left[1:, :] = heatmap[:-1, :]
342 | map_right[:-1, :] = heatmap[1:, :]
343 | map_top[:, 1:] = heatmap[:, :-1]
344 | map_bottom[:, :-1] = heatmap[:, 1:]
345 |
346 | peaks_binary = np.logical_and.reduce((
347 | heatmap > params['heatmap_peak_thresh'],
348 | heatmap > map_left,
349 | heatmap > map_right,
350 | heatmap > map_top,
351 | heatmap > map_bottom,
352 | ))
353 | '''
354 | 这一步操作厉害了,找的是heatmap上满足如下两个条件的所有的点:
355 | 1. 该点的值大于 params['heatmap_peak_thresh'],默认是0.05
356 | 2. 其次,该点的值比它上、下、左、右四个点的值都要大
357 | 满足以上条件为True,否则False
358 | '''
359 | peaks = zip(np.nonzero(peaks_binary)[1], np.nonzero(peaks_binary)[0]) # [(x, y), (x, y)...]のpeak座標配列
360 | '''np.nonzero返回的坐标格式是[y,x],这里被改成了[x,y]'''
361 | peaks_with_score = [(i,) + peak_pos + (heatmap[peak_pos[1], peak_pos[0]],) for peak_pos in peaks]
362 | '''
363 | [(0, 387, 47, 0.050346997),
364 | (0, 388, 47, 0.050751492),
365 | (0, 389, 47, 0.051055912),
366 | .....]
367 | (关节点的index, x坐标, y坐标, heatmap value)
368 | '''
369 | peaks_id = range(peak_counter, peak_counter + len(peaks_with_score))
370 | peaks_with_score_and_id = [peaks_with_score[i] + (peaks_id[i], ) for i in range(len(peaks_id))]
371 | '''
372 | [(0, 387, 47, 0.050346997, 0),
373 | (0, 388, 47, 0.050751492, 1),
374 | (0, 389, 47, 0.051055912, 2),
375 | (0, 390, 47, 0.051255725, 3),
376 | ......]
377 | 这一步还把序号带上了
378 | '''
379 | peak_counter += len(peaks_with_score_and_id)
380 | all_peaks.append(peaks_with_score_and_id)
381 | all_peaks = np.array([peak for peaks_each_category in all_peaks for peak in peaks_each_category])
382 | '''还可以这样写啊,两层语法糖'''
383 | return all_peaks
384 |
385 | def compute_candidate_connections(self, paf, cand_a, cand_b, img_len, params):
386 | candidate_connections = []
387 | for joint_a in cand_a:
388 | for joint_b in cand_b: # jointは(x, y)座標
389 | vector = joint_b[:2] - joint_a[:2]
390 | norm = np.linalg.norm(vector)
391 | if norm == 0:
392 | continue
393 | ys = np.linspace(joint_a[1], joint_b[1], num=params['n_integ_points'])
394 | xs = np.linspace(joint_a[0], joint_b[0], num=params['n_integ_points'])
395 | integ_points = np.stack([ys, xs]).T.round().astype('i')
396 | '''
397 | # joint_aとjoint_bの2点間を結ぶ線分上の座標点 [[x1, y1], [x2, y2]...]
398 | # 连接joint_a和joint_b的线段上的坐标点[[x1,y1],[x2,y2] ...]
399 | params['n_integ_points'] = 10
400 | integ_points =
401 | array([[ 32, 242],
402 | [ 36, 241],
403 | [ 39, 240],
404 | [ 43, 239],
405 | [ 47, 238],
406 | [ 50, 236],
407 | [ 54, 235],
408 | [ 58, 234],
409 | [ 61, 233],
410 | [ 65, 232]], dtype=int32)
411 | 通过在连接joint_a和joint_b的线段上sample 10个点,取整之后得到坐标值
412 | '''
413 | paf_in_edge = np.hstack([paf[0][np.hsplit(integ_points, 2)], paf[1][np.hsplit(integ_points, 2)]])
414 | '''
415 | paf_in_edge.shape : (10, 2)
416 | paf_in_edge代表在这10个点上的paf预测值
417 | '''
418 | unit_vector = vector / norm
419 | inner_products = np.dot(paf_in_edge, unit_vector)
420 | integ_value = inner_products.sum() / len(inner_products)
421 | '''
422 | 以上三行相当于论文中的公式10
423 | 通过sample的方法来代替求积分
424 | '''
425 | '''
426 | # vectorの長さが基準値以上の時にペナルティを与える
427 | # 当vector的长度大于或等于参考值时给予惩罚
428 | params['limb_length_ratio'] = 1
429 | params['length_penalty_value'] = 1
430 | img_len = 原始图片的width
431 | '''
432 | integ_value_with_dist_prior = integ_value + min(params['limb_length_ratio'] * img_len / norm - params['length_penalty_value'], 0)
433 | '''
434 | params['inner_product_thresh'] = 0.05
435 | params['n_integ_points_thresh'] = 8
436 | 以下条件控制表示,
437 | 只有当这10个点里面至少有8个点的paf向量值与连接joint_a与joint_b之间的单位向量的点积大于0.05时,
438 | 并且这10个点的平均值(近似积分值)> 0时,才认为存在一条可能的connection,并把它记录下来
439 | '''
440 | n_valid_points = sum(inner_products > params['inner_product_thresh'])
441 | if n_valid_points > params['n_integ_points_thresh'] and integ_value_with_dist_prior > 0:
442 | candidate_connections.append([int(joint_a[3]), int(joint_b[3]), integ_value_with_dist_prior])
443 | '''
444 | 这里记录下来的是joint_a和joint_b在all_peaks里的序号,还有积分值
445 | joint_a,joint_b是从cand_a, cand_b中枚举出来的,
446 | 而cand_a和cand_b都是从all_peaks里面map出来的
447 | '''
448 | candidate_connections = sorted(candidate_connections, key=lambda x: x[2], reverse=True)
449 | '''在所有取到的可能connection里通过积分值的大小排个序'''
450 | return candidate_connections
451 | '''
452 | len(all_connections) = 19
453 | 正好代表了19种可能的躯干,每一种躯干里面又是一个代表了所有可能connections的array
454 | 比如all_connections[2] =
455 | array([[47. , 51. , 0.86362792],
456 | [46. , 50. , 0.71809054],
457 | [45. , 49. , 0.59873392],
458 | [44. , 48. , 0.3711632 ]])
459 | 里面有4个连接,这4个连接有可能属于不同的人的,
460 | 然后通过最后一步操作grouping_key_points,把每个人的躯干给组合起来
461 | '''
462 |
463 | def compute_connections(self, pafs, all_peaks, img_len, params):
464 | all_connections = []
465 | for i in range(len(params['limbs_point'])):
466 | '''
467 | params['limbs_point']:
468 | [[, ],
469 | [, ],
470 | [, ],
471 | [, ],
472 | [, ],
473 | [, ],
474 | [, ],
475 | [, ],
476 | [, ],
477 | [, ],
478 | [, ],
479 | [, ],
480 | [, ],
481 | [, ],
482 | [, ],
483 | [, ],
484 | [, ],
485 | [, ],
486 | [, ]]
487 | 代表的是limb躯干的数量,也就是PAF的种类,总共19种,
488 | 可以理解这个大循环执行的是一个个的任务,
489 | 第一次只是执行寻找比如说Neck到右腰的所有的可能的connection,
490 | 然后是右腰到右膝的所有可能的connection,总共19次任务
491 | '''
492 | paf_index = [i*2, i*2 + 1]
493 | paf = pafs[paf_index] # shape: (2, 320, 320)
494 | '''
495 | 这里paf的channel为什么等于2 ?
496 | 因为paf代表的是一个向量场,在躯干覆盖在图片上的这一块区域内,
497 | 每个点都代表一个2维向量,
498 | ground truth的paf代表的是从joint_a到joint_b的单位向量(预测的值自然也会与GT接近)
499 | '''
500 | limb_point = params['limbs_point'][i] # example: [, ]
501 | cand_a = all_peaks[all_peaks[:, 0] == limb_point[0]][:, 1:]
502 | cand_b = all_peaks[all_peaks[:, 0] == limb_point[1]][:, 1:]
503 | '''
504 | all_peaks[:, 0]代表peak归属于哪个joint的序号
505 | cand_a表示candidate_a
506 | 表示对应这个关节的所有peak的[x,y,value,idx]
507 | 因为一张图里可能有很多个人,
508 | 这里对应这个关节的peak点也很可能分别属于不同的人
509 | 接下来就要在两两相连的两种关节之间,通过PAF去计算哪些关节是可能相连的
510 | 注意每次任务只会仅仅在两种关节间寻找可能的connection
511 | '''
512 |
513 | if len(cand_a) > 0 and len(cand_b) > 0:
514 | candidate_connections = self.compute_candidate_connections(paf, cand_a, cand_b, img_len, params)
515 | '''
516 | candidate_connections:
517 | [[9, 42, 0.8681351658332168],
518 | [8, 41, 0.8360657979306498],
519 | [10, 43, 0.7184696600989704],
520 | [7, 40, 0.619533988669367],
521 | [6, 39, 0.25027479198405156]]
522 | ############## [index_a, index_b, value]
523 | '''
524 | connections = np.zeros((0, 3))
525 | '''
526 | connections : array([], shape=(0, 3), dtype=float64)
527 | 又学到一招,创建一个0行3列的array,相当于一个模板,给后续工作服务
528 | '''
529 | for index_a, index_b, score in candidate_connections:
530 | if index_a not in connections[:, 0] and index_b not in connections[:, 1]:
531 | '''
532 | 这个if条件起到了抑制重复连接的作用
533 | index_a代表起始点,index_b代表终点
534 | 一个peak点(或者说检测出来的关节点),
535 | 在一次任务里只可能有一次机会作为起始点或终点,
536 | 因为前面已经根据积分值的大小做了排序,
537 | 积分值最高的自然会被优先选择,后面的可能的connection如果重复了,就被抑制掉了
538 | '''
539 | connections = np.vstack([connections, [index_a, index_b, score]])
540 | if len(connections) >= min(len(cand_a), len(cand_b)):
541 | break
542 | all_connections.append(connections)
543 | else:
544 | all_connections.append(np.zeros((0, 3)))
545 | return all_connections
546 |
547 | def grouping_key_points(self, all_connections, candidate_peaks, params):
548 | subsets = -1 * np.ones((0, 20))
549 | '''
550 | subsets是用来记录grouping结果的array
551 | 为什么是20列,前面18位用来记录关节点信息
552 | 总共18种关节,每一位对应candidate_peaks中的一个peak的序号
553 | 倒数第二位用来记录整个subset的得分
554 | 倒数第一位用来记录这个subset已经记录了多少个关节
555 | 18种关节不见得都要找到,缺失了就沿用默认值-1
556 | 每一个subset就代表可能检测到的一个人体
557 | '''
558 |
559 | for l, connections in enumerate(all_connections):
560 | joint_a, joint_b = params['limbs_point'][l]
561 | '''
562 | 19种躯干的连接,按照params['limbs_point'][l]的顺序依次读取之前预测出来的可能的connection
563 | 比如 (, )
564 | 注意这里如果按照顺序来,是不需要走回头路的,没有重复,因为人的躯干就这么几种,不存在重复的问题
565 | '''
566 | for ind_a, ind_b, score in connections[:, :3]:
567 | '''
568 | 内循环,针对同一种躯干类型,之前的步骤已经预测出了很多个connection,
569 | 这些connection按道理应该属于不同的人体,因为一个人不可能有两个相同的躯干,
570 | 接下来几个if条件判断的是这3种情况,根据当前这个connection的两个端点的归属情况:
571 |
572 | 1. 如果都不属于任何一个已有的subset(joint_found_cnt == 0),则根据这个connection创建一个新的subset
573 | 2. 如果只有一个属于一个已有的subset,另一个哪都不属于,这种情况最简单,直接在这个已有的subset上再添加一个connection(躯干)
574 | 3. 如果在两个已有的subset上都能找到这两个端点之中的任何一个,那还要看:
575 | a. 如果这两个subset上完全没有任何重复的关节,那就把这两个subset合并
576 | b. 如果有重复,那就把这个connection在这两个subset上都添加上去(反正最后会把得分较低的subset删掉)
577 | 4. 如果有3个以上的subset都包含有这两个端点之中的任何一个,直接pass
578 | '''
579 | ind_a, ind_b = int(ind_a), int(ind_b)
580 | joint_found_cnt = 0
581 | joint_found_subset_index = [-1, -1]
582 | for subset_ind, subset in enumerate(subsets):
583 | # そのconnectionのjointをもってるsubsetがいる場合
584 | # 如果存在具有该连接的联合的子集
585 | if subset[joint_a] == ind_a or subset[joint_b] == ind_b:
586 | joint_found_subset_index[joint_found_cnt] = subset_ind
587 | joint_found_cnt += 1
588 | # 上面这个for循环遍历所有已有的subset,判断当前connection是两个端点到底和几个subset重合
589 | # print('joint_found_cnt : {}'.format(joint_found_cnt))
590 | # print('joint_a : {}, joint_b : {}'.format(joint_a, joint_b))
591 | # print('ind_a : {}, ind_b : {}'.format(ind_a, ind_b))
592 | if joint_found_cnt == 1:
593 | # '''
594 | # 只有一个subset有重合的情况
595 | # そのconnectionのどちらかのjointをsubsetが持っている場合
596 | # 如果子集具有该连接的一个关节
597 | # '''
598 | found_subset = subsets[joint_found_subset_index[0]]
599 | # 肩->耳のconnectionの組合せを除いて、始点の一致しか起こり得ない。肩->耳の場合、終点が一致していた場合は、既に顔のbone検出済みなので処理不要。
600 | # 除了肩 - 耳连接的组合,只能出现起点的匹配。 在肩膀 - >耳朵的情况下,如果端点匹配,则已经不必处理,因为已经检测到面部的骨骼。
601 | if found_subset[joint_b] != ind_b:
602 | found_subset[joint_b] = ind_b
603 | found_subset[-1] += 1 # increment joint count
604 | found_subset[-2] += candidate_peaks[ind_b, 3] + score # joint bのscoreとconnectionの積分値を加算 # 添加关节b的得分和连接的积分值
605 |
606 | elif joint_found_cnt == 2: # '''有2个subset有重合的情况'''
607 | # subset1にjoint1が、subset2にjoint2がある場合(肩->耳のconnectionの組合せした起こり得ない)
608 | # 如果子集1中存在关节1而子集2中存在关节2(通过组合肩 - >耳连接不会发生)
609 | # print('limb {}: 2 subsets have any joint'.format(l))
610 | found_subset_1 = subsets[joint_found_subset_index[0]]
611 | found_subset_2 = subsets[joint_found_subset_index[1]]
612 |
613 | membership = ((found_subset_1 >= 0).astype(int) + (found_subset_2 >= 0).astype(int))[:-2]
614 | if not np.any(membership == 2): # merge two subsets when no duplication
615 | found_subset_1[:-2] += found_subset_2[:-2] + 1 # default is -1
616 | found_subset_1[-2:] += found_subset_2[-2:]
617 | found_subset_1[-2:] += score # 这一步应该是错误吧,应该是found_subset_1[-2] += score, 没有必要把score值加到joint_count上面去, 不过不影响最终结果
618 | # connectionの積分値のみ加算(jointのscoreはmerge時に全て加算済み)
619 | # 仅添加连接的积分值(在合并时添加联合分数)
620 | subsets = np.delete(subsets, joint_found_subset_index[1], axis=0)
621 | else:
622 | if found_subset_1[joint_a] == -1:
623 | found_subset_1[joint_a] = ind_a
624 | found_subset_1[-1] += 1
625 | found_subset_1[-2] += candidate_peaks[ind_a, 3] + score
626 | elif found_subset_1[joint_b] == -1:
627 | found_subset_1[joint_b] = ind_b
628 | found_subset_1[-1] += 1
629 | found_subset_1[-2] += candidate_peaks[ind_b, 3] + score
630 | if found_subset_2[joint_a] == -1:
631 | found_subset_2[joint_a] = ind_a
632 | found_subset_2[-1] += 1
633 | found_subset_2[-2] += candidate_peaks[ind_a, 3] + score
634 | elif found_subset_2[joint_b] == -1:
635 | found_subset_2[joint_b] = ind_b
636 | found_subset_2[-1] += 1
637 | found_subset_2[-2] += candidate_peaks[ind_b, 3] + score
638 |
639 | elif joint_found_cnt == 0 and l != 9 and l != 13:
640 | # 新規subset作成, 肩耳のconnectionは新規group対象外
641 | # 如果没有任何现成的subset匹配,则创建新的子集,肩耳连接不适用于创建新组
642 | row = -1 * np.ones(20)
643 | row[joint_a] = ind_a
644 | row[joint_b] = ind_b
645 | row[-1] = 2
646 | row[-2] = sum(candidate_peaks[[ind_a, ind_b], 3]) + score
647 | subsets = np.vstack([subsets, row])
648 | elif joint_found_cnt >= 3:
649 | pass
650 |
651 | # delete low score subsets
652 | keep = np.logical_and(subsets[:, -1] >= params['n_subset_limbs_thresh'], subsets[:, -2]/subsets[:, -1] >= params['subset_score_thresh'])
653 | # params['n_subset_limbs_thresh'] = 3
654 | # params['subset_score_thresh'] = 0.2
655 | subsets = subsets[keep]
656 | return subsets
657 |
658 |
659 | def subsets_to_pose_array(self, subsets, all_peaks):
660 | '''
661 | 这个函数没啥,
662 | 就是根据每一个subsets里的peak点的id,
663 | 去all_peaks里面取对应的坐标,然后组装成输出
664 | '''
665 | person_pose_array = []
666 | for subset in subsets:
667 | joints = []
668 | for joint_index in subset[:18].astype('i'):
669 | if joint_index >= 0:
670 | joint = all_peaks[joint_index][1:3].tolist()
671 | joint.append(2)
672 | joints.append(joint)
673 | else:
674 | joints.append([0, 0, 0])
675 | person_pose_array.append(np.array(joints))
676 | person_pose_array = np.array(person_pose_array)
677 | return person_pose_array
678 |
679 | def compute_limbs_length(self, joints):
680 | limbs = []
681 | limbs_len = np.zeros(len(params["limbs_point"]))
682 | for i, joint_indices in enumerate(params["limbs_point"]):
683 | if joints[joint_indices[0]] is not None and joints[joint_indices[1]] is not None:
684 | limbs.append([joints[joint_indices[0]], joints[joint_indices[1]]])
685 | limbs_len[i] = np.linalg.norm(joints[joint_indices[1]][:-1] - joints[joint_indices[0]][:-1])
686 | else:
687 | limbs.append(None)
688 |
689 | return limbs_len, limbs
690 |
691 | def compute_unit_length(self, limbs_len):
692 | unit_length = 0
693 | base_limbs_len = limbs_len[[14, 3, 0, 13, 9]] # (鼻首、首左腰、首右腰、肩左耳、肩右耳)の長さの比率(このどれかが存在すればこれを優先的に単位長さの計算する)
694 | non_zero_limbs_len = base_limbs_len > 0
695 | if len(np.nonzero(non_zero_limbs_len)[0]) > 0:
696 | limbs_len_ratio = np.array([0.85, 2.2, 2.2, 0.85, 0.85])
697 | unit_length = np.sum(base_limbs_len[non_zero_limbs_len] / limbs_len_ratio[non_zero_limbs_len]) / len(np.nonzero(non_zero_limbs_len)[0])
698 | else:
699 | limbs_len_ratio = np.array([2.2, 1.7, 1.7, 2.2, 1.7, 1.7, 0.6, 0.93, 0.65, 0.85, 0.6, 0.93, 0.65, 0.85, 1, 0.2, 0.2, 0.25, 0.25])
700 | non_zero_limbs_len = limbs_len > 0
701 | unit_length = np.sum(limbs_len[non_zero_limbs_len] / limbs_len_ratio[non_zero_limbs_len]) / len(np.nonzero(non_zero_limbs_len)[0])
702 |
703 | return unit_length
704 |
705 | def get_unit_length(self, person_pose):
706 | limbs_length, limbs = self.compute_limbs_length(person_pose)
707 | unit_length = self.compute_unit_length(limbs_length)
708 |
709 | return unit_length
710 |
711 | def crop_around_keypoint(self, img, keypoint, crop_size):
712 | x, y = keypoint
713 | left = int(x - crop_size)
714 | top = int(y - crop_size)
715 | right = int(x + crop_size)
716 | bottom = int(y + crop_size)
717 | bbox = (left, top, right, bottom)
718 |
719 | cropped_img = self.crop_image(img, bbox)
720 |
721 | return cropped_img, bbox
722 |
723 | def crop_person(self, img, person_pose, unit_length):
724 | top_joint_priority = [4, 5, 6, 12, 16, 7, 13, 17, 8, 10, 14, 9, 11, 15, 2, 3, 0, 1, sys.maxsize]
725 | bottom_joint_priority = [9, 6, 7, 14, 16, 8, 15, 17, 4, 2, 0, 5, 3, 1, 10, 11, 12, 13, sys.maxsize]
726 |
727 | top_joint_index = len(top_joint_priority) - 1
728 | bottom_joint_index = len(bottom_joint_priority) - 1
729 | left_joint_index = 0
730 | right_joint_index = 0
731 | top_pos = sys.maxsize
732 | bottom_pos = 0
733 | left_pos = sys.maxsize
734 | right_pos = 0
735 |
736 | for i, joint in enumerate(person_pose):
737 | if joint[2] > 0:
738 | if top_joint_priority[i] < top_joint_priority[top_joint_index]:
739 | top_joint_index = i
740 | elif bottom_joint_priority[i] < bottom_joint_priority[bottom_joint_index]:
741 | bottom_joint_index = i
742 | if joint[1] < top_pos:
743 | top_pos = joint[1]
744 | elif joint[1] > bottom_pos:
745 | bottom_pos = joint[1]
746 |
747 | if joint[0] < left_pos:
748 | left_pos = joint[0]
749 | left_joint_index = i
750 | elif joint[0] > right_pos:
751 | right_pos = joint[0]
752 | right_joint_index = i
753 |
754 | top_padding_radio = [0.9, 1.9, 1.9, 2.9, 3.7, 1.9, 2.9, 3.7, 4.0, 5.5, 7.0, 4.0, 5.5, 7.0, 0.7, 0.8, 0.7, 0.8]
755 | bottom_padding_radio = [6.9, 5.9, 5.9, 4.9, 4.1, 5.9, 4.9, 4.1, 3.8, 2.3, 0.8, 3.8, 2.3, 0.8, 7.1, 7.0, 7.1, 7.0]
756 |
757 | left = (left_pos - 0.3 * unit_length).astype(int)
758 | right = (right_pos + 0.3 * unit_length).astype(int)
759 | top = (top_pos - top_padding_radio[top_joint_index] * unit_length).astype(int)
760 | bottom = (bottom_pos + bottom_padding_radio[bottom_joint_index] * unit_length).astype(int)
761 | bbox = (left, top, right, bottom)
762 |
763 | cropped_img = self.crop_image(img, bbox)
764 | return cropped_img, bbox
765 |
766 | def crop_face(self, img, person_pose, unit_length):
767 | face_size = unit_length
768 | face_img = None
769 | bbox = None
770 |
771 | # if have nose
772 | if person_pose[JointType.Nose][2] > 0:
773 | nose_pos = person_pose[JointType.Nose][:2]
774 | face_top = int(nose_pos[1] - face_size * 1.2)
775 | face_bottom = int(nose_pos[1] + face_size * 0.8)
776 | face_left = int(nose_pos[0] - face_size)
777 | face_right = int(nose_pos[0] + face_size)
778 | bbox = (face_left, face_top, face_right, face_bottom)
779 | face_img = self.crop_image(img, bbox)
780 |
781 | return face_img, bbox
782 |
783 | def crop_hands(self, img, person_pose, unit_length):
784 | hands = {
785 | "left": None,
786 | "right": None
787 | }
788 |
789 | if person_pose[JointType.LeftHand][2] > 0:
790 | crop_center = person_pose[JointType.LeftHand][:-1]
791 | if person_pose[JointType.LeftElbow][2] > 0:
792 | direction_vec = person_pose[JointType.LeftHand][:-1] - person_pose[JointType.LeftElbow][:-1]
793 | crop_center += (0.3 * direction_vec).astype(crop_center.dtype)
794 | hand_img, bbox = self.crop_around_keypoint(img, crop_center, unit_length * 0.95)
795 | hands["left"] = {
796 | "img": hand_img,
797 | "bbox": bbox
798 | }
799 |
800 | if person_pose[JointType.RightHand][2] > 0:
801 | crop_center = person_pose[JointType.RightHand][:-1]
802 | if person_pose[JointType.RightElbow][2] > 0:
803 | direction_vec = person_pose[JointType.RightHand][:-1] - person_pose[JointType.RightElbow][:-1]
804 | crop_center += (0.3 * direction_vec).astype(crop_center.dtype)
805 | hand_img, bbox = self.crop_around_keypoint(img, crop_center, unit_length * 0.95)
806 | hands["right"] = {
807 | "img": hand_img,
808 | "bbox": bbox
809 | }
810 |
811 | return hands
812 |
813 | def crop_image(self, img, bbox):
814 | left, top, right, bottom = bbox
815 | img_h, img_w, img_ch = img.shape
816 | box_h = bottom - top
817 | box_w = right - left
818 |
819 | crop_left = max(0, left)
820 | crop_top = max(0, top)
821 | crop_right = min(img_w, right)
822 | crop_bottom = min(img_h, bottom)
823 | crop_h = crop_bottom - crop_top
824 | crop_w = crop_right - crop_left
825 | cropped_img = img[crop_top:crop_bottom, crop_left:crop_right]
826 |
827 | bias_x = bias_y = 0
828 | if left < crop_left:
829 | bias_x = crop_left - left
830 | if top < crop_top:
831 | bias_y = crop_top - top
832 |
833 | # pad
834 | padded_img = np.zeros((box_h, box_w, img_ch), dtype=np.uint8)
835 | padded_img[bias_y:bias_y+crop_h, bias_x:bias_x+crop_w] = cropped_img
836 | return padded_img
837 |
838 | def preprocess(self, img):
839 | x_data = img.astype('f')
840 | x_data /= 255
841 | x_data -= 0.5
842 | x_data = x_data.transpose(2, 0, 1)[None]
843 | return x_data
844 |
845 | def detect_precise(self, orig_img):
846 | orig_img_h, orig_img_w, _ = orig_img.shape
847 |
848 | pafs_sum = 0
849 | heatmaps_sum = 0
850 |
851 | interpolation = cv2.INTER_CUBIC
852 |
853 | for scale in params['inference_scales']:
854 | # TTA, multl scale testing, scale in [0.5, 1, 1.5, 2]
855 | multiplier = scale * params['inference_img_size'] / min(orig_img.shape[:2])
856 | # 通过scale和实际输入img尺寸判断缩放参数,然后resize输入图片
857 | img = cv2.resize(orig_img, (math.ceil(orig_img_w*multiplier), math.ceil(orig_img_h*multiplier)), interpolation=interpolation)
858 | # bbox = (params['inference_img_size'], max(params['inference_img_size'], img.shape[1]))
859 | # 这个bbox有什么用 ? 可以删掉吧
860 | padded_img, pad = self.pad_image(img, params['downscale'], (104, 117, 123))
861 | # 图片经过manpool缩小8倍,如果不能整除,就pad一下,(104, 117, 123)是输入数据集的均值 ?
862 |
863 | x_data = self.preprocess(padded_img)
864 | x_data = torch.tensor(x_data).to(self.device)
865 | x_data.requires_grad = False
866 |
867 | with torch.no_grad():
868 |
869 | h1s, h2s = self.model(x_data) #输出的是6组相同尺寸的feature,训练的时候是都有用,但是推断的时候就只用最后一组
870 |
871 | tmp_paf = h1s[-1][0].cpu().numpy().transpose(1, 2, 0)
872 | tmp_heatmap = h2s[-1][0].cpu().numpy().transpose(1, 2, 0)
873 |
874 | p_h, p_w = padded_img.shape[:2]
875 | tmp_paf = cv2.resize(tmp_paf, (p_w, p_h), interpolation=interpolation)
876 | #首先,paf先 resize到padded_img的尺寸
877 | tmp_paf = tmp_paf[:p_h-pad[0], :p_w-pad[1], :]
878 | #去掉padding
879 | pafs_sum += cv2.resize(tmp_paf, (orig_img_w, orig_img_h), interpolation=interpolation)
880 | #再resize回原始的输入img的尺寸
881 |
882 | tmp_heatmap = cv2.resize(tmp_heatmap, (0, 0), fx=params['downscale'], fy=params['downscale'], interpolation=interpolation)
883 | tmp_heatmap = tmp_heatmap[:padded_img.shape[0]-pad[0], :padded_img.shape[1]-pad[1], :]
884 | heatmaps_sum += cv2.resize(tmp_heatmap, (orig_img_w, orig_img_h), interpolation=interpolation)
885 | #heat_map的操作和pafs一样
886 |
887 | #经过多个scale的feature计算,再对pafs_sum和heatmaps_sum求均值,就得到了TTA最终的输出feature
888 |
889 | self.pafs = (pafs_sum / len(params['inference_scales'])).transpose(2, 0, 1)
890 | self.heatmaps = (heatmaps_sum / len(params['inference_scales'])).transpose(2, 0, 1)
891 |
892 | self.all_peaks = self.compute_peaks_from_heatmaps(self.heatmaps)
893 | if len(self.all_peaks) == 0:
894 | return np.empty((0, len(JointType), 3)), np.empty(0)
895 | all_connections = self.compute_connections(self.pafs, self.all_peaks, orig_img_w, params)
896 | subsets = self.grouping_key_points(all_connections, self.all_peaks, params)
897 | poses = self.subsets_to_pose_array(subsets, self.all_peaks)
898 | scores = subsets[:, -2]
899 | return poses, scores
900 |
901 | def detect(self, orig_img, precise = False):
902 | orig_img = orig_img.copy()
903 | if precise:
904 | return self.detect_precise(orig_img)
905 | orig_img_h, orig_img_w, _ = orig_img.shape
906 |
907 | input_w, input_h = self.compute_optimal_size(orig_img, params['inference_img_size'])
908 | map_w, map_h = self.compute_optimal_size(orig_img, params['heatmap_size'])
909 |
910 | resized_image = cv2.resize(orig_img, (input_w, input_h))
911 | x_data = self.preprocess(resized_image)
912 |
913 | x_data = torch.tensor(x_data).to(self.device)
914 | x_data.requires_grad = False
915 |
916 | with torch.no_grad():
917 |
918 | h1s, h2s = self.model(x_data)
919 |
920 | pafs = F.interpolate(h1s[-1], (map_h, map_w), mode='bilinear', align_corners=True).cpu().numpy()[0]
921 | heatmaps = F.interpolate(h2s[-1], (map_h, map_w), mode='bilinear', align_corners=True).cpu().numpy()[0]
922 |
923 | all_peaks = self.compute_peaks_from_heatmaps(heatmaps)
924 | if len(all_peaks) == 0:
925 | return np.empty((0, len(JointType), 3)), np.empty(0)
926 | all_connections = self.compute_connections(pafs, all_peaks, map_w, params)
927 | subsets = self.grouping_key_points(all_connections, all_peaks, params)
928 | all_peaks[:, 1] *= orig_img_w / map_w
929 | all_peaks[:, 2] *= orig_img_h / map_h
930 | poses = self.subsets_to_pose_array(subsets, all_peaks)
931 | scores = subsets[:, -2]
932 | return poses, scores
933 |
934 |
935 | def draw_person_pose(orig_img, poses):
936 | orig_img = cv2.cvtColor(orig_img, cv2.COLOR_BGR2RGB)
937 | if len(poses) == 0:
938 | return orig_img
939 |
940 | limb_colors = [
941 | [0, 255, 0], [0, 255, 85], [0, 255, 170], [0, 255, 255], [0, 170, 255],
942 | [0, 85, 255], [255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0.],
943 | [255, 0, 85], [170, 255, 0], [85, 255, 0], [170, 0, 255.], [0, 0, 255],
944 | [0, 0, 255], [255, 0, 255], [170, 0, 255], [255, 0, 170],
945 | ]
946 |
947 | joint_colors = [
948 | [255, 0, 0], [255, 85, 0], [255, 170, 0], [255, 255, 0], [170, 255, 0],
949 | [85, 255, 0], [0, 255, 0], [0, 255, 85], [0, 255, 170], [0, 255, 255],
950 | [0, 170, 255], [0, 85, 255], [0, 0, 255], [85, 0, 255], [170, 0, 255],
951 | [255, 0, 255], [255, 0, 170], [255, 0, 85]]
952 |
953 | canvas = orig_img.copy()
954 |
955 | # limbs
956 | for pose in poses.round().astype('i'):
957 | for i, (limb, color) in enumerate(zip(params['limbs_point'], limb_colors)):
958 | if i != 9 and i != 13: # don't show ear-shoulder connection
959 | limb_ind = np.array(limb)
960 | if np.all(pose[limb_ind][:, 2] != 0):
961 | joint1, joint2 = pose[limb_ind][:, :2]
962 | cv2.line(canvas, tuple(joint1), tuple(joint2), color, 2)
963 |
964 | # joints
965 | for pose in poses.round().astype('i'):
966 | for i, ((x, y, v), color) in enumerate(zip(pose, joint_colors)):
967 | if v != 0:
968 | cv2.circle(canvas, (x, y), 3, color, -1)
969 | return canvas
--------------------------------------------------------------------------------
/pose_detect.py:
--------------------------------------------------------------------------------
1 | import cv2
2 | import argparse
3 | from openpose import Openpose, draw_person_pose
4 |
5 | if __name__ == '__main__':
6 | parser = argparse.ArgumentParser(description='Pose detector')
7 | parser.add_argument('weights', help='weights file path')
8 | parser.add_argument('--img', '-i', help='image file path')
9 | parser.add_argument('--precise', '-p', action='store_true', help='do precise inference')
10 | args = parser.parse_args()
11 |
12 | # load model
13 | openpose = Openpose(weights_file = args.weights, training = False)
14 |
15 | # read image
16 | img = cv2.imread(args.img)
17 |
18 | # inference
19 | poses, _ = openpose.detect(img, precise=args.precise)
20 |
21 | # draw and save image
22 | img = draw_person_pose(cv2.cvtColor(img, cv2.COLOR_BGR2RGB), poses)
23 |
24 | print('Saving result into result.png...')
25 | cv2.imwrite('result.png', img)
--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
1 | from openpose import Openpose
2 | import argparse
3 |
4 | def parse_args():
5 | parser = argparse.ArgumentParser(description="Train openpose")
6 | parser.add_argument("-r", "--resume", help="whether resume from the latest saved model",action="store_true")
7 | parser.add_argument("-save", "--from_save_folder", help="whether resume from the save path",action="store_true")
8 | args = parser.parse_args()
9 | return args
10 |
11 | if __name__ == '__main__':
12 | args = parse_args()
13 | openpose = Openpose()
14 | if args.resume:
15 | openpose.resume_training_load(from_save_folder = args.from_save_folder)
16 | openpose.train()
--------------------------------------------------------------------------------