├── .gitignore
├── LICENSE
├── README.md
├── bvh_skeleton
├── __init__.py
├── bvh_helper.py
├── cmu_skeleton.py
├── coco_skeleton.py
├── h36m_original_skeleton.py
├── h36m_skeleton.py
├── math3d.py
└── openpose_skeleton.py
├── cameras.h5
├── demo.ipynb
├── miscs
├── cxk.mp4
├── cxk_cache
│ ├── 2d_pose.npy
│ ├── 3d_pose.npy
│ └── cxk.bvh
├── demo
│ ├── cxk_2d_pose.gif
│ ├── cxk_3d_pose.gif
│ ├── cxk_bvh.gif
│ ├── cxk_retargeting.gif
│ └── demo.gif
├── girl_model
│ ├── girl.mhx2
│ └── textures
│ │ ├── brown_eye.png
│ │ ├── female_casualsuit02_ao.png
│ │ ├── female_casualsuit02_diffuse.png
│ │ ├── female_casualsuit02_normal.png
│ │ ├── middleage_lightskinned_female_diffuse2.png
│ │ └── ponytail01_diffuse.png
└── model_link.txt
├── pose_estimator_2d
├── __init__.py
├── estimator_2d.py
└── openpose_estimator.py
├── pose_estimator_3d
├── __init__.py
├── dataset
│ ├── __init__.py
│ └── wild_pose_dataset.py
├── estimator_3d.py
└── model
│ ├── __init__.py
│ ├── factory.py
│ ├── linear_model.py
│ ├── module.py
│ └── video_pose.py
└── utils
├── __init__.py
├── camera.py
├── smooth.py
└── vis.py
/.gitignore:
--------------------------------------------------------------------------------
1 | __pycache__/
2 | .ipynb_checkpoints/
3 | .vscode/
4 | .idea/
5 | config_file/
6 | models/
7 | *.pyc
8 | *.swp
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2020 KevinLTT
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # video2bvh
2 |
3 | video2bvh extracts human motion in video and save it as bvh mocap file.
4 |
5 | 
6 |
7 | ## Introduction
8 |
9 | video2bvh consists of 3 modules: pose_estimator_2d, pose_estimator_3d and bvh_skeleton.
10 | - **pose_estimator_2d**: Since the 3D pose estimation models we used are 2-stage model(image-> 2D pose -> 3D pose), this module is used for estimate 2D human pose (2D joint keypoint position) from image. We choose [OpenPose](https://github.com/CMU-Perceptual-Computing-Lab/openpose) as the 2d estimator. It can detect 2D joint keypoints accurately at real-time speed.
11 | - **pose_estimator_3d**: We provide 2 models to estimate 3D human pose.
12 | - [3d-pose-baseline](https://github.com/una-dinosauria/3d-pose-baseline): This model is proposed by Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little in ICCV 2017.[[PAPER]](https://arxiv.org/pdf/1705.03098.pdf)[[CODE]](https://github.com/una-dinosauria/3d-pose-baseline). It uses single frame 2d pose as input. Its original implementation is based on TensorFlow, and we reimplemented it using PyTorch.
13 | - [VideoPose3D](https://github.com/facebookresearch/VideoPose3D): This model is proposed by Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli in CVPR 2019.[[PAPER]](https://arxiv.org/abs/1811.11742)[[CODE]](https://github.com/facebookresearch/VideoPose3D). It uses 2d pose sequence as input. We slightly modificate the original implementation.
14 | - **bvh_skeleton**: This module includes the function that estimates skeleton information from 3D pose, converts 3D pose to joint angle and write motion data to bvh file.
15 |
16 |
17 | ## Dependencies
18 | - [OpenPose](https://github.com/CMU-Perceptual-Computing-Lab/openpose): See OpenPose offical [installation.md](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/installation.md#python-api) for help. Note to turn on the `BUILD_PYTHON` flag while building.
19 | - [pytorch](https://github.com/pytorch/pytorch).
20 | - [python-opencv](https://opencv.org/).
21 | - [numpy](https://numpy.org/)
22 |
23 |
24 | ## Pre-trained models
25 | The original models provided by [3d-pose-baseline](https://github.com/una-dinosauria/3d-pose-baseline) and [VideoPose3D](https://github.com/facebookresearch/VideoPose3D) use [Human3.6M](http://vision.imar.ro/human3.6m/description.php) 17-joint skeleton as input format (See [bvh_skeleton/h36m_skeleton.py](https://github.com/KevinLTT/video2bvh/raw/master/bvh_skeleton/h36m_skeleton.py)), but OpenPose's detection result are 25-joint (See OpenPose [output.md](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/output.md#pose-output-format-body_25)). So, we trained these models using 2D pose estimated by OpenPose in [Human3.6M](http://vision.imar.ro/human3.6m/description.php) dataset from scratch.
26 |
27 | The training progress is almostly same as the originial implementation. We use subject S1, S5, S6, S7, S8 as the training set, and S9, S11 as the test set. For 3d-pose-baseline, the best MPJPE is 64.12 mm (Protocol #1), and for VideoPose3D the best MPJPE is 58.58 mm (Protocol #1). The pre-trained models can be downloaded from following links.
28 |
29 | * [Google Drive](https://drive.google.com/drive/folders/1M2s32xQkrDhDLz-VqzvocMuoaSGR1MfX?usp=sharin)
30 | * [Baidu Disk](https://pan.baidu.com/s/1-SRaS5FwC30-Pf_gL8bbXQ) (code: fmpz)
31 |
32 | After you download the `models` folder, place or link it under the root directory of this project.
33 |
34 |
35 | ## Quick Start
36 | Open [demo.ipynb](https://github.com/KevinLTT/video2bvh/raw/master/demo.ipynb) in Jupyter Notebook and follow the instructions. As you will see in the [demo.ipynb](https://github.com/KevinLTT/video2bvh/raw/master/demo.ipynb), video2bvh converts video to bvh file with 3 main steps.
37 |
38 | ### 1. Estimate 2D pose from video
39 |
40 |
41 |
42 |
43 | ### 2. Estimate 3D pose from 2D pose
44 |
45 |
46 |
47 |
48 | ### 3. Convert 3D pose to bvh motion capture file
49 |
50 |
51 |
52 |
53 |
54 | ## Retargeting
55 | Once get the bvh file, you can easily retarget the motion to other 3D character model with existing tools. The girl model we used is craeted using [MakeHuman](http://www.makehumancommunity.org/), and the demo is rendered with [Blender](https://www.blender.org/). The [MakeWalk](http://www.makehumancommunity.org/wiki/Documentation:MakeWalk) plugin helps us do the retargeting work.
56 |
57 |
58 |
59 |
60 |
61 | ## TODO
62 | - [ ] Add more 2D estimators, such as [HRNet](https://github.com/leoxiaobin/deep-high-resolution-net.pytorch) and [PoseResNet](https://github.com/microsoft/human-pose-estimation.pytorch).
63 | - [ ] Smoothing 2D pose and 3D pose.
64 | - [ ] Real-time demo.
--------------------------------------------------------------------------------
/bvh_skeleton/__init__.py:
--------------------------------------------------------------------------------
1 | from . import h36m_original_skeleton
2 | from . import h36m_skeleton
3 | from . import openpose_skeleton
--------------------------------------------------------------------------------
/bvh_skeleton/bvh_helper.py:
--------------------------------------------------------------------------------
1 | import os
2 | from pathlib import Path
3 |
4 |
5 | class BvhNode(object):
6 | def __init__(
7 | self, name, offset, rotation_order,
8 | children=None, parent=None, is_root=False, is_end_site=False
9 | ):
10 | if not is_end_site and \
11 | rotation_order not in ['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']:
12 | raise ValueError(f'Rotation order invalid.')
13 | self.name = name
14 | self.offset = offset
15 | self.rotation_order = rotation_order
16 | self.children = children
17 | self.parent = parent
18 | self.is_root = is_root
19 | self.is_end_site = is_end_site
20 |
21 |
22 | class BvhHeader(object):
23 | def __init__(self, root, nodes):
24 | self.root = root
25 | self.nodes = nodes
26 |
27 |
28 | def write_header(writer, node, level):
29 | indent = ' ' * 4 * level
30 | if node.is_root:
31 | writer.write(f'{indent}ROOT {node.name}\n')
32 | channel_num = 6
33 | elif node.is_end_site:
34 | writer.write(f'{indent}End Site\n')
35 | channel_num = 0
36 | else:
37 | writer.write(f'{indent}JOINT {node.name}\n')
38 | channel_num = 3
39 | writer.write(f'{indent}{"{"}\n')
40 |
41 | indent = ' ' * 4 * (level + 1)
42 | writer.write(
43 | f'{indent}OFFSET '
44 | f'{node.offset[0]} {node.offset[1]} {node.offset[2]}\n'
45 | )
46 | if channel_num:
47 | channel_line = f'{indent}CHANNELS {channel_num} '
48 | if node.is_root:
49 | channel_line += f'Xposition Yposition Zposition '
50 | channel_line += ' '.join([
51 | f'{axis.upper()}rotation'
52 | for axis in node.rotation_order
53 | ])
54 | writer.write(channel_line + '\n')
55 |
56 | for child in node.children:
57 | write_header(writer, child, level + 1)
58 |
59 | indent = ' ' * 4 * level
60 | writer.write(f'{indent}{"}"}\n')
61 |
62 |
63 | def write_bvh(output_file, header, channels, frame_rate=30):
64 | output_file = Path(output_file)
65 | if not output_file.parent.exists():
66 | os.makedirs(output_file.parent)
67 |
68 | with output_file.open('w') as f:
69 | f.write('HIERARCHY\n')
70 | write_header(writer=f, node=header.root, level=0)
71 |
72 | f.write('MOTION\n')
73 | f.write(f'Frames: {len(channels)}\n')
74 | f.write(f'Frame Time: {1 / frame_rate}\n')
75 |
76 | for channel in channels:
77 | f.write(' '.join([f'{element}' for element in channel]) + '\n')
--------------------------------------------------------------------------------
/bvh_skeleton/cmu_skeleton.py:
--------------------------------------------------------------------------------
1 | from . import math3d
2 | from . import bvh_helper
3 |
4 | import numpy as np
5 | from pprint import pprint
6 |
7 |
8 | class CMUSkeleton(object):
9 |
10 | def __init__(self):
11 | self.root = 'Hips'
12 | self.keypoint2index = {
13 | 'Hips': 0,
14 | 'RightUpLeg': 1,
15 | 'RightLeg': 2,
16 | 'RightFoot': 3,
17 | 'LeftUpLeg': 4,
18 | 'LeftLeg': 5,
19 | 'LeftFoot': 6,
20 | 'Spine': 7,
21 | 'Spine1': 8,
22 | 'Neck1': 9,
23 | 'HeadEndSite': 10,
24 | 'LeftArm': 11,
25 | 'LeftForeArm': 12,
26 | 'LeftHand': 13,
27 | 'RightArm': 14,
28 | 'RightForeArm': 15,
29 | 'RightHand': 16,
30 | 'RightHipJoint': -1,
31 | 'RightFootEndSite': -1,
32 | 'LeftHipJoint': -1,
33 | 'LeftFootEndSite': -1,
34 | 'LeftShoulder': -1,
35 | 'LeftHandEndSite': -1,
36 | 'RightShoulder': -1,
37 | 'RightHandEndSite': -1,
38 | 'LowerBack': -1,
39 | 'Neck': -1
40 | }
41 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()}
42 | self.keypoint_num = len(self.keypoint2index)
43 |
44 | self.children = {
45 | 'Hips': ['LeftHipJoint', 'LowerBack', 'RightHipJoint'],
46 | 'LeftHipJoint': ['LeftUpLeg'],
47 | 'LeftUpLeg': ['LeftLeg'],
48 | 'LeftLeg': ['LeftFoot'],
49 | 'LeftFoot': ['LeftFootEndSite'],
50 | 'LeftFootEndSite': [],
51 | 'LowerBack': ['Spine'],
52 | 'Spine': ['Spine1'],
53 | 'Spine1': ['LeftShoulder', 'Neck', 'RightShoulder'],
54 | 'LeftShoulder': ['LeftArm'],
55 | 'LeftArm': ['LeftForeArm'],
56 | 'LeftForeArm': ['LeftHand'],
57 | 'LeftHand': ['LeftHandEndSite'],
58 | 'LeftHandEndSite': [],
59 | 'Neck': ['Neck1'],
60 | 'Neck1': ['HeadEndSite'],
61 | 'HeadEndSite': [],
62 | 'RightShoulder': ['RightArm'],
63 | 'RightArm': ['RightForeArm'],
64 | 'RightForeArm': ['RightHand'],
65 | 'RightHand': ['RightHandEndSite'],
66 | 'RightHandEndSite': [],
67 | 'RightHipJoint': ['RightUpLeg'],
68 | 'RightUpLeg': ['RightLeg'],
69 | 'RightLeg': ['RightFoot'],
70 | 'RightFoot': ['RightFootEndSite'],
71 | 'RightFootEndSite': [],
72 | }
73 | self.parent = {self.root: None}
74 | for parent, children in self.children.items():
75 | for child in children:
76 | self.parent[child] = parent
77 |
78 | self.left_joints = [
79 | joint for joint in self.keypoint2index
80 | if 'Left' in joint
81 | ]
82 | self.right_joints = [
83 | joint for joint in self.keypoint2index
84 | if 'Right' in joint
85 | ]
86 |
87 | # T-pose
88 | self.initial_directions = {
89 | 'Hips': [0, 0, 0],
90 | 'LeftHipJoint': [1, 0, 0],
91 | 'LeftUpLeg': [1, 0, 0],
92 | 'LeftLeg': [0, 0, -1],
93 | 'LeftFoot': [0, 0, -1],
94 | 'LeftFootEndSite': [0, -1, 0],
95 | 'LowerBack': [0, 0, 1],
96 | 'Spine': [0, 0, 1],
97 | 'Spine1': [0, 0, 1],
98 | 'LeftShoulder': [1, 0, 0],
99 | 'LeftArm': [1, 0, 0],
100 | 'LeftForeArm': [1, 0, 0],
101 | 'LeftHand': [1, 0, 0],
102 | 'LeftHandEndSite': [1, 0, 0],
103 | 'Neck': [0, 0, 1],
104 | 'Neck1': [0, 0, 1],
105 | 'HeadEndSite': [0, 0, 1],
106 | 'RightShoulder': [-1, 0, 0],
107 | 'RightArm': [-1, 0, 0],
108 | 'RightForeArm': [-1, 0, 0],
109 | 'RightHand': [-1, 0, 0],
110 | 'RightHandEndSite': [-1, 0, 0],
111 | 'RightHipJoint': [-1, 0, 0],
112 | 'RightUpLeg': [-1, 0, 0],
113 | 'RightLeg': [0, 0, -1],
114 | 'RightFoot': [0, 0, -1],
115 | 'RightFootEndSite': [0, -1, 0]
116 | }
117 |
118 |
119 | def get_initial_offset(self, poses_3d):
120 | # TODO: RANSAC
121 | bone_lens = {self.root: [0]}
122 | stack = [self.root]
123 | while stack:
124 | parent = stack.pop()
125 | p_idx = self.keypoint2index[parent]
126 | p_name = parent
127 | while p_idx == -1:
128 | # find real parent
129 | p_name = self.parent[p_name]
130 | p_idx = self.keypoint2index[p_name]
131 | for child in self.children[parent]:
132 | stack.append(child)
133 |
134 | if self.keypoint2index[child] == -1:
135 | bone_lens[child] = [0.1]
136 | else:
137 | c_idx = self.keypoint2index[child]
138 | bone_lens[child] = np.linalg.norm(
139 | poses_3d[:, p_idx] - poses_3d[:, c_idx],
140 | axis=1
141 | )
142 |
143 | bone_len = {}
144 | for joint in self.keypoint2index:
145 | if 'Left' in joint or 'Right' in joint:
146 | base_name = joint.replace('Left', '').replace('Right', '')
147 | left_len = np.mean(bone_lens['Left' + base_name])
148 | right_len = np.mean(bone_lens['Right' + base_name])
149 | bone_len[joint] = (left_len + right_len) / 2
150 | else:
151 | bone_len[joint] = np.mean(bone_lens[joint])
152 |
153 | initial_offset = {}
154 | for joint, direction in self.initial_directions.items():
155 | direction = np.array(direction) / max(np.linalg.norm(direction), 1e-12)
156 | initial_offset[joint] = direction * bone_len[joint]
157 |
158 | return initial_offset
159 |
160 |
161 | def get_bvh_header(self, poses_3d):
162 | initial_offset = self.get_initial_offset(poses_3d)
163 |
164 | nodes = {}
165 | for joint in self.keypoint2index:
166 | is_root = joint == self.root
167 | is_end_site = 'EndSite' in joint
168 | nodes[joint] = bvh_helper.BvhNode(
169 | name=joint,
170 | offset=initial_offset[joint],
171 | rotation_order='zxy' if not is_end_site else '',
172 | is_root=is_root,
173 | is_end_site=is_end_site,
174 | )
175 | for joint, children in self.children.items():
176 | nodes[joint].children = [nodes[child] for child in children]
177 | for child in children:
178 | nodes[child].parent = nodes[joint]
179 |
180 | header = bvh_helper.BvhHeader(root=nodes[self.root], nodes=nodes)
181 | return header
182 |
183 |
184 | def pose2euler(self, pose, header):
185 | channel = []
186 | quats = {}
187 | eulers = {}
188 | stack = [header.root]
189 | while stack:
190 | node = stack.pop()
191 | joint = node.name
192 | joint_idx = self.keypoint2index[joint]
193 |
194 | if node.is_root:
195 | channel.extend(pose[joint_idx])
196 |
197 | index = self.keypoint2index
198 | order = None
199 | if joint == 'Hips':
200 | x_dir = pose[index['LeftUpLeg']] - pose[index['RightUpLeg']]
201 | y_dir = None
202 | z_dir = pose[index['Spine']] - pose[joint_idx]
203 | order = 'zyx'
204 | elif joint in ['RightUpLeg', 'RightLeg']:
205 | child_idx = self.keypoint2index[node.children[0].name]
206 | x_dir = pose[index['Hips']] - pose[index['RightUpLeg']]
207 | y_dir = None
208 | z_dir = pose[joint_idx] - pose[child_idx]
209 | order = 'zyx'
210 | elif joint in ['LeftUpLeg', 'LeftLeg']:
211 | child_idx = self.keypoint2index[node.children[0].name]
212 | x_dir = pose[index['LeftUpLeg']] - pose[index['Hips']]
213 | y_dir = None
214 | z_dir = pose[joint_idx] - pose[child_idx]
215 | order = 'zyx'
216 | elif joint == 'Spine':
217 | x_dir = pose[index['LeftUpLeg']] - pose[index['RightUpLeg']]
218 | y_dir = None
219 | z_dir = pose[index['Spine1']] - pose[joint_idx]
220 | order = 'zyx'
221 | elif joint == 'Spine1':
222 | x_dir = pose[index['LeftArm']] - \
223 | pose[index['RightArm']]
224 | y_dir = None
225 | z_dir = pose[joint_idx] - pose[index['Spine']]
226 | order = 'zyx'
227 | elif joint == 'Neck1':
228 | x_dir = None
229 | y_dir = pose[index['Spine1']] - pose[joint_idx]
230 | z_dir = pose[index['HeadEndSite']] - pose[index['Spine1']]
231 | order = 'zxy'
232 | elif joint == 'LeftArm':
233 | x_dir = pose[index['LeftForeArm']] - pose[joint_idx]
234 | y_dir = pose[index['LeftForeArm']] - pose[index['LeftHand']]
235 | z_dir = None
236 | order = 'xzy'
237 | elif joint == 'LeftForeArm':
238 | x_dir = pose[index['LeftHand']] - pose[joint_idx]
239 | y_dir = pose[joint_idx] - pose[index['LeftArm']]
240 | z_dir = None
241 | order = 'xzy'
242 | elif joint == 'RightArm':
243 | x_dir = pose[joint_idx] - pose[index['RightForeArm']]
244 | y_dir = pose[index['RightForeArm']] - pose[index['RightHand']]
245 | z_dir = None
246 | order = 'xzy'
247 | elif joint == 'RightForeArm':
248 | x_dir = pose[joint_idx] - pose[index['RightHand']]
249 | y_dir = pose[joint_idx] - pose[index['RightArm']]
250 | z_dir = None
251 | order = 'xzy'
252 |
253 | if order:
254 | dcm = math3d.dcm_from_axis(x_dir, y_dir, z_dir, order)
255 | quats[joint] = math3d.dcm2quat(dcm)
256 | else:
257 | quats[joint] = quats[self.parent[joint]].copy()
258 |
259 | local_quat = quats[joint].copy()
260 | if node.parent:
261 | local_quat = math3d.quat_divide(
262 | q=quats[joint], r=quats[node.parent.name]
263 | )
264 |
265 | euler = math3d.quat2euler(
266 | q=local_quat, order=node.rotation_order
267 | )
268 | euler = np.rad2deg(euler)
269 | eulers[joint] = euler
270 | channel.extend(euler)
271 |
272 | for child in node.children[::-1]:
273 | if not child.is_end_site:
274 | stack.append(child)
275 |
276 | return channel
277 |
278 |
279 | def poses2bvh(self, poses_3d, header=None, output_file=None):
280 | if not header:
281 | header = self.get_bvh_header(poses_3d)
282 |
283 | channels = []
284 | for frame, pose in enumerate(poses_3d):
285 | channels.append(self.pose2euler(pose, header))
286 |
287 | if output_file:
288 | bvh_helper.write_bvh(output_file, header, channels)
289 |
290 | return channels, header
--------------------------------------------------------------------------------
/bvh_skeleton/coco_skeleton.py:
--------------------------------------------------------------------------------
1 | class COCOSkeleton(object):
2 |
3 | def __init__(self):
4 | self.root = 'Neck' # median of left shoulder and right shoulder
5 | self.keypoint2index = {
6 | 'Nose': 0,
7 | 'LeftEye': 1,
8 | 'RightEye': 2,
9 | 'LeftEar': 3,
10 | 'RightEar': 4,
11 | 'LeftShoulder': 5,
12 | 'RightShoulder': 6,
13 | 'LeftElbow': 7,
14 | 'RightElbow': 8,
15 | 'LeftWrist': 9,
16 | 'RightWrist': 10,
17 | 'LeftHip': 11,
18 | 'RightHip': 12,
19 | 'LeftKnee': 13,
20 | 'RightKnee': 14,
21 | 'LeftAnkle': 15,
22 | 'RightAnkle': 16,
23 | 'Neck': 17
24 | }
25 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()}
26 | self.keypoint_num = len(self.keypoint2index)
27 |
28 | self.children = {
29 | 'Neck': [
30 | 'Nose', 'LeftShoulder', 'RightShoulder', 'LeftHip', 'RightHip'
31 | ],
32 | 'Nose': ['LeftEye', 'RightEye'],
33 | 'LeftEye': ['LeftEar'],
34 | 'LeftEar': [],
35 | 'RightEye': ['RightEar'],
36 | 'RightEar': [],
37 | 'LeftShoulder': ['LeftElbow'],
38 | 'LeftElbow': ['LeftWrist'],
39 | 'LeftWrist': [],
40 | 'RightShoulder': ['RightElbow'],
41 | 'RightElbow': ['RightWrist'],
42 | 'RightWrist': [],
43 | 'LeftHip': ['LeftKnee'],
44 | 'LeftKnee': ['LeftAnkle'],
45 | 'LeftAnkle': [],
46 | 'RightHip': ['RightKnee'],
47 | 'RightKnee': ['RightAnkle'],
48 | 'RightAnkle': []
49 | }
50 | self.parent = {self.root: None}
51 | for parent, children in self.children.items():
52 | for child in children:
53 | self.parent[child] = parent
--------------------------------------------------------------------------------
/bvh_skeleton/h36m_original_skeleton.py:
--------------------------------------------------------------------------------
1 | class H36mOriginalSkeleton(object):
2 |
3 | def __init__(self):
4 | self.root = 'Hip'
5 | self.keypoint2index = {
6 | 'Hip': 0,
7 | 'RightUpLeg': 1,
8 | 'RightLeg': 2,
9 | 'RightFoot': 3,
10 | 'RightToeBase': 4,
11 | 'RightToeBaseEndSite': 5,
12 | 'LeftUpLeg': 6,
13 | 'LeftLeg': 7,
14 | 'LeftFoot': 8,
15 | 'LeftToeBase': 9,
16 | 'LeftToeBaseEndSite': 10,
17 | 'Spine': 11,
18 | 'Spine1': 12,
19 | 'Neck': 13,
20 | 'Head': 14,
21 | 'HeadEndSite': 15,
22 | 'LeftShoulder': 16,
23 | 'LeftArm': 17,
24 | 'LeftForeArm': 18,
25 | 'LeftHand': 19,
26 | 'LeftHandThumb': 20,
27 | 'LeftHandThumbEndSite': 21,
28 | 'LeftWristEnd': 22,
29 | 'LeftWristEndEndSite': 23,
30 | 'RightShoulder': 24,
31 | 'RightArm': 25,
32 | 'RightForeArm': 26,
33 | 'RightHand': 27,
34 | 'RightHandThumb': 28,
35 | 'RightHandThumbEndSite': 29,
36 | 'RightWristEnd': 30,
37 | 'RightWristEndEndSite': 31
38 | }
39 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()}
40 | self.keypoint_num = len(self.keypoint2index)
41 |
42 | self.children = {
43 | 'Hip': ['RightUpLeg', 'LeftUpLeg', 'Spine'],
44 | 'RightUpLeg': ['RightLeg'],
45 | 'RightLeg': ['RightFoot'],
46 | 'RightFoot': ['RightToeBase'],
47 | 'RightToeBase': ['RightToeBaseEndSite'],
48 | 'RightToeBaseEndSite': [],
49 | 'LeftUpLeg': ['LeftLeg'],
50 | 'LeftLeg': ['LeftFoot'],
51 | 'LeftFoot': ['LeftToeBase'],
52 | 'LeftToeBase': ['LeftToeBaseEndSite'],
53 | 'LeftToeBaseEndSite': [],
54 | 'Spine': ['Spine1'],
55 | 'Spine1': ['Neck', 'LeftShoulder', 'RightShoulder'],
56 | 'Neck': ['Head'],
57 | 'Head': ['HeadEndSite'],
58 | 'HeadEndSite': [],
59 | 'LeftShoulder': ['LeftArm'],
60 | 'LeftArm': ['LeftForeArm'],
61 | 'LeftForeArm': ['LeftHand'],
62 | 'LeftHand': ['LeftHandThumb', 'LeftWristEnd'],
63 | 'LeftHandThumb': ['LeftHandThumbEndSite'],
64 | 'LeftHandThumbEndSite': [],
65 | 'LeftWristEnd': ['LeftWristEndEndSite'],
66 | 'LeftWristEndEndSite': [],
67 | 'RightShoulder': ['RightArm'],
68 | 'RightArm': ['RightForeArm'],
69 | 'RightForeArm': ['RightHand'],
70 | 'RightHand': ['RightHandThumb', 'RightWristEnd'],
71 | 'RightHandThumb': ['RightHandThumbEndSite'],
72 | 'RightHandThumbEndSite': [],
73 | 'RightWristEnd': ['RightWristEndEndSite'],
74 | 'RightWristEndEndSite': [],
75 | }
76 | self.parent = {self.root: None}
77 | for parent, children in self.children.items():
78 | for child in children:
79 | self.parent[child] = parent
80 |
81 | self.left_joints = [
82 | joint for joint in self.keypoint2index
83 | if 'Left' in joint
84 | ]
85 | self.right_joints = [
86 | joint for joint in self.keypoint2index
87 | if 'Right' in joint
88 | ]
--------------------------------------------------------------------------------
/bvh_skeleton/h36m_skeleton.py:
--------------------------------------------------------------------------------
1 | from . import math3d
2 | from . import bvh_helper
3 |
4 | import numpy as np
5 |
6 |
7 | class H36mSkeleton(object):
8 |
9 | def __init__(self):
10 | self.root = 'Hip'
11 | self.keypoint2index = {
12 | 'Hip': 0,
13 | 'RightHip': 1,
14 | 'RightKnee': 2,
15 | 'RightAnkle': 3,
16 | 'LeftHip': 4,
17 | 'LeftKnee': 5,
18 | 'LeftAnkle': 6,
19 | 'Spine': 7,
20 | 'Thorax': 8,
21 | 'Neck': 9,
22 | 'HeadEndSite': 10,
23 | 'LeftShoulder': 11,
24 | 'LeftElbow': 12,
25 | 'LeftWrist': 13,
26 | 'RightShoulder': 14,
27 | 'RightElbow': 15,
28 | 'RightWrist': 16,
29 | 'RightAnkleEndSite': -1,
30 | 'LeftAnkleEndSite': -1,
31 | 'LeftWristEndSite': -1,
32 | 'RightWristEndSite': -1
33 | }
34 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()}
35 | self.keypoint_num = len(self.keypoint2index)
36 |
37 | self.children = {
38 | 'Hip': ['RightHip', 'LeftHip', 'Spine'],
39 | 'RightHip': ['RightKnee'],
40 | 'RightKnee': ['RightAnkle'],
41 | 'RightAnkle': ['RightAnkleEndSite'],
42 | 'RightAnkleEndSite': [],
43 | 'LeftHip': ['LeftKnee'],
44 | 'LeftKnee': ['LeftAnkle'],
45 | 'LeftAnkle': ['LeftAnkleEndSite'],
46 | 'LeftAnkleEndSite': [],
47 | 'Spine': ['Thorax'],
48 | 'Thorax': ['Neck', 'LeftShoulder', 'RightShoulder'],
49 | 'Neck': ['HeadEndSite'],
50 | 'HeadEndSite': [], # Head is an end site
51 | 'LeftShoulder': ['LeftElbow'],
52 | 'LeftElbow': ['LeftWrist'],
53 | 'LeftWrist': ['LeftWristEndSite'],
54 | 'LeftWristEndSite': [],
55 | 'RightShoulder': ['RightElbow'],
56 | 'RightElbow': ['RightWrist'],
57 | 'RightWrist': ['RightWristEndSite'],
58 | 'RightWristEndSite': []
59 | }
60 | self.parent = {self.root: None}
61 | for parent, children in self.children.items():
62 | for child in children:
63 | self.parent[child] = parent
64 |
65 | self.left_joints = [
66 | joint for joint in self.keypoint2index
67 | if 'Left' in joint
68 | ]
69 | self.right_joints = [
70 | joint for joint in self.keypoint2index
71 | if 'Right' in joint
72 | ]
73 |
74 | # T-pose
75 | self.initial_directions = {
76 | 'Hip': [0, 0, 0],
77 | 'RightHip': [-1, 0, 0],
78 | 'RightKnee': [0, 0, -1],
79 | 'RightAnkle': [0, 0, -1],
80 | 'RightAnkleEndSite': [0, -1, 0],
81 | 'LeftHip': [1, 0, 0],
82 | 'LeftKnee': [0, 0, -1],
83 | 'LeftAnkle': [0, 0, -1],
84 | 'LeftAnkleEndSite': [0, -1, 0],
85 | 'Spine': [0, 0, 1],
86 | 'Thorax': [0, 0, 1],
87 | 'Neck': [0, 0, 1],
88 | 'HeadEndSite': [0, 0, 1],
89 | 'LeftShoulder': [1, 0, 0],
90 | 'LeftElbow': [1, 0, 0],
91 | 'LeftWrist': [1, 0, 0],
92 | 'LeftWristEndSite': [1, 0, 0],
93 | 'RightShoulder': [-1, 0, 0],
94 | 'RightElbow': [-1, 0, 0],
95 | 'RightWrist': [-1, 0, 0],
96 | 'RightWristEndSite': [-1, 0, 0]
97 | }
98 |
99 |
100 | def get_initial_offset(self, poses_3d):
101 | # TODO: RANSAC
102 | bone_lens = {self.root: [0]}
103 | stack = [self.root]
104 | while stack:
105 | parent = stack.pop()
106 | p_idx = self.keypoint2index[parent]
107 | for child in self.children[parent]:
108 | if 'EndSite' in child:
109 | bone_lens[child] = 0.4 * bone_lens[parent]
110 | continue
111 | stack.append(child)
112 |
113 | c_idx = self.keypoint2index[child]
114 | bone_lens[child] = np.linalg.norm(
115 | poses_3d[:, p_idx] - poses_3d[:, c_idx],
116 | axis=1
117 | )
118 |
119 | bone_len = {}
120 | for joint in self.keypoint2index:
121 | if 'Left' in joint or 'Right' in joint:
122 | base_name = joint.replace('Left', '').replace('Right', '')
123 | left_len = np.mean(bone_lens['Left' + base_name])
124 | right_len = np.mean(bone_lens['Right' + base_name])
125 | bone_len[joint] = (left_len + right_len) / 2
126 | else:
127 | bone_len[joint] = np.mean(bone_lens[joint])
128 |
129 | initial_offset = {}
130 | for joint, direction in self.initial_directions.items():
131 | direction = np.array(direction) / max(np.linalg.norm(direction), 1e-12)
132 | initial_offset[joint] = direction * bone_len[joint]
133 |
134 | return initial_offset
135 |
136 |
137 | def get_bvh_header(self, poses_3d):
138 | initial_offset = self.get_initial_offset(poses_3d)
139 |
140 | nodes = {}
141 | for joint in self.keypoint2index:
142 | is_root = joint == self.root
143 | is_end_site = 'EndSite' in joint
144 | nodes[joint] = bvh_helper.BvhNode(
145 | name=joint,
146 | offset=initial_offset[joint],
147 | rotation_order='zxy' if not is_end_site else '',
148 | is_root=is_root,
149 | is_end_site=is_end_site,
150 | )
151 | for joint, children in self.children.items():
152 | nodes[joint].children = [nodes[child] for child in children]
153 | for child in children:
154 | nodes[child].parent = nodes[joint]
155 |
156 | header = bvh_helper.BvhHeader(root=nodes[self.root], nodes=nodes)
157 | return header
158 |
159 |
160 | def pose2euler(self, pose, header):
161 | channel = []
162 | quats = {}
163 | eulers = {}
164 | stack = [header.root]
165 | while stack:
166 | node = stack.pop()
167 | joint = node.name
168 | joint_idx = self.keypoint2index[joint]
169 |
170 | if node.is_root:
171 | channel.extend(pose[joint_idx])
172 |
173 | index = self.keypoint2index
174 | order = None
175 | if joint == 'Hip':
176 | x_dir = pose[index['LeftHip']] - pose[index['RightHip']]
177 | y_dir = None
178 | z_dir = pose[index['Spine']] - pose[joint_idx]
179 | order = 'zyx'
180 | elif joint in ['RightHip', 'RightKnee']:
181 | child_idx = self.keypoint2index[node.children[0].name]
182 | x_dir = pose[index['Hip']] - pose[index['RightHip']]
183 | y_dir = None
184 | z_dir = pose[joint_idx] - pose[child_idx]
185 | order = 'zyx'
186 | elif joint in ['LeftHip', 'LeftKnee']:
187 | child_idx = self.keypoint2index[node.children[0].name]
188 | x_dir = pose[index['LeftHip']] - pose[index['Hip']]
189 | y_dir = None
190 | z_dir = pose[joint_idx] - pose[child_idx]
191 | order = 'zyx'
192 | elif joint == 'Spine':
193 | x_dir = pose[index['LeftHip']] - pose[index['RightHip']]
194 | y_dir = None
195 | z_dir = pose[index['Thorax']] - pose[joint_idx]
196 | order = 'zyx'
197 | elif joint == 'Thorax':
198 | x_dir = pose[index['LeftShoulder']] - \
199 | pose[index['RightShoulder']]
200 | y_dir = None
201 | z_dir = pose[joint_idx] - pose[index['Spine']]
202 | order = 'zyx'
203 | elif joint == 'Neck':
204 | x_dir = None
205 | y_dir = pose[index['Thorax']] - pose[joint_idx]
206 | z_dir = pose[index['HeadEndSite']] - pose[index['Thorax']]
207 | order = 'zxy'
208 | elif joint == 'LeftShoulder':
209 | x_dir = pose[index['LeftElbow']] - pose[joint_idx]
210 | y_dir = pose[index['LeftElbow']] - pose[index['LeftWrist']]
211 | z_dir = None
212 | order = 'xzy'
213 | elif joint == 'LeftElbow':
214 | x_dir = pose[index['LeftWrist']] - pose[joint_idx]
215 | y_dir = pose[joint_idx] - pose[index['LeftShoulder']]
216 | z_dir = None
217 | order = 'xzy'
218 | elif joint == 'RightShoulder':
219 | x_dir = pose[joint_idx] - pose[index['RightElbow']]
220 | y_dir = pose[index['RightElbow']] - pose[index['RightWrist']]
221 | z_dir = None
222 | order = 'xzy'
223 | elif joint == 'RightElbow':
224 | x_dir = pose[joint_idx] - pose[index['RightWrist']]
225 | y_dir = pose[joint_idx] - pose[index['RightShoulder']]
226 | z_dir = None
227 | order = 'xzy'
228 | if order:
229 | dcm = math3d.dcm_from_axis(x_dir, y_dir, z_dir, order)
230 | quats[joint] = math3d.dcm2quat(dcm)
231 | else:
232 | quats[joint] = quats[self.parent[joint]].copy()
233 |
234 | local_quat = quats[joint].copy()
235 | if node.parent:
236 | local_quat = math3d.quat_divide(
237 | q=quats[joint], r=quats[node.parent.name]
238 | )
239 |
240 | euler = math3d.quat2euler(
241 | q=local_quat, order=node.rotation_order
242 | )
243 | euler = np.rad2deg(euler)
244 | eulers[joint] = euler
245 | channel.extend(euler)
246 |
247 | for child in node.children[::-1]:
248 | if not child.is_end_site:
249 | stack.append(child)
250 |
251 | return channel
252 |
253 |
254 | def poses2bvh(self, poses_3d, header=None, output_file=None):
255 | if not header:
256 | header = self.get_bvh_header(poses_3d)
257 |
258 | channels = []
259 | for frame, pose in enumerate(poses_3d):
260 | channels.append(self.pose2euler(pose, header))
261 |
262 | if output_file:
263 | bvh_helper.write_bvh(output_file, header, channels)
264 |
265 | return channels, header
--------------------------------------------------------------------------------
/bvh_skeleton/math3d.py:
--------------------------------------------------------------------------------
1 | """
2 | ! left handed coordinate, z-up, y-forward
3 | ! left to right rotation matrix multiply: v'=vR
4 | ! non-standard quaternion multiply
5 | """
6 |
7 | import numpy as np
8 |
9 |
10 | def normalize(x):
11 | return x / max(np.linalg.norm(x), 1e-12)
12 |
13 |
14 | def dcm_from_axis(x_dir, y_dir, z_dir, order):
15 | assert order in ['yzx', 'yxz', 'xyz', 'xzy', 'zxy', 'zyx']
16 |
17 | axis = {'x': x_dir, 'y': y_dir, 'z': z_dir}
18 | name = ['x', 'y', 'z']
19 | idx0 = name.index(order[0])
20 | idx1 = name.index(order[1])
21 | idx2 = name.index(order[2])
22 |
23 | axis[order[0]] = normalize(axis[order[0]])
24 | axis[order[1]] = normalize(np.cross(
25 | axis[name[(idx1 + 1) % 3]], axis[name[(idx1 + 2) % 3]]
26 | ))
27 | axis[order[2]] = normalize(np.cross(
28 | axis[name[(idx2 + 1) % 3]], axis[name[(idx2 + 2) % 3]]
29 | ))
30 |
31 | dcm = np.asarray([axis['x'], axis['y'], axis['z']])
32 |
33 | return dcm
34 |
35 |
36 | def dcm2quat(dcm):
37 | q = np.zeros([4])
38 | tr = np.trace(dcm)
39 |
40 | if tr > 0:
41 | sqtrp1 = np.sqrt(tr + 1.0)
42 | q[0] = 0.5 * sqtrp1
43 | q[1] = (dcm[1, 2] - dcm[2, 1]) / (2.0 * sqtrp1)
44 | q[2] = (dcm[2, 0] - dcm[0, 2]) / (2.0 * sqtrp1)
45 | q[3] = (dcm[0, 1] - dcm[1, 0]) / (2.0 * sqtrp1)
46 | else:
47 | d = np.diag(dcm)
48 | if d[1] > d[0] and d[1] > d[2]:
49 | sqdip1 = np.sqrt(d[1] - d[0] - d[2] + 1.0)
50 | q[2] = 0.5 * sqdip1
51 |
52 | if sqdip1 != 0:
53 | sqdip1 = 0.5 / sqdip1
54 |
55 | q[0] = (dcm[2, 0] - dcm[0, 2]) * sqdip1
56 | q[1] = (dcm[0, 1] + dcm[1, 0]) * sqdip1
57 | q[3] = (dcm[1, 2] + dcm[2, 1]) * sqdip1
58 |
59 | elif d[2] > d[0]:
60 | sqdip1 = np.sqrt(d[2] - d[0] - d[1] + 1.0)
61 | q[3] = 0.5 * sqdip1
62 |
63 | if sqdip1 != 0:
64 | sqdip1 = 0.5 / sqdip1
65 |
66 | q[0] = (dcm[0, 1] - dcm[1, 0]) * sqdip1
67 | q[1] = (dcm[2, 0] + dcm[0, 2]) * sqdip1
68 | q[2] = (dcm[1, 2] + dcm[2, 1]) * sqdip1
69 |
70 | else:
71 | sqdip1 = np.sqrt(d[0] - d[1] - d[2] + 1.0)
72 | q[1] = 0.5 * sqdip1
73 |
74 | if sqdip1 != 0:
75 | sqdip1 = 0.5 / sqdip1
76 |
77 | q[0] = (dcm[1, 2] - dcm[2, 1]) * sqdip1
78 | q[2] = (dcm[0, 1] + dcm[1, 0]) * sqdip1
79 | q[3] = (dcm[2, 0] + dcm[0, 2]) * sqdip1
80 |
81 | return q
82 |
83 |
84 | def quat_dot(q0, q1):
85 | original_shape = q0.shape
86 | q0 = np.reshape(q0, [-1, 4])
87 | q1 = np.reshape(q1, [-1, 4])
88 |
89 | w0, x0, y0, z0 = q0[:, 0], q0[:, 1], q0[:, 2], q0[:, 3]
90 | w1, x1, y1, z1 = q1[:, 0], q1[:, 1], q1[:, 2], q1[:, 3]
91 | q_product = w0 * w1 + x1 * x1 + y0 * y1 + z0 * z1
92 | q_product = np.expand_dims(q_product, axis=1)
93 | q_product = np.tile(q_product, [1, 4])
94 |
95 | return np.reshape(q_product, original_shape)
96 |
97 |
98 | def quat_inverse(q):
99 | original_shape = q.shape
100 | q = np.reshape(q, [-1, 4])
101 |
102 | q_conj = [q[:, 0], -q[:, 1], -q[:, 2], -q[:, 3]]
103 | q_conj = np.stack(q_conj, axis=1)
104 | q_inv = np.divide(q_conj, quat_dot(q_conj, q_conj))
105 |
106 | return np.reshape(q_inv, original_shape)
107 |
108 |
109 | def quat_mul(q0, q1):
110 | original_shape = q0.shape
111 | q1 = np.reshape(q1, [-1, 4, 1])
112 | q0 = np.reshape(q0, [-1, 1, 4])
113 | terms = np.matmul(q1, q0)
114 | w = terms[:, 0, 0] - terms[:, 1, 1] - terms[:, 2, 2] - terms[:, 3, 3]
115 | x = terms[:, 0, 1] + terms[:, 1, 0] - terms[:, 2, 3] + terms[:, 3, 2]
116 | y = terms[:, 0, 2] + terms[:, 1, 3] + terms[:, 2, 0] - terms[:, 3, 1]
117 | z = terms[:, 0, 3] - terms[:, 1, 2] + terms[:, 2, 1] + terms[:, 3, 0]
118 |
119 | q_product = np.stack([w, x, y, z], axis=1)
120 | return np.reshape(q_product, original_shape)
121 |
122 |
123 | def quat_divide(q, r):
124 | return quat_mul(quat_inverse(r), q)
125 |
126 |
127 | def quat2euler(q, order='zxy', eps=1e-8):
128 | original_shape = list(q.shape)
129 | original_shape[-1] = 3
130 | q = np.reshape(q, [-1, 4])
131 |
132 | q0 = q[:, 0]
133 | q1 = q[:, 1]
134 | q2 = q[:, 2]
135 | q3 = q[:, 3]
136 |
137 | if order == 'zxy':
138 | x = np.arcsin(np.clip(2 * (q0 * q1 + q2 * q3), -1 + eps, 1 - eps))
139 | y = np.arctan2(2 * (q0 * q2 - q1 * q3), 1 - 2 * (q1 * q1 + q2 * q2))
140 | z = np.arctan2(2 * (q0 * q3 - q1 * q2), 1 - 2 * (q1 * q1 + q3 * q3))
141 | euler = np.stack([z, x, y], axis=1)
142 | else:
143 | raise ValueError('Not implemented')
144 |
145 | return np.reshape(euler, original_shape)
--------------------------------------------------------------------------------
/bvh_skeleton/openpose_skeleton.py:
--------------------------------------------------------------------------------
1 | class OpenPoseSkeleton(object):
2 |
3 | def __init__(self):
4 | self.root = 'MidHip'
5 | self.keypoint2index = {
6 | 'Nose': 0,
7 | 'Neck': 1,
8 | 'RShoulder': 2,
9 | 'RElbow': 3,
10 | 'RWrist': 4,
11 | 'LShoulder': 5,
12 | 'LElbow': 6,
13 | 'LWrist': 7,
14 | 'MidHip': 8,
15 | 'RHip': 9,
16 | 'RKnee': 10,
17 | 'RAnkle': 11,
18 | 'LHip': 12,
19 | 'LKnee': 13,
20 | 'LAnkle': 14,
21 | 'REye': 15,
22 | 'LEye': 16,
23 | 'REar': 17,
24 | 'LEar': 18,
25 | 'LBigToe': 19,
26 | 'LSmallToe': 20,
27 | 'LHeel': 21,
28 | 'RBigToe': 22,
29 | 'RSmallToe': 23,
30 | 'RHeel': 24
31 | }
32 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()}
33 | self.keypoint_num = len(self.keypoint2index)
34 |
35 | self.children = {
36 | 'MidHip': ['Neck', 'RHip', 'LHip'],
37 | 'Neck': ['Nose', 'RShoulder', 'LShoulder'],
38 | 'Nose': ['REye', 'LEye'],
39 | 'REye': ['REar'],
40 | 'REar': [],
41 | 'LEye': ['LEar'],
42 | 'LEar': [],
43 | 'RShoulder': ['RElbow'],
44 | 'RElbow': ['RWrist'],
45 | 'RWrist': [],
46 | 'LShoulder': ['LElbow'],
47 | 'LElbow': ['LWrist'],
48 | 'LWrist': [],
49 | 'RHip': ['RKnee'],
50 | 'RKnee': ['RAnkle'],
51 | 'RAnkle': ['RBigToe', 'RSmallToe', 'RHeel'],
52 | 'RBigToe': [],
53 | 'RSmallToe': [],
54 | 'RHeel': [],
55 | 'LHip': ['LKnee'],
56 | 'LKnee': ['LAnkle'],
57 | 'LAnkle': ['LBigToe', 'LSmallToe', 'LHeel'],
58 | 'LBigToe': [],
59 | 'LSmallToe': [],
60 | 'LHeel': [],
61 | }
62 | self.parent = {self.root: None}
63 | for parent, children in self.children.items():
64 | for child in children:
65 | self.parent[child] = parent
--------------------------------------------------------------------------------
/cameras.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/cameras.h5
--------------------------------------------------------------------------------
/demo.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "from pose_estimator_2d import openpose_estimator\n",
10 | "from pose_estimator_3d import estimator_3d\n",
11 | "from utils import smooth, vis, camera\n",
12 | "from bvh_skeleton import openpose_skeleton, h36m_skeleton, cmu_skeleton\n",
13 | "\n",
14 | "import cv2\n",
15 | "import importlib\n",
16 | "import numpy as np\n",
17 | "import os\n",
18 | "from pathlib import Path\n",
19 | "from IPython.display import HTML"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "## Initialize 2d pose estimator"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": null,
32 | "metadata": {},
33 | "outputs": [],
34 | "source": [
35 | "# more 2d pose estimators like HRNet, PoseResNet, CPN, etc., will be added later\n",
36 | "e2d = openpose_estimator.OpenPoseEstimator(model_folder='/openpose/models/') # set model_folder to /path/to/openpose/models"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "## Estimate 2D pose from video"
44 | ]
45 | },
46 | {
47 | "cell_type": "code",
48 | "execution_count": null,
49 | "metadata": {},
50 | "outputs": [],
51 | "source": [
52 | "video_file = Path('miscs/cxk.mp4') # video file to process\n",
53 | "output_dir = Path(f'miscs/{video_file.stem}_cache')\n",
54 | "if not output_dir.exists():\n",
55 | " os.makedirs(output_dir)\n",
56 | " \n",
57 | "cap = cv2.VideoCapture(str(video_file))\n",
58 | "keypoints_list = []\n",
59 | "img_width, img_height = None, None\n",
60 | "while True:\n",
61 | " ret, frame = cap.read()\n",
62 | " if not ret:\n",
63 | " break\n",
64 | " img_height = frame.shape[0]\n",
65 | " img_width = frame.shape[1]\n",
66 | " \n",
67 | " # returned shape will be (num_of_human, 25, 3)\n",
68 | " # last dimension includes (x, y, confidence)\n",
69 | " keypoints = e2d.estimate(img_list=[frame])[0]\n",
70 | " if not isinstance(keypoints, np.ndarray) or len(keypoints.shape) != 3:\n",
71 | " # failed to detect human\n",
72 | " keypoints_list.append(None)\n",
73 | " else:\n",
74 | " # we assume that the image only contains 1 person\n",
75 | " # multi-person video needs some extra processes like grouping\n",
76 | " # maybe we will implemented it in the future\n",
77 | " keypoints_list.append(keypoints[0])\n",
78 | "cap.release()"
79 | ]
80 | },
81 | {
82 | "cell_type": "markdown",
83 | "metadata": {},
84 | "source": [
85 | "## Process 2D pose"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": null,
91 | "metadata": {},
92 | "outputs": [],
93 | "source": [
94 | "# filter out failed result\n",
95 | "keypoints_list = smooth.filter_missing_value(\n",
96 | " keypoints_list=keypoints_list,\n",
97 | " method='ignore' # interpolation method will be implemented later\n",
98 | ")\n",
99 | "\n",
100 | "# smooth process will be implemented later\n",
101 | "\n",
102 | "# save 2d pose result\n",
103 | "pose2d = np.stack(keypoints_list)[:, :, :2]\n",
104 | "pose2d_file = Path(output_dir / '2d_pose.npy')\n",
105 | "np.save(pose2d_file, pose2d)"
106 | ]
107 | },
108 | {
109 | "cell_type": "markdown",
110 | "metadata": {},
111 | "source": [
112 | "## Visualize 2D pose"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": null,
118 | "metadata": {},
119 | "outputs": [],
120 | "source": [
121 | "cap = cv2.VideoCapture(str(video_file))\n",
122 | "vis_result_dir = output_dir / '2d_pose_vis' # path to save the visualized images\n",
123 | "if not vis_result_dir.exists():\n",
124 | " os.makedirs(vis_result_dir)\n",
125 | " \n",
126 | "op_skel = openpose_skeleton.OpenPoseSkeleton()\n",
127 | "\n",
128 | "for i, keypoints in enumerate(keypoints_list):\n",
129 | " ret, frame = cap.read()\n",
130 | " if not ret:\n",
131 | " break\n",
132 | " \n",
133 | " # keypoint whose detect confidence under kp_thresh will not be visualized\n",
134 | " vis.vis_2d_keypoints(\n",
135 | " keypoints=keypoints,\n",
136 | " img=frame,\n",
137 | " skeleton=op_skel,\n",
138 | " kp_thresh=0.4,\n",
139 | " output_file=vis_result_dir / f'{i:04d}.png'\n",
140 | " )\n",
141 | "cap.release()"
142 | ]
143 | },
144 | {
145 | "cell_type": "markdown",
146 | "metadata": {},
147 | "source": [
148 | "## Initialize 3D pose estimator"
149 | ]
150 | },
151 | {
152 | "cell_type": "code",
153 | "execution_count": null,
154 | "metadata": {},
155 | "outputs": [],
156 | "source": [
157 | "importlib.reload(estimator_3d)\n",
158 | "e3d = estimator_3d.Estimator3D(\n",
159 | " config_file='models/openpose_video_pose_243f/video_pose.yaml',\n",
160 | " checkpoint_file='models/openpose_video_pose_243f/best_58.58.pth'\n",
161 | ")"
162 | ]
163 | },
164 | {
165 | "cell_type": "markdown",
166 | "metadata": {},
167 | "source": [
168 | "## Estimate 3D pose from 2D pose"
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": null,
174 | "metadata": {},
175 | "outputs": [],
176 | "source": [
177 | "pose2d = np.load(pose2d_file)\n",
178 | "pose3d = e3d.estimate(pose2d, image_width=img_width, image_height=img_height)"
179 | ]
180 | },
181 | {
182 | "cell_type": "markdown",
183 | "metadata": {},
184 | "source": [
185 | "## Convert 3D pose from camera coordinates to world coordinates"
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": null,
191 | "metadata": {},
192 | "outputs": [],
193 | "source": [
194 | "subject = 'S1'\n",
195 | "cam_id = '55011271'\n",
196 | "cam_params = camera.load_camera_params('cameras.h5')[subject][cam_id]\n",
197 | "R = cam_params['R']\n",
198 | "T = 0\n",
199 | "azimuth = cam_params['azimuth']\n",
200 | "\n",
201 | "pose3d_world = camera.camera2world(pose=pose3d, R=R, T=T)\n",
202 | "pose3d_world[:, :, 2] -= np.min(pose3d_world[:, :, 2]) # rebase the height\n",
203 | "\n",
204 | "pose3d_file = output_dir / '3d_pose.npy'\n",
205 | "np.save(pose3d_file, pose3d_world)"
206 | ]
207 | },
208 | {
209 | "cell_type": "markdown",
210 | "metadata": {},
211 | "source": [
212 | "## Visualize 3D pose"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": null,
218 | "metadata": {},
219 | "outputs": [],
220 | "source": [
221 | "h36m_skel = h36m_skeleton.H36mSkeleton()\n",
222 | "gif_file = output_dir / '3d_pose_300.gif' # output format can be .gif or .mp4 \n",
223 | "\n",
224 | "ani = vis.vis_3d_keypoints_sequence(\n",
225 | " keypoints_sequence=pose3d_world[0:300],\n",
226 | " skeleton=h36m_skel,\n",
227 | " azimuth=azimuth,\n",
228 | " fps=60,\n",
229 | " output_file=gif_file\n",
230 | ")\n",
231 | "HTML(ani.to_jshtml())"
232 | ]
233 | },
234 | {
235 | "cell_type": "markdown",
236 | "metadata": {},
237 | "source": [
238 | "## Convert 3D pose to BVH"
239 | ]
240 | },
241 | {
242 | "cell_type": "code",
243 | "execution_count": null,
244 | "metadata": {},
245 | "outputs": [],
246 | "source": [
247 | "bvh_file = output_dir / f'{video_file.stem}.bvh'\n",
248 | "cmu_skel = cmu_skeleton.CMUSkeleton()\n",
249 | "channels, header = cmu_skel.poses2bvh(pose3d_world, output_file=bvh_file)"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": null,
255 | "metadata": {},
256 | "outputs": [],
257 | "source": [
258 | "output = 'miscs/h36m_cxk.bvh'\n",
259 | "h36m_skel = h36m_skeleton.H36mSkeleton()\n",
260 | "_ = h36m_skel.poses2bvh(pose3d_world, output_file=output)"
261 | ]
262 | }
263 | ],
264 | "metadata": {
265 | "kernelspec": {
266 | "display_name": "Python 3",
267 | "language": "python",
268 | "name": "python3"
269 | },
270 | "language_info": {
271 | "codemirror_mode": {
272 | "name": "ipython",
273 | "version": 3
274 | },
275 | "file_extension": ".py",
276 | "mimetype": "text/x-python",
277 | "name": "python",
278 | "nbconvert_exporter": "python",
279 | "pygments_lexer": "ipython3",
280 | "version": "3.6.8"
281 | }
282 | },
283 | "nbformat": 4,
284 | "nbformat_minor": 2
285 | }
286 |
--------------------------------------------------------------------------------
/miscs/cxk.mp4:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/cxk.mp4
--------------------------------------------------------------------------------
/miscs/cxk_cache/2d_pose.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/cxk_cache/2d_pose.npy
--------------------------------------------------------------------------------
/miscs/cxk_cache/3d_pose.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/cxk_cache/3d_pose.npy
--------------------------------------------------------------------------------
/miscs/demo/cxk_2d_pose.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/cxk_2d_pose.gif
--------------------------------------------------------------------------------
/miscs/demo/cxk_3d_pose.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/cxk_3d_pose.gif
--------------------------------------------------------------------------------
/miscs/demo/cxk_bvh.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/cxk_bvh.gif
--------------------------------------------------------------------------------
/miscs/demo/cxk_retargeting.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/cxk_retargeting.gif
--------------------------------------------------------------------------------
/miscs/demo/demo.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/demo.gif
--------------------------------------------------------------------------------
/miscs/girl_model/textures/brown_eye.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/brown_eye.png
--------------------------------------------------------------------------------
/miscs/girl_model/textures/female_casualsuit02_ao.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/female_casualsuit02_ao.png
--------------------------------------------------------------------------------
/miscs/girl_model/textures/female_casualsuit02_diffuse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/female_casualsuit02_diffuse.png
--------------------------------------------------------------------------------
/miscs/girl_model/textures/female_casualsuit02_normal.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/female_casualsuit02_normal.png
--------------------------------------------------------------------------------
/miscs/girl_model/textures/middleage_lightskinned_female_diffuse2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/middleage_lightskinned_female_diffuse2.png
--------------------------------------------------------------------------------
/miscs/girl_model/textures/ponytail01_diffuse.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/ponytail01_diffuse.png
--------------------------------------------------------------------------------
/miscs/model_link.txt:
--------------------------------------------------------------------------------
1 | Baidu disk:
2 | https://pan.baidu.com/s/1-SRaS5FwC30-Pf_gL8bbXQ(code: fmpz)
3 |
4 | Google drive:
5 | https://drive.google.com/drive/folders/1M2s32xQkrDhDLz-VqzvocMuoaSGR1MfX?usp=sharin
6 |
--------------------------------------------------------------------------------
/pose_estimator_2d/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/pose_estimator_2d/__init__.py
--------------------------------------------------------------------------------
/pose_estimator_2d/estimator_2d.py:
--------------------------------------------------------------------------------
1 | import abc
2 |
3 | class Estimator2D(object):
4 | """Base class of 2D human pose estimator."""
5 |
6 | def __init__(self):
7 | pass
8 |
9 | @abc.abstractclassmethod
10 | def estimate(self, img_list, bbox_list=None):
11 | """
12 | Args:
13 | img_list: List of image read by opencv(channel order BGR).
14 | bbox_list: List of bounding-box (left_top x, left_top y,
15 | bbox_width, bbox_height).
16 | Return:
17 | keypoints_list: List of keypoint position (joint_num, x, y,
18 | confidence)
19 | """
20 | pass
--------------------------------------------------------------------------------
/pose_estimator_2d/openpose_estimator.py:
--------------------------------------------------------------------------------
1 | from .estimator_2d import Estimator2D
2 | from openpose import pyopenpose as op
3 |
4 |
5 | class OpenPoseEstimator(Estimator2D):
6 |
7 | def __init__(self, model_folder):
8 | """
9 | OpenPose 2D pose estimator. See [https://github.com/
10 | CMU-Perceptual-Computing-Lab/openpose/tree/ master/examples/
11 | tutorial_api_python] for help.
12 | Args:
13 | """
14 | super().__init__()
15 | params = {'model_folder': model_folder, 'render_pose': 0}
16 | self.opWrapper = op.WrapperPython()
17 | self.opWrapper.configure(params)
18 | self.opWrapper.start()
19 |
20 | def estimate(self, img_list, bbox_list=None):
21 | """See base class."""
22 | keypoints_list = []
23 | for i, img in enumerate(img_list):
24 | if bbox_list:
25 | x, y, w, h = bbox_list[i]
26 | img = img[y:y+h, x:x+w]
27 | datum = op.Datum()
28 | datum.cvInputData = img
29 | self.opWrapper.emplaceAndPop([datum])
30 | keypoints = datum.poseKeypoints
31 | if bbox_list:
32 | # TODO: restore coordinate
33 | pass
34 | keypoints_list.append(datum.poseKeypoints)
35 |
36 | return keypoints_list
--------------------------------------------------------------------------------
/pose_estimator_3d/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/pose_estimator_3d/__init__.py
--------------------------------------------------------------------------------
/pose_estimator_3d/dataset/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/pose_estimator_3d/dataset/__init__.py
--------------------------------------------------------------------------------
/pose_estimator_3d/dataset/wild_pose_dataset.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 |
4 |
5 | class WildPoseDataset(object):
6 |
7 | def __init__(self, input_poses, seq_len, image_width, image_height):
8 | self.seq_len = seq_len
9 | self.input_poses = normalize_screen_coordiantes(input_poses, image_width, image_height)
10 |
11 |
12 | def __len__(self):
13 | return self.input_poses.shape[0]
14 |
15 |
16 | def __getitem__(self, idx):
17 | frame = idx
18 | start = frame - self.seq_len//2
19 | end = frame + self.seq_len//2 + 1
20 |
21 | valid_start = max(0, start)
22 | valid_end = min(self.input_poses.shape[0], end)
23 | pad = (valid_start - start, end - valid_end)
24 | input_pose = self.input_poses[valid_start:valid_end]
25 | if pad != (0, 0):
26 | input_pose = np.pad(input_pose, (pad, (0, 0), (0, 0)), 'edge')
27 | if input_pose.shape[0] == 1:
28 | # squeeze time dimension if sequence length is 1
29 | input_pose = np.squeeze(input_pose, axis=0)
30 |
31 | sample = { 'input_pose': input_pose }
32 | return sample
33 |
34 |
35 | def normalize_screen_coordiantes(pose, w, h):
36 | """
37 | Args:
38 | pose: numpy array with shape (joint, 2).
39 | Return:
40 | normalized pose that [0, WIDTH] is maped to [-1, 1] while preserving the aspect ratio.
41 | """
42 | assert pose.shape[-1] == 2
43 | return pose/w*2 - [1, h/w]
44 |
45 |
46 | def flip_pose(pose, lefts, rights):
47 | if isinstance(pose, np.ndarray):
48 | p = pose.copy()
49 | elif isinstance(pose, torch.Tensor):
50 | p = pose.clone()
51 | else:
52 | raise TypeError(f'{type(pose)}')
53 |
54 | p[..., 0] *= -1
55 | p[..., lefts + rights, :] = p[..., rights + lefts, :]
56 | return p
57 |
--------------------------------------------------------------------------------
/pose_estimator_3d/estimator_3d.py:
--------------------------------------------------------------------------------
1 | from .model.factory import create_model
2 | from .dataset.wild_pose_dataset import WildPoseDataset
3 |
4 | import numpy as np
5 | import pprint
6 | import torch
7 | import torch.utils.data
8 | import yaml
9 | from easydict import EasyDict
10 |
11 |
12 | class Estimator3D(object):
13 | """Base class of 3D human pose estimator."""
14 |
15 | def __init__(self, config_file, checkpoint_file):
16 | with open(config_file, 'r') as f:
17 | print(f'=> Read 3D estimator config from {config_file}.')
18 | self.cfg = EasyDict(yaml.load(f, Loader=yaml.Loader))
19 | pprint.pprint(self.cfg)
20 | self.model = create_model(self.cfg, checkpoint_file)
21 | self.device = torch.device(
22 | 'cuda' if torch.cuda.is_available() else 'cpu'
23 | )
24 | print(f'=> Use device {self.device}.')
25 | self.model = self.model.to(self.device)
26 |
27 | def estimate(self, poses_2d, image_width, image_height):
28 | # pylint: disable=no-member
29 | dataset = WildPoseDataset(
30 | input_poses=poses_2d,
31 | seq_len=self.cfg.DATASET.SEQ_LEN,
32 | image_width=image_width,
33 | image_height=image_height
34 | )
35 | loader = torch.utils.data.DataLoader(
36 | dataset=dataset,
37 | batch_size=self.cfg.TRAIN.BATCH_SIZE
38 | )
39 | poses_3d = np.zeros((poses_2d.shape[0], self.cfg.DATASET.OUT_JOINT, 3))
40 | frame = 0
41 | print('=> Begin to estimate 3D poses.')
42 | with torch.no_grad():
43 | for batch in loader:
44 | input_pose = batch['input_pose'].float().cuda()
45 |
46 | output = self.model(input_pose)
47 | if self.cfg.DATASET.TEST_FLIP:
48 | input_lefts = self.cfg.DATASET.INPUT_LEFT_JOINTS
49 | input_rights = self.cfg.DATASET.INPUT_RIGHT_JOINTS
50 | output_lefts = self.cfg.DATASET.OUTPUT_LEFT_JOINTS
51 | output_rights = self.cfg.DATASET.OUTPUT_RIGHT_JOINTS
52 |
53 | flip_input_pose = input_pose.clone()
54 | flip_input_pose[..., :, 0] *= -1
55 | flip_input_pose[..., input_lefts + input_rights, :] = flip_input_pose[..., input_rights + input_lefts, :]
56 |
57 | flip_output = self.model(flip_input_pose)
58 | flip_output[..., :, 0] *= -1
59 | flip_output[..., output_lefts + output_rights, :] = flip_output[..., output_rights + output_lefts, :]
60 |
61 | output = (output + flip_output) / 2
62 | output[:, 0] = 0 # center the root joint
63 | output *= 1000 # m -> mm
64 |
65 | batch_size = output.shape[0]
66 | poses_3d[frame:frame+batch_size] = output.cpu().numpy()
67 | frame += batch_size
68 | print(f'{frame} / {poses_2d.shape[0]}')
69 |
70 | return poses_3d
--------------------------------------------------------------------------------
/pose_estimator_3d/model/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/pose_estimator_3d/model/__init__.py
--------------------------------------------------------------------------------
/pose_estimator_3d/model/factory.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import yaml
3 | from easydict import EasyDict
4 |
5 |
6 | def create_model(cfg, checkpoint_file):
7 | if cfg.MODEL.NAME == 'linear_model':
8 | from .linear_model import LinearModel
9 | model = LinearModel(
10 | in_joint=cfg.DATASET.IN_JOINT,
11 | in_channel=cfg.DATASET.IN_CHANNEL,
12 | out_joint=cfg.DATASET.OUT_JOINT,
13 | out_channel=cfg.DATASET.OUT_CHANNEL,
14 | block_num=cfg.MODEL.BLOCK_NUM,
15 | hidden_size=cfg.MODEL.HIDDEN_SIZE,
16 | dropout=cfg.MODEL.DROPOUT,
17 | bias=cfg.MODEL.BIAS,
18 | residual=cfg.MODEL.RESIDUAL
19 | )
20 | elif cfg.MODEL.NAME == 'video_pose':
21 | from .video_pose import VideoPose
22 | model = VideoPose(
23 | in_joint=cfg.DATASET.IN_JOINT,
24 | in_channel=cfg.DATASET.IN_CHANNEL,
25 | out_joint=cfg.DATASET.OUT_JOINT,
26 | out_channel=cfg.DATASET.OUT_CHANNEL,
27 | filter_widths=cfg.MODEL.FILTER_WIDTHS,
28 | hidden_size=cfg.MODEL.HIDDEN_SIZE,
29 | dropout=cfg.MODEL.DROPOUT,
30 | dsc=cfg.MODEL.DSC
31 | )
32 | else:
33 | raise ValueError(f'Model name {cfg.MODEL.NAME} is invalid.')
34 |
35 | print(f'=> Load checkpoint {checkpoint_file}')
36 | pretrained_dict = torch.load(checkpoint_file)['model_state']
37 | model_dict = model.state_dict()
38 | pretrained_dict = {
39 | k: v for k, v in pretrained_dict.items()
40 | if k in model_dict
41 | }
42 | model_dict.update(pretrained_dict)
43 | model.load_state_dict(model_dict)
44 |
45 | model = model.eval()
46 |
47 | return model
48 |
--------------------------------------------------------------------------------
/pose_estimator_3d/model/linear_model.py:
--------------------------------------------------------------------------------
1 | from .module import ResidualBlock, get_activation
2 |
3 | import torch
4 | import torch.nn as nn
5 |
6 |
7 | class LinearModel(nn.Module):
8 |
9 | def __init__(self, in_joint, in_channel, out_joint, out_channel, block_num, hidden_size,
10 | activation='relu', dropout=0.25, bias=True, residual=True):
11 | super().__init__()
12 |
13 | self.in_joint = in_joint
14 | self.out_joint = out_joint
15 | self.out_channel = out_channel
16 |
17 | self.activation = get_activation(activation)
18 | self.drop = nn.Dropout(dropout)
19 | self.expand_fc = nn.Linear(in_joint*in_channel, hidden_size, bias=bias)
20 | self.expand_bn = nn.BatchNorm1d(hidden_size)
21 | self.blocks = nn.Sequential(*[
22 | ResidualBlock(hidden_size, activation, dropout, residual, bias)
23 | for i in range(block_num)
24 | ])
25 | self.shrink_fc = nn.Linear(hidden_size, out_joint*out_channel, bias=bias)
26 |
27 |
28 | def forward(self, x):
29 | batch_size = x.shape[0]
30 | x = x.view(batch_size, -1)
31 |
32 | x = self.drop(self.activation(self.expand_bn(self.expand_fc(x))))
33 | x = self.blocks(x)
34 | x = self.shrink_fc(x)
35 |
36 | x = x.view(batch_size, self.out_joint, self.out_channel)
37 | return x
--------------------------------------------------------------------------------
/pose_estimator_3d/model/module.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 |
5 | def get_activation(name):
6 | if name == 'relu':
7 | return nn.ReLU()
8 | elif name == 'leaky_relu':
9 | return nn.LeakyReLU()
10 | else:
11 | raise ValueError(f'Activation "{name}" is invalid.')
12 |
13 |
14 | class ResidualBlock(nn.Module):
15 |
16 | def __init__(self, hidden_size, activation='relu', dropout=0, residual=True, bias=False):
17 | super(ResidualBlock, self).__init__()
18 |
19 | self.fc1 = nn.Linear(hidden_size, hidden_size, bias=bias)
20 | self.fc2 = nn.Linear(hidden_size, hidden_size, bias=bias)
21 | self.bn1 = nn.BatchNorm1d(hidden_size)
22 | self.bn2 = nn.BatchNorm1d(hidden_size)
23 | self.activation = get_activation(activation)
24 | self.drop = nn.Dropout(dropout)
25 | self.residual = lambda x: x if residual else 0
26 |
27 |
28 | def forward(self, x):
29 | res = self.residual(x)
30 | x = self.drop(self.activation(self.bn1(self.fc1(x))))
31 | x = self.drop(self.activation(self.bn2(self.fc2(x))))
32 | return x + res
33 |
34 |
35 | class DepthwiseSeparableConv1d(nn.Module):
36 |
37 | def __init__(self, in_channels, out_channels, kernel_size,
38 | bias=False, dilation=1, padding=0, stride=1):
39 | super(DepthwiseSeparableConv1d, self).__init__()
40 |
41 | self.depthwise_conv = nn.Conv1d(
42 | in_channels=in_channels, out_channels=in_channels,
43 | kernel_size=kernel_size, groups=in_channels,
44 | bias=bias, stride=stride, padding=padding, dilation=dilation,)
45 | self.pointwise_conv = nn.Conv1d(
46 | in_channels=in_channels, out_channels=out_channels,
47 | kernel_size=1, groups=1,
48 | bias=bias, stride=1, padding=0, dilation=1
49 | )
50 |
51 |
52 | def forward(self, x):
53 | x = self.depthwise_conv(x)
54 | x = self.pointwise_conv(x)
55 | return x
--------------------------------------------------------------------------------
/pose_estimator_3d/model/video_pose.py:
--------------------------------------------------------------------------------
1 | from .module import DepthwiseSeparableConv1d
2 |
3 | import torch
4 | import torch.nn as nn
5 |
6 |
7 | class VideoPose(nn.Module):
8 |
9 | def __init__(self, in_joint, in_channel, out_joint, out_channel, filter_widths, hidden_size, dropout, dsc):
10 | super().__init__()
11 |
12 | self.train_model = None
13 | self.eval_model = TemporalModel(
14 | in_joint, in_channel, out_joint, out_channel, filter_widths, hidden_size, dropout, dsc
15 | )
16 | self.current_model = self.eval_model
17 |
18 | def forward(self, x):
19 | return self.current_model(x)
20 |
21 |
22 | class TemporalModelBase(nn.Module):
23 | """
24 | Do not instantiate this class.
25 | """
26 |
27 | def __init__(self, in_joint, in_channel, out_joint, out_channel, filter_widths, hidden_size, dropout, dsc):
28 | super().__init__()
29 |
30 | # Validate input
31 | for fw in filter_widths:
32 | assert fw % 2 != 0, 'Only odd filter widths are supported'
33 |
34 | self.in_joint = in_joint
35 | self.out_joint = out_joint
36 | self.filter_widths = filter_widths
37 | self.out_channel = out_channel
38 |
39 | self.drop = nn.Dropout(dropout)
40 | self.relu = nn.ReLU(inplace=True)
41 |
42 | self.pad = [ filter_widths[0] // 2 ]
43 | self.expand_bn = nn.BatchNorm1d(hidden_size, momentum=0.1)
44 | self.shrink = nn.Conv1d(hidden_size, out_joint * out_channel, 1)
45 |
46 |
47 | def set_bn_momentum(self, momentum):
48 | self.expand_bn.momentum = momentum
49 | for bn in self.layers_bn:
50 | bn.momentum = momentum
51 |
52 |
53 | def forward(self, x):
54 | assert len(x.shape) == 4
55 | assert x.shape[-2] == self.in_joint
56 |
57 | batch_size, seq_len, joint, channel = x.shape
58 | x = x.view(batch_size, seq_len, -1)
59 | x = x.permute(0, 2, 1) # channel first
60 |
61 | x = self._forward_blocks(x)
62 |
63 | x = x.permute(0, 2, 1) # channel last
64 | x = x.view(batch_size, self.out_joint, self.out_channel)
65 | return x
66 |
67 |
68 | class TemporalModel(TemporalModelBase):
69 | """
70 | Reference 3D pose estimation model with temporal convolutions.
71 | This implementation can be used for all use-cases.
72 | """
73 |
74 | def __init__(self, in_joint, in_channel, out_joint, out_channel, filter_widths, hidden_size, dropout, dsc):
75 | super().__init__(in_joint, in_channel, out_joint, out_channel,
76 | filter_widths, hidden_size, dropout, dsc)
77 |
78 | self.expand_conv = nn.Conv1d(in_joint*in_channel, hidden_size, filter_widths[0], bias=False)
79 |
80 | layers_conv = []
81 | layers_bn = []
82 |
83 | next_dilation = filter_widths[0]
84 | conv_class = DepthwiseSeparableConv1d if dsc else nn.Conv1d
85 | for i in range(1, len(filter_widths)):
86 | self.pad.append((filter_widths[i] - 1)*next_dilation // 2)
87 | layers_conv.append(conv_class(
88 | hidden_size, hidden_size, filter_widths[i], dilation=next_dilation, bias=False
89 | ))
90 | layers_bn.append(nn.BatchNorm1d(hidden_size, momentum=0.1))
91 | layers_conv.append(nn.Conv1d(hidden_size, hidden_size, 1, dilation=1, bias=False))
92 | layers_bn.append(nn.BatchNorm1d(hidden_size, momentum=0.1))
93 |
94 | next_dilation *= filter_widths[i]
95 |
96 | self.layers_conv = nn.ModuleList(layers_conv)
97 | self.layers_bn = nn.ModuleList(layers_bn)
98 |
99 | def _forward_blocks(self, x):
100 | x = self.drop(self.relu(self.expand_bn(self.expand_conv(x))))
101 |
102 | for i in range(len(self.pad) - 1):
103 | pad = self.pad[i+1]
104 | res = x[:, :, pad : x.shape[2] - pad]
105 |
106 | x = self.drop(self.relu(self.layers_bn[2*i](self.layers_conv[2*i](x))))
107 | x = res + self.drop(self.relu(self.layers_bn[2*i + 1](self.layers_conv[2*i + 1](x))))
108 |
109 | x = self.shrink(x)
110 | return x
111 |
112 |
113 | class TemporalModelOptimized1f(TemporalModelBase):
114 | """
115 | 3D pose estimation model optimized for single-frame batching, i.e.
116 | where batches have input length = receptive field, and output length = 1.
117 | This scenario is only used for training when stride == 1.
118 |
119 | This implementation replaces dilated convolutions with strided convolutions
120 | to avoid generating unused intermediate results. The weights are interchangeable
121 | with the reference implementation.
122 | """
123 |
124 | def __init__(self, in_joint, in_channel, out_joint, out_channel,
125 | filter_widths, hidden_size, dropout, dsc):
126 | super().__init__(in_joint, in_channel, out_joint, out_channel,
127 | filter_widths, hidden_size, dropout, dsc)
128 |
129 | self.expand_conv = nn.Conv1d(in_joint*in_channel, hidden_size, filter_widths[0],
130 | stride=filter_widths[0], bias=False)
131 |
132 | layers_conv = []
133 | layers_bn = []
134 |
135 | next_dilation = filter_widths[0]
136 | conv_class = DepthwiseSeparableConv1d if dsc else nn.Conv1d
137 | for i in range(1, len(filter_widths)):
138 | self.pad.append((filter_widths[i] - 1)*next_dilation // 2)
139 | layers_conv.append(conv_class(
140 | hidden_size, hidden_size, filter_widths[i], stride=filter_widths[i], bias=False
141 | ))
142 | layers_bn.append(nn.BatchNorm1d(hidden_size, momentum=0.1))
143 | layers_conv.append(nn.Conv1d(hidden_size, hidden_size, 1, dilation=1, bias=False))
144 | layers_bn.append(nn.BatchNorm1d(hidden_size, momentum=0.1))
145 | next_dilation *= filter_widths[i]
146 |
147 | self.layers_conv = nn.ModuleList(layers_conv)
148 | self.layers_bn = nn.ModuleList(layers_bn)
149 |
150 | def _forward_blocks(self, x):
151 | x = self.drop(self.relu(self.expand_bn(self.expand_conv(x))))
152 |
153 | for i in range(len(self.pad) - 1):
154 | res = x[:, :, self.filter_widths[i+1]//2 :: self.filter_widths[i+1]]
155 |
156 | x = self.drop(self.relu(self.layers_bn[2*i](self.layers_conv[2*i](x))))
157 | x = res + self.drop(self.relu(self.layers_bn[2*i + 1](self.layers_conv[2*i + 1](x))))
158 |
159 | x = self.shrink(x)
160 | return x
--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
1 | from . import smooth, camera, vis
--------------------------------------------------------------------------------
/utils/camera.py:
--------------------------------------------------------------------------------
1 | import h5py
2 | import numpy as np
3 | from pathlib import Path
4 |
5 | def load_camera_params(file):
6 | cam_file = Path(file)
7 | cam_params = {}
8 | azimuth = {
9 | '54138969': 70, '55011271': -70, '58860488': 110, '60457274': -100
10 | }
11 | with h5py.File(cam_file) as f:
12 | subjects = [1, 5, 6, 7, 8, 9, 11]
13 | for s in subjects:
14 | cam_params[f'S{s}'] = {}
15 | for _, params in f[f'subject{s}'].items():
16 | name = params['Name']
17 | name = ''.join([chr(c) for c in name])
18 | val = {}
19 | val['R'] = np.array(params['R'])
20 | val['T'] = np.array(params['T'])
21 | val['c'] = np.array(params['c'])
22 | val['f'] = np.array(params['f'])
23 | val['k'] = np.array(params['k'])
24 | val['p'] = np.array(params['p'])
25 | val['azimuth'] = azimuth[name]
26 | cam_params[f'S{s}'][name] = val
27 |
28 | return cam_params
29 |
30 |
31 | def world2camera(pose, R, T):
32 | """
33 | Args:
34 | pose: numpy array with shape (-1, 3)
35 | R: numpy array with shape (3, 3)
36 | T: numyp array with shape (3, 1)
37 | """
38 | assert pose.shape[-1] == 3
39 | original_shape = pose.shape
40 | pose_world = pose.copy().reshape((-1, 3)).T
41 | pose_cam = np.matmul(R.T, pose_world - T)
42 | pose_cam = pose_cam.T.reshape(original_shape)
43 | return pose_cam
44 |
45 |
46 | def camera2world(pose, R, T):
47 | """
48 | Args:
49 | pose: numpy array with shape (..., 3)
50 | R: numpy array with shape (3, 3)
51 | T: numyp array with shape (3, 1)
52 | """
53 | assert pose.shape[-1] == 3
54 | original_shape = pose.shape
55 | pose_cam = pose.copy().reshape((-1, 3)).T
56 | pose_world = np.matmul(R, pose_cam) + T
57 | pose_world = pose_world.T.reshape(original_shape)
58 | return pose_world
59 |
--------------------------------------------------------------------------------
/utils/smooth.py:
--------------------------------------------------------------------------------
1 | def filter_missing_value(keypoints_list, method='ignore'):
2 | # TODO: impletemd 'interpolate' method.
3 | """Filter missing value in pose list.
4 | Args:
5 | keypoints_list: Estimate result returned by 2d estimator. Missing value
6 | will be None.
7 | method: 'ignore' -> drop missing value.
8 | Return:
9 | Keypoints list without missing value.
10 | """
11 |
12 | result = []
13 | if method == 'ignore':
14 | result = [pose for pose in keypoints_list if pose is not None]
15 | else:
16 | raise ValueError(f'{method} is not a valid method.')
17 |
18 | return result
--------------------------------------------------------------------------------
/utils/vis.py:
--------------------------------------------------------------------------------
1 | from . import camera
2 |
3 | import cv2
4 | import numpy as np
5 | import os
6 | from pathlib import Path
7 |
8 | import matplotlib.pyplot as plt
9 | import mpl_toolkits.mplot3d.axes3d
10 | from matplotlib.animation import FuncAnimation, writers
11 |
12 |
13 | def vis_2d_keypoints(
14 | keypoints, img, skeleton, kp_thresh,
15 | alpha=0.7, output_file=None, show_name=False):
16 |
17 | # Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv.
18 | cmap = plt.get_cmap('rainbow')
19 | colors = [cmap(i) for i in np.linspace(0, 1, skeleton.keypoint_num)]
20 | colors = [(c[2] * 255, c[1] * 255, c[0] * 255) for c in colors]
21 |
22 | mask = img.copy()
23 | root = skeleton.root
24 | stack = [root]
25 | while stack:
26 | parent = stack.pop()
27 | p_idx = skeleton.keypoint2index[parent]
28 | p_pos = int(keypoints[p_idx, 0]), int(keypoints[p_idx, 1])
29 | p_score = keypoints[p_idx, 2] if kp_thresh is not None else None
30 | if kp_thresh is None or p_score > kp_thresh:
31 | cv2.circle(
32 | mask, p_pos, radius=3,
33 | color=colors[p_idx], thickness=-1, lineType=cv2.LINE_AA)
34 | if show_name:
35 | cv2.putText(mask, parent, p_pos, cv2.FONT_HERSHEY_SIMPLEX,
36 | 0.5, (0, 255, 0))
37 | for child in skeleton.children[parent]:
38 | if child not in skeleton.keypoint2index or \
39 | skeleton.keypoint2index[child] < 0:
40 | continue
41 | stack.append(child)
42 | c_idx = skeleton.keypoint2index[child]
43 | c_pos = int(keypoints[c_idx, 0]), int(keypoints[c_idx, 1])
44 | c_score = keypoints[c_idx, 2] if kp_thresh else None
45 | if kp_thresh is None or \
46 | (p_score > kp_thresh and c_score > kp_thresh):
47 | cv2.line(
48 | mask, p_pos, c_pos,
49 | color=colors[c_idx], thickness=2, lineType=cv2.LINE_AA)
50 |
51 | vis_result = cv2.addWeighted(img, 1.0 - alpha, mask, alpha, 0)
52 | if output_file:
53 | file = Path(output_file)
54 | if not file.parent.exists():
55 | os.makedirs(file.parent)
56 | cv2.imwrite(str(output_file), vis_result)
57 |
58 | return vis_result
59 |
60 |
61 | def vis_3d_keypoints( keypoints, skeleton, azimuth, elev=15):
62 | x_max, x_min = np.max(keypoints[:, 0]), np.min(keypoints[:, 0])
63 | y_max, y_min = np.max(keypoints[:, 1]), np.min(keypoints[:, 1])
64 | z_max, z_min = np.max(keypoints[:, 2]), np.min(keypoints[:, 2])
65 | radius = max(x_max - x_min, y_max - y_min, z_max - z_min) / 2
66 |
67 | fig = plt.figure()
68 | ax = fig.add_subplot(111, projection='3d')
69 | ax.view_init(elev=elev, azim=azimuth)
70 | ax.set_xlim3d([-radius, radius])
71 | ax.set_ylim3d([-radius, radius])
72 | ax.set_zlim3d([0, 2 * radius])
73 |
74 | root = skeleton.root
75 | stack = [root]
76 | while stack:
77 | parent = stack.pop()
78 | p_idx = skeleton.keypoint2index[parent]
79 | p_pos = keypoints[p_idx]
80 | for child in skeleton.children[parent]:
81 | if skeleton.keypoint2index.get(child, -1) == -1:
82 | continue
83 | stack.append(child)
84 | c_idx = skeleton.keypoint2index[child]
85 | c_pos = keypoints[c_idx]
86 | if child in skeleton.left_joints:
87 | color = 'b'
88 | elif child in skeleton.right_joints:
89 | color = 'r'
90 | else:
91 | color = 'k'
92 | line = ax.plot(
93 | xs=[p_pos[0], c_pos[0]],
94 | ys=[p_pos[1], c_pos[1]],
95 | zs=[p_pos[2], c_pos[2]],
96 | c=color, marker='.', zdir='z'
97 | )
98 |
99 | return
100 |
101 |
102 | def vis_3d_keypoints_sequence(
103 | keypoints_sequence, skeleton, azimuth,
104 | fps=30, elev=15, output_file=None
105 | ):
106 | kps_sequence = keypoints_sequence
107 | x_max, x_min = np.max(kps_sequence[:, :, 0]), np.min(kps_sequence[:, :, 0])
108 | y_max, y_min = np.max(kps_sequence[:, :, 1]), np.min(kps_sequence[:, :, 1])
109 | z_max, z_min = np.max(kps_sequence[:, :, 2]), np.min(kps_sequence[:, :, 2])
110 | radius = max(x_max - x_min, y_max - y_min, z_max - z_min) / 2
111 |
112 | fig = plt.figure()
113 | ax = fig.add_subplot(111, projection='3d')
114 | ax.view_init(elev=elev, azim=azimuth)
115 | ax.set_xlim3d([-radius, radius])
116 | ax.set_ylim3d([-radius, radius])
117 | ax.set_zlim3d([0, 2 * radius])
118 |
119 | initialized = False
120 | lines = []
121 |
122 | def update(frame):
123 | nonlocal initialized
124 |
125 | if not initialized:
126 | root = skeleton.root
127 | stack = [root]
128 | while stack:
129 | parent = stack.pop()
130 | p_idx = skeleton.keypoint2index[parent]
131 | p_pos = kps_sequence[0, p_idx]
132 | for child in skeleton.children[parent]:
133 | if skeleton.keypoint2index.get(child, -1) == -1:
134 | continue
135 | stack.append(child)
136 | c_idx = skeleton.keypoint2index[child]
137 | c_pos = kps_sequence[0, c_idx]
138 | if child in skeleton.left_joints:
139 | color = 'b'
140 | elif child in skeleton.right_joints:
141 | color = 'r'
142 | else:
143 | color = 'k'
144 | line = ax.plot(
145 | xs=[p_pos[0], c_pos[0]],
146 | ys=[p_pos[1], c_pos[1]],
147 | zs=[p_pos[2], c_pos[2]],
148 | c=color, marker='.', zdir='z'
149 | )
150 | lines.append(line)
151 | initialized = True
152 | else:
153 | line_idx = 0
154 | root = skeleton.root
155 | stack = [root]
156 | while stack:
157 | parent = stack.pop()
158 | p_idx = skeleton.keypoint2index[parent]
159 | p_pos = kps_sequence[frame, p_idx]
160 | for child in skeleton.children[parent]:
161 | if skeleton.keypoint2index.get(child, -1) == -1:
162 | continue
163 | stack.append(child)
164 | c_idx = skeleton.keypoint2index[child]
165 | c_pos = kps_sequence[frame, c_idx]
166 | if child in skeleton.left_joints:
167 | color = 'b'
168 | elif child in skeleton.right_joints:
169 | color = 'r'
170 | else:
171 | color = 'k'
172 | lines[line_idx][0].set_xdata([p_pos[0], c_pos[0]])
173 | lines[line_idx][0].set_ydata([p_pos[1], c_pos[1]])
174 | lines[line_idx][0].set_3d_properties( [p_pos[2], c_pos[2]])
175 | line_idx += 1
176 |
177 | anim = FuncAnimation(
178 | fig=fig, func=update, frames=kps_sequence.shape[0], interval=1000 / fps
179 | )
180 |
181 | if output_file:
182 | output_file = Path(output_file)
183 | if not output_file.parent.exists():
184 | os.makedirs(output_file.parent)
185 | if output_file.suffix == '.mp4':
186 | Writer = writers['ffmpeg']
187 | writer = Writer(fps=fps, metadata={}, bitrate=3000)
188 | anim.save(output_file, writer=writer)
189 | elif output_file.suffix == '.gif':
190 | anim.save(output_file, dpi=80, writer='imagemagick')
191 | else:
192 | raise ValueError(f'Unsupported output format.'
193 | f'Only mp4 and gif are supported.')
194 |
195 | return anim
196 |
--------------------------------------------------------------------------------