├── .gitignore ├── LICENSE ├── README.md ├── bvh_skeleton ├── __init__.py ├── bvh_helper.py ├── cmu_skeleton.py ├── coco_skeleton.py ├── h36m_original_skeleton.py ├── h36m_skeleton.py ├── math3d.py └── openpose_skeleton.py ├── cameras.h5 ├── demo.ipynb ├── miscs ├── cxk.mp4 ├── cxk_cache │ ├── 2d_pose.npy │ ├── 3d_pose.npy │ └── cxk.bvh ├── demo │ ├── cxk_2d_pose.gif │ ├── cxk_3d_pose.gif │ ├── cxk_bvh.gif │ ├── cxk_retargeting.gif │ └── demo.gif ├── girl_model │ ├── girl.mhx2 │ └── textures │ │ ├── brown_eye.png │ │ ├── female_casualsuit02_ao.png │ │ ├── female_casualsuit02_diffuse.png │ │ ├── female_casualsuit02_normal.png │ │ ├── middleage_lightskinned_female_diffuse2.png │ │ └── ponytail01_diffuse.png └── model_link.txt ├── pose_estimator_2d ├── __init__.py ├── estimator_2d.py └── openpose_estimator.py ├── pose_estimator_3d ├── __init__.py ├── dataset │ ├── __init__.py │ └── wild_pose_dataset.py ├── estimator_3d.py └── model │ ├── __init__.py │ ├── factory.py │ ├── linear_model.py │ ├── module.py │ └── video_pose.py └── utils ├── __init__.py ├── camera.py ├── smooth.py └── vis.py /.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | .ipynb_checkpoints/ 3 | .vscode/ 4 | .idea/ 5 | config_file/ 6 | models/ 7 | *.pyc 8 | *.swp -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 KevinLTT 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # video2bvh 2 | 3 | video2bvh extracts human motion in video and save it as bvh mocap file. 4 | 5 | ![demo](https://github.com/KevinLTT/video2bvh/raw/master/miscs/demo/demo.gif) 6 | 7 | ## Introduction 8 | 9 | video2bvh consists of 3 modules: pose_estimator_2d, pose_estimator_3d and bvh_skeleton. 10 | - **pose_estimator_2d**: Since the 3D pose estimation models we used are 2-stage model(image-> 2D pose -> 3D pose), this module is used for estimate 2D human pose (2D joint keypoint position) from image. We choose [OpenPose](https://github.com/CMU-Perceptual-Computing-Lab/openpose) as the 2d estimator. It can detect 2D joint keypoints accurately at real-time speed. 11 | - **pose_estimator_3d**: We provide 2 models to estimate 3D human pose. 12 | - [3d-pose-baseline](https://github.com/una-dinosauria/3d-pose-baseline): This model is proposed by Julieta Martinez, Rayat Hossain, Javier Romero, and James J. Little in ICCV 2017.[[PAPER]](https://arxiv.org/pdf/1705.03098.pdf)[[CODE]](https://github.com/una-dinosauria/3d-pose-baseline). It uses single frame 2d pose as input. Its original implementation is based on TensorFlow, and we reimplemented it using PyTorch. 13 | - [VideoPose3D](https://github.com/facebookresearch/VideoPose3D): This model is proposed by Dario Pavllo, Christoph Feichtenhofer, David Grangier, and Michael Auli in CVPR 2019.[[PAPER]](https://arxiv.org/abs/1811.11742)[[CODE]](https://github.com/facebookresearch/VideoPose3D). It uses 2d pose sequence as input. We slightly modificate the original implementation. 14 | - **bvh_skeleton**: This module includes the function that estimates skeleton information from 3D pose, converts 3D pose to joint angle and write motion data to bvh file. 15 | 16 | 17 | ## Dependencies 18 | - [OpenPose](https://github.com/CMU-Perceptual-Computing-Lab/openpose): See OpenPose offical [installation.md](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/installation.md#python-api) for help. Note to turn on the `BUILD_PYTHON` flag while building. 19 | - [pytorch](https://github.com/pytorch/pytorch). 20 | - [python-opencv](https://opencv.org/). 21 | - [numpy](https://numpy.org/) 22 | 23 | 24 | ## Pre-trained models 25 | The original models provided by [3d-pose-baseline](https://github.com/una-dinosauria/3d-pose-baseline) and [VideoPose3D](https://github.com/facebookresearch/VideoPose3D) use [Human3.6M](http://vision.imar.ro/human3.6m/description.php) 17-joint skeleton as input format (See [bvh_skeleton/h36m_skeleton.py](https://github.com/KevinLTT/video2bvh/raw/master/bvh_skeleton/h36m_skeleton.py)), but OpenPose's detection result are 25-joint (See OpenPose [output.md](https://github.com/CMU-Perceptual-Computing-Lab/openpose/blob/master/doc/output.md#pose-output-format-body_25)). So, we trained these models using 2D pose estimated by OpenPose in [Human3.6M](http://vision.imar.ro/human3.6m/description.php) dataset from scratch. 26 | 27 | The training progress is almostly same as the originial implementation. We use subject S1, S5, S6, S7, S8 as the training set, and S9, S11 as the test set. For 3d-pose-baseline, the best MPJPE is 64.12 mm (Protocol #1), and for VideoPose3D the best MPJPE is 58.58 mm (Protocol #1). The pre-trained models can be downloaded from following links. 28 | 29 | * [Google Drive](https://drive.google.com/drive/folders/1M2s32xQkrDhDLz-VqzvocMuoaSGR1MfX?usp=sharin) 30 | * [Baidu Disk](https://pan.baidu.com/s/1-SRaS5FwC30-Pf_gL8bbXQ) (code: fmpz) 31 | 32 | After you download the `models` folder, place or link it under the root directory of this project. 33 | 34 | 35 | ## Quick Start 36 | Open [demo.ipynb](https://github.com/KevinLTT/video2bvh/raw/master/demo.ipynb) in Jupyter Notebook and follow the instructions. As you will see in the [demo.ipynb](https://github.com/KevinLTT/video2bvh/raw/master/demo.ipynb), video2bvh converts video to bvh file with 3 main steps. 37 | 38 | ### 1. Estimate 2D pose from video 39 |

40 | 41 |

42 | 43 | ### 2. Estimate 3D pose from 2D pose 44 |

45 | 46 |

47 | 48 | ### 3. Convert 3D pose to bvh motion capture file 49 |

50 | 51 |

52 | 53 | 54 | ## Retargeting 55 | Once get the bvh file, you can easily retarget the motion to other 3D character model with existing tools. The girl model we used is craeted using [MakeHuman](http://www.makehumancommunity.org/), and the demo is rendered with [Blender](https://www.blender.org/). The [MakeWalk](http://www.makehumancommunity.org/wiki/Documentation:MakeWalk) plugin helps us do the retargeting work. 56 | 57 |

58 | 59 |

60 | 61 | ## TODO 62 | - [ ] Add more 2D estimators, such as [HRNet](https://github.com/leoxiaobin/deep-high-resolution-net.pytorch) and [PoseResNet](https://github.com/microsoft/human-pose-estimation.pytorch). 63 | - [ ] Smoothing 2D pose and 3D pose. 64 | - [ ] Real-time demo. -------------------------------------------------------------------------------- /bvh_skeleton/__init__.py: -------------------------------------------------------------------------------- 1 | from . import h36m_original_skeleton 2 | from . import h36m_skeleton 3 | from . import openpose_skeleton -------------------------------------------------------------------------------- /bvh_skeleton/bvh_helper.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pathlib import Path 3 | 4 | 5 | class BvhNode(object): 6 | def __init__( 7 | self, name, offset, rotation_order, 8 | children=None, parent=None, is_root=False, is_end_site=False 9 | ): 10 | if not is_end_site and \ 11 | rotation_order not in ['xyz', 'xzy', 'yxz', 'yzx', 'zxy', 'zyx']: 12 | raise ValueError(f'Rotation order invalid.') 13 | self.name = name 14 | self.offset = offset 15 | self.rotation_order = rotation_order 16 | self.children = children 17 | self.parent = parent 18 | self.is_root = is_root 19 | self.is_end_site = is_end_site 20 | 21 | 22 | class BvhHeader(object): 23 | def __init__(self, root, nodes): 24 | self.root = root 25 | self.nodes = nodes 26 | 27 | 28 | def write_header(writer, node, level): 29 | indent = ' ' * 4 * level 30 | if node.is_root: 31 | writer.write(f'{indent}ROOT {node.name}\n') 32 | channel_num = 6 33 | elif node.is_end_site: 34 | writer.write(f'{indent}End Site\n') 35 | channel_num = 0 36 | else: 37 | writer.write(f'{indent}JOINT {node.name}\n') 38 | channel_num = 3 39 | writer.write(f'{indent}{"{"}\n') 40 | 41 | indent = ' ' * 4 * (level + 1) 42 | writer.write( 43 | f'{indent}OFFSET ' 44 | f'{node.offset[0]} {node.offset[1]} {node.offset[2]}\n' 45 | ) 46 | if channel_num: 47 | channel_line = f'{indent}CHANNELS {channel_num} ' 48 | if node.is_root: 49 | channel_line += f'Xposition Yposition Zposition ' 50 | channel_line += ' '.join([ 51 | f'{axis.upper()}rotation' 52 | for axis in node.rotation_order 53 | ]) 54 | writer.write(channel_line + '\n') 55 | 56 | for child in node.children: 57 | write_header(writer, child, level + 1) 58 | 59 | indent = ' ' * 4 * level 60 | writer.write(f'{indent}{"}"}\n') 61 | 62 | 63 | def write_bvh(output_file, header, channels, frame_rate=30): 64 | output_file = Path(output_file) 65 | if not output_file.parent.exists(): 66 | os.makedirs(output_file.parent) 67 | 68 | with output_file.open('w') as f: 69 | f.write('HIERARCHY\n') 70 | write_header(writer=f, node=header.root, level=0) 71 | 72 | f.write('MOTION\n') 73 | f.write(f'Frames: {len(channels)}\n') 74 | f.write(f'Frame Time: {1 / frame_rate}\n') 75 | 76 | for channel in channels: 77 | f.write(' '.join([f'{element}' for element in channel]) + '\n') -------------------------------------------------------------------------------- /bvh_skeleton/cmu_skeleton.py: -------------------------------------------------------------------------------- 1 | from . import math3d 2 | from . import bvh_helper 3 | 4 | import numpy as np 5 | from pprint import pprint 6 | 7 | 8 | class CMUSkeleton(object): 9 | 10 | def __init__(self): 11 | self.root = 'Hips' 12 | self.keypoint2index = { 13 | 'Hips': 0, 14 | 'RightUpLeg': 1, 15 | 'RightLeg': 2, 16 | 'RightFoot': 3, 17 | 'LeftUpLeg': 4, 18 | 'LeftLeg': 5, 19 | 'LeftFoot': 6, 20 | 'Spine': 7, 21 | 'Spine1': 8, 22 | 'Neck1': 9, 23 | 'HeadEndSite': 10, 24 | 'LeftArm': 11, 25 | 'LeftForeArm': 12, 26 | 'LeftHand': 13, 27 | 'RightArm': 14, 28 | 'RightForeArm': 15, 29 | 'RightHand': 16, 30 | 'RightHipJoint': -1, 31 | 'RightFootEndSite': -1, 32 | 'LeftHipJoint': -1, 33 | 'LeftFootEndSite': -1, 34 | 'LeftShoulder': -1, 35 | 'LeftHandEndSite': -1, 36 | 'RightShoulder': -1, 37 | 'RightHandEndSite': -1, 38 | 'LowerBack': -1, 39 | 'Neck': -1 40 | } 41 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()} 42 | self.keypoint_num = len(self.keypoint2index) 43 | 44 | self.children = { 45 | 'Hips': ['LeftHipJoint', 'LowerBack', 'RightHipJoint'], 46 | 'LeftHipJoint': ['LeftUpLeg'], 47 | 'LeftUpLeg': ['LeftLeg'], 48 | 'LeftLeg': ['LeftFoot'], 49 | 'LeftFoot': ['LeftFootEndSite'], 50 | 'LeftFootEndSite': [], 51 | 'LowerBack': ['Spine'], 52 | 'Spine': ['Spine1'], 53 | 'Spine1': ['LeftShoulder', 'Neck', 'RightShoulder'], 54 | 'LeftShoulder': ['LeftArm'], 55 | 'LeftArm': ['LeftForeArm'], 56 | 'LeftForeArm': ['LeftHand'], 57 | 'LeftHand': ['LeftHandEndSite'], 58 | 'LeftHandEndSite': [], 59 | 'Neck': ['Neck1'], 60 | 'Neck1': ['HeadEndSite'], 61 | 'HeadEndSite': [], 62 | 'RightShoulder': ['RightArm'], 63 | 'RightArm': ['RightForeArm'], 64 | 'RightForeArm': ['RightHand'], 65 | 'RightHand': ['RightHandEndSite'], 66 | 'RightHandEndSite': [], 67 | 'RightHipJoint': ['RightUpLeg'], 68 | 'RightUpLeg': ['RightLeg'], 69 | 'RightLeg': ['RightFoot'], 70 | 'RightFoot': ['RightFootEndSite'], 71 | 'RightFootEndSite': [], 72 | } 73 | self.parent = {self.root: None} 74 | for parent, children in self.children.items(): 75 | for child in children: 76 | self.parent[child] = parent 77 | 78 | self.left_joints = [ 79 | joint for joint in self.keypoint2index 80 | if 'Left' in joint 81 | ] 82 | self.right_joints = [ 83 | joint for joint in self.keypoint2index 84 | if 'Right' in joint 85 | ] 86 | 87 | # T-pose 88 | self.initial_directions = { 89 | 'Hips': [0, 0, 0], 90 | 'LeftHipJoint': [1, 0, 0], 91 | 'LeftUpLeg': [1, 0, 0], 92 | 'LeftLeg': [0, 0, -1], 93 | 'LeftFoot': [0, 0, -1], 94 | 'LeftFootEndSite': [0, -1, 0], 95 | 'LowerBack': [0, 0, 1], 96 | 'Spine': [0, 0, 1], 97 | 'Spine1': [0, 0, 1], 98 | 'LeftShoulder': [1, 0, 0], 99 | 'LeftArm': [1, 0, 0], 100 | 'LeftForeArm': [1, 0, 0], 101 | 'LeftHand': [1, 0, 0], 102 | 'LeftHandEndSite': [1, 0, 0], 103 | 'Neck': [0, 0, 1], 104 | 'Neck1': [0, 0, 1], 105 | 'HeadEndSite': [0, 0, 1], 106 | 'RightShoulder': [-1, 0, 0], 107 | 'RightArm': [-1, 0, 0], 108 | 'RightForeArm': [-1, 0, 0], 109 | 'RightHand': [-1, 0, 0], 110 | 'RightHandEndSite': [-1, 0, 0], 111 | 'RightHipJoint': [-1, 0, 0], 112 | 'RightUpLeg': [-1, 0, 0], 113 | 'RightLeg': [0, 0, -1], 114 | 'RightFoot': [0, 0, -1], 115 | 'RightFootEndSite': [0, -1, 0] 116 | } 117 | 118 | 119 | def get_initial_offset(self, poses_3d): 120 | # TODO: RANSAC 121 | bone_lens = {self.root: [0]} 122 | stack = [self.root] 123 | while stack: 124 | parent = stack.pop() 125 | p_idx = self.keypoint2index[parent] 126 | p_name = parent 127 | while p_idx == -1: 128 | # find real parent 129 | p_name = self.parent[p_name] 130 | p_idx = self.keypoint2index[p_name] 131 | for child in self.children[parent]: 132 | stack.append(child) 133 | 134 | if self.keypoint2index[child] == -1: 135 | bone_lens[child] = [0.1] 136 | else: 137 | c_idx = self.keypoint2index[child] 138 | bone_lens[child] = np.linalg.norm( 139 | poses_3d[:, p_idx] - poses_3d[:, c_idx], 140 | axis=1 141 | ) 142 | 143 | bone_len = {} 144 | for joint in self.keypoint2index: 145 | if 'Left' in joint or 'Right' in joint: 146 | base_name = joint.replace('Left', '').replace('Right', '') 147 | left_len = np.mean(bone_lens['Left' + base_name]) 148 | right_len = np.mean(bone_lens['Right' + base_name]) 149 | bone_len[joint] = (left_len + right_len) / 2 150 | else: 151 | bone_len[joint] = np.mean(bone_lens[joint]) 152 | 153 | initial_offset = {} 154 | for joint, direction in self.initial_directions.items(): 155 | direction = np.array(direction) / max(np.linalg.norm(direction), 1e-12) 156 | initial_offset[joint] = direction * bone_len[joint] 157 | 158 | return initial_offset 159 | 160 | 161 | def get_bvh_header(self, poses_3d): 162 | initial_offset = self.get_initial_offset(poses_3d) 163 | 164 | nodes = {} 165 | for joint in self.keypoint2index: 166 | is_root = joint == self.root 167 | is_end_site = 'EndSite' in joint 168 | nodes[joint] = bvh_helper.BvhNode( 169 | name=joint, 170 | offset=initial_offset[joint], 171 | rotation_order='zxy' if not is_end_site else '', 172 | is_root=is_root, 173 | is_end_site=is_end_site, 174 | ) 175 | for joint, children in self.children.items(): 176 | nodes[joint].children = [nodes[child] for child in children] 177 | for child in children: 178 | nodes[child].parent = nodes[joint] 179 | 180 | header = bvh_helper.BvhHeader(root=nodes[self.root], nodes=nodes) 181 | return header 182 | 183 | 184 | def pose2euler(self, pose, header): 185 | channel = [] 186 | quats = {} 187 | eulers = {} 188 | stack = [header.root] 189 | while stack: 190 | node = stack.pop() 191 | joint = node.name 192 | joint_idx = self.keypoint2index[joint] 193 | 194 | if node.is_root: 195 | channel.extend(pose[joint_idx]) 196 | 197 | index = self.keypoint2index 198 | order = None 199 | if joint == 'Hips': 200 | x_dir = pose[index['LeftUpLeg']] - pose[index['RightUpLeg']] 201 | y_dir = None 202 | z_dir = pose[index['Spine']] - pose[joint_idx] 203 | order = 'zyx' 204 | elif joint in ['RightUpLeg', 'RightLeg']: 205 | child_idx = self.keypoint2index[node.children[0].name] 206 | x_dir = pose[index['Hips']] - pose[index['RightUpLeg']] 207 | y_dir = None 208 | z_dir = pose[joint_idx] - pose[child_idx] 209 | order = 'zyx' 210 | elif joint in ['LeftUpLeg', 'LeftLeg']: 211 | child_idx = self.keypoint2index[node.children[0].name] 212 | x_dir = pose[index['LeftUpLeg']] - pose[index['Hips']] 213 | y_dir = None 214 | z_dir = pose[joint_idx] - pose[child_idx] 215 | order = 'zyx' 216 | elif joint == 'Spine': 217 | x_dir = pose[index['LeftUpLeg']] - pose[index['RightUpLeg']] 218 | y_dir = None 219 | z_dir = pose[index['Spine1']] - pose[joint_idx] 220 | order = 'zyx' 221 | elif joint == 'Spine1': 222 | x_dir = pose[index['LeftArm']] - \ 223 | pose[index['RightArm']] 224 | y_dir = None 225 | z_dir = pose[joint_idx] - pose[index['Spine']] 226 | order = 'zyx' 227 | elif joint == 'Neck1': 228 | x_dir = None 229 | y_dir = pose[index['Spine1']] - pose[joint_idx] 230 | z_dir = pose[index['HeadEndSite']] - pose[index['Spine1']] 231 | order = 'zxy' 232 | elif joint == 'LeftArm': 233 | x_dir = pose[index['LeftForeArm']] - pose[joint_idx] 234 | y_dir = pose[index['LeftForeArm']] - pose[index['LeftHand']] 235 | z_dir = None 236 | order = 'xzy' 237 | elif joint == 'LeftForeArm': 238 | x_dir = pose[index['LeftHand']] - pose[joint_idx] 239 | y_dir = pose[joint_idx] - pose[index['LeftArm']] 240 | z_dir = None 241 | order = 'xzy' 242 | elif joint == 'RightArm': 243 | x_dir = pose[joint_idx] - pose[index['RightForeArm']] 244 | y_dir = pose[index['RightForeArm']] - pose[index['RightHand']] 245 | z_dir = None 246 | order = 'xzy' 247 | elif joint == 'RightForeArm': 248 | x_dir = pose[joint_idx] - pose[index['RightHand']] 249 | y_dir = pose[joint_idx] - pose[index['RightArm']] 250 | z_dir = None 251 | order = 'xzy' 252 | 253 | if order: 254 | dcm = math3d.dcm_from_axis(x_dir, y_dir, z_dir, order) 255 | quats[joint] = math3d.dcm2quat(dcm) 256 | else: 257 | quats[joint] = quats[self.parent[joint]].copy() 258 | 259 | local_quat = quats[joint].copy() 260 | if node.parent: 261 | local_quat = math3d.quat_divide( 262 | q=quats[joint], r=quats[node.parent.name] 263 | ) 264 | 265 | euler = math3d.quat2euler( 266 | q=local_quat, order=node.rotation_order 267 | ) 268 | euler = np.rad2deg(euler) 269 | eulers[joint] = euler 270 | channel.extend(euler) 271 | 272 | for child in node.children[::-1]: 273 | if not child.is_end_site: 274 | stack.append(child) 275 | 276 | return channel 277 | 278 | 279 | def poses2bvh(self, poses_3d, header=None, output_file=None): 280 | if not header: 281 | header = self.get_bvh_header(poses_3d) 282 | 283 | channels = [] 284 | for frame, pose in enumerate(poses_3d): 285 | channels.append(self.pose2euler(pose, header)) 286 | 287 | if output_file: 288 | bvh_helper.write_bvh(output_file, header, channels) 289 | 290 | return channels, header -------------------------------------------------------------------------------- /bvh_skeleton/coco_skeleton.py: -------------------------------------------------------------------------------- 1 | class COCOSkeleton(object): 2 | 3 | def __init__(self): 4 | self.root = 'Neck' # median of left shoulder and right shoulder 5 | self.keypoint2index = { 6 | 'Nose': 0, 7 | 'LeftEye': 1, 8 | 'RightEye': 2, 9 | 'LeftEar': 3, 10 | 'RightEar': 4, 11 | 'LeftShoulder': 5, 12 | 'RightShoulder': 6, 13 | 'LeftElbow': 7, 14 | 'RightElbow': 8, 15 | 'LeftWrist': 9, 16 | 'RightWrist': 10, 17 | 'LeftHip': 11, 18 | 'RightHip': 12, 19 | 'LeftKnee': 13, 20 | 'RightKnee': 14, 21 | 'LeftAnkle': 15, 22 | 'RightAnkle': 16, 23 | 'Neck': 17 24 | } 25 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()} 26 | self.keypoint_num = len(self.keypoint2index) 27 | 28 | self.children = { 29 | 'Neck': [ 30 | 'Nose', 'LeftShoulder', 'RightShoulder', 'LeftHip', 'RightHip' 31 | ], 32 | 'Nose': ['LeftEye', 'RightEye'], 33 | 'LeftEye': ['LeftEar'], 34 | 'LeftEar': [], 35 | 'RightEye': ['RightEar'], 36 | 'RightEar': [], 37 | 'LeftShoulder': ['LeftElbow'], 38 | 'LeftElbow': ['LeftWrist'], 39 | 'LeftWrist': [], 40 | 'RightShoulder': ['RightElbow'], 41 | 'RightElbow': ['RightWrist'], 42 | 'RightWrist': [], 43 | 'LeftHip': ['LeftKnee'], 44 | 'LeftKnee': ['LeftAnkle'], 45 | 'LeftAnkle': [], 46 | 'RightHip': ['RightKnee'], 47 | 'RightKnee': ['RightAnkle'], 48 | 'RightAnkle': [] 49 | } 50 | self.parent = {self.root: None} 51 | for parent, children in self.children.items(): 52 | for child in children: 53 | self.parent[child] = parent -------------------------------------------------------------------------------- /bvh_skeleton/h36m_original_skeleton.py: -------------------------------------------------------------------------------- 1 | class H36mOriginalSkeleton(object): 2 | 3 | def __init__(self): 4 | self.root = 'Hip' 5 | self.keypoint2index = { 6 | 'Hip': 0, 7 | 'RightUpLeg': 1, 8 | 'RightLeg': 2, 9 | 'RightFoot': 3, 10 | 'RightToeBase': 4, 11 | 'RightToeBaseEndSite': 5, 12 | 'LeftUpLeg': 6, 13 | 'LeftLeg': 7, 14 | 'LeftFoot': 8, 15 | 'LeftToeBase': 9, 16 | 'LeftToeBaseEndSite': 10, 17 | 'Spine': 11, 18 | 'Spine1': 12, 19 | 'Neck': 13, 20 | 'Head': 14, 21 | 'HeadEndSite': 15, 22 | 'LeftShoulder': 16, 23 | 'LeftArm': 17, 24 | 'LeftForeArm': 18, 25 | 'LeftHand': 19, 26 | 'LeftHandThumb': 20, 27 | 'LeftHandThumbEndSite': 21, 28 | 'LeftWristEnd': 22, 29 | 'LeftWristEndEndSite': 23, 30 | 'RightShoulder': 24, 31 | 'RightArm': 25, 32 | 'RightForeArm': 26, 33 | 'RightHand': 27, 34 | 'RightHandThumb': 28, 35 | 'RightHandThumbEndSite': 29, 36 | 'RightWristEnd': 30, 37 | 'RightWristEndEndSite': 31 38 | } 39 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()} 40 | self.keypoint_num = len(self.keypoint2index) 41 | 42 | self.children = { 43 | 'Hip': ['RightUpLeg', 'LeftUpLeg', 'Spine'], 44 | 'RightUpLeg': ['RightLeg'], 45 | 'RightLeg': ['RightFoot'], 46 | 'RightFoot': ['RightToeBase'], 47 | 'RightToeBase': ['RightToeBaseEndSite'], 48 | 'RightToeBaseEndSite': [], 49 | 'LeftUpLeg': ['LeftLeg'], 50 | 'LeftLeg': ['LeftFoot'], 51 | 'LeftFoot': ['LeftToeBase'], 52 | 'LeftToeBase': ['LeftToeBaseEndSite'], 53 | 'LeftToeBaseEndSite': [], 54 | 'Spine': ['Spine1'], 55 | 'Spine1': ['Neck', 'LeftShoulder', 'RightShoulder'], 56 | 'Neck': ['Head'], 57 | 'Head': ['HeadEndSite'], 58 | 'HeadEndSite': [], 59 | 'LeftShoulder': ['LeftArm'], 60 | 'LeftArm': ['LeftForeArm'], 61 | 'LeftForeArm': ['LeftHand'], 62 | 'LeftHand': ['LeftHandThumb', 'LeftWristEnd'], 63 | 'LeftHandThumb': ['LeftHandThumbEndSite'], 64 | 'LeftHandThumbEndSite': [], 65 | 'LeftWristEnd': ['LeftWristEndEndSite'], 66 | 'LeftWristEndEndSite': [], 67 | 'RightShoulder': ['RightArm'], 68 | 'RightArm': ['RightForeArm'], 69 | 'RightForeArm': ['RightHand'], 70 | 'RightHand': ['RightHandThumb', 'RightWristEnd'], 71 | 'RightHandThumb': ['RightHandThumbEndSite'], 72 | 'RightHandThumbEndSite': [], 73 | 'RightWristEnd': ['RightWristEndEndSite'], 74 | 'RightWristEndEndSite': [], 75 | } 76 | self.parent = {self.root: None} 77 | for parent, children in self.children.items(): 78 | for child in children: 79 | self.parent[child] = parent 80 | 81 | self.left_joints = [ 82 | joint for joint in self.keypoint2index 83 | if 'Left' in joint 84 | ] 85 | self.right_joints = [ 86 | joint for joint in self.keypoint2index 87 | if 'Right' in joint 88 | ] -------------------------------------------------------------------------------- /bvh_skeleton/h36m_skeleton.py: -------------------------------------------------------------------------------- 1 | from . import math3d 2 | from . import bvh_helper 3 | 4 | import numpy as np 5 | 6 | 7 | class H36mSkeleton(object): 8 | 9 | def __init__(self): 10 | self.root = 'Hip' 11 | self.keypoint2index = { 12 | 'Hip': 0, 13 | 'RightHip': 1, 14 | 'RightKnee': 2, 15 | 'RightAnkle': 3, 16 | 'LeftHip': 4, 17 | 'LeftKnee': 5, 18 | 'LeftAnkle': 6, 19 | 'Spine': 7, 20 | 'Thorax': 8, 21 | 'Neck': 9, 22 | 'HeadEndSite': 10, 23 | 'LeftShoulder': 11, 24 | 'LeftElbow': 12, 25 | 'LeftWrist': 13, 26 | 'RightShoulder': 14, 27 | 'RightElbow': 15, 28 | 'RightWrist': 16, 29 | 'RightAnkleEndSite': -1, 30 | 'LeftAnkleEndSite': -1, 31 | 'LeftWristEndSite': -1, 32 | 'RightWristEndSite': -1 33 | } 34 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()} 35 | self.keypoint_num = len(self.keypoint2index) 36 | 37 | self.children = { 38 | 'Hip': ['RightHip', 'LeftHip', 'Spine'], 39 | 'RightHip': ['RightKnee'], 40 | 'RightKnee': ['RightAnkle'], 41 | 'RightAnkle': ['RightAnkleEndSite'], 42 | 'RightAnkleEndSite': [], 43 | 'LeftHip': ['LeftKnee'], 44 | 'LeftKnee': ['LeftAnkle'], 45 | 'LeftAnkle': ['LeftAnkleEndSite'], 46 | 'LeftAnkleEndSite': [], 47 | 'Spine': ['Thorax'], 48 | 'Thorax': ['Neck', 'LeftShoulder', 'RightShoulder'], 49 | 'Neck': ['HeadEndSite'], 50 | 'HeadEndSite': [], # Head is an end site 51 | 'LeftShoulder': ['LeftElbow'], 52 | 'LeftElbow': ['LeftWrist'], 53 | 'LeftWrist': ['LeftWristEndSite'], 54 | 'LeftWristEndSite': [], 55 | 'RightShoulder': ['RightElbow'], 56 | 'RightElbow': ['RightWrist'], 57 | 'RightWrist': ['RightWristEndSite'], 58 | 'RightWristEndSite': [] 59 | } 60 | self.parent = {self.root: None} 61 | for parent, children in self.children.items(): 62 | for child in children: 63 | self.parent[child] = parent 64 | 65 | self.left_joints = [ 66 | joint for joint in self.keypoint2index 67 | if 'Left' in joint 68 | ] 69 | self.right_joints = [ 70 | joint for joint in self.keypoint2index 71 | if 'Right' in joint 72 | ] 73 | 74 | # T-pose 75 | self.initial_directions = { 76 | 'Hip': [0, 0, 0], 77 | 'RightHip': [-1, 0, 0], 78 | 'RightKnee': [0, 0, -1], 79 | 'RightAnkle': [0, 0, -1], 80 | 'RightAnkleEndSite': [0, -1, 0], 81 | 'LeftHip': [1, 0, 0], 82 | 'LeftKnee': [0, 0, -1], 83 | 'LeftAnkle': [0, 0, -1], 84 | 'LeftAnkleEndSite': [0, -1, 0], 85 | 'Spine': [0, 0, 1], 86 | 'Thorax': [0, 0, 1], 87 | 'Neck': [0, 0, 1], 88 | 'HeadEndSite': [0, 0, 1], 89 | 'LeftShoulder': [1, 0, 0], 90 | 'LeftElbow': [1, 0, 0], 91 | 'LeftWrist': [1, 0, 0], 92 | 'LeftWristEndSite': [1, 0, 0], 93 | 'RightShoulder': [-1, 0, 0], 94 | 'RightElbow': [-1, 0, 0], 95 | 'RightWrist': [-1, 0, 0], 96 | 'RightWristEndSite': [-1, 0, 0] 97 | } 98 | 99 | 100 | def get_initial_offset(self, poses_3d): 101 | # TODO: RANSAC 102 | bone_lens = {self.root: [0]} 103 | stack = [self.root] 104 | while stack: 105 | parent = stack.pop() 106 | p_idx = self.keypoint2index[parent] 107 | for child in self.children[parent]: 108 | if 'EndSite' in child: 109 | bone_lens[child] = 0.4 * bone_lens[parent] 110 | continue 111 | stack.append(child) 112 | 113 | c_idx = self.keypoint2index[child] 114 | bone_lens[child] = np.linalg.norm( 115 | poses_3d[:, p_idx] - poses_3d[:, c_idx], 116 | axis=1 117 | ) 118 | 119 | bone_len = {} 120 | for joint in self.keypoint2index: 121 | if 'Left' in joint or 'Right' in joint: 122 | base_name = joint.replace('Left', '').replace('Right', '') 123 | left_len = np.mean(bone_lens['Left' + base_name]) 124 | right_len = np.mean(bone_lens['Right' + base_name]) 125 | bone_len[joint] = (left_len + right_len) / 2 126 | else: 127 | bone_len[joint] = np.mean(bone_lens[joint]) 128 | 129 | initial_offset = {} 130 | for joint, direction in self.initial_directions.items(): 131 | direction = np.array(direction) / max(np.linalg.norm(direction), 1e-12) 132 | initial_offset[joint] = direction * bone_len[joint] 133 | 134 | return initial_offset 135 | 136 | 137 | def get_bvh_header(self, poses_3d): 138 | initial_offset = self.get_initial_offset(poses_3d) 139 | 140 | nodes = {} 141 | for joint in self.keypoint2index: 142 | is_root = joint == self.root 143 | is_end_site = 'EndSite' in joint 144 | nodes[joint] = bvh_helper.BvhNode( 145 | name=joint, 146 | offset=initial_offset[joint], 147 | rotation_order='zxy' if not is_end_site else '', 148 | is_root=is_root, 149 | is_end_site=is_end_site, 150 | ) 151 | for joint, children in self.children.items(): 152 | nodes[joint].children = [nodes[child] for child in children] 153 | for child in children: 154 | nodes[child].parent = nodes[joint] 155 | 156 | header = bvh_helper.BvhHeader(root=nodes[self.root], nodes=nodes) 157 | return header 158 | 159 | 160 | def pose2euler(self, pose, header): 161 | channel = [] 162 | quats = {} 163 | eulers = {} 164 | stack = [header.root] 165 | while stack: 166 | node = stack.pop() 167 | joint = node.name 168 | joint_idx = self.keypoint2index[joint] 169 | 170 | if node.is_root: 171 | channel.extend(pose[joint_idx]) 172 | 173 | index = self.keypoint2index 174 | order = None 175 | if joint == 'Hip': 176 | x_dir = pose[index['LeftHip']] - pose[index['RightHip']] 177 | y_dir = None 178 | z_dir = pose[index['Spine']] - pose[joint_idx] 179 | order = 'zyx' 180 | elif joint in ['RightHip', 'RightKnee']: 181 | child_idx = self.keypoint2index[node.children[0].name] 182 | x_dir = pose[index['Hip']] - pose[index['RightHip']] 183 | y_dir = None 184 | z_dir = pose[joint_idx] - pose[child_idx] 185 | order = 'zyx' 186 | elif joint in ['LeftHip', 'LeftKnee']: 187 | child_idx = self.keypoint2index[node.children[0].name] 188 | x_dir = pose[index['LeftHip']] - pose[index['Hip']] 189 | y_dir = None 190 | z_dir = pose[joint_idx] - pose[child_idx] 191 | order = 'zyx' 192 | elif joint == 'Spine': 193 | x_dir = pose[index['LeftHip']] - pose[index['RightHip']] 194 | y_dir = None 195 | z_dir = pose[index['Thorax']] - pose[joint_idx] 196 | order = 'zyx' 197 | elif joint == 'Thorax': 198 | x_dir = pose[index['LeftShoulder']] - \ 199 | pose[index['RightShoulder']] 200 | y_dir = None 201 | z_dir = pose[joint_idx] - pose[index['Spine']] 202 | order = 'zyx' 203 | elif joint == 'Neck': 204 | x_dir = None 205 | y_dir = pose[index['Thorax']] - pose[joint_idx] 206 | z_dir = pose[index['HeadEndSite']] - pose[index['Thorax']] 207 | order = 'zxy' 208 | elif joint == 'LeftShoulder': 209 | x_dir = pose[index['LeftElbow']] - pose[joint_idx] 210 | y_dir = pose[index['LeftElbow']] - pose[index['LeftWrist']] 211 | z_dir = None 212 | order = 'xzy' 213 | elif joint == 'LeftElbow': 214 | x_dir = pose[index['LeftWrist']] - pose[joint_idx] 215 | y_dir = pose[joint_idx] - pose[index['LeftShoulder']] 216 | z_dir = None 217 | order = 'xzy' 218 | elif joint == 'RightShoulder': 219 | x_dir = pose[joint_idx] - pose[index['RightElbow']] 220 | y_dir = pose[index['RightElbow']] - pose[index['RightWrist']] 221 | z_dir = None 222 | order = 'xzy' 223 | elif joint == 'RightElbow': 224 | x_dir = pose[joint_idx] - pose[index['RightWrist']] 225 | y_dir = pose[joint_idx] - pose[index['RightShoulder']] 226 | z_dir = None 227 | order = 'xzy' 228 | if order: 229 | dcm = math3d.dcm_from_axis(x_dir, y_dir, z_dir, order) 230 | quats[joint] = math3d.dcm2quat(dcm) 231 | else: 232 | quats[joint] = quats[self.parent[joint]].copy() 233 | 234 | local_quat = quats[joint].copy() 235 | if node.parent: 236 | local_quat = math3d.quat_divide( 237 | q=quats[joint], r=quats[node.parent.name] 238 | ) 239 | 240 | euler = math3d.quat2euler( 241 | q=local_quat, order=node.rotation_order 242 | ) 243 | euler = np.rad2deg(euler) 244 | eulers[joint] = euler 245 | channel.extend(euler) 246 | 247 | for child in node.children[::-1]: 248 | if not child.is_end_site: 249 | stack.append(child) 250 | 251 | return channel 252 | 253 | 254 | def poses2bvh(self, poses_3d, header=None, output_file=None): 255 | if not header: 256 | header = self.get_bvh_header(poses_3d) 257 | 258 | channels = [] 259 | for frame, pose in enumerate(poses_3d): 260 | channels.append(self.pose2euler(pose, header)) 261 | 262 | if output_file: 263 | bvh_helper.write_bvh(output_file, header, channels) 264 | 265 | return channels, header -------------------------------------------------------------------------------- /bvh_skeleton/math3d.py: -------------------------------------------------------------------------------- 1 | """ 2 | ! left handed coordinate, z-up, y-forward 3 | ! left to right rotation matrix multiply: v'=vR 4 | ! non-standard quaternion multiply 5 | """ 6 | 7 | import numpy as np 8 | 9 | 10 | def normalize(x): 11 | return x / max(np.linalg.norm(x), 1e-12) 12 | 13 | 14 | def dcm_from_axis(x_dir, y_dir, z_dir, order): 15 | assert order in ['yzx', 'yxz', 'xyz', 'xzy', 'zxy', 'zyx'] 16 | 17 | axis = {'x': x_dir, 'y': y_dir, 'z': z_dir} 18 | name = ['x', 'y', 'z'] 19 | idx0 = name.index(order[0]) 20 | idx1 = name.index(order[1]) 21 | idx2 = name.index(order[2]) 22 | 23 | axis[order[0]] = normalize(axis[order[0]]) 24 | axis[order[1]] = normalize(np.cross( 25 | axis[name[(idx1 + 1) % 3]], axis[name[(idx1 + 2) % 3]] 26 | )) 27 | axis[order[2]] = normalize(np.cross( 28 | axis[name[(idx2 + 1) % 3]], axis[name[(idx2 + 2) % 3]] 29 | )) 30 | 31 | dcm = np.asarray([axis['x'], axis['y'], axis['z']]) 32 | 33 | return dcm 34 | 35 | 36 | def dcm2quat(dcm): 37 | q = np.zeros([4]) 38 | tr = np.trace(dcm) 39 | 40 | if tr > 0: 41 | sqtrp1 = np.sqrt(tr + 1.0) 42 | q[0] = 0.5 * sqtrp1 43 | q[1] = (dcm[1, 2] - dcm[2, 1]) / (2.0 * sqtrp1) 44 | q[2] = (dcm[2, 0] - dcm[0, 2]) / (2.0 * sqtrp1) 45 | q[3] = (dcm[0, 1] - dcm[1, 0]) / (2.0 * sqtrp1) 46 | else: 47 | d = np.diag(dcm) 48 | if d[1] > d[0] and d[1] > d[2]: 49 | sqdip1 = np.sqrt(d[1] - d[0] - d[2] + 1.0) 50 | q[2] = 0.5 * sqdip1 51 | 52 | if sqdip1 != 0: 53 | sqdip1 = 0.5 / sqdip1 54 | 55 | q[0] = (dcm[2, 0] - dcm[0, 2]) * sqdip1 56 | q[1] = (dcm[0, 1] + dcm[1, 0]) * sqdip1 57 | q[3] = (dcm[1, 2] + dcm[2, 1]) * sqdip1 58 | 59 | elif d[2] > d[0]: 60 | sqdip1 = np.sqrt(d[2] - d[0] - d[1] + 1.0) 61 | q[3] = 0.5 * sqdip1 62 | 63 | if sqdip1 != 0: 64 | sqdip1 = 0.5 / sqdip1 65 | 66 | q[0] = (dcm[0, 1] - dcm[1, 0]) * sqdip1 67 | q[1] = (dcm[2, 0] + dcm[0, 2]) * sqdip1 68 | q[2] = (dcm[1, 2] + dcm[2, 1]) * sqdip1 69 | 70 | else: 71 | sqdip1 = np.sqrt(d[0] - d[1] - d[2] + 1.0) 72 | q[1] = 0.5 * sqdip1 73 | 74 | if sqdip1 != 0: 75 | sqdip1 = 0.5 / sqdip1 76 | 77 | q[0] = (dcm[1, 2] - dcm[2, 1]) * sqdip1 78 | q[2] = (dcm[0, 1] + dcm[1, 0]) * sqdip1 79 | q[3] = (dcm[2, 0] + dcm[0, 2]) * sqdip1 80 | 81 | return q 82 | 83 | 84 | def quat_dot(q0, q1): 85 | original_shape = q0.shape 86 | q0 = np.reshape(q0, [-1, 4]) 87 | q1 = np.reshape(q1, [-1, 4]) 88 | 89 | w0, x0, y0, z0 = q0[:, 0], q0[:, 1], q0[:, 2], q0[:, 3] 90 | w1, x1, y1, z1 = q1[:, 0], q1[:, 1], q1[:, 2], q1[:, 3] 91 | q_product = w0 * w1 + x1 * x1 + y0 * y1 + z0 * z1 92 | q_product = np.expand_dims(q_product, axis=1) 93 | q_product = np.tile(q_product, [1, 4]) 94 | 95 | return np.reshape(q_product, original_shape) 96 | 97 | 98 | def quat_inverse(q): 99 | original_shape = q.shape 100 | q = np.reshape(q, [-1, 4]) 101 | 102 | q_conj = [q[:, 0], -q[:, 1], -q[:, 2], -q[:, 3]] 103 | q_conj = np.stack(q_conj, axis=1) 104 | q_inv = np.divide(q_conj, quat_dot(q_conj, q_conj)) 105 | 106 | return np.reshape(q_inv, original_shape) 107 | 108 | 109 | def quat_mul(q0, q1): 110 | original_shape = q0.shape 111 | q1 = np.reshape(q1, [-1, 4, 1]) 112 | q0 = np.reshape(q0, [-1, 1, 4]) 113 | terms = np.matmul(q1, q0) 114 | w = terms[:, 0, 0] - terms[:, 1, 1] - terms[:, 2, 2] - terms[:, 3, 3] 115 | x = terms[:, 0, 1] + terms[:, 1, 0] - terms[:, 2, 3] + terms[:, 3, 2] 116 | y = terms[:, 0, 2] + terms[:, 1, 3] + terms[:, 2, 0] - terms[:, 3, 1] 117 | z = terms[:, 0, 3] - terms[:, 1, 2] + terms[:, 2, 1] + terms[:, 3, 0] 118 | 119 | q_product = np.stack([w, x, y, z], axis=1) 120 | return np.reshape(q_product, original_shape) 121 | 122 | 123 | def quat_divide(q, r): 124 | return quat_mul(quat_inverse(r), q) 125 | 126 | 127 | def quat2euler(q, order='zxy', eps=1e-8): 128 | original_shape = list(q.shape) 129 | original_shape[-1] = 3 130 | q = np.reshape(q, [-1, 4]) 131 | 132 | q0 = q[:, 0] 133 | q1 = q[:, 1] 134 | q2 = q[:, 2] 135 | q3 = q[:, 3] 136 | 137 | if order == 'zxy': 138 | x = np.arcsin(np.clip(2 * (q0 * q1 + q2 * q3), -1 + eps, 1 - eps)) 139 | y = np.arctan2(2 * (q0 * q2 - q1 * q3), 1 - 2 * (q1 * q1 + q2 * q2)) 140 | z = np.arctan2(2 * (q0 * q3 - q1 * q2), 1 - 2 * (q1 * q1 + q3 * q3)) 141 | euler = np.stack([z, x, y], axis=1) 142 | else: 143 | raise ValueError('Not implemented') 144 | 145 | return np.reshape(euler, original_shape) -------------------------------------------------------------------------------- /bvh_skeleton/openpose_skeleton.py: -------------------------------------------------------------------------------- 1 | class OpenPoseSkeleton(object): 2 | 3 | def __init__(self): 4 | self.root = 'MidHip' 5 | self.keypoint2index = { 6 | 'Nose': 0, 7 | 'Neck': 1, 8 | 'RShoulder': 2, 9 | 'RElbow': 3, 10 | 'RWrist': 4, 11 | 'LShoulder': 5, 12 | 'LElbow': 6, 13 | 'LWrist': 7, 14 | 'MidHip': 8, 15 | 'RHip': 9, 16 | 'RKnee': 10, 17 | 'RAnkle': 11, 18 | 'LHip': 12, 19 | 'LKnee': 13, 20 | 'LAnkle': 14, 21 | 'REye': 15, 22 | 'LEye': 16, 23 | 'REar': 17, 24 | 'LEar': 18, 25 | 'LBigToe': 19, 26 | 'LSmallToe': 20, 27 | 'LHeel': 21, 28 | 'RBigToe': 22, 29 | 'RSmallToe': 23, 30 | 'RHeel': 24 31 | } 32 | self.index2keypoint = {v: k for k, v in self.keypoint2index.items()} 33 | self.keypoint_num = len(self.keypoint2index) 34 | 35 | self.children = { 36 | 'MidHip': ['Neck', 'RHip', 'LHip'], 37 | 'Neck': ['Nose', 'RShoulder', 'LShoulder'], 38 | 'Nose': ['REye', 'LEye'], 39 | 'REye': ['REar'], 40 | 'REar': [], 41 | 'LEye': ['LEar'], 42 | 'LEar': [], 43 | 'RShoulder': ['RElbow'], 44 | 'RElbow': ['RWrist'], 45 | 'RWrist': [], 46 | 'LShoulder': ['LElbow'], 47 | 'LElbow': ['LWrist'], 48 | 'LWrist': [], 49 | 'RHip': ['RKnee'], 50 | 'RKnee': ['RAnkle'], 51 | 'RAnkle': ['RBigToe', 'RSmallToe', 'RHeel'], 52 | 'RBigToe': [], 53 | 'RSmallToe': [], 54 | 'RHeel': [], 55 | 'LHip': ['LKnee'], 56 | 'LKnee': ['LAnkle'], 57 | 'LAnkle': ['LBigToe', 'LSmallToe', 'LHeel'], 58 | 'LBigToe': [], 59 | 'LSmallToe': [], 60 | 'LHeel': [], 61 | } 62 | self.parent = {self.root: None} 63 | for parent, children in self.children.items(): 64 | for child in children: 65 | self.parent[child] = parent -------------------------------------------------------------------------------- /cameras.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/cameras.h5 -------------------------------------------------------------------------------- /demo.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from pose_estimator_2d import openpose_estimator\n", 10 | "from pose_estimator_3d import estimator_3d\n", 11 | "from utils import smooth, vis, camera\n", 12 | "from bvh_skeleton import openpose_skeleton, h36m_skeleton, cmu_skeleton\n", 13 | "\n", 14 | "import cv2\n", 15 | "import importlib\n", 16 | "import numpy as np\n", 17 | "import os\n", 18 | "from pathlib import Path\n", 19 | "from IPython.display import HTML" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## Initialize 2d pose estimator" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "metadata": {}, 33 | "outputs": [], 34 | "source": [ 35 | "# more 2d pose estimators like HRNet, PoseResNet, CPN, etc., will be added later\n", 36 | "e2d = openpose_estimator.OpenPoseEstimator(model_folder='/openpose/models/') # set model_folder to /path/to/openpose/models" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "## Estimate 2D pose from video" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": null, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "video_file = Path('miscs/cxk.mp4') # video file to process\n", 53 | "output_dir = Path(f'miscs/{video_file.stem}_cache')\n", 54 | "if not output_dir.exists():\n", 55 | " os.makedirs(output_dir)\n", 56 | " \n", 57 | "cap = cv2.VideoCapture(str(video_file))\n", 58 | "keypoints_list = []\n", 59 | "img_width, img_height = None, None\n", 60 | "while True:\n", 61 | " ret, frame = cap.read()\n", 62 | " if not ret:\n", 63 | " break\n", 64 | " img_height = frame.shape[0]\n", 65 | " img_width = frame.shape[1]\n", 66 | " \n", 67 | " # returned shape will be (num_of_human, 25, 3)\n", 68 | " # last dimension includes (x, y, confidence)\n", 69 | " keypoints = e2d.estimate(img_list=[frame])[0]\n", 70 | " if not isinstance(keypoints, np.ndarray) or len(keypoints.shape) != 3:\n", 71 | " # failed to detect human\n", 72 | " keypoints_list.append(None)\n", 73 | " else:\n", 74 | " # we assume that the image only contains 1 person\n", 75 | " # multi-person video needs some extra processes like grouping\n", 76 | " # maybe we will implemented it in the future\n", 77 | " keypoints_list.append(keypoints[0])\n", 78 | "cap.release()" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": {}, 84 | "source": [ 85 | "## Process 2D pose" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "metadata": {}, 92 | "outputs": [], 93 | "source": [ 94 | "# filter out failed result\n", 95 | "keypoints_list = smooth.filter_missing_value(\n", 96 | " keypoints_list=keypoints_list,\n", 97 | " method='ignore' # interpolation method will be implemented later\n", 98 | ")\n", 99 | "\n", 100 | "# smooth process will be implemented later\n", 101 | "\n", 102 | "# save 2d pose result\n", 103 | "pose2d = np.stack(keypoints_list)[:, :, :2]\n", 104 | "pose2d_file = Path(output_dir / '2d_pose.npy')\n", 105 | "np.save(pose2d_file, pose2d)" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "## Visualize 2D pose" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": null, 118 | "metadata": {}, 119 | "outputs": [], 120 | "source": [ 121 | "cap = cv2.VideoCapture(str(video_file))\n", 122 | "vis_result_dir = output_dir / '2d_pose_vis' # path to save the visualized images\n", 123 | "if not vis_result_dir.exists():\n", 124 | " os.makedirs(vis_result_dir)\n", 125 | " \n", 126 | "op_skel = openpose_skeleton.OpenPoseSkeleton()\n", 127 | "\n", 128 | "for i, keypoints in enumerate(keypoints_list):\n", 129 | " ret, frame = cap.read()\n", 130 | " if not ret:\n", 131 | " break\n", 132 | " \n", 133 | " # keypoint whose detect confidence under kp_thresh will not be visualized\n", 134 | " vis.vis_2d_keypoints(\n", 135 | " keypoints=keypoints,\n", 136 | " img=frame,\n", 137 | " skeleton=op_skel,\n", 138 | " kp_thresh=0.4,\n", 139 | " output_file=vis_result_dir / f'{i:04d}.png'\n", 140 | " )\n", 141 | "cap.release()" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "## Initialize 3D pose estimator" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "importlib.reload(estimator_3d)\n", 158 | "e3d = estimator_3d.Estimator3D(\n", 159 | " config_file='models/openpose_video_pose_243f/video_pose.yaml',\n", 160 | " checkpoint_file='models/openpose_video_pose_243f/best_58.58.pth'\n", 161 | ")" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "## Estimate 3D pose from 2D pose" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": null, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "pose2d = np.load(pose2d_file)\n", 178 | "pose3d = e3d.estimate(pose2d, image_width=img_width, image_height=img_height)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "markdown", 183 | "metadata": {}, 184 | "source": [ 185 | "## Convert 3D pose from camera coordinates to world coordinates" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": null, 191 | "metadata": {}, 192 | "outputs": [], 193 | "source": [ 194 | "subject = 'S1'\n", 195 | "cam_id = '55011271'\n", 196 | "cam_params = camera.load_camera_params('cameras.h5')[subject][cam_id]\n", 197 | "R = cam_params['R']\n", 198 | "T = 0\n", 199 | "azimuth = cam_params['azimuth']\n", 200 | "\n", 201 | "pose3d_world = camera.camera2world(pose=pose3d, R=R, T=T)\n", 202 | "pose3d_world[:, :, 2] -= np.min(pose3d_world[:, :, 2]) # rebase the height\n", 203 | "\n", 204 | "pose3d_file = output_dir / '3d_pose.npy'\n", 205 | "np.save(pose3d_file, pose3d_world)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "## Visualize 3D pose" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": null, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "h36m_skel = h36m_skeleton.H36mSkeleton()\n", 222 | "gif_file = output_dir / '3d_pose_300.gif' # output format can be .gif or .mp4 \n", 223 | "\n", 224 | "ani = vis.vis_3d_keypoints_sequence(\n", 225 | " keypoints_sequence=pose3d_world[0:300],\n", 226 | " skeleton=h36m_skel,\n", 227 | " azimuth=azimuth,\n", 228 | " fps=60,\n", 229 | " output_file=gif_file\n", 230 | ")\n", 231 | "HTML(ani.to_jshtml())" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "## Convert 3D pose to BVH" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": null, 244 | "metadata": {}, 245 | "outputs": [], 246 | "source": [ 247 | "bvh_file = output_dir / f'{video_file.stem}.bvh'\n", 248 | "cmu_skel = cmu_skeleton.CMUSkeleton()\n", 249 | "channels, header = cmu_skel.poses2bvh(pose3d_world, output_file=bvh_file)" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": {}, 256 | "outputs": [], 257 | "source": [ 258 | "output = 'miscs/h36m_cxk.bvh'\n", 259 | "h36m_skel = h36m_skeleton.H36mSkeleton()\n", 260 | "_ = h36m_skel.poses2bvh(pose3d_world, output_file=output)" 261 | ] 262 | } 263 | ], 264 | "metadata": { 265 | "kernelspec": { 266 | "display_name": "Python 3", 267 | "language": "python", 268 | "name": "python3" 269 | }, 270 | "language_info": { 271 | "codemirror_mode": { 272 | "name": "ipython", 273 | "version": 3 274 | }, 275 | "file_extension": ".py", 276 | "mimetype": "text/x-python", 277 | "name": "python", 278 | "nbconvert_exporter": "python", 279 | "pygments_lexer": "ipython3", 280 | "version": "3.6.8" 281 | } 282 | }, 283 | "nbformat": 4, 284 | "nbformat_minor": 2 285 | } 286 | -------------------------------------------------------------------------------- /miscs/cxk.mp4: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/cxk.mp4 -------------------------------------------------------------------------------- /miscs/cxk_cache/2d_pose.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/cxk_cache/2d_pose.npy -------------------------------------------------------------------------------- /miscs/cxk_cache/3d_pose.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/cxk_cache/3d_pose.npy -------------------------------------------------------------------------------- /miscs/demo/cxk_2d_pose.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/cxk_2d_pose.gif -------------------------------------------------------------------------------- /miscs/demo/cxk_3d_pose.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/cxk_3d_pose.gif -------------------------------------------------------------------------------- /miscs/demo/cxk_bvh.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/cxk_bvh.gif -------------------------------------------------------------------------------- /miscs/demo/cxk_retargeting.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/cxk_retargeting.gif -------------------------------------------------------------------------------- /miscs/demo/demo.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/demo/demo.gif -------------------------------------------------------------------------------- /miscs/girl_model/textures/brown_eye.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/brown_eye.png -------------------------------------------------------------------------------- /miscs/girl_model/textures/female_casualsuit02_ao.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/female_casualsuit02_ao.png -------------------------------------------------------------------------------- /miscs/girl_model/textures/female_casualsuit02_diffuse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/female_casualsuit02_diffuse.png -------------------------------------------------------------------------------- /miscs/girl_model/textures/female_casualsuit02_normal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/female_casualsuit02_normal.png -------------------------------------------------------------------------------- /miscs/girl_model/textures/middleage_lightskinned_female_diffuse2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/middleage_lightskinned_female_diffuse2.png -------------------------------------------------------------------------------- /miscs/girl_model/textures/ponytail01_diffuse.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/miscs/girl_model/textures/ponytail01_diffuse.png -------------------------------------------------------------------------------- /miscs/model_link.txt: -------------------------------------------------------------------------------- 1 | Baidu disk: 2 | https://pan.baidu.com/s/1-SRaS5FwC30-Pf_gL8bbXQ(code: fmpz) 3 | 4 | Google drive: 5 | https://drive.google.com/drive/folders/1M2s32xQkrDhDLz-VqzvocMuoaSGR1MfX?usp=sharin 6 | -------------------------------------------------------------------------------- /pose_estimator_2d/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/pose_estimator_2d/__init__.py -------------------------------------------------------------------------------- /pose_estimator_2d/estimator_2d.py: -------------------------------------------------------------------------------- 1 | import abc 2 | 3 | class Estimator2D(object): 4 | """Base class of 2D human pose estimator.""" 5 | 6 | def __init__(self): 7 | pass 8 | 9 | @abc.abstractclassmethod 10 | def estimate(self, img_list, bbox_list=None): 11 | """ 12 | Args: 13 | img_list: List of image read by opencv(channel order BGR). 14 | bbox_list: List of bounding-box (left_top x, left_top y, 15 | bbox_width, bbox_height). 16 | Return: 17 | keypoints_list: List of keypoint position (joint_num, x, y, 18 | confidence) 19 | """ 20 | pass -------------------------------------------------------------------------------- /pose_estimator_2d/openpose_estimator.py: -------------------------------------------------------------------------------- 1 | from .estimator_2d import Estimator2D 2 | from openpose import pyopenpose as op 3 | 4 | 5 | class OpenPoseEstimator(Estimator2D): 6 | 7 | def __init__(self, model_folder): 8 | """ 9 | OpenPose 2D pose estimator. See [https://github.com/ 10 | CMU-Perceptual-Computing-Lab/openpose/tree/ master/examples/ 11 | tutorial_api_python] for help. 12 | Args: 13 | """ 14 | super().__init__() 15 | params = {'model_folder': model_folder, 'render_pose': 0} 16 | self.opWrapper = op.WrapperPython() 17 | self.opWrapper.configure(params) 18 | self.opWrapper.start() 19 | 20 | def estimate(self, img_list, bbox_list=None): 21 | """See base class.""" 22 | keypoints_list = [] 23 | for i, img in enumerate(img_list): 24 | if bbox_list: 25 | x, y, w, h = bbox_list[i] 26 | img = img[y:y+h, x:x+w] 27 | datum = op.Datum() 28 | datum.cvInputData = img 29 | self.opWrapper.emplaceAndPop([datum]) 30 | keypoints = datum.poseKeypoints 31 | if bbox_list: 32 | # TODO: restore coordinate 33 | pass 34 | keypoints_list.append(datum.poseKeypoints) 35 | 36 | return keypoints_list -------------------------------------------------------------------------------- /pose_estimator_3d/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/pose_estimator_3d/__init__.py -------------------------------------------------------------------------------- /pose_estimator_3d/dataset/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/pose_estimator_3d/dataset/__init__.py -------------------------------------------------------------------------------- /pose_estimator_3d/dataset/wild_pose_dataset.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | 4 | 5 | class WildPoseDataset(object): 6 | 7 | def __init__(self, input_poses, seq_len, image_width, image_height): 8 | self.seq_len = seq_len 9 | self.input_poses = normalize_screen_coordiantes(input_poses, image_width, image_height) 10 | 11 | 12 | def __len__(self): 13 | return self.input_poses.shape[0] 14 | 15 | 16 | def __getitem__(self, idx): 17 | frame = idx 18 | start = frame - self.seq_len//2 19 | end = frame + self.seq_len//2 + 1 20 | 21 | valid_start = max(0, start) 22 | valid_end = min(self.input_poses.shape[0], end) 23 | pad = (valid_start - start, end - valid_end) 24 | input_pose = self.input_poses[valid_start:valid_end] 25 | if pad != (0, 0): 26 | input_pose = np.pad(input_pose, (pad, (0, 0), (0, 0)), 'edge') 27 | if input_pose.shape[0] == 1: 28 | # squeeze time dimension if sequence length is 1 29 | input_pose = np.squeeze(input_pose, axis=0) 30 | 31 | sample = { 'input_pose': input_pose } 32 | return sample 33 | 34 | 35 | def normalize_screen_coordiantes(pose, w, h): 36 | """ 37 | Args: 38 | pose: numpy array with shape (joint, 2). 39 | Return: 40 | normalized pose that [0, WIDTH] is maped to [-1, 1] while preserving the aspect ratio. 41 | """ 42 | assert pose.shape[-1] == 2 43 | return pose/w*2 - [1, h/w] 44 | 45 | 46 | def flip_pose(pose, lefts, rights): 47 | if isinstance(pose, np.ndarray): 48 | p = pose.copy() 49 | elif isinstance(pose, torch.Tensor): 50 | p = pose.clone() 51 | else: 52 | raise TypeError(f'{type(pose)}') 53 | 54 | p[..., 0] *= -1 55 | p[..., lefts + rights, :] = p[..., rights + lefts, :] 56 | return p 57 | -------------------------------------------------------------------------------- /pose_estimator_3d/estimator_3d.py: -------------------------------------------------------------------------------- 1 | from .model.factory import create_model 2 | from .dataset.wild_pose_dataset import WildPoseDataset 3 | 4 | import numpy as np 5 | import pprint 6 | import torch 7 | import torch.utils.data 8 | import yaml 9 | from easydict import EasyDict 10 | 11 | 12 | class Estimator3D(object): 13 | """Base class of 3D human pose estimator.""" 14 | 15 | def __init__(self, config_file, checkpoint_file): 16 | with open(config_file, 'r') as f: 17 | print(f'=> Read 3D estimator config from {config_file}.') 18 | self.cfg = EasyDict(yaml.load(f, Loader=yaml.Loader)) 19 | pprint.pprint(self.cfg) 20 | self.model = create_model(self.cfg, checkpoint_file) 21 | self.device = torch.device( 22 | 'cuda' if torch.cuda.is_available() else 'cpu' 23 | ) 24 | print(f'=> Use device {self.device}.') 25 | self.model = self.model.to(self.device) 26 | 27 | def estimate(self, poses_2d, image_width, image_height): 28 | # pylint: disable=no-member 29 | dataset = WildPoseDataset( 30 | input_poses=poses_2d, 31 | seq_len=self.cfg.DATASET.SEQ_LEN, 32 | image_width=image_width, 33 | image_height=image_height 34 | ) 35 | loader = torch.utils.data.DataLoader( 36 | dataset=dataset, 37 | batch_size=self.cfg.TRAIN.BATCH_SIZE 38 | ) 39 | poses_3d = np.zeros((poses_2d.shape[0], self.cfg.DATASET.OUT_JOINT, 3)) 40 | frame = 0 41 | print('=> Begin to estimate 3D poses.') 42 | with torch.no_grad(): 43 | for batch in loader: 44 | input_pose = batch['input_pose'].float().cuda() 45 | 46 | output = self.model(input_pose) 47 | if self.cfg.DATASET.TEST_FLIP: 48 | input_lefts = self.cfg.DATASET.INPUT_LEFT_JOINTS 49 | input_rights = self.cfg.DATASET.INPUT_RIGHT_JOINTS 50 | output_lefts = self.cfg.DATASET.OUTPUT_LEFT_JOINTS 51 | output_rights = self.cfg.DATASET.OUTPUT_RIGHT_JOINTS 52 | 53 | flip_input_pose = input_pose.clone() 54 | flip_input_pose[..., :, 0] *= -1 55 | flip_input_pose[..., input_lefts + input_rights, :] = flip_input_pose[..., input_rights + input_lefts, :] 56 | 57 | flip_output = self.model(flip_input_pose) 58 | flip_output[..., :, 0] *= -1 59 | flip_output[..., output_lefts + output_rights, :] = flip_output[..., output_rights + output_lefts, :] 60 | 61 | output = (output + flip_output) / 2 62 | output[:, 0] = 0 # center the root joint 63 | output *= 1000 # m -> mm 64 | 65 | batch_size = output.shape[0] 66 | poses_3d[frame:frame+batch_size] = output.cpu().numpy() 67 | frame += batch_size 68 | print(f'{frame} / {poses_2d.shape[0]}') 69 | 70 | return poses_3d -------------------------------------------------------------------------------- /pose_estimator_3d/model/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/KevinLTT/video2bvh/312d18f53bf31c37adcaf07c97098b67dbf9804a/pose_estimator_3d/model/__init__.py -------------------------------------------------------------------------------- /pose_estimator_3d/model/factory.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import yaml 3 | from easydict import EasyDict 4 | 5 | 6 | def create_model(cfg, checkpoint_file): 7 | if cfg.MODEL.NAME == 'linear_model': 8 | from .linear_model import LinearModel 9 | model = LinearModel( 10 | in_joint=cfg.DATASET.IN_JOINT, 11 | in_channel=cfg.DATASET.IN_CHANNEL, 12 | out_joint=cfg.DATASET.OUT_JOINT, 13 | out_channel=cfg.DATASET.OUT_CHANNEL, 14 | block_num=cfg.MODEL.BLOCK_NUM, 15 | hidden_size=cfg.MODEL.HIDDEN_SIZE, 16 | dropout=cfg.MODEL.DROPOUT, 17 | bias=cfg.MODEL.BIAS, 18 | residual=cfg.MODEL.RESIDUAL 19 | ) 20 | elif cfg.MODEL.NAME == 'video_pose': 21 | from .video_pose import VideoPose 22 | model = VideoPose( 23 | in_joint=cfg.DATASET.IN_JOINT, 24 | in_channel=cfg.DATASET.IN_CHANNEL, 25 | out_joint=cfg.DATASET.OUT_JOINT, 26 | out_channel=cfg.DATASET.OUT_CHANNEL, 27 | filter_widths=cfg.MODEL.FILTER_WIDTHS, 28 | hidden_size=cfg.MODEL.HIDDEN_SIZE, 29 | dropout=cfg.MODEL.DROPOUT, 30 | dsc=cfg.MODEL.DSC 31 | ) 32 | else: 33 | raise ValueError(f'Model name {cfg.MODEL.NAME} is invalid.') 34 | 35 | print(f'=> Load checkpoint {checkpoint_file}') 36 | pretrained_dict = torch.load(checkpoint_file)['model_state'] 37 | model_dict = model.state_dict() 38 | pretrained_dict = { 39 | k: v for k, v in pretrained_dict.items() 40 | if k in model_dict 41 | } 42 | model_dict.update(pretrained_dict) 43 | model.load_state_dict(model_dict) 44 | 45 | model = model.eval() 46 | 47 | return model 48 | -------------------------------------------------------------------------------- /pose_estimator_3d/model/linear_model.py: -------------------------------------------------------------------------------- 1 | from .module import ResidualBlock, get_activation 2 | 3 | import torch 4 | import torch.nn as nn 5 | 6 | 7 | class LinearModel(nn.Module): 8 | 9 | def __init__(self, in_joint, in_channel, out_joint, out_channel, block_num, hidden_size, 10 | activation='relu', dropout=0.25, bias=True, residual=True): 11 | super().__init__() 12 | 13 | self.in_joint = in_joint 14 | self.out_joint = out_joint 15 | self.out_channel = out_channel 16 | 17 | self.activation = get_activation(activation) 18 | self.drop = nn.Dropout(dropout) 19 | self.expand_fc = nn.Linear(in_joint*in_channel, hidden_size, bias=bias) 20 | self.expand_bn = nn.BatchNorm1d(hidden_size) 21 | self.blocks = nn.Sequential(*[ 22 | ResidualBlock(hidden_size, activation, dropout, residual, bias) 23 | for i in range(block_num) 24 | ]) 25 | self.shrink_fc = nn.Linear(hidden_size, out_joint*out_channel, bias=bias) 26 | 27 | 28 | def forward(self, x): 29 | batch_size = x.shape[0] 30 | x = x.view(batch_size, -1) 31 | 32 | x = self.drop(self.activation(self.expand_bn(self.expand_fc(x)))) 33 | x = self.blocks(x) 34 | x = self.shrink_fc(x) 35 | 36 | x = x.view(batch_size, self.out_joint, self.out_channel) 37 | return x -------------------------------------------------------------------------------- /pose_estimator_3d/model/module.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | 4 | 5 | def get_activation(name): 6 | if name == 'relu': 7 | return nn.ReLU() 8 | elif name == 'leaky_relu': 9 | return nn.LeakyReLU() 10 | else: 11 | raise ValueError(f'Activation "{name}" is invalid.') 12 | 13 | 14 | class ResidualBlock(nn.Module): 15 | 16 | def __init__(self, hidden_size, activation='relu', dropout=0, residual=True, bias=False): 17 | super(ResidualBlock, self).__init__() 18 | 19 | self.fc1 = nn.Linear(hidden_size, hidden_size, bias=bias) 20 | self.fc2 = nn.Linear(hidden_size, hidden_size, bias=bias) 21 | self.bn1 = nn.BatchNorm1d(hidden_size) 22 | self.bn2 = nn.BatchNorm1d(hidden_size) 23 | self.activation = get_activation(activation) 24 | self.drop = nn.Dropout(dropout) 25 | self.residual = lambda x: x if residual else 0 26 | 27 | 28 | def forward(self, x): 29 | res = self.residual(x) 30 | x = self.drop(self.activation(self.bn1(self.fc1(x)))) 31 | x = self.drop(self.activation(self.bn2(self.fc2(x)))) 32 | return x + res 33 | 34 | 35 | class DepthwiseSeparableConv1d(nn.Module): 36 | 37 | def __init__(self, in_channels, out_channels, kernel_size, 38 | bias=False, dilation=1, padding=0, stride=1): 39 | super(DepthwiseSeparableConv1d, self).__init__() 40 | 41 | self.depthwise_conv = nn.Conv1d( 42 | in_channels=in_channels, out_channels=in_channels, 43 | kernel_size=kernel_size, groups=in_channels, 44 | bias=bias, stride=stride, padding=padding, dilation=dilation,) 45 | self.pointwise_conv = nn.Conv1d( 46 | in_channels=in_channels, out_channels=out_channels, 47 | kernel_size=1, groups=1, 48 | bias=bias, stride=1, padding=0, dilation=1 49 | ) 50 | 51 | 52 | def forward(self, x): 53 | x = self.depthwise_conv(x) 54 | x = self.pointwise_conv(x) 55 | return x -------------------------------------------------------------------------------- /pose_estimator_3d/model/video_pose.py: -------------------------------------------------------------------------------- 1 | from .module import DepthwiseSeparableConv1d 2 | 3 | import torch 4 | import torch.nn as nn 5 | 6 | 7 | class VideoPose(nn.Module): 8 | 9 | def __init__(self, in_joint, in_channel, out_joint, out_channel, filter_widths, hidden_size, dropout, dsc): 10 | super().__init__() 11 | 12 | self.train_model = None 13 | self.eval_model = TemporalModel( 14 | in_joint, in_channel, out_joint, out_channel, filter_widths, hidden_size, dropout, dsc 15 | ) 16 | self.current_model = self.eval_model 17 | 18 | def forward(self, x): 19 | return self.current_model(x) 20 | 21 | 22 | class TemporalModelBase(nn.Module): 23 | """ 24 | Do not instantiate this class. 25 | """ 26 | 27 | def __init__(self, in_joint, in_channel, out_joint, out_channel, filter_widths, hidden_size, dropout, dsc): 28 | super().__init__() 29 | 30 | # Validate input 31 | for fw in filter_widths: 32 | assert fw % 2 != 0, 'Only odd filter widths are supported' 33 | 34 | self.in_joint = in_joint 35 | self.out_joint = out_joint 36 | self.filter_widths = filter_widths 37 | self.out_channel = out_channel 38 | 39 | self.drop = nn.Dropout(dropout) 40 | self.relu = nn.ReLU(inplace=True) 41 | 42 | self.pad = [ filter_widths[0] // 2 ] 43 | self.expand_bn = nn.BatchNorm1d(hidden_size, momentum=0.1) 44 | self.shrink = nn.Conv1d(hidden_size, out_joint * out_channel, 1) 45 | 46 | 47 | def set_bn_momentum(self, momentum): 48 | self.expand_bn.momentum = momentum 49 | for bn in self.layers_bn: 50 | bn.momentum = momentum 51 | 52 | 53 | def forward(self, x): 54 | assert len(x.shape) == 4 55 | assert x.shape[-2] == self.in_joint 56 | 57 | batch_size, seq_len, joint, channel = x.shape 58 | x = x.view(batch_size, seq_len, -1) 59 | x = x.permute(0, 2, 1) # channel first 60 | 61 | x = self._forward_blocks(x) 62 | 63 | x = x.permute(0, 2, 1) # channel last 64 | x = x.view(batch_size, self.out_joint, self.out_channel) 65 | return x 66 | 67 | 68 | class TemporalModel(TemporalModelBase): 69 | """ 70 | Reference 3D pose estimation model with temporal convolutions. 71 | This implementation can be used for all use-cases. 72 | """ 73 | 74 | def __init__(self, in_joint, in_channel, out_joint, out_channel, filter_widths, hidden_size, dropout, dsc): 75 | super().__init__(in_joint, in_channel, out_joint, out_channel, 76 | filter_widths, hidden_size, dropout, dsc) 77 | 78 | self.expand_conv = nn.Conv1d(in_joint*in_channel, hidden_size, filter_widths[0], bias=False) 79 | 80 | layers_conv = [] 81 | layers_bn = [] 82 | 83 | next_dilation = filter_widths[0] 84 | conv_class = DepthwiseSeparableConv1d if dsc else nn.Conv1d 85 | for i in range(1, len(filter_widths)): 86 | self.pad.append((filter_widths[i] - 1)*next_dilation // 2) 87 | layers_conv.append(conv_class( 88 | hidden_size, hidden_size, filter_widths[i], dilation=next_dilation, bias=False 89 | )) 90 | layers_bn.append(nn.BatchNorm1d(hidden_size, momentum=0.1)) 91 | layers_conv.append(nn.Conv1d(hidden_size, hidden_size, 1, dilation=1, bias=False)) 92 | layers_bn.append(nn.BatchNorm1d(hidden_size, momentum=0.1)) 93 | 94 | next_dilation *= filter_widths[i] 95 | 96 | self.layers_conv = nn.ModuleList(layers_conv) 97 | self.layers_bn = nn.ModuleList(layers_bn) 98 | 99 | def _forward_blocks(self, x): 100 | x = self.drop(self.relu(self.expand_bn(self.expand_conv(x)))) 101 | 102 | for i in range(len(self.pad) - 1): 103 | pad = self.pad[i+1] 104 | res = x[:, :, pad : x.shape[2] - pad] 105 | 106 | x = self.drop(self.relu(self.layers_bn[2*i](self.layers_conv[2*i](x)))) 107 | x = res + self.drop(self.relu(self.layers_bn[2*i + 1](self.layers_conv[2*i + 1](x)))) 108 | 109 | x = self.shrink(x) 110 | return x 111 | 112 | 113 | class TemporalModelOptimized1f(TemporalModelBase): 114 | """ 115 | 3D pose estimation model optimized for single-frame batching, i.e. 116 | where batches have input length = receptive field, and output length = 1. 117 | This scenario is only used for training when stride == 1. 118 | 119 | This implementation replaces dilated convolutions with strided convolutions 120 | to avoid generating unused intermediate results. The weights are interchangeable 121 | with the reference implementation. 122 | """ 123 | 124 | def __init__(self, in_joint, in_channel, out_joint, out_channel, 125 | filter_widths, hidden_size, dropout, dsc): 126 | super().__init__(in_joint, in_channel, out_joint, out_channel, 127 | filter_widths, hidden_size, dropout, dsc) 128 | 129 | self.expand_conv = nn.Conv1d(in_joint*in_channel, hidden_size, filter_widths[0], 130 | stride=filter_widths[0], bias=False) 131 | 132 | layers_conv = [] 133 | layers_bn = [] 134 | 135 | next_dilation = filter_widths[0] 136 | conv_class = DepthwiseSeparableConv1d if dsc else nn.Conv1d 137 | for i in range(1, len(filter_widths)): 138 | self.pad.append((filter_widths[i] - 1)*next_dilation // 2) 139 | layers_conv.append(conv_class( 140 | hidden_size, hidden_size, filter_widths[i], stride=filter_widths[i], bias=False 141 | )) 142 | layers_bn.append(nn.BatchNorm1d(hidden_size, momentum=0.1)) 143 | layers_conv.append(nn.Conv1d(hidden_size, hidden_size, 1, dilation=1, bias=False)) 144 | layers_bn.append(nn.BatchNorm1d(hidden_size, momentum=0.1)) 145 | next_dilation *= filter_widths[i] 146 | 147 | self.layers_conv = nn.ModuleList(layers_conv) 148 | self.layers_bn = nn.ModuleList(layers_bn) 149 | 150 | def _forward_blocks(self, x): 151 | x = self.drop(self.relu(self.expand_bn(self.expand_conv(x)))) 152 | 153 | for i in range(len(self.pad) - 1): 154 | res = x[:, :, self.filter_widths[i+1]//2 :: self.filter_widths[i+1]] 155 | 156 | x = self.drop(self.relu(self.layers_bn[2*i](self.layers_conv[2*i](x)))) 157 | x = res + self.drop(self.relu(self.layers_bn[2*i + 1](self.layers_conv[2*i + 1](x)))) 158 | 159 | x = self.shrink(x) 160 | return x -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | from . import smooth, camera, vis -------------------------------------------------------------------------------- /utils/camera.py: -------------------------------------------------------------------------------- 1 | import h5py 2 | import numpy as np 3 | from pathlib import Path 4 | 5 | def load_camera_params(file): 6 | cam_file = Path(file) 7 | cam_params = {} 8 | azimuth = { 9 | '54138969': 70, '55011271': -70, '58860488': 110, '60457274': -100 10 | } 11 | with h5py.File(cam_file) as f: 12 | subjects = [1, 5, 6, 7, 8, 9, 11] 13 | for s in subjects: 14 | cam_params[f'S{s}'] = {} 15 | for _, params in f[f'subject{s}'].items(): 16 | name = params['Name'] 17 | name = ''.join([chr(c) for c in name]) 18 | val = {} 19 | val['R'] = np.array(params['R']) 20 | val['T'] = np.array(params['T']) 21 | val['c'] = np.array(params['c']) 22 | val['f'] = np.array(params['f']) 23 | val['k'] = np.array(params['k']) 24 | val['p'] = np.array(params['p']) 25 | val['azimuth'] = azimuth[name] 26 | cam_params[f'S{s}'][name] = val 27 | 28 | return cam_params 29 | 30 | 31 | def world2camera(pose, R, T): 32 | """ 33 | Args: 34 | pose: numpy array with shape (-1, 3) 35 | R: numpy array with shape (3, 3) 36 | T: numyp array with shape (3, 1) 37 | """ 38 | assert pose.shape[-1] == 3 39 | original_shape = pose.shape 40 | pose_world = pose.copy().reshape((-1, 3)).T 41 | pose_cam = np.matmul(R.T, pose_world - T) 42 | pose_cam = pose_cam.T.reshape(original_shape) 43 | return pose_cam 44 | 45 | 46 | def camera2world(pose, R, T): 47 | """ 48 | Args: 49 | pose: numpy array with shape (..., 3) 50 | R: numpy array with shape (3, 3) 51 | T: numyp array with shape (3, 1) 52 | """ 53 | assert pose.shape[-1] == 3 54 | original_shape = pose.shape 55 | pose_cam = pose.copy().reshape((-1, 3)).T 56 | pose_world = np.matmul(R, pose_cam) + T 57 | pose_world = pose_world.T.reshape(original_shape) 58 | return pose_world 59 | -------------------------------------------------------------------------------- /utils/smooth.py: -------------------------------------------------------------------------------- 1 | def filter_missing_value(keypoints_list, method='ignore'): 2 | # TODO: impletemd 'interpolate' method. 3 | """Filter missing value in pose list. 4 | Args: 5 | keypoints_list: Estimate result returned by 2d estimator. Missing value 6 | will be None. 7 | method: 'ignore' -> drop missing value. 8 | Return: 9 | Keypoints list without missing value. 10 | """ 11 | 12 | result = [] 13 | if method == 'ignore': 14 | result = [pose for pose in keypoints_list if pose is not None] 15 | else: 16 | raise ValueError(f'{method} is not a valid method.') 17 | 18 | return result -------------------------------------------------------------------------------- /utils/vis.py: -------------------------------------------------------------------------------- 1 | from . import camera 2 | 3 | import cv2 4 | import numpy as np 5 | import os 6 | from pathlib import Path 7 | 8 | import matplotlib.pyplot as plt 9 | import mpl_toolkits.mplot3d.axes3d 10 | from matplotlib.animation import FuncAnimation, writers 11 | 12 | 13 | def vis_2d_keypoints( 14 | keypoints, img, skeleton, kp_thresh, 15 | alpha=0.7, output_file=None, show_name=False): 16 | 17 | # Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv. 18 | cmap = plt.get_cmap('rainbow') 19 | colors = [cmap(i) for i in np.linspace(0, 1, skeleton.keypoint_num)] 20 | colors = [(c[2] * 255, c[1] * 255, c[0] * 255) for c in colors] 21 | 22 | mask = img.copy() 23 | root = skeleton.root 24 | stack = [root] 25 | while stack: 26 | parent = stack.pop() 27 | p_idx = skeleton.keypoint2index[parent] 28 | p_pos = int(keypoints[p_idx, 0]), int(keypoints[p_idx, 1]) 29 | p_score = keypoints[p_idx, 2] if kp_thresh is not None else None 30 | if kp_thresh is None or p_score > kp_thresh: 31 | cv2.circle( 32 | mask, p_pos, radius=3, 33 | color=colors[p_idx], thickness=-1, lineType=cv2.LINE_AA) 34 | if show_name: 35 | cv2.putText(mask, parent, p_pos, cv2.FONT_HERSHEY_SIMPLEX, 36 | 0.5, (0, 255, 0)) 37 | for child in skeleton.children[parent]: 38 | if child not in skeleton.keypoint2index or \ 39 | skeleton.keypoint2index[child] < 0: 40 | continue 41 | stack.append(child) 42 | c_idx = skeleton.keypoint2index[child] 43 | c_pos = int(keypoints[c_idx, 0]), int(keypoints[c_idx, 1]) 44 | c_score = keypoints[c_idx, 2] if kp_thresh else None 45 | if kp_thresh is None or \ 46 | (p_score > kp_thresh and c_score > kp_thresh): 47 | cv2.line( 48 | mask, p_pos, c_pos, 49 | color=colors[c_idx], thickness=2, lineType=cv2.LINE_AA) 50 | 51 | vis_result = cv2.addWeighted(img, 1.0 - alpha, mask, alpha, 0) 52 | if output_file: 53 | file = Path(output_file) 54 | if not file.parent.exists(): 55 | os.makedirs(file.parent) 56 | cv2.imwrite(str(output_file), vis_result) 57 | 58 | return vis_result 59 | 60 | 61 | def vis_3d_keypoints( keypoints, skeleton, azimuth, elev=15): 62 | x_max, x_min = np.max(keypoints[:, 0]), np.min(keypoints[:, 0]) 63 | y_max, y_min = np.max(keypoints[:, 1]), np.min(keypoints[:, 1]) 64 | z_max, z_min = np.max(keypoints[:, 2]), np.min(keypoints[:, 2]) 65 | radius = max(x_max - x_min, y_max - y_min, z_max - z_min) / 2 66 | 67 | fig = plt.figure() 68 | ax = fig.add_subplot(111, projection='3d') 69 | ax.view_init(elev=elev, azim=azimuth) 70 | ax.set_xlim3d([-radius, radius]) 71 | ax.set_ylim3d([-radius, radius]) 72 | ax.set_zlim3d([0, 2 * radius]) 73 | 74 | root = skeleton.root 75 | stack = [root] 76 | while stack: 77 | parent = stack.pop() 78 | p_idx = skeleton.keypoint2index[parent] 79 | p_pos = keypoints[p_idx] 80 | for child in skeleton.children[parent]: 81 | if skeleton.keypoint2index.get(child, -1) == -1: 82 | continue 83 | stack.append(child) 84 | c_idx = skeleton.keypoint2index[child] 85 | c_pos = keypoints[c_idx] 86 | if child in skeleton.left_joints: 87 | color = 'b' 88 | elif child in skeleton.right_joints: 89 | color = 'r' 90 | else: 91 | color = 'k' 92 | line = ax.plot( 93 | xs=[p_pos[0], c_pos[0]], 94 | ys=[p_pos[1], c_pos[1]], 95 | zs=[p_pos[2], c_pos[2]], 96 | c=color, marker='.', zdir='z' 97 | ) 98 | 99 | return 100 | 101 | 102 | def vis_3d_keypoints_sequence( 103 | keypoints_sequence, skeleton, azimuth, 104 | fps=30, elev=15, output_file=None 105 | ): 106 | kps_sequence = keypoints_sequence 107 | x_max, x_min = np.max(kps_sequence[:, :, 0]), np.min(kps_sequence[:, :, 0]) 108 | y_max, y_min = np.max(kps_sequence[:, :, 1]), np.min(kps_sequence[:, :, 1]) 109 | z_max, z_min = np.max(kps_sequence[:, :, 2]), np.min(kps_sequence[:, :, 2]) 110 | radius = max(x_max - x_min, y_max - y_min, z_max - z_min) / 2 111 | 112 | fig = plt.figure() 113 | ax = fig.add_subplot(111, projection='3d') 114 | ax.view_init(elev=elev, azim=azimuth) 115 | ax.set_xlim3d([-radius, radius]) 116 | ax.set_ylim3d([-radius, radius]) 117 | ax.set_zlim3d([0, 2 * radius]) 118 | 119 | initialized = False 120 | lines = [] 121 | 122 | def update(frame): 123 | nonlocal initialized 124 | 125 | if not initialized: 126 | root = skeleton.root 127 | stack = [root] 128 | while stack: 129 | parent = stack.pop() 130 | p_idx = skeleton.keypoint2index[parent] 131 | p_pos = kps_sequence[0, p_idx] 132 | for child in skeleton.children[parent]: 133 | if skeleton.keypoint2index.get(child, -1) == -1: 134 | continue 135 | stack.append(child) 136 | c_idx = skeleton.keypoint2index[child] 137 | c_pos = kps_sequence[0, c_idx] 138 | if child in skeleton.left_joints: 139 | color = 'b' 140 | elif child in skeleton.right_joints: 141 | color = 'r' 142 | else: 143 | color = 'k' 144 | line = ax.plot( 145 | xs=[p_pos[0], c_pos[0]], 146 | ys=[p_pos[1], c_pos[1]], 147 | zs=[p_pos[2], c_pos[2]], 148 | c=color, marker='.', zdir='z' 149 | ) 150 | lines.append(line) 151 | initialized = True 152 | else: 153 | line_idx = 0 154 | root = skeleton.root 155 | stack = [root] 156 | while stack: 157 | parent = stack.pop() 158 | p_idx = skeleton.keypoint2index[parent] 159 | p_pos = kps_sequence[frame, p_idx] 160 | for child in skeleton.children[parent]: 161 | if skeleton.keypoint2index.get(child, -1) == -1: 162 | continue 163 | stack.append(child) 164 | c_idx = skeleton.keypoint2index[child] 165 | c_pos = kps_sequence[frame, c_idx] 166 | if child in skeleton.left_joints: 167 | color = 'b' 168 | elif child in skeleton.right_joints: 169 | color = 'r' 170 | else: 171 | color = 'k' 172 | lines[line_idx][0].set_xdata([p_pos[0], c_pos[0]]) 173 | lines[line_idx][0].set_ydata([p_pos[1], c_pos[1]]) 174 | lines[line_idx][0].set_3d_properties( [p_pos[2], c_pos[2]]) 175 | line_idx += 1 176 | 177 | anim = FuncAnimation( 178 | fig=fig, func=update, frames=kps_sequence.shape[0], interval=1000 / fps 179 | ) 180 | 181 | if output_file: 182 | output_file = Path(output_file) 183 | if not output_file.parent.exists(): 184 | os.makedirs(output_file.parent) 185 | if output_file.suffix == '.mp4': 186 | Writer = writers['ffmpeg'] 187 | writer = Writer(fps=fps, metadata={}, bitrate=3000) 188 | anim.save(output_file, writer=writer) 189 | elif output_file.suffix == '.gif': 190 | anim.save(output_file, dpi=80, writer='imagemagick') 191 | else: 192 | raise ValueError(f'Unsupported output format.' 193 | f'Only mp4 and gif are supported.') 194 | 195 | return anim 196 | --------------------------------------------------------------------------------