├── LICENSE ├── README.md ├── assets ├── grasp_fanta │ ├── depths.npy │ └── rgbs.npy ├── imgs │ └── mainfig.png ├── pick_up_phone │ ├── depths.npy │ └── rgbs.npy └── utils │ ├── gripper_point_cloud_dense.npy │ ├── head_cam_intrinsic_matrix_aligned_depth.npy │ └── intrinsics_rgb_d455.npy ├── conda_env.yaml ├── global_vars.py ├── gripper_point_cloud_dense.npy ├── renderer.py ├── requirements.txt ├── util_functions.py ├── video_to_gripper_hamer_kpts.py └── vizualizer.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | BSD 3-Clause License 2 | 3 | Copyright (c) 2025, Georgios Papagiannis 4 | 5 | Redistribution and use in source and binary forms, with or without 6 | modification, are permitted provided that the following conditions are met: 7 | 8 | 1. Redistributions of source code must retain the above copyright notice, this 9 | list of conditions and the following disclaimer. 10 | 11 | 2. Redistributions in binary form must reproduce the above copyright notice, 12 | this list of conditions and the following disclaimer in the documentation 13 | and/or other materials provided with the distribution. 14 | 15 | 3. Neither the name of the copyright holder nor the names of its 16 | contributors may be used to endorse or promote products derived from 17 | this software without specific prior written permission. 18 | 19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 20 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 21 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 22 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 23 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 24 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 25 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 26 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 27 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 28 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 29 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## 🦾 R+X: Retrieval and Execution from Everyday Human Videos 2 | ## 🎥 Extracting robotic actions from human video demonstrations 3 | 4 | ![teaser](./assets/imgs/mainfig.png) 5 | 6 | This repository provides the code used to extract robot actions given a human video demonstration for the paper **R+X: Retrieval and Execution from Everyday Human Videos** [[paper](https://arxiv.org/abs/2407.12957)]. R+X is divided into a Retrieval and an Execution phase. The execution phase leverages the actions extracted from the human videos to execute tasks zero-shot in new settings. R+X's execution phase relies on Keypoint Action Tokens (KAT). For code on KAT please refer to: [colab](https://colab.research.google.com/drive/1ZxZDahvLBE2EGo8_98Lk0Zr8IvUIDjMc?usp=sharing). This is a simplified version in that it is not optimised to process large scale video datasets in parallel, instead it sequentially processes each frame in a video, runs on cpu (apart from the hand-tracking model) and acts as a good starting point demonstrating how to extract actions from human videos. We provide one video of the "grasp franta" and "pick up phone" tasks used in our experiments. Even though our experiments involved a human recording videos while moving, here we provide an example with a moving camera and one with a static camera each with different intrinsics. 7 | 8 | ### 📋 Installation 9 | 10 | Download repository: 11 | 12 | ```bash 13 | git clone https://github.com/gpapagiannis/r-plus-x-hand2actions.git 14 | cd r-plus-x-hand2actions 15 | ``` 16 | 17 | Create a conda environment. Needs python >= 3.10: 18 | 19 | ```bash 20 | conda env create -f conda_env.yaml 21 | conda activate rpx 22 | ``` 23 | 24 | Install requirements.txt 25 | 26 | ```bash 27 | pip install -r requirements.txt 28 | ``` 29 | 30 | - **Install** HaMeR for hand tracking. Follow the instructions in [https://github.com/geopavlakos/hamer](https://github.com/geopavlakos/hamer) 31 | - After installing HaMeR, go to: **_DATA > hamer_ckpts > model_config.yaml**. Then in the model_config.yaml set: **FOCAL_LENGTH = 918**. This ensures compatibility with the camera intrinsics used for the provided examples. We will fix that soon to be set automatically given any intrinsics matrix. 32 | - After installing HaMeR, **replace** the renderer.py file in "hamer/hamer/utils/renderer.py" with the renderer.py provided here [renderer.py](renderer.py) 33 | - **Install** the Pytorch implementation of MANO hand model [https://github.com/otaheri/MANO](https://github.com/otaheri/MANO) 34 | 35 | 36 | 37 | 38 | ### 👟 Running the code ... 39 | 40 | The whole method is contained in [video_to_gripper_hamer_kpts](video_to_gripper_hamer_kpts.py). Just run: 41 | 42 | ```bash 43 | python video_to_gripper_hamer_kpts.py 44 | ``` 45 | By default this will extract the actions for the "grasp_fanta" human video demo found [here](./assets/grasp_fanta/). For the pick_up_phone task (and its corresponding intrinsics), as well as to vizualize step by step the hand extraction process see the [global_vars.py](global_vars.py) file. 46 | 47 | 48 | ### ✏️ Important Note 49 | 50 | Note: Our method is optimised for the Robotiq Gripper 2F-85 which was used for our experiments. The model we used is in [gripper_point_cloud_dense.npy](gripper_point_cloud_dense.npy). In principle, the heuristics used to map the gripper to the hand could be applied to most parallel jaw gripper, however you would need to manually define the gripper points you would like to map the hand to. We have done this in line 552 of [video_to_gripper_hamer_kpts.py](video_to_gripper_hamer_kpts.py). \ 51 | 52 | Line 552: 53 | 54 | ```bash 55 | dense_pcd_kpts = {"index_front": 517980, "thumb_front": 248802, "wrist": 246448} 56 | ``` 57 | 58 | The numbers correspond to the indices of the points in the point cloud: [gripper_point_cloud_dense.npy](gripper_point_cloud_dense.npy). You can vizualize the action extraction process both in our code and by visiting the bottom of our website: [https://www.robot-learning.uk/r-plus-x](https://www.robot-learning.uk/r-plus-x). If you use a different parallel jaw gripper, convert the mesh to a point cloud, select the indices you would like to map from the gripper to the tip of the index, the tip of the thumb and the mean distance between index mcp and the thumb dip of the hand. 59 | 60 | ### 🎞️ Vizualizing the results 61 | 62 | The notebook [vizualizer.ipynb](vizualizer.ipynb) provides minimal code to vizualize the hand to gripper action extraction results. For each task a scene_files folder is created that contains for each video frame, the frame's point cloud, the detected hand mesh, the gripper pose in the camera's frame, the gripper's point cloud and a file hand_joints_kpts_3d.npy that includes the sequence of gripper poses which could be used for further processing to train a model. 63 | -------------------------------------------------------------------------------- /assets/grasp_fanta/depths.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/grasp_fanta/depths.npy -------------------------------------------------------------------------------- /assets/grasp_fanta/rgbs.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/grasp_fanta/rgbs.npy -------------------------------------------------------------------------------- /assets/imgs/mainfig.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/imgs/mainfig.png -------------------------------------------------------------------------------- /assets/pick_up_phone/depths.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/pick_up_phone/depths.npy -------------------------------------------------------------------------------- /assets/pick_up_phone/rgbs.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/pick_up_phone/rgbs.npy -------------------------------------------------------------------------------- /assets/utils/gripper_point_cloud_dense.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/utils/gripper_point_cloud_dense.npy -------------------------------------------------------------------------------- /assets/utils/head_cam_intrinsic_matrix_aligned_depth.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/utils/head_cam_intrinsic_matrix_aligned_depth.npy -------------------------------------------------------------------------------- /assets/utils/intrinsics_rgb_d455.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/utils/intrinsics_rgb_d455.npy -------------------------------------------------------------------------------- /conda_env.yaml: -------------------------------------------------------------------------------- 1 | name: rpx 2 | channels: 3 | - pytorch 4 | - nvidia 5 | - conda-forge 6 | dependencies: 7 | - python=3.10.12 -------------------------------------------------------------------------------- /global_vars.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import os 3 | 4 | task_folder = "assets/grasp_fanta" # folder with the task, also do "assets/pick_up_phone" for pick up phone. if you change task uncomment the corresponding intrinsics matrix below. 5 | VIZ = False # visualize the action extraction/hand tracking process 6 | IS_PRESS_TASK = False # extract robot action for press task 7 | LOAD_SCENE_DATA_FOR_PROCESSING=False # load scene data after extracting hand poses to create videos etc. for vizualization 8 | INTRINSICS_REAL_CAMERA = np.load("assets/utils/intrinsics_rgb_d455.npy") # intrinsics for camera used for grasp_fanta 9 | # INTRINSICS_REAL_CAMERA = np.load("assets/utils/head_cam_intrinsic_matrix_aligned_depth.npy") # intrinsics for camera used for pick_up_phone 10 | VIZ_DEMO = False # vizualize video of the demonstration before processing 11 | START = 0 # which part of the video to use start:end 12 | END = 1000000 13 | FRAME_STEP = 1 # use every nth frame 14 | hands_rgb = np.load(f"{task_folder}/rgbs.npy")[START:END] 15 | hands_depth = np.load(f"{task_folder}/depths.npy")[START:END] 16 | MODEL_MANO_PATH = 'hamer/_DATA/_DATA/data/mano' # path to mano model 17 | SCENE_FILES_FOLDER = f"{task_folder}/scene_files" 18 | os.makedirs(SCENE_FILES_FOLDER, exist_ok=True) # make dir if not exists 19 | INTRINSICS_HAMER_RENDERER = np.eye(4) 20 | INTRINSICS_HAMER_RENDERER[0 ,0] = 2295.0 21 | INTRINSICS_HAMER_RENDERER[1, 1] = 2295.0 22 | INTRINSICS_HAMER_RENDERER[0, 2] = 320.0 23 | INTRINSICS_HAMER_RENDERER[1, 2] = 240.0 24 | # probably do not need to change 25 | T_OPENGL_TO_OPENCV = np.array([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]]) 26 | HUMAN_HAND_COLOR=(0.999, 0.6745, 0.4117) 27 | MANO_HAND_IDS = {"wrist": 0, "index_mcp": 1, "index_pip": 2, 28 | "index_dip": 3, "middle_mcp": 4, "middle_pip": 5, 29 | "middle_dip": 6, "pinkie_mcp": 7, "pinkie_pip": 8, 30 | "pinkie_dip": 9, "ring_mcp": 10, "ring_pip": 11, 31 | "ring_dip": 12, "thumb_mcp": 13, "thumb_pip": 14, 32 | "thumb_dip": 15, "thumb_tip": 16, "index_tip": 17, 33 | "middle_tip": 18, "ring_tip": 19, "pinky_tip": 20} 34 | DISTANCE_BETWEEN_GRIPPERS_FINGERS = 0.08507 35 | 36 | -------------------------------------------------------------------------------- /gripper_point_cloud_dense.npy: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/gripper_point_cloud_dense.npy -------------------------------------------------------------------------------- /renderer.py: -------------------------------------------------------------------------------- 1 | import os 2 | if 'PYOPENGL_PLATFORM' not in os.environ: 3 | os.environ['PYOPENGL_PLATFORM'] = 'egl' 4 | import torch 5 | import numpy as np 6 | import pyrender 7 | import trimesh 8 | import cv2 9 | from yacs.config import CfgNode 10 | from typing import List, Optional 11 | 12 | def cam_crop_to_full(cam_bbox, box_center, box_size, img_size, focal_length=5000.): 13 | # Convert cam_bbox to full image 14 | img_w, img_h = img_size[:, 0], img_size[:, 1] 15 | cx, cy, b = box_center[:, 0], box_center[:, 1], box_size 16 | w_2, h_2 = img_w / 2., img_h / 2. 17 | bs = b * cam_bbox[:, 0] + 1e-9 18 | tz = 2 * focal_length / bs 19 | tx = (2 * (cx - w_2) / bs) + cam_bbox[:, 1] 20 | ty = (2 * (cy - h_2) / bs) + cam_bbox[:, 2] 21 | full_cam = torch.stack([tx, ty, tz], dim=-1) 22 | return full_cam 23 | 24 | def get_light_poses(n_lights=5, elevation=np.pi / 3, dist=12): 25 | # get lights in a circle around origin at elevation 26 | thetas = elevation * np.ones(n_lights) 27 | phis = 2 * np.pi * np.arange(n_lights) / n_lights 28 | poses = [] 29 | trans = make_translation(torch.tensor([0, 0, dist])) 30 | for phi, theta in zip(phis, thetas): 31 | rot = make_rotation(rx=-theta, ry=phi, order="xyz") 32 | poses.append((rot @ trans).numpy()) 33 | return poses 34 | 35 | def make_translation(t): 36 | return make_4x4_pose(torch.eye(3), t) 37 | 38 | def make_rotation(rx=0, ry=0, rz=0, order="xyz"): 39 | Rx = rotx(rx) 40 | Ry = roty(ry) 41 | Rz = rotz(rz) 42 | if order == "xyz": 43 | R = Rz @ Ry @ Rx 44 | elif order == "xzy": 45 | R = Ry @ Rz @ Rx 46 | elif order == "yxz": 47 | R = Rz @ Rx @ Ry 48 | elif order == "yzx": 49 | R = Rx @ Rz @ Ry 50 | elif order == "zyx": 51 | R = Rx @ Ry @ Rz 52 | elif order == "zxy": 53 | R = Ry @ Rx @ Rz 54 | return make_4x4_pose(R, torch.zeros(3)) 55 | 56 | def make_4x4_pose(R, t): 57 | """ 58 | :param R (*, 3, 3) 59 | :param t (*, 3) 60 | return (*, 4, 4) 61 | """ 62 | dims = R.shape[:-2] 63 | pose_3x4 = torch.cat([R, t.view(*dims, 3, 1)], dim=-1) 64 | bottom = ( 65 | torch.tensor([0, 0, 0, 1], device=R.device) 66 | .reshape(*(1,) * len(dims), 1, 4) 67 | .expand(*dims, 1, 4) 68 | ) 69 | return torch.cat([pose_3x4, bottom], dim=-2) 70 | 71 | 72 | def rotx(theta): 73 | return torch.tensor( 74 | [ 75 | [1, 0, 0], 76 | [0, np.cos(theta), -np.sin(theta)], 77 | [0, np.sin(theta), np.cos(theta)], 78 | ], 79 | dtype=torch.float32, 80 | ) 81 | 82 | 83 | def roty(theta): 84 | return torch.tensor( 85 | [ 86 | [np.cos(theta), 0, np.sin(theta)], 87 | [0, 1, 0], 88 | [-np.sin(theta), 0, np.cos(theta)], 89 | ], 90 | dtype=torch.float32, 91 | ) 92 | 93 | 94 | def rotz(theta): 95 | return torch.tensor( 96 | [ 97 | [np.cos(theta), -np.sin(theta), 0], 98 | [np.sin(theta), np.cos(theta), 0], 99 | [0, 0, 1], 100 | ], 101 | dtype=torch.float32, 102 | ) 103 | 104 | 105 | def create_raymond_lights() -> List[pyrender.Node]: 106 | """ 107 | Return raymond light nodes for the scene. 108 | """ 109 | thetas = np.pi * np.array([1.0 / 6.0, 1.0 / 6.0, 1.0 / 6.0]) 110 | phis = np.pi * np.array([0.0, 2.0 / 3.0, 4.0 / 3.0]) 111 | 112 | nodes = [] 113 | 114 | for phi, theta in zip(phis, thetas): 115 | xp = np.sin(theta) * np.cos(phi) 116 | yp = np.sin(theta) * np.sin(phi) 117 | zp = np.cos(theta) 118 | 119 | z = np.array([xp, yp, zp]) 120 | z = z / np.linalg.norm(z) 121 | x = np.array([-z[1], z[0], 0.0]) 122 | if np.linalg.norm(x) == 0: 123 | x = np.array([1.0, 0.0, 0.0]) 124 | x = x / np.linalg.norm(x) 125 | y = np.cross(z, x) 126 | 127 | matrix = np.eye(4) 128 | matrix[:3,:3] = np.c_[x,y,z] 129 | nodes.append(pyrender.Node( 130 | light=pyrender.DirectionalLight(color=np.ones(3), intensity=1.0), 131 | matrix=matrix 132 | )) 133 | 134 | return nodes 135 | 136 | class Renderer: 137 | 138 | def __init__(self, cfg: CfgNode, faces: np.array, load_from_custom_file=False): 139 | """ 140 | Wrapper around the pyrender renderer to render MANO meshes. 141 | Args: 142 | cfg (CfgNode): Model config file. 143 | faces (np.array): Array of shape (F, 3) containing the mesh faces. 144 | """ 145 | self.cfg = cfg 146 | self.focal_length = cfg.EXTRA.FOCAL_LENGTH 147 | self.img_res = cfg.MODEL.IMAGE_SIZE 148 | 149 | # add faces that make the hand mesh watertight 150 | faces_new = np.array([[92, 38, 234], 151 | [234, 38, 239], 152 | [38, 122, 239], 153 | [239, 122, 279], 154 | [122, 118, 279], 155 | [279, 118, 215], 156 | [118, 117, 215], 157 | [215, 117, 214], 158 | [117, 119, 214], 159 | [214, 119, 121], 160 | [119, 120, 121], 161 | [121, 120, 78], 162 | [120, 108, 78], 163 | [78, 108, 79]]) 164 | faces = np.concatenate([faces, faces_new], axis=0) 165 | 166 | self.camera_center = [self.img_res // 2, self.img_res // 2] 167 | self.faces = faces 168 | self.faces_left = self.faces[:,[0,2,1]] 169 | 170 | def __call__(self, 171 | vertices: np.array, 172 | camera_translation: np.array, 173 | image: torch.Tensor, 174 | full_frame: bool = False, 175 | imgname: Optional[str] = None, 176 | side_view=False, rot_angle=90, 177 | mesh_base_color=(1.0, 1.0, 0.9), 178 | scene_bg_color=(0,0,0), 179 | return_rgba=False, 180 | no_camera=False, 181 | ) -> np.array: 182 | """ 183 | Render meshes on input image 184 | Args: 185 | vertices (np.array): Array of shape (V, 3) containing the mesh vertices. 186 | camera_translation (np.array): Array of shape (3,) with the camera translation. 187 | image (torch.Tensor): Tensor of shape (3, H, W) containing the image crop with normalized pixel values. 188 | full_frame (bool): If True, then render on the full image. 189 | imgname (Optional[str]): Contains the original image filenamee. Used only if full_frame == True. 190 | """ 191 | if full_frame: 192 | image = cv2.imread(imgname).astype(np.float32)[:, :, ::-1] / 255. 193 | else: 194 | image = image.clone() * torch.tensor(self.cfg.MODEL.IMAGE_STD, device=image.device).reshape(3,1,1) 195 | image = image + torch.tensor(self.cfg.MODEL.IMAGE_MEAN, device=image.device).reshape(3,1,1) 196 | image = image.permute(1, 2, 0).cpu().numpy() 197 | 198 | renderer = pyrender.OffscreenRenderer(viewport_width=image.shape[1], 199 | viewport_height=image.shape[0], 200 | point_size=1.0) 201 | material = pyrender.MetallicRoughnessMaterial( 202 | metallicFactor=0.0, 203 | alphaMode='OPAQUE', 204 | baseColorFactor=(*mesh_base_color, 1.0)) 205 | 206 | camera_translation[0] *= -1. 207 | 208 | mesh = trimesh.Trimesh(vertices.copy(), self.faces.copy()) 209 | if side_view: 210 | rot = trimesh.transformations.rotation_matrix( 211 | np.radians(rot_angle), [0, 1, 0]) 212 | mesh.apply_transform(rot) 213 | rot = trimesh.transformations.rotation_matrix( 214 | np.radians(180), [1, 0, 0]) 215 | mesh.apply_transform(rot) 216 | 217 | 218 | mesh = pyrender.Mesh.from_trimesh(mesh, material=material) 219 | 220 | scene = pyrender.Scene(bg_color=[*scene_bg_color, 0.0], 221 | ambient_light=(0.3, 0.3, 0.3)) 222 | scene.add(mesh, 'mesh') 223 | 224 | 225 | # Convert mesh to point cloud 226 | # mesh_pc = pyrender.Mesh.from_points(mesh.vertices) 227 | 228 | # # Show point cloud using open3d 229 | # print('Showing point cloud=====================') 230 | # import open3d as o3d 231 | # pcd = o3d.geometry.PointCloud() 232 | # pcd.points = o3d.utility.Vector3dVector(mesh_pc.vertices) 233 | # o3d.visualization.draw_geometries([pcd]) 234 | 235 | 236 | 237 | camera_pose = np.eye(4) 238 | if not no_camera: 239 | camera_pose[:3, 3] = camera_translation 240 | camera_center = [image.shape[1] / 2., image.shape[0] / 2.] 241 | camera = pyrender.IntrinsicsCamera(fx=self.focal_length, fy=self.focal_length, 242 | cx=camera_center[0], cy=camera_center[1], zfar=1e12) 243 | scene.add(camera, pose=camera_pose) 244 | 245 | 246 | light_nodes = create_raymond_lights() 247 | for node in light_nodes: 248 | scene.add_node(node) 249 | 250 | color, rend_depth = renderer.render(scene, flags=pyrender.RenderFlags.RGBA) 251 | color = color.astype(np.float32) / 255.0 252 | renderer.delete() 253 | 254 | if return_rgba: 255 | return color, rend_depth, mesh 256 | 257 | valid_mask = (color[:, :, -1])[:, :, np.newaxis] 258 | if not side_view: 259 | output_img = (color[:, :, :3] * valid_mask + (1 - valid_mask) * image) 260 | else: 261 | output_img = color[:, :, :3] 262 | 263 | output_img = output_img.astype(np.float32) 264 | return output_img, rend_depth, mesh 265 | 266 | def vertices_to_trimesh(self, vertices, camera_translation, mesh_base_color=(1.0, 1.0, 0.9), 267 | rot_axis=[1,0,0], rot_angle=0, is_right=1): 268 | # material = pyrender.MetallicRoughnessMaterial( 269 | # metallicFactor=0.0, 270 | # alphaMode='OPAQUE', 271 | # baseColorFactor=(*mesh_base_color, 1.0)) 272 | vertex_colors = np.array([(*mesh_base_color, 1.0)] * vertices.shape[0]) 273 | if is_right: 274 | mesh = trimesh.Trimesh(vertices.copy() + camera_translation, self.faces.copy(), vertex_colors=vertex_colors) 275 | else: 276 | mesh = trimesh.Trimesh(vertices.copy() + camera_translation, self.faces_left.copy(), vertex_colors=vertex_colors) 277 | # mesh = trimesh.Trimesh(vertices.copy(), self.faces.copy()) 278 | 279 | rot = trimesh.transformations.rotation_matrix( 280 | np.radians(rot_angle), rot_axis) 281 | mesh.apply_transform(rot) 282 | 283 | # Show point cloud using open3d 284 | # print('Showing point cloud=====================') 285 | # import open3d as o3d 286 | # pcd = o3d.geometry.PointCloud() 287 | # pcd.points = o3d.utility.Vector3dVector(mesh.vertices) 288 | # o3d.visualization.draw_geometries([pcd]) 289 | 290 | rot = trimesh.transformations.rotation_matrix( 291 | np.radians(180), [1, 0, 0]) 292 | mesh.apply_transform(rot) 293 | return mesh 294 | 295 | def render_rgba( 296 | self, 297 | vertices: np.array, 298 | cam_t = None, 299 | rot=None, 300 | rot_axis=[1,0,0], 301 | rot_angle=0, 302 | camera_z=3, 303 | # camera_translation: np.array, 304 | mesh_base_color=(1.0, 1.0, 0.9), 305 | scene_bg_color=(0,0,0), 306 | render_res=[256, 256], 307 | focal_length=None, 308 | is_right=None, 309 | ): 310 | 311 | renderer = pyrender.OffscreenRenderer(viewport_width=render_res[0], 312 | viewport_height=render_res[1], 313 | point_size=1.0) 314 | # material = pyrender.MetallicRoughnessMaterial( 315 | # metallicFactor=0.0, 316 | # alphaMode='OPAQUE', 317 | # baseColorFactor=(*mesh_base_color, 1.0)) 318 | 319 | focal_length = focal_length if focal_length is not None else self.focal_length 320 | 321 | if cam_t is not None: 322 | camera_translation = cam_t.copy() 323 | camera_translation[0] *= -1. 324 | else: 325 | camera_translation = np.array([0, 0, camera_z * focal_length/render_res[1]]) 326 | 327 | mesh = self.vertices_to_trimesh(vertices, np.array([0, 0, 0]), mesh_base_color, rot_axis, rot_angle, is_right=is_right) 328 | mesh = pyrender.Mesh.from_trimesh(mesh) 329 | # mesh = pyrender.Mesh.from_trimesh(mesh, material=material) 330 | 331 | scene = pyrender.Scene(bg_color=[*scene_bg_color, 0.0], 332 | ambient_light=(0.3, 0.3, 0.3)) 333 | scene.add(mesh, 'mesh') 334 | 335 | camera_pose = np.eye(4) 336 | camera_pose[:3, 3] = camera_translation 337 | camera_center = [render_res[0] / 2., render_res[1] / 2.] 338 | camera = pyrender.IntrinsicsCamera(fx=focal_length, fy=focal_length, 339 | cx=camera_center[0], cy=camera_center[1], zfar=1e12) 340 | 341 | # Create camera node and add it to pyRender scene 342 | camera_node = pyrender.Node(camera=camera, matrix=camera_pose) 343 | scene.add_node(camera_node) 344 | self.add_point_lighting(scene, camera_node) 345 | self.add_lighting(scene, camera_node) 346 | 347 | light_nodes = create_raymond_lights() 348 | for node in light_nodes: 349 | scene.add_node(node) 350 | 351 | color, rend_depth = renderer.render(scene, flags=pyrender.RenderFlags.RGBA) 352 | color = color.astype(np.float32) / 255.0 353 | renderer.delete() 354 | 355 | return color 356 | 357 | def render_rgba_multiple( 358 | self, 359 | vertices: List[np.array], 360 | cam_t: List[np.array], 361 | rot_axis=[1,0,0], 362 | rot_angle=0, 363 | mesh_base_color=(1.0, 1.0, 0.9), 364 | scene_bg_color=(0,0,0), 365 | render_res=[256, 256], 366 | focal_length=None, 367 | is_right=None, 368 | ): 369 | 370 | renderer = pyrender.OffscreenRenderer(viewport_width=render_res[0], 371 | viewport_height=render_res[1], 372 | point_size=1.0) 373 | # material = pyrender.MetallicRoughnessMaterial( 374 | # metallicFactor=0.0, 375 | # alphaMode='OPAQUE', 376 | # baseColorFactor=(*mesh_base_color, 1.0)) 377 | 378 | if is_right is None: 379 | is_right = [1 for _ in range(len(vertices))] 380 | 381 | mesh_list = [pyrender.Mesh.from_trimesh(self.vertices_to_trimesh(vvv, ttt.copy(), mesh_base_color, rot_axis, rot_angle, is_right=sss)) for vvv,ttt,sss in zip(vertices, cam_t, is_right)] 382 | trimesh_list = [self.vertices_to_trimesh(vvv, ttt.copy(), mesh_base_color, rot_axis, rot_angle, is_right=sss) for vvv,ttt,sss in zip(vertices, cam_t, is_right)] 383 | scene = pyrender.Scene(bg_color=[*scene_bg_color, 0.0], 384 | ambient_light=(0.3, 0.3, 0.3)) 385 | for i,mesh in enumerate(mesh_list): 386 | scene.add(mesh, f'mesh_{i}') 387 | 388 | camera_pose = np.eye(4) 389 | # camera_pose[:3, 3] = camera_translation 390 | camera_center = [render_res[0] / 2., render_res[1] / 2.] 391 | focal_length = focal_length if focal_length is not None else self.focal_length 392 | print(f'[render_rgba_multiple] Focal length: {focal_length}') 393 | print(f'[render_rgba_multiple] Camera center: {camera_center}') 394 | camera = pyrender.IntrinsicsCamera(fx=focal_length, fy=focal_length, 395 | cx=camera_center[0], cy=camera_center[1], zfar=1e12) 396 | 397 | # Create camera node and add it to pyRender scene 398 | camera_node = pyrender.Node(camera=camera, matrix=camera_pose) 399 | scene.add_node(camera_node) 400 | self.add_point_lighting(scene, camera_node) 401 | self.add_lighting(scene, camera_node) 402 | 403 | light_nodes = create_raymond_lights() 404 | for node in light_nodes: 405 | scene.add_node(node) 406 | 407 | color, rend_depth = renderer.render(scene, flags=pyrender.RenderFlags.RGBA) 408 | # Convert normalized depth to world space depth 409 | # rend_depth = rend_depth * 2.0 - 1.0 410 | # rend_depth = 2.0 * focal_length / (rend_depth * render_res[1]) 411 | 412 | color = color.astype(np.float32) / 255.0 413 | renderer.delete() 414 | 415 | return color, rend_depth, trimesh_list 416 | 417 | def add_lighting(self, scene, cam_node, color=np.ones(3), intensity=1.0): 418 | # from phalp.visualize.py_renderer import get_light_poses 419 | light_poses = get_light_poses() 420 | light_poses.append(np.eye(4)) 421 | cam_pose = scene.get_pose(cam_node) 422 | for i, pose in enumerate(light_poses): 423 | matrix = cam_pose @ pose 424 | node = pyrender.Node( 425 | name=f"light-{i:02d}", 426 | light=pyrender.DirectionalLight(color=color, intensity=intensity), 427 | matrix=matrix, 428 | ) 429 | if scene.has_node(node): 430 | continue 431 | scene.add_node(node) 432 | 433 | def add_point_lighting(self, scene, cam_node, color=np.ones(3), intensity=1.0): 434 | # from phalp.visualize.py_renderer import get_light_poses 435 | light_poses = get_light_poses(dist=0.5) 436 | light_poses.append(np.eye(4)) 437 | cam_pose = scene.get_pose(cam_node) 438 | for i, pose in enumerate(light_poses): 439 | matrix = cam_pose @ pose 440 | # node = pyrender.Node( 441 | # name=f"light-{i:02d}", 442 | # light=pyrender.DirectionalLight(color=color, intensity=intensity), 443 | # matrix=matrix, 444 | # ) 445 | node = pyrender.Node( 446 | name=f"plight-{i:02d}", 447 | light=pyrender.PointLight(color=color, intensity=intensity), 448 | matrix=matrix, 449 | ) 450 | if scene.has_node(node): 451 | continue 452 | scene.add_node(node) 453 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | absl-py==2.2.0 2 | addict==2.4.0 3 | aiohappyeyeballs==2.6.1 4 | aiohttp==3.11.14 5 | aiosignal==1.3.2 6 | antlr4-python3-runtime==4.9.3 7 | async-timeout==5.0.1 8 | attrs==25.3.0 9 | beautifulsoup4==4.13.3 10 | black==25.1.0 11 | blinker==1.9.0 12 | braceexpand==0.1.7 13 | 14 | certifi==2025.1.31 15 | cffi==1.17.1 16 | charset-normalizer==3.4.1 17 | 18 | click==8.1.8 19 | cloudpickle==3.1.1 20 | ConfigArgParse==1.7 21 | 22 | cryptease==0.2.0 23 | cryptography==44.0.2 24 | 25 | 26 | dash==3.0.0 27 | 28 | einops==0.8.1 29 | fastjsonschema==2.21.1 30 | filelock==3.18.0 31 | Flask==3.0.3 32 | flatbuffers==25.2.10 33 | 34 | 35 | frozenlist==1.5.0 36 | fsspec==2025.3.0 37 | future==1.0.0 38 | fvcore==0.1.5.post20221221 39 | gdown==5.2.0 40 | grpcio==1.71.0 41 | huggingface-hub==0.29.3 42 | hydra-core==1.3.2 43 | idna==3.10 44 | 45 | importlib_metadata==8.6.1 46 | iopath==0.1.9 47 | itsdangerous==2.2.0 48 | jax==0.5.3 49 | jaxlib==0.5.3 50 | Jinja2==3.1.6 51 | joblib==1.4.2 52 | json-tricks==3.17.3 53 | jsonschema==4.23.0 54 | jsonschema-specifications==2024.10.1 55 | jupyter_core==5.7.2 56 | 57 | lazy_loader==0.4 58 | lightning-utilities==0.14.2 59 | lxml==5.3.1 60 | Markdown==3.7 61 | markdown-it-py==3.0.0 62 | MarkupSafe==3.0.2 63 | matplotlib==3.10.0 64 | mdurl==0.1.2 65 | mediapipe==0.10.21 66 | ml_dtypes==0.5.1 67 | mmcv==1.3.9 68 | mmengine==0.10.7 69 | mpmath==1.3.0 70 | multidict==6.2.0 71 | munkres==1.1.4 72 | mypy-extensions==1.0.0 73 | narwhals==1.32.0 74 | nbformat==5.10.4 75 | nest-asyncio==1.6.0 76 | 77 | numpy==1.26.4 78 | omegaconf==2.3.0 79 | open3d==0.19.0 80 | opencv-contrib-python==4.11.0.86 81 | opencv-python==4.11.0.86 82 | opt_einsum==3.4.0 83 | 84 | pandas==2.2.3 85 | pathspec==0.12.1 86 | pillow==11.1.0 87 | platformdirs==4.3.7 88 | plotly==6.0.1 89 | portalocker==3.1.1 90 | propcache==0.3.0 91 | protobuf==4.25.6 92 | 93 | pycparser==2.22 94 | 95 | Pygments==2.19.1 96 | PyOpenGL==3.1.0 97 | PyOpenGL-accelerate==3.1.9 98 | 99 | pyquaternion==0.9.9 100 | 101 | PySocks==1.7.1 102 | 103 | pytorch-lightning==2.5.1 104 | pytz==2025.1 105 | PyYAML==6.0.2 106 | referencing==0.36.2 107 | requests==2.32.3 108 | retrying==1.3.4 109 | rich==13.9.4 110 | rpds-py==0.23.1 111 | safetensors==0.5.3 112 | scikit-image==0.25.2 113 | scikit-learn==1.6.1 114 | scipy==1.15.2 115 | sentencepiece==0.2.0 116 | six==1.17.0 117 | smplx==0.1.28 118 | sounddevice==0.5.1 119 | soupsieve==2.6 120 | stringcase==1.2.0 121 | sympy==1.13.1 122 | tabulate==0.9.0 123 | tensorboard==2.19.0 124 | tensorboard-data-server==0.7.2 125 | termcolor==2.5.0 126 | threadpoolctl==3.6.0 127 | tifffile==2025.3.13 128 | timm==1.0.15 129 | tomli==2.2.1 130 | torch==2.6.0 131 | torchaudio==2.6.0 132 | torchgeometry==0.1.2 133 | torchmetrics==1.7.0 134 | torchvision==0.21.0 135 | tqdm==4.67.1 136 | traitlets==5.14.3 137 | 138 | typing_extensions==4.12.2 139 | tzdata==2025.2 140 | 141 | urllib3==2.3.0 142 | webdataset==0.2.111 143 | Werkzeug==3.0.6 144 | xtcocotools==1.14.3 145 | yacs==0.1.8 146 | yapf==0.43.0 147 | yarl==1.18.3 148 | zipp==3.21.0 149 | -------------------------------------------------------------------------------- /util_functions.py: -------------------------------------------------------------------------------- 1 | from scipy.spatial.transform import Rotation as R 2 | from scipy.spatial.transform import Slerp 3 | import open3d as o3d 4 | import numpy as np 5 | 6 | def point_cloud_to_depth_image(point_cloud, fx, fy, cx, cy, width=1280, height=720): 7 | """ 8 | Convert a 3D point cloud into a depth image. 9 | 10 | Parameters: 11 | - point_cloud: A 2D numpy array of shape (N, 3), containing the (X, Y, Z) coordinates of each point. 12 | - fx, fy: The focal lengths of the camera in pixels. 13 | - cx, cy: The optical center (principal point) of the camera, typically the image center. 14 | - width, height: The dimensions of the resulting depth image. 15 | 16 | Returns: 17 | - A 2D numpy array of shape (height, width), representing the depth image, where each pixel contains 18 | the depth value of the corresponding point in the point cloud. Pixels with no corresponding point 19 | are set to 0. 20 | """ 21 | # Initialize the depth image with zeros (no depth) 22 | depth_image = np.zeros((height, width), dtype=np.float32) 23 | 24 | # Extract X, Y, Z coordinates 25 | X, Y, Z = point_cloud[:, 0], point_cloud[:, 1], point_cloud[:, 2] 26 | 27 | # Exclude points with no depth or behind the camera 28 | # valid = Z > 0 29 | # X = X[valid] 30 | # Y = Y[valid] 31 | # Z = Z[valid] 32 | 33 | # Project points onto the image plane 34 | u = (X * fx / Z) + cx 35 | v = (Y * fy / Z) + cy 36 | 37 | # Convert coordinates to integer pixel indices 38 | u = np.round(u).astype(int) 39 | v = np.round(v).astype(int) 40 | 41 | # Filter out points that fall outside the image 42 | valid = (u >= 0) & (u < width) & (v >= 0) & (v < height) 43 | u = u[valid] 44 | v = v[valid] 45 | Z = Z[valid] 46 | 47 | # Update the depth image 48 | depth_image[v, u] = Z 49 | 50 | return depth_image 51 | 52 | 53 | def depth_to_point_cloud(depth_im, fx, fy, cx, cy): 54 | rows, cols = depth_im.shape 55 | x = np.arange(0, cols) 56 | y = np.arange(0, rows) 57 | xx, yy = np.meshgrid(x, y) 58 | X = (xx - cx) * depth_im / fx 59 | Y = (yy - cy) * depth_im / fy 60 | Z = depth_im 61 | return np.stack([X, Y, Z], axis=-1) 62 | 63 | 64 | def find_scaled_transformation(p, q, use_scale=True): 65 | # p, q are Nx3 arrays of corresponding points 66 | # Calculate centroids 67 | Cp = np.mean(p, axis=0) 68 | Cq = np.mean(q, axis=0) 69 | # Center points around the centroids 70 | p_centered = p - Cp 71 | q_centered = q - Cq 72 | # Compute scale 73 | # Using the ratio of sums of distances from the centroids 74 | scale = np.sqrt((q_centered ** 2).sum() / (p_centered ** 2).sum()) 75 | 76 | if not use_scale: 77 | scale = 1 78 | # Compute covariance matrix H 79 | H = np.dot(p_centered.T, q_centered) 80 | # SVD on covariance matrix 81 | U, S, Vt = np.linalg.svd(H) 82 | R = np.dot(Vt.T, U.T) 83 | # Ensure a proper rotation matrix (det(R) = 1) 84 | if np.linalg.det(R) < 0: 85 | Vt[-1, :] *= -1 86 | R = np.dot(Vt.T, U.T) 87 | # Compute translation 88 | t = Cq.T - scale * np.dot(R, Cp.T) 89 | # Construct the transformation matrix 90 | T = np.eye(4) 91 | T[:3, :3] = scale * R 92 | T[:3, 3] = t 93 | return T 94 | 95 | 96 | def apply_transformation(p, T): 97 | """ 98 | Apply a 4x4 transformation matrix to a set of 3D points. 99 | 100 | Parameters: 101 | - p: A 2D numpy array of shape (N, 3), containing the (X, Y, Z) coordinates of each point. 102 | - T: The 4x4 transformation matrix. 103 | 104 | Returns: 105 | - A 2D numpy array of shape (N, 3), containing the transformed (X, Y, Z) coordinates. 106 | """ 107 | # Homogenize the points in p 108 | p_homogeneous = np.hstack((p, np.ones((p.shape[0], 1)))) 109 | # Apply the transformation matrix to each point 110 | p_transformed_homogeneous = np.dot(T, p_homogeneous.T).T 111 | # Dehomogenize the points 112 | p_transformed = p_transformed_homogeneous[:, :3] / p_transformed_homogeneous[:, 3, np.newaxis] 113 | return p_transformed 114 | 115 | 116 | def compute_principal_axis(point_cloud, switch_principal_axis=False, return_eigenstuff=False): 117 | # Extract principal axis of the point cloud 118 | # Compute the covariance matrix 119 | cov = np.cov(point_cloud.T) 120 | # Compute the eigenvalues and eigenvectors 121 | eigenvalues, eigenvectors = np.linalg.eig(cov) 122 | # Sort the eigenvectors by decreasing eigenvalues 123 | idx = eigenvalues.argsort()[::-1] 124 | eigenvalues = eigenvalues[idx] 125 | eigenvectors = eigenvectors[:, idx] 126 | # Extract the principal axis 127 | principal_axis = eigenvectors[:, 0] 128 | second_axis = eigenvectors[:, 1] 129 | # Make the vector longer by keep its direction 130 | principal_axis = principal_axis * 10 131 | second_axis = second_axis * 10 132 | if return_eigenstuff: 133 | return eigenvalues, eigenvectors 134 | if switch_principal_axis: 135 | # return second_axis, principal_axis 136 | return second_axis, eigenvectors[:, 2] * 10 137 | return principal_axis, second_axis 138 | 139 | 140 | def scale_gripper_match_hand(gripper_pcd, hand_pcd, vizualize=False): 141 | """Scaled the gripper to match the hand 1-to-1 142 | This is approximate and may not need to be engineered; but that's okay. 143 | """ 144 | gripper_bb = gripper_pcd.get_axis_aligned_bounding_box() 145 | hand_bb = hand_pcd.get_axis_aligned_bounding_box() 146 | # get bounding box volume 147 | volume_ratio = hand_bb.volume() / gripper_bb.volume() 148 | gripper = gripper_pcd.points 149 | hand = hand_pcd.points 150 | scale_x = (hand_bb.get_max_bound()[0] - hand_bb.get_min_bound()[0]) / ( 151 | gripper_bb.get_max_bound()[0] - gripper_bb.get_min_bound()[0]) 152 | scale_y = (hand_bb.get_max_bound()[1] - hand_bb.get_min_bound()[1]) / ( 153 | gripper_bb.get_max_bound()[1] - gripper_bb.get_min_bound()[1]) 154 | scale_z = (hand_bb.get_max_bound()[2] - hand_bb.get_min_bound()[2]) / ( 155 | gripper_bb.get_max_bound()[2] - gripper_bb.get_min_bound()[2]) 156 | center = gripper_pcd.get_center() 157 | mean_scaling = (scale_x + scale_y + scale_z) / 3 158 | # pcd_scaled = (gripper - center) * np.array([scale_x, scale_y, scale_z]) + center 159 | pcd_scaled = (gripper - center) * np.array([mean_scaling, mean_scaling, mean_scaling]) + center 160 | pcd_scaled_o3d = o3d.geometry.PointCloud() 161 | pcd_scaled_o3d.points = o3d.utility.Vector3dVector(pcd_scaled) 162 | 163 | if vizualize: 164 | o3d.visualization.draw_geometries([pcd_scaled_o3d, hand_pcd]) 165 | 166 | return pcd_scaled_o3d 167 | 168 | 169 | def np_to_o3d(points): 170 | pcd = o3d.geometry.PointCloud() 171 | pcd.points = o3d.utility.Vector3dVector(points) 172 | return pcd 173 | 174 | 175 | def interpolate_rotation(r1, r2, num_steps): 176 | r1 = R.from_matrix(r1) 177 | r2 = R.from_matrix(r2) 178 | # r1 and r2 in one Rotation instance 179 | key_rots = R.from_matrix([r1.as_matrix(), r2.as_matrix()]) 180 | # interpolate with Slerp 181 | slerp = Slerp([0, 1], key_rots) 182 | times = np.linspace(0, 1, num_steps) 183 | qs = slerp(times) 184 | 185 | return qs 186 | 187 | 188 | def interpolate_translation(t1, t2, num_steps): 189 | ts = np.zeros((num_steps, 3)) 190 | for i in range(num_steps): 191 | ts[i] = t1 + (t2 - t1) * i / (num_steps - 1) 192 | return ts 193 | 194 | 195 | def interpolate_pose(pose1, pose2, num_steps): 196 | r1 = pose1[:3, :3] 197 | t1 = pose1[:3, 3] 198 | r2 = pose2[:3, :3] 199 | t2 = pose2[:3, 3] 200 | rs = interpolate_rotation(r1, r2, num_steps) 201 | ts = interpolate_translation(t1, t2, num_steps) 202 | poses = np.zeros((num_steps, 4, 4)) 203 | for i in range(num_steps): 204 | poses[i, :3, :3] = rs[i].as_matrix() # R.from_quat(rs[i]).as_matrix() 205 | poses[i, :3, 3] = ts[i] 206 | poses[i, 3, 3] = 1 207 | return poses 208 | 209 | 210 | def interpolate_pose_sequence(poses, num_steps): 211 | interpolated_poses = [] 212 | for i in range(len(poses) - 1): 213 | interpolated_poses.append(interpolate_pose(poses[i], poses[i + 1], num_steps)) 214 | return np.vstack(interpolated_poses) 215 | 216 | def rotation_matrix(axis, theta): 217 | """ 218 | Return the rotation matrix associated with counterclockwise rotation about 219 | the given axis by theta radians. 220 | """ 221 | axis = np.asarray(axis) 222 | axis = axis / np.sqrt(np.dot(axis, axis)) 223 | a = np.cos(theta / 2.0) 224 | b, c, d = -axis * np.sin(theta / 2.0) 225 | aa, bb, cc, dd = a * a, b * b, c * c, d * d 226 | bc, ad, ac, ab, bd, cd = b * c, a * d, a * c, a * b, b * d, c * d 227 | return np.array([[aa + bb - cc - dd, 2 * (bc + ad), 2 * (bd - ac)], 228 | [2 * (bc - ad), aa + cc - bb - dd, 2 * (cd + ab)], 229 | [2 * (bd + ac), 2 * (cd - ab), aa + dd - bb - cc]]) 230 | -------------------------------------------------------------------------------- /video_to_gripper_hamer_kpts.py: -------------------------------------------------------------------------------- 1 | # IMPORTS, FILE LOADS, DETECTRON, RENDERER PREP ETC. 2 | from pathlib import Path 3 | import torch 4 | import argparse 5 | import os 6 | from hamer.configs import CACHE_DIR_HAMER 7 | from hamer.models import HAMER, download_models, load_hamer, DEFAULT_CHECKPOINT 8 | from hamer.utils import recursive_to 9 | from hamer.datasets.vitdet_dataset import ViTDetDataset, DEFAULT_MEAN, DEFAULT_STD 10 | from hamer.utils.renderer import Renderer, cam_crop_to_full 11 | from vitpose_model import ViTPoseModel 12 | from detectron2.utils.logger import setup_logger 13 | setup_logger() 14 | import cv2 15 | 16 | from detectron2.utils.visualizer import Visualizer 17 | from detectron2.data import MetadataCatalog 18 | 19 | import mediapipe as mp 20 | import cv2 as cv 21 | import matplotlib.pyplot as plt 22 | import copy 23 | import time 24 | from mano.utils import Mesh, Struct, colors, to_np, to_tensor 25 | import pickle 26 | import os.path as osp 27 | import trimesh 28 | from mano import lbs 29 | from mano.joints_info import TIP_IDS 30 | import tqdm 31 | 32 | # local imports 33 | from util_functions import * 34 | from global_vars import * 35 | 36 | args = argparse.Namespace() 37 | args.checkpoint = DEFAULT_CHECKPOINT 38 | args.batch_size = 1 39 | args.rescale_factor = 2.0 40 | args.body_detector = 'vitdet' 41 | 42 | IM_WIDTH = hands_rgb[0].shape[1] 43 | IM_HEIGHT = hands_rgb[0].shape[0] 44 | print("Image width: ", IM_WIDTH) 45 | print("Image height: ", IM_HEIGHT) 46 | 47 | INTRINSICS_HAMER_RENDERER[0, 2] = IM_WIDTH / 2 48 | INTRINSICS_HAMER_RENDERER[1, 2] = IM_HEIGHT / 2 49 | 50 | if VIZ_DEMO: 51 | # show rgbs as video 52 | for i in range(hands_rgb.shape[0]): 53 | im = cv2.cvtColor(hands_rgb[i], cv2.COLOR_BGR2RGB) 54 | cv2.imshow("rgb", im) 55 | cv2.waitKey(1) 56 | time.sleep(0.01) 57 | cv2.destroyAllWindows() 58 | 59 | # Download and load checkpoints 60 | download_models(CACHE_DIR_HAMER) 61 | model, model_cfg = load_hamer(args.checkpoint) 62 | 63 | # Setup HaMeR model 64 | device = torch.device('cpu') 65 | model = model.to(device) 66 | model.eval() 67 | # Load detector 68 | from hamer.utils.utils_detectron2 import DefaultPredictor_Lazy 69 | if args.body_detector == 'vitdet': 70 | from detectron2.config import LazyConfig 71 | import hamer 72 | cfg_path = Path(hamer.__file__).parent/'configs'/'cascade_mask_rcnn_vitdet_h_75ep.py' 73 | detectron2_cfg = LazyConfig.load(str(cfg_path)) 74 | detectron2_cfg.train.init_checkpoint = "https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl" 75 | for i in range(3): 76 | detectron2_cfg.model.roi_heads.box_predictors[i].test_score_thresh = 0.25 77 | detector = DefaultPredictor_Lazy(detectron2_cfg) 78 | elif args.body_detector == 'regnety': 79 | from detectron2 import model_zoo 80 | from detectron2.config import get_cfg 81 | detectron2_cfg = model_zoo.get_config('new_baselines/mask_rcnn_regnety_4gf_dds_FPN_400ep_LSJ.py', trained=True) 82 | detectron2_cfg.model.roi_heads.box_predictor.test_score_thresh = 0.5 83 | detectron2_cfg.model.roi_heads.box_predictor.test_nms_thresh = 0.4 84 | detector = DefaultPredictor_Lazy(detectron2_cfg) 85 | # keypoint detector 86 | cpm = ViTPoseModel(device) 87 | # Setup the renderer 88 | renderer = Renderer(model_cfg, faces=model.mano.faces, load_from_custom_file=True) 89 | 90 | print("Existing focal length: ", model_cfg.EXTRA.FOCAL_LENGTH) 91 | 92 | 93 | 94 | def get_hand_and_rendered_depth(rgb): 95 | # Get all demo images ends with .jpg or .png 96 | # Iterate over all images in folder 97 | img_cv2 = rgb 98 | img_cv2 = cv2.cvtColor(img_cv2, cv2.COLOR_BGR2RGB) 99 | # Detect humans in image 100 | det_out = detector(img_cv2) 101 | img = img_cv2.copy()[:, :, ::-1] 102 | det_instances = det_out['instances'] 103 | valid_idx = (det_instances.pred_classes==0) & (det_instances.scores > 0.5) 104 | pred_bboxes=det_instances.pred_boxes.tensor[valid_idx].cpu().numpy() 105 | pred_scores=det_instances.scores[valid_idx].cpu().numpy() 106 | # Detect human keypoints for each person 107 | vitposes_out = cpm.predict_pose(img, [np.concatenate([pred_bboxes, pred_scores[:, None]], axis=1)],) 108 | bboxes = [] 109 | is_right = [] 110 | # Use hands based on hand keypoint detections 111 | for vitposes in vitposes_out: 112 | left_hand_keyp = vitposes['keypoints'][-42:-21] 113 | right_hand_keyp = vitposes['keypoints'][-21:] 114 | # Rejecting not confident detections 115 | keyp = left_hand_keyp 116 | valid = keyp[:,2] > 0.5 117 | if sum(valid) > 3: 118 | bbox = [keyp[valid,0].min(), keyp[valid,1].min(), keyp[valid,0].max(), keyp[valid,1].max()] 119 | bboxes.append(bbox) 120 | is_right.append(0) 121 | keyp = right_hand_keyp 122 | valid = keyp[:,2] > 0.5 123 | if sum(valid) > 3: 124 | bbox = [keyp[valid,0].min(), keyp[valid,1].min(), keyp[valid,0].max(), keyp[valid,1].max()] 125 | bboxes.append(bbox) 126 | is_right.append(1) 127 | 128 | boxes = np.stack(bboxes) 129 | right = np.stack(is_right) 130 | # Run reconstruction on all detected hands 131 | dataset = ViTDetDataset(model_cfg, img_cv2, boxes, right, rescale_factor=args.rescale_factor) 132 | dataloader = torch.utils.data.DataLoader(dataset, batch_size=8, shuffle=False, num_workers=0) 133 | all_verts = [] 134 | all_cam_t = [] 135 | all_right = [] 136 | 137 | for batch in dataloader: 138 | batch = recursive_to(batch, device) 139 | with torch.no_grad(): 140 | out = model(batch) 141 | multiplier = (2*batch['right']-1) 142 | pred_cam = out['pred_cam'] 143 | pred_cam[:,1] = multiplier*pred_cam[:,1] 144 | box_center = batch["box_center"].float() 145 | box_size = batch["box_size"].float() 146 | img_size = batch["img_size"].float() 147 | multiplier = (2*batch['right']-1) 148 | scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max() 149 | INTRINSICS_HAMER_RENDERER[0 ,0] = scaled_focal_length.item() 150 | INTRINSICS_HAMER_RENDERER[1, 1] = scaled_focal_length.item() 151 | pred_cam_t_full = cam_crop_to_full(pred_cam, box_center, box_size, img_size, scaled_focal_length).detach().cpu().numpy() 152 | # Render the result 153 | batch_size = batch['img'].shape[0] 154 | print(f'Batch size: {batch_size}') 155 | for n in range(batch_size): 156 | # Get filename from path img_path 157 | input_patch = batch['img'][n].cpu() * (DEFAULT_STD[:,None,None]/255) + (DEFAULT_MEAN[:,None,None]/255) 158 | input_patch = input_patch.permute(1,2,0).numpy() 159 | # Add all verts and cams to list 160 | verts = out['pred_vertices'][n].detach().cpu().numpy() 161 | is_right = batch['right'][n].cpu().numpy() 162 | verts[:,0] = (2*is_right-1)*verts[:,0] 163 | cam_t = pred_cam_t_full[n] 164 | all_verts.append(verts) 165 | all_cam_t.append(cam_t) 166 | all_right.append(is_right) 167 | all_verts_as_np = np.asarray(all_verts) 168 | all_verts_as_np = all_verts_as_np[0] 169 | all_verts_list = [all_verts_as_np] 170 | 171 | misc_args = dict( 172 | mesh_base_color=HUMAN_HAND_COLOR, 173 | scene_bg_color=(1, 1, 1), 174 | focal_length=scaled_focal_length, 175 | ) 176 | cam_view, rend_depth_front_view, mesh_list = renderer.render_rgba_multiple(all_verts_list, cam_t=all_cam_t, render_res=img_size[n], is_right=all_right, **misc_args) 177 | # Overlay image 178 | input_img = img_cv2.astype(np.float32)[:,:,::-1]/255.0 179 | input_img = np.concatenate([input_img, np.ones_like(input_img[:,:,:1])], axis=2) # Add alpha channel 180 | 181 | 182 | 183 | input_img_overlay = input_img[:,:,:3] * (1-cam_view[:,:,3:]) + cam_view[:,:,:3] * cam_view[:,:,3:] 184 | camera_translation = cam_t.copy() 185 | hand_mesh = renderer.vertices_to_trimesh(verts, camera_translation, HUMAN_HAND_COLOR, is_right=is_right) 186 | return input_img_overlay, cam_view, rend_depth_front_view, det_out, out, (hand_mesh, camera_translation, verts), (mesh_list, all_cam_t, all_verts) 187 | 188 | 189 | def get_hand_mask_from_detectron(det_out, show_mask=False, rgb_im=None): 190 | if show_mask and rgb_im is not None: 191 | cfg = get_cfg() 192 | cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")) 193 | cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set threshold for this model 194 | cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml") 195 | # Make prediction 196 | outputs = det_out 197 | v = Visualizer(rgb_im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2) 198 | v = v.draw_instance_predictions(outputs["instances"].to("cpu")) 199 | try: 200 | human_mask = det_out["instances"].pred_masks[np.argwhere(det_out['instances'].pred_classes == 0)[0,0].item()].cpu().numpy() 201 | except: 202 | human_mask = det_out["instances"].pred_masks[np.argwhere(det_out['instances'].pred_classes.cpu().numpy() == 0)[0,0].item()].cpu().numpy() 203 | 204 | return human_mask 205 | 206 | 207 | def get_hand_keypoints_from_mediapipe(rgb_im, viz_keypoints=False): 208 | # # Initialize MediaPipe Hands. 209 | mp_hands = mp.solutions.hands 210 | hands = mp_hands.Hands(static_image_mode=True, max_num_hands=2, min_detection_confidence=0.5) 211 | color_image = copy.deepcopy((rgb_im * 255.0).astype(np.uint8)) 212 | if viz_keypoints: 213 | plt.imshow(rgb_im) 214 | plt.axis('off') 215 | plt.show() 216 | # Process the image to find hand landmarks. 217 | results = hands.process(color_image) 218 | d_val = -1 219 | # Draw the hand landmarks on the image. 220 | if results.multi_hand_landmarks: 221 | for hand_landmarks in results.multi_hand_landmarks: 222 | for landmark in hand_landmarks.landmark: 223 | # Draw a circle on the image. 224 | cx, cy = int(landmark.x * color_image.shape[1]), int(landmark.y * color_image.shape[0]) 225 | cv.circle(color_image, (cx, cy), 5, (0, 255, 0), -1) 226 | mp_drawing = mp.solutions.drawing_utils 227 | image = color_image 228 | if results.multi_hand_landmarks: 229 | for hand_landmarks in results.multi_hand_landmarks: 230 | mp_drawing.draw_landmarks(image, hand_landmarks, mp_hands.HAND_CONNECTIONS) 231 | # Example: Get the tip of the index finger 232 | index_tip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP] 233 | # print(f"Index Finger Tip Coordinates: (x: {index_tip.x}, y: {index_tip.y}, z: {index_tip.z})") 234 | index_x, index_y = int(index_tip.x * image.shape[1]), int(index_tip.y * image.shape[0]) 235 | thumb_tip = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP] 236 | # print(f"Thumb Tip Coordinates: (x: {thumb_tip.x}, y: {thumb_tip.y}, z: {thumb_tip.z})") 237 | thumb_x, thumb_y = int(thumb_tip.x * image.shape[1]), int(thumb_tip.y * image.shape[0]) 238 | middle_tip = hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP] 239 | # print(f"Middle Finger Tip Coordinates: (x: {middle_tip.x}, y: {middle_tip.y}, z: {middle_tip.z})") 240 | middle_x, middle_y = int(middle_tip.x * image.shape[1]), int(middle_tip.y * image.shape[0]) 241 | # Extract a hand mesh from the hand landmarks and display it. 242 | hand_mesh = mp_hands.HAND_CONNECTIONS 243 | hand_mesh = [list(pair) for pair in hand_mesh] 244 | # Draw the hand mesh on the image. 245 | if results.multi_hand_landmarks: 246 | for hand_landmarks in results.multi_hand_landmarks: 247 | for connection in hand_mesh: 248 | # Get the coordinates of the two points. 249 | start = (int(hand_landmarks.landmark[connection[0]].x * image.shape[1]), 250 | int(hand_landmarks.landmark[connection[0]].y * image.shape[0])) 251 | end = (int(hand_landmarks.landmark[connection[1]].x * image.shape[1]), 252 | int(hand_landmarks.landmark[connection[1]].y * image.shape[0])) 253 | # Draw a line connecting the two points. 254 | cv.line(image, start, end, (0, 255, 0), 2) 255 | 256 | # Draw a cirlce on the pixel of the index finger tip using matplotlib. 257 | thumb_cmc = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_CMC] 258 | thumb_cmc_x, thumb_cmc_y = int(thumb_cmc.x * image.shape[1]), int(thumb_cmc.y * image.shape[0]) 259 | 260 | thumb_IP = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_IP] 261 | thumb_IP_x, thumb_IP_y = int(thumb_IP.x * image.shape[1]), int(thumb_IP.y * image.shape[0]) 262 | 263 | thumb_MCP = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_MCP] 264 | thumb_MCP_x, thumb_MCP_y = int(thumb_MCP.x * image.shape[1]), int(thumb_MCP.y * image.shape[0]) 265 | 266 | index_mc = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_MCP] 267 | index_mc_x, index_mc_y = int(index_mc.x * image.shape[1]), int(index_mc.y * image.shape[0]) 268 | 269 | index_pip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_PIP] 270 | index_pip_x, index_pip_y = int(index_pip.x * image.shape[1]), int(index_pip.y * image.shape[0]) 271 | 272 | index_dip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_DIP] 273 | index_dip_x, index_dip_y = int(index_dip.x * image.shape[1]), int(index_dip.y * image.shape[0]) 274 | 275 | if viz_keypoints: 276 | plt.imshow(rgb_im) 277 | plt.scatter(index_x, index_y, color='blue', s=10) 278 | plt.scatter(index_dip_x, index_dip_y, color='orange', s=10) 279 | 280 | plt.scatter(index_pip_x, index_pip_y, color='purple', s=10) 281 | plt.scatter(thumb_x, thumb_y, color='blue', s=10) 282 | plt.scatter(thumb_IP_x, thumb_IP_y, color='orange', s=10) 283 | plt.scatter(thumb_MCP_x, thumb_MCP_y, color='purple', s=10) 284 | 285 | index_finger_1_x, index_finger_1_y = index_x, index_y 286 | index_finger_2_x, index_finger_2_y = index_dip_x, index_dip_y 287 | index_finger_3_x, index_finger_3_y = index_pip_x, index_pip_y 288 | 289 | thumb_1_x, thumb_1_y = thumb_x, thumb_y 290 | thumb_2_x, thumb_2_y = thumb_IP_x, thumb_IP_y 291 | thumb_3_x, thumb_3_y = thumb_MCP_x, thumb_MCP_y 292 | return middle_x, middle_y, index_finger_1_x, index_finger_1_y, index_finger_2_x, index_finger_2_y, index_finger_3_x, index_finger_3_y, thumb_1_x, thumb_1_y, thumb_2_x, thumb_2_y, thumb_3_x, thumb_3_y 293 | 294 | def align_gripper_to_hand(hand_point_cloud, hand_pcd_as_o3d, gripper_pcd, gripper_pcd_as_o3d, vizualize=False): 295 | 296 | # Compute the principal axis of the point cloud 297 | principal_axis_h, second_axis_h = compute_principal_axis(hand_point_cloud) 298 | principal_axis_g, second_axis_g = compute_principal_axis(gripper_pcd, switch_principal_axis=True) 299 | middle_point_gripper = np.mean(gripper_pcd, axis=0) 300 | middle_point_hand = np.mean(hand_point_cloud, axis=0) 301 | if vizualize: 302 | # Plot a vector in the direction of the principal axis in open3d 303 | line_set_h1 = o3d.geometry.LineSet() 304 | line_set_h1.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand + principal_axis_h]) 305 | line_set_h1.lines = o3d.utility.Vector2iVector([[0, 1]]) 306 | # make line red 307 | line_set_h1.colors = o3d.utility.Vector3dVector([[1, 0, 0]]) 308 | line_set_h2 = o3d.geometry.LineSet() 309 | line_set_h2.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand - principal_axis_h]) 310 | line_set_h2.lines = o3d.utility.Vector2iVector([[0, 1]]) 311 | # make line red 312 | line_set_h2.colors = o3d.utility.Vector3dVector([[1, 0, 0]]) 313 | line_set_h3 = o3d.geometry.LineSet() 314 | line_set_h3.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand + second_axis_h]) 315 | line_set_h3.lines = o3d.utility.Vector2iVector([[0, 1]]) 316 | # make line green 317 | line_set_h3.colors = o3d.utility.Vector3dVector([[0, 1, 0]]) 318 | line_set_h4 = o3d.geometry.LineSet() 319 | line_set_h4.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand - second_axis_h]) 320 | line_set_h4.lines = o3d.utility.Vector2iVector([[0, 1]]) 321 | # make line green 322 | line_set_h4.colors = o3d.utility.Vector3dVector([[0, 1, 0]]) 323 | line_set_g1 = o3d.geometry.LineSet() 324 | line_set_g1.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper + principal_axis_g]) 325 | line_set_g1.lines = o3d.utility.Vector2iVector([[0, 1]]) 326 | # make line red 327 | line_set_g1.colors = o3d.utility.Vector3dVector([[1, 0, 0]]) 328 | line_set_g2 = o3d.geometry.LineSet() 329 | line_set_g2.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper - principal_axis_g]) 330 | line_set_g2.lines = o3d.utility.Vector2iVector([[0, 1]]) 331 | # make line red 332 | line_set_g2.colors = o3d.utility.Vector3dVector([[1, 0, 0]]) 333 | line_set_g3 = o3d.geometry.LineSet() 334 | line_set_g3.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper + second_axis_g]) 335 | line_set_g3.lines = o3d.utility.Vector2iVector([[0, 1]]) 336 | # make line green 337 | line_set_g3.colors = o3d.utility.Vector3dVector([[0, 1, 0]]) 338 | line_set_g4 = o3d.geometry.LineSet() 339 | line_set_g4.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper - second_axis_g]) 340 | line_set_g4.lines = o3d.utility.Vector2iVector([[0, 1]]) 341 | # make line green 342 | line_set_g4.colors = o3d.utility.Vector3dVector([[0, 1, 0]]) 343 | # Plot the coordinate frame of the gripper in open3d 344 | gripper_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=middle_point_gripper) 345 | T_gripper_coord = np.eye(4) 346 | T_gripper_coord[3, :3] = middle_point_gripper 347 | # Show in open3d 348 | o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_pcd_as_o3d, line_set_h2, line_set_h3, line_set_h4, line_set_g2, line_set_g3, line_set_g4, gripper_coord]) 349 | # Extract points on each pair of principal axes to compute the relative transformation 350 | # Extract points on the principal axis of the hand 351 | q = np.array([middle_point_hand, 352 | hand_point_cloud[0] + principal_axis_h, 353 | # middle_point_hand - principal_axis_h, 354 | middle_point_hand + second_axis_h, 355 | middle_point_hand - second_axis_h]) 356 | p = np.array([middle_point_gripper, 357 | # middle_point_gripper + principal_axis_g, 358 | gripper_pcd[0] - principal_axis_g, 359 | middle_point_gripper + second_axis_g, 360 | middle_point_gripper - second_axis_g]) 361 | # Compute the relative transformation between the two pairs of principal axes 362 | T = find_scaled_transformation(p, q, use_scale=True) 363 | # Apply the transformation matrix to the point cloud 364 | gripper_aligned_to_hand_pcd = apply_transformation(gripper_pcd, T) 365 | # Show in open3d 366 | gripper_aligned_to_hand_pcd_as_o3d = o3d.geometry.PointCloud() 367 | gripper_aligned_to_hand_pcd_as_o3d.points = o3d.utility.Vector3dVector(gripper_aligned_to_hand_pcd) 368 | gripper_aligned_to_hand_pcd_as_o3d.paint_uniform_color([0, 1, 0]) 369 | 370 | if vizualize: 371 | o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_pcd_as_o3d, gripper_aligned_to_hand_pcd_as_o3d]) 372 | return gripper_aligned_to_hand_pcd_as_o3d 373 | 374 | def get_hand_pcd_in_scene_from_rendered_model(rgb_im, depth_im, rend_depth_front_view, new_depth_image, human_mask, vizualize=False): 375 | 376 | point_cloud_camera = depth_to_point_cloud(depth_im, INTRINSICS_REAL_CAMERA[0, 0], INTRINSICS_REAL_CAMERA[1, 1], INTRINSICS_REAL_CAMERA[0, 2], INTRINSICS_REAL_CAMERA[1, 2]) # Camera intrinsics, depth at real scale, although the shape is not accurate 377 | point_cloud_camera = point_cloud_camera.reshape(-1, 3) 378 | point_cloud_rescaled_depth = depth_to_point_cloud(new_depth_image, INTRINSICS_HAMER_RENDERER[0, 0], INTRINSICS_HAMER_RENDERER[1, 1], INTRINSICS_HAMER_RENDERER[0, 2], INTRINSICS_HAMER_RENDERER[1, 2]) # Hamer intrinsics, depth at real scale, shape does not match real 379 | point_cloud_rescaled_depth = point_cloud_rescaled_depth.reshape(-1, 3) 380 | pcd_camera = o3d.geometry.PointCloud() 381 | pcd_camera.points = o3d.utility.Vector3dVector(point_cloud_camera) 382 | pcd_rescaled_depth = o3d.geometry.PointCloud() 383 | pcd_rescaled_depth.points = o3d.utility.Vector3dVector(point_cloud_rescaled_depth) 384 | 385 | show_scaled_up_pcd = False 386 | remove_outliers=True # does not work that well, a big hacky 387 | # print("point cloud camera shape: ", point_cloud_camera.shape) 388 | point_cloud_camera = point_cloud_camera.reshape(IM_HEIGHT, IM_WIDTH, 3) 389 | point_cloud_rescaled_depth = point_cloud_rescaled_depth.reshape(IM_HEIGHT, IM_WIDTH, 3) 390 | # Extract common points between rendered model and live image 391 | common_points_binary = np.zeros_like(new_depth_image) 392 | common_points_binary[new_depth_image > 0] = 1 # binary mask of points in the rendered model depth 393 | 394 | depth_im_hand = human_mask * depth_im # extract human hand from the live image 395 | depth_im_hand = common_points_binary * depth_im_hand # extract common points between rendered depth model and live image 396 | if vizualize: 397 | print(depth_im_hand) 398 | plt.imshow(depth_im_hand) 399 | plt.title("Depth image of hand after human mask") 400 | plt.show() 401 | 402 | 403 | if remove_outliers: # a bit hacky, does not do much, if not cause problems 404 | mean_depth = np.median(new_depth_image[new_depth_image > 0]) 405 | depth_im_hand[np.abs(depth_im_hand) > mean_depth + 0.2] = 0 # remove all points 2m away from the camera 406 | 407 | common_points = np.zeros_like(depth_im_hand) 408 | common_points[depth_im_hand > 0] = 1 # Binary mask of common points between rendered model depth and live image 409 | 410 | if vizualize: 411 | plt.imshow(common_points) 412 | plt.title("Common points") 413 | plt.show() 414 | 415 | common_pts_indices = np.argwhere(common_points > 0) # extract common points indices 416 | q = point_cloud_camera[common_pts_indices[:, 0], common_pts_indices[:, 1]] 417 | p = point_cloud_rescaled_depth[common_pts_indices[:, 0], common_pts_indices[:, 1]] 418 | 419 | T = find_scaled_transformation(p, q, use_scale=False) # Compute scale and rotation matrix Tp = q 420 | pcd_rendered_hand_to_live_hand = apply_transformation(point_cloud_rescaled_depth.reshape(-1, 3), T) 421 | pcd_rendered_hand_to_live_hand = pcd_rendered_hand_to_live_hand.reshape(-1, 3) 422 | pcd_rendered_hand_to_live_hand_full = copy.deepcopy(pcd_rendered_hand_to_live_hand) 423 | # pcd_rendered_hand_to_live_hand = pcd_rendered_hand_to_live_hand[pcd_rendered_hand_to_live_hand[:, 2] > .400] 424 | 425 | # Show the point cloud in open3d 426 | pcd_hand_to_scale = o3d.geometry.PointCloud() 427 | pcd_hand_to_scale.points = o3d.utility.Vector3dVector(pcd_rendered_hand_to_live_hand) 428 | hand_point_cloud = pcd_rendered_hand_to_live_hand # from above 429 | hand_point_cloud_full = pcd_rendered_hand_to_live_hand_full 430 | use_scaled_up_hand = True 431 | use_both_projected_hands = False 432 | clean_live_image_background = True 433 | if clean_live_image_background: 434 | mean_depth_hand = np.mean(new_depth_image[new_depth_image > 0]) 435 | std_depth_hand = np.std(new_depth_image[new_depth_image > 0]) 436 | depth_im[depth_im > mean_depth_hand + 1000] = 0 437 | depth_im[depth_im < mean_depth_hand - 1000] = 0 438 | if not use_scaled_up_hand and not use_both_projected_hands: 439 | hand_point_cloud = depth_to_point_cloud(new_depth_image, INTRINSICS_REAL_CAMERA[0, 0], INTRINSICS_REAL_CAMERA[1, 1], INTRINSICS_REAL_CAMERA[0, 2], INTRINSICS_REAL_CAMERA[1, 2]) 440 | hand_point_cloud = hand_point_cloud.reshape(-1, 3) 441 | hand_point_cloud_full = hand_point_cloud 442 | if use_both_projected_hands: 443 | hand_point_cloud_via_intrinsics = depth_to_point_cloud(new_depth_image, INTRINSICS_REAL_CAMERA[0, 0], INTRINSICS_REAL_CAMERA[1, 1], INTRINSICS_REAL_CAMERA[0, 2], INTRINSICS_REAL_CAMERA[1, 2]) 444 | hand_point_cloud_via_intrinsics = hand_point_cloud_via_intrinsics.reshape(-1, 3) 445 | image_point_cloud = depth_to_point_cloud(depth_im, INTRINSICS_REAL_CAMERA[0, 0], INTRINSICS_REAL_CAMERA[1, 1], INTRINSICS_REAL_CAMERA[0, 2], INTRINSICS_REAL_CAMERA[1, 2]) 446 | image_point_cloud = image_point_cloud.reshape(-1, 3) 447 | # Point cloud to open3d point cloud 448 | pcd_hand_to_scale = o3d.geometry.PointCloud() 449 | pcd_hand_to_scale.points = o3d.utility.Vector3dVector(hand_point_cloud[hand_point_cloud[:, 2] > 0]) 450 | # Set pcd color to red 451 | pcd_hand_to_scale.paint_uniform_color([1, 0, 0]) 452 | pcd_image = o3d.geometry.PointCloud() 453 | pcd_image.points = o3d.utility.Vector3dVector(image_point_cloud) 454 | # Set color to RGB 455 | pcd_image.colors = o3d.utility.Vector3dVector(rgb_im.reshape(-1, 3)) 456 | # Set pcd color to blue 457 | if use_both_projected_hands and vizualize: 458 | pcd_via_intrinsics = o3d.geometry.PointCloud() 459 | pcd_via_intrinsics.points = o3d.utility.Vector3dVector(hand_point_cloud_via_intrinsics[hand_point_cloud_via_intrinsics[:, 2] > 0]) 460 | pcd_via_intrinsics.paint_uniform_color([0, 1, 0]) 461 | pcd_hand_to_scale.paint_uniform_color([1, 0, 0]) 462 | 463 | return hand_point_cloud, hand_point_cloud_full, image_point_cloud, pcd_hand_to_scale, pcd_image, T 464 | 465 | def align_gripper_to_hand(hand_point_cloud, hand_pcd_as_o3d, gripper_pcd, gripper_pcd_as_o3d, vizualize=False): 466 | 467 | # Compute the principal axis of the point cloud 468 | principal_axis_h, second_axis_h = compute_principal_axis(hand_point_cloud) 469 | principal_axis_g, second_axis_g = compute_principal_axis(gripper_pcd, switch_principal_axis=True) 470 | middle_point_gripper = np.mean(gripper_pcd, axis=0) 471 | middle_point_hand = np.mean(hand_point_cloud, axis=0) 472 | if vizualize: 473 | # Plot a vector in the direction of the principal axis in open3d 474 | line_set_h1 = o3d.geometry.LineSet() 475 | line_set_h1.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand + principal_axis_h]) 476 | line_set_h1.lines = o3d.utility.Vector2iVector([[0, 1]]) 477 | # make line red 478 | line_set_h1.colors = o3d.utility.Vector3dVector([[1, 0, 0]]) 479 | line_set_h2 = o3d.geometry.LineSet() 480 | line_set_h2.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand - principal_axis_h]) 481 | line_set_h2.lines = o3d.utility.Vector2iVector([[0, 1]]) 482 | # make line red 483 | line_set_h2.colors = o3d.utility.Vector3dVector([[1, 0, 0]]) 484 | line_set_h3 = o3d.geometry.LineSet() 485 | line_set_h3.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand + second_axis_h]) 486 | line_set_h3.lines = o3d.utility.Vector2iVector([[0, 1]]) 487 | # make line green 488 | line_set_h3.colors = o3d.utility.Vector3dVector([[0, 1, 0]]) 489 | line_set_h4 = o3d.geometry.LineSet() 490 | line_set_h4.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand - second_axis_h]) 491 | line_set_h4.lines = o3d.utility.Vector2iVector([[0, 1]]) 492 | # make line green 493 | line_set_h4.colors = o3d.utility.Vector3dVector([[0, 1, 0]]) 494 | line_set_g1 = o3d.geometry.LineSet() 495 | line_set_g1.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper + principal_axis_g]) 496 | line_set_g1.lines = o3d.utility.Vector2iVector([[0, 1]]) 497 | # make line red 498 | line_set_g1.colors = o3d.utility.Vector3dVector([[1, 0, 0]]) 499 | line_set_g2 = o3d.geometry.LineSet() 500 | line_set_g2.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper - principal_axis_g]) 501 | line_set_g2.lines = o3d.utility.Vector2iVector([[0, 1]]) 502 | # make line red 503 | line_set_g2.colors = o3d.utility.Vector3dVector([[1, 0, 0]]) 504 | line_set_g3 = o3d.geometry.LineSet() 505 | line_set_g3.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper + second_axis_g]) 506 | line_set_g3.lines = o3d.utility.Vector2iVector([[0, 1]]) 507 | # make line green 508 | line_set_g3.colors = o3d.utility.Vector3dVector([[0, 1, 0]]) 509 | line_set_g4 = o3d.geometry.LineSet() 510 | line_set_g4.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper - second_axis_g]) 511 | line_set_g4.lines = o3d.utility.Vector2iVector([[0, 1]]) 512 | # make line green 513 | line_set_g4.colors = o3d.utility.Vector3dVector([[0, 1, 0]]) 514 | # Plot the coordinate frame of the gripper in open3d 515 | gripper_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=middle_point_gripper) 516 | T_gripper_coord = np.eye(4) 517 | T_gripper_coord[3, :3] = middle_point_gripper 518 | # Show in open3d 519 | o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_pcd_as_o3d, line_set_h2, line_set_h3, line_set_h4, line_set_g2, line_set_g3, line_set_g4, gripper_coord]) 520 | # Extract points on each pair of principal axes to compute the relative transformation 521 | # Extract points on the principal axis of the hand 522 | q = np.array([middle_point_hand, 523 | hand_point_cloud[0] + principal_axis_h, 524 | # middle_point_hand - principal_axis_h, 525 | middle_point_hand + second_axis_h, 526 | middle_point_hand - second_axis_h]) 527 | p = np.array([middle_point_gripper, 528 | # middle_point_gripper + principal_axis_g, 529 | gripper_pcd[0] - principal_axis_g, 530 | middle_point_gripper + second_axis_g, 531 | middle_point_gripper - second_axis_g]) 532 | # Compute the relative transformation between the two pairs of principal axes 533 | T = find_scaled_transformation(p, q, use_scale=True) 534 | # Apply the transformation matrix to the point cloud 535 | gripper_aligned_to_hand_pcd = apply_transformation(gripper_pcd, T) 536 | # Show in open3d 537 | gripper_aligned_to_hand_pcd_as_o3d = o3d.geometry.PointCloud() 538 | gripper_aligned_to_hand_pcd_as_o3d.points = o3d.utility.Vector3dVector(gripper_aligned_to_hand_pcd) 539 | gripper_aligned_to_hand_pcd_as_o3d.paint_uniform_color([0, 1, 0]) 540 | 541 | if vizualize: 542 | o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_pcd_as_o3d, gripper_aligned_to_hand_pcd_as_o3d]) 543 | return gripper_aligned_to_hand_pcd_as_o3d 544 | 545 | def align_gripper_with_hand_fingers(gripper_scaled_to_hand_pcd, 546 | hand_pcd_as_o3d, 547 | key_fingers_points, 548 | gripper_aligned_to_hand_pcd_as_o3d, 549 | gripper_pcd_dense_mesh, 550 | use_only_thumb_keypoints=False, 551 | use_only_index_keypoints=False, 552 | rescale_gripper_to_hand_opening=False, 553 | rescale_hand_to_gripper_opening=False, 554 | vizualize=False, 555 | bias_transformation=np.eye(4)): 556 | assert (use_only_index_keypoints == use_only_thumb_keypoints) or use_only_index_keypoints == False , f'Either ONLY index {use_only_thumb_keypoints} or thumb keypoints {use_only_index_keypoints} can be used. Both False=Use both' 557 | assert (rescale_gripper_to_hand_opening == rescale_hand_to_gripper_opening) or rescale_gripper_to_hand_opening == False, f'Either rescale gripper to hand opening {rescale_gripper_to_hand_opening} or rescale hand to gripper opening {rescale_hand_to_gripper_opening} can be used. Both False=Do not rescale' 558 | 559 | dense_pcd_kpts = {"index_front": 517980, "thumb_front": 248802, "wrist": 246448} 560 | gripper_fingers = np.array([gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["index_front"]], 561 | gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["index_front"]], 562 | gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["index_front"]], 563 | gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["thumb_front"]], 564 | gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["thumb_front"]], 565 | gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["thumb_front"]], 566 | gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["wrist"]]]) 567 | 568 | if rescale_gripper_to_hand_opening: 569 | distance_between_thumb_and_index = np.linalg.norm(key_fingers_points[0] - key_fingers_points[3]) 570 | distance_between_thumb_and_index_gripper = np.linalg.norm(gripper_fingers[0] - gripper_fingers[3]) 571 | scaling_factor = distance_between_thumb_and_index / distance_between_thumb_and_index_gripper 572 | center_gripper = gripper_scaled_to_hand_pcd.get_center() 573 | gripper_scaled_to_hand_pcd_np = (np.asarray(gripper_scaled_to_hand_pcd.points) - center_gripper) * np.array([scaling_factor, scaling_factor, scaling_factor]) + center_gripper 574 | gripper_scaled_to_hand_pcd.points = o3d.utility.Vector3dVector(gripper_scaled_to_hand_pcd_np) 575 | 576 | if rescale_hand_to_gripper_opening: 577 | distance_between_thumb_and_index = np.linalg.norm(key_fingers_points[0] - key_fingers_points[3]) 578 | distance_between_thumb_and_index_gripper = np.linalg.norm(gripper_fingers[0] - gripper_fingers[3]) 579 | scaling_factor = distance_between_thumb_and_index_gripper / distance_between_thumb_and_index 580 | center_hand = hand_pcd_as_o3d.get_center() 581 | hand_pcd_as_o3d_np = (np.asarray(hand_pcd_as_o3d.points) - center_hand) * np.array([scaling_factor, scaling_factor, scaling_factor]) + center_hand 582 | hand_pcd_as_o3d.points = o3d.utility.Vector3dVector(hand_pcd_as_o3d_np) 583 | # rescale the key fingers points 584 | key_finger_points_center = np.mean(key_fingers_points, axis=0) 585 | key_fingers_points = (key_fingers_points - key_finger_points_center) * np.array([scaling_factor, scaling_factor, scaling_factor]) + key_finger_points_center 586 | 587 | 588 | # determine a line that goes through the index and thumb of the hand 589 | kpt_o3d_sphere = [] 590 | count = 0 591 | # key_fingers_points_4pts = np.array([key_fingers_points[0], key_fingers_points[1], key_fingers_points[3], key_fingers_points[4]]) 592 | key_fingers_points_4pts = np.array([key_fingers_points[0], key_fingers_points[3], key_fingers_points[-1]]) 593 | 594 | key_fingers_points = np.array(key_fingers_points_4pts) 595 | 596 | alpha = 1 597 | line_point = key_fingers_points[0] + alpha * (key_fingers_points[1] - key_fingers_points[0]) 598 | unit_vec_difference = (key_fingers_points[1] - key_fingers_points[0]) / np.linalg.norm(key_fingers_points[1] - key_fingers_points[0]) 599 | distance_gripper_fingers = np.linalg.norm(gripper_fingers[0] - gripper_fingers[4]) 600 | distance_key_fingers = np.linalg.norm(key_fingers_points[0] - key_fingers_points[1]) 601 | difference_half = np.abs(distance_gripper_fingers - distance_key_fingers)/2 602 | pt1 = key_fingers_points[0] - unit_vec_difference * difference_half 603 | pt2 = key_fingers_points[1] + unit_vec_difference * difference_half 604 | middle_finger_point = key_fingers_points[0] + unit_vec_difference * distance_key_fingers/2 605 | distance_middle_griper_middle_finger = np.linalg.norm(gripper_fingers[-1] - middle_finger_point) 606 | unit_difference_between_middle_finger_point_and_key_fingers_last = (middle_finger_point - key_fingers_points[-1]) / np.linalg.norm(middle_finger_point - key_fingers_points[-1]) 607 | 608 | new_hand_point = middle_finger_point - unit_difference_between_middle_finger_point_and_key_fingers_last * distance_middle_griper_middle_finger 609 | distance_pt1_pt2 = np.linalg.norm(pt1 - pt2) 610 | print(f"Distance between pt1 and pt2: {distance_pt1_pt2}") 611 | print(f"Distance between gripper fingers: {distance_gripper_fingers}") 612 | key_fingers_points = np.array([pt1, pt2, key_fingers_points[-1]]) 613 | # key_fingers_points = np.array([pt1, pt2]) 614 | 615 | for kpt in key_fingers_points: 616 | sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003) 617 | sphere.compute_vertex_normals() 618 | count += 1 619 | if count % 3 == 0: 620 | red, green, blue = 1, 0, 1 621 | elif count % 3 == 1: 622 | red, green, blue = 0,1,0 623 | else: 624 | red, green, blue = 1, 0.5, 0 625 | sphere.paint_uniform_color([red, green, blue]) 626 | sphere.translate(kpt) 627 | kpt_o3d_sphere.append(sphere) 628 | 629 | # add middle finger point 630 | sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003) 631 | sphere.compute_vertex_normals() 632 | sphere.paint_uniform_color([1, .5, .777]) 633 | sphere.translate(middle_finger_point) 634 | kpt_o3d_sphere.append(sphere) 635 | 636 | # gripper_fingers_4pts = np.array([gripper_fingers[1], gripper_fingers[2], gripper_fingers[4], gripper_fingers[5]]) 637 | gripper_fingers_4pts = np.array([gripper_fingers[0], gripper_fingers[4], gripper_fingers[-1]]) 638 | # gripper_fingers_4pts = np.array([gripper_fingers[0], gripper_fingers[4]]) 639 | 640 | 641 | gripper_fingers = np.array(gripper_fingers_4pts) 642 | gripper_fingers_o3d = [] 643 | count = 0 644 | # Create vizualizer to sequentilaly add spheres to the gripper fingers 645 | for kpt in gripper_fingers: 646 | sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003) 647 | sphere.compute_vertex_normals() 648 | count += 1 649 | if count % 3 == 0: 650 | red, green, blue = 1, 0, 1 # color: purple 651 | elif count % 3 == 1: 652 | red, green, blue = 0, 1, 0 653 | else: 654 | red, green, blue = 1, 0.5, 0 655 | sphere.paint_uniform_color([red, green, blue]) 656 | sphere.translate(kpt) 657 | gripper_fingers_o3d.append(sphere) 658 | 659 | if use_only_thumb_keypoints: 660 | key_fingers_points = key_fingers_points[2:] 661 | gripper_fingers = gripper_fingers[2:] 662 | gripper_fingers_o3d = gripper_fingers_o3d[2:] 663 | kpt_o3d_sphere = kpt_o3d_sphere[2:] 664 | 665 | if use_only_index_keypoints: 666 | key_fingers_points = key_fingers_points[:4] 667 | gripper_fingers = gripper_fingers[:4] 668 | gripper_fingers_o3d = gripper_fingers_o3d[:4] 669 | kpt_o3d_sphere = kpt_o3d_sphere[:4] 670 | 671 | 672 | T = find_scaled_transformation(gripper_fingers[:2], key_fingers_points[:2], use_scale=False) 673 | # transform the gripper_fingers_o3d to the hand frame 674 | for sphere in gripper_fingers_o3d: 675 | sphere.transform(T) 676 | 677 | gripper_pcd_before_transform = copy.deepcopy(gripper_scaled_to_hand_pcd) 678 | gripper_scaled_to_hand_pcd.transform(T) 679 | # Assume R is the rotation matrix you've computed and t is the translation 680 | # Transform z2 681 | R = T[:3, :3] 682 | t = T[:3, 3] 683 | z1 = key_fingers_points[-1] 684 | x1 = key_fingers_points[0] 685 | y1 = key_fingers_points[1] 686 | z2 = gripper_fingers[-1] 687 | x2 = gripper_fingers[0] 688 | y2 = gripper_fingers[1] 689 | 690 | z2_transformed = R @ z2 + t 691 | # Compute rotation axis (using x2 and y2 after transformation) 692 | x2_transformed = R @ x2 + t 693 | y2_transformed = R @ y2 + t 694 | rotation_axis = y2_transformed - x2_transformed #np.cross(x2_transformed - y2_transformed, z2_transformed - y2_transformed) 695 | 696 | 697 | # find theta that bring z2 as closest as possible to z2 while keeping the rotation axis the same 698 | distance = 10e10 699 | rotation_theta = None 700 | for theta in np.linspace(0, 2 * np.pi, 1000): 701 | R_additional = rotation_matrix(rotation_axis, theta) 702 | z2_final = (z2_transformed - (y2_transformed + x2_transformed) / 2) @ R_additional.T + (y2_transformed + x2_transformed) / 2 703 | distance_temp = np.linalg.norm(z2_final - z1) 704 | if distance_temp < distance: 705 | distance = distance_temp 706 | rotation_theta = theta 707 | 708 | # Apply rotation about the axis 709 | R_additional = rotation_matrix(rotation_axis, rotation_theta) 710 | z2_final = (z2_transformed - (y2_transformed + x2_transformed) / 2) @ R_additional.T + (y2_transformed + x2_transformed) / 2 711 | 712 | T2 = np.eye(4) 713 | T2[:3, :3] = R_additional 714 | gripper_scaled_to_hand_pcd_points = np.asarray(gripper_scaled_to_hand_pcd.points) 715 | gripper_scaled_to_hand_pcd_points = (gripper_scaled_to_hand_pcd_points - (y2_transformed + x2_transformed) / 2) @ R_additional.T + (y2_transformed + x2_transformed) / 2 716 | gripper_scaled_to_hand_pcd.points = o3d.utility.Vector3dVector(gripper_scaled_to_hand_pcd_points) 717 | # gripper_scaled_to_hand_pcd.transform(T2) 718 | gripper_aligned_to_hand_pcd_as_o3d.paint_uniform_color([.1, 1, 1]) 719 | 720 | # z2_final = gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["wrist"]] 721 | # add z2_final to sphere 722 | sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003) 723 | sphere.compute_vertex_normals() 724 | sphere.paint_uniform_color([0, 0, 0]) 725 | sphere.translate(z2_final) 726 | kpt_o3d_sphere.append(sphere) 727 | 728 | if bias_transformation is None: 729 | bias_transformation = np.eye(4) 730 | # apply bias transformation in the gripper frame 731 | gripper_pose, gripper_zero_mean = get_gripper_transform_in_camera_frame(gripper_scaled_to_hand_pcd, 732 | gripper_pcd_dense_mesh, 733 | return_zero_meaned_gripper=True, 734 | vizualize=vizualize) 735 | gripper_pose = gripper_pose @ bias_transformation 736 | gripper_zero_mean.transform(gripper_pose) 737 | gripper_scaled_to_hand_pcd = copy.deepcopy(gripper_zero_mean) 738 | 739 | if vizualize: 740 | # o3d.visualization.draw_geometries([pcd_image, gripper_scaled_to_hand_pcd]) 741 | line_o3d = o3d.geometry.LineSet() 742 | line_o3d.points = o3d.utility.Vector3dVector([key_fingers_points[0], key_fingers_points[0] + unit_vec_difference * 3]) 743 | line_o3d.lines = o3d.utility.Vector2iVector([[0, 1]]) 744 | 745 | line_o3d_2 = o3d.geometry.LineSet() 746 | line_o3d_2.points = o3d.utility.Vector3dVector([key_fingers_points[0], key_fingers_points[0] - unit_vec_difference * 3]) 747 | line_o3d_2.lines = o3d.utility.Vector2iVector([[0, 1]]) 748 | 749 | line_o3d_3 = o3d.geometry.LineSet() 750 | line_o3d_3.points = o3d.utility.Vector3dVector([middle_finger_point, key_fingers_points[1] + unit_difference_between_middle_finger_point_and_key_fingers_last * 3]) 751 | line_o3d_3.lines = o3d.utility.Vector2iVector([[0, 1]]) 752 | 753 | line_o3d_4 = o3d.geometry.LineSet() 754 | line_o3d_4.points = o3d.utility.Vector3dVector([middle_finger_point, key_fingers_points[1] - unit_difference_between_middle_finger_point_and_key_fingers_last * 3]) 755 | line_o3d_4.lines = o3d.utility.Vector2iVector([[0, 1]]) 756 | 757 | 758 | line_o3d_rotation_axis = o3d.geometry.LineSet() 759 | line_o3d_rotation_axis.points = o3d.utility.Vector3dVector([x2_transformed, x2_transformed + 10 * rotation_axis]) 760 | line_o3d_rotation_axis.lines = o3d.utility.Vector2iVector([[0, 1]]) 761 | 762 | line_o3d_rotation_axis_2 = o3d.geometry.LineSet() 763 | line_o3d_rotation_axis_2.points = o3d.utility.Vector3dVector([y2_transformed, y2_transformed - 10 * rotation_axis]) 764 | line_o3d_rotation_axis_2.lines = o3d.utility.Vector2iVector([[0, 1]]) 765 | 766 | gripper_frame_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0]) 767 | gripper_frame_coord.transform(gripper_pose) 768 | gripper_scaled_to_hand_pcd.paint_uniform_color([0, 1, 0]) 769 | hand_pcd_as_o3d.paint_uniform_color([1, 0, 0]) 770 | o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_scaled_to_hand_pcd, gripper_frame_coord] + gripper_fingers_o3d + kpt_o3d_sphere + [line_o3d, line_o3d_2, line_o3d_3, line_o3d_4, line_o3d_rotation_axis, line_o3d_rotation_axis_2]) 771 | # o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_scaled_to_hand_pcd] + gripper_fingers_o3d + kpt_o3d_sphere + [line_o3d_rotation_axis]) 772 | 773 | 774 | return gripper_scaled_to_hand_pcd, gripper_pose, distance_key_fingers 775 | 776 | def align_hand_to_gripper_press(gripper_pcd, 777 | actions, 778 | vizualize=False, 779 | bias_transformation=np.eye(4)): 780 | 781 | gripper_pcd_original_mesh = copy.deepcopy(gripper_pcd) 782 | dense_pcd_kpts = {"index_front": 517980, 783 | "index_middle": 231197, 784 | "index_bottom":335530, 785 | "thumb_front": 248802, 786 | "thumb_middle":71859, 787 | "thumb_bottom":523328, 788 | "wrist": 246448} 789 | 790 | gripper_fingers = np.array([gripper_pcd.points[dense_pcd_kpts["index_front"]], 791 | gripper_pcd.points[dense_pcd_kpts["index_middle"]], 792 | gripper_pcd.points[dense_pcd_kpts["index_bottom"]], 793 | gripper_pcd.points[dense_pcd_kpts["thumb_front"]], 794 | gripper_pcd.points[dense_pcd_kpts["thumb_middle"]], 795 | gripper_pcd.points[dense_pcd_kpts["thumb_bottom"]], 796 | gripper_pcd.points[dense_pcd_kpts["wrist"]]]) 797 | 798 | gripper_fingers[0] = gripper_fingers[0] - (gripper_fingers[0] - gripper_fingers[3]) / 2 799 | gripper_fingers[4] = gripper_fingers[1] - (gripper_fingers[1] - gripper_fingers[4]) / 2 800 | gripper_fingers[-1] = gripper_fingers[2] - (gripper_fingers[2] - gripper_fingers[5]) / 2 801 | key_fingers_points = actions 802 | 803 | kpt_o3d_sphere = [] 804 | count = 0 805 | 806 | for kpt in key_fingers_points: 807 | sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003) 808 | sphere.compute_vertex_normals() 809 | count += 1 810 | if count % 3 == 0: 811 | red, green, blue = 1, 0, 1 812 | elif count % 3 == 1: 813 | red, green, blue = 0,1,0 814 | else: 815 | red, green, blue = 1, 0.5, 0 816 | sphere.paint_uniform_color([red, green, blue]) 817 | sphere.translate(kpt) 818 | kpt_o3d_sphere.append(sphere) 819 | 820 | 821 | gripper_fingers_4pts = np.array([gripper_fingers[0], gripper_fingers[4], gripper_fingers[-1]]) 822 | gripper_fingers = np.array(gripper_fingers_4pts) 823 | gripper_fingers_o3d = [] 824 | count = 0 825 | # Create vizualizer to sequentilaly add spheres to the gripper fingers 826 | for kpt in gripper_fingers: 827 | sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003) 828 | sphere.compute_vertex_normals() 829 | count += 1 830 | if count % 3 == 0: 831 | red, green, blue = 1, 0, 1 # color: purple 832 | elif count % 3 == 1: 833 | red, green, blue = 0, 1, 0 834 | else: 835 | red, green, blue = 1, 0.5, 0 836 | sphere.paint_uniform_color([red, green, blue]) 837 | sphere.translate(kpt) 838 | gripper_fingers_o3d.append(sphere) 839 | 840 | # o3d.visualization.draw_geometries([gripper_pcd] + gripper_fingers_o3d ) 841 | 842 | 843 | T = find_scaled_transformation(gripper_fingers, key_fingers_points, use_scale=False) 844 | # transform the gripper_fingers_o3d to the hand frame 845 | for sphere in gripper_fingers_o3d: 846 | sphere.transform(T) 847 | gripper_pcd.transform(T) 848 | 849 | 850 | if bias_transformation is None: 851 | bias_transformation = np.eye(4) 852 | # apply bias transformation in the gripper frame 853 | gripper_pose, gripper_zero_mean = get_gripper_transform_in_camera_frame(gripper_pcd, 854 | gripper_pcd_original_mesh, 855 | return_zero_meaned_gripper=True, 856 | vizualize=vizualize) 857 | gripper_pose = gripper_pose @ bias_transformation 858 | gripper_zero_mean.transform(gripper_pose) 859 | gripper_pcd = copy.deepcopy(gripper_zero_mean) 860 | 861 | if vizualize: 862 | # o3d.visualization.draw_geometries([pcd_image, gripper_scaled_to_hand_pcd]) 863 | 864 | gripper_frame_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0]) 865 | gripper_frame_coord.transform(gripper_pose) 866 | gripper_pcd.paint_uniform_color([0.0,0.0,0.0]) 867 | o3d.visualization.draw_geometries([gripper_pcd] + kpt_o3d_sphere ) 868 | # o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_scaled_to_hand_pcd] + gripper_fingers_o3d + kpt_o3d_sphere + [line_o3d_rotation_axis]) 869 | 870 | 871 | return gripper_pcd, gripper_pose 872 | 873 | def get_gripper_transform_in_camera_frame(gripper_scaled_to_hand_pcd, original_hand_pcd, vizualize=False, return_zero_meaned_gripper=False): 874 | # Add a world frame 875 | world_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0]) 876 | # Show in open3d 877 | # o3d.visualization.draw_geometries([gripper_scaled_to_hand_pcd, sphere]) 878 | gripper_zero_origin = copy.deepcopy(np.asarray(original_hand_pcd.points)) 879 | # zero mean the z-axis 880 | gripper_zero_origin[:, 2] = gripper_zero_origin[:, 2] - np.mean(gripper_zero_origin[:, 2]) 881 | # rotate 90 degrees around the x-axis 882 | R = np.array([[1, 0, 0], [0, 0, -1], [0, 1, 0]]) 883 | gripper_zero_origin = np.dot(R, gripper_zero_origin.T).T 884 | gripper_zero_origin_o3d = o3d.geometry.PointCloud() 885 | gripper_zero_origin_o3d.points = o3d.utility.Vector3dVector(gripper_zero_origin) 886 | p = np.asarray(gripper_zero_origin_o3d.points) 887 | q = np.asarray(gripper_scaled_to_hand_pcd.points) 888 | T = find_scaled_transformation(p, q, use_scale=False) 889 | gripper_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0]) 890 | gripper_coord.transform(T) 891 | 892 | if vizualize: 893 | o3d.visualization.draw_geometries([gripper_scaled_to_hand_pcd, gripper_zero_origin_o3d, gripper_coord, world_coord]) 894 | if return_zero_meaned_gripper: 895 | return T, gripper_zero_origin_o3d 896 | return T 897 | 898 | 899 | def get_mano_hand_joints_from_vertices(vertices, camera_translation, model_path = MODEL_MANO_PATH): 900 | is_rhand = True 901 | ext='pkl' 902 | data_struct = None 903 | if data_struct is None: 904 | # Load the model 905 | if osp.isdir(model_path): 906 | model_fn = 'MANO_{}.{ext}'.format('RIGHT' if is_rhand else 'LEFT', ext=ext) 907 | mano_path = os.path.join(model_path, model_fn) 908 | else: 909 | mano_path = model_path 910 | is_rhand = True if 'RIGHT' in os.path.basename(model_path) else False 911 | assert osp.exists(mano_path), 'Path {} does not exist!'.format( 912 | mano_path) 913 | if ext == 'pkl': 914 | with open(mano_path, 'rb') as mano_file: 915 | model_data = pickle.load(mano_file, encoding='latin1') 916 | elif ext == 'npz': 917 | model_data = np.load(mano_path, allow_pickle=True) 918 | else: 919 | raise ValueError('Unknown extension: {}'.format(ext)) 920 | data_struct = Struct(**model_data) 921 | 922 | def add_joints(vertices,joints, joint_ids = None): 923 | tip_ids = TIP_IDS['mano'] 924 | dev = vertices.device 925 | if joint_ids is None: 926 | joint_ids = to_tensor(list(tip_ids.values()), 927 | dtype=torch.long).to(dev) 928 | extra_joints = torch.index_select(vertices, 1, joint_ids) 929 | joints = torch.cat([joints, extra_joints], dim=1) 930 | return joints 931 | 932 | data_struct.J_regressor = torch.from_numpy(data_struct.J_regressor.todense()).float() 933 | joints_predicted = lbs.vertices2joints(data_struct.J_regressor, torch.tensor(vertices).unsqueeze(0)) 934 | joints_predicted = add_joints(torch.tensor(vertices).unsqueeze(0), joints_predicted) 935 | 936 | hand_camera_translation_torch = torch.tensor(camera_translation).unsqueeze(0).unsqueeze(0) 937 | # the below comes from renderer.py (ln: 400) to match how the actual hand mesh is 938 | # transformed before rendering 939 | rot_axis=[1,0,0] 940 | rot_angle= 180 941 | rot = trimesh.transformations.rotation_matrix(np.radians(rot_angle), rot_axis) 942 | joints_predicted_rotated = joints_predicted @ rot[:3, :3].T 943 | joints_predicted_translated = joints_predicted_rotated - hand_camera_translation_torch 944 | 945 | return joints_predicted_translated 946 | 947 | def joints_np_joint_meshes(joints, radius=.005, vc=colors['green']): 948 | joints = to_np(joints) 949 | if joints.ndim <3: 950 | joints = joints.reshape(1,-1,3) 951 | meshes = [] 952 | for j in joints: 953 | joint_mesh = Mesh(vertices=j, radius=radius, vc=vc) 954 | meshes.append(joint_mesh) 955 | return meshes 956 | 957 | def get_joints_of_hand_mesh(mesh, vertices, camera_translation): 958 | joints = get_mano_hand_joints_from_vertices(vertices, camera_translation) 959 | joint_meshes = joints_np_joint_meshes(joints) 960 | joint_meshes[0].vertices = joint_meshes[0].vertices + mesh.vertices.mean(0) - joint_meshes[0].vertices.mean(0) # align joints with the hand mesh 961 | joints = joints.squeeze(0) 962 | joints = joints + mesh.vertices.mean(0) - torch.mean(joints, axis=0) # align joints with the hand mesh 963 | return joint_meshes, joints 964 | 965 | def get_hand_keypoints_from_mano_model(joints, rgb_im=None, vizualize=False): 966 | 967 | joint_mesh = np.linalg.inv(T_OPENGL_TO_OPENCV) @ np.vstack((joints.T, np.ones((1, joints.shape[0])))) 968 | joint_mesh = joint_mesh[:3, :].T 969 | jm_X, jm_Y, jm_Z = joint_mesh[:, 0], joint_mesh[:, 1], joint_mesh[:, 2] 970 | x_pixel = (jm_X * INTRINSICS_HAMER_RENDERER[0, 0] / jm_Z) + INTRINSICS_HAMER_RENDERER[0, 2] 971 | y_pixel = (jm_Y * INTRINSICS_HAMER_RENDERER[1, 1] / jm_Z) + INTRINSICS_HAMER_RENDERER[1, 2] 972 | joint_mesh_2d = np.vstack((x_pixel, y_pixel)).T 973 | joint_mesh_2d = joint_mesh_2d.astype(int) 974 | middle_x, middle_y = joint_mesh_2d[MANO_HAND_IDS["middle_tip"]] 975 | index_finger_1_x, index_finger_1_y = joint_mesh_2d[MANO_HAND_IDS["index_tip"]] 976 | index_finger_2_x, index_finger_2_y = joint_mesh_2d[MANO_HAND_IDS["index_pip"]] 977 | index_finger_3_x, index_finger_3_y = joint_mesh_2d[MANO_HAND_IDS["index_dip"]] 978 | thumb_1_x, thumb_1_y = joint_mesh_2d[MANO_HAND_IDS["thumb_tip"]] 979 | thumb_2_x, thumb_2_y = joint_mesh_2d[MANO_HAND_IDS["thumb_pip"]] 980 | thumb_3_x, thumb_3_y = joint_mesh_2d[MANO_HAND_IDS["thumb_dip"]] 981 | if vizualize: 982 | assert rgb_im is not None, "Must provide RGB to vizualize" 983 | 984 | joint_mesh_depth_im = point_cloud_to_depth_image(copy.deepcopy(joint_mesh), 985 | fx=INTRINSICS_HAMER_RENDERER[0, 0], 986 | fy=INTRINSICS_HAMER_RENDERER[1, 1], 987 | cx=INTRINSICS_HAMER_RENDERER[0, 2], 988 | cy=INTRINSICS_HAMER_RENDERER[1, 2], 989 | width=int(INTRINSICS_HAMER_RENDERER[0, 2] * 2), 990 | height=int(INTRINSICS_HAMER_RENDERER[1, 2] * 2)) 991 | joint_mesh_depth_im = np.asarray(joint_mesh_depth_im)[..., np.newaxis] 992 | plt.imshow(joint_mesh_depth_im) 993 | plt.show() 994 | 995 | plt.imshow(rgb_im) 996 | plt.scatter(index_finger_1_x, index_finger_1_y, color='blue', s=10) 997 | plt.scatter(index_finger_2_x, index_finger_2_y, color='orange', s=10) 998 | plt.scatter(index_finger_3_x, index_finger_3_y, color='purple', s=10) 999 | plt.scatter(thumb_1_x, thumb_1_y, color='blue', s=10) 1000 | plt.scatter(thumb_2_x, thumb_2_y, color='orange', s=10) 1001 | plt.scatter(thumb_3_x, thumb_3_y, color='purple', s=10) 1002 | plt.show() 1003 | 1004 | return middle_x, middle_y, index_finger_1_x, index_finger_1_y, index_finger_2_x, index_finger_2_y, \ 1005 | index_finger_3_x, index_finger_3_y, thumb_1_x, thumb_1_y, thumb_2_x, thumb_2_y, thumb_3_x, thumb_3_y 1006 | 1007 | 1008 | def mesh_and_joints_to_world_metric_space(mesh, joints, T_mesh_to_live, scaling_rendered_to_live, live_image_pcd = None, vizualize=False): 1009 | # Visualize alignment of partial point cloud in the world frame 1010 | points_mesh = mesh.sample(10000) 1011 | # points_mesh in camera coordinates 1012 | points_mesh = np.linalg.inv(T_OPENGL_TO_OPENCV) @ np.vstack((points_mesh.T, np.ones((1, points_mesh.shape[0])))) 1013 | points_mesh = points_mesh[:3, :].T 1014 | # points_mesh = points_mesh * scaling_rendered_to_live 1015 | points_mesh = apply_transformation(points_mesh, T_mesh_to_live) 1016 | 1017 | joint_mesh = np.linalg.inv(T_OPENGL_TO_OPENCV) @ np.vstack((joints.T, np.ones((1, joints.shape[0])))) 1018 | joint_mesh = joint_mesh[:3, :].T 1019 | # joint_mesh = joint_mesh * scaling_rendered_to_live 1020 | joint_mesh = apply_transformation(joint_mesh, T_mesh_to_live) 1021 | 1022 | mesh.apply_transform(np.linalg.inv(T_OPENGL_TO_OPENCV)) # probably could have done this before points_mesh to save some lines, but yeah it is what it is 1023 | mesh.apply_scale(scaling_rendered_to_live) 1024 | mesh.apply_transform(T_mesh_to_live) 1025 | mesh_o3d = o3d.geometry.TriangleMesh() 1026 | mesh_o3d.vertices = o3d.utility.Vector3dVector(mesh.vertices) 1027 | mesh_o3d.triangles = o3d.utility.Vector3iVector(mesh.faces) 1028 | mesh_o3d.compute_vertex_normals() 1029 | mesh_o3d.vertex_colors = o3d.utility.Vector3dVector(mesh.visual.vertex_colors[:, :3] / 255.0) 1030 | 1031 | if vizualize: 1032 | assert live_image_pcd is not None, "Need to provide the live image point cloud to vizualize mesh and joints in live image space" 1033 | 1034 | 1035 | pcd_mesh = o3d.geometry.PointCloud() 1036 | pcd_mesh.points = o3d.utility.Vector3dVector(points_mesh) 1037 | # pcd joints as speheres 1038 | pcd_joints = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=joint_mesh[0]) 1039 | for j in joint_mesh: 1040 | sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.005) 1041 | sphere.compute_vertex_normals() 1042 | sphere.paint_uniform_color([1, 0, 0]) 1043 | sphere.translate(j) 1044 | pcd_joints += sphere 1045 | o3d.visualization.draw_geometries([pcd_mesh, pcd_joints, live_image_pcd, mesh_o3d]) 1046 | return points_mesh, joint_mesh, mesh_o3d 1047 | 1048 | 1049 | def get_gripper_pose_from_frame(rgb, depth, use_mediapipe_for_hand_kpts=False, vizualize=False, press_task=False, scale_depth_image=False): 1050 | depth = np.array(depth).astype(np.float32) 1051 | rgb_im, rgb_hand_only, rend_depth_front_view, det_out, hamer_output, hand_mesh_params, all_mesh_params = get_hand_and_rendered_depth(rgb) 1052 | depth_im = depth 1053 | human_mask = get_hand_mask_from_detectron(det_out) 1054 | all_meshes, all_cameras, all_vertices = all_mesh_params[0], all_mesh_params[1], all_mesh_params[2] 1055 | rgb_hand_only = rgb_hand_only[:,:,:3] 1056 | joint_meshes, joints_coords = get_joints_of_hand_mesh(copy.deepcopy(all_meshes[0]), copy.deepcopy(all_vertices[0]), copy.deepcopy(all_cameras[0])) 1057 | if use_mediapipe_for_hand_kpts: 1058 | middle_x, middle_y, index_finger_1_x, index_finger_1_y, index_finger_2_x, index_finger_2_y, index_finger_3_x, index_finger_3_y, thumb_1_x, thumb_1_y, thumb_2_x, thumb_2_y, thumb_3_x, thumb_3_y = get_hand_keypoints_from_mediapipe(rgb_im, viz_keypoints=vizualize) 1059 | else: 1060 | middle_x, middle_y, index_finger_1_x, index_finger_1_y, index_finger_2_x, index_finger_2_y, index_finger_3_x, index_finger_3_y, thumb_1_x, thumb_1_y, thumb_2_x, thumb_2_y, thumb_3_x, thumb_3_y = get_hand_keypoints_from_mano_model(joints_coords, rgb_im=rgb_im, vizualize=vizualize) 1061 | scaling_factor = 1 1062 | if scale_depth_image: 1063 | # generally no need to use this 1064 | d_val = -1 1065 | base_value_finger = 'index' # index, middle, thumb 1066 | base_val = copy.deepcopy(rend_depth_front_view[index_finger_1_y, index_finger_1_x]) 1067 | idx_x, idx_y = index_finger_1_x, index_finger_1_y 1068 | if base_val == 0 or depth_im[idx_y, idx_x] == 0: 1069 | base_value_finger = 'middle' # index, middle, thumb 1070 | base_val = copy.deepcopy(rend_depth_front_view[middle_y, middle_x]) 1071 | idx_x, idx_y = middle_x, middle_y 1072 | if base_val == 0 or depth_im[idx_y, idx_x] == 0: 1073 | base_value_finger = 'thumb' 1074 | base_val = copy.deepcopy(rend_depth_front_view[thumb_1_y, thumb_1_x]) 1075 | idx_x, idx_y = thumb_1_x, thumb_1_y 1076 | d_val = copy.deepcopy(depth_im[idx_y, idx_x]) 1077 | percentage_rend_depth_front_view = rend_depth_front_view / base_val 1078 | scaled_depth_image = percentage_rend_depth_front_view * d_val 1079 | scaling_factor = d_val / base_val 1080 | print(f'Scaling factor {base_value_finger}: {scaling_factor}') 1081 | else: 1082 | scaled_depth_image = copy.deepcopy(rend_depth_front_view) 1083 | hand_point_cloud, hand_point_cloud_full, image_point_cloud, hand_pcd_as_o3d, live_image_pcd_as_o3d, T_mesh_to_live = get_hand_pcd_in_scene_from_rendered_model(rgb_im, 1084 | depth_im, 1085 | rend_depth_front_view, 1086 | scaled_depth_image, 1087 | human_mask, 1088 | vizualize=vizualize) 1089 | mesh_live_np, joints_live, hand_mesh_in_live_metric_space = mesh_and_joints_to_world_metric_space(copy.deepcopy(all_meshes[0]), 1090 | copy.deepcopy(joints_coords), 1091 | T_mesh_to_live, 1092 | scaling_factor, 1093 | live_image_pcd=live_image_pcd_as_o3d, 1094 | vizualize=vizualize) 1095 | gripper_pcd = np.load('assets/utils/gripper_point_cloud_dense.npy') 1096 | gripper_pcd = gripper_pcd / 1000 # scale down by 1000 1097 | gripper_pcd_as_o3d = o3d.geometry.PointCloud() 1098 | gripper_pcd_as_o3d.points = o3d.utility.Vector3dVector(gripper_pcd) 1099 | gripper_aligned_to_hand_pcd_as_o3d = align_gripper_to_hand(hand_point_cloud, 1100 | hand_pcd_as_o3d, 1101 | gripper_pcd, 1102 | gripper_pcd_as_o3d, 1103 | vizualize=vizualize) 1104 | 1105 | key_fingers_points = np.array([joints_live[MANO_HAND_IDS["index_tip"]], 1106 | joints_live[MANO_HAND_IDS["index_dip"]], 1107 | joints_live[MANO_HAND_IDS["index_pip"]], 1108 | joints_live[MANO_HAND_IDS["thumb_tip"]], 1109 | joints_live[MANO_HAND_IDS["thumb_dip"]], 1110 | joints_live[MANO_HAND_IDS["thumb_pip"]], 1111 | joints_live[MANO_HAND_IDS["wrist"]], 1112 | (joints_live[MANO_HAND_IDS["index_mcp"]] + joints_live[MANO_HAND_IDS["thumb_dip"]]) / 2]) 1113 | key_fingers_points = key_fingers_points.reshape(-1, 3) 1114 | gripper_scaled_to_hand_pcd = copy.deepcopy(gripper_aligned_to_hand_pcd_as_o3d) # replace, so no scale applied. Comment out if scaling - although likely you don't need to 1115 | gripper_scaled_to_hand_pcd.paint_uniform_color([0, 0, 1]) 1116 | bias_T = np.eye(4) 1117 | bias_T[2, 3] = 0.01 # NOTE: This is explicit to the robotiq CAD model gripper we had 1118 | if not press_task: 1119 | gripper_scaled_to_hand_pcd, gripper_pose, distance_hand_fingers = align_gripper_with_hand_fingers(gripper_scaled_to_hand_pcd, 1120 | np_to_o3d(mesh_live_np), 1121 | key_fingers_points, 1122 | gripper_aligned_to_hand_pcd_as_o3d, 1123 | gripper_pcd_as_o3d, 1124 | use_only_thumb_keypoints=False, 1125 | use_only_index_keypoints=False, 1126 | rescale_gripper_to_hand_opening=False, 1127 | rescale_hand_to_gripper_opening=False, 1128 | bias_transformation=bias_T, 1129 | vizualize=vizualize) 1130 | gripper_opening_percent = np.min([distance_hand_fingers / DISTANCE_BETWEEN_GRIPPERS_FINGERS, 1]) 1131 | else: 1132 | gripper_scaled_to_hand_pcd, gripper_pose = align_hand_to_gripper_press(gripper_scaled_to_hand_pcd, 1133 | np.array([key_fingers_points[0], key_fingers_points[1], key_fingers_points[2]]), 1134 | vizualize=vizualize, 1135 | bias_transformation=bias_T) 1136 | gripper_opening_percent = 0 1137 | 1138 | return gripper_scaled_to_hand_pcd, live_image_pcd_as_o3d, gripper_pose, gripper_opening_percent, hand_mesh_in_live_metric_space, joints_live 1139 | 1140 | 1141 | 1142 | 1143 | 1144 | gripper_poses = [] 1145 | gripper_pcds = [] 1146 | live_image_pcds = [] 1147 | meshes = [] 1148 | joint_actions = [] 1149 | gripper_actions = [] 1150 | if not LOAD_SCENE_DATA_FOR_PROCESSING: 1151 | print("Processing frames...") 1152 | for idx in tqdm.tqdm(range(0, hands_rgb.shape[0], FRAME_STEP)): 1153 | try: 1154 | rgb = hands_rgb[idx] 1155 | depth = hands_depth[idx] / 1000 1156 | gripper_scaled_to_hand_pcd, live_image_pcd_as_o3d, gripper_pose, gripper_opening_percent, hand_mesh, hand_joints_kpts = get_gripper_pose_from_frame(rgb, depth, vizualize=VIZ, press_task=IS_PRESS_TASK) 1157 | gripper_poses.append(gripper_pose) 1158 | gripper_pcds.append(gripper_scaled_to_hand_pcd) 1159 | live_image_pcds.append(live_image_pcd_as_o3d) 1160 | gripper_actions.append(gripper_opening_percent) 1161 | meshes.append(hand_mesh) 1162 | joint_actions.append(hand_joints_kpts) 1163 | o3d.io.write_point_cloud(f'{SCENE_FILES_FOLDER}/gripper_pcd_{idx}.ply', gripper_scaled_to_hand_pcd) 1164 | o3d.io.write_point_cloud(f'{SCENE_FILES_FOLDER}/live_image_pcd_{idx}.ply', live_image_pcd_as_o3d) 1165 | o3d.io.write_triangle_mesh(f'{SCENE_FILES_FOLDER}/hand_mesh_{idx}.ply', hand_mesh) 1166 | np.save(f'{SCENE_FILES_FOLDER}/gripper_pose_{idx}.npy', gripper_pose) 1167 | np.save(f'{SCENE_FILES_FOLDER}/gripper_actions_{idx}.npy', gripper_actions) 1168 | np.save(f'{SCENE_FILES_FOLDER}/hand_joints_kpts_{idx}.npy', hand_joints_kpts) 1169 | except Exception as e: 1170 | print(f'Error in frame {idx}: {e}') 1171 | continue 1172 | print("Saving joint actions 3D...") 1173 | joint_actions = np.asarray(joint_actions) 1174 | joint_actions = np.array(joint_actions) 1175 | np.save(f'{SCENE_FILES_FOLDER}/hand_joints_kpts_3d.npy', joint_actions) 1176 | print("Projecting joint actions to 2D...") 1177 | jm_X, jm_Y, jm_Z = joint_actions[:, :, 0], joint_actions[:, :, 1], joint_actions[:, :, 2] 1178 | x_pixel = (jm_X * INTRINSICS_REAL_CAMERA[0, 0] / jm_Z) + INTRINSICS_REAL_CAMERA[0, 2] 1179 | y_pixel = (jm_Y * INTRINSICS_REAL_CAMERA[1, 1] / jm_Z) + INTRINSICS_REAL_CAMERA[1, 2] 1180 | joints_acts_2d = np.stack([x_pixel, y_pixel], axis=2) 1181 | print("Saving joint actions 2D...") 1182 | np.save(f'{SCENE_FILES_FOLDER}/hand_joints_kpts_2d.npy', joints_acts_2d) 1183 | 1184 | 1185 | else: 1186 | print("Loading frames...") 1187 | gripper_poses = [] 1188 | gripper_pcds = [] 1189 | live_image_pcds = [] 1190 | meshes = [] 1191 | for idx in tqdm.tqdm(range(0, hands_rgb.shape[0], FRAME_STEP)): 1192 | gripper_pose = np.load(f'{SCENE_FILES_FOLDER}/gripper_pose_{idx}.npy') 1193 | gripper_pcd = o3d.io.read_point_cloud(f'{SCENE_FILES_FOLDER}/gripper_pcd_{idx}.ply') 1194 | live_image_pcd = o3d.io.read_point_cloud(f'{SCENE_FILES_FOLDER}/live_image_pcd_{idx}.ply') 1195 | hand_mesh = o3d.io.read_triangle_mesh(f'{SCENE_FILES_FOLDER}/hand_mesh_{idx}.ply') 1196 | hand_joints_kpts = np.load(f'{SCENE_FILES_FOLDER}/hand_joints_kpts_{idx}.npy') 1197 | gripper_poses.append(gripper_pose) 1198 | gripper_pcds.append(gripper_pcd) 1199 | live_image_pcds.append(live_image_pcd) 1200 | meshes.append(hand_mesh) 1201 | 1202 | print("Interpolate trajectories...") 1203 | # smooth gripper_poses trajectory 1204 | interpolated_gripper_poses = interpolate_pose_sequence(gripper_poses, FRAME_STEP) 1205 | np.save(f'{SCENE_FILES_FOLDER}/interpolated_gripper_poses.npy', interpolated_gripper_poses) 1206 | print("Print saved interpolated trajectory (no smoothing)") 1207 | print("Extracting linear component to smooth...") 1208 | interpolated_gripper_poses_xyz = [] 1209 | for pose in tqdm.tqdm(interpolated_gripper_poses): 1210 | interpolated_gripper_poses_xyz.append(pose[:3, 3]) 1211 | interpolated_gripper_poses_xyz = np.array(interpolated_gripper_poses_xyz) 1212 | 1213 | window_size = FRAME_STEP 1214 | print(f'Print weighted average smoothing with window: {window_size}...') 1215 | interpolated_gripper_poses_filtered = copy.deepcopy(interpolated_gripper_poses_xyz) 1216 | for i in tqdm.tqdm(range(window_size, interpolated_gripper_poses_xyz.shape[0] - window_size)): 1217 | interpolated_gripper_poses_filtered[i] = np.mean(interpolated_gripper_poses_xyz[i - window_size: i + window_size], axis=0) 1218 | interpolated_gripper_poses_filtered = np.array(interpolated_gripper_poses_filtered) 1219 | 1220 | # for i in tqdm.tqdm(range(interpolated_gripper_poses.shape[0])): 1221 | # interpolated_gripper_poses[i, :3, 3] = interpolated_gripper_poses_filtered[i] 1222 | # np.save(f'{scene_files_folder_name}/interpolated_gripper_poses_filtered.npy', interpolated_gripper_poses) 1223 | print("Saved interpolated + smoothed trajectory") 1224 | gripper_pcd0 = gripper_pcds[0] 1225 | gripper_pose0 = gripper_poses[0] 1226 | gripper_pose0_inv = np.linalg.inv(gripper_pose0) 1227 | 1228 | print("Generating gripper pcds for interpolated poses...") 1229 | interpolated_gripper_poses_coord_frames = [] 1230 | interpolated_gripper_pcds = [] 1231 | for pose in tqdm.tqdm(interpolated_gripper_poses): 1232 | coord_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0]) 1233 | coord_frame.transform(pose) 1234 | interpolated_gripper_poses_coord_frames.append(coord_frame) 1235 | interpolated_gripper_pcds.append(copy.deepcopy(gripper_pcd0).transform(pose @ gripper_pose0_inv)) 1236 | 1237 | print("Projecting target gripper pcds (waypoints) to 2D image...") 1238 | gripper_projections = [] 1239 | for i in tqdm.tqdm(range(len(gripper_pcds))): 1240 | gripper_pcd = gripper_pcds[i] 1241 | gripper_pcd_np = np.asarray(gripper_pcd.points) 1242 | gripper_pcd_depth_im = point_cloud_to_depth_image(gripper_pcd_np, 1243 | INTRINSICS_REAL_CAMERA[0, 0], 1244 | INTRINSICS_REAL_CAMERA[1, 1], 1245 | INTRINSICS_REAL_CAMERA[0, 2], 1246 | INTRINSICS_REAL_CAMERA[1, 2], 1247 | width=IM_WIDTH, 1248 | height=IM_HEIGHT) 1249 | gripper_projections.append(gripper_pcd_depth_im) 1250 | 1251 | print("Projecting interpolated gripper pcds to 2D image...") 1252 | interpolated_gripper_projections = [] 1253 | for i in tqdm.tqdm(range(len(interpolated_gripper_pcds))): 1254 | gripper_pcd = interpolated_gripper_pcds[i] 1255 | gripper_pcd_np = np.asarray(gripper_pcd.points) 1256 | gripper_pcd_depth_im = point_cloud_to_depth_image(gripper_pcd_np, 1257 | INTRINSICS_REAL_CAMERA[0, 0], 1258 | INTRINSICS_REAL_CAMERA[1, 1], 1259 | INTRINSICS_REAL_CAMERA[0, 2], 1260 | INTRINSICS_REAL_CAMERA[1, 2], 1261 | width=IM_WIDTH, 1262 | height=IM_HEIGHT) 1263 | interpolated_gripper_projections.append(gripper_pcd_depth_im) 1264 | 1265 | print("Generating video with projected pcds...") 1266 | cnt = 0 1267 | cnt_gripper_proj = 0 1268 | video_ims = [] 1269 | for i in range(0, hands_rgb.shape[0]): 1270 | gripper_proj = interpolated_gripper_projections[cnt][:, :, np.newaxis] 1271 | gripper_proj[gripper_proj > 0] = 1 1272 | im = .5 * gripper_proj + 1 * hands_rgb[i]/255 + 0.5 * gripper_projections[cnt_gripper_proj][:, :, np.newaxis].repeat(3, axis=2) * np.array([0, 1, 0]) 1273 | if cnt % FRAME_STEP == 0: 1274 | cnt_gripper_proj += 1 1275 | im = im[..., ::-1] 1276 | video_ims.append(im) 1277 | cnt += 1 1278 | if cnt >= len(interpolated_gripper_projections) or cnt_gripper_proj >= len(gripper_projections): 1279 | break 1280 | 1281 | print("Saving video smoothed interpolated...") 1282 | # save video as mp4 1283 | height, width, layers = video_ims[0].shape 1284 | size = (width,height) 1285 | out = cv2.VideoWriter(f'{SCENE_FILES_FOLDER}/../gripper_overlayed_video.mp4', cv2.VideoWriter_fourcc(*'DIVX'), 10, size) 1286 | for i in range(len(video_ims)): 1287 | im = (video_ims[i] * 255).astype(np.uint8) 1288 | out.write(im) 1289 | 1290 | out.release() 1291 | 1292 | 1293 | -------------------------------------------------------------------------------- /vizualizer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import open3d as o3d\n", 10 | "import numpy as np\n", 11 | "import matplotlib.pyplot as plt\n", 12 | "from util_functions import *\n", 13 | "from global_vars import *\n" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": null, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "# load rgbs, depths and hand joint actions\n", 23 | "path = 'assets/grasp_fanta/'\n", 24 | "rgbs = np.load(f'{path}rgbs.npy')\n", 25 | "depths = np.load(f'{path}depths.npy')" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 13, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "\n", 35 | "MANO_HAND_IDS = {\"wrist\": 0, \"index_mcp\": 1, \"index_pip\": 2, \n", 36 | " \"index_dip\": 3, \"middle_mcp\": 4, \"middle_pip\": 5, \n", 37 | " \"middle_dip\": 6, \"pinkie_mcp\": 7, \"pinkie_pip\": 8, \n", 38 | " \"pinkie_dip\": 9, \"ring_mcp\": 10, \"ring_pip\": 11, \n", 39 | " \"ring_dip\": 12, \"thumb_mcp\": 13, \"thumb_pip\": 14, \n", 40 | " \"thumb_dip\": 15, \"thumb_tip\": 16, \"index_tip\": 17, \n", 41 | " \"middle_tip\": 18, \"ring_tip\": 19, \"pinky_tip\": 20}\n", 42 | "\n", 43 | "\"\"\"\n", 44 | "the below is what you likely need. you can vizualize them\n", 45 | "at the following cell.\"\"\"\n", 46 | "actions = np.load(f'{path}/scene_files/hand_joints_kpts_3d.npy')\n", 47 | "rgbs = rgbs[:len(actions)] # sometimes some actions are missing because they were not extracted during processing as for some frames there were no hand present and so they are automatically discarded\n", 48 | "depths = depths[:len(actions)]" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 14, 54 | "metadata": {}, 55 | "outputs": [ 56 | { 57 | "name": "stderr", 58 | "output_type": "stream", 59 | "text": [ 60 | "RPly: Unable to open file\n", 61 | "RPly: Unable to open file\n", 62 | "RPly: Unable to open file\n" 63 | ] 64 | }, 65 | { 66 | "name": "stdout", 67 | "output_type": "stream", 68 | "text": [ 69 | "\u001b[1;33m[Open3D WARNING] Read PLY failed: unable to open file: assets/pick_up_phone//scene_files/live_image_pcd_12.ply\u001b[0;m\n", 70 | "\u001b[1;33m[Open3D WARNING] Read PLY failed: unable to open file: assets/pick_up_phone//scene_files/hand_mesh_12.ply\u001b[0;m\n", 71 | "\u001b[1;33m[Open3D WARNING] Read PLY failed: unable to open file: assets/pick_up_phone//scene_files/gripper_pcd_12.ply\u001b[0;m\n" 72 | ] 73 | } 74 | ], 75 | "source": [ 76 | "\n", 77 | "# load scene files\n", 78 | "\n", 79 | "depth_pcds = []\n", 80 | "hand_mesh_pcds = []\n", 81 | "gripper_pcds = []\n", 82 | "sphere_viz_actions = []\n", 83 | "depth_pcds_no_stabilization = []\n", 84 | "hand_joints = []\n", 85 | "for i in range(rgbs.shape[0]):\n", 86 | "\n", 87 | " frame = i\n", 88 | " depth_pcd = o3d.io.read_point_cloud(f\"{path}/scene_files/live_image_pcd_{frame}.ply\")\n", 89 | " # remove far away depth for visualization\n", 90 | " depth_pcd_np = np.asarray(depth_pcd.points)\n", 91 | " depth_pcd_cols = np.asarray(depth_pcd.colors)\n", 92 | " depth_pcd_cols = depth_pcd_cols[depth_pcd_np[:, 2] < 1]\n", 93 | " depth_pcd_np = depth_pcd_np[depth_pcd_np[:, 2] < 1]\n", 94 | " depth_pcd.points = o3d.utility.Vector3dVector(depth_pcd_np)\n", 95 | " depth_pcd.colors = o3d.utility.Vector3dVector(depth_pcd_cols)\n", 96 | "\n", 97 | " hand_mesh_pcd = o3d.io.read_triangle_mesh(f\"{path}/scene_files/hand_mesh_{frame}.ply\")\n", 98 | " gripper_pcd = o3d.io.read_point_cloud(f\"{path}/scene_files/gripper_pcd_{frame}.ply\")\n", 99 | " depth_pcds.append(depth_pcd)\n", 100 | " hand_mesh_pcds.append(hand_mesh_pcd)\n", 101 | " gripper_pcds.append(gripper_pcd)\n", 102 | " action = actions[i]\n", 103 | " sphere_viz_actions = []\n", 104 | " for a in actions[i]:\n", 105 | " sphere = o3d.geometry.TriangleMesh.create_sphere(radius=0.005)\n", 106 | " sphere.compute_vertex_normals()\n", 107 | " sphere.paint_uniform_color([0, 0, 1])\n", 108 | " sphere.translate(a)\n", 109 | " sphere_viz_actions.append(sphere)\n", 110 | " sphere_viz_actions.append(sphere)\n", 111 | " hand_joints.append(sphere_viz_actions)\n", 112 | "\n", 113 | "\n", 114 | "\n" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 15, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "idx = 10 # pick the index\n", 124 | "\n", 125 | "o3d.visualization.draw_geometries([depth_pcds[idx], gripper_pcds[idx], hand_mesh_pcds[idx]]) \n" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [] 134 | } 135 | ], 136 | "metadata": { 137 | "kernelspec": { 138 | "display_name": "base", 139 | "language": "python", 140 | "name": "python3" 141 | }, 142 | "language_info": { 143 | "codemirror_mode": { 144 | "name": "ipython", 145 | "version": 3 146 | }, 147 | "file_extension": ".py", 148 | "mimetype": "text/x-python", 149 | "name": "python", 150 | "nbconvert_exporter": "python", 151 | "pygments_lexer": "ipython3", 152 | "version": "3.11.5" 153 | } 154 | }, 155 | "nbformat": 4, 156 | "nbformat_minor": 2 157 | } 158 | --------------------------------------------------------------------------------