├── LICENSE
├── README.md
├── assets
    ├── grasp_fanta
    │   ├── depths.npy
    │   └── rgbs.npy
    ├── imgs
    │   └── mainfig.png
    ├── pick_up_phone
    │   ├── depths.npy
    │   └── rgbs.npy
    └── utils
    │   ├── gripper_point_cloud_dense.npy
    │   ├── head_cam_intrinsic_matrix_aligned_depth.npy
    │   └── intrinsics_rgb_d455.npy
├── conda_env.yaml
├── global_vars.py
├── gripper_point_cloud_dense.npy
├── renderer.py
├── requirements.txt
├── util_functions.py
├── video_to_gripper_hamer_kpts.py
└── vizualizer.ipynb


/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (c) 2025, Georgios Papagiannis
 4 | 
 5 | Redistribution and use in source and binary forms, with or without
 6 | modification, are permitted provided that the following conditions are met:
 7 | 
 8 | 1. Redistributions of source code must retain the above copyright notice, this
 9 |    list of conditions and the following disclaimer.
10 | 
11 | 2. Redistributions in binary form must reproduce the above copyright notice,
12 |    this list of conditions and the following disclaimer in the documentation
13 |    and/or other materials provided with the distribution.
14 | 
15 | 3. Neither the name of the copyright holder nor the names of its
16 |    contributors may be used to endorse or promote products derived from
17 |    this software without specific prior written permission.
18 | 
19 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
20 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
21 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | ## 🦾 R+X:  Retrieval and Execution from Everyday Human Videos
 2 | ## 🎥 Extracting robotic actions from human video demonstrations
 3 | 
 4 | ![teaser](./assets/imgs/mainfig.png)
 5 | 
 6 | This repository provides the code used to extract robot actions given a human video demonstration for the paper **R+X:  Retrieval and Execution from Everyday Human Videos** [[paper](https://arxiv.org/abs/2407.12957)]. R+X is divided into a Retrieval and an Execution phase. The execution phase leverages the actions extracted from the human videos to execute tasks zero-shot in new settings. R+X's execution phase relies on Keypoint Action Tokens (KAT). For code on KAT please refer to: [colab](https://colab.research.google.com/drive/1ZxZDahvLBE2EGo8_98Lk0Zr8IvUIDjMc?usp=sharing). This is a simplified version in that it is not optimised to process large scale video datasets in parallel, instead it sequentially processes each frame in a video, runs on cpu (apart from the hand-tracking model) and acts as a good starting point demonstrating how to extract actions from human videos. We provide one video of the "grasp franta" and "pick up phone" tasks used in our experiments. Even though our experiments involved a human recording videos while moving, here we provide an example with a moving camera and one with a static camera each with different intrinsics.
 7 | 
 8 | ### 📋 Installation
 9 | 
10 | Download repository:
11 | 
12 | ```bash
13 | git clone https://github.com/gpapagiannis/r-plus-x-hand2actions.git
14 | cd r-plus-x-hand2actions
15 | ```
16 | 
17 | Create a conda environment. Needs python >= 3.10:
18 | 
19 | ```bash
20 | conda env create -f conda_env.yaml
21 | conda activate rpx
22 | ```
23 | 
24 | Install requirements.txt
25 | 
26 | ```bash
27 | pip install -r requirements.txt
28 | ```
29 | 
30 | - **Install** HaMeR for hand tracking. Follow the instructions in [https://github.com/geopavlakos/hamer](https://github.com/geopavlakos/hamer) 
31 | - After installing HaMeR, go to:  **_DATA > hamer_ckpts > model_config.yaml**. Then in the model_config.yaml set: **FOCAL_LENGTH = 918**. This ensures compatibility with the camera intrinsics used for the provided examples. We will fix that soon to be set automatically given any intrinsics matrix. 
32 | - After installing HaMeR, **replace** the renderer.py file in "hamer/hamer/utils/renderer.py" with the renderer.py provided here [renderer.py](renderer.py) 
33 | - **Install** the Pytorch implementation of MANO hand model [https://github.com/otaheri/MANO](https://github.com/otaheri/MANO) 
34 | 
35 | 
36 | 
37 | 
38 | ### 👟 Running the code ...
39 | 
40 | The whole method is contained in [video_to_gripper_hamer_kpts](video_to_gripper_hamer_kpts.py). Just run:
41 | 
42 | ```bash
43 | python video_to_gripper_hamer_kpts.py
44 | ```
45 | By default this will extract the actions for the "grasp_fanta" human video demo found [here](./assets/grasp_fanta/). For the pick_up_phone task (and its corresponding intrinsics), as well as to vizualize step by step the hand extraction process see the [global_vars.py](global_vars.py) file.
46 | 
47 | 
48 | ### ✏️ Important Note
49 | 
50 | Note: Our method is optimised for the Robotiq Gripper 2F-85 which was used for our experiments. The model we used is in [gripper_point_cloud_dense.npy](gripper_point_cloud_dense.npy). In principle, the heuristics used to map the gripper to the hand could be applied to most parallel jaw gripper, however you would need to manually define the gripper points you would like to map the hand to. We have done this in line 552 of [video_to_gripper_hamer_kpts.py](video_to_gripper_hamer_kpts.py). \
51 | 
52 | Line 552:
53 | 
54 | ```bash
55 | dense_pcd_kpts = {"index_front": 517980, "thumb_front": 248802, "wrist": 246448}
56 | ```
57 | 
58 | The numbers correspond to the indices of the points in the point cloud: [gripper_point_cloud_dense.npy](gripper_point_cloud_dense.npy). You can vizualize the action extraction process both in our code and by visiting the bottom of our website: [https://www.robot-learning.uk/r-plus-x](https://www.robot-learning.uk/r-plus-x). If you use a different parallel jaw gripper, convert the mesh to a point cloud, select the indices you would like to map from the gripper to the tip of the index, the tip of the thumb and the mean distance between index mcp and the thumb dip of the hand.
59 | 
60 | ### 🎞️ Vizualizing the results
61 | 
62 | The notebook [vizualizer.ipynb](vizualizer.ipynb) provides minimal code to vizualize the hand to gripper action extraction results. For each task a scene_files folder is created that contains for each video frame, the frame's point cloud, the detected hand mesh, the gripper pose in the camera's frame, the gripper's point cloud and a file hand_joints_kpts_3d.npy that includes the sequence of gripper poses which could be used for further processing to train a model.
63 | 


--------------------------------------------------------------------------------
/assets/grasp_fanta/depths.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/grasp_fanta/depths.npy


--------------------------------------------------------------------------------
/assets/grasp_fanta/rgbs.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/grasp_fanta/rgbs.npy


--------------------------------------------------------------------------------
/assets/imgs/mainfig.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/imgs/mainfig.png


--------------------------------------------------------------------------------
/assets/pick_up_phone/depths.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/pick_up_phone/depths.npy


--------------------------------------------------------------------------------
/assets/pick_up_phone/rgbs.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/pick_up_phone/rgbs.npy


--------------------------------------------------------------------------------
/assets/utils/gripper_point_cloud_dense.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/utils/gripper_point_cloud_dense.npy


--------------------------------------------------------------------------------
/assets/utils/head_cam_intrinsic_matrix_aligned_depth.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/utils/head_cam_intrinsic_matrix_aligned_depth.npy


--------------------------------------------------------------------------------
/assets/utils/intrinsics_rgb_d455.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/assets/utils/intrinsics_rgb_d455.npy


--------------------------------------------------------------------------------
/conda_env.yaml:
--------------------------------------------------------------------------------
1 | name: rpx
2 | channels:
3 |   - pytorch
4 |   - nvidia
5 |   - conda-forge
6 | dependencies:
7 |   - python=3.10.12


--------------------------------------------------------------------------------
/global_vars.py:
--------------------------------------------------------------------------------
 1 | import numpy as np
 2 | import os
 3 | 
 4 | task_folder = "assets/grasp_fanta" # folder with the task, also do "assets/pick_up_phone" for pick up phone. if you change task uncomment the corresponding intrinsics matrix below.
 5 | VIZ = False # visualize the action extraction/hand tracking process
 6 | IS_PRESS_TASK = False # extract robot action for press task
 7 | LOAD_SCENE_DATA_FOR_PROCESSING=False # load scene data after extracting hand poses to create videos etc. for vizualization
 8 | INTRINSICS_REAL_CAMERA = np.load("assets/utils/intrinsics_rgb_d455.npy") # intrinsics for camera used for grasp_fanta
 9 | # INTRINSICS_REAL_CAMERA = np.load("assets/utils/head_cam_intrinsic_matrix_aligned_depth.npy") # intrinsics for camera used for pick_up_phone
10 | VIZ_DEMO = False # vizualize video of the demonstration before processing
11 | START = 0 # which part of the video to use start:end
12 | END = 1000000
13 | FRAME_STEP = 1 # use every nth frame
14 | hands_rgb = np.load(f"{task_folder}/rgbs.npy")[START:END]
15 | hands_depth = np.load(f"{task_folder}/depths.npy")[START:END]
16 | MODEL_MANO_PATH = 'hamer/_DATA/_DATA/data/mano' # path to mano model
17 | SCENE_FILES_FOLDER = f"{task_folder}/scene_files"
18 | os.makedirs(SCENE_FILES_FOLDER, exist_ok=True) # make dir if not exists
19 | INTRINSICS_HAMER_RENDERER = np.eye(4)
20 | INTRINSICS_HAMER_RENDERER[0 ,0] = 2295.0
21 | INTRINSICS_HAMER_RENDERER[1, 1] = 2295.0
22 | INTRINSICS_HAMER_RENDERER[0, 2] = 320.0
23 | INTRINSICS_HAMER_RENDERER[1, 2] = 240.0
24 | # probably do not need to change
25 | T_OPENGL_TO_OPENCV = np.array([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])
26 | HUMAN_HAND_COLOR=(0.999, 0.6745, 0.4117)
27 | MANO_HAND_IDS = {"wrist": 0,        "index_mcp": 1,     "index_pip": 2, 
28 |                  "index_dip": 3,    "middle_mcp": 4,    "middle_pip": 5, 
29 |                  "middle_dip": 6,   "pinkie_mcp": 7,    "pinkie_pip": 8, 
30 |                  "pinkie_dip": 9,   "ring_mcp": 10,     "ring_pip": 11, 
31 |                  "ring_dip": 12,    "thumb_mcp": 13,    "thumb_pip": 14, 
32 |                  "thumb_dip": 15,   "thumb_tip": 16,    "index_tip": 17, 
33 |                  "middle_tip": 18,  "ring_tip": 19,     "pinky_tip": 20}
34 | DISTANCE_BETWEEN_GRIPPERS_FINGERS = 0.08507
35 | 
36 | 


--------------------------------------------------------------------------------
/gripper_point_cloud_dense.npy:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/gpapagiannis/r-plus-x-hand2actions/b269966cb99bd7b5031dd1d4e7c4727c38055a87/gripper_point_cloud_dense.npy


--------------------------------------------------------------------------------
/renderer.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | if 'PYOPENGL_PLATFORM' not in os.environ:
  3 |    os.environ['PYOPENGL_PLATFORM'] = 'egl'
  4 | import torch
  5 | import numpy as np
  6 | import pyrender
  7 | import trimesh
  8 | import cv2
  9 | from yacs.config import CfgNode
 10 | from typing import List, Optional
 11 | 
 12 | def cam_crop_to_full(cam_bbox, box_center, box_size, img_size, focal_length=5000.):
 13 |     # Convert cam_bbox to full image
 14 |     img_w, img_h = img_size[:, 0], img_size[:, 1]
 15 |     cx, cy, b = box_center[:, 0], box_center[:, 1], box_size
 16 |     w_2, h_2 = img_w / 2., img_h / 2.
 17 |     bs = b * cam_bbox[:, 0] + 1e-9
 18 |     tz = 2 * focal_length / bs
 19 |     tx = (2 * (cx - w_2) / bs) + cam_bbox[:, 1]
 20 |     ty = (2 * (cy - h_2) / bs) + cam_bbox[:, 2]
 21 |     full_cam = torch.stack([tx, ty, tz], dim=-1)
 22 |     return full_cam
 23 | 
 24 | def get_light_poses(n_lights=5, elevation=np.pi / 3, dist=12):
 25 |     # get lights in a circle around origin at elevation
 26 |     thetas = elevation * np.ones(n_lights)
 27 |     phis = 2 * np.pi * np.arange(n_lights) / n_lights
 28 |     poses = []
 29 |     trans = make_translation(torch.tensor([0, 0, dist]))
 30 |     for phi, theta in zip(phis, thetas):
 31 |         rot = make_rotation(rx=-theta, ry=phi, order="xyz")
 32 |         poses.append((rot @ trans).numpy())
 33 |     return poses
 34 | 
 35 | def make_translation(t):
 36 |     return make_4x4_pose(torch.eye(3), t)
 37 | 
 38 | def make_rotation(rx=0, ry=0, rz=0, order="xyz"):
 39 |     Rx = rotx(rx)
 40 |     Ry = roty(ry)
 41 |     Rz = rotz(rz)
 42 |     if order == "xyz":
 43 |         R = Rz @ Ry @ Rx
 44 |     elif order == "xzy":
 45 |         R = Ry @ Rz @ Rx
 46 |     elif order == "yxz":
 47 |         R = Rz @ Rx @ Ry
 48 |     elif order == "yzx":
 49 |         R = Rx @ Rz @ Ry
 50 |     elif order == "zyx":
 51 |         R = Rx @ Ry @ Rz
 52 |     elif order == "zxy":
 53 |         R = Ry @ Rx @ Rz
 54 |     return make_4x4_pose(R, torch.zeros(3))
 55 | 
 56 | def make_4x4_pose(R, t):
 57 |     """
 58 |     :param R (*, 3, 3)
 59 |     :param t (*, 3)
 60 |     return (*, 4, 4)
 61 |     """
 62 |     dims = R.shape[:-2]
 63 |     pose_3x4 = torch.cat([R, t.view(*dims, 3, 1)], dim=-1)
 64 |     bottom = (
 65 |         torch.tensor([0, 0, 0, 1], device=R.device)
 66 |         .reshape(*(1,) * len(dims), 1, 4)
 67 |         .expand(*dims, 1, 4)
 68 |     )
 69 |     return torch.cat([pose_3x4, bottom], dim=-2)
 70 | 
 71 | 
 72 | def rotx(theta):
 73 |     return torch.tensor(
 74 |         [
 75 |             [1, 0, 0],
 76 |             [0, np.cos(theta), -np.sin(theta)],
 77 |             [0, np.sin(theta), np.cos(theta)],
 78 |         ],
 79 |         dtype=torch.float32,
 80 |     )
 81 | 
 82 | 
 83 | def roty(theta):
 84 |     return torch.tensor(
 85 |         [
 86 |             [np.cos(theta), 0, np.sin(theta)],
 87 |             [0, 1, 0],
 88 |             [-np.sin(theta), 0, np.cos(theta)],
 89 |         ],
 90 |         dtype=torch.float32,
 91 |     )
 92 | 
 93 | 
 94 | def rotz(theta):
 95 |     return torch.tensor(
 96 |         [
 97 |             [np.cos(theta), -np.sin(theta), 0],
 98 |             [np.sin(theta), np.cos(theta), 0],
 99 |             [0, 0, 1],
100 |         ],
101 |         dtype=torch.float32,
102 |     )
103 |     
104 | 
105 | def create_raymond_lights() -> List[pyrender.Node]:
106 |     """
107 |     Return raymond light nodes for the scene.
108 |     """
109 |     thetas = np.pi * np.array([1.0 / 6.0, 1.0 / 6.0, 1.0 / 6.0])
110 |     phis = np.pi * np.array([0.0, 2.0 / 3.0, 4.0 / 3.0])
111 | 
112 |     nodes = []
113 | 
114 |     for phi, theta in zip(phis, thetas):
115 |         xp = np.sin(theta) * np.cos(phi)
116 |         yp = np.sin(theta) * np.sin(phi)
117 |         zp = np.cos(theta)
118 | 
119 |         z = np.array([xp, yp, zp])
120 |         z = z / np.linalg.norm(z)
121 |         x = np.array([-z[1], z[0], 0.0])
122 |         if np.linalg.norm(x) == 0:
123 |             x = np.array([1.0, 0.0, 0.0])
124 |         x = x / np.linalg.norm(x)
125 |         y = np.cross(z, x)
126 | 
127 |         matrix = np.eye(4)
128 |         matrix[:3,:3] = np.c_[x,y,z]
129 |         nodes.append(pyrender.Node(
130 |             light=pyrender.DirectionalLight(color=np.ones(3), intensity=1.0),
131 |             matrix=matrix
132 |         ))
133 | 
134 |     return nodes
135 | 
136 | class Renderer:
137 | 
138 |     def __init__(self, cfg: CfgNode, faces: np.array, load_from_custom_file=False):
139 |         """
140 |         Wrapper around the pyrender renderer to render MANO meshes.
141 |         Args:
142 |             cfg (CfgNode): Model config file.
143 |             faces (np.array): Array of shape (F, 3) containing the mesh faces.
144 |         """
145 |         self.cfg = cfg
146 |         self.focal_length = cfg.EXTRA.FOCAL_LENGTH
147 |         self.img_res = cfg.MODEL.IMAGE_SIZE
148 | 
149 |         # add faces that make the hand mesh watertight
150 |         faces_new = np.array([[92, 38, 234],
151 |                               [234, 38, 239],
152 |                               [38, 122, 239],
153 |                               [239, 122, 279],
154 |                               [122, 118, 279],
155 |                               [279, 118, 215],
156 |                               [118, 117, 215],
157 |                               [215, 117, 214],
158 |                               [117, 119, 214],
159 |                               [214, 119, 121],
160 |                               [119, 120, 121],
161 |                               [121, 120, 78],
162 |                               [120, 108, 78],
163 |                               [78, 108, 79]])
164 |         faces = np.concatenate([faces, faces_new], axis=0)
165 |         
166 |         self.camera_center = [self.img_res // 2, self.img_res // 2]
167 |         self.faces = faces
168 |         self.faces_left = self.faces[:,[0,2,1]]
169 | 
170 |     def __call__(self,
171 |                 vertices: np.array,
172 |                 camera_translation: np.array,
173 |                 image: torch.Tensor,
174 |                 full_frame: bool = False,
175 |                 imgname: Optional[str] = None,
176 |                 side_view=False, rot_angle=90,
177 |                 mesh_base_color=(1.0, 1.0, 0.9),
178 |                 scene_bg_color=(0,0,0),
179 |                 return_rgba=False,
180 |                 no_camera=False,
181 |                 ) -> np.array:
182 |         """
183 |         Render meshes on input image
184 |         Args:
185 |             vertices (np.array): Array of shape (V, 3) containing the mesh vertices.
186 |             camera_translation (np.array): Array of shape (3,) with the camera translation.
187 |             image (torch.Tensor): Tensor of shape (3, H, W) containing the image crop with normalized pixel values.
188 |             full_frame (bool): If True, then render on the full image.
189 |             imgname (Optional[str]): Contains the original image filenamee. Used only if full_frame == True.
190 |         """
191 |         if full_frame:
192 |             image = cv2.imread(imgname).astype(np.float32)[:, :, ::-1] / 255.
193 |         else:
194 |             image = image.clone() * torch.tensor(self.cfg.MODEL.IMAGE_STD, device=image.device).reshape(3,1,1)
195 |             image = image + torch.tensor(self.cfg.MODEL.IMAGE_MEAN, device=image.device).reshape(3,1,1)
196 |             image = image.permute(1, 2, 0).cpu().numpy()
197 | 
198 |         renderer = pyrender.OffscreenRenderer(viewport_width=image.shape[1],
199 |                                               viewport_height=image.shape[0],
200 |                                               point_size=1.0)
201 |         material = pyrender.MetallicRoughnessMaterial(
202 |             metallicFactor=0.0,
203 |             alphaMode='OPAQUE',
204 |             baseColorFactor=(*mesh_base_color, 1.0))
205 | 
206 |         camera_translation[0] *= -1.
207 | 
208 |         mesh = trimesh.Trimesh(vertices.copy(), self.faces.copy())
209 |         if side_view:
210 |             rot = trimesh.transformations.rotation_matrix(
211 |                 np.radians(rot_angle), [0, 1, 0])
212 |             mesh.apply_transform(rot)
213 |         rot = trimesh.transformations.rotation_matrix(
214 |             np.radians(180), [1, 0, 0])
215 |         mesh.apply_transform(rot)
216 | 
217 | 
218 |         mesh = pyrender.Mesh.from_trimesh(mesh, material=material)
219 | 
220 |         scene = pyrender.Scene(bg_color=[*scene_bg_color, 0.0],
221 |                                ambient_light=(0.3, 0.3, 0.3))
222 |         scene.add(mesh, 'mesh')
223 | 
224 | 
225 |         # Convert mesh to point cloud
226 |         # mesh_pc = pyrender.Mesh.from_points(mesh.vertices)
227 | 
228 |         # # Show point cloud using open3d
229 |         # print('Showing point cloud=====================')
230 |         # import open3d as o3d
231 |         # pcd = o3d.geometry.PointCloud()
232 |         # pcd.points = o3d.utility.Vector3dVector(mesh_pc.vertices)
233 |         # o3d.visualization.draw_geometries([pcd])
234 | 
235 | 
236 | 
237 |         camera_pose = np.eye(4)
238 |         if not no_camera:
239 |             camera_pose[:3, 3] = camera_translation
240 |         camera_center = [image.shape[1] / 2., image.shape[0] / 2.]
241 |         camera = pyrender.IntrinsicsCamera(fx=self.focal_length, fy=self.focal_length,
242 |                                             cx=camera_center[0], cy=camera_center[1], zfar=1e12)
243 |         scene.add(camera, pose=camera_pose)
244 | 
245 | 
246 |         light_nodes = create_raymond_lights()
247 |         for node in light_nodes:
248 |             scene.add_node(node)
249 | 
250 |         color, rend_depth = renderer.render(scene, flags=pyrender.RenderFlags.RGBA)
251 |         color = color.astype(np.float32) / 255.0
252 |         renderer.delete()
253 | 
254 |         if return_rgba:
255 |             return color, rend_depth, mesh
256 | 
257 |         valid_mask = (color[:, :, -1])[:, :, np.newaxis]
258 |         if not side_view:
259 |             output_img = (color[:, :, :3] * valid_mask + (1 - valid_mask) * image)
260 |         else:
261 |             output_img = color[:, :, :3]
262 | 
263 |         output_img = output_img.astype(np.float32)
264 |         return output_img, rend_depth, mesh
265 | 
266 |     def vertices_to_trimesh(self, vertices, camera_translation, mesh_base_color=(1.0, 1.0, 0.9), 
267 |                             rot_axis=[1,0,0], rot_angle=0, is_right=1):
268 |         # material = pyrender.MetallicRoughnessMaterial(
269 |         #     metallicFactor=0.0,
270 |         #     alphaMode='OPAQUE',
271 |         #     baseColorFactor=(*mesh_base_color, 1.0))
272 |         vertex_colors = np.array([(*mesh_base_color, 1.0)] * vertices.shape[0])
273 |         if is_right:
274 |             mesh = trimesh.Trimesh(vertices.copy() + camera_translation, self.faces.copy(), vertex_colors=vertex_colors)
275 |         else:
276 |             mesh = trimesh.Trimesh(vertices.copy() + camera_translation, self.faces_left.copy(), vertex_colors=vertex_colors)
277 |         # mesh = trimesh.Trimesh(vertices.copy(), self.faces.copy())
278 |         
279 |         rot = trimesh.transformations.rotation_matrix(
280 |                 np.radians(rot_angle), rot_axis)
281 |         mesh.apply_transform(rot)
282 | 
283 |         # Show point cloud using open3d
284 |         # print('Showing point cloud=====================')
285 |         # import open3d as o3d
286 |         # pcd = o3d.geometry.PointCloud()
287 |         # pcd.points = o3d.utility.Vector3dVector(mesh.vertices)
288 |         # o3d.visualization.draw_geometries([pcd])
289 | 
290 |         rot = trimesh.transformations.rotation_matrix(
291 |             np.radians(180), [1, 0, 0])
292 |         mesh.apply_transform(rot)
293 |         return mesh
294 | 
295 |     def render_rgba(
296 |             self,
297 |             vertices: np.array,
298 |             cam_t = None,
299 |             rot=None,
300 |             rot_axis=[1,0,0],
301 |             rot_angle=0,
302 |             camera_z=3,
303 |             # camera_translation: np.array,
304 |             mesh_base_color=(1.0, 1.0, 0.9),
305 |             scene_bg_color=(0,0,0),
306 |             render_res=[256, 256],
307 |             focal_length=None,
308 |             is_right=None,
309 |         ):
310 | 
311 |         renderer = pyrender.OffscreenRenderer(viewport_width=render_res[0],
312 |                                               viewport_height=render_res[1],
313 |                                               point_size=1.0)
314 |         # material = pyrender.MetallicRoughnessMaterial(
315 |         #     metallicFactor=0.0,
316 |         #     alphaMode='OPAQUE',
317 |         #     baseColorFactor=(*mesh_base_color, 1.0))
318 | 
319 |         focal_length = focal_length if focal_length is not None else self.focal_length
320 | 
321 |         if cam_t is not None:
322 |             camera_translation = cam_t.copy()
323 |             camera_translation[0] *= -1.
324 |         else:
325 |             camera_translation = np.array([0, 0, camera_z * focal_length/render_res[1]])
326 | 
327 |         mesh = self.vertices_to_trimesh(vertices, np.array([0, 0, 0]), mesh_base_color, rot_axis, rot_angle, is_right=is_right)
328 |         mesh = pyrender.Mesh.from_trimesh(mesh)
329 |         # mesh = pyrender.Mesh.from_trimesh(mesh, material=material)
330 | 
331 |         scene = pyrender.Scene(bg_color=[*scene_bg_color, 0.0],
332 |                                ambient_light=(0.3, 0.3, 0.3))
333 |         scene.add(mesh, 'mesh')
334 | 
335 |         camera_pose = np.eye(4)
336 |         camera_pose[:3, 3] = camera_translation
337 |         camera_center = [render_res[0] / 2., render_res[1] / 2.]
338 |         camera = pyrender.IntrinsicsCamera(fx=focal_length, fy=focal_length,
339 |                                            cx=camera_center[0], cy=camera_center[1], zfar=1e12)
340 | 
341 |         # Create camera node and add it to pyRender scene
342 |         camera_node = pyrender.Node(camera=camera, matrix=camera_pose)
343 |         scene.add_node(camera_node)
344 |         self.add_point_lighting(scene, camera_node)
345 |         self.add_lighting(scene, camera_node)
346 | 
347 |         light_nodes = create_raymond_lights()
348 |         for node in light_nodes:
349 |             scene.add_node(node)
350 | 
351 |         color, rend_depth = renderer.render(scene, flags=pyrender.RenderFlags.RGBA)
352 |         color = color.astype(np.float32) / 255.0
353 |         renderer.delete()
354 | 
355 |         return color
356 | 
357 |     def render_rgba_multiple(
358 |             self,
359 |             vertices: List[np.array],
360 |             cam_t: List[np.array],
361 |             rot_axis=[1,0,0],
362 |             rot_angle=0,
363 |             mesh_base_color=(1.0, 1.0, 0.9),
364 |             scene_bg_color=(0,0,0),
365 |             render_res=[256, 256],
366 |             focal_length=None,
367 |             is_right=None,
368 |         ):
369 | 
370 |         renderer = pyrender.OffscreenRenderer(viewport_width=render_res[0],
371 |                                               viewport_height=render_res[1],
372 |                                               point_size=1.0)
373 |         # material = pyrender.MetallicRoughnessMaterial(
374 |         #     metallicFactor=0.0,
375 |         #     alphaMode='OPAQUE',
376 |         #     baseColorFactor=(*mesh_base_color, 1.0))
377 | 
378 |         if is_right is None:
379 |             is_right = [1 for _ in range(len(vertices))]
380 | 
381 |         mesh_list = [pyrender.Mesh.from_trimesh(self.vertices_to_trimesh(vvv, ttt.copy(), mesh_base_color, rot_axis, rot_angle, is_right=sss)) for vvv,ttt,sss in zip(vertices, cam_t, is_right)]
382 |         trimesh_list = [self.vertices_to_trimesh(vvv, ttt.copy(), mesh_base_color, rot_axis, rot_angle, is_right=sss) for vvv,ttt,sss in zip(vertices, cam_t, is_right)]
383 |         scene = pyrender.Scene(bg_color=[*scene_bg_color, 0.0],
384 |                                ambient_light=(0.3, 0.3, 0.3))
385 |         for i,mesh in enumerate(mesh_list):
386 |             scene.add(mesh, f'mesh_{i}')
387 | 
388 |         camera_pose = np.eye(4)
389 |         # camera_pose[:3, 3] = camera_translation
390 |         camera_center = [render_res[0] / 2., render_res[1] / 2.]
391 |         focal_length = focal_length if focal_length is not None else self.focal_length
392 |         print(f'[render_rgba_multiple] Focal length: {focal_length}')
393 |         print(f'[render_rgba_multiple] Camera center: {camera_center}')
394 |         camera = pyrender.IntrinsicsCamera(fx=focal_length, fy=focal_length,
395 |                                            cx=camera_center[0], cy=camera_center[1], zfar=1e12)
396 | 
397 |         # Create camera node and add it to pyRender scene
398 |         camera_node = pyrender.Node(camera=camera, matrix=camera_pose)
399 |         scene.add_node(camera_node)
400 |         self.add_point_lighting(scene, camera_node)
401 |         self.add_lighting(scene, camera_node)
402 | 
403 |         light_nodes = create_raymond_lights()
404 |         for node in light_nodes:
405 |             scene.add_node(node)
406 | 
407 |         color, rend_depth = renderer.render(scene, flags=pyrender.RenderFlags.RGBA)
408 |         # Convert normalized depth to world space depth
409 |         # rend_depth = rend_depth * 2.0 - 1.0
410 |         # rend_depth = 2.0 * focal_length / (rend_depth * render_res[1])
411 | 
412 |         color = color.astype(np.float32) / 255.0
413 |         renderer.delete()
414 | 
415 |         return color, rend_depth, trimesh_list
416 | 
417 |     def add_lighting(self, scene, cam_node, color=np.ones(3), intensity=1.0):
418 |         # from phalp.visualize.py_renderer import get_light_poses
419 |         light_poses = get_light_poses()
420 |         light_poses.append(np.eye(4))
421 |         cam_pose = scene.get_pose(cam_node)
422 |         for i, pose in enumerate(light_poses):
423 |             matrix = cam_pose @ pose
424 |             node = pyrender.Node(
425 |                 name=f"light-{i:02d}",
426 |                 light=pyrender.DirectionalLight(color=color, intensity=intensity),
427 |                 matrix=matrix,
428 |             )
429 |             if scene.has_node(node):
430 |                 continue
431 |             scene.add_node(node)
432 | 
433 |     def add_point_lighting(self, scene, cam_node, color=np.ones(3), intensity=1.0):
434 |         # from phalp.visualize.py_renderer import get_light_poses
435 |         light_poses = get_light_poses(dist=0.5)
436 |         light_poses.append(np.eye(4))
437 |         cam_pose = scene.get_pose(cam_node)
438 |         for i, pose in enumerate(light_poses):
439 |             matrix = cam_pose @ pose
440 |             # node = pyrender.Node(
441 |             #     name=f"light-{i:02d}",
442 |             #     light=pyrender.DirectionalLight(color=color, intensity=intensity),
443 |             #     matrix=matrix,
444 |             # )
445 |             node = pyrender.Node(
446 |                 name=f"plight-{i:02d}",
447 |                 light=pyrender.PointLight(color=color, intensity=intensity),
448 |                 matrix=matrix,
449 |             )
450 |             if scene.has_node(node):
451 |                 continue
452 |             scene.add_node(node)
453 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
  1 | absl-py==2.2.0
  2 | addict==2.4.0
  3 | aiohappyeyeballs==2.6.1
  4 | aiohttp==3.11.14
  5 | aiosignal==1.3.2
  6 | antlr4-python3-runtime==4.9.3
  7 | async-timeout==5.0.1
  8 | attrs==25.3.0
  9 | beautifulsoup4==4.13.3
 10 | black==25.1.0
 11 | blinker==1.9.0
 12 | braceexpand==0.1.7
 13 | 
 14 | certifi==2025.1.31
 15 | cffi==1.17.1
 16 | charset-normalizer==3.4.1
 17 | 
 18 | click==8.1.8
 19 | cloudpickle==3.1.1
 20 | ConfigArgParse==1.7
 21 | 
 22 | cryptease==0.2.0
 23 | cryptography==44.0.2
 24 | 
 25 | 
 26 | dash==3.0.0
 27 | 
 28 | einops==0.8.1
 29 | fastjsonschema==2.21.1
 30 | filelock==3.18.0
 31 | Flask==3.0.3
 32 | flatbuffers==25.2.10
 33 | 
 34 | 
 35 | frozenlist==1.5.0
 36 | fsspec==2025.3.0
 37 | future==1.0.0
 38 | fvcore==0.1.5.post20221221
 39 | gdown==5.2.0
 40 | grpcio==1.71.0
 41 | huggingface-hub==0.29.3
 42 | hydra-core==1.3.2
 43 | idna==3.10
 44 | 
 45 | importlib_metadata==8.6.1
 46 | iopath==0.1.9
 47 | itsdangerous==2.2.0
 48 | jax==0.5.3
 49 | jaxlib==0.5.3
 50 | Jinja2==3.1.6
 51 | joblib==1.4.2
 52 | json-tricks==3.17.3
 53 | jsonschema==4.23.0
 54 | jsonschema-specifications==2024.10.1
 55 | jupyter_core==5.7.2
 56 | 
 57 | lazy_loader==0.4
 58 | lightning-utilities==0.14.2
 59 | lxml==5.3.1
 60 | Markdown==3.7
 61 | markdown-it-py==3.0.0
 62 | MarkupSafe==3.0.2
 63 | matplotlib==3.10.0
 64 | mdurl==0.1.2
 65 | mediapipe==0.10.21
 66 | ml_dtypes==0.5.1
 67 | mmcv==1.3.9
 68 | mmengine==0.10.7
 69 | mpmath==1.3.0
 70 | multidict==6.2.0
 71 | munkres==1.1.4
 72 | mypy-extensions==1.0.0
 73 | narwhals==1.32.0
 74 | nbformat==5.10.4
 75 | nest-asyncio==1.6.0
 76 | 
 77 | numpy==1.26.4
 78 | omegaconf==2.3.0
 79 | open3d==0.19.0
 80 | opencv-contrib-python==4.11.0.86
 81 | opencv-python==4.11.0.86
 82 | opt_einsum==3.4.0
 83 | 
 84 | pandas==2.2.3
 85 | pathspec==0.12.1
 86 | pillow==11.1.0
 87 | platformdirs==4.3.7
 88 | plotly==6.0.1
 89 | portalocker==3.1.1
 90 | propcache==0.3.0
 91 | protobuf==4.25.6
 92 | 
 93 | pycparser==2.22
 94 | 
 95 | Pygments==2.19.1
 96 | PyOpenGL==3.1.0
 97 | PyOpenGL-accelerate==3.1.9
 98 | 
 99 | pyquaternion==0.9.9
100 | 
101 | PySocks==1.7.1
102 | 
103 | pytorch-lightning==2.5.1
104 | pytz==2025.1
105 | PyYAML==6.0.2
106 | referencing==0.36.2
107 | requests==2.32.3
108 | retrying==1.3.4
109 | rich==13.9.4
110 | rpds-py==0.23.1
111 | safetensors==0.5.3
112 | scikit-image==0.25.2
113 | scikit-learn==1.6.1
114 | scipy==1.15.2
115 | sentencepiece==0.2.0
116 | six==1.17.0
117 | smplx==0.1.28
118 | sounddevice==0.5.1
119 | soupsieve==2.6
120 | stringcase==1.2.0
121 | sympy==1.13.1
122 | tabulate==0.9.0
123 | tensorboard==2.19.0
124 | tensorboard-data-server==0.7.2
125 | termcolor==2.5.0
126 | threadpoolctl==3.6.0
127 | tifffile==2025.3.13
128 | timm==1.0.15
129 | tomli==2.2.1
130 | torch==2.6.0
131 | torchaudio==2.6.0
132 | torchgeometry==0.1.2
133 | torchmetrics==1.7.0
134 | torchvision==0.21.0
135 | tqdm==4.67.1
136 | traitlets==5.14.3
137 | 
138 | typing_extensions==4.12.2
139 | tzdata==2025.2
140 | 
141 | urllib3==2.3.0
142 | webdataset==0.2.111
143 | Werkzeug==3.0.6
144 | xtcocotools==1.14.3
145 | yacs==0.1.8
146 | yapf==0.43.0
147 | yarl==1.18.3
148 | zipp==3.21.0
149 | 


--------------------------------------------------------------------------------
/util_functions.py:
--------------------------------------------------------------------------------
  1 | from scipy.spatial.transform import Rotation as R
  2 | from scipy.spatial.transform import Slerp
  3 | import open3d as o3d
  4 | import numpy as np
  5 | 
  6 | def point_cloud_to_depth_image(point_cloud, fx, fy, cx, cy, width=1280, height=720):
  7 |     """
  8 |     Convert a 3D point cloud into a depth image.
  9 | 
 10 |     Parameters:
 11 |     - point_cloud: A 2D numpy array of shape (N, 3), containing the (X, Y, Z) coordinates of each point.
 12 |     - fx, fy: The focal lengths of the camera in pixels.
 13 |     - cx, cy: The optical center (principal point) of the camera, typically the image center.
 14 |     - width, height: The dimensions of the resulting depth image.
 15 | 
 16 |     Returns:
 17 |     - A 2D numpy array of shape (height, width), representing the depth image, where each pixel contains
 18 |       the depth value of the corresponding point in the point cloud. Pixels with no corresponding point
 19 |       are set to 0.
 20 |     """
 21 |     # Initialize the depth image with zeros (no depth)
 22 |     depth_image = np.zeros((height, width), dtype=np.float32)
 23 | 
 24 |     # Extract X, Y, Z coordinates
 25 |     X, Y, Z = point_cloud[:, 0], point_cloud[:, 1], point_cloud[:, 2]
 26 | 
 27 |     # Exclude points with no depth or behind the camera
 28 |     # valid = Z > 0
 29 |     # X = X[valid]
 30 |     # Y = Y[valid]
 31 |     # Z = Z[valid]
 32 | 
 33 |     # Project points onto the image plane
 34 |     u = (X * fx / Z) + cx
 35 |     v = (Y * fy / Z) + cy
 36 | 
 37 |     # Convert coordinates to integer pixel indices
 38 |     u = np.round(u).astype(int)
 39 |     v = np.round(v).astype(int)
 40 | 
 41 |     # Filter out points that fall outside the image
 42 |     valid = (u >= 0) & (u < width) & (v >= 0) & (v < height)
 43 |     u = u[valid]
 44 |     v = v[valid]
 45 |     Z = Z[valid]
 46 | 
 47 |     # Update the depth image
 48 |     depth_image[v, u] = Z
 49 | 
 50 |     return depth_image
 51 | 
 52 | 
 53 | def depth_to_point_cloud(depth_im, fx, fy, cx, cy):
 54 |     rows, cols = depth_im.shape
 55 |     x = np.arange(0, cols)
 56 |     y = np.arange(0, rows)
 57 |     xx, yy = np.meshgrid(x, y)
 58 |     X = (xx - cx) * depth_im / fx
 59 |     Y = (yy - cy) * depth_im / fy
 60 |     Z = depth_im
 61 |     return np.stack([X, Y, Z], axis=-1)
 62 | 
 63 | 
 64 | def find_scaled_transformation(p, q, use_scale=True):
 65 |     # p, q are Nx3 arrays of corresponding points
 66 |     # Calculate centroids
 67 |     Cp = np.mean(p, axis=0)
 68 |     Cq = np.mean(q, axis=0)
 69 |     # Center points around the centroids
 70 |     p_centered = p - Cp
 71 |     q_centered = q - Cq
 72 |     # Compute scale
 73 |     # Using the ratio of sums of distances from the centroids
 74 |     scale = np.sqrt((q_centered ** 2).sum() / (p_centered ** 2).sum())
 75 | 
 76 |     if not use_scale:
 77 |         scale = 1
 78 |     # Compute covariance matrix H
 79 |     H = np.dot(p_centered.T, q_centered)
 80 |     # SVD on covariance matrix
 81 |     U, S, Vt = np.linalg.svd(H)
 82 |     R = np.dot(Vt.T, U.T)
 83 |     # Ensure a proper rotation matrix (det(R) = 1)
 84 |     if np.linalg.det(R) < 0:
 85 |         Vt[-1, :] *= -1
 86 |         R = np.dot(Vt.T, U.T)
 87 |     # Compute translation
 88 |     t = Cq.T - scale * np.dot(R, Cp.T)
 89 |     # Construct the transformation matrix
 90 |     T = np.eye(4)
 91 |     T[:3, :3] = scale * R
 92 |     T[:3, 3] = t
 93 |     return T
 94 | 
 95 | 
 96 | def apply_transformation(p, T):
 97 |     """
 98 |     Apply a 4x4 transformation matrix to a set of 3D points.
 99 | 
100 |     Parameters:
101 |     - p: A 2D numpy array of shape (N, 3), containing the (X, Y, Z) coordinates of each point.
102 |     - T: The 4x4 transformation matrix.
103 | 
104 |     Returns:
105 |     - A 2D numpy array of shape (N, 3), containing the transformed (X, Y, Z) coordinates.
106 |     """
107 |     # Homogenize the points in p
108 |     p_homogeneous = np.hstack((p, np.ones((p.shape[0], 1))))
109 |     # Apply the transformation matrix to each point
110 |     p_transformed_homogeneous = np.dot(T, p_homogeneous.T).T
111 |     # Dehomogenize the points
112 |     p_transformed = p_transformed_homogeneous[:, :3] / p_transformed_homogeneous[:, 3, np.newaxis]
113 |     return p_transformed
114 | 
115 | 
116 | def compute_principal_axis(point_cloud, switch_principal_axis=False, return_eigenstuff=False):
117 |     # Extract principal axis of the point cloud
118 |     # Compute the covariance matrix
119 |     cov = np.cov(point_cloud.T)
120 |     # Compute the eigenvalues and eigenvectors
121 |     eigenvalues, eigenvectors = np.linalg.eig(cov)
122 |     # Sort the eigenvectors by decreasing eigenvalues
123 |     idx = eigenvalues.argsort()[::-1]
124 |     eigenvalues = eigenvalues[idx]
125 |     eigenvectors = eigenvectors[:, idx]
126 |     # Extract the principal axis
127 |     principal_axis = eigenvectors[:, 0]
128 |     second_axis = eigenvectors[:, 1]
129 |     # Make the vector longer by keep its direction
130 |     principal_axis = principal_axis * 10
131 |     second_axis = second_axis * 10
132 |     if return_eigenstuff:
133 |         return eigenvalues, eigenvectors
134 |     if switch_principal_axis:
135 |         # return second_axis, principal_axis
136 |         return second_axis, eigenvectors[:, 2] * 10
137 |     return principal_axis, second_axis
138 | 
139 | 
140 | def scale_gripper_match_hand(gripper_pcd, hand_pcd, vizualize=False):
141 |     """Scaled the gripper to match the hand 1-to-1
142 |     This is approximate and may not need to be engineered; but that's okay.
143 |     """
144 |     gripper_bb = gripper_pcd.get_axis_aligned_bounding_box()
145 |     hand_bb = hand_pcd.get_axis_aligned_bounding_box()
146 |     # get bounding box volume
147 |     volume_ratio = hand_bb.volume() / gripper_bb.volume()
148 |     gripper = gripper_pcd.points
149 |     hand = hand_pcd.points
150 |     scale_x = (hand_bb.get_max_bound()[0] - hand_bb.get_min_bound()[0]) / (
151 |                 gripper_bb.get_max_bound()[0] - gripper_bb.get_min_bound()[0])
152 |     scale_y = (hand_bb.get_max_bound()[1] - hand_bb.get_min_bound()[1]) / (
153 |                 gripper_bb.get_max_bound()[1] - gripper_bb.get_min_bound()[1])
154 |     scale_z = (hand_bb.get_max_bound()[2] - hand_bb.get_min_bound()[2]) / (
155 |                 gripper_bb.get_max_bound()[2] - gripper_bb.get_min_bound()[2])
156 |     center = gripper_pcd.get_center()
157 |     mean_scaling = (scale_x + scale_y + scale_z) / 3
158 |     # pcd_scaled = (gripper - center) * np.array([scale_x, scale_y, scale_z]) + center
159 |     pcd_scaled = (gripper - center) * np.array([mean_scaling, mean_scaling, mean_scaling]) + center
160 |     pcd_scaled_o3d = o3d.geometry.PointCloud()
161 |     pcd_scaled_o3d.points = o3d.utility.Vector3dVector(pcd_scaled)
162 | 
163 |     if vizualize:
164 |         o3d.visualization.draw_geometries([pcd_scaled_o3d, hand_pcd])
165 | 
166 |     return pcd_scaled_o3d
167 | 
168 | 
169 | def np_to_o3d(points):
170 |     pcd = o3d.geometry.PointCloud()
171 |     pcd.points = o3d.utility.Vector3dVector(points)
172 |     return pcd
173 | 
174 | 
175 | def interpolate_rotation(r1, r2, num_steps):
176 |     r1 = R.from_matrix(r1)
177 |     r2 = R.from_matrix(r2)
178 |     # r1 and r2 in one Rotation instance
179 |     key_rots = R.from_matrix([r1.as_matrix(), r2.as_matrix()])
180 |     # interpolate with Slerp
181 |     slerp = Slerp([0, 1], key_rots)
182 |     times = np.linspace(0, 1, num_steps)
183 |     qs = slerp(times)
184 | 
185 |     return qs
186 | 
187 | 
188 | def interpolate_translation(t1, t2, num_steps):
189 |     ts = np.zeros((num_steps, 3))
190 |     for i in range(num_steps):
191 |         ts[i] = t1 + (t2 - t1) * i / (num_steps - 1)
192 |     return ts
193 | 
194 | 
195 | def interpolate_pose(pose1, pose2, num_steps):
196 |     r1 = pose1[:3, :3]
197 |     t1 = pose1[:3, 3]
198 |     r2 = pose2[:3, :3]
199 |     t2 = pose2[:3, 3]
200 |     rs = interpolate_rotation(r1, r2, num_steps)
201 |     ts = interpolate_translation(t1, t2, num_steps)
202 |     poses = np.zeros((num_steps, 4, 4))
203 |     for i in range(num_steps):
204 |         poses[i, :3, :3] = rs[i].as_matrix()  # R.from_quat(rs[i]).as_matrix()
205 |         poses[i, :3, 3] = ts[i]
206 |         poses[i, 3, 3] = 1
207 |     return poses
208 | 
209 | 
210 | def interpolate_pose_sequence(poses, num_steps):
211 |     interpolated_poses = []
212 |     for i in range(len(poses) - 1):
213 |         interpolated_poses.append(interpolate_pose(poses[i], poses[i + 1], num_steps))
214 |     return np.vstack(interpolated_poses)
215 | 
216 | def rotation_matrix(axis, theta):
217 |     """
218 |     Return the rotation matrix associated with counterclockwise rotation about
219 |     the given axis by theta radians.
220 |     """
221 |     axis = np.asarray(axis)
222 |     axis = axis / np.sqrt(np.dot(axis, axis))
223 |     a = np.cos(theta / 2.0)
224 |     b, c, d = -axis * np.sin(theta / 2.0)
225 |     aa, bb, cc, dd = a * a, b * b, c * c, d * d
226 |     bc, ad, ac, ab, bd, cd = b * c, a * d, a * c, a * b, b * d, c * d
227 |     return np.array([[aa + bb - cc - dd, 2 * (bc + ad), 2 * (bd - ac)],
228 |                     [2 * (bc - ad), aa + cc - bb - dd, 2 * (cd + ab)],
229 |                     [2 * (bd + ac), 2 * (cd - ab), aa + dd - bb - cc]])
230 | 


--------------------------------------------------------------------------------
/video_to_gripper_hamer_kpts.py:
--------------------------------------------------------------------------------
   1 | # IMPORTS, FILE LOADS, DETECTRON, RENDERER PREP ETC.
   2 | from pathlib import Path
   3 | import torch
   4 | import argparse
   5 | import os
   6 | from hamer.configs import CACHE_DIR_HAMER
   7 | from hamer.models import HAMER, download_models, load_hamer, DEFAULT_CHECKPOINT
   8 | from hamer.utils import recursive_to
   9 | from hamer.datasets.vitdet_dataset import ViTDetDataset, DEFAULT_MEAN, DEFAULT_STD
  10 | from hamer.utils.renderer import Renderer, cam_crop_to_full
  11 | from vitpose_model import ViTPoseModel
  12 | from detectron2.utils.logger import setup_logger
  13 | setup_logger()
  14 | import cv2
  15 | 
  16 | from detectron2.utils.visualizer import Visualizer
  17 | from detectron2.data import MetadataCatalog
  18 | 
  19 | import mediapipe   as mp
  20 | import cv2 as cv
  21 | import matplotlib.pyplot as plt
  22 | import copy
  23 | import time
  24 | from mano.utils import Mesh, Struct, colors, to_np, to_tensor
  25 | import pickle
  26 | import os.path as osp
  27 | import trimesh
  28 | from mano import lbs
  29 | from mano.joints_info import TIP_IDS
  30 | import tqdm
  31 | 
  32 | # local imports
  33 | from util_functions import *
  34 | from global_vars import *
  35 | 
  36 | args = argparse.Namespace()
  37 | args.checkpoint = DEFAULT_CHECKPOINT
  38 | args.batch_size = 1
  39 | args.rescale_factor = 2.0
  40 | args.body_detector = 'vitdet'
  41 | 
  42 | IM_WIDTH = hands_rgb[0].shape[1]
  43 | IM_HEIGHT = hands_rgb[0].shape[0]
  44 | print("Image width: ", IM_WIDTH)
  45 | print("Image height: ", IM_HEIGHT)
  46 | 
  47 | INTRINSICS_HAMER_RENDERER[0, 2] = IM_WIDTH / 2
  48 | INTRINSICS_HAMER_RENDERER[1, 2] = IM_HEIGHT / 2
  49 | 
  50 | if VIZ_DEMO:
  51 |     # show rgbs as video
  52 |     for i in range(hands_rgb.shape[0]):
  53 |         im = cv2.cvtColor(hands_rgb[i], cv2.COLOR_BGR2RGB)
  54 |         cv2.imshow("rgb", im)
  55 |         cv2.waitKey(1)
  56 |         time.sleep(0.01)
  57 |     cv2.destroyAllWindows()
  58 | 
  59 | # Download and load checkpoints
  60 | download_models(CACHE_DIR_HAMER)
  61 | model, model_cfg = load_hamer(args.checkpoint)
  62 | 
  63 | # Setup HaMeR model
  64 | device = torch.device('cpu')
  65 | model = model.to(device)
  66 | model.eval()
  67 | # Load detector
  68 | from hamer.utils.utils_detectron2 import DefaultPredictor_Lazy
  69 | if args.body_detector == 'vitdet':
  70 |     from detectron2.config import LazyConfig
  71 |     import hamer
  72 |     cfg_path = Path(hamer.__file__).parent/'configs'/'cascade_mask_rcnn_vitdet_h_75ep.py'
  73 |     detectron2_cfg = LazyConfig.load(str(cfg_path))
  74 |     detectron2_cfg.train.init_checkpoint = "https://dl.fbaipublicfiles.com/detectron2/ViTDet/COCO/cascade_mask_rcnn_vitdet_h/f328730692/model_final_f05665.pkl"
  75 |     for i in range(3):
  76 |         detectron2_cfg.model.roi_heads.box_predictors[i].test_score_thresh = 0.25
  77 |     detector = DefaultPredictor_Lazy(detectron2_cfg)
  78 | elif args.body_detector == 'regnety':
  79 |     from detectron2 import model_zoo
  80 |     from detectron2.config import get_cfg
  81 |     detectron2_cfg = model_zoo.get_config('new_baselines/mask_rcnn_regnety_4gf_dds_FPN_400ep_LSJ.py', trained=True)
  82 |     detectron2_cfg.model.roi_heads.box_predictor.test_score_thresh = 0.5
  83 |     detectron2_cfg.model.roi_heads.box_predictor.test_nms_thresh   = 0.4
  84 |     detector       = DefaultPredictor_Lazy(detectron2_cfg)
  85 | # keypoint detector
  86 | cpm = ViTPoseModel(device)
  87 | # Setup the renderer
  88 | renderer = Renderer(model_cfg, faces=model.mano.faces, load_from_custom_file=True)
  89 | 
  90 | print("Existing focal length: ", model_cfg.EXTRA.FOCAL_LENGTH)
  91 | 
  92 | 
  93 | 
  94 | def get_hand_and_rendered_depth(rgb):
  95 |     # Get all demo images ends with .jpg or .png
  96 |     # Iterate over all images in folder
  97 |     img_cv2 = rgb
  98 |     img_cv2 = cv2.cvtColor(img_cv2, cv2.COLOR_BGR2RGB)
  99 |     # Detect humans in image
 100 |     det_out = detector(img_cv2)
 101 |     img = img_cv2.copy()[:, :, ::-1]
 102 |     det_instances = det_out['instances']
 103 |     valid_idx = (det_instances.pred_classes==0) & (det_instances.scores > 0.5)
 104 |     pred_bboxes=det_instances.pred_boxes.tensor[valid_idx].cpu().numpy()
 105 |     pred_scores=det_instances.scores[valid_idx].cpu().numpy()
 106 |     # Detect human keypoints for each person
 107 |     vitposes_out = cpm.predict_pose(img, [np.concatenate([pred_bboxes, pred_scores[:, None]], axis=1)],)
 108 |     bboxes = []
 109 |     is_right = []
 110 |     # Use hands based on hand keypoint detections
 111 |     for vitposes in vitposes_out:
 112 |         left_hand_keyp = vitposes['keypoints'][-42:-21]
 113 |         right_hand_keyp = vitposes['keypoints'][-21:]
 114 |         # Rejecting not confident detections
 115 |         keyp = left_hand_keyp
 116 |         valid = keyp[:,2] > 0.5
 117 |         if sum(valid) > 3:
 118 |             bbox = [keyp[valid,0].min(), keyp[valid,1].min(), keyp[valid,0].max(), keyp[valid,1].max()]
 119 |             bboxes.append(bbox)
 120 |             is_right.append(0)
 121 |         keyp = right_hand_keyp
 122 |         valid = keyp[:,2] > 0.5
 123 |         if sum(valid) > 3:
 124 |             bbox = [keyp[valid,0].min(), keyp[valid,1].min(), keyp[valid,0].max(), keyp[valid,1].max()]
 125 |             bboxes.append(bbox)
 126 |             is_right.append(1)
 127 | 
 128 |     boxes = np.stack(bboxes)
 129 |     right = np.stack(is_right)
 130 |     # Run reconstruction on all detected hands
 131 |     dataset = ViTDetDataset(model_cfg, img_cv2, boxes, right, rescale_factor=args.rescale_factor)
 132 |     dataloader = torch.utils.data.DataLoader(dataset, batch_size=8, shuffle=False, num_workers=0)
 133 |     all_verts = []
 134 |     all_cam_t = []
 135 |     all_right = []
 136 |     
 137 |     for batch in dataloader:
 138 |         batch = recursive_to(batch, device)
 139 |         with torch.no_grad():
 140 |             out = model(batch)
 141 |         multiplier = (2*batch['right']-1)
 142 |         pred_cam = out['pred_cam']
 143 |         pred_cam[:,1] = multiplier*pred_cam[:,1]
 144 |         box_center = batch["box_center"].float()
 145 |         box_size = batch["box_size"].float()
 146 |         img_size = batch["img_size"].float()
 147 |         multiplier = (2*batch['right']-1)
 148 |         scaled_focal_length = model_cfg.EXTRA.FOCAL_LENGTH / model_cfg.MODEL.IMAGE_SIZE * img_size.max()
 149 |         INTRINSICS_HAMER_RENDERER[0 ,0] = scaled_focal_length.item()
 150 |         INTRINSICS_HAMER_RENDERER[1, 1] = scaled_focal_length.item()
 151 |         pred_cam_t_full = cam_crop_to_full(pred_cam, box_center, box_size, img_size, scaled_focal_length).detach().cpu().numpy()
 152 |         # Render the result
 153 |         batch_size = batch['img'].shape[0]
 154 |         print(f'Batch size: {batch_size}')
 155 |         for n in range(batch_size):
 156 |             # Get filename from path img_path
 157 |             input_patch = batch['img'][n].cpu() * (DEFAULT_STD[:,None,None]/255) + (DEFAULT_MEAN[:,None,None]/255)
 158 |             input_patch = input_patch.permute(1,2,0).numpy()
 159 |             # Add all verts and cams to list
 160 |             verts = out['pred_vertices'][n].detach().cpu().numpy()
 161 |             is_right = batch['right'][n].cpu().numpy()
 162 |             verts[:,0] = (2*is_right-1)*verts[:,0]
 163 |             cam_t = pred_cam_t_full[n]
 164 |             all_verts.append(verts)
 165 |             all_cam_t.append(cam_t)
 166 |             all_right.append(is_right)
 167 |     all_verts_as_np = np.asarray(all_verts)
 168 |     all_verts_as_np = all_verts_as_np[0]
 169 |     all_verts_list = [all_verts_as_np]
 170 | 
 171 |     misc_args = dict(
 172 |         mesh_base_color=HUMAN_HAND_COLOR,
 173 |         scene_bg_color=(1, 1, 1),
 174 |         focal_length=scaled_focal_length,
 175 |     )
 176 |     cam_view, rend_depth_front_view, mesh_list = renderer.render_rgba_multiple(all_verts_list, cam_t=all_cam_t, render_res=img_size[n], is_right=all_right, **misc_args)
 177 |     # Overlay image
 178 |     input_img = img_cv2.astype(np.float32)[:,:,::-1]/255.0
 179 |     input_img = np.concatenate([input_img, np.ones_like(input_img[:,:,:1])], axis=2) # Add alpha channel
 180 | 
 181 | 
 182 |     
 183 |     input_img_overlay = input_img[:,:,:3] * (1-cam_view[:,:,3:]) + cam_view[:,:,:3] * cam_view[:,:,3:]
 184 |     camera_translation = cam_t.copy()
 185 |     hand_mesh = renderer.vertices_to_trimesh(verts, camera_translation, HUMAN_HAND_COLOR, is_right=is_right)
 186 |     return input_img_overlay, cam_view, rend_depth_front_view, det_out, out, (hand_mesh, camera_translation, verts), (mesh_list, all_cam_t, all_verts)
 187 | 
 188 | 
 189 | def get_hand_mask_from_detectron(det_out, show_mask=False, rgb_im=None):
 190 |     if show_mask and rgb_im is not None:
 191 |         cfg = get_cfg()
 192 |         cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"))
 193 |         cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # set threshold for this model
 194 |         cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml")
 195 |         # Make prediction
 196 |         outputs = det_out
 197 |         v = Visualizer(rgb_im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2)
 198 |         v = v.draw_instance_predictions(outputs["instances"].to("cpu"))
 199 |     try:
 200 |         human_mask = det_out["instances"].pred_masks[np.argwhere(det_out['instances'].pred_classes == 0)[0,0].item()].cpu().numpy()
 201 |     except:
 202 |         human_mask = det_out["instances"].pred_masks[np.argwhere(det_out['instances'].pred_classes.cpu().numpy() == 0)[0,0].item()].cpu().numpy()
 203 | 
 204 |     return human_mask
 205 | 
 206 | 
 207 | def get_hand_keypoints_from_mediapipe(rgb_im, viz_keypoints=False):
 208 |     # # Initialize MediaPipe Hands.
 209 |     mp_hands = mp.solutions.hands
 210 |     hands = mp_hands.Hands(static_image_mode=True, max_num_hands=2, min_detection_confidence=0.5)
 211 |     color_image = copy.deepcopy((rgb_im * 255.0).astype(np.uint8))
 212 |     if viz_keypoints:
 213 |         plt.imshow(rgb_im)
 214 |         plt.axis('off')
 215 |         plt.show()
 216 |     # Process the image to find hand landmarks.
 217 |     results = hands.process(color_image)
 218 |     d_val = -1
 219 |     # Draw the hand landmarks on the image.
 220 |     if results.multi_hand_landmarks:
 221 |         for hand_landmarks in results.multi_hand_landmarks:
 222 |             for landmark in hand_landmarks.landmark:
 223 |                 # Draw a circle on the image.
 224 |                 cx, cy = int(landmark.x * color_image.shape[1]), int(landmark.y * color_image.shape[0])
 225 |                 cv.circle(color_image, (cx, cy), 5, (0, 255, 0), -1)
 226 |     mp_drawing = mp.solutions.drawing_utils
 227 |     image = color_image
 228 |     if results.multi_hand_landmarks:
 229 |         for hand_landmarks in results.multi_hand_landmarks:
 230 |             mp_drawing.draw_landmarks(image, hand_landmarks, mp_hands.HAND_CONNECTIONS)
 231 |             # Example: Get the tip of the index finger
 232 |             index_tip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
 233 |             # print(f"Index Finger Tip Coordinates: (x: {index_tip.x}, y: {index_tip.y}, z: {index_tip.z})")
 234 |             index_x, index_y = int(index_tip.x * image.shape[1]), int(index_tip.y * image.shape[0])
 235 |             thumb_tip = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP]
 236 |             # print(f"Thumb Tip Coordinates: (x: {thumb_tip.x}, y: {thumb_tip.y}, z: {thumb_tip.z})")
 237 |             thumb_x, thumb_y = int(thumb_tip.x * image.shape[1]), int(thumb_tip.y * image.shape[0])
 238 |             middle_tip = hand_landmarks.landmark[mp_hands.HandLandmark.MIDDLE_FINGER_TIP]
 239 |             # print(f"Middle Finger Tip Coordinates: (x: {middle_tip.x}, y: {middle_tip.y}, z: {middle_tip.z})")
 240 |             middle_x, middle_y = int(middle_tip.x * image.shape[1]), int(middle_tip.y * image.shape[0])
 241 |     # Extract a hand mesh from the hand landmarks and display it.
 242 |     hand_mesh = mp_hands.HAND_CONNECTIONS
 243 |     hand_mesh = [list(pair) for pair in hand_mesh]
 244 |     # Draw the hand mesh on the image.
 245 |     if results.multi_hand_landmarks:
 246 |         for hand_landmarks in results.multi_hand_landmarks:
 247 |             for connection in hand_mesh:
 248 |                 # Get the coordinates of the two points.
 249 |                 start = (int(hand_landmarks.landmark[connection[0]].x * image.shape[1]),
 250 |                         int(hand_landmarks.landmark[connection[0]].y * image.shape[0]))
 251 |                 end = (int(hand_landmarks.landmark[connection[1]].x * image.shape[1]),
 252 |                     int(hand_landmarks.landmark[connection[1]].y * image.shape[0]))
 253 |                 # Draw a line connecting the two points.
 254 |                 cv.line(image, start, end, (0, 255, 0), 2)
 255 |         
 256 |     # Draw a cirlce on the pixel of the index finger tip using matplotlib.
 257 |     thumb_cmc = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_CMC]
 258 |     thumb_cmc_x, thumb_cmc_y = int(thumb_cmc.x * image.shape[1]), int(thumb_cmc.y * image.shape[0])
 259 | 
 260 |     thumb_IP = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_IP]
 261 |     thumb_IP_x, thumb_IP_y = int(thumb_IP.x * image.shape[1]), int(thumb_IP.y * image.shape[0])
 262 | 
 263 |     thumb_MCP = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_MCP]
 264 |     thumb_MCP_x, thumb_MCP_y = int(thumb_MCP.x * image.shape[1]), int(thumb_MCP.y * image.shape[0])
 265 | 
 266 |     index_mc = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_MCP]
 267 |     index_mc_x, index_mc_y = int(index_mc.x * image.shape[1]), int(index_mc.y * image.shape[0])
 268 | 
 269 |     index_pip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_PIP]
 270 |     index_pip_x, index_pip_y = int(index_pip.x * image.shape[1]), int(index_pip.y * image.shape[0])
 271 | 
 272 |     index_dip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_DIP]
 273 |     index_dip_x, index_dip_y = int(index_dip.x * image.shape[1]), int(index_dip.y * image.shape[0])
 274 |     
 275 |     if viz_keypoints:
 276 |         plt.imshow(rgb_im)
 277 |         plt.scatter(index_x, index_y, color='blue', s=10)
 278 |         plt.scatter(index_dip_x, index_dip_y, color='orange', s=10)
 279 | 
 280 |         plt.scatter(index_pip_x, index_pip_y, color='purple', s=10)
 281 |         plt.scatter(thumb_x, thumb_y, color='blue', s=10)
 282 |         plt.scatter(thumb_IP_x, thumb_IP_y, color='orange', s=10)
 283 |         plt.scatter(thumb_MCP_x, thumb_MCP_y, color='purple', s=10)
 284 | 
 285 |     index_finger_1_x, index_finger_1_y = index_x, index_y
 286 |     index_finger_2_x, index_finger_2_y = index_dip_x, index_dip_y
 287 |     index_finger_3_x, index_finger_3_y = index_pip_x, index_pip_y
 288 | 
 289 |     thumb_1_x, thumb_1_y = thumb_x, thumb_y
 290 |     thumb_2_x, thumb_2_y = thumb_IP_x, thumb_IP_y
 291 |     thumb_3_x, thumb_3_y = thumb_MCP_x, thumb_MCP_y
 292 |     return middle_x, middle_y, index_finger_1_x, index_finger_1_y, index_finger_2_x, index_finger_2_y, index_finger_3_x, index_finger_3_y, thumb_1_x, thumb_1_y, thumb_2_x, thumb_2_y, thumb_3_x, thumb_3_y
 293 | 
 294 | def align_gripper_to_hand(hand_point_cloud, hand_pcd_as_o3d, gripper_pcd, gripper_pcd_as_o3d, vizualize=False):
 295 | 
 296 |     # Compute the principal axis of the point cloud
 297 |     principal_axis_h, second_axis_h = compute_principal_axis(hand_point_cloud)
 298 |     principal_axis_g, second_axis_g = compute_principal_axis(gripper_pcd, switch_principal_axis=True)
 299 |     middle_point_gripper = np.mean(gripper_pcd, axis=0)
 300 |     middle_point_hand = np.mean(hand_point_cloud, axis=0)
 301 |     if vizualize:
 302 |         # Plot a vector in the direction of the principal axis in open3d
 303 |         line_set_h1 = o3d.geometry.LineSet()
 304 |         line_set_h1.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand + principal_axis_h])
 305 |         line_set_h1.lines = o3d.utility.Vector2iVector([[0, 1]])
 306 |         # make line red
 307 |         line_set_h1.colors = o3d.utility.Vector3dVector([[1, 0, 0]])
 308 |         line_set_h2 = o3d.geometry.LineSet()
 309 |         line_set_h2.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand - principal_axis_h])
 310 |         line_set_h2.lines = o3d.utility.Vector2iVector([[0, 1]])
 311 |         # make line red
 312 |         line_set_h2.colors = o3d.utility.Vector3dVector([[1, 0, 0]])
 313 |         line_set_h3 = o3d.geometry.LineSet()
 314 |         line_set_h3.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand + second_axis_h])
 315 |         line_set_h3.lines = o3d.utility.Vector2iVector([[0, 1]])
 316 |         # make line green
 317 |         line_set_h3.colors = o3d.utility.Vector3dVector([[0, 1, 0]])
 318 |         line_set_h4 = o3d.geometry.LineSet()
 319 |         line_set_h4.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand - second_axis_h])
 320 |         line_set_h4.lines = o3d.utility.Vector2iVector([[0, 1]])
 321 |         # make line green
 322 |         line_set_h4.colors = o3d.utility.Vector3dVector([[0, 1, 0]])
 323 |         line_set_g1 = o3d.geometry.LineSet()
 324 |         line_set_g1.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper + principal_axis_g])
 325 |         line_set_g1.lines = o3d.utility.Vector2iVector([[0, 1]])
 326 |         # make line red
 327 |         line_set_g1.colors = o3d.utility.Vector3dVector([[1, 0, 0]])
 328 |         line_set_g2 = o3d.geometry.LineSet()
 329 |         line_set_g2.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper - principal_axis_g])
 330 |         line_set_g2.lines = o3d.utility.Vector2iVector([[0, 1]])
 331 |         # make line red
 332 |         line_set_g2.colors = o3d.utility.Vector3dVector([[1, 0, 0]])
 333 |         line_set_g3 = o3d.geometry.LineSet()
 334 |         line_set_g3.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper + second_axis_g])
 335 |         line_set_g3.lines = o3d.utility.Vector2iVector([[0, 1]])
 336 |         # make line green
 337 |         line_set_g3.colors = o3d.utility.Vector3dVector([[0, 1, 0]])
 338 |         line_set_g4 = o3d.geometry.LineSet()
 339 |         line_set_g4.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper - second_axis_g])
 340 |         line_set_g4.lines = o3d.utility.Vector2iVector([[0, 1]])
 341 |         # make line green
 342 |         line_set_g4.colors = o3d.utility.Vector3dVector([[0, 1, 0]])
 343 |         # Plot the coordinate frame of the gripper in open3d
 344 |         gripper_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=middle_point_gripper)
 345 |         T_gripper_coord = np.eye(4)
 346 |         T_gripper_coord[3, :3] = middle_point_gripper
 347 |         # Show in open3d
 348 |         o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_pcd_as_o3d,  line_set_h2, line_set_h3, line_set_h4, line_set_g2, line_set_g3, line_set_g4, gripper_coord])
 349 |     # Extract points on each pair of principal axes to compute the relative transformation
 350 |     # Extract points on the principal axis of the hand
 351 |     q = np.array([middle_point_hand, 
 352 |                 hand_point_cloud[0] + principal_axis_h, 
 353 |                 #   middle_point_hand - principal_axis_h, 
 354 |                 middle_point_hand + second_axis_h,
 355 |                 middle_point_hand - second_axis_h])
 356 |     p = np.array([middle_point_gripper,
 357 |                     # middle_point_gripper + principal_axis_g,
 358 |                     gripper_pcd[0] - principal_axis_g,
 359 |                     middle_point_gripper + second_axis_g,
 360 |                     middle_point_gripper - second_axis_g])
 361 |     # Compute the relative transformation between the two pairs of principal axes
 362 |     T = find_scaled_transformation(p, q, use_scale=True)
 363 |     # Apply the transformation matrix to the point cloud
 364 |     gripper_aligned_to_hand_pcd = apply_transformation(gripper_pcd, T)
 365 |     # Show in open3d
 366 |     gripper_aligned_to_hand_pcd_as_o3d = o3d.geometry.PointCloud()
 367 |     gripper_aligned_to_hand_pcd_as_o3d.points = o3d.utility.Vector3dVector(gripper_aligned_to_hand_pcd)
 368 |     gripper_aligned_to_hand_pcd_as_o3d.paint_uniform_color([0, 1, 0])
 369 | 
 370 |     if vizualize:
 371 |         o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_pcd_as_o3d, gripper_aligned_to_hand_pcd_as_o3d])
 372 |     return gripper_aligned_to_hand_pcd_as_o3d
 373 | 
 374 | def get_hand_pcd_in_scene_from_rendered_model(rgb_im, depth_im, rend_depth_front_view, new_depth_image, human_mask, vizualize=False):
 375 | 
 376 |     point_cloud_camera = depth_to_point_cloud(depth_im, INTRINSICS_REAL_CAMERA[0, 0], INTRINSICS_REAL_CAMERA[1, 1], INTRINSICS_REAL_CAMERA[0, 2], INTRINSICS_REAL_CAMERA[1, 2]) # Camera intrinsics, depth at real scale, although the shape is not accurate
 377 |     point_cloud_camera = point_cloud_camera.reshape(-1, 3)
 378 |     point_cloud_rescaled_depth = depth_to_point_cloud(new_depth_image, INTRINSICS_HAMER_RENDERER[0, 0], INTRINSICS_HAMER_RENDERER[1, 1], INTRINSICS_HAMER_RENDERER[0, 2],  INTRINSICS_HAMER_RENDERER[1, 2]) # Hamer intrinsics, depth at real scale, shape does not match real
 379 |     point_cloud_rescaled_depth = point_cloud_rescaled_depth.reshape(-1, 3)
 380 |     pcd_camera = o3d.geometry.PointCloud()
 381 |     pcd_camera.points = o3d.utility.Vector3dVector(point_cloud_camera)
 382 |     pcd_rescaled_depth = o3d.geometry.PointCloud()
 383 |     pcd_rescaled_depth.points = o3d.utility.Vector3dVector(point_cloud_rescaled_depth)
 384 | 
 385 |     show_scaled_up_pcd = False
 386 |     remove_outliers=True # does not work that well, a big hacky
 387 |     # print("point cloud camera shape: ", point_cloud_camera.shape)
 388 |     point_cloud_camera = point_cloud_camera.reshape(IM_HEIGHT, IM_WIDTH, 3)
 389 |     point_cloud_rescaled_depth = point_cloud_rescaled_depth.reshape(IM_HEIGHT, IM_WIDTH, 3)
 390 |     # Extract common points between rendered model and live image
 391 |     common_points_binary = np.zeros_like(new_depth_image)
 392 |     common_points_binary[new_depth_image > 0] = 1 # binary mask of points in the rendered model depth
 393 | 
 394 |     depth_im_hand =  human_mask * depth_im # extract human hand from the live image
 395 |     depth_im_hand = common_points_binary * depth_im_hand # extract common points between rendered depth model and live image
 396 |     if vizualize:
 397 |         print(depth_im_hand)
 398 |         plt.imshow(depth_im_hand)
 399 |         plt.title("Depth image of hand after human mask")
 400 |         plt.show()
 401 | 
 402 |     
 403 |     if remove_outliers: # a bit hacky, does not do much, if not cause problems
 404 |         mean_depth = np.median(new_depth_image[new_depth_image > 0])
 405 |         depth_im_hand[np.abs(depth_im_hand) > mean_depth + 0.2] = 0 # remove all points 2m away from the camera
 406 | 
 407 |     common_points = np.zeros_like(depth_im_hand)
 408 |     common_points[depth_im_hand > 0] = 1 # Binary mask of common points between rendered model depth and live image
 409 | 
 410 |     if vizualize:
 411 |         plt.imshow(common_points)
 412 |         plt.title("Common points")
 413 |         plt.show()
 414 | 
 415 |     common_pts_indices = np.argwhere(common_points > 0) # extract common points indices
 416 |     q = point_cloud_camera[common_pts_indices[:, 0], common_pts_indices[:, 1]]
 417 |     p = point_cloud_rescaled_depth[common_pts_indices[:, 0], common_pts_indices[:, 1]]
 418 | 
 419 |     T = find_scaled_transformation(p, q, use_scale=False) # Compute scale and rotation matrix Tp = q
 420 |     pcd_rendered_hand_to_live_hand = apply_transformation(point_cloud_rescaled_depth.reshape(-1, 3), T)
 421 |     pcd_rendered_hand_to_live_hand = pcd_rendered_hand_to_live_hand.reshape(-1, 3)
 422 |     pcd_rendered_hand_to_live_hand_full = copy.deepcopy(pcd_rendered_hand_to_live_hand)
 423 |     # pcd_rendered_hand_to_live_hand = pcd_rendered_hand_to_live_hand[pcd_rendered_hand_to_live_hand[:, 2] > .400]
 424 | 
 425 |     # Show the point cloud in open3d
 426 |     pcd_hand_to_scale = o3d.geometry.PointCloud()
 427 |     pcd_hand_to_scale.points = o3d.utility.Vector3dVector(pcd_rendered_hand_to_live_hand)
 428 |     hand_point_cloud = pcd_rendered_hand_to_live_hand # from above
 429 |     hand_point_cloud_full = pcd_rendered_hand_to_live_hand_full
 430 |     use_scaled_up_hand = True
 431 |     use_both_projected_hands = False
 432 |     clean_live_image_background = True
 433 |     if clean_live_image_background:
 434 |         mean_depth_hand = np.mean(new_depth_image[new_depth_image > 0])
 435 |         std_depth_hand = np.std(new_depth_image[new_depth_image > 0])
 436 |         depth_im[depth_im > mean_depth_hand + 1000] = 0
 437 |         depth_im[depth_im < mean_depth_hand - 1000] = 0
 438 |     if not use_scaled_up_hand and not use_both_projected_hands:
 439 |         hand_point_cloud = depth_to_point_cloud(new_depth_image, INTRINSICS_REAL_CAMERA[0, 0], INTRINSICS_REAL_CAMERA[1, 1], INTRINSICS_REAL_CAMERA[0, 2], INTRINSICS_REAL_CAMERA[1, 2])
 440 |         hand_point_cloud = hand_point_cloud.reshape(-1, 3)
 441 |         hand_point_cloud_full = hand_point_cloud
 442 |     if use_both_projected_hands:
 443 |         hand_point_cloud_via_intrinsics = depth_to_point_cloud(new_depth_image, INTRINSICS_REAL_CAMERA[0, 0], INTRINSICS_REAL_CAMERA[1, 1], INTRINSICS_REAL_CAMERA[0, 2], INTRINSICS_REAL_CAMERA[1, 2])
 444 |         hand_point_cloud_via_intrinsics = hand_point_cloud_via_intrinsics.reshape(-1, 3)
 445 |     image_point_cloud = depth_to_point_cloud(depth_im, INTRINSICS_REAL_CAMERA[0, 0], INTRINSICS_REAL_CAMERA[1, 1], INTRINSICS_REAL_CAMERA[0, 2], INTRINSICS_REAL_CAMERA[1, 2])
 446 |     image_point_cloud = image_point_cloud.reshape(-1, 3)
 447 |     # Point cloud to open3d point cloud
 448 |     pcd_hand_to_scale = o3d.geometry.PointCloud()
 449 |     pcd_hand_to_scale.points = o3d.utility.Vector3dVector(hand_point_cloud[hand_point_cloud[:, 2] > 0])
 450 |     # Set pcd color to red
 451 |     pcd_hand_to_scale.paint_uniform_color([1, 0, 0])
 452 |     pcd_image = o3d.geometry.PointCloud()
 453 |     pcd_image.points = o3d.utility.Vector3dVector(image_point_cloud)
 454 |     # Set color to RGB
 455 |     pcd_image.colors = o3d.utility.Vector3dVector(rgb_im.reshape(-1, 3))
 456 |     # Set pcd color to blue
 457 |     if use_both_projected_hands and vizualize:
 458 |         pcd_via_intrinsics = o3d.geometry.PointCloud()
 459 |         pcd_via_intrinsics.points = o3d.utility.Vector3dVector(hand_point_cloud_via_intrinsics[hand_point_cloud_via_intrinsics[:, 2] > 0])
 460 |         pcd_via_intrinsics.paint_uniform_color([0, 1, 0])
 461 |         pcd_hand_to_scale.paint_uniform_color([1, 0, 0])
 462 | 
 463 |     return hand_point_cloud, hand_point_cloud_full, image_point_cloud, pcd_hand_to_scale, pcd_image, T
 464 | 
 465 | def align_gripper_to_hand(hand_point_cloud, hand_pcd_as_o3d, gripper_pcd, gripper_pcd_as_o3d, vizualize=False):
 466 | 
 467 |     # Compute the principal axis of the point cloud
 468 |     principal_axis_h, second_axis_h = compute_principal_axis(hand_point_cloud)
 469 |     principal_axis_g, second_axis_g = compute_principal_axis(gripper_pcd, switch_principal_axis=True)
 470 |     middle_point_gripper = np.mean(gripper_pcd, axis=0)
 471 |     middle_point_hand = np.mean(hand_point_cloud, axis=0)
 472 |     if vizualize:
 473 |         # Plot a vector in the direction of the principal axis in open3d
 474 |         line_set_h1 = o3d.geometry.LineSet()
 475 |         line_set_h1.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand + principal_axis_h])
 476 |         line_set_h1.lines = o3d.utility.Vector2iVector([[0, 1]])
 477 |         # make line red
 478 |         line_set_h1.colors = o3d.utility.Vector3dVector([[1, 0, 0]])
 479 |         line_set_h2 = o3d.geometry.LineSet()
 480 |         line_set_h2.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand - principal_axis_h])
 481 |         line_set_h2.lines = o3d.utility.Vector2iVector([[0, 1]])
 482 |         # make line red
 483 |         line_set_h2.colors = o3d.utility.Vector3dVector([[1, 0, 0]])
 484 |         line_set_h3 = o3d.geometry.LineSet()
 485 |         line_set_h3.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand + second_axis_h])
 486 |         line_set_h3.lines = o3d.utility.Vector2iVector([[0, 1]])
 487 |         # make line green
 488 |         line_set_h3.colors = o3d.utility.Vector3dVector([[0, 1, 0]])
 489 |         line_set_h4 = o3d.geometry.LineSet()
 490 |         line_set_h4.points = o3d.utility.Vector3dVector([middle_point_hand, middle_point_hand - second_axis_h])
 491 |         line_set_h4.lines = o3d.utility.Vector2iVector([[0, 1]])
 492 |         # make line green
 493 |         line_set_h4.colors = o3d.utility.Vector3dVector([[0, 1, 0]])
 494 |         line_set_g1 = o3d.geometry.LineSet()
 495 |         line_set_g1.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper + principal_axis_g])
 496 |         line_set_g1.lines = o3d.utility.Vector2iVector([[0, 1]])
 497 |         # make line red
 498 |         line_set_g1.colors = o3d.utility.Vector3dVector([[1, 0, 0]])
 499 |         line_set_g2 = o3d.geometry.LineSet()
 500 |         line_set_g2.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper - principal_axis_g])
 501 |         line_set_g2.lines = o3d.utility.Vector2iVector([[0, 1]])
 502 |         # make line red
 503 |         line_set_g2.colors = o3d.utility.Vector3dVector([[1, 0, 0]])
 504 |         line_set_g3 = o3d.geometry.LineSet()
 505 |         line_set_g3.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper + second_axis_g])
 506 |         line_set_g3.lines = o3d.utility.Vector2iVector([[0, 1]])
 507 |         # make line green
 508 |         line_set_g3.colors = o3d.utility.Vector3dVector([[0, 1, 0]])
 509 |         line_set_g4 = o3d.geometry.LineSet()
 510 |         line_set_g4.points = o3d.utility.Vector3dVector([middle_point_gripper, middle_point_gripper - second_axis_g])
 511 |         line_set_g4.lines = o3d.utility.Vector2iVector([[0, 1]])
 512 |         # make line green
 513 |         line_set_g4.colors = o3d.utility.Vector3dVector([[0, 1, 0]])
 514 |         # Plot the coordinate frame of the gripper in open3d
 515 |         gripper_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=middle_point_gripper)
 516 |         T_gripper_coord = np.eye(4)
 517 |         T_gripper_coord[3, :3] = middle_point_gripper
 518 |         # Show in open3d
 519 |         o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_pcd_as_o3d,  line_set_h2, line_set_h3, line_set_h4, line_set_g2, line_set_g3, line_set_g4, gripper_coord])
 520 |     # Extract points on each pair of principal axes to compute the relative transformation
 521 |     # Extract points on the principal axis of the hand
 522 |     q = np.array([middle_point_hand, 
 523 |                 hand_point_cloud[0] + principal_axis_h, 
 524 |                 #   middle_point_hand - principal_axis_h, 
 525 |                 middle_point_hand + second_axis_h,
 526 |                 middle_point_hand - second_axis_h])
 527 |     p = np.array([middle_point_gripper,
 528 |                     # middle_point_gripper + principal_axis_g,
 529 |                     gripper_pcd[0] - principal_axis_g,
 530 |                     middle_point_gripper + second_axis_g,
 531 |                     middle_point_gripper - second_axis_g])
 532 |     # Compute the relative transformation between the two pairs of principal axes
 533 |     T = find_scaled_transformation(p, q, use_scale=True)
 534 |     # Apply the transformation matrix to the point cloud
 535 |     gripper_aligned_to_hand_pcd = apply_transformation(gripper_pcd, T)
 536 |     # Show in open3d
 537 |     gripper_aligned_to_hand_pcd_as_o3d = o3d.geometry.PointCloud()
 538 |     gripper_aligned_to_hand_pcd_as_o3d.points = o3d.utility.Vector3dVector(gripper_aligned_to_hand_pcd)
 539 |     gripper_aligned_to_hand_pcd_as_o3d.paint_uniform_color([0, 1, 0])
 540 | 
 541 |     if vizualize:
 542 |         o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_pcd_as_o3d, gripper_aligned_to_hand_pcd_as_o3d])
 543 |     return gripper_aligned_to_hand_pcd_as_o3d
 544 | 
 545 | def align_gripper_with_hand_fingers(gripper_scaled_to_hand_pcd, 
 546 |                                     hand_pcd_as_o3d, 
 547 |                                     key_fingers_points, 
 548 |                                     gripper_aligned_to_hand_pcd_as_o3d,
 549 |                                     gripper_pcd_dense_mesh,  
 550 |                                     use_only_thumb_keypoints=False,
 551 |                                     use_only_index_keypoints=False,
 552 |                                     rescale_gripper_to_hand_opening=False, 
 553 |                                     rescale_hand_to_gripper_opening=False,
 554 |                                     vizualize=False,
 555 |                                     bias_transformation=np.eye(4)):
 556 |     assert (use_only_index_keypoints == use_only_thumb_keypoints) or use_only_index_keypoints == False , f'Either ONLY index {use_only_thumb_keypoints} or thumb keypoints {use_only_index_keypoints} can be used. Both False=Use both'
 557 |     assert (rescale_gripper_to_hand_opening == rescale_hand_to_gripper_opening) or rescale_gripper_to_hand_opening == False, f'Either rescale gripper to hand opening {rescale_gripper_to_hand_opening} or rescale hand to gripper opening {rescale_hand_to_gripper_opening} can be used. Both False=Do not rescale'
 558 | 
 559 |     dense_pcd_kpts = {"index_front": 517980, "thumb_front": 248802, "wrist": 246448}
 560 |     gripper_fingers = np.array([gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["index_front"]], 
 561 |                                 gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["index_front"]],
 562 |                                 gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["index_front"]],
 563 |                                 gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["thumb_front"]],
 564 |                                 gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["thumb_front"]],
 565 |                                 gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["thumb_front"]],
 566 |                                 gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["wrist"]]])
 567 |     
 568 |     if rescale_gripper_to_hand_opening:
 569 |         distance_between_thumb_and_index = np.linalg.norm(key_fingers_points[0] - key_fingers_points[3])
 570 |         distance_between_thumb_and_index_gripper = np.linalg.norm(gripper_fingers[0] - gripper_fingers[3])
 571 |         scaling_factor = distance_between_thumb_and_index / distance_between_thumb_and_index_gripper
 572 |         center_gripper = gripper_scaled_to_hand_pcd.get_center()
 573 |         gripper_scaled_to_hand_pcd_np = (np.asarray(gripper_scaled_to_hand_pcd.points) - center_gripper) * np.array([scaling_factor, scaling_factor, scaling_factor]) + center_gripper
 574 |         gripper_scaled_to_hand_pcd.points = o3d.utility.Vector3dVector(gripper_scaled_to_hand_pcd_np)
 575 | 
 576 |     if rescale_hand_to_gripper_opening:
 577 |         distance_between_thumb_and_index = np.linalg.norm(key_fingers_points[0] - key_fingers_points[3])
 578 |         distance_between_thumb_and_index_gripper = np.linalg.norm(gripper_fingers[0] - gripper_fingers[3])
 579 |         scaling_factor = distance_between_thumb_and_index_gripper / distance_between_thumb_and_index
 580 |         center_hand = hand_pcd_as_o3d.get_center()
 581 |         hand_pcd_as_o3d_np = (np.asarray(hand_pcd_as_o3d.points) - center_hand) * np.array([scaling_factor, scaling_factor, scaling_factor]) + center_hand
 582 |         hand_pcd_as_o3d.points = o3d.utility.Vector3dVector(hand_pcd_as_o3d_np)
 583 |         # rescale the key fingers points
 584 |         key_finger_points_center = np.mean(key_fingers_points, axis=0)
 585 |         key_fingers_points = (key_fingers_points - key_finger_points_center) * np.array([scaling_factor, scaling_factor, scaling_factor]) + key_finger_points_center
 586 |         
 587 | 
 588 |     # determine a line that goes through the index and thumb of the hand
 589 |     kpt_o3d_sphere = []
 590 |     count = 0
 591 |     # key_fingers_points_4pts = np.array([key_fingers_points[0], key_fingers_points[1], key_fingers_points[3], key_fingers_points[4]])
 592 |     key_fingers_points_4pts = np.array([key_fingers_points[0], key_fingers_points[3], key_fingers_points[-1]])
 593 | 
 594 |     key_fingers_points = np.array(key_fingers_points_4pts)
 595 | 
 596 |     alpha = 1
 597 |     line_point = key_fingers_points[0] + alpha * (key_fingers_points[1] - key_fingers_points[0])
 598 |     unit_vec_difference = (key_fingers_points[1] - key_fingers_points[0]) / np.linalg.norm(key_fingers_points[1] - key_fingers_points[0])
 599 |     distance_gripper_fingers = np.linalg.norm(gripper_fingers[0] - gripper_fingers[4])
 600 |     distance_key_fingers = np.linalg.norm(key_fingers_points[0] - key_fingers_points[1])
 601 |     difference_half = np.abs(distance_gripper_fingers - distance_key_fingers)/2
 602 |     pt1 = key_fingers_points[0] - unit_vec_difference * difference_half
 603 |     pt2 = key_fingers_points[1] + unit_vec_difference * difference_half
 604 |     middle_finger_point = key_fingers_points[0] + unit_vec_difference * distance_key_fingers/2
 605 |     distance_middle_griper_middle_finger = np.linalg.norm(gripper_fingers[-1] - middle_finger_point)
 606 |     unit_difference_between_middle_finger_point_and_key_fingers_last = (middle_finger_point - key_fingers_points[-1]) / np.linalg.norm(middle_finger_point - key_fingers_points[-1])
 607 | 
 608 |     new_hand_point = middle_finger_point - unit_difference_between_middle_finger_point_and_key_fingers_last * distance_middle_griper_middle_finger
 609 |     distance_pt1_pt2 = np.linalg.norm(pt1 - pt2)
 610 |     print(f"Distance between pt1 and pt2: {distance_pt1_pt2}")
 611 |     print(f"Distance between gripper fingers: {distance_gripper_fingers}")
 612 |     key_fingers_points = np.array([pt1, pt2, key_fingers_points[-1]])
 613 |     # key_fingers_points = np.array([pt1, pt2])
 614 | 
 615 |     for kpt in key_fingers_points:
 616 |         sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003)
 617 |         sphere.compute_vertex_normals()
 618 |         count += 1
 619 |         if count % 3 == 0:
 620 |             red, green, blue = 1, 0, 1
 621 |         elif count % 3 == 1:
 622 |             red, green, blue = 0,1,0
 623 |         else:
 624 |             red, green, blue = 1, 0.5, 0
 625 |         sphere.paint_uniform_color([red, green, blue])
 626 |         sphere.translate(kpt)
 627 |         kpt_o3d_sphere.append(sphere)
 628 |     
 629 |     # add middle finger point
 630 |     sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003)
 631 |     sphere.compute_vertex_normals()
 632 |     sphere.paint_uniform_color([1, .5, .777])
 633 |     sphere.translate(middle_finger_point)
 634 |     kpt_o3d_sphere.append(sphere)
 635 | 
 636 |     # gripper_fingers_4pts = np.array([gripper_fingers[1], gripper_fingers[2], gripper_fingers[4], gripper_fingers[5]])
 637 |     gripper_fingers_4pts = np.array([gripper_fingers[0], gripper_fingers[4], gripper_fingers[-1]])
 638 |     # gripper_fingers_4pts = np.array([gripper_fingers[0], gripper_fingers[4]])
 639 | 
 640 | 
 641 |     gripper_fingers = np.array(gripper_fingers_4pts)
 642 |     gripper_fingers_o3d = []
 643 |     count = 0
 644 |     # Create vizualizer to sequentilaly add spheres to the gripper fingers
 645 |     for kpt in gripper_fingers:
 646 |         sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003)
 647 |         sphere.compute_vertex_normals()
 648 |         count += 1
 649 |         if count % 3 == 0:
 650 |             red, green, blue = 1, 0, 1 # color: purple
 651 |         elif count % 3 == 1:
 652 |             red, green, blue = 0, 1, 0
 653 |         else:
 654 |             red, green, blue = 1, 0.5, 0
 655 |         sphere.paint_uniform_color([red, green, blue])
 656 |         sphere.translate(kpt)
 657 |         gripper_fingers_o3d.append(sphere)
 658 |     
 659 |     if use_only_thumb_keypoints:
 660 |         key_fingers_points = key_fingers_points[2:]
 661 |         gripper_fingers = gripper_fingers[2:]
 662 |         gripper_fingers_o3d = gripper_fingers_o3d[2:]
 663 |         kpt_o3d_sphere = kpt_o3d_sphere[2:]
 664 | 
 665 |     if use_only_index_keypoints:
 666 |         key_fingers_points = key_fingers_points[:4]
 667 |         gripper_fingers = gripper_fingers[:4]
 668 |         gripper_fingers_o3d = gripper_fingers_o3d[:4]
 669 |         kpt_o3d_sphere = kpt_o3d_sphere[:4]
 670 | 
 671 | 
 672 |     T = find_scaled_transformation(gripper_fingers[:2], key_fingers_points[:2], use_scale=False)
 673 |     # transform the gripper_fingers_o3d to the hand frame
 674 |     for sphere in gripper_fingers_o3d:
 675 |         sphere.transform(T)
 676 | 
 677 |     gripper_pcd_before_transform = copy.deepcopy(gripper_scaled_to_hand_pcd)
 678 |     gripper_scaled_to_hand_pcd.transform(T)
 679 |     # Assume R is the rotation matrix you've computed and t is the translation
 680 |     # Transform z2
 681 |     R = T[:3, :3]
 682 |     t = T[:3, 3]
 683 |     z1 = key_fingers_points[-1]
 684 |     x1 = key_fingers_points[0]
 685 |     y1 = key_fingers_points[1]
 686 |     z2 = gripper_fingers[-1]
 687 |     x2 = gripper_fingers[0]
 688 |     y2 = gripper_fingers[1]
 689 | 
 690 |     z2_transformed = R @ z2 + t
 691 |     # Compute rotation axis (using x2 and y2 after transformation)
 692 |     x2_transformed = R @ x2 + t
 693 |     y2_transformed = R @ y2 + t
 694 |     rotation_axis = y2_transformed - x2_transformed #np.cross(x2_transformed - y2_transformed, z2_transformed - y2_transformed)
 695 | 
 696 | 
 697 |     # find theta that bring z2 as closest as possible to z2 while keeping the rotation axis the same
 698 |     distance = 10e10
 699 |     rotation_theta = None
 700 |     for theta in np.linspace(0, 2 * np.pi, 1000):
 701 |         R_additional = rotation_matrix(rotation_axis, theta)
 702 |         z2_final = (z2_transformed -  (y2_transformed + x2_transformed) / 2) @ R_additional.T + (y2_transformed + x2_transformed) / 2
 703 |         distance_temp = np.linalg.norm(z2_final - z1)
 704 |         if distance_temp < distance:
 705 |             distance = distance_temp
 706 |             rotation_theta = theta
 707 | 
 708 |     # Apply rotation about the axis
 709 |     R_additional = rotation_matrix(rotation_axis, rotation_theta)
 710 |     z2_final = (z2_transformed -  (y2_transformed + x2_transformed) / 2) @ R_additional.T + (y2_transformed + x2_transformed) / 2
 711 | 
 712 |     T2 = np.eye(4)
 713 |     T2[:3, :3] = R_additional
 714 |     gripper_scaled_to_hand_pcd_points = np.asarray(gripper_scaled_to_hand_pcd.points)
 715 |     gripper_scaled_to_hand_pcd_points = (gripper_scaled_to_hand_pcd_points - (y2_transformed + x2_transformed) / 2) @ R_additional.T + (y2_transformed + x2_transformed) / 2
 716 |     gripper_scaled_to_hand_pcd.points = o3d.utility.Vector3dVector(gripper_scaled_to_hand_pcd_points)
 717 |     # gripper_scaled_to_hand_pcd.transform(T2)
 718 |     gripper_aligned_to_hand_pcd_as_o3d.paint_uniform_color([.1, 1, 1])
 719 | 
 720 |     # z2_final = gripper_scaled_to_hand_pcd.points[dense_pcd_kpts["wrist"]]
 721 |     # add z2_final to sphere
 722 |     sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003)
 723 |     sphere.compute_vertex_normals()
 724 |     sphere.paint_uniform_color([0, 0, 0])
 725 |     sphere.translate(z2_final)
 726 |     kpt_o3d_sphere.append(sphere)
 727 | 
 728 |     if bias_transformation is None:
 729 |         bias_transformation = np.eye(4)
 730 |     # apply bias transformation in the gripper frame
 731 |     gripper_pose, gripper_zero_mean = get_gripper_transform_in_camera_frame(gripper_scaled_to_hand_pcd, 
 732 |                                                                             gripper_pcd_dense_mesh, 
 733 |                                                                             return_zero_meaned_gripper=True,
 734 |                                                                             vizualize=vizualize)
 735 |     gripper_pose = gripper_pose @ bias_transformation
 736 |     gripper_zero_mean.transform(gripper_pose)
 737 |     gripper_scaled_to_hand_pcd = copy.deepcopy(gripper_zero_mean)
 738 |         
 739 |     if vizualize:
 740 |             # o3d.visualization.draw_geometries([pcd_image, gripper_scaled_to_hand_pcd])
 741 |         line_o3d = o3d.geometry.LineSet()
 742 |         line_o3d.points = o3d.utility.Vector3dVector([key_fingers_points[0], key_fingers_points[0] + unit_vec_difference * 3])
 743 |         line_o3d.lines = o3d.utility.Vector2iVector([[0, 1]])
 744 | 
 745 |         line_o3d_2 = o3d.geometry.LineSet()
 746 |         line_o3d_2.points = o3d.utility.Vector3dVector([key_fingers_points[0], key_fingers_points[0] - unit_vec_difference * 3])
 747 |         line_o3d_2.lines = o3d.utility.Vector2iVector([[0, 1]])
 748 | 
 749 |         line_o3d_3 = o3d.geometry.LineSet()
 750 |         line_o3d_3.points = o3d.utility.Vector3dVector([middle_finger_point, key_fingers_points[1] + unit_difference_between_middle_finger_point_and_key_fingers_last * 3])
 751 |         line_o3d_3.lines = o3d.utility.Vector2iVector([[0, 1]])
 752 | 
 753 |         line_o3d_4 = o3d.geometry.LineSet()
 754 |         line_o3d_4.points = o3d.utility.Vector3dVector([middle_finger_point, key_fingers_points[1] - unit_difference_between_middle_finger_point_and_key_fingers_last * 3])
 755 |         line_o3d_4.lines = o3d.utility.Vector2iVector([[0, 1]])
 756 | 
 757 | 
 758 |         line_o3d_rotation_axis = o3d.geometry.LineSet()
 759 |         line_o3d_rotation_axis.points = o3d.utility.Vector3dVector([x2_transformed, x2_transformed + 10 * rotation_axis])
 760 |         line_o3d_rotation_axis.lines = o3d.utility.Vector2iVector([[0, 1]])
 761 | 
 762 |         line_o3d_rotation_axis_2 = o3d.geometry.LineSet()
 763 |         line_o3d_rotation_axis_2.points = o3d.utility.Vector3dVector([y2_transformed, y2_transformed - 10 * rotation_axis])
 764 |         line_o3d_rotation_axis_2.lines = o3d.utility.Vector2iVector([[0, 1]])
 765 | 
 766 |         gripper_frame_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0])
 767 |         gripper_frame_coord.transform(gripper_pose)
 768 |         gripper_scaled_to_hand_pcd.paint_uniform_color([0, 1, 0])
 769 |         hand_pcd_as_o3d.paint_uniform_color([1, 0, 0])
 770 |         o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_scaled_to_hand_pcd, gripper_frame_coord] + gripper_fingers_o3d + kpt_o3d_sphere + [line_o3d, line_o3d_2, line_o3d_3, line_o3d_4, line_o3d_rotation_axis, line_o3d_rotation_axis_2])
 771 |         # o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_scaled_to_hand_pcd] + gripper_fingers_o3d + kpt_o3d_sphere + [line_o3d_rotation_axis])
 772 | 
 773 |     
 774 |     return gripper_scaled_to_hand_pcd, gripper_pose, distance_key_fingers
 775 | 
 776 | def align_hand_to_gripper_press(gripper_pcd, 
 777 |                             actions, 
 778 |                             vizualize=False,
 779 |                             bias_transformation=np.eye(4)):
 780 |     
 781 |     gripper_pcd_original_mesh = copy.deepcopy(gripper_pcd)
 782 |     dense_pcd_kpts = {"index_front": 517980, 
 783 |                       "index_middle": 231197, 
 784 |                       "index_bottom":335530, 
 785 |                       "thumb_front": 248802, 
 786 |                       "thumb_middle":71859, 
 787 |                       "thumb_bottom":523328, 
 788 |                       "wrist": 246448}
 789 |     
 790 |     gripper_fingers = np.array([gripper_pcd.points[dense_pcd_kpts["index_front"]], 
 791 |                                 gripper_pcd.points[dense_pcd_kpts["index_middle"]],
 792 |                                 gripper_pcd.points[dense_pcd_kpts["index_bottom"]],
 793 |                                 gripper_pcd.points[dense_pcd_kpts["thumb_front"]],
 794 |                                 gripper_pcd.points[dense_pcd_kpts["thumb_middle"]],
 795 |                                 gripper_pcd.points[dense_pcd_kpts["thumb_bottom"]],
 796 |                                 gripper_pcd.points[dense_pcd_kpts["wrist"]]])
 797 |     
 798 |     gripper_fingers[0] = gripper_fingers[0] - (gripper_fingers[0] - gripper_fingers[3]) / 2 
 799 |     gripper_fingers[4] = gripper_fingers[1] - (gripper_fingers[1] - gripper_fingers[4]) / 2
 800 |     gripper_fingers[-1] = gripper_fingers[2] - (gripper_fingers[2] - gripper_fingers[5]) / 2
 801 |     key_fingers_points = actions
 802 | 
 803 |     kpt_o3d_sphere = []
 804 |     count = 0
 805 | 
 806 |     for kpt in key_fingers_points:
 807 |         sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003)
 808 |         sphere.compute_vertex_normals()
 809 |         count += 1
 810 |         if count % 3 == 0:
 811 |             red, green, blue = 1, 0, 1
 812 |         elif count % 3 == 1:
 813 |             red, green, blue = 0,1,0
 814 |         else:
 815 |             red, green, blue = 1, 0.5, 0
 816 |         sphere.paint_uniform_color([red, green, blue])
 817 |         sphere.translate(kpt)
 818 |         kpt_o3d_sphere.append(sphere)
 819 |     
 820 | 
 821 |     gripper_fingers_4pts = np.array([gripper_fingers[0], gripper_fingers[4], gripper_fingers[-1]])
 822 |     gripper_fingers = np.array(gripper_fingers_4pts)
 823 |     gripper_fingers_o3d = []
 824 |     count = 0
 825 |     # Create vizualizer to sequentilaly add spheres to the gripper fingers
 826 |     for kpt in gripper_fingers:
 827 |         sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.003)
 828 |         sphere.compute_vertex_normals()
 829 |         count += 1
 830 |         if count % 3 == 0:
 831 |             red, green, blue = 1, 0, 1 # color: purple
 832 |         elif count % 3 == 1:
 833 |             red, green, blue = 0, 1, 0
 834 |         else:
 835 |             red, green, blue = 1, 0.5, 0
 836 |         sphere.paint_uniform_color([red, green, blue])
 837 |         sphere.translate(kpt)
 838 |         gripper_fingers_o3d.append(sphere)
 839 | 
 840 |     # o3d.visualization.draw_geometries([gripper_pcd] + gripper_fingers_o3d )
 841 | 
 842 | 
 843 |     T = find_scaled_transformation(gripper_fingers, key_fingers_points, use_scale=False)
 844 |     # transform the gripper_fingers_o3d to the hand frame
 845 |     for sphere in gripper_fingers_o3d:
 846 |         sphere.transform(T)
 847 |     gripper_pcd.transform(T)
 848 | 
 849 | 
 850 |     if bias_transformation is None:
 851 |         bias_transformation = np.eye(4)
 852 |     # apply bias transformation in the gripper frame
 853 |     gripper_pose, gripper_zero_mean = get_gripper_transform_in_camera_frame(gripper_pcd, 
 854 |                                                                             gripper_pcd_original_mesh, 
 855 |                                                                             return_zero_meaned_gripper=True,
 856 |                                                                             vizualize=vizualize)
 857 |     gripper_pose = gripper_pose @ bias_transformation
 858 |     gripper_zero_mean.transform(gripper_pose)
 859 |     gripper_pcd = copy.deepcopy(gripper_zero_mean)
 860 |         
 861 |     if vizualize:
 862 |             # o3d.visualization.draw_geometries([pcd_image, gripper_scaled_to_hand_pcd])
 863 | 
 864 |         gripper_frame_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0])
 865 |         gripper_frame_coord.transform(gripper_pose)
 866 |         gripper_pcd.paint_uniform_color([0.0,0.0,0.0])
 867 |         o3d.visualization.draw_geometries([gripper_pcd] + kpt_o3d_sphere )
 868 |         # o3d.visualization.draw_geometries([hand_pcd_as_o3d, gripper_scaled_to_hand_pcd] + gripper_fingers_o3d + kpt_o3d_sphere + [line_o3d_rotation_axis])
 869 | 
 870 |     
 871 |     return gripper_pcd, gripper_pose
 872 | 
 873 | def get_gripper_transform_in_camera_frame(gripper_scaled_to_hand_pcd, original_hand_pcd, vizualize=False, return_zero_meaned_gripper=False):
 874 |     # Add a world frame
 875 |     world_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0])
 876 |     # Show in open3d
 877 |     # o3d.visualization.draw_geometries([gripper_scaled_to_hand_pcd, sphere])
 878 |     gripper_zero_origin = copy.deepcopy(np.asarray(original_hand_pcd.points))
 879 |     # zero mean the z-axis
 880 |     gripper_zero_origin[:, 2] = gripper_zero_origin[:, 2] - np.mean(gripper_zero_origin[:, 2])
 881 |     # rotate 90 degrees around the x-axis
 882 |     R = np.array([[1, 0, 0], [0, 0, -1], [0, 1, 0]])
 883 |     gripper_zero_origin = np.dot(R, gripper_zero_origin.T).T
 884 |     gripper_zero_origin_o3d = o3d.geometry.PointCloud()
 885 |     gripper_zero_origin_o3d.points = o3d.utility.Vector3dVector(gripper_zero_origin)
 886 |     p = np.asarray(gripper_zero_origin_o3d.points)
 887 |     q = np.asarray(gripper_scaled_to_hand_pcd.points)
 888 |     T = find_scaled_transformation(p, q, use_scale=False)
 889 |     gripper_coord = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0])
 890 |     gripper_coord.transform(T)
 891 | 
 892 |     if vizualize:
 893 |         o3d.visualization.draw_geometries([gripper_scaled_to_hand_pcd, gripper_zero_origin_o3d, gripper_coord, world_coord])
 894 |     if return_zero_meaned_gripper:
 895 |         return T, gripper_zero_origin_o3d
 896 |     return T
 897 | 
 898 | 
 899 | def get_mano_hand_joints_from_vertices(vertices, camera_translation, model_path = MODEL_MANO_PATH):
 900 |     is_rhand = True
 901 |     ext='pkl'
 902 |     data_struct = None
 903 |     if data_struct is None:
 904 |             # Load the model
 905 |             if osp.isdir(model_path):
 906 |                 model_fn = 'MANO_{}.{ext}'.format('RIGHT' if is_rhand else 'LEFT', ext=ext)
 907 |                 mano_path = os.path.join(model_path, model_fn)
 908 |             else:
 909 |                 mano_path = model_path
 910 |                 is_rhand = True if 'RIGHT' in os.path.basename(model_path) else False
 911 |             assert osp.exists(mano_path), 'Path {} does not exist!'.format(
 912 |                 mano_path)
 913 |             if ext == 'pkl':
 914 |                 with open(mano_path, 'rb') as mano_file:
 915 |                     model_data = pickle.load(mano_file, encoding='latin1')
 916 |             elif ext == 'npz':
 917 |                 model_data = np.load(mano_path, allow_pickle=True)
 918 |             else:
 919 |                 raise ValueError('Unknown extension: {}'.format(ext))
 920 |             data_struct = Struct(**model_data)
 921 |             
 922 |     def add_joints(vertices,joints, joint_ids = None):
 923 |       tip_ids = TIP_IDS['mano']
 924 |       dev = vertices.device
 925 |       if joint_ids is None:
 926 |           joint_ids = to_tensor(list(tip_ids.values()),
 927 |                                 dtype=torch.long).to(dev)
 928 |       extra_joints = torch.index_select(vertices, 1, joint_ids)
 929 |       joints = torch.cat([joints, extra_joints], dim=1)
 930 |       return joints
 931 |     
 932 |     data_struct.J_regressor = torch.from_numpy(data_struct.J_regressor.todense()).float()
 933 |     joints_predicted = lbs.vertices2joints(data_struct.J_regressor, torch.tensor(vertices).unsqueeze(0))
 934 |     joints_predicted = add_joints(torch.tensor(vertices).unsqueeze(0), joints_predicted)
 935 | 
 936 |     hand_camera_translation_torch = torch.tensor(camera_translation).unsqueeze(0).unsqueeze(0)
 937 |     # the below comes from renderer.py (ln: 400) to match how the actual hand mesh is
 938 |     # transformed before rendering
 939 |     rot_axis=[1,0,0]
 940 |     rot_angle= 180
 941 |     rot = trimesh.transformations.rotation_matrix(np.radians(rot_angle), rot_axis)
 942 |     joints_predicted_rotated = joints_predicted @ rot[:3, :3].T
 943 |     joints_predicted_translated = joints_predicted_rotated - hand_camera_translation_torch 
 944 | 
 945 |     return joints_predicted_translated
 946 | 
 947 | def joints_np_joint_meshes(joints, radius=.005, vc=colors['green']):
 948 |     joints = to_np(joints)
 949 |     if joints.ndim <3:
 950 |         joints = joints.reshape(1,-1,3)
 951 |     meshes = []
 952 |     for j in joints:
 953 |         joint_mesh = Mesh(vertices=j, radius=radius, vc=vc)
 954 |         meshes.append(joint_mesh)
 955 |     return  meshes
 956 | 
 957 | def get_joints_of_hand_mesh(mesh, vertices, camera_translation):
 958 |     joints = get_mano_hand_joints_from_vertices(vertices, camera_translation)
 959 |     joint_meshes = joints_np_joint_meshes(joints)
 960 |     joint_meshes[0].vertices = joint_meshes[0].vertices + mesh.vertices.mean(0) - joint_meshes[0].vertices.mean(0) # align joints with the hand mesh
 961 |     joints = joints.squeeze(0)
 962 |     joints = joints + mesh.vertices.mean(0) - torch.mean(joints, axis=0) # align joints with the hand mesh
 963 |     return joint_meshes, joints
 964 | 
 965 | def get_hand_keypoints_from_mano_model(joints, rgb_im=None, vizualize=False):
 966 | 
 967 |     joint_mesh = np.linalg.inv(T_OPENGL_TO_OPENCV) @ np.vstack((joints.T, np.ones((1, joints.shape[0]))))
 968 |     joint_mesh = joint_mesh[:3, :].T
 969 |     jm_X, jm_Y, jm_Z = joint_mesh[:, 0], joint_mesh[:, 1], joint_mesh[:, 2]
 970 |     x_pixel = (jm_X * INTRINSICS_HAMER_RENDERER[0, 0] / jm_Z) + INTRINSICS_HAMER_RENDERER[0, 2]
 971 |     y_pixel = (jm_Y * INTRINSICS_HAMER_RENDERER[1, 1] / jm_Z) + INTRINSICS_HAMER_RENDERER[1, 2]
 972 |     joint_mesh_2d = np.vstack((x_pixel, y_pixel)).T
 973 |     joint_mesh_2d = joint_mesh_2d.astype(int)
 974 |     middle_x, middle_y = joint_mesh_2d[MANO_HAND_IDS["middle_tip"]]
 975 |     index_finger_1_x, index_finger_1_y = joint_mesh_2d[MANO_HAND_IDS["index_tip"]]
 976 |     index_finger_2_x, index_finger_2_y = joint_mesh_2d[MANO_HAND_IDS["index_pip"]] 
 977 |     index_finger_3_x, index_finger_3_y = joint_mesh_2d[MANO_HAND_IDS["index_dip"]]
 978 |     thumb_1_x, thumb_1_y = joint_mesh_2d[MANO_HAND_IDS["thumb_tip"]]
 979 |     thumb_2_x, thumb_2_y = joint_mesh_2d[MANO_HAND_IDS["thumb_pip"]]
 980 |     thumb_3_x, thumb_3_y = joint_mesh_2d[MANO_HAND_IDS["thumb_dip"]]
 981 |     if vizualize:
 982 |         assert rgb_im is not None, "Must provide RGB to vizualize"
 983 | 
 984 |         joint_mesh_depth_im = point_cloud_to_depth_image(copy.deepcopy(joint_mesh), 
 985 |                                                       fx=INTRINSICS_HAMER_RENDERER[0, 0],
 986 |                                                       fy=INTRINSICS_HAMER_RENDERER[1, 1],
 987 |                                                       cx=INTRINSICS_HAMER_RENDERER[0, 2],
 988 |                                                       cy=INTRINSICS_HAMER_RENDERER[1, 2],
 989 |                                                       width=int(INTRINSICS_HAMER_RENDERER[0, 2] * 2),
 990 |                                                       height=int(INTRINSICS_HAMER_RENDERER[1, 2] * 2))
 991 |         joint_mesh_depth_im = np.asarray(joint_mesh_depth_im)[..., np.newaxis]
 992 |         plt.imshow(joint_mesh_depth_im)
 993 |         plt.show()
 994 |   
 995 |         plt.imshow(rgb_im)
 996 |         plt.scatter(index_finger_1_x, index_finger_1_y, color='blue', s=10)
 997 |         plt.scatter(index_finger_2_x, index_finger_2_y, color='orange', s=10)
 998 |         plt.scatter(index_finger_3_x, index_finger_3_y, color='purple', s=10)
 999 |         plt.scatter(thumb_1_x, thumb_1_y, color='blue', s=10)
1000 |         plt.scatter(thumb_2_x, thumb_2_y, color='orange', s=10)
1001 |         plt.scatter(thumb_3_x, thumb_3_y, color='purple', s=10)
1002 |         plt.show()    
1003 |         
1004 |     return middle_x, middle_y, index_finger_1_x, index_finger_1_y, index_finger_2_x, index_finger_2_y, \
1005 |         index_finger_3_x, index_finger_3_y, thumb_1_x, thumb_1_y, thumb_2_x, thumb_2_y, thumb_3_x, thumb_3_y 
1006 | 
1007 | 
1008 | def mesh_and_joints_to_world_metric_space(mesh, joints, T_mesh_to_live, scaling_rendered_to_live, live_image_pcd = None, vizualize=False):
1009 |     # Visualize alignment of partial point cloud in the world frame
1010 |     points_mesh = mesh.sample(10000)
1011 |     # points_mesh in camera coordinates
1012 |     points_mesh = np.linalg.inv(T_OPENGL_TO_OPENCV) @ np.vstack((points_mesh.T, np.ones((1, points_mesh.shape[0]))))
1013 |     points_mesh = points_mesh[:3, :].T
1014 |     # points_mesh = points_mesh * scaling_rendered_to_live
1015 |     points_mesh = apply_transformation(points_mesh, T_mesh_to_live)
1016 | 
1017 |     joint_mesh = np.linalg.inv(T_OPENGL_TO_OPENCV) @ np.vstack((joints.T, np.ones((1, joints.shape[0]))))
1018 |     joint_mesh = joint_mesh[:3, :].T
1019 |     # joint_mesh = joint_mesh * scaling_rendered_to_live
1020 |     joint_mesh = apply_transformation(joint_mesh, T_mesh_to_live)
1021 | 
1022 |     mesh.apply_transform(np.linalg.inv(T_OPENGL_TO_OPENCV)) # probably could have done this before points_mesh to save some lines, but yeah it is what it is
1023 |     mesh.apply_scale(scaling_rendered_to_live)
1024 |     mesh.apply_transform(T_mesh_to_live)
1025 |     mesh_o3d = o3d.geometry.TriangleMesh()
1026 |     mesh_o3d.vertices = o3d.utility.Vector3dVector(mesh.vertices)
1027 |     mesh_o3d.triangles = o3d.utility.Vector3iVector(mesh.faces)
1028 |     mesh_o3d.compute_vertex_normals()
1029 |     mesh_o3d.vertex_colors = o3d.utility.Vector3dVector(mesh.visual.vertex_colors[:, :3] / 255.0)
1030 | 
1031 |     if vizualize:
1032 |         assert live_image_pcd is not None, "Need to provide the live image point cloud to vizualize mesh and joints in live image space"
1033 | 
1034 | 
1035 |         pcd_mesh = o3d.geometry.PointCloud()
1036 |         pcd_mesh.points = o3d.utility.Vector3dVector(points_mesh)
1037 |         # pcd joints as speheres
1038 |         pcd_joints = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=joint_mesh[0])
1039 |         for j in joint_mesh:
1040 |             sphere = o3d.geometry.TriangleMesh.create_sphere(radius=.005)
1041 |             sphere.compute_vertex_normals()
1042 |             sphere.paint_uniform_color([1, 0, 0])
1043 |             sphere.translate(j)
1044 |             pcd_joints += sphere
1045 |         o3d.visualization.draw_geometries([pcd_mesh, pcd_joints, live_image_pcd, mesh_o3d])
1046 |     return points_mesh, joint_mesh, mesh_o3d
1047 | 
1048 | 
1049 | def get_gripper_pose_from_frame(rgb, depth, use_mediapipe_for_hand_kpts=False, vizualize=False, press_task=False, scale_depth_image=False):
1050 |     depth = np.array(depth).astype(np.float32)
1051 |     rgb_im, rgb_hand_only, rend_depth_front_view, det_out, hamer_output, hand_mesh_params, all_mesh_params = get_hand_and_rendered_depth(rgb)
1052 |     depth_im = depth
1053 |     human_mask = get_hand_mask_from_detectron(det_out)
1054 |     all_meshes, all_cameras, all_vertices = all_mesh_params[0], all_mesh_params[1], all_mesh_params[2]
1055 |     rgb_hand_only = rgb_hand_only[:,:,:3]
1056 |     joint_meshes, joints_coords = get_joints_of_hand_mesh(copy.deepcopy(all_meshes[0]), copy.deepcopy(all_vertices[0]), copy.deepcopy(all_cameras[0]))
1057 |     if use_mediapipe_for_hand_kpts: 
1058 |         middle_x, middle_y, index_finger_1_x, index_finger_1_y, index_finger_2_x, index_finger_2_y, index_finger_3_x, index_finger_3_y, thumb_1_x, thumb_1_y, thumb_2_x, thumb_2_y, thumb_3_x, thumb_3_y = get_hand_keypoints_from_mediapipe(rgb_im, viz_keypoints=vizualize)
1059 |     else:
1060 |         middle_x, middle_y, index_finger_1_x, index_finger_1_y, index_finger_2_x, index_finger_2_y, index_finger_3_x, index_finger_3_y, thumb_1_x, thumb_1_y, thumb_2_x, thumb_2_y, thumb_3_x, thumb_3_y = get_hand_keypoints_from_mano_model(joints_coords, rgb_im=rgb_im, vizualize=vizualize)
1061 |     scaling_factor = 1
1062 |     if scale_depth_image:
1063 |         # generally no need to use this
1064 |         d_val = -1
1065 |         base_value_finger = 'index' # index, middle, thumb
1066 |         base_val = copy.deepcopy(rend_depth_front_view[index_finger_1_y, index_finger_1_x])
1067 |         idx_x, idx_y = index_finger_1_x, index_finger_1_y
1068 |         if base_val == 0 or depth_im[idx_y, idx_x] == 0:
1069 |             base_value_finger = 'middle' # index, middle, thumb
1070 |             base_val = copy.deepcopy(rend_depth_front_view[middle_y, middle_x])
1071 |             idx_x, idx_y = middle_x, middle_y
1072 |             if base_val == 0 or depth_im[idx_y, idx_x] == 0:
1073 |                 base_value_finger = 'thumb'
1074 |                 base_val = copy.deepcopy(rend_depth_front_view[thumb_1_y, thumb_1_x])
1075 |                 idx_x, idx_y = thumb_1_x, thumb_1_y
1076 |         d_val = copy.deepcopy(depth_im[idx_y, idx_x])
1077 |         percentage_rend_depth_front_view = rend_depth_front_view / base_val
1078 |         scaled_depth_image = percentage_rend_depth_front_view * d_val
1079 |         scaling_factor = d_val / base_val
1080 |         print(f'Scaling factor {base_value_finger}: {scaling_factor}')
1081 |     else:
1082 |         scaled_depth_image = copy.deepcopy(rend_depth_front_view)
1083 |     hand_point_cloud, hand_point_cloud_full, image_point_cloud, hand_pcd_as_o3d, live_image_pcd_as_o3d, T_mesh_to_live = get_hand_pcd_in_scene_from_rendered_model(rgb_im, 
1084 |                                                                                                                                                depth_im, 
1085 |                                                                                                                                                rend_depth_front_view, 
1086 |                                                                                                                                                scaled_depth_image, 
1087 |                                                                                                                                                human_mask, 
1088 |                                                                                                                                                vizualize=vizualize) 
1089 |     mesh_live_np, joints_live, hand_mesh_in_live_metric_space = mesh_and_joints_to_world_metric_space(copy.deepcopy(all_meshes[0]), 
1090 |                                                                copy.deepcopy(joints_coords), 
1091 |                                                                T_mesh_to_live, 
1092 |                                                                scaling_factor, 
1093 |                                                                live_image_pcd=live_image_pcd_as_o3d, 
1094 |                                                                vizualize=vizualize)   
1095 |     gripper_pcd = np.load('assets/utils/gripper_point_cloud_dense.npy')
1096 |     gripper_pcd = gripper_pcd / 1000 # scale down by 1000
1097 |     gripper_pcd_as_o3d = o3d.geometry.PointCloud()
1098 |     gripper_pcd_as_o3d.points = o3d.utility.Vector3dVector(gripper_pcd)
1099 |     gripper_aligned_to_hand_pcd_as_o3d = align_gripper_to_hand(hand_point_cloud, 
1100 |                                                             hand_pcd_as_o3d, 
1101 |                                                             gripper_pcd, 
1102 |                                                             gripper_pcd_as_o3d, 
1103 |                                                             vizualize=vizualize)
1104 | 
1105 |     key_fingers_points = np.array([joints_live[MANO_HAND_IDS["index_tip"]],
1106 |                                 joints_live[MANO_HAND_IDS["index_dip"]],
1107 |                                 joints_live[MANO_HAND_IDS["index_pip"]],
1108 |                                 joints_live[MANO_HAND_IDS["thumb_tip"]], 
1109 |                                 joints_live[MANO_HAND_IDS["thumb_dip"]], 
1110 |                                 joints_live[MANO_HAND_IDS["thumb_pip"]], 
1111 |                                 joints_live[MANO_HAND_IDS["wrist"]],
1112 |                                 (joints_live[MANO_HAND_IDS["index_mcp"]] + joints_live[MANO_HAND_IDS["thumb_dip"]]) / 2])
1113 |     key_fingers_points = key_fingers_points.reshape(-1, 3)
1114 |     gripper_scaled_to_hand_pcd = copy.deepcopy(gripper_aligned_to_hand_pcd_as_o3d) # replace, so no scale applied. Comment out if scaling - although likely you don't need to
1115 |     gripper_scaled_to_hand_pcd.paint_uniform_color([0, 0, 1])
1116 |     bias_T  = np.eye(4)
1117 |     bias_T[2, 3] = 0.01 # NOTE: This is explicit to the robotiq CAD model gripper we had
1118 |     if not press_task:
1119 |         gripper_scaled_to_hand_pcd, gripper_pose, distance_hand_fingers = align_gripper_with_hand_fingers(gripper_scaled_to_hand_pcd, 
1120 |                                                                     np_to_o3d(mesh_live_np), 
1121 |                                                                     key_fingers_points, 
1122 |                                                                     gripper_aligned_to_hand_pcd_as_o3d,
1123 |                                                                     gripper_pcd_as_o3d, 
1124 |                                                                     use_only_thumb_keypoints=False,
1125 |                                                                     use_only_index_keypoints=False,
1126 |                                                                     rescale_gripper_to_hand_opening=False,
1127 |                                                                     rescale_hand_to_gripper_opening=False,
1128 |                                                                     bias_transformation=bias_T,
1129 |                                                                     vizualize=vizualize)
1130 |         gripper_opening_percent = np.min([distance_hand_fingers / DISTANCE_BETWEEN_GRIPPERS_FINGERS, 1])
1131 |     else:
1132 |         gripper_scaled_to_hand_pcd, gripper_pose = align_hand_to_gripper_press(gripper_scaled_to_hand_pcd, 
1133 |                                                                                 np.array([key_fingers_points[0], key_fingers_points[1], key_fingers_points[2]]), 
1134 |                                                                                 vizualize=vizualize,
1135 |                                                                                 bias_transformation=bias_T)
1136 |         gripper_opening_percent = 0
1137 |         
1138 |     return gripper_scaled_to_hand_pcd, live_image_pcd_as_o3d, gripper_pose, gripper_opening_percent, hand_mesh_in_live_metric_space, joints_live
1139 | 
1140 | 
1141 | 
1142 | 
1143 | 
1144 | gripper_poses = []
1145 | gripper_pcds = []
1146 | live_image_pcds = []
1147 | meshes = []
1148 | joint_actions = []
1149 | gripper_actions = []
1150 | if not LOAD_SCENE_DATA_FOR_PROCESSING:
1151 |     print("Processing frames...")
1152 |     for idx in tqdm.tqdm(range(0, hands_rgb.shape[0], FRAME_STEP)):
1153 |         try:
1154 |             rgb = hands_rgb[idx]
1155 |             depth = hands_depth[idx] / 1000
1156 |             gripper_scaled_to_hand_pcd, live_image_pcd_as_o3d, gripper_pose, gripper_opening_percent, hand_mesh, hand_joints_kpts = get_gripper_pose_from_frame(rgb, depth, vizualize=VIZ, press_task=IS_PRESS_TASK)
1157 |             gripper_poses.append(gripper_pose)
1158 |             gripper_pcds.append(gripper_scaled_to_hand_pcd)
1159 |             live_image_pcds.append(live_image_pcd_as_o3d)
1160 |             gripper_actions.append(gripper_opening_percent)
1161 |             meshes.append(hand_mesh)
1162 |             joint_actions.append(hand_joints_kpts)
1163 |             o3d.io.write_point_cloud(f'{SCENE_FILES_FOLDER}/gripper_pcd_{idx}.ply', gripper_scaled_to_hand_pcd)
1164 |             o3d.io.write_point_cloud(f'{SCENE_FILES_FOLDER}/live_image_pcd_{idx}.ply', live_image_pcd_as_o3d)
1165 |             o3d.io.write_triangle_mesh(f'{SCENE_FILES_FOLDER}/hand_mesh_{idx}.ply', hand_mesh)
1166 |             np.save(f'{SCENE_FILES_FOLDER}/gripper_pose_{idx}.npy', gripper_pose)
1167 |             np.save(f'{SCENE_FILES_FOLDER}/gripper_actions_{idx}.npy', gripper_actions)
1168 |             np.save(f'{SCENE_FILES_FOLDER}/hand_joints_kpts_{idx}.npy', hand_joints_kpts)
1169 |         except Exception as e:
1170 |             print(f'Error in frame {idx}: {e}')
1171 |             continue
1172 |     print("Saving joint actions 3D...")
1173 |     joint_actions = np.asarray(joint_actions)
1174 |     joint_actions = np.array(joint_actions)
1175 |     np.save(f'{SCENE_FILES_FOLDER}/hand_joints_kpts_3d.npy', joint_actions)
1176 |     print("Projecting joint actions to 2D...")
1177 |     jm_X, jm_Y, jm_Z = joint_actions[:, :, 0], joint_actions[:, :, 1], joint_actions[:, :, 2]
1178 |     x_pixel = (jm_X * INTRINSICS_REAL_CAMERA[0, 0] / jm_Z) + INTRINSICS_REAL_CAMERA[0, 2]
1179 |     y_pixel = (jm_Y * INTRINSICS_REAL_CAMERA[1, 1] / jm_Z) + INTRINSICS_REAL_CAMERA[1, 2]
1180 |     joints_acts_2d = np.stack([x_pixel, y_pixel], axis=2)
1181 |     print("Saving joint actions 2D...")
1182 |     np.save(f'{SCENE_FILES_FOLDER}/hand_joints_kpts_2d.npy', joints_acts_2d)
1183 | 
1184 | 
1185 | else:
1186 |     print("Loading frames...")
1187 |     gripper_poses = []
1188 |     gripper_pcds = []
1189 |     live_image_pcds = []
1190 |     meshes = []
1191 |     for idx in tqdm.tqdm(range(0, hands_rgb.shape[0], FRAME_STEP)):
1192 |         gripper_pose = np.load(f'{SCENE_FILES_FOLDER}/gripper_pose_{idx}.npy')
1193 |         gripper_pcd = o3d.io.read_point_cloud(f'{SCENE_FILES_FOLDER}/gripper_pcd_{idx}.ply')
1194 |         live_image_pcd = o3d.io.read_point_cloud(f'{SCENE_FILES_FOLDER}/live_image_pcd_{idx}.ply')
1195 |         hand_mesh = o3d.io.read_triangle_mesh(f'{SCENE_FILES_FOLDER}/hand_mesh_{idx}.ply')
1196 |         hand_joints_kpts = np.load(f'{SCENE_FILES_FOLDER}/hand_joints_kpts_{idx}.npy')
1197 |         gripper_poses.append(gripper_pose)
1198 |         gripper_pcds.append(gripper_pcd)
1199 |         live_image_pcds.append(live_image_pcd)
1200 |         meshes.append(hand_mesh)
1201 | 
1202 | print("Interpolate trajectories...")
1203 | # smooth gripper_poses trajectory
1204 | interpolated_gripper_poses = interpolate_pose_sequence(gripper_poses, FRAME_STEP)
1205 | np.save(f'{SCENE_FILES_FOLDER}/interpolated_gripper_poses.npy', interpolated_gripper_poses)
1206 | print("Print saved interpolated trajectory (no smoothing)")
1207 | print("Extracting linear component to smooth...")
1208 | interpolated_gripper_poses_xyz = []
1209 | for pose in tqdm.tqdm(interpolated_gripper_poses):
1210 |     interpolated_gripper_poses_xyz.append(pose[:3, 3])
1211 | interpolated_gripper_poses_xyz = np.array(interpolated_gripper_poses_xyz)
1212 | 
1213 | window_size = FRAME_STEP
1214 | print(f'Print weighted average smoothing with window: {window_size}...')
1215 | interpolated_gripper_poses_filtered = copy.deepcopy(interpolated_gripper_poses_xyz)
1216 | for i in tqdm.tqdm(range(window_size, interpolated_gripper_poses_xyz.shape[0] - window_size)):
1217 |     interpolated_gripper_poses_filtered[i] = np.mean(interpolated_gripper_poses_xyz[i - window_size: i + window_size], axis=0)
1218 | interpolated_gripper_poses_filtered = np.array(interpolated_gripper_poses_filtered)
1219 | 
1220 | # for i in tqdm.tqdm(range(interpolated_gripper_poses.shape[0])):
1221 | #     interpolated_gripper_poses[i, :3, 3] = interpolated_gripper_poses_filtered[i]
1222 | # np.save(f'{scene_files_folder_name}/interpolated_gripper_poses_filtered.npy', interpolated_gripper_poses)
1223 | print("Saved interpolated + smoothed trajectory")
1224 | gripper_pcd0  = gripper_pcds[0]
1225 | gripper_pose0 = gripper_poses[0]
1226 | gripper_pose0_inv = np.linalg.inv(gripper_pose0)
1227 | 
1228 | print("Generating gripper pcds for interpolated poses...")
1229 | interpolated_gripper_poses_coord_frames = []
1230 | interpolated_gripper_pcds = []
1231 | for pose in tqdm.tqdm(interpolated_gripper_poses):
1232 |     coord_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=.100, origin=[0, 0, 0])
1233 |     coord_frame.transform(pose)
1234 |     interpolated_gripper_poses_coord_frames.append(coord_frame)
1235 |     interpolated_gripper_pcds.append(copy.deepcopy(gripper_pcd0).transform(pose @ gripper_pose0_inv))
1236 | 
1237 | print("Projecting target gripper pcds (waypoints) to 2D image...")
1238 | gripper_projections = []
1239 | for i in tqdm.tqdm(range(len(gripper_pcds))):
1240 |     gripper_pcd = gripper_pcds[i]
1241 |     gripper_pcd_np = np.asarray(gripper_pcd.points)
1242 |     gripper_pcd_depth_im = point_cloud_to_depth_image(gripper_pcd_np,
1243 |                                                       INTRINSICS_REAL_CAMERA[0, 0],
1244 |                                                       INTRINSICS_REAL_CAMERA[1, 1],
1245 |                                                       INTRINSICS_REAL_CAMERA[0, 2],
1246 |                                                       INTRINSICS_REAL_CAMERA[1, 2],
1247 |                                                       width=IM_WIDTH,
1248 |                                                       height=IM_HEIGHT)
1249 |     gripper_projections.append(gripper_pcd_depth_im)
1250 | 
1251 | print("Projecting interpolated gripper pcds to 2D image...")
1252 | interpolated_gripper_projections = []
1253 | for i in tqdm.tqdm(range(len(interpolated_gripper_pcds))):
1254 |     gripper_pcd = interpolated_gripper_pcds[i]
1255 |     gripper_pcd_np = np.asarray(gripper_pcd.points)
1256 |     gripper_pcd_depth_im = point_cloud_to_depth_image(gripper_pcd_np,
1257 |                                                       INTRINSICS_REAL_CAMERA[0, 0],
1258 |                                                       INTRINSICS_REAL_CAMERA[1, 1],
1259 |                                                       INTRINSICS_REAL_CAMERA[0, 2],
1260 |                                                       INTRINSICS_REAL_CAMERA[1, 2],
1261 |                                                       width=IM_WIDTH,
1262 |                                                       height=IM_HEIGHT)
1263 |     interpolated_gripper_projections.append(gripper_pcd_depth_im)
1264 | 
1265 | print("Generating video with projected pcds...")
1266 | cnt = 0
1267 | cnt_gripper_proj = 0
1268 | video_ims = []
1269 | for i in range(0, hands_rgb.shape[0]):
1270 |     gripper_proj = interpolated_gripper_projections[cnt][:, :, np.newaxis] 
1271 |     gripper_proj[gripper_proj > 0] = 1
1272 |     im = .5 * gripper_proj + 1 * hands_rgb[i]/255 + 0.5 * gripper_projections[cnt_gripper_proj][:, :, np.newaxis].repeat(3, axis=2) * np.array([0, 1, 0])
1273 |     if cnt % FRAME_STEP == 0:
1274 |         cnt_gripper_proj += 1
1275 |     im = im[..., ::-1]
1276 |     video_ims.append(im)
1277 |     cnt += 1
1278 |     if cnt >= len(interpolated_gripper_projections) or cnt_gripper_proj >= len(gripper_projections):
1279 |         break
1280 | 
1281 | print("Saving video smoothed interpolated...")
1282 | # save video as mp4 
1283 | height, width, layers = video_ims[0].shape
1284 | size = (width,height)
1285 | out = cv2.VideoWriter(f'{SCENE_FILES_FOLDER}/../gripper_overlayed_video.mp4', cv2.VideoWriter_fourcc(*'DIVX'), 10, size)
1286 | for i in range(len(video_ims)):
1287 |     im = (video_ims[i] * 255).astype(np.uint8)
1288 |     out.write(im)
1289 | 
1290 | out.release()
1291 | 
1292 | 
1293 | 


--------------------------------------------------------------------------------
/vizualizer.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import open3d as o3d\n",
 10 |     "import numpy as np\n",
 11 |     "import matplotlib.pyplot as plt\n",
 12 |     "from util_functions import *\n",
 13 |     "from global_vars import *\n"
 14 |    ]
 15 |   },
 16 |   {
 17 |    "cell_type": "code",
 18 |    "execution_count": null,
 19 |    "metadata": {},
 20 |    "outputs": [],
 21 |    "source": [
 22 |     "# load rgbs, depths and hand joint actions\n",
 23 |     "path = 'assets/grasp_fanta/'\n",
 24 |     "rgbs = np.load(f'{path}rgbs.npy')\n",
 25 |     "depths = np.load(f'{path}depths.npy')"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "code",
 30 |    "execution_count": 13,
 31 |    "metadata": {},
 32 |    "outputs": [],
 33 |    "source": [
 34 |     "\n",
 35 |     "MANO_HAND_IDS = {\"wrist\": 0,        \"index_mcp\": 1,     \"index_pip\": 2, \n",
 36 |     "                 \"index_dip\": 3,    \"middle_mcp\": 4,    \"middle_pip\": 5, \n",
 37 |     "                 \"middle_dip\": 6,   \"pinkie_mcp\": 7,    \"pinkie_pip\": 8, \n",
 38 |     "                 \"pinkie_dip\": 9,   \"ring_mcp\": 10,     \"ring_pip\": 11, \n",
 39 |     "                 \"ring_dip\": 12,    \"thumb_mcp\": 13,    \"thumb_pip\": 14, \n",
 40 |     "                 \"thumb_dip\": 15,   \"thumb_tip\": 16,    \"index_tip\": 17, \n",
 41 |     "                 \"middle_tip\": 18,  \"ring_tip\": 19,     \"pinky_tip\": 20}\n",
 42 |     "\n",
 43 |     "\"\"\"\n",
 44 |     "the below is what you likely need. you can vizualize them\n",
 45 |     "at the following cell.\"\"\"\n",
 46 |     "actions = np.load(f'{path}/scene_files/hand_joints_kpts_3d.npy')\n",
 47 |     "rgbs = rgbs[:len(actions)] # sometimes some actions are missing because they were not extracted during processing as for some frames there were no hand present and so they are automatically discarded\n",
 48 |     "depths = depths[:len(actions)]"
 49 |    ]
 50 |   },
 51 |   {
 52 |    "cell_type": "code",
 53 |    "execution_count": 14,
 54 |    "metadata": {},
 55 |    "outputs": [
 56 |     {
 57 |      "name": "stderr",
 58 |      "output_type": "stream",
 59 |      "text": [
 60 |       "RPly: Unable to open file\n",
 61 |       "RPly: Unable to open file\n",
 62 |       "RPly: Unable to open file\n"
 63 |      ]
 64 |     },
 65 |     {
 66 |      "name": "stdout",
 67 |      "output_type": "stream",
 68 |      "text": [
 69 |       "\u001b[1;33m[Open3D WARNING] Read PLY failed: unable to open file: assets/pick_up_phone//scene_files/live_image_pcd_12.ply\u001b[0;m\n",
 70 |       "\u001b[1;33m[Open3D WARNING] Read PLY failed: unable to open file: assets/pick_up_phone//scene_files/hand_mesh_12.ply\u001b[0;m\n",
 71 |       "\u001b[1;33m[Open3D WARNING] Read PLY failed: unable to open file: assets/pick_up_phone//scene_files/gripper_pcd_12.ply\u001b[0;m\n"
 72 |      ]
 73 |     }
 74 |    ],
 75 |    "source": [
 76 |     "\n",
 77 |     "# load scene files\n",
 78 |     "\n",
 79 |     "depth_pcds = []\n",
 80 |     "hand_mesh_pcds = []\n",
 81 |     "gripper_pcds = []\n",
 82 |     "sphere_viz_actions = []\n",
 83 |     "depth_pcds_no_stabilization = []\n",
 84 |     "hand_joints = []\n",
 85 |     "for i in range(rgbs.shape[0]):\n",
 86 |     "\n",
 87 |     "    frame = i\n",
 88 |     "    depth_pcd = o3d.io.read_point_cloud(f\"{path}/scene_files/live_image_pcd_{frame}.ply\")\n",
 89 |     "    # remove far away depth for visualization\n",
 90 |     "    depth_pcd_np = np.asarray(depth_pcd.points)\n",
 91 |     "    depth_pcd_cols = np.asarray(depth_pcd.colors)\n",
 92 |     "    depth_pcd_cols = depth_pcd_cols[depth_pcd_np[:, 2] < 1]\n",
 93 |     "    depth_pcd_np = depth_pcd_np[depth_pcd_np[:, 2] < 1]\n",
 94 |     "    depth_pcd.points = o3d.utility.Vector3dVector(depth_pcd_np)\n",
 95 |     "    depth_pcd.colors = o3d.utility.Vector3dVector(depth_pcd_cols)\n",
 96 |     "\n",
 97 |     "    hand_mesh_pcd = o3d.io.read_triangle_mesh(f\"{path}/scene_files/hand_mesh_{frame}.ply\")\n",
 98 |     "    gripper_pcd = o3d.io.read_point_cloud(f\"{path}/scene_files/gripper_pcd_{frame}.ply\")\n",
 99 |     "    depth_pcds.append(depth_pcd)\n",
100 |     "    hand_mesh_pcds.append(hand_mesh_pcd)\n",
101 |     "    gripper_pcds.append(gripper_pcd)\n",
102 |     "    action = actions[i]\n",
103 |     "    sphere_viz_actions = []\n",
104 |     "    for a in actions[i]:\n",
105 |     "        sphere = o3d.geometry.TriangleMesh.create_sphere(radius=0.005)\n",
106 |     "        sphere.compute_vertex_normals()\n",
107 |     "        sphere.paint_uniform_color([0, 0, 1])\n",
108 |     "        sphere.translate(a)\n",
109 |     "        sphere_viz_actions.append(sphere)\n",
110 |     "        sphere_viz_actions.append(sphere)\n",
111 |     "    hand_joints.append(sphere_viz_actions)\n",
112 |     "\n",
113 |     "\n",
114 |     "\n"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": 15,
120 |    "metadata": {},
121 |    "outputs": [],
122 |    "source": [
123 |     "idx = 10 # pick the index\n",
124 |     "\n",
125 |     "o3d.visualization.draw_geometries([depth_pcds[idx], gripper_pcds[idx], hand_mesh_pcds[idx]]) \n"
126 |    ]
127 |   },
128 |   {
129 |    "cell_type": "code",
130 |    "execution_count": null,
131 |    "metadata": {},
132 |    "outputs": [],
133 |    "source": []
134 |   }
135 |  ],
136 |  "metadata": {
137 |   "kernelspec": {
138 |    "display_name": "base",
139 |    "language": "python",
140 |    "name": "python3"
141 |   },
142 |   "language_info": {
143 |    "codemirror_mode": {
144 |     "name": "ipython",
145 |     "version": 3
146 |    },
147 |    "file_extension": ".py",
148 |    "mimetype": "text/x-python",
149 |    "name": "python",
150 |    "nbconvert_exporter": "python",
151 |    "pygments_lexer": "ipython3",
152 |    "version": "3.11.5"
153 |   }
154 |  },
155 |  "nbformat": 4,
156 |  "nbformat_minor": 2
157 | }
158 | 


--------------------------------------------------------------------------------