├── .gitignore ├── README.md ├── data ├── .gitkeep └── demo │ ├── depths │ ├── img_001000.jpg.exr │ ├── img_001796.jpg.exr │ └── img_002376.jpg.exr │ ├── imgs │ ├── img_001000.jpg │ ├── img_001796.jpg │ └── img_002376.jpg │ └── out │ └── .gitkeep ├── dataset ├── __init__.py ├── demo_dataset.py ├── real_depth_utils.py └── test_dataset.py ├── demo.py ├── evaluation ├── __init__.py └── python_evaluate_3d_our_dataset.py ├── experiments └── sceneego │ └── test │ └── sceneego.yaml ├── models ├── .gitkeep └── sceneego │ └── checkpoints │ └── .gitkeep ├── network ├── __init__.py ├── pose_resnet.py ├── v2v.py └── voxel_net_depth.py ├── requirements.txt ├── resources └── Wang_CVPR_2023.gif ├── test.py ├── utils ├── __init__.py ├── calculate_errors.py ├── cfg.py ├── data_transforms.py ├── depth2pointcloud.py ├── fisheye │ ├── FishEyeCalibrated.py │ ├── FishEyeEquisolid.py │ ├── __init__.py │ ├── fisheye.calibration.json │ ├── fisheye.calibration_05_08.json │ └── mean3D.mat ├── get_predict.py ├── img.py ├── misc.py ├── multiview.py ├── op.py ├── pose_visualization_utils.py ├── rigid_transform_with_scale.py ├── skeleton.py └── volumetric.py └── visualize.py /.gitignore: -------------------------------------------------------------------------------- 1 | .idea 2 | 3 | 4 | *.tar 5 | 6 | *.ply 7 | 8 | *.pkl -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SceneEgo 2 | 3 | Official implementation of paper: 4 | 5 | **Scene-aware Egocentric 3D Human Pose Estimation** 6 | 7 | *Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, Christian Theobalt* 8 | 9 | *CVPR 2023* 10 | 11 | [[Project Page](https://people.mpi-inf.mpg.de/~jianwang/projects/sceneego/)] <---- The dataset link in the project page might have expired. 12 | 13 | [[SceneEgo Datasets (Train and Test)](https://edmond.mpg.de/dataset.xhtml?persistentId=doi:10.17617/3.VCIHDO)] 14 | 15 | 16 | 17 | [[EgoGTA](https://edmond.mpg.de/dataset.xhtml?persistentId=doi:10.17617/3.MYZMVZ)] [[EgoPW-Scene](https://edmond.mpg.de/dataset.xhtml?persistentId=doi:10.17617/3.EAFCFH)] 18 | 19 | ![Demo image](./resources/Wang_CVPR_2023.gif) 20 | 21 | ### Annotation format in Test dataset 22 | 23 | The annotation of the dataset is saved in "annotation.pkl" of each sequence. Load the pickle file with: 24 | 25 | ```python 26 | with open('annotation.pkl', 'rb') as f: 27 | data = pickle.load(f) 28 | print(data[0].keys()) 29 | ``` 30 | The data is a Python list, each item is a Python dict containing the annotations: 31 | - ext_id: the annotation id of external multiview mocap system; 32 | - calib_board_pose: the 6d pose of the calibration board on the head; 33 | - ego_pose_gt: the ground truth human body pose under the egocentric camera coordinate system, the joint sequence is: Neck, Right Shoulder, Right Elbow, Right Wrist, Left Shoulder, Left Elbow, Left Wrist, Right Hip, Right Knee, Right Ankle, Right Toe, Left Hip, Left Knee, Left Ankle, Left Toe; 34 | - ext_pose_gt: the human pose ground truth in the mocap system coordinate; 35 | - image_name: name of image under directory "imgs"; 36 | - ego_camera_matrix: the 6d pose of the egocentric camera on the head. 37 | 38 | The id of the egocentric camera can also be obtained with the synchronization file with: 39 | ```python 40 | 41 | with open('syn.json', 'r') as f: 42 | syn_data = json.load(f) 43 | 44 | ego_start_frame = syn_data['ego'] 45 | ext_start_frame = syn_data['ext'] 46 | ego_id = ext_id - ext_start_frame + ego_start_frame 47 | egocentric_image_name = "img_%06d.jpg" % ego_id 48 | ``` 49 | 50 | ### Install 51 | 52 | 1. Create a new anaconda environment 53 | 54 | ```shell 55 | conda create -n sceneego python=3.9 56 | 57 | conda activate sceneego 58 | ``` 59 | 60 | 2. Install pytorch 1.13.1 from https://pytorch.org/get-started/previous-versions/ 61 | 62 | 3. Install other dependencies 63 | ```shell 64 | pip install -r requirements.txt 65 | ``` 66 | ### Run the demo 67 | 68 | 1. Download [pre-trained pose estimation model](https://nextcloud.mpi-klsb.mpg.de/index.php/s/DGB6XKEPwwQbmTi) and put it under ```models/sceneego/checkpoints``` 69 | 70 | 2. run: 71 | ```shell 72 | python demo.py --config experiments/sceneego/test/sceneego.yaml --img_dir data/demo/imgs --depth_dir data/demo/depths --output_dir data/demo/out --vis True 73 | ``` 74 | The result will be shown with the open3d visualizer and the predicted pose is saved at ```data/demo/out```. 75 | 76 | 3. The predicted pose is saved as the pkl file (e.g. ```img_001000.jpg.pkl```). To visualize the predicted result, run: 77 | ```shell 78 | python visualize.py --img_path data/demo/imgs/img_001000.jpg --depth_path data/demo/depths/img_001000.jpg.exr --pose_path data/demo/out/img_001000.jpg.pkl 79 | ``` 80 | The result will be shown with the open3d visualizer. 81 | 82 | ### Test on your own dataset 83 | If you want to test on your own dataset, after obtaining egocentric frames, you need to: 84 | 85 | 1. Run the egocentric human body segmentation network to get the human body segmentation for each frame: 86 | 87 | See repo: [Egocentric Human Body Segmentation](https://github.com/yt4766269/EgocentricHumanBodySeg) 88 | 89 | 3. Run the depth estimator to get the scene depth map for each frame: 90 | 91 | See repo: [Egocentric Depth Estimator](https://github.com/yt4766269/EgocentricDepthEstimator) 92 | 93 | 94 | ### Citation 95 | 96 | If you find this work or code is helpful in your research, please cite: 97 | ```` 98 | @inproceedings{wang2023scene, 99 | title={Scene-aware Egocentric 3D Human Pose Estimation}, 100 | author={Wang, Jian and Luvizon, Diogo and Xu, Weipeng and Liu, Lingjie and Sarkar, Kripasindhu and Theobalt, Christian}, 101 | booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, 102 | pages={13031--13040}, 103 | year={2023} 104 | } 105 | ```` 106 | 107 | [//]: # (### Test on real-world dataset) 108 | 109 | [//]: # () 110 | [//]: # (1. Download [pre-trained pose estimation model](https://nextcloud.mpi-klsb.mpg.de/index.php/s/DGB6XKEPwwQbmTi) and put it under ```models/sceneego/checkpoints```) 111 | 112 | [//]: # () 113 | [//]: # () 114 | [//]: # (2. Download the test dataset from to ```data/sceneego```) 115 | 116 | [//]: # () 117 | [//]: # (3. run:) 118 | 119 | [//]: # (```shell) 120 | 121 | [//]: # (python test.py --data_path data/sceneego) 122 | 123 | [//]: # (```) 124 | 125 | 126 | 127 | 128 | 129 | 130 | -------------------------------------------------------------------------------- /data/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/.gitkeep -------------------------------------------------------------------------------- /data/demo/depths/img_001000.jpg.exr: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/depths/img_001000.jpg.exr -------------------------------------------------------------------------------- /data/demo/depths/img_001796.jpg.exr: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/depths/img_001796.jpg.exr -------------------------------------------------------------------------------- /data/demo/depths/img_002376.jpg.exr: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/depths/img_002376.jpg.exr -------------------------------------------------------------------------------- /data/demo/imgs/img_001000.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/imgs/img_001000.jpg -------------------------------------------------------------------------------- /data/demo/imgs/img_001796.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/imgs/img_001796.jpg -------------------------------------------------------------------------------- /data/demo/imgs/img_002376.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/imgs/img_002376.jpg -------------------------------------------------------------------------------- /data/demo/out/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/out/.gitkeep -------------------------------------------------------------------------------- /dataset/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/dataset/__init__.py -------------------------------------------------------------------------------- /dataset/demo_dataset.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import pickle 4 | import sys 5 | 6 | import cv2 7 | import numpy as np 8 | import torch 9 | from torch.utils.data import Dataset 10 | 11 | # import utils.data_transforms as transforms 12 | from utils.data_transforms import Normalize, ToTensor 13 | 14 | from dataset.real_depth_utils import depth_map_to_voxel 15 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated 16 | 17 | 18 | class DemoDataset(Dataset): 19 | 20 | def calculated_ray_direction(self, image_width, image_height): 21 | points = np.zeros(shape=(image_width, image_height, 2)) 22 | x_range = np.array(range(image_width)) 23 | y_range = np.array(range(image_height)) 24 | points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose() 25 | points[:, :, 1] = np.add(points[:, :, 1], y_range) 26 | points = points.reshape((-1, 2)) 27 | ray = self.camera_model.camera2world_ray(points) 28 | return ray 29 | 30 | def __init__(self, config, img_dir, depth_dir, voxel_output=False, img_mean=(0.485, 0.456, 0.406), 31 | img_std=(0.229, 0.224, 0.225)): 32 | self.img_dir = img_dir 33 | self.depth_dir = depth_dir 34 | self.voxel_output = voxel_output 35 | 36 | self.normalize = Normalize(mean=img_mean, std=img_std) 37 | self.to_tensor = ToTensor() 38 | 39 | self.img_size = config.image_shape 40 | 41 | self.camera_model_path = config.dataset.camera_calibration_path 42 | self.camera_model = FishEyeCameraCalibrated(calibration_file_path=self.camera_model_path) 43 | 44 | self.ray = self.calculated_ray_direction(config.dataset.image_width, config.dataset.image_height) 45 | 46 | self.data_list = self.get_input_data(self.img_dir, self.depth_dir) 47 | 48 | self.cuboid_side = config.model.cuboid_side 49 | self.volume_size = config.model.volume_size 50 | 51 | def get_input_data(self, img_dir, depth_dir): 52 | print("start loading test file") 53 | data_list = [] 54 | 55 | for img_name in os.listdir(img_dir): 56 | img_path = os.path.join(img_dir, img_name) 57 | depth_path = os.path.join(depth_dir, f'{img_name}.exr') 58 | if not os.path.exists(depth_path): 59 | raise Exception(f"The depth map {depth_path} does not exist!") 60 | 61 | data_list.append({'img_path': img_path, 'depth_path': depth_path}) 62 | return data_list 63 | 64 | def __len__(self): 65 | return len(self.data_list) 66 | 67 | def __getitem__(self, index): 68 | img_path = self.data_list[index]['img_path'] 69 | depth_path = self.data_list[index]['depth_path'] 70 | 71 | 72 | raw_img = cv2.imread(img_path) 73 | raw_img = raw_img[:, 128: -128, :] 74 | 75 | # data augmentation 76 | img = cv2.resize(raw_img, dsize=(256, 256)) / 255. 77 | 78 | img_rgb = img[:, :, ::-1] 79 | img_rgb = np.ascontiguousarray(img_rgb) 80 | 81 | img_torch = self.normalize(img) 82 | img_torch = self.to_tensor(img_torch) 83 | img_rgb = self.normalize(img_rgb) 84 | img_rgb_torch = self.to_tensor(img_rgb) 85 | 86 | depth_map = cv2.imread(depth_path, cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH) 87 | if depth_map.shape[0] != 1024 or depth_map.shape[1] != 1280: 88 | depth_map = cv2.resize(depth_map, (1280, 1024), interpolation=cv2.INTER_NEAREST) 89 | if len(depth_map.shape) == 3: 90 | depth_map = depth_map[:, :, 0] 91 | depth_map[depth_map > 10] = 10 92 | 93 | if self.voxel_output is True: 94 | depth_scene_info = depth_map_to_voxel(self.ray, depth_map, self.cuboid_side, self.volume_size) 95 | else: 96 | depth_scene_info = torch.from_numpy(depth_map).float() 97 | 98 | return img_torch, img_rgb_torch, depth_scene_info, img_path 99 | -------------------------------------------------------------------------------- /dataset/real_depth_utils.py: -------------------------------------------------------------------------------- 1 | import json 2 | import numpy as np 3 | from copy import copy 4 | import torch 5 | 6 | def calcualate_depth_scale(depth_scale_json_file, log_err=False): 7 | with open(depth_scale_json_file, 'r') as f: 8 | depth_scale_data_list = json.load(f) 9 | # print(depth_scale_data_list) 10 | 11 | scale_list = [] 12 | for scale_data in depth_scale_data_list: 13 | x1 = scale_data['x1'] 14 | x2 = scale_data['x2'] 15 | 16 | x1 = np.asarray(x1) 17 | x2 = np.asarray(x2) 18 | 19 | distance = np.linalg.norm(x2 - x1) 20 | # print(distance) 21 | scale = scale_data['real'] / distance 22 | scale_list.append(scale) 23 | if log_err: 24 | print(scale_list) 25 | print(np.std(scale_list) / np.average(scale_list)) 26 | # print(np.average(scale_list)) 27 | return np.average(scale_list) 28 | 29 | def depth_map_to_voxel(ray, depth, cuboid_side, volume_size): 30 | # directly multiply the depth on the pre-calculated rays 31 | depth_img_flat = depth.T.reshape((-1)) 32 | point_cloud = ray.T * depth_img_flat 33 | point_cloud = point_cloud.T 34 | 35 | # import open3d 36 | # point_cloud_vis = open3d.geometry.PointCloud() 37 | # point_cloud_vis.points = open3d.utility.Vector3dVector(point_cloud) 38 | # coord = open3d.geometry.TriangleMesh.create_coordinate_frame() 39 | # open3d.visualization.draw_geometries([point_cloud_vis, coord]) 40 | 41 | # point cloud to voxel 42 | voxel_torch = point_cloud_to_voxel_pytorch(point_cloud, cuboid_side, volume_size) 43 | return voxel_torch 44 | 45 | def point_cloud_to_voxel_pytorch(point_cloud, cuboid_side, volume_size): 46 | scene_point_cloud_local = copy(point_cloud) 47 | scene_point_cloud_local[:, 0] = (scene_point_cloud_local[:, 48 | 0] + cuboid_side / 2) * volume_size / cuboid_side 49 | scene_point_cloud_local[:, 1] = (scene_point_cloud_local[:, 50 | 1] + cuboid_side / 2) * volume_size / cuboid_side 51 | scene_point_cloud_local[:, 2] = (scene_point_cloud_local[:, 2]) * volume_size / cuboid_side 52 | 53 | scene_point_cloud_local = np.round_(scene_point_cloud_local) 54 | good_indices = np.logical_and(volume_size-1 >= scene_point_cloud_local, scene_point_cloud_local >= 0) 55 | good_indices = np.all(good_indices, axis=1) 56 | scene_point_cloud_local = scene_point_cloud_local[good_indices] 57 | # scene_point_cloud_local = np.clip(scene_point_cloud_local, a_min=0, a_max=self.volume_size - 1).astype(np.int) 58 | voxel_torch = torch.zeros(size=(volume_size, volume_size, volume_size)) 59 | voxel_torch[scene_point_cloud_local.T] = 1 60 | return voxel_torch 61 | 62 | if __name__ == '__main__': 63 | json_file = r'\\winfs-inf\CT\EgoMocap\work\EgoBodyInContext\sfm_test_data\jian3\scale.json' 64 | result = calcualate_depth_scale(json_file, log_err=True) 65 | print(result) -------------------------------------------------------------------------------- /dataset/test_dataset.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import pickle 4 | import sys 5 | 6 | import cv2 7 | import numpy as np 8 | import torch 9 | from torch.utils.data import Dataset 10 | 11 | # import utils.data_transforms as transforms 12 | from utils.data_transforms import Normalize, ToTensor 13 | 14 | from utils.calculate_errors import align_skeleton, calculate_error 15 | from dataset.real_depth_utils import depth_map_to_voxel 16 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated 17 | 18 | 19 | class TestDataset(Dataset): 20 | 21 | def calculated_ray_direction(self, image_width, image_height): 22 | points = np.zeros(shape=(image_width, image_height, 2)) 23 | x_range = np.array(range(image_width)) 24 | y_range = np.array(range(image_height)) 25 | points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose() 26 | points[:, :, 1] = np.add(points[:, :, 1], y_range) 27 | points = points.reshape((-1, 2)) 28 | ray = self.camera_model.camera2world_ray(points) 29 | return ray 30 | 31 | def __init__(self, config, root_dir, seq_name, estimated_depth_name=None, voxel_output=True, img_mean=(0.485, 0.456, 0.406), 32 | img_std=(0.229, 0.224, 0.225), with_one_depth=False): 33 | self.voxel_output = voxel_output 34 | self.estimated_depth_name = estimated_depth_name 35 | self.with_one_depth = with_one_depth # only for visualization 36 | 37 | self.normalize = Normalize(mean=img_mean, std=img_std) 38 | self.to_tensor = ToTensor() 39 | 40 | self.img_size = config.image_shape 41 | 42 | self.camera_model_path = config.dataset.camera_calibration_path 43 | self.camera_model = FishEyeCameraCalibrated(calibration_file_path=self.camera_model_path) 44 | 45 | self.ray = self.calculated_ray_direction(config.dataset.image_width, config.dataset.image_height) 46 | 47 | self.image_path_list, self.gt_pose_list, self.depth_map_list = self.get_gt_data(root_dir, seq_name) 48 | 49 | assert len(self.image_path_list) == len(self.gt_pose_list) and len(self.gt_pose_list) == len( 50 | self.depth_map_list) 51 | 52 | self.cuboid_side = config.model.cuboid_side 53 | self.volume_size = config.model.volume_size 54 | 55 | def get_gt_data(self, root_dir, seq_name): 56 | print("start loading test file") 57 | base_path = os.path.join(root_dir, seq_name) 58 | 59 | img_data_path = os.path.join(base_path, 'imgs') 60 | gt_path = os.path.join(base_path, 'local_pose_gt.pkl') 61 | if self.estimated_depth_name is not None: 62 | depth_path = os.path.join(base_path, self.estimated_depth_name) 63 | else: 64 | depth_path = os.path.join(base_path, 'rendered', 'depths') 65 | syn_path = os.path.join(base_path, 'syn.json') 66 | 67 | with open(syn_path, 'r') as f: 68 | syn_data = json.load(f) 69 | 70 | ego_start_frame = syn_data['ego'] 71 | ext_start_frame = syn_data['ext'] 72 | 73 | with open(gt_path, 'rb') as f: 74 | pose_gt_data = pickle.load(f) 75 | 76 | image_path_list = [] 77 | gt_pose_list = [] 78 | depth_path_list = [] 79 | 80 | for pose_gt_item in pose_gt_data: 81 | ext_id = pose_gt_item['ext_id'] 82 | ego_pose_gt = pose_gt_item['ego_pose_gt'] 83 | if ego_pose_gt is None: 84 | continue 85 | ego_id = ext_id - ext_start_frame + ego_start_frame 86 | egocentric_image_name = "img_%06d.jpg" % ego_id 87 | depth_name = "img_%06d" % ego_id 88 | 89 | image_path = os.path.join(img_data_path, egocentric_image_name) 90 | if not os.path.exists(image_path): 91 | continue 92 | image_path_list.append(image_path) 93 | if self.estimated_depth_name is not None: 94 | depth_full_path = os.path.join(depth_path, 'img_%06d.jpg.exr' % ego_id) 95 | else: 96 | depth_full_path = os.path.join(depth_path, depth_name, 'Image0001.exr') 97 | depth_path_list.append(depth_full_path) 98 | gt_pose_list.append(ego_pose_gt) 99 | print("dataset length: {}".format(len(image_path_list))) 100 | return image_path_list, gt_pose_list, depth_path_list 101 | 102 | def evaluate_mpjpe(self, predicted_pose_list): 103 | gt_pose_list = self.gt_pose_list 104 | 105 | mpjpe = calculate_error(predicted_pose_list, gt_pose_list) 106 | 107 | # align the estimated result and original result 108 | aligned_estimated_result, gt_seq = align_skeleton(predicted_pose_list, gt_pose_list, None) 109 | 110 | pampjpe = calculate_error(aligned_estimated_result, gt_seq) 111 | 112 | return mpjpe, pampjpe 113 | 114 | def __len__(self): 115 | return len(self.image_path_list) 116 | 117 | def __getitem__(self, index): 118 | img_path = self.image_path_list[index] 119 | if self.with_one_depth: 120 | # only for visualization 121 | depth_path = self.depth_map_list[2070] 122 | else: 123 | depth_path = self.depth_map_list[index] 124 | 125 | raw_img = cv2.imread(img_path) 126 | raw_img = raw_img[:, 128: -128, :] 127 | # data augmentation 128 | img = cv2.resize(raw_img, dsize=(256, 256)) / 255. 129 | 130 | img_rgb = img[:, :, ::-1] 131 | img_rgb = np.ascontiguousarray(img_rgb) 132 | 133 | img_torch = self.normalize(img) 134 | img_torch = self.to_tensor(img_torch) 135 | img_rgb = self.normalize(img_rgb) 136 | img_rgb_torch = self.to_tensor(img_rgb) 137 | 138 | depth_map = cv2.imread(depth_path, cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH) 139 | if depth_map.shape[0] != 1024 or depth_map.shape[1] != 1280: 140 | depth_map = cv2.resize(depth_map, (1280, 1024), interpolation=cv2.INTER_NEAREST) 141 | if len(depth_map.shape) == 3: 142 | depth_map = depth_map[:, :, 0] 143 | depth_map[depth_map > 10] = 10 144 | 145 | if self.voxel_output is True: 146 | depth_scene_info = depth_map_to_voxel(self.ray, depth_map, self.cuboid_side, self.volume_size) 147 | else: 148 | depth_scene_info = torch.from_numpy(depth_map).float() 149 | 150 | return img_torch, img_rgb_torch, depth_scene_info, img_path 151 | 152 | def main(): 153 | dataset = TestDataset(config=None, seq_name='new_jian1', 154 | voxel_output=False) 155 | 156 | img_torch, img_rgb_torch, depth_scene_info, img_path = dataset[100] 157 | 158 | 159 | if __name__ == '__main__': 160 | main() -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pprint import pprint 3 | 4 | import torch 5 | from torch.utils.data import DataLoader 6 | from tqdm import tqdm 7 | 8 | from dataset.demo_dataset import DemoDataset 9 | from network.voxel_net_depth import VoxelNetwork_depth 10 | from utils import cfg 11 | from utils.skeleton import Skeleton 12 | import argparse 13 | import pickle 14 | from visualize import visualize 15 | 16 | os.environ["OPENCV_IO_ENABLE_OPENEXR"] = "1" 17 | 18 | 19 | class Demo: 20 | def __init__(self, config, img_dir, depth_dir): 21 | self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') 22 | self.demo_dataset = DemoDataset(config, img_dir, depth_dir, voxel_output=False) 23 | self.demo_dataloader = DataLoader(self.demo_dataset, batch_size=1, shuffle=False, drop_last=False, 24 | num_workers=0) 25 | 26 | self.network = VoxelNetwork_depth(config) 27 | 28 | # load the network model 29 | model_path = config.test.model_path 30 | loads = torch.load(model_path) 31 | self.network.load_state_dict(loads['state_dict']) 32 | 33 | self.network = self.network.to(self.device) 34 | 35 | self.skeleton = Skeleton(calibration_path=config.dataset.camera_calibration_path) 36 | 37 | def run(self, config): 38 | print('---------------------Start Training-----------------------') 39 | pprint(config.__dict__) 40 | self.network.eval() 41 | 42 | result_list = [] 43 | with torch.no_grad(): 44 | for i, (img, img_rgb, depth_info, img_path) in tqdm(enumerate(self.demo_dataloader)): 45 | img = img.to(self.device) 46 | img_rgb = img_rgb.to(self.device) 47 | 48 | depth_info = depth_info.to(self.device) 49 | 50 | grid_coord_proj_batch = self.network.grid_coord_proj_batch 51 | coord_volumes = self.network.coord_volumes 52 | 53 | vol_keypoints_3d, features, volumes, coord_volumes = self.network(img, grid_coord_proj_batch, 54 | coord_volumes, 55 | depth_map_batch=depth_info) 56 | 57 | predicted_keypoints_batch = vol_keypoints_3d.cpu().numpy() 58 | 59 | assert len(vol_keypoints_3d) == 1 and len(img_path) == 1 # make sure the batch is 1 60 | 61 | result_list.append({'img_path': img_path[0], 'predicted_keypoints': predicted_keypoints_batch[0]}) 62 | 63 | # save predicted joint to output dir 64 | 65 | return result_list 66 | 67 | 68 | def main(): 69 | parser = argparse.ArgumentParser() 70 | parser.add_argument('--config', type=str, required=False, default='experiments/sceneego/test/sceneego.yaml') 71 | parser.add_argument("--img_dir", type=str, required=False, default='data/demo/imgs') 72 | parser.add_argument("--depth_dir", type=str, required=False, default='data/demo/depths') 73 | parser.add_argument("--output_dir", type=str, required=False, default='data/demo/out') 74 | parser.add_argument("--vis", type=str, required=False, default='false') 75 | 76 | args = parser.parse_args() 77 | 78 | config_path = args.config 79 | img_dir = args.img_dir 80 | depth_dir = args.depth_dir 81 | output_dir = args.output_dir 82 | vis = args.vis 83 | 84 | config = cfg.load_config(config_path) 85 | demo = Demo(config, img_dir, depth_dir) 86 | result_list = demo.run(config) 87 | 88 | for result_dict in result_list: 89 | # save predicted joint list 90 | img_path = result_dict['img_path'] 91 | img_name = os.path.split(img_path)[1] 92 | pose_pred = result_dict['predicted_keypoints'] 93 | out_path = os.path.join(output_dir, f'{img_name}.pkl') 94 | depth_path = os.path.join(depth_dir, f'{img_name}.exr') 95 | 96 | with open(out_path, 'wb') as f: 97 | pickle.dump(pose_pred, f) 98 | 99 | # visualize the pose and depth map 100 | if vis.lower() == 'true': 101 | visualize(img_path, depth_path, out_path) 102 | 103 | 104 | if __name__ == '__main__': 105 | main() 106 | -------------------------------------------------------------------------------- /evaluation/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/evaluation/__init__.py -------------------------------------------------------------------------------- /evaluation/python_evaluate_3d_our_dataset.py: -------------------------------------------------------------------------------- 1 | from scipy.io import loadmat 2 | import open3d 3 | from utils.skeleton import Skeleton 4 | import numpy as np 5 | from utils.calculate_errors import align_skeleton, calculate_error 6 | import os 7 | from natsort import natsorted 8 | from tqdm import tqdm 9 | import pickle 10 | 11 | jian3_motion_type = { 12 | 'start': 557, 13 | 'walking': [np.asarray([557, 897]), np.asarray([1137, 1327])], 14 | 'running': [np.asarray([897, 1137])], 15 | 'boxing': [np.asarray([1327, 1417])], 16 | 'stretching': [np.asarray([1417, 1587])], 17 | 'waving': [np.asarray([1587, 1687])], 18 | 'sitting': [np.asarray([1687, 1857])]} 19 | 20 | studio_jian1_motion_type = { 21 | 'start': 503, 22 | 'walking': [np.asarray([503, 1723])], 23 | 'running': [np.asarray([1723, 2153])], 24 | 'crouching': [np.asarray([2153, 2393])], 25 | 'boxing': [np.asarray([2393, 2883])], 26 | 'dancing': [np.asarray([2883, 3223])], 27 | 'stretching': [np.asarray([3223, 3553])], 28 | 'waving': [np.asarray([3553, 3603])]} 29 | 30 | studio_lingjie1_motion_type = { 31 | 'start': 551, 32 | 'walking': [np.asarray([551, 1761])], 33 | 'crouching': [np.asarray([1761, 2031])], 34 | 'boxing': [np.asarray([2031, 2331])], 35 | 'dancing': [np.asarray([2331, 2691])], 36 | 'stretching': [np.asarray([2691, 2991])], 37 | 'waving': [np.asarray([2991, 3251])]} 38 | 39 | studio_jian2_motion_type = { 40 | 'start': 600, 41 | 'walking': [np.asarray([600, 1920])], 42 | 'dancing': [np.asarray([2170, 2310])], 43 | 'playingballs': [np.asarray([1920, 2170])], 44 | 'opendoor': [np.asarray([2310, 2740])], 45 | 'playgolf': [np.asarray([2740, 2980])], 46 | 'talking': [np.asarray([2980, 3210])], 47 | 'shootingarrow': [np.asarray([3210, 3400])]} 48 | 49 | studio_lingjie2_motion_type = { 50 | 'start': 438, 51 | 'walking': [np.asarray([438, 1048])], 52 | 'running': [np.asarray([1048, 1548])], 53 | 'playingballs': [np.asarray([1548, 1748])], 54 | 'opendoor': [np.asarray([1748, 2008])], 55 | 'playgolf': [np.asarray([2008, 2288])], 56 | 'talking': [np.asarray([2288, 2528])], 57 | 'shootingarrow': [np.asarray([2528, 2738])]} 58 | 59 | path_dict = { 60 | 'jian3': { 61 | 'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/jian3/jian3.pkl', 62 | 'start_frame': 557, 63 | 'end_frame': 1857, 64 | "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC08102020/jian3' 65 | }, 66 | 'studio-jian1': { 67 | 'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-jian1/jian1.pkl', 68 | 'start_frame': 503, 69 | 'end_frame': 3603, 70 | "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-jian1' 71 | }, 72 | 'studio-jian2': { 73 | 'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-jian2/jian2.pkl', 74 | 'start_frame': 600, 75 | 'end_frame': 3400, 76 | "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-jian2' 77 | }, 78 | 'studio-lingjie1': { 79 | 'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-lingjie1/lingjie1.pkl', 80 | 'start_frame': 551, 81 | 'end_frame': 3251, 82 | "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-lingjie1' 83 | }, 84 | 'studio-lingjie2': { 85 | 'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-lingjie2/lingjie2.pkl', 86 | 'start_frame': 438, 87 | 'end_frame': 2738, 88 | "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-lingjie2' 89 | } 90 | } 91 | 92 | 93 | path_dict_local = { 94 | 'jian3': { 95 | 'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/jian3/jian3.pkl', 96 | 'start_frame': 557, 97 | 'end_frame': 1857, 98 | "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC08102020/jian3' 99 | }, 100 | 'studio-jian1': { 101 | 'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-jian1/jian1.pkl', 102 | 'start_frame': 503, 103 | 'end_frame': 3603, 104 | "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-jian1' 105 | }, 106 | 'studio-jian2': { 107 | 'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-jian2/jian2.pkl', 108 | 'start_frame': 600, 109 | 'end_frame': 3400, 110 | "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-jian2' 111 | }, 112 | 'studio-lingjie1': { 113 | 'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-lingjie1/lingjie1.pkl', 114 | 'start_frame': 551, 115 | 'end_frame': 3251, 116 | "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-lingjie1' 117 | }, 118 | 'studio-lingjie2': { 119 | 'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-lingjie2/lingjie2.pkl', 120 | 'start_frame': 438, 121 | 'end_frame': 2738, 122 | "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-lingjie2' 123 | } 124 | } 125 | 126 | def evaluate_3d_our_dataset(sequence_name, predicted_pose, scale=True, select_start_to_end=True): 127 | gt_path = path_dict[sequence_name]['gt_path'] 128 | start_frame = path_dict[sequence_name]['start_frame'] 129 | end_frame = path_dict[sequence_name]['end_frame'] 130 | predicted_path = path_dict[sequence_name]['predicted_path'] 131 | 132 | def load_gt_data(gt_path, start_frame, end_frame, mat_start_frame): 133 | with open(gt_path, 'rb') as f: 134 | pose_gt = pickle.load(f) 135 | clip = [] 136 | for i in range(start_frame, end_frame): 137 | clip.append(pose_gt[i - mat_start_frame]) 138 | 139 | skeleton_list = clip 140 | 141 | return np.asarray(skeleton_list) 142 | 143 | gt_pose_list = load_gt_data(gt_path, start_frame, end_frame, start_frame) 144 | 145 | # get predicted pose 146 | 147 | skeleton_model = Skeleton(calibration_path='utils/fisheye/fisheye.calibration.json') 148 | 149 | # sort the predicted poses 150 | # predicted_pose = natsorted(predicted_pose, key=lambda pose: pose[0]) 151 | # 152 | # predicted_pose_list = [pose_tuple[1] for pose_tuple in predicted_pose] 153 | if select_start_to_end: 154 | predicted_pose_list = predicted_pose[start_frame: end_frame] 155 | else: 156 | predicted_pose_list = predicted_pose 157 | 158 | aligned_estimated_result, gt_seq = align_skeleton(predicted_pose_list, gt_pose_list, None, scale=scale) 159 | 160 | aligned_original_mpjpe = calculate_error(aligned_estimated_result, gt_seq) 161 | 162 | # align the estimated result and original result 163 | aligned_estimated_result, gt_seq = align_skeleton(predicted_pose_list, gt_pose_list, skeleton_model, scale=scale) 164 | 165 | bone_length_aligned_original_mpjpe = calculate_error(aligned_estimated_result, gt_seq) 166 | 167 | print(aligned_original_mpjpe) 168 | print(bone_length_aligned_original_mpjpe) 169 | 170 | calculate_different_motion(aligned_estimated_result, gt_seq, sequence_name) 171 | 172 | return aligned_original_mpjpe, bone_length_aligned_original_mpjpe 173 | 174 | 175 | def calculate_different_motion(estimated_pose, gt_pose, data_dir): 176 | if 'jian3' in data_dir: 177 | motion_type = jian3_motion_type 178 | if 'jian1' in data_dir: 179 | motion_type = studio_jian1_motion_type 180 | if 'jian2' in data_dir: 181 | motion_type = studio_jian2_motion_type 182 | if 'lingjie1' in data_dir: 183 | motion_type = studio_lingjie1_motion_type 184 | if 'lingjie2' in data_dir: 185 | motion_type = studio_lingjie2_motion_type 186 | 187 | skeleton_model = Skeleton(None) 188 | start_frame = motion_type['start'] 189 | 190 | for motion in motion_type.keys(): 191 | if motion == 'start': 192 | continue 193 | estimated_mpjpe = 0 194 | for motion_range in motion_type[motion]: 195 | estimated_pose_motion = estimated_pose[motion_range[0] - start_frame: motion_range[1] - start_frame] 196 | gt_pose_motion = gt_pose[motion_range[0] - start_frame: motion_range[1] - start_frame] 197 | aligned_estimated_result, final_gt_seq = align_skeleton(estimated_pose_motion, gt_pose_motion, 198 | skeleton_model) 199 | estimated_mpjpe += calculate_error(aligned_estimated_result, final_gt_seq) 200 | estimated_mpjpe /= len(motion_type[motion]) 201 | print("{}: {}".format(motion, estimated_mpjpe)) 202 | 203 | 204 | if __name__ == '__main__': 205 | evaluate_3d_our_dataset('jian3', heatmap_name='da_external_more_gpu_no_mid_loss', 206 | depth_name='finetune_depth_spin_iter_0_depth_5', 207 | load_predicted=False) 208 | evaluate_3d_our_dataset('studio-jian1', heatmap_name='da_external_more_gpu_no_mid_loss', 209 | depth_name='finetune_depth_spin_iter_0_depth_5', 210 | load_predicted=False) 211 | evaluate_3d_our_dataset('studio-jian2', heatmap_name='da_external_more_gpu_no_mid_loss', 212 | depth_name='finetune_depth_spin_iter_0_depth_5', 213 | load_predicted=False) 214 | evaluate_3d_our_dataset('studio-lingjie1', heatmap_name='da_external_more_gpu_no_mid_loss', 215 | depth_name='finetune_depth_spin_iter_0_depth_5', 216 | load_predicted=False) 217 | evaluate_3d_our_dataset('studio-lingjie2', heatmap_name='da_external_more_gpu_no_mid_loss', 218 | depth_name='finetune_depth_spin_iter_0_depth_5', 219 | load_predicted=False) 220 | -------------------------------------------------------------------------------- /experiments/sceneego/test/sceneego.yaml: -------------------------------------------------------------------------------- 1 | title: "sceneego" 2 | kind: "mo2cap2" 3 | vis_freq: 1000 4 | vis_n_elements: 10 5 | 6 | image_shape: [256, 256] 7 | heatmap_shape: [1024, 1280] 8 | 9 | test: 10 | batch_size: 8 11 | model_path: "models/sceneego/checkpoints/6.pth.tar" 12 | depth_model_path: "network_models/wo_body_lr_1e-4_finetune_cleaned_data/iter_38000.pth.tar" 13 | 14 | opt: 15 | criterion: "MAE" 16 | 17 | use_volumetric_ce_loss: true 18 | volumetric_ce_loss_weight: 0.01 19 | 20 | n_objects_per_epoch: 15000 21 | n_epochs: 10 22 | 23 | batch_size: 40 24 | val_batch_size: 10 25 | 26 | train_2d: false 27 | 28 | lr: 0.001 29 | process_features_lr: 0.001 30 | volume_net_lr: 0.001 31 | 32 | scale_keypoints_3d: 0.1 33 | 34 | log_step: 1000 35 | 36 | model: 37 | name: "vol" 38 | kind: "mo2cap2" 39 | volume_aggregation_method: "softmax" 40 | with_scene: true 41 | with_intersection: false 42 | init_weights: false 43 | checkpoint: "" 44 | 45 | load_model: True 46 | model_path: "logs/egopw_with_depth/checkpoints/5.pth.tar" 47 | depth_model_path: "network_models/wo_body_lr_1e-4_finetune_cleaned_data/iter_38000.pth.tar" 48 | 49 | cuboid_side: 2 50 | 51 | volume_size: 64 52 | volume_multiplier: 1.0 53 | volume_softmax: true 54 | 55 | heatmap_softmax: true 56 | heatmap_multiplier: 100.0 57 | 58 | backbone: 59 | name: "resnet50" 60 | style: "simple" 61 | 62 | init_weights: true 63 | local_checkpoint: false 64 | checkpoint: "/HPS/Mo2Cap2Plus1/work/Mo2Cap2Finetune/logs/finetune2D_spin_without_da_iter_0_new/checkpoints/6.pth.tar" 65 | # checkpoint: "X:/Mo2Cap2Plus1/work/Mo2Cap2Finetune/logs/finetune2D_spin_without_da_iter_0_new/checkpoints/6.pth.tar" 66 | 67 | num_joints: 15 68 | num_layers: 50 69 | 70 | dataset: 71 | kind: "egopw_with_depth" 72 | camera_calibration_path: "utils/fisheye/fisheye.calibration_05_08.json" 73 | old_camera_calibration_path: "utils/fisheye/fisheye.calibration.json" 74 | image_width: 1280 75 | image_height: 1024 76 | 77 | train: 78 | mo2cap2_root: "/HPS/Mo2Cap2Plus/static00/Datasets/Mo2Cap2/data/training_data_full_annotated" 79 | # wild_data_root: "/HPS/Mo2Cap2Plus1/static00/ExternalEgo/External_camera_all" 80 | # rendered_depth_path: "/CT/EgoMocap/work/EgoBodyInContext/sfm_data" 81 | wild_data_root: "X:/Mo2Cap2Plus1/static00/ExternalEgo/External_camera_all" 82 | rendered_depth_path: "Z:/EgoMocap/work/EgoBodyInContext/sfm_data" 83 | 84 | with_damaged_actions: true 85 | undistort_images: true 86 | 87 | scale_bbox: 1.0 88 | 89 | 90 | shuffle: true 91 | randomize_n_views: false 92 | min_n_views: null 93 | max_n_views: null 94 | num_workers: 5 95 | 96 | 97 | val: 98 | h36m_root: "./data/human36m/processed/" 99 | labels_path: "./data/human36m/extra/human36m-multiview-labels-GTbboxes.npy" 100 | pred_results_path: "./data/pretrained/human36m/human36m_alg_10-04-2019/checkpoints/0060/results/val.pkl" 101 | 102 | with_damaged_actions: true 103 | undistort_images: true 104 | 105 | scale_bbox: 1.0 106 | 107 | shuffle: false 108 | randomize_n_views: false 109 | min_n_views: null 110 | max_n_views: null 111 | num_workers: 10 112 | 113 | retain_every_n_frames_in_test: 1 114 | -------------------------------------------------------------------------------- /models/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/models/.gitkeep -------------------------------------------------------------------------------- /models/sceneego/checkpoints/.gitkeep: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/models/sceneego/checkpoints/.gitkeep -------------------------------------------------------------------------------- /network/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/network/__init__.py -------------------------------------------------------------------------------- /network/pose_resnet.py: -------------------------------------------------------------------------------- 1 | import os 2 | import logging 3 | 4 | import torch 5 | import torch.nn as nn 6 | from collections import OrderedDict 7 | from torch.nn.functional import interpolate 8 | 9 | 10 | BN_MOMENTUM = 0.1 11 | logger = logging.getLogger(__name__) 12 | 13 | 14 | def conv3x3(in_planes, out_planes, stride=1): 15 | """3x3 convolution with padding""" 16 | return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride, 17 | padding=1, bias=False) 18 | 19 | 20 | class BasicBlock(nn.Module): 21 | expansion = 1 22 | 23 | def __init__(self, inplanes, planes, stride=1, downsample=None): 24 | super(BasicBlock, self).__init__() 25 | self.conv1 = conv3x3(inplanes, planes, stride) 26 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 27 | self.relu = nn.ReLU(inplace=True) 28 | self.conv2 = conv3x3(planes, planes) 29 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 30 | self.downsample = downsample 31 | self.stride = stride 32 | 33 | def forward(self, x): 34 | residual = x 35 | 36 | out = self.conv1(x) 37 | out = self.bn1(out) 38 | out = self.relu(out) 39 | 40 | out = self.conv2(out) 41 | out = self.bn2(out) 42 | 43 | if self.downsample is not None: 44 | residual = self.downsample(x) 45 | 46 | out += residual 47 | out = self.relu(out) 48 | 49 | return out 50 | 51 | 52 | class Bottleneck(nn.Module): 53 | expansion = 4 54 | 55 | def __init__(self, inplanes, planes, stride=1, downsample=None): 56 | super(Bottleneck, self).__init__() 57 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False) 58 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 59 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride, 60 | padding=1, bias=False) 61 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 62 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, 63 | bias=False) 64 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, 65 | momentum=BN_MOMENTUM) 66 | self.relu = nn.ReLU(inplace=True) 67 | self.downsample = downsample 68 | self.stride = stride 69 | 70 | def forward(self, x): 71 | residual = x 72 | 73 | out = self.conv1(x) 74 | out = self.bn1(out) 75 | out = self.relu(out) 76 | 77 | out = self.conv2(out) 78 | out = self.bn2(out) 79 | out = self.relu(out) 80 | 81 | out = self.conv3(out) 82 | out = self.bn3(out) 83 | 84 | if self.downsample is not None: 85 | residual = self.downsample(x) 86 | 87 | out += residual 88 | out = self.relu(out) 89 | 90 | return out 91 | 92 | 93 | class Bottleneck_CAFFE(nn.Module): 94 | expansion = 4 95 | 96 | def __init__(self, inplanes, planes, stride=1, downsample=None): 97 | super(Bottleneck_CAFFE, self).__init__() 98 | # add stride to conv1x1 99 | self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) 100 | self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 101 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, 102 | padding=1, bias=False) 103 | self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM) 104 | self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, 105 | bias=False) 106 | self.bn3 = nn.BatchNorm2d(planes * self.expansion, 107 | momentum=BN_MOMENTUM) 108 | self.relu = nn.ReLU(inplace=True) 109 | self.downsample = downsample 110 | self.stride = stride 111 | 112 | def forward(self, x): 113 | residual = x 114 | 115 | out = self.conv1(x) 116 | out = self.bn1(out) 117 | out = self.relu(out) 118 | 119 | out = self.conv2(out) 120 | out = self.bn2(out) 121 | out = self.relu(out) 122 | 123 | out = self.conv3(out) 124 | out = self.bn3(out) 125 | 126 | if self.downsample is not None: 127 | residual = self.downsample(x) 128 | 129 | out += residual 130 | out = self.relu(out) 131 | 132 | return out 133 | 134 | 135 | class PoseResNet(nn.Module): 136 | 137 | def __init__(self, block, layers): 138 | self.inplanes = 64 139 | self.deconv_with_bias = False 140 | 141 | super(PoseResNet, self).__init__() 142 | 143 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, 144 | bias=False) 145 | self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM) 146 | self.relu = nn.ReLU(inplace=True) 147 | self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) 148 | self.layer1 = self._make_layer(block, 64, layers[0]) 149 | self.layer2 = self._make_layer(block, 128, layers[1], stride=2) 150 | self.layer3 = self._make_layer(block, 256, layers[2], stride=2) 151 | self.layer4 = self._make_layer(block, 512, layers[3], stride=2) 152 | 153 | # used for deconv layers 154 | self.deconv_layers = self._make_deconv_layer( 155 | 3, 156 | [256, 256, 256], 157 | [4, 4, 4], 158 | ) 159 | 160 | self.final_layer = nn.Conv2d( 161 | in_channels=256, 162 | out_channels=16, 163 | kernel_size=1, 164 | stride=1, 165 | padding=0 166 | ) 167 | 168 | def _make_layer(self, block, planes, blocks, stride=1): 169 | downsample = None 170 | if stride != 1 or self.inplanes != planes * block.expansion: 171 | downsample = nn.Sequential( 172 | nn.Conv2d(self.inplanes, planes * block.expansion, 173 | kernel_size=1, stride=stride, bias=False), 174 | nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM), 175 | ) 176 | 177 | layers = [] 178 | layers.append(block(self.inplanes, planes, stride, downsample)) 179 | self.inplanes = planes * block.expansion 180 | for i in range(1, blocks): 181 | layers.append(block(self.inplanes, planes)) 182 | 183 | return nn.Sequential(*layers) 184 | 185 | def _get_deconv_cfg(self, deconv_kernel, index): 186 | if deconv_kernel == 4: 187 | padding = 1 188 | output_padding = 0 189 | elif deconv_kernel == 3: 190 | padding = 1 191 | output_padding = 1 192 | elif deconv_kernel == 2: 193 | padding = 0 194 | output_padding = 0 195 | 196 | return deconv_kernel, padding, output_padding 197 | 198 | def _make_deconv_layer(self, num_layers, num_filters, num_kernels): 199 | assert num_layers == len(num_filters), \ 200 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)' 201 | assert num_layers == len(num_kernels), \ 202 | 'ERROR: num_deconv_layers is different len(num_deconv_filters)' 203 | 204 | layers = [] 205 | for i in range(num_layers): 206 | kernel, padding, output_padding = \ 207 | self._get_deconv_cfg(num_kernels[i], i) 208 | 209 | planes = num_filters[i] 210 | layers.append( 211 | nn.ConvTranspose2d( 212 | in_channels=self.inplanes, 213 | out_channels=planes, 214 | kernel_size=kernel, 215 | stride=2, 216 | padding=padding, 217 | output_padding=output_padding, 218 | bias=self.deconv_with_bias)) 219 | layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)) 220 | layers.append(nn.ReLU(inplace=True)) 221 | self.inplanes = planes 222 | 223 | return nn.Sequential(*layers) 224 | 225 | def forward(self, x, return_mid_layer=False): 226 | x = self.conv1(x) 227 | x = self.bn1(x) 228 | x = self.relu(x) 229 | x = self.maxpool(x) 230 | 231 | x = self.layer1(x) 232 | x = self.layer2(x) 233 | x = self.layer3(x) 234 | x = self.layer4(x) 235 | 236 | mid_layer_feature = x 237 | 238 | x = self.deconv_layers(x) 239 | 240 | features = x 241 | x = self.final_layer(x) 242 | heatmaps = x[:, :15, :, :] 243 | if return_mid_layer is False: 244 | return heatmaps, features 245 | else: 246 | return heatmaps, features, mid_layer_feature 247 | 248 | 249 | def init_weights(self, pretrained=''): 250 | if os.path.isfile(pretrained): 251 | logger.info('=> init deconv weights from normal distribution') 252 | for name, m in self.deconv_layers.named_modules(): 253 | if isinstance(m, nn.ConvTranspose2d): 254 | logger.info('=> init {}.weight as normal(0, 0.001)'.format(name)) 255 | logger.info('=> init {}.bias as 0'.format(name)) 256 | nn.init.normal_(m.weight, std=0.001) 257 | if self.deconv_with_bias: 258 | nn.init.constant_(m.bias, 0) 259 | elif isinstance(m, nn.BatchNorm2d): 260 | logger.info('=> init {}.weight as 1'.format(name)) 261 | logger.info('=> init {}.bias as 0'.format(name)) 262 | nn.init.constant_(m.weight, 1) 263 | nn.init.constant_(m.bias, 0) 264 | logger.info('=> init final conv weights from normal distribution') 265 | for m in self.final_layer.modules(): 266 | if isinstance(m, nn.Conv2d): 267 | # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 268 | logger.info('=> init {}.weight as normal(0, 0.001)'.format(name)) 269 | logger.info('=> init {}.bias as 0'.format(name)) 270 | nn.init.normal_(m.weight, std=0.001) 271 | nn.init.constant_(m.bias, 0) 272 | 273 | # pretrained_state_dict = torch.load(pretrained) 274 | logger.info('=> loading pretrained model {}'.format(pretrained)) 275 | # self.load_state_dict(pretrained_state_dict, strict=False) 276 | checkpoint = torch.load(pretrained) 277 | if isinstance(checkpoint, OrderedDict): 278 | state_dict = checkpoint 279 | elif isinstance(checkpoint, dict) and 'state_dict' in checkpoint: 280 | state_dict_old = checkpoint['state_dict'] 281 | state_dict = OrderedDict() 282 | # delete 'module.' because it is saved from DataParallel module 283 | for key in state_dict_old.keys(): 284 | if key.startswith('module.'): 285 | # state_dict[key[7:]] = state_dict[key] 286 | # state_dict.pop(key) 287 | state_dict[key[7:]] = state_dict_old[key] 288 | else: 289 | state_dict[key] = state_dict_old[key] 290 | else: 291 | raise RuntimeError( 292 | 'No state_dict found in checkpoint file {}'.format(pretrained)) 293 | self.load_state_dict(state_dict, strict=False) 294 | else: 295 | logger.error('=> imagenet pretrained model dose not exist') 296 | logger.error('=> please download it first') 297 | raise ValueError('imagenet pretrained model does not exist') 298 | 299 | 300 | resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]), 301 | 34: (BasicBlock, [3, 4, 6, 3]), 302 | 50: (Bottleneck, [3, 4, 6, 3]), 303 | 101: (Bottleneck, [3, 4, 23, 3]), 304 | 152: (Bottleneck, [3, 8, 36, 3])} 305 | 306 | 307 | def load_state_dict(model, new_state_dict): 308 | state = model.state_dict() 309 | state.update(new_state_dict) 310 | model.load_state_dict(state) 311 | return model 312 | 313 | def get_pose_net(model_path='models/pose_resnet_50_256x256.pth.tar', state_dict=None): 314 | num_layers = 50 315 | 316 | block_class, layers = resnet_spec[num_layers] 317 | 318 | model = PoseResNet(block_class, layers) 319 | 320 | if state_dict is None: 321 | if model_path is not None: 322 | state_dict = torch.load(model_path) 323 | model = load_state_dict(model, state_dict) 324 | else: 325 | # model.load_state_dict(state_dict) 326 | keys_list = list(state_dict.keys()) 327 | 328 | if keys_list[0].startswith('module'): 329 | state_dict = {k[7:]: v for k, v in state_dict.items()} 330 | model = load_state_dict(model, state_dict) 331 | 332 | return model 333 | 334 | 335 | if __name__ == '__main__': 336 | network = get_pose_net(model_path='../models/pose_resnet_50_256x256.pth.tar', with_bone_length=False) 337 | test_input = torch.zeros([4, 3, 256, 256]) 338 | res = network.forward_2D_pose(test_input, slice=False) 339 | print(res[1].shape) 340 | print(res[2].shape) -------------------------------------------------------------------------------- /network/v2v.py: -------------------------------------------------------------------------------- 1 | # Reference: https://github.com/dragonbook/V2V-PoseNet-pytorch 2 | 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | import torch 6 | 7 | 8 | class Basic3DBlock(nn.Module): 9 | def __init__(self, in_planes, out_planes, kernel_size): 10 | super(Basic3DBlock, self).__init__() 11 | self.block = nn.Sequential( 12 | nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=1, padding=((kernel_size-1)//2)), 13 | nn.BatchNorm3d(out_planes), 14 | nn.ReLU(True) 15 | ) 16 | 17 | def forward(self, x): 18 | return self.block(x) 19 | 20 | 21 | class Res3DBlock(nn.Module): 22 | def __init__(self, in_planes, out_planes): 23 | super(Res3DBlock, self).__init__() 24 | self.res_branch = nn.Sequential( 25 | nn.Conv3d(in_planes, out_planes, kernel_size=3, stride=1, padding=1), 26 | nn.BatchNorm3d(out_planes), 27 | nn.ReLU(True), 28 | nn.Conv3d(out_planes, out_planes, kernel_size=3, stride=1, padding=1), 29 | nn.BatchNorm3d(out_planes) 30 | ) 31 | 32 | if in_planes == out_planes: 33 | self.skip_con = nn.Sequential() 34 | else: 35 | self.skip_con = nn.Sequential( 36 | nn.Conv3d(in_planes, out_planes, kernel_size=1, stride=1, padding=0), 37 | nn.BatchNorm3d(out_planes) 38 | ) 39 | 40 | def forward(self, x): 41 | res = self.res_branch(x) 42 | skip = self.skip_con(x) 43 | return F.relu(res + skip, True) 44 | 45 | 46 | class Pool3DBlock(nn.Module): 47 | def __init__(self, pool_size): 48 | super(Pool3DBlock, self).__init__() 49 | self.pool_size = pool_size 50 | 51 | def forward(self, x): 52 | return F.max_pool3d(x, kernel_size=self.pool_size, stride=self.pool_size) 53 | 54 | 55 | class Upsample3DBlock(nn.Module): 56 | def __init__(self, in_planes, out_planes, kernel_size, stride): 57 | super(Upsample3DBlock, self).__init__() 58 | assert(kernel_size == 2) 59 | assert(stride == 2) 60 | self.block = nn.Sequential( 61 | nn.ConvTranspose3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=0, output_padding=0), 62 | nn.BatchNorm3d(out_planes), 63 | nn.ReLU(True) 64 | ) 65 | 66 | def forward(self, x): 67 | return self.block(x) 68 | 69 | 70 | class EncoderDecorder(nn.Module): 71 | def __init__(self): 72 | super().__init__() 73 | 74 | self.encoder_pool1 = Pool3DBlock(2) 75 | self.encoder_res1 = Res3DBlock(32, 64) 76 | self.encoder_pool2 = Pool3DBlock(2) 77 | self.encoder_res2 = Res3DBlock(64, 128) 78 | self.encoder_pool3 = Pool3DBlock(2) 79 | self.encoder_res3 = Res3DBlock(128, 128) 80 | self.encoder_pool4 = Pool3DBlock(2) 81 | self.encoder_res4 = Res3DBlock(128, 128) 82 | self.encoder_pool5 = Pool3DBlock(2) 83 | self.encoder_res5 = Res3DBlock(128, 128) 84 | 85 | self.mid_res = Res3DBlock(128, 128) 86 | 87 | self.decoder_res5 = Res3DBlock(128, 128) 88 | self.decoder_upsample5 = Upsample3DBlock(128, 128, 2, 2) 89 | self.decoder_res4 = Res3DBlock(128, 128) 90 | self.decoder_upsample4 = Upsample3DBlock(128, 128, 2, 2) 91 | self.decoder_res3 = Res3DBlock(128, 128) 92 | self.decoder_upsample3 = Upsample3DBlock(128, 128, 2, 2) 93 | self.decoder_res2 = Res3DBlock(128, 128) 94 | self.decoder_upsample2 = Upsample3DBlock(128, 64, 2, 2) 95 | self.decoder_res1 = Res3DBlock(64, 64) 96 | self.decoder_upsample1 = Upsample3DBlock(64, 32, 2, 2) 97 | 98 | self.skip_res1 = Res3DBlock(32, 32) 99 | self.skip_res2 = Res3DBlock(64, 64) 100 | self.skip_res3 = Res3DBlock(128, 128) 101 | self.skip_res4 = Res3DBlock(128, 128) 102 | self.skip_res5 = Res3DBlock(128, 128) 103 | 104 | def forward(self, x): 105 | skip_x1 = self.skip_res1(x) 106 | x = self.encoder_pool1(x) 107 | x = self.encoder_res1(x) 108 | skip_x2 = self.skip_res2(x) 109 | x = self.encoder_pool2(x) 110 | x = self.encoder_res2(x) 111 | skip_x3 = self.skip_res3(x) 112 | x = self.encoder_pool3(x) 113 | x = self.encoder_res3(x) 114 | skip_x4 = self.skip_res4(x) 115 | x = self.encoder_pool4(x) 116 | x = self.encoder_res4(x) 117 | skip_x5 = self.skip_res5(x) 118 | x = self.encoder_pool5(x) 119 | x = self.encoder_res5(x) 120 | 121 | x = self.mid_res(x) 122 | 123 | x = self.decoder_res5(x) 124 | x = self.decoder_upsample5(x) 125 | x = x + skip_x5 126 | x = self.decoder_res4(x) 127 | x = self.decoder_upsample4(x) 128 | x = x + skip_x4 129 | x = self.decoder_res3(x) 130 | x = self.decoder_upsample3(x) 131 | x = x + skip_x3 132 | x = self.decoder_res2(x) 133 | x = self.decoder_upsample2(x) 134 | x = x + skip_x2 135 | x = self.decoder_res1(x) 136 | x = self.decoder_upsample1(x) 137 | x = x + skip_x1 138 | 139 | return x 140 | 141 | 142 | class V2VModel(nn.Module): 143 | def __init__(self, input_channels, output_channels): 144 | super().__init__() 145 | 146 | self.front_layers = nn.Sequential( 147 | Basic3DBlock(input_channels, 16, 7), 148 | Res3DBlock(16, 32), 149 | Res3DBlock(32, 32), 150 | Res3DBlock(32, 32) 151 | ) 152 | 153 | self.encoder_decoder = EncoderDecorder() 154 | 155 | self.back_layers = nn.Sequential( 156 | Res3DBlock(32, 32), 157 | Basic3DBlock(32, 32, 1), 158 | Basic3DBlock(32, 32, 1), 159 | ) 160 | 161 | self.output_layer = nn.Conv3d(32, output_channels, kernel_size=1, stride=1, padding=0) 162 | 163 | self._initialize_weights() 164 | 165 | def forward(self, x): 166 | x = self.front_layers(x) 167 | x = self.encoder_decoder(x) 168 | x = self.back_layers(x) 169 | x = self.output_layer(x) 170 | return x 171 | 172 | def _initialize_weights(self): 173 | for m in self.modules(): 174 | if isinstance(m, nn.Conv3d): 175 | nn.init.xavier_normal_(m.weight) 176 | # nn.init.normal_(m.weight, 0, 0.001) 177 | nn.init.constant_(m.bias, 0) 178 | elif isinstance(m, nn.ConvTranspose3d): 179 | nn.init.xavier_normal_(m.weight) 180 | # nn.init.normal_(m.weight, 0, 0.001) 181 | nn.init.constant_(m.bias, 0) 182 | 183 | 184 | class EncoderDecoderSimple(nn.Module): 185 | def __init__(self): 186 | super().__init__() 187 | 188 | self.encoder_pool1 = Pool3DBlock(2) 189 | self.encoder_res1 = Res3DBlock(32, 64) 190 | self.encoder_pool2 = Pool3DBlock(2) 191 | self.encoder_res2 = Res3DBlock(64, 128) 192 | 193 | self.mid_res = Res3DBlock(128, 128) 194 | 195 | self.decoder_res2 = Res3DBlock(128, 128) 196 | self.decoder_upsample2 = Upsample3DBlock(128, 64, 2, 2) 197 | self.decoder_res1 = Res3DBlock(64, 64) 198 | self.decoder_upsample1 = Upsample3DBlock(64, 32, 2, 2) 199 | 200 | self.skip_res1 = Res3DBlock(32, 32) 201 | self.skip_res2 = Res3DBlock(64, 64) 202 | 203 | def forward(self, x): 204 | skip_x1 = self.skip_res1(x) 205 | x = self.encoder_pool1(x) 206 | x = self.encoder_res1(x) 207 | skip_x2 = self.skip_res2(x) 208 | x = self.encoder_pool2(x) 209 | x = self.encoder_res2(x) 210 | 211 | x = self.mid_res(x) 212 | 213 | x = self.decoder_res2(x) 214 | x = self.decoder_upsample2(x) 215 | x = x + skip_x2 216 | x = self.decoder_res1(x) 217 | x = self.decoder_upsample1(x) 218 | x = x + skip_x1 219 | 220 | return x 221 | 222 | 223 | class V2VModelSimple(nn.Module): 224 | def __init__(self, input_channels, output_channels): 225 | super().__init__() 226 | 227 | self.front_layers = nn.Sequential( 228 | Basic3DBlock(input_channels, 32, 7), 229 | ) 230 | 231 | self.encoder_decoder = EncoderDecoderSimple() 232 | 233 | self.back_layers = nn.Sequential( 234 | Basic3DBlock(32, 32, 1), 235 | ) 236 | 237 | self.output_layer = nn.Conv3d(32, output_channels, kernel_size=1, stride=1, padding=0) 238 | 239 | self._initialize_weights() 240 | 241 | def forward(self, x): 242 | x = self.front_layers(x) 243 | x = self.encoder_decoder(x) 244 | x = self.back_layers(x) 245 | x = self.output_layer(x) 246 | return x 247 | 248 | def _initialize_weights(self): 249 | for m in self.modules(): 250 | if isinstance(m, nn.Conv3d): 251 | nn.init.xavier_normal_(m.weight) 252 | # nn.init.normal_(m.weight, 0, 0.001) 253 | nn.init.constant_(m.bias, 0) 254 | elif isinstance(m, nn.ConvTranspose3d): 255 | nn.init.xavier_normal_(m.weight) 256 | # nn.init.normal_(m.weight, 0, 0.001) 257 | nn.init.constant_(m.bias, 0) 258 | 259 | if __name__ == '__main__': 260 | import time 261 | model = V2VModel(input_channels=32, output_channels=15) 262 | model = model.cuda() 263 | 264 | for i in range(10): 265 | input_tensor = torch.randn(8, 32, 64, 64, 64).cuda() 266 | start_time = time.time() 267 | output_tensor = model(input_tensor) 268 | end_time = time.time() 269 | print('time for one batch: {}'.format(end_time - start_time)) 270 | # print(output_tensor.shape) -------------------------------------------------------------------------------- /network/voxel_net_depth.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from copy import copy 3 | import cv2 4 | 5 | # from utils_proj.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated 6 | # from utils_proj.fisheye.FishEyeEquisolid import FishEyeCameraEquisolid 7 | 8 | import torch 9 | from torch import nn 10 | 11 | from utils import op 12 | from torch.nn.functional import interpolate 13 | 14 | from network import pose_resnet 15 | from network.v2v import V2VModel 16 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated 17 | 18 | 19 | class VoxelNetwork_depth(nn.Module): 20 | def __init__(self, config, device='cuda'): 21 | super(VoxelNetwork_depth, self).__init__() 22 | 23 | self.device = device 24 | self.num_joints = config.model.backbone.num_joints 25 | 26 | # volume 27 | self.volume_softmax = config.model.volume_softmax 28 | self.volume_multiplier = config.model.volume_multiplier 29 | self.volume_size = config.model.volume_size 30 | 31 | self.cuboid_side = config.model.cuboid_side 32 | 33 | self.kind = config.model.kind 34 | 35 | # heatmap 36 | self.heatmap_softmax = config.model.heatmap_softmax 37 | self.heatmap_multiplier = config.model.heatmap_multiplier 38 | 39 | # # transfer 40 | # self.transfer_cmu_to_human36m = config.model.transfer_cmu_to_human36m if hasattr(config.model, "transfer_cmu_to_human36m") else False 41 | 42 | load_checkpoint = config.model.backbone.local_checkpoint 43 | if load_checkpoint: 44 | network_path = config.model.backbone.checkpoint 45 | network_loads = torch.load(network_path) 46 | 47 | self.backbone = pose_resnet.get_pose_net(state_dict=network_loads['state_dict']) 48 | else: 49 | print('Do not load checkpoint') 50 | self.backbone = pose_resnet.get_pose_net(None) 51 | self.backbone = self.backbone.to(device) 52 | 53 | if config.opt.train_2d is False: 54 | for p in self.backbone.parameters(): 55 | p.requires_grad = False 56 | 57 | # resize and pad feature for reprojection 58 | self.process_features = nn.Sequential( 59 | nn.Conv2d(256, 32, 1), 60 | nn.Upsample(size=(1024, 1024)), 61 | nn.ConstantPad2d(padding=(128, 128, 0, 0), value=0.0) 62 | ) 63 | self.process_features = self.process_features.to(device) 64 | 65 | self.with_scene = config.model.with_scene 66 | if config.model.with_scene is True: 67 | if config.model.with_intersection is True: 68 | volume_input_channel_num = 32 + 1 + 32 69 | self.with_intersection = True 70 | else: 71 | volume_input_channel_num = 32 + 1 72 | self.with_intersection = False 73 | else: 74 | volume_input_channel_num = 32 75 | 76 | self.volume_net = V2VModel(volume_input_channel_num, self.num_joints) 77 | self.volume_net = self.volume_net.to(device) 78 | 79 | print('build coord volume') 80 | self.coord_volume = self.build_coord_volume() 81 | self.coord_volumes = self.coord_volume.unsqueeze(0).expand(config.opt.batch_size, 82 | -1, -1, -1, -1) 83 | self.coord_volumes = self.coord_volumes.to(device) 84 | 85 | self.fisheye_camera_model = FishEyeCameraCalibrated( 86 | calibration_file_path=config.dataset.camera_calibration_path) 87 | print('build reprojected grid coord') 88 | self.grid_coord_proj = op.get_projected_2d_points_with_coord_volumes(fisheye_model=self.fisheye_camera_model, 89 | coord_volume=self.coord_volume) 90 | self.grid_coord_proj.requires_grad = False 91 | 92 | self.grid_coord_proj_batch = op.get_grid_coord_proj_batch(self.grid_coord_proj, 93 | batch_size=config.opt.batch_size, 94 | heatmap_shape=config.heatmap_shape) 95 | self.grid_coord_proj_batch.requires_grad = False 96 | self.grid_coord_proj_batch = self.grid_coord_proj_batch.to(device) 97 | 98 | # self.fisheye_camera_model_depth = FishEyeCameraEquisolid(focal_length=9, sensor_size=32, img_size=(1280, 1024)) 99 | 100 | # self.ray_torch = self.calculated_ray_direction(config.dataset.image_width, 101 | # config.dataset.image_height).to(device) 102 | # self.ray_torch.requires_grad = False 103 | 104 | self.ray = self.calculated_ray_direction_numpy(config.dataset.image_width, 105 | config.dataset.image_height) 106 | 107 | self.image_width = config.dataset.image_width 108 | self.image_height = config.dataset.image_height 109 | 110 | def build_coord_volume(self): 111 | """ 112 | get coord volume and prepare for the re-projection process 113 | :param self: 114 | :return: 115 | """ 116 | # build coord volumes 117 | sides = np.array([self.cuboid_side, self.cuboid_side, self.cuboid_side]) 118 | 119 | position = np.array([-self.cuboid_side / 2, -self.cuboid_side / 2, 0]) 120 | # build coord volume 121 | xxx, yyy, zzz = torch.meshgrid(torch.arange(self.volume_size), 122 | torch.arange(self.volume_size), 123 | torch.arange(self.volume_size)) 124 | grid = torch.stack([xxx, yyy, zzz], dim=-1).type(torch.float) 125 | grid = grid.reshape((-1, 3)) 126 | 127 | grid_coord = torch.zeros_like(grid) 128 | grid_coord[:, 0] = position[0] + (sides[0] / (self.volume_size - 1)) * grid[:, 0] 129 | grid_coord[:, 1] = position[1] + (sides[1] / (self.volume_size - 1)) * grid[:, 1] 130 | grid_coord[:, 2] = position[2] + (sides[2] / (self.volume_size - 1)) * grid[:, 2] 131 | 132 | coord_volume = grid_coord.reshape(self.volume_size, self.volume_size, self.volume_size, 3) 133 | 134 | return coord_volume 135 | 136 | def calculated_ray_direction(self, image_width, image_height): 137 | points = np.zeros(shape=(image_width, image_height, 2)) 138 | x_range = np.array(range(image_width)) 139 | y_range = np.array(range(image_height)) 140 | points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose() 141 | points[:, :, 1] = np.add(points[:, :, 1], y_range) 142 | points = points.reshape((-1, 2)) 143 | ray = self.fisheye_camera_model.camera2world_ray(points) 144 | ray_torch = torch.from_numpy(ray) 145 | return ray_torch 146 | 147 | def calculated_ray_direction_numpy(self, image_width, image_height): 148 | points = np.zeros(shape=(image_width, image_height, 2)) 149 | x_range = np.array(range(image_width)) 150 | y_range = np.array(range(image_height)) 151 | points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose() 152 | points[:, :, 1] = np.add(points[:, :, 1], y_range) 153 | points = points.reshape((-1, 2)) 154 | ray = self.fisheye_camera_model.camera2world_ray(points) 155 | return ray 156 | 157 | def depth_to_voxel_pytorch(self, depth): 158 | # directly multiply the depth on the pre-calculated rays 159 | # resize depth to (image_width, image_height) 160 | depth = depth.unsqueeze(0) 161 | depth = interpolate(depth, (self.image_height, self.image_height), mode='nearest') 162 | depth = torch.nn.functional.pad(depth, pad=[128, 128]) 163 | depth = depth.squeeze(0)[0] 164 | print(depth.shape) 165 | depth_img_flat = depth.transpose(0, 1).reshape(-1) 166 | point_cloud = self.ray_torch.transpose(0, 1) * depth_img_flat 167 | point_cloud = point_cloud.transpose(0, 1) 168 | 169 | # point cloud to voxel 170 | 171 | voxel_torch = self.point_cloud_to_voxel_pytorch(point_cloud) 172 | return voxel_torch 173 | 174 | def point_cloud_to_voxel_pytorch(self, point_cloud): 175 | scene_point_cloud_local = point_cloud 176 | 177 | scene_point_cloud_local[:, 0] = (scene_point_cloud_local[:, 178 | 0] + self.cuboid_side / 2) * self.volume_size / self.cuboid_side 179 | scene_point_cloud_local[:, 1] = (scene_point_cloud_local[:, 180 | 1] + self.cuboid_side / 2) * self.volume_size / self.cuboid_side 181 | scene_point_cloud_local[:, 2] = (scene_point_cloud_local[:, 2]) * self.volume_size / self.cuboid_side 182 | 183 | scene_point_cloud_local = torch.round(scene_point_cloud_local).long() 184 | 185 | # note: the gradient is zero here!!! 186 | good_indices = torch.logical_and(self.volume_size-1 >= scene_point_cloud_local, scene_point_cloud_local >= 0) 187 | good_indices = torch.all(good_indices, dim=1) 188 | scene_point_cloud_local = scene_point_cloud_local[good_indices] 189 | # scene_point_cloud_local = np.clip(scene_point_cloud_local, a_min=0, a_max=self.volume_size - 1).astype(np.int) 190 | voxel_torch = torch.zeros(size=(self.volume_size, self.volume_size, self.volume_size)).to(self.device) 191 | voxel_torch[scene_point_cloud_local.cpu().numpy().T] = 1 192 | return voxel_torch 193 | 194 | def depth_map_to_voxel_numpy(self, depth): 195 | # directly multiply the depth on the pre-calculated rays 196 | depth = depth.view(depth.size(-2), depth.size(-1)).cpu().detach().numpy() 197 | depth = cv2.resize(depth, dsize=(1024, 1024), interpolation=cv2.INTER_NEAREST) 198 | depth = np.pad(depth, ((0, 0), (128, 128)), 'constant', constant_values=0) 199 | depth_img_flat = depth.T.reshape((-1)) 200 | point_cloud = self.ray.T * depth_img_flat 201 | point_cloud = point_cloud.T 202 | 203 | # point cloud to voxel 204 | voxel_torch = self.point_cloud_to_voxel_numpy(point_cloud) 205 | return voxel_torch 206 | 207 | def point_cloud_to_voxel_numpy(self, point_cloud): 208 | scene_point_cloud_local = copy(point_cloud) 209 | scene_point_cloud_local[:, 0] = (scene_point_cloud_local[:, 210 | 0] + self.cuboid_side / 2) * self.volume_size / self.cuboid_side 211 | scene_point_cloud_local[:, 1] = (scene_point_cloud_local[:, 212 | 1] + self.cuboid_side / 2) * self.volume_size / self.cuboid_side 213 | scene_point_cloud_local[:, 2] = (scene_point_cloud_local[:, 2]) * self.volume_size / self.cuboid_side 214 | 215 | scene_point_cloud_local = np.round_(scene_point_cloud_local) 216 | good_indices = np.logical_and(self.volume_size-1 >= scene_point_cloud_local, scene_point_cloud_local >= 0) 217 | good_indices = np.all(good_indices, axis=1) 218 | scene_point_cloud_local = scene_point_cloud_local[good_indices] 219 | # scene_point_cloud_local = np.clip(scene_point_cloud_local, a_min=0, a_max=self.volume_size - 1).astype(np.int) 220 | voxel_torch = torch.zeros(size=(self.volume_size, self.volume_size, self.volume_size)).to(self.device) 221 | voxel_torch[scene_point_cloud_local.T] = 1 222 | return voxel_torch 223 | 224 | def forward(self, images, grid_coord_proj_batch, coord_volumes, scene_volumes=None, depth_map_batch=None): 225 | """ 226 | side: the length of volume square side, like 2 meters or 3 meters 227 | volume_size: the number of grid, like we have 64 or 32 grids for each side 228 | :param images: 229 | :return: 230 | """ 231 | device = images.device 232 | batch_size = images.shape[0] 233 | 234 | # forward backbone 235 | heatmaps, features = self.backbone(images) 236 | 237 | # process features before unprojecting 238 | features = self.process_features(features) 239 | 240 | # lift to volume 241 | if features.shape[0] < grid_coord_proj_batch.shape[0]: 242 | grid_coord_proj_batch = grid_coord_proj_batch[:features.shape[0]] 243 | volumes = op.unproject_heatmaps_one_view_batch(features, grid_coord_proj_batch, self.volume_size) 244 | 245 | if self.with_scene is True: 246 | if scene_volumes is not None: 247 | # combine scene volume with project pose volume 248 | scene_volumes = torch.unsqueeze(scene_volumes, dim=1) 249 | volumes = torch.cat([volumes, scene_volumes], dim=1) 250 | elif depth_map_batch is not None: 251 | voxel_list = [] 252 | for depth_map in depth_map_batch: 253 | voxel_torch = self.depth_map_to_voxel_numpy(depth_map) 254 | # show_voxel_torch(voxel_torch) 255 | voxel_list.append(voxel_torch) 256 | scene_volumes = torch.stack(voxel_list, dim=0) 257 | scene_volumes = torch.unsqueeze(scene_volumes, dim=1) 258 | if self.with_intersection is True: 259 | intersection = volumes * scene_volumes 260 | volumes = torch.cat([volumes, intersection, scene_volumes], dim=1) 261 | else: 262 | volumes = torch.cat([volumes, scene_volumes], dim=1) 263 | else: 264 | print("no scene volume or depth input!") 265 | return None 266 | 267 | # integral 3d 268 | volumes = self.volume_net(volumes) 269 | if volumes.shape[0] < coord_volumes.shape[0]: 270 | coord_volumes = coord_volumes[:volumes.shape[0]] 271 | vol_keypoints_3d, volumes = op.integrate_tensor_3d_with_coordinates(volumes * self.volume_multiplier, 272 | coord_volumes, 273 | softmax=self.volume_softmax) 274 | 275 | return vol_keypoints_3d, features, volumes, self.coord_volumes 276 | 277 | 278 | def run_voxel_net(): 279 | image_batch = torch.ones(size=(4, 3, 256, 256)) 280 | from utils import cfg 281 | config_path = 'experiments/mo2cap2/train/mo2cap2_vol_softmax.yaml' 282 | config = cfg.load_config(config_path) 283 | projection_network = VoxelNetwork_depth(config=config, device='cuda') 284 | 285 | vol_keypoints_3d, features, volumes, coord_volumes = projection_network(image_batch) 286 | print(vol_keypoints_3d.shape) 287 | 288 | 289 | def show_voxel_torch(voxel): 290 | import matplotlib.pyplot as plt 291 | # show voxel result 292 | voxel_np = voxel.cpu().numpy() 293 | fig = plt.figure() 294 | ax = fig.gca(projection='3d') 295 | # ax.set_aspect('equal') 296 | 297 | ax.voxels(voxel_np, edgecolor="k") 298 | plt.savefig('tmp/0.png') 299 | exit(0) 300 | # plt.show() 301 | 302 | def calculate_max_position(): 303 | volumes = torch.ones(size=(4, 15, 64, 64, 64)) * 0 304 | 305 | volumes[:, :, 32, 32, 32] = 1 306 | volumes[:, :, 31, 31, 31] = 1 307 | 308 | from utils import cfg 309 | config_path = '../experiments/mo2cap2/train/mo2cap2_vol_softmax.yaml' 310 | config = cfg.load_config(config_path) 311 | projection_network = VoxelNetwork_depth(config=config, device='cpu') 312 | 313 | coord_volumes = projection_network.coord_volumes 314 | 315 | vol_keypoints_3d, volumes = op.integrate_tensor_3d_with_coordinates(volumes, 316 | coord_volumes, 317 | softmax=True) 318 | 319 | print(vol_keypoints_3d) 320 | print(vol_keypoints_3d.shape) 321 | 322 | 323 | def visualize_grid_coord_proj(): 324 | from utils import cfg 325 | from visualization.visualization_projected_grid import draw_points 326 | import cv2 327 | config_path = 'experiments/mo2cap2/train/mo2cap2_vol_softmax.yaml' 328 | config = cfg.load_config(config_path) 329 | projection_network = VoxelNetwork_depth(config=config, device='cpu') 330 | 331 | grid_coord_proj = projection_network.grid_coord_proj 332 | 333 | canvas = cv2.imread( 334 | r'\\winfs-inf\HPS\Mo2Cap2Plus1\static00\EgocentricData\old_data\kitchen_2\imgs\img_-04032020185303-3.jpg') 335 | 336 | canvas = draw_points(canvas, grid_coord_proj) 337 | 338 | cv2.imshow('img', canvas) 339 | cv2.waitKey(0) 340 | 341 | 342 | def test_reproject_feature(): 343 | from utils import cfg 344 | import cv2 345 | config_path = 'experiments/local/train/mo2cap2_vol_softmax.yaml' 346 | config = cfg.load_config(config_path) 347 | projection_network = VoxelNetwork_depth(config=config, device='cpu') 348 | 349 | # features = torch.zeros(size=(4, 15, 1024, 1280)) 350 | # 351 | # features[:, :, 490:570, 620:700] = 10 352 | 353 | # read feature from egocentric map 354 | 355 | heatmap_path = r'X:\Mo2Cap2Plus1\static00\EgocentricData\REC23102020\studio-jian1\da_external_layer4\img-10232020185916-700.mat' 356 | from scipy.io import loadmat 357 | heatmap = loadmat(heatmap_path) 358 | heatmap = heatmap['heatmap'] 359 | # recover heatmap 360 | heatmap = cv2.resize(heatmap, dsize=(1024, 1024), interpolation=cv2.INTER_NEAREST) 361 | heatmap = np.pad(heatmap, ((0, 0), (128, 128), (0, 0)), mode='edge') 362 | heatmap = heatmap.transpose((2, 0, 1)) 363 | 364 | heatmap = np.expand_dims(heatmap, axis=0) 365 | 366 | features = torch.from_numpy(heatmap) 367 | 368 | grid_coord_proj_batch = op.get_grid_coord_proj_batch(projection_network.grid_coord_proj, 369 | batch_size=features.shape[0], 370 | heatmap_shape=(1024, 1280), 371 | device='cpu') 372 | 373 | volumes_batch = op.unproject_heatmaps_one_view_batch(features, grid_coord_proj_batch, 374 | projection_network.volume_size) 375 | 376 | volumes = op.unproject_heatmaps_one_view(features, projection_network.grid_coord_proj, 377 | projection_network.volume_size) 378 | 379 | print(torch.sum(torch.abs(volumes - volumes_batch))) 380 | # # visualize volume 381 | # print(volumes.shape) 382 | # for i in range(15): 383 | # volume_single = volumes[0][i] 384 | # 385 | # zoomed_volume_single = zoom(volume_single, 0.5) * 100 386 | # point_cloud = visualize_3D_grid_single_grid(zoomed_volume_single) 387 | # coord = open3d.geometry.TriangleMesh.create_coordinate_frame(size=10) 388 | # open3d.visualization.draw_geometries([point_cloud, coord]) 389 | 390 | 391 | if __name__ == '__main__': 392 | # visualize_grid_coord_proj() 393 | # test_reproject_feature() 394 | # calculate_max_position() 395 | run_voxel_net() 396 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | open3d 3 | tqdm 4 | opencv-python 5 | scipy 6 | matplotlib 7 | natsort -------------------------------------------------------------------------------- /resources/Wang_CVPR_2023.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/resources/Wang_CVPR_2023.gif -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import os 2 | from pprint import pprint 3 | import torch 4 | from torch.utils.data import DataLoader 5 | from tqdm import tqdm 6 | 7 | from dataset.test_dataset import TestDataset 8 | from network.voxel_net_depth import VoxelNetwork_depth 9 | from utils import cfg 10 | from utils.skeleton import Skeleton 11 | 12 | os.environ["OPENCV_IO_ENABLE_OPENEXR"] = "1" 13 | 14 | 15 | class Test: 16 | def __init__(self, config, seq_name, estimated_depth_name=None): 17 | self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu') 18 | self.context_dataset = TestDataset(config, seq_name=seq_name, voxel_output=False, 19 | estimated_depth_name=estimated_depth_name) 20 | self.context_dataloader = DataLoader(self.context_dataset, batch_size=config.test.batch_size, shuffle=False, 21 | drop_last=False, num_workers=1) 22 | 23 | self.network = VoxelNetwork_depth(config, device=self.device) 24 | 25 | # load the network model 26 | model_path = config.test.model_path 27 | loads = torch.load(model_path) 28 | self.network.load_state_dict(loads['state_dict']) 29 | 30 | self.network = self.network.to(self.device) 31 | 32 | self.skeleton = Skeleton(calibration_path=config.dataset.camera_calibration_path) 33 | 34 | def run(self, config): 35 | print('---------------------Start Training-----------------------') 36 | pprint(config.__dict__) 37 | self.network.eval() 38 | 39 | predicted_joint_list = [] 40 | with torch.no_grad(): 41 | for i, (img, img_rgb, depth_info, img_path) in tqdm(enumerate(self.context_dataloader)): 42 | img = img.to(self.device) 43 | img_rgb = img_rgb.to(self.device) 44 | 45 | depth_info = depth_info.to(self.device) 46 | 47 | grid_coord_proj_batch = self.network.grid_coord_proj_batch 48 | coord_volumes = self.network.coord_volumes 49 | 50 | vol_keypoints_3d, features, volumes, coord_volumes = self.network(img, grid_coord_proj_batch, 51 | coord_volumes, 52 | depth_map_batch=depth_info) 53 | 54 | predicted_keypoints_batch = vol_keypoints_3d.cpu().numpy() 55 | 56 | predicted_joint_list.extend(predicted_keypoints_batch) 57 | # print(len(predicted_joint_list)) 58 | 59 | return predicted_joint_list 60 | 61 | 62 | if __name__ == '__main__': 63 | config_path = 'experiments/sceneego/test/sceneego.yaml' 64 | import pickle 65 | 66 | seq_name = 'new_diogo1' 67 | 68 | config = cfg.load_config(config_path) 69 | test = Test(config, seq_name, estimated_depth_name='matterport_green') 70 | predicted_joint_list = test.run(config) 71 | 72 | mpjpe, pampjpe = test.context_dataset.evaluate_mpjpe(predicted_joint_list) 73 | 74 | print('mpjpe: {}'.format(mpjpe)) 75 | print('pa mpjpe: {}'.format(pampjpe)) 76 | 77 | # save predicted joint list 78 | 79 | save_dir = r'/HPS/ScanNet/work/egocentric_view/25082022/diogo1/out' 80 | if not os.path.isdir(save_dir): 81 | os.makedirs(save_dir) 82 | save_path = os.path.join(save_dir, 'no_body_diogo1.pkl') 83 | 84 | save_obj = predicted_joint_list 85 | with open(save_path, 'wb') as f: 86 | pickle.dump(save_obj, f) 87 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/utils/__init__.py -------------------------------------------------------------------------------- /utils/calculate_errors.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from utils_proj.rigid_transform_with_scale import umeyama 4 | from utils_proj.skeleton import Skeleton 5 | from copy import deepcopy 6 | 7 | 8 | def global_align_skeleton_seq(estimated_seq, gt_seq): 9 | estimated_seq = np.asarray(estimated_seq).reshape((-1, 3)) 10 | gt_seq = np.asarray(gt_seq).reshape((-1, 3)) 11 | # aligned_pose_list = np.zeros_like(estimated_seq) 12 | # for s in range(estimated_seq.shape[0]): 13 | # pose_p = estimated_seq[s] 14 | # pose_gt_bs = gt_seq[s] 15 | c, R, t = umeyama(estimated_seq, gt_seq) 16 | pose_p = estimated_seq.dot(R) * c + t 17 | # aligned_pose_list[s] = pose_p 18 | 19 | return pose_p.reshape((-1, 15, 3)) 20 | 21 | 22 | def calculate_error(estimated_seq, gt_seq): 23 | estimated_seq = np.asarray(estimated_seq) 24 | gt_seq = np.asarray(gt_seq) 25 | distance = estimated_seq - gt_seq 26 | distance = np.linalg.norm(distance, axis=2) 27 | m_distance = np.mean(distance) 28 | return m_distance 29 | 30 | 31 | def calculate_slam_error(estimated_seq, gt_seq, align=False): 32 | # seq shape: n_seq, 15, 3 33 | estimated_seq = np.asarray(estimated_seq) 34 | gt_seq = np.asarray(gt_seq) 35 | estimated_root_seq = (estimated_seq[:, 7, :] + estimated_seq[:, 11, :]) / 2 36 | gt_root_seq = (gt_seq[:, 7, :] + gt_seq[:, 11, :]) / 2 37 | 38 | if align is True: 39 | c, R, t = umeyama(estimated_root_seq, gt_root_seq) 40 | estimated_root_seq = estimated_root_seq.dot(R) * c + t 41 | 42 | distance = estimated_root_seq - gt_root_seq 43 | distance = np.linalg.norm(distance, axis=1) 44 | m_distance = np.mean(distance) 45 | return m_distance 46 | 47 | def align_skeleton_size(estimated_seq, gt_seq): 48 | estimated_seq = deepcopy(np.asarray(estimated_seq)) 49 | gt_seq = deepcopy(np.asarray(gt_seq)) 50 | aligned_pose_list = np.zeros_like(estimated_seq) 51 | for s in range(estimated_seq.shape[0]): 52 | pose_p = estimated_seq[s] 53 | pose_gt_bs = gt_seq[s] 54 | c, R, t = umeyama(pose_p, pose_gt_bs) 55 | pose_p = pose_p * c 56 | aligned_pose_list[s] = pose_p 57 | 58 | return aligned_pose_list 59 | 60 | def align_skeleton(estimated_seq, gt_seq, skeleton_model=None, scale=True): 61 | estimated_seq = deepcopy(np.asarray(estimated_seq)) 62 | gt_seq = deepcopy(np.asarray(gt_seq)) 63 | if skeleton_model is not None: 64 | for i in range(len(estimated_seq)): 65 | estimated_seq[i] = skeleton_model.skeleton_resize_single( 66 | estimated_seq[i], 67 | bone_length_file='utils/fisheye/mean3D.mat') 68 | for i in range(len(gt_seq)): 69 | gt_seq[i] = skeleton_model.skeleton_resize_single( 70 | gt_seq[i], 71 | bone_length_file='utils/fisheye/mean3D.mat') 72 | 73 | aligned_pose_list = np.zeros_like(estimated_seq) 74 | for s in range(estimated_seq.shape[0]): 75 | pose_p = estimated_seq[s] 76 | pose_gt_bs = gt_seq[s] 77 | if scale is False: 78 | # if scale is False, firstly align the center of each pose 79 | pose_p_center = np.mean(pose_p, axis=0) 80 | pose_gt_center = np.mean(pose_gt_bs, axis=0) 81 | pose_p -= pose_p_center 82 | pose_gt_bs -= pose_gt_center 83 | 84 | c, R, t = umeyama(pose_p, pose_gt_bs) 85 | if scale is True: 86 | pose_p = pose_p.dot(R) * c + t 87 | else: 88 | pose_p = pose_p.dot(R) + t 89 | aligned_pose_list[s] = pose_p 90 | 91 | return aligned_pose_list, gt_seq 92 | 93 | 94 | def calculate_joint_error(estimated_seq, gt_seq): 95 | estimated_seq = np.asarray(estimated_seq) 96 | gt_seq = np.asarray(gt_seq) 97 | distance = estimated_seq - gt_seq 98 | distance = np.linalg.norm(distance, axis=2) 99 | joints_distance = np.mean(distance, axis=0) 100 | return joints_distance 101 | 102 | 103 | def calculate_errors(final_estimated_seq, mid_estimated_seq, final_optimized_seq=None, final_gt_seq=None): 104 | skeleton_model = Skeleton(None) 105 | original_global_mpjpe = calculate_error(final_estimated_seq, final_gt_seq) 106 | mid_global_mpjpe = calculate_error(mid_estimated_seq, final_gt_seq) 107 | optimized_global_mpjpe = calculate_error(final_optimized_seq, final_gt_seq) 108 | 109 | original_camera_pos_error = calculate_slam_error(final_estimated_seq, final_gt_seq) 110 | optimized_camera_pos_error = calculate_slam_error(final_optimized_seq, final_gt_seq) 111 | 112 | 113 | 114 | # align the estimated result and original result 115 | 116 | aligned_estimated_seq_result = global_align_skeleton_seq(final_estimated_seq, final_gt_seq) 117 | aligned_estimated_mid_seq_result = global_align_skeleton_seq(mid_estimated_seq, final_gt_seq) 118 | aligned_optimized_seq_result = global_align_skeleton_seq(final_optimized_seq, final_gt_seq) 119 | 120 | original_aligned_camera_pos_error = calculate_slam_error(aligned_estimated_seq_result, final_gt_seq, align=False) 121 | mid_aligned_camera_pose_error = calculate_slam_error(aligned_estimated_mid_seq_result, final_gt_seq, align=False) 122 | optimized_aligned_camera_pos_error = calculate_slam_error(aligned_optimized_seq_result, final_gt_seq, align=False) 123 | 124 | aligned_original_seq_mpjpe = calculate_error(aligned_estimated_seq_result, final_gt_seq) 125 | aligned_mid_seq_mpjpe = calculate_error(aligned_estimated_mid_seq_result, final_gt_seq) 126 | aligned_optimized_seq_mpjpe = calculate_error(aligned_optimized_seq_result, final_gt_seq) 127 | 128 | # align the estimated result and original result 129 | aligned_estimated_result, final_gt_seq = align_skeleton(final_estimated_seq, final_gt_seq, None) 130 | aligned_mid_optimized_result, final_gt_seq = align_skeleton(mid_estimated_seq, final_gt_seq, None) 131 | aligned_optimized_result, final_gt_seq = align_skeleton(final_optimized_seq, final_gt_seq, None) 132 | 133 | aligned_original_mpjpe = calculate_error(aligned_estimated_result, final_gt_seq) 134 | aligned_mid_optimized_mpjpe = calculate_error(aligned_mid_optimized_result, final_gt_seq) 135 | aligned_optimized_mpjpe = calculate_error(aligned_optimized_result, final_gt_seq) 136 | 137 | # align the estimated result and original result 138 | aligned_estimated_result, final_gt_seq = align_skeleton(final_estimated_seq, final_gt_seq, skeleton_model) 139 | aligned_mid_optimized_result, final_gt_seq = align_skeleton(mid_estimated_seq, final_gt_seq, skeleton_model) 140 | aligned_optimized_result, final_gt_seq = align_skeleton(final_optimized_seq, final_gt_seq, skeleton_model) 141 | 142 | bone_length_aligned_original_mpjpe = calculate_error(aligned_estimated_result, final_gt_seq) 143 | bone_length_aligned_mid_optimized_mpjpe = calculate_error(aligned_mid_optimized_result, final_gt_seq) 144 | bone_length_aligned_optimized_mpjpe = calculate_error(aligned_optimized_result, final_gt_seq) 145 | joints_error = calculate_joint_error(aligned_optimized_result, final_gt_seq) 146 | 147 | from collections import OrderedDict 148 | result = OrderedDict({'original_global_mpjpe': original_global_mpjpe, 149 | 'mid_global_mpjpe': mid_global_mpjpe, 150 | 'optimized_global_mpjpe': optimized_global_mpjpe, 151 | 'original_camera_pos_error': original_camera_pos_error, 152 | 'optimized_camera_pos_error': optimized_camera_pos_error, 153 | 154 | 'original_aligned_camera_pos_error': original_aligned_camera_pos_error, 155 | 'mid_aligned_camera_pose_error': mid_aligned_camera_pose_error, 156 | 'optimized_aligned_camera_pos_error': optimized_aligned_camera_pos_error, 157 | 158 | 'original_aligned_global_mpjpe': aligned_original_seq_mpjpe, 159 | "aligned_mid_seq_mpjpe": aligned_mid_seq_mpjpe, 160 | 'optimized_aligned_global_mpjpe': aligned_optimized_seq_mpjpe, 161 | 'aligned_original_mpjpe': aligned_original_mpjpe, 162 | 'aligned_mid_optimized_mpjpe': aligned_mid_optimized_mpjpe, 163 | 'aligned_optimized_mpjpe': aligned_optimized_mpjpe, 164 | 'bone_length_aligned_original_mpjpe': bone_length_aligned_original_mpjpe, 165 | 'bone_length_aligned_mid_optimized_mpjpe': bone_length_aligned_mid_optimized_mpjpe, 166 | 'bone_length_aligned_optimized_mpjpe': bone_length_aligned_optimized_mpjpe, 167 | 'joints_error': joints_error}) 168 | return result 169 | -------------------------------------------------------------------------------- /utils/cfg.py: -------------------------------------------------------------------------------- 1 | import yaml 2 | from easydict import EasyDict as edict 3 | 4 | 5 | def load_config(path): 6 | with open(path) as fin: 7 | config = edict(yaml.safe_load(fin)) 8 | 9 | return config 10 | -------------------------------------------------------------------------------- /utils/data_transforms.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # Developed by Haozhe Xie 4 | # References: 5 | # - https://github.com/xiumingzhang/GenRe-ShapeHD 6 | 7 | import cv2 8 | import matplotlib.pyplot as plt 9 | import matplotlib.patches as patches 10 | import numpy as np 11 | import os 12 | import random 13 | import torch 14 | 15 | 16 | class Compose(object): 17 | """ Composes several transforms together. 18 | For example: 19 | >>> transforms.Compose([ 20 | >>> transforms.RandomBackground(), 21 | >>> transforms.CenterCrop(127, 127, 3), 22 | >>> ]) 23 | """ 24 | 25 | def __init__(self, transforms): 26 | self.transforms = transforms 27 | 28 | def __call__(self, image, bounding_box=None): 29 | for t in self.transforms: 30 | if t.__class__.__name__ == 'RandomCrop' or t.__class__.__name__ == 'CenterCrop': 31 | image = t(image, bounding_box) 32 | else: 33 | image = t(image) 34 | 35 | return image 36 | 37 | 38 | class ToTensor(object): 39 | """ 40 | Convert a numpy.ndarray to tensor. 41 | """ 42 | 43 | def __call__(self, image): 44 | assert (isinstance(image, np.ndarray)) 45 | # HWC to CHW 46 | array = np.transpose(image, (2, 0, 1)) 47 | # handle numpy array 48 | tensor = torch.from_numpy(array) 49 | return tensor.float() 50 | 51 | 52 | class BGR2RGB: 53 | """ 54 | convert BGR image to RGB image 55 | """ 56 | 57 | def __call__(self, image): 58 | # BGR to RGB 59 | return image[:, :, ::-1] 60 | 61 | 62 | class Normalize(object): 63 | def __init__(self, mean, std): 64 | self.mean = mean 65 | self.std = std 66 | 67 | def __call__(self, image): 68 | assert (isinstance(image, np.ndarray)) 69 | image -= self.mean 70 | image /= self.std 71 | 72 | return image 73 | 74 | 75 | class SimpleNormalize(object): 76 | def __init__(self): 77 | self.mean = 0.4 78 | 79 | def __call__(self, image): 80 | assert (isinstance(image, np.ndarray)) 81 | image -= self.mean 82 | return image 83 | 84 | 85 | class RandomPermuteRGB(object): 86 | """ 87 | Random permute RGB channels??? 88 | """ 89 | 90 | def __call__(self, image): 91 | assert (isinstance(image, np.ndarray)) 92 | 93 | random_permutation = np.random.permutation(3) 94 | image = image[..., random_permutation] 95 | 96 | return image 97 | 98 | 99 | class CenterCrop(object): 100 | def __init__(self, img_size, crop_size): 101 | """Set the height and weight before and after cropping""" 102 | self.img_size_h = img_size[0] 103 | self.img_size_w = img_size[1] 104 | self.crop_size_h = crop_size[0] 105 | self.crop_size_w = crop_size[1] 106 | 107 | def __call__(self, image, bounding_box=None): 108 | 109 | img_height, img_width, _ = image.shape 110 | 111 | if bounding_box is not None: 112 | bounding_box = [ 113 | bounding_box[0], 114 | bounding_box[1], 115 | bounding_box[2], 116 | bounding_box[3] 117 | ] # yapf: disable 118 | 119 | # Calculate the size of bounding boxes 120 | bbox_width = bounding_box[2] - bounding_box[0] 121 | bbox_height = bounding_box[3] - bounding_box[1] 122 | bbox_x_mid = (bounding_box[2] + bounding_box[0]) * .5 123 | bbox_y_mid = (bounding_box[3] + bounding_box[1]) * .5 124 | 125 | # Make the crop area as a square 126 | square_object_size = max(bbox_width, bbox_height) 127 | 128 | x_left = int(bbox_x_mid - square_object_size * .5) 129 | x_right = int(bbox_x_mid + square_object_size * .5) 130 | y_top = int(bbox_y_mid - square_object_size * .5) 131 | y_bottom = int(bbox_y_mid + square_object_size * .5) 132 | 133 | # If the crop position is out of the image, fix it with padding 134 | pad_x_left = 0 135 | if x_left < 0: 136 | pad_x_left = -x_left 137 | x_left = 0 138 | pad_x_right = 0 139 | if x_right >= img_width: 140 | pad_x_right = x_right - img_width + 1 141 | x_right = img_width - 1 142 | pad_y_top = 0 143 | if y_top < 0: 144 | pad_y_top = -y_top 145 | y_top = 0 146 | pad_y_bottom = 0 147 | if y_bottom >= img_height: 148 | pad_y_bottom = y_bottom - img_height + 1 149 | y_bottom = img_height - 1 150 | 151 | # Padding the image and resize the image 152 | processed_image = np.pad( 153 | image[y_top:y_bottom + 1, x_left:x_right + 1], ((pad_y_top, pad_y_bottom), (pad_x_left, pad_x_right), 154 | (0, 0)), 155 | mode='edge') 156 | processed_image = cv2.resize(processed_image, (self.img_size_w, self.img_size_h)) 157 | else: 158 | if img_height > self.crop_size_h and img_width > self.crop_size_w: 159 | x_left = int(img_width - self.crop_size_w) // 2 160 | x_right = int(x_left + self.crop_size_w) 161 | y_top = int(img_height - self.crop_size_h) // 2 162 | y_bottom = int(y_top + self.crop_size_h) 163 | else: 164 | x_left = 0 165 | x_right = img_width 166 | y_top = 0 167 | y_bottom = img_height 168 | 169 | processed_image = cv2.resize(image[y_top:y_bottom, x_left:x_right], (self.img_size_w, self.img_size_h)) 170 | 171 | return processed_image 172 | 173 | 174 | class RandomCrop(object): 175 | def __init__(self, img_size, crop_size): 176 | """Set the height and weight before and after cropping""" 177 | self.img_size_h = img_size[0] 178 | self.img_size_w = img_size[1] 179 | self.crop_size_h = crop_size[0] 180 | self.crop_size_w = crop_size[1] 181 | 182 | def __call__(self, image, silhouette=None, bounding_box=None): 183 | 184 | img_height, img_width, crop_size_c = image.shape 185 | 186 | if bounding_box is not None: 187 | bounding_box = [ 188 | bounding_box[0], 189 | bounding_box[1], 190 | bounding_box[2], 191 | bounding_box[3] 192 | ] # yapf: disable 193 | 194 | # Calculate the size of bounding boxes 195 | bbox_width = bounding_box[2] - bounding_box[0] 196 | bbox_height = bounding_box[3] - bounding_box[1] 197 | bbox_x_mid = (bounding_box[2] + bounding_box[0]) * .5 198 | bbox_y_mid = (bounding_box[3] + bounding_box[1]) * .5 199 | 200 | # Make the crop area as a square 201 | square_object_size = max(bbox_width, bbox_height) 202 | square_object_size = square_object_size * random.uniform(0.8, 1.2) 203 | 204 | x_left = int(bbox_x_mid - square_object_size * random.uniform(.4, .6)) 205 | x_right = int(bbox_x_mid + square_object_size * random.uniform(.4, .6)) 206 | y_top = int(bbox_y_mid - square_object_size * random.uniform(.4, .6)) 207 | y_bottom = int(bbox_y_mid + square_object_size * random.uniform(.4, .6)) 208 | 209 | # If the crop position is out of the image, fix it with padding 210 | pad_x_left = 0 211 | if x_left < 0: 212 | pad_x_left = -x_left 213 | x_left = 0 214 | pad_x_right = 0 215 | if x_right >= img_width: 216 | pad_x_right = x_right - img_width + 1 217 | x_right = img_width - 1 218 | pad_y_top = 0 219 | if y_top < 0: 220 | pad_y_top = -y_top 221 | y_top = 0 222 | pad_y_bottom = 0 223 | if y_bottom >= img_height: 224 | pad_y_bottom = y_bottom - img_height + 1 225 | y_bottom = img_height - 1 226 | 227 | # Padding the image and resize the image 228 | processed_image = np.pad( 229 | image[y_top:y_bottom + 1, x_left:x_right + 1], ((pad_y_top, pad_y_bottom), (pad_x_left, pad_x_right), 230 | (0, 0)), mode='edge') 231 | processed_image = cv2.resize(processed_image, (self.img_size_w, self.img_size_h)) 232 | if silhouette is not None: 233 | processed_silhouette = np.pad( 234 | silhouette[y_top:y_bottom + 1, x_left:x_right + 1], 235 | ((pad_y_top, pad_y_bottom), (pad_x_left, pad_x_right), 236 | (0, 0)), mode='constant') 237 | processed_silhouette = cv2.resize(processed_silhouette, (self.img_size_w, self.img_size_h)) 238 | else: 239 | processed_silhouette = None 240 | 241 | 242 | else: 243 | if img_height > self.crop_size_h and img_width > self.crop_size_w: 244 | x_left = int(img_width - self.crop_size_w) // 2 245 | x_right = int(x_left + self.crop_size_w) 246 | y_top = int(img_height - self.crop_size_h) // 2 247 | y_bottom = int(y_top + self.crop_size_h) 248 | else: 249 | x_left = 0 250 | x_right = img_width 251 | y_top = 0 252 | y_bottom = img_height 253 | 254 | processed_image = cv2.resize(image[y_top:y_bottom, x_left:x_right], (self.img_size_w, self.img_size_h)) 255 | if silhouette is not None: 256 | processed_silhouette = cv2.resize(silhouette[y_top:y_bottom, x_left:x_right], 257 | (self.img_size_w, self.img_size_h)) 258 | else: 259 | processed_silhouette = None 260 | 261 | return processed_image, processed_silhouette 262 | 263 | 264 | class RandomFlip(object): 265 | def __call__(self, image, silhouette=None): 266 | assert (isinstance(image, np.ndarray)) 267 | 268 | if random.randint(0, 1): 269 | image = np.fliplr(image) 270 | if silhouette is not None: 271 | silhouette = np.fliplr(silhouette) 272 | 273 | return image, silhouette 274 | 275 | 276 | class RandomNoise(object): 277 | def __init__(self, 278 | noise_std, 279 | eigvals=(0.2175, 0.0188, 0.0045), 280 | eigvecs=((-0.5675, 0.7192, 0.4009), (-0.5808, -0.0045, -0.8140), (-0.5836, -0.6948, 0.4203))): 281 | self.noise_std = noise_std 282 | self.eigvals = np.array(eigvals) 283 | self.eigvecs = np.array(eigvecs) 284 | 285 | def __call__(self, image): 286 | alpha = np.random.normal(loc=0, scale=self.noise_std, size=3) 287 | noise_rgb = \ 288 | np.sum( 289 | np.multiply( 290 | np.multiply( 291 | self.eigvecs, 292 | np.tile(alpha, (3, 1)) 293 | ), 294 | np.tile(self.eigvals, (3, 1)) 295 | ), 296 | axis=1 297 | ) 298 | 299 | # Allocate new space for storing processed images 300 | img_height, img_width, img_channels = image.shape 301 | assert (img_channels == 3), "Please use RandomBackground to normalize image channels" 302 | for i in range(img_channels): 303 | image[:, :, i] += noise_rgb[i] 304 | 305 | return image 306 | 307 | 308 | class ColorJitter(object): 309 | def __init__(self, color_add, color_mul): 310 | self.color_add_low = 0 311 | self.color_add_high = color_add 312 | self.color_mul_low = 1 - color_mul 313 | self.color_mul_high = 1 + color_mul 314 | 315 | def __call__(self, rendering_image): 316 | color_add = np.random.uniform(self.color_add_low, self.color_add_high, size=(1, 1, 3)) 317 | color_mul = np.random.uniform(self.color_mul_low, self.color_mul_high, size=(1, 1, 3)) 318 | rendering_image = rendering_image + color_add 319 | rendering_image = rendering_image * color_mul 320 | return rendering_image 321 | 322 | 323 | if __name__ == '__main__': 324 | from config import consts 325 | import skimage.io as img_io 326 | 327 | # test random noise 328 | random_noise = RandomNoise(consts.img.noise_std) 329 | image = img_io.imread('/home/wangjian/Develop/3DReconstruction/tmp/bobo.jpg') / 255. 330 | # img_io.imshow(image) 331 | # img_io.show() 332 | image = random_noise(image) 333 | image = image / np.max(image, (0, 1)) 334 | img_io.imshow(image) 335 | img_io.show() 336 | -------------------------------------------------------------------------------- /utils/depth2pointcloud.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import open3d 4 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated 5 | from utils.fisheye.FishEyeEquisolid import FishEyeCameraEquisolid 6 | import os 7 | 8 | 9 | class Depth2PointCloud: 10 | def __init__(self, visualization, camera_model='utils/fisheye/fisheye.calibration_05_08.json', 11 | post_process=True): 12 | if camera_model == 'FishEyeCameraEquisolid': 13 | self.camera = FishEyeCameraEquisolid(focal_length=9, sensor_size=32, img_size=(1280, 1024)) 14 | else: 15 | self.camera = FishEyeCameraCalibrated(calibration_file_path=camera_model) 16 | self.visualization = visualization 17 | self.post_process = post_process 18 | 19 | def depth2pointcloud(self, camera, depth_img, real_img): 20 | depth_img = depth_img.transpose() 21 | real_img = np.transpose(real_img, axes=(1, 0, 2)) 22 | points = np.zeros(shape=(depth_img.shape[0], depth_img.shape[1], 2)) 23 | x_range = np.array(range(depth_img.shape[0])) 24 | y_range = np.array(range(depth_img.shape[1])) 25 | points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose() 26 | points[:, :, 1] = np.add(points[:, :, 1], y_range) 27 | points = points.reshape((-1, 2)) 28 | depth_img_flat = depth_img.reshape((-1)) 29 | # opencv color to RGB color between [0, 1) 30 | colors = real_img[:, :, ::-1] 31 | colors = colors.reshape((-1, 3)) / 255. 32 | points_3d = camera.camera2world(point=points, depth=depth_img_flat) 33 | return points_3d, colors 34 | 35 | 36 | def depth2pointcloud_no_color(self, camera, depth_img): 37 | depth_img = depth_img.transpose() 38 | points = np.zeros(shape=(depth_img.shape[0], depth_img.shape[1], 2)) 39 | x_range = np.array(range(depth_img.shape[0])) 40 | y_range = np.array(range(depth_img.shape[1])) 41 | points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose() 42 | points[:, :, 1] = np.add(points[:, :, 1], y_range) 43 | points = points.reshape((-1, 2)) 44 | depth_img_flat = depth_img.reshape((-1)) 45 | # opencv color to RGB color between [0, 1) 46 | points_3d = camera.camera2world(point=points, depth=depth_img_flat) 47 | final_point = [] 48 | for point in points_3d: 49 | if point[2] > 0.1: 50 | final_point.append(point) 51 | return final_point 52 | 53 | def __get_img_mask(self, img_width=1280, img_height=1024): 54 | radius = int(img_height / 2 - 30) 55 | mask = np.zeros(shape=[img_height, img_width, 3]) 56 | cv2.circle(mask, center=(img_width // 2, img_height // 2), radius=radius, color=(255, 255, 255), thickness=-1) 57 | return mask / 255. 58 | 59 | def postprocess(self, point_3d, colors): 60 | final_point = [] 61 | final_color = [] 62 | for point, color in zip(point_3d, colors): 63 | if point[2] > 0.1: 64 | final_point.append(point) 65 | final_color.append(color) 66 | return final_point, final_color 67 | 68 | def __visualize(self, point_cloud): 69 | mesh_frame = open3d.geometry.TriangleMesh.create_coordinate_frame() 70 | open3d.visualization.draw_geometries([point_cloud, mesh_frame]) 71 | 72 | def get_point_cloud_single_image(self, depth_path, img_path, output_path=None): 73 | depth = cv2.imread(depth_path, cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH) 74 | depth = cv2.resize(depth, dsize=(1280, 1024), interpolation=cv2.INTER_NEAREST) 75 | if len(depth.shape) == 3: 76 | depth = depth[:, :, 0] 77 | depth[depth > 100] = 0 78 | 79 | img = cv2.imread(img_path) 80 | img = cv2.resize(img, dsize=(1280, 1024), interpolation=cv2.INTER_LINEAR) 81 | 82 | point_3d, colors = self.depth2pointcloud(self.camera, depth, img) 83 | 84 | if self.post_process: 85 | point_3d, colors = self.postprocess(point_3d, colors) 86 | 87 | # visualization 88 | point_cloud = open3d.geometry.PointCloud() 89 | point_cloud.points = open3d.utility.Vector3dVector(point_3d) 90 | point_cloud.colors = open3d.utility.Vector3dVector(colors) 91 | 92 | if self.visualization: 93 | self.__visualize(point_cloud) 94 | if output_path is not None: 95 | open3d.io.write_point_cloud(output_path, point_cloud) 96 | 97 | return point_cloud 98 | 99 | 100 | if __name__ == '__main__': 101 | root_path = r'\\winfs-inf\CT\EgoMocap\work\EgocentricDepthEstimation\data' 102 | img_name_map = {'kitchen': 'fc2_save_2017-11-08-124903-0100.jpg', 103 | 'office': 'fc2_save_2017-11-08-162032-0000.jpg', 104 | 'img': '004.png', 105 | 'kitchen_jian': 'img_-04032020183051-58.jpg', 106 | 'kitchen_2': 'fc2_save_2017-11-08-124903-2300.jpg', 107 | 'office_2': 'fc2_save_2017-11-08-162032-1300.jpg', 108 | 'studio': 'frame_c_0_f_0180.png', 109 | 'office_3': 'fc2_save_2017-11-06-152157-0120.jpg', 110 | 'me': 'me_processed.jpg', 111 | 'scannet': '000.png', 112 | 'kitchen_3': 'fc2_save_2017-11-08-124903-0793.jpg', 113 | 'real': '0.png', 114 | 'shakehead': '2.jpg', 115 | 'kitchen_rot': '1.jpg', 116 | 'kitchen_seq': 'fc2_save_2017-11-08-124903-0108.jpg', 117 | 'synthetic_data': '000010.png', 118 | 'synthetic_data_2': '000000.png', 119 | 'jian3': 'img-10082020170800-793.jpg'} 120 | scene_dir = r'kitchen' 121 | img_name = img_name_map[scene_dir] 122 | depth_name = img_name + '.exr' 123 | depth_dir = 'wo_body_lr_1e-4_finetune' 124 | depth_path = os.path.join(root_path, scene_dir, depth_dir, depth_name) 125 | img_path = os.path.join(root_path, scene_dir, img_name) 126 | 127 | get_point_cloud = Depth2PointCloud(visualization=True) 128 | 129 | get_point_cloud.get_point_cloud_single_image(depth_path, img_path, 130 | output_path=os.path.join(r'F:\Develop\egocentricdepthestimation\reconstructed_scene_pointcloud', 131 | '{}_{}_{}.ply'.format(scene_dir, depth_dir, depth_name))) 132 | -------------------------------------------------------------------------------- /utils/fisheye/FishEyeCalibrated.py: -------------------------------------------------------------------------------- 1 | import json 2 | import numpy as np 3 | import torch 4 | from copy import deepcopy 5 | 6 | 7 | class FishEyeCameraCalibrated: 8 | def __init__(self, calibration_file_path, use_gpu=False): 9 | with open(calibration_file_path) as f: 10 | calibration_data = json.load(f) 11 | self.intrinsic = np.array(calibration_data['intrinsic']) 12 | self.img_size = np.array(calibration_data['size']) # w, h 13 | self.fisheye_polynomial = np.array(calibration_data['polynomialC2W']) 14 | self.fisheye_inverse_polynomial = np.array(calibration_data['polynomialW2C']) 15 | self.img_center = np.array([self.intrinsic[0][2], self.intrinsic[1][2]]) 16 | self.use_gpu = use_gpu 17 | # self.img_center = np.array([self.img_size[0] / 2, self.img_size[1] / 2]) 18 | 19 | def camera2world(self, point: np.ndarray, depth: np.ndarray): 20 | """ 21 | point: np.ndarray of 2D points on image (n * 2) 22 | depth: np.ndarray of depth of every 2D points (n) 23 | """ 24 | depth = depth.astype(np.float32) 25 | point_centered = point.astype(np.float32) - self.img_center 26 | x = point_centered[:, 0] 27 | y = point_centered[:, 1] 28 | distance_from_center = np.sqrt(np.square(x) + np.square(y)) 29 | 30 | z = np.polyval(p=self.fisheye_polynomial[::-1], x=distance_from_center) 31 | point_3d = np.array([x, y, -z]) # 3, n 32 | norm = np.linalg.norm(point_3d, axis=0) 33 | point_3d = point_3d / norm * depth 34 | return point_3d.transpose() 35 | 36 | def camera2world_ray(self, point: np.ndarray): 37 | """ 38 | calculate the ray direction from the image points 39 | point: np.ndarray of 2D points on image (n * 2) 40 | depth: np.ndarray of depth of every 2D points (n) 41 | """ 42 | point_centered = point.astype(np.float) - self.img_center 43 | x = point_centered[:, 0] 44 | y = point_centered[:, 1] 45 | distance_from_center = np.sqrt(np.square(x) + np.square(y)) 46 | 47 | z = np.polyval(p=self.fisheye_polynomial[::-1], x=distance_from_center) 48 | point_3d = np.array([x, y, -z]) # 3, n 49 | norm = np.linalg.norm(point_3d, axis=0) 50 | point_3d = point_3d / norm 51 | return point_3d.transpose() 52 | 53 | def getPolyVal(self, p, x): 54 | curVal = torch.zeros_like(x) 55 | for curValIndex in range(len(p) - 1): 56 | curVal = (curVal + p[curValIndex]) * x 57 | return curVal + p[len(p) - 1] 58 | 59 | def camera2world_pytorch(self, point: torch.Tensor, depth: torch.Tensor): 60 | """ 61 | point: np.ndarray of 2D points on image (n * 2) 62 | depth: np.ndarray of depth of every 2D points (n) 63 | """ 64 | img_center = torch.from_numpy(self.img_center).float().to(point.device) 65 | point_centered = point.float() - img_center 66 | x = point_centered[:, 0] 67 | y = point_centered[:, 1] 68 | distance_from_center = torch.sqrt(torch.square(x) + torch.square(y)) 69 | 70 | z = self.getPolyVal(p=self.fisheye_polynomial[::-1], x=distance_from_center) 71 | point_3d = torch.stack([x, y, -z]).float() # 3, n 72 | norm = torch.norm(point_3d, dim=0) 73 | point_3d = point_3d / norm * depth.float() 74 | return point_3d.t() 75 | 76 | def camera2world_ray_pytorch(self, point: torch.Tensor): 77 | """ 78 | point: np.ndarray of 2D points on image (n * 2) 79 | depth: np.ndarray of depth of every 2D points (n) 80 | """ 81 | point_centered = point.float() - self.img_center 82 | x = point_centered[:, 0] 83 | y = point_centered[:, 1] 84 | distance_from_center = torch.sqrt(torch.square(x) + torch.square(y)) 85 | 86 | z = self.getPolyVal(p=self.fisheye_polynomial[::-1], x=distance_from_center) 87 | point_3d = torch.stack([x, y, -z]).float() # 3, n 88 | norm = torch.norm(point_3d, dim=0) 89 | point_3d = point_3d / norm 90 | return point_3d.t() 91 | 92 | def world2camera(self, point3D): 93 | point3D = deepcopy(point3D) 94 | point3D[:, 2] = point3D[:, 2] * -1 95 | point3D = point3D.T 96 | xc, yc = self.img_center[0], self.img_center[1] 97 | point2D = [] 98 | 99 | norm = np.linalg.norm(point3D[:2], axis=0) 100 | 101 | if (norm != 0).all(): 102 | theta = np.arctan(point3D[2] / norm) 103 | invnorm = 1.0 / norm 104 | t = theta 105 | rho = self.fisheye_inverse_polynomial[0] 106 | t_i = 1.0 107 | 108 | for i in range(1, len(self.fisheye_inverse_polynomial)): 109 | t_i *= t 110 | rho += t_i * self.fisheye_inverse_polynomial[i] 111 | 112 | x = point3D[0] * invnorm * rho 113 | y = point3D[1] * invnorm * rho 114 | 115 | point2D.append(x + xc) 116 | point2D.append(y + yc) 117 | else: 118 | point2D.append(xc) 119 | point2D.append(yc) 120 | raise Exception("norm is zero!") 121 | 122 | return np.asarray(point2D).T 123 | 124 | def world2camera_with_depth(self, point3D): 125 | point3D_cloned = deepcopy(point3D) 126 | point2D = self.world2camera(point3D_cloned) 127 | 128 | depth = np.linalg.norm(point3D, axis=-1) 129 | return point2D, depth 130 | 131 | def world2camera_pytorch_with_depth(self, point3D): 132 | point2D = self.world2camera_pytorch(point3D) 133 | 134 | depth = torch.norm(point3D, dim=-1) 135 | return point2D, depth 136 | 137 | def world2camera_pytorch(self, point3d_original: torch.Tensor, normalize=False): 138 | """ 139 | 140 | Args: 141 | point3d_original: point 142 | normalize: normalize to -1, 1 143 | 144 | Returns: 145 | 146 | """ 147 | fisheye_inv_polynomial = self.fisheye_inverse_polynomial 148 | point3d = point3d_original.clone() 149 | point3d[:, 2] = point3d_original[:, 2] * -1 150 | point3d = point3d.transpose(0, 1) 151 | xc, yc = self.img_center[0], self.img_center[1] 152 | xc = torch.Tensor([xc]).float().to(point3d.device) 153 | yc = torch.Tensor([yc]).float().to(point3d.device) 154 | point2d = torch.empty((2, point3d.shape[-1])).to(point3d.device) 155 | 156 | norm = torch.norm(point3d[:2], dim=0) 157 | 158 | if (norm != 0).all(): 159 | theta = torch.atan(point3d[2] / norm) 160 | invnorm = 1.0 / norm 161 | t = theta 162 | rho = fisheye_inv_polynomial[0] 163 | t_i = 1.0 164 | 165 | for i in range(1, len(fisheye_inv_polynomial)): 166 | t_i *= t 167 | rho += t_i * fisheye_inv_polynomial[i] 168 | 169 | x = point3d[0] * invnorm * rho 170 | y = point3d[1] * invnorm * rho 171 | 172 | point2d[0] = x + xc 173 | point2d[1] = y + yc 174 | else: 175 | point2d[0] = xc 176 | point2d[1] = yc 177 | raise Exception("norm is zero!") 178 | 179 | # if normalize, the point result will be -1 to 1from 180 | if normalize is True: 181 | image_w, image_h = self.img_size[0], self.img_size[1] 182 | assert image_w > image_h 183 | point2d[0] = point2d[0] - (image_w - image_h) // 2 # to square image 184 | point2d = point2d / (image_h - 1) * 2 # to [0, 2] 185 | point2d -= 1 186 | 187 | return point2d.transpose(0, 1) 188 | 189 | def undistort(self, point_2Ds): 190 | """ 191 | undistort the input 2d points in fisheye camera 192 | """ 193 | point_length = point_2Ds.shape[0] 194 | depths = np.ones(shape=point_length) 195 | point_3Ds = self.camera2world(point_2Ds, depths) 196 | 197 | # point_3Ds_homo = np.ones((point_length, 4)) 198 | # point_3Ds_homo[:, :3] = point_3Ds 199 | 200 | projected_2d_points = (self.intrinsic[:3, :3] @ point_3Ds.T).T 201 | projected_2d_points = projected_2d_points[:, :2] / projected_2d_points[:, 2:] 202 | return projected_2d_points 203 | 204 | 205 | if __name__ == '__main__': 206 | camera = FishEyeCameraCalibrated(r'Z:\EgoMocap\work\EgocentricFullBody\mmpose\utils\fisheye_camera\fisheye.calibration_01_12.json') 207 | point = np.array([[660, 520], [520, 660], [123, 456]]) 208 | depth = np.array([30, 30, 40]) 209 | point = torch.from_numpy(point).cuda() 210 | depth = torch.from_numpy(depth).cuda() 211 | point3d = camera.camera2world_pytorch(point, depth) 212 | print(point3d) 213 | 214 | # reprojected_point_2d = camera.world2camera(point3d) 215 | # print(reprojected_point_2d) 216 | 217 | reprojected_point_2d = camera.world2camera_pytorch(point3d.cpu(), normalize=False) 218 | print(reprojected_point_2d) 219 | -------------------------------------------------------------------------------- /utils/fisheye/FishEyeEquisolid.py: -------------------------------------------------------------------------------- 1 | import json 2 | import numpy as np 3 | import torch 4 | 5 | 6 | class FishEyeCameraEquisolid: 7 | def __init__(self, focal_length, sensor_size, img_size, use_gpu=False): 8 | """ 9 | @param focal_length: focal length of camera in mm 10 | @param sensor_size: sensor size of camera in mm 11 | @param img_size: image size of w, h 12 | @param use_gpu: whether use the gpu to accelerate the calculation 13 | """ 14 | self.sensor_size = sensor_size 15 | self.img_size = np.asarray(img_size) 16 | self.use_gpu = use_gpu 17 | # calculate the focal length in pixel 18 | self.focal_length = focal_length / np.max(sensor_size) * np.max(img_size) 19 | 20 | # calculate the image center 21 | self.img_center = self.img_size / 2 + 1e-10 22 | # get max distance from image center 23 | self.max_distance = self.focal_length * np.sqrt(2) 24 | 25 | if self.use_gpu: 26 | gpu_ok = torch.cuda.is_available() 27 | if gpu_ok is False: 28 | raise Exception("GPU is not available!") 29 | 30 | def camera2world(self, point: np.ndarray, depth: np.ndarray): 31 | """ 32 | @param point: 2d point in shape: (n, 2) 33 | @param depth: depth of every points 34 | @return: 3d position of every point 35 | """ 36 | # get the distance of point to center 37 | depth = depth.astype(np.float32) 38 | point_centered = point.astype(np.float32) - self.img_center 39 | x = point_centered[:, 0] 40 | y = point_centered[:, 1] 41 | distance_from_center = np.sqrt(np.square(x) + np.square(y)) 42 | distance_from_center[distance_from_center > self.max_distance - 30] = self.max_distance 43 | 44 | theta = 2 * np.arcsin(distance_from_center / (2 * self.focal_length)) 45 | Z = distance_from_center / np.tan(theta) 46 | 47 | # square_sin_theta_div_2 = np.square(distance_from_center / (2 * self.focal_length)) 48 | # tan_theta_div_1 = np.sqrt(1 / (4 * square_sin_theta_div_2 * (1 - square_sin_theta_div_2)) - 1) 49 | # Z = distance_from_center * tan_theta_div_1 50 | point_3d = np.array([x, y, Z]) 51 | norm = np.linalg.norm(point_3d, axis=0) 52 | point_3d = point_3d / norm * depth 53 | return point_3d.transpose() 54 | 55 | def camera2world_pytorch(self, point, depth): 56 | """ 57 | @param point: 2d point in shape: (n, 2) 58 | @param depth: depth of every points 59 | @return: 3d position of every point 60 | """ 61 | # get the distance of point to center 62 | depth = depth.float() 63 | img_center = torch.from_numpy(self.img_center) 64 | point_centered = point.float() - img_center.to(point.device) 65 | x = point_centered[:, 0] 66 | y = point_centered[:, 1] 67 | distance_from_center = torch.sqrt(torch.square(x) + torch.square(y)) 68 | distance_from_center[distance_from_center > self.max_distance - 30] = self.max_distance 69 | 70 | theta = 2 * torch.arcsin(distance_from_center / (2 * self.focal_length)) 71 | Z = distance_from_center / torch.tan(theta) 72 | 73 | point_3d = torch.stack([x, y, Z]) 74 | norm = torch.norm(point_3d, dim=0) 75 | point_3d = point_3d / norm * depth 76 | point_3d = point_3d.float() 77 | return point_3d.T 78 | 79 | def world2camera(self, point3D): 80 | # calculate depth 81 | x = point3D[:, 0] 82 | y = point3D[:, 1] 83 | z = point3D[:, 2] 84 | depth = np.linalg.norm(point3D, axis=-1) 85 | 86 | # calculate theta 87 | distance_to_center_3d = np.sqrt(np.square(x) + np.square(y)) 88 | tan_theta = distance_to_center_3d / z 89 | theta = np.arctan(tan_theta) 90 | 91 | R = 2 * self.focal_length * np.sin(theta / 2) 92 | a = np.sqrt(np.square(R) / (np.square(x) + np.square(y))) 93 | X = a * x 94 | Y = a * y 95 | point2D = np.array([X, Y]).T + self.img_center 96 | return point2D, depth 97 | 98 | def world2camera_pytorch(self, point3D): 99 | # calculate depth 100 | x = point3D[:, 0] 101 | y = point3D[:, 1] 102 | z = point3D[:, 2] 103 | depth = torch.norm(point3D, dim=-1) 104 | 105 | # calculate theta 106 | distance_to_center_3d = torch.sqrt(torch.square(x) + torch.square(y)) 107 | tan_theta = distance_to_center_3d / z 108 | 109 | theta = torch.arctan(tan_theta) 110 | 111 | R = 2 * self.focal_length * torch.sin(theta / 2) 112 | a = torch.sqrt(torch.square(R) / (torch.square(x) + torch.square(y))) 113 | X = a * x 114 | Y = a * y 115 | img_center = torch.from_numpy(self.img_center) 116 | point2D = torch.stack([X, Y], dim=-1) + img_center.to(point3D.device) 117 | return point2D.float(), depth.float() 118 | 119 | 120 | 121 | def main(): 122 | camera = FishEyeCameraEquisolid(focal_length=9, sensor_size=32, img_size=(1280, 1024)) 123 | point = np.array([[660, 120], [660, 420], ]) 124 | depth = np.array([10, 100]) 125 | point3d = camera.camera2world(point, depth) 126 | print(point3d) 127 | point = torch.asarray([[660, 120], [660, 420], ]) 128 | depth = torch.asarray([10, 100]) 129 | point3d = camera.camera2world_pytorch(point, depth) 130 | print(point3d) 131 | # point3d = torch.from_numpy(point3d).cuda() 132 | # point3d.requires_grad = True 133 | print(camera.world2camera_pytorch(point3d)) 134 | 135 | if __name__ == '__main__': 136 | main() -------------------------------------------------------------------------------- /utils/fisheye/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/utils/fisheye/__init__.py -------------------------------------------------------------------------------- /utils/fisheye/fisheye.calibration.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "egosyn", 3 | "sensor": [1, 1], 4 | "size": [1280, 1024], 5 | "animated": 0, 6 | "intrinsic": [ 7 | [500, 0, 6.597087109684564E+02, 0], 8 | [0, 500, 5.300556618148025E+02, 0], 9 | [0, 0, 1, 0], 10 | [0, 0, 0, 1] 11 | ], 12 | "imageCircleRadius": 5.120000000000000E+02, 13 | "polynomialC2W": [-2.924126419694919E+02, 0.000000000000000E+00, 1.075613595858202E-03, 2.072664555244253E-07, 14 | 4.493499097653669E-10, -1.192028310212584E-15, -1.822337421183959E-17], 15 | "polynomialW2C": [4.785893205484341E+02, 3.503715828980770E+02, 7.900065565120241E+01, 6.228794005673283E+01, 16 | 3.264466851189552E+01, 1.568380500967838E+01, 7.766879336977007E+00, 2.190791369989537E+00, 17 | -1.084229689289942E-01, -1.903842667463734E-01, -2.776267870029922E-02], 18 | "affine": [9.997488697329212E-01, -2.240239372797548E-04, 3.318272123957599E-04], 19 | "extrinsic": [ 20 | [1, 0, 0, 0], 21 | [0, 1, 0, 0], 22 | [0, 0, 1, 0], 23 | [0, 0, 0, 1] 24 | ], 25 | "radial":0 26 | } 27 | 28 | -------------------------------------------------------------------------------- /utils/fisheye/fisheye.calibration_05_08.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "egosyn", 3 | "sensor": [1, 1], 4 | "size": [1280, 1024], 5 | "animated": 0, 6 | "intrinsic": [ 7 | [500, 0, 614.93216999, 0], 8 | [0, 500, 527.53465649, 0], 9 | [0, 0, 1, 0], 10 | [0, 0, 0, 1] 11 | ], 12 | "imageCircleRadius": 5.120000000000000E+02, 13 | "polynomialC2W": [-2.924126419694919E+02, 0.000000000000000E+00, 1.075613595858202E-03, 2.072664555244253E-07, 14 | 4.493499097653669E-10, -1.192028310212584E-15, -1.822337421183959E-17], 15 | "polynomialW2C": [4.785893205484341E+02, 3.503715828980770E+02, 7.900065565120241E+01, 6.228794005673283E+01, 16 | 3.264466851189552E+01, 1.568380500967838E+01, 7.766879336977007E+00, 2.190791369989537E+00, 17 | -1.084229689289942E-01, -1.903842667463734E-01, -2.776267870029922E-02], 18 | "affine": [9.997488697329212E-01, -2.240239372797548E-04, 3.318272123957599E-04], 19 | "extrinsic": [ 20 | [1, 0, 0, 0], 21 | [0, 1, 0, 0], 22 | [0, 0, 1, 0], 23 | [0, 0, 0, 1] 24 | ], 25 | "radial":0 26 | } 27 | 28 | -------------------------------------------------------------------------------- /utils/fisheye/mean3D.mat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/utils/fisheye/mean3D.mat -------------------------------------------------------------------------------- /utils/get_predict.py: -------------------------------------------------------------------------------- 1 | import math 2 | 3 | import numpy as np 4 | 5 | 6 | def get_max_preds(batch_heatmaps): 7 | ''' 8 | get predictions from score maps 9 | heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) 10 | ''' 11 | assert isinstance(batch_heatmaps, np.ndarray), \ 12 | 'batch_heatmaps should be numpy.ndarray' 13 | assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim' 14 | 15 | batch_size = batch_heatmaps.shape[0] 16 | num_joints = batch_heatmaps.shape[1] 17 | width = batch_heatmaps.shape[3] 18 | heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1)) 19 | idx = np.argmax(heatmaps_reshaped, 2) 20 | maxvals = np.amax(heatmaps_reshaped, 2) 21 | 22 | maxvals = maxvals.reshape((batch_size, num_joints, 1)) 23 | idx = idx.reshape((batch_size, num_joints, 1)) 24 | 25 | preds = np.tile(idx, (1, 1, 2)).astype(np.float32) 26 | 27 | preds[:, :, 0] = (preds[:, :, 0]) % width 28 | preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) 29 | 30 | pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) 31 | pred_mask = pred_mask.astype(np.float32) 32 | 33 | preds *= pred_mask 34 | return preds, maxvals 35 | -------------------------------------------------------------------------------- /utils/img.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | from PIL import Image 4 | 5 | import torch 6 | 7 | IMAGENET_MEAN, IMAGENET_STD = np.array([0.485, 0.456, 0.406]), np.array([0.229, 0.224, 0.225]) 8 | 9 | 10 | def crop_image(image, bbox): 11 | """Crops area from image specified as bbox. Always returns area of size as bbox filling missing parts with zeros 12 | Args: 13 | image numpy array of shape (height, width, 3): input image 14 | bbox tuple of size 4: input bbox (left, upper, right, lower) 15 | 16 | Returns: 17 | cropped_image numpy array of shape (height, width, 3): resulting cropped image 18 | 19 | """ 20 | 21 | image_pil = Image.fromarray(image) 22 | image_pil = image_pil.crop(bbox) 23 | 24 | return np.asarray(image_pil) 25 | 26 | 27 | def resize_image(image, shape): 28 | return cv2.resize(image, (shape[1], shape[0]), interpolation=cv2.INTER_AREA) 29 | 30 | 31 | def get_square_bbox(bbox): 32 | """Makes square bbox from any bbox by stretching of minimal length side 33 | 34 | Args: 35 | bbox tuple of size 4: input bbox (left, upper, right, lower) 36 | 37 | Returns: 38 | bbox: tuple of size 4: resulting square bbox (left, upper, right, lower) 39 | """ 40 | 41 | left, upper, right, lower = bbox 42 | width, height = right - left, lower - upper 43 | 44 | if width > height: 45 | y_center = (upper + lower) // 2 46 | upper = y_center - width // 2 47 | lower = upper + width 48 | else: 49 | x_center = (left + right) // 2 50 | left = x_center - height // 2 51 | right = left + height 52 | 53 | return left, upper, right, lower 54 | 55 | 56 | def scale_bbox(bbox, scale): 57 | left, upper, right, lower = bbox 58 | width, height = right - left, lower - upper 59 | 60 | x_center, y_center = (right + left) // 2, (lower + upper) // 2 61 | new_width, new_height = int(scale * width), int(scale * height) 62 | 63 | new_left = x_center - new_width // 2 64 | new_right = new_left + new_width 65 | 66 | new_upper = y_center - new_height // 2 67 | new_lower = new_upper + new_height 68 | 69 | return new_left, new_upper, new_right, new_lower 70 | 71 | 72 | def to_numpy(tensor): 73 | if torch.is_tensor(tensor): 74 | return tensor.cpu().detach().numpy() 75 | elif type(tensor).__module__ != 'numpy': 76 | raise ValueError("Cannot convert {} to numpy array" 77 | .format(type(tensor))) 78 | return tensor 79 | 80 | 81 | def to_torch(ndarray): 82 | if type(ndarray).__module__ == 'numpy': 83 | return torch.from_numpy(ndarray) 84 | elif not torch.is_tensor(ndarray): 85 | raise ValueError("Cannot convert {} to torch tensor" 86 | .format(type(ndarray))) 87 | return ndarray 88 | 89 | 90 | def image_batch_to_numpy(image_batch): 91 | image_batch = to_numpy(image_batch) 92 | image_batch = np.transpose(image_batch, (0, 2, 3, 1)) # BxCxHxW -> BxHxWxC 93 | return image_batch 94 | 95 | 96 | def image_batch_to_torch(image_batch): 97 | image_batch = np.transpose(image_batch, (0, 3, 1, 2)) # BxHxWxC -> BxCxHxW 98 | image_batch = to_torch(image_batch).float() 99 | return image_batch 100 | 101 | 102 | def normalize_image(image): 103 | """Normalizes image using ImageNet mean and std 104 | 105 | Args: 106 | image numpy array of shape (h, w, 3): image 107 | 108 | Returns normalized_image numpy array of shape (h, w, 3): normalized image 109 | """ 110 | return (image / 255.0 - IMAGENET_MEAN) / IMAGENET_STD 111 | 112 | 113 | def denormalize_image(image): 114 | """Reverse to normalize_image() function""" 115 | return np.clip(255.0 * (image * IMAGENET_STD + IMAGENET_MEAN), 0, 255) 116 | -------------------------------------------------------------------------------- /utils/misc.py: -------------------------------------------------------------------------------- 1 | import os 2 | import yaml 3 | import json 4 | import re 5 | 6 | import torch 7 | 8 | 9 | def config_to_str(config): 10 | return yaml.dump(yaml.safe_load(json.dumps(config))) # fuck yeah 11 | 12 | 13 | class AverageMeter(object): 14 | """Computes and stores the average and current value""" 15 | def __init__(self): 16 | self.reset() 17 | 18 | def reset(self): 19 | self.val = 0 20 | self.avg = 0 21 | self.sum = 0 22 | self.count = 0 23 | 24 | def update(self, val, n=1): 25 | self.val = val 26 | self.sum += val * n 27 | self.count += n 28 | self.avg = self.sum / self.count 29 | 30 | 31 | def calc_gradient_norm(named_parameters): 32 | total_norm = 0.0 33 | for name, p in named_parameters: 34 | # print(name) 35 | param_norm = p.grad.data.norm(2) 36 | total_norm += param_norm.item() ** 2 37 | 38 | total_norm = total_norm ** (1. / 2) 39 | 40 | return total_norm 41 | -------------------------------------------------------------------------------- /utils/multiview.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated 4 | 5 | 6 | class Camera: 7 | def __init__(self, R, t, K, dist=None, name=""): 8 | self.R = np.array(R).copy() 9 | assert self.R.shape == (3, 3) 10 | 11 | self.t = np.array(t).copy() 12 | assert self.t.size == 3 13 | self.t = self.t.reshape(3, 1) 14 | 15 | self.K = np.array(K).copy() 16 | assert self.K.shape == (3, 3) 17 | 18 | self.dist = dist 19 | if self.dist is not None: 20 | self.dist = np.array(self.dist).copy().flatten() 21 | 22 | self.name = name 23 | 24 | def update_after_crop(self, bbox): 25 | left, upper, right, lower = bbox 26 | 27 | cx, cy = self.K[0, 2], self.K[1, 2] 28 | 29 | new_cx = cx - left 30 | new_cy = cy - upper 31 | 32 | self.K[0, 2], self.K[1, 2] = new_cx, new_cy 33 | 34 | def update_after_resize(self, image_shape, new_image_shape): 35 | height, width = image_shape 36 | new_height, new_width = new_image_shape 37 | 38 | fx, fy, cx, cy = self.K[0, 0], self.K[1, 1], self.K[0, 2], self.K[1, 2] 39 | 40 | new_fx = fx * (new_width / width) 41 | new_fy = fy * (new_height / height) 42 | new_cx = cx * (new_width / width) 43 | new_cy = cy * (new_height / height) 44 | 45 | self.K[0, 0], self.K[1, 1], self.K[0, 2], self.K[1, 2] = new_fx, new_fy, new_cx, new_cy 46 | 47 | @property 48 | def projection(self): 49 | return self.K.dot(self.extrinsics) 50 | 51 | @property 52 | def extrinsics(self): 53 | return np.hstack([self.R, self.t]) 54 | 55 | 56 | def euclidean_to_homogeneous(points): 57 | """Converts euclidean points to homogeneous 58 | 59 | Args: 60 | points numpy array or torch tensor of shape (N, M): N euclidean points of dimension M 61 | 62 | Returns: 63 | numpy array or torch tensor of shape (N, M + 1): homogeneous points 64 | """ 65 | if isinstance(points, np.ndarray): 66 | return np.hstack([points, np.ones((len(points), 1))]) 67 | elif torch.is_tensor(points): 68 | return torch.cat([points, torch.ones((points.shape[0], 1), dtype=points.dtype, device=points.device)], dim=1) 69 | else: 70 | raise TypeError("Works only with numpy arrays and PyTorch tensors.") 71 | 72 | 73 | def homogeneous_to_euclidean(points): 74 | """Converts homogeneous points to euclidean 75 | 76 | Args: 77 | points numpy array or torch tensor of shape (N, M + 1): N homogeneous points of dimension M 78 | 79 | Returns: 80 | numpy array or torch tensor of shape (N, M): euclidean points 81 | """ 82 | if isinstance(points, np.ndarray): 83 | return (points.T[:-1] / points.T[-1]).T 84 | elif torch.is_tensor(points): 85 | return (points.transpose(1, 0)[:-1] / points.transpose(1, 0)[-1]).transpose(1, 0) 86 | else: 87 | raise TypeError("Works only with numpy arrays and PyTorch tensors.") 88 | 89 | 90 | def project_3d_points_to_image_plane_without_distortion(proj_matrix, points_3d, convert_back_to_euclidean=True): 91 | """Project 3D points to image plane not taking into account distortion 92 | Args: 93 | proj_matrix numpy array or torch tensor of shape (3, 4): projection matrix 94 | points_3d numpy array or torch tensor of shape (N, 3): 3D points 95 | convert_back_to_euclidean bool: if True, then resulting points will be converted to euclidean coordinates 96 | NOTE: division by zero can be here if z = 0 97 | Returns: 98 | numpy array or torch tensor of shape (N, 2): 3D points projected to image plane 99 | """ 100 | if isinstance(proj_matrix, np.ndarray) and isinstance(points_3d, np.ndarray): 101 | result = euclidean_to_homogeneous(points_3d) @ proj_matrix.T 102 | if convert_back_to_euclidean: 103 | result = homogeneous_to_euclidean(result) 104 | return result 105 | elif torch.is_tensor(proj_matrix) and torch.is_tensor(points_3d): 106 | result = euclidean_to_homogeneous(points_3d) @ proj_matrix.t() 107 | if convert_back_to_euclidean: 108 | result = homogeneous_to_euclidean(result) 109 | return result 110 | else: 111 | raise TypeError("Works only with numpy arrays and PyTorch tensors.") 112 | 113 | 114 | def project_3d_points_to_image_fisheye_camera(fisheye_camera_model: FishEyeCameraCalibrated, 115 | points_3d): 116 | """Project 3D points to image plane 117 | Args: 118 | fisheye camera model: model of fisheye camera 119 | points_3d numpy array or torch tensor of shape (N, 3): 3D points 120 | convert_back_to_euclidean bool: if True, then resulting points will be converted to euclidean coordinates 121 | NOTE: division by zero can be here if z = 0 122 | Returns: 123 | numpy array or torch tensor of shape (N, 2): 3D points projected to image plane 124 | """ 125 | if isinstance(points_3d, np.ndarray): 126 | result = fisheye_camera_model.world2camera(points_3d) 127 | return result 128 | elif torch.is_tensor(points_3d): 129 | result = fisheye_camera_model.world2camera_pytorch(points_3d) 130 | return result 131 | else: 132 | raise TypeError("Works only with numpy arrays and PyTorch tensors.") 133 | 134 | 135 | 136 | def calc_reprojection_error_matrix(keypoints_3d, keypoints_2d_list, proj_matricies): 137 | reprojection_error_matrix = [] 138 | for keypoints_2d, proj_matrix in zip(keypoints_2d_list, proj_matricies): 139 | keypoints_2d_projected = project_3d_points_to_image_plane_without_distortion(proj_matrix, keypoints_3d) 140 | reprojection_error = 1 / 2 * np.sqrt(np.sum((keypoints_2d - keypoints_2d_projected) ** 2, axis=1)) 141 | reprojection_error_matrix.append(reprojection_error) 142 | 143 | return np.vstack(reprojection_error_matrix).T 144 | -------------------------------------------------------------------------------- /utils/op.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | import torch 4 | import torch.nn as nn 5 | import torch.nn.functional as F 6 | 7 | from utils import multiview 8 | 9 | 10 | def integrate_tensor_2d(heatmaps, softmax=True): 11 | """Applies softmax to heatmaps and integrates them to get their's "center of masses" 12 | 13 | Args: 14 | heatmaps torch tensor of shape (batch_size, n_heatmaps, h, w): input heatmaps 15 | 16 | Returns: 17 | coordinates torch tensor of shape (batch_size, n_heatmaps, 2): coordinates of center of masses of all heatmaps 18 | 19 | """ 20 | batch_size, n_heatmaps, h, w = heatmaps.shape 21 | 22 | heatmaps = heatmaps.reshape((batch_size, n_heatmaps, -1)) 23 | if softmax: 24 | heatmaps = nn.functional.softmax(heatmaps, dim=2) 25 | else: 26 | heatmaps = nn.functional.relu(heatmaps) 27 | 28 | heatmaps = heatmaps.reshape((batch_size, n_heatmaps, h, w)) 29 | 30 | mass_x = heatmaps.sum(dim=2) 31 | mass_y = heatmaps.sum(dim=3) 32 | 33 | mass_times_coord_x = mass_x * torch.arange(w).type(torch.float).to(mass_x.device) 34 | mass_times_coord_y = mass_y * torch.arange(h).type(torch.float).to(mass_y.device) 35 | 36 | x = mass_times_coord_x.sum(dim=2, keepdim=True) 37 | y = mass_times_coord_y.sum(dim=2, keepdim=True) 38 | 39 | if not softmax: 40 | x = x / mass_x.sum(dim=2, keepdim=True) 41 | y = y / mass_y.sum(dim=2, keepdim=True) 42 | 43 | coordinates = torch.cat((x, y), dim=2) 44 | coordinates = coordinates.reshape((batch_size, n_heatmaps, 2)) 45 | 46 | return coordinates, heatmaps 47 | 48 | 49 | def integrate_tensor_3d(volumes, softmax=True): 50 | batch_size, n_volumes, x_size, y_size, z_size = volumes.shape 51 | 52 | volumes = volumes.reshape((batch_size, n_volumes, -1)) 53 | if softmax: 54 | volumes = nn.functional.softmax(volumes, dim=2) 55 | else: 56 | volumes = nn.functional.relu(volumes) 57 | 58 | volumes = volumes.reshape((batch_size, n_volumes, x_size, y_size, z_size)) 59 | 60 | mass_x = volumes.sum(dim=3).sum(dim=3) 61 | mass_y = volumes.sum(dim=2).sum(dim=3) 62 | mass_z = volumes.sum(dim=2).sum(dim=2) 63 | 64 | mass_times_coord_x = mass_x * torch.arange(x_size).type(torch.float).to(mass_x.device) 65 | mass_times_coord_y = mass_y * torch.arange(y_size).type(torch.float).to(mass_y.device) 66 | mass_times_coord_z = mass_z * torch.arange(z_size).type(torch.float).to(mass_z.device) 67 | 68 | x = mass_times_coord_x.sum(dim=2, keepdim=True) 69 | y = mass_times_coord_y.sum(dim=2, keepdim=True) 70 | z = mass_times_coord_z.sum(dim=2, keepdim=True) 71 | 72 | if not softmax: 73 | x = x / mass_x.sum(dim=2, keepdim=True) 74 | y = y / mass_y.sum(dim=2, keepdim=True) 75 | z = z / mass_z.sum(dim=2, keepdim=True) 76 | 77 | coordinates = torch.cat((x, y, z), dim=2) 78 | coordinates = coordinates.reshape((batch_size, n_volumes, 3)) 79 | 80 | return coordinates, volumes 81 | 82 | 83 | def integrate_tensor_3d_with_coordinates(volumes, coord_volumes, softmax=True): 84 | 85 | batch_size, n_volumes, x_size, y_size, z_size = volumes.shape 86 | volumes = volumes.reshape((batch_size, n_volumes, -1)) 87 | if softmax: 88 | volumes = nn.functional.softmax(volumes, dim=2) 89 | else: 90 | # need to be normalized 91 | volumes = nn.functional.relu(volumes) 92 | 93 | volumes = volumes.reshape((batch_size, n_volumes, x_size, y_size, z_size)) 94 | coordinates = torch.einsum("bnxyz, bxyzc -> bnc", volumes, coord_volumes) 95 | 96 | return coordinates, volumes 97 | 98 | def get_projected_2d_points_with_coord_volumes(fisheye_model, coord_volume): 99 | """ 100 | :param fisheye_model: 101 | :param coord_volumes: no batch dimension 102 | :return: 103 | """ 104 | # Note: coord volumes are the same among all of the batches, so we only need to 105 | # get the coord volume for one batch and copy it to others 106 | 107 | device = coord_volume.device 108 | volume_shape = coord_volume.shape # x_len, y_len, z_len 109 | 110 | grid_coord = coord_volume.reshape((-1, 3)) 111 | 112 | ####note: precalculated reprojected points! 113 | grid_coord_proj = multiview.project_3d_points_to_image_fisheye_camera( 114 | fisheye_model, grid_coord 115 | ) 116 | return grid_coord_proj 117 | 118 | 119 | def get_distance_with_coord_volumes(coord_volume): 120 | """ 121 | :param fisheye_model: 122 | :param coord_volumes: no batch dimension 123 | :return: 124 | """ 125 | # Note: coord volumes are the same among all of the batches, so we only need to 126 | # get the coord volume for one batch and copy it to others 127 | 128 | grid_coord = coord_volume.reshape((-1, 3)) 129 | 130 | ####note: precalculated distance! 131 | grid_coord_distance = torch.norm(grid_coord, dim=-1) 132 | return grid_coord_distance 133 | 134 | 135 | def unproject_heatmaps_one_view(heatmaps, grid_coord_proj, volume_size): 136 | 137 | ''' 138 | project the heatmap based on the camera parameters of egocentric fisheye camera 139 | :param heatmaps: 140 | :param fisheye_model: fisheye camera model 141 | :param coord_volumes: shape: batch_size, n_joints, x_len, y_len, z_len 142 | :return: 143 | ''' 144 | # Note: the coord volume is the same for all images, thus we can calculate it in advance. 145 | # We do not need to calculate 146 | # it within the iteration 147 | device = heatmaps.device 148 | batch_size, n_joints, heatmap_shape = heatmaps.shape[0], heatmaps.shape[1], tuple(heatmaps.shape[2:]) 149 | volume_shape = (volume_size, volume_size, volume_size) 150 | 151 | volume_batch = torch.zeros(batch_size, n_joints, *volume_shape, device=device) 152 | 153 | # TODO: speed up this this loop 154 | for batch_i in range(batch_size): 155 | heatmap = heatmaps[batch_i] 156 | heatmap = heatmap.unsqueeze(0) 157 | 158 | # transform to [-1.0, 1.0] range 159 | # note: in grid_coord_proj, the format is like (x, y), however, 160 | # note: when we sample the points, we need (y, x) 161 | grid_coord_proj_transformed = torch.zeros_like(grid_coord_proj) 162 | grid_coord_proj_transformed[:, 0] = 2 * (grid_coord_proj[:, 0] / heatmap_shape[1] - 0.5) 163 | grid_coord_proj_transformed[:, 1] = 2 * (grid_coord_proj[:, 1] / heatmap_shape[0] - 0.5) 164 | 165 | # prepare to F.grid_sample 166 | grid_coord_proj_transformed = grid_coord_proj_transformed.unsqueeze(1).unsqueeze(0) 167 | 168 | current_volume = F.grid_sample(heatmap, grid_coord_proj_transformed, align_corners=True) 169 | 170 | # reshape back to volume 171 | current_volume = current_volume.view(n_joints, *volume_shape) 172 | 173 | volume_batch[batch_i] = current_volume 174 | 175 | return volume_batch 176 | 177 | def get_grid_coord_proj_batch(grid_coord_proj, batch_size, heatmap_shape): 178 | grid_coord_proj_transformed = torch.zeros_like(grid_coord_proj) 179 | grid_coord_proj_transformed[:, 0] = 2 * (grid_coord_proj[:, 0] / heatmap_shape[1] - 0.5) 180 | grid_coord_proj_transformed[:, 1] = 2 * (grid_coord_proj[:, 1] / heatmap_shape[0] - 0.5) 181 | grid_coord_proj_transformed = grid_coord_proj_transformed.unsqueeze(1).unsqueeze(0) 182 | grid_coord_proj_transformed_batch = grid_coord_proj_transformed.expand(batch_size, -1, -1, -1) 183 | 184 | return grid_coord_proj_transformed_batch 185 | 186 | 187 | def get_grid_coord_distance_batch(grid_coord_distance, batch_size, joint_num=15): 188 | grid_coord_distance = grid_coord_distance.unsqueeze(1).unsqueeze(0) 189 | grid_coord_distance_batch = grid_coord_distance.expand(batch_size, joint_num, -1, -1) 190 | 191 | return grid_coord_distance_batch 192 | 193 | 194 | def unproject_heatmaps_one_view_batch(heatmaps, grid_coord_proj_transformed_batch, volume_size): 195 | 196 | ''' 197 | project the heatmap based on the camera parameters of egocentric fisheye camera 198 | :param heatmaps: 199 | :param fisheye_model: fisheye camera model 200 | :param coord_volumes: shape: batch_size, n_joints, x_len, y_len, z_len 201 | :return: 202 | ''' 203 | # Note: the coord volume is the same for all images, thus we can calculate it in advance. 204 | # We do not need to calculate 205 | # it within the iteration 206 | batch_size, n_joints, heatmap_shape = heatmaps.shape[0], heatmaps.shape[1], tuple(heatmaps.shape[2:]) 207 | volume_shape = (volume_size, volume_size, volume_size) 208 | 209 | current_volume = F.grid_sample(heatmaps, grid_coord_proj_transformed_batch, align_corners=True) 210 | 211 | # reshape back to volume 212 | volume_batch = current_volume.view(batch_size, n_joints, *volume_shape) 213 | 214 | return volume_batch 215 | 216 | def gaussian_2d_pdf(coords, means, sigmas, normalize=True): 217 | normalization = 1.0 218 | if normalize: 219 | normalization = (2 * np.pi * sigmas[:, 0] * sigmas[:, 0]) 220 | 221 | exp = torch.exp(-((coords[:, 0] - means[:, 0]) ** 2 / sigmas[:, 0] ** 2 + (coords[:, 1] - means[:, 1]) ** 2 / sigmas[:, 1] ** 2) / 2) 222 | return exp / normalization 223 | 224 | 225 | def render_points_as_2d_gaussians(points, sigmas, image_shape, normalize=True): 226 | device = points.device 227 | n_points = points.shape[0] 228 | 229 | yy, xx = torch.meshgrid(torch.arange(image_shape[0]).to(device), torch.arange(image_shape[1]).to(device)) 230 | grid = torch.stack([xx, yy], dim=-1).type(torch.float32) 231 | grid = grid.unsqueeze(0).repeat(n_points, 1, 1, 1) # (n_points, h, w, 2) 232 | grid = grid.reshape((-1, 2)) 233 | 234 | points = points.unsqueeze(1).unsqueeze(1).repeat(1, image_shape[0], image_shape[1], 1) 235 | points = points.reshape(-1, 2) 236 | 237 | sigmas = sigmas.unsqueeze(1).unsqueeze(1).repeat(1, image_shape[0], image_shape[1], 1) 238 | sigmas = sigmas.reshape(-1, 2) 239 | 240 | images = gaussian_2d_pdf(grid, points, sigmas, normalize=normalize) 241 | images = images.reshape(n_points, *image_shape) 242 | 243 | return images 244 | -------------------------------------------------------------------------------- /utils/pose_visualization_utils.py: -------------------------------------------------------------------------------- 1 | 2 | import open3d 3 | import numpy as np 4 | from scipy.spatial.transform import Rotation 5 | 6 | def get_sphere(position, radius=1.0, color=(0.1, 0.1, 0.7)): 7 | mesh_sphere: open3d.geometry.TriangleMesh = open3d.geometry.TriangleMesh.create_sphere(radius=radius) 8 | mesh_sphere.paint_uniform_color(color) 9 | 10 | # translate to position 11 | mesh_sphere = mesh_sphere.translate(position, relative=False) 12 | return mesh_sphere 13 | 14 | def rotation_matrix_from_vectors(vec1, vec2): 15 | """ Find the rotation matrix that aligns vec1 to vec2 16 | :param vec1: A 3d "source" vector 17 | :param vec2: A 3d "destination" vector 18 | :return mat: A transform matrix (3x3) which when applied to vec1, aligns it with vec2. 19 | """ 20 | a, b = (vec1 / np.linalg.norm(vec1)).reshape(3), (vec2 / np.linalg.norm(vec2)).reshape(3) 21 | v = np.cross(a, b) 22 | c = np.dot(a, b) 23 | s = np.linalg.norm(v) 24 | if np.abs(s) < 1e-6: 25 | rotation_matrix = np.eye(3) 26 | else: 27 | kmat = np.array([[0, -v[2], v[1]], [v[2], 0, -v[0]], [-v[1], v[0], 0]]) 28 | rotation_matrix = np.eye(3) + kmat + kmat.dot(kmat) * ((1 - c) / (s ** 2)) 29 | return rotation_matrix 30 | 31 | def get_cylinder(start_point, end_point, radius=0.3, color=(0.1, 0.9, 0.1)): 32 | center = (start_point + end_point) / 2 33 | height = np.linalg.norm(start_point - end_point) 34 | mesh_cylinder: open3d.geometry.TriangleMesh = open3d.geometry.TriangleMesh.create_cylinder(radius=radius, height=height) 35 | mesh_cylinder.paint_uniform_color(color) 36 | 37 | # translate and rotate to position 38 | # rotate vector 39 | rot_vec = end_point - start_point 40 | rot_vec = rot_vec / np.linalg.norm(rot_vec) 41 | rot_0 = np.array([0, 0, 1]) 42 | rot_mat = rotation_matrix_from_vectors(rot_0, rot_vec) 43 | # if open3d.__version__ >= '0.9.0.0': 44 | # rotation_param = rot_mat 45 | # else: 46 | # rotation_param = Rotation.from_matrix(rot_mat).as_euler('xyz') 47 | rotation_param = rot_mat 48 | mesh_cylinder = mesh_cylinder.rotate(rotation_param) 49 | mesh_cylinder = mesh_cylinder.translate(center, relative=False) 50 | return mesh_cylinder 51 | 52 | if __name__ == '__main__': 53 | point1 = np.array([-1, 11, 8]) 54 | point2 = np.array([12, -1, 5]) 55 | sphere1 = get_sphere(position=point1, radius=0.1) 56 | sphere2 = get_sphere(position=point2, radius=0.1) 57 | cylinder = get_cylinder(start_point=point1, end_point=point2, radius=0.02) 58 | 59 | mesh_frame = open3d.geometry.TriangleMesh.create_coordinate_frame(size=0.5) 60 | 61 | open3d.visualization.draw_geometries( 62 | [sphere1, sphere2, cylinder, mesh_frame]) -------------------------------------------------------------------------------- /utils/rigid_transform_with_scale.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import numpy.linalg 3 | import random 4 | import torch 5 | 6 | # Relevant links: 7 | # - http://stackoverflow.com/a/32244818/263061 (solution with scale) 8 | # - "Least-Squares Rigid Motion Using SVD" (no scale but easy proofs and explains how weights could be added) 9 | 10 | 11 | # Rigidly (+scale) aligns two point clouds with know point-to-point correspondences 12 | # with least-squares error. 13 | # Returns (scale factor c, rotation matrix R, translation vector t) such that 14 | # Q = P*cR + t 15 | # if they align perfectly, or such that 16 | # SUM over point i ( | P_i*cR + t - Q_i |^2 ) 17 | # is minimised if they don't align perfectly. 18 | def umeyama(P, Q): 19 | assert P.shape == Q.shape 20 | n, dim = P.shape 21 | 22 | centeredP = P - P.mean(axis=0) 23 | centeredQ = Q - Q.mean(axis=0) 24 | 25 | C = np.dot(np.transpose(centeredP), centeredQ) / n 26 | 27 | 28 | 29 | V, S, W = np.linalg.svd(C) 30 | d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0 31 | 32 | if d: 33 | S[-1] = -S[-1] 34 | V[:, -1] = -V[:, -1] 35 | 36 | R = np.dot(V, W) 37 | 38 | varP = np.var(P, axis=0).sum() 39 | c = 1/varP * np.sum(S) # scale factor 40 | 41 | t = Q.mean(axis=0) - P.mean(axis=0).dot(c*R) 42 | 43 | return c, R, t 44 | 45 | def umeyama_pytorch(P, Q): 46 | assert P.shape == Q.shape 47 | n, dim = P.shape 48 | 49 | centeredP = P - torch.mean(P, dim=0) 50 | centeredQ = Q - torch.mean(Q, dim=0) 51 | 52 | C = centeredP.T @ centeredQ / n 53 | 54 | V, S, W = torch.svd(C) 55 | W = W.T 56 | d = (torch.det(V) * torch.det(W)) < 0.0 57 | 58 | if d: 59 | S[-1] = -S[-1] 60 | V[:, -1] = -V[:, -1] 61 | 62 | R = V @ W 63 | 64 | 65 | varP = torch.sum(torch.var(P, dim=0, unbiased=False)) 66 | c = 1 / varP * torch.sum(S) # scale factor 67 | 68 | t = torch.mean(Q, dim=0) - torch.mean(P, dim=0).matmul(c * R) 69 | 70 | return c, R, t 71 | 72 | def umeyama_ransac(P, Q, epsilon=0.2, n_iters=80): 73 | assert P.shape == Q.shape 74 | inliner_set = [] 75 | point_length = P.shape[0] 76 | for i in range(n_iters): 77 | sampled_points = random.sample(list(range(point_length)), 4) 78 | sampled_P = P[sampled_points] 79 | sampled_Q = Q[sampled_points] 80 | c, R, t = umeyama(sampled_P, sampled_Q) 81 | 82 | projected_P = P @ R * c + t 83 | new_inliner_set = [] 84 | for j in range(point_length): 85 | if np.linalg.norm(projected_P[j] - Q[j], ord=2) < epsilon: 86 | new_inliner_set.append(j) 87 | if len(new_inliner_set) > len(inliner_set): 88 | inliner_set = new_inliner_set 89 | 90 | sampled_P = P[inliner_set] 91 | sampled_Q = Q[inliner_set] 92 | c, R, t = umeyama(sampled_P, sampled_Q) 93 | return c, R, t 94 | 95 | def umeyama_dim_2(P, Q): 96 | assert P.shape == Q.shape 97 | n, dim1 = P.shape 98 | 99 | centeredP = P 100 | centeredQ = Q 101 | 102 | C = np.dot(np.transpose(centeredP), centeredQ) / n 103 | 104 | V, S, W = np.linalg.svd(C) 105 | d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0 106 | 107 | if d: 108 | S[-1] = -S[-1] 109 | V[:, -1] = -V[:, -1] 110 | 111 | R = np.dot(V, W) 112 | 113 | varP = np.var(P, axis=0).sum() 114 | c = 1/varP * np.sum(S) # scale factor 115 | 116 | t = Q.mean(axis=0) - P.mean(axis=0).dot(c*R) 117 | 118 | return c, R, t 119 | 120 | 121 | def umeyama(P, Q): 122 | assert P.shape == Q.shape 123 | n, dim = P.shape 124 | 125 | centeredP = P - P.mean(axis=0) 126 | centeredQ = Q - Q.mean(axis=0) 127 | 128 | C = np.dot(np.transpose(centeredP), centeredQ) / n 129 | 130 | 131 | 132 | V, S, W = np.linalg.svd(C) 133 | d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0 134 | 135 | if d: 136 | S[-1] = -S[-1] 137 | V[:, -1] = -V[:, -1] 138 | 139 | R = np.dot(V, W) 140 | 141 | varP = np.var(P, axis=0).sum() 142 | c = 1/varP * np.sum(S) # scale factor 143 | 144 | t = Q.mean(axis=0) - P.mean(axis=0).dot(c*R) 145 | 146 | return c, R, t 147 | 148 | if __name__ == '__main__': 149 | a = np.random.normal(size=(15, 3)) 150 | b = np.random.normal(size=(15, 3)) 151 | 152 | result1 = umeyama(a.copy(), b.copy()) 153 | print(result1) 154 | result2 = umeyama_pytorch(torch.from_numpy(a), torch.from_numpy(b)) 155 | print(result2) -------------------------------------------------------------------------------- /utils/skeleton.py: -------------------------------------------------------------------------------- 1 | # pose visualizer 2 | # 1. read and generate 3D skeleton from heat map and depth 3 | # 2. convert 3D skeleton to skeleton mesh 4 | from utils.fisheye.FishEyeEquisolid import FishEyeCameraEquisolid 5 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated 6 | import numpy as np 7 | import open3d 8 | from utils.pose_visualization_utils import get_cylinder, get_sphere 9 | from scipy.io import loadmat 10 | import cv2 11 | import os 12 | from tqdm import tqdm 13 | from scipy.ndimage.filters import gaussian_filter1d 14 | 15 | 16 | class Skeleton: 17 | heatmap_sequence = ["Neck", "Right_shoulder", "Right_elbow", "Right_wrist", "Left_shoulder", "Left_elbow", 18 | "Left_wrist", "Right_hip", "Right_knee", "Right_ankle", "Right_foot", "Left_hip", 19 | "Left_knee", "Left_ankle", "Left_foot"] 20 | lines = [(0, 1), (0, 4), (1, 2), (2, 3), (4, 5), (5, 6), (1, 7), (4, 11), (7, 8), (8, 9), (9, 10), 21 | (11, 12), (12, 13), (13, 14), (7, 11)] 22 | kinematic_parents = [0, 0, 1, 2, 0, 4, 5, 1, 7, 8, 9, 4, 11, 12, 13] 23 | 24 | def __init__(self, calibration_path): 25 | 26 | self.skeleton = None 27 | self.skeleton_mesh = None 28 | if calibration_path is None: 29 | print('use FishEyeCameraEquisolid') 30 | self.camera = FishEyeCameraEquisolid(focal_length=9, sensor_size=32, img_size=(1280, 1024)) 31 | else: 32 | self.camera = FishEyeCameraCalibrated(calibration_file_path=calibration_path) 33 | 34 | def set_skeleton(self, heatmap, depth, bone_length=None): 35 | heatmap = np.expand_dims(heatmap, axis=0) 36 | preds, _ = self.get_max_preds(heatmap) 37 | pred = preds[0] 38 | 39 | points_3d = self.camera.camera2world(pred, depth) 40 | # print('------------------------') 41 | # print(self.camera.camera2world(np.array([[640, 1000]]), np.array([1]))) 42 | if bone_length is not None: 43 | points_3d = self._skeleton_resize(points_3d, bone_length) 44 | return points_3d 45 | 46 | def get_2d_pose_from_heatmap(self, heatmap): 47 | heatmap = np.expand_dims(heatmap, axis=0) 48 | preds, _ = self.get_max_preds(heatmap) 49 | pred = preds[0] 50 | return pred 51 | 52 | def joints_2_mesh(self, joints_3d, joint_color=(0.1, 0.1, 0.7), bone_color=(0.1, 0.9, 0.1)): 53 | self.skeleton = joints_3d 54 | self.skeleton_to_mesh(joint_color, bone_color) 55 | skeleton_mesh = self.skeleton_mesh 56 | self.skeleton_mesh = None 57 | self.skeleton = None 58 | return skeleton_mesh 59 | 60 | def joint_list_2_mesh_list(self, joints_3d_list): 61 | mesh_list = [] 62 | for joints_3d in joints_3d_list: 63 | mesh_list.append(self.joints_2_mesh(joints_3d)) 64 | return mesh_list 65 | 66 | def get_skeleton_mesh(self): 67 | if self.skeleton_mesh is None: 68 | raise Exception("Skeleton is not prepared.") 69 | else: 70 | return self.skeleton_mesh 71 | 72 | def save_skeleton_mesh(self, out_path): 73 | if self.skeleton_mesh is None: 74 | raise Exception("Skeleton is not prepared.") 75 | else: 76 | open3d.io.write_triangle_mesh(out_path, mesh=self.skeleton_mesh) 77 | 78 | def set_skeleton_from_file(self, heatmap_file, depth_file, bone_length_file=None, to_mesh=True): 79 | # load the average bone length 80 | if bone_length_file is not None: 81 | bone_length_mat = loadmat(bone_length_file) 82 | mean3D = bone_length_mat['mean3D'].T # convert shape to 15 * 3 83 | bones_mean = mean3D - mean3D[self.kinematic_parents, :] 84 | bone_length = np.linalg.norm(bones_mean, axis=1) 85 | else: 86 | bone_length = None 87 | heatmap_mat = loadmat(heatmap_file) 88 | depth_mat = loadmat(depth_file) 89 | depth = depth_mat['depth'][0] 90 | heatmap = heatmap_mat['heatmap'] 91 | heatmap = cv2.resize(heatmap, dsize=(1024, 1024), interpolation=cv2.INTER_NEAREST) 92 | heatmap = np.pad(heatmap, ((0, 0), (128, 128), (0, 0)), 'constant', constant_values=0) 93 | heatmap = heatmap.transpose((2, 0, 1)) 94 | return self.set_skeleton(heatmap, depth, bone_length, to_mesh) 95 | 96 | def skeleton_resize_seq(self, joint_list, bone_length_file): 97 | bone_length_mat = loadmat(bone_length_file) 98 | mean3D = bone_length_mat['mean3D'].T # convert shape to 15 * 3 99 | bones_mean = mean3D - mean3D[self.kinematic_parents, :] 100 | bone_length = np.linalg.norm(bones_mean, axis=1) 101 | 102 | for i in range(len(joint_list)): 103 | joint_list[i] = self._skeleton_resize(joint_list[i], bone_length) 104 | return joint_list 105 | 106 | def skeleton_resize_single(self, joint, bone_length_file): 107 | bone_length_mat = loadmat(bone_length_file) 108 | mean3D = bone_length_mat['mean3D'].T # convert shape to 15 * 3 109 | bones_mean = mean3D - mean3D[self.kinematic_parents, :] 110 | bone_length = np.linalg.norm(bones_mean, axis=1) 111 | 112 | joint = self._skeleton_resize(joint, bone_length) 113 | return joint 114 | 115 | def skeleton_resize_standard_skeleton(self, joint_input, joint_standard): 116 | """ 117 | 118 | :param joint_input: input joint shape: 15 * 3 119 | :param joint_standard: standard joint shape: 15 * 3 120 | :return: 121 | """ 122 | bones_mean = joint_standard - joint_standard[self.kinematic_parents, :] 123 | bone_length = np.linalg.norm(bones_mean, axis=1) * 1000. 124 | 125 | joint = self._skeleton_resize(joint_input, bone_length) 126 | return joint 127 | 128 | def _skeleton_resize(self, points_3d, bone_length): 129 | # resize the skeleton to the normal size (why we should do that?) 130 | estimated_bone_vec = points_3d - points_3d[self.kinematic_parents, :] 131 | estimated_bone_length = np.linalg.norm(estimated_bone_vec, axis=1) 132 | multi = bone_length[1:] / estimated_bone_length[1:] 133 | multi = np.concatenate(([0], multi)) 134 | multi = np.stack([multi] * 3, axis=1) 135 | resized_bones_vec = estimated_bone_vec * multi / 1000 136 | 137 | joints_rescaled = points_3d 138 | for i in range(joints_rescaled.shape[0]): 139 | joints_rescaled[i, :] = joints_rescaled[self.kinematic_parents[i], :] + resized_bones_vec[i, :] 140 | return joints_rescaled 141 | 142 | def render(self): 143 | mesh_frame = open3d.geometry.TriangleMesh.create_coordinate_frame(size=1) 144 | open3d.visualization.draw_geometries([self.skeleton_mesh, mesh_frame]) 145 | 146 | def skeleton_to_mesh(self, joint_color=(0.1, 0.1, 0.7), bone_color=(0.1, 0.9, 0.1)): 147 | final_mesh = open3d.geometry.TriangleMesh() 148 | for i in range(len(self.skeleton)): 149 | keypoint_mesh = get_sphere(position=self.skeleton[i], radius=0.03, color=joint_color) 150 | final_mesh = final_mesh + keypoint_mesh 151 | 152 | for line in self.lines: 153 | line_start_i = line[0] 154 | line_end_i = line[1] 155 | 156 | start_point = self.skeleton[line_start_i] 157 | end_point = self.skeleton[line_end_i] 158 | 159 | line_mesh = get_cylinder(start_point, end_point, radius=0.0075, color=bone_color) 160 | final_mesh += line_mesh 161 | self.skeleton_mesh = final_mesh 162 | return final_mesh 163 | 164 | def smooth(self, pose_sequence, sigma): 165 | """ 166 | gaussian smooth pose 167 | :param pose_sequence_2d: pose sequence, input is a list with every element is 15 * 2 body pose 168 | :param kernel_size: kernel size of guassian smooth 169 | :return: smoothed 2d pose 170 | """ 171 | pose_sequence = np.asarray(pose_sequence) 172 | pose_sequence_result = np.zeros_like(pose_sequence) 173 | keypoint_num = pose_sequence.shape[1] 174 | for i in range(keypoint_num): 175 | pose_sequence_i = pose_sequence[:, i, :] 176 | pose_sequence_filtered = gaussian_filter1d(pose_sequence_i, sigma, axis=0) 177 | pose_sequence_result[:, i, :] = pose_sequence_filtered 178 | return pose_sequence_result 179 | 180 | def get_max_preds(self, batch_heatmaps): 181 | ''' 182 | get predictions from score maps 183 | heatmaps: numpy.ndarray([batch_size, num_joints, height, width]) 184 | ''' 185 | assert isinstance(batch_heatmaps, np.ndarray), \ 186 | 'batch_heatmaps should be numpy.ndarray' 187 | assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim' 188 | 189 | batch_size = batch_heatmaps.shape[0] 190 | num_joints = batch_heatmaps.shape[1] 191 | width = batch_heatmaps.shape[3] 192 | heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1)) 193 | idx = np.argmax(heatmaps_reshaped, 2) 194 | maxvals = np.amax(heatmaps_reshaped, 2) 195 | 196 | maxvals = maxvals.reshape((batch_size, num_joints, 1)) 197 | idx = idx.reshape((batch_size, num_joints, 1)) 198 | 199 | preds = np.tile(idx, (1, 1, 2)).astype(np.float32) 200 | 201 | preds[:, :, 0] = (preds[:, :, 0]) % width 202 | preds[:, :, 1] = np.floor((preds[:, :, 1]) / width) 203 | 204 | pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2)) 205 | pred_mask = pred_mask.astype(np.float32) 206 | 207 | preds *= pred_mask 208 | return preds, maxvals 209 | 210 | 211 | if __name__ == '__main__': 212 | skeleton = Skeleton( 213 | calibration_path='/home/wangjian/Develop/egocentricvisualization/pose/fisheye/fisheye.calibration.json') 214 | data_path = r'/home/wangjian/Develop/egocentricvisualization/data_2' 215 | heatmap_dir = os.path.join(data_path, 'heatmaps') 216 | depth_dir = os.path.join(data_path, 'depths') 217 | out_dir = os.path.join(data_path, 'smooth_skeleton_mesh') 218 | if not os.path.isdir(out_dir): 219 | os.mkdir(out_dir) 220 | skeleon_list = [] 221 | out_path_list = [] 222 | for heatmap_name in tqdm(sorted(os.listdir(heatmap_dir))): 223 | heatmap_path = os.path.join(heatmap_dir, heatmap_name) 224 | mat_id = heatmap_name 225 | depth_path = os.path.join(depth_dir, mat_id) 226 | 227 | skeleton_array = skeleton.set_skeleton_from_file(heatmap_path, 228 | depth_path, 229 | # bone_length_file=r'/home/wangjian/Develop/egocentricvisualization/pose/fisheye/mean3D.mat', 230 | to_mesh=False) 231 | out_path = os.path.join(out_dir, mat_id + ".ply") 232 | skeleon_list.append(skeleton_array) 233 | out_path_list.append(out_path) 234 | 235 | smoothed_skeleton = skeleton.smooth(skeleon_list, sigma=1) 236 | print("saving to ply") 237 | for i in tqdm(range(len(smoothed_skeleton))): 238 | skeleton.skeleton = smoothed_skeleton[i] 239 | skeleton.skeleton_to_mesh() 240 | skeleton.save_skeleton_mesh(out_path_list[i]) 241 | 242 | # skeleton.set_skeleton_from_file(r'X:\Mo2Cap2Plus\static00\Datasets\Mo2Cap2\ego_system_test\sitting\heatmaps\img-04052020001910-937.mat', 243 | # r'X:\Mo2Cap2Plus\static00\Datasets\Mo2Cap2\ego_system_test\sitting\depths\img-04052020001910-937.mat', 244 | # # bone_length_file=r'F:\Develop\EgocentricSystemVisualization\pose\fisheye\mean3D.mat') 245 | # ) 246 | # 247 | # skeleton.render() 248 | # print(skeleton.skeleton) 249 | -------------------------------------------------------------------------------- /utils/volumetric.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import cv2 3 | import torch 4 | 5 | from utils import multiview 6 | from utils.pose_visualization_utils import get_cylinder, get_sphere 7 | import open3d 8 | 9 | 10 | class Point3D: 11 | def __init__(self, point, size=3, color=(0, 0, 255)): 12 | self.point = point 13 | self.size = size 14 | self.color = color 15 | 16 | def render(self, proj_matrix, canvas): 17 | point_2d = multiview.project_3d_points_to_image_plane_without_distortion( 18 | proj_matrix, np.array([self.point]) 19 | )[0] 20 | 21 | point_2d = tuple(map(int, point_2d)) 22 | cv2.circle(canvas, point_2d, self.size, self.color, self.size) 23 | 24 | return canvas 25 | 26 | def render_open3d(self): 27 | point_mesh = get_sphere(self.point, radius=0.02, color=self.color) 28 | return point_mesh 29 | 30 | 31 | 32 | class Line3D: 33 | def __init__(self, start_point, end_point, size=2, color=(0, 0, 255)): 34 | self.start_point, self.end_point = start_point, end_point 35 | self.size = size 36 | self.color = color 37 | 38 | def render(self, proj_matrix, canvas): 39 | start_point_2d, end_point_2d = multiview.project_3d_points_to_image_plane_without_distortion( 40 | proj_matrix, np.array([self.start_point, self.end_point]) 41 | ) 42 | 43 | start_point_2d = tuple(map(int, start_point_2d)) 44 | end_point_2d = tuple(map(int, end_point_2d)) 45 | 46 | cv2.line(canvas, start_point_2d, end_point_2d, self.color, self.size) 47 | 48 | return canvas 49 | 50 | def render_open3d(self): 51 | line_mesh = get_cylinder(self.start_point, self.end_point, radius=0.005) 52 | return line_mesh 53 | 54 | 55 | class Cuboid3D: 56 | def __init__(self, position, sides): 57 | self.position = position 58 | self.sides = sides 59 | 60 | def build(self): 61 | primitives = [] 62 | 63 | line_color = (255, 255, 0) 64 | 65 | start = self.position + np.array([0, 0, 0]) 66 | primitives.append(Line3D(start, start + np.array([self.sides[0], 0, 0]), color=(255, 0, 0))) 67 | primitives.append(Line3D(start, start + np.array([0, self.sides[1], 0]), color=(0, 255, 0))) 68 | primitives.append(Line3D(start, start + np.array([0, 0, self.sides[2]]), color=(0, 0, 255))) 69 | 70 | start = self.position + np.array([self.sides[0], 0, self.sides[2]]) 71 | primitives.append(Line3D(start, start + np.array([-self.sides[0], 0, 0]), color=line_color)) 72 | primitives.append(Line3D(start, start + np.array([0, self.sides[1], 0]), color=line_color)) 73 | primitives.append(Line3D(start, start + np.array([0, 0, -self.sides[2]]), color=line_color)) 74 | 75 | start = self.position + np.array([self.sides[0], self.sides[1], 0]) 76 | primitives.append(Line3D(start, start + np.array([-self.sides[0], 0, 0]), color=line_color)) 77 | primitives.append(Line3D(start, start + np.array([0, -self.sides[1], 0]), color=line_color)) 78 | primitives.append(Line3D(start, start + np.array([0, 0, self.sides[2]]), color=line_color)) 79 | 80 | start = self.position + np.array([0, self.sides[1], self.sides[2]]) 81 | primitives.append(Line3D(start, start + np.array([self.sides[0], 0, 0]), color=line_color)) 82 | primitives.append(Line3D(start, start + np.array([0, -self.sides[1], 0]), color=line_color)) 83 | primitives.append(Line3D(start, start + np.array([0, 0, -self.sides[2]]), color=line_color)) 84 | 85 | return primitives 86 | 87 | def render(self, proj_matrix, canvas): 88 | # TODO: support rotation 89 | 90 | primitives = self.build() 91 | 92 | for primitive in primitives: 93 | canvas = primitive.render(proj_matrix, canvas) 94 | 95 | return canvas 96 | 97 | def render_open3d(self): 98 | primitives = self.build() 99 | 100 | mesh_canvas = [] 101 | 102 | for primitive in primitives: 103 | mesh = primitive.render_open3d() 104 | mesh_canvas.append(mesh) 105 | 106 | return mesh_canvas 107 | 108 | 109 | def get_rotation_matrix(axis, theta): 110 | """Returns the rotation matrix associated with counterclockwise rotation about 111 | the given axis by theta radians. 112 | """ 113 | axis = np.asarray(axis) 114 | axis = axis / np.sqrt(np.dot(axis, axis)) 115 | a = np.cos(theta / 2.0) 116 | b, c, d = -axis * np.sin(theta / 2.0) 117 | aa, bb, cc, dd = a * a, b * b, c * c, d * d 118 | bc, ad, ac, ab, bd, cd = b * c, a * d, a * c, a * b, b * d, c * d 119 | return np.array([[aa + bb - cc - dd, 2 * (bc + ad), 2 * (bd - ac)], 120 | [2 * (bc - ad), aa + cc - bb - dd, 2 * (cd + ab)], 121 | [2 * (bd + ac), 2 * (cd - ab), aa + dd - bb - cc]]) 122 | 123 | 124 | def rotate_coord_volume(coord_volume, theta, axis): 125 | shape = coord_volume.shape 126 | device = coord_volume.device 127 | 128 | rot = get_rotation_matrix(axis, theta) 129 | rot = torch.from_numpy(rot).type(torch.float).to(device) 130 | 131 | coord_volume = coord_volume.view(-1, 3) 132 | coord_volume = rot.mm(coord_volume.t()).t() 133 | 134 | coord_volume = coord_volume.view(*shape) 135 | 136 | return coord_volume 137 | 138 | if __name__ == '__main__': 139 | cuboid3D = Cuboid3D(position=(-1, -1, 0), sides=(2, 2, 2)) 140 | 141 | mesh_list = cuboid3D.render_open3d() 142 | mesh_list.append(open3d.geometry.TriangleMesh.create_coordinate_frame()) 143 | 144 | open3d.visualization.draw_geometries(mesh_list) 145 | -------------------------------------------------------------------------------- /visualize.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | os.environ["OPENCV_IO_ENABLE_OPENEXR"] = "1" 4 | from utils.skeleton import Skeleton 5 | import pickle 6 | import open3d 7 | 8 | from utils.depth2pointcloud import Depth2PointCloud 9 | 10 | 11 | def visualize(img_path, depth_path, pred_pose_path): 12 | skeleton = Skeleton(calibration_path='utils/fisheye/fisheye.calibration_05_08.json') 13 | 14 | with open(pred_pose_path, 'rb') as f: 15 | predicted_pose = pickle.load(f) 16 | 17 | predicted_pose_mesh = skeleton.joints_2_mesh(predicted_pose) 18 | 19 | get_point_cloud = Depth2PointCloud(visualization=False, 20 | camera_model='utils/fisheye/fisheye.calibration_05_08.json') 21 | 22 | scene = get_point_cloud.get_point_cloud_single_image(depth_path, img_path, output_path=None) 23 | 24 | open3d.visualization.draw_geometries([scene, predicted_pose_mesh]) 25 | 26 | def main(): 27 | parser = argparse.ArgumentParser() 28 | parser.add_argument("--img_path", type=str, required=True) 29 | parser.add_argument("--depth_path", type=str, required=True) 30 | parser.add_argument("--pose_path", type=str, required=True) 31 | 32 | args = parser.parse_args() 33 | img_path = args.img_path 34 | depth_path = args.depth_path 35 | pose_path = args.pose_path 36 | 37 | visualize(img_path, depth_path, pose_path) 38 | 39 | 40 | if __name__ == '__main__': 41 | main() --------------------------------------------------------------------------------