├── .gitignore
├── README.md
├── data
    ├── .gitkeep
    └── demo
    │   ├── depths
    │       ├── img_001000.jpg.exr
    │       ├── img_001796.jpg.exr
    │       └── img_002376.jpg.exr
    │   ├── imgs
    │       ├── img_001000.jpg
    │       ├── img_001796.jpg
    │       └── img_002376.jpg
    │   └── out
    │       └── .gitkeep
├── dataset
    ├── __init__.py
    ├── demo_dataset.py
    ├── real_depth_utils.py
    └── test_dataset.py
├── demo.py
├── evaluation
    ├── __init__.py
    └── python_evaluate_3d_our_dataset.py
├── experiments
    └── sceneego
    │   └── test
    │       └── sceneego.yaml
├── models
    ├── .gitkeep
    └── sceneego
    │   └── checkpoints
    │       └── .gitkeep
├── network
    ├── __init__.py
    ├── pose_resnet.py
    ├── v2v.py
    └── voxel_net_depth.py
├── requirements.txt
├── resources
    └── Wang_CVPR_2023.gif
├── test.py
├── utils
    ├── __init__.py
    ├── calculate_errors.py
    ├── cfg.py
    ├── data_transforms.py
    ├── depth2pointcloud.py
    ├── fisheye
    │   ├── FishEyeCalibrated.py
    │   ├── FishEyeEquisolid.py
    │   ├── __init__.py
    │   ├── fisheye.calibration.json
    │   ├── fisheye.calibration_05_08.json
    │   └── mean3D.mat
    ├── get_predict.py
    ├── img.py
    ├── misc.py
    ├── multiview.py
    ├── op.py
    ├── pose_visualization_utils.py
    ├── rigid_transform_with_scale.py
    ├── skeleton.py
    └── volumetric.py
└── visualize.py


/.gitignore:
--------------------------------------------------------------------------------
1 | .idea
2 | 
3 | 
4 | *.tar
5 | 
6 | *.ply
7 | 
8 | *.pkl


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # SceneEgo
  2 | 
  3 | Official implementation of paper: 
  4 | 
  5 | **Scene-aware Egocentric 3D Human Pose Estimation**
  6 | 
  7 | *Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, Christian Theobalt*
  8 | 
  9 | *CVPR 2023*
 10 | 
 11 | [[Project Page](https://people.mpi-inf.mpg.de/~jianwang/projects/sceneego/)]  <---- The dataset link in the project page might have expired.
 12 | 
 13 | [[SceneEgo Datasets (Train and Test)](https://edmond.mpg.de/dataset.xhtml?persistentId=doi:10.17617/3.VCIHDO)] 
 14 | 
 15 | <!-- [[SceneEgo Datasets (Test split)](https://nextcloud.mpi-klsb.mpg.de/index.php/s/q27gwN8tWLMEfrY)] [[SceneEgo Datasets (Train split)](https://nextcloud.mpi-klsb.mpg.de/index.php/s/BsjsMJHBdCxfGt6)] -->
 16 | 
 17 | [[EgoGTA](https://edmond.mpg.de/dataset.xhtml?persistentId=doi:10.17617/3.MYZMVZ)] [[EgoPW-Scene](https://edmond.mpg.de/dataset.xhtml?persistentId=doi:10.17617/3.EAFCFH)]
 18 | 
 19 | ![Demo image](./resources/Wang_CVPR_2023.gif)
 20 | 
 21 | ### Annotation format in Test dataset
 22 | 
 23 | The annotation of the dataset is saved in "annotation.pkl" of each sequence. Load the pickle file with:
 24 | 
 25 | ```python
 26 | with open('annotation.pkl', 'rb') as f:
 27 |     data = pickle.load(f)
 28 | print(data[0].keys())
 29 | ```
 30 | The data is a Python list, each item is a Python dict containing the annotations:
 31 | - ext_id: the annotation id of external multiview mocap system; 
 32 | - calib_board_pose: the 6d pose of the calibration board on the head;
 33 | - ego_pose_gt: the ground truth human body pose under the egocentric camera coordinate system, the joint sequence is: Neck, Right Shoulder, Right Elbow, Right Wrist, Left Shoulder, Left Elbow, Left Wrist, Right Hip, Right Knee, Right Ankle, Right Toe, Left Hip, Left Knee, Left Ankle, Left Toe;
 34 | - ext_pose_gt: the human pose ground truth in the mocap system coordinate;
 35 | - image_name: name of image under directory "imgs";
 36 | - ego_camera_matrix: the 6d pose of the egocentric camera on the head.
 37 | 
 38 | The id of the egocentric camera can also be obtained with the synchronization file with:
 39 | ```python
 40 | 
 41 | with open('syn.json', 'r') as f:
 42 |     syn_data = json.load(f)
 43 | 
 44 | ego_start_frame = syn_data['ego']
 45 | ext_start_frame = syn_data['ext']
 46 | ego_id = ext_id - ext_start_frame + ego_start_frame
 47 | egocentric_image_name = "img_%06d.jpg" % ego_id
 48 | ```
 49 | 
 50 | ### Install
 51 | 
 52 | 1. Create a new anaconda environment
 53 | 
 54 | ```shell
 55 | conda create -n sceneego python=3.9
 56 | 
 57 | conda activate sceneego
 58 | ```
 59 | 
 60 | 2. Install pytorch 1.13.1 from https://pytorch.org/get-started/previous-versions/
 61 | 
 62 | 3. Install other dependencies
 63 | ```shell
 64 | pip install -r requirements.txt
 65 | ```
 66 | ### Run the demo
 67 | 
 68 | 1. Download [pre-trained pose estimation model](https://nextcloud.mpi-klsb.mpg.de/index.php/s/DGB6XKEPwwQbmTi) and put it under ```models/sceneego/checkpoints```
 69 | 
 70 | 2. run:
 71 | ```shell
 72 | python demo.py --config experiments/sceneego/test/sceneego.yaml --img_dir data/demo/imgs --depth_dir data/demo/depths --output_dir data/demo/out --vis True
 73 | ```
 74 | The result will be shown with the open3d visualizer and the predicted pose is saved at ```data/demo/out```.
 75 | 
 76 | 3. The predicted pose is saved as the pkl file (e.g. ```img_001000.jpg.pkl```). To visualize the predicted result, run:
 77 | ```shell
 78 | python visualize.py --img_path data/demo/imgs/img_001000.jpg --depth_path data/demo/depths/img_001000.jpg.exr --pose_path data/demo/out/img_001000.jpg.pkl
 79 | ```
 80 | The result will be shown with the open3d visualizer.
 81 | 
 82 | ### Test on your own dataset
 83 | If you want to test on your own dataset, after obtaining egocentric frames, you need to:
 84 | 
 85 | 1. Run the egocentric human body segmentation network to get the human body segmentation for each frame:
 86 |    
 87 |    See repo: [Egocentric Human Body Segmentation](https://github.com/yt4766269/EgocentricHumanBodySeg)
 88 |    
 89 | 3. Run the depth estimator to get the scene depth map for each frame:
 90 | 
 91 |    See repo: [Egocentric Depth Estimator](https://github.com/yt4766269/EgocentricDepthEstimator)
 92 | 
 93 | 
 94 | ### Citation
 95 | 
 96 | If you find this work or code is helpful in your research, please cite:
 97 | ````
 98 | @inproceedings{wang2023scene,
 99 |   title={Scene-aware Egocentric 3D Human Pose Estimation},
100 |   author={Wang, Jian and Luvizon, Diogo and Xu, Weipeng and Liu, Lingjie and Sarkar, Kripasindhu and Theobalt, Christian},
101 |   booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
102 |   pages={13031--13040},
103 |   year={2023}
104 | }
105 | ````
106 | 
107 | [//]: # (### Test on real-world dataset)
108 | 
109 | [//]: # ()
110 | [//]: # (1. Download [pre-trained pose estimation model]&#40;https://nextcloud.mpi-klsb.mpg.de/index.php/s/DGB6XKEPwwQbmTi&#41; and put it under ```models/sceneego/checkpoints```)
111 | 
112 | [//]: # ()
113 | [//]: # ()
114 | [//]: # (2. Download the test dataset from to ```data/sceneego```)
115 | 
116 | [//]: # ()
117 | [//]: # (3. run:)
118 | 
119 | [//]: # (```shell)
120 | 
121 | [//]: # (python test.py --data_path data/sceneego)
122 | 
123 | [//]: # (```)
124 | 
125 | 
126 | 
127 | 
128 | 
129 | 
130 | 


--------------------------------------------------------------------------------
/data/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/.gitkeep


--------------------------------------------------------------------------------
/data/demo/depths/img_001000.jpg.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/depths/img_001000.jpg.exr


--------------------------------------------------------------------------------
/data/demo/depths/img_001796.jpg.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/depths/img_001796.jpg.exr


--------------------------------------------------------------------------------
/data/demo/depths/img_002376.jpg.exr:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/depths/img_002376.jpg.exr


--------------------------------------------------------------------------------
/data/demo/imgs/img_001000.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/imgs/img_001000.jpg


--------------------------------------------------------------------------------
/data/demo/imgs/img_001796.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/imgs/img_001796.jpg


--------------------------------------------------------------------------------
/data/demo/imgs/img_002376.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/imgs/img_002376.jpg


--------------------------------------------------------------------------------
/data/demo/out/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/data/demo/out/.gitkeep


--------------------------------------------------------------------------------
/dataset/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/dataset/__init__.py


--------------------------------------------------------------------------------
/dataset/demo_dataset.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import os
 3 | import pickle
 4 | import sys
 5 | 
 6 | import cv2
 7 | import numpy as np
 8 | import torch
 9 | from torch.utils.data import Dataset
10 | 
11 | # import utils.data_transforms as transforms
12 | from utils.data_transforms import Normalize, ToTensor
13 | 
14 | from dataset.real_depth_utils import depth_map_to_voxel
15 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated
16 | 
17 | 
18 | class DemoDataset(Dataset):
19 | 
20 |     def calculated_ray_direction(self, image_width, image_height):
21 |         points = np.zeros(shape=(image_width, image_height, 2))
22 |         x_range = np.array(range(image_width))
23 |         y_range = np.array(range(image_height))
24 |         points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose()
25 |         points[:, :, 1] = np.add(points[:, :, 1], y_range)
26 |         points = points.reshape((-1, 2))
27 |         ray = self.camera_model.camera2world_ray(points)
28 |         return ray
29 | 
30 |     def __init__(self, config, img_dir, depth_dir, voxel_output=False, img_mean=(0.485, 0.456, 0.406),
31 |                  img_std=(0.229, 0.224, 0.225)):
32 |         self.img_dir = img_dir
33 |         self.depth_dir = depth_dir
34 |         self.voxel_output = voxel_output
35 | 
36 |         self.normalize = Normalize(mean=img_mean, std=img_std)
37 |         self.to_tensor = ToTensor()
38 | 
39 |         self.img_size = config.image_shape
40 | 
41 |         self.camera_model_path = config.dataset.camera_calibration_path
42 |         self.camera_model = FishEyeCameraCalibrated(calibration_file_path=self.camera_model_path)
43 | 
44 |         self.ray = self.calculated_ray_direction(config.dataset.image_width, config.dataset.image_height)
45 | 
46 |         self.data_list = self.get_input_data(self.img_dir, self.depth_dir)
47 | 
48 |         self.cuboid_side = config.model.cuboid_side
49 |         self.volume_size = config.model.volume_size
50 | 
51 |     def get_input_data(self, img_dir, depth_dir):
52 |         print("start loading test file")
53 |         data_list = []
54 | 
55 |         for img_name in os.listdir(img_dir):
56 |             img_path = os.path.join(img_dir, img_name)
57 |             depth_path = os.path.join(depth_dir, f'{img_name}.exr')
58 |             if not os.path.exists(depth_path):
59 |                 raise Exception(f"The depth map {depth_path} does not exist!")
60 | 
61 |             data_list.append({'img_path': img_path, 'depth_path': depth_path})
62 |         return data_list
63 | 
64 |     def __len__(self):
65 |         return len(self.data_list)
66 | 
67 |     def __getitem__(self, index):
68 |         img_path = self.data_list[index]['img_path']
69 |         depth_path = self.data_list[index]['depth_path']
70 | 
71 | 
72 |         raw_img = cv2.imread(img_path)
73 |         raw_img = raw_img[:, 128: -128, :]
74 | 
75 |         # data augmentation
76 |         img = cv2.resize(raw_img, dsize=(256, 256)) / 255.
77 | 
78 |         img_rgb = img[:, :, ::-1]
79 |         img_rgb = np.ascontiguousarray(img_rgb)
80 | 
81 |         img_torch = self.normalize(img)
82 |         img_torch = self.to_tensor(img_torch)
83 |         img_rgb = self.normalize(img_rgb)
84 |         img_rgb_torch = self.to_tensor(img_rgb)
85 | 
86 |         depth_map = cv2.imread(depth_path, cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)
87 |         if depth_map.shape[0] != 1024 or depth_map.shape[1] != 1280:
88 |             depth_map = cv2.resize(depth_map, (1280, 1024), interpolation=cv2.INTER_NEAREST)
89 |         if len(depth_map.shape) == 3:
90 |             depth_map = depth_map[:, :, 0]
91 |         depth_map[depth_map > 10] = 10
92 | 
93 |         if self.voxel_output is True:
94 |             depth_scene_info = depth_map_to_voxel(self.ray, depth_map, self.cuboid_side, self.volume_size)
95 |         else:
96 |             depth_scene_info = torch.from_numpy(depth_map).float()
97 | 
98 |         return img_torch, img_rgb_torch, depth_scene_info, img_path
99 | 


--------------------------------------------------------------------------------
/dataset/real_depth_utils.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | import numpy as np
 3 | from copy import copy
 4 | import torch
 5 | 
 6 | def calcualate_depth_scale(depth_scale_json_file, log_err=False):
 7 |     with open(depth_scale_json_file, 'r') as f:
 8 |         depth_scale_data_list = json.load(f)
 9 |     # print(depth_scale_data_list)
10 | 
11 |     scale_list = []
12 |     for scale_data in depth_scale_data_list:
13 |         x1 = scale_data['x1']
14 |         x2 = scale_data['x2']
15 | 
16 |         x1 = np.asarray(x1)
17 |         x2 = np.asarray(x2)
18 | 
19 |         distance = np.linalg.norm(x2 - x1)
20 |         # print(distance)
21 |         scale = scale_data['real'] / distance
22 |         scale_list.append(scale)
23 |     if log_err:
24 |         print(scale_list)
25 |         print(np.std(scale_list) / np.average(scale_list))
26 |     # print(np.average(scale_list))
27 |     return np.average(scale_list)
28 | 
29 | def depth_map_to_voxel(ray, depth, cuboid_side, volume_size):
30 |     # directly multiply the depth on the pre-calculated rays
31 |     depth_img_flat = depth.T.reshape((-1))
32 |     point_cloud = ray.T * depth_img_flat
33 |     point_cloud = point_cloud.T
34 | 
35 |     # import open3d
36 |     # point_cloud_vis = open3d.geometry.PointCloud()
37 |     # point_cloud_vis.points = open3d.utility.Vector3dVector(point_cloud)
38 |     # coord = open3d.geometry.TriangleMesh.create_coordinate_frame()
39 |     # open3d.visualization.draw_geometries([point_cloud_vis, coord])
40 | 
41 |     # point cloud to voxel
42 |     voxel_torch = point_cloud_to_voxel_pytorch(point_cloud, cuboid_side, volume_size)
43 |     return voxel_torch
44 | 
45 | def point_cloud_to_voxel_pytorch(point_cloud, cuboid_side, volume_size):
46 |     scene_point_cloud_local = copy(point_cloud)
47 |     scene_point_cloud_local[:, 0] = (scene_point_cloud_local[:,
48 |                                      0] + cuboid_side / 2) * volume_size / cuboid_side
49 |     scene_point_cloud_local[:, 1] = (scene_point_cloud_local[:,
50 |                                      1] + cuboid_side / 2) * volume_size / cuboid_side
51 |     scene_point_cloud_local[:, 2] = (scene_point_cloud_local[:, 2]) * volume_size / cuboid_side
52 | 
53 |     scene_point_cloud_local = np.round_(scene_point_cloud_local)
54 |     good_indices = np.logical_and(volume_size-1 >= scene_point_cloud_local, scene_point_cloud_local >= 0)
55 |     good_indices = np.all(good_indices, axis=1)
56 |     scene_point_cloud_local = scene_point_cloud_local[good_indices]
57 |     # scene_point_cloud_local = np.clip(scene_point_cloud_local, a_min=0, a_max=self.volume_size - 1).astype(np.int)
58 |     voxel_torch = torch.zeros(size=(volume_size, volume_size, volume_size))
59 |     voxel_torch[scene_point_cloud_local.T] = 1
60 |     return voxel_torch
61 | 
62 | if __name__ == '__main__':
63 |     json_file = r'\\winfs-inf\CT\EgoMocap\work\EgoBodyInContext\sfm_test_data\jian3\scale.json'
64 |     result = calcualate_depth_scale(json_file, log_err=True)
65 |     print(result)


--------------------------------------------------------------------------------
/dataset/test_dataset.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | import pickle
  4 | import sys
  5 | 
  6 | import cv2
  7 | import numpy as np
  8 | import torch
  9 | from torch.utils.data import Dataset
 10 | 
 11 | # import utils.data_transforms as transforms
 12 | from utils.data_transforms import Normalize, ToTensor
 13 | 
 14 | from utils.calculate_errors import align_skeleton, calculate_error
 15 | from dataset.real_depth_utils import depth_map_to_voxel
 16 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated
 17 | 
 18 | 
 19 | class TestDataset(Dataset):
 20 | 
 21 |     def calculated_ray_direction(self, image_width, image_height):
 22 |         points = np.zeros(shape=(image_width, image_height, 2))
 23 |         x_range = np.array(range(image_width))
 24 |         y_range = np.array(range(image_height))
 25 |         points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose()
 26 |         points[:, :, 1] = np.add(points[:, :, 1], y_range)
 27 |         points = points.reshape((-1, 2))
 28 |         ray = self.camera_model.camera2world_ray(points)
 29 |         return ray
 30 | 
 31 |     def __init__(self, config, root_dir, seq_name, estimated_depth_name=None, voxel_output=True, img_mean=(0.485, 0.456, 0.406),
 32 |                  img_std=(0.229, 0.224, 0.225), with_one_depth=False):
 33 |         self.voxel_output = voxel_output
 34 |         self.estimated_depth_name = estimated_depth_name
 35 |         self.with_one_depth = with_one_depth # only for visualization
 36 | 
 37 |         self.normalize = Normalize(mean=img_mean, std=img_std)
 38 |         self.to_tensor = ToTensor()
 39 | 
 40 |         self.img_size = config.image_shape
 41 | 
 42 |         self.camera_model_path = config.dataset.camera_calibration_path
 43 |         self.camera_model = FishEyeCameraCalibrated(calibration_file_path=self.camera_model_path)
 44 | 
 45 |         self.ray = self.calculated_ray_direction(config.dataset.image_width, config.dataset.image_height)
 46 | 
 47 |         self.image_path_list, self.gt_pose_list, self.depth_map_list = self.get_gt_data(root_dir, seq_name)
 48 | 
 49 |         assert len(self.image_path_list) == len(self.gt_pose_list) and len(self.gt_pose_list) == len(
 50 |             self.depth_map_list)
 51 | 
 52 |         self.cuboid_side = config.model.cuboid_side
 53 |         self.volume_size = config.model.volume_size
 54 | 
 55 |     def get_gt_data(self, root_dir, seq_name):
 56 |         print("start loading test file")
 57 |         base_path = os.path.join(root_dir, seq_name)
 58 | 
 59 |         img_data_path = os.path.join(base_path, 'imgs')
 60 |         gt_path = os.path.join(base_path, 'local_pose_gt.pkl')
 61 |         if self.estimated_depth_name is not None:
 62 |             depth_path = os.path.join(base_path, self.estimated_depth_name)
 63 |         else:
 64 |             depth_path = os.path.join(base_path, 'rendered', 'depths')
 65 |         syn_path = os.path.join(base_path, 'syn.json')
 66 | 
 67 |         with open(syn_path, 'r') as f:
 68 |             syn_data = json.load(f)
 69 | 
 70 |         ego_start_frame = syn_data['ego']
 71 |         ext_start_frame = syn_data['ext']
 72 | 
 73 |         with open(gt_path, 'rb') as f:
 74 |             pose_gt_data = pickle.load(f)
 75 | 
 76 |         image_path_list = []
 77 |         gt_pose_list = []
 78 |         depth_path_list = []
 79 | 
 80 |         for pose_gt_item in pose_gt_data:
 81 |             ext_id = pose_gt_item['ext_id']
 82 |             ego_pose_gt = pose_gt_item['ego_pose_gt']
 83 |             if ego_pose_gt is None:
 84 |                 continue
 85 |             ego_id = ext_id - ext_start_frame + ego_start_frame
 86 |             egocentric_image_name = "img_%06d.jpg" % ego_id
 87 |             depth_name = "img_%06d" % ego_id
 88 | 
 89 |             image_path = os.path.join(img_data_path, egocentric_image_name)
 90 |             if not os.path.exists(image_path):
 91 |                 continue
 92 |             image_path_list.append(image_path)
 93 |             if self.estimated_depth_name is not None:
 94 |                 depth_full_path = os.path.join(depth_path, 'img_%06d.jpg.exr' % ego_id)
 95 |             else:
 96 |                 depth_full_path = os.path.join(depth_path, depth_name, 'Image0001.exr')
 97 |             depth_path_list.append(depth_full_path)
 98 |             gt_pose_list.append(ego_pose_gt)
 99 |         print("dataset length: {}".format(len(image_path_list)))
100 |         return image_path_list, gt_pose_list, depth_path_list
101 | 
102 |     def evaluate_mpjpe(self, predicted_pose_list):
103 |         gt_pose_list = self.gt_pose_list
104 | 
105 |         mpjpe = calculate_error(predicted_pose_list, gt_pose_list)
106 | 
107 |         # align the estimated result and original result
108 |         aligned_estimated_result, gt_seq = align_skeleton(predicted_pose_list, gt_pose_list, None)
109 | 
110 |         pampjpe = calculate_error(aligned_estimated_result, gt_seq)
111 | 
112 |         return mpjpe, pampjpe
113 | 
114 |     def __len__(self):
115 |         return len(self.image_path_list)
116 | 
117 |     def __getitem__(self, index):
118 |         img_path = self.image_path_list[index]
119 |         if self.with_one_depth:
120 |             # only for visualization
121 |             depth_path = self.depth_map_list[2070]
122 |         else:
123 |             depth_path = self.depth_map_list[index]
124 | 
125 |         raw_img = cv2.imread(img_path)
126 |         raw_img = raw_img[:, 128: -128, :]
127 |         # data augmentation
128 |         img = cv2.resize(raw_img, dsize=(256, 256)) / 255.
129 | 
130 |         img_rgb = img[:, :, ::-1]
131 |         img_rgb = np.ascontiguousarray(img_rgb)
132 | 
133 |         img_torch = self.normalize(img)
134 |         img_torch = self.to_tensor(img_torch)
135 |         img_rgb = self.normalize(img_rgb)
136 |         img_rgb_torch = self.to_tensor(img_rgb)
137 | 
138 |         depth_map = cv2.imread(depth_path, cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)
139 |         if depth_map.shape[0] != 1024 or depth_map.shape[1] != 1280:
140 |             depth_map = cv2.resize(depth_map, (1280, 1024), interpolation=cv2.INTER_NEAREST)
141 |         if len(depth_map.shape) == 3:
142 |             depth_map = depth_map[:, :, 0]
143 |         depth_map[depth_map > 10] = 10
144 | 
145 |         if self.voxel_output is True:
146 |             depth_scene_info = depth_map_to_voxel(self.ray, depth_map, self.cuboid_side, self.volume_size)
147 |         else:
148 |             depth_scene_info = torch.from_numpy(depth_map).float()
149 | 
150 |         return img_torch, img_rgb_torch, depth_scene_info, img_path
151 | 
152 | def main():
153 |     dataset = TestDataset(config=None, seq_name='new_jian1',
154 |                           voxel_output=False)
155 | 
156 |     img_torch, img_rgb_torch, depth_scene_info, img_path = dataset[100]
157 | 
158 | 
159 | if __name__ == '__main__':
160 |     main()


--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | from pprint import pprint
  3 | 
  4 | import torch
  5 | from torch.utils.data import DataLoader
  6 | from tqdm import tqdm
  7 | 
  8 | from dataset.demo_dataset import DemoDataset
  9 | from network.voxel_net_depth import VoxelNetwork_depth
 10 | from utils import cfg
 11 | from utils.skeleton import Skeleton
 12 | import argparse
 13 | import pickle
 14 | from visualize import visualize
 15 | 
 16 | os.environ["OPENCV_IO_ENABLE_OPENEXR"] = "1"
 17 | 
 18 | 
 19 | class Demo:
 20 |     def __init__(self, config, img_dir, depth_dir):
 21 |         self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
 22 |         self.demo_dataset = DemoDataset(config, img_dir, depth_dir, voxel_output=False)
 23 |         self.demo_dataloader = DataLoader(self.demo_dataset, batch_size=1, shuffle=False, drop_last=False,
 24 |                                           num_workers=0)
 25 | 
 26 |         self.network = VoxelNetwork_depth(config)
 27 | 
 28 |         # load the network model
 29 |         model_path = config.test.model_path
 30 |         loads = torch.load(model_path)
 31 |         self.network.load_state_dict(loads['state_dict'])
 32 | 
 33 |         self.network = self.network.to(self.device)
 34 | 
 35 |         self.skeleton = Skeleton(calibration_path=config.dataset.camera_calibration_path)
 36 | 
 37 |     def run(self, config):
 38 |         print('---------------------Start Training-----------------------')
 39 |         pprint(config.__dict__)
 40 |         self.network.eval()
 41 | 
 42 |         result_list = []
 43 |         with torch.no_grad():
 44 |             for i, (img, img_rgb, depth_info, img_path) in tqdm(enumerate(self.demo_dataloader)):
 45 |                 img = img.to(self.device)
 46 |                 img_rgb = img_rgb.to(self.device)
 47 | 
 48 |                 depth_info = depth_info.to(self.device)
 49 | 
 50 |                 grid_coord_proj_batch = self.network.grid_coord_proj_batch
 51 |                 coord_volumes = self.network.coord_volumes
 52 | 
 53 |                 vol_keypoints_3d, features, volumes, coord_volumes = self.network(img, grid_coord_proj_batch,
 54 |                                                                                   coord_volumes,
 55 |                                                                                   depth_map_batch=depth_info)
 56 | 
 57 |                 predicted_keypoints_batch = vol_keypoints_3d.cpu().numpy()
 58 | 
 59 |                 assert len(vol_keypoints_3d) == 1 and len(img_path) == 1  # make sure the batch is 1
 60 | 
 61 |                 result_list.append({'img_path': img_path[0], 'predicted_keypoints': predicted_keypoints_batch[0]})
 62 | 
 63 |                 # save predicted joint to output dir
 64 | 
 65 |         return result_list
 66 | 
 67 | 
 68 | def main():
 69 |     parser = argparse.ArgumentParser()
 70 |     parser.add_argument('--config', type=str, required=False, default='experiments/sceneego/test/sceneego.yaml')
 71 |     parser.add_argument("--img_dir", type=str, required=False, default='data/demo/imgs')
 72 |     parser.add_argument("--depth_dir", type=str, required=False, default='data/demo/depths')
 73 |     parser.add_argument("--output_dir", type=str, required=False, default='data/demo/out')
 74 |     parser.add_argument("--vis", type=str, required=False, default='false')
 75 | 
 76 |     args = parser.parse_args()
 77 | 
 78 |     config_path = args.config
 79 |     img_dir = args.img_dir
 80 |     depth_dir = args.depth_dir
 81 |     output_dir = args.output_dir
 82 |     vis = args.vis
 83 | 
 84 |     config = cfg.load_config(config_path)
 85 |     demo = Demo(config, img_dir, depth_dir)
 86 |     result_list = demo.run(config)
 87 | 
 88 |     for result_dict in result_list:
 89 |         # save predicted joint list
 90 |         img_path = result_dict['img_path']
 91 |         img_name = os.path.split(img_path)[1]
 92 |         pose_pred = result_dict['predicted_keypoints']
 93 |         out_path = os.path.join(output_dir, f'{img_name}.pkl')
 94 |         depth_path = os.path.join(depth_dir, f'{img_name}.exr')
 95 | 
 96 |         with open(out_path, 'wb') as f:
 97 |             pickle.dump(pose_pred, f)
 98 | 
 99 |         # visualize the pose and depth map
100 |         if vis.lower() == 'true':
101 |             visualize(img_path, depth_path, out_path)
102 | 
103 | 
104 | if __name__ == '__main__':
105 |     main()
106 | 


--------------------------------------------------------------------------------
/evaluation/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/evaluation/__init__.py


--------------------------------------------------------------------------------
/evaluation/python_evaluate_3d_our_dataset.py:
--------------------------------------------------------------------------------
  1 | from scipy.io import loadmat
  2 | import open3d
  3 | from utils.skeleton import Skeleton
  4 | import numpy as np
  5 | from utils.calculate_errors import align_skeleton, calculate_error
  6 | import os
  7 | from natsort import natsorted
  8 | from tqdm import tqdm
  9 | import pickle
 10 | 
 11 | jian3_motion_type = {
 12 |     'start': 557,
 13 |     'walking': [np.asarray([557, 897]), np.asarray([1137, 1327])],
 14 |     'running': [np.asarray([897, 1137])],
 15 |     'boxing': [np.asarray([1327, 1417])],
 16 |     'stretching': [np.asarray([1417, 1587])],
 17 |     'waving': [np.asarray([1587, 1687])],
 18 |     'sitting': [np.asarray([1687, 1857])]}
 19 | 
 20 | studio_jian1_motion_type = {
 21 |     'start': 503,
 22 |     'walking': [np.asarray([503, 1723])],
 23 |     'running': [np.asarray([1723, 2153])],
 24 |     'crouching': [np.asarray([2153, 2393])],
 25 |     'boxing': [np.asarray([2393, 2883])],
 26 |     'dancing': [np.asarray([2883, 3223])],
 27 |     'stretching': [np.asarray([3223, 3553])],
 28 |     'waving': [np.asarray([3553, 3603])]}
 29 | 
 30 | studio_lingjie1_motion_type = {
 31 |     'start': 551,
 32 |     'walking': [np.asarray([551, 1761])],
 33 |     'crouching': [np.asarray([1761, 2031])],
 34 |     'boxing': [np.asarray([2031, 2331])],
 35 |     'dancing': [np.asarray([2331, 2691])],
 36 |     'stretching': [np.asarray([2691, 2991])],
 37 |     'waving': [np.asarray([2991, 3251])]}
 38 | 
 39 | studio_jian2_motion_type = {
 40 |     'start': 600,
 41 |     'walking': [np.asarray([600, 1920])],
 42 |     'dancing': [np.asarray([2170, 2310])],
 43 |     'playingballs': [np.asarray([1920, 2170])],
 44 |     'opendoor': [np.asarray([2310, 2740])],
 45 |     'playgolf': [np.asarray([2740, 2980])],
 46 |     'talking': [np.asarray([2980, 3210])],
 47 |     'shootingarrow': [np.asarray([3210, 3400])]}
 48 | 
 49 | studio_lingjie2_motion_type = {
 50 |     'start': 438,
 51 |     'walking': [np.asarray([438, 1048])],
 52 |     'running': [np.asarray([1048, 1548])],
 53 |     'playingballs': [np.asarray([1548, 1748])],
 54 |     'opendoor': [np.asarray([1748, 2008])],
 55 |     'playgolf': [np.asarray([2008, 2288])],
 56 |     'talking': [np.asarray([2288, 2528])],
 57 |     'shootingarrow': [np.asarray([2528, 2738])]}
 58 | 
 59 | path_dict = {
 60 |     'jian3': {
 61 |         'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/jian3/jian3.pkl',
 62 |         'start_frame': 557,
 63 |         'end_frame': 1857,
 64 |         "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC08102020/jian3'
 65 |     },
 66 |     'studio-jian1': {
 67 |         'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-jian1/jian1.pkl',
 68 |         'start_frame': 503,
 69 |         'end_frame': 3603,
 70 |         "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-jian1'
 71 |     },
 72 |     'studio-jian2': {
 73 |         'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-jian2/jian2.pkl',
 74 |         'start_frame': 600,
 75 |         'end_frame': 3400,
 76 |         "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-jian2'
 77 |     },
 78 |     'studio-lingjie1': {
 79 |         'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-lingjie1/lingjie1.pkl',
 80 |         'start_frame': 551,
 81 |         'end_frame': 3251,
 82 |         "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-lingjie1'
 83 |     },
 84 |     'studio-lingjie2': {
 85 |         'gt_path': r'/HPS/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-lingjie2/lingjie2.pkl',
 86 |         'start_frame': 438,
 87 |         'end_frame': 2738,
 88 |         "predicted_path": r'/HPS/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-lingjie2'
 89 |     }
 90 | }
 91 | 
 92 | 
 93 | path_dict_local = {
 94 |     'jian3': {
 95 |         'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/jian3/jian3.pkl',
 96 |         'start_frame': 557,
 97 |         'end_frame': 1857,
 98 |         "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC08102020/jian3'
 99 |     },
100 |     'studio-jian1': {
101 |         'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-jian1/jian1.pkl',
102 |         'start_frame': 503,
103 |         'end_frame': 3603,
104 |         "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-jian1'
105 |     },
106 |     'studio-jian2': {
107 |         'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-jian2/jian2.pkl',
108 |         'start_frame': 600,
109 |         'end_frame': 3400,
110 |         "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-jian2'
111 |     },
112 |     'studio-lingjie1': {
113 |         'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-lingjie1/lingjie1.pkl',
114 |         'start_frame': 551,
115 |         'end_frame': 3251,
116 |         "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-lingjie1'
117 |     },
118 |     'studio-lingjie2': {
119 |         'gt_path': r'X:/Mo2Cap2Plus/work/MakeWeipengStudioTestData/data/studio-lingjie2/lingjie2.pkl',
120 |         'start_frame': 438,
121 |         'end_frame': 2738,
122 |         "predicted_path": r'X:/Mo2Cap2Plus1/static00/EgocentricData/REC23102020/studio-lingjie2'
123 |     }
124 | }
125 | 
126 | def evaluate_3d_our_dataset(sequence_name, predicted_pose, scale=True, select_start_to_end=True):
127 |     gt_path = path_dict[sequence_name]['gt_path']
128 |     start_frame = path_dict[sequence_name]['start_frame']
129 |     end_frame = path_dict[sequence_name]['end_frame']
130 |     predicted_path = path_dict[sequence_name]['predicted_path']
131 | 
132 |     def load_gt_data(gt_path, start_frame, end_frame, mat_start_frame):
133 |         with open(gt_path, 'rb') as f:
134 |             pose_gt = pickle.load(f)
135 |         clip = []
136 |         for i in range(start_frame, end_frame):
137 |             clip.append(pose_gt[i - mat_start_frame])
138 | 
139 |         skeleton_list = clip
140 | 
141 |         return np.asarray(skeleton_list)
142 | 
143 |     gt_pose_list = load_gt_data(gt_path, start_frame, end_frame, start_frame)
144 | 
145 |     # get predicted pose
146 | 
147 |     skeleton_model = Skeleton(calibration_path='utils/fisheye/fisheye.calibration.json')
148 | 
149 |     # sort the predicted poses
150 |     # predicted_pose = natsorted(predicted_pose, key=lambda pose: pose[0])
151 |     #
152 |     # predicted_pose_list = [pose_tuple[1] for pose_tuple in predicted_pose]
153 |     if select_start_to_end:
154 |         predicted_pose_list = predicted_pose[start_frame: end_frame]
155 |     else:
156 |         predicted_pose_list = predicted_pose
157 | 
158 |     aligned_estimated_result, gt_seq = align_skeleton(predicted_pose_list, gt_pose_list, None, scale=scale)
159 | 
160 |     aligned_original_mpjpe = calculate_error(aligned_estimated_result, gt_seq)
161 | 
162 |     # align the estimated result and original result
163 |     aligned_estimated_result, gt_seq = align_skeleton(predicted_pose_list, gt_pose_list, skeleton_model, scale=scale)
164 | 
165 |     bone_length_aligned_original_mpjpe = calculate_error(aligned_estimated_result, gt_seq)
166 | 
167 |     print(aligned_original_mpjpe)
168 |     print(bone_length_aligned_original_mpjpe)
169 | 
170 |     calculate_different_motion(aligned_estimated_result, gt_seq, sequence_name)
171 | 
172 |     return aligned_original_mpjpe, bone_length_aligned_original_mpjpe
173 | 
174 | 
175 | def calculate_different_motion(estimated_pose, gt_pose, data_dir):
176 |     if 'jian3' in data_dir:
177 |         motion_type = jian3_motion_type
178 |     if 'jian1' in data_dir:
179 |         motion_type = studio_jian1_motion_type
180 |     if 'jian2' in data_dir:
181 |         motion_type = studio_jian2_motion_type
182 |     if 'lingjie1' in data_dir:
183 |         motion_type = studio_lingjie1_motion_type
184 |     if 'lingjie2' in data_dir:
185 |         motion_type = studio_lingjie2_motion_type
186 | 
187 |     skeleton_model = Skeleton(None)
188 |     start_frame = motion_type['start']
189 | 
190 |     for motion in motion_type.keys():
191 |         if motion == 'start':
192 |             continue
193 |         estimated_mpjpe = 0
194 |         for motion_range in motion_type[motion]:
195 |             estimated_pose_motion = estimated_pose[motion_range[0] - start_frame: motion_range[1] - start_frame]
196 |             gt_pose_motion = gt_pose[motion_range[0] - start_frame: motion_range[1] - start_frame]
197 |             aligned_estimated_result, final_gt_seq = align_skeleton(estimated_pose_motion, gt_pose_motion,
198 |                                                                     skeleton_model)
199 |             estimated_mpjpe += calculate_error(aligned_estimated_result, final_gt_seq)
200 |         estimated_mpjpe /= len(motion_type[motion])
201 |         print("{}: {}".format(motion, estimated_mpjpe))
202 | 
203 | 
204 | if __name__ == '__main__':
205 |     evaluate_3d_our_dataset('jian3', heatmap_name='da_external_more_gpu_no_mid_loss',
206 |                             depth_name='finetune_depth_spin_iter_0_depth_5',
207 |                             load_predicted=False)
208 |     evaluate_3d_our_dataset('studio-jian1', heatmap_name='da_external_more_gpu_no_mid_loss',
209 |                             depth_name='finetune_depth_spin_iter_0_depth_5',
210 |                             load_predicted=False)
211 |     evaluate_3d_our_dataset('studio-jian2', heatmap_name='da_external_more_gpu_no_mid_loss',
212 |                             depth_name='finetune_depth_spin_iter_0_depth_5',
213 |                             load_predicted=False)
214 |     evaluate_3d_our_dataset('studio-lingjie1', heatmap_name='da_external_more_gpu_no_mid_loss',
215 |                             depth_name='finetune_depth_spin_iter_0_depth_5',
216 |                             load_predicted=False)
217 |     evaluate_3d_our_dataset('studio-lingjie2', heatmap_name='da_external_more_gpu_no_mid_loss',
218 |                             depth_name='finetune_depth_spin_iter_0_depth_5',
219 |                             load_predicted=False)
220 | 


--------------------------------------------------------------------------------
/experiments/sceneego/test/sceneego.yaml:
--------------------------------------------------------------------------------
  1 | title: "sceneego"
  2 | kind: "mo2cap2"
  3 | vis_freq: 1000
  4 | vis_n_elements: 10
  5 | 
  6 | image_shape: [256, 256]
  7 | heatmap_shape: [1024, 1280]
  8 | 
  9 | test:
 10 |   batch_size: 8
 11 |   model_path: "models/sceneego/checkpoints/6.pth.tar"
 12 |   depth_model_path: "network_models/wo_body_lr_1e-4_finetune_cleaned_data/iter_38000.pth.tar"
 13 | 
 14 | opt:
 15 |   criterion: "MAE"
 16 | 
 17 |   use_volumetric_ce_loss: true
 18 |   volumetric_ce_loss_weight: 0.01
 19 | 
 20 |   n_objects_per_epoch: 15000
 21 |   n_epochs: 10
 22 | 
 23 |   batch_size: 40
 24 |   val_batch_size: 10
 25 | 
 26 |   train_2d: false
 27 | 
 28 |   lr: 0.001
 29 |   process_features_lr: 0.001
 30 |   volume_net_lr: 0.001
 31 | 
 32 |   scale_keypoints_3d: 0.1
 33 | 
 34 |   log_step: 1000
 35 | 
 36 | model:
 37 |   name: "vol"
 38 |   kind: "mo2cap2"
 39 |   volume_aggregation_method: "softmax"
 40 |   with_scene: true
 41 |   with_intersection: false
 42 |   init_weights: false
 43 |   checkpoint: ""
 44 | 
 45 |   load_model: True
 46 |   model_path: "logs/egopw_with_depth/checkpoints/5.pth.tar"
 47 |   depth_model_path: "network_models/wo_body_lr_1e-4_finetune_cleaned_data/iter_38000.pth.tar"
 48 | 
 49 |   cuboid_side: 2
 50 | 
 51 |   volume_size: 64
 52 |   volume_multiplier: 1.0
 53 |   volume_softmax: true
 54 | 
 55 |   heatmap_softmax: true
 56 |   heatmap_multiplier: 100.0
 57 | 
 58 |   backbone:
 59 |     name: "resnet50"
 60 |     style: "simple"
 61 | 
 62 |     init_weights: true
 63 |     local_checkpoint: false
 64 |     checkpoint: "/HPS/Mo2Cap2Plus1/work/Mo2Cap2Finetune/logs/finetune2D_spin_without_da_iter_0_new/checkpoints/6.pth.tar"
 65 | #    checkpoint: "X:/Mo2Cap2Plus1/work/Mo2Cap2Finetune/logs/finetune2D_spin_without_da_iter_0_new/checkpoints/6.pth.tar"
 66 | 
 67 |     num_joints: 15
 68 |     num_layers: 50
 69 | 
 70 | dataset:
 71 |   kind: "egopw_with_depth"
 72 |   camera_calibration_path: "utils/fisheye/fisheye.calibration_05_08.json"
 73 |   old_camera_calibration_path: "utils/fisheye/fisheye.calibration.json"
 74 |   image_width: 1280
 75 |   image_height: 1024
 76 | 
 77 |   train:
 78 |     mo2cap2_root: "/HPS/Mo2Cap2Plus/static00/Datasets/Mo2Cap2/data/training_data_full_annotated"
 79 | #    wild_data_root: "/HPS/Mo2Cap2Plus1/static00/ExternalEgo/External_camera_all"
 80 | #    rendered_depth_path: "/CT/EgoMocap/work/EgoBodyInContext/sfm_data"
 81 |     wild_data_root: "X:/Mo2Cap2Plus1/static00/ExternalEgo/External_camera_all"
 82 |     rendered_depth_path: "Z:/EgoMocap/work/EgoBodyInContext/sfm_data"
 83 | 
 84 |     with_damaged_actions: true
 85 |     undistort_images: true
 86 | 
 87 |     scale_bbox: 1.0
 88 | 
 89 | 
 90 |     shuffle: true
 91 |     randomize_n_views: false
 92 |     min_n_views: null
 93 |     max_n_views: null
 94 |     num_workers: 5
 95 | 
 96 | 
 97 |   val:
 98 |     h36m_root: "./data/human36m/processed/"
 99 |     labels_path: "./data/human36m/extra/human36m-multiview-labels-GTbboxes.npy"
100 |     pred_results_path: "./data/pretrained/human36m/human36m_alg_10-04-2019/checkpoints/0060/results/val.pkl"
101 | 
102 |     with_damaged_actions: true
103 |     undistort_images: true
104 | 
105 |     scale_bbox: 1.0
106 | 
107 |     shuffle: false
108 |     randomize_n_views: false
109 |     min_n_views: null
110 |     max_n_views: null
111 |     num_workers: 10
112 | 
113 |     retain_every_n_frames_in_test: 1
114 | 


--------------------------------------------------------------------------------
/models/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/models/.gitkeep


--------------------------------------------------------------------------------
/models/sceneego/checkpoints/.gitkeep:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/models/sceneego/checkpoints/.gitkeep


--------------------------------------------------------------------------------
/network/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/network/__init__.py


--------------------------------------------------------------------------------
/network/pose_resnet.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import logging
  3 | 
  4 | import torch
  5 | import torch.nn as nn
  6 | from collections import OrderedDict
  7 | from torch.nn.functional import interpolate
  8 | 
  9 | 
 10 | BN_MOMENTUM = 0.1
 11 | logger = logging.getLogger(__name__)
 12 | 
 13 | 
 14 | def conv3x3(in_planes, out_planes, stride=1):
 15 |     """3x3 convolution with padding"""
 16 |     return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
 17 |                      padding=1, bias=False)
 18 | 
 19 | 
 20 | class BasicBlock(nn.Module):
 21 |     expansion = 1
 22 | 
 23 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 24 |         super(BasicBlock, self).__init__()
 25 |         self.conv1 = conv3x3(inplanes, planes, stride)
 26 |         self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
 27 |         self.relu = nn.ReLU(inplace=True)
 28 |         self.conv2 = conv3x3(planes, planes)
 29 |         self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
 30 |         self.downsample = downsample
 31 |         self.stride = stride
 32 | 
 33 |     def forward(self, x):
 34 |         residual = x
 35 | 
 36 |         out = self.conv1(x)
 37 |         out = self.bn1(out)
 38 |         out = self.relu(out)
 39 | 
 40 |         out = self.conv2(out)
 41 |         out = self.bn2(out)
 42 | 
 43 |         if self.downsample is not None:
 44 |             residual = self.downsample(x)
 45 | 
 46 |         out += residual
 47 |         out = self.relu(out)
 48 | 
 49 |         return out
 50 | 
 51 | 
 52 | class Bottleneck(nn.Module):
 53 |     expansion = 4
 54 | 
 55 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 56 |         super(Bottleneck, self).__init__()
 57 |         self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
 58 |         self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
 59 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
 60 |                                padding=1, bias=False)
 61 |         self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
 62 |         self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
 63 |                                bias=False)
 64 |         self.bn3 = nn.BatchNorm2d(planes * self.expansion,
 65 |                                   momentum=BN_MOMENTUM)
 66 |         self.relu = nn.ReLU(inplace=True)
 67 |         self.downsample = downsample
 68 |         self.stride = stride
 69 | 
 70 |     def forward(self, x):
 71 |         residual = x
 72 | 
 73 |         out = self.conv1(x)
 74 |         out = self.bn1(out)
 75 |         out = self.relu(out)
 76 | 
 77 |         out = self.conv2(out)
 78 |         out = self.bn2(out)
 79 |         out = self.relu(out)
 80 | 
 81 |         out = self.conv3(out)
 82 |         out = self.bn3(out)
 83 | 
 84 |         if self.downsample is not None:
 85 |             residual = self.downsample(x)
 86 | 
 87 |         out += residual
 88 |         out = self.relu(out)
 89 | 
 90 |         return out
 91 | 
 92 | 
 93 | class Bottleneck_CAFFE(nn.Module):
 94 |     expansion = 4
 95 | 
 96 |     def __init__(self, inplanes, planes, stride=1, downsample=None):
 97 |         super(Bottleneck_CAFFE, self).__init__()
 98 |         # add stride to conv1x1
 99 |         self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False)
100 |         self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
101 |         self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1,
102 |                                padding=1, bias=False)
103 |         self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
104 |         self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
105 |                                bias=False)
106 |         self.bn3 = nn.BatchNorm2d(planes * self.expansion,
107 |                                   momentum=BN_MOMENTUM)
108 |         self.relu = nn.ReLU(inplace=True)
109 |         self.downsample = downsample
110 |         self.stride = stride
111 | 
112 |     def forward(self, x):
113 |         residual = x
114 | 
115 |         out = self.conv1(x)
116 |         out = self.bn1(out)
117 |         out = self.relu(out)
118 | 
119 |         out = self.conv2(out)
120 |         out = self.bn2(out)
121 |         out = self.relu(out)
122 | 
123 |         out = self.conv3(out)
124 |         out = self.bn3(out)
125 | 
126 |         if self.downsample is not None:
127 |             residual = self.downsample(x)
128 | 
129 |         out += residual
130 |         out = self.relu(out)
131 | 
132 |         return out
133 | 
134 | 
135 | class PoseResNet(nn.Module):
136 | 
137 |     def __init__(self, block, layers):
138 |         self.inplanes = 64
139 |         self.deconv_with_bias = False
140 | 
141 |         super(PoseResNet, self).__init__()
142 |         
143 |         self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
144 |                                bias=False)
145 |         self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
146 |         self.relu = nn.ReLU(inplace=True)
147 |         self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
148 |         self.layer1 = self._make_layer(block, 64, layers[0])
149 |         self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
150 |         self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
151 |         self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
152 | 
153 |         # used for deconv layers
154 |         self.deconv_layers = self._make_deconv_layer(
155 |             3,
156 |             [256, 256, 256],
157 |             [4, 4, 4],
158 |         )
159 | 
160 |         self.final_layer = nn.Conv2d(
161 |             in_channels=256,
162 |             out_channels=16,
163 |             kernel_size=1,
164 |             stride=1,
165 |             padding=0
166 |         )
167 | 
168 |     def _make_layer(self, block, planes, blocks, stride=1):
169 |         downsample = None
170 |         if stride != 1 or self.inplanes != planes * block.expansion:
171 |             downsample = nn.Sequential(
172 |                 nn.Conv2d(self.inplanes, planes * block.expansion,
173 |                           kernel_size=1, stride=stride, bias=False),
174 |                 nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
175 |             )
176 | 
177 |         layers = []
178 |         layers.append(block(self.inplanes, planes, stride, downsample))
179 |         self.inplanes = planes * block.expansion
180 |         for i in range(1, blocks):
181 |             layers.append(block(self.inplanes, planes))
182 | 
183 |         return nn.Sequential(*layers)
184 | 
185 |     def _get_deconv_cfg(self, deconv_kernel, index):
186 |         if deconv_kernel == 4:
187 |             padding = 1
188 |             output_padding = 0
189 |         elif deconv_kernel == 3:
190 |             padding = 1
191 |             output_padding = 1
192 |         elif deconv_kernel == 2:
193 |             padding = 0
194 |             output_padding = 0
195 | 
196 |         return deconv_kernel, padding, output_padding
197 | 
198 |     def _make_deconv_layer(self, num_layers, num_filters, num_kernels):
199 |         assert num_layers == len(num_filters), \
200 |             'ERROR: num_deconv_layers is different len(num_deconv_filters)'
201 |         assert num_layers == len(num_kernels), \
202 |             'ERROR: num_deconv_layers is different len(num_deconv_filters)'
203 | 
204 |         layers = []
205 |         for i in range(num_layers):
206 |             kernel, padding, output_padding = \
207 |                 self._get_deconv_cfg(num_kernels[i], i)
208 | 
209 |             planes = num_filters[i]
210 |             layers.append(
211 |                 nn.ConvTranspose2d(
212 |                     in_channels=self.inplanes,
213 |                     out_channels=planes,
214 |                     kernel_size=kernel,
215 |                     stride=2,
216 |                     padding=padding,
217 |                     output_padding=output_padding,
218 |                     bias=self.deconv_with_bias))
219 |             layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM))
220 |             layers.append(nn.ReLU(inplace=True))
221 |             self.inplanes = planes
222 | 
223 |         return nn.Sequential(*layers)
224 | 
225 |     def forward(self, x, return_mid_layer=False):
226 |         x = self.conv1(x)
227 |         x = self.bn1(x)
228 |         x = self.relu(x)
229 |         x = self.maxpool(x)
230 | 
231 |         x = self.layer1(x)
232 |         x = self.layer2(x)
233 |         x = self.layer3(x)
234 |         x = self.layer4(x)
235 | 
236 |         mid_layer_feature = x
237 | 
238 |         x = self.deconv_layers(x)
239 | 
240 |         features = x
241 |         x = self.final_layer(x)
242 |         heatmaps = x[:, :15, :, :]
243 |         if return_mid_layer is False:
244 |             return heatmaps, features
245 |         else:
246 |             return heatmaps, features, mid_layer_feature
247 | 
248 | 
249 |     def init_weights(self, pretrained=''):
250 |         if os.path.isfile(pretrained):
251 |             logger.info('=> init deconv weights from normal distribution')
252 |             for name, m in self.deconv_layers.named_modules():
253 |                 if isinstance(m, nn.ConvTranspose2d):
254 |                     logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))
255 |                     logger.info('=> init {}.bias as 0'.format(name))
256 |                     nn.init.normal_(m.weight, std=0.001)
257 |                     if self.deconv_with_bias:
258 |                         nn.init.constant_(m.bias, 0)
259 |                 elif isinstance(m, nn.BatchNorm2d):
260 |                     logger.info('=> init {}.weight as 1'.format(name))
261 |                     logger.info('=> init {}.bias as 0'.format(name))
262 |                     nn.init.constant_(m.weight, 1)
263 |                     nn.init.constant_(m.bias, 0)
264 |             logger.info('=> init final conv weights from normal distribution')
265 |             for m in self.final_layer.modules():
266 |                 if isinstance(m, nn.Conv2d):
267 |                     # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
268 |                     logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))
269 |                     logger.info('=> init {}.bias as 0'.format(name))
270 |                     nn.init.normal_(m.weight, std=0.001)
271 |                     nn.init.constant_(m.bias, 0)
272 | 
273 |             # pretrained_state_dict = torch.load(pretrained)
274 |             logger.info('=> loading pretrained model {}'.format(pretrained))
275 |             # self.load_state_dict(pretrained_state_dict, strict=False)
276 |             checkpoint = torch.load(pretrained)
277 |             if isinstance(checkpoint, OrderedDict):
278 |                 state_dict = checkpoint
279 |             elif isinstance(checkpoint, dict) and 'state_dict' in checkpoint:
280 |                 state_dict_old = checkpoint['state_dict']
281 |                 state_dict = OrderedDict()
282 |                 # delete 'module.' because it is saved from DataParallel module
283 |                 for key in state_dict_old.keys():
284 |                     if key.startswith('module.'):
285 |                         # state_dict[key[7:]] = state_dict[key]
286 |                         # state_dict.pop(key)
287 |                         state_dict[key[7:]] = state_dict_old[key]
288 |                     else:
289 |                         state_dict[key] = state_dict_old[key]
290 |             else:
291 |                 raise RuntimeError(
292 |                     'No state_dict found in checkpoint file {}'.format(pretrained))
293 |             self.load_state_dict(state_dict, strict=False)
294 |         else:
295 |             logger.error('=> imagenet pretrained model dose not exist')
296 |             logger.error('=> please download it first')
297 |             raise ValueError('imagenet pretrained model does not exist')
298 | 
299 | 
300 | resnet_spec = {18: (BasicBlock, [2, 2, 2, 2]),
301 |                34: (BasicBlock, [3, 4, 6, 3]),
302 |                50: (Bottleneck, [3, 4, 6, 3]),
303 |                101: (Bottleneck, [3, 4, 23, 3]),
304 |                152: (Bottleneck, [3, 8, 36, 3])}
305 | 
306 | 
307 | def load_state_dict(model, new_state_dict):
308 |     state = model.state_dict()
309 |     state.update(new_state_dict)
310 |     model.load_state_dict(state)
311 |     return model
312 | 
313 | def get_pose_net(model_path='models/pose_resnet_50_256x256.pth.tar', state_dict=None):
314 |     num_layers = 50
315 | 
316 |     block_class, layers = resnet_spec[num_layers]
317 | 
318 |     model = PoseResNet(block_class, layers)
319 | 
320 |     if state_dict is None:
321 |         if model_path is not None:
322 |             state_dict = torch.load(model_path)
323 |             model = load_state_dict(model, state_dict)
324 |     else:
325 |         # model.load_state_dict(state_dict)
326 |         keys_list = list(state_dict.keys())
327 | 
328 |         if keys_list[0].startswith('module'):
329 |             state_dict = {k[7:]: v for k, v in state_dict.items()}
330 |         model = load_state_dict(model, state_dict)
331 | 
332 |     return model
333 | 
334 | 
335 | if __name__ == '__main__':
336 |     network = get_pose_net(model_path='../models/pose_resnet_50_256x256.pth.tar', with_bone_length=False)
337 |     test_input = torch.zeros([4, 3, 256, 256])
338 |     res = network.forward_2D_pose(test_input, slice=False)
339 |     print(res[1].shape)
340 |     print(res[2].shape)


--------------------------------------------------------------------------------
/network/v2v.py:
--------------------------------------------------------------------------------
  1 | # Reference: https://github.com/dragonbook/V2V-PoseNet-pytorch
  2 | 
  3 | import torch.nn as nn
  4 | import torch.nn.functional as F
  5 | import torch
  6 | 
  7 | 
  8 | class Basic3DBlock(nn.Module):
  9 |     def __init__(self, in_planes, out_planes, kernel_size):
 10 |         super(Basic3DBlock, self).__init__()
 11 |         self.block = nn.Sequential(
 12 |             nn.Conv3d(in_planes, out_planes, kernel_size=kernel_size, stride=1, padding=((kernel_size-1)//2)),
 13 |             nn.BatchNorm3d(out_planes),
 14 |             nn.ReLU(True)
 15 |         )
 16 | 
 17 |     def forward(self, x):
 18 |         return self.block(x)
 19 | 
 20 | 
 21 | class Res3DBlock(nn.Module):
 22 |     def __init__(self, in_planes, out_planes):
 23 |         super(Res3DBlock, self).__init__()
 24 |         self.res_branch = nn.Sequential(
 25 |             nn.Conv3d(in_planes, out_planes, kernel_size=3, stride=1, padding=1),
 26 |             nn.BatchNorm3d(out_planes),
 27 |             nn.ReLU(True),
 28 |             nn.Conv3d(out_planes, out_planes, kernel_size=3, stride=1, padding=1),
 29 |             nn.BatchNorm3d(out_planes)
 30 |         )
 31 | 
 32 |         if in_planes == out_planes:
 33 |             self.skip_con = nn.Sequential()
 34 |         else:
 35 |             self.skip_con = nn.Sequential(
 36 |                 nn.Conv3d(in_planes, out_planes, kernel_size=1, stride=1, padding=0),
 37 |                 nn.BatchNorm3d(out_planes)
 38 |             )
 39 | 
 40 |     def forward(self, x):
 41 |         res = self.res_branch(x)
 42 |         skip = self.skip_con(x)
 43 |         return F.relu(res + skip, True)
 44 | 
 45 | 
 46 | class Pool3DBlock(nn.Module):
 47 |     def __init__(self, pool_size):
 48 |         super(Pool3DBlock, self).__init__()
 49 |         self.pool_size = pool_size
 50 | 
 51 |     def forward(self, x):
 52 |         return F.max_pool3d(x, kernel_size=self.pool_size, stride=self.pool_size)
 53 | 
 54 | 
 55 | class Upsample3DBlock(nn.Module):
 56 |     def __init__(self, in_planes, out_planes, kernel_size, stride):
 57 |         super(Upsample3DBlock, self).__init__()
 58 |         assert(kernel_size == 2)
 59 |         assert(stride == 2)
 60 |         self.block = nn.Sequential(
 61 |             nn.ConvTranspose3d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=0, output_padding=0),
 62 |             nn.BatchNorm3d(out_planes),
 63 |             nn.ReLU(True)
 64 |         )
 65 | 
 66 |     def forward(self, x):
 67 |         return self.block(x)
 68 | 
 69 | 
 70 | class EncoderDecorder(nn.Module):
 71 |     def __init__(self):
 72 |         super().__init__()
 73 | 
 74 |         self.encoder_pool1 = Pool3DBlock(2)
 75 |         self.encoder_res1 = Res3DBlock(32, 64)
 76 |         self.encoder_pool2 = Pool3DBlock(2)
 77 |         self.encoder_res2 = Res3DBlock(64, 128)
 78 |         self.encoder_pool3 = Pool3DBlock(2)
 79 |         self.encoder_res3 = Res3DBlock(128, 128)
 80 |         self.encoder_pool4 = Pool3DBlock(2)
 81 |         self.encoder_res4 = Res3DBlock(128, 128)
 82 |         self.encoder_pool5 = Pool3DBlock(2)
 83 |         self.encoder_res5 = Res3DBlock(128, 128)
 84 | 
 85 |         self.mid_res = Res3DBlock(128, 128)
 86 | 
 87 |         self.decoder_res5 = Res3DBlock(128, 128)
 88 |         self.decoder_upsample5 = Upsample3DBlock(128, 128, 2, 2)
 89 |         self.decoder_res4 = Res3DBlock(128, 128)
 90 |         self.decoder_upsample4 = Upsample3DBlock(128, 128, 2, 2)
 91 |         self.decoder_res3 = Res3DBlock(128, 128)
 92 |         self.decoder_upsample3 = Upsample3DBlock(128, 128, 2, 2)
 93 |         self.decoder_res2 = Res3DBlock(128, 128)
 94 |         self.decoder_upsample2 = Upsample3DBlock(128, 64, 2, 2)
 95 |         self.decoder_res1 = Res3DBlock(64, 64)
 96 |         self.decoder_upsample1 = Upsample3DBlock(64, 32, 2, 2)
 97 | 
 98 |         self.skip_res1 = Res3DBlock(32, 32)
 99 |         self.skip_res2 = Res3DBlock(64, 64)
100 |         self.skip_res3 = Res3DBlock(128, 128)
101 |         self.skip_res4 = Res3DBlock(128, 128)
102 |         self.skip_res5 = Res3DBlock(128, 128)
103 | 
104 |     def forward(self, x):
105 |         skip_x1 = self.skip_res1(x)
106 |         x = self.encoder_pool1(x)
107 |         x = self.encoder_res1(x)
108 |         skip_x2 = self.skip_res2(x)
109 |         x = self.encoder_pool2(x)
110 |         x = self.encoder_res2(x)
111 |         skip_x3 = self.skip_res3(x)
112 |         x = self.encoder_pool3(x)
113 |         x = self.encoder_res3(x)
114 |         skip_x4 = self.skip_res4(x)
115 |         x = self.encoder_pool4(x)
116 |         x = self.encoder_res4(x)
117 |         skip_x5 = self.skip_res5(x)
118 |         x = self.encoder_pool5(x)
119 |         x = self.encoder_res5(x)
120 | 
121 |         x = self.mid_res(x)
122 | 
123 |         x = self.decoder_res5(x)
124 |         x = self.decoder_upsample5(x)
125 |         x = x + skip_x5
126 |         x = self.decoder_res4(x)
127 |         x = self.decoder_upsample4(x)
128 |         x = x + skip_x4
129 |         x = self.decoder_res3(x)
130 |         x = self.decoder_upsample3(x)
131 |         x = x + skip_x3
132 |         x = self.decoder_res2(x)
133 |         x = self.decoder_upsample2(x)
134 |         x = x + skip_x2
135 |         x = self.decoder_res1(x)
136 |         x = self.decoder_upsample1(x)
137 |         x = x + skip_x1
138 | 
139 |         return x
140 | 
141 | 
142 | class V2VModel(nn.Module):
143 |     def __init__(self, input_channels, output_channels):
144 |         super().__init__()
145 | 
146 |         self.front_layers = nn.Sequential(
147 |             Basic3DBlock(input_channels, 16, 7),
148 |             Res3DBlock(16, 32),
149 |             Res3DBlock(32, 32),
150 |             Res3DBlock(32, 32)
151 |         )
152 | 
153 |         self.encoder_decoder = EncoderDecorder()
154 | 
155 |         self.back_layers = nn.Sequential(
156 |             Res3DBlock(32, 32),
157 |             Basic3DBlock(32, 32, 1),
158 |             Basic3DBlock(32, 32, 1),
159 |         )
160 | 
161 |         self.output_layer = nn.Conv3d(32, output_channels, kernel_size=1, stride=1, padding=0)
162 | 
163 |         self._initialize_weights()
164 | 
165 |     def forward(self, x):
166 |         x = self.front_layers(x)
167 |         x = self.encoder_decoder(x)
168 |         x = self.back_layers(x)
169 |         x = self.output_layer(x)
170 |         return x
171 | 
172 |     def _initialize_weights(self):
173 |         for m in self.modules():
174 |             if isinstance(m, nn.Conv3d):
175 |                 nn.init.xavier_normal_(m.weight)
176 |                 # nn.init.normal_(m.weight, 0, 0.001)
177 |                 nn.init.constant_(m.bias, 0)
178 |             elif isinstance(m, nn.ConvTranspose3d):
179 |                 nn.init.xavier_normal_(m.weight)
180 |                 # nn.init.normal_(m.weight, 0, 0.001)
181 |                 nn.init.constant_(m.bias, 0)
182 | 
183 | 
184 | class EncoderDecoderSimple(nn.Module):
185 |     def __init__(self):
186 |         super().__init__()
187 | 
188 |         self.encoder_pool1 = Pool3DBlock(2)
189 |         self.encoder_res1 = Res3DBlock(32, 64)
190 |         self.encoder_pool2 = Pool3DBlock(2)
191 |         self.encoder_res2 = Res3DBlock(64, 128)
192 | 
193 |         self.mid_res = Res3DBlock(128, 128)
194 | 
195 |         self.decoder_res2 = Res3DBlock(128, 128)
196 |         self.decoder_upsample2 = Upsample3DBlock(128, 64, 2, 2)
197 |         self.decoder_res1 = Res3DBlock(64, 64)
198 |         self.decoder_upsample1 = Upsample3DBlock(64, 32, 2, 2)
199 | 
200 |         self.skip_res1 = Res3DBlock(32, 32)
201 |         self.skip_res2 = Res3DBlock(64, 64)
202 | 
203 |     def forward(self, x):
204 |         skip_x1 = self.skip_res1(x)
205 |         x = self.encoder_pool1(x)
206 |         x = self.encoder_res1(x)
207 |         skip_x2 = self.skip_res2(x)
208 |         x = self.encoder_pool2(x)
209 |         x = self.encoder_res2(x)
210 | 
211 |         x = self.mid_res(x)
212 | 
213 |         x = self.decoder_res2(x)
214 |         x = self.decoder_upsample2(x)
215 |         x = x + skip_x2
216 |         x = self.decoder_res1(x)
217 |         x = self.decoder_upsample1(x)
218 |         x = x + skip_x1
219 | 
220 |         return x
221 | 
222 | 
223 | class V2VModelSimple(nn.Module):
224 |     def __init__(self, input_channels, output_channels):
225 |         super().__init__()
226 | 
227 |         self.front_layers = nn.Sequential(
228 |             Basic3DBlock(input_channels, 32, 7),
229 |         )
230 | 
231 |         self.encoder_decoder = EncoderDecoderSimple()
232 | 
233 |         self.back_layers = nn.Sequential(
234 |             Basic3DBlock(32, 32, 1),
235 |         )
236 | 
237 |         self.output_layer = nn.Conv3d(32, output_channels, kernel_size=1, stride=1, padding=0)
238 | 
239 |         self._initialize_weights()
240 | 
241 |     def forward(self, x):
242 |         x = self.front_layers(x)
243 |         x = self.encoder_decoder(x)
244 |         x = self.back_layers(x)
245 |         x = self.output_layer(x)
246 |         return x
247 | 
248 |     def _initialize_weights(self):
249 |         for m in self.modules():
250 |             if isinstance(m, nn.Conv3d):
251 |                 nn.init.xavier_normal_(m.weight)
252 |                 # nn.init.normal_(m.weight, 0, 0.001)
253 |                 nn.init.constant_(m.bias, 0)
254 |             elif isinstance(m, nn.ConvTranspose3d):
255 |                 nn.init.xavier_normal_(m.weight)
256 |                 # nn.init.normal_(m.weight, 0, 0.001)
257 |                 nn.init.constant_(m.bias, 0)
258 | 
259 | if __name__ == '__main__':
260 |     import time
261 |     model = V2VModel(input_channels=32, output_channels=15)
262 |     model = model.cuda()
263 | 
264 |     for i in range(10):
265 |         input_tensor = torch.randn(8, 32, 64, 64, 64).cuda()
266 |         start_time = time.time()
267 |         output_tensor = model(input_tensor)
268 |         end_time = time.time()
269 |         print('time for one batch: {}'.format(end_time - start_time))
270 |         # print(output_tensor.shape)


--------------------------------------------------------------------------------
/network/voxel_net_depth.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | from copy import copy
  3 | import cv2
  4 | 
  5 | # from utils_proj.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated
  6 | # from utils_proj.fisheye.FishEyeEquisolid import FishEyeCameraEquisolid
  7 | 
  8 | import torch
  9 | from torch import nn
 10 | 
 11 | from utils import op
 12 | from torch.nn.functional import interpolate
 13 | 
 14 | from network import pose_resnet
 15 | from network.v2v import V2VModel
 16 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated
 17 | 
 18 | 
 19 | class VoxelNetwork_depth(nn.Module):
 20 |     def __init__(self, config, device='cuda'):
 21 |         super(VoxelNetwork_depth, self).__init__()
 22 | 
 23 |         self.device = device
 24 |         self.num_joints = config.model.backbone.num_joints
 25 | 
 26 |         # volume
 27 |         self.volume_softmax = config.model.volume_softmax
 28 |         self.volume_multiplier = config.model.volume_multiplier
 29 |         self.volume_size = config.model.volume_size
 30 | 
 31 |         self.cuboid_side = config.model.cuboid_side
 32 | 
 33 |         self.kind = config.model.kind
 34 | 
 35 |         # heatmap
 36 |         self.heatmap_softmax = config.model.heatmap_softmax
 37 |         self.heatmap_multiplier = config.model.heatmap_multiplier
 38 | 
 39 |         # # transfer
 40 |         # self.transfer_cmu_to_human36m = config.model.transfer_cmu_to_human36m if hasattr(config.model, "transfer_cmu_to_human36m") else False
 41 | 
 42 |         load_checkpoint = config.model.backbone.local_checkpoint
 43 |         if load_checkpoint:
 44 |             network_path = config.model.backbone.checkpoint
 45 |             network_loads = torch.load(network_path)
 46 | 
 47 |             self.backbone = pose_resnet.get_pose_net(state_dict=network_loads['state_dict'])
 48 |         else:
 49 |             print('Do not load checkpoint')
 50 |             self.backbone = pose_resnet.get_pose_net(None)
 51 |         self.backbone = self.backbone.to(device)
 52 | 
 53 |         if config.opt.train_2d is False:
 54 |             for p in self.backbone.parameters():
 55 |                 p.requires_grad = False
 56 | 
 57 |         # resize and pad feature for reprojection
 58 |         self.process_features = nn.Sequential(
 59 |             nn.Conv2d(256, 32, 1),
 60 |             nn.Upsample(size=(1024, 1024)),
 61 |             nn.ConstantPad2d(padding=(128, 128, 0, 0), value=0.0)
 62 |         )
 63 |         self.process_features = self.process_features.to(device)
 64 | 
 65 |         self.with_scene = config.model.with_scene
 66 |         if config.model.with_scene is True:
 67 |             if config.model.with_intersection is True:
 68 |                 volume_input_channel_num = 32 + 1 + 32
 69 |                 self.with_intersection = True
 70 |             else:
 71 |                 volume_input_channel_num = 32 + 1
 72 |                 self.with_intersection = False
 73 |         else:
 74 |             volume_input_channel_num = 32
 75 | 
 76 |         self.volume_net = V2VModel(volume_input_channel_num, self.num_joints)
 77 |         self.volume_net = self.volume_net.to(device)
 78 | 
 79 |         print('build coord volume')
 80 |         self.coord_volume = self.build_coord_volume()
 81 |         self.coord_volumes = self.coord_volume.unsqueeze(0).expand(config.opt.batch_size,
 82 |                                                                    -1, -1, -1, -1)
 83 |         self.coord_volumes = self.coord_volumes.to(device)
 84 | 
 85 |         self.fisheye_camera_model = FishEyeCameraCalibrated(
 86 |             calibration_file_path=config.dataset.camera_calibration_path)
 87 |         print('build reprojected grid coord')
 88 |         self.grid_coord_proj = op.get_projected_2d_points_with_coord_volumes(fisheye_model=self.fisheye_camera_model,
 89 |                                                                              coord_volume=self.coord_volume)
 90 |         self.grid_coord_proj.requires_grad = False
 91 | 
 92 |         self.grid_coord_proj_batch = op.get_grid_coord_proj_batch(self.grid_coord_proj,
 93 |                                                                   batch_size=config.opt.batch_size,
 94 |                                                                   heatmap_shape=config.heatmap_shape)
 95 |         self.grid_coord_proj_batch.requires_grad = False
 96 |         self.grid_coord_proj_batch = self.grid_coord_proj_batch.to(device)
 97 | 
 98 |         # self.fisheye_camera_model_depth = FishEyeCameraEquisolid(focal_length=9, sensor_size=32, img_size=(1280, 1024))
 99 | 
100 |         # self.ray_torch = self.calculated_ray_direction(config.dataset.image_width,
101 |         #                                                config.dataset.image_height).to(device)
102 |         # self.ray_torch.requires_grad = False
103 | 
104 |         self.ray = self.calculated_ray_direction_numpy(config.dataset.image_width,
105 |                                                        config.dataset.image_height)
106 | 
107 |         self.image_width = config.dataset.image_width
108 |         self.image_height = config.dataset.image_height
109 | 
110 |     def build_coord_volume(self):
111 |         """
112 |         get coord volume and prepare for the re-projection process
113 |         :param self:
114 |         :return:
115 |         """
116 |         # build coord volumes
117 |         sides = np.array([self.cuboid_side, self.cuboid_side, self.cuboid_side])
118 | 
119 |         position = np.array([-self.cuboid_side / 2, -self.cuboid_side / 2, 0])
120 |         # build coord volume
121 |         xxx, yyy, zzz = torch.meshgrid(torch.arange(self.volume_size),
122 |                                        torch.arange(self.volume_size),
123 |                                        torch.arange(self.volume_size))
124 |         grid = torch.stack([xxx, yyy, zzz], dim=-1).type(torch.float)
125 |         grid = grid.reshape((-1, 3))
126 | 
127 |         grid_coord = torch.zeros_like(grid)
128 |         grid_coord[:, 0] = position[0] + (sides[0] / (self.volume_size - 1)) * grid[:, 0]
129 |         grid_coord[:, 1] = position[1] + (sides[1] / (self.volume_size - 1)) * grid[:, 1]
130 |         grid_coord[:, 2] = position[2] + (sides[2] / (self.volume_size - 1)) * grid[:, 2]
131 | 
132 |         coord_volume = grid_coord.reshape(self.volume_size, self.volume_size, self.volume_size, 3)
133 | 
134 |         return coord_volume
135 | 
136 |     def calculated_ray_direction(self, image_width, image_height):
137 |         points = np.zeros(shape=(image_width, image_height, 2))
138 |         x_range = np.array(range(image_width))
139 |         y_range = np.array(range(image_height))
140 |         points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose()
141 |         points[:, :, 1] = np.add(points[:, :, 1], y_range)
142 |         points = points.reshape((-1, 2))
143 |         ray = self.fisheye_camera_model.camera2world_ray(points)
144 |         ray_torch = torch.from_numpy(ray)
145 |         return ray_torch
146 | 
147 |     def calculated_ray_direction_numpy(self, image_width, image_height):
148 |         points = np.zeros(shape=(image_width, image_height, 2))
149 |         x_range = np.array(range(image_width))
150 |         y_range = np.array(range(image_height))
151 |         points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose()
152 |         points[:, :, 1] = np.add(points[:, :, 1], y_range)
153 |         points = points.reshape((-1, 2))
154 |         ray = self.fisheye_camera_model.camera2world_ray(points)
155 |         return ray
156 | 
157 |     def depth_to_voxel_pytorch(self, depth):
158 |         # directly multiply the depth on the pre-calculated rays
159 |         # resize depth to (image_width, image_height)
160 |         depth = depth.unsqueeze(0)
161 |         depth = interpolate(depth, (self.image_height, self.image_height), mode='nearest')
162 |         depth = torch.nn.functional.pad(depth, pad=[128, 128])
163 |         depth = depth.squeeze(0)[0]
164 |         print(depth.shape)
165 |         depth_img_flat = depth.transpose(0, 1).reshape(-1)
166 |         point_cloud = self.ray_torch.transpose(0, 1) * depth_img_flat
167 |         point_cloud = point_cloud.transpose(0, 1)
168 | 
169 |         # point cloud to voxel
170 | 
171 |         voxel_torch = self.point_cloud_to_voxel_pytorch(point_cloud)
172 |         return voxel_torch
173 | 
174 |     def point_cloud_to_voxel_pytorch(self, point_cloud):
175 |         scene_point_cloud_local = point_cloud
176 | 
177 |         scene_point_cloud_local[:, 0] = (scene_point_cloud_local[:,
178 |                                          0] + self.cuboid_side / 2) * self.volume_size / self.cuboid_side
179 |         scene_point_cloud_local[:, 1] = (scene_point_cloud_local[:,
180 |                                          1] + self.cuboid_side / 2) * self.volume_size / self.cuboid_side
181 |         scene_point_cloud_local[:, 2] = (scene_point_cloud_local[:, 2]) * self.volume_size / self.cuboid_side
182 | 
183 |         scene_point_cloud_local = torch.round(scene_point_cloud_local).long()
184 | 
185 |         # note: the gradient is zero here!!!
186 |         good_indices = torch.logical_and(self.volume_size-1 >= scene_point_cloud_local, scene_point_cloud_local >= 0)
187 |         good_indices = torch.all(good_indices, dim=1)
188 |         scene_point_cloud_local = scene_point_cloud_local[good_indices]
189 |         # scene_point_cloud_local = np.clip(scene_point_cloud_local, a_min=0, a_max=self.volume_size - 1).astype(np.int)
190 |         voxel_torch = torch.zeros(size=(self.volume_size, self.volume_size, self.volume_size)).to(self.device)
191 |         voxel_torch[scene_point_cloud_local.cpu().numpy().T] = 1
192 |         return voxel_torch
193 | 
194 |     def depth_map_to_voxel_numpy(self, depth):
195 |         # directly multiply the depth on the pre-calculated rays
196 |         depth = depth.view(depth.size(-2), depth.size(-1)).cpu().detach().numpy()
197 |         depth = cv2.resize(depth, dsize=(1024, 1024), interpolation=cv2.INTER_NEAREST)
198 |         depth = np.pad(depth, ((0, 0), (128, 128)), 'constant', constant_values=0)
199 |         depth_img_flat = depth.T.reshape((-1))
200 |         point_cloud = self.ray.T * depth_img_flat
201 |         point_cloud = point_cloud.T
202 | 
203 |         # point cloud to voxel
204 |         voxel_torch = self.point_cloud_to_voxel_numpy(point_cloud)
205 |         return voxel_torch
206 | 
207 |     def point_cloud_to_voxel_numpy(self, point_cloud):
208 |         scene_point_cloud_local = copy(point_cloud)
209 |         scene_point_cloud_local[:, 0] = (scene_point_cloud_local[:,
210 |                                          0] + self.cuboid_side / 2) * self.volume_size / self.cuboid_side
211 |         scene_point_cloud_local[:, 1] = (scene_point_cloud_local[:,
212 |                                          1] + self.cuboid_side / 2) * self.volume_size / self.cuboid_side
213 |         scene_point_cloud_local[:, 2] = (scene_point_cloud_local[:, 2]) * self.volume_size / self.cuboid_side
214 | 
215 |         scene_point_cloud_local = np.round_(scene_point_cloud_local)
216 |         good_indices = np.logical_and(self.volume_size-1 >= scene_point_cloud_local, scene_point_cloud_local >= 0)
217 |         good_indices = np.all(good_indices, axis=1)
218 |         scene_point_cloud_local = scene_point_cloud_local[good_indices]
219 |         # scene_point_cloud_local = np.clip(scene_point_cloud_local, a_min=0, a_max=self.volume_size - 1).astype(np.int)
220 |         voxel_torch = torch.zeros(size=(self.volume_size, self.volume_size, self.volume_size)).to(self.device)
221 |         voxel_torch[scene_point_cloud_local.T] = 1
222 |         return voxel_torch
223 | 
224 |     def forward(self, images, grid_coord_proj_batch, coord_volumes, scene_volumes=None, depth_map_batch=None):
225 |         """
226 |         side: the length of volume square side, like 2 meters or 3 meters
227 |         volume_size: the number of grid, like we have 64 or 32 grids for each side
228 |         :param images:
229 |         :return:
230 |         """
231 |         device = images.device
232 |         batch_size = images.shape[0]
233 | 
234 |         # forward backbone
235 |         heatmaps, features = self.backbone(images)
236 | 
237 |         # process features before unprojecting
238 |         features = self.process_features(features)
239 | 
240 |         # lift to volume
241 |         if features.shape[0] < grid_coord_proj_batch.shape[0]:
242 |             grid_coord_proj_batch = grid_coord_proj_batch[:features.shape[0]]
243 |         volumes = op.unproject_heatmaps_one_view_batch(features, grid_coord_proj_batch, self.volume_size)
244 | 
245 |         if self.with_scene is True:
246 |             if scene_volumes is not None:
247 |                 # combine scene volume with project pose volume
248 |                 scene_volumes = torch.unsqueeze(scene_volumes, dim=1)
249 |                 volumes = torch.cat([volumes, scene_volumes], dim=1)
250 |             elif depth_map_batch is not None:
251 |                 voxel_list = []
252 |                 for depth_map in depth_map_batch:
253 |                     voxel_torch = self.depth_map_to_voxel_numpy(depth_map)
254 |                     # show_voxel_torch(voxel_torch)
255 |                     voxel_list.append(voxel_torch)
256 |                 scene_volumes = torch.stack(voxel_list, dim=0)
257 |                 scene_volumes = torch.unsqueeze(scene_volumes, dim=1)
258 |                 if self.with_intersection is True:
259 |                     intersection = volumes * scene_volumes
260 |                     volumes = torch.cat([volumes, intersection, scene_volumes], dim=1)
261 |                 else:
262 |                     volumes = torch.cat([volumes, scene_volumes], dim=1)
263 |             else:
264 |                 print("no scene volume or depth input!")
265 |                 return None
266 | 
267 |         # integral 3d
268 |         volumes = self.volume_net(volumes)
269 |         if volumes.shape[0] < coord_volumes.shape[0]:
270 |             coord_volumes = coord_volumes[:volumes.shape[0]]
271 |         vol_keypoints_3d, volumes = op.integrate_tensor_3d_with_coordinates(volumes * self.volume_multiplier,
272 |                                                                             coord_volumes,
273 |                                                                             softmax=self.volume_softmax)
274 | 
275 |         return vol_keypoints_3d, features, volumes, self.coord_volumes
276 | 
277 | 
278 | def run_voxel_net():
279 |     image_batch = torch.ones(size=(4, 3, 256, 256))
280 |     from utils import cfg
281 |     config_path = 'experiments/mo2cap2/train/mo2cap2_vol_softmax.yaml'
282 |     config = cfg.load_config(config_path)
283 |     projection_network = VoxelNetwork_depth(config=config, device='cuda')
284 | 
285 |     vol_keypoints_3d, features, volumes, coord_volumes = projection_network(image_batch)
286 |     print(vol_keypoints_3d.shape)
287 | 
288 | 
289 | def show_voxel_torch(voxel):
290 |     import matplotlib.pyplot as plt
291 |     # show voxel result
292 |     voxel_np = voxel.cpu().numpy()
293 |     fig = plt.figure()
294 |     ax = fig.gca(projection='3d')
295 |     # ax.set_aspect('equal')
296 | 
297 |     ax.voxels(voxel_np, edgecolor="k")
298 |     plt.savefig('tmp/0.png')
299 |     exit(0)
300 |     # plt.show()
301 | 
302 | def calculate_max_position():
303 |     volumes = torch.ones(size=(4, 15, 64, 64, 64)) * 0
304 | 
305 |     volumes[:, :, 32, 32, 32] = 1
306 |     volumes[:, :, 31, 31, 31] = 1
307 | 
308 |     from utils import cfg
309 |     config_path = '../experiments/mo2cap2/train/mo2cap2_vol_softmax.yaml'
310 |     config = cfg.load_config(config_path)
311 |     projection_network = VoxelNetwork_depth(config=config, device='cpu')
312 | 
313 |     coord_volumes = projection_network.coord_volumes
314 | 
315 |     vol_keypoints_3d, volumes = op.integrate_tensor_3d_with_coordinates(volumes,
316 |                                                                         coord_volumes,
317 |                                                                         softmax=True)
318 | 
319 |     print(vol_keypoints_3d)
320 |     print(vol_keypoints_3d.shape)
321 | 
322 | 
323 | def visualize_grid_coord_proj():
324 |     from utils import cfg
325 |     from visualization.visualization_projected_grid import draw_points
326 |     import cv2
327 |     config_path = 'experiments/mo2cap2/train/mo2cap2_vol_softmax.yaml'
328 |     config = cfg.load_config(config_path)
329 |     projection_network = VoxelNetwork_depth(config=config, device='cpu')
330 | 
331 |     grid_coord_proj = projection_network.grid_coord_proj
332 | 
333 |     canvas = cv2.imread(
334 |         r'\\winfs-inf\HPS\Mo2Cap2Plus1\static00\EgocentricData\old_data\kitchen_2\imgs\img_-04032020185303-3.jpg')
335 | 
336 |     canvas = draw_points(canvas, grid_coord_proj)
337 | 
338 |     cv2.imshow('img', canvas)
339 |     cv2.waitKey(0)
340 | 
341 | 
342 | def test_reproject_feature():
343 |     from utils import cfg
344 |     import cv2
345 |     config_path = 'experiments/local/train/mo2cap2_vol_softmax.yaml'
346 |     config = cfg.load_config(config_path)
347 |     projection_network = VoxelNetwork_depth(config=config, device='cpu')
348 | 
349 |     # features = torch.zeros(size=(4, 15, 1024, 1280))
350 |     #
351 |     # features[:, :, 490:570, 620:700] = 10
352 | 
353 |     # read feature from egocentric map
354 | 
355 |     heatmap_path = r'X:\Mo2Cap2Plus1\static00\EgocentricData\REC23102020\studio-jian1\da_external_layer4\img-10232020185916-700.mat'
356 |     from scipy.io import loadmat
357 |     heatmap = loadmat(heatmap_path)
358 |     heatmap = heatmap['heatmap']
359 |     # recover heatmap
360 |     heatmap = cv2.resize(heatmap, dsize=(1024, 1024), interpolation=cv2.INTER_NEAREST)
361 |     heatmap = np.pad(heatmap, ((0, 0), (128, 128), (0, 0)), mode='edge')
362 |     heatmap = heatmap.transpose((2, 0, 1))
363 | 
364 |     heatmap = np.expand_dims(heatmap, axis=0)
365 | 
366 |     features = torch.from_numpy(heatmap)
367 | 
368 |     grid_coord_proj_batch = op.get_grid_coord_proj_batch(projection_network.grid_coord_proj,
369 |                                                          batch_size=features.shape[0],
370 |                                                          heatmap_shape=(1024, 1280),
371 |                                                          device='cpu')
372 | 
373 |     volumes_batch = op.unproject_heatmaps_one_view_batch(features, grid_coord_proj_batch,
374 |                                                          projection_network.volume_size)
375 | 
376 |     volumes = op.unproject_heatmaps_one_view(features, projection_network.grid_coord_proj,
377 |                                              projection_network.volume_size)
378 | 
379 |     print(torch.sum(torch.abs(volumes - volumes_batch)))
380 |     # # visualize volume
381 |     # print(volumes.shape)
382 |     # for i in range(15):
383 |     #     volume_single = volumes[0][i]
384 |     #
385 |     #     zoomed_volume_single = zoom(volume_single, 0.5) * 100
386 |     #     point_cloud = visualize_3D_grid_single_grid(zoomed_volume_single)
387 |     #     coord = open3d.geometry.TriangleMesh.create_coordinate_frame(size=10)
388 |     #     open3d.visualization.draw_geometries([point_cloud, coord])
389 | 
390 | 
391 | if __name__ == '__main__':
392 |     # visualize_grid_coord_proj()
393 |     # test_reproject_feature()
394 |     # calculate_max_position()
395 |     run_voxel_net()
396 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | open3d
3 | tqdm
4 | opencv-python
5 | scipy
6 | matplotlib
7 | natsort


--------------------------------------------------------------------------------
/resources/Wang_CVPR_2023.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/resources/Wang_CVPR_2023.gif


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from pprint import pprint
 3 | import torch
 4 | from torch.utils.data import DataLoader
 5 | from tqdm import tqdm
 6 | 
 7 | from dataset.test_dataset import TestDataset
 8 | from network.voxel_net_depth import VoxelNetwork_depth
 9 | from utils import cfg
10 | from utils.skeleton import Skeleton
11 | 
12 | os.environ["OPENCV_IO_ENABLE_OPENEXR"] = "1"
13 | 
14 | 
15 | class Test:
16 |     def __init__(self, config, seq_name, estimated_depth_name=None):
17 |         self.device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
18 |         self.context_dataset = TestDataset(config, seq_name=seq_name, voxel_output=False,
19 |                                            estimated_depth_name=estimated_depth_name)
20 |         self.context_dataloader = DataLoader(self.context_dataset, batch_size=config.test.batch_size, shuffle=False,
21 |                                              drop_last=False, num_workers=1)
22 | 
23 |         self.network = VoxelNetwork_depth(config, device=self.device)
24 | 
25 |         # load the network model
26 |         model_path = config.test.model_path
27 |         loads = torch.load(model_path)
28 |         self.network.load_state_dict(loads['state_dict'])
29 | 
30 |         self.network = self.network.to(self.device)
31 | 
32 |         self.skeleton = Skeleton(calibration_path=config.dataset.camera_calibration_path)
33 | 
34 |     def run(self, config):
35 |         print('---------------------Start Training-----------------------')
36 |         pprint(config.__dict__)
37 |         self.network.eval()
38 | 
39 |         predicted_joint_list = []
40 |         with torch.no_grad():
41 |             for i, (img, img_rgb, depth_info, img_path) in tqdm(enumerate(self.context_dataloader)):
42 |                 img = img.to(self.device)
43 |                 img_rgb = img_rgb.to(self.device)
44 | 
45 |                 depth_info = depth_info.to(self.device)
46 | 
47 |                 grid_coord_proj_batch = self.network.grid_coord_proj_batch
48 |                 coord_volumes = self.network.coord_volumes
49 | 
50 |                 vol_keypoints_3d, features, volumes, coord_volumes = self.network(img, grid_coord_proj_batch,
51 |                                                                                   coord_volumes,
52 |                                                                                   depth_map_batch=depth_info)
53 | 
54 |                 predicted_keypoints_batch = vol_keypoints_3d.cpu().numpy()
55 | 
56 |                 predicted_joint_list.extend(predicted_keypoints_batch)
57 |                 # print(len(predicted_joint_list))
58 | 
59 |         return predicted_joint_list
60 | 
61 | 
62 | if __name__ == '__main__':
63 |     config_path = 'experiments/sceneego/test/sceneego.yaml'
64 |     import pickle
65 | 
66 |     seq_name = 'new_diogo1'
67 | 
68 |     config = cfg.load_config(config_path)
69 |     test = Test(config, seq_name, estimated_depth_name='matterport_green')
70 |     predicted_joint_list = test.run(config)
71 | 
72 |     mpjpe, pampjpe = test.context_dataset.evaluate_mpjpe(predicted_joint_list)
73 | 
74 |     print('mpjpe: {}'.format(mpjpe))
75 |     print('pa mpjpe: {}'.format(pampjpe))
76 | 
77 |     # save predicted joint list
78 | 
79 |     save_dir = r'/HPS/ScanNet/work/egocentric_view/25082022/diogo1/out'
80 |     if not os.path.isdir(save_dir):
81 |         os.makedirs(save_dir)
82 |     save_path = os.path.join(save_dir, 'no_body_diogo1.pkl')
83 | 
84 |     save_obj = predicted_joint_list
85 |     with open(save_path, 'wb') as f:
86 |         pickle.dump(save_obj, f)
87 | 


--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/utils/__init__.py


--------------------------------------------------------------------------------
/utils/calculate_errors.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | from utils_proj.rigid_transform_with_scale import umeyama
  4 | from utils_proj.skeleton import Skeleton
  5 | from copy import deepcopy
  6 | 
  7 | 
  8 | def global_align_skeleton_seq(estimated_seq, gt_seq):
  9 |     estimated_seq = np.asarray(estimated_seq).reshape((-1, 3))
 10 |     gt_seq = np.asarray(gt_seq).reshape((-1, 3))
 11 |     # aligned_pose_list = np.zeros_like(estimated_seq)
 12 |     # for s in range(estimated_seq.shape[0]):
 13 |     #     pose_p = estimated_seq[s]
 14 |     #     pose_gt_bs = gt_seq[s]
 15 |     c, R, t = umeyama(estimated_seq, gt_seq)
 16 |     pose_p = estimated_seq.dot(R) * c + t
 17 |     # aligned_pose_list[s] = pose_p
 18 |     
 19 |     return pose_p.reshape((-1, 15, 3))
 20 | 
 21 | 
 22 | def calculate_error(estimated_seq, gt_seq):
 23 |     estimated_seq = np.asarray(estimated_seq)
 24 |     gt_seq = np.asarray(gt_seq)
 25 |     distance = estimated_seq - gt_seq
 26 |     distance = np.linalg.norm(distance, axis=2)
 27 |     m_distance = np.mean(distance)
 28 |     return m_distance
 29 | 
 30 | 
 31 | def calculate_slam_error(estimated_seq, gt_seq, align=False):
 32 |     # seq shape: n_seq, 15, 3
 33 |     estimated_seq = np.asarray(estimated_seq)
 34 |     gt_seq = np.asarray(gt_seq)
 35 |     estimated_root_seq = (estimated_seq[:, 7, :] + estimated_seq[:, 11, :]) / 2
 36 |     gt_root_seq = (gt_seq[:, 7, :] + gt_seq[:, 11, :]) / 2
 37 |     
 38 |     if align is True:
 39 |         c, R, t = umeyama(estimated_root_seq, gt_root_seq)
 40 |         estimated_root_seq = estimated_root_seq.dot(R) * c + t
 41 |     
 42 |     distance = estimated_root_seq - gt_root_seq
 43 |     distance = np.linalg.norm(distance, axis=1)
 44 |     m_distance = np.mean(distance)
 45 |     return m_distance
 46 | 
 47 | def align_skeleton_size(estimated_seq, gt_seq):
 48 |     estimated_seq = deepcopy(np.asarray(estimated_seq))
 49 |     gt_seq = deepcopy(np.asarray(gt_seq))
 50 |     aligned_pose_list = np.zeros_like(estimated_seq)
 51 |     for s in range(estimated_seq.shape[0]):
 52 |         pose_p = estimated_seq[s]
 53 |         pose_gt_bs = gt_seq[s]
 54 |         c, R, t = umeyama(pose_p, pose_gt_bs)
 55 |         pose_p = pose_p * c
 56 |         aligned_pose_list[s] = pose_p
 57 | 
 58 |     return aligned_pose_list
 59 | 
 60 | def align_skeleton(estimated_seq, gt_seq, skeleton_model=None, scale=True):
 61 |     estimated_seq = deepcopy(np.asarray(estimated_seq))
 62 |     gt_seq = deepcopy(np.asarray(gt_seq))
 63 |     if skeleton_model is not None:
 64 |         for i in range(len(estimated_seq)):
 65 |             estimated_seq[i] = skeleton_model.skeleton_resize_single(
 66 |                 estimated_seq[i],
 67 |                 bone_length_file='utils/fisheye/mean3D.mat')
 68 |         for i in range(len(gt_seq)):
 69 |             gt_seq[i] = skeleton_model.skeleton_resize_single(
 70 |                 gt_seq[i],
 71 |                 bone_length_file='utils/fisheye/mean3D.mat')
 72 |     
 73 |     aligned_pose_list = np.zeros_like(estimated_seq)
 74 |     for s in range(estimated_seq.shape[0]):
 75 |         pose_p = estimated_seq[s]
 76 |         pose_gt_bs = gt_seq[s]
 77 |         if scale is False:
 78 |             # if scale is False, firstly align the center of each pose
 79 |             pose_p_center = np.mean(pose_p, axis=0)
 80 |             pose_gt_center = np.mean(pose_gt_bs, axis=0)
 81 |             pose_p -= pose_p_center
 82 |             pose_gt_bs -= pose_gt_center
 83 | 
 84 |         c, R, t = umeyama(pose_p, pose_gt_bs)
 85 |         if scale is True:
 86 |             pose_p = pose_p.dot(R) * c + t
 87 |         else:
 88 |             pose_p = pose_p.dot(R) + t
 89 |         aligned_pose_list[s] = pose_p
 90 | 
 91 |     return aligned_pose_list, gt_seq
 92 | 
 93 | 
 94 | def calculate_joint_error(estimated_seq, gt_seq):
 95 |     estimated_seq = np.asarray(estimated_seq)
 96 |     gt_seq = np.asarray(gt_seq)
 97 |     distance = estimated_seq - gt_seq
 98 |     distance = np.linalg.norm(distance, axis=2)
 99 |     joints_distance = np.mean(distance, axis=0)
100 |     return joints_distance
101 | 
102 | 
103 | def calculate_errors(final_estimated_seq, mid_estimated_seq, final_optimized_seq=None, final_gt_seq=None):
104 |     skeleton_model = Skeleton(None)
105 |     original_global_mpjpe = calculate_error(final_estimated_seq, final_gt_seq)
106 |     mid_global_mpjpe = calculate_error(mid_estimated_seq, final_gt_seq)
107 |     optimized_global_mpjpe = calculate_error(final_optimized_seq, final_gt_seq)
108 |     
109 |     original_camera_pos_error = calculate_slam_error(final_estimated_seq, final_gt_seq)
110 |     optimized_camera_pos_error = calculate_slam_error(final_optimized_seq, final_gt_seq)
111 |     
112 | 
113 |     
114 |     # align the estimated result and original result
115 |     
116 |     aligned_estimated_seq_result = global_align_skeleton_seq(final_estimated_seq, final_gt_seq)
117 |     aligned_estimated_mid_seq_result = global_align_skeleton_seq(mid_estimated_seq, final_gt_seq)
118 |     aligned_optimized_seq_result = global_align_skeleton_seq(final_optimized_seq, final_gt_seq)
119 | 
120 |     original_aligned_camera_pos_error = calculate_slam_error(aligned_estimated_seq_result, final_gt_seq, align=False)
121 |     mid_aligned_camera_pose_error = calculate_slam_error(aligned_estimated_mid_seq_result, final_gt_seq, align=False)
122 |     optimized_aligned_camera_pos_error = calculate_slam_error(aligned_optimized_seq_result, final_gt_seq, align=False)
123 |     
124 |     aligned_original_seq_mpjpe = calculate_error(aligned_estimated_seq_result, final_gt_seq)
125 |     aligned_mid_seq_mpjpe = calculate_error(aligned_estimated_mid_seq_result, final_gt_seq)
126 |     aligned_optimized_seq_mpjpe = calculate_error(aligned_optimized_seq_result, final_gt_seq)
127 |     
128 |     # align the estimated result and original result
129 |     aligned_estimated_result, final_gt_seq = align_skeleton(final_estimated_seq, final_gt_seq, None)
130 |     aligned_mid_optimized_result, final_gt_seq = align_skeleton(mid_estimated_seq, final_gt_seq, None)
131 |     aligned_optimized_result, final_gt_seq = align_skeleton(final_optimized_seq, final_gt_seq, None)
132 |     
133 |     aligned_original_mpjpe = calculate_error(aligned_estimated_result, final_gt_seq)
134 |     aligned_mid_optimized_mpjpe = calculate_error(aligned_mid_optimized_result, final_gt_seq)
135 |     aligned_optimized_mpjpe = calculate_error(aligned_optimized_result, final_gt_seq)
136 |     
137 |     # align the estimated result and original result
138 |     aligned_estimated_result, final_gt_seq = align_skeleton(final_estimated_seq, final_gt_seq, skeleton_model)
139 |     aligned_mid_optimized_result, final_gt_seq = align_skeleton(mid_estimated_seq, final_gt_seq, skeleton_model)
140 |     aligned_optimized_result, final_gt_seq = align_skeleton(final_optimized_seq, final_gt_seq, skeleton_model)
141 |     
142 |     bone_length_aligned_original_mpjpe = calculate_error(aligned_estimated_result, final_gt_seq)
143 |     bone_length_aligned_mid_optimized_mpjpe = calculate_error(aligned_mid_optimized_result, final_gt_seq)
144 |     bone_length_aligned_optimized_mpjpe = calculate_error(aligned_optimized_result, final_gt_seq)
145 |     joints_error = calculate_joint_error(aligned_optimized_result, final_gt_seq)
146 |     
147 |     from collections import OrderedDict
148 |     result = OrderedDict({'original_global_mpjpe': original_global_mpjpe,
149 |                           'mid_global_mpjpe': mid_global_mpjpe,
150 |                           'optimized_global_mpjpe': optimized_global_mpjpe,
151 |                           'original_camera_pos_error': original_camera_pos_error,
152 |                           'optimized_camera_pos_error': optimized_camera_pos_error,
153 |                           
154 |                           'original_aligned_camera_pos_error': original_aligned_camera_pos_error,
155 |                           'mid_aligned_camera_pose_error': mid_aligned_camera_pose_error,
156 |                           'optimized_aligned_camera_pos_error': optimized_aligned_camera_pos_error,
157 |                           
158 |                           'original_aligned_global_mpjpe': aligned_original_seq_mpjpe,
159 |                           "aligned_mid_seq_mpjpe": aligned_mid_seq_mpjpe,
160 |                           'optimized_aligned_global_mpjpe': aligned_optimized_seq_mpjpe,
161 |                           'aligned_original_mpjpe': aligned_original_mpjpe,
162 |                           'aligned_mid_optimized_mpjpe': aligned_mid_optimized_mpjpe,
163 |                           'aligned_optimized_mpjpe': aligned_optimized_mpjpe,
164 |                           'bone_length_aligned_original_mpjpe': bone_length_aligned_original_mpjpe,
165 |                           'bone_length_aligned_mid_optimized_mpjpe': bone_length_aligned_mid_optimized_mpjpe,
166 |                           'bone_length_aligned_optimized_mpjpe': bone_length_aligned_optimized_mpjpe,
167 |                           'joints_error': joints_error})
168 |     return result
169 | 


--------------------------------------------------------------------------------
/utils/cfg.py:
--------------------------------------------------------------------------------
 1 | import yaml
 2 | from easydict import EasyDict as edict
 3 | 
 4 | 
 5 | def load_config(path):
 6 |     with open(path) as fin:
 7 |         config = edict(yaml.safe_load(fin))
 8 | 
 9 |     return config
10 | 


--------------------------------------------------------------------------------
/utils/data_transforms.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # Developed by Haozhe Xie <cshzxie@gmail.com>
  4 | # References:
  5 | # - https://github.com/xiumingzhang/GenRe-ShapeHD
  6 | 
  7 | import cv2
  8 | import matplotlib.pyplot as plt
  9 | import matplotlib.patches as patches
 10 | import numpy as np
 11 | import os
 12 | import random
 13 | import torch
 14 | 
 15 | 
 16 | class Compose(object):
 17 |     """ Composes several transforms together.
 18 |     For example:
 19 |     >>> transforms.Compose([
 20 |     >>>     transforms.RandomBackground(),
 21 |     >>>     transforms.CenterCrop(127, 127, 3),
 22 |     >>>  ])
 23 |     """
 24 | 
 25 |     def __init__(self, transforms):
 26 |         self.transforms = transforms
 27 | 
 28 |     def __call__(self, image, bounding_box=None):
 29 |         for t in self.transforms:
 30 |             if t.__class__.__name__ == 'RandomCrop' or t.__class__.__name__ == 'CenterCrop':
 31 |                 image = t(image, bounding_box)
 32 |             else:
 33 |                 image = t(image)
 34 | 
 35 |         return image
 36 | 
 37 | 
 38 | class ToTensor(object):
 39 |     """
 40 |     Convert a numpy.ndarray to tensor.
 41 |     """
 42 | 
 43 |     def __call__(self, image):
 44 |         assert (isinstance(image, np.ndarray))
 45 |         # HWC to CHW
 46 |         array = np.transpose(image, (2, 0, 1))
 47 |         # handle numpy array
 48 |         tensor = torch.from_numpy(array)
 49 |         return tensor.float()
 50 | 
 51 | 
 52 | class BGR2RGB:
 53 |     """
 54 |     convert BGR image to RGB image
 55 |     """
 56 | 
 57 |     def __call__(self, image):
 58 |         # BGR to RGB
 59 |         return image[:, :, ::-1]
 60 | 
 61 | 
 62 | class Normalize(object):
 63 |     def __init__(self, mean, std):
 64 |         self.mean = mean
 65 |         self.std = std
 66 | 
 67 |     def __call__(self, image):
 68 |         assert (isinstance(image, np.ndarray))
 69 |         image -= self.mean
 70 |         image /= self.std
 71 | 
 72 |         return image
 73 | 
 74 | 
 75 | class SimpleNormalize(object):
 76 |     def __init__(self):
 77 |         self.mean = 0.4
 78 | 
 79 |     def __call__(self, image):
 80 |         assert (isinstance(image, np.ndarray))
 81 |         image -= self.mean
 82 |         return image
 83 | 
 84 | 
 85 | class RandomPermuteRGB(object):
 86 |     """
 87 |     Random permute RGB channels???
 88 |     """
 89 | 
 90 |     def __call__(self, image):
 91 |         assert (isinstance(image, np.ndarray))
 92 | 
 93 |         random_permutation = np.random.permutation(3)
 94 |         image = image[..., random_permutation]
 95 | 
 96 |         return image
 97 | 
 98 | 
 99 | class CenterCrop(object):
100 |     def __init__(self, img_size, crop_size):
101 |         """Set the height and weight before and after cropping"""
102 |         self.img_size_h = img_size[0]
103 |         self.img_size_w = img_size[1]
104 |         self.crop_size_h = crop_size[0]
105 |         self.crop_size_w = crop_size[1]
106 | 
107 |     def __call__(self, image, bounding_box=None):
108 | 
109 |         img_height, img_width, _ = image.shape
110 | 
111 |         if bounding_box is not None:
112 |             bounding_box = [
113 |                 bounding_box[0],
114 |                 bounding_box[1],
115 |                 bounding_box[2],
116 |                 bounding_box[3]
117 |             ]  # yapf: disable
118 | 
119 |             # Calculate the size of bounding boxes
120 |             bbox_width = bounding_box[2] - bounding_box[0]
121 |             bbox_height = bounding_box[3] - bounding_box[1]
122 |             bbox_x_mid = (bounding_box[2] + bounding_box[0]) * .5
123 |             bbox_y_mid = (bounding_box[3] + bounding_box[1]) * .5
124 | 
125 |             # Make the crop area as a square
126 |             square_object_size = max(bbox_width, bbox_height)
127 | 
128 |             x_left = int(bbox_x_mid - square_object_size * .5)
129 |             x_right = int(bbox_x_mid + square_object_size * .5)
130 |             y_top = int(bbox_y_mid - square_object_size * .5)
131 |             y_bottom = int(bbox_y_mid + square_object_size * .5)
132 | 
133 |             # If the crop position is out of the image, fix it with padding
134 |             pad_x_left = 0
135 |             if x_left < 0:
136 |                 pad_x_left = -x_left
137 |                 x_left = 0
138 |             pad_x_right = 0
139 |             if x_right >= img_width:
140 |                 pad_x_right = x_right - img_width + 1
141 |                 x_right = img_width - 1
142 |             pad_y_top = 0
143 |             if y_top < 0:
144 |                 pad_y_top = -y_top
145 |                 y_top = 0
146 |             pad_y_bottom = 0
147 |             if y_bottom >= img_height:
148 |                 pad_y_bottom = y_bottom - img_height + 1
149 |                 y_bottom = img_height - 1
150 | 
151 |             # Padding the image and resize the image
152 |             processed_image = np.pad(
153 |                 image[y_top:y_bottom + 1, x_left:x_right + 1], ((pad_y_top, pad_y_bottom), (pad_x_left, pad_x_right),
154 |                                                                 (0, 0)),
155 |                 mode='edge')
156 |             processed_image = cv2.resize(processed_image, (self.img_size_w, self.img_size_h))
157 |         else:
158 |             if img_height > self.crop_size_h and img_width > self.crop_size_w:
159 |                 x_left = int(img_width - self.crop_size_w) // 2
160 |                 x_right = int(x_left + self.crop_size_w)
161 |                 y_top = int(img_height - self.crop_size_h) // 2
162 |                 y_bottom = int(y_top + self.crop_size_h)
163 |             else:
164 |                 x_left = 0
165 |                 x_right = img_width
166 |                 y_top = 0
167 |                 y_bottom = img_height
168 | 
169 |             processed_image = cv2.resize(image[y_top:y_bottom, x_left:x_right], (self.img_size_w, self.img_size_h))
170 | 
171 |         return processed_image
172 | 
173 | 
174 | class RandomCrop(object):
175 |     def __init__(self, img_size, crop_size):
176 |         """Set the height and weight before and after cropping"""
177 |         self.img_size_h = img_size[0]
178 |         self.img_size_w = img_size[1]
179 |         self.crop_size_h = crop_size[0]
180 |         self.crop_size_w = crop_size[1]
181 | 
182 |     def __call__(self, image, silhouette=None, bounding_box=None):
183 | 
184 |         img_height, img_width, crop_size_c = image.shape
185 | 
186 |         if bounding_box is not None:
187 |             bounding_box = [
188 |                 bounding_box[0],
189 |                 bounding_box[1],
190 |                 bounding_box[2],
191 |                 bounding_box[3]
192 |             ]  # yapf: disable
193 | 
194 |             # Calculate the size of bounding boxes
195 |             bbox_width = bounding_box[2] - bounding_box[0]
196 |             bbox_height = bounding_box[3] - bounding_box[1]
197 |             bbox_x_mid = (bounding_box[2] + bounding_box[0]) * .5
198 |             bbox_y_mid = (bounding_box[3] + bounding_box[1]) * .5
199 | 
200 |             # Make the crop area as a square
201 |             square_object_size = max(bbox_width, bbox_height)
202 |             square_object_size = square_object_size * random.uniform(0.8, 1.2)
203 | 
204 |             x_left = int(bbox_x_mid - square_object_size * random.uniform(.4, .6))
205 |             x_right = int(bbox_x_mid + square_object_size * random.uniform(.4, .6))
206 |             y_top = int(bbox_y_mid - square_object_size * random.uniform(.4, .6))
207 |             y_bottom = int(bbox_y_mid + square_object_size * random.uniform(.4, .6))
208 | 
209 |             # If the crop position is out of the image, fix it with padding
210 |             pad_x_left = 0
211 |             if x_left < 0:
212 |                 pad_x_left = -x_left
213 |                 x_left = 0
214 |             pad_x_right = 0
215 |             if x_right >= img_width:
216 |                 pad_x_right = x_right - img_width + 1
217 |                 x_right = img_width - 1
218 |             pad_y_top = 0
219 |             if y_top < 0:
220 |                 pad_y_top = -y_top
221 |                 y_top = 0
222 |             pad_y_bottom = 0
223 |             if y_bottom >= img_height:
224 |                 pad_y_bottom = y_bottom - img_height + 1
225 |                 y_bottom = img_height - 1
226 | 
227 |             # Padding the image and resize the image
228 |             processed_image = np.pad(
229 |                 image[y_top:y_bottom + 1, x_left:x_right + 1], ((pad_y_top, pad_y_bottom), (pad_x_left, pad_x_right),
230 |                                                                 (0, 0)), mode='edge')
231 |             processed_image = cv2.resize(processed_image, (self.img_size_w, self.img_size_h))
232 |             if silhouette is not None:
233 |                 processed_silhouette = np.pad(
234 |                     silhouette[y_top:y_bottom + 1, x_left:x_right + 1],
235 |                     ((pad_y_top, pad_y_bottom), (pad_x_left, pad_x_right),
236 |                      (0, 0)), mode='constant')
237 |                 processed_silhouette = cv2.resize(processed_silhouette, (self.img_size_w, self.img_size_h))
238 |             else:
239 |                 processed_silhouette = None
240 | 
241 | 
242 |         else:
243 |             if img_height > self.crop_size_h and img_width > self.crop_size_w:
244 |                 x_left = int(img_width - self.crop_size_w) // 2
245 |                 x_right = int(x_left + self.crop_size_w)
246 |                 y_top = int(img_height - self.crop_size_h) // 2
247 |                 y_bottom = int(y_top + self.crop_size_h)
248 |             else:
249 |                 x_left = 0
250 |                 x_right = img_width
251 |                 y_top = 0
252 |                 y_bottom = img_height
253 | 
254 |             processed_image = cv2.resize(image[y_top:y_bottom, x_left:x_right], (self.img_size_w, self.img_size_h))
255 |             if silhouette is not None:
256 |                 processed_silhouette = cv2.resize(silhouette[y_top:y_bottom, x_left:x_right],
257 |                                                   (self.img_size_w, self.img_size_h))
258 |             else:
259 |                 processed_silhouette = None
260 | 
261 |         return processed_image, processed_silhouette
262 | 
263 | 
264 | class RandomFlip(object):
265 |     def __call__(self, image, silhouette=None):
266 |         assert (isinstance(image, np.ndarray))
267 | 
268 |         if random.randint(0, 1):
269 |             image = np.fliplr(image)
270 |             if silhouette is not None:
271 |                 silhouette = np.fliplr(silhouette)
272 | 
273 |         return image, silhouette
274 | 
275 | 
276 | class RandomNoise(object):
277 |     def __init__(self,
278 |                  noise_std,
279 |                  eigvals=(0.2175, 0.0188, 0.0045),
280 |                  eigvecs=((-0.5675, 0.7192, 0.4009), (-0.5808, -0.0045, -0.8140), (-0.5836, -0.6948, 0.4203))):
281 |         self.noise_std = noise_std
282 |         self.eigvals = np.array(eigvals)
283 |         self.eigvecs = np.array(eigvecs)
284 | 
285 |     def __call__(self, image):
286 |         alpha = np.random.normal(loc=0, scale=self.noise_std, size=3)
287 |         noise_rgb = \
288 |             np.sum(
289 |                 np.multiply(
290 |                     np.multiply(
291 |                         self.eigvecs,
292 |                         np.tile(alpha, (3, 1))
293 |                     ),
294 |                     np.tile(self.eigvals, (3, 1))
295 |                 ),
296 |                 axis=1
297 |             )
298 | 
299 |         # Allocate new space for storing processed images
300 |         img_height, img_width, img_channels = image.shape
301 |         assert (img_channels == 3), "Please use RandomBackground to normalize image channels"
302 |         for i in range(img_channels):
303 |             image[:, :, i] += noise_rgb[i]
304 | 
305 |         return image
306 | 
307 | 
308 | class ColorJitter(object):
309 |     def __init__(self, color_add, color_mul):
310 |         self.color_add_low = 0
311 |         self.color_add_high = color_add
312 |         self.color_mul_low = 1 - color_mul
313 |         self.color_mul_high = 1 + color_mul
314 | 
315 |     def __call__(self, rendering_image):
316 |         color_add = np.random.uniform(self.color_add_low, self.color_add_high, size=(1, 1, 3))
317 |         color_mul = np.random.uniform(self.color_mul_low, self.color_mul_high, size=(1, 1, 3))
318 |         rendering_image = rendering_image + color_add
319 |         rendering_image = rendering_image * color_mul
320 |         return rendering_image
321 | 
322 | 
323 | if __name__ == '__main__':
324 |     from config import consts
325 |     import skimage.io as img_io
326 | 
327 |     # test random noise
328 |     random_noise = RandomNoise(consts.img.noise_std)
329 |     image = img_io.imread('/home/wangjian/Develop/3DReconstruction/tmp/bobo.jpg') / 255.
330 |     # img_io.imshow(image)
331 |     # img_io.show()
332 |     image = random_noise(image)
333 |     image = image / np.max(image, (0, 1))
334 |     img_io.imshow(image)
335 |     img_io.show()
336 | 


--------------------------------------------------------------------------------
/utils/depth2pointcloud.py:
--------------------------------------------------------------------------------
  1 | import cv2
  2 | import numpy as np
  3 | import open3d
  4 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated
  5 | from utils.fisheye.FishEyeEquisolid import FishEyeCameraEquisolid
  6 | import os
  7 | 
  8 | 
  9 | class Depth2PointCloud:
 10 |     def __init__(self, visualization, camera_model='utils/fisheye/fisheye.calibration_05_08.json',
 11 |                  post_process=True):
 12 |         if camera_model == 'FishEyeCameraEquisolid':
 13 |             self.camera = FishEyeCameraEquisolid(focal_length=9, sensor_size=32, img_size=(1280, 1024))
 14 |         else:
 15 |             self.camera = FishEyeCameraCalibrated(calibration_file_path=camera_model)
 16 |         self.visualization = visualization
 17 |         self.post_process = post_process
 18 | 
 19 |     def depth2pointcloud(self, camera, depth_img, real_img):
 20 |         depth_img = depth_img.transpose()
 21 |         real_img = np.transpose(real_img, axes=(1, 0, 2))
 22 |         points = np.zeros(shape=(depth_img.shape[0], depth_img.shape[1], 2))
 23 |         x_range = np.array(range(depth_img.shape[0]))
 24 |         y_range = np.array(range(depth_img.shape[1]))
 25 |         points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose()
 26 |         points[:, :, 1] = np.add(points[:, :, 1], y_range)
 27 |         points = points.reshape((-1, 2))
 28 |         depth_img_flat = depth_img.reshape((-1))
 29 |         # opencv color to  RGB color between [0, 1)
 30 |         colors = real_img[:, :, ::-1]
 31 |         colors = colors.reshape((-1, 3)) / 255.
 32 |         points_3d = camera.camera2world(point=points, depth=depth_img_flat)
 33 |         return points_3d, colors
 34 | 
 35 | 
 36 |     def depth2pointcloud_no_color(self, camera, depth_img):
 37 |         depth_img = depth_img.transpose()
 38 |         points = np.zeros(shape=(depth_img.shape[0], depth_img.shape[1], 2))
 39 |         x_range = np.array(range(depth_img.shape[0]))
 40 |         y_range = np.array(range(depth_img.shape[1]))
 41 |         points[:, :, 0] = np.add(points[:, :, 0].transpose(), x_range).transpose()
 42 |         points[:, :, 1] = np.add(points[:, :, 1], y_range)
 43 |         points = points.reshape((-1, 2))
 44 |         depth_img_flat = depth_img.reshape((-1))
 45 |         # opencv color to  RGB color between [0, 1)
 46 |         points_3d = camera.camera2world(point=points, depth=depth_img_flat)
 47 |         final_point = []
 48 |         for point in points_3d:
 49 |             if point[2] > 0.1:
 50 |                 final_point.append(point)
 51 |         return final_point
 52 | 
 53 |     def __get_img_mask(self, img_width=1280, img_height=1024):
 54 |         radius = int(img_height / 2 - 30)
 55 |         mask = np.zeros(shape=[img_height, img_width, 3])
 56 |         cv2.circle(mask, center=(img_width // 2, img_height // 2), radius=radius, color=(255, 255, 255), thickness=-1)
 57 |         return mask / 255.
 58 | 
 59 |     def postprocess(self, point_3d, colors):
 60 |         final_point = []
 61 |         final_color = []
 62 |         for point, color in zip(point_3d, colors):
 63 |             if point[2] > 0.1:
 64 |                 final_point.append(point)
 65 |                 final_color.append(color)
 66 |         return final_point, final_color
 67 | 
 68 |     def __visualize(self, point_cloud):
 69 |         mesh_frame = open3d.geometry.TriangleMesh.create_coordinate_frame()
 70 |         open3d.visualization.draw_geometries([point_cloud, mesh_frame])
 71 | 
 72 |     def get_point_cloud_single_image(self, depth_path, img_path, output_path=None):
 73 |         depth = cv2.imread(depth_path, cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)
 74 |         depth = cv2.resize(depth, dsize=(1280, 1024), interpolation=cv2.INTER_NEAREST)
 75 |         if len(depth.shape) == 3:
 76 |             depth = depth[:, :, 0]
 77 |         depth[depth > 100] = 0
 78 | 
 79 |         img = cv2.imread(img_path)
 80 |         img = cv2.resize(img, dsize=(1280, 1024), interpolation=cv2.INTER_LINEAR)
 81 | 
 82 |         point_3d, colors = self.depth2pointcloud(self.camera, depth, img)
 83 | 
 84 |         if self.post_process:
 85 |             point_3d, colors = self.postprocess(point_3d, colors)
 86 | 
 87 |         # visualization
 88 |         point_cloud = open3d.geometry.PointCloud()
 89 |         point_cloud.points = open3d.utility.Vector3dVector(point_3d)
 90 |         point_cloud.colors = open3d.utility.Vector3dVector(colors)
 91 | 
 92 |         if self.visualization:
 93 |             self.__visualize(point_cloud)
 94 |         if output_path is not None:
 95 |             open3d.io.write_point_cloud(output_path, point_cloud)
 96 | 
 97 |         return point_cloud
 98 | 
 99 | 
100 | if __name__ == '__main__':
101 |     root_path = r'\\winfs-inf\CT\EgoMocap\work\EgocentricDepthEstimation\data'
102 |     img_name_map = {'kitchen': 'fc2_save_2017-11-08-124903-0100.jpg',
103 |                     'office': 'fc2_save_2017-11-08-162032-0000.jpg',
104 |                     'img': '004.png',
105 |                     'kitchen_jian': 'img_-04032020183051-58.jpg',
106 |                     'kitchen_2': 'fc2_save_2017-11-08-124903-2300.jpg',
107 |                     'office_2': 'fc2_save_2017-11-08-162032-1300.jpg',
108 |                     'studio': 'frame_c_0_f_0180.png',
109 |                     'office_3': 'fc2_save_2017-11-06-152157-0120.jpg',
110 |                     'me': 'me_processed.jpg',
111 |                     'scannet': '000.png',
112 |                     'kitchen_3': 'fc2_save_2017-11-08-124903-0793.jpg',
113 |                     'real': '0.png',
114 |                     'shakehead': '2.jpg',
115 |                     'kitchen_rot': '1.jpg',
116 |                     'kitchen_seq': 'fc2_save_2017-11-08-124903-0108.jpg',
117 |                     'synthetic_data': '000010.png',
118 |                     'synthetic_data_2': '000000.png',
119 |                     'jian3': 'img-10082020170800-793.jpg'}
120 |     scene_dir = r'kitchen'
121 |     img_name = img_name_map[scene_dir]
122 |     depth_name = img_name + '.exr'
123 |     depth_dir = 'wo_body_lr_1e-4_finetune'
124 |     depth_path = os.path.join(root_path, scene_dir, depth_dir, depth_name)
125 |     img_path = os.path.join(root_path, scene_dir, img_name)
126 | 
127 |     get_point_cloud = Depth2PointCloud(visualization=True)
128 | 
129 |     get_point_cloud.get_point_cloud_single_image(depth_path, img_path,
130 |          output_path=os.path.join(r'F:\Develop\egocentricdepthestimation\reconstructed_scene_pointcloud',
131 |                                   '{}_{}_{}.ply'.format(scene_dir, depth_dir, depth_name)))
132 | 


--------------------------------------------------------------------------------
/utils/fisheye/FishEyeCalibrated.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import numpy as np
  3 | import torch
  4 | from copy import deepcopy
  5 | 
  6 | 
  7 | class FishEyeCameraCalibrated:
  8 |     def __init__(self, calibration_file_path, use_gpu=False):
  9 |         with open(calibration_file_path) as f:
 10 |             calibration_data = json.load(f)
 11 |         self.intrinsic = np.array(calibration_data['intrinsic'])
 12 |         self.img_size = np.array(calibration_data['size'])  # w, h
 13 |         self.fisheye_polynomial = np.array(calibration_data['polynomialC2W'])
 14 |         self.fisheye_inverse_polynomial = np.array(calibration_data['polynomialW2C'])
 15 |         self.img_center = np.array([self.intrinsic[0][2], self.intrinsic[1][2]])
 16 |         self.use_gpu = use_gpu
 17 |         # self.img_center = np.array([self.img_size[0] / 2, self.img_size[1] / 2])
 18 | 
 19 |     def camera2world(self, point: np.ndarray, depth: np.ndarray):
 20 |         """
 21 |         point: np.ndarray of 2D points on image (n * 2)
 22 |         depth: np.ndarray of depth of every 2D points (n)
 23 |         """
 24 |         depth = depth.astype(np.float32)
 25 |         point_centered = point.astype(np.float32) - self.img_center
 26 |         x = point_centered[:, 0]
 27 |         y = point_centered[:, 1]
 28 |         distance_from_center = np.sqrt(np.square(x) + np.square(y))
 29 | 
 30 |         z = np.polyval(p=self.fisheye_polynomial[::-1], x=distance_from_center)
 31 |         point_3d = np.array([x, y, -z])  # 3, n
 32 |         norm = np.linalg.norm(point_3d, axis=0)
 33 |         point_3d = point_3d / norm * depth
 34 |         return point_3d.transpose()
 35 | 
 36 |     def camera2world_ray(self, point: np.ndarray):
 37 |         """
 38 |         calculate the ray direction from the image points
 39 |         point: np.ndarray of 2D points on image (n * 2)
 40 |         depth: np.ndarray of depth of every 2D points (n)
 41 |         """
 42 |         point_centered = point.astype(np.float) - self.img_center
 43 |         x = point_centered[:, 0]
 44 |         y = point_centered[:, 1]
 45 |         distance_from_center = np.sqrt(np.square(x) + np.square(y))
 46 | 
 47 |         z = np.polyval(p=self.fisheye_polynomial[::-1], x=distance_from_center)
 48 |         point_3d = np.array([x, y, -z])  # 3, n
 49 |         norm = np.linalg.norm(point_3d, axis=0)
 50 |         point_3d = point_3d / norm
 51 |         return point_3d.transpose()
 52 | 
 53 |     def getPolyVal(self, p, x):
 54 |         curVal = torch.zeros_like(x)
 55 |         for curValIndex in range(len(p) - 1):
 56 |             curVal = (curVal + p[curValIndex]) * x
 57 |         return curVal + p[len(p) - 1]
 58 | 
 59 |     def camera2world_pytorch(self, point: torch.Tensor, depth: torch.Tensor):
 60 |         """
 61 |                 point: np.ndarray of 2D points on image (n * 2)
 62 |                 depth: np.ndarray of depth of every 2D points (n)
 63 |                 """
 64 |         img_center = torch.from_numpy(self.img_center).float().to(point.device)
 65 |         point_centered = point.float() - img_center
 66 |         x = point_centered[:, 0]
 67 |         y = point_centered[:, 1]
 68 |         distance_from_center = torch.sqrt(torch.square(x) + torch.square(y))
 69 | 
 70 |         z = self.getPolyVal(p=self.fisheye_polynomial[::-1], x=distance_from_center)
 71 |         point_3d = torch.stack([x, y, -z]).float()  # 3, n
 72 |         norm = torch.norm(point_3d, dim=0)
 73 |         point_3d = point_3d / norm * depth.float()
 74 |         return point_3d.t()
 75 | 
 76 |     def camera2world_ray_pytorch(self, point: torch.Tensor):
 77 |         """
 78 |                 point: np.ndarray of 2D points on image (n * 2)
 79 |                 depth: np.ndarray of depth of every 2D points (n)
 80 |                 """
 81 |         point_centered = point.float() - self.img_center
 82 |         x = point_centered[:, 0]
 83 |         y = point_centered[:, 1]
 84 |         distance_from_center = torch.sqrt(torch.square(x) + torch.square(y))
 85 | 
 86 |         z = self.getPolyVal(p=self.fisheye_polynomial[::-1], x=distance_from_center)
 87 |         point_3d = torch.stack([x, y, -z]).float()  # 3, n
 88 |         norm = torch.norm(point_3d, dim=0)
 89 |         point_3d = point_3d / norm
 90 |         return point_3d.t()
 91 | 
 92 |     def world2camera(self, point3D):
 93 |         point3D = deepcopy(point3D)
 94 |         point3D[:, 2] = point3D[:, 2] * -1
 95 |         point3D = point3D.T
 96 |         xc, yc = self.img_center[0], self.img_center[1]
 97 |         point2D = []
 98 | 
 99 |         norm = np.linalg.norm(point3D[:2], axis=0)
100 | 
101 |         if (norm != 0).all():
102 |             theta = np.arctan(point3D[2] / norm)
103 |             invnorm = 1.0 / norm
104 |             t = theta
105 |             rho = self.fisheye_inverse_polynomial[0]
106 |             t_i = 1.0
107 | 
108 |             for i in range(1, len(self.fisheye_inverse_polynomial)):
109 |                 t_i *= t
110 |                 rho += t_i * self.fisheye_inverse_polynomial[i]
111 | 
112 |             x = point3D[0] * invnorm * rho
113 |             y = point3D[1] * invnorm * rho
114 | 
115 |             point2D.append(x + xc)
116 |             point2D.append(y + yc)
117 |         else:
118 |             point2D.append(xc)
119 |             point2D.append(yc)
120 |             raise Exception("norm is zero!")
121 | 
122 |         return np.asarray(point2D).T
123 | 
124 |     def world2camera_with_depth(self, point3D):
125 |         point3D_cloned = deepcopy(point3D)
126 |         point2D = self.world2camera(point3D_cloned)
127 | 
128 |         depth = np.linalg.norm(point3D, axis=-1)
129 |         return point2D, depth
130 | 
131 |     def world2camera_pytorch_with_depth(self, point3D):
132 |         point2D = self.world2camera_pytorch(point3D)
133 | 
134 |         depth = torch.norm(point3D, dim=-1)
135 |         return point2D, depth
136 | 
137 |     def world2camera_pytorch(self, point3d_original: torch.Tensor, normalize=False):
138 |         """
139 | 
140 |         Args:
141 |             point3d_original: point
142 |             normalize: normalize to -1, 1
143 | 
144 |         Returns:
145 | 
146 |         """
147 |         fisheye_inv_polynomial = self.fisheye_inverse_polynomial
148 |         point3d = point3d_original.clone()
149 |         point3d[:, 2] = point3d_original[:, 2] * -1
150 |         point3d = point3d.transpose(0, 1)
151 |         xc, yc = self.img_center[0], self.img_center[1]
152 |         xc = torch.Tensor([xc]).float().to(point3d.device)
153 |         yc = torch.Tensor([yc]).float().to(point3d.device)
154 |         point2d = torch.empty((2, point3d.shape[-1])).to(point3d.device)
155 | 
156 |         norm = torch.norm(point3d[:2], dim=0)
157 | 
158 |         if (norm != 0).all():
159 |             theta = torch.atan(point3d[2] / norm)
160 |             invnorm = 1.0 / norm
161 |             t = theta
162 |             rho = fisheye_inv_polynomial[0]
163 |             t_i = 1.0
164 | 
165 |             for i in range(1, len(fisheye_inv_polynomial)):
166 |                 t_i *= t
167 |                 rho += t_i * fisheye_inv_polynomial[i]
168 | 
169 |             x = point3d[0] * invnorm * rho
170 |             y = point3d[1] * invnorm * rho
171 | 
172 |             point2d[0] = x + xc
173 |             point2d[1] = y + yc
174 |         else:
175 |             point2d[0] = xc
176 |             point2d[1] = yc
177 |             raise Exception("norm is zero!")
178 | 
179 |         # if normalize, the point result will be -1 to 1from
180 |         if normalize is True:
181 |             image_w, image_h = self.img_size[0], self.img_size[1]
182 |             assert image_w > image_h
183 |             point2d[0] = point2d[0] - (image_w - image_h) // 2  # to square image
184 |             point2d = point2d / (image_h - 1) * 2   # to [0, 2]
185 |             point2d -= 1
186 | 
187 |         return point2d.transpose(0, 1)
188 | 
189 |     def undistort(self, point_2Ds):
190 |         """
191 |         undistort the input 2d points in fisheye camera
192 |         """
193 |         point_length = point_2Ds.shape[0]
194 |         depths = np.ones(shape=point_length)
195 |         point_3Ds = self.camera2world(point_2Ds, depths)
196 | 
197 |         # point_3Ds_homo = np.ones((point_length, 4))
198 |         # point_3Ds_homo[:, :3] = point_3Ds
199 | 
200 |         projected_2d_points = (self.intrinsic[:3, :3] @ point_3Ds.T).T
201 |         projected_2d_points = projected_2d_points[:, :2] / projected_2d_points[:, 2:]
202 |         return projected_2d_points
203 | 
204 | 
205 | if __name__ == '__main__':
206 |     camera = FishEyeCameraCalibrated(r'Z:\EgoMocap\work\EgocentricFullBody\mmpose\utils\fisheye_camera\fisheye.calibration_01_12.json')
207 |     point = np.array([[660, 520], [520, 660], [123, 456]])
208 |     depth = np.array([30, 30, 40])
209 |     point = torch.from_numpy(point).cuda()
210 |     depth = torch.from_numpy(depth).cuda()
211 |     point3d = camera.camera2world_pytorch(point, depth)
212 |     print(point3d)
213 | 
214 |     # reprojected_point_2d = camera.world2camera(point3d)
215 |     # print(reprojected_point_2d)
216 | 
217 |     reprojected_point_2d = camera.world2camera_pytorch(point3d.cpu(), normalize=False)
218 |     print(reprojected_point_2d)
219 | 


--------------------------------------------------------------------------------
/utils/fisheye/FishEyeEquisolid.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import numpy as np
  3 | import torch
  4 | 
  5 | 
  6 | class FishEyeCameraEquisolid:
  7 |     def __init__(self, focal_length, sensor_size, img_size, use_gpu=False):
  8 |         """
  9 |         @param    focal_length: focal length of camera in mm
 10 |         @param    sensor_size: sensor size of camera in mm
 11 |         @param    img_size: image size of w, h
 12 |         @param    use_gpu: whether use the gpu to accelerate the calculation
 13 |         """
 14 |         self.sensor_size = sensor_size
 15 |         self.img_size = np.asarray(img_size)
 16 |         self.use_gpu = use_gpu
 17 |         # calculate the focal length in pixel
 18 |         self.focal_length = focal_length / np.max(sensor_size) * np.max(img_size)
 19 | 
 20 |         # calculate the image center
 21 |         self.img_center = self.img_size / 2 + 1e-10
 22 |         # get max distance from image center
 23 |         self.max_distance = self.focal_length * np.sqrt(2)
 24 | 
 25 |         if self.use_gpu:
 26 |             gpu_ok = torch.cuda.is_available()
 27 |             if gpu_ok is False:
 28 |                 raise Exception("GPU is not available!")
 29 | 
 30 |     def camera2world(self, point: np.ndarray, depth: np.ndarray):
 31 |         """
 32 |         @param point: 2d point in shape: (n, 2)
 33 |         @param depth: depth of every points
 34 |         @return: 3d position of every point
 35 |         """
 36 |         # get the distance of point to center
 37 |         depth = depth.astype(np.float32)
 38 |         point_centered = point.astype(np.float32) - self.img_center
 39 |         x = point_centered[:, 0]
 40 |         y = point_centered[:, 1]
 41 |         distance_from_center = np.sqrt(np.square(x) + np.square(y))
 42 |         distance_from_center[distance_from_center > self.max_distance - 30] = self.max_distance
 43 | 
 44 |         theta = 2 * np.arcsin(distance_from_center / (2 * self.focal_length))
 45 |         Z = distance_from_center / np.tan(theta)
 46 | 
 47 |         # square_sin_theta_div_2 = np.square(distance_from_center / (2 * self.focal_length))
 48 |         # tan_theta_div_1 = np.sqrt(1 / (4 * square_sin_theta_div_2 * (1 - square_sin_theta_div_2)) - 1)
 49 |         # Z = distance_from_center * tan_theta_div_1
 50 |         point_3d = np.array([x, y, Z])
 51 |         norm = np.linalg.norm(point_3d, axis=0)
 52 |         point_3d = point_3d / norm * depth
 53 |         return point_3d.transpose()
 54 | 
 55 |     def camera2world_pytorch(self, point, depth):
 56 |         """
 57 |         @param point: 2d point in shape: (n, 2)
 58 |         @param depth: depth of every points
 59 |         @return: 3d position of every point
 60 |         """
 61 |         # get the distance of point to center
 62 |         depth = depth.float()
 63 |         img_center = torch.from_numpy(self.img_center)
 64 |         point_centered = point.float() - img_center.to(point.device)
 65 |         x = point_centered[:, 0]
 66 |         y = point_centered[:, 1]
 67 |         distance_from_center = torch.sqrt(torch.square(x) + torch.square(y))
 68 |         distance_from_center[distance_from_center > self.max_distance - 30] = self.max_distance
 69 | 
 70 |         theta = 2 * torch.arcsin(distance_from_center / (2 * self.focal_length))
 71 |         Z = distance_from_center / torch.tan(theta)
 72 | 
 73 |         point_3d = torch.stack([x, y, Z])
 74 |         norm = torch.norm(point_3d, dim=0)
 75 |         point_3d = point_3d / norm * depth
 76 |         point_3d = point_3d.float()
 77 |         return point_3d.T
 78 | 
 79 |     def world2camera(self, point3D):
 80 |         # calculate depth
 81 |         x = point3D[:, 0]
 82 |         y = point3D[:, 1]
 83 |         z = point3D[:, 2]
 84 |         depth = np.linalg.norm(point3D, axis=-1)
 85 | 
 86 |         # calculate theta
 87 |         distance_to_center_3d = np.sqrt(np.square(x) + np.square(y))
 88 |         tan_theta = distance_to_center_3d / z
 89 |         theta = np.arctan(tan_theta)
 90 | 
 91 |         R = 2 * self.focal_length * np.sin(theta / 2)
 92 |         a = np.sqrt(np.square(R) / (np.square(x) + np.square(y)))
 93 |         X = a * x
 94 |         Y = a * y
 95 |         point2D = np.array([X, Y]).T + self.img_center
 96 |         return point2D, depth
 97 | 
 98 |     def world2camera_pytorch(self, point3D):
 99 |         # calculate depth
100 |         x = point3D[:, 0]
101 |         y = point3D[:, 1]
102 |         z = point3D[:, 2]
103 |         depth = torch.norm(point3D, dim=-1)
104 | 
105 |         # calculate theta
106 |         distance_to_center_3d = torch.sqrt(torch.square(x) + torch.square(y))
107 |         tan_theta = distance_to_center_3d / z
108 | 
109 |         theta = torch.arctan(tan_theta)
110 | 
111 |         R = 2 * self.focal_length * torch.sin(theta / 2)
112 |         a = torch.sqrt(torch.square(R) / (torch.square(x) + torch.square(y)))
113 |         X = a * x
114 |         Y = a * y
115 |         img_center = torch.from_numpy(self.img_center)
116 |         point2D = torch.stack([X, Y], dim=-1) + img_center.to(point3D.device)
117 |         return point2D.float(), depth.float()
118 | 
119 | 
120 | 
121 | def main():
122 |     camera = FishEyeCameraEquisolid(focal_length=9, sensor_size=32, img_size=(1280, 1024))
123 |     point = np.array([[660, 120], [660, 420], ])
124 |     depth = np.array([10, 100])
125 |     point3d = camera.camera2world(point, depth)
126 |     print(point3d)
127 |     point = torch.asarray([[660, 120], [660, 420], ])
128 |     depth = torch.asarray([10, 100])
129 |     point3d = camera.camera2world_pytorch(point, depth)
130 |     print(point3d)
131 |     # point3d = torch.from_numpy(point3d).cuda()
132 |     # point3d.requires_grad = True
133 |     print(camera.world2camera_pytorch(point3d))
134 | 
135 | if __name__ == '__main__':
136 |     main()


--------------------------------------------------------------------------------
/utils/fisheye/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/utils/fisheye/__init__.py


--------------------------------------------------------------------------------
/utils/fisheye/fisheye.calibration.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "egosyn",
 3 |   "sensor": [1, 1],
 4 |   "size": [1280, 1024],
 5 |   "animated": 0,
 6 |   "intrinsic":   [
 7 |     [500, 0, 6.597087109684564E+02, 0],
 8 |     [0, 500, 5.300556618148025E+02, 0],
 9 |     [0, 0, 1, 0],
10 |     [0, 0, 0, 1]
11 |   ],
12 |   "imageCircleRadius": 5.120000000000000E+02,
13 |   "polynomialC2W":  [-2.924126419694919E+02,  0.000000000000000E+00,  1.075613595858202E-03,  2.072664555244253E-07,
14 |     4.493499097653669E-10, -1.192028310212584E-15, -1.822337421183959E-17],
15 |   "polynomialW2C":  [4.785893205484341E+02,  3.503715828980770E+02,  7.900065565120241E+01,  6.228794005673283E+01,
16 |     3.264466851189552E+01,  1.568380500967838E+01,  7.766879336977007E+00,  2.190791369989537E+00,
17 |     -1.084229689289942E-01, -1.903842667463734E-01, -2.776267870029922E-02],
18 |   "affine":      [9.997488697329212E-01, -2.240239372797548E-04, 3.318272123957599E-04],
19 |   "extrinsic":   [
20 |     [1, 0, 0, 0],
21 |     [0, 1, 0, 0],
22 |     [0, 0, 1, 0],
23 |     [0, 0, 0, 1]
24 |   ],
25 |   "radial":0
26 | }
27 | 
28 | 


--------------------------------------------------------------------------------
/utils/fisheye/fisheye.calibration_05_08.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "egosyn",
 3 |   "sensor": [1, 1],
 4 |   "size": [1280, 1024],
 5 |   "animated": 0,
 6 |   "intrinsic":   [
 7 |     [500, 0, 614.93216999, 0],
 8 |     [0, 500, 527.53465649, 0],
 9 |     [0, 0, 1, 0],
10 |     [0, 0, 0, 1]
11 |   ],
12 |   "imageCircleRadius": 5.120000000000000E+02,
13 |   "polynomialC2W":  [-2.924126419694919E+02,  0.000000000000000E+00,  1.075613595858202E-03,  2.072664555244253E-07,
14 |     4.493499097653669E-10, -1.192028310212584E-15, -1.822337421183959E-17],
15 |   "polynomialW2C":  [4.785893205484341E+02,  3.503715828980770E+02,  7.900065565120241E+01,  6.228794005673283E+01,
16 |     3.264466851189552E+01,  1.568380500967838E+01,  7.766879336977007E+00,  2.190791369989537E+00,
17 |     -1.084229689289942E-01, -1.903842667463734E-01, -2.776267870029922E-02],
18 |   "affine":      [9.997488697329212E-01, -2.240239372797548E-04, 3.318272123957599E-04],
19 |   "extrinsic":   [
20 |     [1, 0, 0, 0],
21 |     [0, 1, 0, 0],
22 |     [0, 0, 1, 0],
23 |     [0, 0, 0, 1]
24 |   ],
25 |   "radial":0
26 | }
27 | 
28 | 


--------------------------------------------------------------------------------
/utils/fisheye/mean3D.mat:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jianwang-mpi/SceneEgo/a85ab5966dd5fd03e97fcee6e420d3c77fddfd34/utils/fisheye/mean3D.mat


--------------------------------------------------------------------------------
/utils/get_predict.py:
--------------------------------------------------------------------------------
 1 | import math
 2 | 
 3 | import numpy as np
 4 | 
 5 | 
 6 | def get_max_preds(batch_heatmaps):
 7 |     '''
 8 |     get predictions from score maps
 9 |     heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
10 |     '''
11 |     assert isinstance(batch_heatmaps, np.ndarray), \
12 |         'batch_heatmaps should be numpy.ndarray'
13 |     assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'
14 | 
15 |     batch_size = batch_heatmaps.shape[0]
16 |     num_joints = batch_heatmaps.shape[1]
17 |     width = batch_heatmaps.shape[3]
18 |     heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))
19 |     idx = np.argmax(heatmaps_reshaped, 2)
20 |     maxvals = np.amax(heatmaps_reshaped, 2)
21 | 
22 |     maxvals = maxvals.reshape((batch_size, num_joints, 1))
23 |     idx = idx.reshape((batch_size, num_joints, 1))
24 | 
25 |     preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
26 | 
27 |     preds[:, :, 0] = (preds[:, :, 0]) % width
28 |     preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
29 | 
30 |     pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
31 |     pred_mask = pred_mask.astype(np.float32)
32 | 
33 |     preds *= pred_mask
34 |     return preds, maxvals
35 | 


--------------------------------------------------------------------------------
/utils/img.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import cv2
  3 | from PIL import Image
  4 | 
  5 | import torch
  6 | 
  7 | IMAGENET_MEAN, IMAGENET_STD = np.array([0.485, 0.456, 0.406]), np.array([0.229, 0.224, 0.225])
  8 | 
  9 | 
 10 | def crop_image(image, bbox):
 11 |     """Crops area from image specified as bbox. Always returns area of size as bbox filling missing parts with zeros
 12 |     Args:
 13 |         image numpy array of shape (height, width, 3): input image
 14 |         bbox tuple of size 4: input bbox (left, upper, right, lower)
 15 | 
 16 |     Returns:
 17 |         cropped_image numpy array of shape (height, width, 3): resulting cropped image
 18 | 
 19 |     """
 20 | 
 21 |     image_pil = Image.fromarray(image)
 22 |     image_pil = image_pil.crop(bbox)
 23 | 
 24 |     return np.asarray(image_pil)
 25 | 
 26 | 
 27 | def resize_image(image, shape):
 28 |     return cv2.resize(image, (shape[1], shape[0]), interpolation=cv2.INTER_AREA)
 29 | 
 30 | 
 31 | def get_square_bbox(bbox):
 32 |     """Makes square bbox from any bbox by stretching of minimal length side
 33 | 
 34 |     Args:
 35 |         bbox tuple of size 4: input bbox (left, upper, right, lower)
 36 | 
 37 |     Returns:
 38 |         bbox: tuple of size 4:  resulting square bbox (left, upper, right, lower)
 39 |     """
 40 | 
 41 |     left, upper, right, lower = bbox
 42 |     width, height = right - left, lower - upper
 43 | 
 44 |     if width > height:
 45 |         y_center = (upper + lower) // 2
 46 |         upper = y_center - width // 2
 47 |         lower = upper + width
 48 |     else:
 49 |         x_center = (left + right) // 2
 50 |         left = x_center - height // 2
 51 |         right = left + height
 52 | 
 53 |     return left, upper, right, lower
 54 | 
 55 | 
 56 | def scale_bbox(bbox, scale):
 57 |     left, upper, right, lower = bbox
 58 |     width, height = right - left, lower - upper
 59 | 
 60 |     x_center, y_center = (right + left) // 2, (lower + upper) // 2
 61 |     new_width, new_height = int(scale * width), int(scale * height)
 62 | 
 63 |     new_left = x_center - new_width // 2
 64 |     new_right = new_left + new_width
 65 | 
 66 |     new_upper = y_center - new_height // 2
 67 |     new_lower = new_upper + new_height
 68 | 
 69 |     return new_left, new_upper, new_right, new_lower
 70 | 
 71 | 
 72 | def to_numpy(tensor):
 73 |     if torch.is_tensor(tensor):
 74 |         return tensor.cpu().detach().numpy()
 75 |     elif type(tensor).__module__ != 'numpy':
 76 |         raise ValueError("Cannot convert {} to numpy array"
 77 |                          .format(type(tensor)))
 78 |     return tensor
 79 | 
 80 | 
 81 | def to_torch(ndarray):
 82 |     if type(ndarray).__module__ == 'numpy':
 83 |         return torch.from_numpy(ndarray)
 84 |     elif not torch.is_tensor(ndarray):
 85 |         raise ValueError("Cannot convert {} to torch tensor"
 86 |                          .format(type(ndarray)))
 87 |     return ndarray
 88 | 
 89 | 
 90 | def image_batch_to_numpy(image_batch):
 91 |     image_batch = to_numpy(image_batch)
 92 |     image_batch = np.transpose(image_batch, (0, 2, 3, 1)) # BxCxHxW -> BxHxWxC
 93 |     return image_batch
 94 | 
 95 | 
 96 | def image_batch_to_torch(image_batch):
 97 |     image_batch = np.transpose(image_batch, (0, 3, 1, 2)) # BxHxWxC -> BxCxHxW
 98 |     image_batch = to_torch(image_batch).float()
 99 |     return image_batch
100 | 
101 | 
102 | def normalize_image(image):
103 |     """Normalizes image using ImageNet mean and std
104 | 
105 |     Args:
106 |         image numpy array of shape (h, w, 3): image
107 | 
108 |     Returns normalized_image numpy array of shape (h, w, 3): normalized image
109 |     """
110 |     return (image / 255.0 - IMAGENET_MEAN) / IMAGENET_STD
111 | 
112 | 
113 | def denormalize_image(image):
114 |     """Reverse to normalize_image() function"""
115 |     return np.clip(255.0 * (image * IMAGENET_STD + IMAGENET_MEAN), 0, 255)
116 | 


--------------------------------------------------------------------------------
/utils/misc.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | import yaml
 3 | import json
 4 | import re
 5 | 
 6 | import torch
 7 | 
 8 | 
 9 | def config_to_str(config):
10 |     return yaml.dump(yaml.safe_load(json.dumps(config)))  # fuck yeah
11 | 
12 | 
13 | class AverageMeter(object):
14 |     """Computes and stores the average and current value"""
15 |     def __init__(self):
16 |         self.reset()
17 | 
18 |     def reset(self):
19 |         self.val = 0
20 |         self.avg = 0
21 |         self.sum = 0
22 |         self.count = 0
23 | 
24 |     def update(self, val, n=1):
25 |         self.val = val
26 |         self.sum += val * n
27 |         self.count += n
28 |         self.avg = self.sum / self.count
29 | 
30 | 
31 | def calc_gradient_norm(named_parameters):
32 |     total_norm = 0.0
33 |     for name, p in named_parameters:
34 |         # print(name)
35 |         param_norm = p.grad.data.norm(2)
36 |         total_norm += param_norm.item() ** 2
37 | 
38 |     total_norm = total_norm ** (1. / 2)
39 | 
40 |     return total_norm
41 | 


--------------------------------------------------------------------------------
/utils/multiview.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import torch
  3 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated
  4 | 
  5 | 
  6 | class Camera:
  7 |     def __init__(self, R, t, K, dist=None, name=""):
  8 |         self.R = np.array(R).copy()
  9 |         assert self.R.shape == (3, 3)
 10 | 
 11 |         self.t = np.array(t).copy()
 12 |         assert self.t.size == 3
 13 |         self.t = self.t.reshape(3, 1)
 14 | 
 15 |         self.K = np.array(K).copy()
 16 |         assert self.K.shape == (3, 3)
 17 | 
 18 |         self.dist = dist
 19 |         if self.dist is not None:
 20 |             self.dist = np.array(self.dist).copy().flatten()
 21 | 
 22 |         self.name = name
 23 | 
 24 |     def update_after_crop(self, bbox):
 25 |         left, upper, right, lower = bbox
 26 | 
 27 |         cx, cy = self.K[0, 2], self.K[1, 2]
 28 | 
 29 |         new_cx = cx - left
 30 |         new_cy = cy - upper
 31 | 
 32 |         self.K[0, 2], self.K[1, 2] = new_cx, new_cy
 33 | 
 34 |     def update_after_resize(self, image_shape, new_image_shape):
 35 |         height, width = image_shape
 36 |         new_height, new_width = new_image_shape
 37 | 
 38 |         fx, fy, cx, cy = self.K[0, 0], self.K[1, 1], self.K[0, 2], self.K[1, 2]
 39 | 
 40 |         new_fx = fx * (new_width / width)
 41 |         new_fy = fy * (new_height / height)
 42 |         new_cx = cx * (new_width / width)
 43 |         new_cy = cy * (new_height / height)
 44 | 
 45 |         self.K[0, 0], self.K[1, 1], self.K[0, 2], self.K[1, 2] = new_fx, new_fy, new_cx, new_cy
 46 | 
 47 |     @property
 48 |     def projection(self):
 49 |         return self.K.dot(self.extrinsics)
 50 | 
 51 |     @property
 52 |     def extrinsics(self):
 53 |         return np.hstack([self.R, self.t])
 54 | 
 55 | 
 56 | def euclidean_to_homogeneous(points):
 57 |     """Converts euclidean points to homogeneous
 58 | 
 59 |     Args:
 60 |         points numpy array or torch tensor of shape (N, M): N euclidean points of dimension M
 61 | 
 62 |     Returns:
 63 |         numpy array or torch tensor of shape (N, M + 1): homogeneous points
 64 |     """
 65 |     if isinstance(points, np.ndarray):
 66 |         return np.hstack([points, np.ones((len(points), 1))])
 67 |     elif torch.is_tensor(points):
 68 |         return torch.cat([points, torch.ones((points.shape[0], 1), dtype=points.dtype, device=points.device)], dim=1)
 69 |     else:
 70 |         raise TypeError("Works only with numpy arrays and PyTorch tensors.")
 71 | 
 72 | 
 73 | def homogeneous_to_euclidean(points):
 74 |     """Converts homogeneous points to euclidean
 75 | 
 76 |     Args:
 77 |         points numpy array or torch tensor of shape (N, M + 1): N homogeneous points of dimension M
 78 | 
 79 |     Returns:
 80 |         numpy array or torch tensor of shape (N, M): euclidean points
 81 |     """
 82 |     if isinstance(points, np.ndarray):
 83 |         return (points.T[:-1] / points.T[-1]).T
 84 |     elif torch.is_tensor(points):
 85 |         return (points.transpose(1, 0)[:-1] / points.transpose(1, 0)[-1]).transpose(1, 0)
 86 |     else:
 87 |         raise TypeError("Works only with numpy arrays and PyTorch tensors.")
 88 | 
 89 | 
 90 | def project_3d_points_to_image_plane_without_distortion(proj_matrix, points_3d, convert_back_to_euclidean=True):
 91 |     """Project 3D points to image plane not taking into account distortion
 92 |     Args:
 93 |         proj_matrix numpy array or torch tensor of shape (3, 4): projection matrix
 94 |         points_3d numpy array or torch tensor of shape (N, 3): 3D points
 95 |         convert_back_to_euclidean bool: if True, then resulting points will be converted to euclidean coordinates
 96 |                                         NOTE: division by zero can be here if z = 0
 97 |     Returns:
 98 |         numpy array or torch tensor of shape (N, 2): 3D points projected to image plane
 99 |     """
100 |     if isinstance(proj_matrix, np.ndarray) and isinstance(points_3d, np.ndarray):
101 |         result = euclidean_to_homogeneous(points_3d) @ proj_matrix.T
102 |         if convert_back_to_euclidean:
103 |             result = homogeneous_to_euclidean(result)
104 |         return result
105 |     elif torch.is_tensor(proj_matrix) and torch.is_tensor(points_3d):
106 |         result = euclidean_to_homogeneous(points_3d) @ proj_matrix.t()
107 |         if convert_back_to_euclidean:
108 |             result = homogeneous_to_euclidean(result)
109 |         return result
110 |     else:
111 |         raise TypeError("Works only with numpy arrays and PyTorch tensors.")
112 | 
113 | 
114 | def project_3d_points_to_image_fisheye_camera(fisheye_camera_model: FishEyeCameraCalibrated,
115 |                                               points_3d):
116 |     """Project 3D points to image plane
117 |         Args:
118 |             fisheye camera model: model of fisheye camera
119 |             points_3d numpy array or torch tensor of shape (N, 3): 3D points
120 |             convert_back_to_euclidean bool: if True, then resulting points will be converted to euclidean coordinates
121 |                                             NOTE: division by zero can be here if z = 0
122 |         Returns:
123 |             numpy array or torch tensor of shape (N, 2): 3D points projected to image plane
124 |     """
125 |     if isinstance(points_3d, np.ndarray):
126 |         result = fisheye_camera_model.world2camera(points_3d)
127 |         return result
128 |     elif torch.is_tensor(points_3d):
129 |         result = fisheye_camera_model.world2camera_pytorch(points_3d)
130 |         return result
131 |     else:
132 |         raise TypeError("Works only with numpy arrays and PyTorch tensors.")
133 | 
134 | 
135 | 
136 | def calc_reprojection_error_matrix(keypoints_3d, keypoints_2d_list, proj_matricies):
137 |     reprojection_error_matrix = []
138 |     for keypoints_2d, proj_matrix in zip(keypoints_2d_list, proj_matricies):
139 |         keypoints_2d_projected = project_3d_points_to_image_plane_without_distortion(proj_matrix, keypoints_3d)
140 |         reprojection_error = 1 / 2 * np.sqrt(np.sum((keypoints_2d - keypoints_2d_projected) ** 2, axis=1))
141 |         reprojection_error_matrix.append(reprojection_error)
142 | 
143 |     return np.vstack(reprojection_error_matrix).T
144 | 


--------------------------------------------------------------------------------
/utils/op.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | 
  7 | from utils import multiview
  8 | 
  9 | 
 10 | def integrate_tensor_2d(heatmaps, softmax=True):
 11 |     """Applies softmax to heatmaps and integrates them to get their's "center of masses"
 12 | 
 13 |     Args:
 14 |         heatmaps torch tensor of shape (batch_size, n_heatmaps, h, w): input heatmaps
 15 | 
 16 |     Returns:
 17 |         coordinates torch tensor of shape (batch_size, n_heatmaps, 2): coordinates of center of masses of all heatmaps
 18 | 
 19 |     """
 20 |     batch_size, n_heatmaps, h, w = heatmaps.shape
 21 | 
 22 |     heatmaps = heatmaps.reshape((batch_size, n_heatmaps, -1))
 23 |     if softmax:
 24 |         heatmaps = nn.functional.softmax(heatmaps, dim=2)
 25 |     else:
 26 |         heatmaps = nn.functional.relu(heatmaps)
 27 | 
 28 |     heatmaps = heatmaps.reshape((batch_size, n_heatmaps, h, w))
 29 | 
 30 |     mass_x = heatmaps.sum(dim=2)
 31 |     mass_y = heatmaps.sum(dim=3)
 32 | 
 33 |     mass_times_coord_x = mass_x * torch.arange(w).type(torch.float).to(mass_x.device)
 34 |     mass_times_coord_y = mass_y * torch.arange(h).type(torch.float).to(mass_y.device)
 35 | 
 36 |     x = mass_times_coord_x.sum(dim=2, keepdim=True)
 37 |     y = mass_times_coord_y.sum(dim=2, keepdim=True)
 38 | 
 39 |     if not softmax:
 40 |         x = x / mass_x.sum(dim=2, keepdim=True)
 41 |         y = y / mass_y.sum(dim=2, keepdim=True)
 42 | 
 43 |     coordinates = torch.cat((x, y), dim=2)
 44 |     coordinates = coordinates.reshape((batch_size, n_heatmaps, 2))
 45 | 
 46 |     return coordinates, heatmaps
 47 | 
 48 | 
 49 | def integrate_tensor_3d(volumes, softmax=True):
 50 |     batch_size, n_volumes, x_size, y_size, z_size = volumes.shape
 51 | 
 52 |     volumes = volumes.reshape((batch_size, n_volumes, -1))
 53 |     if softmax:
 54 |         volumes = nn.functional.softmax(volumes, dim=2)
 55 |     else:
 56 |         volumes = nn.functional.relu(volumes)
 57 | 
 58 |     volumes = volumes.reshape((batch_size, n_volumes, x_size, y_size, z_size))
 59 | 
 60 |     mass_x = volumes.sum(dim=3).sum(dim=3)
 61 |     mass_y = volumes.sum(dim=2).sum(dim=3)
 62 |     mass_z = volumes.sum(dim=2).sum(dim=2)
 63 | 
 64 |     mass_times_coord_x = mass_x * torch.arange(x_size).type(torch.float).to(mass_x.device)
 65 |     mass_times_coord_y = mass_y * torch.arange(y_size).type(torch.float).to(mass_y.device)
 66 |     mass_times_coord_z = mass_z * torch.arange(z_size).type(torch.float).to(mass_z.device)
 67 | 
 68 |     x = mass_times_coord_x.sum(dim=2, keepdim=True)
 69 |     y = mass_times_coord_y.sum(dim=2, keepdim=True)
 70 |     z = mass_times_coord_z.sum(dim=2, keepdim=True)
 71 | 
 72 |     if not softmax:
 73 |         x = x / mass_x.sum(dim=2, keepdim=True)
 74 |         y = y / mass_y.sum(dim=2, keepdim=True)
 75 |         z = z / mass_z.sum(dim=2, keepdim=True)
 76 | 
 77 |     coordinates = torch.cat((x, y, z), dim=2)
 78 |     coordinates = coordinates.reshape((batch_size, n_volumes, 3))
 79 | 
 80 |     return coordinates, volumes
 81 | 
 82 | 
 83 | def integrate_tensor_3d_with_coordinates(volumes, coord_volumes, softmax=True):
 84 | 
 85 |     batch_size, n_volumes, x_size, y_size, z_size = volumes.shape
 86 |     volumes = volumes.reshape((batch_size, n_volumes, -1))
 87 |     if softmax:
 88 |         volumes = nn.functional.softmax(volumes, dim=2)
 89 |     else:
 90 |         # need to be normalized
 91 |         volumes = nn.functional.relu(volumes)
 92 | 
 93 |     volumes = volumes.reshape((batch_size, n_volumes, x_size, y_size, z_size))
 94 |     coordinates = torch.einsum("bnxyz, bxyzc -> bnc", volumes, coord_volumes)
 95 | 
 96 |     return coordinates, volumes
 97 | 
 98 | def get_projected_2d_points_with_coord_volumes(fisheye_model, coord_volume):
 99 |     """
100 |     :param fisheye_model:
101 |     :param coord_volumes: no batch dimension
102 |     :return:
103 |     """
104 |     # Note: coord volumes are the same among all of the batches, so we only need to
105 |     # get the coord volume for one batch and copy it to others
106 | 
107 |     device = coord_volume.device
108 |     volume_shape = coord_volume.shape  # x_len, y_len, z_len
109 | 
110 |     grid_coord = coord_volume.reshape((-1, 3))
111 | 
112 |     ####note: precalculated reprojected points!
113 |     grid_coord_proj = multiview.project_3d_points_to_image_fisheye_camera(
114 |         fisheye_model, grid_coord
115 |     )
116 |     return grid_coord_proj
117 | 
118 | 
119 | def get_distance_with_coord_volumes(coord_volume):
120 |     """
121 |     :param fisheye_model:
122 |     :param coord_volumes: no batch dimension
123 |     :return:
124 |     """
125 |     # Note: coord volumes are the same among all of the batches, so we only need to
126 |     # get the coord volume for one batch and copy it to others
127 | 
128 |     grid_coord = coord_volume.reshape((-1, 3))
129 | 
130 |     ####note: precalculated distance!
131 |     grid_coord_distance = torch.norm(grid_coord, dim=-1)
132 |     return grid_coord_distance
133 | 
134 | 
135 | def unproject_heatmaps_one_view(heatmaps, grid_coord_proj, volume_size):
136 | 
137 |     '''
138 |     project the heatmap based on the camera parameters of egocentric fisheye camera
139 |     :param heatmaps:
140 |     :param fisheye_model: fisheye camera model
141 |     :param coord_volumes: shape: batch_size, n_joints, x_len, y_len, z_len
142 |     :return:
143 |     '''
144 |     # Note: the coord volume is the same for all images, thus we can calculate it in advance.
145 |     #  We do not need to calculate
146 |     #  it within the iteration
147 |     device = heatmaps.device
148 |     batch_size, n_joints, heatmap_shape = heatmaps.shape[0], heatmaps.shape[1], tuple(heatmaps.shape[2:])
149 |     volume_shape = (volume_size, volume_size, volume_size)
150 | 
151 |     volume_batch = torch.zeros(batch_size, n_joints, *volume_shape, device=device)
152 | 
153 |     # TODO: speed up this this loop
154 |     for batch_i in range(batch_size):
155 |         heatmap = heatmaps[batch_i]
156 |         heatmap = heatmap.unsqueeze(0)
157 | 
158 |         # transform to [-1.0, 1.0] range
159 |         # note: in grid_coord_proj, the format is like (x, y), however,
160 |         # note: when we sample the points, we need (y, x)
161 |         grid_coord_proj_transformed = torch.zeros_like(grid_coord_proj)
162 |         grid_coord_proj_transformed[:, 0] = 2 * (grid_coord_proj[:, 0] / heatmap_shape[1] - 0.5)
163 |         grid_coord_proj_transformed[:, 1] = 2 * (grid_coord_proj[:, 1] / heatmap_shape[0] - 0.5)
164 | 
165 |         # prepare to F.grid_sample
166 |         grid_coord_proj_transformed = grid_coord_proj_transformed.unsqueeze(1).unsqueeze(0)
167 | 
168 |         current_volume = F.grid_sample(heatmap, grid_coord_proj_transformed, align_corners=True)
169 | 
170 |         # reshape back to volume
171 |         current_volume = current_volume.view(n_joints, *volume_shape)
172 | 
173 |         volume_batch[batch_i] = current_volume
174 | 
175 |     return volume_batch
176 | 
177 | def get_grid_coord_proj_batch(grid_coord_proj, batch_size, heatmap_shape):
178 |     grid_coord_proj_transformed = torch.zeros_like(grid_coord_proj)
179 |     grid_coord_proj_transformed[:, 0] = 2 * (grid_coord_proj[:, 0] / heatmap_shape[1] - 0.5)
180 |     grid_coord_proj_transformed[:, 1] = 2 * (grid_coord_proj[:, 1] / heatmap_shape[0] - 0.5)
181 |     grid_coord_proj_transformed = grid_coord_proj_transformed.unsqueeze(1).unsqueeze(0)
182 |     grid_coord_proj_transformed_batch = grid_coord_proj_transformed.expand(batch_size, -1, -1, -1)
183 | 
184 |     return grid_coord_proj_transformed_batch
185 | 
186 | 
187 | def get_grid_coord_distance_batch(grid_coord_distance, batch_size, joint_num=15):
188 |     grid_coord_distance = grid_coord_distance.unsqueeze(1).unsqueeze(0)
189 |     grid_coord_distance_batch = grid_coord_distance.expand(batch_size, joint_num, -1, -1)
190 | 
191 |     return grid_coord_distance_batch
192 | 
193 | 
194 | def unproject_heatmaps_one_view_batch(heatmaps, grid_coord_proj_transformed_batch, volume_size):
195 | 
196 |     '''
197 |     project the heatmap based on the camera parameters of egocentric fisheye camera
198 |     :param heatmaps:
199 |     :param fisheye_model: fisheye camera model
200 |     :param coord_volumes: shape: batch_size, n_joints, x_len, y_len, z_len
201 |     :return:
202 |     '''
203 |     # Note: the coord volume is the same for all images, thus we can calculate it in advance.
204 |     #  We do not need to calculate
205 |     #  it within the iteration
206 |     batch_size, n_joints, heatmap_shape = heatmaps.shape[0], heatmaps.shape[1], tuple(heatmaps.shape[2:])
207 |     volume_shape = (volume_size, volume_size, volume_size)
208 | 
209 |     current_volume = F.grid_sample(heatmaps, grid_coord_proj_transformed_batch, align_corners=True)
210 | 
211 |     # reshape back to volume
212 |     volume_batch = current_volume.view(batch_size, n_joints, *volume_shape)
213 | 
214 |     return volume_batch
215 | 
216 | def gaussian_2d_pdf(coords, means, sigmas, normalize=True):
217 |     normalization = 1.0
218 |     if normalize:
219 |         normalization = (2 * np.pi * sigmas[:, 0] * sigmas[:, 0])
220 | 
221 |     exp = torch.exp(-((coords[:, 0] - means[:, 0]) ** 2 / sigmas[:, 0] ** 2 + (coords[:, 1] - means[:, 1]) ** 2 / sigmas[:, 1] ** 2) / 2)
222 |     return exp / normalization
223 | 
224 | 
225 | def render_points_as_2d_gaussians(points, sigmas, image_shape, normalize=True):
226 |     device = points.device
227 |     n_points = points.shape[0]
228 | 
229 |     yy, xx = torch.meshgrid(torch.arange(image_shape[0]).to(device), torch.arange(image_shape[1]).to(device))
230 |     grid = torch.stack([xx, yy], dim=-1).type(torch.float32)
231 |     grid = grid.unsqueeze(0).repeat(n_points, 1, 1, 1)  # (n_points, h, w, 2)
232 |     grid = grid.reshape((-1, 2))
233 | 
234 |     points = points.unsqueeze(1).unsqueeze(1).repeat(1, image_shape[0], image_shape[1], 1)
235 |     points = points.reshape(-1, 2)
236 | 
237 |     sigmas = sigmas.unsqueeze(1).unsqueeze(1).repeat(1, image_shape[0], image_shape[1], 1)
238 |     sigmas = sigmas.reshape(-1, 2)
239 | 
240 |     images = gaussian_2d_pdf(grid, points, sigmas, normalize=normalize)
241 |     images = images.reshape(n_points, *image_shape)
242 | 
243 |     return images
244 | 


--------------------------------------------------------------------------------
/utils/pose_visualization_utils.py:
--------------------------------------------------------------------------------
 1 | 
 2 | import open3d
 3 | import numpy as np
 4 | from scipy.spatial.transform import Rotation
 5 | 
 6 | def get_sphere(position, radius=1.0, color=(0.1, 0.1, 0.7)):
 7 |     mesh_sphere: open3d.geometry.TriangleMesh = open3d.geometry.TriangleMesh.create_sphere(radius=radius)
 8 |     mesh_sphere.paint_uniform_color(color)
 9 | 
10 |     # translate to position
11 |     mesh_sphere = mesh_sphere.translate(position, relative=False)
12 |     return mesh_sphere
13 | 
14 | def rotation_matrix_from_vectors(vec1, vec2):
15 |     """ Find the rotation matrix that aligns vec1 to vec2
16 |     :param vec1: A 3d "source" vector
17 |     :param vec2: A 3d "destination" vector
18 |     :return mat: A transform matrix (3x3) which when applied to vec1, aligns it with vec2.
19 |     """
20 |     a, b = (vec1 / np.linalg.norm(vec1)).reshape(3), (vec2 / np.linalg.norm(vec2)).reshape(3)
21 |     v = np.cross(a, b)
22 |     c = np.dot(a, b)
23 |     s = np.linalg.norm(v)
24 |     if np.abs(s) < 1e-6:
25 |         rotation_matrix = np.eye(3)
26 |     else:
27 |         kmat = np.array([[0, -v[2], v[1]], [v[2], 0, -v[0]], [-v[1], v[0], 0]])
28 |         rotation_matrix = np.eye(3) + kmat + kmat.dot(kmat) * ((1 - c) / (s ** 2))
29 |     return rotation_matrix
30 | 
31 | def get_cylinder(start_point, end_point, radius=0.3, color=(0.1, 0.9, 0.1)):
32 |     center = (start_point + end_point) / 2
33 |     height = np.linalg.norm(start_point - end_point)
34 |     mesh_cylinder: open3d.geometry.TriangleMesh = open3d.geometry.TriangleMesh.create_cylinder(radius=radius, height=height)
35 |     mesh_cylinder.paint_uniform_color(color)
36 | 
37 |     # translate and rotate to position
38 |     # rotate vector
39 |     rot_vec = end_point - start_point
40 |     rot_vec = rot_vec / np.linalg.norm(rot_vec)
41 |     rot_0 = np.array([0, 0, 1])
42 |     rot_mat = rotation_matrix_from_vectors(rot_0, rot_vec)
43 |     # if open3d.__version__ >= '0.9.0.0':
44 |     #     rotation_param = rot_mat
45 |     # else:
46 |     #     rotation_param = Rotation.from_matrix(rot_mat).as_euler('xyz')
47 |     rotation_param = rot_mat
48 |     mesh_cylinder = mesh_cylinder.rotate(rotation_param)
49 |     mesh_cylinder = mesh_cylinder.translate(center, relative=False)
50 |     return mesh_cylinder
51 | 
52 | if __name__ == '__main__':
53 |     point1 = np.array([-1, 11, 8])
54 |     point2 = np.array([12, -1, 5])
55 |     sphere1 = get_sphere(position=point1, radius=0.1)
56 |     sphere2 = get_sphere(position=point2, radius=0.1)
57 |     cylinder = get_cylinder(start_point=point1, end_point=point2, radius=0.02)
58 | 
59 |     mesh_frame = open3d.geometry.TriangleMesh.create_coordinate_frame(size=0.5)
60 |     
61 |     open3d.visualization.draw_geometries(
62 |         [sphere1, sphere2, cylinder, mesh_frame])


--------------------------------------------------------------------------------
/utils/rigid_transform_with_scale.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import numpy.linalg
  3 | import random
  4 | import torch
  5 | 
  6 | # Relevant links:
  7 | #   - http://stackoverflow.com/a/32244818/263061 (solution with scale)
  8 | #   - "Least-Squares Rigid Motion Using SVD" (no scale but easy proofs and explains how weights could be added)
  9 | 
 10 | 
 11 | # Rigidly (+scale) aligns two point clouds with know point-to-point correspondences
 12 | # with least-squares error.
 13 | # Returns (scale factor c, rotation matrix R, translation vector t) such that
 14 | #   Q = P*cR + t
 15 | # if they align perfectly, or such that
 16 | #   SUM over point i ( | P_i*cR + t - Q_i |^2 )
 17 | # is minimised if they don't align perfectly.
 18 | def umeyama(P, Q):
 19 |     assert P.shape == Q.shape
 20 |     n, dim = P.shape
 21 | 
 22 |     centeredP = P - P.mean(axis=0)
 23 |     centeredQ = Q - Q.mean(axis=0)
 24 | 
 25 |     C = np.dot(np.transpose(centeredP), centeredQ) / n
 26 | 
 27 | 
 28 | 
 29 |     V, S, W = np.linalg.svd(C)
 30 |     d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0
 31 | 
 32 |     if d:
 33 |         S[-1] = -S[-1]
 34 |         V[:, -1] = -V[:, -1]
 35 | 
 36 |     R = np.dot(V, W)
 37 | 
 38 |     varP = np.var(P, axis=0).sum()
 39 |     c = 1/varP * np.sum(S) # scale factor
 40 | 
 41 |     t = Q.mean(axis=0) - P.mean(axis=0).dot(c*R)
 42 | 
 43 |     return c, R, t
 44 | 
 45 | def umeyama_pytorch(P, Q):
 46 |     assert P.shape == Q.shape
 47 |     n, dim = P.shape
 48 | 
 49 |     centeredP = P - torch.mean(P, dim=0)
 50 |     centeredQ = Q - torch.mean(Q, dim=0)
 51 | 
 52 |     C = centeredP.T @ centeredQ / n
 53 | 
 54 |     V, S, W = torch.svd(C)
 55 |     W = W.T
 56 |     d = (torch.det(V) * torch.det(W)) < 0.0
 57 | 
 58 |     if d:
 59 |         S[-1] = -S[-1]
 60 |         V[:, -1] = -V[:, -1]
 61 | 
 62 |     R = V @ W
 63 | 
 64 | 
 65 |     varP = torch.sum(torch.var(P, dim=0, unbiased=False))
 66 |     c = 1 / varP * torch.sum(S) # scale factor
 67 | 
 68 |     t = torch.mean(Q, dim=0) - torch.mean(P, dim=0).matmul(c * R)
 69 | 
 70 |     return c, R, t
 71 | 
 72 | def umeyama_ransac(P, Q, epsilon=0.2, n_iters=80):
 73 |     assert P.shape == Q.shape
 74 |     inliner_set = []
 75 |     point_length = P.shape[0]
 76 |     for i in range(n_iters):
 77 |         sampled_points = random.sample(list(range(point_length)), 4)
 78 |         sampled_P = P[sampled_points]
 79 |         sampled_Q = Q[sampled_points]
 80 |         c, R, t = umeyama(sampled_P, sampled_Q)
 81 | 
 82 |         projected_P = P @ R * c + t
 83 |         new_inliner_set = []
 84 |         for j in range(point_length):
 85 |             if np.linalg.norm(projected_P[j] - Q[j], ord=2) < epsilon:
 86 |                 new_inliner_set.append(j)
 87 |         if len(new_inliner_set) > len(inliner_set):
 88 |             inliner_set = new_inliner_set
 89 | 
 90 |     sampled_P = P[inliner_set]
 91 |     sampled_Q = Q[inliner_set]
 92 |     c, R, t = umeyama(sampled_P, sampled_Q)
 93 |     return c, R, t
 94 | 
 95 | def umeyama_dim_2(P, Q):
 96 |     assert P.shape == Q.shape
 97 |     n, dim1 = P.shape
 98 | 
 99 |     centeredP = P
100 |     centeredQ = Q
101 | 
102 |     C = np.dot(np.transpose(centeredP), centeredQ) / n
103 | 
104 |     V, S, W = np.linalg.svd(C)
105 |     d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0
106 | 
107 |     if d:
108 |         S[-1] = -S[-1]
109 |         V[:, -1] = -V[:, -1]
110 | 
111 |     R = np.dot(V, W)
112 | 
113 |     varP = np.var(P, axis=0).sum()
114 |     c = 1/varP * np.sum(S) # scale factor
115 | 
116 |     t = Q.mean(axis=0) - P.mean(axis=0).dot(c*R)
117 | 
118 |     return c, R, t
119 | 
120 | 
121 | def umeyama(P, Q):
122 |     assert P.shape == Q.shape
123 |     n, dim = P.shape
124 | 
125 |     centeredP = P - P.mean(axis=0)
126 |     centeredQ = Q - Q.mean(axis=0)
127 | 
128 |     C = np.dot(np.transpose(centeredP), centeredQ) / n
129 | 
130 | 
131 | 
132 |     V, S, W = np.linalg.svd(C)
133 |     d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0
134 | 
135 |     if d:
136 |         S[-1] = -S[-1]
137 |         V[:, -1] = -V[:, -1]
138 | 
139 |     R = np.dot(V, W)
140 | 
141 |     varP = np.var(P, axis=0).sum()
142 |     c = 1/varP * np.sum(S) # scale factor
143 | 
144 |     t = Q.mean(axis=0) - P.mean(axis=0).dot(c*R)
145 | 
146 |     return c, R, t
147 | 
148 | if __name__ == '__main__':
149 |     a = np.random.normal(size=(15, 3))
150 |     b = np.random.normal(size=(15, 3))
151 | 
152 |     result1 = umeyama(a.copy(), b.copy())
153 |     print(result1)
154 |     result2 = umeyama_pytorch(torch.from_numpy(a), torch.from_numpy(b))
155 |     print(result2)


--------------------------------------------------------------------------------
/utils/skeleton.py:
--------------------------------------------------------------------------------
  1 | # pose visualizer
  2 | # 1. read and generate 3D skeleton from heat map and depth
  3 | # 2. convert 3D skeleton to skeleton mesh
  4 | from utils.fisheye.FishEyeEquisolid import FishEyeCameraEquisolid
  5 | from utils.fisheye.FishEyeCalibrated import FishEyeCameraCalibrated
  6 | import numpy as np
  7 | import open3d
  8 | from utils.pose_visualization_utils import get_cylinder, get_sphere
  9 | from scipy.io import loadmat
 10 | import cv2
 11 | import os
 12 | from tqdm import tqdm
 13 | from scipy.ndimage.filters import gaussian_filter1d
 14 | 
 15 | 
 16 | class Skeleton:
 17 |     heatmap_sequence = ["Neck", "Right_shoulder", "Right_elbow", "Right_wrist", "Left_shoulder", "Left_elbow",
 18 |                         "Left_wrist", "Right_hip", "Right_knee", "Right_ankle", "Right_foot", "Left_hip",
 19 |                         "Left_knee", "Left_ankle", "Left_foot"]
 20 |     lines = [(0, 1), (0, 4), (1, 2), (2, 3), (4, 5), (5, 6), (1, 7), (4, 11), (7, 8), (8, 9), (9, 10),
 21 |              (11, 12), (12, 13), (13, 14), (7, 11)]
 22 |     kinematic_parents = [0, 0, 1, 2, 0, 4, 5, 1, 7, 8, 9, 4, 11, 12, 13]
 23 |     
 24 |     def __init__(self, calibration_path):
 25 |         
 26 |         self.skeleton = None
 27 |         self.skeleton_mesh = None
 28 |         if calibration_path is None:
 29 |             print('use FishEyeCameraEquisolid')
 30 |             self.camera = FishEyeCameraEquisolid(focal_length=9, sensor_size=32, img_size=(1280, 1024))
 31 |         else:
 32 |             self.camera = FishEyeCameraCalibrated(calibration_file_path=calibration_path)
 33 |     
 34 |     def set_skeleton(self, heatmap, depth, bone_length=None):
 35 |         heatmap = np.expand_dims(heatmap, axis=0)
 36 |         preds, _ = self.get_max_preds(heatmap)
 37 |         pred = preds[0]
 38 |         
 39 |         points_3d = self.camera.camera2world(pred, depth)
 40 |         # print('------------------------')
 41 |         # print(self.camera.camera2world(np.array([[640, 1000]]), np.array([1])))
 42 |         if bone_length is not None:
 43 |             points_3d = self._skeleton_resize(points_3d, bone_length)
 44 |         return points_3d
 45 | 
 46 |     def get_2d_pose_from_heatmap(self, heatmap):
 47 |         heatmap = np.expand_dims(heatmap, axis=0)
 48 |         preds, _ = self.get_max_preds(heatmap)
 49 |         pred = preds[0]
 50 |         return pred
 51 |     
 52 |     def joints_2_mesh(self, joints_3d, joint_color=(0.1, 0.1, 0.7), bone_color=(0.1, 0.9, 0.1)):
 53 |         self.skeleton = joints_3d
 54 |         self.skeleton_to_mesh(joint_color, bone_color)
 55 |         skeleton_mesh = self.skeleton_mesh
 56 |         self.skeleton_mesh = None
 57 |         self.skeleton = None
 58 |         return skeleton_mesh
 59 |     
 60 |     def joint_list_2_mesh_list(self, joints_3d_list):
 61 |         mesh_list = []
 62 |         for joints_3d in joints_3d_list:
 63 |             mesh_list.append(self.joints_2_mesh(joints_3d))
 64 |         return mesh_list
 65 |     
 66 |     def get_skeleton_mesh(self):
 67 |         if self.skeleton_mesh is None:
 68 |             raise Exception("Skeleton is not prepared.")
 69 |         else:
 70 |             return self.skeleton_mesh
 71 |     
 72 |     def save_skeleton_mesh(self, out_path):
 73 |         if self.skeleton_mesh is None:
 74 |             raise Exception("Skeleton is not prepared.")
 75 |         else:
 76 |             open3d.io.write_triangle_mesh(out_path, mesh=self.skeleton_mesh)
 77 |     
 78 |     def set_skeleton_from_file(self, heatmap_file, depth_file, bone_length_file=None, to_mesh=True):
 79 |         # load the average bone length
 80 |         if bone_length_file is not None:
 81 |             bone_length_mat = loadmat(bone_length_file)
 82 |             mean3D = bone_length_mat['mean3D'].T  # convert shape to 15 * 3
 83 |             bones_mean = mean3D - mean3D[self.kinematic_parents, :]
 84 |             bone_length = np.linalg.norm(bones_mean, axis=1)
 85 |         else:
 86 |             bone_length = None
 87 |         heatmap_mat = loadmat(heatmap_file)
 88 |         depth_mat = loadmat(depth_file)
 89 |         depth = depth_mat['depth'][0]
 90 |         heatmap = heatmap_mat['heatmap']
 91 |         heatmap = cv2.resize(heatmap, dsize=(1024, 1024), interpolation=cv2.INTER_NEAREST)
 92 |         heatmap = np.pad(heatmap, ((0, 0), (128, 128), (0, 0)), 'constant', constant_values=0)
 93 |         heatmap = heatmap.transpose((2, 0, 1))
 94 |         return self.set_skeleton(heatmap, depth, bone_length, to_mesh)
 95 |     
 96 |     def skeleton_resize_seq(self, joint_list, bone_length_file):
 97 |         bone_length_mat = loadmat(bone_length_file)
 98 |         mean3D = bone_length_mat['mean3D'].T  # convert shape to 15 * 3
 99 |         bones_mean = mean3D - mean3D[self.kinematic_parents, :]
100 |         bone_length = np.linalg.norm(bones_mean, axis=1)
101 |         
102 |         for i in range(len(joint_list)):
103 |             joint_list[i] = self._skeleton_resize(joint_list[i], bone_length)
104 |         return joint_list
105 |     
106 |     def skeleton_resize_single(self, joint, bone_length_file):
107 |         bone_length_mat = loadmat(bone_length_file)
108 |         mean3D = bone_length_mat['mean3D'].T  # convert shape to 15 * 3
109 |         bones_mean = mean3D - mean3D[self.kinematic_parents, :]
110 |         bone_length = np.linalg.norm(bones_mean, axis=1)
111 |         
112 |         joint = self._skeleton_resize(joint, bone_length)
113 |         return joint
114 |     
115 |     def skeleton_resize_standard_skeleton(self, joint_input, joint_standard):
116 |         """
117 |         
118 |         :param joint_input: input joint shape: 15 * 3
119 |         :param joint_standard: standard joint shape: 15 * 3
120 |         :return:
121 |         """
122 |         bones_mean = joint_standard - joint_standard[self.kinematic_parents, :]
123 |         bone_length = np.linalg.norm(bones_mean, axis=1) * 1000.
124 |     
125 |         joint = self._skeleton_resize(joint_input, bone_length)
126 |         return joint
127 |     
128 |     def _skeleton_resize(self, points_3d, bone_length):
129 |         # resize the skeleton to the normal size (why we should do that?)
130 |         estimated_bone_vec = points_3d - points_3d[self.kinematic_parents, :]
131 |         estimated_bone_length = np.linalg.norm(estimated_bone_vec, axis=1)
132 |         multi = bone_length[1:] / estimated_bone_length[1:]
133 |         multi = np.concatenate(([0], multi))
134 |         multi = np.stack([multi] * 3, axis=1)
135 |         resized_bones_vec = estimated_bone_vec * multi / 1000
136 |         
137 |         joints_rescaled = points_3d
138 |         for i in range(joints_rescaled.shape[0]):
139 |             joints_rescaled[i, :] = joints_rescaled[self.kinematic_parents[i], :] + resized_bones_vec[i, :]
140 |         return joints_rescaled
141 |     
142 |     def render(self):
143 |         mesh_frame = open3d.geometry.TriangleMesh.create_coordinate_frame(size=1)
144 |         open3d.visualization.draw_geometries([self.skeleton_mesh, mesh_frame])
145 |     
146 |     def skeleton_to_mesh(self, joint_color=(0.1, 0.1, 0.7), bone_color=(0.1, 0.9, 0.1)):
147 |         final_mesh = open3d.geometry.TriangleMesh()
148 |         for i in range(len(self.skeleton)):
149 |             keypoint_mesh = get_sphere(position=self.skeleton[i], radius=0.03, color=joint_color)
150 |             final_mesh = final_mesh + keypoint_mesh
151 |         
152 |         for line in self.lines:
153 |             line_start_i = line[0]
154 |             line_end_i = line[1]
155 |             
156 |             start_point = self.skeleton[line_start_i]
157 |             end_point = self.skeleton[line_end_i]
158 |             
159 |             line_mesh = get_cylinder(start_point, end_point, radius=0.0075, color=bone_color)
160 |             final_mesh += line_mesh
161 |         self.skeleton_mesh = final_mesh
162 |         return final_mesh
163 |     
164 |     def smooth(self, pose_sequence, sigma):
165 |         """
166 |         gaussian smooth pose
167 |         :param pose_sequence_2d: pose sequence, input is a list with every element is 15 * 2 body pose
168 |         :param kernel_size: kernel size of guassian smooth
169 |         :return: smoothed 2d pose
170 |         """
171 |         pose_sequence = np.asarray(pose_sequence)
172 |         pose_sequence_result = np.zeros_like(pose_sequence)
173 |         keypoint_num = pose_sequence.shape[1]
174 |         for i in range(keypoint_num):
175 |             pose_sequence_i = pose_sequence[:, i, :]
176 |             pose_sequence_filtered = gaussian_filter1d(pose_sequence_i, sigma, axis=0)
177 |             pose_sequence_result[:, i, :] = pose_sequence_filtered
178 |         return pose_sequence_result
179 |     
180 |     def get_max_preds(self, batch_heatmaps):
181 |         '''
182 |         get predictions from score maps
183 |         heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
184 |         '''
185 |         assert isinstance(batch_heatmaps, np.ndarray), \
186 |             'batch_heatmaps should be numpy.ndarray'
187 |         assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'
188 |         
189 |         batch_size = batch_heatmaps.shape[0]
190 |         num_joints = batch_heatmaps.shape[1]
191 |         width = batch_heatmaps.shape[3]
192 |         heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))
193 |         idx = np.argmax(heatmaps_reshaped, 2)
194 |         maxvals = np.amax(heatmaps_reshaped, 2)
195 |         
196 |         maxvals = maxvals.reshape((batch_size, num_joints, 1))
197 |         idx = idx.reshape((batch_size, num_joints, 1))
198 |         
199 |         preds = np.tile(idx, (1, 1, 2)).astype(np.float32)
200 |         
201 |         preds[:, :, 0] = (preds[:, :, 0]) % width
202 |         preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)
203 |         
204 |         pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
205 |         pred_mask = pred_mask.astype(np.float32)
206 |         
207 |         preds *= pred_mask
208 |         return preds, maxvals
209 | 
210 | 
211 | if __name__ == '__main__':
212 |     skeleton = Skeleton(
213 |         calibration_path='/home/wangjian/Develop/egocentricvisualization/pose/fisheye/fisheye.calibration.json')
214 |     data_path = r'/home/wangjian/Develop/egocentricvisualization/data_2'
215 |     heatmap_dir = os.path.join(data_path, 'heatmaps')
216 |     depth_dir = os.path.join(data_path, 'depths')
217 |     out_dir = os.path.join(data_path, 'smooth_skeleton_mesh')
218 |     if not os.path.isdir(out_dir):
219 |         os.mkdir(out_dir)
220 |     skeleon_list = []
221 |     out_path_list = []
222 |     for heatmap_name in tqdm(sorted(os.listdir(heatmap_dir))):
223 |         heatmap_path = os.path.join(heatmap_dir, heatmap_name)
224 |         mat_id = heatmap_name
225 |         depth_path = os.path.join(depth_dir, mat_id)
226 |         
227 |         skeleton_array = skeleton.set_skeleton_from_file(heatmap_path,
228 |                                                          depth_path,
229 |                                                          # bone_length_file=r'/home/wangjian/Develop/egocentricvisualization/pose/fisheye/mean3D.mat',
230 |                                                          to_mesh=False)
231 |         out_path = os.path.join(out_dir, mat_id + ".ply")
232 |         skeleon_list.append(skeleton_array)
233 |         out_path_list.append(out_path)
234 |     
235 |     smoothed_skeleton = skeleton.smooth(skeleon_list, sigma=1)
236 |     print("saving to ply")
237 |     for i in tqdm(range(len(smoothed_skeleton))):
238 |         skeleton.skeleton = smoothed_skeleton[i]
239 |         skeleton.skeleton_to_mesh()
240 |         skeleton.save_skeleton_mesh(out_path_list[i])
241 |     
242 |     # skeleton.set_skeleton_from_file(r'X:\Mo2Cap2Plus\static00\Datasets\Mo2Cap2\ego_system_test\sitting\heatmaps\img-04052020001910-937.mat',
243 |     #                                 r'X:\Mo2Cap2Plus\static00\Datasets\Mo2Cap2\ego_system_test\sitting\depths\img-04052020001910-937.mat',
244 |     #                                 # bone_length_file=r'F:\Develop\EgocentricSystemVisualization\pose\fisheye\mean3D.mat')
245 |     #                                 )
246 |     #
247 |     # skeleton.render()
248 |     # print(skeleton.skeleton)
249 | 


--------------------------------------------------------------------------------
/utils/volumetric.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | import cv2
  3 | import torch
  4 | 
  5 | from utils import multiview
  6 | from utils.pose_visualization_utils import get_cylinder, get_sphere
  7 | import open3d
  8 | 
  9 | 
 10 | class Point3D:
 11 |     def __init__(self, point, size=3, color=(0, 0, 255)):
 12 |         self.point = point
 13 |         self.size = size
 14 |         self.color = color
 15 | 
 16 |     def render(self, proj_matrix, canvas):
 17 |         point_2d = multiview.project_3d_points_to_image_plane_without_distortion(
 18 |             proj_matrix, np.array([self.point])
 19 |         )[0]
 20 | 
 21 |         point_2d = tuple(map(int, point_2d))
 22 |         cv2.circle(canvas, point_2d, self.size, self.color, self.size)
 23 | 
 24 |         return canvas
 25 | 
 26 |     def render_open3d(self):
 27 |         point_mesh = get_sphere(self.point, radius=0.02, color=self.color)
 28 |         return point_mesh
 29 | 
 30 | 
 31 | 
 32 | class Line3D:
 33 |     def __init__(self, start_point, end_point, size=2, color=(0, 0, 255)):
 34 |         self.start_point, self.end_point = start_point, end_point
 35 |         self.size = size
 36 |         self.color = color
 37 | 
 38 |     def render(self, proj_matrix, canvas):
 39 |         start_point_2d, end_point_2d = multiview.project_3d_points_to_image_plane_without_distortion(
 40 |             proj_matrix, np.array([self.start_point, self.end_point])
 41 |         )
 42 | 
 43 |         start_point_2d = tuple(map(int, start_point_2d))
 44 |         end_point_2d = tuple(map(int, end_point_2d))
 45 | 
 46 |         cv2.line(canvas, start_point_2d, end_point_2d, self.color, self.size)
 47 | 
 48 |         return canvas
 49 | 
 50 |     def render_open3d(self):
 51 |         line_mesh = get_cylinder(self.start_point, self.end_point, radius=0.005)
 52 |         return line_mesh
 53 | 
 54 | 
 55 | class Cuboid3D:
 56 |     def __init__(self, position, sides):
 57 |         self.position = position
 58 |         self.sides = sides
 59 | 
 60 |     def build(self):
 61 |         primitives = []
 62 | 
 63 |         line_color = (255, 255, 0)
 64 | 
 65 |         start = self.position + np.array([0, 0, 0])
 66 |         primitives.append(Line3D(start, start + np.array([self.sides[0], 0, 0]), color=(255, 0, 0)))
 67 |         primitives.append(Line3D(start, start + np.array([0, self.sides[1], 0]), color=(0, 255, 0)))
 68 |         primitives.append(Line3D(start, start + np.array([0, 0, self.sides[2]]), color=(0, 0, 255)))
 69 | 
 70 |         start = self.position + np.array([self.sides[0], 0, self.sides[2]])
 71 |         primitives.append(Line3D(start, start + np.array([-self.sides[0], 0, 0]), color=line_color))
 72 |         primitives.append(Line3D(start, start + np.array([0, self.sides[1], 0]), color=line_color))
 73 |         primitives.append(Line3D(start, start + np.array([0, 0, -self.sides[2]]), color=line_color))
 74 | 
 75 |         start = self.position + np.array([self.sides[0], self.sides[1], 0])
 76 |         primitives.append(Line3D(start, start + np.array([-self.sides[0], 0, 0]), color=line_color))
 77 |         primitives.append(Line3D(start, start + np.array([0, -self.sides[1], 0]), color=line_color))
 78 |         primitives.append(Line3D(start, start + np.array([0, 0, self.sides[2]]), color=line_color))
 79 | 
 80 |         start = self.position + np.array([0, self.sides[1], self.sides[2]])
 81 |         primitives.append(Line3D(start, start + np.array([self.sides[0], 0, 0]), color=line_color))
 82 |         primitives.append(Line3D(start, start + np.array([0, -self.sides[1], 0]), color=line_color))
 83 |         primitives.append(Line3D(start, start + np.array([0, 0, -self.sides[2]]), color=line_color))
 84 | 
 85 |         return primitives
 86 | 
 87 |     def render(self, proj_matrix, canvas):
 88 |         # TODO: support rotation
 89 | 
 90 |         primitives = self.build()
 91 | 
 92 |         for primitive in primitives:
 93 |             canvas = primitive.render(proj_matrix, canvas)
 94 | 
 95 |         return canvas
 96 | 
 97 |     def render_open3d(self):
 98 |         primitives = self.build()
 99 | 
100 |         mesh_canvas = []
101 | 
102 |         for primitive in primitives:
103 |             mesh = primitive.render_open3d()
104 |             mesh_canvas.append(mesh)
105 | 
106 |         return mesh_canvas
107 | 
108 | 
109 | def get_rotation_matrix(axis, theta):
110 |     """Returns the rotation matrix associated with counterclockwise rotation about
111 |     the given axis by theta radians.
112 |     """
113 |     axis = np.asarray(axis)
114 |     axis = axis / np.sqrt(np.dot(axis, axis))
115 |     a = np.cos(theta / 2.0)
116 |     b, c, d = -axis * np.sin(theta / 2.0)
117 |     aa, bb, cc, dd = a * a, b * b, c * c, d * d
118 |     bc, ad, ac, ab, bd, cd = b * c, a * d, a * c, a * b, b * d, c * d
119 |     return np.array([[aa + bb - cc - dd, 2 * (bc + ad), 2 * (bd - ac)],
120 |                      [2 * (bc - ad), aa + cc - bb - dd, 2 * (cd + ab)],
121 |                      [2 * (bd + ac), 2 * (cd - ab), aa + dd - bb - cc]])
122 | 
123 | 
124 | def rotate_coord_volume(coord_volume, theta, axis):
125 |     shape = coord_volume.shape
126 |     device = coord_volume.device
127 | 
128 |     rot = get_rotation_matrix(axis, theta)
129 |     rot = torch.from_numpy(rot).type(torch.float).to(device)
130 | 
131 |     coord_volume = coord_volume.view(-1, 3)
132 |     coord_volume = rot.mm(coord_volume.t()).t()
133 | 
134 |     coord_volume = coord_volume.view(*shape)
135 | 
136 |     return coord_volume
137 | 
138 | if __name__ == '__main__':
139 |     cuboid3D = Cuboid3D(position=(-1, -1, 0), sides=(2, 2, 2))
140 | 
141 |     mesh_list = cuboid3D.render_open3d()
142 |     mesh_list.append(open3d.geometry.TriangleMesh.create_coordinate_frame())
143 | 
144 |     open3d.visualization.draw_geometries(mesh_list)
145 | 


--------------------------------------------------------------------------------
/visualize.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | import os
 3 | os.environ["OPENCV_IO_ENABLE_OPENEXR"] = "1"
 4 | from utils.skeleton import Skeleton
 5 | import pickle
 6 | import open3d
 7 | 
 8 | from utils.depth2pointcloud import Depth2PointCloud
 9 | 
10 | 
11 | def visualize(img_path, depth_path, pred_pose_path):
12 |     skeleton = Skeleton(calibration_path='utils/fisheye/fisheye.calibration_05_08.json')
13 | 
14 |     with open(pred_pose_path, 'rb') as f:
15 |         predicted_pose = pickle.load(f)
16 | 
17 |     predicted_pose_mesh = skeleton.joints_2_mesh(predicted_pose)
18 | 
19 |     get_point_cloud = Depth2PointCloud(visualization=False,
20 |                                        camera_model='utils/fisheye/fisheye.calibration_05_08.json')
21 | 
22 |     scene = get_point_cloud.get_point_cloud_single_image(depth_path, img_path, output_path=None)
23 | 
24 |     open3d.visualization.draw_geometries([scene, predicted_pose_mesh])
25 | 
26 | def main():
27 |     parser = argparse.ArgumentParser()
28 |     parser.add_argument("--img_path", type=str, required=True)
29 |     parser.add_argument("--depth_path", type=str, required=True)
30 |     parser.add_argument("--pose_path", type=str, required=True)
31 | 
32 |     args = parser.parse_args()
33 |     img_path = args.img_path
34 |     depth_path = args.depth_path
35 |     pose_path = args.pose_path
36 | 
37 |     visualize(img_path, depth_path, pose_path)
38 | 
39 | 
40 | if __name__ == '__main__':
41 |     main()


--------------------------------------------------------------------------------