├── .gitignore ├── LICENSE ├── README.md ├── dataset └── README.txt ├── models └── README.txt ├── render_human.py ├── render_pose.py ├── render_scene.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | dataset/* 2 | !dataset/README.txt 3 | 4 | models/* 5 | !models/README.txt 6 | 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2024 Junggy 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # SCRREAM 2 | 3 | This is the official repository for SCRREAM (**SC**an, **R**egister, **RE**nder **A**nd **M**ap) benchmark dataset (accepted at **NeurIPS 2024**). We provide code example and intruction for visualizing our dataset as well as dataset download link. 4 | For the further info, please check our arxiv (https://arxiv.org/pdf/2410.22715) and project page (https://sites.google.com/view/scrream/about) 5 | 6 | ## Link to Download Dataset 7 | **Indoor Reconstruction and SLAM dataset & Object Removal and Scene Editing dataset :** 8 | 9 | https://drive.google.com/file/d/1Lu5VdlLn5NNoPSh2Tp90M6NF2Y1us677/view?usp=sharing (scene01-05, 59 Gb) 10 | https://drive.google.com/file/d/1V8HRX4-12jv-ZNW6AM_fyHoNiDstBV8q/view?usp=sharing (scene06-11, 101 Gb) 11 | 12 | In case of slow internet, please download with smaller zip files 13 | 14 | https://drive.google.com/file/d/11QIO2ZT2DnFo8V9OMiBzMjRAwjcUJ5qC/view?usp=sharing (scene01)\ 15 | https://drive.google.com/file/d/1jvIucH0PI9hkuYy3cRwYR7_eFpt7Nhpi/view?usp=sharing (scene02)\ 16 | https://drive.google.com/file/d/1mL2NajhaUxXvhvjGd7WPe7KKe037TshJ/view?usp=sharing (scene03)\ 17 | https://drive.google.com/file/d/1Tj6LckAubUd-OI8SwCEwULgwjRatdZbD/view?usp=sharing (scene04)\ 18 | https://drive.google.com/file/d/1E7Ahm4ERde9gXsUb3pXVJntOFWUqWSil/view?usp=sharing (scene05)\ 19 | https://drive.google.com/file/d/16kB_pjzb5V-VS8Ma-EVwtJYH0jxt1ptb/view?usp=sharing (scene06)\ 20 | https://drive.google.com/file/d/1-f-RkDdjVKwJdZeGLHGPwpz--cJNd11M/view?usp=sharing (scene07)\ 21 | https://drive.google.com/file/d/1VPlD4zvALDeDfPtXBigubgAJxJlueZBA/view?usp=sharing (scene08)\ 22 | https://drive.google.com/file/d/1DUBOMxurmjWSU2R5MUs6iXJMFyKTBhIA/view?usp=sharing (scene09)\ 23 | https://drive.google.com/file/d/1h2v4hrY3IxsR49MV65ZOxXI8xMMHGL1Y/view?usp=sharing (scene10)\ 24 | https://drive.google.com/file/d/1fwjpmXQ29sO_wk1SOmfdhlGq7Kyv8crN/view?usp=sharing (scene11) 25 | 26 | 27 | **Human Reconstruction Dataset :** 28 | 29 | https://drive.google.com/file/d/1BHHb5ibNsYsm00FXUrwLRkvC630sYq3U/view?usp=sharing (scene01-02, 7.7 Gb) 30 | 31 | **Pose Estimation Dataset :** 32 | 33 | https://drive.google.com/file/d/1kcy8DCu6L2GtU2vK22w9FhhlLP6ljq9b/view?usp=sharing (scene01-02, 7.4 Gb) 34 | 35 | Once the dataset is downloaded, unzip in the ```dataset``` folder. 36 | It should look like this 37 | 38 | ``` 39 | dataset\ 40 | human_scene01\.. 41 | human_scene02\.. 42 | pose_meshes_canonical\.. 43 | pose_scene01\.. 44 | pose_scene02\.. 45 | scene01\.. 46 | ... 47 | scene11\.. 48 | README.txt 49 | ``` 50 | 51 | 52 | ## Instruction for Visualization 53 | ### Requirements 54 | Install requirements with pip with this command 55 | ``` 56 | pip install -r requirements.txt 57 | ``` 58 | We tested with python version 3.9 on windows 10 machine. 59 | 60 | ### Visualizing Indoor Reconstruction and SLAM Dataset & Object Removal and Scene Editing Dataset 61 | 62 | To visualize the Indoor Reconstruction and SLAM Dataset & Object Removal and Scene Editing Dataset, 63 | run ```render_scene.py``` script with argument ```{dataset_dir} {scene} {traj} {frame}```, such as 64 | ``` 65 | python render_scene.py {dataset_dir} {scene} {traj} {frame} 66 | ``` 67 | If ```{frame}``` is set to -1, the script with go thorugh the entire frame as a video sequence. 68 | If ```{frame}``` is set to positive integer, the script will display the image with the given frame number 69 | 70 | ``` 71 | python render_scene.py dataset scene01 full_00 -1 \\ for visualizing entire sequnce in scene01_full_00 72 | python render_scene.py dataset scene01 full_00 100 \\ for visualizing 100th frame in scene01_full_00 73 | ``` 74 | 75 | To visualize the reduced scene for object removal or scene editing experiments, use ```{traj}``` that contains reduced. scene01, scene02, scene04, scene05, scene06, scene07, scene08, scene09 contain reduced scene. 76 | For example, 77 | ``` 78 | render_scene.py dataset scene01 reduced_00 -1 \\ for visualizing enture sequnce in scene01_full_00 79 | ``` 80 | will play the sequence from scene01 with reduced objects. 81 | 82 | The visualization is formatted as 2x3 image layout with the given format: 83 | ``` 84 | ㅣ (Ground Truth Depth) | (D435 Depth) | (ToF Depth) | 85 | ㅏㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅓ 86 | ㅣ (RGB with Semantic Mask) | (D435 Error) | (ToF Error) | 87 | ``` 88 | 89 | ### Visualizing Human Reconstruction Dataset 90 | (Pre-requisite) Download SMPL model into . As our dataset uses smpl model format, only downloading smpl format will be enough (download .pkl file if possible. Our dataset assumes .pkl in the script). 91 | The folder should look like this : 92 | ``` 93 | models\ 94 | smpl\.. 95 | README.txt 96 | ``` 97 | 98 | To visualize the human reconstruction dataset, run ```render_human.py``` script with argument ```{dataset_dir} {scene} {frame} {view}```, such as 99 | ``` 100 | python render_human.py {dataset_dir} {scene} {frame} {view} 101 | ``` 102 | Note that our human dataset contains 4 multiview images per each frame (or human posture). If ```view``` is set to -1, the script will plot 4 consecutive views on given frame. 103 | ``` 104 | python human_scene.py dataset human_scene01 0 0 \\ for visualizing first view of frame 0 in the human_scene01 105 | python human_scene.py dataset human_scene01 0 -1 \\ for visualizing all 4 views of frame 0 in the human_scene01 106 | ``` 107 | The visualization is formatted as 1x3 image layout with the given format: 108 | ``` 109 | | (RGB with Semantic Mask) | (RGB with Scanned Human Mesh) | (RGB with SMPL Human Mesh) | 110 | ``` 111 | 112 | 113 | 114 | 115 | ### Visualizing 6D Pose Estimation Dataset 116 | To visualize the human reconstruction dataset, run ```render_pose.py``` script with argument ```{dataset_dir} {scene} {frame}```, such as 117 | ``` 118 | python render_pose.py {dataset_dir} {scene} {frame} 119 | ``` 120 | If ```{frame}``` is set to -1, the script with go thorugh the entire frame as a video sequence. 121 | If ```{frame}``` is set to positive integer, the script will display the image with the given frame number 122 | ``` 123 | python pose_scene.py dataset pose_scene01 -1 \\ for visualizing entire sequnce in pose_scene01 124 | python pose_scene.py dataset pose_scene01 100 \\ for visualizing 100th frame in pose_scene01 125 | ``` 126 | The visualization is formatted as 1x1 image layout with the given format: 127 | ``` 128 | | (RGB with Pose 3D Bounding Box and Mask) | 129 | ``` 130 | ## Citation 131 | ``` 132 | @misc{jung2024scrreamscanregister, 133 | title={SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark}, 134 | author={HyunJun Jung and Weihang Li and Shun-Cheng Wu and William Bittner and Nikolas Brasch and Jifei Song and Eduardo Pérez-Pellitero and Zhensong Zhang and Arthur Moreau and Nassir Navab and Benjamin Busam}, 135 | year={2024}, 136 | eprint={2410.22715}, 137 | archivePrefix={arXiv}, 138 | primaryClass={cs.CV}, 139 | url={https://arxiv.org/abs/2410.22715}, 140 | } 141 | ``` 142 | 143 | 144 | -------------------------------------------------------------------------------- /dataset/README.txt: -------------------------------------------------------------------------------- 1 | download and upzip dataset here 2 | 3 | after successfully unzipping, this folder should contain 4 | 5 | human_scene01 6 | human_scene02 7 | pose_meshes_canonical 8 | pose_scene01 9 | pose_scene02 10 | scene01 11 | ... 12 | scene11 13 | README.txt -------------------------------------------------------------------------------- /models/README.txt: -------------------------------------------------------------------------------- 1 | download SMPL in pkl format. 2 | 3 | after successfully downloading, this folder should contain 4 | 5 | smpl 6 | README.txt -------------------------------------------------------------------------------- /render_human.py: -------------------------------------------------------------------------------- 1 | import pyrender, trimesh, os, glob, cv2, argparse, smplx, pickle 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | 5 | cmap = plt.get_cmap("gist_rainbow") 6 | 7 | if os.name == 'nt': 8 | separator = "\\" 9 | else: 10 | separator = '/' 11 | 12 | scene = pyrender.Scene(bg_color=[0, 0, 0],ambient_light=[0.5,0.5,0.5]) 13 | 14 | parser = argparse.ArgumentParser(description="Undistort images") 15 | parser.add_argument("dataset_dir") 16 | parser.add_argument("scene_name") 17 | parser.add_argument("idx") 18 | parser.add_argument("view_idx") 19 | 20 | args = parser.parse_args() 21 | 22 | dataset_dir = args.dataset_dir 23 | scene_name = args.scene_name 24 | frame_idx = int(args.idx) 25 | view_idx = int(args.view_idx) 26 | 27 | assert frame_idx >= 0, "frame_idx has to be greater than 0" 28 | assert view_idx >= -1 and view_idx < 4, "view_idx has to be a value between -1 and 3" 29 | 30 | base = os.path.join(dataset_dir,scene_name) 31 | mesh_base = os.path.join(dataset_dir,scene_name,"meshes") 32 | mesh_names = glob.glob(os.path.join(mesh_base,"*.obj")) 33 | 34 | meshes = {} 35 | 36 | # load the human as a scanned mesh 37 | human_mesh_name_clean = "human-{0:02d}.obj".format(frame_idx) 38 | human_mesh_name = os.path.join(dataset_dir,scene_name,"meshes",human_mesh_name_clean) 39 | print("loading scanned human mesh") 40 | 41 | trimesh_obj = trimesh.load(human_mesh_name) 42 | human_mesh = pyrender.Mesh.from_trimesh(trimesh_obj) 43 | meshes["scanned_mesh"] = human_mesh 44 | 45 | # load the human as a SMPL mesh 46 | import smplx, torch 47 | 48 | print("loading smpl mesh") 49 | annotation_folder = os.path.join(dataset_dir,scene_name,"human_annotation") 50 | annotation_file = os.path.join(annotation_folder,"human-{0:02d}_smpl.pkl".format(frame_idx)) 51 | 52 | with open(annotation_file, 'rb') as f: 53 | x = pickle.load(f) 54 | 55 | smpl_model = smplx.create("models",model_type="smpl", gender="male",num_beta=300,ext="npz") 56 | faces = smpl_model.faces 57 | 58 | output = smpl_model(return_verts=True, 59 | body_pose = torch.tensor(x["pose"][3:]).float().unsqueeze(0), 60 | betas = torch.tensor(x["betas"])[:10].float().unsqueeze(0), 61 | global_orient = torch.tensor(x["pose"][:3]).float().unsqueeze(0), 62 | transl = torch.tensor(x["trans"]).float().unsqueeze(0), 63 | ) 64 | vertices = output.vertices 65 | smpl_mesh = pyrender.Mesh.from_trimesh(trimesh.Trimesh(vertices=vertices.detach().cpu().numpy().squeeze(), 66 | faces=faces)) 67 | meshes["smpl_mesh"] = smpl_mesh 68 | 69 | intrinsic = np.loadtxt(os.path.join(base, "intrinsics.txt")) 70 | fx,fy,px,py = intrinsic[0,0],intrinsic[1,1],intrinsic[0,2],intrinsic[1,2] 71 | camera = pyrender.IntrinsicsCamera(0,0,0,0) 72 | camera.fx = fx 73 | camera.fy = fy 74 | camera.cx = px 75 | camera.cy = py 76 | camera_node = scene.add(camera) 77 | 78 | light = pyrender.SpotLight(color=np.ones(3), intensity=10.0, 79 | innerConeAngle=np.pi/16.0, 80 | outerConeAngle=np.pi/3.0) 81 | light_node = scene.add(light) 82 | 83 | 84 | images = glob.glob(os.path.join(base, "rgb", "*.png")) 85 | poses = glob.glob(os.path.join(base, "camera_pose", "*.txt")) 86 | instances = glob.glob(os.path.join(base, "instance", "*.png")) 87 | 88 | assert len(set([len(images),len(poses),len(instances)])) == 1 89 | 90 | h,w,_ = cv2.imread(images[0]).shape 91 | 92 | r = pyrender.OffscreenRenderer(w, h) 93 | 94 | error_cmap = plt.get_cmap("seismic") 95 | depth_cmap = plt.get_cmap("inferno") 96 | instance_cmap = plt.get_cmap("gist_rainbow") 97 | 98 | images.sort() 99 | poses.sort() 100 | instances.sort() 101 | 102 | cv2pyrender = np.array([[1,1,1,1], 103 | [-1,-1,-1,-1], 104 | [-1,-1,-1,-1], 105 | [1,1,1,1]]) 106 | 107 | if view_idx == -1: 108 | start = 0 109 | end = 4 110 | else: 111 | start = view_idx 112 | end = view_idx+1 113 | 114 | for idx_pre in range(start,end): 115 | 116 | idx = frame_idx * 4 + idx_pre 117 | 118 | each_rgb = cv2.imread(images[idx],-1) 119 | each_instance = cv2.imread(instances[idx],-1) 120 | each_pose = np.loadtxt(poses[idx]) 121 | 122 | scene.set_pose(camera_node, cv2pyrender.T * each_pose) 123 | scene.set_pose(light_node, cv2pyrender.T * each_pose) 124 | 125 | added_node = scene.add(meshes["scanned_mesh"]) 126 | color_scanned, _ = r.render(scene) 127 | scene.remove_node(added_node) 128 | 129 | added_node = scene.add(meshes["smpl_mesh"]) 130 | color_smpl, _ = r.render(scene) 131 | scene.remove_node(added_node) 132 | 133 | each_instance_cmap = (instance_cmap(each_instance)[:, :, [2, 1, 0]] * 255).astype(np.uint8) 134 | rgb_augmented = cv2.addWeighted(each_instance_cmap, 0.3, each_rgb, 0.5, 1).astype(np.float32) / 255 135 | rgb_augmented_scanned = cv2.addWeighted(color_scanned[:,:,[2,1,0]], 0.7, each_rgb, 0.3, 1).astype(np.float32) / 255 136 | rgb_augmented_smpl = cv2.addWeighted(color_smpl[:,:,[2,1,0]], 0.7, each_rgb, 0.3, 1).astype(np.float32) / 255 137 | 138 | plot = np.hstack([rgb_augmented,rgb_augmented_scanned,rgb_augmented_smpl]) 139 | 140 | if view_idx == -1: 141 | 142 | h_,w_,_ = plot.shape 143 | plot_reshape = cv2.resize(plot,(w_//2,h_//2)) 144 | 145 | cv2.imshow("human_visualization",plot_reshape) 146 | cv2.waitKey(500) 147 | 148 | else: 149 | plt.figure() 150 | plt.imshow(plot[:,:,[2,1,0]]) 151 | plt.show() -------------------------------------------------------------------------------- /render_pose.py: -------------------------------------------------------------------------------- 1 | import pyrender, trimesh, argparse 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | import cv2, os, glob 5 | import _pickle as cPickle 6 | 7 | cls_id_to_name = {1: "box", 8 | 2: "bottle", 9 | 3: "can", 10 | 4: "cup", 11 | 5: "remote", 12 | 6: "teapot", 13 | 7: "cutlery", 14 | 8: "glass", 15 | 9: "shoe", 16 | 10: "tube"} 17 | 18 | name_to_cls_id = {v:k for k,v in cls_id_to_name.items()} 19 | 20 | housecat2pyrender_conversion = np.array([[1, 1, 1, 1], 21 | [-1, -1, -1, -1], 22 | [-1, -1, -1, -1], 23 | [1, 1, 1, 1]]) 24 | 25 | def main(): 26 | 27 | parser = argparse.ArgumentParser(description="render_obj_with_pose") 28 | 29 | parser.add_argument("dataset_dir") 30 | parser.add_argument("scene_name") 31 | parser.add_argument("idx") 32 | 33 | args = parser.parse_args() 34 | 35 | dataset_dir = args.dataset_dir 36 | folder_name = os.path.join(dataset_dir,args.scene_name) 37 | 38 | render_idx = int(args.idx) 39 | 40 | object_list = glob.glob(os.path.join(folder_name,"obj_pose")+"/*.txt") 41 | 42 | 43 | with open(os.path.join(folder_name,"meta.txt")) as f: 44 | object_list= [each_line.split(" ")[1] for each_line in f.readlines() if each_line.split(" ")[0] in cls_id_to_name.values()] 45 | 46 | objects = {} 47 | for each_object in object_list: 48 | 49 | if os.name == 'nt': 50 | object_name_full = each_object.split("\\")[-1] 51 | else: 52 | object_name_full = each_object.split("/")[-1] 53 | each_class, each_name = object_name_full.split(".")[0].split("-") 54 | objects[(each_class,each_name)] = {} 55 | 56 | # setup pyrender scene with camera 57 | scene = pyrender.Scene(bg_color=[0, 0, 0], ambient_light=[0.7,0.7,0.7]) 58 | k = np.loadtxt(os.path.join(folder_name, "intrinsics.txt")) 59 | 60 | 61 | 62 | camera = pyrender.IntrinsicsCamera(0, 0, 0, 0) 63 | camera.fx = k[0, 0] 64 | camera.fy = k[1, 1] 65 | camera.cx = k[0, 2] 66 | camera.cy = k[1, 2] 67 | scene.add(camera) 68 | 69 | # mesh 70 | obj_folder = os.path.join(dataset_dir,"pose_meshes_canonical") 71 | bbox_scales = [] 72 | for each_key in objects.keys(): 73 | print("loading mesh (category, instance) :",each_key) 74 | 75 | obj_class, obj_name = each_key 76 | 77 | if obj_class in name_to_cls_id.keys(): 78 | 79 | 80 | obj_fname = os.path.join(obj_folder,obj_class + "-" + obj_name + '.obj') 81 | 82 | trimesh_obj = trimesh.load(obj_fname) 83 | bbox_scales.append(2*trimesh_obj.vertices.max(0)) 84 | mesh = pyrender.Mesh.from_trimesh(trimesh_obj) 85 | objects[each_key]["mesh"] = scene.add(mesh) 86 | 87 | bbox_scales = np.stack(bbox_scales,0) 88 | 89 | n_images = len(glob.glob("{0}/{1}/*.png".format(folder_name,"rgb"))) 90 | print("n_images :", n_images) 91 | 92 | with open(os.path.join(folder_name, "meta.txt"), 'r') as f: 93 | instance_labels = [each_line.strip().split(" ") for each_line in f.readlines()] 94 | 95 | if render_idx==-1: 96 | _start,_end = 0,n_images 97 | else: 98 | _start,_end = render_idx, render_idx+1 99 | 100 | for idx in range(_start,_end): 101 | 102 | img = plt.imread("{0}/rgb/{1:06d}.png".format(folder_name, idx))[:,:,[2,1,0]] 103 | 104 | if idx == _start: 105 | h,w,_ = img.shape 106 | r = pyrender.OffscreenRenderer(w, h) 107 | 108 | pkl_name = "{0}/labels/{1:06d}_label.pkl".format(folder_name, idx) 109 | with open(pkl_name, 'rb') as f: 110 | label = cPickle.load(f) 111 | 112 | RTs = [] 113 | 114 | instance_count = 0 115 | for label_idx,each_line in enumerate(instance_labels): 116 | 117 | cls,name,instance = each_line 118 | 119 | if cls in name_to_cls_id.keys(): 120 | cls = name_to_cls_id[cls] 121 | 122 | _,name = name.split("-") 123 | 124 | obj_to_cam = np.identity(4) 125 | 126 | rotation = label['rotations'][instance_count] 127 | translation = label['translations'][instance_count] 128 | obj_to_cam[:3,:3] = rotation 129 | obj_to_cam[:3,3] = translation 130 | RTs.append(np.array(obj_to_cam)) 131 | obj_to_cam *= housecat2pyrender_conversion 132 | scene.set_pose(objects[(cls_id_to_name[int(cls)],name)]["mesh"], obj_to_cam) 133 | 134 | instance_count += 1 135 | 136 | RTs = np.stack(RTs,0) 137 | color_i, depth_i = r.render(scene) 138 | 139 | img = (img).astype(np.float32) 140 | mask = np.stack([np.zeros_like(depth_i),(depth_i != 0),np.zeros_like(depth_i)],-1).astype(np.float32) 141 | 142 | _,_,c = img.shape 143 | 144 | overlay = cv2.addWeighted(mask, 0.2, img, 0.8, 0) 145 | overlay = (overlay * 255).astype(np.uint8) 146 | overlay = draw_detections(overlay, k, RTs, bbox_scales) 147 | 148 | 149 | if render_idx == -1: 150 | cv2.imshow("vis",overlay) 151 | cv2.waitKey(1) 152 | else: 153 | plt.figure() 154 | plt.imshow(overlay[:,:,[2,1,0]]) 155 | plt.show() 156 | 157 | 158 | 159 | """ 160 | Functions for drawing 3d BBox (draw_detections, transform_coordinates_3d, get_3d_bbox, calculate_2d_projections, draw) 161 | are adapted from Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation Detection and 162 | evaluation code (https://github.com/hughw19/NOCS_CVPR2019) 163 | """ 164 | def draw_detections(image, intrinsics, gt_RTs, gt_scales): 165 | 166 | draw_image_bbox = image.copy() 167 | 168 | if gt_RTs is not None: 169 | for ind, RT in enumerate(gt_RTs): 170 | 171 | xyz_axis = 0.3 * np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0]]).transpose() 172 | transformed_axes = transform_coordinates_3d(xyz_axis, RT) 173 | projected_axes = calculate_2d_projections(transformed_axes, intrinsics) 174 | 175 | bbox_3d = get_3d_bbox(gt_scales[ind], 0) 176 | transformed_bbox_3d = transform_coordinates_3d(bbox_3d, RT) 177 | 178 | projected_bbox = calculate_2d_projections(transformed_bbox_3d, intrinsics) 179 | draw_image_bbox = draw(draw_image_bbox, projected_bbox, projected_axes, (0, 255, 0)) 180 | 181 | return draw_image_bbox 182 | 183 | def transform_coordinates_3d(coordinates, RT): 184 | """ 185 | Input: 186 | coordinates: [3, N] 187 | RT: [4, 4] 188 | Return 189 | new_coordinates: [3, N] 190 | 191 | """ 192 | assert coordinates.shape[0] == 3 193 | coordinates = np.vstack([coordinates, np.ones((1, coordinates.shape[1]), dtype=np.float32)]) 194 | new_coordinates = RT @ coordinates 195 | new_coordinates = new_coordinates[:3, :] / new_coordinates[3, :] 196 | return new_coordinates 197 | 198 | def get_3d_bbox(scale, shift=0): 199 | """ 200 | Input: 201 | scale: [3] or scalar 202 | shift: [3] or scalar 203 | Return 204 | bbox_3d: [3, N] 205 | """ 206 | if hasattr(scale, "__iter__"): 207 | bbox_3d = np.array([[scale[0] / 2, +scale[1] / 2, scale[2] / 2], 208 | [scale[0] / 2, +scale[1] / 2, -scale[2] / 2], 209 | [-scale[0] / 2, +scale[1] / 2, scale[2] / 2], 210 | [-scale[0] / 2, +scale[1] / 2, -scale[2] / 2], 211 | [+scale[0] / 2, -scale[1] / 2, scale[2] / 2], 212 | [+scale[0] / 2, -scale[1] / 2, -scale[2] / 2], 213 | [-scale[0] / 2, -scale[1] / 2, scale[2] / 2], 214 | [-scale[0] / 2, -scale[1] / 2, -scale[2] / 2]]) + shift 215 | else: 216 | bbox_3d = np.array([[scale / 2, +scale / 2, scale / 2], 217 | [scale / 2, +scale / 2, -scale / 2], 218 | [-scale / 2, +scale / 2, scale / 2], 219 | [-scale / 2, +scale / 2, -scale / 2], 220 | [+scale / 2, -scale / 2, scale / 2], 221 | [+scale / 2, -scale / 2, -scale / 2], 222 | [-scale / 2, -scale / 2, scale / 2], 223 | [-scale / 2, -scale / 2, -scale / 2]]) + shift 224 | 225 | bbox_3d = bbox_3d.transpose() 226 | return bbox_3d 227 | 228 | def calculate_2d_projections(coordinates_3d, intrinsics): 229 | """ 230 | Input: 231 | coordinates: [3, N] 232 | intrinsics: [3, 3] 233 | Return 234 | projected_coordinates: [N, 2] 235 | """ 236 | projected_coordinates = intrinsics @ coordinates_3d 237 | projected_coordinates = projected_coordinates[:2, :] / projected_coordinates[2, :] 238 | projected_coordinates = projected_coordinates.transpose() 239 | projected_coordinates = np.array(projected_coordinates, dtype=np.int32) 240 | 241 | return projected_coordinates 242 | 243 | def draw(img, imgpts, axes, color): 244 | imgpts = np.int32(imgpts).reshape(-1, 2) 245 | 246 | # draw ground layer in darker color 247 | color_ground = (int(color[0] * 0.3), int(color[1] * 0.3), int(color[2] * 0.3)) 248 | for i, j in zip([4, 5, 6, 7], [5, 7, 4, 6]): 249 | img = cv2.line(img, tuple(imgpts[i]), tuple(imgpts[j]), color_ground, 3) 250 | 251 | # draw pillars in blue color 252 | color_pillar = (int(color[0] * 0.6), int(color[1] * 0.6), int(color[2] * 0.6)) 253 | for i, j in zip(range(4), range(4, 8)): 254 | img = cv2.line(img, tuple(imgpts[i]), tuple(imgpts[j]), color_pillar, 3) 255 | 256 | # finally, draw top layer in color 257 | for i, j in zip([0, 1, 2, 3], [1, 3, 0, 2]): 258 | img = cv2.line(img, tuple(imgpts[i]), tuple(imgpts[j]), color, 3) 259 | 260 | return img 261 | 262 | 263 | 264 | if __name__ == "__main__": 265 | main() 266 | # ^ 267 | # cv pose : / z 268 | # ---> x 269 | # | 270 | # v y 271 | 272 | # ^ y 273 | # pyrender pose : | 274 | # ---> x 275 | # / 276 | # v z 277 | -------------------------------------------------------------------------------- /render_scene.py: -------------------------------------------------------------------------------- 1 | import pyrender, trimesh, os, glob, cv2, argparse 2 | import numpy as np 3 | import matplotlib.pyplot as plt 4 | 5 | cmap = plt.get_cmap("gist_rainbow") 6 | 7 | if os.name == 'nt': 8 | separator = "\\" 9 | else: 10 | separator = '/' 11 | 12 | scene = pyrender.Scene(bg_color=[0, 0, 0], ambient_light=[1.0, 1.0, 1.0]) 13 | 14 | parser = argparse.ArgumentParser(description="Undistort images") 15 | parser.add_argument("dataset_dir") 16 | parser.add_argument("scene_name") 17 | parser.add_argument("traj_name") 18 | parser.add_argument("idx") 19 | 20 | args = parser.parse_args() 21 | 22 | dataset_dir = args.dataset_dir 23 | scene_name = args.scene_name 24 | traj_name = args.traj_name 25 | frame_idx = int(args.idx) 26 | 27 | base = os.path.join(dataset_dir,scene_name,scene_name+"_"+traj_name) 28 | mesh_base = os.path.join(dataset_dir,scene_name,"meshes") 29 | mesh_names = glob.glob(os.path.join(mesh_base,"*.obj")) 30 | 31 | with open(os.path.join(base,"meta.txt"),"r") as f: 32 | meshes_in_the_scene = [each_line.strip().split(" ")[1] for each_line in f.readlines()] 33 | 34 | meshes = {} 35 | 36 | for each_mesh_name in mesh_names: 37 | 38 | each_mesh_name_clean = each_mesh_name.split(separator)[-1].split(".")[0] 39 | 40 | if not (each_mesh_name_clean in meshes_in_the_scene): 41 | print("skipping",each_mesh_name_clean) 42 | continue 43 | else: 44 | print("loading",each_mesh_name_clean) 45 | 46 | trimesh_obj = trimesh.load(each_mesh_name) 47 | trimesh_obj.visual = trimesh.visual.ColorVisuals() 48 | mesh = pyrender.Mesh.from_trimesh(trimesh_obj) 49 | meshes[each_mesh_name] = mesh 50 | scene.add(mesh) 51 | 52 | intrinsic = np.loadtxt(os.path.join(base, "intrinsics.txt")) 53 | fx,fy,px,py = intrinsic[0,0],intrinsic[1,1],intrinsic[0,2],intrinsic[1,2] 54 | camera = pyrender.IntrinsicsCamera(0,0,0,0) 55 | camera.fx = fx 56 | camera.fy = fy 57 | camera.cx = px 58 | camera.cy = py 59 | camera_node = scene.add(camera) 60 | 61 | 62 | images = glob.glob(os.path.join(base, "rgb", "*.png")) 63 | poses = glob.glob(os.path.join(base, "camera_pose", "*.txt")) 64 | instances = glob.glob(os.path.join(base, "instance", "*.png")) 65 | depth_d435 = glob.glob(os.path.join(base, "depth_d435", "*.png")) 66 | depth_tof = glob.glob(os.path.join(base, "depth_tof", "*.png")) 67 | 68 | assert len(set([len(images),len(poses),len(instances),len(depth_d435),len(depth_tof)])) == 1 69 | 70 | h,w,_ = cv2.imread(images[0]).shape 71 | 72 | r = pyrender.OffscreenRenderer(w, h) 73 | 74 | error_cmap = plt.get_cmap("seismic") 75 | depth_cmap = plt.get_cmap("inferno") 76 | instance_cmap = plt.get_cmap("gist_rainbow") 77 | 78 | images.sort() 79 | poses.sort() 80 | instances.sort() 81 | depth_d435.sort() 82 | depth_tof.sort() 83 | 84 | cv2pyrender = np.array([[1,1,1,1], 85 | [-1,-1,-1,-1], 86 | [-1,-1,-1,-1], 87 | [1,1,1,1]]) 88 | 89 | if frame_idx == -1: 90 | start = 0 91 | end = len(images) 92 | else: 93 | start = frame_idx 94 | end = frame_idx+1 95 | 96 | for idx in range(start,end): 97 | 98 | each_rgb = cv2.imread(images[idx],-1) 99 | each_instance = cv2.imread(instances[idx],-1) 100 | each_depth_d435 = cv2.imread(depth_d435[idx],-1) / 1000 101 | each_depth_tof = cv2.imread(depth_tof[idx],-1) / 1000 102 | each_pose = np.loadtxt(poses[idx]) 103 | 104 | scene.set_pose(camera_node, cv2pyrender.T * each_pose) 105 | 106 | color, depth = r.render(scene) 107 | 108 | d435_error = (depth - each_depth_d435) * (each_depth_d435 != 0) 109 | tof_error = (depth - each_depth_tof) * (each_depth_tof != 0) 110 | 111 | dmax = 4 112 | dmin = 0 113 | error_max = 1 114 | error_min = -error_max 115 | 116 | each_instance_cmap = (instance_cmap(each_instance)[:, :, [2, 1, 0]] * 255).astype(np.uint8) 117 | rgb_augmented = cv2.addWeighted(each_instance_cmap, 0.3, each_rgb, 0.5, 1).astype(np.float32) / 255 118 | 119 | if frame_idx == -1: 120 | 121 | depth_clipped = ((depth.clip(dmin,dmax) - dmin) / (dmax-dmin) * 255).astype(np.uint8) 122 | d435_clipped = ((each_depth_d435.clip(dmin,dmax) - dmin) / (dmax-dmin) * 255).astype(np.uint8) 123 | tof_clipped = ((each_depth_tof.clip(dmin, dmax) - dmin) / (dmax - dmin) * 255).astype(np.uint8) 124 | 125 | depth_8bit = depth_cmap(depth_clipped)[:,:,[2,1,0]] 126 | d435_8bit = depth_cmap(d435_clipped)[:,:,[2,1,0]] 127 | tof_8bit = depth_cmap(tof_clipped)[:, :, [2,1,0]] 128 | 129 | error_d435_clipped = error_cmap((d435_error.clip(error_min,error_max) - error_min) / 2*error_max)[:,:,[2,1,0]] 130 | error_tof_clipped = error_cmap((tof_error.clip(error_min,error_max) - error_min) / 2*error_max)[:,:,[2,1,0]] 131 | 132 | plot_row1 = np.hstack([depth_8bit,d435_8bit,tof_8bit]) 133 | plot_row2 = np.hstack([rgb_augmented,error_d435_clipped,error_tof_clipped]) 134 | 135 | plot = np.vstack([plot_row1,plot_row2]) 136 | h_,w_,_ = plot.shape 137 | plot_reshape = cv2.resize(plot,(w_//2,h_//2)) 138 | 139 | cv2.imshow("error",plot_reshape) 140 | cv2.waitKey(5) 141 | 142 | else: 143 | plt.figure() 144 | plt.subplot(2,3,1) 145 | plt.title("Depth GT") 146 | plt.imshow(depth,cmap="inferno",vmax=dmax,vmin=dmin) 147 | plt.subplot(2,3,2) 148 | plt.title("Depth D435") 149 | plt.imshow(each_depth_d435, cmap="inferno", vmax=dmax, vmin=dmin) 150 | plt.subplot(2,3,3) 151 | plt.title("Depth ToF") 152 | plt.imshow(each_depth_tof, cmap="inferno", vmax=dmax, vmin=dmin) 153 | plt.subplot(2,3,4) 154 | plt.title("RGB with Semantic Mask") 155 | plt.imshow(rgb_augmented[:,:,[2,1,0]]) 156 | plt.subplot(2,3,5) 157 | plt.title("D435 Depth Error") 158 | plt.imshow(d435_error, cmap="seismic", vmax=error_max, vmin=error_min) 159 | plt.subplot(2,3,6) 160 | plt.title("ToF Depth Error") 161 | plt.imshow(tof_error, cmap="seismic", vmax=error_max, vmin=error_min) 162 | plt.show() 163 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | pyrender==0.1.45 2 | opencv-python==4.10.0.82 3 | smplx==0.1.28 4 | chumpy==0.70 5 | numpy==1.23.0 6 | matplotlib==3.9.0 7 | --------------------------------------------------------------------------------