├── .gitignore
├── LICENSE
├── README.md
├── dataset
    └── README.txt
├── models
    └── README.txt
├── render_human.py
├── render_pose.py
├── render_scene.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
1 | dataset/*
2 | !dataset/README.txt
3 | 
4 | models/*
5 | !models/README.txt
6 | 
7 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Junggy
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # SCRREAM
  2 | 
  3 | This is the official repository for SCRREAM (**SC**an, **R**egister, **RE**nder **A**nd **M**ap) benchmark dataset (accepted at **NeurIPS 2024**). We provide code example and intruction for visualizing our dataset as well as dataset download link.
  4 | For the further info, please check our arxiv (https://arxiv.org/pdf/2410.22715) and project page (https://sites.google.com/view/scrream/about)
  5 | 
  6 | ## Link to Download Dataset
  7 | **Indoor Reconstruction and SLAM dataset & Object Removal and Scene Editing dataset :**
  8 | 
  9 | https://drive.google.com/file/d/1Lu5VdlLn5NNoPSh2Tp90M6NF2Y1us677/view?usp=sharing (scene01-05, 59 Gb) 
 10 | https://drive.google.com/file/d/1V8HRX4-12jv-ZNW6AM_fyHoNiDstBV8q/view?usp=sharing (scene06-11, 101 Gb) 
 11 | 
 12 | In case of slow internet, please download with smaller zip files
 13 | 
 14 | https://drive.google.com/file/d/11QIO2ZT2DnFo8V9OMiBzMjRAwjcUJ5qC/view?usp=sharing (scene01)\
 15 | https://drive.google.com/file/d/1jvIucH0PI9hkuYy3cRwYR7_eFpt7Nhpi/view?usp=sharing (scene02)\
 16 | https://drive.google.com/file/d/1mL2NajhaUxXvhvjGd7WPe7KKe037TshJ/view?usp=sharing (scene03)\
 17 | https://drive.google.com/file/d/1Tj6LckAubUd-OI8SwCEwULgwjRatdZbD/view?usp=sharing (scene04)\
 18 | https://drive.google.com/file/d/1E7Ahm4ERde9gXsUb3pXVJntOFWUqWSil/view?usp=sharing (scene05)\
 19 | https://drive.google.com/file/d/16kB_pjzb5V-VS8Ma-EVwtJYH0jxt1ptb/view?usp=sharing (scene06)\
 20 | https://drive.google.com/file/d/1-f-RkDdjVKwJdZeGLHGPwpz--cJNd11M/view?usp=sharing (scene07)\
 21 | https://drive.google.com/file/d/1VPlD4zvALDeDfPtXBigubgAJxJlueZBA/view?usp=sharing (scene08)\
 22 | https://drive.google.com/file/d/1DUBOMxurmjWSU2R5MUs6iXJMFyKTBhIA/view?usp=sharing (scene09)\
 23 | https://drive.google.com/file/d/1h2v4hrY3IxsR49MV65ZOxXI8xMMHGL1Y/view?usp=sharing (scene10)\
 24 | https://drive.google.com/file/d/1fwjpmXQ29sO_wk1SOmfdhlGq7Kyv8crN/view?usp=sharing (scene11)
 25 | 
 26 | 
 27 | **Human Reconstruction Dataset :**
 28 | 
 29 | https://drive.google.com/file/d/1BHHb5ibNsYsm00FXUrwLRkvC630sYq3U/view?usp=sharing (scene01-02, 7.7 Gb)
 30 | 
 31 | **Pose Estimation Dataset :**
 32 | 
 33 | https://drive.google.com/file/d/1kcy8DCu6L2GtU2vK22w9FhhlLP6ljq9b/view?usp=sharing (scene01-02, 7.4 Gb)
 34 | 
 35 | Once the dataset is downloaded, unzip in the ```dataset``` folder.
 36 | It should look like this
 37 | 
 38 | ```
 39 | dataset\
 40 |    human_scene01\..
 41 |    human_scene02\..
 42 |    pose_meshes_canonical\..
 43 |    pose_scene01\..
 44 |    pose_scene02\..
 45 |    scene01\..
 46 |       ...
 47 |    scene11\..
 48 |    README.txt
 49 | ```
 50 | 
 51 | 
 52 | ## Instruction for Visualization
 53 | ### Requirements
 54 | Install requirements with pip with this command
 55 | ```
 56 | pip install -r requirements.txt
 57 | ```
 58 | We tested with python version 3.9 on windows 10 machine.
 59 | 
 60 | ### Visualizing Indoor Reconstruction and SLAM Dataset & Object Removal and Scene Editing Dataset
 61 | 
 62 | To visualize the Indoor Reconstruction and SLAM Dataset & Object Removal and Scene Editing Dataset,
 63 | run ```render_scene.py``` script with argument ```{dataset_dir} {scene} {traj} {frame}```, such as
 64 | ```
 65 | python render_scene.py {dataset_dir} {scene} {traj} {frame}
 66 | ```
 67 | If ```{frame}``` is set to -1, the script with go thorugh the entire frame as a video sequence.
 68 | If ```{frame}``` is set to positive integer, the script will display the image with the given frame number 
 69 | 
 70 | ```
 71 | python render_scene.py dataset scene01 full_00  -1 \\ for visualizing entire sequnce in scene01_full_00
 72 | python render_scene.py dataset scene01 full_00 100 \\ for visualizing 100th frame in scene01_full_00
 73 | ```
 74 | 
 75 | To visualize the reduced scene for object removal or scene editing experiments, use ```{traj}``` that contains reduced. scene01, scene02, scene04, scene05, scene06, scene07, scene08, scene09 contain reduced scene. 
 76 | For example, 
 77 | ```
 78 | render_scene.py dataset scene01 reduced_00 -1 \\ for visualizing enture sequnce in scene01_full_00
 79 | ```
 80 | will play the sequence from scene01 with reduced objects.
 81 | 
 82 | The visualization is formatted as 2x3 image layout with the given format:
 83 | ```
 84 | ㅣ   (Ground Truth Depth)   |    (D435 Depth)    |     (ToF Depth)    |
 85 | ㅏㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅡㅓ
 86 | ㅣ (RGB with Semantic Mask) |    (D435 Error)    |     (ToF Error)    |
 87 | ```
 88 | 
 89 | ### Visualizing Human Reconstruction Dataset
 90 | (Pre-requisite) Download SMPL model into . As our dataset uses smpl model format, only downloading smpl format will be enough (download .pkl file if possible. Our dataset assumes .pkl in the script).
 91 | The folder should look like this :
 92 | ```
 93 | models\
 94 |    smpl\..
 95 |    README.txt
 96 | ```
 97 | 
 98 | To visualize the human reconstruction dataset, run ```render_human.py``` script with argument ```{dataset_dir} {scene} {frame} {view}```, such as 
 99 | ```
100 | python render_human.py {dataset_dir} {scene} {frame} {view}
101 | ```
102 | Note that our human dataset contains 4 multiview images per each frame (or human posture). If ```view``` is set to -1, the script will plot 4 consecutive views on given frame.
103 | ```
104 | python human_scene.py dataset human_scene01 0 0 \\ for visualizing first view of frame 0 in the human_scene01
105 | python human_scene.py dataset human_scene01 0 -1 \\ for visualizing all 4 views of frame 0 in the human_scene01
106 | ```
107 | The visualization is formatted as 1x3 image layout with the given format:
108 | ```
109 | | (RGB with Semantic Mask) | (RGB with Scanned Human Mesh) | (RGB with SMPL Human Mesh) |
110 | ```
111 | 
112 | 
113 | 
114 | 
115 | ### Visualizing 6D Pose Estimation Dataset
116 | To visualize the human reconstruction dataset, run ```render_pose.py``` script with argument ```{dataset_dir} {scene} {frame}```, such as 
117 | ```
118 | python render_pose.py {dataset_dir} {scene} {frame}
119 | ```
120 | If ```{frame}``` is set to -1, the script with go thorugh the entire frame as a video sequence.
121 | If ```{frame}``` is set to positive integer, the script will display the image with the given frame number
122 | ```
123 | python pose_scene.py dataset pose_scene01  -1 \\ for visualizing entire sequnce in pose_scene01
124 | python pose_scene.py dataset pose_scene01 100 \\ for visualizing 100th frame in pose_scene01
125 | ```
126 | The visualization is formatted as 1x1 image layout with the given format:
127 | ```
128 | | (RGB with Pose 3D Bounding Box and Mask) |
129 | ```
130 | ## Citation
131 | ```
132 | @misc{jung2024scrreamscanregister,
133 |       title={SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark}, 
134 |       author={HyunJun Jung and Weihang Li and Shun-Cheng Wu and William Bittner and Nikolas Brasch and Jifei Song and Eduardo Pérez-Pellitero and Zhensong Zhang and Arthur Moreau and Nassir Navab and Benjamin Busam},
135 |       year={2024},
136 |       eprint={2410.22715},
137 |       archivePrefix={arXiv},
138 |       primaryClass={cs.CV},
139 |       url={https://arxiv.org/abs/2410.22715}, 
140 | }
141 | ```
142 | 
143 | 
144 | 


--------------------------------------------------------------------------------
/dataset/README.txt:
--------------------------------------------------------------------------------
 1 | download and upzip dataset here
 2 | 
 3 | after successfully unzipping, this folder should contain
 4 | 
 5 | human_scene01
 6 | human_scene02
 7 | pose_meshes_canonical
 8 | pose_scene01
 9 | pose_scene02
10 | scene01
11 | ...
12 | scene11
13 | README.txt


--------------------------------------------------------------------------------
/models/README.txt:
--------------------------------------------------------------------------------
1 | download SMPL in pkl format.
2 | 
3 | after successfully downloading, this folder should contain
4 | 
5 | smpl
6 | README.txt


--------------------------------------------------------------------------------
/render_human.py:
--------------------------------------------------------------------------------
  1 | import pyrender, trimesh, os, glob, cv2, argparse, smplx, pickle
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | 
  5 | cmap = plt.get_cmap("gist_rainbow")
  6 | 
  7 | if os.name == 'nt':
  8 |     separator = "\\"
  9 | else:
 10 |     separator = '/'
 11 | 
 12 | scene = pyrender.Scene(bg_color=[0, 0, 0],ambient_light=[0.5,0.5,0.5])
 13 | 
 14 | parser = argparse.ArgumentParser(description="Undistort images")
 15 | parser.add_argument("dataset_dir")
 16 | parser.add_argument("scene_name")
 17 | parser.add_argument("idx")
 18 | parser.add_argument("view_idx")
 19 | 
 20 | args = parser.parse_args()
 21 | 
 22 | dataset_dir = args.dataset_dir
 23 | scene_name  = args.scene_name
 24 | frame_idx = int(args.idx)
 25 | view_idx = int(args.view_idx)
 26 | 
 27 | assert frame_idx >= 0, "frame_idx has to be greater than 0"
 28 | assert view_idx >= -1 and view_idx < 4, "view_idx has to be a value between -1 and 3"
 29 | 
 30 | base = os.path.join(dataset_dir,scene_name)
 31 | mesh_base = os.path.join(dataset_dir,scene_name,"meshes")
 32 | mesh_names = glob.glob(os.path.join(mesh_base,"*.obj"))
 33 | 
 34 | meshes = {}
 35 | 
 36 | # load the human as a scanned mesh
 37 | human_mesh_name_clean = "human-{0:02d}.obj".format(frame_idx)
 38 | human_mesh_name = os.path.join(dataset_dir,scene_name,"meshes",human_mesh_name_clean)
 39 | print("loading scanned human mesh")
 40 | 
 41 | trimesh_obj = trimesh.load(human_mesh_name)
 42 | human_mesh = pyrender.Mesh.from_trimesh(trimesh_obj)
 43 | meshes["scanned_mesh"] = human_mesh
 44 | 
 45 | # load the human as a SMPL mesh
 46 | import smplx, torch
 47 | 
 48 | print("loading smpl mesh")
 49 | annotation_folder = os.path.join(dataset_dir,scene_name,"human_annotation")
 50 | annotation_file = os.path.join(annotation_folder,"human-{0:02d}_smpl.pkl".format(frame_idx))
 51 | 
 52 | with open(annotation_file, 'rb') as f:
 53 |     x = pickle.load(f)
 54 | 
 55 | smpl_model = smplx.create("models",model_type="smpl", gender="male",num_beta=300,ext="npz")
 56 | faces = smpl_model.faces
 57 | 
 58 | output = smpl_model(return_verts=True,
 59 |           body_pose = torch.tensor(x["pose"][3:]).float().unsqueeze(0),
 60 |           betas = torch.tensor(x["betas"])[:10].float().unsqueeze(0),
 61 |           global_orient = torch.tensor(x["pose"][:3]).float().unsqueeze(0),
 62 |           transl = torch.tensor(x["trans"]).float().unsqueeze(0),
 63 |           )
 64 | vertices = output.vertices
 65 | smpl_mesh = pyrender.Mesh.from_trimesh(trimesh.Trimesh(vertices=vertices.detach().cpu().numpy().squeeze(),
 66 |                        faces=faces))
 67 | meshes["smpl_mesh"] = smpl_mesh
 68 | 
 69 | intrinsic = np.loadtxt(os.path.join(base, "intrinsics.txt"))
 70 | fx,fy,px,py = intrinsic[0,0],intrinsic[1,1],intrinsic[0,2],intrinsic[1,2]
 71 | camera = pyrender.IntrinsicsCamera(0,0,0,0)
 72 | camera.fx = fx
 73 | camera.fy = fy
 74 | camera.cx = px
 75 | camera.cy = py
 76 | camera_node = scene.add(camera)
 77 | 
 78 | light = pyrender.SpotLight(color=np.ones(3), intensity=10.0,
 79 |                             innerConeAngle=np.pi/16.0,
 80 |                            outerConeAngle=np.pi/3.0)
 81 | light_node = scene.add(light)
 82 | 
 83 | 
 84 | images = glob.glob(os.path.join(base, "rgb", "*.png"))
 85 | poses = glob.glob(os.path.join(base, "camera_pose", "*.txt"))
 86 | instances = glob.glob(os.path.join(base, "instance", "*.png"))
 87 | 
 88 | assert len(set([len(images),len(poses),len(instances)])) == 1
 89 | 
 90 | h,w,_ = cv2.imread(images[0]).shape
 91 | 
 92 | r = pyrender.OffscreenRenderer(w, h)
 93 | 
 94 | error_cmap = plt.get_cmap("seismic")
 95 | depth_cmap = plt.get_cmap("inferno")
 96 | instance_cmap = plt.get_cmap("gist_rainbow")
 97 | 
 98 | images.sort()
 99 | poses.sort()
100 | instances.sort()
101 | 
102 | cv2pyrender = np.array([[1,1,1,1],
103 |                        [-1,-1,-1,-1],
104 |                        [-1,-1,-1,-1],
105 |                        [1,1,1,1]])
106 | 
107 | if view_idx == -1:
108 |     start = 0
109 |     end   = 4
110 | else:
111 |     start = view_idx
112 |     end   = view_idx+1
113 | 
114 | for idx_pre in range(start,end):
115 | 
116 |     idx = frame_idx * 4 + idx_pre
117 | 
118 |     each_rgb = cv2.imread(images[idx],-1)
119 |     each_instance = cv2.imread(instances[idx],-1)
120 |     each_pose = np.loadtxt(poses[idx])
121 | 
122 |     scene.set_pose(camera_node, cv2pyrender.T * each_pose)
123 |     scene.set_pose(light_node, cv2pyrender.T * each_pose)
124 | 
125 |     added_node = scene.add(meshes["scanned_mesh"])
126 |     color_scanned, _ = r.render(scene)
127 |     scene.remove_node(added_node)
128 | 
129 |     added_node = scene.add(meshes["smpl_mesh"])
130 |     color_smpl, _ = r.render(scene)
131 |     scene.remove_node(added_node)
132 | 
133 |     each_instance_cmap = (instance_cmap(each_instance)[:, :, [2, 1, 0]] * 255).astype(np.uint8)
134 |     rgb_augmented = cv2.addWeighted(each_instance_cmap, 0.3, each_rgb, 0.5, 1).astype(np.float32) / 255
135 |     rgb_augmented_scanned = cv2.addWeighted(color_scanned[:,:,[2,1,0]], 0.7, each_rgb, 0.3, 1).astype(np.float32) / 255
136 |     rgb_augmented_smpl    = cv2.addWeighted(color_smpl[:,:,[2,1,0]], 0.7, each_rgb, 0.3, 1).astype(np.float32) / 255
137 | 
138 |     plot = np.hstack([rgb_augmented,rgb_augmented_scanned,rgb_augmented_smpl])
139 | 
140 |     if view_idx == -1:
141 | 
142 |         h_,w_,_ = plot.shape
143 |         plot_reshape = cv2.resize(plot,(w_//2,h_//2))
144 | 
145 |         cv2.imshow("human_visualization",plot_reshape)
146 |         cv2.waitKey(500)
147 | 
148 |     else:
149 |         plt.figure()
150 |         plt.imshow(plot[:,:,[2,1,0]])
151 |         plt.show()


--------------------------------------------------------------------------------
/render_pose.py:
--------------------------------------------------------------------------------
  1 | import pyrender, trimesh, argparse
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | import cv2, os, glob
  5 | import _pickle as cPickle
  6 | 
  7 | cls_id_to_name = {1: "box",
  8 |                   2: "bottle",
  9 |                   3: "can",
 10 |                   4: "cup",
 11 |                   5: "remote",
 12 |                   6: "teapot",
 13 |                   7: "cutlery",
 14 |                   8: "glass",
 15 |                   9: "shoe",
 16 |                   10: "tube"}
 17 | 
 18 | name_to_cls_id = {v:k for k,v in cls_id_to_name.items()}
 19 | 
 20 | housecat2pyrender_conversion = np.array([[1, 1, 1, 1],
 21 |                                          [-1, -1, -1, -1],
 22 |                                          [-1, -1, -1, -1],
 23 |                                          [1, 1, 1, 1]])
 24 |     
 25 | def main():
 26 | 
 27 |     parser = argparse.ArgumentParser(description="render_obj_with_pose")
 28 | 
 29 |     parser.add_argument("dataset_dir")
 30 |     parser.add_argument("scene_name")
 31 |     parser.add_argument("idx")
 32 | 
 33 |     args = parser.parse_args()
 34 | 
 35 |     dataset_dir = args.dataset_dir
 36 |     folder_name = os.path.join(dataset_dir,args.scene_name)
 37 | 
 38 |     render_idx = int(args.idx)
 39 | 
 40 |     object_list = glob.glob(os.path.join(folder_name,"obj_pose")+"/*.txt")
 41 |     
 42 | 
 43 |     with open(os.path.join(folder_name,"meta.txt")) as f:
 44 |         object_list= [each_line.split(" ")[1] for each_line in f.readlines() if  each_line.split(" ")[0] in cls_id_to_name.values()]
 45 | 
 46 |     objects = {}
 47 |     for each_object in object_list:
 48 | 
 49 |         if os.name == 'nt':
 50 |             object_name_full = each_object.split("\\")[-1]
 51 |         else:
 52 |             object_name_full = each_object.split("/")[-1]
 53 |         each_class, each_name = object_name_full.split(".")[0].split("-")
 54 |         objects[(each_class,each_name)] = {}
 55 | 
 56 |     # setup pyrender scene with camera
 57 |     scene = pyrender.Scene(bg_color=[0, 0, 0], ambient_light=[0.7,0.7,0.7])
 58 |     k = np.loadtxt(os.path.join(folder_name, "intrinsics.txt"))
 59 | 
 60 | 
 61 | 
 62 |     camera = pyrender.IntrinsicsCamera(0, 0, 0, 0)
 63 |     camera.fx = k[0, 0]
 64 |     camera.fy = k[1, 1]
 65 |     camera.cx = k[0, 2]
 66 |     camera.cy = k[1, 2]
 67 |     scene.add(camera)
 68 | 
 69 |     # mesh
 70 |     obj_folder = os.path.join(dataset_dir,"pose_meshes_canonical")
 71 |     bbox_scales = []
 72 |     for each_key in objects.keys():
 73 |         print("loading mesh (category, instance) :",each_key)
 74 | 
 75 |         obj_class, obj_name = each_key
 76 | 
 77 |         if obj_class in name_to_cls_id.keys():
 78 | 
 79 |             
 80 |             obj_fname = os.path.join(obj_folder,obj_class + "-" + obj_name + '.obj')
 81 | 
 82 |             trimesh_obj = trimesh.load(obj_fname)
 83 |             bbox_scales.append(2*trimesh_obj.vertices.max(0))
 84 |             mesh = pyrender.Mesh.from_trimesh(trimesh_obj)
 85 |             objects[each_key]["mesh"] = scene.add(mesh)
 86 | 
 87 |     bbox_scales = np.stack(bbox_scales,0)
 88 | 
 89 |     n_images = len(glob.glob("{0}/{1}/*.png".format(folder_name,"rgb")))
 90 |     print("n_images :", n_images)
 91 | 
 92 |     with open(os.path.join(folder_name, "meta.txt"), 'r') as f:
 93 |         instance_labels = [each_line.strip().split(" ") for each_line in f.readlines()]
 94 | 
 95 |     if render_idx==-1:
 96 |         _start,_end = 0,n_images
 97 |     else:
 98 |         _start,_end = render_idx, render_idx+1
 99 | 
100 |     for idx in range(_start,_end):
101 | 
102 |         img = plt.imread("{0}/rgb/{1:06d}.png".format(folder_name, idx))[:,:,[2,1,0]]
103 | 
104 |         if idx == _start:
105 |             h,w,_ = img.shape
106 |             r = pyrender.OffscreenRenderer(w, h)
107 | 
108 |         pkl_name = "{0}/labels/{1:06d}_label.pkl".format(folder_name, idx)
109 |         with open(pkl_name, 'rb') as f:
110 |             label = cPickle.load(f)
111 | 
112 |         RTs = []
113 |         
114 |         instance_count = 0
115 |         for label_idx,each_line in enumerate(instance_labels):
116 |             
117 |             cls,name,instance = each_line
118 | 
119 |             if cls in name_to_cls_id.keys():
120 |                 cls = name_to_cls_id[cls]
121 | 
122 |                 _,name = name.split("-")
123 | 
124 |                 obj_to_cam = np.identity(4)
125 | 
126 |                 rotation = label['rotations'][instance_count]
127 |                 translation = label['translations'][instance_count]
128 |                 obj_to_cam[:3,:3] = rotation
129 |                 obj_to_cam[:3,3] = translation
130 |                 RTs.append(np.array(obj_to_cam))
131 |                 obj_to_cam *= housecat2pyrender_conversion
132 |                 scene.set_pose(objects[(cls_id_to_name[int(cls)],name)]["mesh"], obj_to_cam)
133 | 
134 |                 instance_count += 1
135 | 
136 |         RTs = np.stack(RTs,0)
137 |         color_i, depth_i = r.render(scene)
138 | 
139 |         img = (img).astype(np.float32)
140 |         mask = np.stack([np.zeros_like(depth_i),(depth_i != 0),np.zeros_like(depth_i)],-1).astype(np.float32)
141 | 
142 |         _,_,c = img.shape
143 | 
144 |         overlay =  cv2.addWeighted(mask, 0.2, img, 0.8, 0)
145 |         overlay = (overlay * 255).astype(np.uint8)
146 |         overlay = draw_detections(overlay, k, RTs, bbox_scales)
147 | 
148 | 
149 |         if render_idx == -1:
150 |             cv2.imshow("vis",overlay)
151 |             cv2.waitKey(1)
152 |         else:
153 |             plt.figure()
154 |             plt.imshow(overlay[:,:,[2,1,0]])
155 |             plt.show()
156 | 
157 | 
158 | 
159 | """
160 | Functions for drawing 3d BBox (draw_detections, transform_coordinates_3d, get_3d_bbox, calculate_2d_projections, draw)
161 | are adapted from Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation Detection and 
162 | evaluation code (https://github.com/hughw19/NOCS_CVPR2019)
163 | """
164 | def draw_detections(image, intrinsics, gt_RTs, gt_scales):
165 | 
166 |     draw_image_bbox = image.copy()
167 | 
168 |     if gt_RTs is not None:
169 |         for ind, RT in enumerate(gt_RTs):
170 | 
171 |             xyz_axis = 0.3 * np.array([[0, 0, 0], [0, 0, 1], [0, 1, 0], [1, 0, 0]]).transpose()
172 |             transformed_axes = transform_coordinates_3d(xyz_axis, RT)
173 |             projected_axes = calculate_2d_projections(transformed_axes, intrinsics)
174 | 
175 |             bbox_3d = get_3d_bbox(gt_scales[ind], 0)
176 |             transformed_bbox_3d = transform_coordinates_3d(bbox_3d, RT)
177 | 
178 |             projected_bbox = calculate_2d_projections(transformed_bbox_3d, intrinsics)
179 |             draw_image_bbox = draw(draw_image_bbox, projected_bbox, projected_axes, (0, 255, 0))
180 | 
181 |     return draw_image_bbox
182 | 
183 | def transform_coordinates_3d(coordinates, RT):
184 |     """
185 |     Input:
186 |         coordinates: [3, N]
187 |         RT: [4, 4]
188 |     Return
189 |         new_coordinates: [3, N]
190 | 
191 |     """
192 |     assert coordinates.shape[0] == 3
193 |     coordinates = np.vstack([coordinates, np.ones((1, coordinates.shape[1]), dtype=np.float32)])
194 |     new_coordinates = RT @ coordinates
195 |     new_coordinates = new_coordinates[:3, :] / new_coordinates[3, :]
196 |     return new_coordinates
197 | 
198 | def get_3d_bbox(scale, shift=0):
199 |     """
200 |     Input:
201 |         scale: [3] or scalar
202 |         shift: [3] or scalar
203 |     Return
204 |         bbox_3d: [3, N]
205 |     """
206 |     if hasattr(scale, "__iter__"):
207 |         bbox_3d = np.array([[scale[0] / 2, +scale[1] / 2, scale[2] / 2],
208 |                             [scale[0] / 2, +scale[1] / 2, -scale[2] / 2],
209 |                             [-scale[0] / 2, +scale[1] / 2, scale[2] / 2],
210 |                             [-scale[0] / 2, +scale[1] / 2, -scale[2] / 2],
211 |                             [+scale[0] / 2, -scale[1] / 2, scale[2] / 2],
212 |                             [+scale[0] / 2, -scale[1] / 2, -scale[2] / 2],
213 |                             [-scale[0] / 2, -scale[1] / 2, scale[2] / 2],
214 |                             [-scale[0] / 2, -scale[1] / 2, -scale[2] / 2]]) + shift
215 |     else:
216 |         bbox_3d = np.array([[scale / 2, +scale / 2, scale / 2],
217 |                             [scale / 2, +scale / 2, -scale / 2],
218 |                             [-scale / 2, +scale / 2, scale / 2],
219 |                             [-scale / 2, +scale / 2, -scale / 2],
220 |                             [+scale / 2, -scale / 2, scale / 2],
221 |                             [+scale / 2, -scale / 2, -scale / 2],
222 |                             [-scale / 2, -scale / 2, scale / 2],
223 |                             [-scale / 2, -scale / 2, -scale / 2]]) + shift
224 | 
225 |     bbox_3d = bbox_3d.transpose()
226 |     return bbox_3d
227 | 
228 | def calculate_2d_projections(coordinates_3d, intrinsics):
229 |     """
230 |     Input:
231 |         coordinates: [3, N]
232 |         intrinsics: [3, 3]
233 |     Return
234 |         projected_coordinates: [N, 2]
235 |     """
236 |     projected_coordinates = intrinsics @ coordinates_3d
237 |     projected_coordinates = projected_coordinates[:2, :] / projected_coordinates[2, :]
238 |     projected_coordinates = projected_coordinates.transpose()
239 |     projected_coordinates = np.array(projected_coordinates, dtype=np.int32)
240 | 
241 |     return projected_coordinates
242 | 
243 | def draw(img, imgpts, axes, color):
244 |     imgpts = np.int32(imgpts).reshape(-1, 2)
245 | 
246 |     # draw ground layer in darker color
247 |     color_ground = (int(color[0] * 0.3), int(color[1] * 0.3), int(color[2] * 0.3))
248 |     for i, j in zip([4, 5, 6, 7], [5, 7, 4, 6]):
249 |         img = cv2.line(img, tuple(imgpts[i]), tuple(imgpts[j]), color_ground, 3)
250 | 
251 |     # draw pillars in blue color
252 |     color_pillar = (int(color[0] * 0.6), int(color[1] * 0.6), int(color[2] * 0.6))
253 |     for i, j in zip(range(4), range(4, 8)):
254 |         img = cv2.line(img, tuple(imgpts[i]), tuple(imgpts[j]), color_pillar, 3)
255 | 
256 |     # finally, draw top layer in color
257 |     for i, j in zip([0, 1, 2, 3], [1, 3, 0, 2]):
258 |         img = cv2.line(img, tuple(imgpts[i]), tuple(imgpts[j]), color, 3)
259 | 
260 |     return img
261 | 
262 | 
263 | 
264 | if __name__ == "__main__":
265 |     main()
266 | #              ^
267 | # cv pose :   /  z
268 | #             ---> x
269 | #            |
270 | #            v y
271 | 
272 | #                      ^ y
273 | # pyrender pose :      |
274 | #                      ---> x
275 | #                    /
276 | #                   v z
277 | 


--------------------------------------------------------------------------------
/render_scene.py:
--------------------------------------------------------------------------------
  1 | import pyrender, trimesh, os, glob, cv2, argparse
  2 | import numpy as np
  3 | import matplotlib.pyplot as plt
  4 | 
  5 | cmap = plt.get_cmap("gist_rainbow")
  6 | 
  7 | if os.name == 'nt':
  8 |     separator = "\\"
  9 | else:
 10 |     separator = '/'
 11 | 
 12 | scene = pyrender.Scene(bg_color=[0, 0, 0], ambient_light=[1.0, 1.0, 1.0])
 13 | 
 14 | parser = argparse.ArgumentParser(description="Undistort images")
 15 | parser.add_argument("dataset_dir")
 16 | parser.add_argument("scene_name")
 17 | parser.add_argument("traj_name")
 18 | parser.add_argument("idx")
 19 | 
 20 | args = parser.parse_args()
 21 | 
 22 | dataset_dir = args.dataset_dir
 23 | scene_name  = args.scene_name
 24 | traj_name = args.traj_name
 25 | frame_idx = int(args.idx)
 26 | 
 27 | base = os.path.join(dataset_dir,scene_name,scene_name+"_"+traj_name)
 28 | mesh_base = os.path.join(dataset_dir,scene_name,"meshes")
 29 | mesh_names = glob.glob(os.path.join(mesh_base,"*.obj"))
 30 | 
 31 | with open(os.path.join(base,"meta.txt"),"r") as f:
 32 |     meshes_in_the_scene = [each_line.strip().split(" ")[1] for each_line in f.readlines()]
 33 | 
 34 | meshes = {}
 35 | 
 36 | for each_mesh_name in mesh_names:
 37 | 
 38 |     each_mesh_name_clean = each_mesh_name.split(separator)[-1].split(".")[0]
 39 | 
 40 |     if not (each_mesh_name_clean in meshes_in_the_scene): 
 41 |         print("skipping",each_mesh_name_clean)
 42 |         continue
 43 |     else:
 44 |         print("loading",each_mesh_name_clean)
 45 | 
 46 |     trimesh_obj = trimesh.load(each_mesh_name)
 47 |     trimesh_obj.visual = trimesh.visual.ColorVisuals()
 48 |     mesh = pyrender.Mesh.from_trimesh(trimesh_obj)
 49 |     meshes[each_mesh_name] = mesh
 50 |     scene.add(mesh)
 51 | 
 52 | intrinsic = np.loadtxt(os.path.join(base, "intrinsics.txt"))
 53 | fx,fy,px,py = intrinsic[0,0],intrinsic[1,1],intrinsic[0,2],intrinsic[1,2]
 54 | camera = pyrender.IntrinsicsCamera(0,0,0,0)
 55 | camera.fx = fx
 56 | camera.fy = fy
 57 | camera.cx = px
 58 | camera.cy = py
 59 | camera_node = scene.add(camera)
 60 | 
 61 | 
 62 | images = glob.glob(os.path.join(base, "rgb", "*.png"))
 63 | poses = glob.glob(os.path.join(base, "camera_pose", "*.txt"))
 64 | instances = glob.glob(os.path.join(base, "instance", "*.png"))
 65 | depth_d435 = glob.glob(os.path.join(base, "depth_d435", "*.png"))
 66 | depth_tof  = glob.glob(os.path.join(base, "depth_tof", "*.png"))
 67 | 
 68 | assert len(set([len(images),len(poses),len(instances),len(depth_d435),len(depth_tof)])) == 1
 69 | 
 70 | h,w,_ = cv2.imread(images[0]).shape
 71 | 
 72 | r = pyrender.OffscreenRenderer(w, h)
 73 | 
 74 | error_cmap = plt.get_cmap("seismic")
 75 | depth_cmap = plt.get_cmap("inferno")
 76 | instance_cmap = plt.get_cmap("gist_rainbow")
 77 | 
 78 | images.sort()
 79 | poses.sort()
 80 | instances.sort()
 81 | depth_d435.sort()
 82 | depth_tof.sort()
 83 | 
 84 | cv2pyrender = np.array([[1,1,1,1],
 85 |                        [-1,-1,-1,-1],
 86 |                        [-1,-1,-1,-1],
 87 |                        [1,1,1,1]])
 88 | 
 89 | if frame_idx == -1:
 90 |     start = 0
 91 |     end   = len(images)
 92 | else:
 93 |     start = frame_idx
 94 |     end   = frame_idx+1
 95 | 
 96 | for idx in range(start,end):
 97 | 
 98 |     each_rgb = cv2.imread(images[idx],-1)
 99 |     each_instance = cv2.imread(instances[idx],-1)
100 |     each_depth_d435 = cv2.imread(depth_d435[idx],-1) / 1000
101 |     each_depth_tof = cv2.imread(depth_tof[idx],-1) / 1000
102 |     each_pose = np.loadtxt(poses[idx])
103 | 
104 |     scene.set_pose(camera_node, cv2pyrender.T * each_pose)
105 | 
106 |     color, depth = r.render(scene)
107 | 
108 |     d435_error = (depth - each_depth_d435) * (each_depth_d435 != 0)
109 |     tof_error = (depth - each_depth_tof) * (each_depth_tof != 0)
110 | 
111 |     dmax = 4
112 |     dmin = 0
113 |     error_max = 1
114 |     error_min = -error_max
115 | 
116 |     each_instance_cmap = (instance_cmap(each_instance)[:, :, [2, 1, 0]] * 255).astype(np.uint8)
117 |     rgb_augmented = cv2.addWeighted(each_instance_cmap, 0.3, each_rgb, 0.5, 1).astype(np.float32) / 255
118 | 
119 |     if frame_idx == -1:
120 | 
121 |         depth_clipped = ((depth.clip(dmin,dmax) - dmin) / (dmax-dmin) * 255).astype(np.uint8)
122 |         d435_clipped = ((each_depth_d435.clip(dmin,dmax) - dmin) / (dmax-dmin) * 255).astype(np.uint8)
123 |         tof_clipped = ((each_depth_tof.clip(dmin, dmax) - dmin) / (dmax - dmin) * 255).astype(np.uint8)
124 | 
125 |         depth_8bit = depth_cmap(depth_clipped)[:,:,[2,1,0]]
126 |         d435_8bit = depth_cmap(d435_clipped)[:,:,[2,1,0]]
127 |         tof_8bit = depth_cmap(tof_clipped)[:, :, [2,1,0]]
128 | 
129 |         error_d435_clipped = error_cmap((d435_error.clip(error_min,error_max) - error_min) / 2*error_max)[:,:,[2,1,0]]
130 |         error_tof_clipped = error_cmap((tof_error.clip(error_min,error_max) - error_min) / 2*error_max)[:,:,[2,1,0]]
131 | 
132 |         plot_row1 = np.hstack([depth_8bit,d435_8bit,tof_8bit])
133 |         plot_row2 = np.hstack([rgb_augmented,error_d435_clipped,error_tof_clipped])
134 | 
135 |         plot = np.vstack([plot_row1,plot_row2])
136 |         h_,w_,_ = plot.shape
137 |         plot_reshape = cv2.resize(plot,(w_//2,h_//2))
138 | 
139 |         cv2.imshow("error",plot_reshape)
140 |         cv2.waitKey(5)
141 | 
142 |     else:
143 |         plt.figure()
144 |         plt.subplot(2,3,1)
145 |         plt.title("Depth GT")
146 |         plt.imshow(depth,cmap="inferno",vmax=dmax,vmin=dmin)
147 |         plt.subplot(2,3,2)
148 |         plt.title("Depth D435")
149 |         plt.imshow(each_depth_d435, cmap="inferno", vmax=dmax, vmin=dmin)
150 |         plt.subplot(2,3,3)
151 |         plt.title("Depth ToF")
152 |         plt.imshow(each_depth_tof, cmap="inferno", vmax=dmax, vmin=dmin)
153 |         plt.subplot(2,3,4)
154 |         plt.title("RGB with Semantic Mask")
155 |         plt.imshow(rgb_augmented[:,:,[2,1,0]])
156 |         plt.subplot(2,3,5)
157 |         plt.title("D435 Depth Error")
158 |         plt.imshow(d435_error, cmap="seismic", vmax=error_max, vmin=error_min)
159 |         plt.subplot(2,3,6)
160 |         plt.title("ToF Depth Error")
161 |         plt.imshow(tof_error, cmap="seismic", vmax=error_max, vmin=error_min)
162 |         plt.show()
163 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | pyrender==0.1.45
2 | opencv-python==4.10.0.82
3 | smplx==0.1.28
4 | chumpy==0.70
5 | numpy==1.23.0
6 | matplotlib==3.9.0
7 | 


--------------------------------------------------------------------------------