├── .gitignore ├── README.md ├── assets └── epic_fields.png ├── demo ├── demo.py ├── demo_ego4d.md ├── dense_point_cloud.sh ├── reconstruct_sparse.sh └── register_dense.sh ├── example_data ├── P04_01_line.json ├── P04_01_line.png ├── P06_09_line.json ├── P06_09_line.png ├── P12_101_line.json ├── P12_101_line.png ├── P28_101.json ├── P28_101 │ ├── frame_0000000080.jpg │ ├── frame_0000000085.jpg │ ├── frame_0000000090.jpg │ ├── frame_0000000095.jpg │ ├── frame_0000000100.jpg │ ├── frame_0000000105.jpg │ ├── frame_0000000110.jpg │ └── frame_0000000115.jpg ├── P28_101_line.json ├── P28_101_line.png ├── example_output_gui.jpg └── example_output_line.jpg ├── homography_filter ├── __init__.py ├── argparser.py ├── filter.py └── lib.py ├── input_videos.txt ├── licence.txt ├── reconstruct_sparse.py ├── register_dense.py ├── scripts ├── reconstruct_sparse.sh └── register_dense.sh ├── select_sparse_frames.py ├── tools ├── __init__.py ├── common_functions.py ├── project_3d_line.py ├── visualise_data_open3d.py └── visualize_colmap_open3d.py └── utils ├── __init__.py ├── base_type.py ├── colmap_utils.py ├── hovering ├── __init__.py ├── helper.py ├── hover_open3d.py └── o3d_line_mesh.py └── lib.py /.gitignore: -------------------------------------------------------------------------------- 1 | **/__pycache__/ 2 | 3 | outputs/ -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # EPIC Fields: Marrying 3D Geometry and Video Understanding 2 | ![EPIC Fields Overview](assets/epic_fields.png?raw=true) 3 | 4 | This repository provides tools and scripts for visualizing and reconstructing the [EPIC FIELDS](https://epic-kitchens.github.io/epic-fields) dataset. 5 | 6 | ## Table of Contents 7 | 8 | 1. [Visualization Code](#visualization-code) 9 | - [Introduction](#introduction) 10 | - [Format](#format) 11 | - [Visualization](#visualisation) 12 | 2. [Reconstruction Pipeline](#reconstruction-pipeline) 13 | - [Steps for EPIC-KITCHENS Reconstruction](#steps-for-epic-kitchens-reconstruction) 14 | - [Understanding the Output File Structure](#understanding-the-output-file-structure) 15 | 3. [Reconstruction Pipeline: Quick Demo](#reconstruction-pipeline-quick-demo) 16 | 4. [Additional info](#additional-info) 17 | - [Credit](#credit) 18 | - [Citation](#citation) 19 | - [License](#license) 20 | - [Contact](#contact) 21 | 22 | 23 | 24 | # Visualization Code 25 | ## Introduction 26 | 27 | This visualisation code is associated with the released EPIC FIELDS dataset. Further details on the dataset and associated preprint are available at: 28 | [https://epic-kitchens.github.io/epic-fields](https://epic-kitchens.github.io/epic-fields) 29 | 30 | 31 | ## Format 32 | 33 | - The `camera` parameters use the COLMAP format, which is the same as the OpenCV format. 34 | - The `images` stores the world-to-camera transformation, represented by quaternion and translation. 35 | - Note: for NeRF usage this needs to be converted to camera-to-world transformation and possibly changing (+x, +y, +z) to (+x, -y, -z) 36 | - The `points` is part of COLMAP output. It's kept here for visualisation purpose and potentially for computing the `near`/`far` bounds in NeRF input. 37 | ``` 38 | { 39 | "camera": { 40 | "id": 1, "model": "OPENCV", "width": 456, "height": 256, 41 | "params": [fx, fy, cx, cy, k1, k2, p1, p2] 42 | }, 43 | "images": { 44 | frame_name: [qw, qx, qy, qz, tx, ty, tz], 45 | ... 46 | }, 47 | "points": [ 48 | [x, y, z, r, g, b], 49 | ... 50 | ] 51 | } 52 | 53 | example data can be found in `example_data/P28_101.json` 54 | ``` 55 | 56 | ## Visualisation 57 | 58 | ### Visualise camera poses and pointcloud 59 | 60 | This script requires Open3D. This script is tested with Open3D==0.16.1. 61 | ```python 62 | python tools/visualise_data_open3d.py --json-data example_data/P28_101.json 63 | ``` 64 | PS: Press 'h' to see the Open3D help message. 65 | 66 |
67 | Click to see the example output 68 | gui 69 |
70 | 71 | ### Example: Project a 3D line onto epic-kitchens images using camera poses 72 | 73 | ```python 74 | python tools/project_3d_line.py \ 75 | --json-data example_data/P28_101.json \ 76 | --line-data example_data/P28_101_line.json \ 77 | --frames-root example_data/P28_101/ 78 | ``` 79 |
80 | Click to see the example output 81 | line 82 |
83 | 84 | To draw a 3D line, one option is to download the COLMAP format data and use COLMAP GUI to click on points. 85 | 86 | 87 | --- 88 | 89 | 90 | 91 | # Reconstruction Pipeline 92 | 93 | This section contains the pipeline for the dataset introduced in our paper, "EPIC Fields: Marrying 3D Geometry and Video Understanding." We aim to bridge the domains of 3D geometry and video understanding, leading to innovative advancements in both areas. 94 | 95 | ## Steps for EPIC-KITCHENS Reconstruction 96 | 97 | This section outlines the procedure to achieve the [EPIC-KITCHENS](https://epic-kitchens.github.io) reconstructions using our methodology. 98 | 99 | ### Step 0: Prerequisites and Initial Configuration 100 | 101 | #### 1. Installing COLMAP (preferably with CUDA support) 102 | 103 | To efficiently process and reconstruct the frames, it's recommended to install COLMAP with CUDA support, which accelerates the reconstruction process using NVIDIA GPUs. 104 | 105 | You can download and install COLMAP from their official website. For detailed installation instructions, especially on how to enable CUDA support, refer to the [COLMAP installation guide](https://colmap.github.io/install.html). 106 | 107 | #### 2. Cloning the Repository 108 | 109 | To proceed with the subsequent steps, you'll need to clone the current repository. Run the following commands: 110 | 111 | ```bash 112 | git clone https://github.com/epic-kitchens/epic-fields-code.git 113 | cd epic-fields-code 114 | ``` 115 | 116 | #### 3. Downloading Vocabulary Trees 117 | COLMAP utilizes vocabulary trees for efficient image matching. Create a directory called vocab_bins and download the required Vocabulary Trees into this directory: 118 | ```bash 119 | mkdir vocab_bins 120 | cd vocab_bins 121 | wget https://demuc.de/colmap/vocab_tree_flickr100K_words32K.bin 122 | cd .. 123 | ``` 124 | #### 4. Installing `pycolmap` package 125 | 126 | The `pycolmap` package will be used to gather statistics from the model later on. Install it using `pip` (assuming that you've created an environment): 127 | 128 | ```bash 129 | pip install pycolmap 130 | ``` 131 | ### Step 1: Downloading Video Frames 132 | 133 | To utilize the EPIC Fields pipeline, the first step is to acquire the necessary video frames. We're particularly interested in the RGB frames from EPIC-KITCHENS. You can download the entire collection from [EPIC-KITCHENS](https://epic-kitchens.github.io). 134 | 135 | For demonstration purposes, we'll guide you through downloading the `P15_12` video RGB frames. 136 | 137 | ##### Demo: Downloading and Extracting `P15_12` Video Frames 138 | 139 | Execute the following shell commands to download and extract the RGB frames: 140 | 141 | ```bash 142 | # Download the tarball 143 | wget https://data.bris.ac.uk/datasets/3h91syskeag572hl6tvuovwv4d/frames_rgb_flow/rgb/train/P15/P15_12.tar 144 | 145 | # Create the desired directory structure 146 | mkdir -p P15/P15_12 147 | 148 | # Extract the frames into the specified directory 149 | tar -xf P15_12.tar -C P15/P15_12 150 | ``` 151 | This will place all the .jpg frames inside the P15/P15_12 directory. 152 | 153 | ##### Directory Structure Confirmation 154 | 155 | After downloading and extracting, your directory structure should look like this (which is [EPIC-KITCHENS](https://epic-kitchens.github.io) format : 156 | ``` 157 | /root-directory/ 158 | │ 159 | └───PXX/ 160 | │ 161 | └───PXX_YY(Y)/ 162 | │ frame_000001.jpg 163 | │ frame_000002.jpg 164 | │ ... 165 | ``` 166 | For our P15_12 example, this would be: 167 | ``` 168 | /root-directory/ 169 | │ 170 | └───P15/ 171 | │ 172 | └───P15_12/ 173 | │ frame_000001.jpg 174 | │ frame_000002.jpg 175 | │ ... 176 | ``` 177 | 178 | This structure ensures a consistent format for the pipeline to process the frames effectively. 179 | 180 | ### Step 2: Specifying Videos for Reconstruction 181 | 182 | Update the `input_videos.txt` file in the repository to list the video identifiers you wish to process. In our demo example, we put P15_12 in the file. If you have multiple files, please ensure each video identifier is on a separate line. 183 | 184 | 185 | ### Step 3: Running the Homography-Based Frame Sampling 186 | 187 | Execute the `select_sparse_frames.py` script to perform homography-based sampling of the frames. 188 | 189 | ##### Script Parameters: 190 | 191 | - `--input_videos`: Path to the file containing a list of videos to be processed. Default: `input_videos.txt` 192 | - `--epic_kithens_root`: Directory path to the EPIC-KITCHENS images. Default: `.` 193 | - `--sampled_images_path`: Directory where the sampled image files will be stored. Default: `sampled_frames` 194 | - `--homography_overlap`: Threshold for the homography to sample new frames. A higher value will sample more images. Default: `0.9` 195 | - `--max_concurrent`: Maximum number of concurrent processes. Default: `8` 196 | 197 | ##### Example Usage: 198 | 199 | ```bash 200 | python3 select_sparse_frames.py --input_videos input_videos.txt --epic_kithens_root path_to_epic_images --sampled_images_path path_for_sampled_frames 201 | ``` 202 | 203 | ##### Demo: Homography-Based Frame Sampling for `P15_12` Video 204 | 205 | For the demo, using the `P15_12` video you've downloaded into the current directory, run: 206 | 207 | ```bash 208 | python3 select_sparse_frames.py --input_videos input_videos.txt --epic_kithens_root . --sampled_images_path sampled_frames --homography_overlap 0.9 --max_concurrent 8 209 | ``` 210 | 211 | 212 | ### Step 4: Running the COLMAP Sparse Reconstruction 213 | 214 | Execute the `reconstruct_sparse.py` script to perform sparse reconstruction using COLMAP. 215 | 216 | ##### Script Parameters: 217 | 218 | - `--input_videos`: Path to the file containing a list of videos to be processed. Default: `input_videos.txt` 219 | - `--sparse_reconstuctions_root`: Path to store the sparsely reconstructed models. Default: `colmap_models/sparse` 220 | - `--epic_kithens_root`: Directory path to the EPIC-KITCHENS images. Default: `.` 221 | - `--logs_path`: Path where the log files will be stored. Default: `logs/sparse/out_logs_terminal` 222 | - `--summary_path`: Path where the summary files will be stored. Default: `logs/sparse/out_summary` 223 | - `--sampled_images_path`: Directory where the sampled image files are located. Default: `sampled_frames` 224 | - `--gpu_index`: Index of the GPU to be used. Default: `0` 225 | 226 | ##### Example Usage: 227 | ```bash 228 | python3 reconstruct_sparse.py --input_videos input_videos.txt --sparse_reconstuctions_root colmap_models/sparse --epic_kithens_root path_to_epic_images --logs_path logs/sparse/out_logs_terminal --summary_path logs/sparse/out_summary --sampled_images_path path_for_sampled_frames --gpu_index 0 229 | ``` 230 | 231 | #### Demo: Sparse Reconstruction for P15_12 Video 232 | For the demo, using the P15_12 video and the sampled frames in the current directory, run: 233 | 234 | ```bash 235 | python3 reconstruct_sparse.py --input_videos input_videos.txt --sparse_reconstuctions_root colmap_models/sparse --epic_kithens_root . --logs_path logs/sparse/out_logs_terminal --summary_path logs/sparse/out_summary --sampled_images_path sampled_frames --gpu_index 0 236 | ``` 237 | 238 | ### Understanding the Output File Structure 239 | 240 | After running the sparse reconstruction demo, you'll notice the following directory hierarchy: 241 | ``` 242 | logs/ 243 | │ 244 | └───sparse/ 245 | │ 246 | ├───out_logs_terminal/ 247 | │ │ P15_12__reconstruct_sparse.out 248 | │ │ ... 249 | │ 250 | └───out_summary/ 251 | │ P15_12.out 252 | │ ... 253 | 254 | ``` 255 | #### Sparse Model Directory: 256 | The sparsely reconstructed model for our demo video P15_12 will be found in: ```colmap_models/sparse/P15_12``` 257 | 258 | #### Logs Directory: 259 | The "logs" directory provides insights into the sparse reconstruction process: 260 | 261 | - COLMAP Execution Logs (out_logs_terminal): These logs capture details from the COLMAP execution and can be helpful for debugging. For our demo video P15_12, the respective log file would be named something like: ```logs/sparse/out_logs_terminal/P15_12__reconstruct_sparse.out``` 262 | 263 | - Sparse Model Summary (out_summary): This directory contains a summary of the sparse model's statistics. For our demo video P15_12, the summary file is ```logs/sparse/out_summary/P15_12.out``` 264 | By examining the P15_12.out file, you can gain insights into how well the reconstruction process performed for that specific video and the excution time. 265 | 266 | 267 | ### Step 5: Registering All Frames into the Sparse Model 268 | 269 | For this step, you'll use the `register_dense.py` script. This script registers all the frames with the sparse model, preparing them for a dense reconstruction. 270 | 271 | ##### Script Parameters: 272 | 273 | - `--input_videos`: Path to the file containing a list of videos to be processed. Default: `input_videos.txt` 274 | - `--sparse_reconstuctions_root`: Directory path to the sparsely reconstructed models. Default: `colmap_models/sparse` 275 | - `--dense_reconstuctions_root`: Directory path to the densely registered models. Default: `colmap_models/dense` 276 | - `--epic_kithens_root`: Directory path to the EPIC-KITCHENS images. Default: `.` 277 | - `--logs_path`: Directory where the log files of the dense registration will be stored. Default: `logs/dense/out_logs_terminal` 278 | - `--summary_path`: Directory where the summary files of the dense registration will be stored. Default: `logs/dense/out_summary` 279 | - `--gpu_index`: Index of the GPU to use. Default: `0` 280 | 281 | #### Demo: Registering Frames into Sparse Model for Video `P15_12` 282 | 283 | To demonstrate the registration process using the `register_dense.py` script, let's use the sample video `P15_12` as an example. 284 | 285 | ```bash 286 | python3 register_dense.py --input_videos input_videos.txt --sparse_reconstuctions_root colmap_models/sparse --dense_reconstuctions_root colmap_models/dense --epic_kithens_root . --logs_path logs/dense/out_logs_terminal --summary_path logs/dense/out_summary --gpu_index 0 287 | ``` 288 | 289 | Assuming input_videos.txt contains the entry for P15_12, the above command will register all frames from the P15_12 video with the sparse model stored under colmap_models/sparse, the new registered model will be saved under colmap_models/dense. The logs and summary for this registration process will be saved under the logs/dense/out_logs_terminal and logs/dense/out_summary directories, respectively. 290 | 291 | After executing the command, you can check the log files and summary for insights and statistics on the registration process for the P15_12 video. 292 | 293 | # Reconstruction Pipeline: Quick Demo 294 | 295 | Here we provide another demo script `demo/demo.py` 296 | that works on a video directly, summarising all above steps into one file. 297 | 298 | ``` 299 | python demo/demo.py video.mp4 300 | ``` 301 | 302 | Please refer to [demo_ego4d.md](demo/demo_ego4d.md) for details. 303 | 304 | 305 | 306 | # Additional info 307 | 308 | ## Credit 309 | 310 | Code prepared by Zhifan Zhu, Ahmad Darkhalil and Vadim Tschernezki. 311 | 312 | ## Citation 313 | If you find this work useful please cite our paper: 314 | 315 | ``` 316 | @article{EPICFIELDS2023, 317 | title={{EPIC-FIELDS}: {M}arrying {3D} {G}eometry and {V}ideo {U}nderstanding}, 318 | author={Tschernezki, Vadim and Darkhalil, Ahmad and Zhu, Zhifan and Fouhey, David and Larina, Iro and Larlus, Diane and Damen, Dima and Vedaldi, Andrea}, 319 | booktitle = {ArXiv}, 320 | year = {2023} 321 | } 322 | ``` 323 | 324 | Also cite the [EPIC-KITCHENS-100](https://epic-kitchens.github.io) paper where the videos originate: 325 | 326 | ``` 327 | @ARTICLE{Damen2022RESCALING, 328 | title={Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100}, 329 | author={Damen, Dima and Doughty, Hazel and Farinella, Giovanni Maria and and Furnari, Antonino 330 | and Ma, Jian and Kazakos, Evangelos and Moltisanti, Davide and Munro, Jonathan 331 | and Perrett, Toby and Price, Will and Wray, Michael}, 332 | journal = {International Journal of Computer Vision (IJCV)}, 333 | year = {2022}, 334 | volume = {130}, 335 | pages = {33–55}, 336 | Url = {https://doi.org/10.1007/s11263-021-01531-2} 337 | } 338 | ``` 339 | For more information on the project and related research, please visit the [EPIC-Kitchens' EPIC Fields page](https://epic-kitchens.github.io/epic-fields/). 340 | 341 | 342 | ## License 343 | All files in this dataset are copyright by us and published under the 344 | Creative Commons Attribution-NonCommerial 4.0 International License, found 345 | [here](https://creativecommons.org/licenses/by-nc/4.0/). 346 | This means that you must give appropriate credit, provide a link to the license, 347 | and indicate if changes were made. You may do so in any reasonable manner, 348 | but not in any way that suggests the licensor endorses you or your use. You 349 | may not use the material for commercial purposes. 350 | 351 | ## Contact 352 | 353 | For general enquiries regarding this work or related projects, feel free to email us at [uob-epic-kitchens@bristol.ac.uk](mailto:uob-epic-kitchens@bristol.ac.uk). 354 | 355 | -------------------------------------------------------------------------------- /assets/epic_fields.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/assets/epic_fields.png -------------------------------------------------------------------------------- /demo/demo.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import os.path as osp 4 | from pathlib import Path 5 | import subprocess 6 | import logging 7 | import pycolmap 8 | 9 | 10 | def parse_args(): 11 | import argparse 12 | parser = argparse.ArgumentParser() 13 | parser.add_argument('video_path', type=str) 14 | return parser.parse_args() 15 | 16 | 17 | def setup_logger(name, log_file, level=logging.DEBUG): 18 | """To setup as many loggers as you want""" 19 | 20 | handler = logging.FileHandler(log_file, mode='a') 21 | formatter = logging.Formatter('%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s', 22 | datefmt='%Y-%m-%d %H:%M:%S') 23 | handler.setFormatter(formatter) 24 | logger = logging.getLogger(name) 25 | logger.setLevel(level) 26 | logger.addHandler(handler) 27 | 28 | return logger 29 | 30 | 31 | class PipelineExecutor: 32 | """ 33 | Output structure we need are: 34 | 35 | / 36 | pipeline.log 37 | sparse.log 38 | register.log 39 | dense_pcd.log 40 | colmap/ 41 | sparse/{max_model_id}/{cameras.bin,points.bin,images.bin} 42 | registered/{cameras.bin,points.bin,images.bin} 43 | dense/dense.ply 44 | """ 45 | 46 | def __init__(self, 47 | video_path: str, 48 | out_dir: str, 49 | longside: int = 512, 50 | camera_model: str = 'OPENCV', 51 | make_log_and_dirs=True, 52 | ): 53 | """ 54 | Args: 55 | video_path: path to the video 56 | camera_model: See Colmap doc 57 | longside: this controls the frame resolution for the extracted frames 58 | """ 59 | self.worker_dir = Path(out_dir) 60 | self.video_file = video_path 61 | self.camera_model = camera_model 62 | self.longside = longside 63 | 64 | self.frames_dir = self.worker_dir / 'frames' 65 | self.homo_path = self.worker_dir / 'homo90.txt' 66 | self.colmap_dir = self.worker_dir / 'colmap' 67 | self.pipeline_log = self.worker_dir / 'pipeline.log' 68 | self.sparse_log = self.worker_dir / 'sparse.log' 69 | self.register_log = self.worker_dir / 'register.log' 70 | self.dense_pcd_log = self.worker_dir / 'dense_pcd.log' 71 | 72 | self.sparse_dir = self.colmap_dir / 'sparse' # generated by colmap 73 | self.register_dir = self.colmap_dir / 'registered' 74 | self.dense_pcd_dir = self.colmap_dir / 'dense' 75 | 76 | if not make_log_and_dirs: 77 | return 78 | os.makedirs(self.worker_dir, exist_ok=True) 79 | self.logger = setup_logger('demo-logger', self.pipeline_log) 80 | self.logger.info("Run start") 81 | assert os.path.exists(self.pipeline_log) 82 | 83 | def extract_frames(self, with_skip=True): 84 | # num_expected_frames = -1 85 | if with_skip and os.path.exists(self.frames_dir) and len(os.listdir(self.frames_dir)) > 0: 86 | print(f'{self.frames_dir} exist and is non-empty, skip') 87 | return 88 | 89 | cmd1 = [ 90 | 'ffprobe', '-v', 'error', '-select_streams', 'v:0', '-show_entries', 'stream=width,height', '-of', 'csv=s=x:p=0', self.video_file 91 | ] 92 | # extract output resolution 93 | p = subprocess.Popen(cmd1, stdout=subprocess.PIPE, stderr=subprocess.PIPE) 94 | out, err = p.communicate() 95 | print('Original resolution: ', out) 96 | w, h = out.decode('utf-8').strip().split('x') 97 | h, w = int(h), int(w) 98 | assert w > h 99 | h = h * self.longside // w 100 | w = self.longside 101 | 102 | s = f'{w}x{h}' 103 | os.makedirs(self.frames_dir, exist_ok=True) 104 | 105 | print("Extracting frames... ") 106 | cmd2 = [ 107 | 'ffmpeg', '-i', self.video_file, '-q:v', '1', '-vf', 'fps=30', '-s', s, f'{self.frames_dir}/frame_%010d.jpg'] 108 | cmd2 = ' '.join(cmd2) 109 | p = subprocess.call(cmd2, shell=True) 110 | self.logger.info(f'Extract frames done') 111 | 112 | def run_homography(self): 113 | self.logger.info(f'Run homography') 114 | cmd = [ 115 | 'python', 'homography_filter/filter.py', '--src', 116 | str(self.frames_dir), '--dst_file', str(self.homo_path), '--overlap', '0.9' 117 | ] 118 | print(' '.join(cmd)) 119 | if os.path.exists(self.homo_path): 120 | with open(self.homo_path, 'r') as fp: 121 | lines = fp.readlines() 122 | n_lines = len(lines) 123 | print(f'{self.homo_path} with {n_lines}, skip') 124 | self.logger.info(f'{self.homo_path} with {n_lines}, skip') 125 | return 126 | cmd = ' '.join(cmd) 127 | self.logger.info(cmd) 128 | p = subprocess.call(cmd, shell=True) 129 | self.logger.info(f'Homography Done') 130 | 131 | def run_sparse_reconstruct(self, script_path='demo/reconstruct_sparse.sh'): 132 | status = self.get_summary() 133 | if status['num_sparse_models'] > 0: 134 | self.logger.info(f'Found {status["num_sparse_models"]} sparse models, skip sparse reconstruction()') 135 | print(f'Found {status["num_sparse_models"]} sparse models, skip sparse reconstruction()') 136 | return 137 | self.logger.info(f'Run sparse') 138 | cmd = [ 139 | 'bash', script_path, 140 | str(self.worker_dir), str(self.camera_model) 141 | ] 142 | print(' '.join(cmd)) 143 | print('Check sparse log at ', self.sparse_log) 144 | self.logger.info(' '.join(cmd)) 145 | with open(self.sparse_log, 'w') as sparse_fp: 146 | p = subprocess.run(cmd, stdout=sparse_fp, stderr=sparse_fp) 147 | # out, err = p.communicate() 148 | if p.returncode != 0: 149 | print(f'Error in sparse reconstruction. See {self.sparse_log}') 150 | sys.exit(1) 151 | self.logger.info(f'Done sparse') 152 | 153 | def run_register(self, script_path='demo/register_dense.sh'): 154 | summary = self.get_summary() 155 | max_sparse_ind = summary['max_sparse_ind'] 156 | if summary['num_register'] > 0: 157 | print(f'Found {summary["num_register"]} already registered, skiping') 158 | return 159 | self.logger.info(f'Run Register') 160 | cmd = [ 161 | 'bash', script_path, 162 | str(self.worker_dir), str(self.camera_model), str(max_sparse_ind) 163 | ] 164 | print(' '.join(cmd)) 165 | self.logger.info(' '.join(cmd)) 166 | with open(self.register_log, 'w') as register_fp: 167 | p = subprocess.run(cmd, stdout=register_fp, stderr=register_fp) 168 | self.logger.info(f'Done Register') 169 | 170 | def run_dense_pcd(self, script_path='demo/dense_point_cloud.sh'): 171 | summary = self.get_summary() 172 | max_sparse_ind = summary['max_sparse_ind'] 173 | if os.path.exists(self.dense_pcd_dir / 'fused.ply'): 174 | print(f'fused.ply already exist in {self.dense_pcd_dir}, skiping') 175 | return 176 | self.logger.info(f'Run Dense PCD (patch stereo)') 177 | cmd = [ 178 | 'bash', script_path, 179 | str(self.worker_dir), str(max_sparse_ind) 180 | ] 181 | print(' '.join(cmd)) 182 | self.logger.info(' '.join(cmd)) 183 | with open(self.dense_pcd_log, 'w') as dense_pcd_fp: 184 | p = subprocess.run(cmd, stdout=dense_pcd_fp, stderr=dense_pcd_fp) 185 | self.logger.info(f'Done Dense PCD') 186 | 187 | def execute(self): 188 | self.extract_frames() 189 | self.run_homography() 190 | if not osp.exists(self.homo_path): 191 | print(f'{self.homo_path} not exist after homography, abort') 192 | return 193 | self.run_sparse_reconstruct() 194 | if not self.get_summary()['num_sparse_models'] > 0: 195 | print(f"num_sparse_models <= 0 after sparse reconstruction, abort") 196 | return 197 | self.run_register() 198 | self.run_dense_pcd() 199 | 200 | def get_summary(self) -> dict: 201 | """ 202 | N-frames, N-homo, N-sparse-models, max_sparse_ind, N-sparse-images, N-register 203 | """ 204 | info = dict( 205 | video=self.video_file, 206 | num_frames=-1, num_homo=-1, num_sparse_models=-1, 207 | max_sparse_ind=-1, num_sparse_images=-1, num_register=-1 208 | ) 209 | info['num_frames'] = len(os.listdir(self.frames_dir)) 210 | if not os.path.exists(self.homo_path): 211 | return info 212 | 213 | with open(self.homo_path) as fp: 214 | info['num_homo'] = len(fp.readlines()) 215 | 216 | if not osp.exists(self.sparse_dir): 217 | return info 218 | 219 | info['num_sparse_models'] = len(os.listdir(self.sparse_dir)) 220 | for mod in os.listdir(self.sparse_dir): 221 | mod_path = osp.join(self.sparse_dir, mod) 222 | recon = pycolmap.Reconstruction(mod_path) 223 | num_images = recon.num_images() 224 | if num_images > info['num_sparse_images']: 225 | info['num_sparse_images'] = num_images 226 | info['max_sparse_ind'] = mod # str 227 | 228 | reg_path = osp.join(self.register_dir) 229 | if not osp.exists(osp.join(reg_path, 'images.bin')): 230 | return info 231 | recon = pycolmap.Reconstruction(reg_path) 232 | num_images = recon.num_images() 233 | info['num_register'] = num_images 234 | 235 | return info 236 | 237 | if __name__ == '__main__': 238 | args = parse_args() 239 | executor = PipelineExecutor( 240 | args.video_path, out_dir='outputs/demo/', 241 | longside=512) 242 | executor.execute() -------------------------------------------------------------------------------- /demo/demo_ego4d.md: -------------------------------------------------------------------------------- 1 | # Reconstruction Pipeline: Demo on Ego4D 2 | 3 | This `demo/demo.py` will works on a video directly. 4 | 5 | Assume the environment is setup as described in [Step 0](/README.md#step-0-prerequisites-and-initial-configuration), 6 | and the video file is named `video.mp4`. 7 | Run the demo with: 8 | 9 | ``` 10 | python demo/demo.py video.mp4 11 | ``` 12 | 13 | You will find the results in `outputs/demo/colmap/`: 14 | the file `outputs/demo/colmap/registered/images.bin` stores (nearly) all camera poses; 15 | the file `outputs/demo/colmap/dense/fused.ply` stores the dense point cloud of the scene. 16 | There are also log files `outputs/demo/*.log` to monitor the progress. 17 | 18 | You should now inspect(visualise) the results using: 19 | ``` 20 | # Tested with open3d==0.16.0 21 | python3 tools/visualize_colmap_open3d.py \ 22 | --model outputs/demo/colmap/registered \ 23 | --pcd-path outputs/demo/colmap/dense/fused.ply 24 | ``` 25 | Note the `outputs/demo/colmap/registered/images.bin` might be slow to load. In practice, we visualise the key-frames: 26 | ``` 27 | python3 tools/visualize_colmap_open3d.py \ 28 | --model outputs/demo/colmap/sparse/0 \ 29 | --pcd-path outputs/demo/colmap/dense/fused.ply 30 | # Note: See colmap doc for what `sparse/0` exactly means. 31 | ``` 32 | 33 | ### What does this `demo/demo.py` do? 34 | 35 | Specifically, `demo/demo.py` file will do the following sequentially: 36 | - Extract frames using `ffmpeg` with longside 512px. This is analogous to Step 1 & 2 in [Reconstruction Pipeline](/README.md#reconstruction-pipeline). 37 | - Compute important frames via homography. This correspond to Step 3 above. 38 | - Perform the _sparse reconstruction_. This corresponds to Step 4 above. 39 | - at the end of this step, you should inspect the sparse result to make sure it makes sense. 40 | - Perform the _dense frame registration_. This corresponds to Step 5 above. 41 | - at the end of this, you will have all the camera poses. 42 | - Compute dense point cloud using colmap's patch_match_stereo. This gives you the dense pretty point-cloud you see in the teaser image. 43 | 44 | ### Example: Ego4D videos 45 | 46 | We demo this script on following two Ego4D videos: 47 | - Task: Cooking — 10 minutes. Ego4d uid = `id18f5c2be-cb79-46fa-8ff1-e03b7e26c986`. Demo output on Youtube: https://youtu.be/GfBsLnZoFGs 48 | - The running time of this video is 4 hours. 49 | - As a sanity check, the file `homo90.txt` after the homography step contains *1522* frames. 50 | - Task: Construction — 35 minutes of decorating and refurbishment. Ego4d uid =`a2dd8a8f-835f-4068-be78-99d38ad99625`. Demo output on Youtube: https://youtu.be/EZlayZIwNgQ 51 | - The running time of this video breaks down as follows: 52 | - Extract frames: 5 mins 53 | - Homography filter: 1 hour 54 | - Sparse reconstruction: **20 hours** 55 | - Dense register: 1.5 hours 56 | - Dense Point-cloud generation: 2 hours 57 | 58 | ### Tips for running the demo script 59 | 60 | We rely on COLMAP, but no tool is perfect. In case of failure, check: 61 | - If the resulting point cloud is not geometrically correct, e.g. the ground is clearly not flat, try to re-run from the sparse reconstruction step. 62 | COLMAP has some stochastic behaviur at initial view choosing. 63 | - If above fails again, try to increase the `--overlap` in homography filter to e.g. 0.95. This will the number of important frames, at the cost of increasing running time during sparse reconstruction. 64 | 65 | 66 | ### Visualise a video of camera poses 67 | 68 | To produce a video of camera poses and trajectory overtime (see e.g. Youtube video above), follow steps below: 69 |
70 | Click to see steps 71 |
    72 |
  1. Visualise the result again with Open3D GUI
    python3 tools/visualize_colmap_open3d.py --model outputs/demo/colmap/sparse/0 --pcd-path outputs/demo/colmap/dense/fused.ply 73 |
  2. 74 |
  3. 75 | In Open3D GUI, press Ctrl-C(Linux) / Cmd-C (Mac) to copy the view to system clipboard. Go to any editor, press Ctrl-V/Cmd-V to paste the view status, save the file to outputs/demo/view.json. 76 |
  4. 77 |
  5. Run the following script to produce the video
    python utils/hovering/hover_open3d.py --model outputs/demo/colmap/registered --pcd-path outputs/demo/colmap/dense/fused.ply --view-path outputs/demo/view.json
    The produced video is at outputs/hovering/out.mp4. 78 |
  6. 79 |
80 |
81 | -------------------------------------------------------------------------------- /demo/dense_point_cloud.sh: -------------------------------------------------------------------------------- 1 | 2 | WORK_DIR=$1 3 | SPARSE_INDEX=$2 4 | 5 | IMG_PATH=$WORK_DIR/frames 6 | INPUT_PATH=$WORK_DIR/colmap/sparse/$SPARSE_INDEX 7 | OUTPUT_PATH=$WORK_DIR/colmap/dense 8 | 9 | OLD_DIR=$(pwd) 10 | 11 | mkdir -p $OUTPUT_PATH 12 | 13 | colmap image_undistorter \ 14 | --image_path $IMG_PATH \ 15 | --input_path $INPUT_PATH \ 16 | --output_path $OUTPUT_PATH \ 17 | --output_type COLMAP \ 18 | --max_image_size 1000 \ 19 | 20 | cd $OUTPUT_PATH 21 | 22 | colmap patch_match_stereo \ 23 | --workspace_path . \ 24 | --workspace_format COLMAP \ 25 | --PatchMatchStereo.max_image_size=1000 \ 26 | --PatchMatchStereo.gpu_index=0,1 \ 27 | --PatchMatchStereo.cache_size=32 \ 28 | --PatchMatchStereo.geom_consistency false \ 29 | 30 | colmap stereo_fusion \ 31 | --workspace_path . \ 32 | --workspace_format COLMAP \ 33 | --input_type photometric \ 34 | --output_type PLY \ 35 | --output_path ./fused.ply \ 36 | 37 | # For geometric consistency, do the following lines instead 38 | # colmap patch_match_stereo \ 39 | # --workspace_path . \ 40 | # --workspace_format COLMAP \ 41 | # --PatchMatchStereo.max_image_size=1000 \ 42 | # --PatchMatchStereo.gpu_index=0,1 \ 43 | # --PatchMatchStereo.cache_size=32 \ 44 | # --PatchMatchStereo.geom_consistency false \ 45 | 46 | # colmap stereo_fusion \ 47 | # --workspace_path . \ 48 | # --workspace_format COLMAP \ 49 | # --input_type photometric \ 50 | # --output_type PLY \ 51 | # --output_path ./fused.ply \ 52 | 53 | cd $OLD_DIR -------------------------------------------------------------------------------- /demo/reconstruct_sparse.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | start=`date +%s` 3 | 4 | WORK_DIR=$1 5 | CAMERA_MODEL=$2 # OPENCV or OPENCV_FISHEYE 6 | GPU_IDX=0 7 | 8 | IMGS_DIR=$WORK_DIR/frames 9 | OUT_DIR=${WORK_DIR}/colmap 10 | 11 | DB_PATH=${OUT_DIR}/database.db 12 | SPARSE_DIR=${OUT_DIR}/sparse 13 | 14 | mkdir -p ${OUT_DIR} 15 | mkdir -p ${SPARSE_DIR} 16 | 17 | #SIMPLE_PINHOLE 18 | colmap feature_extractor \ 19 | --database_path ${DB_PATH} \ 20 | --ImageReader.camera_model $CAMERA_MODEL \ 21 | --image_list_path $WORK_DIR/homo90.txt \ 22 | --ImageReader.single_camera 1 \ 23 | --SiftExtraction.use_gpu 1 \ 24 | --SiftExtraction.gpu_index $GPU_IDX \ 25 | --image_path $IMGS_DIR \ 26 | 27 | colmap sequential_matcher \ 28 | --database_path ${DB_PATH} \ 29 | --SiftMatching.use_gpu 1 \ 30 | --SequentialMatching.loop_detection 1 \ 31 | --SiftMatching.gpu_index $GPU_IDX \ 32 | --SequentialMatching.vocab_tree_path vocab_bins/vocab_tree_flickr100K_words32K.bin \ 33 | 34 | colmap mapper \ 35 | --database_path ${DB_PATH} \ 36 | --image_path $IMGS_DIR \ 37 | --output_path ${SPARSE_DIR} \ 38 | --image_list_path $WORK_DIR/homo90.txt \ 39 | #--Mapper.ba_global_use_pba 1 \ 40 | #--Mapper.ba_global_pba_gpu_index 0 1 \ 41 | 42 | 43 | end=`date +%s` 44 | 45 | runtime=$(((end-start)/60)) 46 | echo "$runtime minutes" 47 | -------------------------------------------------------------------------------- /demo/register_dense.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | start=`date +%s` 3 | 4 | GPU_IDX=0 5 | 6 | WORK_DIR=$1 7 | CAMERA_MODEL=$2 8 | MAX_SPARSE_IND=$3 9 | IMGS_DIR=$WORK_DIR/frames 10 | OUT_DIR=${WORK_DIR}/colmap 11 | 12 | DB_PATH=${OUT_DIR}/database.db 13 | SPARSE_DIR=${OUT_DIR}/sparse 14 | 15 | REG_DIR=${OUT_DIR}/registered 16 | mkdir -p $REG_DIR 17 | 18 | VIDEOUID=`basename $WORK_DIR` 19 | REG_DB_PATH=${OUT_DIR}/reg${VIDEOUID}.db 20 | echo $VIDOEUID $REG_DB_PATH 21 | rm -f $REG_DB_PATH $REG_DB_PATH-shm $REG_DB_PATH-wal 22 | cp $DB_PATH $REG_DB_PATH 23 | 24 | colmap feature_extractor \ 25 | --database_path ${REG_DB_PATH} \ 26 | --ImageReader.camera_model $CAMERA_MODEL \ 27 | --ImageReader.single_camera 1 \ 28 | --ImageReader.existing_camera_id 1 \ 29 | --SiftExtraction.use_gpu 1 \ 30 | --SiftExtraction.gpu_index $GPU_IDX \ 31 | --image_path $IMGS_DIR 32 | 33 | colmap sequential_matcher \ 34 | --database_path ${REG_DB_PATH} \ 35 | --SiftMatching.use_gpu 1 \ 36 | --SequentialMatching.loop_detection 1 \ 37 | --SiftMatching.gpu_index $GPU_IDX \ 38 | --SequentialMatching.vocab_tree_path vocab_bins/vocab_tree_flickr100K_words32K.bin \ 39 | 40 | colmap image_registrator \ 41 | --database_path $REG_DB_PATH \ 42 | --input_path $SPARSE_DIR/$MAX_SPARSE_IND \ 43 | --output_path $REG_DIR \ 44 | 45 | # Release space after successful registration 46 | if [ -e $REG_DIR/images.bin ]; then 47 | rm -f $REG_DB_PATH $REG_DB_PATH-shm $REG_DB_PATH-wal 48 | fi 49 | 50 | end_reg=`date +%s` 51 | 52 | runtime=$(((end_reg-start)/60)) 53 | echo "$runtime minutes" 54 | 55 | -------------------------------------------------------------------------------- /example_data/P04_01_line.json: -------------------------------------------------------------------------------- 1 | [ 2 | -3.028, 4.60835, 3.67792, 3 | 0.199998, 0.291596, 5.56575 4 | ] 5 | -------------------------------------------------------------------------------- /example_data/P04_01_line.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P04_01_line.png -------------------------------------------------------------------------------- /example_data/P06_09_line.json: -------------------------------------------------------------------------------- 1 | [ 2 | 11.5486, 1.00723, 3.13634, 3 | -2.84154, 0.720368, 6.66926 4 | ] -------------------------------------------------------------------------------- /example_data/P06_09_line.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P06_09_line.png -------------------------------------------------------------------------------- /example_data/P12_101_line.json: -------------------------------------------------------------------------------- 1 | [ 2 | 2.44827, 0.0581669, 8.20895, 3 | -7.4244, 3.82762, 7.32 4 | ] -------------------------------------------------------------------------------- /example_data/P12_101_line.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P12_101_line.png -------------------------------------------------------------------------------- /example_data/P28_101/frame_0000000080.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101/frame_0000000080.jpg -------------------------------------------------------------------------------- /example_data/P28_101/frame_0000000085.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101/frame_0000000085.jpg -------------------------------------------------------------------------------- /example_data/P28_101/frame_0000000090.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101/frame_0000000090.jpg -------------------------------------------------------------------------------- /example_data/P28_101/frame_0000000095.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101/frame_0000000095.jpg -------------------------------------------------------------------------------- /example_data/P28_101/frame_0000000100.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101/frame_0000000100.jpg -------------------------------------------------------------------------------- /example_data/P28_101/frame_0000000105.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101/frame_0000000105.jpg -------------------------------------------------------------------------------- /example_data/P28_101/frame_0000000110.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101/frame_0000000110.jpg -------------------------------------------------------------------------------- /example_data/P28_101/frame_0000000115.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101/frame_0000000115.jpg -------------------------------------------------------------------------------- /example_data/P28_101_line.json: -------------------------------------------------------------------------------- 1 | [ 2 | -2.49927, -0.543869, 2.57086, 3 | 3.32875, -2.17165, 2.4229 4 | ] -------------------------------------------------------------------------------- /example_data/P28_101_line.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/P28_101_line.png -------------------------------------------------------------------------------- /example_data/example_output_gui.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/example_output_gui.jpg -------------------------------------------------------------------------------- /example_data/example_output_line.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/example_data/example_output_line.jpg -------------------------------------------------------------------------------- /homography_filter/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/homography_filter/__init__.py -------------------------------------------------------------------------------- /homography_filter/argparser.py: -------------------------------------------------------------------------------- 1 | 2 | import argparse 3 | 4 | 5 | def parse_args(): 6 | parser = argparse.ArgumentParser() 7 | parser.add_argument( 8 | "--src", 9 | type=str, 10 | ) 11 | parser.add_argument( 12 | "--dst_file", 13 | type=str, 14 | ) 15 | parser.add_argument( 16 | "--overlap", 17 | default=0.9, 18 | type=float, 19 | ) 20 | parser.add_argument( 21 | "--frame_range_min", 22 | default=0, 23 | type=int, 24 | ) 25 | parser.add_argument( 26 | "--frame_range_max", 27 | default=None, 28 | type=int, 29 | ) 30 | parser.add_argument( 31 | "--filtering_scale", 32 | default=1, 33 | type=int, 34 | ) 35 | parser.add_argument( 36 | '-f', 37 | type=str, 38 | default=None 39 | ) 40 | args = parser.parse_args() 41 | return args 42 | -------------------------------------------------------------------------------- /homography_filter/filter.py: -------------------------------------------------------------------------------- 1 | 2 | import os 3 | from glob import glob 4 | import numpy as np 5 | from matplotlib import pyplot as plt 6 | from collections import defaultdict 7 | import time 8 | 9 | from lib import * 10 | from argparser import parse_args 11 | import cv2 12 | 13 | 14 | def make_homography_loader(args): 15 | 16 | images = Images(args.src, scale=args.filtering_scale) 17 | print(f'Found {len(images.imreader.fpaths)} images.') 18 | features = Features(images) 19 | matches = Matches(features) 20 | homographies = Homographies(images, features, matches) 21 | 22 | return homographies 23 | 24 | 25 | def save(fpaths_filtered, args): 26 | imreader = ImageReader(src=args.src) 27 | dir_dst = args.dir_dst 28 | dir_images = os.path.join(dir_dst, 'images') 29 | extract_frames(dir_images, fpaths_filtered, imreader) 30 | save_as_video(os.path.join(dir_dst, 'video'), fpaths_filtered, imreader) 31 | 32 | 33 | if __name__ == '__main__': 34 | 35 | # set filtering to deterministic mode 36 | cv2.setRNGSeed(0) 37 | args = parse_args() 38 | homographies = make_homography_loader(args) 39 | graph = calc_graph(homographies, **vars(args)) 40 | fpaths_filtered = graph2fpaths(graph) 41 | lines = [os.path.basename(v)+'\n' for v in fpaths_filtered] 42 | dir_name = os.path.dirname(args.dst_file) 43 | if not os.path.exists(dir_name): 44 | os.makedirs(dir_name) 45 | with open(args.dst_file, 'w') as fp: 46 | fp.writelines(lines) -------------------------------------------------------------------------------- /homography_filter/lib.py: -------------------------------------------------------------------------------- 1 | 2 | import cv2 as cv 3 | import numpy as np 4 | from matplotlib import pyplot as plt 5 | from collections import defaultdict 6 | import sys 7 | import os 8 | import shutil 9 | from glob import glob 10 | 11 | 12 | if '-f' in sys.argv: 13 | from tqdm.notebook import tqdm 14 | else: 15 | from tqdm import tqdm 16 | 17 | 18 | class Images: 19 | def __init__(self, src, load_grey=True, scale=1): 20 | self.images = {} 21 | self.im_size = None 22 | self.src = src 23 | self.scale = scale 24 | if load_grey: 25 | self.imreader = ImageReader(src, scale=scale, cv_flag=cv.IMREAD_GRAYSCALE) 26 | else: 27 | self.imreader = ImageReader(src, scale=scale) 28 | 29 | def __getitem__(self, k): 30 | if k not in self.images: 31 | im = self.imreader[k] 32 | self.images[k] = im 33 | self.im_size = self.images[k].shape[:2] 34 | return self.images[k] 35 | 36 | 37 | class Features: 38 | def __init__(self, images): 39 | self.features = {} 40 | self.images = images 41 | self.sift = cv.SIFT_create() 42 | 43 | def __getitem__(self, k): 44 | if k not in self.features: 45 | im = self.images[k] 46 | kp, des = self.sift.detectAndCompute(im, None) 47 | self.features[k] = (kp, des) 48 | return self.features[k] 49 | 50 | 51 | class Matches: 52 | def __init__(self, features): 53 | 54 | FLANN_INDEX_KDTREE = 1 55 | index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5) 56 | search_params = dict(checks=50) 57 | self.features = features 58 | self.matcher = cv.FlannBasedMatcher(index_params, search_params) 59 | self.matches = {} 60 | self.for_panorama_stitching = False 61 | 62 | def __getitem__(self, k): 63 | if k not in self.matches: 64 | (kp1, des1) = self.features[k[0]] 65 | (kp2, des2) = self.features[k[1]] 66 | if len(kp1) > 8: 67 | try: 68 | matches = self.matcher.knnMatch(des1, des2, k=2) 69 | except cv.error as e: 70 | print('NOTE: Too few keypoints for matching, skip.') 71 | matches = zip([], []) 72 | else: 73 | matches = zip([], []) 74 | # store all the good matches as per Lowe's ratio test. 75 | good = [] 76 | for m, n in matches: 77 | if m.distance < 0.7 * n.distance: 78 | good.append(m) 79 | self.matches[k] = good 80 | 81 | return self.matches[k] 82 | 83 | 84 | class Homographies: 85 | def __init__(self, images, features, matches): 86 | self.matches = matches 87 | self.homographies = {} 88 | self.images = images 89 | self.features = features 90 | self.warps = {} 91 | self.min_match_count = 10 92 | self.images_rgb = ImageReader(src=self.images.src, scale=self.images.scale) 93 | 94 | def __getitem__(self, k): 95 | good = self.matches[k] 96 | kp1, _ = self.features[k[0]] 97 | kp2, _ = self.features[k[1]] 98 | img2 = self.images[k[1]] 99 | if k not in self.homographies: 100 | if len(good) > self.min_match_count: 101 | src_pts = np.float32([kp1[m.queryIdx].pt for m in good]).reshape( 102 | -1, 1, 2 103 | ) 104 | dst_pts = np.float32([kp2[m.trainIdx].pt for m in good]).reshape( 105 | -1, 1, 2 106 | ) 107 | M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC, 5.0) 108 | self.homographies[k] = (M, mask) 109 | else: 110 | # print( "Not enough matches are found - {}/{}".format(len(good), self.min_match_count) ) 111 | matchesMask = None 112 | self.homographies[k] = (None, None) 113 | return self.homographies[k] 114 | 115 | def calc_overlap(self, *k, vis=False, is_debug=False, with_warp=False, draw_matches=True): 116 | img1 = self.images_rgb[k[0]].copy() 117 | img2 = self.images_rgb[k[1]].copy() 118 | kp1, _ = self.features[k[0]] 119 | kp2, _ = self.features[k[1]] 120 | good = self.matches[k] 121 | h, w, c = img1.shape 122 | M, mask = self[k] 123 | 124 | if M is None: 125 | return 0, [], np.zeros([h, w * 2]) 126 | 127 | matchesMask = mask.ravel().tolist() 128 | 129 | pts = np.float32([[0, 0], [0, h - 1], [w - 1, h - 1], [w - 1, 0]]).reshape( 130 | -1, 1, 2 131 | ) 132 | dst = cv.perspectiveTransform(pts, M) 133 | 134 | img2 = cv.polylines(img2, [np.int32(dst)], True, 255, 3, cv.LINE_AA) 135 | 136 | if with_warp: 137 | self.warps[k] = img2 138 | draw_params = dict( 139 | matchColor=(0, 255, 0), # draw matches in green color 140 | singlePointColor=None, 141 | matchesMask=matchesMask, # draw only inliers 142 | flags=2, 143 | ) 144 | 145 | if is_debug: 146 | if draw_matches: 147 | im_matches = cv.drawMatches(img1, kp1, img2, kp2, good, None, **draw_params) 148 | else: 149 | im_matches = img2 150 | if vis: 151 | plt.imshow(im_matches, "gray"), plt.show() 152 | # plt.imshow(img3, "gray"), plt.show() 153 | else: 154 | im_matches = img2 155 | 156 | image_area = self.images.im_size[0] * self.images.im_size[1] 157 | polygon = dst.copy()[:, 0] 158 | polygon = bound_polygon(polygon, im_size=self.images.im_size) 159 | overlap = polygon_area(polygon[:, 1], polygon[:, 0]) / image_area 160 | 161 | return overlap, good, im_matches 162 | 163 | def calc_graph( 164 | homographies, 165 | return_im_matches=False, 166 | overlap=0.9, 167 | frame_range_min=0, 168 | frame_range_max=None, 169 | is_debug=False, 170 | clear_cache=True, 171 | **kwargs, 172 | ): 173 | 174 | fpaths = homographies.images.imreader.fpaths 175 | print(overlap) 176 | graph = {'im_matches': {}, 'fpaths': {}} 177 | if frame_range_max is None: 178 | frame_range_max = len(fpaths) 179 | i = frame_range_min 180 | j = i + 1 181 | pbar = tqdm(total=frame_range_max - frame_range_min - 1) 182 | while i < frame_range_max - 1 and j < frame_range_max: 183 | j = i + 1 184 | while j < frame_range_max: 185 | pbar.update(1) 186 | overlap_ij, matches, im_matches = homographies.calc_overlap( 187 | fpaths[i], 188 | fpaths[j], 189 | vis=False, 190 | is_debug=is_debug, 191 | ) 192 | if overlap_ij < overlap: 193 | if is_debug: 194 | graph['im_matches'][i, j] = im_matches 195 | graph['fpaths'][i, j] = [fpaths[i], fpaths[j]] 196 | if clear_cache: 197 | i_ = i 198 | pi = fpaths[i_] 199 | del homographies.images.images[pi] 200 | del homographies.features.features[pi] 201 | for j_ in range(i_+1, j+1): 202 | pj = fpaths[j_] 203 | del homographies.homographies[(pi, pj)] 204 | del homographies.matches.matches[(pi, pj)] 205 | del homographies.images.images[pj] 206 | del homographies.features.features[pj] 207 | i = j 208 | break 209 | j += 1 210 | pbar.close() 211 | return graph 212 | 213 | 214 | def graph2fpaths(graph): 215 | fpaths = list(graph['fpaths'].values()) 216 | first_fpath = fpaths[0][0] 217 | graph = graph['fpaths'] 218 | paths = [first_fpath] + [fpath_pair[1] for fpath_pair in graph.values()] 219 | return paths 220 | 221 | 222 | def bound_polygon(polygon, im_size): 223 | # approximate for now instead of line clipping 224 | polygon[:, 0] = np.clip(polygon[:, 0], 0, im_size[1]) 225 | polygon[:, 1] = np.clip(polygon[:, 1], 0, im_size[0]) 226 | return polygon 227 | 228 | 229 | def polygon_area(x,y): 230 | return 0.5*np.abs(np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1))) 231 | 232 | 233 | def write_mp4(name, frames, fps=10): 234 | import imageio 235 | imageio.mimwrite(name + ".mp4", frames, "mp4", fps=fps) 236 | 237 | 238 | def save_as_video(dst, fpaths, imreader): 239 | frames = [] 240 | for fp in tqdm(fpaths): 241 | frames += [imreader[fp]] 242 | write_mp4(dst, frames) 243 | 244 | 245 | def extract_frames(dir_dst, fpaths, imreader): 246 | for k in fpaths: 247 | imreader.save(k, dir_dst) 248 | 249 | 250 | # imreader 251 | 252 | import io 253 | def tar2bytearr(tar_member): 254 | return np.asarray( 255 | bytearray( 256 | tar_member.read() 257 | ), 258 | dtype=np.uint8 259 | ) 260 | 261 | import shutil 262 | 263 | import tarfile 264 | class ImageReader: 265 | def __init__(self, src, scale=1, cv_flag=cv.IMREAD_UNCHANGED): 266 | # src can be directory or tar file 267 | 268 | self.scale = 1 269 | self.cv_flag = cv_flag 270 | 271 | if os.path.isdir(src): 272 | self.src_type = 'dir' 273 | self.fpaths = sorted(glob(os.path.join(src, '*.jpg'))) 274 | elif os.path.isfile(src) and os.path.splitext(src)[1] == '.tar': 275 | self.tar = tarfile.open(src) 276 | self.src_type = 'tar' 277 | self.fpaths = sorted([x for x in self.tar.getnames() if 'frame_' in x and '.jpg' in x]) 278 | else: 279 | print('Source has unknown format.') 280 | exit() 281 | 282 | def __getitem__(self, k): 283 | if self.src_type == 'dir': 284 | 285 | im = cv.imread(k, self.cv_flag) 286 | elif self.src_type == 'tar': 287 | member = self.tar.getmember(k) 288 | tarfile = self.tar.extractfile(member) 289 | byte_array = tar2bytearr(tarfile) 290 | im = cv.imdecode(byte_array, self.cv_flag) 291 | if self.scale != 1: 292 | im = cv.resize( 293 | im, dsize=[im.shape[0] // self.scale, im.shape[1] // self.scale] 294 | ) 295 | if self.cv_flag != cv.IMREAD_GRAYSCALE: 296 | im = im[..., [2, 1, 0]] 297 | return im 298 | 299 | def save(self, k, dst): 300 | fn = os.path.split(k)[-1] 301 | if self.src_type == 'dir': 302 | shutil.copy(k, os.path.join(dst, fn)) 303 | elif self.src_type == 'tar': 304 | self.tar.extract(self.tar.getmember(k), dst) 305 | 306 | 307 | # test 308 | def test(): 309 | reader_args = {'scale': 2, 'cv_flag': cv.IMREAD_GRAYSCALE} 310 | reader_args = {'scale': 2} 311 | 312 | src = '/work/vadim/datasets/visor/2v6cgv1x04ol22qp9rm9x2j6a7/' + \ 313 | 'EPIC-KITCHENS-frames/tar/P28_05.tar' 314 | imreader1 = ImageReader(src=src, **reader_args) 315 | fpaths1 = imreader1.fpaths 316 | 317 | reader_args = {'scale': 2} 318 | 319 | video_id = 'P28_05' 320 | src = f'/work/vadim/datasets/visor/2v6cgv1x04ol22qp9rm9x2j6a7/EPIC-KITCHENS-frames/rgb_frames/{video_id}' 321 | imreader2 = ImageReader(src=src, **reader_args) 322 | fpaths2 = imreader2.fpaths 323 | 324 | for i in range(0, len(fpaths1), 1000): 325 | print((imreader1[fpaths1[i]] == imreader2[fpaths2[i]]).all()) -------------------------------------------------------------------------------- /input_videos.txt: -------------------------------------------------------------------------------- 1 | P15_12 -------------------------------------------------------------------------------- /licence.txt: -------------------------------------------------------------------------------- 1 | All files in this dataset are copyright by us and published under the 2 | Creative Commons Attribution-NonCommerial 4.0 International License, found 3 | at https://creativecommons.org/licenses/by-nc/4.0/. 4 | This means that you must give appropriate credit, provide a link to the license, 5 | and indicate if changes were made. You may do so in any reasonable manner, 6 | but not in any way that suggests the licensor endorses you or your use. You 7 | may not use the material for commercial purposes. 8 | -------------------------------------------------------------------------------- /reconstruct_sparse.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import shutil 3 | import os 4 | import time 5 | import glob 6 | import argparse 7 | import pycolmap 8 | from utils.lib import * 9 | # Function to parse command-line arguments 10 | def parse_args(): 11 | parser = argparse.ArgumentParser(description='COLMAP Reconstruction Script') 12 | parser.add_argument('--input_videos', type=str, default='input_videos.txt', 13 | help='A file with list of vidoes to be processed in all stages') 14 | parser.add_argument('--sparse_reconstuctions_root', type=str, default='colmap_models/sparse', 15 | help='Path to the sparsely reconstructed models.') 16 | parser.add_argument('--epic_kithens_root', type=str, default='.', 17 | help='Path to epic kitchens images.') 18 | parser.add_argument('--logs_path', type=str, default='logs/sparse/out_logs_terminal', 19 | help='Path to store the log files.') 20 | parser.add_argument('--summary_path', type=str, default='logs/sparse/out_summary', 21 | help='Path to store the summary files.') 22 | parser.add_argument('--sampled_images_path', type=str, default='sampled_frames', 23 | help='Path to the directory containing sampled image files.') 24 | parser.add_argument('--gpu_index', type=int, default=0, 25 | help='Index of the GPU to use.') 26 | 27 | return parser.parse_args() 28 | 29 | 30 | args = parse_args() 31 | 32 | gpu_index = args.gpu_index 33 | 34 | videos_list = read_lines_from_file(args.input_videos) 35 | videos_list = sorted(videos_list) 36 | print('GPU: %d' % (gpu_index)) 37 | os.makedirs(args.logs_path, exist_ok=True) 38 | os.makedirs(args.summary_path, exist_ok=True) 39 | os.makedirs(args.sparse_reconstuctions_root, exist_ok=True) 40 | 41 | i = 0 42 | for video in videos_list: 43 | pre = video.split('_')[0] 44 | if (not os.path.exists(os.path.join(args.sparse_reconstuctions_root, '%s' % video))): 45 | # check the number of images in this video 46 | with open(os.path.join(args.sampled_images_path, '%s_selected_frames.txt' % (video)), 'r') as f: 47 | lines = f.readlines() 48 | num_lines = len(lines) 49 | #print(f'The file {video} contains {num_lines} lines.') 50 | if num_lines < 100000: #it's too large, so it would take days! 51 | print('Processing: ', video, '(',num_lines, 'images )') 52 | start_time = time.time() 53 | 54 | # Define the path to the shell script 55 | script_path = 'scripts/reconstruct_sparse.sh' 56 | 57 | # Create a unique copy of the script 58 | script_copy_path = video + '_' + str(os.getpid()) + '_' + os.path.basename(script_path) 59 | shutil.copy(script_path, script_copy_path) 60 | 61 | # Output file 62 | output_file_path = os.path.join(args.logs_path, script_copy_path.replace('.sh', '.out')) 63 | 64 | 65 | # Define the command to execute the script 66 | command = ["bash", script_copy_path, video,args.sparse_reconstuctions_root,args.epic_kithens_root,args.sampled_images_path,args.summary_path,str(gpu_index)] 67 | # Open the output file in write mode 68 | with open(output_file_path, 'w') as output_file: 69 | # Run the command and capture its output in real time 70 | process = subprocess.Popen(command, stdout=output_file, stderr=subprocess.PIPE, text=True) 71 | while True: 72 | output = process.stderr.readline() 73 | if output == '' and process.poll() is not None: 74 | break 75 | if output: 76 | output_file.write(output) 77 | output_file.flush() 78 | 79 | # Once the script has finished running, you can delete the copy of the script 80 | os.remove(script_copy_path) 81 | 82 | #In case of having multiple models, will keep the one with largest number of images and rename it as 0 83 | reg_images = keep_model_with_largest_images(os.path.join(args.sparse_reconstuctions_root,video,'sparse')) 84 | if reg_images > 0: 85 | print(f"Registered_images/total_images: {reg_images}/{num_lines} = {round(reg_images/num_lines*100)}%") 86 | else: 87 | print('The video reconstruction fails!! no reconstruction file is found!') 88 | 89 | 90 | 91 | 92 | print("Execution time: %s minutes" % round((time.time() - start_time)/60, 0)) 93 | print('-----------------------------------------------------------') 94 | 95 | i += 1 96 | 97 | -------------------------------------------------------------------------------- /register_dense.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import shutil 3 | import os 4 | import time 5 | import glob 6 | import argparse 7 | import pycolmap 8 | from utils.lib import * 9 | # Function to parse command-line arguments 10 | def parse_args(): 11 | parser = argparse.ArgumentParser(description='COLMAP Reconstruction Script') 12 | parser.add_argument('--input_videos', type=str, default='input_videos.txt', 13 | help='A file with list of vidoes to be processed in all stages') 14 | parser.add_argument('--sparse_reconstuctions_root', type=str, default='colmap_models/sparse', 15 | help='Path to the sparsely reconstructed models.') 16 | parser.add_argument('--dense_reconstuctions_root', type=str, default='colmap_models/dense', 17 | help='Path to the densely registered models.') 18 | parser.add_argument('--epic_kithens_root', type=str, default='.', 19 | help='Path to epic kitchens images.') 20 | parser.add_argument('--logs_path', type=str, default='logs/dense/out_logs_terminal', 21 | help='Path to store the log files.') 22 | parser.add_argument('--summary_path', type=str, default='logs/dense/out_summary', 23 | help='Path to store the summary files.') 24 | parser.add_argument('--gpu_index', type=int, default=0, 25 | help='Index of the GPU to use.') 26 | 27 | return parser.parse_args() 28 | 29 | 30 | args = parse_args() 31 | 32 | gpu_index = args.gpu_index 33 | 34 | videos_list = read_lines_from_file(args.input_videos) 35 | videos_list = sorted(videos_list) 36 | print('GPU: %d' % (gpu_index)) 37 | os.makedirs(args.logs_path, exist_ok=True) 38 | os.makedirs(args.summary_path, exist_ok=True) 39 | os.makedirs(args.sparse_reconstuctions_root, exist_ok=True) 40 | os.makedirs(args.dense_reconstuctions_root, exist_ok=True) 41 | 42 | 43 | i = 0 44 | for video in videos_list: 45 | pre = video.split('_')[0] 46 | if (not os.path.exists(os.path.join(args.dense_reconstuctions_root, '%s' % video))): 47 | # check the number of images in this video 48 | num_lines = len(glob.glob(os.path.join(args.epic_kithens_root,pre,video,'*.jpg'))) 49 | 50 | print('Processing: ', video, '(',num_lines, 'images )') 51 | start_time = time.time() 52 | 53 | # Define the path to the shell script 54 | script_path = 'scripts/register_dense.sh' 55 | 56 | # Create a unique copy of the script 57 | script_copy_path = video + '_' + str(os.getpid()) + '_' + os.path.basename(script_path) 58 | shutil.copy(script_path, script_copy_path) 59 | 60 | # Output file 61 | output_file_path = os.path.join(args.logs_path, script_copy_path.replace('.sh', '.out')) 62 | 63 | 64 | # Define the command to execute the script 65 | command = ["bash", script_copy_path, video,args.sparse_reconstuctions_root,args.dense_reconstuctions_root,args.epic_kithens_root,args.summary_path,str(gpu_index)] 66 | # Open the output file in write mode 67 | with open(output_file_path, 'w') as output_file: 68 | # Run the command and capture its output in real time 69 | process = subprocess.Popen(command, stdout=output_file, stderr=subprocess.PIPE, text=True) 70 | while True: 71 | output = process.stderr.readline() 72 | if output == '' and process.poll() is not None: 73 | break 74 | if output: 75 | output_file.write(output) 76 | output_file.flush() 77 | 78 | # Once the script has finished running, you can delete the copy of the script 79 | os.remove(script_copy_path) 80 | 81 | 82 | reg_images = get_num_images(os.path.join(args.dense_reconstuctions_root,video)) 83 | if reg_images > 0: 84 | print(f"Registered_images/total_images: {reg_images}/{num_lines} = {round(reg_images/num_lines*100)}%") 85 | else: 86 | print('The video reconstruction fails!! no colmap files are found!') 87 | 88 | 89 | 90 | 91 | print("Execution time: %s minutes" % round((time.time() - start_time)/60, 0)) 92 | print('-----------------------------------------------------------') 93 | 94 | i += 1 95 | 96 | -------------------------------------------------------------------------------- /scripts/reconstruct_sparse.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | start=`date +%s` 3 | 4 | VIDEO=$1 #i.e. P02_14 5 | SPARSE_PATH=$2 # path to save the sparse models 6 | IMAGES_ROOT=$3 # root of epic kitchens images 7 | SAMPLED_IMAGES=$4 # path of the sampeld images to be used for reconstruction 8 | LOGS=$5 # to save the output logs 9 | GPU_IDX=$6 # i.e. 0 10 | 11 | PRE=$(echo "$VIDEO" | cut -d'_' -f1) 12 | #cat $0 > "${LOGS}/$VIDEO.out" 13 | mkdir ${SPARSE_PATH}/${VIDEO} 14 | mkdir ${SPARSE_PATH}/${VIDEO}/sparse 15 | 16 | colmap feature_extractor \ 17 | --database_path ${VIDEO}_database.db \ 18 | --ImageReader.camera_model OPENCV \ 19 | --image_list_path ${SAMPLED_IMAGES}/${VIDEO}_selected_frames.txt \ 20 | --ImageReader.single_camera 1 \ 21 | --SiftExtraction.use_gpu 1 \ 22 | --SiftExtraction.gpu_index $GPU_IDX \ 23 | --image_path ${IMAGES_ROOT}/${PRE}/${VIDEO} \ 24 | 25 | colmap sequential_matcher \ 26 | --database_path ${VIDEO}_database.db \ 27 | --SiftMatching.use_gpu 1 \ 28 | --SequentialMatching.loop_detection 1 \ 29 | --SiftMatching.gpu_index $GPU_IDX \ 30 | --SequentialMatching.vocab_tree_path vocab_bins/vocab_tree_flickr100K_words32K.bin \ 31 | 32 | colmap mapper \ 33 | --database_path ${VIDEO}_database.db \ 34 | --image_path ${PRE}/${VIDEO} \ 35 | --output_path ${SPARSE_PATH}/${VIDEO}/sparse \ 36 | --image_list_path ${SAMPLED_IMAGES}/${VIDEO}_selected_frames.txt \ 37 | 38 | 39 | #echo "----------------------------------------------------------------------SUMMARY----------------------------------------------------------------------">> "${LOGS}/$VIDEO.out" 40 | colmap model_analyzer --path ${SPARSE_PATH}/${VIDEO}/sparse/0/ > "${LOGS}/$VIDEO.out" 41 | 42 | end=`date +%s` 43 | runtime=$(((end-start)/60)) 44 | echo "$runtime minutes">> "${LOGS}/$VIDEO.out" 45 | mv ${VIDEO}_database.db ${SPARSE_PATH}/${VIDEO}/database.db #move the database 46 | -------------------------------------------------------------------------------- /scripts/register_dense.sh: -------------------------------------------------------------------------------- 1 | start=`date +%s` 2 | 3 | VIDEO=$1 #i.e. P02_14 4 | SPARSE_PATH=$2 # path to save the sparse models 5 | DENSE_PATH=$3 # path to save the sparse models 6 | IMAGES_ROOT=$4 # root of epic kitchens images 7 | LOGS=$5 # to save the output logs 8 | GPU_IDX=$6 # i.e. 0 9 | 10 | PRE=$(echo "$VIDEO" | cut -d'_' -f1) 11 | 12 | cp ${SPARSE_PATH}/${VIDEO}/database.db ${VIDEO}_database.db #move the database from the sparse model 13 | mkdir ${DENSE_PATH}/${VIDEO} 14 | 15 | colmap feature_extractor \ 16 | --database_path ${VIDEO}_database.db \ 17 | --ImageReader.camera_model OPENCV \ 18 | --ImageReader.single_camera 1 \ 19 | --ImageReader.existing_camera_id 1 \ 20 | --SiftExtraction.use_gpu 1 \ 21 | --SiftExtraction.gpu_index $GPU_IDX \ 22 | --image_path ${IMAGES_ROOT}/${PRE}/${VIDEO} \ 23 | 24 | 25 | 26 | colmap sequential_matcher \ 27 | --database_path ${VIDEO}_database.db \ 28 | --SiftMatching.use_gpu 1 \ 29 | --SequentialMatching.loop_detection 1 \ 30 | --SiftMatching.gpu_index $GPU_IDX \ 31 | --SequentialMatching.vocab_tree_path vocab_bins/vocab_tree_flickr100K_words32K.bin \ 32 | 33 | 34 | colmap image_registrator \ 35 | --database_path ${VIDEO}_database.db \ 36 | --input_path ${SPARSE_PATH}/${VIDEO}/sparse/0 \ 37 | --output_path ${DENSE_PATH}/${VIDEO} \ 38 | 39 | 40 | colmap model_analyzer --path ${DENSE_PATH}/${VIDEO} > "${LOGS}/$VIDEO.out" 41 | 42 | end_reg=`date +%s` 43 | 44 | runtime=$(((end_reg-start)/60)) 45 | echo "$runtime minutes (registration time)">> "${LOGS}/$VIDEO.out" 46 | 47 | rm ${VIDEO}_database.db #remove the database since it's too large, you can keep it upon your usecase 48 | -------------------------------------------------------------------------------- /select_sparse_frames.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import concurrent.futures 3 | import glob 4 | import os 5 | import argparse 6 | from utils.lib import * 7 | # Function to parse command-line arguments 8 | def parse_args(): 9 | parser = argparse.ArgumentParser(description='COLMAP Reconstruction Script') 10 | parser.add_argument('--input_videos', type=str, default='input_videos.txt', 11 | help='A file with list of vidoes to be processed in all stages') 12 | parser.add_argument('--epic_kithens_root', type=str, default='.', 13 | help='Path to epic kitchens images.') 14 | parser.add_argument('--sampled_images_path', type=str, default='sampled_frames', 15 | help='Path to the directory containing sampled image files.') 16 | parser.add_argument('--homography_overlap', type=float, default=0.9, 17 | help='Threshold of the homography to sample new frames, higher value samples more images') 18 | parser.add_argument('--max_concurrent', type=int, default=8, 19 | help='Max number of concurrent processes') 20 | return parser.parse_args() 21 | 22 | 23 | 24 | 25 | def main(): 26 | args = parse_args() 27 | 28 | videos = read_lines_from_file(args.input_videos) 29 | epic_root = args.epic_kithens_root 30 | params_list = [] 31 | for video in videos: 32 | video_pre = video.split('_')[0] 33 | for folder in sorted(glob.glob(os.path.join(epic_root,video_pre+'/*'))): 34 | video = folder.split('/')[-1] 35 | if video in videos: 36 | print(video) 37 | added_run = ['--src', folder, '--dst_file', '%s/%s_selected_frames.txt'%(args.sampled_images_path,video), '--overlap', str(args.homography_overlap)] 38 | if not added_run in params_list: 39 | params_list.append(added_run) 40 | 41 | if params_list: 42 | max_concurrent = args.max_concurrent 43 | # Create a process pool executor with a maximum of K processes 44 | executor = concurrent.futures.ProcessPoolExecutor(max_workers=max_concurrent) 45 | 46 | # Submit the tasks to the executor 47 | results = [] 48 | for i in range(len(params_list)): 49 | future = executor.submit(run_script, 'homography_filter/filter.py', params_list[i % len(params_list)]) 50 | results.append(future) 51 | 52 | # Wait for all tasks to complete 53 | for r in concurrent.futures.as_completed(results): 54 | try: 55 | r.result() 56 | except Exception as e: 57 | print(f"Error occurred: {e}") 58 | 59 | # Shut down the executor 60 | executor.shutdown() 61 | 62 | 63 | if __name__ == '__main__': 64 | main() -------------------------------------------------------------------------------- /tools/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/tools/__init__.py -------------------------------------------------------------------------------- /tools/common_functions.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | """ Source: see COLMAP """ 5 | def qvec2rotmat(qvec): 6 | return np.array([ 7 | [1 - 2 * qvec[2]**2 - 2 * qvec[3]**2, 8 | 2 * qvec[1] * qvec[2] - 2 * qvec[0] * qvec[3], 9 | 2 * qvec[3] * qvec[1] + 2 * qvec[0] * qvec[2]], 10 | [2 * qvec[1] * qvec[2] + 2 * qvec[0] * qvec[3], 11 | 1 - 2 * qvec[1]**2 - 2 * qvec[3]**2, 12 | 2 * qvec[2] * qvec[3] - 2 * qvec[0] * qvec[1]], 13 | [2 * qvec[3] * qvec[1] - 2 * qvec[0] * qvec[2], 14 | 2 * qvec[2] * qvec[3] + 2 * qvec[0] * qvec[1], 15 | 1 - 2 * qvec[1]**2 - 2 * qvec[2]**2]]) 16 | 17 | 18 | 19 | def get_c2w(img_data: list) -> np.ndarray: 20 | """ 21 | Args: 22 | img_data: list, [qvec, tvec] of w2c 23 | 24 | Returns: 25 | c2w: np.ndarray, 4x4 camera-to-world matrix 26 | """ 27 | w2c = np.eye(4) 28 | w2c[:3, :3] = qvec2rotmat(img_data[:4]) 29 | w2c[:3, -1] = img_data[4:7] 30 | c2w = np.linalg.inv(w2c) 31 | return c2w 32 | -------------------------------------------------------------------------------- /tools/project_3d_line.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict 2 | import argparse 3 | import json 4 | import os 5 | import re 6 | import os.path as osp 7 | import tqdm 8 | import numpy as np 9 | 10 | import cv2 11 | from PIL import Image 12 | 13 | from tools.common_functions import qvec2rotmat 14 | 15 | 16 | class Line: 17 | """ An infinite 3D line to denote Annotated Line """ 18 | 19 | def __init__(self, line_ends: np.ndarray): 20 | """ 21 | Args: 22 | line_ends: (2, 3) 23 | points annotated using some GUI, denoting points along the desired line 24 | """ 25 | st, ed = line_ends 26 | self.vc = (st + ed) / 2 27 | self.dir = ed - st 28 | self.v0 = st 29 | self.v1 = ed 30 | 31 | def __repr__(self) -> str: 32 | return f'vc: {str(self.vc)} \ndir: {str(self.dir)}' 33 | 34 | def check_single_point(self, 35 | point: np.ndarray, 36 | radius: float) -> bool: 37 | """ 38 | point-to-line = (|(p-v_0)x(p-v_1)|)/(|v_1 - v_0|) 39 | 40 | Args: 41 | point: (3,) array of point 42 | radius: threshold for checking inside 43 | """ 44 | area2 = np.linalg.norm(np.cross(point - self.v0, point - self.v1)) 45 | base_len = np.linalg.norm(self.v1 - self.v0) 46 | d = area2 / base_len 47 | return True if d < radius else False 48 | 49 | def check_points(self, 50 | points: np.ndarray, 51 | diameter: float) -> np.ndarray: 52 | """ 53 | Args: 54 | points: (N, 3) array of points 55 | diameter: threshold for checking inside 56 | 57 | Returns: 58 | (N,) bool array 59 | """ 60 | area2 = np.linalg.norm(np.cross(points - self.v0, points - self.v1), axis=1) 61 | base_len = np.linalg.norm(self.v1 - self.v0) 62 | d = area2 / base_len 63 | return d < diameter 64 | 65 | 66 | def line_rectangle_check(cen, dir, rect, 67 | eps=1e-6): 68 | """ 69 | Args: 70 | cen, dir: (2,) float 71 | rect: Tuple (xmin, ymin, xmax, ymax) 72 | 73 | Returns: 74 | num_intersect: int 75 | inters: (num_intersect, 2) float 76 | """ 77 | x1, y1 = cen 78 | u1, v1 = dir 79 | xmin, ymin, xmax, ymax = rect 80 | rect_loop = np.asarray([ 81 | [xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax], 82 | [xmin, ymin] 83 | ], dtype=np.float32) 84 | x2, y2 = rect_loop[:4, 0], rect_loop[:4, 1] 85 | u2 = rect_loop[1:, 0] - rect_loop[:-1, 0] 86 | v2 = rect_loop[1:, 1] - rect_loop[:-1, 1] 87 | 88 | t2 = (v1*x1 - u1*y1) - (v1*x2 - u1*y2) 89 | divisor = (v1*u2 - v2*u1) 90 | cond = np.abs(divisor) > eps 91 | 92 | t2[~cond] = -1 93 | t2[cond] = t2[cond] / divisor[cond] 94 | 95 | keep = (t2 >= 0) & (t2 <= 1) 96 | num_intersect = np.sum(keep) 97 | uv = np.stack([u2, v2], 1) 98 | inters = rect_loop[:4, :] + t2[:, None] * uv 99 | inters = inters[keep, :] 100 | return num_intersect, inters 101 | 102 | 103 | def project_line_image(line: Line, 104 | pose_data: list, 105 | camera: dict): 106 | """ Project a 3D line using camera pose and intrinsics 107 | 108 | This implementation ignores distortion. 109 | 110 | Args: 111 | line: 112 | -vc: (3,) float 113 | -dir: (3,) float 114 | pose_data: stores camera pose 115 | [qw, qx, qy, qz, tx, ty, tz, frame_name] 116 | camera: dict, stores intrinsics 117 | -width, 118 | -height 119 | -params (8,) fx, fy, cx, cy, k1, k2, p1, p2 120 | 121 | Returns: 122 | (st, ed): (2,) float 123 | """ 124 | cen, dir = line.vc, line.dir 125 | rot_w2c = qvec2rotmat(pose_data[:4]) 126 | tvec = np.asarray(pose_data[4:7]) 127 | # Represent as column vector 128 | cen = rot_w2c @ cen + tvec 129 | dir = rot_w2c @ dir 130 | width, height = camera['width'], camera['height'] 131 | fx, fy, cx, cy, k1, k2, p1, p2 = camera['params'] 132 | 133 | cen_uv = cen[:2] / cen[2] 134 | cen_uv = cen_uv * np.array([fx, fy]) + np.array([cx, cy]) 135 | dir_uv = ((dir + cen)[:2] / (dir + cen)[2]) - (cen[:2] / cen[2]) 136 | dir_uv = dir_uv * np.array([fx, fy]) 137 | dir_uv = dir_uv / np.linalg.norm(dir_uv) 138 | 139 | line2d = None 140 | num_inters, inters = line_rectangle_check( 141 | cen_uv, dir_uv, (0, 0, width, height)) 142 | if num_inters == 2: 143 | line2d = (inters[0], inters[1]) 144 | return line2d 145 | 146 | 147 | class LineProjector: 148 | 149 | COLORS = dict(yellow=(255, 255, 0),) 150 | 151 | def __init__(self, 152 | camera: Dict, 153 | images: Dict[str, List], 154 | line: Line): 155 | """ 156 | Args: 157 | camera: dict, camera info 158 | images: dict of 159 | frame_name: [qw, qx, qy, qz, tx, ty, tz] in **w2c** 160 | """ 161 | self.camera = camera 162 | self.images = images 163 | self.line = line 164 | self.line_color = self.COLORS['yellow'] 165 | 166 | def project_frame(self, frame_name: str, frames_root: str) -> np.ndarray: 167 | """ Project a line onto a frame 168 | 169 | Args: 170 | frame_idx: int. epic frame index 171 | frames_root: str. 172 | f'{frame_root}/frame_{frame_idx:010d}.jpg' is the path to the epic-kitchens frame 173 | 174 | Returns: 175 | img: (H, W, 3) np.uint8 176 | """ 177 | pose_data = self.images[frame_name] 178 | img_path = osp.join(frames_root, frame_name) 179 | img = np.asarray(Image.open(img_path)) 180 | line_2d = project_line_image(self.line, pose_data, self.camera) 181 | if line_2d is None: 182 | return img 183 | img = cv2.line( 184 | img, np.int32(line_2d[0]), np.int32(line_2d[1]), 185 | color=self.line_color, thickness=2, lineType=cv2.LINE_AA) 186 | 187 | return img 188 | 189 | def write_mp4(self, 190 | frames_root: str, 191 | fps=5, 192 | out_name='line_output'): 193 | """ Write mp4 file that has line projected on the image frames 194 | 195 | Args: 196 | frames_root: str. 197 | f'{frame_root}/frame_{frame_idx:010d}.jpg' is the path to the epic-kitchens frame 198 | """ 199 | out_dir = os.path.join('./outputs/', out_name) 200 | os.makedirs(out_dir, exist_ok=True) 201 | fmt = os.path.join(out_dir, '{}') 202 | 203 | frames_on_disk = set(os.listdir(frames_root)) 204 | frame_names = set(self.images.keys()) 205 | if len(frames_on_disk) < len(frame_names): 206 | print(f"Showing {len(frames_on_disk)} / {len(frame_names)} frames") 207 | frame_names = frame_names.intersection(frames_on_disk) 208 | frame_names = sorted(list(frame_names)) 209 | for frame_name in tqdm.tqdm(frame_names): 210 | img = self.project_frame(frame_name, frames_root) 211 | frame_number = re.search('\d{10,}', frame_name)[0] 212 | cv2.putText(img, frame_number, 213 | (self.camera['width']//4, self.camera['height'] * 31 // 32), 214 | cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA) 215 | Image.fromarray(img).save(fmt.format(frame_name)) 216 | 217 | from moviepy import editor 218 | clip = editor.ImageSequenceClip(sequence=out_dir, fps=fps) 219 | clip.write_videofile(f'./outputs/{out_name}-fps{fps}.mp4') 220 | 221 | 222 | if __name__ == '__main__': 223 | parser = argparse.ArgumentParser() 224 | parser.add_argument('--json-data', type=str, required=True) 225 | parser.add_argument('--line-data', type=str, required=True) 226 | parser.add_argument('--frames-root', type=str, required=True) 227 | parser.add_argument('--out-name', type=str, default="line_output") 228 | parser.add_argument('--fps', type=int, default=5) 229 | args = parser.parse_args() 230 | 231 | with open(args.json_data) as f: 232 | model = json.load(f) 233 | camera = model['camera'] 234 | images = model['images'] 235 | 236 | with open(args.line_data) as f: 237 | line = json.load(f) 238 | line = np.asarray(line).reshape(2, 3) 239 | line = Line(line) 240 | 241 | runner = LineProjector(camera, images, line) 242 | runner.write_mp4( 243 | frames_root=args.frames_root, fps=args.fps, out_name=args.out_name) 244 | -------------------------------------------------------------------------------- /tools/visualise_data_open3d.py: -------------------------------------------------------------------------------- 1 | import open3d as o3d 2 | import numpy as np 3 | from argparse import ArgumentParser 4 | import json 5 | 6 | from tools.common_functions import get_c2w 7 | 8 | """ Visualize poses and point-cloud stored in json file.""" 9 | 10 | def parse_args(): 11 | parser = ArgumentParser() 12 | parser.add_argument('--json-data', help='path to json data', required=True) 13 | parser.add_argument('--line-data', help='path to line data', default=None) 14 | parser.add_argument( 15 | '--num-display-poses', type=int, default=500, 16 | help='randomly display num-display-poses to avoid creating too many poses') 17 | parser.add_argument('--frustum-size', type=float, default=0.1) 18 | return parser.parse_args() 19 | 20 | 21 | def get_frustum(c2w: np.ndarray, 22 | sz=0.2, 23 | camera_height=None, 24 | camera_width=None, 25 | frustum_color=[1, 0, 0]) -> o3d.geometry.LineSet: 26 | """ 27 | Args: 28 | c2w: np.ndarray, 4x4 camera-to-world matrix 29 | sz: float, size (width) of the frustum 30 | Returns: 31 | frustum: o3d.geometry.TriangleMesh 32 | """ 33 | cen = [0, 0, 0] 34 | wid = sz 35 | if camera_height is not None and camera_width is not None: 36 | hei = wid * camera_height / camera_width 37 | else: 38 | hei = wid 39 | tl = [wid, hei, sz] 40 | tr = [-wid, hei, sz] 41 | br = [-wid, -hei, sz] 42 | bl = [wid, -hei, sz] 43 | points = np.float32([cen, tl, tr, br, bl]) 44 | lines = [ 45 | [0, 1], [0, 2], [0, 3], [0, 4], 46 | [1, 2], [2, 3], [3, 4], [4, 1],] 47 | frustum = o3d.geometry.LineSet() 48 | frustum.points = o3d.utility.Vector3dVector(points) 49 | frustum.lines = o3d.utility.Vector2iVector(lines) 50 | frustum.colors = o3d.utility.Vector3dVector([np.asarray([1, 0, 0])]) 51 | frustum.paint_uniform_color(frustum_color) 52 | 53 | frustum = frustum.transform(c2w) 54 | return frustum 55 | 56 | 57 | if __name__ == "__main__": 58 | args = parse_args() 59 | frustum_size = args.frustum_size 60 | 61 | vis = o3d.visualization.Visualizer() 62 | vis.create_window() 63 | with open(args.json_data, 'r') as f: 64 | model = json.load(f) 65 | 66 | """ Points """ 67 | points = model['points'] 68 | pcd_np = [v[:3] for v in points] 69 | pcd_rgb = [np.asarray(v[3:6]) / 255 for v in points] 70 | pcd = o3d.geometry.PointCloud() 71 | pcd.points = o3d.utility.Vector3dVector(pcd_np) 72 | pcd.colors = o3d.utility.Vector3dVector(pcd_rgb) 73 | vis.add_geometry(pcd, reset_bounding_box=True) 74 | 75 | """ Camear Poses """ 76 | camera = model['camera'] 77 | cam_h, cam_w = camera['height'], camera['width'] 78 | c2w_list = [get_c2w(img) for img in model['images'].values()] 79 | c2w_sel_inds = np.linspace(0, len(c2w_list)-1, args.num_display_poses).astype(int) 80 | c2w_sel = [c2w_list[i] for i in c2w_sel_inds] 81 | frustums = [ 82 | get_frustum(c2w, sz=frustum_size, camera_height=cam_h, camera_width=cam_w) 83 | for c2w in c2w_sel 84 | ] 85 | for frustum in frustums: 86 | vis.add_geometry(frustum, reset_bounding_box=True) 87 | 88 | """ Optional: Line """ 89 | if args.line_data is not None: 90 | line_set = o3d.geometry.LineSet() 91 | with open(args.line_data, 'r') as f: 92 | line_points = np.asarray(json.load(f)).reshape(2, 3) 93 | vc = line_points.mean(axis=0) 94 | dir = line_points[1] - line_points[0] 95 | lst = vc + 2 * dir 96 | led = vc - 2 * dir 97 | lines = [lst, led] 98 | line_set.points = o3d.utility.Vector3dVector(lines) 99 | line_set.lines = o3d.utility.Vector2iVector([[0, 1]]) 100 | vis.add_geometry(line_set, reset_bounding_box=True) 101 | 102 | control = vis.get_view_control() 103 | control.set_front([1, 1, 1]) 104 | control.set_lookat([0, 0, 0]) 105 | control.set_up([0, 0, 1]) 106 | control.set_zoom(1) 107 | 108 | vis.run() 109 | vis.destroy_window() 110 | -------------------------------------------------------------------------------- /tools/visualize_colmap_open3d.py: -------------------------------------------------------------------------------- 1 | import open3d as o3d 2 | import numpy as np 3 | from argparse import ArgumentParser 4 | from utils.base_type import ColmapModel 5 | from tools.visualise_data_open3d import get_c2w, get_frustum 6 | 7 | """TODO 8 | 1. Frustum, on/off 9 | 2. Line (saved in json) 10 | """ 11 | 12 | def parse_args(): 13 | parser = ArgumentParser() 14 | parser.add_argument('--model', help="path to direcctory containing images.bin", required=True) 15 | parser.add_argument('--pcd-path', help="path to fused.ply", default=None) 16 | parser.add_argument('--show-mesh-frame', default=False) 17 | parser.add_argument('--specify-frame-name', default=None) 18 | parser.add_argument( 19 | '--num-display-poses', type=int, default=500, 20 | help='randomly display num-display-poses to avoid creating too many poses') 21 | return parser.parse_args() 22 | 23 | if __name__ == "__main__": 24 | args = parse_args() 25 | 26 | model_path = args.model 27 | mod = ColmapModel(args.model) 28 | if args.pcd_path is not None: 29 | pcd = o3d.io.read_point_cloud(args.pcd_path) 30 | else: 31 | pcd_np = np.asarray([v.xyz for v in mod.points.values()]) 32 | pcd_rgb = np.asarray([v.rgb / 255 for v in mod.points.values()]) 33 | # Remove too far points from GUI -- usually noise 34 | pcd_np_center = np.mean(pcd_np, axis=0) 35 | pcd_ind = np.linalg.norm(pcd_np - pcd_np_center, axis=1) < 500 36 | pcd_np, pcd_rgb = pcd_np[pcd_ind], pcd_rgb[pcd_ind] 37 | 38 | pcd = o3d.geometry.PointCloud() 39 | pcd.points = o3d.utility.Vector3dVector(pcd_np) 40 | pcd.colors = o3d.utility.Vector3dVector(pcd_rgb) 41 | 42 | mesh_frame = o3d.geometry.TriangleMesh.create_coordinate_frame( 43 | size=1.0, origin=[0, 0, 0]) 44 | 45 | vis = o3d.visualization.Visualizer() 46 | vis.create_window() 47 | vis.add_geometry(pcd, reset_bounding_box=True) 48 | if args.show_mesh_frame: 49 | vis.add_geometry(mesh_frame, reset_bounding_box=True) 50 | 51 | frustum_size = 0.1 52 | camera = mod.camera 53 | cam_h, cam_w = camera.height, camera.width 54 | """ Camear Poses """ 55 | if args.specify_frame_name is not None: 56 | qvec, tvec = [ 57 | (v.qvec, v.tvec) for k, v in mod.images.items() if v.name == args.specify_frame_name][0] 58 | img_data = [qvec[0], qvec[1], qvec[2], qvec[3], tvec[0], tvec[1], tvec[2]] 59 | c2w = get_c2w(img_data) 60 | frustum = get_frustum(c2w, sz=frustum_size, camera_height=cam_h, camera_width=cam_w) 61 | vis.add_geometry(frustum, reset_bounding_box=True) 62 | else: 63 | qtvecs = [list(v.qvec) + list(v.tvec) for v in mod.images.values()] 64 | qtvecs = [qtvecs[i] 65 | for i in np.linspace(0, len(qtvecs)-1, args.num_display_poses).astype(int)] 66 | c2w_list = [get_c2w(img) for img in qtvecs] 67 | for c2w in c2w_list: 68 | frustum = get_frustum(c2w, sz=frustum_size, camera_height=cam_h, camera_width=cam_w) 69 | vis.add_geometry(frustum, reset_bounding_box=True) 70 | 71 | control = vis.get_view_control() 72 | control.set_front([1, 1, 1]) 73 | control.set_lookat([0, 0, 0]) 74 | control.set_up([0, 0, 1]) 75 | control.set_zoom(1.0) 76 | 77 | vis.run() 78 | vis.destroy_window() 79 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/utils/__init__.py -------------------------------------------------------------------------------- /utils/base_type.py: -------------------------------------------------------------------------------- 1 | from typing import List 2 | import json 3 | from functools import cached_property 4 | from utils.colmap_utils import ( 5 | read_cameras_binary, read_points3d_binary, 6 | read_images_binary, BaseImage) 7 | from utils.colmap_utils import Image as ColmapImage 8 | 9 | 10 | 11 | class ColmapModel: 12 | 13 | """ 14 | NOTE: this class shares commons codes with line_check.LineChecker, 15 | reuse these codes? 16 | """ 17 | def __init__(self, model_dir: str): 18 | 19 | def _as_list(path, func): 20 | return func(path) 21 | 22 | cameras = _as_list( 23 | f'{model_dir}/cameras.bin', read_cameras_binary) 24 | if len(cameras) != 1: 25 | print("Found more than one camera!") 26 | self.camera = cameras[1] 27 | self.points = _as_list( 28 | f'{model_dir}/points3D.bin', read_points3d_binary) 29 | self.images = _as_list( 30 | f'{model_dir}/images.bin', read_images_binary) 31 | 32 | def __repr__(self) -> str: 33 | return f'{self.num_images} images - {self.num_points} points' 34 | 35 | @property 36 | def example_data(self): 37 | ki = list(self.images.keys())[0] 38 | img = self.images[ki] 39 | kp = list(self.points.keys())[0] 40 | point = self.points[kp] 41 | return img, point 42 | 43 | @cached_property 44 | def ordered_image_ids(self): 45 | return sorted(self.images.keys(), key=lambda x: self.images[x].name) 46 | 47 | @property 48 | def num_points(self): 49 | return len(self.points) 50 | 51 | @property 52 | def num_images(self): 53 | return len(self.images) 54 | 55 | @property 56 | def ordered_images(self) -> List[BaseImage]: 57 | return [self.images[i] for i in self.ordered_image_ids] 58 | 59 | def get_image_by_id(self, image_id: int): 60 | return self.images[image_id] 61 | 62 | 63 | class JsonColmapModel: 64 | def __init__(self, json_path_or_dict): 65 | if isinstance(json_path_or_dict, str): 66 | with open(json_path_or_dict) as f: 67 | model = json.load(f) 68 | elif isinstance(json_path_or_dict, dict): 69 | model = json_path_or_dict 70 | self.camera = model['camera'] 71 | self.points = model['points'] 72 | self.images = [ 73 | model['images'][k] + [k] for k in sorted(model['images'].keys()) 74 | ] # qw, qx, qy, qz, tx, ty, tz, frame_name 75 | 76 | @property 77 | def ordered_image_ids(self): 78 | return list(range(len(self.images))) 79 | 80 | @property 81 | def ordered_images(self) -> List[ColmapImage]: 82 | return [self.get_image_by_id(i) for i in self.ordered_image_ids] 83 | 84 | def get_image_by_id(self, image_id: int) -> ColmapImage: 85 | img_info = self.images[image_id] 86 | cimg = ColmapImage( 87 | id=image_id, qvec=img_info[:4], tvec=img_info[4:7], camera_id=0, 88 | name=img_info[7], xys=[], point3D_ids=[]) 89 | return cimg 90 | -------------------------------------------------------------------------------- /utils/colmap_utils.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) 2018, ETH Zurich and UNC Chapel Hill. 2 | # All rights reserved. 3 | # 4 | # Redistribution and use in source and binary forms, with or without 5 | # modification, are permitted provided that the following conditions are met: 6 | # 7 | # * Redistributions of source code must retain the above copyright 8 | # notice, this list of conditions and the following disclaimer. 9 | # 10 | # * Redistributions in binary form must reproduce the above copyright 11 | # notice, this list of conditions and the following disclaimer in the 12 | # documentation and/or other materials provided with the distribution. 13 | # 14 | # * Neither the name of ETH Zurich and UNC Chapel Hill nor the names of 15 | # its contributors may be used to endorse or promote products derived 16 | # from this software without specific prior written permission. 17 | # 18 | # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 19 | # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20 | # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 21 | # ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE 22 | # LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 23 | # CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF 24 | # SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 25 | # INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN 26 | # CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) 27 | # ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28 | # POSSIBILITY OF SUCH DAMAGE. 29 | # 30 | # Author: Johannes L. Schoenberger (jsch at inf.ethz.ch) 31 | 32 | import os 33 | import sys 34 | import collections 35 | import numpy as np 36 | import struct 37 | 38 | 39 | CameraModel = collections.namedtuple( 40 | "CameraModel", ["model_id", "model_name", "num_params"]) 41 | Camera = collections.namedtuple( 42 | "Camera", ["id", "model", "width", "height", "params"]) 43 | BaseImage = collections.namedtuple( 44 | "Image", ["id", "qvec", "tvec", "camera_id", "name", "xys", "point3D_ids"]) 45 | Point3D = collections.namedtuple( 46 | "Point3D", ["id", "xyz", "rgb", "error", "image_ids", "point2D_idxs"]) 47 | 48 | class Image(BaseImage): 49 | def qvec2rotmat(self): 50 | return qvec2rotmat(self.qvec) 51 | 52 | 53 | CAMERA_MODELS = { 54 | CameraModel(model_id=0, model_name="SIMPLE_PINHOLE", num_params=3), 55 | CameraModel(model_id=1, model_name="PINHOLE", num_params=4), 56 | CameraModel(model_id=2, model_name="SIMPLE_RADIAL", num_params=4), 57 | CameraModel(model_id=3, model_name="RADIAL", num_params=5), 58 | CameraModel(model_id=4, model_name="OPENCV", num_params=8), 59 | CameraModel(model_id=5, model_name="OPENCV_FISHEYE", num_params=8), 60 | CameraModel(model_id=6, model_name="FULL_OPENCV", num_params=12), 61 | CameraModel(model_id=7, model_name="FOV", num_params=5), 62 | CameraModel(model_id=8, model_name="SIMPLE_RADIAL_FISHEYE", num_params=4), 63 | CameraModel(model_id=9, model_name="RADIAL_FISHEYE", num_params=5), 64 | CameraModel(model_id=10, model_name="THIN_PRISM_FISHEYE", num_params=12) 65 | } 66 | CAMERA_MODEL_IDS = dict([(camera_model.model_id, camera_model) \ 67 | for camera_model in CAMERA_MODELS]) 68 | 69 | 70 | def read_next_bytes(fid, num_bytes, format_char_sequence, endian_character="<"): 71 | """Read and unpack the next bytes from a binary file. 72 | :param fid: 73 | :param num_bytes: Sum of combination of {2, 4, 8}, e.g. 2, 6, 16, 30, etc. 74 | :param format_char_sequence: List of {c, e, f, d, h, H, i, I, l, L, q, Q}. 75 | :param endian_character: Any of {@, =, <, >, !} 76 | :return: Tuple of read and unpacked values. 77 | """ 78 | data = fid.read(num_bytes) 79 | return struct.unpack(endian_character + format_char_sequence, data) 80 | 81 | 82 | def read_cameras_text(path): 83 | """ 84 | see: src/base/reconstruction.cc 85 | void Reconstruction::WriteCamerasText(const std::string& path) 86 | void Reconstruction::ReadCamerasText(const std::string& path) 87 | """ 88 | cameras = {} 89 | with open(path, "r") as fid: 90 | while True: 91 | line = fid.readline() 92 | if not line: 93 | break 94 | line = line.strip() 95 | if len(line) > 0 and line[0] != "#": 96 | elems = line.split() 97 | camera_id = int(elems[0]) 98 | model = elems[1] 99 | width = int(elems[2]) 100 | height = int(elems[3]) 101 | params = np.array(tuple(map(float, elems[4:]))) 102 | cameras[camera_id] = Camera(id=camera_id, model=model, 103 | width=width, height=height, 104 | params=params) 105 | return cameras 106 | 107 | 108 | def read_cameras_binary(path_to_model_file): 109 | """ 110 | see: src/base/reconstruction.cc 111 | void Reconstruction::WriteCamerasBinary(const std::string& path) 112 | void Reconstruction::ReadCamerasBinary(const std::string& path) 113 | """ 114 | cameras = {} 115 | with open(path_to_model_file, "rb") as fid: 116 | num_cameras = read_next_bytes(fid, 8, "Q")[0] 117 | for camera_line_index in range(num_cameras): 118 | camera_properties = read_next_bytes( 119 | fid, num_bytes=24, format_char_sequence="iiQQ") 120 | camera_id = camera_properties[0] 121 | model_id = camera_properties[1] 122 | model_name = CAMERA_MODEL_IDS[camera_properties[1]].model_name 123 | width = camera_properties[2] 124 | height = camera_properties[3] 125 | num_params = CAMERA_MODEL_IDS[model_id].num_params 126 | params = read_next_bytes(fid, num_bytes=8*num_params, 127 | format_char_sequence="d"*num_params) 128 | cameras[camera_id] = Camera(id=camera_id, 129 | model=model_name, 130 | width=width, 131 | height=height, 132 | params=np.array(params)) 133 | assert len(cameras) == num_cameras 134 | return cameras 135 | 136 | 137 | def read_images_text(path): 138 | """ 139 | see: src/base/reconstruction.cc 140 | void Reconstruction::ReadImagesText(const std::string& path) 141 | void Reconstruction::WriteImagesText(const std::string& path) 142 | """ 143 | images = {} 144 | with open(path, "r") as fid: 145 | while True: 146 | line = fid.readline() 147 | if not line: 148 | break 149 | line = line.strip() 150 | if len(line) > 0 and line[0] != "#": 151 | elems = line.split() 152 | image_id = int(elems[0]) 153 | qvec = np.array(tuple(map(float, elems[1:5]))) 154 | tvec = np.array(tuple(map(float, elems[5:8]))) 155 | camera_id = int(elems[8]) 156 | image_name = elems[9] 157 | elems = fid.readline().split() 158 | xys = np.column_stack([tuple(map(float, elems[0::3])), 159 | tuple(map(float, elems[1::3]))]) 160 | point3D_ids = np.array(tuple(map(int, elems[2::3]))) 161 | images[image_id] = Image( 162 | id=image_id, qvec=qvec, tvec=tvec, 163 | camera_id=camera_id, name=image_name, 164 | xys=xys, point3D_ids=point3D_ids) 165 | return images 166 | 167 | 168 | def read_images_binary(path_to_model_file): 169 | """ 170 | see: src/base/reconstruction.cc 171 | void Reconstruction::ReadImagesBinary(const std::string& path) 172 | void Reconstruction::WriteImagesBinary(const std::string& path) 173 | """ 174 | images = {} 175 | with open(path_to_model_file, "rb") as fid: 176 | num_reg_images = read_next_bytes(fid, 8, "Q")[0] 177 | for image_index in range(num_reg_images): 178 | binary_image_properties = read_next_bytes( 179 | fid, num_bytes=64, format_char_sequence="idddddddi") 180 | image_id = binary_image_properties[0] 181 | qvec = np.array(binary_image_properties[1:5]) 182 | tvec = np.array(binary_image_properties[5:8]) 183 | camera_id = binary_image_properties[8] 184 | image_name = "" 185 | current_char = read_next_bytes(fid, 1, "c")[0] 186 | while current_char != b"\x00": # look for the ASCII 0 entry 187 | image_name += current_char.decode("utf-8") 188 | current_char = read_next_bytes(fid, 1, "c")[0] 189 | num_points2D = read_next_bytes(fid, num_bytes=8, 190 | format_char_sequence="Q")[0] 191 | x_y_id_s = read_next_bytes(fid, num_bytes=24*num_points2D, 192 | format_char_sequence="ddq"*num_points2D) 193 | xys = np.column_stack([tuple(map(float, x_y_id_s[0::3])), 194 | tuple(map(float, x_y_id_s[1::3]))]) 195 | point3D_ids = np.array(tuple(map(int, x_y_id_s[2::3]))) 196 | images[image_id] = Image( 197 | id=image_id, qvec=qvec, tvec=tvec, 198 | camera_id=camera_id, name=image_name, 199 | xys=xys, point3D_ids=point3D_ids) 200 | return images 201 | 202 | 203 | def read_points3D_text(path): 204 | """ 205 | see: src/base/reconstruction.cc 206 | void Reconstruction::ReadPoints3DText(const std::string& path) 207 | void Reconstruction::WritePoints3DText(const std::string& path) 208 | """ 209 | points3D = {} 210 | with open(path, "r") as fid: 211 | while True: 212 | line = fid.readline() 213 | if not line: 214 | break 215 | line = line.strip() 216 | if len(line) > 0 and line[0] != "#": 217 | elems = line.split() 218 | point3D_id = int(elems[0]) 219 | xyz = np.array(tuple(map(float, elems[1:4]))) 220 | rgb = np.array(tuple(map(int, elems[4:7]))) 221 | error = float(elems[7]) 222 | image_ids = np.array(tuple(map(int, elems[8::2]))) 223 | point2D_idxs = np.array(tuple(map(int, elems[9::2]))) 224 | points3D[point3D_id] = Point3D(id=point3D_id, xyz=xyz, rgb=rgb, 225 | error=error, image_ids=image_ids, 226 | point2D_idxs=point2D_idxs) 227 | return points3D 228 | 229 | 230 | def read_points3d_binary(path_to_model_file): 231 | """ 232 | see: src/base/reconstruction.cc 233 | void Reconstruction::ReadPoints3DBinary(const std::string& path) 234 | void Reconstruction::WritePoints3DBinary(const std::string& path) 235 | """ 236 | points3D = {} 237 | with open(path_to_model_file, "rb") as fid: 238 | num_points = read_next_bytes(fid, 8, "Q")[0] 239 | for point_line_index in range(num_points): 240 | binary_point_line_properties = read_next_bytes( 241 | fid, num_bytes=43, format_char_sequence="QdddBBBd") 242 | point3D_id = binary_point_line_properties[0] 243 | xyz = np.array(binary_point_line_properties[1:4]) 244 | rgb = np.array(binary_point_line_properties[4:7]) 245 | error = np.array(binary_point_line_properties[7]) 246 | track_length = read_next_bytes( 247 | fid, num_bytes=8, format_char_sequence="Q")[0] 248 | track_elems = read_next_bytes( 249 | fid, num_bytes=8*track_length, 250 | format_char_sequence="ii"*track_length) 251 | image_ids = np.array(tuple(map(int, track_elems[0::2]))) 252 | point2D_idxs = np.array(tuple(map(int, track_elems[1::2]))) 253 | points3D[point3D_id] = Point3D( 254 | id=point3D_id, xyz=xyz, rgb=rgb, 255 | error=error, image_ids=image_ids, 256 | point2D_idxs=point2D_idxs) 257 | return points3D 258 | 259 | 260 | def read_model(path, ext): 261 | if ext == ".txt": 262 | cameras = read_cameras_text(os.path.join(path, "cameras" + ext)) 263 | images = read_images_text(os.path.join(path, "images" + ext)) 264 | points3D = read_points3D_text(os.path.join(path, "points3D") + ext) 265 | else: 266 | cameras = read_cameras_binary(os.path.join(path, "cameras" + ext)) 267 | images = read_images_binary(os.path.join(path, "images" + ext)) 268 | points3D = read_points3d_binary(os.path.join(path, "points3D") + ext) 269 | return cameras, images, points3D 270 | 271 | 272 | def qvec2rotmat(qvec): 273 | return np.array([ 274 | [1 - 2 * qvec[2]**2 - 2 * qvec[3]**2, 275 | 2 * qvec[1] * qvec[2] - 2 * qvec[0] * qvec[3], 276 | 2 * qvec[3] * qvec[1] + 2 * qvec[0] * qvec[2]], 277 | [2 * qvec[1] * qvec[2] + 2 * qvec[0] * qvec[3], 278 | 1 - 2 * qvec[1]**2 - 2 * qvec[3]**2, 279 | 2 * qvec[2] * qvec[3] - 2 * qvec[0] * qvec[1]], 280 | [2 * qvec[3] * qvec[1] - 2 * qvec[0] * qvec[2], 281 | 2 * qvec[2] * qvec[3] + 2 * qvec[0] * qvec[1], 282 | 1 - 2 * qvec[1]**2 - 2 * qvec[2]**2]]) 283 | 284 | 285 | def rotmat2qvec(R): 286 | Rxx, Ryx, Rzx, Rxy, Ryy, Rzy, Rxz, Ryz, Rzz = R.flat 287 | K = np.array([ 288 | [Rxx - Ryy - Rzz, 0, 0, 0], 289 | [Ryx + Rxy, Ryy - Rxx - Rzz, 0, 0], 290 | [Rzx + Rxz, Rzy + Ryz, Rzz - Rxx - Ryy, 0], 291 | [Ryz - Rzy, Rzx - Rxz, Rxy - Ryx, Rxx + Ryy + Rzz]]) / 3.0 292 | eigvals, eigvecs = np.linalg.eigh(K) 293 | qvec = eigvecs[[3, 0, 1, 2], np.argmax(eigvals)] 294 | if qvec[0] < 0: 295 | qvec *= -1 296 | return qvec -------------------------------------------------------------------------------- /utils/hovering/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/epic-kitchens/epic-fields-code/e5cd4fdfbbf130a9f6266dcb536f17bb52136f64/utils/hovering/__init__.py -------------------------------------------------------------------------------- /utils/hovering/helper.py: -------------------------------------------------------------------------------- 1 | from typing import List 2 | import os 3 | import numpy as np 4 | from PIL import Image 5 | import open3d as o3d 6 | import matplotlib.pyplot as plt 7 | from open3d.visualization import rendering 8 | 9 | 10 | from utils.hovering.o3d_line_mesh import LineMesh 11 | 12 | 13 | class Helper: 14 | base_colors = { 15 | 'white': [1, 1, 1, 0.8], 16 | 'red': [1, 0, 0, 1], 17 | 'blue': [0, 0, 1,1], 18 | 'green': [0, 1, 0,1], 19 | 'yellow': [1, 1, 0,1], 20 | 'purple': [0.2, 0.2, 0.8, 1] 21 | } 22 | 23 | def __init__(self, point_size): 24 | self.point_size = point_size 25 | 26 | def material(self, color: str, shader="defaultUnlit") -> rendering.MaterialRecord: 27 | """ 28 | Args: 29 | shader: e.g.'defaultUnlit', 'defaultLit', 'depth', 'normal' 30 | see Open3D: cpp/open3d/visualization/rendering/filament/FilamentScene.cpp#L1109 31 | """ 32 | material = rendering.MaterialRecord() 33 | material.shader = shader 34 | material.base_color = self.base_colors[color] 35 | material.point_size = self.point_size 36 | return material 37 | 38 | def get_cam_pos(c2w: np.ndarray) -> np.ndarray: 39 | """ Get camera position in world coordinate system 40 | """ 41 | cen = np.float32([0, 0, 0, 1]) 42 | pos = c2w @ cen 43 | return pos[:3] 44 | 45 | 46 | # def get_frustum(c2w: np.ndarray, 47 | # sz=0.2, 48 | # camera_height=None, 49 | # camera_width=None, 50 | # frustum_color=[1, 0, 0]) -> o3d.geometry.LineSet: 51 | # """ 52 | # Args: 53 | # c2w: np.ndarray, 4x4 camera-to-world matrix 54 | # sz: float, size (width) of the frustum 55 | # Returns: 56 | # frustum: o3d.geometry.TriangleMesh 57 | # """ 58 | # cen = [0, 0, 0] 59 | # wid = sz 60 | # if camera_height is not None and camera_width is not None: 61 | # hei = wid * camera_height / camera_width 62 | # else: 63 | # hei = wid 64 | # tl = [wid, hei, sz] 65 | # tr = [-wid, hei, sz] 66 | # br = [-wid, -hei, sz] 67 | # bl = [wid, -hei, sz] 68 | # points = np.float32([cen, tl, tr, br, bl]) 69 | # lines = [ 70 | # [0, 1], [0, 2], [0, 3], [0, 4], 71 | # [1, 2], [2, 3], [3, 4], [4, 1],] 72 | # frustum = o3d.geometry.LineSet() 73 | # frustum.points = o3d.utility.Vector3dVector(points) 74 | # frustum.lines = o3d.utility.Vector2iVector(lines) 75 | # frustum.colors = o3d.utility.Vector3dVector([np.asarray([1, 0, 0])]) 76 | # frustum.paint_uniform_color(frustum_color) 77 | 78 | # frustum = frustum.transform(c2w) 79 | # return frustum 80 | 81 | 82 | def get_trajectory(pos_history, 83 | num_line=6, 84 | line_radius=0.15 85 | ) -> o3d.geometry.TriangleMesh: 86 | """ pos_history: absolute position history 87 | """ 88 | pos_history = np.asarray(pos_history)[-num_line:] 89 | colors = [0, 0, 0.6] 90 | line_mesh = LineMesh( 91 | points=pos_history, 92 | colors=colors, radius=line_radius) 93 | line_mesh.merge_cylinder_segments() 94 | path = line_mesh.cylinder_segments[0] 95 | return path 96 | 97 | 98 | def get_pretty_trajectory(pos_history, 99 | num_line=6, 100 | line_radius=0.15, 101 | darkness=1.0, 102 | ) -> List[o3d.geometry.TriangleMesh]: 103 | """ pos_history: absolute position history 104 | """ 105 | def generate_jet_colors(n, darkness=0.6): 106 | cmap = plt.get_cmap('jet') 107 | norm = plt.Normalize(vmin=0, vmax=n-1) 108 | colors = cmap(norm(np.arange(n))) 109 | # Convert RGBA to RGB 110 | colors_rgb = [] 111 | for color in colors: 112 | colors_rgb.append(color[:3] * darkness) 113 | 114 | return colors_rgb 115 | 116 | pos_history = np.asarray(pos_history)[-num_line:] 117 | colors = generate_jet_colors(len(pos_history), darkness) 118 | line_mesh = LineMesh( 119 | points=pos_history, 120 | colors=colors, radius=line_radius) 121 | return line_mesh.cylinder_segments 122 | 123 | 124 | """ Obtain Viewpoint from Open3D GUI """ 125 | def parse_o3d_gui_view_status(status: dict, render: rendering.OffscreenRenderer): 126 | """ Parse open3d GUI's view status and convert to OffscreenRenderer format. 127 | This will do the normalisation of front and compute eye vector (updated version of front) 128 | 129 | 130 | Args: 131 | status: Ctrl-C output from Open3D GUI 132 | render: OffscreenRenderer 133 | Output: 134 | params for render.setup_camera(fov, lookat, eye, up) 135 | """ 136 | cam_info = status['trajectory'][0] 137 | fov = cam_info['field_of_view'] 138 | lookat = np.asarray(cam_info['lookat']) 139 | front = np.asarray(cam_info['front']) 140 | front = front / np.linalg.norm(front) 141 | up = np.asarray(cam_info['up']) 142 | zoom = cam_info['zoom'] 143 | """ 144 | See Open3D/cpp/open3d/visualization/visualizer/ViewControl.cpp#L243: 145 | void ViewControl::SetProjectionParameters() 146 | """ 147 | right = np.cross(up, front) / np.linalg.norm(np.cross(up, front)) 148 | view_ratio = zoom * render.scene.bounding_box.get_max_extent() 149 | distance = view_ratio / np.tan(fov * 0.5 / 180.0 * np.pi) 150 | eye = lookat + front * distance 151 | return fov, lookat, eye, up 152 | 153 | 154 | def set_offscreen_as_gui(render: rendering.OffscreenRenderer, status: dict): 155 | """ Set offscreen renderer as GUI's view status 156 | """ 157 | fov, lookat, eye, up = parse_o3d_gui_view_status(status, render) 158 | render.setup_camera(fov, lookat, eye, up) -------------------------------------------------------------------------------- /utils/hovering/hover_open3d.py: -------------------------------------------------------------------------------- 1 | from argparse import ArgumentParser 2 | import os 3 | import glob 4 | import numpy as np 5 | from PIL import Image 6 | from tqdm import tqdm 7 | import json 8 | import cv2 9 | import open3d as o3d 10 | from open3d.visualization import rendering 11 | 12 | from utils.base_type import ColmapModel 13 | from utils.hovering.helper import ( 14 | Helper, 15 | get_cam_pos, 16 | get_trajectory, get_pretty_trajectory, set_offscreen_as_gui 17 | ) 18 | from tools.visualise_data_open3d import get_c2w, get_frustum 19 | 20 | from moviepy import editor 21 | from PIL import ImageDraw, ImageFont 22 | 23 | 24 | TRAJECTORY_LINE_RADIUS = 0.01 25 | 26 | 27 | def parse_args(): 28 | parser = ArgumentParser() 29 | parser.add_argument('--model', help="path to direcctory containing images.bin", required=True) 30 | parser.add_argument('--pcd-path', help="path to fused.ply", default=None) 31 | parser.add_argument('--view-path', type=str, required=True, 32 | help='path to the view file, copy-paste from open3d gui.') 33 | parser.add_argument('--out_dir', type=str, default='outputs/hovering/') 34 | args = parser.parse_args() 35 | return args 36 | 37 | 38 | class HoverRunner: 39 | 40 | fov = None 41 | lookat = None 42 | front = None 43 | up = None 44 | 45 | background_color = [1, 1, 1, 1.0] 46 | 47 | def __init__(self, out_size: str = 'big'): 48 | if out_size == 'big': 49 | out_size = (1920, 1080) 50 | else: 51 | out_size = (640, 480) 52 | self.render = rendering.OffscreenRenderer(*out_size) 53 | 54 | def setup(self, 55 | model: ColmapModel, 56 | pcd_path: str, 57 | viewstatus_path: str, 58 | out_dir: str, 59 | img_x0: int = 0, 60 | img_y0: int = 0, 61 | frustum_size: float = 0.2, 62 | frustum_line_width: float = 5): 63 | """ 64 | Args: 65 | model: 66 | viewstatus_path: 67 | path to viewstatus.json, CTRL-c output from Open3D gui 68 | out_dir: 69 | e.g. 'P34_104_out' 70 | """ 71 | self.model = model 72 | if pcd_path is not None: 73 | pcd = o3d.io.read_point_cloud(args.pcd_path) 74 | else: 75 | pcd_np = np.asarray([v.xyz for v in model.points.values()]) 76 | pcd_rgb = np.asarray([v.rgb / 255 for v in model.points.values()]) 77 | pcd = o3d.geometry.PointCloud() 78 | pcd.points = o3d.utility.Vector3dVector(pcd_np) 79 | pcd.colors = o3d.utility.Vector3dVector(pcd_rgb) 80 | self.transformed_pcd = pcd 81 | 82 | self.viewstatus_path = viewstatus_path 83 | self.out_dir = out_dir 84 | 85 | # Render Layout params 86 | # img_x0/img_y0: int. The top-left corner of the display image 87 | self.img_x0 = img_x0 88 | self.img_y0 = img_y0 89 | self.rgb_monitor_height = 456 90 | self.rgb_monitor_width = 456 91 | self.frustum_size = frustum_size 92 | self.frustum_line_width = frustum_line_width 93 | self.text_loc = (450, 1000) 94 | 95 | def test_single_frame(self, 96 | psize, 97 | img_index:int =None, 98 | clear_geometry: bool =True, 99 | lay_rgb_img: bool =True, 100 | sun_light: bool =False, 101 | show_first_frustum: bool =True, 102 | ): 103 | """ 104 | Args: 105 | psize: point size, 106 | probing a good point size is a bit tricky but very important! 107 | img_index: int. I.e. Frame number 108 | """ 109 | pcd = self.transformed_pcd 110 | 111 | if clear_geometry: 112 | self.render.scene.clear_geometry() 113 | 114 | # Get materials 115 | helper = Helper(point_size=psize) 116 | white = helper.material('white') 117 | red = helper.material('red', shader='unlitLine') 118 | red.line_width = self.frustum_line_width 119 | self.helper = helper 120 | 121 | # put on pcd 122 | self.render.scene.add_geometry('pcd', pcd, white) 123 | with open(self.viewstatus_path) as f: 124 | viewstatus = json.load(f) 125 | set_offscreen_as_gui(self.render, viewstatus) 126 | 127 | # now put frustum on canvas 128 | if img_index is None: 129 | img_index = 0 130 | c_image = self.model.ordered_images[img_index] 131 | c2w = get_c2w(list(c_image.qvec) + list(c_image.tvec)) 132 | frustum = get_frustum( 133 | c2w=c2w, sz=self.frustum_size, 134 | camera_height=self.rgb_monitor_height, 135 | camera_width=self.rgb_monitor_width) 136 | if show_first_frustum: 137 | self.render.scene.add_geometry('first_frustum', frustum, red) 138 | self.render.scene.set_background(self.background_color) 139 | 140 | if sun_light: 141 | self.render.scene.scene.set_sun_light( 142 | [0.707, 0.0, -.707], [1.0, 1.0, 1.0], 75000) 143 | self.render.scene.scene.enable_sun_light(True) 144 | else: 145 | self.render.scene.set_lighting( 146 | rendering.Open3DScene.NO_SHADOWS, (0, 0, 0)) 147 | self.render.scene.show_axes(False) 148 | 149 | img_buf = self.render.render_to_image() 150 | img = np.asarray(img_buf) 151 | test_img = self.model.read_rgb_from_name(c_image.name) 152 | test_img = cv2.resize( 153 | test_img, (self.rgb_monitor_width, self.rgb_monitor_height)) 154 | if lay_rgb_img: 155 | img[-self.rgb_monitor_height:, 156 | -self.rgb_monitor_width:] = test_img 157 | 158 | img_pil = Image.fromarray(img) 159 | I1 = ImageDraw.Draw(img_pil) 160 | myFont = ImageFont.truetype('FreeMono.ttf', 65) 161 | bbox = ( 162 | img.shape[1] - self.rgb_monitor_width, 163 | img.shape[0] - self.rgb_monitor_height, 164 | img.shape[1], 165 | img.shape[0]) 166 | # print(bbox) 167 | text = "Frame %d" % img_index 168 | I1.text(self.text_loc, text, font=myFont, fill =(0, 0, 0)) 169 | I1.rectangle(bbox, outline='red', width=5) 170 | img = np.asarray(img_pil) 171 | return img 172 | 173 | def run_all(self, step, traj_len=10): 174 | """ 175 | Args: 176 | step: int. Render every `step` frames 177 | traj_len: int. Number of trajectory lines to show 178 | """ 179 | render = self.render 180 | os.makedirs(self.out_dir, exist_ok=True) 181 | out_fmt = os.path.join(self.out_dir, '%010d.jpg') 182 | red_m = self.helper.material('red', shader='unlitLine') 183 | red_m.line_width = self.frustum_line_width 184 | white_m = self.helper.material('white') 185 | 186 | render.scene.remove_geometry('first_frustum') 187 | 188 | myFont = ImageFont.truetype('FreeMono.ttf', 65) 189 | bbox = (1464, 624, 1920, 1080) 190 | 191 | pos_history = [] 192 | num_images = self.model.num_images 193 | for frame_idx in tqdm(range(0, num_images, step), total=num_images//step): 194 | c_image = self.model.ordered_images[frame_idx] 195 | frame_rgb = self.model.read_rgb_from_name(c_image.name) 196 | frame_rgb = cv2.resize( 197 | frame_rgb, (self.rgb_monitor_width, self.rgb_monitor_height)) 198 | c2w = get_c2w(list(c_image.qvec) + list(c_image.tvec)) 199 | frustum = get_frustum( 200 | c2w=c2w, sz=self.frustum_size, 201 | camera_height=self.rgb_monitor_height, 202 | camera_width=self.rgb_monitor_width) 203 | pos_history.append(get_cam_pos(c2w)) 204 | 205 | if len(pos_history) > 2: 206 | # lines = get_pretty_trajectory( 207 | traj = get_trajectory( 208 | pos_history, num_line=traj_len, 209 | line_radius=TRAJECTORY_LINE_RADIUS) 210 | if render.scene.has_geometry('traj'): 211 | render.scene.remove_geometry('traj') 212 | render.scene.add_geometry('traj', traj, white_m) 213 | render.scene.add_geometry('frustum', frustum, red_m) 214 | 215 | img = render.render_to_image() 216 | img = np.asarray(img) 217 | img[-self.rgb_monitor_height:, 218 | -self.rgb_monitor_width:] = frame_rgb 219 | img_pil = Image.fromarray(img) 220 | 221 | I1 = ImageDraw.Draw(img_pil) 222 | text = "Frame %d" % frame_idx 223 | I1.text(self.text_loc, text, font=myFont, fill =(0, 0, 0)) 224 | I1.rectangle(bbox, outline='red', width=5) 225 | img_pil.save(out_fmt % frame_idx) 226 | 227 | render.scene.remove_geometry('frustum') 228 | 229 | # Gen output 230 | video_fps = 20 231 | print("Generating video...") 232 | seq = sorted(glob.glob(os.path.join(self.out_dir, '*.jpg'))) 233 | clip = editor.ImageSequenceClip(seq, fps=video_fps) 234 | clip.write_videofile(os.path.join(self.out_dir, 'out.mp4')) 235 | 236 | 237 | if __name__ == '__main__': 238 | args = parse_args() 239 | model = ColmapModel(args.model) 240 | model.read_rgb_from_name = \ 241 | lambda name: np.asarray(Image.open(f"outputs/demo/frames/{name}")) 242 | runner = HoverRunner() 243 | runner.setup( 244 | model, 245 | pcd_path=args.pcd_path, 246 | viewstatus_path=args.view_path, 247 | out_dir=args.out_dir, 248 | frustum_size=1, 249 | frustum_line_width=1) 250 | runner.test_single_frame(0.1) 251 | runner.run_all(step=3, traj_len=10) 252 | -------------------------------------------------------------------------------- /utils/hovering/o3d_line_mesh.py: -------------------------------------------------------------------------------- 1 | """Module which creates mesh lines from a line set 2 | Open3D relies upon using glLineWidth to set line width on a LineSet 3 | However, this method is now deprecated and not fully supporeted in newer OpenGL versions 4 | See: 5 | Open3D Github Pull Request - https://github.com/intel-isl/Open3D/pull/738 6 | Other Framework Issues - https://github.com/openframeworks/openFrameworks/issues/3460 7 | 8 | This module aims to solve this by converting a line into a triangular mesh (which has thickness) 9 | The basic idea is to create a cylinder for each line segment, translate it, and then rotate it. 10 | 11 | License: MIT 12 | 13 | """ 14 | import numpy as np 15 | import open3d as o3d 16 | 17 | 18 | def align_vector_to_another(a=np.array([0, 0, 1]), b=np.array([1, 0, 0])): 19 | """ 20 | Aligns vector a to vector b with axis angle rotation 21 | """ 22 | if np.array_equal(a, b): 23 | return None, None 24 | axis_ = np.cross(a, b) 25 | axis_ = axis_ / np.linalg.norm(axis_) 26 | angle = np.arccos(np.dot(a, b)) 27 | 28 | return axis_, angle 29 | 30 | 31 | def normalized(a, axis=-1, order=2): 32 | """Normalizes a numpy array of points""" 33 | l2 = np.atleast_1d(np.linalg.norm(a, order, axis)) 34 | l2[l2 == 0] = 1 35 | return a / np.expand_dims(l2, axis), l2 36 | 37 | 38 | class LineMesh(object): 39 | def __init__(self, points, lines=None, colors=[0, 1, 0], radius=0.15): 40 | """Creates a line represented as sequence of cylinder triangular meshes 41 | 42 | Arguments: 43 | points {ndarray} -- Numpy array of ponts Nx3. 44 | 45 | Keyword Arguments: 46 | lines {list[list] or None} -- List of point index pairs denoting line segments. If None, implicit lines from ordered pairwise points. (default: {None}) 47 | colors {list} -- list of colors, or single color of the line (default: {[0, 1, 0]}) 48 | radius {float} -- radius of cylinder (default: {0.15}) 49 | """ 50 | self.points = np.array(points) 51 | self.lines = np.array( 52 | lines) if lines is not None else self.lines_from_ordered_points(self.points) 53 | self.colors = np.array(colors) 54 | self.radius = radius 55 | self.cylinder_segments = [] 56 | 57 | self.create_line_mesh() 58 | 59 | @staticmethod 60 | def lines_from_ordered_points(points): 61 | lines = [[i, i + 1] for i in range(0, points.shape[0] - 1, 1)] 62 | return np.array(lines) 63 | 64 | def create_line_mesh(self): 65 | first_points = self.points[self.lines[:, 0], :] 66 | second_points = self.points[self.lines[:, 1], :] 67 | line_segments = second_points - first_points 68 | line_segments_unit, line_lengths = normalized(line_segments) 69 | 70 | z_axis = np.array([0, 0, 1]) 71 | # Create triangular mesh cylinder segments of line 72 | for i in range(line_segments_unit.shape[0]): 73 | line_segment = line_segments_unit[i, :] 74 | line_length = line_lengths[i] 75 | # get axis angle rotation to allign cylinder with line segment 76 | axis, angle = align_vector_to_another(z_axis, line_segment) 77 | # Get translation vector 78 | translation = first_points[i, :] + line_segment * line_length * 0.5 79 | # create cylinder and apply transformations 80 | cylinder_segment = o3d.geometry.TriangleMesh.create_cylinder( 81 | self.radius, line_length) 82 | cylinder_segment = cylinder_segment.translate( 83 | translation, relative=False) 84 | if axis is not None: 85 | axis_a = axis * angle 86 | rot = o3d.geometry.get_rotation_matrix_from_axis_angle(axis_a) 87 | cylinder_segment = cylinder_segment.rotate( 88 | R=rot, center=cylinder_segment.get_center()) 89 | # cylinder_segment = cylinder_segment.rotate( 90 | # axis_a, center=True, type=o3d.geometry.RotationType.AxisAngle) 91 | # color cylinder 92 | color = self.colors if self.colors.ndim == 1 else self.colors[i, :] 93 | cylinder_segment.paint_uniform_color(color) 94 | 95 | self.cylinder_segments.append(cylinder_segment) 96 | 97 | def merge_cylinder_segments(self): 98 | 99 | vertices_list = [np.asarray(mesh.vertices) for mesh in self.cylinder_segments] 100 | triangles_list = [np.asarray(mesh.triangles) for mesh in self.cylinder_segments] 101 | triangles_offset = np.cumsum([v.shape[0] for v in vertices_list]) 102 | triangles_offset = np.insert(triangles_offset, 0, 0)[:-1] 103 | 104 | vertices = np.vstack(vertices_list) 105 | triangles = np.vstack([triangle + offset for triangle, offset in zip(triangles_list, triangles_offset)]) 106 | 107 | merged_mesh = o3d.geometry.TriangleMesh(o3d.open3d.utility.Vector3dVector(vertices), 108 | o3d.open3d.utility.Vector3iVector(triangles)) 109 | color = self.colors if self.colors.ndim == 1 else self.colors[0] 110 | merged_mesh.paint_uniform_color(color) 111 | self.cylinder_segments = [merged_mesh] 112 | 113 | def add_line(self, vis): 114 | """Adds this line to the visualizer""" 115 | for cylinder in self.cylinder_segments: 116 | vis.add_geometry(cylinder) 117 | 118 | def remove_line(self, vis): 119 | """Removes this line from the visualizer""" 120 | for cylinder in self.cylinder_segments: 121 | vis.remove_geometry(cylinder) 122 | 123 | 124 | def main(): 125 | print("Demonstrating LineMesh vs LineSet") 126 | # Create Line Set 127 | points = [[0, 0, 0], [1, 0, 0], [0, 1, 0], [1, 1, 0], [0, 0, 1], [1, 0, 1], 128 | [0, 1, 1], [1, 1, 1]] 129 | lines = [[0, 1], [0, 2], [1, 3], [2, 3], [4, 5], [4, 6], [5, 7], [6, 7], 130 | [0, 4], [1, 5], [2, 6], [3, 7]] 131 | colors = [[1, 0, 0] for i in range(len(lines))] 132 | 133 | line_set = o3d.geometry.LineSet() 134 | line_set.points = o3d.utility.Vector3dVector(points) 135 | line_set.lines = o3d.utility.Vector2iVector(lines) 136 | line_set.colors = o3d.utility.Vector3dVector(colors) 137 | 138 | # Create Line Mesh 1 139 | points = np.array(points) + [0, 0, 2] 140 | line_mesh1 = LineMesh(points, lines, colors, radius=0.02) 141 | line_mesh1_geoms = line_mesh1.cylinder_segments 142 | 143 | # Create Line Mesh 1 144 | points = np.array(points) + [0, 2, 0] 145 | line_mesh2 = LineMesh(points, radius=0.03) 146 | line_mesh2_geoms = line_mesh2.cylinder_segments 147 | 148 | o3d.visualization.draw_geometries( 149 | [line_set, *line_mesh1_geoms, *line_mesh2_geoms]) 150 | 151 | 152 | if __name__ == "__main__": 153 | main() 154 | 155 | -------------------------------------------------------------------------------- /utils/lib.py: -------------------------------------------------------------------------------- 1 | import pycolmap 2 | import shutil 3 | import os 4 | import glob 5 | import subprocess 6 | 7 | def get_num_images(model_path): 8 | reconstruction = pycolmap.Reconstruction(model_path) 9 | num_images = reconstruction.num_images() 10 | return num_images 11 | 12 | def read_lines_from_file(filename): 13 | """ 14 | Read lines from a txt file and return them as a list. 15 | 16 | :param filename: Name of the file to read from. 17 | :return: List of lines from the file. 18 | """ 19 | with open(filename, 'r') as file: 20 | lines = file.readlines() 21 | 22 | # Strip any trailing newline characters 23 | return [line.strip() for line in lines] 24 | 25 | def keep_model_with_largest_images(reconstuction_path): 26 | all_models = sorted(glob.glob(os.path.join(reconstuction_path,'*'))) 27 | try: 28 | max_images = get_num_images(all_models[0]) 29 | except: 30 | return 0 31 | selected_model = all_models[0] 32 | if len(all_models) > 1: 33 | for model in all_models: 34 | num_images = get_num_images(model) 35 | if num_images > max_images: 36 | max_images = num_images 37 | selected_model = model 38 | 39 | for model in all_models: 40 | if model != selected_model: 41 | shutil.rmtree(model) 42 | os.rename(selected_model,os.path.join(reconstuction_path,'0')) 43 | return max_images 44 | 45 | # Define the function to execute in each process 46 | def run_script(script_path, arg): 47 | cmd = ['python3', script_path] + arg 48 | print(cmd) 49 | subprocess.call(cmd) --------------------------------------------------------------------------------