├── Network.png ├── README.md ├── core ├── BPnP.py ├── camera_model.py ├── corr.py ├── correlation_package │ ├── LICENSE │ ├── correlation.py │ ├── correlation_cuda.cc │ ├── correlation_cuda_kernel.cu │ ├── correlation_cuda_kernel.cuh │ ├── pyproject.toml │ └── setup.py ├── data_preprocess.py ├── datasets_kitti.py ├── depth_completion.py ├── extractor.py ├── flow2pose.py ├── flow_viz.py ├── losses.py ├── quaternion_distances.py ├── raft.py ├── update.py ├── utils.py ├── utils_point.py └── visibility_package │ ├── setup.py │ ├── visibility.cpp │ ├── visibility_kernel.cu │ ├── visibility_kernel_new.cu │ └── visibility_new.cpp ├── demo.py ├── doc └── I2D-Loc--Camera-localization-via-image_2022_ISPRS-Journal-of-Photogrammetry-.pdf ├── main.py ├── main_bpnp.py ├── preprocess └── kitti_maps.py ├── requirements.txt └── sample ├── image ├── 000500.png ├── 001000.png ├── 001500.png └── 002000.png ├── overlay ├── 000500.png ├── 001000.png ├── 001500.png └── 002000.png └── pc ├── 000500.h5 ├── 001000.h5 ├── 001500.h5 └── 002000.h5 /Network.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/Network.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # I2D-Loc 2 | This repository contains the source code for our paper: 3 | 4 | [I2D-Loc: Camera localization via image to LiDAR depth flow](https://www.sciencedirect.com/science/article/pii/S0924271622002775?dgcid=coauthor)
5 | ISPRS 2022
6 | Kuangyi Chen, Huai Yu, Wen Yang, Lei Yu, Sebastian Scherer and Gui-Song Xia
7 | 8 |

9 | 10 | ## Requirements 11 | The code has been trained and tested with PyTorch 1.12 and Cuda 11.6. 12 | ```Shell 13 | conda create -n i2d python=3.7 -y 14 | conda activate i2d 15 | pip install -r requirements.txt 16 | pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu116 17 | cd core/correlation_package 18 | python setup.py install 19 | cd .. 20 | cd visibility_package 21 | python setup.py install 22 | cd ../.. 23 | ``` 24 | 25 | ## Demos 26 | Pretrained models can be downloaded from [google drive](https://drive.google.com/drive/folders/19VWNCPR1me7SnON1NYJRFrdgd1sKj052?usp=sharing) 27 | 28 | You can demo a trained model on a sequence of frames 29 | ```Shell 30 | python demo.py --load_checkpoints checkpoints/2_10/checkpoints.pth --render 31 | ``` 32 | 33 | ## Required Data 34 | To evaluate/train I2D-Loc, you will need to download the required datasets. 35 | * [KITTI](https://www.cvlibs.net/datasets/kitti/eval_odometry.php) 36 | 37 | We trained and tested I2D-Loc on the KITTI odometry sequences 00, 03, 05, 07, 08, and 09. 38 | To obtain the whole LiDAR maps, we aggregate all scans at their ground truth positions. 39 | Then, we down-sample the LiDAR maps at a resolution of 0.1m. The downsampled point clouds are saved as h5 files. 40 | 41 | Use the script preprocess/kitti_maps.py with the ground truth files in data/ to generate the h5 files. 42 | 43 | ```Shell 44 | python preprocess/kitti_maps.py --sequence 00 --kitti_folder ./KITTI_ODOMETRY/ 45 | python preprocess/kitti_maps.py --sequence 03 --kitti_folder ./KITTI_ODOMETRY/ 46 | python preprocess/kitti_maps.py --sequence 05 --kitti_folder ./KITTI_ODOMETRY/ 47 | python preprocess/kitti_maps.py --sequence 06 --kitti_folder ./KITTI_ODOMETRY/ 48 | python preprocess/kitti_maps.py --sequence 07 --kitti_folder ./KITTI_ODOMETRY/ 49 | python preprocess/kitti_maps.py --sequence 08 --kitti_folder ./KITTI_ODOMETRY/ --end 3000 50 | python preprocess/kitti_maps.py --sequence 08 --kitti_folder ./KITTI_ODOMETRY/ --start 3000 51 | python preprocess/kitti_maps.py --sequence 09 --kitti_folder ./KITTI_ODOMETRY/ 52 | ``` 53 | 54 | The final directory structure should look like: 55 | 56 | ```Shell 57 | ├── datasets 58 | ├── KITTI 59 | ├── sequences 60 | ├── 00 61 | ├── image_2 62 | ├── *.png 63 | ├── local_maps_0.1 64 | ├── *.h5 65 | ├── calib.txt 66 | ├── map-00_0.1_0-4541.pcd 67 | ├── poses.csv 68 | ├── 03 69 | ├── 05 70 | ├── 06 71 | ├── 07 72 | ├── 08 73 | ├── 09 74 | ``` 75 | 76 | ## Evaluation 77 | You can evaluate a trained model using `main.py` 78 | ```Shell 79 | python main.py --data_path /data/KITTI/sequences --load_checkpoints checkpoints/2_10/checkpoints.pth -e 80 | ``` 81 | 82 | 83 | ## Training 84 | You can train a model using `main.py`. Training logs will be written to the `runs` which can be visualized using tensorboard. 85 | ```Shell 86 | python main.py --data_path /data/KITTI/sequences --test_sequence 00 --epochs 100 --batch_size 2 --lr 4e-5 --gpus 0 --max_r 10. --max_t 2. --evaluate_interval 1 87 | ``` 88 | If you want to train a model using BPnP as back-end, you can use `main_bpnp.py`. 89 | ```Shell 90 | python main_bpnp.py --data_path /data/KITTI/sequences --test_sequence 00 --epochs 100 --batch_size 2 --lr 4e-5 --gpus 0 --max_r 10. --max_t 2. --evaluate_interval 1 91 | ``` 92 | 93 | 94 | ## Citation 95 | ``` 96 | @inproceedings{CHEN2022209, 97 | title={{I2D-Loc: Camera Localization via Image to LiDAR Depth Flow}}, 98 | author={Kuangyi Chen, Huai Yu, Wen Yang, Lei Yu, Sebastian Scherer and Gui-Song Xia}, 99 | booktitle={ISPRS Journal of Photogrammetry and Remote Sensing}, 100 | volume = {194}, 101 | pages = {209-221}, 102 | year={2022}, 103 | issn = {0924-2716} 104 | } 105 | ``` 106 | 107 | ## Acknowledgments 108 | The code is based on [CMRNet](https://github.com/cattaneod/CMRNet), [RAFT](https://github.com/princeton-vl/RAFT), and [BPnP](https://github.com/BoChenYS/BPnP). -------------------------------------------------------------------------------- /core/BPnP.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import cv2 as cv 3 | import numpy as np 4 | import kornia as kn 5 | 6 | PTS_D_DEVICE = 'cuda' 7 | 8 | class BPnP(torch.autograd.Function): 9 | """ 10 | Back-propagatable PnP 11 | INPUTS: 12 | pts2d - the 2D keypoints coordinates of size [batch_size, num_keypoints, 2] 13 | pts3d - the 3D keypoints coordinates of size [num_keypoints, 3] 14 | K - the camera intrinsic matrix of size [3, 3] 15 | OUTPUT: 16 | P_6d - the 6 DOF poses of size [batch_size, 6], where the first 3 elements of each row are the angle-axis rotation 17 | vector (Euler vector) and the last 3 elements are the translation vector. 18 | NOTE: 19 | This BPnP function assumes that all sets of 2D points in the mini-batch correspond to one common set of 3D points. 20 | For situations where pts3d is also a mini-batch, use the BPnP_m3d class. 21 | """ 22 | @staticmethod 23 | def forward(ctx, pts2d, pts3d, K, ini_pose=None): 24 | bs = pts2d.size(0) 25 | n = pts2d.size(1) 26 | device = pts2d.device 27 | pts3d_np = np.array(pts3d.detach().cpu()) 28 | K_np = np.array(K.detach().cpu()) 29 | P_6d = torch.zeros(bs,6,device=device) 30 | 31 | for i in range(bs): 32 | pts2d_i_np = np.ascontiguousarray(pts2d[i].detach().cpu()).reshape((n,1,2)) 33 | if ini_pose is None: 34 | _, rvec0, T0, _ = cv.solvePnPRansac(objectPoints=pts3d_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, confidence=0.9999 ,reprojectionError=3) 35 | else: 36 | rvec0 = np.array(ini_pose[i, 0:3].cpu().view(3, 1)) 37 | T0 = np.array(ini_pose[i, 3:6].cpu().view(3, 1)) 38 | _, rvec, T = cv.solvePnP(objectPoints=pts3d_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, useExtrinsicGuess=True, rvec=rvec0, tvec=T0) 39 | angle_axis = torch.tensor(rvec,device=device,dtype=torch.float).view(1, 3) 40 | T = torch.tensor(T,device=device,dtype=torch.float).view(1, 3) 41 | P_6d[i,:] = torch.cat((angle_axis,T),dim=-1) 42 | 43 | ctx.save_for_backward(pts2d,P_6d,pts3d,K) 44 | return P_6d 45 | 46 | @staticmethod 47 | def backward(ctx, grad_output): 48 | 49 | pts2d, P_6d, pts3d, K = ctx.saved_tensors 50 | device = pts2d.device 51 | bs = pts2d.size(0) 52 | n = pts2d.size(1) 53 | m = 6 54 | 55 | grad_x = torch.zeros_like(pts2d) 56 | grad_z = torch.zeros_like(pts3d) 57 | grad_K = torch.zeros_like(K) 58 | 59 | for i in range(bs): 60 | J_fy = torch.zeros(m,m, device=device) 61 | J_fx = torch.zeros(m,2*n, device=device) 62 | J_fz = torch.zeros(m,3*n, device=device) 63 | J_fK = torch.zeros(m, 9, device=device) 64 | 65 | torch.set_grad_enabled(True) 66 | pts2d_flat = pts2d[i].clone().view(-1).detach().requires_grad_() 67 | P_6d_flat = P_6d[i].clone().view(-1).detach().requires_grad_() 68 | pts3d_flat = pts3d.clone().view(-1).detach().requires_grad_() 69 | K_flat = K.clone().view(-1).detach().requires_grad_() 70 | 71 | for j in range(m): 72 | torch.set_grad_enabled(True) 73 | if j > 0: 74 | pts2d_flat.grad.zero_() 75 | P_6d_flat.grad.zero_() 76 | pts3d_flat.grad.zero_() 77 | K_flat.grad.zero_() 78 | 79 | R = kn.angle_axis_to_rotation_matrix(P_6d_flat[0:m-3].view(1,3)) 80 | 81 | P = torch.cat((R[0,0:3,0:3].view(3,3), P_6d_flat[m-3:m].view(3,1)),dim=-1) 82 | KP = torch.mm(K_flat.view(3,3), P) 83 | pts2d_i = pts2d_flat.view(n,2).transpose(0,1) 84 | pts3d_i = torch.cat((pts3d_flat.view(n,3),torch.ones(n,1,device=device)),dim=-1).t() 85 | proj_i = KP.mm(pts3d_i) 86 | Si = proj_i[2,:].view(1,n) 87 | 88 | r = pts2d_i*Si-proj_i[0:2,:] 89 | coefs = get_coefs(P_6d_flat.view(1,6), pts3d_flat.view(n,3), K_flat.view(3,3)) 90 | coef = coefs[:,:,j].transpose(0,1) # size: [2,n] 91 | fj = (coef*r).sum() 92 | fj.backward() 93 | J_fy[j,:] = P_6d_flat.grad.clone() 94 | J_fx[j,:] = pts2d_flat.grad.clone() 95 | J_fz[j,:] = pts3d_flat.grad.clone() 96 | J_fK[j,:] = K_flat.grad.clone() 97 | 98 | inv_J_fy = torch.inverse(J_fy) 99 | 100 | J_yx = (-1) * torch.mm(inv_J_fy, J_fx) 101 | J_yz = (-1) * torch.mm(inv_J_fy, J_fz) 102 | J_yK = (-1) * torch.mm(inv_J_fy, J_fK) 103 | 104 | grad_x[i] = grad_output[i].view(1,m).mm(J_yx).view(n,2) 105 | grad_z += grad_output[i].view(1,m).mm(J_yz).view(n,3) 106 | grad_K += grad_output[i].view(1,m).mm(J_yK).view(3,3) 107 | 108 | return grad_x, grad_z, grad_K, None 109 | 110 | 111 | class BPnP_m3d(torch.autograd.Function): 112 | """ 113 | BPnP_m3d supports mini-batch intputs of 3D keypoints, where the i-th set of 2D keypoints correspond to the i-th set of 3D keypoints. 114 | INPUTS: 115 | pts2d - the 2D keypoints coordinates of size [batch_size, num_keypoints, 2] 116 | pts3d - the 3D keypoints coordinates of size [batch_size, num_keypoints, 3] 117 | K - the camera intrinsic matrix of size [3, 3] 118 | OUTPUT: 119 | P_6d - the 6 DOF poses of size [batch_size, 6], where the first 3 elements of each row are the angle-axis rotation 120 | vector (Euler vector) and the last 3 elements are the translation vector. 121 | NOTE: 122 | For situations where all sets of 2D points in the mini-batch correspond to one common set of 3D points, use the BPnP class. 123 | """ 124 | @staticmethod 125 | def forward(ctx, pts2d, pts3d, K, ini_pose=None): 126 | bs = pts2d.size(0) 127 | n = pts2d.size(1) 128 | device = pts2d.device 129 | K_np = np.array(K.detach().cpu()) 130 | P_6d = torch.zeros(bs,6,device=device) 131 | 132 | for i in range(bs): 133 | pts2d_i_np = np.ascontiguousarray(pts2d[i].detach().cpu()).reshape((n,1,2)) 134 | pts3d_i_np = np.ascontiguousarray(pts3d[i].detach().cpu()).reshape((n,3)) 135 | if ini_pose is None: 136 | _, rvec0, T0, _ = cv.solvePnPRansac(objectPoints=pts3d_i_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, confidence=0.9999 ,reprojectionError=1) 137 | else: 138 | rvec0 = np.array(ini_pose[i, 0:3].cpu().view(3, 1)) 139 | T0 = np.array(ini_pose[i, 3:6].cpu().view(3, 1)) 140 | _, rvec, T = cv.solvePnP(objectPoints=pts3d_i_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, useExtrinsicGuess=True, rvec=rvec0, tvec=T0) 141 | angle_axis = torch.tensor(rvec,device=device,dtype=torch.float).view(1, 3) 142 | T = torch.tensor(T,device=device,dtype=torch.float).view(1, 3) 143 | P_6d[i,:] = torch.cat((angle_axis,T),dim=-1) 144 | 145 | ctx.save_for_backward(pts2d,P_6d,pts3d,K) 146 | return P_6d 147 | 148 | @staticmethod 149 | def backward(ctx, grad_output): 150 | 151 | pts2d, P_6d, pts3d, K = ctx.saved_tensors 152 | device = PTS_D_DEVICE 153 | bs = pts2d.size(0) 154 | n = pts2d.size(1) 155 | m = 6 156 | 157 | grad_x = torch.zeros_like(pts2d) 158 | grad_z = torch.zeros_like(pts3d) 159 | grad_K = torch.zeros_like(K) 160 | 161 | for i in range(bs): 162 | J_fy = torch.zeros(m,m, device=device) 163 | J_fx = torch.zeros(m,2*n, device=device) 164 | J_fz = torch.zeros(m,3*n, device=device) 165 | J_fK = torch.zeros(m, 9, device=device) 166 | 167 | torch.set_grad_enabled(True) 168 | pts2d_flat = pts2d[i].clone().view(-1).detach().requires_grad_() 169 | P_6d_flat = P_6d[i].clone().view(-1).detach().requires_grad_() 170 | pts3d_flat = pts3d[i].clone().view(-1).detach().requires_grad_() 171 | K_flat = K.clone().view(-1).detach().requires_grad_() 172 | 173 | for j in range(m): 174 | torch.set_grad_enabled(True) 175 | if j > 0: 176 | pts2d_flat.grad.zero_() 177 | P_6d_flat.grad.zero_() 178 | pts3d_flat.grad.zero_() 179 | K_flat.grad.zero_() 180 | 181 | R = kn.angle_axis_to_rotation_matrix(P_6d_flat[0:m-3].view(1,3)) 182 | 183 | P = torch.cat((R[0,0:3,0:3].view(3,3), P_6d_flat[m-3:m].view(3,1)),dim=-1) 184 | KP = torch.mm(K_flat.view(3,3), P) 185 | pts2d_i = pts2d_flat.view(n,2).transpose(0,1) 186 | pts3d_i = torch.cat((pts3d_flat.view(n,3),torch.ones(n,1,device=device)),dim=-1).t() 187 | proj_i = KP.mm(pts3d_i) 188 | Si = proj_i[2,:].view(1,n) 189 | 190 | r = pts2d_i*Si-proj_i[0:2,:] 191 | coefs = get_coefs(P_6d_flat.view(1,6), pts3d_flat.view(n,3), K_flat.view(3,3)) 192 | coef = coefs[:,:,j].transpose(0,1) # size: [2,n] 193 | fj = (coef*r).sum() 194 | fj.backward() 195 | J_fy[j,:] = P_6d_flat.grad.clone() 196 | J_fx[j,:] = pts2d_flat.grad.clone() 197 | J_fz[j,:] = pts3d_flat.grad.clone() 198 | J_fK[j,:] = K_flat.grad.clone() 199 | 200 | inv_J_fy = torch.inverse(J_fy) 201 | 202 | J_yx = (-1) * torch.mm(inv_J_fy, J_fx) 203 | J_yz = (-1) * torch.mm(inv_J_fy, J_fz) 204 | J_yK = (-1) * torch.mm(inv_J_fy, J_fK) 205 | 206 | grad_x[i] = grad_output[i].view(1,m).mm(J_yx).view(n,2) 207 | grad_z[i] = grad_output[i].view(1,m).mm(J_yz).view(n,3) 208 | grad_K += grad_output[i].view(1,m).mm(J_yK).view(3,3) 209 | 210 | return grad_x, grad_z, grad_K, None 211 | 212 | 213 | class BPnP_fast(torch.autograd.Function): 214 | """ 215 | BPnP_fast is the efficient version of the BPnP class which ignores the higher order dirivatives through the coefs' graph. This sacrifices 216 | negligible gradient accuracy yet saves significant runtime. 217 | INPUTS: 218 | pts2d - the 2D keypoints coordinates of size [batch_size, num_keypoints, 2] 219 | pts3d - the 3D keypoints coordinates of size [num_keypoints, 3] 220 | K - the camera intrinsic matrix of size [3, 3] 221 | OUTPUT: 222 | P_6d - the 6 DOF poses of size [batch_size, 6], where the first 3 elements of each row are the angle-axis rotation 223 | vector (Euler vector) and the last 3 elements are the translation vector. 224 | NOTE: 225 | This BPnP function assumes that all sets of 2D points in the mini-batch correspond to one common set of 3D points. 226 | For situations where pts3d is also a mini-batch, use the BPnP_m3d class. 227 | """ 228 | @staticmethod 229 | def forward(ctx, pts2d, pts3d, K, ini_pose=None): 230 | bs = pts2d.size(0) 231 | n = pts2d.size(1) 232 | device = pts2d.device 233 | pts3d_np = np.array(pts3d.detach().cpu()) 234 | K_np = np.array(K.detach().cpu()) 235 | P_6d = torch.zeros(bs,6,device=device) 236 | 237 | for i in range(bs): 238 | pts2d_i_np = np.ascontiguousarray(pts2d[i].detach().cpu()).reshape((n,1,2)) 239 | if ini_pose is None: 240 | _, rvec0, T0, _ = cv.solvePnPRansac(objectPoints=pts3d_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, confidence=0.9999 ,reprojectionError=3) 241 | else: 242 | rvec0 = np.array(ini_pose[i, 0:3].cpu().view(3, 1)) 243 | T0 = np.array(ini_pose[i, 3:6].cpu().view(3, 1)) 244 | _, rvec, T = cv.solvePnP(objectPoints=pts3d_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, useExtrinsicGuess=True, rvec=rvec0, tvec=T0) 245 | angle_axis = torch.tensor(rvec,device=device,dtype=torch.float).view(1, 3) 246 | T = torch.tensor(T,device=device,dtype=torch.float).view(1, 3) 247 | P_6d[i,:] = torch.cat((angle_axis,T),dim=-1) 248 | 249 | ctx.save_for_backward(pts2d,P_6d,pts3d,K) 250 | return P_6d 251 | 252 | @staticmethod 253 | def backward(ctx, grad_output): 254 | 255 | pts2d, P_6d, pts3d, K = ctx.saved_tensors 256 | device = pts2d.device 257 | bs = pts2d.size(0) 258 | n = pts2d.size(1) 259 | m = 6 260 | 261 | grad_x = torch.zeros_like(pts2d) 262 | grad_z = torch.zeros_like(pts3d) 263 | grad_K = torch.zeros_like(K) 264 | 265 | for i in range(bs): 266 | J_fy = torch.zeros(m,m, device=device) 267 | J_fx = torch.zeros(m,2*n, device=device) 268 | J_fz = torch.zeros(m,3*n, device=device) 269 | J_fK = torch.zeros(m, 9, device=device) 270 | 271 | coefs = get_coefs(P_6d[i].view(1,6), pts3d, K, create_graph=False).detach() 272 | 273 | pts2d_flat = pts2d[i].clone().view(-1).detach().requires_grad_() 274 | P_6d_flat = P_6d[i].clone().view(-1).detach().requires_grad_() 275 | pts3d_flat = pts3d.clone().view(-1).detach().requires_grad_() 276 | K_flat = K.clone().view(-1).detach().requires_grad_() 277 | 278 | for j in range(m): 279 | torch.set_grad_enabled(True) 280 | if j > 0: 281 | pts2d_flat.grad.zero_() 282 | P_6d_flat.grad.zero_() 283 | pts3d_flat.grad.zero_() 284 | K_flat.grad.zero_() 285 | 286 | R = kn.angle_axis_to_rotation_matrix(P_6d_flat[0:m-3].view(1,3)) 287 | 288 | P = torch.cat((R[0,0:3,0:3].view(3,3), P_6d_flat[m-3:m].view(3,1)),dim=-1) 289 | KP = torch.mm(K_flat.view(3,3), P) 290 | pts2d_i = pts2d_flat.view(n,2).transpose(0,1) 291 | pts3d_i = torch.cat((pts3d_flat.view(n,3),torch.ones(n,1,device=device)),dim=-1).t() 292 | proj_i = KP.mm(pts3d_i) 293 | Si = proj_i[2,:].view(1,n) 294 | 295 | r = pts2d_i*Si-proj_i[0:2,:] 296 | coef = coefs[:,:,j].transpose(0,1) # size: [2,n] 297 | fj = (coef*r).sum() 298 | fj.backward() 299 | J_fy[j,:] = P_6d_flat.grad.clone() 300 | J_fx[j,:] = pts2d_flat.grad.clone() 301 | J_fz[j,:] = pts3d_flat.grad.clone() 302 | J_fK[j,:] = K_flat.grad.clone() 303 | 304 | inv_J_fy = torch.inverse(J_fy) 305 | 306 | J_yx = (-1) * torch.mm(inv_J_fy, J_fx) 307 | J_yz = (-1) * torch.mm(inv_J_fy, J_fz) 308 | J_yK = (-1) * torch.mm(inv_J_fy, J_fK) 309 | 310 | grad_x[i] = grad_output[i].view(1,m).mm(J_yx).view(n,2) 311 | grad_z += grad_output[i].view(1,m).mm(J_yz).view(n,3) 312 | grad_K += grad_output[i].view(1,m).mm(J_yK).view(3,3) 313 | 314 | return grad_x, grad_z, grad_K, None 315 | 316 | 317 | def get_coefs(P_6d, pts3d, K, create_graph=True): 318 | device = P_6d.device 319 | n = pts3d.size(0) 320 | m = P_6d.size(-1) 321 | coefs = torch.zeros(n,2,m,device=device) 322 | torch.set_grad_enabled(True) 323 | y = P_6d.repeat(n,1) 324 | proj = batch_project(y, pts3d, K).squeeze() 325 | vec = torch.diag(torch.ones(n,device=device).float()) 326 | for k in range(2): 327 | torch.set_grad_enabled(True) 328 | y_grad = torch.autograd.grad(proj[:,:,k],y,vec, retain_graph=True, create_graph=create_graph) 329 | coefs[:,k,:] = -2*y_grad[0].clone() 330 | return coefs 331 | 332 | def batch_project(P, pts3d, K, angle_axis=True): 333 | n = pts3d.size(0) 334 | bs = P.size(0) 335 | device = P.device 336 | pts3d_h = torch.cat((pts3d, torch.ones(n, 1, device=device)), dim=-1) 337 | if angle_axis: 338 | R_out = kn.angle_axis_to_rotation_matrix(P[:, 0:3].view(bs, 3)) 339 | PM = torch.cat((R_out[:,0:3,0:3], P[:, 3:6].view(bs, 3, 1)), dim=-1) 340 | else: 341 | PM = P 342 | pts3d_cam = pts3d_h.matmul(PM.transpose(-2,-1)) 343 | pts2d_proj = pts3d_cam.matmul(K.t()) 344 | S = pts2d_proj[:,:, 2].view(bs, n, 1) 345 | pts2d_pro = pts2d_proj[:,:,0:2].div(S) 346 | 347 | return pts2d_pro 348 | 349 | 350 | 351 | 352 | 353 | -------------------------------------------------------------------------------- /core/camera_model.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------- 2 | # Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab 3 | # Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it) 4 | # Released under Creative Commons 5 | # Attribution-NonCommercial-ShareAlike 4.0 International License. 6 | # http://creativecommons.org/licenses/by-nc-sa/4.0/ 7 | # ------------------------------------------------------------------- 8 | import numpy as np 9 | import torch 10 | import sys 11 | sys.path.append("core") 12 | from utils_point import rotate_forward, rotate_back 13 | 14 | class CameraModel: 15 | def __init__(self, focal_length=None, principal_point=None): 16 | self.focal_length = focal_length 17 | self.principal_point = principal_point 18 | 19 | def project_pytorch(self, xyz: torch.Tensor, image_size, reflectance=None): 20 | if xyz.shape[0] == 3: 21 | xyz = torch.cat([xyz, torch.ones(1, xyz.shape[1], device=xyz.device)]) 22 | else: 23 | if not torch.all(xyz[3, :] == 1.): 24 | xyz[3, :] = 1. 25 | raise TypeError("Wrong Coordinates") 26 | order = [1, 2, 0, 3] 27 | xyzw = xyz[order, :] 28 | indexes = xyzw[2, :] >= 0 29 | if reflectance is not None: 30 | reflectance = reflectance[:, indexes] 31 | xyzw = xyzw[:, indexes] 32 | 33 | uv = torch.zeros((2, xyzw.shape[1]), device=xyzw.device) 34 | uv[0, :] = self.focal_length[0] * xyzw[0, :] / xyzw[2, :] + self.principal_point[0] 35 | uv[1, :] = self.focal_length[1] * xyzw[1, :] / xyzw[2, :] + self.principal_point[1] 36 | indexes = uv[0, :] >= 0.1 37 | indexes = indexes & (uv[1, :] >= 0.1) 38 | indexes = indexes & (uv[0,:] < image_size[1]) 39 | indexes = indexes & (uv[1,:] < image_size[0]) 40 | if reflectance is None: 41 | uv = uv[:, indexes], xyzw[2, indexes], xyzw[:3, indexes], None 42 | else: 43 | uv = uv[:, indexes], xyzw[2, indexes], xyzw[:3, indexes], reflectance[:, indexes] 44 | 45 | return uv 46 | 47 | # for pc_RT 48 | def project_withindex_pytorch(self, xyz: torch.Tensor, image_size, reflectance=None): 49 | if xyz.shape[0] == 3: 50 | xyz = torch.cat([xyz, torch.ones(1, xyz.shape[1], device=xyz.device)]) 51 | else: 52 | if not torch.all(xyz[3, :] == 1.): 53 | xyz[3, :] = 1. 54 | raise TypeError("Wrong Coordinates") 55 | order = [1, 2, 0, 3] 56 | xyzw = xyz[order, :] 57 | indexes = xyzw[2, :] >= 0 58 | if reflectance is not None: 59 | reflectance = reflectance[:, indexes] 60 | xyzw = xyzw[:, indexes] 61 | 62 | VI_indexes = indexes 63 | 64 | 65 | uv = torch.zeros((2, xyzw.shape[1]), device=xyzw.device) 66 | uv[0, :] = self.focal_length[0] * xyzw[0, :] / xyzw[2, :] + self.principal_point[0] 67 | uv[1, :] = self.focal_length[1] * xyzw[1, :] / xyzw[2, :] + self.principal_point[1] 68 | indexes = uv[0, :] >= 0.1 69 | indexes = indexes & (uv[1, :] >= 0.1) 70 | indexes = indexes & (uv[0, :] < image_size[1]) 71 | indexes = indexes & (uv[1, :] < image_size[0]) 72 | 73 | # generate complete indexes 74 | ind = torch.where(VI_indexes == True)[0] 75 | VI_indexes[ind] = VI_indexes[ind] & indexes 76 | 77 | 78 | if reflectance is None: 79 | uv = uv[:, indexes], xyzw[2, indexes], xyzw[:3, indexes], None, VI_indexes 80 | else: 81 | uv = uv[:, indexes], xyzw[2, indexes], xyzw[:3, indexes], reflectance[:, indexes], VI_indexes 82 | 83 | return uv 84 | 85 | 86 | def get_matrix(self): 87 | matrix = np.zeros([3, 3]) 88 | matrix[0, 0] = self.focal_length[0] 89 | matrix[1, 1] = self.focal_length[1] 90 | matrix[0, 2] = self.principal_point[0] 91 | matrix[1, 2] = self.principal_point[1] 92 | matrix[2, 2] = 1.0 93 | return matrix 94 | 95 | def deproject_pytorch(self, uv, pc_project_uv): 96 | index = np.argwhere(uv > 0) 97 | mask = uv > 0 98 | z = uv[mask] 99 | # x = (index[:, 1] - self.principal_point[0].cpu().numpy()) * z / self.focal_length[0].cpu().numpy() 100 | # y = (index[:, 0] - self.principal_point[1].cpu().numpy()) * z / self.focal_length[1].cpu().numpy() 101 | x = (index[:, 1] - self.principal_point[0]) * z / self.focal_length[0] 102 | y = (index[:, 0] - self.principal_point[1]) * z / self.focal_length[1] 103 | zxy = np.array([z, x, y]) 104 | zxy = torch.tensor(zxy, dtype=torch.float32) 105 | zxyw = torch.cat([zxy, torch.ones(1, zxy.shape[1], device=zxy.device)]) 106 | zxy = zxyw[:3, :] 107 | zxy = zxy.cpu().numpy() 108 | xyz = zxy[[1, 2, 0], :] 109 | 110 | 111 | pc_project_u = pc_project_uv[:, :, 0][mask] 112 | pc_project_v = pc_project_uv[:, :, 1][mask] 113 | pc_project = np.array([pc_project_v, pc_project_u]) 114 | 115 | 116 | match_index = np.array([index[:, 0], index[:, 1]]) 117 | 118 | return xyz.transpose(), pc_project.transpose(), match_index.transpose() 119 | 120 | def depth2pc(self, depth_img): 121 | # pc = torch.zeros([depth_img.shape[0], depth_img.shape[1], 3]) 122 | # pc[:, :, 2] = depth_img 123 | # 124 | # for i in range(depth_img.shape[0]): 125 | # for j in range(depth_img.shape[1]): 126 | # pc[i, j, 0] = (j - self.principal_point[0]) * depth_img[i, j] / self.focal_length[0] 127 | # pc[i, j, 1] = (i - self.principal_point[1]) * depth_img[i, j] / self.focal_length[1] 128 | 129 | depth_img = depth_img.cpu().numpy() 130 | index = np.argwhere(depth_img > 0) 131 | mask = depth_img > 0 132 | z = depth_img[mask] 133 | x = (index[:, 1] - self.principal_point[0].cpu().numpy()) * z / self.focal_length[0].cpu().numpy() 134 | y = (index[:, 0] - self.principal_point[1].cpu().numpy()) * z / self.focal_length[1].cpu().numpy() 135 | 136 | # pc[index[:, 0], index[:, 1], 0] = x 137 | # pc[index[:, 0], index[:, 1], 1] = y 138 | # pc[index[:, 0], index[:, 1], 2] = z 139 | zxy = np.array([z, x, y], dtype=np.float32) 140 | return zxy -------------------------------------------------------------------------------- /core/corr.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | import sys 4 | sys.path.append("core") 5 | from utils import bilinear_sampler, coords_grid 6 | 7 | try: 8 | import alt_cuda_corr 9 | except: 10 | # alt_cuda_corr is not compiled 11 | pass 12 | 13 | 14 | class CorrBlock: 15 | def __init__(self, fmap1, fmap2, num_levels=4, radius=4): 16 | self.num_levels = num_levels 17 | self.radius = radius 18 | self.corr_pyramid = [] 19 | 20 | # all pairs correlation 21 | corr = CorrBlock.corr(fmap1, fmap2) 22 | 23 | batch, h1, w1, dim, h2, w2 = corr.shape 24 | corr = corr.reshape(batch * h1 * w1, dim, h2, w2) 25 | 26 | self.corr_pyramid.append(corr) 27 | for i in range(self.num_levels - 1): 28 | corr = F.avg_pool2d(corr, 2, stride=2) 29 | self.corr_pyramid.append(corr) 30 | 31 | def __call__(self, coords): 32 | r = self.radius 33 | coords = coords.permute(0, 2, 3, 1) 34 | batch, h1, w1, _ = coords.shape 35 | 36 | out_pyramid = [] 37 | for i in range(self.num_levels): 38 | corr = self.corr_pyramid[i] 39 | dx = torch.linspace(-r, r, 2 * r + 1) 40 | dy = torch.linspace(-r, r, 2 * r + 1) 41 | delta = torch.stack(torch.meshgrid(dy, dx), axis=-1).to(coords.device) 42 | 43 | centroid_lvl = coords.reshape(batch * h1 * w1, 1, 1, 2) / 2 ** i 44 | delta_lvl = delta.view(1, 2 * r + 1, 2 * r + 1, 2) 45 | coords_lvl = centroid_lvl + delta_lvl 46 | 47 | corr = bilinear_sampler(corr, coords_lvl) 48 | corr = corr.view(batch, h1, w1, -1) 49 | out_pyramid.append(corr) 50 | 51 | out = torch.cat(out_pyramid, dim=-1) 52 | return out.permute(0, 3, 1, 2).contiguous().float() 53 | 54 | @staticmethod 55 | def corr(fmap1, fmap2): 56 | batch, dim, ht, wd = fmap1.shape 57 | fmap1 = fmap1.view(batch, dim, ht * wd) 58 | fmap2 = fmap2.view(batch, dim, ht * wd) 59 | 60 | corr = torch.matmul(fmap1.transpose(1, 2), fmap2) 61 | corr = corr.view(batch, ht, wd, 1, ht, wd) 62 | return corr / torch.sqrt(torch.tensor(dim).float()) 63 | 64 | 65 | class AlternateCorrBlock: 66 | def __init__(self, fmap1, fmap2, num_levels=4, radius=4): 67 | self.num_levels = num_levels 68 | self.radius = radius 69 | 70 | self.pyramid = [(fmap1, fmap2)] 71 | for i in range(self.num_levels): 72 | fmap1 = F.avg_pool2d(fmap1, 2, stride=2) 73 | fmap2 = F.avg_pool2d(fmap2, 2, stride=2) 74 | self.pyramid.append((fmap1, fmap2)) 75 | 76 | def __call__(self, coords): 77 | coords = coords.permute(0, 2, 3, 1) 78 | B, H, W, _ = coords.shape 79 | dim = self.pyramid[0][0].shape[1] 80 | 81 | corr_list = [] 82 | for i in range(self.num_levels): 83 | r = self.radius 84 | fmap1_i = self.pyramid[0][0].permute(0, 2, 3, 1).contiguous() 85 | fmap2_i = self.pyramid[i][1].permute(0, 2, 3, 1).contiguous() 86 | 87 | coords_i = (coords / 2 ** i).reshape(B, 1, H, W, 2).contiguous() 88 | corr, = alt_cuda_corr.forward(fmap1_i, fmap2_i, coords_i, r) 89 | corr_list.append(corr.squeeze(1)) 90 | 91 | corr = torch.stack(corr_list, dim=1) 92 | corr = corr.reshape(B, -1, H, W) 93 | return corr / torch.sqrt(torch.tensor(dim).float()) 94 | -------------------------------------------------------------------------------- /core/correlation_package/LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2017 NVIDIA CORPORATION 2 | 3 | Licensed under the Apache License, Version 2.0 (the "License"); 4 | you may not use this file except in compliance with the License. 5 | You may obtain a copy of the License at 6 | 7 | http://www.apache.org/licenses/LICENSE-2.0 8 | 9 | Unless required by applicable law or agreed to in writing, software 10 | distributed under the License is distributed on an "AS IS" BASIS, 11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | See the License for the specific language governing permissions and 13 | limitations under the License. -------------------------------------------------------------------------------- /core/correlation_package/correlation.py: -------------------------------------------------------------------------------- 1 | import correlation_cuda 2 | import torch 3 | from torch.autograd import Function 4 | from torch.nn.modules.module import Module 5 | 6 | 7 | class CorrelationFunction(Function): 8 | 9 | def __init__(self, pad_size=3, kernel_size=3, max_displacement=20, stride1=1, stride2=2, corr_multiply=1): 10 | super(CorrelationFunction, self).__init__() 11 | self.pad_size = pad_size 12 | self.kernel_size = kernel_size 13 | self.max_displacement = max_displacement 14 | self.stride1 = stride1 15 | self.stride2 = stride2 16 | self.corr_multiply = corr_multiply 17 | # self.out_channel = ((max_displacement/stride2)*2 + 1) * ((max_displacement/stride2)*2 + 1) 18 | 19 | def forward(self, input1, input2): 20 | self.save_for_backward(input1, input2) 21 | 22 | with torch.cuda.device_of(input1): 23 | rbot1 = input1.new() 24 | rbot2 = input2.new() 25 | output = input1.new() 26 | 27 | correlation_cuda.forward(input1, input2, rbot1, rbot2, output, 28 | self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply) 29 | 30 | return output 31 | 32 | def backward(self, grad_output): 33 | input1, input2 = self.saved_tensors 34 | 35 | with torch.cuda.device_of(input1): 36 | rbot1 = input1.new() 37 | rbot2 = input2.new() 38 | 39 | grad_input1 = input1.new() 40 | grad_input2 = input2.new() 41 | 42 | correlation_cuda.backward(input1, input2, rbot1, rbot2, grad_output, grad_input1, grad_input2, 43 | self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply) 44 | 45 | return grad_input1, grad_input2 46 | 47 | 48 | class Correlation(Module): 49 | def __init__(self, pad_size=0, kernel_size=0, max_displacement=0, stride1=1, stride2=2, corr_multiply=1): 50 | super(Correlation, self).__init__() 51 | self.pad_size = pad_size 52 | self.kernel_size = kernel_size 53 | self.max_displacement = max_displacement 54 | self.stride1 = stride1 55 | self.stride2 = stride2 56 | self.corr_multiply = corr_multiply 57 | 58 | def forward(self, input1, input2): 59 | 60 | result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2) 61 | 62 | return result 63 | -------------------------------------------------------------------------------- /core/correlation_package/correlation_cuda.cc: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | 7 | #include "correlation_cuda_kernel.cuh" 8 | 9 | int correlation_forward_cuda(at::Tensor& input1, at::Tensor& input2, at::Tensor& rInput1, at::Tensor& rInput2, at::Tensor& output, 10 | int pad_size, 11 | int kernel_size, 12 | int max_displacement, 13 | int stride1, 14 | int stride2, 15 | int corr_type_multiply) 16 | { 17 | 18 | int batchSize = input1.size(0); 19 | 20 | int nInputChannels = input1.size(1); 21 | int inputHeight = input1.size(2); 22 | int inputWidth = input1.size(3); 23 | 24 | int kernel_radius = (kernel_size - 1) / 2; 25 | int border_radius = kernel_radius + max_displacement; 26 | 27 | int paddedInputHeight = inputHeight + 2 * pad_size; 28 | int paddedInputWidth = inputWidth + 2 * pad_size; 29 | 30 | int nOutputChannels = ((max_displacement/stride2)*2 + 1) * ((max_displacement/stride2)*2 + 1); 31 | 32 | int outputHeight = ceil(static_cast(paddedInputHeight - 2 * border_radius) / static_cast(stride1)); 33 | int outputwidth = ceil(static_cast(paddedInputWidth - 2 * border_radius) / static_cast(stride1)); 34 | 35 | rInput1.resize_({batchSize, paddedInputHeight, paddedInputWidth, nInputChannels}); 36 | rInput2.resize_({batchSize, paddedInputHeight, paddedInputWidth, nInputChannels}); 37 | output.resize_({batchSize, nOutputChannels, outputHeight, outputwidth}); 38 | 39 | rInput1.fill_(0); 40 | rInput2.fill_(0); 41 | output.fill_(0); 42 | 43 | int success = correlation_forward_cuda_kernel( 44 | output, 45 | output.size(0), 46 | output.size(1), 47 | output.size(2), 48 | output.size(3), 49 | output.stride(0), 50 | output.stride(1), 51 | output.stride(2), 52 | output.stride(3), 53 | input1, 54 | input1.size(1), 55 | input1.size(2), 56 | input1.size(3), 57 | input1.stride(0), 58 | input1.stride(1), 59 | input1.stride(2), 60 | input1.stride(3), 61 | input2, 62 | input2.size(1), 63 | input2.stride(0), 64 | input2.stride(1), 65 | input2.stride(2), 66 | input2.stride(3), 67 | rInput1, 68 | rInput2, 69 | pad_size, 70 | kernel_size, 71 | max_displacement, 72 | stride1, 73 | stride2, 74 | corr_type_multiply, 75 | at::cuda::getCurrentCUDAStream() 76 | ); 77 | 78 | //check for errors 79 | if (!success) { 80 | AT_ERROR("CUDA call failed"); 81 | } 82 | 83 | return 1; 84 | 85 | } 86 | 87 | int correlation_backward_cuda(at::Tensor& input1, at::Tensor& input2, at::Tensor& rInput1, at::Tensor& rInput2, at::Tensor& gradOutput, 88 | at::Tensor& gradInput1, at::Tensor& gradInput2, 89 | int pad_size, 90 | int kernel_size, 91 | int max_displacement, 92 | int stride1, 93 | int stride2, 94 | int corr_type_multiply) 95 | { 96 | 97 | int batchSize = input1.size(0); 98 | int nInputChannels = input1.size(1); 99 | int paddedInputHeight = input1.size(2)+ 2 * pad_size; 100 | int paddedInputWidth = input1.size(3)+ 2 * pad_size; 101 | 102 | int height = input1.size(2); 103 | int width = input1.size(3); 104 | 105 | rInput1.resize_({batchSize, paddedInputHeight, paddedInputWidth, nInputChannels}); 106 | rInput2.resize_({batchSize, paddedInputHeight, paddedInputWidth, nInputChannels}); 107 | gradInput1.resize_({batchSize, nInputChannels, height, width}); 108 | gradInput2.resize_({batchSize, nInputChannels, height, width}); 109 | 110 | rInput1.fill_(0); 111 | rInput2.fill_(0); 112 | gradInput1.fill_(0); 113 | gradInput2.fill_(0); 114 | 115 | int success = correlation_backward_cuda_kernel(gradOutput, 116 | gradOutput.size(0), 117 | gradOutput.size(1), 118 | gradOutput.size(2), 119 | gradOutput.size(3), 120 | gradOutput.stride(0), 121 | gradOutput.stride(1), 122 | gradOutput.stride(2), 123 | gradOutput.stride(3), 124 | input1, 125 | input1.size(1), 126 | input1.size(2), 127 | input1.size(3), 128 | input1.stride(0), 129 | input1.stride(1), 130 | input1.stride(2), 131 | input1.stride(3), 132 | input2, 133 | input2.stride(0), 134 | input2.stride(1), 135 | input2.stride(2), 136 | input2.stride(3), 137 | gradInput1, 138 | gradInput1.stride(0), 139 | gradInput1.stride(1), 140 | gradInput1.stride(2), 141 | gradInput1.stride(3), 142 | gradInput2, 143 | gradInput2.size(1), 144 | gradInput2.stride(0), 145 | gradInput2.stride(1), 146 | gradInput2.stride(2), 147 | gradInput2.stride(3), 148 | rInput1, 149 | rInput2, 150 | pad_size, 151 | kernel_size, 152 | max_displacement, 153 | stride1, 154 | stride2, 155 | corr_type_multiply, 156 | at::cuda::getCurrentCUDAStream() 157 | ); 158 | 159 | if (!success) { 160 | AT_ERROR("CUDA call failed"); 161 | } 162 | 163 | return 1; 164 | } 165 | 166 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 167 | m.def("forward", &correlation_forward_cuda, "Correlation forward (CUDA)"); 168 | m.def("backward", &correlation_backward_cuda, "Correlation backward (CUDA)"); 169 | } 170 | 171 | -------------------------------------------------------------------------------- /core/correlation_package/correlation_cuda_kernel.cu: -------------------------------------------------------------------------------- 1 | #include 2 | 3 | #include "correlation_cuda_kernel.cuh" 4 | 5 | #define CUDA_NUM_THREADS 1024 6 | #define THREADS_PER_BLOCK 32 7 | #define FULL_MASK 0xffffffff 8 | 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | using at::Half; 15 | 16 | template 17 | __forceinline__ __device__ scalar_t warpReduceSum(scalar_t val) { 18 | for (int offset = 16; offset > 0; offset /= 2) 19 | val += __shfl_down_sync(FULL_MASK, val, offset); 20 | return val; 21 | } 22 | 23 | template 24 | __forceinline__ __device__ scalar_t blockReduceSum(scalar_t val) { 25 | 26 | static __shared__ scalar_t shared[32]; 27 | int lane = threadIdx.x % warpSize; 28 | int wid = threadIdx.x / warpSize; 29 | 30 | val = warpReduceSum(val); 31 | 32 | if (lane == 0) 33 | shared[wid] = val; 34 | 35 | __syncthreads(); 36 | 37 | val = (threadIdx.x < blockDim.x / warpSize) ? shared[lane] : 0; 38 | 39 | if (wid == 0) 40 | val = warpReduceSum(val); 41 | 42 | return val; 43 | } 44 | 45 | 46 | template 47 | __global__ void channels_first(const scalar_t* __restrict__ input, scalar_t* rinput, int channels, int height, int width, int pad_size) 48 | { 49 | 50 | // n (batch size), c (num of channels), y (height), x (width) 51 | int n = blockIdx.x; 52 | int y = blockIdx.y; 53 | int x = blockIdx.z; 54 | 55 | int ch_off = threadIdx.x; 56 | scalar_t value; 57 | 58 | int dimcyx = channels * height * width; 59 | int dimyx = height * width; 60 | 61 | int p_dimx = (width + 2 * pad_size); 62 | int p_dimy = (height + 2 * pad_size); 63 | int p_dimyxc = channels * p_dimy * p_dimx; 64 | int p_dimxc = p_dimx * channels; 65 | 66 | for (int c = ch_off; c < channels; c += THREADS_PER_BLOCK) { 67 | value = input[n * dimcyx + c * dimyx + y * width + x]; 68 | rinput[n * p_dimyxc + (y + pad_size) * p_dimxc + (x + pad_size) * channels + c] = value; 69 | } 70 | } 71 | 72 | 73 | template 74 | __global__ void correlation_forward(scalar_t* __restrict__ output, const int nOutputChannels, 75 | const int outputHeight, const int outputWidth, const scalar_t* __restrict__ rInput1, 76 | const int nInputChannels, const int inputHeight, const int inputWidth, 77 | const scalar_t* __restrict__ rInput2, const int pad_size, const int kernel_size, 78 | const int max_displacement, const int stride1, const int stride2) { 79 | 80 | int32_t pInputWidth = inputWidth + 2 * pad_size; 81 | int32_t pInputHeight = inputHeight + 2 * pad_size; 82 | 83 | int32_t kernel_rad = (kernel_size - 1) / 2; 84 | 85 | int32_t displacement_rad = max_displacement / stride2; 86 | 87 | int32_t displacement_size = 2 * displacement_rad + 1; 88 | 89 | int32_t n = blockIdx.x; 90 | int32_t y1 = blockIdx.y * stride1 + max_displacement; 91 | int32_t x1 = blockIdx.z * stride1 + max_displacement; 92 | int32_t c = threadIdx.x; 93 | 94 | int32_t pdimyxc = pInputHeight * pInputWidth * nInputChannels; 95 | 96 | int32_t pdimxc = pInputWidth * nInputChannels; 97 | 98 | int32_t pdimc = nInputChannels; 99 | 100 | int32_t tdimcyx = nOutputChannels * outputHeight * outputWidth; 101 | int32_t tdimyx = outputHeight * outputWidth; 102 | int32_t tdimx = outputWidth; 103 | 104 | int32_t nelems = kernel_size * kernel_size * pdimc; 105 | 106 | // element-wise product along channel axis 107 | for (int tj = -displacement_rad; tj <= displacement_rad; ++tj) { 108 | for (int ti = -displacement_rad; ti <= displacement_rad; ++ti) { 109 | int x2 = x1 + ti * stride2; 110 | int y2 = y1 + tj * stride2; 111 | 112 | float acc0 = 0.0f; 113 | 114 | for (int j = -kernel_rad; j <= kernel_rad; ++j) { 115 | for (int i = -kernel_rad; i <= kernel_rad; ++i) { 116 | // THREADS_PER_BLOCK 117 | #pragma unroll 118 | for (int ch = c; ch < pdimc; ch += blockDim.x) { 119 | 120 | int indx1 = n * pdimyxc + (y1 + j) * pdimxc 121 | + (x1 + i) * pdimc + ch; 122 | int indx2 = n * pdimyxc + (y2 + j) * pdimxc 123 | + (x2 + i) * pdimc + ch; 124 | acc0 += static_cast(rInput1[indx1] * rInput2[indx2]); 125 | } 126 | } 127 | } 128 | 129 | if (blockDim.x == warpSize) { 130 | __syncwarp(); 131 | acc0 = warpReduceSum(acc0); 132 | } else { 133 | __syncthreads(); 134 | acc0 = blockReduceSum(acc0); 135 | } 136 | 137 | if (threadIdx.x == 0) { 138 | 139 | int tc = (tj + displacement_rad) * displacement_size 140 | + (ti + displacement_rad); 141 | const int tindx = n * tdimcyx + tc * tdimyx + blockIdx.y * tdimx 142 | + blockIdx.z; 143 | output[tindx] = static_cast(acc0 / nelems); 144 | } 145 | } 146 | } 147 | } 148 | 149 | 150 | template 151 | __global__ void correlation_backward_input1(int item, scalar_t* gradInput1, int nInputChannels, int inputHeight, int inputWidth, 152 | const scalar_t* __restrict__ gradOutput, int nOutputChannels, int outputHeight, int outputWidth, 153 | const scalar_t* __restrict__ rInput2, 154 | int pad_size, 155 | int kernel_size, 156 | int max_displacement, 157 | int stride1, 158 | int stride2) 159 | { 160 | // n (batch size), c (num of channels), y (height), x (width) 161 | 162 | int n = item; 163 | int y = blockIdx.x * stride1 + pad_size; 164 | int x = blockIdx.y * stride1 + pad_size; 165 | int c = blockIdx.z; 166 | int tch_off = threadIdx.x; 167 | 168 | int kernel_rad = (kernel_size - 1) / 2; 169 | int displacement_rad = max_displacement / stride2; 170 | int displacement_size = 2 * displacement_rad + 1; 171 | 172 | int xmin = (x - kernel_rad - max_displacement) / stride1; 173 | int ymin = (y - kernel_rad - max_displacement) / stride1; 174 | 175 | int xmax = (x + kernel_rad - max_displacement) / stride1; 176 | int ymax = (y + kernel_rad - max_displacement) / stride1; 177 | 178 | if (xmax < 0 || ymax < 0 || xmin >= outputWidth || ymin >= outputHeight) { 179 | // assumes gradInput1 is pre-allocated and zero filled 180 | return; 181 | } 182 | 183 | if (xmin > xmax || ymin > ymax) { 184 | // assumes gradInput1 is pre-allocated and zero filled 185 | return; 186 | } 187 | 188 | xmin = max(0,xmin); 189 | xmax = min(outputWidth-1,xmax); 190 | 191 | ymin = max(0,ymin); 192 | ymax = min(outputHeight-1,ymax); 193 | 194 | int pInputWidth = inputWidth + 2 * pad_size; 195 | int pInputHeight = inputHeight + 2 * pad_size; 196 | 197 | int pdimyxc = pInputHeight * pInputWidth * nInputChannels; 198 | int pdimxc = pInputWidth * nInputChannels; 199 | int pdimc = nInputChannels; 200 | 201 | int tdimcyx = nOutputChannels * outputHeight * outputWidth; 202 | int tdimyx = outputHeight * outputWidth; 203 | int tdimx = outputWidth; 204 | 205 | int odimcyx = nInputChannels * inputHeight* inputWidth; 206 | int odimyx = inputHeight * inputWidth; 207 | int odimx = inputWidth; 208 | 209 | scalar_t nelems = kernel_size * kernel_size * nInputChannels; 210 | 211 | __shared__ scalar_t prod_sum[THREADS_PER_BLOCK]; 212 | prod_sum[tch_off] = 0; 213 | 214 | for (int tc = tch_off; tc < nOutputChannels; tc += THREADS_PER_BLOCK) { 215 | 216 | int i2 = (tc % displacement_size - displacement_rad) * stride2; 217 | int j2 = (tc / displacement_size - displacement_rad) * stride2; 218 | 219 | int indx2 = n * pdimyxc + (y + j2)* pdimxc + (x + i2) * pdimc + c; 220 | 221 | scalar_t val2 = rInput2[indx2]; 222 | 223 | for (int j = ymin; j <= ymax; ++j) { 224 | for (int i = xmin; i <= xmax; ++i) { 225 | int tindx = n * tdimcyx + tc * tdimyx + j * tdimx + i; 226 | prod_sum[tch_off] += gradOutput[tindx] * val2; 227 | } 228 | } 229 | } 230 | __syncthreads(); 231 | 232 | if(tch_off == 0) { 233 | scalar_t reduce_sum = 0; 234 | for(int idx = 0; idx < THREADS_PER_BLOCK; idx++) { 235 | reduce_sum += prod_sum[idx]; 236 | } 237 | const int indx1 = n * odimcyx + c * odimyx + (y - pad_size) * odimx + (x - pad_size); 238 | gradInput1[indx1] = reduce_sum / nelems; 239 | } 240 | 241 | } 242 | 243 | template 244 | __global__ void correlation_backward_input2(int item, scalar_t* gradInput2, int nInputChannels, int inputHeight, int inputWidth, 245 | const scalar_t* __restrict__ gradOutput, int nOutputChannels, int outputHeight, int outputWidth, 246 | const scalar_t* __restrict__ rInput1, 247 | int pad_size, 248 | int kernel_size, 249 | int max_displacement, 250 | int stride1, 251 | int stride2) 252 | { 253 | // n (batch size), c (num of channels), y (height), x (width) 254 | 255 | int n = item; 256 | int y = blockIdx.x * stride1 + pad_size; 257 | int x = blockIdx.y * stride1 + pad_size; 258 | int c = blockIdx.z; 259 | 260 | int tch_off = threadIdx.x; 261 | 262 | int kernel_rad = (kernel_size - 1) / 2; 263 | int displacement_rad = max_displacement / stride2; 264 | int displacement_size = 2 * displacement_rad + 1; 265 | 266 | int pInputWidth = inputWidth + 2 * pad_size; 267 | int pInputHeight = inputHeight + 2 * pad_size; 268 | 269 | int pdimyxc = pInputHeight * pInputWidth * nInputChannels; 270 | int pdimxc = pInputWidth * nInputChannels; 271 | int pdimc = nInputChannels; 272 | 273 | int tdimcyx = nOutputChannels * outputHeight * outputWidth; 274 | int tdimyx = outputHeight * outputWidth; 275 | int tdimx = outputWidth; 276 | 277 | int odimcyx = nInputChannels * inputHeight* inputWidth; 278 | int odimyx = inputHeight * inputWidth; 279 | int odimx = inputWidth; 280 | 281 | scalar_t nelems = kernel_size * kernel_size * nInputChannels; 282 | 283 | __shared__ scalar_t prod_sum[THREADS_PER_BLOCK]; 284 | prod_sum[tch_off] = 0; 285 | 286 | for (int tc = tch_off; tc < nOutputChannels; tc += THREADS_PER_BLOCK) { 287 | int i2 = (tc % displacement_size - displacement_rad) * stride2; 288 | int j2 = (tc / displacement_size - displacement_rad) * stride2; 289 | 290 | int xmin = (x - kernel_rad - max_displacement - i2) / stride1; 291 | int ymin = (y - kernel_rad - max_displacement - j2) / stride1; 292 | 293 | int xmax = (x + kernel_rad - max_displacement - i2) / stride1; 294 | int ymax = (y + kernel_rad - max_displacement - j2) / stride1; 295 | 296 | if (xmax < 0 || ymax < 0 || xmin >= outputWidth || ymin >= outputHeight) { 297 | // assumes gradInput2 is pre-allocated and zero filled 298 | continue; 299 | } 300 | 301 | if (xmin > xmax || ymin > ymax) { 302 | // assumes gradInput2 is pre-allocated and zero filled 303 | continue; 304 | } 305 | 306 | xmin = max(0,xmin); 307 | xmax = min(outputWidth-1,xmax); 308 | 309 | ymin = max(0,ymin); 310 | ymax = min(outputHeight-1,ymax); 311 | 312 | int indx1 = n * pdimyxc + (y - j2)* pdimxc + (x - i2) * pdimc + c; 313 | scalar_t val1 = rInput1[indx1]; 314 | 315 | for (int j = ymin; j <= ymax; ++j) { 316 | for (int i = xmin; i <= xmax; ++i) { 317 | int tindx = n * tdimcyx + tc * tdimyx + j * tdimx + i; 318 | prod_sum[tch_off] += gradOutput[tindx] * val1; 319 | } 320 | } 321 | } 322 | 323 | __syncthreads(); 324 | 325 | if(tch_off == 0) { 326 | scalar_t reduce_sum = 0; 327 | for(int idx = 0; idx < THREADS_PER_BLOCK; idx++) { 328 | reduce_sum += prod_sum[idx]; 329 | } 330 | const int indx2 = n * odimcyx + c * odimyx + (y - pad_size) * odimx + (x - pad_size); 331 | gradInput2[indx2] = reduce_sum / nelems; 332 | } 333 | 334 | } 335 | 336 | int correlation_forward_cuda_kernel(at::Tensor& output, 337 | int ob, 338 | int oc, 339 | int oh, 340 | int ow, 341 | int osb, 342 | int osc, 343 | int osh, 344 | int osw, 345 | 346 | at::Tensor& input1, 347 | int ic, 348 | int ih, 349 | int iw, 350 | int isb, 351 | int isc, 352 | int ish, 353 | int isw, 354 | 355 | at::Tensor& input2, 356 | int gc, 357 | int gsb, 358 | int gsc, 359 | int gsh, 360 | int gsw, 361 | 362 | at::Tensor& rInput1, 363 | at::Tensor& rInput2, 364 | int pad_size, 365 | int kernel_size, 366 | int max_displacement, 367 | int stride1, 368 | int stride2, 369 | int corr_type_multiply, 370 | cudaStream_t stream) 371 | { 372 | 373 | int batchSize = ob; 374 | 375 | int nInputChannels = ic; 376 | int inputWidth = iw; 377 | int inputHeight = ih; 378 | 379 | int nOutputChannels = oc; 380 | int outputWidth = ow; 381 | int outputHeight = oh; 382 | 383 | dim3 blocks_grid(batchSize, inputHeight, inputWidth); 384 | dim3 threads_block(THREADS_PER_BLOCK); 385 | 386 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input1.type(), "channels_first_fwd_1", ([&] { 387 | 388 | channels_first<<>>( 389 | input1.data(), rInput1.data(), nInputChannels, inputHeight, inputWidth, pad_size); 390 | 391 | })); 392 | 393 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input2.type(), "channels_first_fwd_2", ([&] { 394 | 395 | channels_first<<>> ( 396 | input2.data(), rInput2.data(), nInputChannels, inputHeight, inputWidth, pad_size); 397 | 398 | })); 399 | 400 | dim3 threadsPerBlock(THREADS_PER_BLOCK); 401 | dim3 totalBlocksCorr(batchSize, outputHeight, outputWidth); 402 | 403 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input1.type(), "correlation_forward", ([&] { 404 | 405 | correlation_forward<<>> 406 | (output.data(), nOutputChannels, outputHeight, outputWidth, 407 | rInput1.data(), nInputChannels, inputHeight, inputWidth, 408 | rInput2.data(), 409 | pad_size, 410 | kernel_size, 411 | max_displacement, 412 | stride1, 413 | stride2); 414 | 415 | })); 416 | 417 | cudaError_t err = cudaGetLastError(); 418 | 419 | 420 | // check for errors 421 | if (err != cudaSuccess) { 422 | printf("error in correlation_forward_cuda_kernel: %s\n", cudaGetErrorString(err)); 423 | return 0; 424 | } 425 | 426 | return 1; 427 | } 428 | 429 | 430 | int correlation_backward_cuda_kernel( 431 | at::Tensor& gradOutput, 432 | int gob, 433 | int goc, 434 | int goh, 435 | int gow, 436 | int gosb, 437 | int gosc, 438 | int gosh, 439 | int gosw, 440 | 441 | at::Tensor& input1, 442 | int ic, 443 | int ih, 444 | int iw, 445 | int isb, 446 | int isc, 447 | int ish, 448 | int isw, 449 | 450 | at::Tensor& input2, 451 | int gsb, 452 | int gsc, 453 | int gsh, 454 | int gsw, 455 | 456 | at::Tensor& gradInput1, 457 | int gisb, 458 | int gisc, 459 | int gish, 460 | int gisw, 461 | 462 | at::Tensor& gradInput2, 463 | int ggc, 464 | int ggsb, 465 | int ggsc, 466 | int ggsh, 467 | int ggsw, 468 | 469 | at::Tensor& rInput1, 470 | at::Tensor& rInput2, 471 | int pad_size, 472 | int kernel_size, 473 | int max_displacement, 474 | int stride1, 475 | int stride2, 476 | int corr_type_multiply, 477 | cudaStream_t stream) 478 | { 479 | 480 | int batchSize = gob; 481 | int num = batchSize; 482 | 483 | int nInputChannels = ic; 484 | int inputWidth = iw; 485 | int inputHeight = ih; 486 | 487 | int nOutputChannels = goc; 488 | int outputWidth = gow; 489 | int outputHeight = goh; 490 | 491 | dim3 blocks_grid(batchSize, inputHeight, inputWidth); 492 | dim3 threads_block(THREADS_PER_BLOCK); 493 | 494 | 495 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input1.type(), "lltm_forward_cuda", ([&] { 496 | 497 | channels_first<<>>( 498 | input1.data(), 499 | rInput1.data(), 500 | nInputChannels, 501 | inputHeight, 502 | inputWidth, 503 | pad_size 504 | ); 505 | })); 506 | 507 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input2.type(), "lltm_forward_cuda", ([&] { 508 | 509 | channels_first<<>>( 510 | input2.data(), 511 | rInput2.data(), 512 | nInputChannels, 513 | inputHeight, 514 | inputWidth, 515 | pad_size 516 | ); 517 | })); 518 | 519 | dim3 threadsPerBlock(THREADS_PER_BLOCK); 520 | dim3 totalBlocksCorr(inputHeight, inputWidth, nInputChannels); 521 | 522 | for (int n = 0; n < num; ++n) { 523 | 524 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input2.type(), "lltm_forward_cuda", ([&] { 525 | 526 | 527 | correlation_backward_input1<<>> ( 528 | n, gradInput1.data(), nInputChannels, inputHeight, inputWidth, 529 | gradOutput.data(), nOutputChannels, outputHeight, outputWidth, 530 | rInput2.data(), 531 | pad_size, 532 | kernel_size, 533 | max_displacement, 534 | stride1, 535 | stride2); 536 | })); 537 | } 538 | 539 | for(int n = 0; n < batchSize; n++) { 540 | 541 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(rInput1.type(), "lltm_forward_cuda", ([&] { 542 | 543 | correlation_backward_input2<<>>( 544 | n, gradInput2.data(), nInputChannels, inputHeight, inputWidth, 545 | gradOutput.data(), nOutputChannels, outputHeight, outputWidth, 546 | rInput1.data(), 547 | pad_size, 548 | kernel_size, 549 | max_displacement, 550 | stride1, 551 | stride2); 552 | 553 | })); 554 | } 555 | 556 | // check for errors 557 | cudaError_t err = cudaGetLastError(); 558 | if (err != cudaSuccess) { 559 | printf("error in correlation_backward_cuda_kernel: %s\n", cudaGetErrorString(err)); 560 | return 0; 561 | } 562 | 563 | return 1; 564 | } 565 | -------------------------------------------------------------------------------- /core/correlation_package/correlation_cuda_kernel.cuh: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | #include 6 | 7 | int correlation_forward_cuda_kernel(at::Tensor& output, 8 | int ob, 9 | int oc, 10 | int oh, 11 | int ow, 12 | int osb, 13 | int osc, 14 | int osh, 15 | int osw, 16 | 17 | at::Tensor& input1, 18 | int ic, 19 | int ih, 20 | int iw, 21 | int isb, 22 | int isc, 23 | int ish, 24 | int isw, 25 | 26 | at::Tensor& input2, 27 | int gc, 28 | int gsb, 29 | int gsc, 30 | int gsh, 31 | int gsw, 32 | 33 | at::Tensor& rInput1, 34 | at::Tensor& rInput2, 35 | int pad_size, 36 | int kernel_size, 37 | int max_displacement, 38 | int stride1, 39 | int stride2, 40 | int corr_type_multiply, 41 | cudaStream_t stream); 42 | 43 | 44 | int correlation_backward_cuda_kernel( 45 | at::Tensor& gradOutput, 46 | int gob, 47 | int goc, 48 | int goh, 49 | int gow, 50 | int gosb, 51 | int gosc, 52 | int gosh, 53 | int gosw, 54 | 55 | at::Tensor& input1, 56 | int ic, 57 | int ih, 58 | int iw, 59 | int isb, 60 | int isc, 61 | int ish, 62 | int isw, 63 | 64 | at::Tensor& input2, 65 | int gsb, 66 | int gsc, 67 | int gsh, 68 | int gsw, 69 | 70 | at::Tensor& gradInput1, 71 | int gisb, 72 | int gisc, 73 | int gish, 74 | int gisw, 75 | 76 | at::Tensor& gradInput2, 77 | int ggc, 78 | int ggsb, 79 | int ggsc, 80 | int ggsh, 81 | int ggsw, 82 | 83 | at::Tensor& rInput1, 84 | at::Tensor& rInput2, 85 | int pad_size, 86 | int kernel_size, 87 | int max_displacement, 88 | int stride1, 89 | int stride2, 90 | int corr_type_multiply, 91 | cudaStream_t stream); 92 | -------------------------------------------------------------------------------- /core/correlation_package/pyproject.toml: -------------------------------------------------------------------------------- 1 | [build-system] 2 | # Minimum requirements for the build system to execute. 3 | requires = ["setuptools", "wheel", "numpy", "torch==1.0.1.post2"] # PEP 508 specifications. 4 | -------------------------------------------------------------------------------- /core/correlation_package/setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | import os 3 | 4 | import torch 5 | from setuptools import find_packages, setup 6 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension 7 | 8 | cxx_args = ['-std=c++14'] 9 | 10 | nvcc_args = [ 11 | '-gencode', 'arch=compute_50,code=sm_50', 12 | '-gencode', 'arch=compute_52,code=sm_52', 13 | '-gencode', 'arch=compute_60,code=sm_60', 14 | '-gencode', 'arch=compute_61,code=sm_61', 15 | '-gencode', 'arch=compute_70,code=sm_70', 16 | '-gencode', 'arch=compute_70,code=compute_70' 17 | ] 18 | 19 | setup( 20 | name='correlation_cuda', 21 | ext_modules=[ 22 | CUDAExtension('correlation_cuda', [ 23 | 'correlation_cuda.cc', 24 | 'correlation_cuda_kernel.cu' 25 | ], extra_compile_args={'cxx': cxx_args, 'nvcc': nvcc_args}) 26 | ], 27 | cmdclass={ 28 | 'build_ext': BuildExtension 29 | }) 30 | -------------------------------------------------------------------------------- /core/data_preprocess.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import visibility 3 | import mathutils 4 | import numpy as np 5 | 6 | import sys 7 | sys.path.append('core') 8 | from utils_point import overlay_imgs, rotate_back 9 | from camera_model import CameraModel 10 | 11 | class Data_preprocess: 12 | def __init__(self, calibs, occlusion_threshold, occlusion_kernel): 13 | self.real_shape = None 14 | self.calibs = calibs 15 | self.occlusion_threshold = occlusion_threshold 16 | self.occlusion_kernel = occlusion_kernel 17 | 18 | def delta_1(self, uv_RT, uv, VI_indexes_RT, VI_indexes): 19 | indexes = VI_indexes_RT & VI_indexes 20 | 21 | indexes_1 = indexes[VI_indexes_RT] 22 | indexes_2 = indexes[VI_indexes] 23 | 24 | delta_P = uv[indexes_2, :] - uv_RT[indexes_1, :] 25 | 26 | return delta_P, indexes 27 | 28 | def gen_depth_img(self, uv_RT_af_index, depth_RT_af_index, indexes_uvRT, cam_params): 29 | device = uv_RT_af_index.device 30 | 31 | depth_img_RT = torch.zeros(self.real_shape[:2], device=device, dtype=torch.float) 32 | depth_img_RT += 1000. 33 | 34 | idx_img = (-1) * torch.ones(self.real_shape[:2], device=device, dtype=torch.float) 35 | indexes_uvRT = indexes_uvRT.float() 36 | 37 | depth_img_RT, idx_img = visibility.depth_image(uv_RT_af_index, depth_RT_af_index, indexes_uvRT, 38 | depth_img_RT, idx_img, uv_RT_af_index.shape[0], 39 | self.real_shape[1], self.real_shape[0]) 40 | depth_img_RT[depth_img_RT == 1000.] = 0. 41 | 42 | deoccl_index_img = (-1) * torch.ones(self.real_shape[:2], device=device, dtype=torch.float) 43 | 44 | depth_img_no_occlusion_RT = torch.zeros_like(depth_img_RT, device=device) 45 | depth_img_no_occlusion_RT, deoccl_index_img = visibility.visibility2(depth_img_RT, cam_params, 46 | idx_img, 47 | depth_img_no_occlusion_RT, 48 | deoccl_index_img, 49 | depth_img_RT.shape[1], 50 | depth_img_RT.shape[0], 51 | self.occlusion_threshold, 52 | self.occlusion_kernel) 53 | 54 | return depth_img_no_occlusion_RT, deoccl_index_img.int() 55 | 56 | def fresh_indexes(self, indexes_uvRT_deoccl, indexes_uvRT): 57 | indexes_uvRT_deoccl_list_indexes = torch.where(indexes_uvRT_deoccl > 0) 58 | 59 | indexes_uvRT_deoccl_list = indexes_uvRT_deoccl[indexes_uvRT_deoccl_list_indexes[0][:], indexes_uvRT_deoccl_list_indexes[1][:]] 60 | 61 | indexes_temp = torch.zeros(indexes_uvRT.shape[0], device=indexes_uvRT_deoccl_list.device, dtype=torch.int32) 62 | indexes_temp[indexes_uvRT_deoccl_list.cpu().numpy() - 1] = indexes_uvRT_deoccl_list 63 | 64 | return indexes_temp 65 | 66 | def delta_2(self, delta_P, uv_RT_af_index, mask): 67 | device = delta_P.device 68 | 69 | delta_P_com = delta_P[mask, :] 70 | 71 | delta_P_0 = delta_P_com[:, 0] 72 | delta_P_1 = delta_P_com[:, 1] 73 | 74 | ## keep common points after deocclusion 75 | uv_RT_af_index_com = uv_RT_af_index[mask, :] 76 | 77 | ## generate displacement map 78 | project_delta_P_1 = torch.zeros(self.real_shape[:2], device=device, dtype=torch.int32) 79 | project_delta_P_2 = torch.zeros(self.real_shape[:2], device=device, dtype=torch.int32) 80 | project_delta_P_1[uv_RT_af_index_com[:, 1].cpu().numpy(), uv_RT_af_index_com[:, 0].cpu().numpy()] = delta_P_0 81 | project_delta_P_2[uv_RT_af_index_com[:, 1].cpu().numpy(), uv_RT_af_index_com[:, 0].cpu().numpy()] = delta_P_1 82 | 83 | project_delta_P_shape = list(self.real_shape[:2]) 84 | project_delta_P_shape.insert(0, 2) 85 | project_delta_P = torch.zeros(project_delta_P_shape, device=device, dtype=torch.float) 86 | 87 | project_delta_P[0, :, :] = project_delta_P_1 88 | project_delta_P[1, :, :] = project_delta_P_2 89 | 90 | return project_delta_P 91 | 92 | def DownsampleCrop_KITTI_delta(self, img, depth, displacement, split): 93 | if split == 'train': 94 | x = np.random.randint(0, img.shape[1] - 320) 95 | y = np.random.randint(0, img.shape[2] - 960) 96 | else: 97 | x = (img.shape[1] - 320) // 2 98 | y = (img.shape[2] - 960) // 2 99 | img = img[:, x:x + 320, y:y + 960] 100 | depth = depth[:, x:x + 320, y:y + 960] 101 | displacement = displacement[:, x:x + 320, y:y + 960] 102 | return img, depth, displacement 103 | 104 | 105 | def push(self, rgbs, pcs, T_errs, R_errs, device, split='train'): 106 | lidar_input = [] 107 | rgb_input = [] 108 | flow_gt = [] 109 | 110 | for idx in range(len(rgbs)): 111 | rgb = rgbs[idx].to(device) 112 | pc = pcs[idx].clone().to(device) 113 | reflectance = None 114 | 115 | self.real_shape = [rgb.shape[1], rgb.shape[2], rgb.shape[0]] 116 | 117 | R = mathutils.Quaternion(R_errs[idx].to(device)).to_matrix() 118 | R.resize_4x4() 119 | T = mathutils.Matrix.Translation(T_errs[idx].to(device)) 120 | RT = T * R 121 | 122 | pc_rotated = rotate_back(pc, RT) 123 | 124 | cam_params = self.calibs[idx] 125 | cam_model = CameraModel() 126 | cam_model.focal_length = cam_params[:2] 127 | cam_model.principal_point = cam_params[2:] 128 | cam_params = cam_params.to(device) 129 | 130 | uv, depth, _, refl, VI_indexes = cam_model.project_withindex_pytorch(pc, self.real_shape, reflectance) 131 | uv = uv.t().int().contiguous() 132 | 133 | uv_RT, depth_RT, _, refl_RT, VI_indexes_RT = cam_model.project_withindex_pytorch(pc_rotated, self.real_shape, 134 | reflectance) 135 | uv_RT = uv_RT.t().int().contiguous() 136 | 137 | delta_P, indexes = self.delta_1(uv_RT, uv, VI_indexes_RT, VI_indexes) 138 | 139 | indexes_uvRT = VI_indexes_RT[indexes] 140 | indexes_uvRT = torch.arange(indexes_uvRT.shape[0]).to(device) + 1 141 | 142 | ## keep common points 143 | uv_RT_af_index = uv_RT[indexes[VI_indexes_RT], :] 144 | depth_RT_af_index = depth_RT[indexes[VI_indexes_RT]] 145 | 146 | indexes_uv = VI_indexes[indexes] 147 | indexes_uv = torch.arange(indexes_uv.shape[0]).to(device) + 1 148 | 149 | ## keep common points 150 | uv_af_index = uv[indexes[VI_indexes], :] 151 | depth_af_index = depth[indexes[VI_indexes]] 152 | 153 | depth_img_no_occlusion_RT, indexes_uvRT_deoccl = self.gen_depth_img(uv_RT_af_index, depth_RT_af_index, 154 | indexes_uvRT, cam_params) 155 | indexes_uvRT_fresh = self.fresh_indexes(indexes_uvRT_deoccl, indexes_uvRT) 156 | 157 | depth_img_no_occlusion, indexes_uv_deoccl = self.gen_depth_img(uv_af_index, depth_af_index, indexes_uv, cam_params) 158 | indexes_uv_fresh = self.fresh_indexes(indexes_uv_deoccl, indexes_uv) 159 | 160 | ## make depth_image for training 161 | depth_img_no_occlusion_RT_training, indexes_uvRT_deoccl_training = \ 162 | self.gen_depth_img(uv_RT, depth_RT, VI_indexes_RT[VI_indexes_RT], cam_params) 163 | 164 | ## 这里归一化的时候是不是重新计算一下最大深度比较好 165 | depth_img_no_occlusion_RT_training /= 100. 166 | 167 | depth_img_no_occlusion_RT_training = depth_img_no_occlusion_RT_training.unsqueeze(0) 168 | 169 | mask1 = indexes_uv_fresh > 0 170 | mask2 = indexes_uvRT_fresh > 0 171 | mask = mask1 & mask2 172 | project_delta_P = self.delta_2(delta_P, uv_RT_af_index, mask) 173 | 174 | ## downsample and crop 175 | rgb, depth_img_no_occlusion_RT_training, project_delta_P \ 176 | = self.DownsampleCrop_KITTI_delta(rgb, depth_img_no_occlusion_RT_training, project_delta_P, split) 177 | 178 | rgb_input.append(rgb) 179 | lidar_input.append(depth_img_no_occlusion_RT_training) 180 | flow_gt.append(project_delta_P) 181 | 182 | lidar_input = torch.stack(lidar_input) 183 | rgb_input = torch.stack(rgb_input) 184 | flow_gt = torch.stack(flow_gt) 185 | 186 | return rgb_input, lidar_input, flow_gt -------------------------------------------------------------------------------- /core/datasets_kitti.py: -------------------------------------------------------------------------------- 1 | import csv 2 | import os 3 | from math import radians 4 | 5 | import h5py 6 | import mathutils 7 | import numpy as np 8 | import pandas as pd 9 | import torch 10 | import torchvision.transforms.functional as TTF 11 | from PIL import Image 12 | from torch.utils.data import Dataset 13 | from torchvision import transforms 14 | 15 | import sys 16 | sys.path.append("core") 17 | from camera_model import CameraModel 18 | from utils_point import rotate_forward, rotate_back, invert_pose 19 | 20 | 21 | 22 | def get_calib_kitti(sequence): 23 | if sequence == 0: 24 | return torch.tensor([718.856, 718.856, 607.1928, 185.2157]) 25 | elif sequence == 3: 26 | return torch.tensor([721.5377, 721.5377, 609.5593, 172.854]) 27 | elif sequence in [5, 6, 7, 8, 9]: 28 | return torch.tensor([707.0912, 707.0912, 601.8873, 183.1104]) 29 | else: 30 | raise TypeError("Sequence Not Available") 31 | 32 | 33 | class DatasetVisibilityKittiSingle(Dataset): 34 | 35 | def __init__(self, dataset_dir, transform=None, augmentation=False, maps_folder='local_maps_0.1', 36 | use_reflectance=False, max_t=2., max_r=10., split='test', device='cpu', test_sequence='00'): 37 | super(DatasetVisibilityKittiSingle, self).__init__() 38 | self.use_reflectance = use_reflectance 39 | self.maps_folder = maps_folder 40 | self.device = device 41 | self.max_r = max_r 42 | self.max_t = max_t 43 | self.augmentation = augmentation 44 | self.root_dir = dataset_dir 45 | self.transform = transform 46 | self.split = split 47 | self.GTs_R = {} 48 | self.GTs_T = {} 49 | 50 | self.all_files = [] 51 | self.model = CameraModel() 52 | self.model.focal_length = [7.18856e+02, 7.18856e+02] 53 | self.model.principal_point = [6.071928e+02, 1.852157e+02] 54 | for dir in ['00', '03', '05', '06', '07', '08', '09']: 55 | self.GTs_R[dir] = [] 56 | self.GTs_T[dir] = [] 57 | df_locations = pd.read_csv(os.path.join(dataset_dir, dir, 'poses.csv'), sep=',', dtype={'timestamp': str}) 58 | for index, row in df_locations.iterrows(): 59 | if not os.path.exists(os.path.join(dataset_dir, dir, maps_folder, f"{int(row['timestamp']):06d}"+'.h5')): 60 | continue 61 | if not os.path.exists(os.path.join(dataset_dir, dir, 'image_2', f"{int(row['timestamp']):06d}"+'.png')): 62 | continue 63 | if dir == test_sequence and split.startswith('test'): 64 | self.all_files.append(os.path.join(dir, f"{int(row['timestamp']):06d}")) 65 | elif (not dir == test_sequence) and split == 'train': 66 | self.all_files.append(os.path.join(dir, f"{int(row['timestamp']):06d}")) 67 | GT_R = np.array([row['qw'], row['qx'], row['qy'], row['qz']]) 68 | GT_T = np.array([row['x'], row['y'], row['z']]) 69 | self.GTs_R[dir].append(GT_R) 70 | self.GTs_T[dir].append(GT_T) 71 | 72 | self.test_RT = [] 73 | if split == 'test': 74 | test_RT_file = os.path.join(dataset_dir, f'test_RT_seq{test_sequence}_{max_r:.2f}_{max_t:.2f}.csv') 75 | if os.path.exists(test_RT_file): 76 | print(f'TEST SET: Using this file: {test_RT_file}') 77 | df_test_RT = pd.read_csv(test_RT_file, sep=',') 78 | for index, row in df_test_RT.iterrows(): 79 | self.test_RT.append(list(row)) 80 | else: 81 | print(f'TEST SET - Not found: {test_RT_file}') 82 | print("Generating a new one") 83 | test_RT_file = open(test_RT_file, 'w') 84 | test_RT_file = csv.writer(test_RT_file, delimiter=',') 85 | test_RT_file.writerow(['id', 'tx', 'ty', 'tz', 'rx', 'ry', 'rz']) 86 | for i in range(len(self.all_files)): 87 | rotz = np.random.uniform(-max_r, max_r) * (3.141592 / 180.0) 88 | roty = np.random.uniform(-max_r, max_r) * (3.141592 / 180.0) 89 | rotx = np.random.uniform(-max_r, max_r) * (3.141592 / 180.0) 90 | transl_x = np.random.uniform(-max_t, max_t) 91 | transl_y = np.random.uniform(-max_t, max_t) 92 | transl_z = np.random.uniform(-max_t, min(max_t, 1.)) 93 | test_RT_file.writerow([i, transl_x, transl_y, transl_z, 94 | rotx, roty, rotz]) 95 | self.test_RT.append([i, transl_x, transl_y, transl_z, 96 | rotx, roty, rotz]) 97 | 98 | assert len(self.test_RT) == len(self.all_files), "Something wrong with test RTs" 99 | 100 | def get_ground_truth_poses(self, sequence, frame): 101 | return self.GTs_T[sequence][frame], self.GTs_R[sequence][frame] 102 | 103 | def custom_transform(self, rgb, img_rotation=0., flip=False): 104 | to_tensor = transforms.ToTensor() 105 | normalization = transforms.Normalize(mean=[0.485, 0.456, 0.406], 106 | std=[0.229, 0.224, 0.225]) 107 | 108 | #rgb = crop(rgb) 109 | if self.split == 'train': 110 | color_transform = transforms.ColorJitter(0.1, 0.1, 0.1) 111 | rgb = color_transform(rgb) 112 | if flip: 113 | rgb = TTF.hflip(rgb) 114 | rgb = TTF.rotate(rgb, img_rotation) 115 | 116 | rgb = to_tensor(rgb) 117 | rgb = normalization(rgb) 118 | 119 | return rgb 120 | 121 | def __len__(self): 122 | return len(self.all_files) 123 | 124 | def __getitem__(self, idx): 125 | item = self.all_files[idx] 126 | run = str(item.split('/')[0]) 127 | timestamp = str(item.split('/')[1]) 128 | img_path = os.path.join(self.root_dir, run, 'image_2', timestamp+'.png') 129 | pc_path = os.path.join(self.root_dir, run, self.maps_folder, timestamp+'.h5') 130 | 131 | try: 132 | with h5py.File(pc_path, 'r') as hf: 133 | pc = hf['PC'][:] 134 | if self.use_reflectance: 135 | reflectance = hf['intensity'][:] 136 | reflectance = torch.from_numpy(reflectance).float() 137 | except Exception as e: 138 | print(f'File Broken: {pc_path}') 139 | raise e 140 | 141 | pc_in = torch.from_numpy(pc.astype(np.float32)) 142 | if pc_in.shape[1] == 4 or pc_in.shape[1] == 3: 143 | pc_in = pc_in.t() 144 | if pc_in.shape[0] == 3: 145 | homogeneous = torch.ones(pc_in.shape[1]).unsqueeze(0) 146 | pc_in = torch.cat((pc_in, homogeneous), 0) 147 | elif pc_in.shape[0] == 4: 148 | if not torch.all(pc_in[3,:] == 1.): 149 | pc_in[3,:] = 1. 150 | else: 151 | raise TypeError("Wrong PointCloud shape") 152 | 153 | h_mirror = False 154 | if np.random.rand() > 0.5 and self.split == 'train': 155 | h_mirror = True 156 | pc_in[1, :] *= -1 157 | 158 | img = Image.open(img_path) 159 | img_rotation = 0. 160 | if self.split == 'train': 161 | img_rotation = np.random.uniform(-5, 5) 162 | try: 163 | img = self.custom_transform(img, img_rotation, h_mirror) 164 | except OSError: 165 | new_idx = np.random.randint(0, self.__len__()) 166 | return self.__getitem__(new_idx) 167 | 168 | # Rotate PointCloud for img_rotation 169 | if self.split == 'train': 170 | R = mathutils.Euler((radians(img_rotation), 0, 0), 'XYZ') 171 | T = mathutils.Vector((0., 0., 0.)) 172 | pc_in = rotate_forward(pc_in, R, T) 173 | 174 | if self.split != 'test': 175 | max_angle = self.max_r 176 | rotz = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0) 177 | roty = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0) 178 | rotx = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0) 179 | transl_x = np.random.uniform(-self.max_t, self.max_t) 180 | transl_y = np.random.uniform(-self.max_t, self.max_t) 181 | transl_z = np.random.uniform(-self.max_t, min(self.max_t, 1.)) 182 | else: 183 | initial_RT = self.test_RT[idx] 184 | rotz = initial_RT[6] 185 | roty = initial_RT[5] 186 | rotx = initial_RT[4] 187 | transl_x = initial_RT[1] 188 | transl_y = initial_RT[2] 189 | transl_z = initial_RT[3] 190 | 191 | 192 | R = mathutils.Euler((rotx, roty, rotz), 'XYZ') 193 | T = mathutils.Vector((transl_x, transl_y, transl_z)) 194 | 195 | R, T = invert_pose(R, T) 196 | R, T = torch.tensor(R), torch.tensor(T) 197 | 198 | calib = get_calib_kitti(int(run)) 199 | if h_mirror: 200 | calib[2] = (img.shape[2] / 2)*2 - calib[2] 201 | 202 | if not self.use_reflectance: 203 | sample = {'rgb': img, 'point_cloud': pc_in, 'calib': calib, 204 | 'tr_error': T, 'rot_error': R, 'idx': int(run), 'rgb_name': timestamp} 205 | else: 206 | sample = {'rgb': img, 'point_cloud': pc_in, 'reflectance': reflectance, 'calib': calib, 207 | 'tr_error': T, 'rot_error': R, 'idx': int(run), 'rgb_name': timestamp} 208 | 209 | return sample 210 | -------------------------------------------------------------------------------- /core/depth_completion.py: -------------------------------------------------------------------------------- 1 | import os 2 | import numpy as np 3 | import cv2 4 | 5 | def sparse_to_dense(sparse, max_depth=100.): 6 | ## invert 7 | valid = sparse > 0.1 8 | sparse[valid] = max_depth - sparse[valid] 9 | 10 | ## dilate 11 | custom_kernel = np.array( 12 | [ 13 | [0, 0, 1, 0, 0], 14 | [0, 1, 1, 1, 0], 15 | [1, 1, 1, 1, 1], 16 | [0, 1, 1, 1, 0], 17 | [0, 0, 1, 0, 0], 18 | ], dtype=np.uint8) 19 | sparse = cv2.dilate(sparse, custom_kernel) 20 | 21 | ## close 22 | custom_kernel = np.ones((5, 5), np.uint8) 23 | sparse = cv2.morphologyEx(sparse, cv2.MORPH_CLOSE, custom_kernel) 24 | 25 | ## fill 26 | invalid = sparse < 0.1 27 | custom_kernel = np.ones((7, 7), np.uint8) 28 | dilated = cv2.dilate(sparse, custom_kernel) 29 | sparse[invalid] = dilated[invalid] 30 | 31 | ## invert 32 | valid = sparse > 0.1 33 | sparse[valid] = max_depth - sparse[valid] 34 | 35 | return sparse -------------------------------------------------------------------------------- /core/extractor.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class ResidualBlock(nn.Module): 7 | def __init__(self, in_planes, planes, norm_fn='group', stride=1): 8 | super(ResidualBlock, self).__init__() 9 | 10 | self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, stride=stride) 11 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1) 12 | self.relu = nn.ReLU(inplace=True) 13 | 14 | num_groups = planes // 8 15 | 16 | if norm_fn == 'group': 17 | self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes) 18 | self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes) 19 | if not stride == 1: 20 | self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes) 21 | 22 | elif norm_fn == 'batch': 23 | self.norm1 = nn.BatchNorm2d(planes) 24 | self.norm2 = nn.BatchNorm2d(planes) 25 | if not stride == 1: 26 | self.norm3 = nn.BatchNorm2d(planes) 27 | 28 | elif norm_fn == 'instance': 29 | self.norm1 = nn.InstanceNorm2d(planes) 30 | self.norm2 = nn.InstanceNorm2d(planes) 31 | if not stride == 1: 32 | self.norm3 = nn.InstanceNorm2d(planes) 33 | 34 | elif norm_fn == 'none': 35 | self.norm1 = nn.Sequential() 36 | self.norm2 = nn.Sequential() 37 | if not stride == 1: 38 | self.norm3 = nn.Sequential() 39 | 40 | if stride == 1: 41 | self.downsample = None 42 | 43 | else: 44 | self.downsample = nn.Sequential( 45 | nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3) 46 | 47 | def forward(self, x): 48 | y = x 49 | y = self.relu(self.norm1(self.conv1(y))) 50 | y = self.relu(self.norm2(self.conv2(y))) 51 | 52 | if self.downsample is not None: 53 | x = self.downsample(x) 54 | 55 | return self.relu(x + y) 56 | 57 | class ResidualBlock_Lidar(nn.Module): 58 | def __init__(self, in_planes, planes, norm_fn='group', stride=1): 59 | super(ResidualBlock_Lidar, self).__init__() 60 | 61 | self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, stride=stride) 62 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1) 63 | self.relu = nn.ReLU(inplace=True) 64 | 65 | num_groups = planes // 8 66 | 67 | if norm_fn == 'group': 68 | self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes) 69 | self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes) 70 | if not stride == 1: 71 | self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes) 72 | 73 | elif norm_fn == 'batch': 74 | self.norm1 = nn.BatchNorm2d(planes) 75 | self.norm2 = nn.BatchNorm2d(planes) 76 | if not stride == 1: 77 | self.norm3 = nn.BatchNorm2d(planes) 78 | 79 | elif norm_fn == 'instance': 80 | self.norm1 = nn.InstanceNorm2d(planes) 81 | self.norm2 = nn.InstanceNorm2d(planes) 82 | if not stride == 1: 83 | self.norm3 = nn.InstanceNorm2d(planes) 84 | 85 | elif norm_fn == 'none': 86 | self.norm1 = nn.Sequential() 87 | self.norm2 = nn.Sequential() 88 | if not stride == 1: 89 | self.norm3 = nn.Sequential() 90 | 91 | if stride == 1: 92 | self.downsample = None 93 | 94 | else: 95 | self.downsample = nn.Sequential( 96 | nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3) 97 | 98 | def forward(self, x): 99 | if isinstance(x, tuple): 100 | mask = x[1] 101 | x = x[0] 102 | else: 103 | Dim = x.shape[1] 104 | mask = x[:, Dim-1, :, :].unsqueeze(1) 105 | x = x[:, :Dim-1, :, :] 106 | y = x 107 | y = self.relu(self.norm1(self.conv1(y))) 108 | if y.shape[2] != x.shape[2]: 109 | mask = mask[:, :, 0::2, 0::2] 110 | y = torch.mul(y, mask) 111 | y = self.relu(self.norm2(self.conv2(y))) 112 | y = torch.mul(y, mask) 113 | if self.downsample is not None: 114 | x = self.downsample(x) 115 | x = torch.mul(x, mask) 116 | 117 | return self.relu(x + y), mask 118 | 119 | 120 | class BottleneckBlock(nn.Module): 121 | def __init__(self, in_planes, planes, norm_fn='group', stride=1): 122 | super(BottleneckBlock, self).__init__() 123 | 124 | self.conv1 = nn.Conv2d(in_planes, planes // 4, kernel_size=1, padding=0) 125 | self.conv2 = nn.Conv2d(planes // 4, planes // 4, kernel_size=3, padding=1, stride=stride) 126 | self.conv3 = nn.Conv2d(planes // 4, planes, kernel_size=1, padding=0) 127 | self.relu = nn.ReLU(inplace=True) 128 | 129 | num_groups = planes // 8 130 | 131 | if norm_fn == 'group': 132 | self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes // 4) 133 | self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes // 4) 134 | self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes) 135 | if not stride == 1: 136 | self.norm4 = nn.GroupNorm(num_groups=num_groups, num_channels=planes) 137 | 138 | elif norm_fn == 'batch': 139 | self.norm1 = nn.BatchNorm2d(planes // 4) 140 | self.norm2 = nn.BatchNorm2d(planes // 4) 141 | self.norm3 = nn.BatchNorm2d(planes) 142 | if not stride == 1: 143 | self.norm4 = nn.BatchNorm2d(planes) 144 | 145 | elif norm_fn == 'instance': 146 | self.norm1 = nn.InstanceNorm2d(planes // 4) 147 | self.norm2 = nn.InstanceNorm2d(planes // 4) 148 | self.norm3 = nn.InstanceNorm2d(planes) 149 | if not stride == 1: 150 | self.norm4 = nn.InstanceNorm2d(planes) 151 | 152 | elif norm_fn == 'none': 153 | self.norm1 = nn.Sequential() 154 | self.norm2 = nn.Sequential() 155 | self.norm3 = nn.Sequential() 156 | if not stride == 1: 157 | self.norm4 = nn.Sequential() 158 | 159 | if stride == 1: 160 | self.downsample = None 161 | 162 | else: 163 | self.downsample = nn.Sequential( 164 | nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm4) 165 | 166 | def forward(self, x): 167 | y = x 168 | y = self.relu(self.norm1(self.conv1(y))) 169 | y = self.relu(self.norm2(self.conv2(y))) 170 | y = self.relu(self.norm3(self.conv3(y))) 171 | 172 | if self.downsample is not None: 173 | x = self.downsample(x) 174 | 175 | return self.relu(x + y) 176 | 177 | 178 | class BasicEncoder(nn.Module): 179 | def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0): 180 | super(BasicEncoder, self).__init__() 181 | self.norm_fn = norm_fn 182 | 183 | if self.norm_fn == 'group': 184 | self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64) 185 | 186 | elif self.norm_fn == 'batch': 187 | self.norm1 = nn.BatchNorm2d(64) 188 | 189 | elif self.norm_fn == 'instance': 190 | self.norm1 = nn.InstanceNorm2d(64) 191 | 192 | elif self.norm_fn == 'none': 193 | self.norm1 = nn.Sequential() 194 | 195 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3) 196 | self.relu1 = nn.ReLU(inplace=True) 197 | 198 | self.in_planes = 64 199 | self.layer1 = self._make_layer(64, stride=1) 200 | self.layer2 = self._make_layer(96, stride=2) 201 | self.layer3 = self._make_layer(128, stride=2) 202 | 203 | # output convolution 204 | self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1) 205 | 206 | self.dropout = None 207 | if dropout > 0: 208 | self.dropout = nn.Dropout2d(p=dropout) 209 | 210 | for m in self.modules(): 211 | if isinstance(m, nn.Conv2d): 212 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 213 | elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)): 214 | if m.weight is not None: 215 | nn.init.constant_(m.weight, 1) 216 | if m.bias is not None: 217 | nn.init.constant_(m.bias, 0) 218 | 219 | def _make_layer(self, dim, stride=1): 220 | layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride) 221 | layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1) 222 | layers = (layer1, layer2) 223 | 224 | self.in_planes = dim 225 | return nn.Sequential(*layers) 226 | 227 | def forward(self, x): 228 | 229 | # if input is list, combine batch dimension 230 | is_list = isinstance(x, tuple) or isinstance(x, list) 231 | if is_list: 232 | batch_dim = x[0].shape[0] 233 | x = torch.cat(x, dim=0) 234 | 235 | x = self.conv1(x) 236 | # import cv2 237 | # cv2.imwrite(f"./images/output/image.png", torch.sum(x, dim=1)[0].cpu().detach().numpy()*255) 238 | x = self.norm1(x) 239 | x = self.relu1(x) 240 | 241 | x = self.layer1(x) 242 | x = self.layer2(x) 243 | x = self.layer3(x) 244 | 245 | x = self.conv2(x) 246 | 247 | if self.training and self.dropout is not None: 248 | x = self.dropout(x) 249 | 250 | if is_list: 251 | x = torch.split(x, [batch_dim, batch_dim], dim=0) 252 | 253 | return x 254 | 255 | 256 | class BasicEncoder_LIDAR(nn.Module): 257 | def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0): 258 | super(BasicEncoder_LIDAR, self).__init__() 259 | self.norm_fn = norm_fn 260 | 261 | if self.norm_fn == 'group': 262 | self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64) 263 | 264 | elif self.norm_fn == 'batch': 265 | self.norm1 = nn.BatchNorm2d(64) 266 | 267 | elif self.norm_fn == 'instance': 268 | self.norm1 = nn.InstanceNorm2d(64) 269 | 270 | elif self.norm_fn == 'none': 271 | self.norm1 = nn.Sequential() 272 | 273 | self.convm = nn.Conv2d(1, 1, kernel_size=7, stride=2, padding=3) 274 | 275 | self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3) 276 | self.relu1 = nn.ReLU(inplace=True) 277 | 278 | self.in_planes = 64 279 | self.layer1 = self._make_layer(64, stride=1) 280 | self.layer2 = self._make_layer(96, stride=2) 281 | self.layer3 = self._make_layer(128, stride=2) 282 | 283 | # output convolution 284 | self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1) 285 | 286 | self.dropout = None 287 | if dropout > 0: 288 | self.dropout = nn.Dropout2d(p=dropout) 289 | 290 | for m in self.modules(): 291 | if isinstance(m, nn.Conv2d): 292 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') 293 | elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)): 294 | if m.weight is not None: 295 | nn.init.constant_(m.weight, 1) 296 | if m.bias is not None: 297 | nn.init.constant_(m.bias, 0) 298 | 299 | def _make_layer(self, dim, stride=1): 300 | layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride) 301 | layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1) 302 | layers = (layer1, layer2) 303 | 304 | self.in_planes = dim 305 | return nn.Sequential(*layers) 306 | 307 | def forward(self, x, mask): 308 | 309 | # if input is list, combine batch dimension 310 | is_list = isinstance(x, tuple) or isinstance(x, list) 311 | if is_list: 312 | batch_dim = x[0].shape[0] 313 | x = torch.cat(x, dim=0) 314 | 315 | mask = self.convm(mask) 316 | 317 | # if i_batch is not None: 318 | # import cv2 319 | # import seaborn 320 | # import matplotlib.pyplot as plt 321 | # import numpy as np 322 | # heatmap = seaborn.heatmap(mask[0, 0, :, :].cpu().detach().numpy(), xticklabels=False, yticklabels=False, cbar=False, square=True, robust=True, cmap='gist_rainbow') 323 | # figure = heatmap.get_figure() 324 | # figure.savefig(f"./images/output/{i_batch:06d}_heatmap.png") 325 | # plt.close() 326 | 327 | x = self.conv1(x) # 160 480 64 328 | x = torch.mul(x, mask) 329 | 330 | x = self.norm1(x) 331 | x = self.relu1(x) 332 | x = self.layer1(x) # 160 480 64 333 | x = self.layer2(x) # 80 240 96 334 | x = self.layer3(x) # 40 120 128 335 | 336 | x = self.conv2(x) # 40 120 128 337 | 338 | if self.training and self.dropout is not None: 339 | x = self.dropout(x) 340 | 341 | if is_list: 342 | x = torch.split(x, [batch_dim, batch_dim], dim=0) 343 | 344 | return x -------------------------------------------------------------------------------- /core/flow2pose.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import visibility 3 | import cv2 4 | import numpy as np 5 | import mathutils 6 | import math 7 | 8 | import sys 9 | sys.path.append('core') 10 | from camera_model import CameraModel 11 | from utils_point import invert_pose, quat2mat, tvector2mat, quaternion_from_matrix, rotation_vector_to_euler 12 | from quaternion_distances import quaternion_distance 13 | 14 | def Flow2Pose(flow_up, depth, calib): 15 | device = flow_up.device 16 | 17 | output = torch.zeros(flow_up.shape).to(device) 18 | pred_depth_img = torch.zeros(depth.shape).to(device) 19 | pred_depth_img += 1000. 20 | output = visibility.image_warp_index(depth.to(device), flow_up.int(), pred_depth_img, output, 21 | depth.shape[3], depth.shape[2]) 22 | pred_depth_img[pred_depth_img == 1000.] = 0. 23 | pc_project_uv = output.cpu().permute(0, 2, 3, 1).numpy() 24 | 25 | depth_img_ori = depth.cpu().numpy() * 100. 26 | 27 | mask_depth_1 = pc_project_uv[0, :, :, 0] != 0 28 | mask_depth_2 = pc_project_uv[0, :, :, 1] != 0 29 | mask_depth = mask_depth_1 + mask_depth_2 30 | depth_img = depth_img_ori[0, 0, :, :] * mask_depth 31 | 32 | cam_model = CameraModel() 33 | cam_params = calib[0].cpu().numpy() 34 | x, y = 28, 140 35 | cam_params[2] = cam_params[2] + 480 - (y + y + 960) / 2. 36 | cam_params[3] = cam_params[3] + 160 - (x + x + 320) / 2. 37 | cam_model.focal_length = cam_params[:2] 38 | cam_model.principal_point = cam_params[2:] 39 | cam_mat = np.array([[cam_params[0], 0, cam_params[2]], [0, cam_params[1], cam_params[3]], [0, 0, 1.]]) 40 | 41 | pts3d, pts2d, match_index = cam_model.deproject_pytorch(depth_img, pc_project_uv[0, :, :, :]) 42 | ret, rvecs, tvecs, inliers = cv2.solvePnPRansac(pts3d, pts2d, cam_mat, None) 43 | 44 | reuler = rotation_vector_to_euler(rvecs) 45 | R = mathutils.Euler((reuler[0], reuler[1], reuler[2]), 'XYZ') 46 | T = mathutils.Vector((tvecs[0], tvecs[1], tvecs[2])) 47 | R, T = invert_pose(R, T) 48 | R, T = torch.tensor(R), torch.tensor(T) 49 | R_predicted = R[[0, 3, 1, 2]] 50 | T_predicted = T[[2, 0, 1]] 51 | 52 | return R_predicted, T_predicted 53 | 54 | def Flow2PoseBPnP(flow_up, depth, calib, bpnp): 55 | device =flow_up.device 56 | output = torch.zeros(flow_up.shape).to(device) 57 | pred_depth_img = torch.zeros(depth.shape).to(device) 58 | pred_depth_img += 1000. 59 | output = visibility.image_warp_index(depth.to(device), flow_up.int().to(device), pred_depth_img, output, 60 | depth.shape[3], depth.shape[2]) 61 | pred_depth_img[pred_depth_img == 1000.] = 0. 62 | pc_project_uv = output.cpu().permute(0, 2, 3, 1).numpy() 63 | 64 | depth_img_ori = depth.cpu().numpy() * 100. 65 | 66 | mask_depth_1 = pc_project_uv[0, :, :, 0] != 0 67 | mask_depth_2 = pc_project_uv[0, :, :, 1] != 0 68 | mask_depth = mask_depth_1 + mask_depth_2 69 | depth_img = depth_img_ori[0, 0, :, :] * mask_depth 70 | 71 | cam_model = CameraModel() 72 | cam_params = calib[0].cpu().numpy() 73 | x, y = 28, 140 74 | cam_params[2] = cam_params[2] + 480 - (y + y + 960) / 2. 75 | cam_params[3] = cam_params[3] + 160 - (x + x + 320) / 2. 76 | cam_model.focal_length = cam_params[:2] 77 | cam_model.principal_point = cam_params[2:] 78 | cam_mat = np.array([[cam_params[0], 0, cam_params[2]], [0, cam_params[1], cam_params[3]], [0, 0, 1.]]) 79 | 80 | pts3d, pts2d, match_index = cam_model.deproject_pytorch(depth_img, pc_project_uv[0, :, :, :]) 81 | 82 | pts3d = torch.tensor(pts3d, dtype=torch.float32).to(device) 83 | pts2d = torch.tensor(pts2d, dtype=torch.float32).to(device) 84 | pts2d = pts2d.unsqueeze(0) 85 | K = torch.tensor(cam_mat, dtype=torch.float32).to(device) 86 | P_out = bpnp(pts2d, pts3d, K) 87 | rvecs = P_out[0, 0:3] 88 | tvecs = P_out[0, 3:] 89 | 90 | R = mathutils.Euler((rvecs[0], rvecs[1], rvecs[2]), 'XYZ') 91 | T = mathutils.Vector((tvecs[0], tvecs[1], tvecs[2])) 92 | R, T = invert_pose(R, T) 93 | R, T = torch.tensor(R), torch.tensor(T) 94 | R_predicted = R[[0, 3, 1, 2]] 95 | T_predicted = T[[2, 0, 1]] 96 | 97 | return R_predicted, T_predicted 98 | 99 | def err_Pose(R_pred, T_pred, R_gt, T_gt): 100 | device = R_pred.device 101 | 102 | R = quat2mat(R_gt) 103 | T = tvector2mat(T_gt) 104 | RT_inv = torch.mm(T, R).to(device) 105 | RT = RT_inv.clone().inverse() 106 | 107 | R_pred = quat2mat(R_pred) 108 | T_pred = tvector2mat(T_pred) 109 | RT_pred = torch.mm(T_pred, R_pred) 110 | RT_pred = RT_pred.to(device) 111 | RT_new = torch.mm(RT, RT_pred) 112 | 113 | T_composed = RT_new[:3, 3] 114 | R_composed = quaternion_from_matrix(RT_new) 115 | R_composed = R_composed.unsqueeze(0) 116 | total_trasl_error = torch.tensor(0.0).to(device) 117 | total_rot_error = quaternion_distance(R_composed.to(device), torch.tensor([[1., 0., 0., 0.]]).to(device), 118 | device=R_composed.device) 119 | total_rot_error = total_rot_error * 180. / math.pi 120 | total_trasl_error += torch.norm(T_composed.to(device)) * 100. 121 | 122 | total_trasl_fail = torch.norm(T_composed - T_gt[0].to(device)) * 100 123 | if total_trasl_fail > 400: 124 | is_fail = True 125 | else: 126 | is_fail = False 127 | return total_rot_error, total_trasl_error, is_fail 128 | 129 | 130 | -------------------------------------------------------------------------------- /core/flow_viz.py: -------------------------------------------------------------------------------- 1 | # Flow visualization code used from https://github.com/tomrunia/OpticalFlow_Visualization 2 | 3 | 4 | # MIT License 5 | # 6 | # Copyright (c) 2018 Tom Runia 7 | # 8 | # Permission is hereby granted, free of charge, to any person obtaining a copy 9 | # of this software and associated documentation files (the "Software"), to deal 10 | # in the Software without restriction, including without limitation the rights 11 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 12 | # copies of the Software, and to permit persons to whom the Software is 13 | # furnished to do so, subject to conditions. 14 | # 15 | # Author: Tom Runia 16 | # Date Created: 2018-08-03 17 | 18 | import numpy as np 19 | 20 | def make_colorwheel(): 21 | """ 22 | Generates a color wheel for optical flow visualization as presented in: 23 | Baker et al. "A Database and Evaluation Methodology for Optical Flow" (ICCV, 2007) 24 | URL: http://vision.middlebury.edu/flow/flowEval-iccv07.pdf 25 | 26 | Code follows the original C++ source code of Daniel Scharstein. 27 | Code follows the the Matlab source code of Deqing Sun. 28 | 29 | Returns: 30 | np.ndarray: Color wheel 31 | """ 32 | 33 | RY = 15 34 | YG = 6 35 | GC = 4 36 | CB = 11 37 | BM = 13 38 | MR = 6 39 | 40 | ncols = RY + YG + GC + CB + BM + MR 41 | colorwheel = np.zeros((ncols, 3)) 42 | col = 0 43 | 44 | # RY 45 | colorwheel[0:RY, 0] = 255 46 | colorwheel[0:RY, 1] = np.floor(255*np.arange(0,RY)/RY) 47 | col = col+RY 48 | # YG 49 | colorwheel[col:col+YG, 0] = 255 - np.floor(255*np.arange(0,YG)/YG) 50 | colorwheel[col:col+YG, 1] = 255 51 | col = col+YG 52 | # GC 53 | colorwheel[col:col+GC, 1] = 255 54 | colorwheel[col:col+GC, 2] = np.floor(255*np.arange(0,GC)/GC) 55 | col = col+GC 56 | # CB 57 | colorwheel[col:col+CB, 1] = 255 - np.floor(255*np.arange(CB)/CB) 58 | colorwheel[col:col+CB, 2] = 255 59 | col = col+CB 60 | # BM 61 | colorwheel[col:col+BM, 2] = 255 62 | colorwheel[col:col+BM, 0] = np.floor(255*np.arange(0,BM)/BM) 63 | col = col+BM 64 | # MR 65 | colorwheel[col:col+MR, 2] = 255 - np.floor(255*np.arange(MR)/MR) 66 | colorwheel[col:col+MR, 0] = 255 67 | return colorwheel 68 | 69 | 70 | def flow_uv_to_colors(u, v, convert_to_bgr=False): 71 | """ 72 | Applies the flow color wheel to (possibly clipped) flow components u and v. 73 | 74 | According to the C++ source code of Daniel Scharstein 75 | According to the Matlab source code of Deqing Sun 76 | 77 | Args: 78 | u (np.ndarray): Input horizontal flow of shape [H,W] 79 | v (np.ndarray): Input vertical flow of shape [H,W] 80 | convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False. 81 | 82 | Returns: 83 | np.ndarray: Flow visualization image of shape [H,W,3] 84 | """ 85 | flow_image = np.zeros((u.shape[0], u.shape[1], 3), np.uint8) 86 | colorwheel = make_colorwheel() # shape [55x3] 87 | ncols = colorwheel.shape[0] 88 | rad = np.sqrt(np.square(u) + np.square(v)) 89 | a = np.arctan2(-v, -u)/np.pi 90 | fk = (a+1) / 2*(ncols-1) 91 | k0 = np.floor(fk).astype(np.int32) 92 | k1 = k0 + 1 93 | k1[k1 == ncols] = 0 94 | f = fk - k0 95 | for i in range(colorwheel.shape[1]): 96 | tmp = colorwheel[:,i] 97 | col0 = tmp[k0] / 255.0 98 | col1 = tmp[k1] / 255.0 99 | col = (1-f)*col0 + f*col1 100 | idx = (rad <= 1) 101 | col[idx] = 1 - rad[idx] * (1-col[idx]) 102 | col[~idx] = col[~idx] * 0.75 # out of range 103 | # Note the 2-i => BGR instead of RGB 104 | ch_idx = 2-i if convert_to_bgr else i 105 | flow_image[:,:,ch_idx] = np.floor(255 * col) 106 | return flow_image 107 | 108 | 109 | def flow_to_image(flow_uv, clip_flow=None, convert_to_bgr=False): 110 | """ 111 | Expects a two dimensional flow image of shape. 112 | 113 | Args: 114 | flow_uv (np.ndarray): Flow UV image of shape [H,W,2] 115 | clip_flow (float, optional): Clip maximum of flow values. Defaults to None. 116 | convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False. 117 | 118 | Returns: 119 | np.ndarray: Flow visualization image of shape [H,W,3] 120 | """ 121 | assert flow_uv.ndim == 3, 'input flow must have three dimensions' 122 | assert flow_uv.shape[2] == 2, 'input flow must have shape [H,W,2]' 123 | if clip_flow is not None: 124 | flow_uv = np.clip(flow_uv, 0, clip_flow) 125 | u = flow_uv[:,:,0] 126 | v = flow_uv[:,:,1] 127 | rad = np.sqrt(np.square(u) + np.square(v)) 128 | rad_max = np.max(rad) 129 | epsilon = 1e-5 130 | u = u / (rad_max + epsilon) 131 | v = v / (rad_max + epsilon) 132 | return flow_uv_to_colors(u, v, convert_to_bgr) -------------------------------------------------------------------------------- /core/losses.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import numpy as np 4 | import visibility 5 | from depth_completion import sparse_to_dense 6 | 7 | def sequence_loss(flow_preds, flow_gt, gamma=0.8, MAX_FLOW=400): 8 | """ Loss function defined over sequence of flow predictions """ 9 | 10 | mag = torch.sum(flow_gt ** 2, dim=1).sqrt() 11 | Mask = torch.zeros([flow_gt.shape[0], flow_gt.shape[1], flow_gt.shape[2], 12 | flow_gt.shape[3]]).to(flow_gt.device) 13 | mask = (flow_gt[:, 0, :, :] != 0) + (flow_gt[:, 1, :, :] != 0) 14 | valid = mask & (mag < MAX_FLOW) 15 | Mask[:, 0, :, :] = valid 16 | Mask[:, 1, :, :] = valid 17 | Mask = Mask != 0 18 | mask_sum = torch.sum(mask, dim=[1, 2]) 19 | 20 | n_predictions = len(flow_preds) 21 | flow_loss = 0.0 22 | 23 | for i in range(n_predictions): 24 | i_weight = gamma ** (n_predictions - i - 1) 25 | Loss_reg = (flow_preds[i] - flow_gt) * Mask 26 | Loss_reg = torch.norm(Loss_reg, dim=1) 27 | Loss_reg = torch.sum(Loss_reg, dim=[1, 2]) 28 | Loss_reg = Loss_reg / mask_sum 29 | flow_loss += i_weight * Loss_reg.mean() 30 | 31 | epe = torch.sum((flow_preds[-1] - flow_gt) ** 2, dim=1).sqrt() 32 | epe = epe.view(-1)[valid.view(-1)] 33 | 34 | metrics = { 35 | 'epe': epe.mean().item(), 36 | '1px': (epe < 1).float().mean().item(), 37 | '3px': (epe < 3).float().mean().item(), 38 | '5px': (epe < 5).float().mean().item(), 39 | } 40 | 41 | return flow_loss, metrics 42 | 43 | 44 | def normal_loss(pred_flows, gt_flows, cam_mats, lidar_input): 45 | device = gt_flows.device 46 | 47 | loss = 0. 48 | 49 | N = lidar_input.shape[0] 50 | flow_up = pred_flows[-1] 51 | for i in range(N): 52 | depth_img = lidar_input[i].unsqueeze(0) 53 | flow = flow_up[i].unsqueeze(0) 54 | gt_flow = gt_flows[i].unsqueeze(0) 55 | cam_params = cam_mats[i, :] 56 | fx, fy = cam_params[0], cam_params[1] 57 | 58 | output = torch.zeros(flow.shape).to(device) 59 | pred_depth_img = torch.zeros(depth_img.shape).to(device) 60 | pred_depth_img += 1000. 61 | output = visibility.image_warp_index(depth_img.to(device), flow.int().to(device), pred_depth_img, 62 | output, 63 | depth_img.shape[3], depth_img.shape[2]) 64 | pred_depth_img[pred_depth_img == 1000.] = 0. 65 | 66 | output2 = torch.zeros(flow.shape).to(device) 67 | gt_depth_img = torch.zeros(depth_img.shape).to(device) 68 | gt_depth_img += 1000. 69 | output2 = visibility.image_warp_index(depth_img.to(device), gt_flow.int().to(device), gt_depth_img, 70 | output2, 71 | depth_img.shape[3], depth_img.shape[2]) 72 | gt_depth_img[gt_depth_img == 1000.] = 0. 73 | 74 | gt_depth_img_dilate = sparse_to_dense(gt_depth_img[0, 0, :, :].cpu().numpy().astype(np.float32)) 75 | gt_depth_img_dilate = torch.from_numpy(gt_depth_img_dilate).to(device).unsqueeze(0) 76 | pred_depth_img_dilate = sparse_to_dense(pred_depth_img[0, 0, :, :].cpu().numpy().astype(np.float32)) 77 | pred_depth_img_dilate = torch.from_numpy(pred_depth_img_dilate).to(device).unsqueeze(0) 78 | 79 | ## choose common points 80 | mask1_1 = output2[0, 0, :, :] > 0 81 | mask1_2 = output2[0, 1, :, :] > 0 82 | mask1 = mask1_1 + mask1_2 83 | mask2_1 = output[0, 0, :, :] > 0 84 | mask2_2 = output[0, 1, :, :] > 0 85 | mask2 = mask2_1 + mask2_2 86 | mask = mask1 * mask2 87 | Mask = torch.cat((mask.unsqueeze(0), mask.unsqueeze(0)), dim=0).unsqueeze(0) 88 | pred_index = torch.cat((output[0, 0, :, :][mask].unsqueeze(0), output[0, 1, :, :][mask].unsqueeze(0)), dim=0) 89 | gt_index = torch.cat((output2[0, 0, :, :][mask].unsqueeze(0), output2[0, 1, :, :][mask].unsqueeze(0)), dim=0) 90 | 91 | ## calculate normal loss 92 | H, W = depth_img.shape[2], depth_img.shape[3] 93 | xx = torch.arange(0, W).view(1, -1).repeat(H, 1) 94 | yy = torch.arange(0, H).view(-1, 1).repeat(1, W) 95 | xx = xx.view(1, H, W) 96 | yy = yy.view(1, H, W) 97 | u0 = xx.to(device) - torch.tensor(W // 2, dtype=torch.float32).to(device) 98 | v0 = yy.to(device) - torch.tensor(H // 2, dtype=torch.float32).to(device) 99 | 100 | x = u0 * torch.abs(gt_depth_img_dilate) / fx 101 | y = v0 * torch.abs(gt_depth_img_dilate) / fy 102 | z = gt_depth_img_dilate 103 | pc_gt = torch.cat([x, y, z], 0).permute(1, 2, 0) 104 | x = u0 * torch.abs(pred_depth_img_dilate) / fx 105 | y = v0 * torch.abs(pred_depth_img_dilate) / fy 106 | z = pred_depth_img_dilate 107 | pc_pred = torch.cat([x, y, z], 0).permute(1, 2, 0) 108 | 109 | num = gt_index.shape[1] 110 | sample_num = int(0.1 * num) 111 | p1_index = np.random.choice(num, sample_num, replace=True) 112 | p2_index = np.random.choice(num, sample_num, replace=True) 113 | p3_index = np.random.choice(num, sample_num, replace=True) 114 | gt_p1_index = gt_index[:, p1_index].int() 115 | gt_p2_index = gt_index[:, p2_index].int() 116 | gt_p3_index = gt_index[:, p3_index].int() 117 | pred_p1_index = pred_index[:, p1_index].int() 118 | pred_p2_index = pred_index[:, p2_index].int() 119 | pred_p3_index = pred_index[:, p3_index].int() 120 | 121 | pc_gt_1 = pc_gt[gt_p1_index[0, :].cpu().numpy(), gt_p1_index[1, :].cpu().numpy(), :] 122 | pc_gt_2 = pc_gt[gt_p2_index[0, :].cpu().numpy(), gt_p2_index[1, :].cpu().numpy(), :] 123 | pc_gt_3 = pc_gt[gt_p3_index[0, :].cpu().numpy(), gt_p3_index[1, :].cpu().numpy(), :] 124 | pc_pred_1 = pc_pred[pred_p1_index[0, :].cpu().numpy(), pred_p1_index[1, :].cpu().numpy()] 125 | pc_pred_2 = pc_pred[pred_p2_index[0, :].cpu().numpy(), pred_p2_index[1, :].cpu().numpy()] 126 | pc_pred_3 = pc_pred[pred_p3_index[0, :].cpu().numpy(), pred_p3_index[1, :].cpu().numpy()] 127 | pc_gt_group = torch.cat([pc_gt_1[:, :, np.newaxis], 128 | pc_gt_2[:, :, np.newaxis], 129 | pc_gt_3[:, :, np.newaxis]], 2) # Nx3x3 130 | pc_pred_group = torch.cat([pc_pred_1[:, :, np.newaxis], 131 | pc_pred_2[:, :, np.newaxis], 132 | pc_pred_3[:, :, np.newaxis]], 2) # Nx3x3 133 | pc_gt_group.requires_grad = True 134 | pc_pred_group.requires_grad = True 135 | 136 | gt_p12 = pc_gt_group[:, :, 1] - pc_gt_group[:, :, 0] # Nx3 137 | gt_p13 = pc_gt_group[:, :, 2] - pc_gt_group[:, :, 0] # Nx3 138 | pred_p12 = pc_pred_group[:, :, 1] - pc_pred_group[:, :, 0] # Nx3 139 | pred_p13 = pc_pred_group[:, :, 2] - pc_pred_group[:, :, 0] # Nx3 140 | 141 | gt_normal = torch.cross(gt_p12, gt_p13, dim=1) 142 | gt_norm = torch.norm(gt_normal, 2, dim=1, keepdim=True) 143 | gt_normal = gt_normal / gt_norm 144 | pred_noraml = torch.cross(pred_p12, pred_p13, dim=1) 145 | pred_norm = torch.norm(pred_noraml, 2, dim=1, keepdim=True) 146 | pred_noraml = pred_noraml / pred_norm 147 | loss_p = torch.sum(torch.abs(gt_normal - pred_noraml), dim=1) 148 | loss_p = torch.mean(loss_p) 149 | 150 | loss = loss + loss_p 151 | 152 | return loss -------------------------------------------------------------------------------- /core/quaternion_distances.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------- 2 | # Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab 3 | # Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it) 4 | # Released under Creative Commons 5 | # Attribution-NonCommercial-ShareAlike 4.0 International License. 6 | # http://creativecommons.org/licenses/by-nc-sa/4.0/ 7 | # ------------------------------------------------------------------- 8 | import numpy as np 9 | import torch 10 | 11 | 12 | def quatmultiply(q, r, device='cpu'): 13 | """ 14 | Batch quaternion multiplication 15 | Args: 16 | q (torch.Tensor/np.ndarray): shape=[Nx4] 17 | r (torch.Tensor/np.ndarray): shape=[Nx4] 18 | device (str): 'cuda' or 'cpu' 19 | 20 | Returns: 21 | torch.Tensor: shape=[Nx4] 22 | """ 23 | if isinstance(q, torch.Tensor): 24 | t = torch.zeros(q.shape[0], 4, device=device) 25 | elif isinstance(q, np.ndarray): 26 | t = np.zeros(q.shape[0], 4) 27 | else: 28 | raise TypeError("Type not supported") 29 | t[:, 0] = r[:, 0] * q[:, 0] - r[:, 1] * q[:, 1] - r[:, 2] * q[:, 2] - r[:, 3] * q[:, 3] 30 | t[:, 1] = r[:, 0] * q[:, 1] + r[:, 1] * q[:, 0] - r[:, 2] * q[:, 3] + r[:, 3] * q[:, 2] 31 | t[:, 2] = r[:, 0] * q[:, 2] + r[:, 1] * q[:, 3] + r[:, 2] * q[:, 0] - r[:, 3] * q[:, 1] 32 | t[:, 3] = r[:, 0] * q[:, 3] - r[:, 1] * q[:, 2] + r[:, 2] * q[:, 1] + r[:, 3] * q[:, 0] 33 | return t 34 | 35 | 36 | def quatinv(q): 37 | """ 38 | Batch quaternion inversion 39 | Args: 40 | q (torch.Tensor/np.ndarray): shape=[Nx4] 41 | 42 | Returns: 43 | torch.Tensor/np.ndarray: shape=[Nx4] 44 | """ 45 | if isinstance(q, torch.Tensor): 46 | t = q.clone() 47 | elif isinstance(q, np.ndarray): 48 | t = q.copy() 49 | else: 50 | raise TypeError("Type not supported") 51 | t *= -1 52 | t[:, 0] *= -1 53 | return t 54 | 55 | 56 | def quaternion_distance(q, r, device): 57 | """ 58 | Batch quaternion distances, used as loss 59 | Args: 60 | q (torch.Tensor): shape=[Nx4] 61 | r (torch.Tensor): shape=[Nx4] 62 | device (str): 'cuda' or 'cpu' 63 | 64 | Returns: 65 | torch.Tensor: shape=[N] 66 | """ 67 | t = quatmultiply(q, quatinv(r), device) 68 | return 2 * torch.atan2(torch.norm(t[:, 1:], dim=1), torch.abs(t[:, 0])) 69 | -------------------------------------------------------------------------------- /core/raft.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | import torch.nn as nn 4 | import torch.nn.functional as F 5 | 6 | import sys 7 | sys.path.append("core") 8 | from update import BasicUpdateBlock 9 | from extractor import BasicEncoder, BasicEncoder_LIDAR 10 | from corr import CorrBlock, AlternateCorrBlock 11 | from utils import bilinear_sampler, coords_grid, upflow8 12 | 13 | try: 14 | autocast = torch.cuda.amp.autocast 15 | except: 16 | # dummy autocast for PyTorch < 1.6 17 | class autocast: 18 | def __init__(self, enabled): 19 | pass 20 | 21 | def __enter__(self): 22 | pass 23 | 24 | def __exit__(self, *args): 25 | pass 26 | 27 | 28 | class RAFT(nn.Module): 29 | def __init__(self, args): 30 | super(RAFT, self).__init__() 31 | self.args = args 32 | 33 | self.hidden_dim = hdim = 128 34 | self.context_dim = cdim = 128 35 | args.corr_levels = 4 36 | args.corr_radius = 4 37 | 38 | if 'dropout' not in self.args: 39 | self.args.dropout = 0 40 | 41 | if 'alternate_corr' not in self.args: 42 | self.args.alternate_corr = False 43 | 44 | # feature network, context network, and update block 45 | self.fnet = BasicEncoder(output_dim=256, norm_fn='instance', dropout=args.dropout) 46 | self.fnet_lidar = BasicEncoder_LIDAR(output_dim=256, norm_fn='instance', dropout=args.dropout) 47 | self.cnet = BasicEncoder_LIDAR(output_dim=hdim + cdim, norm_fn='batch', dropout=args.dropout) 48 | self.update_block = BasicUpdateBlock(self.args, hidden_dim=hdim) 49 | 50 | def freeze_bn(self): 51 | for m in self.modules(): 52 | if isinstance(m, nn.BatchNorm2d): 53 | m.eval() 54 | 55 | def initialize_flow(self, img): 56 | """ Flow is represented as difference between two coordinate grids flow = coords1 - coords0""" 57 | N, C, H, W = img.shape 58 | coords0 = coords_grid(N, H // 8, W // 8).to(img.device) 59 | coords1 = coords_grid(N, H // 8, W // 8).to(img.device) 60 | 61 | # optical flow computed as difference: flow = coords1 - coords0 62 | return coords0, coords1 63 | 64 | def upsample_flow(self, flow, mask): 65 | """ Upsample flow field [H/8, W/8, 2] -> [H, W, 2] using convex combination """ 66 | N, _, H, W = flow.shape 67 | mask = mask.view(N, 1, 9, 8, 8, H, W) 68 | mask = torch.softmax(mask, dim=2) 69 | 70 | up_flow = F.unfold(8 * flow, [3, 3], padding=1) 71 | up_flow = up_flow.view(N, 2, 9, 1, 1, H, W) 72 | 73 | up_flow = torch.sum(mask * up_flow, dim=2) 74 | up_flow = up_flow.permute(0, 1, 4, 2, 5, 3) 75 | return up_flow.reshape(N, 2, 8 * H, 8 * W) 76 | 77 | def forward(self, image1, image2, iters=12, lidar_mask=None, flow_init=None, upsample=True, test_mode=False): 78 | """ Estimate optical flow between pair of frames """ 79 | 80 | image1 = 2 * image1 - 1.0 81 | 82 | image1 = image1.contiguous() 83 | image2 = image2.contiguous() 84 | 85 | hdim = self.hidden_dim 86 | cdim = self.context_dim 87 | 88 | # run the feature network 89 | with autocast(enabled=self.args.mixed_precision): 90 | fmap2 = self.fnet(image2) 91 | fmap1 = self.fnet_lidar(image1, lidar_mask) 92 | 93 | fmap1 = fmap1.float() 94 | fmap2 = fmap2.float() 95 | 96 | if self.args.alternate_corr: 97 | corr_fn = AlternateCorrBlock(fmap1, fmap2, radius=self.args.corr_radius) 98 | else: 99 | corr_fn = CorrBlock(fmap1, fmap2, radius=self.args.corr_radius) 100 | 101 | # run the context network 102 | with autocast(enabled=self.args.mixed_precision): 103 | cnet = self.cnet(image1, lidar_mask) 104 | net, inp = torch.split(cnet, [hdim, cdim], dim=1) 105 | net = torch.tanh(net) 106 | inp = torch.relu(inp) 107 | 108 | coords0, coords1 = self.initialize_flow(image1) 109 | 110 | if flow_init is not None: 111 | coords1 = coords1 + flow_init 112 | 113 | flow_predictions = [] 114 | for itr in range(iters): 115 | coords1 = coords1.detach() 116 | 117 | corr = corr_fn(coords1) # index correlation volume 118 | 119 | flow = coords1 - coords0 120 | with autocast(enabled=self.args.mixed_precision): 121 | net, up_mask, delta_flow = self.update_block(net, inp, corr, flow) 122 | 123 | # F(t+1) = F(t) + \Delta(t) 124 | coords1 = coords1 + delta_flow 125 | 126 | # upsample predictions 127 | if up_mask is None: 128 | flow_up = upflow8(coords1 - coords0) 129 | else: 130 | flow_up = self.upsample_flow(coords1 - coords0, up_mask) 131 | 132 | flow_predictions.append(flow_up) 133 | 134 | if test_mode: 135 | return coords1 - coords0, flow_up 136 | 137 | return flow_predictions 138 | -------------------------------------------------------------------------------- /core/update.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | 6 | class FlowHead(nn.Module): 7 | def __init__(self, input_dim=128, hidden_dim=256): 8 | super(FlowHead, self).__init__() 9 | self.conv1 = nn.Conv2d(input_dim, hidden_dim, 3, padding=1) 10 | self.conv2 = nn.Conv2d(hidden_dim, 2, 3, padding=1) 11 | self.relu = nn.ReLU(inplace=True) 12 | 13 | def forward(self, x): 14 | return self.conv2(self.relu(self.conv1(x))) 15 | 16 | class ConvGRU(nn.Module): 17 | def __init__(self, hidden_dim=128, input_dim=192+128): 18 | super(ConvGRU, self).__init__() 19 | self.convz = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1) 20 | self.convr = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1) 21 | self.convq = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1) 22 | 23 | def forward(self, h, x): 24 | hx = torch.cat([h, x], dim=1) 25 | 26 | z = torch.sigmoid(self.convz(hx)) 27 | r = torch.sigmoid(self.convr(hx)) 28 | q = torch.tanh(self.convq(torch.cat([r*h, x], dim=1))) 29 | 30 | h = (1-z) * h + z * q 31 | return h 32 | 33 | class SepConvGRU(nn.Module): 34 | def __init__(self, hidden_dim=128, input_dim=192+128): 35 | super(SepConvGRU, self).__init__() 36 | self.convz1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2)) 37 | self.convr1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2)) 38 | self.convq1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2)) 39 | 40 | self.convz2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0)) 41 | self.convr2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0)) 42 | self.convq2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0)) 43 | 44 | 45 | def forward(self, h, x): 46 | # horizontal 47 | hx = torch.cat([h, x], dim=1) 48 | z = torch.sigmoid(self.convz1(hx)) 49 | r = torch.sigmoid(self.convr1(hx)) 50 | q = torch.tanh(self.convq1(torch.cat([r*h, x], dim=1))) 51 | h = (1-z) * h + z * q 52 | 53 | # vertical 54 | hx = torch.cat([h, x], dim=1) 55 | z = torch.sigmoid(self.convz2(hx)) 56 | r = torch.sigmoid(self.convr2(hx)) 57 | q = torch.tanh(self.convq2(torch.cat([r*h, x], dim=1))) 58 | h = (1-z) * h + z * q 59 | 60 | return h 61 | 62 | 63 | class BasicMotionEncoder(nn.Module): 64 | def __init__(self, args): 65 | super(BasicMotionEncoder, self).__init__() 66 | cor_planes = args.corr_levels * (2*args.corr_radius + 1)**2 67 | self.convc1 = nn.Conv2d(cor_planes, 256, 1, padding=0) 68 | self.convc2 = nn.Conv2d(256, 192, 3, padding=1) 69 | self.convf1 = nn.Conv2d(2, 128, 7, padding=3) 70 | self.convf2 = nn.Conv2d(128, 64, 3, padding=1) 71 | self.conv = nn.Conv2d(64+192, 128-2, 3, padding=1) 72 | 73 | def forward(self, flow, corr): 74 | cor = F.relu(self.convc1(corr)) 75 | cor = F.relu(self.convc2(cor)) 76 | flo = F.relu(self.convf1(flow)) 77 | flo = F.relu(self.convf2(flo)) 78 | 79 | cor_flo = torch.cat([cor, flo], dim=1) 80 | out = F.relu(self.conv(cor_flo)) 81 | return torch.cat([out, flow], dim=1) 82 | 83 | class BasicUpdateBlock(nn.Module): 84 | def __init__(self, args, hidden_dim=128, input_dim=128): 85 | super(BasicUpdateBlock, self).__init__() 86 | self.args = args 87 | self.encoder = BasicMotionEncoder(args) 88 | self.gru = SepConvGRU(hidden_dim=hidden_dim, input_dim=128+hidden_dim) 89 | self.flow_head = FlowHead(hidden_dim, hidden_dim=256) 90 | 91 | self.mask = nn.Sequential( 92 | nn.Conv2d(128, 256, 3, padding=1), 93 | nn.ReLU(inplace=True), 94 | nn.Conv2d(256, 64*9, 1, padding=0)) 95 | 96 | def forward(self, net, inp, corr, flow, upsample=True): 97 | motion_features = self.encoder(flow, corr) 98 | inp = torch.cat([inp, motion_features], dim=1) 99 | 100 | net = self.gru(net, inp) 101 | 102 | delta_flow = self.flow_head(net) 103 | 104 | # scale mask to balence gradients 105 | mask = .25 * self.mask(net) 106 | return net, mask, delta_flow 107 | 108 | 109 | 110 | -------------------------------------------------------------------------------- /core/utils.py: -------------------------------------------------------------------------------- 1 | import torch.optim as optim 2 | from tensorboardX import SummaryWriter 3 | import torch 4 | import torch.nn.functional as F 5 | import numpy as np 6 | from scipy import interpolate 7 | 8 | 9 | class Logger: 10 | def __init__(self, model, scheduler, SUM_FREQ=100): 11 | self.model = model 12 | self.scheduler = scheduler 13 | self.total_steps = 0 14 | self.running_loss = {} 15 | self.writer = None 16 | self.SUM_FREQ = SUM_FREQ 17 | 18 | def _print_training_status(self): 19 | metrics_data = [self.running_loss[k] / self.SUM_FREQ for k in sorted(self.running_loss.keys())] 20 | training_str = "[{:6d}, {:10.7f}] ".format(self.total_steps + 1, self.scheduler.get_last_lr()[0]) 21 | metrics_str = ("{:10.4f}, " * len(metrics_data)).format(*metrics_data) 22 | 23 | # print the training status 24 | print(training_str + metrics_str) 25 | 26 | if self.writer is None: 27 | self.writer = SummaryWriter() 28 | 29 | for k in self.running_loss: 30 | self.writer.add_scalar(k, self.running_loss[k] / self.SUM_FREQ, self.total_steps) 31 | self.running_loss[k] = 0.0 32 | 33 | def push(self, metrics): 34 | self.total_steps += 1 35 | 36 | for key in metrics: 37 | if key not in self.running_loss: 38 | self.running_loss[key] = 0.0 39 | 40 | self.running_loss[key] += metrics[key] 41 | 42 | if self.total_steps % self.SUM_FREQ == self.SUM_FREQ - 1: 43 | self._print_training_status() 44 | self.running_loss = {} 45 | 46 | def write_dict(self, results): 47 | if self.writer is None: 48 | self.writer = SummaryWriter() 49 | 50 | for key in results: 51 | self.writer.add_scalar(key, results[key], self.total_steps) 52 | 53 | def close(self): 54 | self.writer.close() 55 | 56 | def count_parameters(model): 57 | return sum(p.numel() for p in model.parameters() if p.requires_grad) 58 | 59 | def fetch_optimizer(args, nums, model): 60 | """ Create the optimizer and learning rate scheduler """ 61 | optimizer = optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.wdecay, eps=args.epsilon) 62 | 63 | # scheduler = optim.lr_scheduler.OneCycleLR(optimizer, args.lr, args.num_steps+100, 64 | # pct_start=0.05, cycle_momentum=False, anneal_strategy='linear') 65 | scheduler = optim.lr_scheduler.OneCycleLR(optimizer, args.lr, args.epochs * nums + 100, 66 | pct_start=0.05, cycle_momentum=False, anneal_strategy='linear') 67 | return optimizer, scheduler 68 | 69 | 70 | def merge_inputs(queries): 71 | point_clouds = [] 72 | imgs = [] 73 | reflectances = [] 74 | returns = {key: default_collate([d[key] for d in queries]) for key in queries[0] 75 | if key != 'point_cloud' and key != 'rgb' and key != 'reflectance'} 76 | for input in queries: 77 | point_clouds.append(input['point_cloud']) 78 | imgs.append(input['rgb']) 79 | if 'reflectance' in input: 80 | reflectances.append(input['reflectance']) 81 | returns['point_cloud'] = point_clouds 82 | returns['rgb'] = imgs 83 | if len(reflectances) > 0: 84 | returns['reflectance'] = reflectances 85 | return returns 86 | 87 | 88 | class InputPadder: 89 | """ Pads images such that dimensions are divisible by 8 """ 90 | 91 | def __init__(self, dims, mode='sintel'): 92 | self.ht, self.wd = dims[-2:] 93 | pad_ht = (((self.ht // 8) + 1) * 8 - self.ht) % 8 94 | pad_wd = (((self.wd // 8) + 1) * 8 - self.wd) % 8 95 | if mode == 'sintel': 96 | self._pad = [pad_wd // 2, pad_wd - pad_wd // 2, pad_ht // 2, pad_ht - pad_ht // 2] 97 | else: 98 | self._pad = [pad_wd // 2, pad_wd - pad_wd // 2, 0, pad_ht] 99 | 100 | def pad(self, *inputs): 101 | return [F.pad(x, self._pad, mode='replicate') for x in inputs] 102 | 103 | def unpad(self, x): 104 | ht, wd = x.shape[-2:] 105 | c = [self._pad[2], ht - self._pad[3], self._pad[0], wd - self._pad[1]] 106 | return x[..., c[0]:c[1], c[2]:c[3]] 107 | 108 | 109 | def forward_interpolate(flow): 110 | flow = flow.detach().cpu().numpy() 111 | dx, dy = flow[0], flow[1] 112 | 113 | ht, wd = dx.shape 114 | x0, y0 = np.meshgrid(np.arange(wd), np.arange(ht)) 115 | 116 | x1 = x0 + dx 117 | y1 = y0 + dy 118 | 119 | x1 = x1.reshape(-1) 120 | y1 = y1.reshape(-1) 121 | dx = dx.reshape(-1) 122 | dy = dy.reshape(-1) 123 | 124 | valid = (x1 > 0) & (x1 < wd) & (y1 > 0) & (y1 < ht) 125 | x1 = x1[valid] 126 | y1 = y1[valid] 127 | dx = dx[valid] 128 | dy = dy[valid] 129 | 130 | flow_x = interpolate.griddata( 131 | (x1, y1), dx, (x0, y0), method='nearest', fill_value=0) 132 | 133 | flow_y = interpolate.griddata( 134 | (x1, y1), dy, (x0, y0), method='nearest', fill_value=0) 135 | 136 | flow = np.stack([flow_x, flow_y], axis=0) 137 | return torch.from_numpy(flow).float() 138 | 139 | 140 | def bilinear_sampler(img, coords, mode='bilinear', mask=False): 141 | """ Wrapper for grid_sample, uses pixel coordinates """ 142 | H, W = img.shape[-2:] 143 | xgrid, ygrid = coords.split([1, 1], dim=-1) 144 | xgrid = 2 * xgrid / (W - 1) - 1 145 | ygrid = 2 * ygrid / (H - 1) - 1 146 | 147 | grid = torch.cat([xgrid, ygrid], dim=-1) 148 | img = F.grid_sample(img, grid, align_corners=True) 149 | 150 | if mask: 151 | mask = (xgrid > -1) & (ygrid > -1) & (xgrid < 1) & (ygrid < 1) 152 | return img, mask.float() 153 | 154 | return img 155 | 156 | 157 | def coords_grid(batch, ht, wd): 158 | coords = torch.meshgrid(torch.arange(ht), torch.arange(wd)) 159 | coords = torch.stack(coords[::-1], dim=0).float() 160 | return coords[None].repeat(batch, 1, 1, 1) 161 | 162 | 163 | def upflow8(flow, mode='bilinear'): 164 | new_size = (8 * flow.shape[2], 8 * flow.shape[3]) 165 | return 8 * F.interpolate(flow, size=new_size, mode=mode, align_corners=True) -------------------------------------------------------------------------------- /core/utils_point.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------- 2 | # Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab 3 | # Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it) 4 | # Released under Creative Commons 5 | # Attribution-NonCommercial-ShareAlike 4.0 International License. 6 | # http://creativecommons.org/licenses/by-nc-sa/4.0/ 7 | # ------------------------------------------------------------------- 8 | import math 9 | import cv2 10 | from PIL import Image 11 | import mathutils 12 | import numpy as np 13 | import torch 14 | from torchvision import transforms 15 | import torch.nn.functional as F 16 | from matplotlib import cm 17 | from torch.utils.data.dataloader import default_collate 18 | 19 | 20 | def rotate_points(PC, R, T=None, inverse=True): 21 | if T is not None: 22 | R = R.to_matrix() 23 | R.resize_4x4() 24 | T = mathutils.Matrix.Translation(T) 25 | RT = T*R 26 | else: 27 | RT=R.copy() 28 | if inverse: 29 | RT.invert_safe() 30 | RT = torch.tensor(RT, device=PC.device, dtype=torch.float) 31 | 32 | if PC.shape[0] == 4: 33 | PC = torch.mm(RT, PC) 34 | elif PC.shape[1] == 4: 35 | PC = torch.mm(RT, PC.t()) 36 | PC = PC.t() 37 | else: 38 | raise TypeError("Point cloud must have shape [Nx4] or [4xN] (homogeneous coordinates)") 39 | return PC 40 | 41 | 42 | def rotate_points_torch(PC, R, T=None, inverse=True): 43 | if T is not None: 44 | R = quat2mat(R) 45 | T = tvector2mat(T) 46 | RT = torch.mm(T, R) 47 | else: 48 | RT = R.clone() 49 | if inverse: 50 | RT = RT.inverse() 51 | if PC.shape[0] == 4: 52 | PC = torch.mm(RT, PC) 53 | elif PC.shape[1] == 4: 54 | PC = torch.mm(RT, PC.t()) 55 | PC = PC.t() 56 | else: 57 | raise TypeError("Point cloud must have shape [Nx4] or [4xN] (homogeneous coordinates)") 58 | return PC 59 | 60 | 61 | def rotate_forward(PC, R, T=None): 62 | """ 63 | Transform the point cloud PC, so to have the points 'as seen from' the new 64 | pose T*R 65 | Args: 66 | PC (torch.Tensor): Point Cloud to be transformed, shape [4xN] or [Nx4] 67 | R (torch.Tensor/mathutils.Euler): can be either: 68 | * (mathutils.Euler) euler angles of the rotation part, in this case T cannot be None 69 | * (torch.Tensor shape [4]) quaternion representation of the rotation part, in this case T cannot be None 70 | * (mathutils.Matrix shape [4x4]) Rotation matrix, 71 | in this case it should contains the translation part, and T should be None 72 | * (torch.Tensor shape [4x4]) Rotation matrix, 73 | in this case it should contains the translation part, and T should be None 74 | T (torch.Tensor/mathutils.Vector): Translation of the new pose, shape [3], or None (depending on R) 75 | 76 | Returns: 77 | torch.Tensor: Transformed Point Cloud 'as seen from' pose T*R 78 | """ 79 | if isinstance(R, torch.Tensor): 80 | return rotate_points_torch(PC, R, T, inverse=True) 81 | else: 82 | return rotate_points(PC, R, T, inverse=True) 83 | 84 | 85 | def rotate_back(PC_ROTATED, R, T=None): 86 | """ 87 | Inverse of :func:`~utils.rotate_forward`. 88 | """ 89 | if isinstance(R, torch.Tensor): 90 | return rotate_points_torch(PC_ROTATED, R, T, inverse=False) 91 | else: 92 | return rotate_points(PC_ROTATED, R, T, inverse=False) 93 | 94 | 95 | def invert_pose(R, T): 96 | """ 97 | Given the 'sampled pose' (aka H_init), we want CMRNet to predict inv(H_init). 98 | inv(T*R) will be used as ground truth for the network. 99 | Args: 100 | R (mathutils.Euler): Rotation of 'sampled pose' 101 | T (mathutils.Vector): Translation of 'sampled pose' 102 | 103 | Returns: 104 | (R_GT, T_GT) = (mathutils.Quaternion, mathutils.Vector) 105 | """ 106 | R = R.to_matrix() 107 | R.resize_4x4() 108 | T = mathutils.Matrix.Translation(T) 109 | RT = T * R 110 | RT.invert_safe() 111 | T_GT, R_GT, _ = RT.decompose() 112 | return R_GT.normalized(), T_GT 113 | 114 | 115 | def merge_inputs(queries): 116 | point_clouds = [] 117 | imgs = [] 118 | reflectances = [] 119 | returns = {key: default_collate([d[key] for d in queries]) for key in queries[0] 120 | if key != 'point_cloud' and key != 'rgb' and key != 'reflectance'} 121 | for input in queries: 122 | point_clouds.append(input['point_cloud']) 123 | imgs.append(input['rgb']) 124 | if 'reflectance' in input: 125 | reflectances.append(input['reflectance']) 126 | returns['point_cloud'] = point_clouds 127 | returns['rgb'] = imgs 128 | if len(reflectances) > 0: 129 | returns['reflectance'] = reflectances 130 | return returns 131 | 132 | 133 | def quaternion_from_matrix(matrix): 134 | """ 135 | Convert a rotation matrix to quaternion. 136 | Args: 137 | matrix (torch.Tensor): [4x4] transformation matrix or [3,3] rotation matrix. 138 | 139 | Returns: 140 | torch.Tensor: shape [4], normalized quaternion 141 | """ 142 | if matrix.shape == (4, 4): 143 | R = matrix[:-1, :-1] 144 | elif matrix.shape == (3, 3): 145 | R = matrix 146 | else: 147 | raise TypeError("Not a valid rotation matrix") 148 | tr = R[0, 0] + R[1, 1] + R[2, 2] 149 | q = torch.zeros(4, device=matrix.device) 150 | if tr > 0.: 151 | S = (tr+1.0).sqrt() * 2 152 | q[0] = 0.25 * S 153 | q[1] = (R[2, 1] - R[1, 2]) / S 154 | q[2] = (R[0, 2] - R[2, 0]) / S 155 | q[3] = (R[1, 0] - R[0, 1]) / S 156 | elif R[0, 0] > R[1, 1] and R[0, 0] > R[2, 2]: 157 | S = (1.0 + R[0, 0] - R[1, 1] - R[2, 2]).sqrt() * 2 158 | q[0] = (R[2, 1] - R[1, 2]) / S 159 | q[1] = 0.25 * S 160 | q[2] = (R[0, 1] + R[1, 0]) / S 161 | q[3] = (R[0, 2] + R[2, 0]) / S 162 | elif R[1, 1] > R[2, 2]: 163 | S = (1.0 + R[1, 1] - R[0, 0] - R[2, 2]).sqrt() * 2 164 | q[0] = (R[0, 2] - R[2, 0]) / S 165 | q[1] = (R[0, 1] + R[1, 0]) / S 166 | q[2] = 0.25 * S 167 | q[3] = (R[1, 2] + R[2, 1]) / S 168 | else: 169 | S = (1.0 + R[2, 2] - R[0, 0] - R[1, 1]).sqrt() * 2 170 | q[0] = (R[1, 0] - R[0, 1]) / S 171 | q[1] = (R[0, 2] + R[2, 0]) / S 172 | q[2] = (R[1, 2] + R[2, 1]) / S 173 | q[3] = 0.25 * S 174 | return q / q.norm() 175 | 176 | 177 | def quatmultiply(q, r): 178 | """ 179 | Multiply two quaternions 180 | Args: 181 | q (torch.Tensor/nd.ndarray): shape=[4], first quaternion 182 | r (torch.Tensor/nd.ndarray): shape=[4], second quaternion 183 | 184 | Returns: 185 | torch.Tensor: shape=[4], normalized quaternion q*r 186 | """ 187 | t = torch.zeros(4, device=q.device) 188 | t[0] = r[0] * q[0] - r[1] * q[1] - r[2] * q[2] - r[3] * q[3] 189 | t[1] = r[0] * q[1] + r[1] * q[0] - r[2] * q[3] + r[3] * q[2] 190 | t[2] = r[0] * q[2] + r[1] * q[3] + r[2] * q[0] - r[3] * q[1] 191 | t[3] = r[0] * q[3] - r[1] * q[2] + r[2] * q[1] + r[3] * q[0] 192 | return t / t.norm() 193 | 194 | 195 | def quat2mat(q): 196 | """ 197 | Convert a quaternion to a rotation matrix 198 | Args: 199 | q (torch.Tensor): shape [4], input quaternion 200 | 201 | Returns: 202 | torch.Tensor: [4x4] homogeneous rotation matrix 203 | """ 204 | assert q.shape == torch.Size([4]), "Not a valid quaternion" 205 | if q.norm() != 1.: 206 | q = q / q.norm() 207 | mat = torch.zeros((4, 4), device=q.device) 208 | mat[0, 0] = 1 - 2*q[2]**2 - 2*q[3]**2 209 | mat[0, 1] = 2*q[1]*q[2] - 2*q[3]*q[0] 210 | mat[0, 2] = 2*q[1]*q[3] + 2*q[2]*q[0] 211 | mat[1, 0] = 2*q[1]*q[2] + 2*q[3]*q[0] 212 | mat[1, 1] = 1 - 2*q[1]**2 - 2*q[3]**2 213 | mat[1, 2] = 2*q[2]*q[3] - 2*q[1]*q[0] 214 | mat[2, 0] = 2*q[1]*q[3] - 2*q[2]*q[0] 215 | mat[2, 1] = 2*q[2]*q[3] + 2*q[1]*q[0] 216 | mat[2, 2] = 1 - 2*q[1]**2 - 2*q[2]**2 217 | mat[3, 3] = 1. 218 | return mat 219 | 220 | 221 | def tvector2mat(t): 222 | """ 223 | Translation vector to homogeneous transformation matrix with identity rotation 224 | Args: 225 | t (torch.Tensor): shape=[3], translation vector 226 | 227 | Returns: 228 | torch.Tensor: [4x4] homogeneous transformation matrix 229 | 230 | """ 231 | assert t.shape == torch.Size([3]), "Not a valid translation" 232 | mat = torch.eye(4, device=t.device) 233 | mat[0, 3] = t[0] 234 | mat[1, 3] = t[1] 235 | mat[2, 3] = t[2] 236 | return mat 237 | 238 | 239 | def mat2xyzrpy(rotmatrix): 240 | """ 241 | Decompose transformation matrix into components 242 | Args: 243 | rotmatrix (torch.Tensor/np.ndarray): [4x4] transformation matrix 244 | 245 | Returns: 246 | torch.Tensor: shape=[6], contains xyzrpy 247 | """ 248 | roll = math.atan2(-rotmatrix[1, 2], rotmatrix[2, 2]) 249 | pitch = math.asin ( rotmatrix[0, 2]) 250 | yaw = math.atan2(-rotmatrix[0, 1], rotmatrix[0, 0]) 251 | x = rotmatrix[:3, 3][0] 252 | y = rotmatrix[:3, 3][1] 253 | z = rotmatrix[:3, 3][2] 254 | 255 | return torch.tensor([x, y, z, roll, pitch, yaw], device=rotmatrix.device, dtype=rotmatrix.dtype) 256 | 257 | 258 | def to_rotation_matrix(R, T): 259 | R = quat2mat(R) 260 | T = tvector2mat(T) 261 | RT = torch.mm(T, R) 262 | return RT 263 | 264 | 265 | def rotation_vector_to_euler(rvecs): 266 | R, _ = cv2.Rodrigues(rvecs) 267 | 268 | sy = np.sqrt(R[0, 0] * R[0, 0] + R[1, 0] * R[1, 0]) 269 | 270 | singular = sy < 1e-6 271 | 272 | if not singular: 273 | x = np.arctan2(R[2, 1], R[2, 2]) 274 | y = np.arctan2(-R[2, 0], sy) 275 | z = np.arctan2(R[1, 0], R[0, 0]) 276 | else: 277 | x = np.arctan2(-R[1, 2], R[1, 1]) 278 | y = np.arctan2(-R[2, 0], sy) 279 | z = 0 280 | 281 | return np.array([x, y, z]) 282 | 283 | 284 | def overlay_imgs(rgb, lidar): 285 | std = [0.229, 0.224, 0.225] 286 | mean = [0.485, 0.456, 0.406] 287 | 288 | rgb = rgb.clone().cpu().permute(1,2,0).numpy() 289 | rgb = rgb*std+mean 290 | 291 | # rgb = cv2.resize(rgb, (120, 40)) 292 | # lidar = lidar.cpu().numpy() 293 | # lidar = cv2.resize(lidar, (120, 40)) 294 | 295 | lidar[lidar == 0] = 1000. 296 | lidar = -lidar 297 | 298 | lidar = lidar.clone() 299 | lidar = lidar.unsqueeze(0) 300 | lidar = lidar.unsqueeze(0) 301 | 302 | lidar = F.max_pool2d(lidar, 3, 1, 1) 303 | lidar = -lidar 304 | lidar[lidar == 1000.] = 0. 305 | 306 | lidar = lidar[0][0] 307 | 308 | 309 | lidar = lidar.cpu().numpy() 310 | min_d = 0 311 | max_d = np.max(lidar) 312 | lidar = ((lidar - min_d) / (max_d - min_d)) * 255 313 | lidar = lidar.astype(np.uint8) 314 | lidar_color = cm.jet(lidar) 315 | lidar_color[:, :, 3] = 0.5 316 | lidar_color[lidar == 0] = [0, 0, 0, 0] 317 | blended_img = lidar_color[:, :, :3] * (np.expand_dims(lidar_color[:, :, 3], 2)) \ 318 | + rgb * (1. - np.expand_dims(lidar_color[:, :, 3], 2)) 319 | blended_img = blended_img.clip(min=0., max=1.) 320 | 321 | blended_img = cv2.cvtColor((blended_img*255).astype(np.uint8), cv2.COLOR_BGR2RGB) 322 | # cv2.imwrite(f'./images/output/{idx:06d}_{name}.png', blended_img) 323 | return blended_img 324 | 325 | -------------------------------------------------------------------------------- /core/visibility_package/setup.py: -------------------------------------------------------------------------------- 1 | # ------------------------------------------------------------------- 2 | # Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab 3 | # Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it) 4 | # Released under Creative Commons 5 | # Attribution-NonCommercial-ShareAlike 4.0 International License. 6 | # http://creativecommons.org/licenses/by-nc-sa/4.0/ 7 | # ------------------------------------------------------------------- 8 | from setuptools import setup 9 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension 10 | 11 | setup( 12 | name='visibility', 13 | version='0.1', 14 | author="Daniele Cattaneo", 15 | author_email="cattaneo@informatik.uni-freiburg.de", 16 | url="https://github.com/catta202000/CMRNet", 17 | ext_modules=[ 18 | CUDAExtension('visibility', [ 19 | './visibility_new.cpp', 20 | './visibility_kernel_new.cu', 21 | ]) 22 | ], 23 | cmdclass={ 24 | 'build_ext': BuildExtension 25 | }) -------------------------------------------------------------------------------- /core/visibility_package/visibility.cpp: -------------------------------------------------------------------------------- 1 | // ------------------------------------------------------------------- 2 | // Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab 3 | // Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it) 4 | // Released under Creative Commons 5 | // Attribution-NonCommercial-ShareAlike 4.0 International License. 6 | // http://creativecommons.org/licenses/by-nc-sa/4.0/ 7 | // ------------------------------------------------------------------- 8 | #include 9 | #include 10 | 11 | #include 12 | #include 13 | 14 | #include 15 | 16 | at::Tensor depth_image_cuda(at::Tensor input_uv, at::Tensor input_depth, at::Tensor output, unsigned int size, unsigned int width,unsigned int height); 17 | at::Tensor visibility_filter_cuda(at::Tensor input, at::Tensor output, unsigned int width, unsigned int height, unsigned int threshold); 18 | at::Tensor visibility_filter_cuda2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor output, unsigned int width,unsigned int height, float threshold, unsigned int radius); 19 | at::Tensor downsample_flow_cuda(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel); 20 | at::Tensor downsample_mask_cuda(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel); 21 | //at::Tensor visibility_filter_cuda_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width,unsigned int height, float threshold); 22 | 23 | // C++ interface 24 | 25 | #define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor") 26 | #define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous") 27 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x) 28 | 29 | at::Tensor depth_image(at::Tensor input_uv, at::Tensor input_depth, at::Tensor output, unsigned int size, unsigned int width,unsigned int height) { 30 | CHECK_INPUT(input_uv); 31 | CHECK_INPUT(input_depth); 32 | CHECK_INPUT(output); 33 | return depth_image_cuda(input_uv, input_depth, output, size, width, height); 34 | } 35 | 36 | at::Tensor visibility_filter(at::Tensor input, at::Tensor output, unsigned int width, unsigned int height, unsigned int threshold){ 37 | CHECK_INPUT(input); 38 | CHECK_INPUT(output); 39 | return visibility_filter_cuda(input, output, width, height, threshold); 40 | } 41 | 42 | at::Tensor visibility_filter2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor output, unsigned int width, unsigned int height, float threshold, unsigned int radius){ 43 | CHECK_INPUT(input_depth); 44 | CHECK_INPUT(intrinsic); 45 | CHECK_INPUT(output); 46 | return visibility_filter_cuda2(input_depth, intrinsic, output, width, height, threshold, radius); 47 | } 48 | 49 | at::Tensor downsample_flow(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) { 50 | CHECK_INPUT(input_uv); 51 | CHECK_INPUT(output); 52 | return downsample_flow_cuda(input_uv, output, width_out, height_out, kernel); 53 | } 54 | 55 | at::Tensor downsample_mask(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) { 56 | CHECK_INPUT(input_mask); 57 | CHECK_INPUT(output); 58 | return downsample_mask_cuda(input_mask, output, width_out, height_out, kernel); 59 | } 60 | 61 | /*at::Tensor visibility_filter_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width, unsigned int height, float threshold){ 62 | CHECK_INPUT(input_depth); 63 | CHECK_INPUT(input_points); 64 | CHECK_INPUT(output); 65 | return visibility_filter_cuda_shared(input_depth, input_points, output, width, height, threshold); 66 | }*/ 67 | 68 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 69 | m.def("depth_image", &depth_image, "Generate Depth Image(CUDA)"); 70 | m.def("visibility", &visibility_filter, "visibility_filter (CUDA)"); 71 | m.def("visibility2", &visibility_filter2, "visibility_filter2 (CUDA)"); 72 | m.def("downsample_flow", &downsample_flow, "downsample_flow (CUDA)"); 73 | m.def("downsample_mask", &downsample_mask, "downsample_mask (CUDA)"); 74 | //m.def("visibility_shared", &visibility_filter2, "visibility_filter_shared (CUDA)"); 75 | } 76 | -------------------------------------------------------------------------------- /core/visibility_package/visibility_kernel.cu: -------------------------------------------------------------------------------- 1 | // ------------------------------------------------------------------- 2 | // Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab 3 | // Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it) 4 | // Released under Creative Commons 5 | // Attribution-NonCommercial-ShareAlike 4.0 International License. 6 | // http://creativecommons.org/licenses/by-nc-sa/4.0/ 7 | // ------------------------------------------------------------------- 8 | #include 9 | #include 10 | #include 11 | 12 | #include 13 | #include 14 | 15 | #define IMAGE_SIZE 10000 16 | #define TILE_W 25 //Tile Width 17 | #define TILE_H 25 //Tile Height 18 | #define R 3 //Filter Radius 19 | #define D (2*R+1) //Filter Diameter 20 | #define BLOCK_W (TILE_W+(2*R)) 21 | #define BLOCK_H (TILE_H+(2*R)) 22 | 23 | 24 | __device__ __forceinline__ int floatToOrderedInt( float floatVal ) { 25 | int intVal = __float_as_int( floatVal ); 26 | return (intVal >= 0 ) ? intVal : intVal ^ 0x7FFFFFFF; 27 | } 28 | 29 | __device__ __forceinline__ float orderedIntToFloat( int intVal ) { 30 | return __int_as_float( (intVal >= 0) ? intVal : intVal ^ 0x7FFFFFFF); 31 | } 32 | 33 | __global__ void depth_image_kernel( 34 | const int* __restrict__ in_uv, 35 | const float* __restrict__ in_depth, 36 | float* __restrict__ out, 37 | unsigned int size, unsigned int width, unsigned int height) { 38 | 39 | int index = threadIdx.x + blockIdx.x * blockDim.x; 40 | if(index >= 0 && index < size) { 41 | int uv_index = in_uv[2*index+1]*width + in_uv[2*index]; 42 | atomicMin((int *)&out[uv_index], floatToOrderedInt(in_depth[index])); 43 | } 44 | 45 | } 46 | 47 | template 48 | __global__ void visibility_kernel( 49 | const scalar_t* __restrict__ in, 50 | scalar_t* __restrict__ out, 51 | unsigned int width, unsigned int height, unsigned int threshold) { 52 | //__shared__ scalar_t sharedImg[BLOCK_W*BLOCK_H]; 53 | //int x = blockIdx.x*TILE_W + threadIdx.x - R; 54 | //int y = blockIdx.y*TILE_H + threadIdx.y - R; 55 | 56 | int x = threadIdx.x + blockIdx.x * blockDim.x; 57 | int y = threadIdx.y + blockIdx.y * blockDim.y; 58 | 59 | // clamp to edge of image 60 | x = max(0, x); 61 | x = min(x, height-1); 62 | y = max(y, 0); 63 | y = min(y, width-1); 64 | //unsigned int index = y*width + x; 65 | unsigned int index = x*width + y; 66 | //unsigned int bindex = threadIdx.y*blockDim.y+threadIdx.x; 67 | 68 | // each thread copy its pixel to the shared memory 69 | //sharedImg[bindex] = in[index]; 70 | //__syncthreads(); 71 | 72 | //index = 73 | 74 | //out[index] = 7.0; 75 | //return; 76 | 77 | // only threads inside the apron write the results 78 | //if ((threadIdx.x >= R) && (threadIdx.x < (BLOCK_W-R)) && 79 | // (threadIdx.y >= R) && (threadIdx.y < (BLOCK_H-R))) { 80 | out[index] = in[index]; 81 | if(x >= R && y >= R && x < (height-R) && y < (width-R)) { 82 | scalar_t pixel = in[index]; 83 | if (pixel != 0.) { 84 | int sum = 0; 85 | int count = 0; 86 | for(int i=-R; i<=R; i++) { 87 | for(int j=-R; j<=R; j++) { 88 | if(i==0 && j==0) 89 | continue; 90 | int temp_index = index + i*width + j; 91 | scalar_t temp_pixel = in[temp_index]; 92 | if(temp_pixel != 0 ) { 93 | count += 1; 94 | if(temp_pixel < pixel - 3.) 95 | sum += 1; 96 | } 97 | } 98 | } 99 | if(sum >= 1+threshold * count / (R*R*2*2)) 100 | out[index] = 0.; 101 | } 102 | 103 | } 104 | } 105 | 106 | 107 | template 108 | __device__ __forceinline__ scalar_t norm(scalar_t x, scalar_t y, scalar_t z) { 109 | scalar_t sum = x*x + y*y + z*z; 110 | return sum; 111 | } 112 | 113 | 114 | template 115 | __global__ void visibility_kernel2( 116 | const scalar_t* __restrict__ in_depth, 117 | const scalar_t* __restrict__ intrinsic, 118 | scalar_t* __restrict__ out, 119 | unsigned int width, unsigned int height, float threshold, unsigned int radius) { 120 | 121 | int x = threadIdx.x + blockIdx.x * blockDim.x; 122 | int y = threadIdx.y + blockIdx.y * blockDim.y; 123 | 124 | // clamp to edge of image 125 | x = max(0, x); 126 | x = min(x, height-1); 127 | y = max(y, 0); 128 | y = min(y, width-1); 129 | 130 | //unsigned int index = y*width + x; 131 | unsigned int index = x*width + y; 132 | out[index] = in_depth[index]; 133 | 134 | bool debug=false; 135 | 136 | if(x >= 0 && y >= 0 && x < (height) && y < (width)) { 137 | 138 | scalar_t pixel = in_depth[index]; 139 | if (pixel != 0.) { 140 | scalar_t fx, fy, cx, cy; 141 | fx = intrinsic[0]; 142 | fy = intrinsic[1]; 143 | cx = intrinsic[2]; 144 | cy = intrinsic[3]; 145 | 146 | scalar_t v_x, v_y, v_z, v2_x, v2_y, v2_z; 147 | v_x = (y - cx) * pixel / fx; 148 | v_y = (x - cy) * pixel / fy; 149 | v_z = pixel; 150 | 151 | if(debug) 152 | printf("V: %f %f %f \n", v_x, v_y, v_z); 153 | 154 | scalar_t v_norm = norm(-v_x, -v_y, -v_z); 155 | if(debug) 156 | printf("V_norm: %f \n", v_norm); 157 | 158 | if(v_norm <= 0.) 159 | return; 160 | v_norm = sqrt(v_norm); 161 | v2_x = -v_x / v_norm; 162 | v2_y = -v_y / v_norm; 163 | v2_z = -v_z / v_norm; 164 | if(debug) { 165 | printf("-V_normalized %f, %f, %f \n", v2_x, v2_y, v2_z); 166 | printf("SubMatrix1: \n"); 167 | } 168 | 169 | scalar_t max_dot1, max_dot2, max_dot3, max_dot4; 170 | max_dot1 = -1.; 171 | max_dot2 = -1.; 172 | max_dot3 = -1.; 173 | max_dot4 = -1.; 174 | 175 | for(int i=-radius; i<=0; i++) { 176 | for(int j=-radius; j<=0; j++) { 177 | if(x+i <0 || x+i >= height || y+j < 0 || y+j >= width) 178 | break; 179 | int temp_index = index + i*width + j; 180 | scalar_t temp_pixel = in_depth[temp_index]; 181 | if (temp_pixel==0.) 182 | continue; 183 | 184 | scalar_t c_x, c_y, c_z; 185 | c_x = (y+j - cx) * temp_pixel / fx; 186 | c_y = (x+i - cy) * temp_pixel / fy; 187 | c_z = temp_pixel; 188 | 189 | c_x = c_x - v_x; 190 | c_y = c_y - v_y; 191 | c_z = c_z - v_z; 192 | scalar_t c_norm = norm(c_x, c_y, c_z); 193 | if(c_norm <= 0.) 194 | continue; 195 | c_norm = sqrt(c_norm); 196 | c_x = c_x / c_norm; 197 | c_y = c_y / c_norm; 198 | c_z = c_z / c_norm; 199 | scalar_t dot_prod = c_x*v2_x + c_y*v2_y + c_z*v2_z; 200 | if(debug) { 201 | printf("%d, %d : %f %f %f \n", i, j, c_x, c_y, c_z); 202 | printf("DotProd: %f \n", dot_prod); 203 | } 204 | if(dot_prod > max_dot1) { 205 | max_dot1 = dot_prod; 206 | } 207 | 208 | } 209 | } 210 | 211 | if(debug) 212 | printf("SubMatrix2: \n"); 213 | for(int i=0; i<=radius; i++) { 214 | for(int j=-radius; j<=0; j++) { 215 | if(x+i <0 || x+i >= height || y+j < 0 || y+j >= width) 216 | break; 217 | int temp_index = index + i*width + j; 218 | scalar_t temp_pixel = in_depth[temp_index]; 219 | if (temp_pixel==0.) 220 | continue; 221 | 222 | scalar_t c_x, c_y, c_z; 223 | c_x = (y+j - cx) * temp_pixel / fx; 224 | c_y = (x+i - cy) * temp_pixel / fy; 225 | c_z = temp_pixel; 226 | c_x = c_x - v_x; 227 | c_y = c_y - v_y; 228 | c_z = c_z - v_z; 229 | scalar_t c_norm = norm(c_x, c_y, c_z); 230 | if(c_norm <= 0.) 231 | continue; 232 | c_norm = sqrt(c_norm); 233 | c_x = c_x / c_norm; 234 | c_y = c_y / c_norm; 235 | c_z = c_z / c_norm; 236 | scalar_t dot_prod = c_x*v2_x + c_y*v2_y + c_z*v2_z; 237 | if(debug) { 238 | printf("%d, %d : %f %f %f \n", i, j, c_x, c_y, c_z); 239 | printf("DotProd: %f \n", dot_prod); 240 | } 241 | if(dot_prod > max_dot2) { 242 | max_dot2 = dot_prod; 243 | } 244 | 245 | } 246 | } 247 | 248 | if(debug) 249 | printf("SubMatrix3: \n"); 250 | for(int i=-radius; i<=0; i++) { 251 | for(int j=0; j<=radius; j++) { 252 | if(x+i <0 || x+i >= height || y+j < 0 || y+j >= width) 253 | break; 254 | int temp_index = index + i*width + j; 255 | scalar_t temp_pixel = in_depth[temp_index]; 256 | if (temp_pixel==0.) 257 | continue; 258 | 259 | scalar_t c_x, c_y, c_z; 260 | c_x = (y+j - cx) * temp_pixel / fx; 261 | c_y = (x+i - cy) * temp_pixel / fy; 262 | c_z = temp_pixel; 263 | c_x = c_x - v_x; 264 | c_y = c_y - v_y; 265 | c_z = c_z - v_z; 266 | scalar_t c_norm = norm(c_x, c_y, c_z); 267 | if(c_norm <= 0.) 268 | continue; 269 | c_norm = sqrt(c_norm); 270 | c_x = c_x / c_norm; 271 | c_y = c_y / c_norm; 272 | c_z = c_z / c_norm; 273 | scalar_t dot_prod = c_x*v2_x + c_y*v2_y + c_z*v2_z; 274 | if(debug) { 275 | printf("%d, %d : %f %f %f \n", i, j, c_x, c_y, c_z); 276 | printf("DotProd: %f \n", dot_prod); 277 | } 278 | if(dot_prod > max_dot3) { 279 | max_dot3 = dot_prod; 280 | } 281 | 282 | } 283 | } 284 | 285 | if(debug) 286 | printf("SubMatrix4: \n"); 287 | for(int i=0; i<=radius; i++) { 288 | for(int j=0; j<=radius; j++) { 289 | if(x+i <0 || x+i >= height || y+j < 0 || y+j >= width) 290 | break; 291 | int temp_index = index + i*width + j; 292 | scalar_t temp_pixel = in_depth[temp_index]; 293 | if(temp_pixel==0.) 294 | continue; 295 | 296 | scalar_t c_x, c_y, c_z; 297 | c_x = (y+j - cx) * temp_pixel / fx; 298 | c_y = (x+i - cy) * temp_pixel / fy; 299 | c_z = temp_pixel; 300 | c_x = c_x - v_x; 301 | c_y = c_y - v_y; 302 | c_z = c_z - v_z; 303 | scalar_t c_norm = norm(c_x, c_y, c_z); 304 | if(c_norm <= 0.) 305 | continue; 306 | c_norm = sqrt(c_norm); 307 | c_x = c_x / c_norm; 308 | c_y = c_y / c_norm; 309 | c_z = c_z / c_norm; 310 | scalar_t dot_prod = c_x*v2_x + c_y*v2_y + c_z*v2_z; 311 | if(debug) { 312 | printf("%d, %d : %f %f %f \n", i, j, c_x, c_y, c_z); 313 | printf("DotProd: %f \n", dot_prod); 314 | } 315 | if(dot_prod > max_dot4) { 316 | max_dot4 = dot_prod; 317 | } 318 | 319 | } 320 | } 321 | 322 | if(max_dot1 + max_dot2 + max_dot3 + max_dot4 >= threshold) { 323 | out[index] = 0.; 324 | } 325 | } 326 | 327 | } 328 | } 329 | 330 | 331 | template 332 | __global__ void downsample_flow_kernel( 333 | const scalar_t* __restrict__ in, 334 | scalar_t* __restrict__ out, 335 | unsigned int width_out, unsigned int height_out, unsigned int kernel) { 336 | 337 | int x = threadIdx.x + blockIdx.x * blockDim.x; 338 | int y = threadIdx.y + blockIdx.y * blockDim.y; 339 | 340 | // clamp to edge of image 341 | // x = max(0, x); 342 | //x = min(x, height_out-1); 343 | //y = max(y, 0); 344 | //y = min(y, width_out-1); 345 | 346 | scalar_t center = ((scalar_t)kernel-1.0)/2.0; 347 | 348 | unsigned int in_index = (kernel*x) * (kernel*width_out) + (kernel*y); 349 | unsigned int out_index = x*width_out + y; 350 | 351 | if (x >= 0 && x < height_out && y >= 0 && y < width_out) { 352 | //printf("out_index: %d, %d - in_pixel: %d, %d\n", x, y, x*kernel, y*kernel); 353 | scalar_t mean_u=0; 354 | scalar_t mean_v=0; 355 | scalar_t weights=0.0; 356 | unsigned int count = 0; 357 | for(int i=0; i= 0 && x < height_out && y >= 0 && y < width_out) { 403 | //printf("out_index: %d, %d - in_pixel: %d, %d\n", x, y, x*kernel, y*kernel); 404 | bool active = false; 405 | for(int i=0; i>>(input_uv.data(), input_depth.data(), output.data(), size, width, height); 432 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold); 433 | return output; 434 | } 435 | 436 | 437 | at::Tensor visibility_filter_cuda(at::Tensor input, at::Tensor output, unsigned int width,unsigned int height, unsigned int threshold) { 438 | //dim3 threads(BLOCK_W, BLOCK_H); 439 | //dim3 blocks((height)/TILE_W, (width)/TILE_H); 440 | dim3 threads(32, 32); 441 | dim3 blocks(height/32, width/32); 442 | AT_DISPATCH_FLOATING_TYPES(input.type(), "visibility_filter", ([&] { 443 | visibility_kernel<<>>(input.data(), output.data(), width, height, threshold);})); 444 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold); 445 | return output; 446 | } 447 | 448 | at::Tensor visibility_filter_cuda2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor output, unsigned int width,unsigned int height, float threshold, unsigned int radius) { 449 | //dim3 threads(BLOCK_W, BLOCK_H); 450 | //dim3 blocks((height)/TILE_W, (width)/TILE_H); 451 | dim3 threads(32, 32); 452 | dim3 blocks(height/32+1, width/32+1); 453 | AT_DISPATCH_FLOATING_TYPES(input_depth.type(), "visibility_filter2", ([&] { 454 | visibility_kernel2<<>>(input_depth.data(), intrinsic.data(), 455 | output.data(), width, height, threshold, radius);})); 456 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold); 457 | return output; 458 | } 459 | 460 | at::Tensor downsample_flow_cuda(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) { 461 | //dim3 threads(BLOCK_W, BLOCK_H); 462 | //dim3 blocks((height)/TILE_W, (width)/TILE_H); 463 | dim3 threads(32, 32); 464 | dim3 blocks(height_out/32+1, width_out/32+1); 465 | AT_DISPATCH_FLOATING_TYPES(input_uv.type(), "downsample_flow", ([&] { 466 | downsample_flow_kernel<<>>(input_uv.data(), output.data(), 467 | width_out, height_out, kernel);})); 468 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold); 469 | return output; 470 | } 471 | 472 | at::Tensor downsample_mask_cuda(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) { 473 | //dim3 threads(BLOCK_W, BLOCK_H); 474 | //dim3 blocks((height)/TILE_W, (width)/TILE_H); 475 | dim3 threads(32, 32); 476 | dim3 blocks(height_out/32+1, width_out/32+1); 477 | downsample_mask_kernel<<>>(input_mask.data(), output.data(), 478 | width_out, height_out, kernel); 479 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold); 480 | return output; 481 | } 482 | 483 | /*at::Tensor visibility_filter_cuda_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width,unsigned int height, float threshold) { 484 | dim3 threads(BLOCK_W, BLOCK_H); 485 | dim3 blocks((height)/TILE_W, (width)/TILE_H); 486 | //dim3 threads(32, 32); 487 | //dim3 blocks(height/32, width/32); 488 | AT_DISPATCH_FLOATING_TYPES(input_depth.type(), "visibility_filter2", ([&] { 489 | visibility_kernel_shared<<>>(input_depth.data(), input_points.data(), output.data(), width, height, threshold);})); 490 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold); 491 | return output; 492 | }*/ 493 | -------------------------------------------------------------------------------- /core/visibility_package/visibility_new.cpp: -------------------------------------------------------------------------------- 1 | // ------------------------------------------------------------------- 2 | // Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab 3 | // Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it) 4 | // Released under Creative Commons 5 | // Attribution-NonCommercial-ShareAlike 4.0 International License. 6 | // http://creativecommons.org/licenses/by-nc-sa/4.0/ 7 | // ------------------------------------------------------------------- 8 | #include 9 | #include 10 | 11 | #include 12 | #include 13 | 14 | #include 15 | 16 | std::vector depth_image_cuda(at::Tensor input_uv, at::Tensor input_depth, at::Tensor input_index, at::Tensor output, at::Tensor output_index, unsigned int size, unsigned int width,unsigned int height); 17 | at::Tensor visibility_filter_cuda(at::Tensor input, at::Tensor output, unsigned int width, unsigned int height, unsigned int threshold); 18 | std::vector visibility_filter_cuda2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor in_index, at::Tensor output, at::Tensor out_index, unsigned int width,unsigned int height, float threshold, unsigned int radius); 19 | at::Tensor downsample_flow_cuda(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel); 20 | at::Tensor downsample_mask_cuda(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel); 21 | //at::Tensor visibility_filter_cuda_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width,unsigned int height, float threshold); 22 | at::Tensor image_warp_index_cuda(at::Tensor input, at::Tensor flow, at::Tensor warp_depth, at::Tensor output, unsigned int width, unsigned int height); 23 | at::Tensor image_warp_cuda(at::Tensor input, at::Tensor flow, at::Tensor output, unsigned int width, unsigned int height); 24 | // C++ interface 25 | 26 | #define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor") 27 | #define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous") 28 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x) 29 | 30 | std::vector depth_image(at::Tensor input_uv, at::Tensor input_depth,at::Tensor input_index, at::Tensor output, at::Tensor output_index, unsigned int size, unsigned int width,unsigned int height) { 31 | CHECK_INPUT(input_uv); 32 | CHECK_INPUT(input_depth); 33 | CHECK_INPUT(input_index); 34 | CHECK_INPUT(output); 35 | CHECK_INPUT(output_index); 36 | return depth_image_cuda(input_uv, input_depth, input_index, output,output_index, size, width, height); 37 | } 38 | 39 | at::Tensor visibility_filter(at::Tensor input, at::Tensor output, unsigned int width, unsigned int height, unsigned int threshold){ 40 | CHECK_INPUT(input); 41 | CHECK_INPUT(output); 42 | return visibility_filter_cuda(input, output, width, height, threshold); 43 | } 44 | 45 | std::vector visibility_filter2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor in_index, at::Tensor output, at::Tensor out_index, unsigned int width, unsigned int height, float threshold, unsigned int radius){ 46 | CHECK_INPUT(input_depth); 47 | CHECK_INPUT(intrinsic); 48 | CHECK_INPUT(output); 49 | return visibility_filter_cuda2(input_depth, intrinsic, in_index, output, out_index, width, height, threshold, radius); 50 | } 51 | 52 | at::Tensor downsample_flow(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) { 53 | CHECK_INPUT(input_uv); 54 | CHECK_INPUT(output); 55 | return downsample_flow_cuda(input_uv, output, width_out, height_out, kernel); 56 | } 57 | 58 | at::Tensor downsample_mask(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) { 59 | CHECK_INPUT(input_mask); 60 | CHECK_INPUT(output); 61 | return downsample_mask_cuda(input_mask, output, width_out, height_out, kernel); 62 | } 63 | 64 | at::Tensor image_warp_index(at::Tensor input, at::Tensor flow, at::Tensor warp_depth, at::Tensor output, unsigned int width, unsigned int height) 65 | { 66 | CHECK_INPUT(input); 67 | CHECK_INPUT(flow); 68 | CHECK_INPUT(warp_depth); 69 | CHECK_INPUT(output); 70 | return image_warp_index_cuda(input, flow, warp_depth, output, width, height); 71 | } 72 | 73 | at::Tensor image_warp(at::Tensor input, at::Tensor flow, at::Tensor output, unsigned int width, unsigned int height) 74 | { 75 | CHECK_INPUT(input); 76 | CHECK_INPUT(flow); 77 | CHECK_INPUT(output); 78 | return image_warp_cuda(input, flow, output, width, height); 79 | } 80 | 81 | /*at::Tensor visibility_filter_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width, unsigned int height, float threshold){ 82 | CHECK_INPUT(input_depth); 83 | CHECK_INPUT(input_points); 84 | CHECK_INPUT(output); 85 | return visibility_filter_cuda_shared(input_depth, input_points, output, width, height, threshold); 86 | }*/ 87 | 88 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { 89 | m.def("depth_image", &depth_image, "Generate Depth Image(CUDA)"); 90 | m.def("visibility", &visibility_filter, "visibility_filter (CUDA)"); 91 | m.def("visibility2", &visibility_filter2, "visibility_filter2 (CUDA)"); 92 | m.def("downsample_flow", &downsample_flow, "downsample_flow (CUDA)"); 93 | m.def("downsample_mask", &downsample_mask, "downsample_mask (CUDA)"); 94 | m.def("image_warp_index", &image_warp_index, "image_warp_index (CUDA)"); 95 | m.def("image_warp", &image_warp, "image_warp (CUDA)"); 96 | //m.def("visibility_shared", &visibility_filter2, "visibility_filter_shared (CUDA)"); 97 | } 98 | -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import numpy as np 4 | import h5py 5 | import argparse 6 | import torch 7 | from torchvision import transforms 8 | import mathutils 9 | from PIL import Image 10 | import cv2 11 | import visibility 12 | from core.raft import RAFT 13 | from core.utils_point import invert_pose, overlay_imgs 14 | from core.data_preprocess import Data_preprocess 15 | from core.depth_completion import sparse_to_dense 16 | from core.flow_viz import flow_to_image 17 | from core.flow2pose import Flow2Pose, err_Pose 18 | 19 | def custom_transform(rgb): 20 | to_tensor = transforms.ToTensor() 21 | normalization = transforms.Normalize(mean=[0.485, 0.456, 0.406], 22 | std=[0.229, 0.224, 0.225]) 23 | 24 | rgb = to_tensor(rgb) 25 | rgb = normalization(rgb) 26 | 27 | return rgb 28 | 29 | def load_data(root, id): 30 | img_path = os.path.join(root, "image", id + '.png') 31 | pc_path = os.path.join(root, "pc", id + '.h5') 32 | 33 | try: 34 | with h5py.File(pc_path, 'r') as hf: 35 | pc = hf['PC'][:] 36 | except Exception as e: 37 | print(f'File Broken: {pc_path}') 38 | raise e 39 | 40 | pc_in = torch.from_numpy(pc.astype(np.float32)) 41 | if pc_in.shape[1] == 4 or pc_in.shape[1] == 3: 42 | pc_in = pc_in.t() 43 | if pc_in.shape[0] == 3: 44 | homogeneous = torch.ones(pc_in.shape[1]).unsqueeze(0) 45 | pc_in = torch.cat((pc_in, homogeneous), 0) 46 | elif pc_in.shape[0] == 4: 47 | if not torch.all(pc_in[3, :] == 1.): 48 | pc_in[3, :] = 1. 49 | else: 50 | raise TypeError("Wrong PointCloud shape") 51 | 52 | img = Image.open(img_path) 53 | img = custom_transform(img) 54 | 55 | max_r = 10. 56 | max_t = 2. 57 | max_angle = max_r 58 | rotz = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0) 59 | roty = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0) 60 | rotx = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0) 61 | transl_x = np.random.uniform(-max_t, max_t) 62 | transl_y = np.random.uniform(-max_t, max_t) 63 | transl_z = np.random.uniform(-max_t, min(max_t, 1.)) 64 | 65 | R = mathutils.Euler((rotx, roty, rotz), 'XYZ') 66 | T = mathutils.Vector((transl_x, transl_y, transl_z)) 67 | 68 | R, T = invert_pose(R, T) 69 | R, T = torch.tensor(R), torch.tensor(T) 70 | 71 | return pc_in, img, R, T 72 | 73 | 74 | def demo(args): 75 | device = torch.device(f"cuda:{args.gpus[0]}" if torch.cuda.is_available() else "cpu") 76 | os.environ["CUDA_LAUNCH_BLOCKING"] = "1" 77 | torch.cuda.set_device(args.gpus[0]) 78 | 79 | root = args.data_path 80 | calib = torch.tensor([718.856, 718.856, 607.1928, 185.2157]) 81 | calib = calib.unsqueeze(0) 82 | occlusion_kernel = 5 83 | occlusion_threshold = 3 84 | id_list = sorted(os.listdir(os.path.join(root, "image"))) 85 | id_list = [id[:6] for id in id_list] 86 | 87 | model = torch.nn.DataParallel(RAFT(args), device_ids=args.gpus) 88 | model.load_state_dict(torch.load(args.load_checkpoints)) 89 | model.to(device) 90 | 91 | for k in range(len(id_list)): 92 | id = id_list[k] 93 | 94 | pc, img, R_err, T_err = load_data(root, id) 95 | pc = pc.unsqueeze(0) 96 | img = img.unsqueeze(0) 97 | R_err = R_err.unsqueeze(0) 98 | T_err = T_err.unsqueeze(0) 99 | calib_k = calib.clone() 100 | 101 | data_generate = Data_preprocess(calib_k, occlusion_threshold, occlusion_kernel) 102 | rgb_input, spare_depth, flow_gt = data_generate.push(img, pc, T_err, R_err, device, split='test') 103 | 104 | # dilation 105 | dense_depth = [] 106 | for i in range(spare_depth.shape[0]): 107 | depth_img = spare_depth[i, 0, :, :].cpu().numpy() * 100. 108 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32)) 109 | dense_depth.append(depth_img_dilate / 100.) 110 | dense_depth = torch.tensor(np.array(dense_depth)).float().to(device) 111 | dense_depth = dense_depth.unsqueeze(1) 112 | 113 | _, flow_up = model(dense_depth, rgb_input, lidar_mask=spare_depth, iters=24, test_mode=True) 114 | 115 | if args.render: 116 | if not os.path.exists(f"{root}/visualization"): 117 | os.mkdir(f"{root}/visualization") 118 | os.mkdir(f"{root}/visualization/flow") 119 | os.mkdir(f"{root}/visualization/original_overlay") 120 | os.mkdir(f"{root}/visualization/warp_overlay") 121 | 122 | flow_image = flow_to_image(flow_up.permute(0, 2, 3, 1).cpu().detach().numpy()[0]) 123 | cv2.imwrite(f'{root}/visualization/flow/{id}.png', flow_image) 124 | 125 | output = torch.zeros(flow_up.shape).to(device) 126 | pred_depth_img = torch.zeros(spare_depth.shape).to(device) 127 | pred_depth_img += 1000. 128 | output = visibility.image_warp_index(spare_depth.to(device), 129 | flow_up.int().to(device), pred_depth_img, 130 | output, spare_depth.shape[3], spare_depth.shape[2]) 131 | pred_depth_img[pred_depth_img == 1000.] = 0. 132 | 133 | original_overlay = overlay_imgs(rgb_input[0, :, :, :], spare_depth[0, 0, :, :]) 134 | cv2.imwrite(f'{root}/visualization/original_overlay/{id}.png', original_overlay) 135 | warp_overlay = overlay_imgs(rgb_input[0, :, :, :], pred_depth_img[0, 0, :, :]) 136 | cv2.imwrite(f'{root}/visualization/warp_overlay/{id}.png', warp_overlay) 137 | 138 | R_pred, T_pred = Flow2Pose(flow_up, spare_depth, calib_k) 139 | R_gt = torch.tensor([1., 0., 0., 0.]) 140 | T_gt = torch.tensor([0., 0., 0.]) 141 | init_err_r, init_err_t, _ = err_Pose(R_err[0], T_err[0], R_gt, T_gt) 142 | pred_err_r, pred_err_t, _ = err_Pose(R_pred, T_pred, R_err[0], T_err[0]) 143 | print(f"sample {id}:") 144 | print(f"initial rotation error {init_err_r.item():.5f} initial translation error {init_err_t.item():.5f} cm") 145 | print(f"prediction rotation error {pred_err_r.item():.5f} prediction translation error {pred_err_t.item():.5f} cm") 146 | 147 | if __name__ == "__main__": 148 | parser = argparse.ArgumentParser() 149 | parser.add_argument('--data_path', type=str, metavar='DIR', default="./sample", help='path to dataset') 150 | parser.add_argument('-cps', '--load_checkpoints', help="restore checkpoint") 151 | parser.add_argument('--gpus', type=int, nargs='+', default=[0]) 152 | parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision') 153 | parser.add_argument('--render', action='store_true') 154 | args = parser.parse_args() 155 | 156 | demo(args) -------------------------------------------------------------------------------- /doc/I2D-Loc--Camera-localization-via-image_2022_ISPRS-Journal-of-Photogrammetry-.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/doc/I2D-Loc--Camera-localization-via-image_2022_ISPRS-Journal-of-Photogrammetry-.pdf -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import cv2 4 | import numpy as np 5 | import torch 6 | from torch.utils.data import DataLoader, RandomSampler 7 | import argparse 8 | import visibility 9 | import time 10 | 11 | sys.path.append('core') 12 | from raft import RAFT 13 | from datasets_kitti import DatasetVisibilityKittiSingle 14 | from camera_model import CameraModel 15 | from utils import fetch_optimizer, Logger, count_parameters 16 | from utils_point import merge_inputs, overlay_imgs 17 | from data_preprocess import Data_preprocess 18 | from losses import sequence_loss 19 | from depth_completion import sparse_to_dense 20 | from flow_viz import flow_to_image 21 | from flow2pose import Flow2Pose, err_Pose 22 | 23 | 24 | occlusion_kernel = 5 25 | occlusion_threshold = 3 26 | seed = 1234 27 | 28 | try: 29 | from torch.cuda.amp import GradScaler 30 | except: 31 | class GradScaler: 32 | def __init__(self): 33 | pass 34 | 35 | def scale(self, loss): 36 | return loss 37 | 38 | def unscale_(self, optimizer): 39 | pass 40 | 41 | def step(self, optimizer): 42 | optimizer.step() 43 | 44 | def update(self): 45 | pass 46 | 47 | def _init_fn(worker_id, seed): 48 | seed = seed 49 | print(f"Init worker {worker_id} with seed {seed}") 50 | torch.manual_seed(seed) 51 | np.random.seed(seed) 52 | np.random.seed(seed) 53 | 54 | def train(args, TrainImgLoader, model, optimizer, scheduler, scaler, logger, device): 55 | global occlusion_threshold, occlusion_kernel 56 | model.train() 57 | for i_batch, sample in enumerate(TrainImgLoader): 58 | rgb = sample['rgb'] 59 | pc = sample['point_cloud'] 60 | calib = sample['calib'] 61 | T_err = sample['tr_error'] 62 | R_err = sample['rot_error'] 63 | 64 | data_generate = Data_preprocess(calib, occlusion_threshold, occlusion_kernel) 65 | rgb_input, lidar_input, flow_gt = data_generate.push(rgb, pc, T_err, R_err, device) 66 | 67 | # dilation 68 | depth_img_input = [] 69 | for i in range(lidar_input.shape[0]): 70 | depth_img = lidar_input[i, 0, :, :].cpu().numpy() * 100. 71 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32)) 72 | depth_img_input.append(depth_img_dilate / 100.) 73 | depth_img_input = torch.tensor(depth_img_input).float().to(device) 74 | depth_img_input = depth_img_input.unsqueeze(1) 75 | 76 | optimizer.zero_grad() 77 | flow_preds = model(depth_img_input, rgb_input, lidar_mask=lidar_input, iters=args.iters) 78 | 79 | loss, metrics = sequence_loss(flow_preds, flow_gt, args.gamma, MAX_FLOW=400) 80 | 81 | scaler.scale(loss).backward() 82 | scaler.unscale_(optimizer) 83 | torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) 84 | 85 | scaler.step(optimizer) 86 | scheduler.step() 87 | scaler.update() 88 | 89 | logger.push(metrics) 90 | 91 | 92 | def test(args, TestImgLoader, model, device, cal_pose=False): 93 | global occlusion_threshold, occlusion_kernel 94 | model.eval() 95 | out_list, epe_list = [], [] 96 | Time = 0. 97 | outliers, err_r_list, err_t_list = [], [], [] 98 | for i_batch, sample in enumerate(TestImgLoader): 99 | rgb = sample['rgb'] 100 | pc = sample['point_cloud'] 101 | calib = sample['calib'] 102 | T_err = sample['tr_error'] 103 | R_err = sample['rot_error'] 104 | 105 | data_generate = Data_preprocess(calib, occlusion_threshold, occlusion_kernel) 106 | rgb_input, lidar_input, flow_gt = data_generate.push(rgb, pc, T_err, R_err, device, split='test') 107 | 108 | # dilation 109 | depth_img_input = [] 110 | for i in range(lidar_input.shape[0]): 111 | depth_img = lidar_input[i, 0, :, :].cpu().numpy() * 100. 112 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32)) 113 | depth_img_input.append(depth_img_dilate / 100.) 114 | depth_img_input = torch.tensor(depth_img_input).float().to(device) 115 | depth_img_input = depth_img_input.unsqueeze(1) 116 | 117 | end = time.time() 118 | _, flow_up = model(depth_img_input, rgb_input, lidar_mask=lidar_input, iters=24, test_mode=True) 119 | 120 | if args.render: 121 | if not os.path.exists(f"./visualization"): 122 | os.mkdir(f"./visualization") 123 | os.mkdir(f"./visualization/flow") 124 | os.mkdir(f"./visualization/original_overlay") 125 | os.mkdir(f"./visualization/warp_overlay") 126 | 127 | flow_image = flow_to_image(flow_up.permute(0, 2, 3, 1).cpu().detach().numpy()[0]) 128 | cv2.imwrite(f'./visualization/flow/{i_batch:06d}.png', flow_image) 129 | 130 | output = torch.zeros(flow_up.shape).to(device) 131 | pred_depth_img = torch.zeros(lidar_input.shape).to(device) 132 | pred_depth_img += 1000. 133 | output = visibility.image_warp_index(lidar_input.to(device), 134 | flow_up.int().to(device), pred_depth_img, 135 | output, lidar_input.shape[3], lidar_input.shape[2]) 136 | pred_depth_img[pred_depth_img == 1000.] = 0. 137 | 138 | original_overlay = overlay_imgs(rgb_input[0, :, :, :], lidar_input[0, 0, :, :]) 139 | cv2.imwrite(f'./visualization/original_overlay/{i_batch:06d}.png', original_overlay) 140 | warp_overlay = overlay_imgs(rgb_input[0, :, :, :], pred_depth_img[0, 0, :, :]) 141 | cv2.imwrite(f'./visualization/warp_overlay/{i_batch:06d}.png', warp_overlay) 142 | 143 | if not cal_pose: 144 | epe = torch.sum((flow_up - flow_gt) ** 2, dim=1).sqrt() 145 | mag = torch.sum(flow_gt ** 2, dim=1).sqrt() 146 | epe = epe.view(-1) 147 | mag = mag.view(-1) 148 | valid_gt = (flow_gt[:, 0, :, :] != 0) + (flow_gt[:, 1, :, :] != 0) 149 | val = valid_gt.view(-1) >= 0.5 150 | 151 | out = ((epe > 3.0) & ((epe / mag) > 0.05)).float() 152 | epe_list.append(epe[val].mean().item()) 153 | out_list.append(out[val].cpu().numpy()) 154 | else: 155 | R_pred, T_pred = Flow2Pose(flow_up, lidar_input, calib) 156 | Time += time.time() - end 157 | err_r, err_t, is_fail = err_Pose(R_pred, T_pred, R_err[0], T_err[0]) 158 | if is_fail: 159 | outliers.append(i_batch) 160 | else: 161 | err_r_list.append(err_r.item()) 162 | err_t_list.append(err_t.item()) 163 | print(f"{i_batch:05d}: {np.mean(err_t_list):.5f} {np.mean(err_r_list):.5f} {np.median(err_t_list):.5f} " 164 | f"{np.median(err_r_list):.5f} {len(outliers)} {Time / (i_batch+1):.5f}") 165 | 166 | if not cal_pose: 167 | epe_list = np.array(epe_list) 168 | out_list = np.concatenate(out_list) 169 | 170 | epe = np.mean(epe_list) 171 | f1 = 100 * np.mean(out_list) 172 | 173 | return epe, f1 174 | else: 175 | return err_t_list, err_r_list, outliers, Time 176 | 177 | 178 | if __name__ == '__main__': 179 | parser = argparse.ArgumentParser() 180 | parser.add_argument('--data_path', type=str, metavar='DIR', 181 | default='/data/cky/KITTI/sequences', 182 | help='path to dataset') 183 | parser.add_argument('--test_sequence', type=str, default='00') 184 | parser.add_argument('-cps', '--load_checkpoints', help="restore checkpoint") 185 | parser.add_argument('--epochs', default=100, type=int, metavar='N', 186 | help='number of total epochs to run') 187 | parser.add_argument('--starting_epoch', default=0, type=int, metavar='N', 188 | help='manual epoch number (useful on restarts)') 189 | parser.add_argument('-b', '--batch_size', default=2, type=int, 190 | metavar='N', help='mini-batch size') 191 | parser.add_argument('--lr', '--learning_rate', default=4e-5, type=float, 192 | metavar='LR', help='initial learning rate') 193 | parser.add_argument('--wdecay', type=float, default=.00005) 194 | parser.add_argument('--epsilon', type=float, default=1e-8) 195 | parser.add_argument('--clip', type=float, default=1.0) 196 | parser.add_argument('--gamma', type=float, default=0.8, help='exponential weighting') 197 | parser.add_argument('--iters', type=int, default=12) 198 | parser.add_argument('--gpus', type=int, nargs='+', default=[0]) 199 | parser.add_argument('--max_r', type=float, default=10.) 200 | parser.add_argument('--max_t', type=float, default=2.) 201 | parser.add_argument('--use_reflectance', default=False) 202 | parser.add_argument('--num_workers', type=int, default=3) 203 | parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision') 204 | parser.add_argument('--evaluate_interval', default=1, type=int, metavar='N', 205 | help='Evaluate every \'evaluate interval\' epochs ') 206 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', 207 | help='evaluate model on validation set') 208 | parser.add_argument('--render', action='store_true') 209 | args = parser.parse_args() 210 | 211 | device = torch.device(f"cuda:{args.gpus[0]}" if torch.cuda.is_available() else "cpu") 212 | os.environ["CUDA_LAUNCH_BLOCKING"] = "1" 213 | torch.cuda.set_device(args.gpus[0]) 214 | 215 | batch_size = args.batch_size 216 | 217 | model = torch.nn.DataParallel(RAFT(args), device_ids=args.gpus) 218 | print("Parameter Count: %d" % count_parameters(model)) 219 | if args.load_checkpoints is not None: 220 | model.load_state_dict(torch.load(args.load_checkpoints)) 221 | model.to(device) 222 | 223 | def init_fn(x): 224 | return _init_fn(x, seed) 225 | 226 | dataset_test = DatasetVisibilityKittiSingle(args.data_path, max_r=args.max_r, max_t=args.max_t, 227 | split='test', use_reflectance=args.use_reflectance, 228 | test_sequence=args.test_sequence) 229 | TestImgLoader = torch.utils.data.DataLoader(dataset=dataset_test, 230 | shuffle=False, 231 | batch_size=1, 232 | num_workers=args.num_workers, 233 | worker_init_fn=init_fn, 234 | collate_fn=merge_inputs, 235 | drop_last=False, 236 | pin_memory=True) 237 | if args.evaluate: 238 | with torch.no_grad(): 239 | err_t_list, err_r_list, outliers, Time = test(args, TestImgLoader, model, device, cal_pose=True) 240 | print(f"Mean trans error {np.mean(err_t_list):.5f} Mean rotation error {np.mean(err_r_list):.5f}") 241 | print(f"Median trans error {np.median(err_t_list):.5f} Median rotation error {np.median(err_r_list):.5f}") 242 | print(f"Outliers number {len(outliers)}/{len(TestImgLoader)} Mean {Time / len(TestImgLoader):.5f} per frame") 243 | sys.exit() 244 | 245 | dataset_train = DatasetVisibilityKittiSingle(args.data_path, max_r=args.max_r, max_t=args.max_t, 246 | split='train', use_reflectance=args.use_reflectance, 247 | test_sequence=args.test_sequence) 248 | TrainImgLoader = torch.utils.data.DataLoader(dataset=dataset_train, 249 | shuffle=True, 250 | batch_size=batch_size, 251 | num_workers=args.num_workers, 252 | worker_init_fn=init_fn, 253 | collate_fn=merge_inputs, 254 | drop_last=False, 255 | pin_memory=True) 256 | print("Train length: ", len(TrainImgLoader)) 257 | print("Test length: ", len(TestImgLoader)) 258 | 259 | optimizer, scheduler = fetch_optimizer(args, len(TrainImgLoader), model) 260 | scaler = GradScaler(enabled=args.mixed_precision) 261 | logger = Logger(model, scheduler, SUM_FREQ=100) 262 | 263 | starting_epoch = args.starting_epoch 264 | min_val_err = 9999. 265 | for epoch in range(starting_epoch, args.epochs): 266 | # train 267 | train(args, TrainImgLoader, model, optimizer, scheduler, scaler, logger, device) 268 | 269 | if epoch % args.evaluate_interval == 0: 270 | epe, f1 = test(args, TestImgLoader, model, device) 271 | print("Validation KITTI: %f, %f" % (epe, f1)) 272 | 273 | results = {'kitti-epe': epe, 'kitti-f1': f1} 274 | logger.write_dict(results) 275 | 276 | torch.save(model.state_dict(), "./checkpoints/checkpoint.pth") 277 | 278 | if epe < min_val_err: 279 | min_val_err = epe 280 | torch.save(model.state_dict(), './checkpoints/best_model.pth') 281 | 282 | 283 | 284 | 285 | 286 | -------------------------------------------------------------------------------- /main_bpnp.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import cv2 4 | import numpy as np 5 | import torch 6 | from torch.utils.data import DataLoader, RandomSampler 7 | import argparse 8 | import visibility 9 | import time 10 | 11 | sys.path.append('core') 12 | from raft import RAFT 13 | from datasets_kitti import DatasetVisibilityKittiSingle 14 | from camera_model import CameraModel 15 | from utils import fetch_optimizer, Logger, count_parameters 16 | from utils_point import merge_inputs, overlay_imgs, quat2mat, tvector2mat, mat2xyzrpy 17 | from data_preprocess import Data_preprocess 18 | from losses import sequence_loss, normal_loss 19 | from depth_completion import sparse_to_dense 20 | from flow_viz import flow_to_image 21 | from flow2pose import Flow2PoseBPnP, err_Pose 22 | from BPnP import BPnP, batch_project 23 | 24 | occlusion_kernel = 5 25 | occlusion_threshold = 3 26 | seed = 1234 27 | BPnP_EPOCH = 30 28 | 29 | try: 30 | from torch.cuda.amp import GradScaler 31 | except: 32 | class GradScaler: 33 | def __init__(self): 34 | pass 35 | 36 | def scale(self, loss): 37 | return loss 38 | 39 | def unscale_(self, optimizer): 40 | pass 41 | 42 | def step(self, optimizer): 43 | optimizer.step() 44 | 45 | def update(self): 46 | pass 47 | 48 | def _init_fn(worker_id, seed): 49 | seed = seed 50 | print(f"Init worker {worker_id} with seed {seed}") 51 | torch.manual_seed(seed) 52 | np.random.seed(seed) 53 | np.random.seed(seed) 54 | 55 | def train(args, epoch, TrainImgLoader, model, optimizer, scheduler, scaler, logger, device): 56 | global occlusion_threshold, occlusion_kernel 57 | model.train() 58 | bpnp = BPnP.apply 59 | cam_model = CameraModel() 60 | for i_batch, sample in enumerate(TrainImgLoader): 61 | rgb = sample['rgb'] 62 | pc = sample['point_cloud'] 63 | calib = sample['calib'] 64 | T_err = sample['tr_error'] 65 | R_err = sample['rot_error'] 66 | 67 | data_generate = Data_preprocess(calib, occlusion_threshold, occlusion_kernel) 68 | rgb_input, lidar_input, flow_gt = data_generate.push(rgb, pc, T_err, R_err, device) 69 | 70 | # dilation 71 | depth_img_input = [] 72 | for i in range(lidar_input.shape[0]): 73 | depth_img = lidar_input[i, 0, :, :].cpu().numpy() * 100. 74 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32)) 75 | depth_img_input.append(depth_img_dilate / 100.) 76 | depth_img_input = torch.tensor(depth_img_input).float().to(device) 77 | depth_img_input = depth_img_input.unsqueeze(1) 78 | 79 | optimizer.zero_grad() 80 | flow_preds = model(depth_img_input, rgb_input, lidar_mask=lidar_input, iters=args.iters) 81 | 82 | loss, metrics = sequence_loss(flow_preds, flow_gt, args.gamma, MAX_FLOW=400) 83 | norm_loss = normal_loss(flow_preds, flow_gt, calib, lidar_input) 84 | loss += norm_loss * 100 85 | 86 | ## BPnP loss 87 | if epoch > BPnP_EPOCH: 88 | loss_poses = 0.0 89 | 90 | flow_up = flow_preds[-1] 91 | 92 | depth_img_ori = lidar_input.cpu().numpy() * 100. 93 | pc_project_uv = np.zeros([lidar_input.shape[0], lidar_input.shape[2], lidar_input.shape[3], 2]) 94 | 95 | for i in range(flow_up.shape[0]): 96 | output = torch.zeros([1, flow_up.shape[1], flow_up.shape[2], flow_up.shape[3]]).to(device) 97 | warp_depth_img = torch.zeros([1, 1, flow_up.shape[2], flow_up.shape[3]]).to(device) 98 | warp_depth_img += 1000. 99 | output = visibility.image_warp_index(lidar_input[i, :, :, :].unsqueeze(0).to(device), 100 | flow_up[i, :, :, :].unsqueeze(0).int().to(device), warp_depth_img, 101 | output, lidar_input.shape[3], lidar_input.shape[2]) 102 | warp_depth_img[warp_depth_img == 1000.] = 0 103 | pc_project_uv[i, :, :, :] = output.cpu().permute(0, 2, 3, 1).numpy() 104 | 105 | for n in range(lidar_input.shape[0]): 106 | mask_depth_1 = pc_project_uv[n, :, :, 0] != 0 107 | mask_depth_2 = pc_project_uv[n, :, :, 1] != 0 108 | mask_depth = mask_depth_1 + mask_depth_2 109 | depth_img = depth_img_ori[n, 0, :, :] * mask_depth 110 | cam_params_clone = calib[n].cpu().numpy() 111 | cam_model.focal_length = cam_params_clone[:2] 112 | cam_model.principal_point = cam_params_clone[2:] 113 | pts3d, pts2d, _ = cam_model.deproject_pytorch(depth_img, pc_project_uv[n, :, :, :]) 114 | pts3d = torch.tensor(pts3d, dtype=torch.float32).to(device) 115 | pts2d = torch.tensor(pts2d, dtype=torch.float32).to(device) 116 | pts2d = pts2d.unsqueeze(0) 117 | cam_mat = np.array( 118 | [[cam_params_clone[0], 0, cam_params_clone[2]], [0, cam_params_clone[1], cam_params_clone[3]], 119 | [0, 0, 1.]]) 120 | K = torch.tensor(cam_mat, dtype=torch.float32).to(device) 121 | 122 | P_out = bpnp(pts2d, pts3d, K) 123 | pts2d_pro = batch_project(P_out, pts3d, K) 124 | 125 | R = quat2mat(R_err[n]) 126 | T = tvector2mat(T_err[n]) 127 | RT_inv = torch.mm(T, R) 128 | RT = RT_inv.clone().inverse() 129 | P_gt = mat2xyzrpy(RT) 130 | P_gt = P_gt[[4, 5, 3, 1, 2, 0]] 131 | P_gt = P_gt.unsqueeze(0) 132 | pts2d_gt = batch_project(P_gt.to(device), pts3d, K) 133 | 134 | pts2d_gt.requires_grad = True 135 | pts2d_pro.requires_grad = True 136 | loss_poses += ((pts2d_pro - pts2d_gt) ** 2).mean() 137 | 138 | loss_pose = loss_poses / lidar_input.shape[0] 139 | else: 140 | loss_pose = 0. 141 | 142 | loss += loss_pose * 10 143 | 144 | scaler.scale(loss).backward() 145 | scaler.unscale_(optimizer) 146 | torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip) 147 | 148 | scaler.step(optimizer) 149 | scheduler.step() 150 | scaler.update() 151 | 152 | logger.push(metrics) 153 | 154 | def test(args, TestImgLoader, model, bpnp, device, cal_pose=False): 155 | global occlusion_threshold, occlusion_kernel 156 | model.eval() 157 | out_list, epe_list = [], [] 158 | Time = 0. 159 | outliers, err_r_list, err_t_list = [], [], [] 160 | for i_batch, sample in enumerate(TestImgLoader): 161 | rgb = sample['rgb'] 162 | pc = sample['point_cloud'] 163 | calib = sample['calib'] 164 | T_err = sample['tr_error'] 165 | R_err = sample['rot_error'] 166 | 167 | data_generate = Data_preprocess(calib, occlusion_threshold, occlusion_kernel) 168 | rgb_input, lidar_input, flow_gt = data_generate.push(rgb, pc, T_err, R_err, device, split='test') 169 | 170 | # dilation 171 | depth_img_input = [] 172 | for i in range(lidar_input.shape[0]): 173 | depth_img = lidar_input[i, 0, :, :].cpu().numpy() * 100. 174 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32)) 175 | depth_img_input.append(depth_img_dilate / 100.) 176 | depth_img_input = torch.tensor(depth_img_input).float().to(device) 177 | depth_img_input = depth_img_input.unsqueeze(1) 178 | 179 | end = time.time() 180 | _, flow_up = model(depth_img_input, rgb_input, lidar_mask=lidar_input, iters=24, test_mode=True) 181 | 182 | if args.render: 183 | if not os.path.exists(f"./visulization"): 184 | os.mkdir(f"./visulization") 185 | os.mkdir(f"./visulization/flow") 186 | os.mkdir(f"./visulization/original_overlay") 187 | os.mkdir(f"./visulization/warp_overlay") 188 | 189 | flow_image = flow_to_image(flow_up.permute(0, 2, 3, 1).cpu().detach().numpy()[0]) 190 | cv2.imwrite(f'./visulization/flow/{i_batch:06d}.png', flow_image) 191 | 192 | output = torch.zeros(flow_up.shape).to(device) 193 | pred_depth_img = torch.zeros(lidar_input.shape).to(device) 194 | pred_depth_img += 1000. 195 | output = visibility.image_warp_index(lidar_input.to(device), 196 | flow_up.int().to(device), pred_depth_img, 197 | output, lidar_input.shape[3], lidar_input.shape[2]) 198 | pred_depth_img[pred_depth_img == 1000.] = 0. 199 | 200 | original_overlay = overlay_imgs(rgb_input[0, :, :, :], lidar_input[0, 0, :, :]) 201 | cv2.imwrite(f'./visulization/original_overlay/{i_batch:06d}.png', original_overlay) 202 | warp_overlay = overlay_imgs(rgb_input[0, :, :, :], pred_depth_img[0, 0, :, :]) 203 | cv2.imwrite(f'./visulization/warp_overlay/{i_batch:06d}.png', warp_overlay) 204 | 205 | if not cal_pose: 206 | epe = torch.sum((flow_up - flow_gt) ** 2, dim=1).sqrt() 207 | mag = torch.sum(flow_gt ** 2, dim=1).sqrt() 208 | epe = epe.view(-1) 209 | mag = mag.view(-1) 210 | valid_gt = (flow_gt[:, 0, :, :] != 0) + (flow_gt[:, 1, :, :] != 0) 211 | val = valid_gt.view(-1) >= 0.5 212 | 213 | out = ((epe > 3.0) & ((epe / mag) > 0.05)).float() 214 | epe_list.append(epe[val].mean().item()) 215 | out_list.append(out[val].cpu().numpy()) 216 | else: 217 | R_pred, T_pred = Flow2PoseBPnP(flow_up, lidar_input, calib, bpnp) 218 | Time += time.time() - end 219 | err_r, err_t, is_fail = err_Pose(R_pred, T_pred, R_err[0], T_err[0]) 220 | if is_fail: 221 | outliers.append(i_batch) 222 | else: 223 | err_r_list.append(err_r.item()) 224 | err_t_list.append(err_t.item()) 225 | print(f"{i_batch:05d}: {np.mean(err_t_list):.5f} {np.mean(err_r_list):.5f} {np.median(err_t_list):.5f} " 226 | f"{np.median(err_r_list):.5f} {len(outliers)} {Time / (i_batch+1):.5f}") 227 | 228 | if not cal_pose: 229 | epe_list = np.array(epe_list) 230 | out_list = np.concatenate(out_list) 231 | 232 | epe = np.mean(epe_list) 233 | f1 = 100 * np.mean(out_list) 234 | 235 | return epe, f1 236 | else: 237 | return err_t_list, err_r_list, outliers, Time 238 | 239 | 240 | if __name__ == '__main__': 241 | parser = argparse.ArgumentParser() 242 | parser.add_argument('--data_path', type=str, metavar='DIR', 243 | default='/data/cky/KITTI/sequences', 244 | help='path to dataset') 245 | parser.add_argument('--test_sequence', type=str, default='00') 246 | parser.add_argument('-cps', '--load_checkpoints', help="restore checkpoint") 247 | parser.add_argument('--epochs', default=100, type=int, metavar='N', 248 | help='number of total epochs to run') 249 | parser.add_argument('--starting_epoch', default=0, type=int, metavar='N', 250 | help='manual epoch number (useful on restarts)') 251 | parser.add_argument('-b', '--batch_size', default=2, type=int, 252 | metavar='N', help='mini-batch size') 253 | parser.add_argument('--lr', '--learning_rate', default=4e-5, type=float, 254 | metavar='LR', help='initial learning rate') 255 | parser.add_argument('--wdecay', type=float, default=.00005) 256 | parser.add_argument('--epsilon', type=float, default=1e-8) 257 | parser.add_argument('--clip', type=float, default=1.0) 258 | parser.add_argument('--gamma', type=float, default=0.8, help='exponential weighting') 259 | parser.add_argument('--iters', type=int, default=12) 260 | parser.add_argument('--gpus', type=int, nargs='+', default=[0]) 261 | parser.add_argument('--max_r', type=float, default=10.) 262 | parser.add_argument('--max_t', type=float, default=2.) 263 | parser.add_argument('--use_reflectance', default=False) 264 | parser.add_argument('--num_workers', type=int, default=3) 265 | parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision') 266 | parser.add_argument('--evaluate_interval', default=1, type=int, metavar='N', 267 | help='Evaluate every \'evaluate interval\' epochs ') 268 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true', 269 | help='evaluate model on validation set') 270 | parser.add_argument('--render', action='store_true') 271 | args = parser.parse_args() 272 | 273 | device = torch.device(f"cuda:{args.gpus[0]}" if torch.cuda.is_available() else "cpu") 274 | os.environ["CUDA_LAUNCH_BLOCKING"] = "1" 275 | torch.cuda.set_device(args.gpus[0]) 276 | 277 | batch_size = args.batch_size 278 | 279 | model = torch.nn.DataParallel(RAFT(args), device_ids=args.gpus) 280 | print("Parameter Count: %d" % count_parameters(model)) 281 | if args.load_checkpoints is not None: 282 | model.load_state_dict(torch.load(args.load_checkpoints)) 283 | model.to(device) 284 | 285 | bpnp = BPnP.apply 286 | 287 | def init_fn(x): 288 | return _init_fn(x, seed) 289 | 290 | dataset_test = DatasetVisibilityKittiSingle(args.data_path, max_r=args.max_r, max_t=args.max_t, 291 | split='test', use_reflectance=args.use_reflectance, 292 | test_sequence=args.test_sequence) 293 | TestImgLoader = torch.utils.data.DataLoader(dataset=dataset_test, 294 | shuffle=False, 295 | batch_size=1, 296 | num_workers=args.num_workers, 297 | worker_init_fn=init_fn, 298 | collate_fn=merge_inputs, 299 | drop_last=False, 300 | pin_memory=True) 301 | if args.evaluate: 302 | with torch.no_grad(): 303 | err_t_list, err_r_list, outliers, Time = test(args, TestImgLoader, model, bpnp, device, cal_pose=True) 304 | print(f"Mean trans error {np.mean(err_t_list):.5f} Mean rotation error {np.mean(err_r_list):.5f}") 305 | print(f"Median trans error {np.median(err_t_list):.5f} Median rotation error {np.median(err_r_list):.5f}") 306 | print(f"Outliers number {len(outliers)}/{len(TestImgLoader)} Mean {Time / len(TestImgLoader):.5f} per frame") 307 | sys.exit() 308 | 309 | dataset_train = DatasetVisibilityKittiSingle(args.data_path, max_r=args.max_r, max_t=args.max_t, 310 | split='train', use_reflectance=args.use_reflectance, 311 | test_sequence=args.test_sequence) 312 | TrainImgLoader = torch.utils.data.DataLoader(dataset=dataset_train, 313 | shuffle=True, 314 | batch_size=batch_size, 315 | num_workers=args.num_workers, 316 | worker_init_fn=init_fn, 317 | collate_fn=merge_inputs, 318 | drop_last=False, 319 | pin_memory=True) 320 | print("Train length: ", len(TrainImgLoader)) 321 | print("Test length: ", len(TestImgLoader)) 322 | 323 | optimizer, scheduler = fetch_optimizer(args, len(TrainImgLoader), model) 324 | scaler = GradScaler(enabled=args.mixed_precision) 325 | logger = Logger(model, scheduler, SUM_FREQ=100) 326 | 327 | starting_epoch = args.starting_epoch 328 | min_val_err = 9999. 329 | for epoch in range(starting_epoch, args.epochs): 330 | # train 331 | train(args, epoch, TrainImgLoader, model, optimizer, scheduler, scaler, logger, device) 332 | 333 | if epoch % args.evaluate_interval == 0: 334 | epe, f1 = test(args, TestImgLoader, model, bpnp, device) 335 | print("Validation KITTI: %f, %f" % (epe, f1)) 336 | 337 | results = {'kitti-epe': epe, 'kitti-f1': f1} 338 | logger.write_dict(results) 339 | 340 | torch.save(model.state_dict(), "./checkpoints/checkpoint.pth") 341 | 342 | if epe < min_val_err: 343 | min_val_err = epe 344 | torch.save(model.state_dict(), './checkpoints/best_model.pth') 345 | 346 | 347 | 348 | 349 | 350 | -------------------------------------------------------------------------------- /preprocess/kitti_maps.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import sys 4 | sys.path.append("..") 5 | sys.path.append(".") 6 | 7 | import h5py 8 | import numpy as np 9 | import open3d as o3 10 | import pykitti 11 | import torch 12 | from tqdm import tqdm 13 | 14 | from utils import to_rotation_matrix 15 | 16 | 17 | parser = argparse.ArgumentParser() 18 | parser.add_argument('--sequence', default='00', 19 | help='sequence') 20 | parser.add_argument('--device', default='cuda', 21 | help='device') 22 | parser.add_argument('--voxel_size', default=0.1, type=float, help='Voxel Size') 23 | parser.add_argument('--start', default=0, help='Starting Frame') 24 | parser.add_argument('--end', default=100000, help='End Frame') 25 | parser.add_argument('--map', default=None, help='Use map file') 26 | parser.add_argument('--kitti_folder', default='./KITTI/ODOMETRY', help='Folder of the KITTI dataset') 27 | 28 | args = parser.parse_args() 29 | sequence = args.sequence 30 | print("Sequnce: ", sequence) 31 | velodyne_folder = os.path.join(args.kitti_folder, 'sequences', sequence, 'velodyne') 32 | pose_file = os.path.join('./data', f'kitti-{sequence}.csv') 33 | 34 | poses = [] 35 | with open(pose_file, 'r') as f: 36 | for x in f: 37 | if x.startswith('timestamp'): 38 | continue 39 | x = x.split(',') 40 | T = torch.tensor([float(x[1]), float(x[2]), float(x[3])]) 41 | R = torch.tensor([float(x[7]), float(x[4]), float(x[5]), float(x[6])]) 42 | poses.append(to_rotation_matrix(R, T)) 43 | 44 | map_file = args.map 45 | first_frame = int(args.start) 46 | last_frame = min(len(poses), int(args.end)) 47 | kitti = pykitti.odometry(args.kitti_folder, sequence) 48 | 49 | if map_file is None: 50 | 51 | pc_map = [] 52 | pcl = o3.PointCloud() 53 | for i in tqdm(range(first_frame, last_frame)): 54 | pc = kitti.get_velo(i) 55 | valid_indices = pc[:, 0] < -3. 56 | valid_indices = valid_indices | (pc[:, 0] > 3.) 57 | valid_indices = valid_indices | (pc[:, 1] < -3.) 58 | valid_indices = valid_indices | (pc[:, 1] > 3.) 59 | pc = pc[valid_indices].copy() 60 | intensity = pc[:, 3].copy() 61 | pc[:, 3] = 1. 62 | RT = poses[i].numpy() 63 | pc_rot = np.matmul(RT, pc.T) 64 | pc_rot = pc_rot.astype(np.float).T.copy() 65 | 66 | pcl_local = o3.PointCloud() 67 | pcl_local.points = o3.Vector3dVector(pc_rot[:, :3]) 68 | pcl_local.colors = o3.Vector3dVector(np.vstack((intensity, intensity, intensity)).T) 69 | 70 | downpcd = o3.voxel_down_sample(pcl_local, voxel_size=args.voxel_size) 71 | 72 | pcl.points.extend(downpcd.points) 73 | pcl.colors.extend(downpcd.colors) 74 | 75 | 76 | downpcd_full = o3.voxel_down_sample(pcl, voxel_size=args.voxel_size) 77 | downpcd, ind = o3.statistical_outlier_removal(downpcd_full, nb_neighbors=40, std_ratio=0.3) 78 | #o3.draw_geometries(downpcd) 79 | o3.write_point_cloud(os.path.join(args.kitti_folder, 'sequences', sequence, f'map-{sequence}_{args.voxel_size}_{first_frame}-{last_frame}.pcd'), downpcd) 80 | else: 81 | downpcd = o3.read_point_cloud(map_file) 82 | 83 | 84 | voxelized = torch.tensor(downpcd.points, dtype=torch.float) 85 | voxelized = torch.cat((voxelized, torch.ones([voxelized.shape[0], 1], dtype=torch.float)), 1) 86 | voxelized = voxelized.t() 87 | voxelized = voxelized.to(args.device) 88 | vox_intensity = torch.tensor(downpcd.colors, dtype=torch.float)[:, 0:1].t() 89 | 90 | velo2cam2 = torch.from_numpy(kitti.calib.T_cam2_velo).float().to(args.device) 91 | 92 | # SAVE SINGLE PCs 93 | if not os.path.exists(os.path.join(args.kitti_folder, 'sequences', sequence, 94 | f'local_maps_{args.voxel_size}')): 95 | os.mkdir(os.path.join(args.kitti_folder, 'sequences', sequence, f'local_maps_{args.voxel_size}')) 96 | for i in tqdm(range(first_frame, last_frame)): 97 | pose = poses[i] 98 | pose = pose.to(args.device) 99 | pose = pose.inverse() 100 | 101 | local_map = voxelized.clone() 102 | local_intensity = vox_intensity.clone() 103 | local_map = torch.mm(pose, local_map).t() 104 | indexes = local_map[:, 1] > -25. 105 | indexes = indexes & (local_map[:, 1] < 25.) 106 | indexes = indexes & (local_map[:, 0] > -10.) 107 | indexes = indexes & (local_map[:, 0] < 100.) 108 | local_map = local_map[indexes] 109 | local_intensity = local_intensity[:, indexes] 110 | 111 | local_map = torch.mm(velo2cam2, local_map.t()) 112 | local_map = local_map[[2, 0, 1, 3], :] 113 | 114 | #pcd = o3.PointCloud() 115 | #pcd.points = o3.Vector3dVector(local_map[:,:3].numpy()) 116 | #o3.write_point_cloud(f'{i:06d}.pcd', pcd) 117 | 118 | file = os.path.join(args.kitti_folder, 'sequences', sequence, 119 | f'local_maps_{args.voxel_size}', f'{i:06d}.h5') 120 | with h5py.File(file, 'w') as hf: 121 | hf.create_dataset('PC', data=local_map.cpu().half(), compression='lzf', shuffle=True) 122 | hf.create_dataset('intensity', data=local_intensity.cpu().half(), compression='lzf', shuffle=True) 123 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | scikit_image 2 | git+https://gitlab.com/m1lhaus/blender-mathutils.git 3 | tqdm 4 | pandas 5 | h5py 6 | matplotlib 7 | scipy 8 | pyquaternion 9 | opencv-python 10 | pykitti 11 | numpy 12 | open3d-python==0.7.0.0 13 | Pillow 14 | scikit-image 15 | sacred==0.7.4 16 | tqdm 17 | tensorboardX -------------------------------------------------------------------------------- /sample/image/000500.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/image/000500.png -------------------------------------------------------------------------------- /sample/image/001000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/image/001000.png -------------------------------------------------------------------------------- /sample/image/001500.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/image/001500.png -------------------------------------------------------------------------------- /sample/image/002000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/image/002000.png -------------------------------------------------------------------------------- /sample/overlay/000500.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/overlay/000500.png -------------------------------------------------------------------------------- /sample/overlay/001000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/overlay/001000.png -------------------------------------------------------------------------------- /sample/overlay/001500.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/overlay/001500.png -------------------------------------------------------------------------------- /sample/overlay/002000.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/overlay/002000.png -------------------------------------------------------------------------------- /sample/pc/000500.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/pc/000500.h5 -------------------------------------------------------------------------------- /sample/pc/001000.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/pc/001000.h5 -------------------------------------------------------------------------------- /sample/pc/001500.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/pc/001500.h5 -------------------------------------------------------------------------------- /sample/pc/002000.h5: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/pc/002000.h5 --------------------------------------------------------------------------------