├── Network.png
├── README.md
├── core
├── BPnP.py
├── camera_model.py
├── corr.py
├── correlation_package
│ ├── LICENSE
│ ├── correlation.py
│ ├── correlation_cuda.cc
│ ├── correlation_cuda_kernel.cu
│ ├── correlation_cuda_kernel.cuh
│ ├── pyproject.toml
│ └── setup.py
├── data_preprocess.py
├── datasets_kitti.py
├── depth_completion.py
├── extractor.py
├── flow2pose.py
├── flow_viz.py
├── losses.py
├── quaternion_distances.py
├── raft.py
├── update.py
├── utils.py
├── utils_point.py
└── visibility_package
│ ├── setup.py
│ ├── visibility.cpp
│ ├── visibility_kernel.cu
│ ├── visibility_kernel_new.cu
│ └── visibility_new.cpp
├── demo.py
├── doc
└── I2D-Loc--Camera-localization-via-image_2022_ISPRS-Journal-of-Photogrammetry-.pdf
├── main.py
├── main_bpnp.py
├── preprocess
└── kitti_maps.py
├── requirements.txt
└── sample
├── image
├── 000500.png
├── 001000.png
├── 001500.png
└── 002000.png
├── overlay
├── 000500.png
├── 001000.png
├── 001500.png
└── 002000.png
└── pc
├── 000500.h5
├── 001000.h5
├── 001500.h5
└── 002000.h5
/Network.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/Network.png
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # I2D-Loc
2 | This repository contains the source code for our paper:
3 |
4 | [I2D-Loc: Camera localization via image to LiDAR depth flow](https://www.sciencedirect.com/science/article/pii/S0924271622002775?dgcid=coauthor)
5 | ISPRS 2022
6 | Kuangyi Chen, Huai Yu, Wen Yang, Lei Yu, Sebastian Scherer and Gui-Song Xia
7 |
8 |
9 |
10 | ## Requirements
11 | The code has been trained and tested with PyTorch 1.12 and Cuda 11.6.
12 | ```Shell
13 | conda create -n i2d python=3.7 -y
14 | conda activate i2d
15 | pip install -r requirements.txt
16 | pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu116
17 | cd core/correlation_package
18 | python setup.py install
19 | cd ..
20 | cd visibility_package
21 | python setup.py install
22 | cd ../..
23 | ```
24 |
25 | ## Demos
26 | Pretrained models can be downloaded from [google drive](https://drive.google.com/drive/folders/19VWNCPR1me7SnON1NYJRFrdgd1sKj052?usp=sharing)
27 |
28 | You can demo a trained model on a sequence of frames
29 | ```Shell
30 | python demo.py --load_checkpoints checkpoints/2_10/checkpoints.pth --render
31 | ```
32 |
33 | ## Required Data
34 | To evaluate/train I2D-Loc, you will need to download the required datasets.
35 | * [KITTI](https://www.cvlibs.net/datasets/kitti/eval_odometry.php)
36 |
37 | We trained and tested I2D-Loc on the KITTI odometry sequences 00, 03, 05, 07, 08, and 09.
38 | To obtain the whole LiDAR maps, we aggregate all scans at their ground truth positions.
39 | Then, we down-sample the LiDAR maps at a resolution of 0.1m. The downsampled point clouds are saved as h5 files.
40 |
41 | Use the script preprocess/kitti_maps.py with the ground truth files in data/ to generate the h5 files.
42 |
43 | ```Shell
44 | python preprocess/kitti_maps.py --sequence 00 --kitti_folder ./KITTI_ODOMETRY/
45 | python preprocess/kitti_maps.py --sequence 03 --kitti_folder ./KITTI_ODOMETRY/
46 | python preprocess/kitti_maps.py --sequence 05 --kitti_folder ./KITTI_ODOMETRY/
47 | python preprocess/kitti_maps.py --sequence 06 --kitti_folder ./KITTI_ODOMETRY/
48 | python preprocess/kitti_maps.py --sequence 07 --kitti_folder ./KITTI_ODOMETRY/
49 | python preprocess/kitti_maps.py --sequence 08 --kitti_folder ./KITTI_ODOMETRY/ --end 3000
50 | python preprocess/kitti_maps.py --sequence 08 --kitti_folder ./KITTI_ODOMETRY/ --start 3000
51 | python preprocess/kitti_maps.py --sequence 09 --kitti_folder ./KITTI_ODOMETRY/
52 | ```
53 |
54 | The final directory structure should look like:
55 |
56 | ```Shell
57 | ├── datasets
58 | ├── KITTI
59 | ├── sequences
60 | ├── 00
61 | ├── image_2
62 | ├── *.png
63 | ├── local_maps_0.1
64 | ├── *.h5
65 | ├── calib.txt
66 | ├── map-00_0.1_0-4541.pcd
67 | ├── poses.csv
68 | ├── 03
69 | ├── 05
70 | ├── 06
71 | ├── 07
72 | ├── 08
73 | ├── 09
74 | ```
75 |
76 | ## Evaluation
77 | You can evaluate a trained model using `main.py`
78 | ```Shell
79 | python main.py --data_path /data/KITTI/sequences --load_checkpoints checkpoints/2_10/checkpoints.pth -e
80 | ```
81 |
82 |
83 | ## Training
84 | You can train a model using `main.py`. Training logs will be written to the `runs` which can be visualized using tensorboard.
85 | ```Shell
86 | python main.py --data_path /data/KITTI/sequences --test_sequence 00 --epochs 100 --batch_size 2 --lr 4e-5 --gpus 0 --max_r 10. --max_t 2. --evaluate_interval 1
87 | ```
88 | If you want to train a model using BPnP as back-end, you can use `main_bpnp.py`.
89 | ```Shell
90 | python main_bpnp.py --data_path /data/KITTI/sequences --test_sequence 00 --epochs 100 --batch_size 2 --lr 4e-5 --gpus 0 --max_r 10. --max_t 2. --evaluate_interval 1
91 | ```
92 |
93 |
94 | ## Citation
95 | ```
96 | @inproceedings{CHEN2022209,
97 | title={{I2D-Loc: Camera Localization via Image to LiDAR Depth Flow}},
98 | author={Kuangyi Chen, Huai Yu, Wen Yang, Lei Yu, Sebastian Scherer and Gui-Song Xia},
99 | booktitle={ISPRS Journal of Photogrammetry and Remote Sensing},
100 | volume = {194},
101 | pages = {209-221},
102 | year={2022},
103 | issn = {0924-2716}
104 | }
105 | ```
106 |
107 | ## Acknowledgments
108 | The code is based on [CMRNet](https://github.com/cattaneod/CMRNet), [RAFT](https://github.com/princeton-vl/RAFT), and [BPnP](https://github.com/BoChenYS/BPnP).
--------------------------------------------------------------------------------
/core/BPnP.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import cv2 as cv
3 | import numpy as np
4 | import kornia as kn
5 |
6 | PTS_D_DEVICE = 'cuda'
7 |
8 | class BPnP(torch.autograd.Function):
9 | """
10 | Back-propagatable PnP
11 | INPUTS:
12 | pts2d - the 2D keypoints coordinates of size [batch_size, num_keypoints, 2]
13 | pts3d - the 3D keypoints coordinates of size [num_keypoints, 3]
14 | K - the camera intrinsic matrix of size [3, 3]
15 | OUTPUT:
16 | P_6d - the 6 DOF poses of size [batch_size, 6], where the first 3 elements of each row are the angle-axis rotation
17 | vector (Euler vector) and the last 3 elements are the translation vector.
18 | NOTE:
19 | This BPnP function assumes that all sets of 2D points in the mini-batch correspond to one common set of 3D points.
20 | For situations where pts3d is also a mini-batch, use the BPnP_m3d class.
21 | """
22 | @staticmethod
23 | def forward(ctx, pts2d, pts3d, K, ini_pose=None):
24 | bs = pts2d.size(0)
25 | n = pts2d.size(1)
26 | device = pts2d.device
27 | pts3d_np = np.array(pts3d.detach().cpu())
28 | K_np = np.array(K.detach().cpu())
29 | P_6d = torch.zeros(bs,6,device=device)
30 |
31 | for i in range(bs):
32 | pts2d_i_np = np.ascontiguousarray(pts2d[i].detach().cpu()).reshape((n,1,2))
33 | if ini_pose is None:
34 | _, rvec0, T0, _ = cv.solvePnPRansac(objectPoints=pts3d_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, confidence=0.9999 ,reprojectionError=3)
35 | else:
36 | rvec0 = np.array(ini_pose[i, 0:3].cpu().view(3, 1))
37 | T0 = np.array(ini_pose[i, 3:6].cpu().view(3, 1))
38 | _, rvec, T = cv.solvePnP(objectPoints=pts3d_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, useExtrinsicGuess=True, rvec=rvec0, tvec=T0)
39 | angle_axis = torch.tensor(rvec,device=device,dtype=torch.float).view(1, 3)
40 | T = torch.tensor(T,device=device,dtype=torch.float).view(1, 3)
41 | P_6d[i,:] = torch.cat((angle_axis,T),dim=-1)
42 |
43 | ctx.save_for_backward(pts2d,P_6d,pts3d,K)
44 | return P_6d
45 |
46 | @staticmethod
47 | def backward(ctx, grad_output):
48 |
49 | pts2d, P_6d, pts3d, K = ctx.saved_tensors
50 | device = pts2d.device
51 | bs = pts2d.size(0)
52 | n = pts2d.size(1)
53 | m = 6
54 |
55 | grad_x = torch.zeros_like(pts2d)
56 | grad_z = torch.zeros_like(pts3d)
57 | grad_K = torch.zeros_like(K)
58 |
59 | for i in range(bs):
60 | J_fy = torch.zeros(m,m, device=device)
61 | J_fx = torch.zeros(m,2*n, device=device)
62 | J_fz = torch.zeros(m,3*n, device=device)
63 | J_fK = torch.zeros(m, 9, device=device)
64 |
65 | torch.set_grad_enabled(True)
66 | pts2d_flat = pts2d[i].clone().view(-1).detach().requires_grad_()
67 | P_6d_flat = P_6d[i].clone().view(-1).detach().requires_grad_()
68 | pts3d_flat = pts3d.clone().view(-1).detach().requires_grad_()
69 | K_flat = K.clone().view(-1).detach().requires_grad_()
70 |
71 | for j in range(m):
72 | torch.set_grad_enabled(True)
73 | if j > 0:
74 | pts2d_flat.grad.zero_()
75 | P_6d_flat.grad.zero_()
76 | pts3d_flat.grad.zero_()
77 | K_flat.grad.zero_()
78 |
79 | R = kn.angle_axis_to_rotation_matrix(P_6d_flat[0:m-3].view(1,3))
80 |
81 | P = torch.cat((R[0,0:3,0:3].view(3,3), P_6d_flat[m-3:m].view(3,1)),dim=-1)
82 | KP = torch.mm(K_flat.view(3,3), P)
83 | pts2d_i = pts2d_flat.view(n,2).transpose(0,1)
84 | pts3d_i = torch.cat((pts3d_flat.view(n,3),torch.ones(n,1,device=device)),dim=-1).t()
85 | proj_i = KP.mm(pts3d_i)
86 | Si = proj_i[2,:].view(1,n)
87 |
88 | r = pts2d_i*Si-proj_i[0:2,:]
89 | coefs = get_coefs(P_6d_flat.view(1,6), pts3d_flat.view(n,3), K_flat.view(3,3))
90 | coef = coefs[:,:,j].transpose(0,1) # size: [2,n]
91 | fj = (coef*r).sum()
92 | fj.backward()
93 | J_fy[j,:] = P_6d_flat.grad.clone()
94 | J_fx[j,:] = pts2d_flat.grad.clone()
95 | J_fz[j,:] = pts3d_flat.grad.clone()
96 | J_fK[j,:] = K_flat.grad.clone()
97 |
98 | inv_J_fy = torch.inverse(J_fy)
99 |
100 | J_yx = (-1) * torch.mm(inv_J_fy, J_fx)
101 | J_yz = (-1) * torch.mm(inv_J_fy, J_fz)
102 | J_yK = (-1) * torch.mm(inv_J_fy, J_fK)
103 |
104 | grad_x[i] = grad_output[i].view(1,m).mm(J_yx).view(n,2)
105 | grad_z += grad_output[i].view(1,m).mm(J_yz).view(n,3)
106 | grad_K += grad_output[i].view(1,m).mm(J_yK).view(3,3)
107 |
108 | return grad_x, grad_z, grad_K, None
109 |
110 |
111 | class BPnP_m3d(torch.autograd.Function):
112 | """
113 | BPnP_m3d supports mini-batch intputs of 3D keypoints, where the i-th set of 2D keypoints correspond to the i-th set of 3D keypoints.
114 | INPUTS:
115 | pts2d - the 2D keypoints coordinates of size [batch_size, num_keypoints, 2]
116 | pts3d - the 3D keypoints coordinates of size [batch_size, num_keypoints, 3]
117 | K - the camera intrinsic matrix of size [3, 3]
118 | OUTPUT:
119 | P_6d - the 6 DOF poses of size [batch_size, 6], where the first 3 elements of each row are the angle-axis rotation
120 | vector (Euler vector) and the last 3 elements are the translation vector.
121 | NOTE:
122 | For situations where all sets of 2D points in the mini-batch correspond to one common set of 3D points, use the BPnP class.
123 | """
124 | @staticmethod
125 | def forward(ctx, pts2d, pts3d, K, ini_pose=None):
126 | bs = pts2d.size(0)
127 | n = pts2d.size(1)
128 | device = pts2d.device
129 | K_np = np.array(K.detach().cpu())
130 | P_6d = torch.zeros(bs,6,device=device)
131 |
132 | for i in range(bs):
133 | pts2d_i_np = np.ascontiguousarray(pts2d[i].detach().cpu()).reshape((n,1,2))
134 | pts3d_i_np = np.ascontiguousarray(pts3d[i].detach().cpu()).reshape((n,3))
135 | if ini_pose is None:
136 | _, rvec0, T0, _ = cv.solvePnPRansac(objectPoints=pts3d_i_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, confidence=0.9999 ,reprojectionError=1)
137 | else:
138 | rvec0 = np.array(ini_pose[i, 0:3].cpu().view(3, 1))
139 | T0 = np.array(ini_pose[i, 3:6].cpu().view(3, 1))
140 | _, rvec, T = cv.solvePnP(objectPoints=pts3d_i_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, useExtrinsicGuess=True, rvec=rvec0, tvec=T0)
141 | angle_axis = torch.tensor(rvec,device=device,dtype=torch.float).view(1, 3)
142 | T = torch.tensor(T,device=device,dtype=torch.float).view(1, 3)
143 | P_6d[i,:] = torch.cat((angle_axis,T),dim=-1)
144 |
145 | ctx.save_for_backward(pts2d,P_6d,pts3d,K)
146 | return P_6d
147 |
148 | @staticmethod
149 | def backward(ctx, grad_output):
150 |
151 | pts2d, P_6d, pts3d, K = ctx.saved_tensors
152 | device = PTS_D_DEVICE
153 | bs = pts2d.size(0)
154 | n = pts2d.size(1)
155 | m = 6
156 |
157 | grad_x = torch.zeros_like(pts2d)
158 | grad_z = torch.zeros_like(pts3d)
159 | grad_K = torch.zeros_like(K)
160 |
161 | for i in range(bs):
162 | J_fy = torch.zeros(m,m, device=device)
163 | J_fx = torch.zeros(m,2*n, device=device)
164 | J_fz = torch.zeros(m,3*n, device=device)
165 | J_fK = torch.zeros(m, 9, device=device)
166 |
167 | torch.set_grad_enabled(True)
168 | pts2d_flat = pts2d[i].clone().view(-1).detach().requires_grad_()
169 | P_6d_flat = P_6d[i].clone().view(-1).detach().requires_grad_()
170 | pts3d_flat = pts3d[i].clone().view(-1).detach().requires_grad_()
171 | K_flat = K.clone().view(-1).detach().requires_grad_()
172 |
173 | for j in range(m):
174 | torch.set_grad_enabled(True)
175 | if j > 0:
176 | pts2d_flat.grad.zero_()
177 | P_6d_flat.grad.zero_()
178 | pts3d_flat.grad.zero_()
179 | K_flat.grad.zero_()
180 |
181 | R = kn.angle_axis_to_rotation_matrix(P_6d_flat[0:m-3].view(1,3))
182 |
183 | P = torch.cat((R[0,0:3,0:3].view(3,3), P_6d_flat[m-3:m].view(3,1)),dim=-1)
184 | KP = torch.mm(K_flat.view(3,3), P)
185 | pts2d_i = pts2d_flat.view(n,2).transpose(0,1)
186 | pts3d_i = torch.cat((pts3d_flat.view(n,3),torch.ones(n,1,device=device)),dim=-1).t()
187 | proj_i = KP.mm(pts3d_i)
188 | Si = proj_i[2,:].view(1,n)
189 |
190 | r = pts2d_i*Si-proj_i[0:2,:]
191 | coefs = get_coefs(P_6d_flat.view(1,6), pts3d_flat.view(n,3), K_flat.view(3,3))
192 | coef = coefs[:,:,j].transpose(0,1) # size: [2,n]
193 | fj = (coef*r).sum()
194 | fj.backward()
195 | J_fy[j,:] = P_6d_flat.grad.clone()
196 | J_fx[j,:] = pts2d_flat.grad.clone()
197 | J_fz[j,:] = pts3d_flat.grad.clone()
198 | J_fK[j,:] = K_flat.grad.clone()
199 |
200 | inv_J_fy = torch.inverse(J_fy)
201 |
202 | J_yx = (-1) * torch.mm(inv_J_fy, J_fx)
203 | J_yz = (-1) * torch.mm(inv_J_fy, J_fz)
204 | J_yK = (-1) * torch.mm(inv_J_fy, J_fK)
205 |
206 | grad_x[i] = grad_output[i].view(1,m).mm(J_yx).view(n,2)
207 | grad_z[i] = grad_output[i].view(1,m).mm(J_yz).view(n,3)
208 | grad_K += grad_output[i].view(1,m).mm(J_yK).view(3,3)
209 |
210 | return grad_x, grad_z, grad_K, None
211 |
212 |
213 | class BPnP_fast(torch.autograd.Function):
214 | """
215 | BPnP_fast is the efficient version of the BPnP class which ignores the higher order dirivatives through the coefs' graph. This sacrifices
216 | negligible gradient accuracy yet saves significant runtime.
217 | INPUTS:
218 | pts2d - the 2D keypoints coordinates of size [batch_size, num_keypoints, 2]
219 | pts3d - the 3D keypoints coordinates of size [num_keypoints, 3]
220 | K - the camera intrinsic matrix of size [3, 3]
221 | OUTPUT:
222 | P_6d - the 6 DOF poses of size [batch_size, 6], where the first 3 elements of each row are the angle-axis rotation
223 | vector (Euler vector) and the last 3 elements are the translation vector.
224 | NOTE:
225 | This BPnP function assumes that all sets of 2D points in the mini-batch correspond to one common set of 3D points.
226 | For situations where pts3d is also a mini-batch, use the BPnP_m3d class.
227 | """
228 | @staticmethod
229 | def forward(ctx, pts2d, pts3d, K, ini_pose=None):
230 | bs = pts2d.size(0)
231 | n = pts2d.size(1)
232 | device = pts2d.device
233 | pts3d_np = np.array(pts3d.detach().cpu())
234 | K_np = np.array(K.detach().cpu())
235 | P_6d = torch.zeros(bs,6,device=device)
236 |
237 | for i in range(bs):
238 | pts2d_i_np = np.ascontiguousarray(pts2d[i].detach().cpu()).reshape((n,1,2))
239 | if ini_pose is None:
240 | _, rvec0, T0, _ = cv.solvePnPRansac(objectPoints=pts3d_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, confidence=0.9999 ,reprojectionError=3)
241 | else:
242 | rvec0 = np.array(ini_pose[i, 0:3].cpu().view(3, 1))
243 | T0 = np.array(ini_pose[i, 3:6].cpu().view(3, 1))
244 | _, rvec, T = cv.solvePnP(objectPoints=pts3d_np, imagePoints=pts2d_i_np, cameraMatrix=K_np, distCoeffs=None, flags=cv.SOLVEPNP_ITERATIVE, useExtrinsicGuess=True, rvec=rvec0, tvec=T0)
245 | angle_axis = torch.tensor(rvec,device=device,dtype=torch.float).view(1, 3)
246 | T = torch.tensor(T,device=device,dtype=torch.float).view(1, 3)
247 | P_6d[i,:] = torch.cat((angle_axis,T),dim=-1)
248 |
249 | ctx.save_for_backward(pts2d,P_6d,pts3d,K)
250 | return P_6d
251 |
252 | @staticmethod
253 | def backward(ctx, grad_output):
254 |
255 | pts2d, P_6d, pts3d, K = ctx.saved_tensors
256 | device = pts2d.device
257 | bs = pts2d.size(0)
258 | n = pts2d.size(1)
259 | m = 6
260 |
261 | grad_x = torch.zeros_like(pts2d)
262 | grad_z = torch.zeros_like(pts3d)
263 | grad_K = torch.zeros_like(K)
264 |
265 | for i in range(bs):
266 | J_fy = torch.zeros(m,m, device=device)
267 | J_fx = torch.zeros(m,2*n, device=device)
268 | J_fz = torch.zeros(m,3*n, device=device)
269 | J_fK = torch.zeros(m, 9, device=device)
270 |
271 | coefs = get_coefs(P_6d[i].view(1,6), pts3d, K, create_graph=False).detach()
272 |
273 | pts2d_flat = pts2d[i].clone().view(-1).detach().requires_grad_()
274 | P_6d_flat = P_6d[i].clone().view(-1).detach().requires_grad_()
275 | pts3d_flat = pts3d.clone().view(-1).detach().requires_grad_()
276 | K_flat = K.clone().view(-1).detach().requires_grad_()
277 |
278 | for j in range(m):
279 | torch.set_grad_enabled(True)
280 | if j > 0:
281 | pts2d_flat.grad.zero_()
282 | P_6d_flat.grad.zero_()
283 | pts3d_flat.grad.zero_()
284 | K_flat.grad.zero_()
285 |
286 | R = kn.angle_axis_to_rotation_matrix(P_6d_flat[0:m-3].view(1,3))
287 |
288 | P = torch.cat((R[0,0:3,0:3].view(3,3), P_6d_flat[m-3:m].view(3,1)),dim=-1)
289 | KP = torch.mm(K_flat.view(3,3), P)
290 | pts2d_i = pts2d_flat.view(n,2).transpose(0,1)
291 | pts3d_i = torch.cat((pts3d_flat.view(n,3),torch.ones(n,1,device=device)),dim=-1).t()
292 | proj_i = KP.mm(pts3d_i)
293 | Si = proj_i[2,:].view(1,n)
294 |
295 | r = pts2d_i*Si-proj_i[0:2,:]
296 | coef = coefs[:,:,j].transpose(0,1) # size: [2,n]
297 | fj = (coef*r).sum()
298 | fj.backward()
299 | J_fy[j,:] = P_6d_flat.grad.clone()
300 | J_fx[j,:] = pts2d_flat.grad.clone()
301 | J_fz[j,:] = pts3d_flat.grad.clone()
302 | J_fK[j,:] = K_flat.grad.clone()
303 |
304 | inv_J_fy = torch.inverse(J_fy)
305 |
306 | J_yx = (-1) * torch.mm(inv_J_fy, J_fx)
307 | J_yz = (-1) * torch.mm(inv_J_fy, J_fz)
308 | J_yK = (-1) * torch.mm(inv_J_fy, J_fK)
309 |
310 | grad_x[i] = grad_output[i].view(1,m).mm(J_yx).view(n,2)
311 | grad_z += grad_output[i].view(1,m).mm(J_yz).view(n,3)
312 | grad_K += grad_output[i].view(1,m).mm(J_yK).view(3,3)
313 |
314 | return grad_x, grad_z, grad_K, None
315 |
316 |
317 | def get_coefs(P_6d, pts3d, K, create_graph=True):
318 | device = P_6d.device
319 | n = pts3d.size(0)
320 | m = P_6d.size(-1)
321 | coefs = torch.zeros(n,2,m,device=device)
322 | torch.set_grad_enabled(True)
323 | y = P_6d.repeat(n,1)
324 | proj = batch_project(y, pts3d, K).squeeze()
325 | vec = torch.diag(torch.ones(n,device=device).float())
326 | for k in range(2):
327 | torch.set_grad_enabled(True)
328 | y_grad = torch.autograd.grad(proj[:,:,k],y,vec, retain_graph=True, create_graph=create_graph)
329 | coefs[:,k,:] = -2*y_grad[0].clone()
330 | return coefs
331 |
332 | def batch_project(P, pts3d, K, angle_axis=True):
333 | n = pts3d.size(0)
334 | bs = P.size(0)
335 | device = P.device
336 | pts3d_h = torch.cat((pts3d, torch.ones(n, 1, device=device)), dim=-1)
337 | if angle_axis:
338 | R_out = kn.angle_axis_to_rotation_matrix(P[:, 0:3].view(bs, 3))
339 | PM = torch.cat((R_out[:,0:3,0:3], P[:, 3:6].view(bs, 3, 1)), dim=-1)
340 | else:
341 | PM = P
342 | pts3d_cam = pts3d_h.matmul(PM.transpose(-2,-1))
343 | pts2d_proj = pts3d_cam.matmul(K.t())
344 | S = pts2d_proj[:,:, 2].view(bs, n, 1)
345 | pts2d_pro = pts2d_proj[:,:,0:2].div(S)
346 |
347 | return pts2d_pro
348 |
349 |
350 |
351 |
352 |
353 |
--------------------------------------------------------------------------------
/core/camera_model.py:
--------------------------------------------------------------------------------
1 | # -------------------------------------------------------------------
2 | # Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab
3 | # Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it)
4 | # Released under Creative Commons
5 | # Attribution-NonCommercial-ShareAlike 4.0 International License.
6 | # http://creativecommons.org/licenses/by-nc-sa/4.0/
7 | # -------------------------------------------------------------------
8 | import numpy as np
9 | import torch
10 | import sys
11 | sys.path.append("core")
12 | from utils_point import rotate_forward, rotate_back
13 |
14 | class CameraModel:
15 | def __init__(self, focal_length=None, principal_point=None):
16 | self.focal_length = focal_length
17 | self.principal_point = principal_point
18 |
19 | def project_pytorch(self, xyz: torch.Tensor, image_size, reflectance=None):
20 | if xyz.shape[0] == 3:
21 | xyz = torch.cat([xyz, torch.ones(1, xyz.shape[1], device=xyz.device)])
22 | else:
23 | if not torch.all(xyz[3, :] == 1.):
24 | xyz[3, :] = 1.
25 | raise TypeError("Wrong Coordinates")
26 | order = [1, 2, 0, 3]
27 | xyzw = xyz[order, :]
28 | indexes = xyzw[2, :] >= 0
29 | if reflectance is not None:
30 | reflectance = reflectance[:, indexes]
31 | xyzw = xyzw[:, indexes]
32 |
33 | uv = torch.zeros((2, xyzw.shape[1]), device=xyzw.device)
34 | uv[0, :] = self.focal_length[0] * xyzw[0, :] / xyzw[2, :] + self.principal_point[0]
35 | uv[1, :] = self.focal_length[1] * xyzw[1, :] / xyzw[2, :] + self.principal_point[1]
36 | indexes = uv[0, :] >= 0.1
37 | indexes = indexes & (uv[1, :] >= 0.1)
38 | indexes = indexes & (uv[0,:] < image_size[1])
39 | indexes = indexes & (uv[1,:] < image_size[0])
40 | if reflectance is None:
41 | uv = uv[:, indexes], xyzw[2, indexes], xyzw[:3, indexes], None
42 | else:
43 | uv = uv[:, indexes], xyzw[2, indexes], xyzw[:3, indexes], reflectance[:, indexes]
44 |
45 | return uv
46 |
47 | # for pc_RT
48 | def project_withindex_pytorch(self, xyz: torch.Tensor, image_size, reflectance=None):
49 | if xyz.shape[0] == 3:
50 | xyz = torch.cat([xyz, torch.ones(1, xyz.shape[1], device=xyz.device)])
51 | else:
52 | if not torch.all(xyz[3, :] == 1.):
53 | xyz[3, :] = 1.
54 | raise TypeError("Wrong Coordinates")
55 | order = [1, 2, 0, 3]
56 | xyzw = xyz[order, :]
57 | indexes = xyzw[2, :] >= 0
58 | if reflectance is not None:
59 | reflectance = reflectance[:, indexes]
60 | xyzw = xyzw[:, indexes]
61 |
62 | VI_indexes = indexes
63 |
64 |
65 | uv = torch.zeros((2, xyzw.shape[1]), device=xyzw.device)
66 | uv[0, :] = self.focal_length[0] * xyzw[0, :] / xyzw[2, :] + self.principal_point[0]
67 | uv[1, :] = self.focal_length[1] * xyzw[1, :] / xyzw[2, :] + self.principal_point[1]
68 | indexes = uv[0, :] >= 0.1
69 | indexes = indexes & (uv[1, :] >= 0.1)
70 | indexes = indexes & (uv[0, :] < image_size[1])
71 | indexes = indexes & (uv[1, :] < image_size[0])
72 |
73 | # generate complete indexes
74 | ind = torch.where(VI_indexes == True)[0]
75 | VI_indexes[ind] = VI_indexes[ind] & indexes
76 |
77 |
78 | if reflectance is None:
79 | uv = uv[:, indexes], xyzw[2, indexes], xyzw[:3, indexes], None, VI_indexes
80 | else:
81 | uv = uv[:, indexes], xyzw[2, indexes], xyzw[:3, indexes], reflectance[:, indexes], VI_indexes
82 |
83 | return uv
84 |
85 |
86 | def get_matrix(self):
87 | matrix = np.zeros([3, 3])
88 | matrix[0, 0] = self.focal_length[0]
89 | matrix[1, 1] = self.focal_length[1]
90 | matrix[0, 2] = self.principal_point[0]
91 | matrix[1, 2] = self.principal_point[1]
92 | matrix[2, 2] = 1.0
93 | return matrix
94 |
95 | def deproject_pytorch(self, uv, pc_project_uv):
96 | index = np.argwhere(uv > 0)
97 | mask = uv > 0
98 | z = uv[mask]
99 | # x = (index[:, 1] - self.principal_point[0].cpu().numpy()) * z / self.focal_length[0].cpu().numpy()
100 | # y = (index[:, 0] - self.principal_point[1].cpu().numpy()) * z / self.focal_length[1].cpu().numpy()
101 | x = (index[:, 1] - self.principal_point[0]) * z / self.focal_length[0]
102 | y = (index[:, 0] - self.principal_point[1]) * z / self.focal_length[1]
103 | zxy = np.array([z, x, y])
104 | zxy = torch.tensor(zxy, dtype=torch.float32)
105 | zxyw = torch.cat([zxy, torch.ones(1, zxy.shape[1], device=zxy.device)])
106 | zxy = zxyw[:3, :]
107 | zxy = zxy.cpu().numpy()
108 | xyz = zxy[[1, 2, 0], :]
109 |
110 |
111 | pc_project_u = pc_project_uv[:, :, 0][mask]
112 | pc_project_v = pc_project_uv[:, :, 1][mask]
113 | pc_project = np.array([pc_project_v, pc_project_u])
114 |
115 |
116 | match_index = np.array([index[:, 0], index[:, 1]])
117 |
118 | return xyz.transpose(), pc_project.transpose(), match_index.transpose()
119 |
120 | def depth2pc(self, depth_img):
121 | # pc = torch.zeros([depth_img.shape[0], depth_img.shape[1], 3])
122 | # pc[:, :, 2] = depth_img
123 | #
124 | # for i in range(depth_img.shape[0]):
125 | # for j in range(depth_img.shape[1]):
126 | # pc[i, j, 0] = (j - self.principal_point[0]) * depth_img[i, j] / self.focal_length[0]
127 | # pc[i, j, 1] = (i - self.principal_point[1]) * depth_img[i, j] / self.focal_length[1]
128 |
129 | depth_img = depth_img.cpu().numpy()
130 | index = np.argwhere(depth_img > 0)
131 | mask = depth_img > 0
132 | z = depth_img[mask]
133 | x = (index[:, 1] - self.principal_point[0].cpu().numpy()) * z / self.focal_length[0].cpu().numpy()
134 | y = (index[:, 0] - self.principal_point[1].cpu().numpy()) * z / self.focal_length[1].cpu().numpy()
135 |
136 | # pc[index[:, 0], index[:, 1], 0] = x
137 | # pc[index[:, 0], index[:, 1], 1] = y
138 | # pc[index[:, 0], index[:, 1], 2] = z
139 | zxy = np.array([z, x, y], dtype=np.float32)
140 | return zxy
--------------------------------------------------------------------------------
/core/corr.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn.functional as F
3 | import sys
4 | sys.path.append("core")
5 | from utils import bilinear_sampler, coords_grid
6 |
7 | try:
8 | import alt_cuda_corr
9 | except:
10 | # alt_cuda_corr is not compiled
11 | pass
12 |
13 |
14 | class CorrBlock:
15 | def __init__(self, fmap1, fmap2, num_levels=4, radius=4):
16 | self.num_levels = num_levels
17 | self.radius = radius
18 | self.corr_pyramid = []
19 |
20 | # all pairs correlation
21 | corr = CorrBlock.corr(fmap1, fmap2)
22 |
23 | batch, h1, w1, dim, h2, w2 = corr.shape
24 | corr = corr.reshape(batch * h1 * w1, dim, h2, w2)
25 |
26 | self.corr_pyramid.append(corr)
27 | for i in range(self.num_levels - 1):
28 | corr = F.avg_pool2d(corr, 2, stride=2)
29 | self.corr_pyramid.append(corr)
30 |
31 | def __call__(self, coords):
32 | r = self.radius
33 | coords = coords.permute(0, 2, 3, 1)
34 | batch, h1, w1, _ = coords.shape
35 |
36 | out_pyramid = []
37 | for i in range(self.num_levels):
38 | corr = self.corr_pyramid[i]
39 | dx = torch.linspace(-r, r, 2 * r + 1)
40 | dy = torch.linspace(-r, r, 2 * r + 1)
41 | delta = torch.stack(torch.meshgrid(dy, dx), axis=-1).to(coords.device)
42 |
43 | centroid_lvl = coords.reshape(batch * h1 * w1, 1, 1, 2) / 2 ** i
44 | delta_lvl = delta.view(1, 2 * r + 1, 2 * r + 1, 2)
45 | coords_lvl = centroid_lvl + delta_lvl
46 |
47 | corr = bilinear_sampler(corr, coords_lvl)
48 | corr = corr.view(batch, h1, w1, -1)
49 | out_pyramid.append(corr)
50 |
51 | out = torch.cat(out_pyramid, dim=-1)
52 | return out.permute(0, 3, 1, 2).contiguous().float()
53 |
54 | @staticmethod
55 | def corr(fmap1, fmap2):
56 | batch, dim, ht, wd = fmap1.shape
57 | fmap1 = fmap1.view(batch, dim, ht * wd)
58 | fmap2 = fmap2.view(batch, dim, ht * wd)
59 |
60 | corr = torch.matmul(fmap1.transpose(1, 2), fmap2)
61 | corr = corr.view(batch, ht, wd, 1, ht, wd)
62 | return corr / torch.sqrt(torch.tensor(dim).float())
63 |
64 |
65 | class AlternateCorrBlock:
66 | def __init__(self, fmap1, fmap2, num_levels=4, radius=4):
67 | self.num_levels = num_levels
68 | self.radius = radius
69 |
70 | self.pyramid = [(fmap1, fmap2)]
71 | for i in range(self.num_levels):
72 | fmap1 = F.avg_pool2d(fmap1, 2, stride=2)
73 | fmap2 = F.avg_pool2d(fmap2, 2, stride=2)
74 | self.pyramid.append((fmap1, fmap2))
75 |
76 | def __call__(self, coords):
77 | coords = coords.permute(0, 2, 3, 1)
78 | B, H, W, _ = coords.shape
79 | dim = self.pyramid[0][0].shape[1]
80 |
81 | corr_list = []
82 | for i in range(self.num_levels):
83 | r = self.radius
84 | fmap1_i = self.pyramid[0][0].permute(0, 2, 3, 1).contiguous()
85 | fmap2_i = self.pyramid[i][1].permute(0, 2, 3, 1).contiguous()
86 |
87 | coords_i = (coords / 2 ** i).reshape(B, 1, H, W, 2).contiguous()
88 | corr, = alt_cuda_corr.forward(fmap1_i, fmap2_i, coords_i, r)
89 | corr_list.append(corr.squeeze(1))
90 |
91 | corr = torch.stack(corr_list, dim=1)
92 | corr = corr.reshape(B, -1, H, W)
93 | return corr / torch.sqrt(torch.tensor(dim).float())
94 |
--------------------------------------------------------------------------------
/core/correlation_package/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright 2017 NVIDIA CORPORATION
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
--------------------------------------------------------------------------------
/core/correlation_package/correlation.py:
--------------------------------------------------------------------------------
1 | import correlation_cuda
2 | import torch
3 | from torch.autograd import Function
4 | from torch.nn.modules.module import Module
5 |
6 |
7 | class CorrelationFunction(Function):
8 |
9 | def __init__(self, pad_size=3, kernel_size=3, max_displacement=20, stride1=1, stride2=2, corr_multiply=1):
10 | super(CorrelationFunction, self).__init__()
11 | self.pad_size = pad_size
12 | self.kernel_size = kernel_size
13 | self.max_displacement = max_displacement
14 | self.stride1 = stride1
15 | self.stride2 = stride2
16 | self.corr_multiply = corr_multiply
17 | # self.out_channel = ((max_displacement/stride2)*2 + 1) * ((max_displacement/stride2)*2 + 1)
18 |
19 | def forward(self, input1, input2):
20 | self.save_for_backward(input1, input2)
21 |
22 | with torch.cuda.device_of(input1):
23 | rbot1 = input1.new()
24 | rbot2 = input2.new()
25 | output = input1.new()
26 |
27 | correlation_cuda.forward(input1, input2, rbot1, rbot2, output,
28 | self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
29 |
30 | return output
31 |
32 | def backward(self, grad_output):
33 | input1, input2 = self.saved_tensors
34 |
35 | with torch.cuda.device_of(input1):
36 | rbot1 = input1.new()
37 | rbot2 = input2.new()
38 |
39 | grad_input1 = input1.new()
40 | grad_input2 = input2.new()
41 |
42 | correlation_cuda.backward(input1, input2, rbot1, rbot2, grad_output, grad_input1, grad_input2,
43 | self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)
44 |
45 | return grad_input1, grad_input2
46 |
47 |
48 | class Correlation(Module):
49 | def __init__(self, pad_size=0, kernel_size=0, max_displacement=0, stride1=1, stride2=2, corr_multiply=1):
50 | super(Correlation, self).__init__()
51 | self.pad_size = pad_size
52 | self.kernel_size = kernel_size
53 | self.max_displacement = max_displacement
54 | self.stride1 = stride1
55 | self.stride2 = stride2
56 | self.corr_multiply = corr_multiply
57 |
58 | def forward(self, input1, input2):
59 |
60 | result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2)
61 |
62 | return result
63 |
--------------------------------------------------------------------------------
/core/correlation_package/correlation_cuda.cc:
--------------------------------------------------------------------------------
1 | #include
2 | #include
3 | #include
4 | #include
5 | #include
6 |
7 | #include "correlation_cuda_kernel.cuh"
8 |
9 | int correlation_forward_cuda(at::Tensor& input1, at::Tensor& input2, at::Tensor& rInput1, at::Tensor& rInput2, at::Tensor& output,
10 | int pad_size,
11 | int kernel_size,
12 | int max_displacement,
13 | int stride1,
14 | int stride2,
15 | int corr_type_multiply)
16 | {
17 |
18 | int batchSize = input1.size(0);
19 |
20 | int nInputChannels = input1.size(1);
21 | int inputHeight = input1.size(2);
22 | int inputWidth = input1.size(3);
23 |
24 | int kernel_radius = (kernel_size - 1) / 2;
25 | int border_radius = kernel_radius + max_displacement;
26 |
27 | int paddedInputHeight = inputHeight + 2 * pad_size;
28 | int paddedInputWidth = inputWidth + 2 * pad_size;
29 |
30 | int nOutputChannels = ((max_displacement/stride2)*2 + 1) * ((max_displacement/stride2)*2 + 1);
31 |
32 | int outputHeight = ceil(static_cast(paddedInputHeight - 2 * border_radius) / static_cast(stride1));
33 | int outputwidth = ceil(static_cast(paddedInputWidth - 2 * border_radius) / static_cast(stride1));
34 |
35 | rInput1.resize_({batchSize, paddedInputHeight, paddedInputWidth, nInputChannels});
36 | rInput2.resize_({batchSize, paddedInputHeight, paddedInputWidth, nInputChannels});
37 | output.resize_({batchSize, nOutputChannels, outputHeight, outputwidth});
38 |
39 | rInput1.fill_(0);
40 | rInput2.fill_(0);
41 | output.fill_(0);
42 |
43 | int success = correlation_forward_cuda_kernel(
44 | output,
45 | output.size(0),
46 | output.size(1),
47 | output.size(2),
48 | output.size(3),
49 | output.stride(0),
50 | output.stride(1),
51 | output.stride(2),
52 | output.stride(3),
53 | input1,
54 | input1.size(1),
55 | input1.size(2),
56 | input1.size(3),
57 | input1.stride(0),
58 | input1.stride(1),
59 | input1.stride(2),
60 | input1.stride(3),
61 | input2,
62 | input2.size(1),
63 | input2.stride(0),
64 | input2.stride(1),
65 | input2.stride(2),
66 | input2.stride(3),
67 | rInput1,
68 | rInput2,
69 | pad_size,
70 | kernel_size,
71 | max_displacement,
72 | stride1,
73 | stride2,
74 | corr_type_multiply,
75 | at::cuda::getCurrentCUDAStream()
76 | );
77 |
78 | //check for errors
79 | if (!success) {
80 | AT_ERROR("CUDA call failed");
81 | }
82 |
83 | return 1;
84 |
85 | }
86 |
87 | int correlation_backward_cuda(at::Tensor& input1, at::Tensor& input2, at::Tensor& rInput1, at::Tensor& rInput2, at::Tensor& gradOutput,
88 | at::Tensor& gradInput1, at::Tensor& gradInput2,
89 | int pad_size,
90 | int kernel_size,
91 | int max_displacement,
92 | int stride1,
93 | int stride2,
94 | int corr_type_multiply)
95 | {
96 |
97 | int batchSize = input1.size(0);
98 | int nInputChannels = input1.size(1);
99 | int paddedInputHeight = input1.size(2)+ 2 * pad_size;
100 | int paddedInputWidth = input1.size(3)+ 2 * pad_size;
101 |
102 | int height = input1.size(2);
103 | int width = input1.size(3);
104 |
105 | rInput1.resize_({batchSize, paddedInputHeight, paddedInputWidth, nInputChannels});
106 | rInput2.resize_({batchSize, paddedInputHeight, paddedInputWidth, nInputChannels});
107 | gradInput1.resize_({batchSize, nInputChannels, height, width});
108 | gradInput2.resize_({batchSize, nInputChannels, height, width});
109 |
110 | rInput1.fill_(0);
111 | rInput2.fill_(0);
112 | gradInput1.fill_(0);
113 | gradInput2.fill_(0);
114 |
115 | int success = correlation_backward_cuda_kernel(gradOutput,
116 | gradOutput.size(0),
117 | gradOutput.size(1),
118 | gradOutput.size(2),
119 | gradOutput.size(3),
120 | gradOutput.stride(0),
121 | gradOutput.stride(1),
122 | gradOutput.stride(2),
123 | gradOutput.stride(3),
124 | input1,
125 | input1.size(1),
126 | input1.size(2),
127 | input1.size(3),
128 | input1.stride(0),
129 | input1.stride(1),
130 | input1.stride(2),
131 | input1.stride(3),
132 | input2,
133 | input2.stride(0),
134 | input2.stride(1),
135 | input2.stride(2),
136 | input2.stride(3),
137 | gradInput1,
138 | gradInput1.stride(0),
139 | gradInput1.stride(1),
140 | gradInput1.stride(2),
141 | gradInput1.stride(3),
142 | gradInput2,
143 | gradInput2.size(1),
144 | gradInput2.stride(0),
145 | gradInput2.stride(1),
146 | gradInput2.stride(2),
147 | gradInput2.stride(3),
148 | rInput1,
149 | rInput2,
150 | pad_size,
151 | kernel_size,
152 | max_displacement,
153 | stride1,
154 | stride2,
155 | corr_type_multiply,
156 | at::cuda::getCurrentCUDAStream()
157 | );
158 |
159 | if (!success) {
160 | AT_ERROR("CUDA call failed");
161 | }
162 |
163 | return 1;
164 | }
165 |
166 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
167 | m.def("forward", &correlation_forward_cuda, "Correlation forward (CUDA)");
168 | m.def("backward", &correlation_backward_cuda, "Correlation backward (CUDA)");
169 | }
170 |
171 |
--------------------------------------------------------------------------------
/core/correlation_package/correlation_cuda_kernel.cu:
--------------------------------------------------------------------------------
1 | #include
2 |
3 | #include "correlation_cuda_kernel.cuh"
4 |
5 | #define CUDA_NUM_THREADS 1024
6 | #define THREADS_PER_BLOCK 32
7 | #define FULL_MASK 0xffffffff
8 |
9 | #include
10 | #include
11 | #include
12 | #include
13 |
14 | using at::Half;
15 |
16 | template
17 | __forceinline__ __device__ scalar_t warpReduceSum(scalar_t val) {
18 | for (int offset = 16; offset > 0; offset /= 2)
19 | val += __shfl_down_sync(FULL_MASK, val, offset);
20 | return val;
21 | }
22 |
23 | template
24 | __forceinline__ __device__ scalar_t blockReduceSum(scalar_t val) {
25 |
26 | static __shared__ scalar_t shared[32];
27 | int lane = threadIdx.x % warpSize;
28 | int wid = threadIdx.x / warpSize;
29 |
30 | val = warpReduceSum(val);
31 |
32 | if (lane == 0)
33 | shared[wid] = val;
34 |
35 | __syncthreads();
36 |
37 | val = (threadIdx.x < blockDim.x / warpSize) ? shared[lane] : 0;
38 |
39 | if (wid == 0)
40 | val = warpReduceSum(val);
41 |
42 | return val;
43 | }
44 |
45 |
46 | template
47 | __global__ void channels_first(const scalar_t* __restrict__ input, scalar_t* rinput, int channels, int height, int width, int pad_size)
48 | {
49 |
50 | // n (batch size), c (num of channels), y (height), x (width)
51 | int n = blockIdx.x;
52 | int y = blockIdx.y;
53 | int x = blockIdx.z;
54 |
55 | int ch_off = threadIdx.x;
56 | scalar_t value;
57 |
58 | int dimcyx = channels * height * width;
59 | int dimyx = height * width;
60 |
61 | int p_dimx = (width + 2 * pad_size);
62 | int p_dimy = (height + 2 * pad_size);
63 | int p_dimyxc = channels * p_dimy * p_dimx;
64 | int p_dimxc = p_dimx * channels;
65 |
66 | for (int c = ch_off; c < channels; c += THREADS_PER_BLOCK) {
67 | value = input[n * dimcyx + c * dimyx + y * width + x];
68 | rinput[n * p_dimyxc + (y + pad_size) * p_dimxc + (x + pad_size) * channels + c] = value;
69 | }
70 | }
71 |
72 |
73 | template
74 | __global__ void correlation_forward(scalar_t* __restrict__ output, const int nOutputChannels,
75 | const int outputHeight, const int outputWidth, const scalar_t* __restrict__ rInput1,
76 | const int nInputChannels, const int inputHeight, const int inputWidth,
77 | const scalar_t* __restrict__ rInput2, const int pad_size, const int kernel_size,
78 | const int max_displacement, const int stride1, const int stride2) {
79 |
80 | int32_t pInputWidth = inputWidth + 2 * pad_size;
81 | int32_t pInputHeight = inputHeight + 2 * pad_size;
82 |
83 | int32_t kernel_rad = (kernel_size - 1) / 2;
84 |
85 | int32_t displacement_rad = max_displacement / stride2;
86 |
87 | int32_t displacement_size = 2 * displacement_rad + 1;
88 |
89 | int32_t n = blockIdx.x;
90 | int32_t y1 = blockIdx.y * stride1 + max_displacement;
91 | int32_t x1 = blockIdx.z * stride1 + max_displacement;
92 | int32_t c = threadIdx.x;
93 |
94 | int32_t pdimyxc = pInputHeight * pInputWidth * nInputChannels;
95 |
96 | int32_t pdimxc = pInputWidth * nInputChannels;
97 |
98 | int32_t pdimc = nInputChannels;
99 |
100 | int32_t tdimcyx = nOutputChannels * outputHeight * outputWidth;
101 | int32_t tdimyx = outputHeight * outputWidth;
102 | int32_t tdimx = outputWidth;
103 |
104 | int32_t nelems = kernel_size * kernel_size * pdimc;
105 |
106 | // element-wise product along channel axis
107 | for (int tj = -displacement_rad; tj <= displacement_rad; ++tj) {
108 | for (int ti = -displacement_rad; ti <= displacement_rad; ++ti) {
109 | int x2 = x1 + ti * stride2;
110 | int y2 = y1 + tj * stride2;
111 |
112 | float acc0 = 0.0f;
113 |
114 | for (int j = -kernel_rad; j <= kernel_rad; ++j) {
115 | for (int i = -kernel_rad; i <= kernel_rad; ++i) {
116 | // THREADS_PER_BLOCK
117 | #pragma unroll
118 | for (int ch = c; ch < pdimc; ch += blockDim.x) {
119 |
120 | int indx1 = n * pdimyxc + (y1 + j) * pdimxc
121 | + (x1 + i) * pdimc + ch;
122 | int indx2 = n * pdimyxc + (y2 + j) * pdimxc
123 | + (x2 + i) * pdimc + ch;
124 | acc0 += static_cast(rInput1[indx1] * rInput2[indx2]);
125 | }
126 | }
127 | }
128 |
129 | if (blockDim.x == warpSize) {
130 | __syncwarp();
131 | acc0 = warpReduceSum(acc0);
132 | } else {
133 | __syncthreads();
134 | acc0 = blockReduceSum(acc0);
135 | }
136 |
137 | if (threadIdx.x == 0) {
138 |
139 | int tc = (tj + displacement_rad) * displacement_size
140 | + (ti + displacement_rad);
141 | const int tindx = n * tdimcyx + tc * tdimyx + blockIdx.y * tdimx
142 | + blockIdx.z;
143 | output[tindx] = static_cast(acc0 / nelems);
144 | }
145 | }
146 | }
147 | }
148 |
149 |
150 | template
151 | __global__ void correlation_backward_input1(int item, scalar_t* gradInput1, int nInputChannels, int inputHeight, int inputWidth,
152 | const scalar_t* __restrict__ gradOutput, int nOutputChannels, int outputHeight, int outputWidth,
153 | const scalar_t* __restrict__ rInput2,
154 | int pad_size,
155 | int kernel_size,
156 | int max_displacement,
157 | int stride1,
158 | int stride2)
159 | {
160 | // n (batch size), c (num of channels), y (height), x (width)
161 |
162 | int n = item;
163 | int y = blockIdx.x * stride1 + pad_size;
164 | int x = blockIdx.y * stride1 + pad_size;
165 | int c = blockIdx.z;
166 | int tch_off = threadIdx.x;
167 |
168 | int kernel_rad = (kernel_size - 1) / 2;
169 | int displacement_rad = max_displacement / stride2;
170 | int displacement_size = 2 * displacement_rad + 1;
171 |
172 | int xmin = (x - kernel_rad - max_displacement) / stride1;
173 | int ymin = (y - kernel_rad - max_displacement) / stride1;
174 |
175 | int xmax = (x + kernel_rad - max_displacement) / stride1;
176 | int ymax = (y + kernel_rad - max_displacement) / stride1;
177 |
178 | if (xmax < 0 || ymax < 0 || xmin >= outputWidth || ymin >= outputHeight) {
179 | // assumes gradInput1 is pre-allocated and zero filled
180 | return;
181 | }
182 |
183 | if (xmin > xmax || ymin > ymax) {
184 | // assumes gradInput1 is pre-allocated and zero filled
185 | return;
186 | }
187 |
188 | xmin = max(0,xmin);
189 | xmax = min(outputWidth-1,xmax);
190 |
191 | ymin = max(0,ymin);
192 | ymax = min(outputHeight-1,ymax);
193 |
194 | int pInputWidth = inputWidth + 2 * pad_size;
195 | int pInputHeight = inputHeight + 2 * pad_size;
196 |
197 | int pdimyxc = pInputHeight * pInputWidth * nInputChannels;
198 | int pdimxc = pInputWidth * nInputChannels;
199 | int pdimc = nInputChannels;
200 |
201 | int tdimcyx = nOutputChannels * outputHeight * outputWidth;
202 | int tdimyx = outputHeight * outputWidth;
203 | int tdimx = outputWidth;
204 |
205 | int odimcyx = nInputChannels * inputHeight* inputWidth;
206 | int odimyx = inputHeight * inputWidth;
207 | int odimx = inputWidth;
208 |
209 | scalar_t nelems = kernel_size * kernel_size * nInputChannels;
210 |
211 | __shared__ scalar_t prod_sum[THREADS_PER_BLOCK];
212 | prod_sum[tch_off] = 0;
213 |
214 | for (int tc = tch_off; tc < nOutputChannels; tc += THREADS_PER_BLOCK) {
215 |
216 | int i2 = (tc % displacement_size - displacement_rad) * stride2;
217 | int j2 = (tc / displacement_size - displacement_rad) * stride2;
218 |
219 | int indx2 = n * pdimyxc + (y + j2)* pdimxc + (x + i2) * pdimc + c;
220 |
221 | scalar_t val2 = rInput2[indx2];
222 |
223 | for (int j = ymin; j <= ymax; ++j) {
224 | for (int i = xmin; i <= xmax; ++i) {
225 | int tindx = n * tdimcyx + tc * tdimyx + j * tdimx + i;
226 | prod_sum[tch_off] += gradOutput[tindx] * val2;
227 | }
228 | }
229 | }
230 | __syncthreads();
231 |
232 | if(tch_off == 0) {
233 | scalar_t reduce_sum = 0;
234 | for(int idx = 0; idx < THREADS_PER_BLOCK; idx++) {
235 | reduce_sum += prod_sum[idx];
236 | }
237 | const int indx1 = n * odimcyx + c * odimyx + (y - pad_size) * odimx + (x - pad_size);
238 | gradInput1[indx1] = reduce_sum / nelems;
239 | }
240 |
241 | }
242 |
243 | template
244 | __global__ void correlation_backward_input2(int item, scalar_t* gradInput2, int nInputChannels, int inputHeight, int inputWidth,
245 | const scalar_t* __restrict__ gradOutput, int nOutputChannels, int outputHeight, int outputWidth,
246 | const scalar_t* __restrict__ rInput1,
247 | int pad_size,
248 | int kernel_size,
249 | int max_displacement,
250 | int stride1,
251 | int stride2)
252 | {
253 | // n (batch size), c (num of channels), y (height), x (width)
254 |
255 | int n = item;
256 | int y = blockIdx.x * stride1 + pad_size;
257 | int x = blockIdx.y * stride1 + pad_size;
258 | int c = blockIdx.z;
259 |
260 | int tch_off = threadIdx.x;
261 |
262 | int kernel_rad = (kernel_size - 1) / 2;
263 | int displacement_rad = max_displacement / stride2;
264 | int displacement_size = 2 * displacement_rad + 1;
265 |
266 | int pInputWidth = inputWidth + 2 * pad_size;
267 | int pInputHeight = inputHeight + 2 * pad_size;
268 |
269 | int pdimyxc = pInputHeight * pInputWidth * nInputChannels;
270 | int pdimxc = pInputWidth * nInputChannels;
271 | int pdimc = nInputChannels;
272 |
273 | int tdimcyx = nOutputChannels * outputHeight * outputWidth;
274 | int tdimyx = outputHeight * outputWidth;
275 | int tdimx = outputWidth;
276 |
277 | int odimcyx = nInputChannels * inputHeight* inputWidth;
278 | int odimyx = inputHeight * inputWidth;
279 | int odimx = inputWidth;
280 |
281 | scalar_t nelems = kernel_size * kernel_size * nInputChannels;
282 |
283 | __shared__ scalar_t prod_sum[THREADS_PER_BLOCK];
284 | prod_sum[tch_off] = 0;
285 |
286 | for (int tc = tch_off; tc < nOutputChannels; tc += THREADS_PER_BLOCK) {
287 | int i2 = (tc % displacement_size - displacement_rad) * stride2;
288 | int j2 = (tc / displacement_size - displacement_rad) * stride2;
289 |
290 | int xmin = (x - kernel_rad - max_displacement - i2) / stride1;
291 | int ymin = (y - kernel_rad - max_displacement - j2) / stride1;
292 |
293 | int xmax = (x + kernel_rad - max_displacement - i2) / stride1;
294 | int ymax = (y + kernel_rad - max_displacement - j2) / stride1;
295 |
296 | if (xmax < 0 || ymax < 0 || xmin >= outputWidth || ymin >= outputHeight) {
297 | // assumes gradInput2 is pre-allocated and zero filled
298 | continue;
299 | }
300 |
301 | if (xmin > xmax || ymin > ymax) {
302 | // assumes gradInput2 is pre-allocated and zero filled
303 | continue;
304 | }
305 |
306 | xmin = max(0,xmin);
307 | xmax = min(outputWidth-1,xmax);
308 |
309 | ymin = max(0,ymin);
310 | ymax = min(outputHeight-1,ymax);
311 |
312 | int indx1 = n * pdimyxc + (y - j2)* pdimxc + (x - i2) * pdimc + c;
313 | scalar_t val1 = rInput1[indx1];
314 |
315 | for (int j = ymin; j <= ymax; ++j) {
316 | for (int i = xmin; i <= xmax; ++i) {
317 | int tindx = n * tdimcyx + tc * tdimyx + j * tdimx + i;
318 | prod_sum[tch_off] += gradOutput[tindx] * val1;
319 | }
320 | }
321 | }
322 |
323 | __syncthreads();
324 |
325 | if(tch_off == 0) {
326 | scalar_t reduce_sum = 0;
327 | for(int idx = 0; idx < THREADS_PER_BLOCK; idx++) {
328 | reduce_sum += prod_sum[idx];
329 | }
330 | const int indx2 = n * odimcyx + c * odimyx + (y - pad_size) * odimx + (x - pad_size);
331 | gradInput2[indx2] = reduce_sum / nelems;
332 | }
333 |
334 | }
335 |
336 | int correlation_forward_cuda_kernel(at::Tensor& output,
337 | int ob,
338 | int oc,
339 | int oh,
340 | int ow,
341 | int osb,
342 | int osc,
343 | int osh,
344 | int osw,
345 |
346 | at::Tensor& input1,
347 | int ic,
348 | int ih,
349 | int iw,
350 | int isb,
351 | int isc,
352 | int ish,
353 | int isw,
354 |
355 | at::Tensor& input2,
356 | int gc,
357 | int gsb,
358 | int gsc,
359 | int gsh,
360 | int gsw,
361 |
362 | at::Tensor& rInput1,
363 | at::Tensor& rInput2,
364 | int pad_size,
365 | int kernel_size,
366 | int max_displacement,
367 | int stride1,
368 | int stride2,
369 | int corr_type_multiply,
370 | cudaStream_t stream)
371 | {
372 |
373 | int batchSize = ob;
374 |
375 | int nInputChannels = ic;
376 | int inputWidth = iw;
377 | int inputHeight = ih;
378 |
379 | int nOutputChannels = oc;
380 | int outputWidth = ow;
381 | int outputHeight = oh;
382 |
383 | dim3 blocks_grid(batchSize, inputHeight, inputWidth);
384 | dim3 threads_block(THREADS_PER_BLOCK);
385 |
386 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input1.type(), "channels_first_fwd_1", ([&] {
387 |
388 | channels_first<<>>(
389 | input1.data(), rInput1.data(), nInputChannels, inputHeight, inputWidth, pad_size);
390 |
391 | }));
392 |
393 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input2.type(), "channels_first_fwd_2", ([&] {
394 |
395 | channels_first<<>> (
396 | input2.data(), rInput2.data(), nInputChannels, inputHeight, inputWidth, pad_size);
397 |
398 | }));
399 |
400 | dim3 threadsPerBlock(THREADS_PER_BLOCK);
401 | dim3 totalBlocksCorr(batchSize, outputHeight, outputWidth);
402 |
403 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input1.type(), "correlation_forward", ([&] {
404 |
405 | correlation_forward<<>>
406 | (output.data(), nOutputChannels, outputHeight, outputWidth,
407 | rInput1.data(), nInputChannels, inputHeight, inputWidth,
408 | rInput2.data(),
409 | pad_size,
410 | kernel_size,
411 | max_displacement,
412 | stride1,
413 | stride2);
414 |
415 | }));
416 |
417 | cudaError_t err = cudaGetLastError();
418 |
419 |
420 | // check for errors
421 | if (err != cudaSuccess) {
422 | printf("error in correlation_forward_cuda_kernel: %s\n", cudaGetErrorString(err));
423 | return 0;
424 | }
425 |
426 | return 1;
427 | }
428 |
429 |
430 | int correlation_backward_cuda_kernel(
431 | at::Tensor& gradOutput,
432 | int gob,
433 | int goc,
434 | int goh,
435 | int gow,
436 | int gosb,
437 | int gosc,
438 | int gosh,
439 | int gosw,
440 |
441 | at::Tensor& input1,
442 | int ic,
443 | int ih,
444 | int iw,
445 | int isb,
446 | int isc,
447 | int ish,
448 | int isw,
449 |
450 | at::Tensor& input2,
451 | int gsb,
452 | int gsc,
453 | int gsh,
454 | int gsw,
455 |
456 | at::Tensor& gradInput1,
457 | int gisb,
458 | int gisc,
459 | int gish,
460 | int gisw,
461 |
462 | at::Tensor& gradInput2,
463 | int ggc,
464 | int ggsb,
465 | int ggsc,
466 | int ggsh,
467 | int ggsw,
468 |
469 | at::Tensor& rInput1,
470 | at::Tensor& rInput2,
471 | int pad_size,
472 | int kernel_size,
473 | int max_displacement,
474 | int stride1,
475 | int stride2,
476 | int corr_type_multiply,
477 | cudaStream_t stream)
478 | {
479 |
480 | int batchSize = gob;
481 | int num = batchSize;
482 |
483 | int nInputChannels = ic;
484 | int inputWidth = iw;
485 | int inputHeight = ih;
486 |
487 | int nOutputChannels = goc;
488 | int outputWidth = gow;
489 | int outputHeight = goh;
490 |
491 | dim3 blocks_grid(batchSize, inputHeight, inputWidth);
492 | dim3 threads_block(THREADS_PER_BLOCK);
493 |
494 |
495 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input1.type(), "lltm_forward_cuda", ([&] {
496 |
497 | channels_first<<>>(
498 | input1.data(),
499 | rInput1.data(),
500 | nInputChannels,
501 | inputHeight,
502 | inputWidth,
503 | pad_size
504 | );
505 | }));
506 |
507 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input2.type(), "lltm_forward_cuda", ([&] {
508 |
509 | channels_first<<>>(
510 | input2.data(),
511 | rInput2.data(),
512 | nInputChannels,
513 | inputHeight,
514 | inputWidth,
515 | pad_size
516 | );
517 | }));
518 |
519 | dim3 threadsPerBlock(THREADS_PER_BLOCK);
520 | dim3 totalBlocksCorr(inputHeight, inputWidth, nInputChannels);
521 |
522 | for (int n = 0; n < num; ++n) {
523 |
524 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(input2.type(), "lltm_forward_cuda", ([&] {
525 |
526 |
527 | correlation_backward_input1<<>> (
528 | n, gradInput1.data(), nInputChannels, inputHeight, inputWidth,
529 | gradOutput.data(), nOutputChannels, outputHeight, outputWidth,
530 | rInput2.data(),
531 | pad_size,
532 | kernel_size,
533 | max_displacement,
534 | stride1,
535 | stride2);
536 | }));
537 | }
538 |
539 | for(int n = 0; n < batchSize; n++) {
540 |
541 | AT_DISPATCH_FLOATING_TYPES_AND_HALF(rInput1.type(), "lltm_forward_cuda", ([&] {
542 |
543 | correlation_backward_input2<<>>(
544 | n, gradInput2.data(), nInputChannels, inputHeight, inputWidth,
545 | gradOutput.data(), nOutputChannels, outputHeight, outputWidth,
546 | rInput1.data(),
547 | pad_size,
548 | kernel_size,
549 | max_displacement,
550 | stride1,
551 | stride2);
552 |
553 | }));
554 | }
555 |
556 | // check for errors
557 | cudaError_t err = cudaGetLastError();
558 | if (err != cudaSuccess) {
559 | printf("error in correlation_backward_cuda_kernel: %s\n", cudaGetErrorString(err));
560 | return 0;
561 | }
562 |
563 | return 1;
564 | }
565 |
--------------------------------------------------------------------------------
/core/correlation_package/correlation_cuda_kernel.cuh:
--------------------------------------------------------------------------------
1 | #pragma once
2 |
3 | #include
4 | #include
5 | #include
6 |
7 | int correlation_forward_cuda_kernel(at::Tensor& output,
8 | int ob,
9 | int oc,
10 | int oh,
11 | int ow,
12 | int osb,
13 | int osc,
14 | int osh,
15 | int osw,
16 |
17 | at::Tensor& input1,
18 | int ic,
19 | int ih,
20 | int iw,
21 | int isb,
22 | int isc,
23 | int ish,
24 | int isw,
25 |
26 | at::Tensor& input2,
27 | int gc,
28 | int gsb,
29 | int gsc,
30 | int gsh,
31 | int gsw,
32 |
33 | at::Tensor& rInput1,
34 | at::Tensor& rInput2,
35 | int pad_size,
36 | int kernel_size,
37 | int max_displacement,
38 | int stride1,
39 | int stride2,
40 | int corr_type_multiply,
41 | cudaStream_t stream);
42 |
43 |
44 | int correlation_backward_cuda_kernel(
45 | at::Tensor& gradOutput,
46 | int gob,
47 | int goc,
48 | int goh,
49 | int gow,
50 | int gosb,
51 | int gosc,
52 | int gosh,
53 | int gosw,
54 |
55 | at::Tensor& input1,
56 | int ic,
57 | int ih,
58 | int iw,
59 | int isb,
60 | int isc,
61 | int ish,
62 | int isw,
63 |
64 | at::Tensor& input2,
65 | int gsb,
66 | int gsc,
67 | int gsh,
68 | int gsw,
69 |
70 | at::Tensor& gradInput1,
71 | int gisb,
72 | int gisc,
73 | int gish,
74 | int gisw,
75 |
76 | at::Tensor& gradInput2,
77 | int ggc,
78 | int ggsb,
79 | int ggsc,
80 | int ggsh,
81 | int ggsw,
82 |
83 | at::Tensor& rInput1,
84 | at::Tensor& rInput2,
85 | int pad_size,
86 | int kernel_size,
87 | int max_displacement,
88 | int stride1,
89 | int stride2,
90 | int corr_type_multiply,
91 | cudaStream_t stream);
92 |
--------------------------------------------------------------------------------
/core/correlation_package/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | # Minimum requirements for the build system to execute.
3 | requires = ["setuptools", "wheel", "numpy", "torch==1.0.1.post2"] # PEP 508 specifications.
4 |
--------------------------------------------------------------------------------
/core/correlation_package/setup.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | import os
3 |
4 | import torch
5 | from setuptools import find_packages, setup
6 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension
7 |
8 | cxx_args = ['-std=c++14']
9 |
10 | nvcc_args = [
11 | '-gencode', 'arch=compute_50,code=sm_50',
12 | '-gencode', 'arch=compute_52,code=sm_52',
13 | '-gencode', 'arch=compute_60,code=sm_60',
14 | '-gencode', 'arch=compute_61,code=sm_61',
15 | '-gencode', 'arch=compute_70,code=sm_70',
16 | '-gencode', 'arch=compute_70,code=compute_70'
17 | ]
18 |
19 | setup(
20 | name='correlation_cuda',
21 | ext_modules=[
22 | CUDAExtension('correlation_cuda', [
23 | 'correlation_cuda.cc',
24 | 'correlation_cuda_kernel.cu'
25 | ], extra_compile_args={'cxx': cxx_args, 'nvcc': nvcc_args})
26 | ],
27 | cmdclass={
28 | 'build_ext': BuildExtension
29 | })
30 |
--------------------------------------------------------------------------------
/core/data_preprocess.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import visibility
3 | import mathutils
4 | import numpy as np
5 |
6 | import sys
7 | sys.path.append('core')
8 | from utils_point import overlay_imgs, rotate_back
9 | from camera_model import CameraModel
10 |
11 | class Data_preprocess:
12 | def __init__(self, calibs, occlusion_threshold, occlusion_kernel):
13 | self.real_shape = None
14 | self.calibs = calibs
15 | self.occlusion_threshold = occlusion_threshold
16 | self.occlusion_kernel = occlusion_kernel
17 |
18 | def delta_1(self, uv_RT, uv, VI_indexes_RT, VI_indexes):
19 | indexes = VI_indexes_RT & VI_indexes
20 |
21 | indexes_1 = indexes[VI_indexes_RT]
22 | indexes_2 = indexes[VI_indexes]
23 |
24 | delta_P = uv[indexes_2, :] - uv_RT[indexes_1, :]
25 |
26 | return delta_P, indexes
27 |
28 | def gen_depth_img(self, uv_RT_af_index, depth_RT_af_index, indexes_uvRT, cam_params):
29 | device = uv_RT_af_index.device
30 |
31 | depth_img_RT = torch.zeros(self.real_shape[:2], device=device, dtype=torch.float)
32 | depth_img_RT += 1000.
33 |
34 | idx_img = (-1) * torch.ones(self.real_shape[:2], device=device, dtype=torch.float)
35 | indexes_uvRT = indexes_uvRT.float()
36 |
37 | depth_img_RT, idx_img = visibility.depth_image(uv_RT_af_index, depth_RT_af_index, indexes_uvRT,
38 | depth_img_RT, idx_img, uv_RT_af_index.shape[0],
39 | self.real_shape[1], self.real_shape[0])
40 | depth_img_RT[depth_img_RT == 1000.] = 0.
41 |
42 | deoccl_index_img = (-1) * torch.ones(self.real_shape[:2], device=device, dtype=torch.float)
43 |
44 | depth_img_no_occlusion_RT = torch.zeros_like(depth_img_RT, device=device)
45 | depth_img_no_occlusion_RT, deoccl_index_img = visibility.visibility2(depth_img_RT, cam_params,
46 | idx_img,
47 | depth_img_no_occlusion_RT,
48 | deoccl_index_img,
49 | depth_img_RT.shape[1],
50 | depth_img_RT.shape[0],
51 | self.occlusion_threshold,
52 | self.occlusion_kernel)
53 |
54 | return depth_img_no_occlusion_RT, deoccl_index_img.int()
55 |
56 | def fresh_indexes(self, indexes_uvRT_deoccl, indexes_uvRT):
57 | indexes_uvRT_deoccl_list_indexes = torch.where(indexes_uvRT_deoccl > 0)
58 |
59 | indexes_uvRT_deoccl_list = indexes_uvRT_deoccl[indexes_uvRT_deoccl_list_indexes[0][:], indexes_uvRT_deoccl_list_indexes[1][:]]
60 |
61 | indexes_temp = torch.zeros(indexes_uvRT.shape[0], device=indexes_uvRT_deoccl_list.device, dtype=torch.int32)
62 | indexes_temp[indexes_uvRT_deoccl_list.cpu().numpy() - 1] = indexes_uvRT_deoccl_list
63 |
64 | return indexes_temp
65 |
66 | def delta_2(self, delta_P, uv_RT_af_index, mask):
67 | device = delta_P.device
68 |
69 | delta_P_com = delta_P[mask, :]
70 |
71 | delta_P_0 = delta_P_com[:, 0]
72 | delta_P_1 = delta_P_com[:, 1]
73 |
74 | ## keep common points after deocclusion
75 | uv_RT_af_index_com = uv_RT_af_index[mask, :]
76 |
77 | ## generate displacement map
78 | project_delta_P_1 = torch.zeros(self.real_shape[:2], device=device, dtype=torch.int32)
79 | project_delta_P_2 = torch.zeros(self.real_shape[:2], device=device, dtype=torch.int32)
80 | project_delta_P_1[uv_RT_af_index_com[:, 1].cpu().numpy(), uv_RT_af_index_com[:, 0].cpu().numpy()] = delta_P_0
81 | project_delta_P_2[uv_RT_af_index_com[:, 1].cpu().numpy(), uv_RT_af_index_com[:, 0].cpu().numpy()] = delta_P_1
82 |
83 | project_delta_P_shape = list(self.real_shape[:2])
84 | project_delta_P_shape.insert(0, 2)
85 | project_delta_P = torch.zeros(project_delta_P_shape, device=device, dtype=torch.float)
86 |
87 | project_delta_P[0, :, :] = project_delta_P_1
88 | project_delta_P[1, :, :] = project_delta_P_2
89 |
90 | return project_delta_P
91 |
92 | def DownsampleCrop_KITTI_delta(self, img, depth, displacement, split):
93 | if split == 'train':
94 | x = np.random.randint(0, img.shape[1] - 320)
95 | y = np.random.randint(0, img.shape[2] - 960)
96 | else:
97 | x = (img.shape[1] - 320) // 2
98 | y = (img.shape[2] - 960) // 2
99 | img = img[:, x:x + 320, y:y + 960]
100 | depth = depth[:, x:x + 320, y:y + 960]
101 | displacement = displacement[:, x:x + 320, y:y + 960]
102 | return img, depth, displacement
103 |
104 |
105 | def push(self, rgbs, pcs, T_errs, R_errs, device, split='train'):
106 | lidar_input = []
107 | rgb_input = []
108 | flow_gt = []
109 |
110 | for idx in range(len(rgbs)):
111 | rgb = rgbs[idx].to(device)
112 | pc = pcs[idx].clone().to(device)
113 | reflectance = None
114 |
115 | self.real_shape = [rgb.shape[1], rgb.shape[2], rgb.shape[0]]
116 |
117 | R = mathutils.Quaternion(R_errs[idx].to(device)).to_matrix()
118 | R.resize_4x4()
119 | T = mathutils.Matrix.Translation(T_errs[idx].to(device))
120 | RT = T * R
121 |
122 | pc_rotated = rotate_back(pc, RT)
123 |
124 | cam_params = self.calibs[idx]
125 | cam_model = CameraModel()
126 | cam_model.focal_length = cam_params[:2]
127 | cam_model.principal_point = cam_params[2:]
128 | cam_params = cam_params.to(device)
129 |
130 | uv, depth, _, refl, VI_indexes = cam_model.project_withindex_pytorch(pc, self.real_shape, reflectance)
131 | uv = uv.t().int().contiguous()
132 |
133 | uv_RT, depth_RT, _, refl_RT, VI_indexes_RT = cam_model.project_withindex_pytorch(pc_rotated, self.real_shape,
134 | reflectance)
135 | uv_RT = uv_RT.t().int().contiguous()
136 |
137 | delta_P, indexes = self.delta_1(uv_RT, uv, VI_indexes_RT, VI_indexes)
138 |
139 | indexes_uvRT = VI_indexes_RT[indexes]
140 | indexes_uvRT = torch.arange(indexes_uvRT.shape[0]).to(device) + 1
141 |
142 | ## keep common points
143 | uv_RT_af_index = uv_RT[indexes[VI_indexes_RT], :]
144 | depth_RT_af_index = depth_RT[indexes[VI_indexes_RT]]
145 |
146 | indexes_uv = VI_indexes[indexes]
147 | indexes_uv = torch.arange(indexes_uv.shape[0]).to(device) + 1
148 |
149 | ## keep common points
150 | uv_af_index = uv[indexes[VI_indexes], :]
151 | depth_af_index = depth[indexes[VI_indexes]]
152 |
153 | depth_img_no_occlusion_RT, indexes_uvRT_deoccl = self.gen_depth_img(uv_RT_af_index, depth_RT_af_index,
154 | indexes_uvRT, cam_params)
155 | indexes_uvRT_fresh = self.fresh_indexes(indexes_uvRT_deoccl, indexes_uvRT)
156 |
157 | depth_img_no_occlusion, indexes_uv_deoccl = self.gen_depth_img(uv_af_index, depth_af_index, indexes_uv, cam_params)
158 | indexes_uv_fresh = self.fresh_indexes(indexes_uv_deoccl, indexes_uv)
159 |
160 | ## make depth_image for training
161 | depth_img_no_occlusion_RT_training, indexes_uvRT_deoccl_training = \
162 | self.gen_depth_img(uv_RT, depth_RT, VI_indexes_RT[VI_indexes_RT], cam_params)
163 |
164 | ## 这里归一化的时候是不是重新计算一下最大深度比较好
165 | depth_img_no_occlusion_RT_training /= 100.
166 |
167 | depth_img_no_occlusion_RT_training = depth_img_no_occlusion_RT_training.unsqueeze(0)
168 |
169 | mask1 = indexes_uv_fresh > 0
170 | mask2 = indexes_uvRT_fresh > 0
171 | mask = mask1 & mask2
172 | project_delta_P = self.delta_2(delta_P, uv_RT_af_index, mask)
173 |
174 | ## downsample and crop
175 | rgb, depth_img_no_occlusion_RT_training, project_delta_P \
176 | = self.DownsampleCrop_KITTI_delta(rgb, depth_img_no_occlusion_RT_training, project_delta_P, split)
177 |
178 | rgb_input.append(rgb)
179 | lidar_input.append(depth_img_no_occlusion_RT_training)
180 | flow_gt.append(project_delta_P)
181 |
182 | lidar_input = torch.stack(lidar_input)
183 | rgb_input = torch.stack(rgb_input)
184 | flow_gt = torch.stack(flow_gt)
185 |
186 | return rgb_input, lidar_input, flow_gt
--------------------------------------------------------------------------------
/core/datasets_kitti.py:
--------------------------------------------------------------------------------
1 | import csv
2 | import os
3 | from math import radians
4 |
5 | import h5py
6 | import mathutils
7 | import numpy as np
8 | import pandas as pd
9 | import torch
10 | import torchvision.transforms.functional as TTF
11 | from PIL import Image
12 | from torch.utils.data import Dataset
13 | from torchvision import transforms
14 |
15 | import sys
16 | sys.path.append("core")
17 | from camera_model import CameraModel
18 | from utils_point import rotate_forward, rotate_back, invert_pose
19 |
20 |
21 |
22 | def get_calib_kitti(sequence):
23 | if sequence == 0:
24 | return torch.tensor([718.856, 718.856, 607.1928, 185.2157])
25 | elif sequence == 3:
26 | return torch.tensor([721.5377, 721.5377, 609.5593, 172.854])
27 | elif sequence in [5, 6, 7, 8, 9]:
28 | return torch.tensor([707.0912, 707.0912, 601.8873, 183.1104])
29 | else:
30 | raise TypeError("Sequence Not Available")
31 |
32 |
33 | class DatasetVisibilityKittiSingle(Dataset):
34 |
35 | def __init__(self, dataset_dir, transform=None, augmentation=False, maps_folder='local_maps_0.1',
36 | use_reflectance=False, max_t=2., max_r=10., split='test', device='cpu', test_sequence='00'):
37 | super(DatasetVisibilityKittiSingle, self).__init__()
38 | self.use_reflectance = use_reflectance
39 | self.maps_folder = maps_folder
40 | self.device = device
41 | self.max_r = max_r
42 | self.max_t = max_t
43 | self.augmentation = augmentation
44 | self.root_dir = dataset_dir
45 | self.transform = transform
46 | self.split = split
47 | self.GTs_R = {}
48 | self.GTs_T = {}
49 |
50 | self.all_files = []
51 | self.model = CameraModel()
52 | self.model.focal_length = [7.18856e+02, 7.18856e+02]
53 | self.model.principal_point = [6.071928e+02, 1.852157e+02]
54 | for dir in ['00', '03', '05', '06', '07', '08', '09']:
55 | self.GTs_R[dir] = []
56 | self.GTs_T[dir] = []
57 | df_locations = pd.read_csv(os.path.join(dataset_dir, dir, 'poses.csv'), sep=',', dtype={'timestamp': str})
58 | for index, row in df_locations.iterrows():
59 | if not os.path.exists(os.path.join(dataset_dir, dir, maps_folder, f"{int(row['timestamp']):06d}"+'.h5')):
60 | continue
61 | if not os.path.exists(os.path.join(dataset_dir, dir, 'image_2', f"{int(row['timestamp']):06d}"+'.png')):
62 | continue
63 | if dir == test_sequence and split.startswith('test'):
64 | self.all_files.append(os.path.join(dir, f"{int(row['timestamp']):06d}"))
65 | elif (not dir == test_sequence) and split == 'train':
66 | self.all_files.append(os.path.join(dir, f"{int(row['timestamp']):06d}"))
67 | GT_R = np.array([row['qw'], row['qx'], row['qy'], row['qz']])
68 | GT_T = np.array([row['x'], row['y'], row['z']])
69 | self.GTs_R[dir].append(GT_R)
70 | self.GTs_T[dir].append(GT_T)
71 |
72 | self.test_RT = []
73 | if split == 'test':
74 | test_RT_file = os.path.join(dataset_dir, f'test_RT_seq{test_sequence}_{max_r:.2f}_{max_t:.2f}.csv')
75 | if os.path.exists(test_RT_file):
76 | print(f'TEST SET: Using this file: {test_RT_file}')
77 | df_test_RT = pd.read_csv(test_RT_file, sep=',')
78 | for index, row in df_test_RT.iterrows():
79 | self.test_RT.append(list(row))
80 | else:
81 | print(f'TEST SET - Not found: {test_RT_file}')
82 | print("Generating a new one")
83 | test_RT_file = open(test_RT_file, 'w')
84 | test_RT_file = csv.writer(test_RT_file, delimiter=',')
85 | test_RT_file.writerow(['id', 'tx', 'ty', 'tz', 'rx', 'ry', 'rz'])
86 | for i in range(len(self.all_files)):
87 | rotz = np.random.uniform(-max_r, max_r) * (3.141592 / 180.0)
88 | roty = np.random.uniform(-max_r, max_r) * (3.141592 / 180.0)
89 | rotx = np.random.uniform(-max_r, max_r) * (3.141592 / 180.0)
90 | transl_x = np.random.uniform(-max_t, max_t)
91 | transl_y = np.random.uniform(-max_t, max_t)
92 | transl_z = np.random.uniform(-max_t, min(max_t, 1.))
93 | test_RT_file.writerow([i, transl_x, transl_y, transl_z,
94 | rotx, roty, rotz])
95 | self.test_RT.append([i, transl_x, transl_y, transl_z,
96 | rotx, roty, rotz])
97 |
98 | assert len(self.test_RT) == len(self.all_files), "Something wrong with test RTs"
99 |
100 | def get_ground_truth_poses(self, sequence, frame):
101 | return self.GTs_T[sequence][frame], self.GTs_R[sequence][frame]
102 |
103 | def custom_transform(self, rgb, img_rotation=0., flip=False):
104 | to_tensor = transforms.ToTensor()
105 | normalization = transforms.Normalize(mean=[0.485, 0.456, 0.406],
106 | std=[0.229, 0.224, 0.225])
107 |
108 | #rgb = crop(rgb)
109 | if self.split == 'train':
110 | color_transform = transforms.ColorJitter(0.1, 0.1, 0.1)
111 | rgb = color_transform(rgb)
112 | if flip:
113 | rgb = TTF.hflip(rgb)
114 | rgb = TTF.rotate(rgb, img_rotation)
115 |
116 | rgb = to_tensor(rgb)
117 | rgb = normalization(rgb)
118 |
119 | return rgb
120 |
121 | def __len__(self):
122 | return len(self.all_files)
123 |
124 | def __getitem__(self, idx):
125 | item = self.all_files[idx]
126 | run = str(item.split('/')[0])
127 | timestamp = str(item.split('/')[1])
128 | img_path = os.path.join(self.root_dir, run, 'image_2', timestamp+'.png')
129 | pc_path = os.path.join(self.root_dir, run, self.maps_folder, timestamp+'.h5')
130 |
131 | try:
132 | with h5py.File(pc_path, 'r') as hf:
133 | pc = hf['PC'][:]
134 | if self.use_reflectance:
135 | reflectance = hf['intensity'][:]
136 | reflectance = torch.from_numpy(reflectance).float()
137 | except Exception as e:
138 | print(f'File Broken: {pc_path}')
139 | raise e
140 |
141 | pc_in = torch.from_numpy(pc.astype(np.float32))
142 | if pc_in.shape[1] == 4 or pc_in.shape[1] == 3:
143 | pc_in = pc_in.t()
144 | if pc_in.shape[0] == 3:
145 | homogeneous = torch.ones(pc_in.shape[1]).unsqueeze(0)
146 | pc_in = torch.cat((pc_in, homogeneous), 0)
147 | elif pc_in.shape[0] == 4:
148 | if not torch.all(pc_in[3,:] == 1.):
149 | pc_in[3,:] = 1.
150 | else:
151 | raise TypeError("Wrong PointCloud shape")
152 |
153 | h_mirror = False
154 | if np.random.rand() > 0.5 and self.split == 'train':
155 | h_mirror = True
156 | pc_in[1, :] *= -1
157 |
158 | img = Image.open(img_path)
159 | img_rotation = 0.
160 | if self.split == 'train':
161 | img_rotation = np.random.uniform(-5, 5)
162 | try:
163 | img = self.custom_transform(img, img_rotation, h_mirror)
164 | except OSError:
165 | new_idx = np.random.randint(0, self.__len__())
166 | return self.__getitem__(new_idx)
167 |
168 | # Rotate PointCloud for img_rotation
169 | if self.split == 'train':
170 | R = mathutils.Euler((radians(img_rotation), 0, 0), 'XYZ')
171 | T = mathutils.Vector((0., 0., 0.))
172 | pc_in = rotate_forward(pc_in, R, T)
173 |
174 | if self.split != 'test':
175 | max_angle = self.max_r
176 | rotz = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0)
177 | roty = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0)
178 | rotx = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0)
179 | transl_x = np.random.uniform(-self.max_t, self.max_t)
180 | transl_y = np.random.uniform(-self.max_t, self.max_t)
181 | transl_z = np.random.uniform(-self.max_t, min(self.max_t, 1.))
182 | else:
183 | initial_RT = self.test_RT[idx]
184 | rotz = initial_RT[6]
185 | roty = initial_RT[5]
186 | rotx = initial_RT[4]
187 | transl_x = initial_RT[1]
188 | transl_y = initial_RT[2]
189 | transl_z = initial_RT[3]
190 |
191 |
192 | R = mathutils.Euler((rotx, roty, rotz), 'XYZ')
193 | T = mathutils.Vector((transl_x, transl_y, transl_z))
194 |
195 | R, T = invert_pose(R, T)
196 | R, T = torch.tensor(R), torch.tensor(T)
197 |
198 | calib = get_calib_kitti(int(run))
199 | if h_mirror:
200 | calib[2] = (img.shape[2] / 2)*2 - calib[2]
201 |
202 | if not self.use_reflectance:
203 | sample = {'rgb': img, 'point_cloud': pc_in, 'calib': calib,
204 | 'tr_error': T, 'rot_error': R, 'idx': int(run), 'rgb_name': timestamp}
205 | else:
206 | sample = {'rgb': img, 'point_cloud': pc_in, 'reflectance': reflectance, 'calib': calib,
207 | 'tr_error': T, 'rot_error': R, 'idx': int(run), 'rgb_name': timestamp}
208 |
209 | return sample
210 |
--------------------------------------------------------------------------------
/core/depth_completion.py:
--------------------------------------------------------------------------------
1 | import os
2 | import numpy as np
3 | import cv2
4 |
5 | def sparse_to_dense(sparse, max_depth=100.):
6 | ## invert
7 | valid = sparse > 0.1
8 | sparse[valid] = max_depth - sparse[valid]
9 |
10 | ## dilate
11 | custom_kernel = np.array(
12 | [
13 | [0, 0, 1, 0, 0],
14 | [0, 1, 1, 1, 0],
15 | [1, 1, 1, 1, 1],
16 | [0, 1, 1, 1, 0],
17 | [0, 0, 1, 0, 0],
18 | ], dtype=np.uint8)
19 | sparse = cv2.dilate(sparse, custom_kernel)
20 |
21 | ## close
22 | custom_kernel = np.ones((5, 5), np.uint8)
23 | sparse = cv2.morphologyEx(sparse, cv2.MORPH_CLOSE, custom_kernel)
24 |
25 | ## fill
26 | invalid = sparse < 0.1
27 | custom_kernel = np.ones((7, 7), np.uint8)
28 | dilated = cv2.dilate(sparse, custom_kernel)
29 | sparse[invalid] = dilated[invalid]
30 |
31 | ## invert
32 | valid = sparse > 0.1
33 | sparse[valid] = max_depth - sparse[valid]
34 |
35 | return sparse
--------------------------------------------------------------------------------
/core/extractor.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 |
6 | class ResidualBlock(nn.Module):
7 | def __init__(self, in_planes, planes, norm_fn='group', stride=1):
8 | super(ResidualBlock, self).__init__()
9 |
10 | self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, stride=stride)
11 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1)
12 | self.relu = nn.ReLU(inplace=True)
13 |
14 | num_groups = planes // 8
15 |
16 | if norm_fn == 'group':
17 | self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
18 | self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
19 | if not stride == 1:
20 | self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
21 |
22 | elif norm_fn == 'batch':
23 | self.norm1 = nn.BatchNorm2d(planes)
24 | self.norm2 = nn.BatchNorm2d(planes)
25 | if not stride == 1:
26 | self.norm3 = nn.BatchNorm2d(planes)
27 |
28 | elif norm_fn == 'instance':
29 | self.norm1 = nn.InstanceNorm2d(planes)
30 | self.norm2 = nn.InstanceNorm2d(planes)
31 | if not stride == 1:
32 | self.norm3 = nn.InstanceNorm2d(planes)
33 |
34 | elif norm_fn == 'none':
35 | self.norm1 = nn.Sequential()
36 | self.norm2 = nn.Sequential()
37 | if not stride == 1:
38 | self.norm3 = nn.Sequential()
39 |
40 | if stride == 1:
41 | self.downsample = None
42 |
43 | else:
44 | self.downsample = nn.Sequential(
45 | nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3)
46 |
47 | def forward(self, x):
48 | y = x
49 | y = self.relu(self.norm1(self.conv1(y)))
50 | y = self.relu(self.norm2(self.conv2(y)))
51 |
52 | if self.downsample is not None:
53 | x = self.downsample(x)
54 |
55 | return self.relu(x + y)
56 |
57 | class ResidualBlock_Lidar(nn.Module):
58 | def __init__(self, in_planes, planes, norm_fn='group', stride=1):
59 | super(ResidualBlock_Lidar, self).__init__()
60 |
61 | self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, padding=1, stride=stride)
62 | self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1)
63 | self.relu = nn.ReLU(inplace=True)
64 |
65 | num_groups = planes // 8
66 |
67 | if norm_fn == 'group':
68 | self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
69 | self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
70 | if not stride == 1:
71 | self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
72 |
73 | elif norm_fn == 'batch':
74 | self.norm1 = nn.BatchNorm2d(planes)
75 | self.norm2 = nn.BatchNorm2d(planes)
76 | if not stride == 1:
77 | self.norm3 = nn.BatchNorm2d(planes)
78 |
79 | elif norm_fn == 'instance':
80 | self.norm1 = nn.InstanceNorm2d(planes)
81 | self.norm2 = nn.InstanceNorm2d(planes)
82 | if not stride == 1:
83 | self.norm3 = nn.InstanceNorm2d(planes)
84 |
85 | elif norm_fn == 'none':
86 | self.norm1 = nn.Sequential()
87 | self.norm2 = nn.Sequential()
88 | if not stride == 1:
89 | self.norm3 = nn.Sequential()
90 |
91 | if stride == 1:
92 | self.downsample = None
93 |
94 | else:
95 | self.downsample = nn.Sequential(
96 | nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm3)
97 |
98 | def forward(self, x):
99 | if isinstance(x, tuple):
100 | mask = x[1]
101 | x = x[0]
102 | else:
103 | Dim = x.shape[1]
104 | mask = x[:, Dim-1, :, :].unsqueeze(1)
105 | x = x[:, :Dim-1, :, :]
106 | y = x
107 | y = self.relu(self.norm1(self.conv1(y)))
108 | if y.shape[2] != x.shape[2]:
109 | mask = mask[:, :, 0::2, 0::2]
110 | y = torch.mul(y, mask)
111 | y = self.relu(self.norm2(self.conv2(y)))
112 | y = torch.mul(y, mask)
113 | if self.downsample is not None:
114 | x = self.downsample(x)
115 | x = torch.mul(x, mask)
116 |
117 | return self.relu(x + y), mask
118 |
119 |
120 | class BottleneckBlock(nn.Module):
121 | def __init__(self, in_planes, planes, norm_fn='group', stride=1):
122 | super(BottleneckBlock, self).__init__()
123 |
124 | self.conv1 = nn.Conv2d(in_planes, planes // 4, kernel_size=1, padding=0)
125 | self.conv2 = nn.Conv2d(planes // 4, planes // 4, kernel_size=3, padding=1, stride=stride)
126 | self.conv3 = nn.Conv2d(planes // 4, planes, kernel_size=1, padding=0)
127 | self.relu = nn.ReLU(inplace=True)
128 |
129 | num_groups = planes // 8
130 |
131 | if norm_fn == 'group':
132 | self.norm1 = nn.GroupNorm(num_groups=num_groups, num_channels=planes // 4)
133 | self.norm2 = nn.GroupNorm(num_groups=num_groups, num_channels=planes // 4)
134 | self.norm3 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
135 | if not stride == 1:
136 | self.norm4 = nn.GroupNorm(num_groups=num_groups, num_channels=planes)
137 |
138 | elif norm_fn == 'batch':
139 | self.norm1 = nn.BatchNorm2d(planes // 4)
140 | self.norm2 = nn.BatchNorm2d(planes // 4)
141 | self.norm3 = nn.BatchNorm2d(planes)
142 | if not stride == 1:
143 | self.norm4 = nn.BatchNorm2d(planes)
144 |
145 | elif norm_fn == 'instance':
146 | self.norm1 = nn.InstanceNorm2d(planes // 4)
147 | self.norm2 = nn.InstanceNorm2d(planes // 4)
148 | self.norm3 = nn.InstanceNorm2d(planes)
149 | if not stride == 1:
150 | self.norm4 = nn.InstanceNorm2d(planes)
151 |
152 | elif norm_fn == 'none':
153 | self.norm1 = nn.Sequential()
154 | self.norm2 = nn.Sequential()
155 | self.norm3 = nn.Sequential()
156 | if not stride == 1:
157 | self.norm4 = nn.Sequential()
158 |
159 | if stride == 1:
160 | self.downsample = None
161 |
162 | else:
163 | self.downsample = nn.Sequential(
164 | nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride), self.norm4)
165 |
166 | def forward(self, x):
167 | y = x
168 | y = self.relu(self.norm1(self.conv1(y)))
169 | y = self.relu(self.norm2(self.conv2(y)))
170 | y = self.relu(self.norm3(self.conv3(y)))
171 |
172 | if self.downsample is not None:
173 | x = self.downsample(x)
174 |
175 | return self.relu(x + y)
176 |
177 |
178 | class BasicEncoder(nn.Module):
179 | def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0):
180 | super(BasicEncoder, self).__init__()
181 | self.norm_fn = norm_fn
182 |
183 | if self.norm_fn == 'group':
184 | self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)
185 |
186 | elif self.norm_fn == 'batch':
187 | self.norm1 = nn.BatchNorm2d(64)
188 |
189 | elif self.norm_fn == 'instance':
190 | self.norm1 = nn.InstanceNorm2d(64)
191 |
192 | elif self.norm_fn == 'none':
193 | self.norm1 = nn.Sequential()
194 |
195 | self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3)
196 | self.relu1 = nn.ReLU(inplace=True)
197 |
198 | self.in_planes = 64
199 | self.layer1 = self._make_layer(64, stride=1)
200 | self.layer2 = self._make_layer(96, stride=2)
201 | self.layer3 = self._make_layer(128, stride=2)
202 |
203 | # output convolution
204 | self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1)
205 |
206 | self.dropout = None
207 | if dropout > 0:
208 | self.dropout = nn.Dropout2d(p=dropout)
209 |
210 | for m in self.modules():
211 | if isinstance(m, nn.Conv2d):
212 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
213 | elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):
214 | if m.weight is not None:
215 | nn.init.constant_(m.weight, 1)
216 | if m.bias is not None:
217 | nn.init.constant_(m.bias, 0)
218 |
219 | def _make_layer(self, dim, stride=1):
220 | layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)
221 | layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)
222 | layers = (layer1, layer2)
223 |
224 | self.in_planes = dim
225 | return nn.Sequential(*layers)
226 |
227 | def forward(self, x):
228 |
229 | # if input is list, combine batch dimension
230 | is_list = isinstance(x, tuple) or isinstance(x, list)
231 | if is_list:
232 | batch_dim = x[0].shape[0]
233 | x = torch.cat(x, dim=0)
234 |
235 | x = self.conv1(x)
236 | # import cv2
237 | # cv2.imwrite(f"./images/output/image.png", torch.sum(x, dim=1)[0].cpu().detach().numpy()*255)
238 | x = self.norm1(x)
239 | x = self.relu1(x)
240 |
241 | x = self.layer1(x)
242 | x = self.layer2(x)
243 | x = self.layer3(x)
244 |
245 | x = self.conv2(x)
246 |
247 | if self.training and self.dropout is not None:
248 | x = self.dropout(x)
249 |
250 | if is_list:
251 | x = torch.split(x, [batch_dim, batch_dim], dim=0)
252 |
253 | return x
254 |
255 |
256 | class BasicEncoder_LIDAR(nn.Module):
257 | def __init__(self, output_dim=128, norm_fn='batch', dropout=0.0):
258 | super(BasicEncoder_LIDAR, self).__init__()
259 | self.norm_fn = norm_fn
260 |
261 | if self.norm_fn == 'group':
262 | self.norm1 = nn.GroupNorm(num_groups=8, num_channels=64)
263 |
264 | elif self.norm_fn == 'batch':
265 | self.norm1 = nn.BatchNorm2d(64)
266 |
267 | elif self.norm_fn == 'instance':
268 | self.norm1 = nn.InstanceNorm2d(64)
269 |
270 | elif self.norm_fn == 'none':
271 | self.norm1 = nn.Sequential()
272 |
273 | self.convm = nn.Conv2d(1, 1, kernel_size=7, stride=2, padding=3)
274 |
275 | self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3)
276 | self.relu1 = nn.ReLU(inplace=True)
277 |
278 | self.in_planes = 64
279 | self.layer1 = self._make_layer(64, stride=1)
280 | self.layer2 = self._make_layer(96, stride=2)
281 | self.layer3 = self._make_layer(128, stride=2)
282 |
283 | # output convolution
284 | self.conv2 = nn.Conv2d(128, output_dim, kernel_size=1)
285 |
286 | self.dropout = None
287 | if dropout > 0:
288 | self.dropout = nn.Dropout2d(p=dropout)
289 |
290 | for m in self.modules():
291 | if isinstance(m, nn.Conv2d):
292 | nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
293 | elif isinstance(m, (nn.BatchNorm2d, nn.InstanceNorm2d, nn.GroupNorm)):
294 | if m.weight is not None:
295 | nn.init.constant_(m.weight, 1)
296 | if m.bias is not None:
297 | nn.init.constant_(m.bias, 0)
298 |
299 | def _make_layer(self, dim, stride=1):
300 | layer1 = ResidualBlock(self.in_planes, dim, self.norm_fn, stride=stride)
301 | layer2 = ResidualBlock(dim, dim, self.norm_fn, stride=1)
302 | layers = (layer1, layer2)
303 |
304 | self.in_planes = dim
305 | return nn.Sequential(*layers)
306 |
307 | def forward(self, x, mask):
308 |
309 | # if input is list, combine batch dimension
310 | is_list = isinstance(x, tuple) or isinstance(x, list)
311 | if is_list:
312 | batch_dim = x[0].shape[0]
313 | x = torch.cat(x, dim=0)
314 |
315 | mask = self.convm(mask)
316 |
317 | # if i_batch is not None:
318 | # import cv2
319 | # import seaborn
320 | # import matplotlib.pyplot as plt
321 | # import numpy as np
322 | # heatmap = seaborn.heatmap(mask[0, 0, :, :].cpu().detach().numpy(), xticklabels=False, yticklabels=False, cbar=False, square=True, robust=True, cmap='gist_rainbow')
323 | # figure = heatmap.get_figure()
324 | # figure.savefig(f"./images/output/{i_batch:06d}_heatmap.png")
325 | # plt.close()
326 |
327 | x = self.conv1(x) # 160 480 64
328 | x = torch.mul(x, mask)
329 |
330 | x = self.norm1(x)
331 | x = self.relu1(x)
332 | x = self.layer1(x) # 160 480 64
333 | x = self.layer2(x) # 80 240 96
334 | x = self.layer3(x) # 40 120 128
335 |
336 | x = self.conv2(x) # 40 120 128
337 |
338 | if self.training and self.dropout is not None:
339 | x = self.dropout(x)
340 |
341 | if is_list:
342 | x = torch.split(x, [batch_dim, batch_dim], dim=0)
343 |
344 | return x
--------------------------------------------------------------------------------
/core/flow2pose.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import visibility
3 | import cv2
4 | import numpy as np
5 | import mathutils
6 | import math
7 |
8 | import sys
9 | sys.path.append('core')
10 | from camera_model import CameraModel
11 | from utils_point import invert_pose, quat2mat, tvector2mat, quaternion_from_matrix, rotation_vector_to_euler
12 | from quaternion_distances import quaternion_distance
13 |
14 | def Flow2Pose(flow_up, depth, calib):
15 | device = flow_up.device
16 |
17 | output = torch.zeros(flow_up.shape).to(device)
18 | pred_depth_img = torch.zeros(depth.shape).to(device)
19 | pred_depth_img += 1000.
20 | output = visibility.image_warp_index(depth.to(device), flow_up.int(), pred_depth_img, output,
21 | depth.shape[3], depth.shape[2])
22 | pred_depth_img[pred_depth_img == 1000.] = 0.
23 | pc_project_uv = output.cpu().permute(0, 2, 3, 1).numpy()
24 |
25 | depth_img_ori = depth.cpu().numpy() * 100.
26 |
27 | mask_depth_1 = pc_project_uv[0, :, :, 0] != 0
28 | mask_depth_2 = pc_project_uv[0, :, :, 1] != 0
29 | mask_depth = mask_depth_1 + mask_depth_2
30 | depth_img = depth_img_ori[0, 0, :, :] * mask_depth
31 |
32 | cam_model = CameraModel()
33 | cam_params = calib[0].cpu().numpy()
34 | x, y = 28, 140
35 | cam_params[2] = cam_params[2] + 480 - (y + y + 960) / 2.
36 | cam_params[3] = cam_params[3] + 160 - (x + x + 320) / 2.
37 | cam_model.focal_length = cam_params[:2]
38 | cam_model.principal_point = cam_params[2:]
39 | cam_mat = np.array([[cam_params[0], 0, cam_params[2]], [0, cam_params[1], cam_params[3]], [0, 0, 1.]])
40 |
41 | pts3d, pts2d, match_index = cam_model.deproject_pytorch(depth_img, pc_project_uv[0, :, :, :])
42 | ret, rvecs, tvecs, inliers = cv2.solvePnPRansac(pts3d, pts2d, cam_mat, None)
43 |
44 | reuler = rotation_vector_to_euler(rvecs)
45 | R = mathutils.Euler((reuler[0], reuler[1], reuler[2]), 'XYZ')
46 | T = mathutils.Vector((tvecs[0], tvecs[1], tvecs[2]))
47 | R, T = invert_pose(R, T)
48 | R, T = torch.tensor(R), torch.tensor(T)
49 | R_predicted = R[[0, 3, 1, 2]]
50 | T_predicted = T[[2, 0, 1]]
51 |
52 | return R_predicted, T_predicted
53 |
54 | def Flow2PoseBPnP(flow_up, depth, calib, bpnp):
55 | device =flow_up.device
56 | output = torch.zeros(flow_up.shape).to(device)
57 | pred_depth_img = torch.zeros(depth.shape).to(device)
58 | pred_depth_img += 1000.
59 | output = visibility.image_warp_index(depth.to(device), flow_up.int().to(device), pred_depth_img, output,
60 | depth.shape[3], depth.shape[2])
61 | pred_depth_img[pred_depth_img == 1000.] = 0.
62 | pc_project_uv = output.cpu().permute(0, 2, 3, 1).numpy()
63 |
64 | depth_img_ori = depth.cpu().numpy() * 100.
65 |
66 | mask_depth_1 = pc_project_uv[0, :, :, 0] != 0
67 | mask_depth_2 = pc_project_uv[0, :, :, 1] != 0
68 | mask_depth = mask_depth_1 + mask_depth_2
69 | depth_img = depth_img_ori[0, 0, :, :] * mask_depth
70 |
71 | cam_model = CameraModel()
72 | cam_params = calib[0].cpu().numpy()
73 | x, y = 28, 140
74 | cam_params[2] = cam_params[2] + 480 - (y + y + 960) / 2.
75 | cam_params[3] = cam_params[3] + 160 - (x + x + 320) / 2.
76 | cam_model.focal_length = cam_params[:2]
77 | cam_model.principal_point = cam_params[2:]
78 | cam_mat = np.array([[cam_params[0], 0, cam_params[2]], [0, cam_params[1], cam_params[3]], [0, 0, 1.]])
79 |
80 | pts3d, pts2d, match_index = cam_model.deproject_pytorch(depth_img, pc_project_uv[0, :, :, :])
81 |
82 | pts3d = torch.tensor(pts3d, dtype=torch.float32).to(device)
83 | pts2d = torch.tensor(pts2d, dtype=torch.float32).to(device)
84 | pts2d = pts2d.unsqueeze(0)
85 | K = torch.tensor(cam_mat, dtype=torch.float32).to(device)
86 | P_out = bpnp(pts2d, pts3d, K)
87 | rvecs = P_out[0, 0:3]
88 | tvecs = P_out[0, 3:]
89 |
90 | R = mathutils.Euler((rvecs[0], rvecs[1], rvecs[2]), 'XYZ')
91 | T = mathutils.Vector((tvecs[0], tvecs[1], tvecs[2]))
92 | R, T = invert_pose(R, T)
93 | R, T = torch.tensor(R), torch.tensor(T)
94 | R_predicted = R[[0, 3, 1, 2]]
95 | T_predicted = T[[2, 0, 1]]
96 |
97 | return R_predicted, T_predicted
98 |
99 | def err_Pose(R_pred, T_pred, R_gt, T_gt):
100 | device = R_pred.device
101 |
102 | R = quat2mat(R_gt)
103 | T = tvector2mat(T_gt)
104 | RT_inv = torch.mm(T, R).to(device)
105 | RT = RT_inv.clone().inverse()
106 |
107 | R_pred = quat2mat(R_pred)
108 | T_pred = tvector2mat(T_pred)
109 | RT_pred = torch.mm(T_pred, R_pred)
110 | RT_pred = RT_pred.to(device)
111 | RT_new = torch.mm(RT, RT_pred)
112 |
113 | T_composed = RT_new[:3, 3]
114 | R_composed = quaternion_from_matrix(RT_new)
115 | R_composed = R_composed.unsqueeze(0)
116 | total_trasl_error = torch.tensor(0.0).to(device)
117 | total_rot_error = quaternion_distance(R_composed.to(device), torch.tensor([[1., 0., 0., 0.]]).to(device),
118 | device=R_composed.device)
119 | total_rot_error = total_rot_error * 180. / math.pi
120 | total_trasl_error += torch.norm(T_composed.to(device)) * 100.
121 |
122 | total_trasl_fail = torch.norm(T_composed - T_gt[0].to(device)) * 100
123 | if total_trasl_fail > 400:
124 | is_fail = True
125 | else:
126 | is_fail = False
127 | return total_rot_error, total_trasl_error, is_fail
128 |
129 |
130 |
--------------------------------------------------------------------------------
/core/flow_viz.py:
--------------------------------------------------------------------------------
1 | # Flow visualization code used from https://github.com/tomrunia/OpticalFlow_Visualization
2 |
3 |
4 | # MIT License
5 | #
6 | # Copyright (c) 2018 Tom Runia
7 | #
8 | # Permission is hereby granted, free of charge, to any person obtaining a copy
9 | # of this software and associated documentation files (the "Software"), to deal
10 | # in the Software without restriction, including without limitation the rights
11 | # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
12 | # copies of the Software, and to permit persons to whom the Software is
13 | # furnished to do so, subject to conditions.
14 | #
15 | # Author: Tom Runia
16 | # Date Created: 2018-08-03
17 |
18 | import numpy as np
19 |
20 | def make_colorwheel():
21 | """
22 | Generates a color wheel for optical flow visualization as presented in:
23 | Baker et al. "A Database and Evaluation Methodology for Optical Flow" (ICCV, 2007)
24 | URL: http://vision.middlebury.edu/flow/flowEval-iccv07.pdf
25 |
26 | Code follows the original C++ source code of Daniel Scharstein.
27 | Code follows the the Matlab source code of Deqing Sun.
28 |
29 | Returns:
30 | np.ndarray: Color wheel
31 | """
32 |
33 | RY = 15
34 | YG = 6
35 | GC = 4
36 | CB = 11
37 | BM = 13
38 | MR = 6
39 |
40 | ncols = RY + YG + GC + CB + BM + MR
41 | colorwheel = np.zeros((ncols, 3))
42 | col = 0
43 |
44 | # RY
45 | colorwheel[0:RY, 0] = 255
46 | colorwheel[0:RY, 1] = np.floor(255*np.arange(0,RY)/RY)
47 | col = col+RY
48 | # YG
49 | colorwheel[col:col+YG, 0] = 255 - np.floor(255*np.arange(0,YG)/YG)
50 | colorwheel[col:col+YG, 1] = 255
51 | col = col+YG
52 | # GC
53 | colorwheel[col:col+GC, 1] = 255
54 | colorwheel[col:col+GC, 2] = np.floor(255*np.arange(0,GC)/GC)
55 | col = col+GC
56 | # CB
57 | colorwheel[col:col+CB, 1] = 255 - np.floor(255*np.arange(CB)/CB)
58 | colorwheel[col:col+CB, 2] = 255
59 | col = col+CB
60 | # BM
61 | colorwheel[col:col+BM, 2] = 255
62 | colorwheel[col:col+BM, 0] = np.floor(255*np.arange(0,BM)/BM)
63 | col = col+BM
64 | # MR
65 | colorwheel[col:col+MR, 2] = 255 - np.floor(255*np.arange(MR)/MR)
66 | colorwheel[col:col+MR, 0] = 255
67 | return colorwheel
68 |
69 |
70 | def flow_uv_to_colors(u, v, convert_to_bgr=False):
71 | """
72 | Applies the flow color wheel to (possibly clipped) flow components u and v.
73 |
74 | According to the C++ source code of Daniel Scharstein
75 | According to the Matlab source code of Deqing Sun
76 |
77 | Args:
78 | u (np.ndarray): Input horizontal flow of shape [H,W]
79 | v (np.ndarray): Input vertical flow of shape [H,W]
80 | convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.
81 |
82 | Returns:
83 | np.ndarray: Flow visualization image of shape [H,W,3]
84 | """
85 | flow_image = np.zeros((u.shape[0], u.shape[1], 3), np.uint8)
86 | colorwheel = make_colorwheel() # shape [55x3]
87 | ncols = colorwheel.shape[0]
88 | rad = np.sqrt(np.square(u) + np.square(v))
89 | a = np.arctan2(-v, -u)/np.pi
90 | fk = (a+1) / 2*(ncols-1)
91 | k0 = np.floor(fk).astype(np.int32)
92 | k1 = k0 + 1
93 | k1[k1 == ncols] = 0
94 | f = fk - k0
95 | for i in range(colorwheel.shape[1]):
96 | tmp = colorwheel[:,i]
97 | col0 = tmp[k0] / 255.0
98 | col1 = tmp[k1] / 255.0
99 | col = (1-f)*col0 + f*col1
100 | idx = (rad <= 1)
101 | col[idx] = 1 - rad[idx] * (1-col[idx])
102 | col[~idx] = col[~idx] * 0.75 # out of range
103 | # Note the 2-i => BGR instead of RGB
104 | ch_idx = 2-i if convert_to_bgr else i
105 | flow_image[:,:,ch_idx] = np.floor(255 * col)
106 | return flow_image
107 |
108 |
109 | def flow_to_image(flow_uv, clip_flow=None, convert_to_bgr=False):
110 | """
111 | Expects a two dimensional flow image of shape.
112 |
113 | Args:
114 | flow_uv (np.ndarray): Flow UV image of shape [H,W,2]
115 | clip_flow (float, optional): Clip maximum of flow values. Defaults to None.
116 | convert_to_bgr (bool, optional): Convert output image to BGR. Defaults to False.
117 |
118 | Returns:
119 | np.ndarray: Flow visualization image of shape [H,W,3]
120 | """
121 | assert flow_uv.ndim == 3, 'input flow must have three dimensions'
122 | assert flow_uv.shape[2] == 2, 'input flow must have shape [H,W,2]'
123 | if clip_flow is not None:
124 | flow_uv = np.clip(flow_uv, 0, clip_flow)
125 | u = flow_uv[:,:,0]
126 | v = flow_uv[:,:,1]
127 | rad = np.sqrt(np.square(u) + np.square(v))
128 | rad_max = np.max(rad)
129 | epsilon = 1e-5
130 | u = u / (rad_max + epsilon)
131 | v = v / (rad_max + epsilon)
132 | return flow_uv_to_colors(u, v, convert_to_bgr)
--------------------------------------------------------------------------------
/core/losses.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import numpy as np
4 | import visibility
5 | from depth_completion import sparse_to_dense
6 |
7 | def sequence_loss(flow_preds, flow_gt, gamma=0.8, MAX_FLOW=400):
8 | """ Loss function defined over sequence of flow predictions """
9 |
10 | mag = torch.sum(flow_gt ** 2, dim=1).sqrt()
11 | Mask = torch.zeros([flow_gt.shape[0], flow_gt.shape[1], flow_gt.shape[2],
12 | flow_gt.shape[3]]).to(flow_gt.device)
13 | mask = (flow_gt[:, 0, :, :] != 0) + (flow_gt[:, 1, :, :] != 0)
14 | valid = mask & (mag < MAX_FLOW)
15 | Mask[:, 0, :, :] = valid
16 | Mask[:, 1, :, :] = valid
17 | Mask = Mask != 0
18 | mask_sum = torch.sum(mask, dim=[1, 2])
19 |
20 | n_predictions = len(flow_preds)
21 | flow_loss = 0.0
22 |
23 | for i in range(n_predictions):
24 | i_weight = gamma ** (n_predictions - i - 1)
25 | Loss_reg = (flow_preds[i] - flow_gt) * Mask
26 | Loss_reg = torch.norm(Loss_reg, dim=1)
27 | Loss_reg = torch.sum(Loss_reg, dim=[1, 2])
28 | Loss_reg = Loss_reg / mask_sum
29 | flow_loss += i_weight * Loss_reg.mean()
30 |
31 | epe = torch.sum((flow_preds[-1] - flow_gt) ** 2, dim=1).sqrt()
32 | epe = epe.view(-1)[valid.view(-1)]
33 |
34 | metrics = {
35 | 'epe': epe.mean().item(),
36 | '1px': (epe < 1).float().mean().item(),
37 | '3px': (epe < 3).float().mean().item(),
38 | '5px': (epe < 5).float().mean().item(),
39 | }
40 |
41 | return flow_loss, metrics
42 |
43 |
44 | def normal_loss(pred_flows, gt_flows, cam_mats, lidar_input):
45 | device = gt_flows.device
46 |
47 | loss = 0.
48 |
49 | N = lidar_input.shape[0]
50 | flow_up = pred_flows[-1]
51 | for i in range(N):
52 | depth_img = lidar_input[i].unsqueeze(0)
53 | flow = flow_up[i].unsqueeze(0)
54 | gt_flow = gt_flows[i].unsqueeze(0)
55 | cam_params = cam_mats[i, :]
56 | fx, fy = cam_params[0], cam_params[1]
57 |
58 | output = torch.zeros(flow.shape).to(device)
59 | pred_depth_img = torch.zeros(depth_img.shape).to(device)
60 | pred_depth_img += 1000.
61 | output = visibility.image_warp_index(depth_img.to(device), flow.int().to(device), pred_depth_img,
62 | output,
63 | depth_img.shape[3], depth_img.shape[2])
64 | pred_depth_img[pred_depth_img == 1000.] = 0.
65 |
66 | output2 = torch.zeros(flow.shape).to(device)
67 | gt_depth_img = torch.zeros(depth_img.shape).to(device)
68 | gt_depth_img += 1000.
69 | output2 = visibility.image_warp_index(depth_img.to(device), gt_flow.int().to(device), gt_depth_img,
70 | output2,
71 | depth_img.shape[3], depth_img.shape[2])
72 | gt_depth_img[gt_depth_img == 1000.] = 0.
73 |
74 | gt_depth_img_dilate = sparse_to_dense(gt_depth_img[0, 0, :, :].cpu().numpy().astype(np.float32))
75 | gt_depth_img_dilate = torch.from_numpy(gt_depth_img_dilate).to(device).unsqueeze(0)
76 | pred_depth_img_dilate = sparse_to_dense(pred_depth_img[0, 0, :, :].cpu().numpy().astype(np.float32))
77 | pred_depth_img_dilate = torch.from_numpy(pred_depth_img_dilate).to(device).unsqueeze(0)
78 |
79 | ## choose common points
80 | mask1_1 = output2[0, 0, :, :] > 0
81 | mask1_2 = output2[0, 1, :, :] > 0
82 | mask1 = mask1_1 + mask1_2
83 | mask2_1 = output[0, 0, :, :] > 0
84 | mask2_2 = output[0, 1, :, :] > 0
85 | mask2 = mask2_1 + mask2_2
86 | mask = mask1 * mask2
87 | Mask = torch.cat((mask.unsqueeze(0), mask.unsqueeze(0)), dim=0).unsqueeze(0)
88 | pred_index = torch.cat((output[0, 0, :, :][mask].unsqueeze(0), output[0, 1, :, :][mask].unsqueeze(0)), dim=0)
89 | gt_index = torch.cat((output2[0, 0, :, :][mask].unsqueeze(0), output2[0, 1, :, :][mask].unsqueeze(0)), dim=0)
90 |
91 | ## calculate normal loss
92 | H, W = depth_img.shape[2], depth_img.shape[3]
93 | xx = torch.arange(0, W).view(1, -1).repeat(H, 1)
94 | yy = torch.arange(0, H).view(-1, 1).repeat(1, W)
95 | xx = xx.view(1, H, W)
96 | yy = yy.view(1, H, W)
97 | u0 = xx.to(device) - torch.tensor(W // 2, dtype=torch.float32).to(device)
98 | v0 = yy.to(device) - torch.tensor(H // 2, dtype=torch.float32).to(device)
99 |
100 | x = u0 * torch.abs(gt_depth_img_dilate) / fx
101 | y = v0 * torch.abs(gt_depth_img_dilate) / fy
102 | z = gt_depth_img_dilate
103 | pc_gt = torch.cat([x, y, z], 0).permute(1, 2, 0)
104 | x = u0 * torch.abs(pred_depth_img_dilate) / fx
105 | y = v0 * torch.abs(pred_depth_img_dilate) / fy
106 | z = pred_depth_img_dilate
107 | pc_pred = torch.cat([x, y, z], 0).permute(1, 2, 0)
108 |
109 | num = gt_index.shape[1]
110 | sample_num = int(0.1 * num)
111 | p1_index = np.random.choice(num, sample_num, replace=True)
112 | p2_index = np.random.choice(num, sample_num, replace=True)
113 | p3_index = np.random.choice(num, sample_num, replace=True)
114 | gt_p1_index = gt_index[:, p1_index].int()
115 | gt_p2_index = gt_index[:, p2_index].int()
116 | gt_p3_index = gt_index[:, p3_index].int()
117 | pred_p1_index = pred_index[:, p1_index].int()
118 | pred_p2_index = pred_index[:, p2_index].int()
119 | pred_p3_index = pred_index[:, p3_index].int()
120 |
121 | pc_gt_1 = pc_gt[gt_p1_index[0, :].cpu().numpy(), gt_p1_index[1, :].cpu().numpy(), :]
122 | pc_gt_2 = pc_gt[gt_p2_index[0, :].cpu().numpy(), gt_p2_index[1, :].cpu().numpy(), :]
123 | pc_gt_3 = pc_gt[gt_p3_index[0, :].cpu().numpy(), gt_p3_index[1, :].cpu().numpy(), :]
124 | pc_pred_1 = pc_pred[pred_p1_index[0, :].cpu().numpy(), pred_p1_index[1, :].cpu().numpy()]
125 | pc_pred_2 = pc_pred[pred_p2_index[0, :].cpu().numpy(), pred_p2_index[1, :].cpu().numpy()]
126 | pc_pred_3 = pc_pred[pred_p3_index[0, :].cpu().numpy(), pred_p3_index[1, :].cpu().numpy()]
127 | pc_gt_group = torch.cat([pc_gt_1[:, :, np.newaxis],
128 | pc_gt_2[:, :, np.newaxis],
129 | pc_gt_3[:, :, np.newaxis]], 2) # Nx3x3
130 | pc_pred_group = torch.cat([pc_pred_1[:, :, np.newaxis],
131 | pc_pred_2[:, :, np.newaxis],
132 | pc_pred_3[:, :, np.newaxis]], 2) # Nx3x3
133 | pc_gt_group.requires_grad = True
134 | pc_pred_group.requires_grad = True
135 |
136 | gt_p12 = pc_gt_group[:, :, 1] - pc_gt_group[:, :, 0] # Nx3
137 | gt_p13 = pc_gt_group[:, :, 2] - pc_gt_group[:, :, 0] # Nx3
138 | pred_p12 = pc_pred_group[:, :, 1] - pc_pred_group[:, :, 0] # Nx3
139 | pred_p13 = pc_pred_group[:, :, 2] - pc_pred_group[:, :, 0] # Nx3
140 |
141 | gt_normal = torch.cross(gt_p12, gt_p13, dim=1)
142 | gt_norm = torch.norm(gt_normal, 2, dim=1, keepdim=True)
143 | gt_normal = gt_normal / gt_norm
144 | pred_noraml = torch.cross(pred_p12, pred_p13, dim=1)
145 | pred_norm = torch.norm(pred_noraml, 2, dim=1, keepdim=True)
146 | pred_noraml = pred_noraml / pred_norm
147 | loss_p = torch.sum(torch.abs(gt_normal - pred_noraml), dim=1)
148 | loss_p = torch.mean(loss_p)
149 |
150 | loss = loss + loss_p
151 |
152 | return loss
--------------------------------------------------------------------------------
/core/quaternion_distances.py:
--------------------------------------------------------------------------------
1 | # -------------------------------------------------------------------
2 | # Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab
3 | # Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it)
4 | # Released under Creative Commons
5 | # Attribution-NonCommercial-ShareAlike 4.0 International License.
6 | # http://creativecommons.org/licenses/by-nc-sa/4.0/
7 | # -------------------------------------------------------------------
8 | import numpy as np
9 | import torch
10 |
11 |
12 | def quatmultiply(q, r, device='cpu'):
13 | """
14 | Batch quaternion multiplication
15 | Args:
16 | q (torch.Tensor/np.ndarray): shape=[Nx4]
17 | r (torch.Tensor/np.ndarray): shape=[Nx4]
18 | device (str): 'cuda' or 'cpu'
19 |
20 | Returns:
21 | torch.Tensor: shape=[Nx4]
22 | """
23 | if isinstance(q, torch.Tensor):
24 | t = torch.zeros(q.shape[0], 4, device=device)
25 | elif isinstance(q, np.ndarray):
26 | t = np.zeros(q.shape[0], 4)
27 | else:
28 | raise TypeError("Type not supported")
29 | t[:, 0] = r[:, 0] * q[:, 0] - r[:, 1] * q[:, 1] - r[:, 2] * q[:, 2] - r[:, 3] * q[:, 3]
30 | t[:, 1] = r[:, 0] * q[:, 1] + r[:, 1] * q[:, 0] - r[:, 2] * q[:, 3] + r[:, 3] * q[:, 2]
31 | t[:, 2] = r[:, 0] * q[:, 2] + r[:, 1] * q[:, 3] + r[:, 2] * q[:, 0] - r[:, 3] * q[:, 1]
32 | t[:, 3] = r[:, 0] * q[:, 3] - r[:, 1] * q[:, 2] + r[:, 2] * q[:, 1] + r[:, 3] * q[:, 0]
33 | return t
34 |
35 |
36 | def quatinv(q):
37 | """
38 | Batch quaternion inversion
39 | Args:
40 | q (torch.Tensor/np.ndarray): shape=[Nx4]
41 |
42 | Returns:
43 | torch.Tensor/np.ndarray: shape=[Nx4]
44 | """
45 | if isinstance(q, torch.Tensor):
46 | t = q.clone()
47 | elif isinstance(q, np.ndarray):
48 | t = q.copy()
49 | else:
50 | raise TypeError("Type not supported")
51 | t *= -1
52 | t[:, 0] *= -1
53 | return t
54 |
55 |
56 | def quaternion_distance(q, r, device):
57 | """
58 | Batch quaternion distances, used as loss
59 | Args:
60 | q (torch.Tensor): shape=[Nx4]
61 | r (torch.Tensor): shape=[Nx4]
62 | device (str): 'cuda' or 'cpu'
63 |
64 | Returns:
65 | torch.Tensor: shape=[N]
66 | """
67 | t = quatmultiply(q, quatinv(r), device)
68 | return 2 * torch.atan2(torch.norm(t[:, 1:], dim=1), torch.abs(t[:, 0]))
69 |
--------------------------------------------------------------------------------
/core/raft.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 | import torch.nn as nn
4 | import torch.nn.functional as F
5 |
6 | import sys
7 | sys.path.append("core")
8 | from update import BasicUpdateBlock
9 | from extractor import BasicEncoder, BasicEncoder_LIDAR
10 | from corr import CorrBlock, AlternateCorrBlock
11 | from utils import bilinear_sampler, coords_grid, upflow8
12 |
13 | try:
14 | autocast = torch.cuda.amp.autocast
15 | except:
16 | # dummy autocast for PyTorch < 1.6
17 | class autocast:
18 | def __init__(self, enabled):
19 | pass
20 |
21 | def __enter__(self):
22 | pass
23 |
24 | def __exit__(self, *args):
25 | pass
26 |
27 |
28 | class RAFT(nn.Module):
29 | def __init__(self, args):
30 | super(RAFT, self).__init__()
31 | self.args = args
32 |
33 | self.hidden_dim = hdim = 128
34 | self.context_dim = cdim = 128
35 | args.corr_levels = 4
36 | args.corr_radius = 4
37 |
38 | if 'dropout' not in self.args:
39 | self.args.dropout = 0
40 |
41 | if 'alternate_corr' not in self.args:
42 | self.args.alternate_corr = False
43 |
44 | # feature network, context network, and update block
45 | self.fnet = BasicEncoder(output_dim=256, norm_fn='instance', dropout=args.dropout)
46 | self.fnet_lidar = BasicEncoder_LIDAR(output_dim=256, norm_fn='instance', dropout=args.dropout)
47 | self.cnet = BasicEncoder_LIDAR(output_dim=hdim + cdim, norm_fn='batch', dropout=args.dropout)
48 | self.update_block = BasicUpdateBlock(self.args, hidden_dim=hdim)
49 |
50 | def freeze_bn(self):
51 | for m in self.modules():
52 | if isinstance(m, nn.BatchNorm2d):
53 | m.eval()
54 |
55 | def initialize_flow(self, img):
56 | """ Flow is represented as difference between two coordinate grids flow = coords1 - coords0"""
57 | N, C, H, W = img.shape
58 | coords0 = coords_grid(N, H // 8, W // 8).to(img.device)
59 | coords1 = coords_grid(N, H // 8, W // 8).to(img.device)
60 |
61 | # optical flow computed as difference: flow = coords1 - coords0
62 | return coords0, coords1
63 |
64 | def upsample_flow(self, flow, mask):
65 | """ Upsample flow field [H/8, W/8, 2] -> [H, W, 2] using convex combination """
66 | N, _, H, W = flow.shape
67 | mask = mask.view(N, 1, 9, 8, 8, H, W)
68 | mask = torch.softmax(mask, dim=2)
69 |
70 | up_flow = F.unfold(8 * flow, [3, 3], padding=1)
71 | up_flow = up_flow.view(N, 2, 9, 1, 1, H, W)
72 |
73 | up_flow = torch.sum(mask * up_flow, dim=2)
74 | up_flow = up_flow.permute(0, 1, 4, 2, 5, 3)
75 | return up_flow.reshape(N, 2, 8 * H, 8 * W)
76 |
77 | def forward(self, image1, image2, iters=12, lidar_mask=None, flow_init=None, upsample=True, test_mode=False):
78 | """ Estimate optical flow between pair of frames """
79 |
80 | image1 = 2 * image1 - 1.0
81 |
82 | image1 = image1.contiguous()
83 | image2 = image2.contiguous()
84 |
85 | hdim = self.hidden_dim
86 | cdim = self.context_dim
87 |
88 | # run the feature network
89 | with autocast(enabled=self.args.mixed_precision):
90 | fmap2 = self.fnet(image2)
91 | fmap1 = self.fnet_lidar(image1, lidar_mask)
92 |
93 | fmap1 = fmap1.float()
94 | fmap2 = fmap2.float()
95 |
96 | if self.args.alternate_corr:
97 | corr_fn = AlternateCorrBlock(fmap1, fmap2, radius=self.args.corr_radius)
98 | else:
99 | corr_fn = CorrBlock(fmap1, fmap2, radius=self.args.corr_radius)
100 |
101 | # run the context network
102 | with autocast(enabled=self.args.mixed_precision):
103 | cnet = self.cnet(image1, lidar_mask)
104 | net, inp = torch.split(cnet, [hdim, cdim], dim=1)
105 | net = torch.tanh(net)
106 | inp = torch.relu(inp)
107 |
108 | coords0, coords1 = self.initialize_flow(image1)
109 |
110 | if flow_init is not None:
111 | coords1 = coords1 + flow_init
112 |
113 | flow_predictions = []
114 | for itr in range(iters):
115 | coords1 = coords1.detach()
116 |
117 | corr = corr_fn(coords1) # index correlation volume
118 |
119 | flow = coords1 - coords0
120 | with autocast(enabled=self.args.mixed_precision):
121 | net, up_mask, delta_flow = self.update_block(net, inp, corr, flow)
122 |
123 | # F(t+1) = F(t) + \Delta(t)
124 | coords1 = coords1 + delta_flow
125 |
126 | # upsample predictions
127 | if up_mask is None:
128 | flow_up = upflow8(coords1 - coords0)
129 | else:
130 | flow_up = self.upsample_flow(coords1 - coords0, up_mask)
131 |
132 | flow_predictions.append(flow_up)
133 |
134 | if test_mode:
135 | return coords1 - coords0, flow_up
136 |
137 | return flow_predictions
138 |
--------------------------------------------------------------------------------
/core/update.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 |
6 | class FlowHead(nn.Module):
7 | def __init__(self, input_dim=128, hidden_dim=256):
8 | super(FlowHead, self).__init__()
9 | self.conv1 = nn.Conv2d(input_dim, hidden_dim, 3, padding=1)
10 | self.conv2 = nn.Conv2d(hidden_dim, 2, 3, padding=1)
11 | self.relu = nn.ReLU(inplace=True)
12 |
13 | def forward(self, x):
14 | return self.conv2(self.relu(self.conv1(x)))
15 |
16 | class ConvGRU(nn.Module):
17 | def __init__(self, hidden_dim=128, input_dim=192+128):
18 | super(ConvGRU, self).__init__()
19 | self.convz = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1)
20 | self.convr = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1)
21 | self.convq = nn.Conv2d(hidden_dim+input_dim, hidden_dim, 3, padding=1)
22 |
23 | def forward(self, h, x):
24 | hx = torch.cat([h, x], dim=1)
25 |
26 | z = torch.sigmoid(self.convz(hx))
27 | r = torch.sigmoid(self.convr(hx))
28 | q = torch.tanh(self.convq(torch.cat([r*h, x], dim=1)))
29 |
30 | h = (1-z) * h + z * q
31 | return h
32 |
33 | class SepConvGRU(nn.Module):
34 | def __init__(self, hidden_dim=128, input_dim=192+128):
35 | super(SepConvGRU, self).__init__()
36 | self.convz1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2))
37 | self.convr1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2))
38 | self.convq1 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (1,5), padding=(0,2))
39 |
40 | self.convz2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0))
41 | self.convr2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0))
42 | self.convq2 = nn.Conv2d(hidden_dim+input_dim, hidden_dim, (5,1), padding=(2,0))
43 |
44 |
45 | def forward(self, h, x):
46 | # horizontal
47 | hx = torch.cat([h, x], dim=1)
48 | z = torch.sigmoid(self.convz1(hx))
49 | r = torch.sigmoid(self.convr1(hx))
50 | q = torch.tanh(self.convq1(torch.cat([r*h, x], dim=1)))
51 | h = (1-z) * h + z * q
52 |
53 | # vertical
54 | hx = torch.cat([h, x], dim=1)
55 | z = torch.sigmoid(self.convz2(hx))
56 | r = torch.sigmoid(self.convr2(hx))
57 | q = torch.tanh(self.convq2(torch.cat([r*h, x], dim=1)))
58 | h = (1-z) * h + z * q
59 |
60 | return h
61 |
62 |
63 | class BasicMotionEncoder(nn.Module):
64 | def __init__(self, args):
65 | super(BasicMotionEncoder, self).__init__()
66 | cor_planes = args.corr_levels * (2*args.corr_radius + 1)**2
67 | self.convc1 = nn.Conv2d(cor_planes, 256, 1, padding=0)
68 | self.convc2 = nn.Conv2d(256, 192, 3, padding=1)
69 | self.convf1 = nn.Conv2d(2, 128, 7, padding=3)
70 | self.convf2 = nn.Conv2d(128, 64, 3, padding=1)
71 | self.conv = nn.Conv2d(64+192, 128-2, 3, padding=1)
72 |
73 | def forward(self, flow, corr):
74 | cor = F.relu(self.convc1(corr))
75 | cor = F.relu(self.convc2(cor))
76 | flo = F.relu(self.convf1(flow))
77 | flo = F.relu(self.convf2(flo))
78 |
79 | cor_flo = torch.cat([cor, flo], dim=1)
80 | out = F.relu(self.conv(cor_flo))
81 | return torch.cat([out, flow], dim=1)
82 |
83 | class BasicUpdateBlock(nn.Module):
84 | def __init__(self, args, hidden_dim=128, input_dim=128):
85 | super(BasicUpdateBlock, self).__init__()
86 | self.args = args
87 | self.encoder = BasicMotionEncoder(args)
88 | self.gru = SepConvGRU(hidden_dim=hidden_dim, input_dim=128+hidden_dim)
89 | self.flow_head = FlowHead(hidden_dim, hidden_dim=256)
90 |
91 | self.mask = nn.Sequential(
92 | nn.Conv2d(128, 256, 3, padding=1),
93 | nn.ReLU(inplace=True),
94 | nn.Conv2d(256, 64*9, 1, padding=0))
95 |
96 | def forward(self, net, inp, corr, flow, upsample=True):
97 | motion_features = self.encoder(flow, corr)
98 | inp = torch.cat([inp, motion_features], dim=1)
99 |
100 | net = self.gru(net, inp)
101 |
102 | delta_flow = self.flow_head(net)
103 |
104 | # scale mask to balence gradients
105 | mask = .25 * self.mask(net)
106 | return net, mask, delta_flow
107 |
108 |
109 |
110 |
--------------------------------------------------------------------------------
/core/utils.py:
--------------------------------------------------------------------------------
1 | import torch.optim as optim
2 | from tensorboardX import SummaryWriter
3 | import torch
4 | import torch.nn.functional as F
5 | import numpy as np
6 | from scipy import interpolate
7 |
8 |
9 | class Logger:
10 | def __init__(self, model, scheduler, SUM_FREQ=100):
11 | self.model = model
12 | self.scheduler = scheduler
13 | self.total_steps = 0
14 | self.running_loss = {}
15 | self.writer = None
16 | self.SUM_FREQ = SUM_FREQ
17 |
18 | def _print_training_status(self):
19 | metrics_data = [self.running_loss[k] / self.SUM_FREQ for k in sorted(self.running_loss.keys())]
20 | training_str = "[{:6d}, {:10.7f}] ".format(self.total_steps + 1, self.scheduler.get_last_lr()[0])
21 | metrics_str = ("{:10.4f}, " * len(metrics_data)).format(*metrics_data)
22 |
23 | # print the training status
24 | print(training_str + metrics_str)
25 |
26 | if self.writer is None:
27 | self.writer = SummaryWriter()
28 |
29 | for k in self.running_loss:
30 | self.writer.add_scalar(k, self.running_loss[k] / self.SUM_FREQ, self.total_steps)
31 | self.running_loss[k] = 0.0
32 |
33 | def push(self, metrics):
34 | self.total_steps += 1
35 |
36 | for key in metrics:
37 | if key not in self.running_loss:
38 | self.running_loss[key] = 0.0
39 |
40 | self.running_loss[key] += metrics[key]
41 |
42 | if self.total_steps % self.SUM_FREQ == self.SUM_FREQ - 1:
43 | self._print_training_status()
44 | self.running_loss = {}
45 |
46 | def write_dict(self, results):
47 | if self.writer is None:
48 | self.writer = SummaryWriter()
49 |
50 | for key in results:
51 | self.writer.add_scalar(key, results[key], self.total_steps)
52 |
53 | def close(self):
54 | self.writer.close()
55 |
56 | def count_parameters(model):
57 | return sum(p.numel() for p in model.parameters() if p.requires_grad)
58 |
59 | def fetch_optimizer(args, nums, model):
60 | """ Create the optimizer and learning rate scheduler """
61 | optimizer = optim.AdamW(model.parameters(), lr=args.lr, weight_decay=args.wdecay, eps=args.epsilon)
62 |
63 | # scheduler = optim.lr_scheduler.OneCycleLR(optimizer, args.lr, args.num_steps+100,
64 | # pct_start=0.05, cycle_momentum=False, anneal_strategy='linear')
65 | scheduler = optim.lr_scheduler.OneCycleLR(optimizer, args.lr, args.epochs * nums + 100,
66 | pct_start=0.05, cycle_momentum=False, anneal_strategy='linear')
67 | return optimizer, scheduler
68 |
69 |
70 | def merge_inputs(queries):
71 | point_clouds = []
72 | imgs = []
73 | reflectances = []
74 | returns = {key: default_collate([d[key] for d in queries]) for key in queries[0]
75 | if key != 'point_cloud' and key != 'rgb' and key != 'reflectance'}
76 | for input in queries:
77 | point_clouds.append(input['point_cloud'])
78 | imgs.append(input['rgb'])
79 | if 'reflectance' in input:
80 | reflectances.append(input['reflectance'])
81 | returns['point_cloud'] = point_clouds
82 | returns['rgb'] = imgs
83 | if len(reflectances) > 0:
84 | returns['reflectance'] = reflectances
85 | return returns
86 |
87 |
88 | class InputPadder:
89 | """ Pads images such that dimensions are divisible by 8 """
90 |
91 | def __init__(self, dims, mode='sintel'):
92 | self.ht, self.wd = dims[-2:]
93 | pad_ht = (((self.ht // 8) + 1) * 8 - self.ht) % 8
94 | pad_wd = (((self.wd // 8) + 1) * 8 - self.wd) % 8
95 | if mode == 'sintel':
96 | self._pad = [pad_wd // 2, pad_wd - pad_wd // 2, pad_ht // 2, pad_ht - pad_ht // 2]
97 | else:
98 | self._pad = [pad_wd // 2, pad_wd - pad_wd // 2, 0, pad_ht]
99 |
100 | def pad(self, *inputs):
101 | return [F.pad(x, self._pad, mode='replicate') for x in inputs]
102 |
103 | def unpad(self, x):
104 | ht, wd = x.shape[-2:]
105 | c = [self._pad[2], ht - self._pad[3], self._pad[0], wd - self._pad[1]]
106 | return x[..., c[0]:c[1], c[2]:c[3]]
107 |
108 |
109 | def forward_interpolate(flow):
110 | flow = flow.detach().cpu().numpy()
111 | dx, dy = flow[0], flow[1]
112 |
113 | ht, wd = dx.shape
114 | x0, y0 = np.meshgrid(np.arange(wd), np.arange(ht))
115 |
116 | x1 = x0 + dx
117 | y1 = y0 + dy
118 |
119 | x1 = x1.reshape(-1)
120 | y1 = y1.reshape(-1)
121 | dx = dx.reshape(-1)
122 | dy = dy.reshape(-1)
123 |
124 | valid = (x1 > 0) & (x1 < wd) & (y1 > 0) & (y1 < ht)
125 | x1 = x1[valid]
126 | y1 = y1[valid]
127 | dx = dx[valid]
128 | dy = dy[valid]
129 |
130 | flow_x = interpolate.griddata(
131 | (x1, y1), dx, (x0, y0), method='nearest', fill_value=0)
132 |
133 | flow_y = interpolate.griddata(
134 | (x1, y1), dy, (x0, y0), method='nearest', fill_value=0)
135 |
136 | flow = np.stack([flow_x, flow_y], axis=0)
137 | return torch.from_numpy(flow).float()
138 |
139 |
140 | def bilinear_sampler(img, coords, mode='bilinear', mask=False):
141 | """ Wrapper for grid_sample, uses pixel coordinates """
142 | H, W = img.shape[-2:]
143 | xgrid, ygrid = coords.split([1, 1], dim=-1)
144 | xgrid = 2 * xgrid / (W - 1) - 1
145 | ygrid = 2 * ygrid / (H - 1) - 1
146 |
147 | grid = torch.cat([xgrid, ygrid], dim=-1)
148 | img = F.grid_sample(img, grid, align_corners=True)
149 |
150 | if mask:
151 | mask = (xgrid > -1) & (ygrid > -1) & (xgrid < 1) & (ygrid < 1)
152 | return img, mask.float()
153 |
154 | return img
155 |
156 |
157 | def coords_grid(batch, ht, wd):
158 | coords = torch.meshgrid(torch.arange(ht), torch.arange(wd))
159 | coords = torch.stack(coords[::-1], dim=0).float()
160 | return coords[None].repeat(batch, 1, 1, 1)
161 |
162 |
163 | def upflow8(flow, mode='bilinear'):
164 | new_size = (8 * flow.shape[2], 8 * flow.shape[3])
165 | return 8 * F.interpolate(flow, size=new_size, mode=mode, align_corners=True)
--------------------------------------------------------------------------------
/core/utils_point.py:
--------------------------------------------------------------------------------
1 | # -------------------------------------------------------------------
2 | # Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab
3 | # Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it)
4 | # Released under Creative Commons
5 | # Attribution-NonCommercial-ShareAlike 4.0 International License.
6 | # http://creativecommons.org/licenses/by-nc-sa/4.0/
7 | # -------------------------------------------------------------------
8 | import math
9 | import cv2
10 | from PIL import Image
11 | import mathutils
12 | import numpy as np
13 | import torch
14 | from torchvision import transforms
15 | import torch.nn.functional as F
16 | from matplotlib import cm
17 | from torch.utils.data.dataloader import default_collate
18 |
19 |
20 | def rotate_points(PC, R, T=None, inverse=True):
21 | if T is not None:
22 | R = R.to_matrix()
23 | R.resize_4x4()
24 | T = mathutils.Matrix.Translation(T)
25 | RT = T*R
26 | else:
27 | RT=R.copy()
28 | if inverse:
29 | RT.invert_safe()
30 | RT = torch.tensor(RT, device=PC.device, dtype=torch.float)
31 |
32 | if PC.shape[0] == 4:
33 | PC = torch.mm(RT, PC)
34 | elif PC.shape[1] == 4:
35 | PC = torch.mm(RT, PC.t())
36 | PC = PC.t()
37 | else:
38 | raise TypeError("Point cloud must have shape [Nx4] or [4xN] (homogeneous coordinates)")
39 | return PC
40 |
41 |
42 | def rotate_points_torch(PC, R, T=None, inverse=True):
43 | if T is not None:
44 | R = quat2mat(R)
45 | T = tvector2mat(T)
46 | RT = torch.mm(T, R)
47 | else:
48 | RT = R.clone()
49 | if inverse:
50 | RT = RT.inverse()
51 | if PC.shape[0] == 4:
52 | PC = torch.mm(RT, PC)
53 | elif PC.shape[1] == 4:
54 | PC = torch.mm(RT, PC.t())
55 | PC = PC.t()
56 | else:
57 | raise TypeError("Point cloud must have shape [Nx4] or [4xN] (homogeneous coordinates)")
58 | return PC
59 |
60 |
61 | def rotate_forward(PC, R, T=None):
62 | """
63 | Transform the point cloud PC, so to have the points 'as seen from' the new
64 | pose T*R
65 | Args:
66 | PC (torch.Tensor): Point Cloud to be transformed, shape [4xN] or [Nx4]
67 | R (torch.Tensor/mathutils.Euler): can be either:
68 | * (mathutils.Euler) euler angles of the rotation part, in this case T cannot be None
69 | * (torch.Tensor shape [4]) quaternion representation of the rotation part, in this case T cannot be None
70 | * (mathutils.Matrix shape [4x4]) Rotation matrix,
71 | in this case it should contains the translation part, and T should be None
72 | * (torch.Tensor shape [4x4]) Rotation matrix,
73 | in this case it should contains the translation part, and T should be None
74 | T (torch.Tensor/mathutils.Vector): Translation of the new pose, shape [3], or None (depending on R)
75 |
76 | Returns:
77 | torch.Tensor: Transformed Point Cloud 'as seen from' pose T*R
78 | """
79 | if isinstance(R, torch.Tensor):
80 | return rotate_points_torch(PC, R, T, inverse=True)
81 | else:
82 | return rotate_points(PC, R, T, inverse=True)
83 |
84 |
85 | def rotate_back(PC_ROTATED, R, T=None):
86 | """
87 | Inverse of :func:`~utils.rotate_forward`.
88 | """
89 | if isinstance(R, torch.Tensor):
90 | return rotate_points_torch(PC_ROTATED, R, T, inverse=False)
91 | else:
92 | return rotate_points(PC_ROTATED, R, T, inverse=False)
93 |
94 |
95 | def invert_pose(R, T):
96 | """
97 | Given the 'sampled pose' (aka H_init), we want CMRNet to predict inv(H_init).
98 | inv(T*R) will be used as ground truth for the network.
99 | Args:
100 | R (mathutils.Euler): Rotation of 'sampled pose'
101 | T (mathutils.Vector): Translation of 'sampled pose'
102 |
103 | Returns:
104 | (R_GT, T_GT) = (mathutils.Quaternion, mathutils.Vector)
105 | """
106 | R = R.to_matrix()
107 | R.resize_4x4()
108 | T = mathutils.Matrix.Translation(T)
109 | RT = T * R
110 | RT.invert_safe()
111 | T_GT, R_GT, _ = RT.decompose()
112 | return R_GT.normalized(), T_GT
113 |
114 |
115 | def merge_inputs(queries):
116 | point_clouds = []
117 | imgs = []
118 | reflectances = []
119 | returns = {key: default_collate([d[key] for d in queries]) for key in queries[0]
120 | if key != 'point_cloud' and key != 'rgb' and key != 'reflectance'}
121 | for input in queries:
122 | point_clouds.append(input['point_cloud'])
123 | imgs.append(input['rgb'])
124 | if 'reflectance' in input:
125 | reflectances.append(input['reflectance'])
126 | returns['point_cloud'] = point_clouds
127 | returns['rgb'] = imgs
128 | if len(reflectances) > 0:
129 | returns['reflectance'] = reflectances
130 | return returns
131 |
132 |
133 | def quaternion_from_matrix(matrix):
134 | """
135 | Convert a rotation matrix to quaternion.
136 | Args:
137 | matrix (torch.Tensor): [4x4] transformation matrix or [3,3] rotation matrix.
138 |
139 | Returns:
140 | torch.Tensor: shape [4], normalized quaternion
141 | """
142 | if matrix.shape == (4, 4):
143 | R = matrix[:-1, :-1]
144 | elif matrix.shape == (3, 3):
145 | R = matrix
146 | else:
147 | raise TypeError("Not a valid rotation matrix")
148 | tr = R[0, 0] + R[1, 1] + R[2, 2]
149 | q = torch.zeros(4, device=matrix.device)
150 | if tr > 0.:
151 | S = (tr+1.0).sqrt() * 2
152 | q[0] = 0.25 * S
153 | q[1] = (R[2, 1] - R[1, 2]) / S
154 | q[2] = (R[0, 2] - R[2, 0]) / S
155 | q[3] = (R[1, 0] - R[0, 1]) / S
156 | elif R[0, 0] > R[1, 1] and R[0, 0] > R[2, 2]:
157 | S = (1.0 + R[0, 0] - R[1, 1] - R[2, 2]).sqrt() * 2
158 | q[0] = (R[2, 1] - R[1, 2]) / S
159 | q[1] = 0.25 * S
160 | q[2] = (R[0, 1] + R[1, 0]) / S
161 | q[3] = (R[0, 2] + R[2, 0]) / S
162 | elif R[1, 1] > R[2, 2]:
163 | S = (1.0 + R[1, 1] - R[0, 0] - R[2, 2]).sqrt() * 2
164 | q[0] = (R[0, 2] - R[2, 0]) / S
165 | q[1] = (R[0, 1] + R[1, 0]) / S
166 | q[2] = 0.25 * S
167 | q[3] = (R[1, 2] + R[2, 1]) / S
168 | else:
169 | S = (1.0 + R[2, 2] - R[0, 0] - R[1, 1]).sqrt() * 2
170 | q[0] = (R[1, 0] - R[0, 1]) / S
171 | q[1] = (R[0, 2] + R[2, 0]) / S
172 | q[2] = (R[1, 2] + R[2, 1]) / S
173 | q[3] = 0.25 * S
174 | return q / q.norm()
175 |
176 |
177 | def quatmultiply(q, r):
178 | """
179 | Multiply two quaternions
180 | Args:
181 | q (torch.Tensor/nd.ndarray): shape=[4], first quaternion
182 | r (torch.Tensor/nd.ndarray): shape=[4], second quaternion
183 |
184 | Returns:
185 | torch.Tensor: shape=[4], normalized quaternion q*r
186 | """
187 | t = torch.zeros(4, device=q.device)
188 | t[0] = r[0] * q[0] - r[1] * q[1] - r[2] * q[2] - r[3] * q[3]
189 | t[1] = r[0] * q[1] + r[1] * q[0] - r[2] * q[3] + r[3] * q[2]
190 | t[2] = r[0] * q[2] + r[1] * q[3] + r[2] * q[0] - r[3] * q[1]
191 | t[3] = r[0] * q[3] - r[1] * q[2] + r[2] * q[1] + r[3] * q[0]
192 | return t / t.norm()
193 |
194 |
195 | def quat2mat(q):
196 | """
197 | Convert a quaternion to a rotation matrix
198 | Args:
199 | q (torch.Tensor): shape [4], input quaternion
200 |
201 | Returns:
202 | torch.Tensor: [4x4] homogeneous rotation matrix
203 | """
204 | assert q.shape == torch.Size([4]), "Not a valid quaternion"
205 | if q.norm() != 1.:
206 | q = q / q.norm()
207 | mat = torch.zeros((4, 4), device=q.device)
208 | mat[0, 0] = 1 - 2*q[2]**2 - 2*q[3]**2
209 | mat[0, 1] = 2*q[1]*q[2] - 2*q[3]*q[0]
210 | mat[0, 2] = 2*q[1]*q[3] + 2*q[2]*q[0]
211 | mat[1, 0] = 2*q[1]*q[2] + 2*q[3]*q[0]
212 | mat[1, 1] = 1 - 2*q[1]**2 - 2*q[3]**2
213 | mat[1, 2] = 2*q[2]*q[3] - 2*q[1]*q[0]
214 | mat[2, 0] = 2*q[1]*q[3] - 2*q[2]*q[0]
215 | mat[2, 1] = 2*q[2]*q[3] + 2*q[1]*q[0]
216 | mat[2, 2] = 1 - 2*q[1]**2 - 2*q[2]**2
217 | mat[3, 3] = 1.
218 | return mat
219 |
220 |
221 | def tvector2mat(t):
222 | """
223 | Translation vector to homogeneous transformation matrix with identity rotation
224 | Args:
225 | t (torch.Tensor): shape=[3], translation vector
226 |
227 | Returns:
228 | torch.Tensor: [4x4] homogeneous transformation matrix
229 |
230 | """
231 | assert t.shape == torch.Size([3]), "Not a valid translation"
232 | mat = torch.eye(4, device=t.device)
233 | mat[0, 3] = t[0]
234 | mat[1, 3] = t[1]
235 | mat[2, 3] = t[2]
236 | return mat
237 |
238 |
239 | def mat2xyzrpy(rotmatrix):
240 | """
241 | Decompose transformation matrix into components
242 | Args:
243 | rotmatrix (torch.Tensor/np.ndarray): [4x4] transformation matrix
244 |
245 | Returns:
246 | torch.Tensor: shape=[6], contains xyzrpy
247 | """
248 | roll = math.atan2(-rotmatrix[1, 2], rotmatrix[2, 2])
249 | pitch = math.asin ( rotmatrix[0, 2])
250 | yaw = math.atan2(-rotmatrix[0, 1], rotmatrix[0, 0])
251 | x = rotmatrix[:3, 3][0]
252 | y = rotmatrix[:3, 3][1]
253 | z = rotmatrix[:3, 3][2]
254 |
255 | return torch.tensor([x, y, z, roll, pitch, yaw], device=rotmatrix.device, dtype=rotmatrix.dtype)
256 |
257 |
258 | def to_rotation_matrix(R, T):
259 | R = quat2mat(R)
260 | T = tvector2mat(T)
261 | RT = torch.mm(T, R)
262 | return RT
263 |
264 |
265 | def rotation_vector_to_euler(rvecs):
266 | R, _ = cv2.Rodrigues(rvecs)
267 |
268 | sy = np.sqrt(R[0, 0] * R[0, 0] + R[1, 0] * R[1, 0])
269 |
270 | singular = sy < 1e-6
271 |
272 | if not singular:
273 | x = np.arctan2(R[2, 1], R[2, 2])
274 | y = np.arctan2(-R[2, 0], sy)
275 | z = np.arctan2(R[1, 0], R[0, 0])
276 | else:
277 | x = np.arctan2(-R[1, 2], R[1, 1])
278 | y = np.arctan2(-R[2, 0], sy)
279 | z = 0
280 |
281 | return np.array([x, y, z])
282 |
283 |
284 | def overlay_imgs(rgb, lidar):
285 | std = [0.229, 0.224, 0.225]
286 | mean = [0.485, 0.456, 0.406]
287 |
288 | rgb = rgb.clone().cpu().permute(1,2,0).numpy()
289 | rgb = rgb*std+mean
290 |
291 | # rgb = cv2.resize(rgb, (120, 40))
292 | # lidar = lidar.cpu().numpy()
293 | # lidar = cv2.resize(lidar, (120, 40))
294 |
295 | lidar[lidar == 0] = 1000.
296 | lidar = -lidar
297 |
298 | lidar = lidar.clone()
299 | lidar = lidar.unsqueeze(0)
300 | lidar = lidar.unsqueeze(0)
301 |
302 | lidar = F.max_pool2d(lidar, 3, 1, 1)
303 | lidar = -lidar
304 | lidar[lidar == 1000.] = 0.
305 |
306 | lidar = lidar[0][0]
307 |
308 |
309 | lidar = lidar.cpu().numpy()
310 | min_d = 0
311 | max_d = np.max(lidar)
312 | lidar = ((lidar - min_d) / (max_d - min_d)) * 255
313 | lidar = lidar.astype(np.uint8)
314 | lidar_color = cm.jet(lidar)
315 | lidar_color[:, :, 3] = 0.5
316 | lidar_color[lidar == 0] = [0, 0, 0, 0]
317 | blended_img = lidar_color[:, :, :3] * (np.expand_dims(lidar_color[:, :, 3], 2)) \
318 | + rgb * (1. - np.expand_dims(lidar_color[:, :, 3], 2))
319 | blended_img = blended_img.clip(min=0., max=1.)
320 |
321 | blended_img = cv2.cvtColor((blended_img*255).astype(np.uint8), cv2.COLOR_BGR2RGB)
322 | # cv2.imwrite(f'./images/output/{idx:06d}_{name}.png', blended_img)
323 | return blended_img
324 |
325 |
--------------------------------------------------------------------------------
/core/visibility_package/setup.py:
--------------------------------------------------------------------------------
1 | # -------------------------------------------------------------------
2 | # Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab
3 | # Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it)
4 | # Released under Creative Commons
5 | # Attribution-NonCommercial-ShareAlike 4.0 International License.
6 | # http://creativecommons.org/licenses/by-nc-sa/4.0/
7 | # -------------------------------------------------------------------
8 | from setuptools import setup
9 | from torch.utils.cpp_extension import BuildExtension, CUDAExtension
10 |
11 | setup(
12 | name='visibility',
13 | version='0.1',
14 | author="Daniele Cattaneo",
15 | author_email="cattaneo@informatik.uni-freiburg.de",
16 | url="https://github.com/catta202000/CMRNet",
17 | ext_modules=[
18 | CUDAExtension('visibility', [
19 | './visibility_new.cpp',
20 | './visibility_kernel_new.cu',
21 | ])
22 | ],
23 | cmdclass={
24 | 'build_ext': BuildExtension
25 | })
--------------------------------------------------------------------------------
/core/visibility_package/visibility.cpp:
--------------------------------------------------------------------------------
1 | // -------------------------------------------------------------------
2 | // Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab
3 | // Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it)
4 | // Released under Creative Commons
5 | // Attribution-NonCommercial-ShareAlike 4.0 International License.
6 | // http://creativecommons.org/licenses/by-nc-sa/4.0/
7 | // -------------------------------------------------------------------
8 | #include
9 | #include
10 |
11 | #include
12 | #include
13 |
14 | #include
15 |
16 | at::Tensor depth_image_cuda(at::Tensor input_uv, at::Tensor input_depth, at::Tensor output, unsigned int size, unsigned int width,unsigned int height);
17 | at::Tensor visibility_filter_cuda(at::Tensor input, at::Tensor output, unsigned int width, unsigned int height, unsigned int threshold);
18 | at::Tensor visibility_filter_cuda2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor output, unsigned int width,unsigned int height, float threshold, unsigned int radius);
19 | at::Tensor downsample_flow_cuda(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel);
20 | at::Tensor downsample_mask_cuda(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel);
21 | //at::Tensor visibility_filter_cuda_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width,unsigned int height, float threshold);
22 |
23 | // C++ interface
24 |
25 | #define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor")
26 | #define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")
27 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
28 |
29 | at::Tensor depth_image(at::Tensor input_uv, at::Tensor input_depth, at::Tensor output, unsigned int size, unsigned int width,unsigned int height) {
30 | CHECK_INPUT(input_uv);
31 | CHECK_INPUT(input_depth);
32 | CHECK_INPUT(output);
33 | return depth_image_cuda(input_uv, input_depth, output, size, width, height);
34 | }
35 |
36 | at::Tensor visibility_filter(at::Tensor input, at::Tensor output, unsigned int width, unsigned int height, unsigned int threshold){
37 | CHECK_INPUT(input);
38 | CHECK_INPUT(output);
39 | return visibility_filter_cuda(input, output, width, height, threshold);
40 | }
41 |
42 | at::Tensor visibility_filter2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor output, unsigned int width, unsigned int height, float threshold, unsigned int radius){
43 | CHECK_INPUT(input_depth);
44 | CHECK_INPUT(intrinsic);
45 | CHECK_INPUT(output);
46 | return visibility_filter_cuda2(input_depth, intrinsic, output, width, height, threshold, radius);
47 | }
48 |
49 | at::Tensor downsample_flow(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) {
50 | CHECK_INPUT(input_uv);
51 | CHECK_INPUT(output);
52 | return downsample_flow_cuda(input_uv, output, width_out, height_out, kernel);
53 | }
54 |
55 | at::Tensor downsample_mask(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) {
56 | CHECK_INPUT(input_mask);
57 | CHECK_INPUT(output);
58 | return downsample_mask_cuda(input_mask, output, width_out, height_out, kernel);
59 | }
60 |
61 | /*at::Tensor visibility_filter_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width, unsigned int height, float threshold){
62 | CHECK_INPUT(input_depth);
63 | CHECK_INPUT(input_points);
64 | CHECK_INPUT(output);
65 | return visibility_filter_cuda_shared(input_depth, input_points, output, width, height, threshold);
66 | }*/
67 |
68 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
69 | m.def("depth_image", &depth_image, "Generate Depth Image(CUDA)");
70 | m.def("visibility", &visibility_filter, "visibility_filter (CUDA)");
71 | m.def("visibility2", &visibility_filter2, "visibility_filter2 (CUDA)");
72 | m.def("downsample_flow", &downsample_flow, "downsample_flow (CUDA)");
73 | m.def("downsample_mask", &downsample_mask, "downsample_mask (CUDA)");
74 | //m.def("visibility_shared", &visibility_filter2, "visibility_filter_shared (CUDA)");
75 | }
76 |
--------------------------------------------------------------------------------
/core/visibility_package/visibility_kernel.cu:
--------------------------------------------------------------------------------
1 | // -------------------------------------------------------------------
2 | // Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab
3 | // Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it)
4 | // Released under Creative Commons
5 | // Attribution-NonCommercial-ShareAlike 4.0 International License.
6 | // http://creativecommons.org/licenses/by-nc-sa/4.0/
7 | // -------------------------------------------------------------------
8 | #include
9 | #include
10 | #include
11 |
12 | #include
13 | #include
14 |
15 | #define IMAGE_SIZE 10000
16 | #define TILE_W 25 //Tile Width
17 | #define TILE_H 25 //Tile Height
18 | #define R 3 //Filter Radius
19 | #define D (2*R+1) //Filter Diameter
20 | #define BLOCK_W (TILE_W+(2*R))
21 | #define BLOCK_H (TILE_H+(2*R))
22 |
23 |
24 | __device__ __forceinline__ int floatToOrderedInt( float floatVal ) {
25 | int intVal = __float_as_int( floatVal );
26 | return (intVal >= 0 ) ? intVal : intVal ^ 0x7FFFFFFF;
27 | }
28 |
29 | __device__ __forceinline__ float orderedIntToFloat( int intVal ) {
30 | return __int_as_float( (intVal >= 0) ? intVal : intVal ^ 0x7FFFFFFF);
31 | }
32 |
33 | __global__ void depth_image_kernel(
34 | const int* __restrict__ in_uv,
35 | const float* __restrict__ in_depth,
36 | float* __restrict__ out,
37 | unsigned int size, unsigned int width, unsigned int height) {
38 |
39 | int index = threadIdx.x + blockIdx.x * blockDim.x;
40 | if(index >= 0 && index < size) {
41 | int uv_index = in_uv[2*index+1]*width + in_uv[2*index];
42 | atomicMin((int *)&out[uv_index], floatToOrderedInt(in_depth[index]));
43 | }
44 |
45 | }
46 |
47 | template
48 | __global__ void visibility_kernel(
49 | const scalar_t* __restrict__ in,
50 | scalar_t* __restrict__ out,
51 | unsigned int width, unsigned int height, unsigned int threshold) {
52 | //__shared__ scalar_t sharedImg[BLOCK_W*BLOCK_H];
53 | //int x = blockIdx.x*TILE_W + threadIdx.x - R;
54 | //int y = blockIdx.y*TILE_H + threadIdx.y - R;
55 |
56 | int x = threadIdx.x + blockIdx.x * blockDim.x;
57 | int y = threadIdx.y + blockIdx.y * blockDim.y;
58 |
59 | // clamp to edge of image
60 | x = max(0, x);
61 | x = min(x, height-1);
62 | y = max(y, 0);
63 | y = min(y, width-1);
64 | //unsigned int index = y*width + x;
65 | unsigned int index = x*width + y;
66 | //unsigned int bindex = threadIdx.y*blockDim.y+threadIdx.x;
67 |
68 | // each thread copy its pixel to the shared memory
69 | //sharedImg[bindex] = in[index];
70 | //__syncthreads();
71 |
72 | //index =
73 |
74 | //out[index] = 7.0;
75 | //return;
76 |
77 | // only threads inside the apron write the results
78 | //if ((threadIdx.x >= R) && (threadIdx.x < (BLOCK_W-R)) &&
79 | // (threadIdx.y >= R) && (threadIdx.y < (BLOCK_H-R))) {
80 | out[index] = in[index];
81 | if(x >= R && y >= R && x < (height-R) && y < (width-R)) {
82 | scalar_t pixel = in[index];
83 | if (pixel != 0.) {
84 | int sum = 0;
85 | int count = 0;
86 | for(int i=-R; i<=R; i++) {
87 | for(int j=-R; j<=R; j++) {
88 | if(i==0 && j==0)
89 | continue;
90 | int temp_index = index + i*width + j;
91 | scalar_t temp_pixel = in[temp_index];
92 | if(temp_pixel != 0 ) {
93 | count += 1;
94 | if(temp_pixel < pixel - 3.)
95 | sum += 1;
96 | }
97 | }
98 | }
99 | if(sum >= 1+threshold * count / (R*R*2*2))
100 | out[index] = 0.;
101 | }
102 |
103 | }
104 | }
105 |
106 |
107 | template
108 | __device__ __forceinline__ scalar_t norm(scalar_t x, scalar_t y, scalar_t z) {
109 | scalar_t sum = x*x + y*y + z*z;
110 | return sum;
111 | }
112 |
113 |
114 | template
115 | __global__ void visibility_kernel2(
116 | const scalar_t* __restrict__ in_depth,
117 | const scalar_t* __restrict__ intrinsic,
118 | scalar_t* __restrict__ out,
119 | unsigned int width, unsigned int height, float threshold, unsigned int radius) {
120 |
121 | int x = threadIdx.x + blockIdx.x * blockDim.x;
122 | int y = threadIdx.y + blockIdx.y * blockDim.y;
123 |
124 | // clamp to edge of image
125 | x = max(0, x);
126 | x = min(x, height-1);
127 | y = max(y, 0);
128 | y = min(y, width-1);
129 |
130 | //unsigned int index = y*width + x;
131 | unsigned int index = x*width + y;
132 | out[index] = in_depth[index];
133 |
134 | bool debug=false;
135 |
136 | if(x >= 0 && y >= 0 && x < (height) && y < (width)) {
137 |
138 | scalar_t pixel = in_depth[index];
139 | if (pixel != 0.) {
140 | scalar_t fx, fy, cx, cy;
141 | fx = intrinsic[0];
142 | fy = intrinsic[1];
143 | cx = intrinsic[2];
144 | cy = intrinsic[3];
145 |
146 | scalar_t v_x, v_y, v_z, v2_x, v2_y, v2_z;
147 | v_x = (y - cx) * pixel / fx;
148 | v_y = (x - cy) * pixel / fy;
149 | v_z = pixel;
150 |
151 | if(debug)
152 | printf("V: %f %f %f \n", v_x, v_y, v_z);
153 |
154 | scalar_t v_norm = norm(-v_x, -v_y, -v_z);
155 | if(debug)
156 | printf("V_norm: %f \n", v_norm);
157 |
158 | if(v_norm <= 0.)
159 | return;
160 | v_norm = sqrt(v_norm);
161 | v2_x = -v_x / v_norm;
162 | v2_y = -v_y / v_norm;
163 | v2_z = -v_z / v_norm;
164 | if(debug) {
165 | printf("-V_normalized %f, %f, %f \n", v2_x, v2_y, v2_z);
166 | printf("SubMatrix1: \n");
167 | }
168 |
169 | scalar_t max_dot1, max_dot2, max_dot3, max_dot4;
170 | max_dot1 = -1.;
171 | max_dot2 = -1.;
172 | max_dot3 = -1.;
173 | max_dot4 = -1.;
174 |
175 | for(int i=-radius; i<=0; i++) {
176 | for(int j=-radius; j<=0; j++) {
177 | if(x+i <0 || x+i >= height || y+j < 0 || y+j >= width)
178 | break;
179 | int temp_index = index + i*width + j;
180 | scalar_t temp_pixel = in_depth[temp_index];
181 | if (temp_pixel==0.)
182 | continue;
183 |
184 | scalar_t c_x, c_y, c_z;
185 | c_x = (y+j - cx) * temp_pixel / fx;
186 | c_y = (x+i - cy) * temp_pixel / fy;
187 | c_z = temp_pixel;
188 |
189 | c_x = c_x - v_x;
190 | c_y = c_y - v_y;
191 | c_z = c_z - v_z;
192 | scalar_t c_norm = norm(c_x, c_y, c_z);
193 | if(c_norm <= 0.)
194 | continue;
195 | c_norm = sqrt(c_norm);
196 | c_x = c_x / c_norm;
197 | c_y = c_y / c_norm;
198 | c_z = c_z / c_norm;
199 | scalar_t dot_prod = c_x*v2_x + c_y*v2_y + c_z*v2_z;
200 | if(debug) {
201 | printf("%d, %d : %f %f %f \n", i, j, c_x, c_y, c_z);
202 | printf("DotProd: %f \n", dot_prod);
203 | }
204 | if(dot_prod > max_dot1) {
205 | max_dot1 = dot_prod;
206 | }
207 |
208 | }
209 | }
210 |
211 | if(debug)
212 | printf("SubMatrix2: \n");
213 | for(int i=0; i<=radius; i++) {
214 | for(int j=-radius; j<=0; j++) {
215 | if(x+i <0 || x+i >= height || y+j < 0 || y+j >= width)
216 | break;
217 | int temp_index = index + i*width + j;
218 | scalar_t temp_pixel = in_depth[temp_index];
219 | if (temp_pixel==0.)
220 | continue;
221 |
222 | scalar_t c_x, c_y, c_z;
223 | c_x = (y+j - cx) * temp_pixel / fx;
224 | c_y = (x+i - cy) * temp_pixel / fy;
225 | c_z = temp_pixel;
226 | c_x = c_x - v_x;
227 | c_y = c_y - v_y;
228 | c_z = c_z - v_z;
229 | scalar_t c_norm = norm(c_x, c_y, c_z);
230 | if(c_norm <= 0.)
231 | continue;
232 | c_norm = sqrt(c_norm);
233 | c_x = c_x / c_norm;
234 | c_y = c_y / c_norm;
235 | c_z = c_z / c_norm;
236 | scalar_t dot_prod = c_x*v2_x + c_y*v2_y + c_z*v2_z;
237 | if(debug) {
238 | printf("%d, %d : %f %f %f \n", i, j, c_x, c_y, c_z);
239 | printf("DotProd: %f \n", dot_prod);
240 | }
241 | if(dot_prod > max_dot2) {
242 | max_dot2 = dot_prod;
243 | }
244 |
245 | }
246 | }
247 |
248 | if(debug)
249 | printf("SubMatrix3: \n");
250 | for(int i=-radius; i<=0; i++) {
251 | for(int j=0; j<=radius; j++) {
252 | if(x+i <0 || x+i >= height || y+j < 0 || y+j >= width)
253 | break;
254 | int temp_index = index + i*width + j;
255 | scalar_t temp_pixel = in_depth[temp_index];
256 | if (temp_pixel==0.)
257 | continue;
258 |
259 | scalar_t c_x, c_y, c_z;
260 | c_x = (y+j - cx) * temp_pixel / fx;
261 | c_y = (x+i - cy) * temp_pixel / fy;
262 | c_z = temp_pixel;
263 | c_x = c_x - v_x;
264 | c_y = c_y - v_y;
265 | c_z = c_z - v_z;
266 | scalar_t c_norm = norm(c_x, c_y, c_z);
267 | if(c_norm <= 0.)
268 | continue;
269 | c_norm = sqrt(c_norm);
270 | c_x = c_x / c_norm;
271 | c_y = c_y / c_norm;
272 | c_z = c_z / c_norm;
273 | scalar_t dot_prod = c_x*v2_x + c_y*v2_y + c_z*v2_z;
274 | if(debug) {
275 | printf("%d, %d : %f %f %f \n", i, j, c_x, c_y, c_z);
276 | printf("DotProd: %f \n", dot_prod);
277 | }
278 | if(dot_prod > max_dot3) {
279 | max_dot3 = dot_prod;
280 | }
281 |
282 | }
283 | }
284 |
285 | if(debug)
286 | printf("SubMatrix4: \n");
287 | for(int i=0; i<=radius; i++) {
288 | for(int j=0; j<=radius; j++) {
289 | if(x+i <0 || x+i >= height || y+j < 0 || y+j >= width)
290 | break;
291 | int temp_index = index + i*width + j;
292 | scalar_t temp_pixel = in_depth[temp_index];
293 | if(temp_pixel==0.)
294 | continue;
295 |
296 | scalar_t c_x, c_y, c_z;
297 | c_x = (y+j - cx) * temp_pixel / fx;
298 | c_y = (x+i - cy) * temp_pixel / fy;
299 | c_z = temp_pixel;
300 | c_x = c_x - v_x;
301 | c_y = c_y - v_y;
302 | c_z = c_z - v_z;
303 | scalar_t c_norm = norm(c_x, c_y, c_z);
304 | if(c_norm <= 0.)
305 | continue;
306 | c_norm = sqrt(c_norm);
307 | c_x = c_x / c_norm;
308 | c_y = c_y / c_norm;
309 | c_z = c_z / c_norm;
310 | scalar_t dot_prod = c_x*v2_x + c_y*v2_y + c_z*v2_z;
311 | if(debug) {
312 | printf("%d, %d : %f %f %f \n", i, j, c_x, c_y, c_z);
313 | printf("DotProd: %f \n", dot_prod);
314 | }
315 | if(dot_prod > max_dot4) {
316 | max_dot4 = dot_prod;
317 | }
318 |
319 | }
320 | }
321 |
322 | if(max_dot1 + max_dot2 + max_dot3 + max_dot4 >= threshold) {
323 | out[index] = 0.;
324 | }
325 | }
326 |
327 | }
328 | }
329 |
330 |
331 | template
332 | __global__ void downsample_flow_kernel(
333 | const scalar_t* __restrict__ in,
334 | scalar_t* __restrict__ out,
335 | unsigned int width_out, unsigned int height_out, unsigned int kernel) {
336 |
337 | int x = threadIdx.x + blockIdx.x * blockDim.x;
338 | int y = threadIdx.y + blockIdx.y * blockDim.y;
339 |
340 | // clamp to edge of image
341 | // x = max(0, x);
342 | //x = min(x, height_out-1);
343 | //y = max(y, 0);
344 | //y = min(y, width_out-1);
345 |
346 | scalar_t center = ((scalar_t)kernel-1.0)/2.0;
347 |
348 | unsigned int in_index = (kernel*x) * (kernel*width_out) + (kernel*y);
349 | unsigned int out_index = x*width_out + y;
350 |
351 | if (x >= 0 && x < height_out && y >= 0 && y < width_out) {
352 | //printf("out_index: %d, %d - in_pixel: %d, %d\n", x, y, x*kernel, y*kernel);
353 | scalar_t mean_u=0;
354 | scalar_t mean_v=0;
355 | scalar_t weights=0.0;
356 | unsigned int count = 0;
357 | for(int i=0; i= 0 && x < height_out && y >= 0 && y < width_out) {
403 | //printf("out_index: %d, %d - in_pixel: %d, %d\n", x, y, x*kernel, y*kernel);
404 | bool active = false;
405 | for(int i=0; i>>(input_uv.data(), input_depth.data(), output.data(), size, width, height);
432 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold);
433 | return output;
434 | }
435 |
436 |
437 | at::Tensor visibility_filter_cuda(at::Tensor input, at::Tensor output, unsigned int width,unsigned int height, unsigned int threshold) {
438 | //dim3 threads(BLOCK_W, BLOCK_H);
439 | //dim3 blocks((height)/TILE_W, (width)/TILE_H);
440 | dim3 threads(32, 32);
441 | dim3 blocks(height/32, width/32);
442 | AT_DISPATCH_FLOATING_TYPES(input.type(), "visibility_filter", ([&] {
443 | visibility_kernel<<>>(input.data(), output.data(), width, height, threshold);}));
444 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold);
445 | return output;
446 | }
447 |
448 | at::Tensor visibility_filter_cuda2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor output, unsigned int width,unsigned int height, float threshold, unsigned int radius) {
449 | //dim3 threads(BLOCK_W, BLOCK_H);
450 | //dim3 blocks((height)/TILE_W, (width)/TILE_H);
451 | dim3 threads(32, 32);
452 | dim3 blocks(height/32+1, width/32+1);
453 | AT_DISPATCH_FLOATING_TYPES(input_depth.type(), "visibility_filter2", ([&] {
454 | visibility_kernel2<<>>(input_depth.data(), intrinsic.data(),
455 | output.data(), width, height, threshold, radius);}));
456 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold);
457 | return output;
458 | }
459 |
460 | at::Tensor downsample_flow_cuda(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) {
461 | //dim3 threads(BLOCK_W, BLOCK_H);
462 | //dim3 blocks((height)/TILE_W, (width)/TILE_H);
463 | dim3 threads(32, 32);
464 | dim3 blocks(height_out/32+1, width_out/32+1);
465 | AT_DISPATCH_FLOATING_TYPES(input_uv.type(), "downsample_flow", ([&] {
466 | downsample_flow_kernel<<>>(input_uv.data(), output.data(),
467 | width_out, height_out, kernel);}));
468 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold);
469 | return output;
470 | }
471 |
472 | at::Tensor downsample_mask_cuda(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) {
473 | //dim3 threads(BLOCK_W, BLOCK_H);
474 | //dim3 blocks((height)/TILE_W, (width)/TILE_H);
475 | dim3 threads(32, 32);
476 | dim3 blocks(height_out/32+1, width_out/32+1);
477 | downsample_mask_kernel<<>>(input_mask.data(), output.data(),
478 | width_out, height_out, kernel);
479 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold);
480 | return output;
481 | }
482 |
483 | /*at::Tensor visibility_filter_cuda_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width,unsigned int height, float threshold) {
484 | dim3 threads(BLOCK_W, BLOCK_H);
485 | dim3 blocks((height)/TILE_W, (width)/TILE_H);
486 | //dim3 threads(32, 32);
487 | //dim3 blocks(height/32, width/32);
488 | AT_DISPATCH_FLOATING_TYPES(input_depth.type(), "visibility_filter2", ([&] {
489 | visibility_kernel_shared<<>>(input_depth.data(), input_points.data(), output.data(), width, height, threshold);}));
490 | //visibility_kernel<<>>(input.data(), output.data(), width, height, threshold);
491 | return output;
492 | }*/
493 |
--------------------------------------------------------------------------------
/core/visibility_package/visibility_new.cpp:
--------------------------------------------------------------------------------
1 | // -------------------------------------------------------------------
2 | // Copyright (C) 2020 Università degli studi di Milano-Bicocca, iralab
3 | // Author: Daniele Cattaneo (d.cattaneo10@campus.unimib.it)
4 | // Released under Creative Commons
5 | // Attribution-NonCommercial-ShareAlike 4.0 International License.
6 | // http://creativecommons.org/licenses/by-nc-sa/4.0/
7 | // -------------------------------------------------------------------
8 | #include
9 | #include
10 |
11 | #include
12 | #include
13 |
14 | #include
15 |
16 | std::vector depth_image_cuda(at::Tensor input_uv, at::Tensor input_depth, at::Tensor input_index, at::Tensor output, at::Tensor output_index, unsigned int size, unsigned int width,unsigned int height);
17 | at::Tensor visibility_filter_cuda(at::Tensor input, at::Tensor output, unsigned int width, unsigned int height, unsigned int threshold);
18 | std::vector visibility_filter_cuda2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor in_index, at::Tensor output, at::Tensor out_index, unsigned int width,unsigned int height, float threshold, unsigned int radius);
19 | at::Tensor downsample_flow_cuda(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel);
20 | at::Tensor downsample_mask_cuda(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel);
21 | //at::Tensor visibility_filter_cuda_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width,unsigned int height, float threshold);
22 | at::Tensor image_warp_index_cuda(at::Tensor input, at::Tensor flow, at::Tensor warp_depth, at::Tensor output, unsigned int width, unsigned int height);
23 | at::Tensor image_warp_cuda(at::Tensor input, at::Tensor flow, at::Tensor output, unsigned int width, unsigned int height);
24 | // C++ interface
25 |
26 | #define CHECK_CUDA(x) AT_ASSERTM(x.type().is_cuda(), #x " must be a CUDA tensor")
27 | #define CHECK_CONTIGUOUS(x) AT_ASSERTM(x.is_contiguous(), #x " must be contiguous")
28 | #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
29 |
30 | std::vector depth_image(at::Tensor input_uv, at::Tensor input_depth,at::Tensor input_index, at::Tensor output, at::Tensor output_index, unsigned int size, unsigned int width,unsigned int height) {
31 | CHECK_INPUT(input_uv);
32 | CHECK_INPUT(input_depth);
33 | CHECK_INPUT(input_index);
34 | CHECK_INPUT(output);
35 | CHECK_INPUT(output_index);
36 | return depth_image_cuda(input_uv, input_depth, input_index, output,output_index, size, width, height);
37 | }
38 |
39 | at::Tensor visibility_filter(at::Tensor input, at::Tensor output, unsigned int width, unsigned int height, unsigned int threshold){
40 | CHECK_INPUT(input);
41 | CHECK_INPUT(output);
42 | return visibility_filter_cuda(input, output, width, height, threshold);
43 | }
44 |
45 | std::vector visibility_filter2(at::Tensor input_depth, at::Tensor intrinsic, at::Tensor in_index, at::Tensor output, at::Tensor out_index, unsigned int width, unsigned int height, float threshold, unsigned int radius){
46 | CHECK_INPUT(input_depth);
47 | CHECK_INPUT(intrinsic);
48 | CHECK_INPUT(output);
49 | return visibility_filter_cuda2(input_depth, intrinsic, in_index, output, out_index, width, height, threshold, radius);
50 | }
51 |
52 | at::Tensor downsample_flow(at::Tensor input_uv, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) {
53 | CHECK_INPUT(input_uv);
54 | CHECK_INPUT(output);
55 | return downsample_flow_cuda(input_uv, output, width_out, height_out, kernel);
56 | }
57 |
58 | at::Tensor downsample_mask(at::Tensor input_mask, at::Tensor output, unsigned int width_out ,unsigned int height_out, unsigned int kernel) {
59 | CHECK_INPUT(input_mask);
60 | CHECK_INPUT(output);
61 | return downsample_mask_cuda(input_mask, output, width_out, height_out, kernel);
62 | }
63 |
64 | at::Tensor image_warp_index(at::Tensor input, at::Tensor flow, at::Tensor warp_depth, at::Tensor output, unsigned int width, unsigned int height)
65 | {
66 | CHECK_INPUT(input);
67 | CHECK_INPUT(flow);
68 | CHECK_INPUT(warp_depth);
69 | CHECK_INPUT(output);
70 | return image_warp_index_cuda(input, flow, warp_depth, output, width, height);
71 | }
72 |
73 | at::Tensor image_warp(at::Tensor input, at::Tensor flow, at::Tensor output, unsigned int width, unsigned int height)
74 | {
75 | CHECK_INPUT(input);
76 | CHECK_INPUT(flow);
77 | CHECK_INPUT(output);
78 | return image_warp_cuda(input, flow, output, width, height);
79 | }
80 |
81 | /*at::Tensor visibility_filter_shared(at::Tensor input_depth, at::Tensor input_points, at::Tensor output, unsigned int width, unsigned int height, float threshold){
82 | CHECK_INPUT(input_depth);
83 | CHECK_INPUT(input_points);
84 | CHECK_INPUT(output);
85 | return visibility_filter_cuda_shared(input_depth, input_points, output, width, height, threshold);
86 | }*/
87 |
88 | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
89 | m.def("depth_image", &depth_image, "Generate Depth Image(CUDA)");
90 | m.def("visibility", &visibility_filter, "visibility_filter (CUDA)");
91 | m.def("visibility2", &visibility_filter2, "visibility_filter2 (CUDA)");
92 | m.def("downsample_flow", &downsample_flow, "downsample_flow (CUDA)");
93 | m.def("downsample_mask", &downsample_mask, "downsample_mask (CUDA)");
94 | m.def("image_warp_index", &image_warp_index, "image_warp_index (CUDA)");
95 | m.def("image_warp", &image_warp, "image_warp (CUDA)");
96 | //m.def("visibility_shared", &visibility_filter2, "visibility_filter_shared (CUDA)");
97 | }
98 |
--------------------------------------------------------------------------------
/demo.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import os
3 | import numpy as np
4 | import h5py
5 | import argparse
6 | import torch
7 | from torchvision import transforms
8 | import mathutils
9 | from PIL import Image
10 | import cv2
11 | import visibility
12 | from core.raft import RAFT
13 | from core.utils_point import invert_pose, overlay_imgs
14 | from core.data_preprocess import Data_preprocess
15 | from core.depth_completion import sparse_to_dense
16 | from core.flow_viz import flow_to_image
17 | from core.flow2pose import Flow2Pose, err_Pose
18 |
19 | def custom_transform(rgb):
20 | to_tensor = transforms.ToTensor()
21 | normalization = transforms.Normalize(mean=[0.485, 0.456, 0.406],
22 | std=[0.229, 0.224, 0.225])
23 |
24 | rgb = to_tensor(rgb)
25 | rgb = normalization(rgb)
26 |
27 | return rgb
28 |
29 | def load_data(root, id):
30 | img_path = os.path.join(root, "image", id + '.png')
31 | pc_path = os.path.join(root, "pc", id + '.h5')
32 |
33 | try:
34 | with h5py.File(pc_path, 'r') as hf:
35 | pc = hf['PC'][:]
36 | except Exception as e:
37 | print(f'File Broken: {pc_path}')
38 | raise e
39 |
40 | pc_in = torch.from_numpy(pc.astype(np.float32))
41 | if pc_in.shape[1] == 4 or pc_in.shape[1] == 3:
42 | pc_in = pc_in.t()
43 | if pc_in.shape[0] == 3:
44 | homogeneous = torch.ones(pc_in.shape[1]).unsqueeze(0)
45 | pc_in = torch.cat((pc_in, homogeneous), 0)
46 | elif pc_in.shape[0] == 4:
47 | if not torch.all(pc_in[3, :] == 1.):
48 | pc_in[3, :] = 1.
49 | else:
50 | raise TypeError("Wrong PointCloud shape")
51 |
52 | img = Image.open(img_path)
53 | img = custom_transform(img)
54 |
55 | max_r = 10.
56 | max_t = 2.
57 | max_angle = max_r
58 | rotz = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0)
59 | roty = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0)
60 | rotx = np.random.uniform(-max_angle, max_angle) * (3.141592 / 180.0)
61 | transl_x = np.random.uniform(-max_t, max_t)
62 | transl_y = np.random.uniform(-max_t, max_t)
63 | transl_z = np.random.uniform(-max_t, min(max_t, 1.))
64 |
65 | R = mathutils.Euler((rotx, roty, rotz), 'XYZ')
66 | T = mathutils.Vector((transl_x, transl_y, transl_z))
67 |
68 | R, T = invert_pose(R, T)
69 | R, T = torch.tensor(R), torch.tensor(T)
70 |
71 | return pc_in, img, R, T
72 |
73 |
74 | def demo(args):
75 | device = torch.device(f"cuda:{args.gpus[0]}" if torch.cuda.is_available() else "cpu")
76 | os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
77 | torch.cuda.set_device(args.gpus[0])
78 |
79 | root = args.data_path
80 | calib = torch.tensor([718.856, 718.856, 607.1928, 185.2157])
81 | calib = calib.unsqueeze(0)
82 | occlusion_kernel = 5
83 | occlusion_threshold = 3
84 | id_list = sorted(os.listdir(os.path.join(root, "image")))
85 | id_list = [id[:6] for id in id_list]
86 |
87 | model = torch.nn.DataParallel(RAFT(args), device_ids=args.gpus)
88 | model.load_state_dict(torch.load(args.load_checkpoints))
89 | model.to(device)
90 |
91 | for k in range(len(id_list)):
92 | id = id_list[k]
93 |
94 | pc, img, R_err, T_err = load_data(root, id)
95 | pc = pc.unsqueeze(0)
96 | img = img.unsqueeze(0)
97 | R_err = R_err.unsqueeze(0)
98 | T_err = T_err.unsqueeze(0)
99 | calib_k = calib.clone()
100 |
101 | data_generate = Data_preprocess(calib_k, occlusion_threshold, occlusion_kernel)
102 | rgb_input, spare_depth, flow_gt = data_generate.push(img, pc, T_err, R_err, device, split='test')
103 |
104 | # dilation
105 | dense_depth = []
106 | for i in range(spare_depth.shape[0]):
107 | depth_img = spare_depth[i, 0, :, :].cpu().numpy() * 100.
108 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32))
109 | dense_depth.append(depth_img_dilate / 100.)
110 | dense_depth = torch.tensor(np.array(dense_depth)).float().to(device)
111 | dense_depth = dense_depth.unsqueeze(1)
112 |
113 | _, flow_up = model(dense_depth, rgb_input, lidar_mask=spare_depth, iters=24, test_mode=True)
114 |
115 | if args.render:
116 | if not os.path.exists(f"{root}/visualization"):
117 | os.mkdir(f"{root}/visualization")
118 | os.mkdir(f"{root}/visualization/flow")
119 | os.mkdir(f"{root}/visualization/original_overlay")
120 | os.mkdir(f"{root}/visualization/warp_overlay")
121 |
122 | flow_image = flow_to_image(flow_up.permute(0, 2, 3, 1).cpu().detach().numpy()[0])
123 | cv2.imwrite(f'{root}/visualization/flow/{id}.png', flow_image)
124 |
125 | output = torch.zeros(flow_up.shape).to(device)
126 | pred_depth_img = torch.zeros(spare_depth.shape).to(device)
127 | pred_depth_img += 1000.
128 | output = visibility.image_warp_index(spare_depth.to(device),
129 | flow_up.int().to(device), pred_depth_img,
130 | output, spare_depth.shape[3], spare_depth.shape[2])
131 | pred_depth_img[pred_depth_img == 1000.] = 0.
132 |
133 | original_overlay = overlay_imgs(rgb_input[0, :, :, :], spare_depth[0, 0, :, :])
134 | cv2.imwrite(f'{root}/visualization/original_overlay/{id}.png', original_overlay)
135 | warp_overlay = overlay_imgs(rgb_input[0, :, :, :], pred_depth_img[0, 0, :, :])
136 | cv2.imwrite(f'{root}/visualization/warp_overlay/{id}.png', warp_overlay)
137 |
138 | R_pred, T_pred = Flow2Pose(flow_up, spare_depth, calib_k)
139 | R_gt = torch.tensor([1., 0., 0., 0.])
140 | T_gt = torch.tensor([0., 0., 0.])
141 | init_err_r, init_err_t, _ = err_Pose(R_err[0], T_err[0], R_gt, T_gt)
142 | pred_err_r, pred_err_t, _ = err_Pose(R_pred, T_pred, R_err[0], T_err[0])
143 | print(f"sample {id}:")
144 | print(f"initial rotation error {init_err_r.item():.5f} initial translation error {init_err_t.item():.5f} cm")
145 | print(f"prediction rotation error {pred_err_r.item():.5f} prediction translation error {pred_err_t.item():.5f} cm")
146 |
147 | if __name__ == "__main__":
148 | parser = argparse.ArgumentParser()
149 | parser.add_argument('--data_path', type=str, metavar='DIR', default="./sample", help='path to dataset')
150 | parser.add_argument('-cps', '--load_checkpoints', help="restore checkpoint")
151 | parser.add_argument('--gpus', type=int, nargs='+', default=[0])
152 | parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision')
153 | parser.add_argument('--render', action='store_true')
154 | args = parser.parse_args()
155 |
156 | demo(args)
--------------------------------------------------------------------------------
/doc/I2D-Loc--Camera-localization-via-image_2022_ISPRS-Journal-of-Photogrammetry-.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/doc/I2D-Loc--Camera-localization-via-image_2022_ISPRS-Journal-of-Photogrammetry-.pdf
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import os
3 | import cv2
4 | import numpy as np
5 | import torch
6 | from torch.utils.data import DataLoader, RandomSampler
7 | import argparse
8 | import visibility
9 | import time
10 |
11 | sys.path.append('core')
12 | from raft import RAFT
13 | from datasets_kitti import DatasetVisibilityKittiSingle
14 | from camera_model import CameraModel
15 | from utils import fetch_optimizer, Logger, count_parameters
16 | from utils_point import merge_inputs, overlay_imgs
17 | from data_preprocess import Data_preprocess
18 | from losses import sequence_loss
19 | from depth_completion import sparse_to_dense
20 | from flow_viz import flow_to_image
21 | from flow2pose import Flow2Pose, err_Pose
22 |
23 |
24 | occlusion_kernel = 5
25 | occlusion_threshold = 3
26 | seed = 1234
27 |
28 | try:
29 | from torch.cuda.amp import GradScaler
30 | except:
31 | class GradScaler:
32 | def __init__(self):
33 | pass
34 |
35 | def scale(self, loss):
36 | return loss
37 |
38 | def unscale_(self, optimizer):
39 | pass
40 |
41 | def step(self, optimizer):
42 | optimizer.step()
43 |
44 | def update(self):
45 | pass
46 |
47 | def _init_fn(worker_id, seed):
48 | seed = seed
49 | print(f"Init worker {worker_id} with seed {seed}")
50 | torch.manual_seed(seed)
51 | np.random.seed(seed)
52 | np.random.seed(seed)
53 |
54 | def train(args, TrainImgLoader, model, optimizer, scheduler, scaler, logger, device):
55 | global occlusion_threshold, occlusion_kernel
56 | model.train()
57 | for i_batch, sample in enumerate(TrainImgLoader):
58 | rgb = sample['rgb']
59 | pc = sample['point_cloud']
60 | calib = sample['calib']
61 | T_err = sample['tr_error']
62 | R_err = sample['rot_error']
63 |
64 | data_generate = Data_preprocess(calib, occlusion_threshold, occlusion_kernel)
65 | rgb_input, lidar_input, flow_gt = data_generate.push(rgb, pc, T_err, R_err, device)
66 |
67 | # dilation
68 | depth_img_input = []
69 | for i in range(lidar_input.shape[0]):
70 | depth_img = lidar_input[i, 0, :, :].cpu().numpy() * 100.
71 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32))
72 | depth_img_input.append(depth_img_dilate / 100.)
73 | depth_img_input = torch.tensor(depth_img_input).float().to(device)
74 | depth_img_input = depth_img_input.unsqueeze(1)
75 |
76 | optimizer.zero_grad()
77 | flow_preds = model(depth_img_input, rgb_input, lidar_mask=lidar_input, iters=args.iters)
78 |
79 | loss, metrics = sequence_loss(flow_preds, flow_gt, args.gamma, MAX_FLOW=400)
80 |
81 | scaler.scale(loss).backward()
82 | scaler.unscale_(optimizer)
83 | torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
84 |
85 | scaler.step(optimizer)
86 | scheduler.step()
87 | scaler.update()
88 |
89 | logger.push(metrics)
90 |
91 |
92 | def test(args, TestImgLoader, model, device, cal_pose=False):
93 | global occlusion_threshold, occlusion_kernel
94 | model.eval()
95 | out_list, epe_list = [], []
96 | Time = 0.
97 | outliers, err_r_list, err_t_list = [], [], []
98 | for i_batch, sample in enumerate(TestImgLoader):
99 | rgb = sample['rgb']
100 | pc = sample['point_cloud']
101 | calib = sample['calib']
102 | T_err = sample['tr_error']
103 | R_err = sample['rot_error']
104 |
105 | data_generate = Data_preprocess(calib, occlusion_threshold, occlusion_kernel)
106 | rgb_input, lidar_input, flow_gt = data_generate.push(rgb, pc, T_err, R_err, device, split='test')
107 |
108 | # dilation
109 | depth_img_input = []
110 | for i in range(lidar_input.shape[0]):
111 | depth_img = lidar_input[i, 0, :, :].cpu().numpy() * 100.
112 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32))
113 | depth_img_input.append(depth_img_dilate / 100.)
114 | depth_img_input = torch.tensor(depth_img_input).float().to(device)
115 | depth_img_input = depth_img_input.unsqueeze(1)
116 |
117 | end = time.time()
118 | _, flow_up = model(depth_img_input, rgb_input, lidar_mask=lidar_input, iters=24, test_mode=True)
119 |
120 | if args.render:
121 | if not os.path.exists(f"./visualization"):
122 | os.mkdir(f"./visualization")
123 | os.mkdir(f"./visualization/flow")
124 | os.mkdir(f"./visualization/original_overlay")
125 | os.mkdir(f"./visualization/warp_overlay")
126 |
127 | flow_image = flow_to_image(flow_up.permute(0, 2, 3, 1).cpu().detach().numpy()[0])
128 | cv2.imwrite(f'./visualization/flow/{i_batch:06d}.png', flow_image)
129 |
130 | output = torch.zeros(flow_up.shape).to(device)
131 | pred_depth_img = torch.zeros(lidar_input.shape).to(device)
132 | pred_depth_img += 1000.
133 | output = visibility.image_warp_index(lidar_input.to(device),
134 | flow_up.int().to(device), pred_depth_img,
135 | output, lidar_input.shape[3], lidar_input.shape[2])
136 | pred_depth_img[pred_depth_img == 1000.] = 0.
137 |
138 | original_overlay = overlay_imgs(rgb_input[0, :, :, :], lidar_input[0, 0, :, :])
139 | cv2.imwrite(f'./visualization/original_overlay/{i_batch:06d}.png', original_overlay)
140 | warp_overlay = overlay_imgs(rgb_input[0, :, :, :], pred_depth_img[0, 0, :, :])
141 | cv2.imwrite(f'./visualization/warp_overlay/{i_batch:06d}.png', warp_overlay)
142 |
143 | if not cal_pose:
144 | epe = torch.sum((flow_up - flow_gt) ** 2, dim=1).sqrt()
145 | mag = torch.sum(flow_gt ** 2, dim=1).sqrt()
146 | epe = epe.view(-1)
147 | mag = mag.view(-1)
148 | valid_gt = (flow_gt[:, 0, :, :] != 0) + (flow_gt[:, 1, :, :] != 0)
149 | val = valid_gt.view(-1) >= 0.5
150 |
151 | out = ((epe > 3.0) & ((epe / mag) > 0.05)).float()
152 | epe_list.append(epe[val].mean().item())
153 | out_list.append(out[val].cpu().numpy())
154 | else:
155 | R_pred, T_pred = Flow2Pose(flow_up, lidar_input, calib)
156 | Time += time.time() - end
157 | err_r, err_t, is_fail = err_Pose(R_pred, T_pred, R_err[0], T_err[0])
158 | if is_fail:
159 | outliers.append(i_batch)
160 | else:
161 | err_r_list.append(err_r.item())
162 | err_t_list.append(err_t.item())
163 | print(f"{i_batch:05d}: {np.mean(err_t_list):.5f} {np.mean(err_r_list):.5f} {np.median(err_t_list):.5f} "
164 | f"{np.median(err_r_list):.5f} {len(outliers)} {Time / (i_batch+1):.5f}")
165 |
166 | if not cal_pose:
167 | epe_list = np.array(epe_list)
168 | out_list = np.concatenate(out_list)
169 |
170 | epe = np.mean(epe_list)
171 | f1 = 100 * np.mean(out_list)
172 |
173 | return epe, f1
174 | else:
175 | return err_t_list, err_r_list, outliers, Time
176 |
177 |
178 | if __name__ == '__main__':
179 | parser = argparse.ArgumentParser()
180 | parser.add_argument('--data_path', type=str, metavar='DIR',
181 | default='/data/cky/KITTI/sequences',
182 | help='path to dataset')
183 | parser.add_argument('--test_sequence', type=str, default='00')
184 | parser.add_argument('-cps', '--load_checkpoints', help="restore checkpoint")
185 | parser.add_argument('--epochs', default=100, type=int, metavar='N',
186 | help='number of total epochs to run')
187 | parser.add_argument('--starting_epoch', default=0, type=int, metavar='N',
188 | help='manual epoch number (useful on restarts)')
189 | parser.add_argument('-b', '--batch_size', default=2, type=int,
190 | metavar='N', help='mini-batch size')
191 | parser.add_argument('--lr', '--learning_rate', default=4e-5, type=float,
192 | metavar='LR', help='initial learning rate')
193 | parser.add_argument('--wdecay', type=float, default=.00005)
194 | parser.add_argument('--epsilon', type=float, default=1e-8)
195 | parser.add_argument('--clip', type=float, default=1.0)
196 | parser.add_argument('--gamma', type=float, default=0.8, help='exponential weighting')
197 | parser.add_argument('--iters', type=int, default=12)
198 | parser.add_argument('--gpus', type=int, nargs='+', default=[0])
199 | parser.add_argument('--max_r', type=float, default=10.)
200 | parser.add_argument('--max_t', type=float, default=2.)
201 | parser.add_argument('--use_reflectance', default=False)
202 | parser.add_argument('--num_workers', type=int, default=3)
203 | parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision')
204 | parser.add_argument('--evaluate_interval', default=1, type=int, metavar='N',
205 | help='Evaluate every \'evaluate interval\' epochs ')
206 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true',
207 | help='evaluate model on validation set')
208 | parser.add_argument('--render', action='store_true')
209 | args = parser.parse_args()
210 |
211 | device = torch.device(f"cuda:{args.gpus[0]}" if torch.cuda.is_available() else "cpu")
212 | os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
213 | torch.cuda.set_device(args.gpus[0])
214 |
215 | batch_size = args.batch_size
216 |
217 | model = torch.nn.DataParallel(RAFT(args), device_ids=args.gpus)
218 | print("Parameter Count: %d" % count_parameters(model))
219 | if args.load_checkpoints is not None:
220 | model.load_state_dict(torch.load(args.load_checkpoints))
221 | model.to(device)
222 |
223 | def init_fn(x):
224 | return _init_fn(x, seed)
225 |
226 | dataset_test = DatasetVisibilityKittiSingle(args.data_path, max_r=args.max_r, max_t=args.max_t,
227 | split='test', use_reflectance=args.use_reflectance,
228 | test_sequence=args.test_sequence)
229 | TestImgLoader = torch.utils.data.DataLoader(dataset=dataset_test,
230 | shuffle=False,
231 | batch_size=1,
232 | num_workers=args.num_workers,
233 | worker_init_fn=init_fn,
234 | collate_fn=merge_inputs,
235 | drop_last=False,
236 | pin_memory=True)
237 | if args.evaluate:
238 | with torch.no_grad():
239 | err_t_list, err_r_list, outliers, Time = test(args, TestImgLoader, model, device, cal_pose=True)
240 | print(f"Mean trans error {np.mean(err_t_list):.5f} Mean rotation error {np.mean(err_r_list):.5f}")
241 | print(f"Median trans error {np.median(err_t_list):.5f} Median rotation error {np.median(err_r_list):.5f}")
242 | print(f"Outliers number {len(outliers)}/{len(TestImgLoader)} Mean {Time / len(TestImgLoader):.5f} per frame")
243 | sys.exit()
244 |
245 | dataset_train = DatasetVisibilityKittiSingle(args.data_path, max_r=args.max_r, max_t=args.max_t,
246 | split='train', use_reflectance=args.use_reflectance,
247 | test_sequence=args.test_sequence)
248 | TrainImgLoader = torch.utils.data.DataLoader(dataset=dataset_train,
249 | shuffle=True,
250 | batch_size=batch_size,
251 | num_workers=args.num_workers,
252 | worker_init_fn=init_fn,
253 | collate_fn=merge_inputs,
254 | drop_last=False,
255 | pin_memory=True)
256 | print("Train length: ", len(TrainImgLoader))
257 | print("Test length: ", len(TestImgLoader))
258 |
259 | optimizer, scheduler = fetch_optimizer(args, len(TrainImgLoader), model)
260 | scaler = GradScaler(enabled=args.mixed_precision)
261 | logger = Logger(model, scheduler, SUM_FREQ=100)
262 |
263 | starting_epoch = args.starting_epoch
264 | min_val_err = 9999.
265 | for epoch in range(starting_epoch, args.epochs):
266 | # train
267 | train(args, TrainImgLoader, model, optimizer, scheduler, scaler, logger, device)
268 |
269 | if epoch % args.evaluate_interval == 0:
270 | epe, f1 = test(args, TestImgLoader, model, device)
271 | print("Validation KITTI: %f, %f" % (epe, f1))
272 |
273 | results = {'kitti-epe': epe, 'kitti-f1': f1}
274 | logger.write_dict(results)
275 |
276 | torch.save(model.state_dict(), "./checkpoints/checkpoint.pth")
277 |
278 | if epe < min_val_err:
279 | min_val_err = epe
280 | torch.save(model.state_dict(), './checkpoints/best_model.pth')
281 |
282 |
283 |
284 |
285 |
286 |
--------------------------------------------------------------------------------
/main_bpnp.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import os
3 | import cv2
4 | import numpy as np
5 | import torch
6 | from torch.utils.data import DataLoader, RandomSampler
7 | import argparse
8 | import visibility
9 | import time
10 |
11 | sys.path.append('core')
12 | from raft import RAFT
13 | from datasets_kitti import DatasetVisibilityKittiSingle
14 | from camera_model import CameraModel
15 | from utils import fetch_optimizer, Logger, count_parameters
16 | from utils_point import merge_inputs, overlay_imgs, quat2mat, tvector2mat, mat2xyzrpy
17 | from data_preprocess import Data_preprocess
18 | from losses import sequence_loss, normal_loss
19 | from depth_completion import sparse_to_dense
20 | from flow_viz import flow_to_image
21 | from flow2pose import Flow2PoseBPnP, err_Pose
22 | from BPnP import BPnP, batch_project
23 |
24 | occlusion_kernel = 5
25 | occlusion_threshold = 3
26 | seed = 1234
27 | BPnP_EPOCH = 30
28 |
29 | try:
30 | from torch.cuda.amp import GradScaler
31 | except:
32 | class GradScaler:
33 | def __init__(self):
34 | pass
35 |
36 | def scale(self, loss):
37 | return loss
38 |
39 | def unscale_(self, optimizer):
40 | pass
41 |
42 | def step(self, optimizer):
43 | optimizer.step()
44 |
45 | def update(self):
46 | pass
47 |
48 | def _init_fn(worker_id, seed):
49 | seed = seed
50 | print(f"Init worker {worker_id} with seed {seed}")
51 | torch.manual_seed(seed)
52 | np.random.seed(seed)
53 | np.random.seed(seed)
54 |
55 | def train(args, epoch, TrainImgLoader, model, optimizer, scheduler, scaler, logger, device):
56 | global occlusion_threshold, occlusion_kernel
57 | model.train()
58 | bpnp = BPnP.apply
59 | cam_model = CameraModel()
60 | for i_batch, sample in enumerate(TrainImgLoader):
61 | rgb = sample['rgb']
62 | pc = sample['point_cloud']
63 | calib = sample['calib']
64 | T_err = sample['tr_error']
65 | R_err = sample['rot_error']
66 |
67 | data_generate = Data_preprocess(calib, occlusion_threshold, occlusion_kernel)
68 | rgb_input, lidar_input, flow_gt = data_generate.push(rgb, pc, T_err, R_err, device)
69 |
70 | # dilation
71 | depth_img_input = []
72 | for i in range(lidar_input.shape[0]):
73 | depth_img = lidar_input[i, 0, :, :].cpu().numpy() * 100.
74 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32))
75 | depth_img_input.append(depth_img_dilate / 100.)
76 | depth_img_input = torch.tensor(depth_img_input).float().to(device)
77 | depth_img_input = depth_img_input.unsqueeze(1)
78 |
79 | optimizer.zero_grad()
80 | flow_preds = model(depth_img_input, rgb_input, lidar_mask=lidar_input, iters=args.iters)
81 |
82 | loss, metrics = sequence_loss(flow_preds, flow_gt, args.gamma, MAX_FLOW=400)
83 | norm_loss = normal_loss(flow_preds, flow_gt, calib, lidar_input)
84 | loss += norm_loss * 100
85 |
86 | ## BPnP loss
87 | if epoch > BPnP_EPOCH:
88 | loss_poses = 0.0
89 |
90 | flow_up = flow_preds[-1]
91 |
92 | depth_img_ori = lidar_input.cpu().numpy() * 100.
93 | pc_project_uv = np.zeros([lidar_input.shape[0], lidar_input.shape[2], lidar_input.shape[3], 2])
94 |
95 | for i in range(flow_up.shape[0]):
96 | output = torch.zeros([1, flow_up.shape[1], flow_up.shape[2], flow_up.shape[3]]).to(device)
97 | warp_depth_img = torch.zeros([1, 1, flow_up.shape[2], flow_up.shape[3]]).to(device)
98 | warp_depth_img += 1000.
99 | output = visibility.image_warp_index(lidar_input[i, :, :, :].unsqueeze(0).to(device),
100 | flow_up[i, :, :, :].unsqueeze(0).int().to(device), warp_depth_img,
101 | output, lidar_input.shape[3], lidar_input.shape[2])
102 | warp_depth_img[warp_depth_img == 1000.] = 0
103 | pc_project_uv[i, :, :, :] = output.cpu().permute(0, 2, 3, 1).numpy()
104 |
105 | for n in range(lidar_input.shape[0]):
106 | mask_depth_1 = pc_project_uv[n, :, :, 0] != 0
107 | mask_depth_2 = pc_project_uv[n, :, :, 1] != 0
108 | mask_depth = mask_depth_1 + mask_depth_2
109 | depth_img = depth_img_ori[n, 0, :, :] * mask_depth
110 | cam_params_clone = calib[n].cpu().numpy()
111 | cam_model.focal_length = cam_params_clone[:2]
112 | cam_model.principal_point = cam_params_clone[2:]
113 | pts3d, pts2d, _ = cam_model.deproject_pytorch(depth_img, pc_project_uv[n, :, :, :])
114 | pts3d = torch.tensor(pts3d, dtype=torch.float32).to(device)
115 | pts2d = torch.tensor(pts2d, dtype=torch.float32).to(device)
116 | pts2d = pts2d.unsqueeze(0)
117 | cam_mat = np.array(
118 | [[cam_params_clone[0], 0, cam_params_clone[2]], [0, cam_params_clone[1], cam_params_clone[3]],
119 | [0, 0, 1.]])
120 | K = torch.tensor(cam_mat, dtype=torch.float32).to(device)
121 |
122 | P_out = bpnp(pts2d, pts3d, K)
123 | pts2d_pro = batch_project(P_out, pts3d, K)
124 |
125 | R = quat2mat(R_err[n])
126 | T = tvector2mat(T_err[n])
127 | RT_inv = torch.mm(T, R)
128 | RT = RT_inv.clone().inverse()
129 | P_gt = mat2xyzrpy(RT)
130 | P_gt = P_gt[[4, 5, 3, 1, 2, 0]]
131 | P_gt = P_gt.unsqueeze(0)
132 | pts2d_gt = batch_project(P_gt.to(device), pts3d, K)
133 |
134 | pts2d_gt.requires_grad = True
135 | pts2d_pro.requires_grad = True
136 | loss_poses += ((pts2d_pro - pts2d_gt) ** 2).mean()
137 |
138 | loss_pose = loss_poses / lidar_input.shape[0]
139 | else:
140 | loss_pose = 0.
141 |
142 | loss += loss_pose * 10
143 |
144 | scaler.scale(loss).backward()
145 | scaler.unscale_(optimizer)
146 | torch.nn.utils.clip_grad_norm_(model.parameters(), args.clip)
147 |
148 | scaler.step(optimizer)
149 | scheduler.step()
150 | scaler.update()
151 |
152 | logger.push(metrics)
153 |
154 | def test(args, TestImgLoader, model, bpnp, device, cal_pose=False):
155 | global occlusion_threshold, occlusion_kernel
156 | model.eval()
157 | out_list, epe_list = [], []
158 | Time = 0.
159 | outliers, err_r_list, err_t_list = [], [], []
160 | for i_batch, sample in enumerate(TestImgLoader):
161 | rgb = sample['rgb']
162 | pc = sample['point_cloud']
163 | calib = sample['calib']
164 | T_err = sample['tr_error']
165 | R_err = sample['rot_error']
166 |
167 | data_generate = Data_preprocess(calib, occlusion_threshold, occlusion_kernel)
168 | rgb_input, lidar_input, flow_gt = data_generate.push(rgb, pc, T_err, R_err, device, split='test')
169 |
170 | # dilation
171 | depth_img_input = []
172 | for i in range(lidar_input.shape[0]):
173 | depth_img = lidar_input[i, 0, :, :].cpu().numpy() * 100.
174 | depth_img_dilate = sparse_to_dense(depth_img.astype(np.float32))
175 | depth_img_input.append(depth_img_dilate / 100.)
176 | depth_img_input = torch.tensor(depth_img_input).float().to(device)
177 | depth_img_input = depth_img_input.unsqueeze(1)
178 |
179 | end = time.time()
180 | _, flow_up = model(depth_img_input, rgb_input, lidar_mask=lidar_input, iters=24, test_mode=True)
181 |
182 | if args.render:
183 | if not os.path.exists(f"./visulization"):
184 | os.mkdir(f"./visulization")
185 | os.mkdir(f"./visulization/flow")
186 | os.mkdir(f"./visulization/original_overlay")
187 | os.mkdir(f"./visulization/warp_overlay")
188 |
189 | flow_image = flow_to_image(flow_up.permute(0, 2, 3, 1).cpu().detach().numpy()[0])
190 | cv2.imwrite(f'./visulization/flow/{i_batch:06d}.png', flow_image)
191 |
192 | output = torch.zeros(flow_up.shape).to(device)
193 | pred_depth_img = torch.zeros(lidar_input.shape).to(device)
194 | pred_depth_img += 1000.
195 | output = visibility.image_warp_index(lidar_input.to(device),
196 | flow_up.int().to(device), pred_depth_img,
197 | output, lidar_input.shape[3], lidar_input.shape[2])
198 | pred_depth_img[pred_depth_img == 1000.] = 0.
199 |
200 | original_overlay = overlay_imgs(rgb_input[0, :, :, :], lidar_input[0, 0, :, :])
201 | cv2.imwrite(f'./visulization/original_overlay/{i_batch:06d}.png', original_overlay)
202 | warp_overlay = overlay_imgs(rgb_input[0, :, :, :], pred_depth_img[0, 0, :, :])
203 | cv2.imwrite(f'./visulization/warp_overlay/{i_batch:06d}.png', warp_overlay)
204 |
205 | if not cal_pose:
206 | epe = torch.sum((flow_up - flow_gt) ** 2, dim=1).sqrt()
207 | mag = torch.sum(flow_gt ** 2, dim=1).sqrt()
208 | epe = epe.view(-1)
209 | mag = mag.view(-1)
210 | valid_gt = (flow_gt[:, 0, :, :] != 0) + (flow_gt[:, 1, :, :] != 0)
211 | val = valid_gt.view(-1) >= 0.5
212 |
213 | out = ((epe > 3.0) & ((epe / mag) > 0.05)).float()
214 | epe_list.append(epe[val].mean().item())
215 | out_list.append(out[val].cpu().numpy())
216 | else:
217 | R_pred, T_pred = Flow2PoseBPnP(flow_up, lidar_input, calib, bpnp)
218 | Time += time.time() - end
219 | err_r, err_t, is_fail = err_Pose(R_pred, T_pred, R_err[0], T_err[0])
220 | if is_fail:
221 | outliers.append(i_batch)
222 | else:
223 | err_r_list.append(err_r.item())
224 | err_t_list.append(err_t.item())
225 | print(f"{i_batch:05d}: {np.mean(err_t_list):.5f} {np.mean(err_r_list):.5f} {np.median(err_t_list):.5f} "
226 | f"{np.median(err_r_list):.5f} {len(outliers)} {Time / (i_batch+1):.5f}")
227 |
228 | if not cal_pose:
229 | epe_list = np.array(epe_list)
230 | out_list = np.concatenate(out_list)
231 |
232 | epe = np.mean(epe_list)
233 | f1 = 100 * np.mean(out_list)
234 |
235 | return epe, f1
236 | else:
237 | return err_t_list, err_r_list, outliers, Time
238 |
239 |
240 | if __name__ == '__main__':
241 | parser = argparse.ArgumentParser()
242 | parser.add_argument('--data_path', type=str, metavar='DIR',
243 | default='/data/cky/KITTI/sequences',
244 | help='path to dataset')
245 | parser.add_argument('--test_sequence', type=str, default='00')
246 | parser.add_argument('-cps', '--load_checkpoints', help="restore checkpoint")
247 | parser.add_argument('--epochs', default=100, type=int, metavar='N',
248 | help='number of total epochs to run')
249 | parser.add_argument('--starting_epoch', default=0, type=int, metavar='N',
250 | help='manual epoch number (useful on restarts)')
251 | parser.add_argument('-b', '--batch_size', default=2, type=int,
252 | metavar='N', help='mini-batch size')
253 | parser.add_argument('--lr', '--learning_rate', default=4e-5, type=float,
254 | metavar='LR', help='initial learning rate')
255 | parser.add_argument('--wdecay', type=float, default=.00005)
256 | parser.add_argument('--epsilon', type=float, default=1e-8)
257 | parser.add_argument('--clip', type=float, default=1.0)
258 | parser.add_argument('--gamma', type=float, default=0.8, help='exponential weighting')
259 | parser.add_argument('--iters', type=int, default=12)
260 | parser.add_argument('--gpus', type=int, nargs='+', default=[0])
261 | parser.add_argument('--max_r', type=float, default=10.)
262 | parser.add_argument('--max_t', type=float, default=2.)
263 | parser.add_argument('--use_reflectance', default=False)
264 | parser.add_argument('--num_workers', type=int, default=3)
265 | parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision')
266 | parser.add_argument('--evaluate_interval', default=1, type=int, metavar='N',
267 | help='Evaluate every \'evaluate interval\' epochs ')
268 | parser.add_argument('-e', '--evaluate', dest='evaluate', action='store_true',
269 | help='evaluate model on validation set')
270 | parser.add_argument('--render', action='store_true')
271 | args = parser.parse_args()
272 |
273 | device = torch.device(f"cuda:{args.gpus[0]}" if torch.cuda.is_available() else "cpu")
274 | os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
275 | torch.cuda.set_device(args.gpus[0])
276 |
277 | batch_size = args.batch_size
278 |
279 | model = torch.nn.DataParallel(RAFT(args), device_ids=args.gpus)
280 | print("Parameter Count: %d" % count_parameters(model))
281 | if args.load_checkpoints is not None:
282 | model.load_state_dict(torch.load(args.load_checkpoints))
283 | model.to(device)
284 |
285 | bpnp = BPnP.apply
286 |
287 | def init_fn(x):
288 | return _init_fn(x, seed)
289 |
290 | dataset_test = DatasetVisibilityKittiSingle(args.data_path, max_r=args.max_r, max_t=args.max_t,
291 | split='test', use_reflectance=args.use_reflectance,
292 | test_sequence=args.test_sequence)
293 | TestImgLoader = torch.utils.data.DataLoader(dataset=dataset_test,
294 | shuffle=False,
295 | batch_size=1,
296 | num_workers=args.num_workers,
297 | worker_init_fn=init_fn,
298 | collate_fn=merge_inputs,
299 | drop_last=False,
300 | pin_memory=True)
301 | if args.evaluate:
302 | with torch.no_grad():
303 | err_t_list, err_r_list, outliers, Time = test(args, TestImgLoader, model, bpnp, device, cal_pose=True)
304 | print(f"Mean trans error {np.mean(err_t_list):.5f} Mean rotation error {np.mean(err_r_list):.5f}")
305 | print(f"Median trans error {np.median(err_t_list):.5f} Median rotation error {np.median(err_r_list):.5f}")
306 | print(f"Outliers number {len(outliers)}/{len(TestImgLoader)} Mean {Time / len(TestImgLoader):.5f} per frame")
307 | sys.exit()
308 |
309 | dataset_train = DatasetVisibilityKittiSingle(args.data_path, max_r=args.max_r, max_t=args.max_t,
310 | split='train', use_reflectance=args.use_reflectance,
311 | test_sequence=args.test_sequence)
312 | TrainImgLoader = torch.utils.data.DataLoader(dataset=dataset_train,
313 | shuffle=True,
314 | batch_size=batch_size,
315 | num_workers=args.num_workers,
316 | worker_init_fn=init_fn,
317 | collate_fn=merge_inputs,
318 | drop_last=False,
319 | pin_memory=True)
320 | print("Train length: ", len(TrainImgLoader))
321 | print("Test length: ", len(TestImgLoader))
322 |
323 | optimizer, scheduler = fetch_optimizer(args, len(TrainImgLoader), model)
324 | scaler = GradScaler(enabled=args.mixed_precision)
325 | logger = Logger(model, scheduler, SUM_FREQ=100)
326 |
327 | starting_epoch = args.starting_epoch
328 | min_val_err = 9999.
329 | for epoch in range(starting_epoch, args.epochs):
330 | # train
331 | train(args, epoch, TrainImgLoader, model, optimizer, scheduler, scaler, logger, device)
332 |
333 | if epoch % args.evaluate_interval == 0:
334 | epe, f1 = test(args, TestImgLoader, model, bpnp, device)
335 | print("Validation KITTI: %f, %f" % (epe, f1))
336 |
337 | results = {'kitti-epe': epe, 'kitti-f1': f1}
338 | logger.write_dict(results)
339 |
340 | torch.save(model.state_dict(), "./checkpoints/checkpoint.pth")
341 |
342 | if epe < min_val_err:
343 | min_val_err = epe
344 | torch.save(model.state_dict(), './checkpoints/best_model.pth')
345 |
346 |
347 |
348 |
349 |
350 |
--------------------------------------------------------------------------------
/preprocess/kitti_maps.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import os
3 | import sys
4 | sys.path.append("..")
5 | sys.path.append(".")
6 |
7 | import h5py
8 | import numpy as np
9 | import open3d as o3
10 | import pykitti
11 | import torch
12 | from tqdm import tqdm
13 |
14 | from utils import to_rotation_matrix
15 |
16 |
17 | parser = argparse.ArgumentParser()
18 | parser.add_argument('--sequence', default='00',
19 | help='sequence')
20 | parser.add_argument('--device', default='cuda',
21 | help='device')
22 | parser.add_argument('--voxel_size', default=0.1, type=float, help='Voxel Size')
23 | parser.add_argument('--start', default=0, help='Starting Frame')
24 | parser.add_argument('--end', default=100000, help='End Frame')
25 | parser.add_argument('--map', default=None, help='Use map file')
26 | parser.add_argument('--kitti_folder', default='./KITTI/ODOMETRY', help='Folder of the KITTI dataset')
27 |
28 | args = parser.parse_args()
29 | sequence = args.sequence
30 | print("Sequnce: ", sequence)
31 | velodyne_folder = os.path.join(args.kitti_folder, 'sequences', sequence, 'velodyne')
32 | pose_file = os.path.join('./data', f'kitti-{sequence}.csv')
33 |
34 | poses = []
35 | with open(pose_file, 'r') as f:
36 | for x in f:
37 | if x.startswith('timestamp'):
38 | continue
39 | x = x.split(',')
40 | T = torch.tensor([float(x[1]), float(x[2]), float(x[3])])
41 | R = torch.tensor([float(x[7]), float(x[4]), float(x[5]), float(x[6])])
42 | poses.append(to_rotation_matrix(R, T))
43 |
44 | map_file = args.map
45 | first_frame = int(args.start)
46 | last_frame = min(len(poses), int(args.end))
47 | kitti = pykitti.odometry(args.kitti_folder, sequence)
48 |
49 | if map_file is None:
50 |
51 | pc_map = []
52 | pcl = o3.PointCloud()
53 | for i in tqdm(range(first_frame, last_frame)):
54 | pc = kitti.get_velo(i)
55 | valid_indices = pc[:, 0] < -3.
56 | valid_indices = valid_indices | (pc[:, 0] > 3.)
57 | valid_indices = valid_indices | (pc[:, 1] < -3.)
58 | valid_indices = valid_indices | (pc[:, 1] > 3.)
59 | pc = pc[valid_indices].copy()
60 | intensity = pc[:, 3].copy()
61 | pc[:, 3] = 1.
62 | RT = poses[i].numpy()
63 | pc_rot = np.matmul(RT, pc.T)
64 | pc_rot = pc_rot.astype(np.float).T.copy()
65 |
66 | pcl_local = o3.PointCloud()
67 | pcl_local.points = o3.Vector3dVector(pc_rot[:, :3])
68 | pcl_local.colors = o3.Vector3dVector(np.vstack((intensity, intensity, intensity)).T)
69 |
70 | downpcd = o3.voxel_down_sample(pcl_local, voxel_size=args.voxel_size)
71 |
72 | pcl.points.extend(downpcd.points)
73 | pcl.colors.extend(downpcd.colors)
74 |
75 |
76 | downpcd_full = o3.voxel_down_sample(pcl, voxel_size=args.voxel_size)
77 | downpcd, ind = o3.statistical_outlier_removal(downpcd_full, nb_neighbors=40, std_ratio=0.3)
78 | #o3.draw_geometries(downpcd)
79 | o3.write_point_cloud(os.path.join(args.kitti_folder, 'sequences', sequence, f'map-{sequence}_{args.voxel_size}_{first_frame}-{last_frame}.pcd'), downpcd)
80 | else:
81 | downpcd = o3.read_point_cloud(map_file)
82 |
83 |
84 | voxelized = torch.tensor(downpcd.points, dtype=torch.float)
85 | voxelized = torch.cat((voxelized, torch.ones([voxelized.shape[0], 1], dtype=torch.float)), 1)
86 | voxelized = voxelized.t()
87 | voxelized = voxelized.to(args.device)
88 | vox_intensity = torch.tensor(downpcd.colors, dtype=torch.float)[:, 0:1].t()
89 |
90 | velo2cam2 = torch.from_numpy(kitti.calib.T_cam2_velo).float().to(args.device)
91 |
92 | # SAVE SINGLE PCs
93 | if not os.path.exists(os.path.join(args.kitti_folder, 'sequences', sequence,
94 | f'local_maps_{args.voxel_size}')):
95 | os.mkdir(os.path.join(args.kitti_folder, 'sequences', sequence, f'local_maps_{args.voxel_size}'))
96 | for i in tqdm(range(first_frame, last_frame)):
97 | pose = poses[i]
98 | pose = pose.to(args.device)
99 | pose = pose.inverse()
100 |
101 | local_map = voxelized.clone()
102 | local_intensity = vox_intensity.clone()
103 | local_map = torch.mm(pose, local_map).t()
104 | indexes = local_map[:, 1] > -25.
105 | indexes = indexes & (local_map[:, 1] < 25.)
106 | indexes = indexes & (local_map[:, 0] > -10.)
107 | indexes = indexes & (local_map[:, 0] < 100.)
108 | local_map = local_map[indexes]
109 | local_intensity = local_intensity[:, indexes]
110 |
111 | local_map = torch.mm(velo2cam2, local_map.t())
112 | local_map = local_map[[2, 0, 1, 3], :]
113 |
114 | #pcd = o3.PointCloud()
115 | #pcd.points = o3.Vector3dVector(local_map[:,:3].numpy())
116 | #o3.write_point_cloud(f'{i:06d}.pcd', pcd)
117 |
118 | file = os.path.join(args.kitti_folder, 'sequences', sequence,
119 | f'local_maps_{args.voxel_size}', f'{i:06d}.h5')
120 | with h5py.File(file, 'w') as hf:
121 | hf.create_dataset('PC', data=local_map.cpu().half(), compression='lzf', shuffle=True)
122 | hf.create_dataset('intensity', data=local_intensity.cpu().half(), compression='lzf', shuffle=True)
123 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | scikit_image
2 | git+https://gitlab.com/m1lhaus/blender-mathutils.git
3 | tqdm
4 | pandas
5 | h5py
6 | matplotlib
7 | scipy
8 | pyquaternion
9 | opencv-python
10 | pykitti
11 | numpy
12 | open3d-python==0.7.0.0
13 | Pillow
14 | scikit-image
15 | sacred==0.7.4
16 | tqdm
17 | tensorboardX
--------------------------------------------------------------------------------
/sample/image/000500.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/image/000500.png
--------------------------------------------------------------------------------
/sample/image/001000.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/image/001000.png
--------------------------------------------------------------------------------
/sample/image/001500.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/image/001500.png
--------------------------------------------------------------------------------
/sample/image/002000.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/image/002000.png
--------------------------------------------------------------------------------
/sample/overlay/000500.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/overlay/000500.png
--------------------------------------------------------------------------------
/sample/overlay/001000.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/overlay/001000.png
--------------------------------------------------------------------------------
/sample/overlay/001500.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/overlay/001500.png
--------------------------------------------------------------------------------
/sample/overlay/002000.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/overlay/002000.png
--------------------------------------------------------------------------------
/sample/pc/000500.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/pc/000500.h5
--------------------------------------------------------------------------------
/sample/pc/001000.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/pc/001000.h5
--------------------------------------------------------------------------------
/sample/pc/001500.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/pc/001500.h5
--------------------------------------------------------------------------------
/sample/pc/002000.h5:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/EasonChen99/I2D-Loc/7742241ac915da4b1e503ecc1642c71819f6d1e8/sample/pc/002000.h5
--------------------------------------------------------------------------------