├── .github └── FUNDING.yml ├── .gitignore ├── LICENSE ├── README.md ├── data.py ├── demo.py ├── imgs ├── arXiv_fig1.png ├── dataset_cmp.png ├── demo01.png └── demo_exp.png ├── main.py ├── model.py ├── requirements.txt ├── tools ├── extract_comma2k19.py └── extract_nuscenes.py ├── utils.py ├── utils_comma2k19 ├── LICENSE ├── __init__.py ├── benchmarks.py ├── camera.py ├── coordinates.py ├── dataset.py ├── orientation.py └── unzip_msft_fs.py └── view_transform.py /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | # These are supported funding model platforms 2 | 3 | github: [OpenDriveLab] # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] 4 | patreon: # Replace with a single Patreon username 5 | open_collective: # Replace with a single Open Collective username 6 | ko_fi: # Replace with a single Ko-fi username 7 | tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel 8 | community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry 9 | liberapay: # Replace with a single Liberapay username 10 | issuehunt: # Replace with a single IssueHunt username 11 | otechie: # Replace with a single Otechie username 12 | lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry 13 | custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2'] 14 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | data/* 2 | lightning_logs/ 3 | runs/ 4 | __pycache__ 5 | vis*/ 6 | *.ckpt 7 | .vscode -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2021 T.T. Tang tttang@sjtu.edu.cn 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | #

Openpilot-Deepdive

2 | **

Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot

** 3 | ![image](./imgs/arXiv_fig1.png) 4 | [**Webpage**](https://sites.google.com/view/openpilot-deepdive/home) | [**Paper**](https://arxiv.org/abs/2206.08176) | [**Zhihu**](https://www.zhihu.com/people/PerceptionX) 5 | *** 6 | # Introduction 7 | 8 | This repository is the PyTorch implementation for our Openpilot-Deepdive. 9 | In contrast to most traditional autonomous driving solutions where the perception, prediction, and planning module are apart, [Openpilot](https://github.com/commaai/openpilot) uses an end-to-end neural network to predict the trajectory directly from the camera images, which is called Supercombo. We try to reimplement the training details and test the pipeline on public benchmarks. Experimental results of OP-Deepdive on nuScenes, Comma2k19, CARLA, and in-house realistic scenarios (collected in Shanghai) verify that a low-cost device can indeed achieve most L2 functionalities and be on par with the original Supercombo model. We also test on CommaTwo device with a dual-model deploeyment framework, which is in this repo: [Openpilot-Deployment](https://github.com/OpenPerceptionX/Openpilot-Deepdive/tree/deploy/Openpilot-Deployment). 10 | 11 | * [Directory Structure](#directory-structure) 12 | * [Changelog](#changelog) 13 | * [Quick Start](#quick-start-examples) 14 | * [Installation](#installation) 15 | * [Dataset](#dataset) 16 | * [Training and Testing](#training-and-testing) 17 | * [Demo](#demo) 18 | * [Baselines](#baselines) 19 | * [Citation](#citation) 20 | * [License](#license) 21 | 22 | *** 23 | # Directory Structure 24 | 25 | ``` 26 | Openpilot-Deepdive 27 | ├── tools - Tools to generate split on Comma2k19 and nuScenes datasets. 28 | ├── utils_comma2k19 - The utils provided by comma, copied from `commaai/comma2k19.git/utils` 29 | ├── data 30 | ├── nuscenes -> soft link to the nusSenes-all dataset 31 | ├── comma2k19 -> soft link to the Comma2k19 dataset 32 | ``` 33 | *** 34 | # Changelog 35 | 36 | 2022-6-17: We released the v1.0 code for Openpilot-Deepdive. 37 | 38 | 2022-6-26: We fix some problems and update the readme for using the code on bare-metal machines. Thanks @EliomEssaim and @MicroHest! 39 | 40 | 2022-7-13: We released the v1.0 code for [Openpilot-Deployment](https://github.com/OpenPerceptionX/Openpilot-Deepdive/tree/deploy/Openpilot-Deployment) for dual-model deployment in the Openpilot framework. 41 | 42 | *** 43 | # Quick Start Examples 44 | Before starting, we refer you to read the [arXiv](https://arxiv.org/abs/2206.08176) to understand the details of our work. 45 | ## Installation 46 | Clone repo and install requirements.txt in a [Python>=3.7.0](https://www.python.org/) environment, including [PyTorch>=1.7](https://pytorch.org/get-started/locally/). 47 | 48 | ``` 49 | git clone https://github.com/OpenPerceptionX/Openpilot-Deepdive.git # clone 50 | cd Openpilot-Deepdive 51 | pip install -r requirements.txt # install 52 | ``` 53 | ## Dataset 54 | We train and evaluate our model on two datasets, [nuScenes](https://www.nuscenes.org/nuscenes) and [Comma2k19](https://github.com/commaai/comma2k19). 55 | The table shows some key features of them. 56 | 57 | | Dataset | Raw
FPS (Hz) | Aligned&
FPS (Hz) | Length Per
Sequence
(Frames/Second) | Altogether
Length
(Minutes) | Scenario | Locations | 58 | | :----: |:----:|:----:|:----:|:----:|:----:|:----:| 59 | | nuScenes | 12 | 2 | 40 / 20 | 330 | Street | America
Singapore | 60 | | Comma2k19 | 20 | 20 | 1000 / 60 | 2000 | Highway | America | 61 | 62 | Please create a `data` folder and create soft links to the datasets. 63 | 64 | For dataset splits, you may create your own by running the scripts in the `tools` folder, or download it in https://github.com/OpenPerceptionX/Openpilot-Deepdive/issues/4. 65 | 66 | ## Training and Testing 67 | By default, the batch size is set to be 6 per GPU, which consumes 27 GB GPU memory. When using 8 V100 GPUs, it takes approximate 120 hours to train 100 epochs on Comma2k19 dataset. 68 | 69 | **Note**: Our lab use `slurm` to run and manage the tasks. Then, the PyTorch distributed training processes are initialized manually by `slurm`, since the automatic `mp.spawn` may cause unknown problems on slurm clusters. For most people who do not use a cluster, it's okay to launch the training process on bare-metal machines, but you will have to open multiple terminals and set some environmental variables manually if you want to use multiple GPUs. We will explain it below. 70 | 71 | **Warning**: Since we have to extract all the frames from the video before sending them into the network, the program is hungry for memory. The actual memory usage is related to `batch_size` and `n_workers`. By default, each process with `n_workers=4` and `batch_size=6` consumes around 40 to 50 GB memory. You'd better open an `htop` to monitor the memory usage, before the machine hangs. 72 | 73 | ```[bash] 74 | # Training on a slurm cluster 75 | export DIST_PORT = 23333 # You may use whatever you want 76 | export NUM_GPUS = 8 77 | PORT=$DIST_PORT$ srun -p $PARTITION$ --job-name=openpilot -n $NUM_GPUS$ --gres=gpu:$NUM_GPUS$ --ntasks-per-node=$NUM_GPUS$ python main.py 78 | ``` 79 | 80 | ```[bash] 81 | # Training on a bare-metal machine with a single GPU 82 | PORT=23333 SLURM_PROCID=0 SLURM_NTASKS=1 python main.py 83 | ``` 84 | 85 | ```[bash] 86 | # Training on a bare-metal machine with multiple GPUs 87 | # You need to open multiple terminals 88 | 89 | # Let's use 4 GPUs for example 90 | # Terminal 1 91 | PORT=23333 SLURM_PROCID=0 SLURM_NTASKS=4 python main.py 92 | # Terminal 2 93 | PORT=23333 SLURM_PROCID=1 SLURM_NTASKS=4 python main.py 94 | # Terminal 3 95 | PORT=23333 SLURM_PROCID=2 SLURM_NTASKS=4 python main.py 96 | # Terminal 4 97 | PORT=23333 SLURM_PROCID=3 SLURM_NTASKS=4 python main.py 98 | # Then, the training process will start after all 4 processes are launched. 99 | ``` 100 | 101 | By default, the program will not output anything once the training process starts, for the widely-used `tqdm` might be buggy on slurm clusters. So, you may see some debugging info like the one below and the program seems to be stuck. 102 | 103 | ``` 104 | [1656218909.68] starting job... 0 of 1 105 | [1656218911.53] DDP Initialized at localhost:23333 0 of 1 106 | Comma2k19SequenceDataset: DEMO mode is on. 107 | Loaded pretrained weights for efficientnet-b2 108 | ``` 109 | 110 | Don't worry, you can open a tensorboard to see the loss and validation curves. 111 | ``` 112 | tensorboard --logdir runs --bind_all 113 | ``` 114 | 115 | Otherwise, you may want to parse `--tqdm=True` to show the progress bar in `Terminal 1`. 116 | 117 | By default, the test process will be executed once every epoch. So we did not implement the independent test script. 118 | 119 | ## Demo 120 | See more demo and test cases on our [webpage](https://sites.google.com/view/openpilot-deepdive/home)！ 121 | 122 | You can generate your own demo video using `demo.py`. It will generate some frames in the `./vis` folder. (You may have to create it first.) Then, you can generate a video using `ffmpeg`. 123 | 124 | https://user-images.githubusercontent.com/20351686/174319920-35b3ad34-a15e-43d7-be23-b135c24712e2.mp4 125 | 126 | 127 | *** 128 | # Baselines 129 | Here we list several baselines to perform trajectory prediction task on different datasets. You are welcome to pull request and add your work here! 130 | 131 | **nuScenes** 132 | | Method | AP@0.5(0-10) | AP@1(10-20) | AP@1(20-30) | AP@1(30-50) | 133 | | :----: |:----: |:----: |:----: |:----: | 134 | | Supercombo | 0.237 | 0.064 | 0.038 | 0.053 | 135 | | Supercombo-finetuned | 0.305 | 0.162 | 0.088 | 0.050 | 136 | | OP-Deepdive (Ours) | 0.28 | 0.14 | 0.067 | 0.038 | 137 | 138 | **Comma2k19** 139 | | Method | AP@0.5(0-10) | AP@1(10-20) | AP@1(20-30) | AP@1(30-50) | AP@2(50+) | Average Jerk* | 140 | | :----: |:----: |:----: |:----: |:----: |:----: |:----: | 141 | | Supercombo | 0.7966 | 0.6170 | 0.2661 | 0.0889 | 0.0062 | 2.2243 | 142 | | OP-Deepdive (Ours) | 0.909 | 0.808 | 0.651 | 0.465 | 0.239 | 4.7959 | 143 | 144 | \*: The lower, the better. To comparison, the average jerk of human driver's trajectories is 0.3232 m/s^2. 145 | 146 | *** 147 | # Citation 148 | Please use the following citation when referencing our repo or [arXiv](https://arxiv.org/abs/2206.08176). 149 | ``` 150 | @article{chen2022op, 151 | title={Level 2 Autonomous Driving on a Single Device: Diving into the Devils of Openpilot}, 152 | author={Li Chen and Tutian Tang and Zhitian Cai and Yang Li and Penghao Wu and Hongyang Li and Jianping Shi and Junchi Yan and Yu Qiao}, 153 | journal={arXiv preprint arXiv:2206.08176}, 154 | year={2022} 155 | } 156 | ``` 157 | *** 158 | # License 159 | All code within this repository is under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). 160 | *** 161 | 162 | -------------------------------------------------------------------------------- /data.py: -------------------------------------------------------------------------------- 1 | import os 2 | import io 3 | import json 4 | import torch 5 | from math import pi 6 | import numpy as np 7 | from scipy.interpolate import interp1d 8 | import cv2 9 | cv2.setNumThreads(0) 10 | cv2.ocl.setUseOpenCL(False) 11 | 12 | from PIL import Image 13 | from torch.utils.data import Dataset 14 | from torchvision import transforms 15 | from utils import warp, generate_random_params_for_warp 16 | from view_transform import calibration 17 | 18 | import utils_comma2k19.orientation as orient 19 | import utils_comma2k19.coordinates as coord 20 | 21 | 22 | class PlanningDataset(Dataset): 23 | def __init__(self, root='data', json_path_pattern='p3_%s.json', split='train'): 24 | self.samples = json.load(open(os.path.join(root, json_path_pattern % split))) 25 | print('PlanningDataset: %d samples loaded from %s' % 26 | (len(self.samples), os.path.join(root, json_path_pattern % split))) 27 | self.split = split 28 | 29 | self.img_root = os.path.join(root, 'nuscenes') 30 | self.transforms = transforms.Compose( 31 | [ 32 | # transforms.Resize((900 // 2, 1600 // 2)), 33 | # transforms.Resize((9 * 32, 16 * 32)), 34 | transforms.Resize((128, 256)), 35 | transforms.ToTensor(), 36 | transforms.Normalize([0.3890, 0.3937, 0.3851], 37 | [0.2172, 0.2141, 0.2209]), 38 | ] 39 | ) 40 | 41 | self.enable_aug = False 42 | self.view_transform = False 43 | 44 | self.use_memcache = False 45 | if self.use_memcache: 46 | self._init_mc_() 47 | 48 | def _init_mc_(self): 49 | from petrel_client.client import Client 50 | self.client = Client('~/petreloss.conf') 51 | print('======== Initializing Memcache: Success =======') 52 | 53 | def _get_cv2_image(self, path): 54 | if self.use_memcache: 55 | img_bytes = self.client.get(str(path)) 56 | assert(img_bytes is not None) 57 | img_mem_view = memoryview(img_bytes) 58 | img_array = np.frombuffer(img_mem_view, np.uint8) 59 | return cv2.imdecode(img_array, cv2.IMREAD_COLOR) 60 | 61 | else: 62 | return cv2.imread(path) 63 | 64 | def __len__(self): 65 | return len(self.samples) 66 | 67 | def __getitem__(self, idx): 68 | sample = self.samples[idx] 69 | imgs, future_poses = sample['imgs'], sample['future_poses'] 70 | 71 | # process future_poses 72 | future_poses = torch.tensor(future_poses) 73 | future_poses[:, 0] = future_poses[:, 0].clamp(1e-2, ) # the car will never go backward 74 | 75 | imgs = list(self._get_cv2_image(os.path.join(self.img_root, p)) for p in imgs) 76 | imgs = list(cv2.cvtColor(img, cv2.COLOR_BGR2RGB) for img in imgs) # RGB 77 | 78 | # process images 79 | if self.enable_aug and self.split == 'train': 80 | # data augumentation when training 81 | # random distort (warp) 82 | w_offsets, h_offsets = generate_random_params_for_warp(imgs[0], random_rate=0.1) 83 | imgs = list(warp(img, w_offsets, h_offsets) for img in imgs) 84 | 85 | # random flip 86 | if np.random.rand() > 0.5: 87 | imgs = list(img[:, ::-1, :] for img in imgs) 88 | future_poses[:, 1] *= -1 89 | 90 | 91 | if self.view_transform: 92 | camera_rotation_matrix = np.linalg.inv(np.array(sample["camera_rotation_matrix_inv"])) 93 | camera_translation = -np.array(sample["camera_translation_inv"]) 94 | camera_extrinsic = np.vstack((np.hstack((camera_rotation_matrix, camera_translation.reshape((3, 1)))), np.array([0, 0, 0, 1]))) 95 | camera_extrinsic = np.linalg.inv(camera_extrinsic) 96 | warp_matrix = calibration(camera_extrinsic, np.array(sample["camera_intrinsic"])) 97 | imgs = list(cv2.warpPerspective(src = img, M = warp_matrix, dsize= (256,128), flags= cv2.WARP_INVERSE_MAP) for img in imgs) 98 | 99 | # cvt back to PIL images 100 | # cv2.imshow('0', imgs[0]) 101 | # cv2.imshow('1', imgs[1]) 102 | # cv2.waitKey(0) 103 | imgs = list(Image.fromarray(img) for img in imgs) 104 | imgs = list(self.transforms(img) for img in imgs) 105 | input_img = torch.cat(imgs, dim=0) 106 | 107 | return dict( 108 | input_img=input_img, 109 | future_poses=future_poses, 110 | camera_intrinsic=torch.tensor(sample['camera_intrinsic']), 111 | camera_extrinsic=torch.tensor(sample['camera_extrinsic']), 112 | camera_translation_inv=torch.tensor(sample['camera_translation_inv']), 113 | camera_rotation_matrix_inv=torch.tensor(sample['camera_rotation_matrix_inv']), 114 | ) 115 | 116 | 117 | class SequencePlanningDataset(PlanningDataset): 118 | def __init__(self, root='data', json_path_pattern='p3_%s.json', split='train'): 119 | print('Sequence', end='') 120 | self.fix_seq_length = 18 121 | super().__init__(root=root, json_path_pattern=json_path_pattern, split=split) 122 | 123 | def __getitem__(self, idx): 124 | seq_samples = self.samples[idx] 125 | seq_length = len(seq_samples) 126 | if seq_length < self.fix_seq_length: 127 | # Only 1 sample < 28 (==21) 128 | return self.__getitem__(np.random.randint(0, len(self.samples))) 129 | if seq_length > self.fix_seq_length: 130 | seq_length_delta = seq_length - self.fix_seq_length 131 | seq_length_delta = np.random.randint(0, seq_length_delta+1) 132 | seq_samples = seq_samples[seq_length_delta:self.fix_seq_length+seq_length_delta] 133 | 134 | seq_future_poses = list(smp['future_poses'] for smp in seq_samples) 135 | seq_imgs = list(smp['imgs'] for smp in seq_samples) 136 | 137 | seq_input_img = [] 138 | for imgs in seq_imgs: 139 | imgs = list(self._get_cv2_image(os.path.join(self.img_root, p)) for p in imgs) 140 | imgs = list(cv2.cvtColor(img, cv2.COLOR_BGR2RGB) for img in imgs) # RGB 141 | imgs = list(Image.fromarray(img) for img in imgs) 142 | imgs = list(self.transforms(img) for img in imgs) 143 | input_img = torch.cat(imgs, dim=0) 144 | seq_input_img.append(input_img[None]) 145 | seq_input_img = torch.cat(seq_input_img) 146 | 147 | return dict( 148 | seq_input_img=seq_input_img, # torch.Size([28, 10, 3]) 149 | seq_future_poses=torch.tensor(seq_future_poses), # torch.Size([28, 6, 128, 256]) 150 | camera_intrinsic=torch.tensor(seq_samples[0]['camera_intrinsic']), 151 | camera_extrinsic=torch.tensor(seq_samples[0]['camera_extrinsic']), 152 | camera_translation_inv=torch.tensor(seq_samples[0]['camera_translation_inv']), 153 | camera_rotation_matrix_inv=torch.tensor(seq_samples[0]['camera_rotation_matrix_inv']), 154 | ) 155 | 156 | 157 | class Comma2k19SequenceDataset(PlanningDataset): 158 | def __init__(self, split_txt_path, prefix, mode, use_memcache=True, return_origin=False): 159 | self.split_txt_path = split_txt_path 160 | self.prefix = prefix 161 | 162 | self.samples = open(split_txt_path).readlines() 163 | self.samples = [i.strip() for i in self.samples] 164 | 165 | assert mode in ('train', 'val', 'demo') 166 | self.mode = mode 167 | if self.mode == 'demo': 168 | print('Comma2k19SequenceDataset: DEMO mode is on.') 169 | 170 | self.fix_seq_length = 800 if mode == 'train' else 800 171 | 172 | self.transforms = transforms.Compose( 173 | [ 174 | # transforms.Resize((900 // 2, 1600 // 2)), 175 | # transforms.Resize((9 * 32, 16 * 32)), 176 | transforms.Resize((128, 256)), 177 | transforms.ToTensor(), 178 | transforms.Normalize([0.3890, 0.3937, 0.3851], 179 | [0.2172, 0.2141, 0.2209]), 180 | ] 181 | ) 182 | 183 | self.warp_matrix = calibration(extrinsic_matrix=np.array([[ 0, -1, 0, 0], 184 | [ 0, 0, -1, 1.22], 185 | [ 1, 0, 0, 0], 186 | [ 0, 0, 0, 1]]), 187 | cam_intrinsics=np.array([[910, 0, 582], 188 | [0, 910, 437], 189 | [0, 0, 1]]), 190 | device_frame_from_road_frame=np.hstack((np.diag([1, -1, -1]), [[0], [0], [1.22]]))) 191 | 192 | self.use_memcache = use_memcache 193 | if self.use_memcache: 194 | self._init_mc_() 195 | 196 | self.return_origin = return_origin 197 | 198 | # from OpenPilot 199 | self.num_pts = 10 * 20 # 10 s * 20 Hz = 200 frames 200 | self.t_anchors = np.array( 201 | (0. , 0.00976562, 0.0390625 , 0.08789062, 0.15625 , 202 | 0.24414062, 0.3515625 , 0.47851562, 0.625 , 0.79101562, 203 | 0.9765625 , 1.18164062, 1.40625 , 1.65039062, 1.9140625 , 204 | 2.19726562, 2.5 , 2.82226562, 3.1640625 , 3.52539062, 205 | 3.90625 , 4.30664062, 4.7265625 , 5.16601562, 5.625 , 206 | 6.10351562, 6.6015625 , 7.11914062, 7.65625 , 8.21289062, 207 | 8.7890625 , 9.38476562, 10.) 208 | ) 209 | self.t_idx = np.linspace(0, 10, num=self.num_pts) 210 | 211 | 212 | def _get_cv2_vid(self, path): 213 | if self.use_memcache: 214 | path = self.client.generate_presigned_url(str(path), client_method='get_object', expires_in=3600) 215 | return cv2.VideoCapture(path) 216 | 217 | def _get_numpy(self, path): 218 | if self.use_memcache: 219 | bytes = io.BytesIO(memoryview(self.client.get(str(path)))) 220 | return np.lib.format.read_array(bytes) 221 | else: 222 | return np.load(path) 223 | 224 | def __getitem__(self, idx): 225 | seq_sample_path = self.prefix + self.samples[idx] 226 | cap = self._get_cv2_vid(seq_sample_path + '/video.hevc') 227 | if (cap.isOpened() == False): 228 | raise RuntimeError 229 | imgs = [] # <--- all frames here 230 | origin_imgs = [] 231 | while (cap.isOpened()): 232 | ret, frame = cap.read() 233 | if ret == True: 234 | imgs.append(frame) 235 | # cv2.imshow('frame', frame) 236 | # cv2.waitKey(0) 237 | if self.return_origin: 238 | origin_imgs.append(frame) 239 | else: 240 | break 241 | cap.release() 242 | 243 | seq_length = len(imgs) 244 | 245 | if self.mode == 'demo': 246 | self.fix_seq_length = seq_length - self.num_pts - 1 247 | 248 | if seq_length < self.fix_seq_length + self.num_pts: 249 | print('The length of sequence', seq_sample_path, 'is too short', 250 | '(%d < %d)' % (seq_length, self.fix_seq_length + self.num_pts)) 251 | return self.__getitem__(idx+1) 252 | 253 | seq_length_delta = seq_length - (self.fix_seq_length + self.num_pts) 254 | seq_length_delta = np.random.randint(1, seq_length_delta+1) 255 | 256 | seq_start_idx = seq_length_delta 257 | seq_end_idx = seq_length_delta + self.fix_seq_length 258 | 259 | # seq_input_img 260 | imgs = imgs[seq_start_idx-1: seq_end_idx] # contains one more img 261 | imgs = [cv2.warpPerspective(src=img, M=self.warp_matrix, dsize=(512,256), flags=cv2.WARP_INVERSE_MAP) for img in imgs] 262 | imgs = [cv2.cvtColor(img, cv2.COLOR_BGR2RGB) for img in imgs] 263 | imgs = list(Image.fromarray(img) for img in imgs) 264 | imgs = list(self.transforms(img)[None] for img in imgs) 265 | input_img = torch.cat(imgs, dim=0) # [N+1, 3, H, W] 266 | del imgs 267 | input_img = torch.cat((input_img[:-1, ...], input_img[1:, ...]), dim=1) 268 | 269 | # poses 270 | frame_positions = self._get_numpy(self.prefix + self.samples[idx] + '/global_pose/frame_positions')[seq_start_idx: seq_end_idx+self.num_pts] 271 | frame_orientations = self._get_numpy(self.prefix + self.samples[idx] + '/global_pose/frame_orientations')[seq_start_idx: seq_end_idx+self.num_pts] 272 | 273 | future_poses = [] 274 | for i in range(self.fix_seq_length): 275 | ecef_from_local = orient.rot_from_quat(frame_orientations[i]) 276 | local_from_ecef = ecef_from_local.T 277 | frame_positions_local = np.einsum('ij,kj->ki', local_from_ecef, frame_positions - frame_positions[i]).astype(np.float32) 278 | 279 | # Time-Anchor like OpenPilot 280 | fs = [interp1d(self.t_idx, frame_positions_local[i: i+self.num_pts, j]) for j in range(3)] 281 | interp_positions = [fs[j](self.t_anchors)[:, None] for j in range(3)] 282 | interp_positions = np.concatenate(interp_positions, axis=1) 283 | 284 | future_poses.append(interp_positions) 285 | future_poses = torch.tensor(np.array(future_poses), dtype=torch.float32) 286 | 287 | rtn_dict = dict( 288 | seq_input_img=input_img, # torch.Size([N, 6, 128, 256]) 289 | seq_future_poses=future_poses, # torch.Size([N, num_pts, 3]) 290 | # camera_intrinsic=torch.tensor(seq_samples[0]['camera_intrinsic']), 291 | # camera_extrinsic=torch.tensor(seq_samples[0]['camera_extrinsic']), 292 | # camera_translation_inv=torch.tensor(seq_samples[0]['camera_translation_inv']), 293 | # camera_rotation_matrix_inv=torch.tensor(seq_samples[0]['camera_rotation_matrix_inv']), 294 | ) 295 | 296 | # For DEMO 297 | if self.return_origin: 298 | origin_imgs = origin_imgs[seq_start_idx: seq_end_idx] 299 | origin_imgs = [torch.tensor(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))[None] for img in origin_imgs] 300 | origin_imgs = torch.cat(origin_imgs, dim=0) # N, H_ori, W_ori, 3 301 | rtn_dict['origin_imgs'] = origin_imgs 302 | 303 | return rtn_dict 304 | -------------------------------------------------------------------------------- /demo.py: -------------------------------------------------------------------------------- 1 | from utils import draw_path, draw_trajectory_on_ax 2 | import torch 3 | import os 4 | import cv2 5 | import numpy as np 6 | from tqdm import tqdm 7 | from torch.nn.functional import softmax 8 | 9 | from data import PlanningDataset, Comma2k19SequenceDataset 10 | from main import SequenceBaselineV1 11 | from torch.utils.data import DataLoader 12 | 13 | import matplotlib.pyplot as plt 14 | 15 | 16 | CKPT_PATH = 'vis/M5_epoch_94.pth' # Path to your checkpoint 17 | 18 | # You can generate your own comma2k19_demo.txt to make some fancy demos 19 | # val = Comma2k19SequenceDataset('data/comma2k19_demo.txt', 'data/comma2k19/','demo', use_memcache=False, return_origin=True) 20 | val = Comma2k19SequenceDataset('data/comma2k19_val_non_overlap.txt', 'data/comma2k19/','demo', use_memcache=False, return_origin=True) 21 | val_loader = DataLoader(val, 1, num_workers=0, shuffle=False) 22 | 23 | planning_v0 = SequenceBaselineV1(5, 33, 1.0, 0.0, 'adamw') 24 | planning_v0.load_state_dict(torch.load(CKPT_PATH)) 25 | planning_v0.eval().cuda() 26 | 27 | seq_idx = 0 28 | for b_idx, batch in enumerate(val_loader): 29 | os.mkdir('vis/M5_DEMO_%04d' % seq_idx) 30 | seq_inputs, seq_labels = batch['seq_input_img'].cuda(), batch['seq_future_poses'].cuda() 31 | origin_imgs = batch['origin_imgs'] 32 | # camera_rotation_matrix_inv=batch['camera_rotation_matrix_inv'].numpy()[0] 33 | # camera_translation_inv=batch['camera_translation_inv'].numpy()[0] 34 | # camera_intrinsic=batch['camera_intrinsic'].numpy()[0] 35 | bs = seq_labels.size(0) 36 | seq_length = seq_labels.size(1) 37 | 38 | hidden = torch.zeros((2, bs, 512), device=seq_inputs.device) 39 | 40 | img_idx = 0 41 | for t in tqdm(range(seq_length)): 42 | 43 | with torch.no_grad(): 44 | inputs, labels = seq_inputs[:, t, :, :, :], seq_labels[:, t, :, :] 45 | pred_cls, pred_trajectory, hidden = planning_v0(inputs, hidden) 46 | 47 | pred_conf = softmax(pred_cls, dim=-1).cpu().numpy()[0] 48 | pred_trajectory = pred_trajectory.reshape(planning_v0.M, planning_v0.num_pts, 3).cpu().numpy() 49 | 50 | inputs, labels = inputs.cpu(), labels.cpu() 51 | vis_img = (inputs.permute(0, 2, 3, 1)[0] * torch.tensor((0.2172, 0.2141, 0.2209, 0.2172, 0.2141, 0.2209)) + torch.tensor((0.3890, 0.3937, 0.3851, 0.3890, 0.3937, 0.3851)) ) * 255 52 | # print(vis_img.max(), vis_img.min(), vis_img.mean()) 53 | vis_img = vis_img.clamp(0, 255) 54 | img_0, img_1 = vis_img[..., :3].numpy().astype(np.uint8), vis_img[..., 3:].numpy().astype(np.uint8) 55 | 56 | # fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, figsize=(16, 9)) 57 | # fig = plt.figure(figsize=(16, 9), constrained_layout=True) 58 | # fig = plt.figure(figsize=(12, 9.9)) # W, H 59 | fig = plt.figure(figsize=(12, 9)) # W, H 60 | spec = fig.add_gridspec(3, 3) # H, W 61 | ax1 = fig.add_subplot(spec[ 2, 0]) # H, W 62 | ax2 = fig.add_subplot(spec[ 2, 1]) 63 | ax3 = fig.add_subplot(spec[ :, 2]) 64 | ax4 = fig.add_subplot(spec[0:2, 0:2]) 65 | 66 | ax1.imshow(img_0) 67 | ax1.set_title('network input [previous]') 68 | ax1.axis('off') 69 | 70 | ax2.imshow(img_1) 71 | ax2.set_title('network input [current]') 72 | ax2.axis('off') 73 | 74 | current_metric = (((pred_trajectory[pred_conf.argmax()] - labels.numpy()) ** 2).sum(-1) ** 0.5).mean().item() 75 | 76 | trajectories = list(pred_trajectory) + list(labels) 77 | confs = list(pred_conf) + [1, ] 78 | ax3 = draw_trajectory_on_ax(ax3, trajectories, confs, ylim=(0, 200)) 79 | ax3.set_title('Mean L2: %.2f' % current_metric) 80 | ax3.grid() 81 | 82 | origin_img = origin_imgs[0, t, :, :, :].numpy() 83 | overlay = origin_img.copy() 84 | draw_path(pred_trajectory[pred_conf.argmax(), :], overlay, width=1, height=1.2, fill_color=(255,255,255), line_color=(0,255,0)) 85 | origin_img = 0.5 * origin_img + 0.5 * overlay 86 | draw_path(pred_trajectory[pred_conf.argmax(), :], origin_img, width=1, height=1.2, fill_color=None, line_color=(0,255,0)) 87 | 88 | ax4.imshow(origin_img.astype(np.uint8)) 89 | ax4.set_title('project on current frame') 90 | ax4.axis('off') 91 | 92 | # ax4.imshow(img_1) 93 | # pred_mask = np.argmax(pred_conf) 94 | # pred_trajectory = [pred_trajectory[pred_mask, ...], ] + [batch['future_poses'].numpy()[0], ] 95 | # pred_conf = [pred_conf[pred_mask], ] + [1, ] 96 | # for pred_trajectory_single, pred_conf_single in zip(pred_trajectory, pred_conf): 97 | # location = list((p + camera_translation_inv for p in pred_trajectory_single)) 98 | # proj_trajectory = np.array(list((camera_intrinsic @ (camera_rotation_matrix_inv @ l) for l in location))) 99 | # proj_trajectory /= proj_trajectory[..., 2:3].repeat(3, -1) 100 | # proj_trajectory /= 2 101 | # proj_trajectory = proj_trajectory[(proj_trajectory[..., 0] > 0) & (proj_trajectory[..., 0] < 800)] 102 | # proj_trajectory = proj_trajectory[(proj_trajectory[..., 1] > 0) & (proj_trajectory[..., 1] < 450)] 103 | # ax4.plot(proj_trajectory[:, 0], proj_trajectory[:, 1], 'o-', label='gt' if pred_conf_single == 1.0 else 'pred - conf %.3f' % pred_conf_single, alpha=np.clip(pred_conf_single, 0.1, np.Inf)) 104 | 105 | # ax4.legend() 106 | plt.tight_layout() 107 | plt.savefig('vis/M5_DEMO_%04d/%08d.jpg' % (seq_idx, img_idx), pad_inches=0.2, bbox_inches='tight') 108 | img_idx += 1 109 | # plt.show() 110 | plt.close(fig) 111 | 112 | seq_idx += 1 113 | -------------------------------------------------------------------------------- /imgs/arXiv_fig1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenDriveLab/Openpilot-Deepdive/fae05055c071fe8b6ed0dd578bb047f29f2b4dd4/imgs/arXiv_fig1.png -------------------------------------------------------------------------------- /imgs/dataset_cmp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenDriveLab/Openpilot-Deepdive/fae05055c071fe8b6ed0dd578bb047f29f2b4dd4/imgs/dataset_cmp.png -------------------------------------------------------------------------------- /imgs/demo01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenDriveLab/Openpilot-Deepdive/fae05055c071fe8b6ed0dd578bb047f29f2b4dd4/imgs/demo01.png -------------------------------------------------------------------------------- /imgs/demo_exp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenDriveLab/Openpilot-Deepdive/fae05055c071fe8b6ed0dd578bb047f29f2b4dd4/imgs/demo_exp.png -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import time 4 | import random 5 | from tqdm import tqdm 6 | from argparse import ArgumentParser 7 | 8 | import numpy as np 9 | import torch 10 | from torch import nn 11 | import torch.nn.functional as F 12 | from torch import optim 13 | from torch.utils.data import DataLoader 14 | 15 | import torch.distributed as dist 16 | from torch.utils.data.distributed import DistributedSampler 17 | 18 | if torch.__version__ == 'parrots': 19 | from pavi import SummaryWriter 20 | else: 21 | from torch.utils.tensorboard import SummaryWriter 22 | 23 | from data import PlanningDataset, SequencePlanningDataset, Comma2k19SequenceDataset 24 | from model import PlaningNetwork, MultipleTrajectoryPredictionLoss, SequencePlanningNetwork 25 | from utils import draw_trajectory_on_ax, get_val_metric, get_val_metric_keys 26 | 27 | 28 | def get_hyperparameters(parser: ArgumentParser): 29 | parser.add_argument('--batch_size', type=int, default=6) 30 | parser.add_argument('--lr', type=float, default=1e-4) 31 | parser.add_argument('--n_workers', type=int, default=4) 32 | parser.add_argument('--epochs', type=int, default=100) 33 | parser.add_argument('--log_per_n_step', type=int, default=20) 34 | parser.add_argument('--val_per_n_epoch', type=int, default=1) 35 | 36 | parser.add_argument('--resume', type=str, default='') 37 | 38 | parser.add_argument('--M', type=int, default=5) 39 | parser.add_argument('--num_pts', type=int, default=33) 40 | parser.add_argument('--mtp_alpha', type=float, default=1.0) 41 | parser.add_argument('--optimizer', type=str, default='sgd') 42 | parser.add_argument('--sync_bn', type=bool, default=True) 43 | parser.add_argument('--tqdm', type=bool, default=False) 44 | parser.add_argument('--optimize_per_n_step', type=int, default=40) 45 | 46 | try: 47 | exp_name = os.environ["SLURM_JOB_ID"] 48 | except KeyError: 49 | exp_name = str(time.time()) 50 | parser.add_argument('--exp_name', type=str, default=exp_name) 51 | 52 | return parser 53 | 54 | 55 | def setup(rank, world_size): 56 | torch.cuda.set_device(rank) 57 | dist.init_process_group('nccl', init_method='tcp://localhost:%s' % os.environ['PORT'], rank=rank, world_size=world_size) 58 | print('[%.2f]' % time.time(), 'DDP Initialized at %s:%s' % ('localhost', os.environ['PORT']), rank, 'of', world_size, flush=True) 59 | 60 | 61 | def get_dataloader(rank, world_size, batch_size, pin_memory=False, num_workers=0): 62 | train = Comma2k19SequenceDataset('data/comma2k19_train_non_overlap.txt', 'data/comma2k19/','train', use_memcache=False) 63 | val = Comma2k19SequenceDataset('data/comma2k19_val_non_overlap.txt', 'data/comma2k19/','demo', use_memcache=False) 64 | 65 | if torch.__version__ == 'parrots': 66 | dist_sampler_params = dict(num_replicas=world_size, rank=rank, shuffle=True) 67 | else: 68 | dist_sampler_params = dict(num_replicas=world_size, rank=rank, shuffle=True, drop_last=True) 69 | train_sampler = DistributedSampler(train, **dist_sampler_params) 70 | val_sampler = DistributedSampler(val, **dist_sampler_params) 71 | 72 | loader_args = dict(num_workers=num_workers, persistent_workers=True if num_workers > 0 else False, prefetch_factor=2, pin_memory=pin_memory) 73 | train_loader = DataLoader(train, batch_size, sampler=train_sampler, **loader_args) 74 | val_loader = DataLoader(val, batch_size=1, sampler=val_sampler, **loader_args) 75 | 76 | return train_loader, val_loader 77 | 78 | 79 | def cleanup(): 80 | dist.destroy_process_group() 81 | 82 | class SequenceBaselineV1(nn.Module): 83 | def __init__(self, M, num_pts, mtp_alpha, lr, optimizer, optimize_per_n_step=40) -> None: 84 | super().__init__() 85 | self.M = M 86 | self.num_pts = num_pts 87 | self.mtp_alpha = mtp_alpha 88 | self.lr = lr 89 | self.optimizer = optimizer 90 | 91 | self.net = SequencePlanningNetwork(M, num_pts) 92 | 93 | self.optimize_per_n_step = optimize_per_n_step # for the gru module 94 | 95 | @staticmethod 96 | def configure_optimizers(args, model): 97 | if args.optimizer == 'sgd': 98 | optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=0.9, weight_decay=0.01) 99 | elif args.optimizer == 'adam': 100 | optimizer = optim.Adam(model.parameters(), lr=args.lr, weight_decay=0.01) 101 | elif args.optimizer == 'adamw': 102 | optimizer = optim.AdamW(model.parameters(), lr=args.lr, ) 103 | else: 104 | raise NotImplementedError 105 | lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 20, 0.9) 106 | 107 | return optimizer, lr_scheduler 108 | 109 | def forward(self, x, hidden=None): 110 | if hidden is None: 111 | hidden = torch.zeros((2, x.size(0), 512)).to(self.device) 112 | return self.net(x, hidden) 113 | 114 | 115 | def main(rank, world_size, args): 116 | if rank == 0: 117 | writer = SummaryWriter() 118 | 119 | train_dataloader, val_dataloader = get_dataloader(rank, world_size, args.batch_size, False, args.n_workers) 120 | model = SequenceBaselineV1(args.M, args.num_pts, args.mtp_alpha, args.lr, args.optimizer, args.optimize_per_n_step) 121 | use_sync_bn = args.sync_bn 122 | if use_sync_bn: 123 | model = nn.SyncBatchNorm.convert_sync_batchnorm(model) 124 | model = model.cuda() 125 | optimizer, lr_scheduler = model.configure_optimizers(args, model) 126 | model: SequenceBaselineV1 127 | if args.resume and rank == 0: 128 | print('Loading weights from', args.resume) 129 | model.load_state_dict(torch.load(args.resume), strict=True) 130 | dist.barrier() 131 | model = nn.parallel.DistributedDataParallel(model, device_ids=[rank], find_unused_parameters=True, broadcast_buffers=False) 132 | loss = MultipleTrajectoryPredictionLoss(args.mtp_alpha, args.M, args.num_pts, distance_type='angle') 133 | 134 | num_steps = 0 135 | disable_tqdm = (not args.tqdm) or (rank != 0) 136 | 137 | for epoch in tqdm(range(args.epochs), disable=disable_tqdm, position=0): 138 | train_dataloader.sampler.set_epoch(epoch) 139 | 140 | for batch_idx, data in enumerate(tqdm(train_dataloader, leave=False, disable=disable_tqdm, position=1)): 141 | seq_inputs, seq_labels = data['seq_input_img'].cuda(), data['seq_future_poses'].cuda() 142 | bs = seq_labels.size(0) 143 | seq_length = seq_labels.size(1) 144 | 145 | hidden = torch.zeros((2, bs, 512)).cuda() 146 | total_loss = 0 147 | for t in tqdm(range(seq_length), leave=False, disable=disable_tqdm, position=2): 148 | num_steps += 1 149 | inputs, labels = seq_inputs[:, t, :, :, :], seq_labels[:, t, :, :] 150 | pred_cls, pred_trajectory, hidden = model(inputs, hidden) 151 | 152 | cls_loss, reg_loss = loss(pred_cls, pred_trajectory, labels) 153 | total_loss += (cls_loss + args.mtp_alpha * reg_loss.mean()) / model.module.optimize_per_n_step 154 | 155 | if rank == 0 and (num_steps + 1) % args.log_per_n_step == 0: 156 | # TODO: add a customized log function 157 | writer.add_scalar('train/epoch', epoch, num_steps) 158 | writer.add_scalar('loss/cls', cls_loss, num_steps) 159 | writer.add_scalar('loss/reg', reg_loss.mean(), num_steps) 160 | writer.add_scalar('loss/reg_x', reg_loss[0], num_steps) 161 | writer.add_scalar('loss/reg_y', reg_loss[1], num_steps) 162 | writer.add_scalar('loss/reg_z', reg_loss[2], num_steps) 163 | writer.add_scalar('param/lr', optimizer.param_groups[0]['lr'], num_steps) 164 | 165 | if (t + 1) % model.module.optimize_per_n_step == 0: 166 | hidden = hidden.clone().detach() 167 | optimizer.zero_grad() 168 | total_loss.backward() 169 | torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) # TODO: move to args 170 | optimizer.step() 171 | if rank == 0: 172 | writer.add_scalar('loss/total', total_loss, num_steps) 173 | total_loss = 0 174 | 175 | if not isinstance(total_loss, int): 176 | optimizer.zero_grad() 177 | total_loss.backward() 178 | torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) # TODO: move to args 179 | optimizer.step() 180 | if rank == 0: 181 | writer.add_scalar('loss/total', total_loss, num_steps) 182 | 183 | lr_scheduler.step() 184 | if (epoch + 1) % args.val_per_n_epoch == 0: 185 | if rank == 0: 186 | # save model 187 | ckpt_path = os.path.join(writer.log_dir, 'epoch_%d.pth' % epoch) 188 | torch.save(model.module.state_dict(), ckpt_path) 189 | print('[Epoch %d] checkpoint saved at %s' % (epoch, ckpt_path)) 190 | 191 | model.eval() 192 | with torch.no_grad(): 193 | saved_metric_epoch = get_val_metric_keys() 194 | for batch_idx, data in enumerate(tqdm(val_dataloader, leave=False, disable=disable_tqdm, position=1)): 195 | seq_inputs, seq_labels = data['seq_input_img'].cuda(), data['seq_future_poses'].cuda() 196 | 197 | bs = seq_labels.size(0) 198 | seq_length = seq_labels.size(1) 199 | 200 | hidden = torch.zeros((2, bs, 512), device=seq_inputs.device) 201 | for t in tqdm(range(seq_length), leave=False, disable=True, position=2): 202 | inputs, labels = seq_inputs[:, t, :, :, :], seq_labels[:, t, :, :] 203 | pred_cls, pred_trajectory, hidden = model(inputs, hidden) 204 | 205 | metrics = get_val_metric(pred_cls, pred_trajectory.view(-1, args.M, args.num_pts, 3), labels) 206 | 207 | for k, v in metrics.items(): 208 | saved_metric_epoch[k].append(v.float().mean().item()) 209 | 210 | dist.barrier() # Wait for all processes 211 | # sync 212 | metric_single = torch.zeros((len(saved_metric_epoch), ), dtype=torch.float32, device='cuda') 213 | counter_single = torch.zeros((len(saved_metric_epoch), ), dtype=torch.int32, device='cuda') 214 | # From Python 3.6 onwards, the standard dict type maintains insertion order by default. 215 | # But, programmers should not rely on it. 216 | for i, k in enumerate(sorted(saved_metric_epoch.keys())): 217 | metric_single[i] = np.mean(saved_metric_epoch[k]) 218 | counter_single[i] = len(saved_metric_epoch[k]) 219 | 220 | metric_gather = [torch.zeros((len(saved_metric_epoch), ), dtype=torch.float32, device='cuda')[None] for _ in range(world_size)] 221 | counter_gather = [torch.zeros((len(saved_metric_epoch), ), dtype=torch.int32, device='cuda')[None] for _ in range(world_size)] 222 | dist.all_gather(metric_gather, metric_single[None]) 223 | dist.all_gather(counter_gather, counter_single[None]) 224 | 225 | if rank == 0: 226 | metric_gather = torch.cat(metric_gather, dim=0) # [world_size, num_metric_keys] 227 | counter_gather = torch.cat(counter_gather, dim=0) # [world_size, num_metric_keys] 228 | metric_gather_weighted_mean = (metric_gather * counter_gather).sum(0) / counter_gather.sum(0) 229 | for i, k in enumerate(sorted(saved_metric_epoch.keys())): 230 | writer.add_scalar(k, metric_gather_weighted_mean[i], num_steps) 231 | dist.barrier() 232 | 233 | model.train() 234 | 235 | cleanup() 236 | 237 | 238 | if __name__ == "__main__": 239 | print('[%.2f]' % time.time(), 'starting job...', os.environ['SLURM_PROCID'], 'of', os.environ['SLURM_NTASKS'], flush=True) 240 | 241 | parser = ArgumentParser() 242 | parser = get_hyperparameters(parser) 243 | args = parser.parse_args() 244 | 245 | setup(rank=int(os.environ['SLURM_PROCID']), world_size=int(os.environ['SLURM_NTASKS'])) 246 | main(rank=int(os.environ['SLURM_PROCID']), world_size=int(os.environ['SLURM_NTASKS']), args=args) 247 | -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | import torch.nn.functional as F 4 | 5 | from efficientnet_pytorch import EfficientNet 6 | 7 | 8 | class PlaningNetwork(nn.Module): 9 | def __init__(self, M, num_pts): 10 | super().__init__() 11 | self.M = M 12 | self.num_pts = num_pts 13 | self.backbone = EfficientNet.from_pretrained('efficientnet-b2', in_channels=6) 14 | 15 | use_avg_pooling = False # TODO 16 | if use_avg_pooling: 17 | self.plan_head = nn.Sequential( 18 | nn.AdaptiveAvgPool2d(1), 19 | nn.Flatten(), 20 | nn.BatchNorm1d(1408), 21 | nn.ReLU(), 22 | nn.Linear(1408, 4096), 23 | nn.BatchNorm1d(4096), 24 | nn.ReLU(), 25 | # nn.Dropout(0.3), 26 | nn.Linear(4096, M * (num_pts * 3 + 1)) # +1 for cls 27 | ) 28 | else: # more like the structure of OpenPilot 29 | self.plan_head = nn.Sequential( 30 | # 6, 450, 800 -> 1408, 14, 25 31 | # nn.AdaptiveMaxPool2d((4, 8)), # 1408, 4, 8 32 | nn.BatchNorm2d(1408), 33 | nn.Conv2d(1408, 32, 1), # 32, 4, 8 34 | nn.BatchNorm2d(32), 35 | nn.Flatten(), 36 | nn.ELU(), 37 | nn.Linear(1024, 4096), 38 | nn.BatchNorm1d(4096), 39 | nn.ReLU(), 40 | # nn.Dropout(0.3), 41 | nn.Linear(4096, M * (num_pts * 3 + 1)) # +1 for cls 42 | ) 43 | 44 | 45 | def forward(self, x): 46 | features = self.backbone.extract_features(x) 47 | raw_preds = self.plan_head(features) 48 | pred_cls = raw_preds[:, :self.M] 49 | pred_trajectory = raw_preds[:, self.M:].reshape(-1, self.M, self.num_pts, 3) 50 | 51 | pred_xs = pred_trajectory[:, :, :, 0:1].exp() 52 | pred_ys = pred_trajectory[:, :, :, 1:2].sinh() 53 | pred_zs = pred_trajectory[:, :, :, 2:3] 54 | return pred_cls, torch.cat((pred_xs, pred_ys, pred_zs), dim=3) 55 | 56 | 57 | class SequencePlanningNetwork(nn.Module): 58 | def __init__(self, M, num_pts): 59 | super().__init__() 60 | self.M = M 61 | self.num_pts = num_pts 62 | self.backbone = EfficientNet.from_pretrained('efficientnet-b2', in_channels=6) 63 | 64 | self.plan_head = nn.Sequential( 65 | # 6, 450, 800 -> 1408, 14, 25 66 | # nn.AdaptiveMaxPool2d((4, 8)), # 1408, 4, 8 67 | nn.BatchNorm2d(1408), 68 | nn.Conv2d(1408, 32, 1), # 32, 4, 8 69 | nn.BatchNorm2d(32), 70 | nn.Flatten(), 71 | nn.ELU(), 72 | ) 73 | self.gru = nn.GRU(input_size=1024, hidden_size=512, bidirectional=True, batch_first=True) # 1024 out 74 | self.plan_head_tip = nn.Sequential( 75 | nn.Flatten(), 76 | # nn.BatchNorm1d(1024), 77 | nn.ELU(), 78 | nn.Linear(1024, 4096), 79 | # nn.BatchNorm1d(4096), 80 | nn.ReLU(), 81 | # nn.Dropout(0.3), 82 | nn.Linear(4096, M * (num_pts * 3 + 1)) # +1 for cls 83 | ) 84 | 85 | def forward(self, x, hidden): 86 | features = self.backbone.extract_features(x) 87 | 88 | raw_preds = self.plan_head(features) 89 | raw_preds, hidden = self.gru(raw_preds[:, None, :], hidden) # N, L, H_in for batch_first=True 90 | raw_preds = self.plan_head_tip(raw_preds) 91 | 92 | pred_cls = raw_preds[:, :self.M] 93 | pred_trajectory = raw_preds[:, self.M:].reshape(-1, self.M, self.num_pts, 3) 94 | 95 | pred_xs = pred_trajectory[:, :, :, 0:1].exp() 96 | pred_ys = pred_trajectory[:, :, :, 1:2].sinh() 97 | pred_zs = pred_trajectory[:, :, :, 2:3] 98 | return pred_cls, torch.cat((pred_xs, pred_ys, pred_zs), dim=3), hidden 99 | 100 | 101 | class AbsoluteRelativeErrorLoss(nn.Module): 102 | def __init__(self, epsilon=1e-4): 103 | super().__init__() 104 | self.epsilon = epsilon 105 | 106 | def forward(self, pred, target): 107 | error = (pred - target) / (target + self.epsilon) 108 | return torch.abs(error) 109 | 110 | 111 | class SigmoidAbsoluteRelativeErrorLoss(nn.Module): 112 | def __init__(self, epsilon=1e-4): 113 | super().__init__() 114 | self.epsilon = epsilon 115 | 116 | def forward(self, pred, target): 117 | error = (pred - target) / (target + self.epsilon) 118 | return torch.sigmoid(torch.abs(error)) 119 | 120 | 121 | class MultipleTrajectoryPredictionLoss(nn.Module): 122 | def __init__(self, alpha, M, num_pts, distance_type='angle'): 123 | super().__init__() 124 | self.alpha = alpha # TODO: currently no use 125 | self.M = M 126 | self.num_pts = num_pts 127 | 128 | self.distance_type = distance_type 129 | if self.distance_type == 'angle': 130 | self.distance_func = nn.CosineSimilarity(dim=2) 131 | else: 132 | raise NotImplementedError 133 | self.cls_loss = nn.CrossEntropyLoss() 134 | self.reg_loss = nn.SmoothL1Loss(reduction='none') 135 | # self.reg_loss = SigmoidAbsoluteRelativeErrorLoss() 136 | # self.reg_loss = AbsoluteRelativeErrorLoss() 137 | 138 | def forward(self, pred_cls, pred_trajectory, gt): 139 | """ 140 | pred_cls: [B, M] 141 | pred_trajectory: [B, M * num_pts * 3] 142 | gt: [B, num_pts, 3] 143 | """ 144 | assert len(pred_cls) == len(pred_trajectory) == len(gt) 145 | pred_trajectory = pred_trajectory.reshape(-1, self.M, self.num_pts, 3) 146 | with torch.no_grad(): 147 | # step 1: calculate distance between gt and each prediction 148 | pred_end_positions = pred_trajectory[:, :, self.num_pts-1, :] # B, M, 3 149 | gt_end_positions = gt[:, self.num_pts-1:, :].expand(-1, self.M, -1) # B, 1, 3 -> B, M, 3 150 | 151 | distances = 1 - self.distance_func(pred_end_positions, gt_end_positions) # B, M 152 | index = distances.argmin(dim=1) # B 153 | 154 | gt_cls = index 155 | pred_trajectory = pred_trajectory[torch.tensor(range(len(gt_cls)), device=gt_cls.device), index, ...] # B, num_pts, 3 156 | 157 | cls_loss = self.cls_loss(pred_cls, gt_cls) 158 | 159 | reg_loss = self.reg_loss(pred_trajectory, gt).mean(dim=(0, 1)) 160 | 161 | return cls_loss, reg_loss 162 | 163 | 164 | if __name__ == '__main__': 165 | # model = EfficientNet.from_pretrained('efficientnet-b2', in_channels=6) 166 | model = PlaningNetwork(M=3, num_pts=20) 167 | 168 | dummy_input = torch.zeros((1, 6, 256, 512)) 169 | 170 | # features = model.extract_features(dummy_input) 171 | features = model(dummy_input) 172 | 173 | pred_cls = torch.rand(16, 5) 174 | pred_trajectory = torch.rand(16, 5*20*3) 175 | gt = torch.rand(16, 20, 3) 176 | 177 | loss = MultipleTrajectoryPredictionLoss(1.0, 5, 20) 178 | 179 | loss(pred_cls, pred_trajectory, gt) 180 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | nuscenes-devkit 2 | efficientnet_pytorch 3 | numpyencoder 4 | tensorboard 5 | # pytorch-lightning -------------------------------------------------------------------------------- /tools/extract_comma2k19.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | from tqdm import tqdm 4 | import os 5 | import cv2 6 | import glob 7 | cv2.setNumThreads(0) 8 | cv2.ocl.setUseOpenCL(False) 9 | import numpy as np 10 | import random 11 | random.seed(0) 12 | 13 | 14 | def main(): 15 | sequences = glob.glob('data/comma2k19/*/*/*/video.hevc') 16 | random.shuffle(sequences) 17 | 18 | num_seqs = len(sequences) 19 | print(num_seqs, 'sequences') 20 | 21 | num_train = int(0.8 * num_seqs) 22 | 23 | with open('data/comma2k19_train.txt', 'w') as f: 24 | f.writelines(seq.replace('data/comma2k19/', '').replace('/video.hevc', '\n') for seq in sequences[:num_train]) 25 | with open('data/comma2k19_val.txt', 'w') as f: 26 | f.writelines(seq.replace('data/comma2k19/', '').replace('/video.hevc', '\n') for seq in sequences[num_train:]) 27 | example_segment = 'data/comma2k19/Chunk_1/b0c9d2329ad1606b|2018-07-27--06-03-57/3/' 28 | frame_times = np.load(example_segment + 'global_pose/frame_times') 29 | print(frame_times.shape) 30 | 31 | # === Generating non-overlaping seqs === 32 | sequences = glob.glob('data/comma2k19/*/*/*/video.hevc') 33 | sequences = [seq.replace('data/comma2k19/', '').replace('/video.hevc', '') for seq in sequences] 34 | seq_names = list(set([seq.split('/')[1] for seq in sequences])) 35 | num_seqs = len(seq_names) 36 | num_train = int(0.8 * num_seqs) 37 | train_seq_names = seq_names[:num_train] 38 | with open('data/comma2k19_train_non_overlap.txt', 'w') as f: 39 | f.writelines(seq + '\n' for seq in sequences if seq.split('/')[1] in train_seq_names) 40 | with open('data/comma2k19_val_non_overlap.txt', 'w') as f: 41 | f.writelines(seq + '\n' for seq in sequences if seq.split('/')[1] not in train_seq_names) 42 | 43 | 44 | if __name__ == '__main__': 45 | main() 46 | -------------------------------------------------------------------------------- /tools/extract_nuscenes.py: -------------------------------------------------------------------------------- 1 | import math 2 | import json 3 | import random 4 | random.seed(0) 5 | import numpy as np 6 | from tqdm import tqdm 7 | from numpyencoder import NumpyEncoder 8 | 9 | from nuscenes.nuscenes import NuScenes 10 | from nuscenes.can_bus.can_bus_api import NuScenesCanBus 11 | from scipy.spatial.transform import Rotation 12 | 13 | 14 | # Hyper-Params 15 | DATA_ROOT = 'data/nuscenes' 16 | SPLIT = 'v1.0-trainval' 17 | NUM_RGB_IMGS = 2 18 | NUM_FUTURE_TRAJECTORY_PTS = 10 19 | OUTPUT_JSON_NAME = 'data/p3_10pts_can_bus_%s_temporal.json' 20 | GET_CAN_BUS = True 21 | 22 | TEMPORAL = True 23 | 24 | 25 | sensors_tree = { 26 | 'ms_imu': 27 | [ 28 | 'linear_accel', 29 | 'q', 30 | 'rotation_rate', 31 | ], 32 | 33 | 'pose': 34 | [ 35 | 'accel', 36 | 'orientation', 37 | 'pos', 38 | 'rotation_rate', 39 | 'vel', 40 | ], 41 | 42 | 'steeranglefeedback': 43 | [ 44 | 'value', 45 | ], 46 | 47 | 'vehicle_monitor': 48 | [ 49 | 'available_distance', 50 | 'battery_level', 51 | 'brake', 52 | 'brake_switch', 53 | 'gear_position', 54 | 'left_signal', 55 | 'rear_left_rpm', 56 | 'rear_right_rpm', 57 | 'right_signal', 58 | 'steering', 59 | 'steering_speed', 60 | 'throttle', 61 | 'vehicle_speed', 62 | 'yaw_rate', 63 | ], 64 | 65 | 'zoe_veh_info': 66 | [ 67 | 'FL_wheel_speed', 68 | 'FR_wheel_speed', 69 | 'RL_wheel_speed', 70 | 'RR_wheel_speed', 71 | 'left_solar', 72 | 'longitudinal_accel', 73 | 'meanEffTorque', 74 | 'odom', 75 | 'odom_speed', 76 | 'pedal_cc', 77 | 'regen', 78 | 'requestedTorqueAfterProc', 79 | 'right_solar', 80 | 'steer_corrected', 81 | 'steer_offset_can', 82 | 'steer_raw', 83 | 'transversal_accel', 84 | ], 85 | 86 | 'zoesensors': 87 | [ 88 | 'brake_sensor', 89 | 'steering_sensor', 90 | 'throttle_sensor', 91 | ], 92 | } 93 | 94 | 95 | def find_nearest_index(array, value): 96 | idx = np.searchsorted(array, value, side="left") 97 | if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])): 98 | return idx-1 99 | else: 100 | return idx 101 | 102 | 103 | def get_samples(nusc, scenes, nusc_can=None): 104 | samples = [] 105 | # list of dicts, where 106 | # 'img': LIST of filenames 0, 1, ..., NUM_RGB_IMGS - 1. 107 | # NUM_RGB_IMGS - 1 is the frame of 'current' timestamp 108 | # 'pt_%d': LIST of future points offset by current img 0, 1, ..., NUM_FUTURE_TRAJECTORY_PTS - 1 109 | # 0 is the point of the 'very next' timestamp 110 | for scene in tqdm(scenes, ncols=0): 111 | assert len(scene) >= NUM_RGB_IMGS + NUM_FUTURE_TRAJECTORY_PTS 112 | valid_start_tokens = scene[NUM_RGB_IMGS-1 : -NUM_FUTURE_TRAJECTORY_PTS] 113 | if TEMPORAL: 114 | cur_scene_samples = [] 115 | # CAN BUS 116 | if nusc_can is not None: 117 | can_bus_cache = dict() 118 | scene_token = nusc.get('sample', valid_start_tokens[0])['scene_token'] 119 | scene_name = nusc.get('scene', scene_token)['name'] 120 | 121 | has_can_bus_data = True 122 | for message_name, keys in sensors_tree.items(): 123 | try: 124 | can_data = nusc_can.get_messages(scene_name, message_name) 125 | except Exception: 126 | has_can_bus_data = False 127 | continue 128 | can_bus_cache['%s.utime' % (message_name)] = np.array([m['utime'] for m in can_data]) 129 | if len(can_bus_cache['%s.utime' % message_name]) == 0: 130 | has_can_bus_data = False 131 | continue 132 | for key_name in keys: 133 | can_bus_cache['%s.%s' % (message_name, key_name)] = np.array([m[key_name] for m in can_data]) 134 | 135 | if not has_can_bus_data: 136 | print('Error: %s does not have any CAN bus data!' % scene_name) 137 | continue 138 | 139 | for idx, cur_token in enumerate(valid_start_tokens): 140 | img_tokens = scene[idx:idx+NUM_RGB_IMGS] 141 | point_tokens = scene[idx+NUM_RGB_IMGS:idx+NUM_RGB_IMGS+NUM_FUTURE_TRAJECTORY_PTS] 142 | 143 | cam_front_data = nusc.get('sample_data', nusc.get('sample', cur_token)['data']['CAM_FRONT']) 144 | # Images 145 | imgs = list(nusc.get('sample_data', nusc.get('sample', token)['data']['CAM_FRONT'])['filename'] for token in img_tokens) 146 | 147 | # Ego poses 148 | cur_ego_pose = nusc.get('ego_pose', cam_front_data['ego_pose_token']) 149 | ego_rotation_matrix = Rotation.from_quat(np.array(cur_ego_pose['rotation'])[[1,2,3,0]]).as_matrix() 150 | ego_tranlation = np.array(cur_ego_pose['translation']) 151 | ego_rotation_matrix_inv = np.linalg.inv(ego_rotation_matrix) 152 | ego_tranlation_inv = -ego_tranlation 153 | 154 | future_poses = list(nusc.get('ego_pose', nusc.get('sample_data', nusc.get('sample', token)['data']['CAM_FRONT'])['ego_pose_token'])['translation'] for token in point_tokens) 155 | future_poses = list(ego_rotation_matrix_inv @ (np.array(future_pose)+ego_tranlation_inv) for future_pose in future_poses) 156 | future_poses = list(list(p) for p in future_poses) # for json 157 | 158 | # Camera Matrices 159 | calibration_para = nusc.get('calibrated_sensor', cam_front_data['calibrated_sensor_token']) 160 | camera_intrinsic = np.array(calibration_para['camera_intrinsic']) 161 | camera_rotation_matrix = Rotation.from_quat(np.array(calibration_para['rotation'])[[1,2,3,0]]).as_matrix() 162 | camera_translation = np.array(calibration_para['translation']) 163 | camera_rotation_matrix_inv = np.linalg.inv(camera_rotation_matrix) 164 | camera_translation_inv = -camera_translation 165 | camera_extrinsic = np.vstack((np.hstack((camera_rotation_matrix_inv, camera_translation_inv.reshape((3, 1)))), np.array([0, 0, 0, 1]))) 166 | 167 | cur_sample_to_append = dict( 168 | imgs=imgs, 169 | future_poses=future_poses, 170 | camera_intrinsic=camera_intrinsic.tolist(), 171 | camera_extrinsic=camera_extrinsic.tolist(), 172 | camera_translation_inv=camera_translation_inv.tolist(), 173 | camera_rotation_matrix_inv=camera_rotation_matrix_inv.tolist(), 174 | ) 175 | 176 | # CAN BUS 177 | if nusc_can is not None: 178 | img_timestamp = nusc.get('sample_data', nusc.get('sample', img_tokens[-1])['data']['CAM_FRONT'])['timestamp'] 179 | cur_sample_to_append['img_utime'] = img_timestamp 180 | for message_name, keys in sensors_tree.items(): 181 | message_utimes = can_bus_cache['%s.utime' % message_name] 182 | nearest_index = find_nearest_index(message_utimes, img_timestamp) 183 | can_bus_time_delta = abs(message_utimes[nearest_index] - img_timestamp) # ideally should be less than half the sample rate (2Hz * 2 = 4Hz) 184 | if can_bus_time_delta >= 0.25 * 1e6: 185 | print('Warning', scene_name, message_utimes[nearest_index], img_timestamp, can_bus_time_delta) 186 | cur_sample_to_append['can_bus.%s.utime' % message_name] = message_utimes[nearest_index] 187 | cur_sample_to_append['can_bus.%s.can_bus_delta' % message_name] = can_bus_time_delta 188 | for key_name in keys: 189 | can_bus_value = can_bus_cache['%s.%s' % (message_name, key_name)][nearest_index] 190 | if isinstance(can_bus_value, np.ndarray): 191 | can_bus_value = can_bus_value.tolist() 192 | cur_sample_to_append['can_bus.%s.%s' % (message_name, key_name)] = can_bus_value 193 | if TEMPORAL: 194 | cur_scene_samples.append(cur_sample_to_append) 195 | else: 196 | samples.append(cur_sample_to_append) 197 | 198 | if TEMPORAL: 199 | samples.append(cur_scene_samples) 200 | 201 | return samples 202 | 203 | 204 | # Load NuScenes dataset 205 | nusc = NuScenes(version=SPLIT, dataroot=DATA_ROOT, verbose=True) 206 | nusc_can = NuScenesCanBus(dataroot=DATA_ROOT) if GET_CAN_BUS else None 207 | 208 | # get all scenes into time structure 209 | all_scenes = [] 210 | for scene in nusc.scene: 211 | cur_token = scene['first_sample_token'] 212 | cur_scene_tokens = [] # saves tokens of samples in this scene 213 | while cur_token != '': 214 | cur_scene_tokens.append(cur_token) 215 | cur_sample = nusc.get('sample', cur_token) 216 | cur_token = cur_sample['next'] 217 | 218 | all_scenes.append(cur_scene_tokens) 219 | 220 | random.shuffle(all_scenes) 221 | 222 | length_all_scenes = len(all_scenes) 223 | print('Altogether', length_all_scenes, 'scenes') 224 | 225 | train_samples = get_samples(nusc, all_scenes[:int(length_all_scenes * 0.8)], nusc_can) 226 | val_samples = get_samples(nusc, all_scenes[int(length_all_scenes * 0.8):], nusc_can) 227 | 228 | json.dump(train_samples, open(OUTPUT_JSON_NAME % 'train', 'w'), indent='\t', cls=NumpyEncoder) 229 | json.dump(val_samples, open(OUTPUT_JSON_NAME % 'val', 'w'), indent='\t', cls=NumpyEncoder) 230 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import matplotlib 4 | import matplotlib.pyplot as plt 5 | from matplotlib.axes import Axes 6 | import torch 7 | from torch import nn 8 | import torch.nn.functional as F 9 | 10 | from utils_comma2k19.camera import img_from_device, denormalize, view_frame_from_device_frame 11 | from cycler import cycler 12 | matplotlib.rcParams['axes.prop_cycle'] = cycler('color', 13 | ['#1f77b4', '#ff7f0e', '#2ca02c', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']) 14 | 15 | 16 | def draw_trajectory_on_ax(ax: Axes, trajectories, confs, line_type='o-', transparent=True, xlim=(-30, 30), ylim=(0, 100)): 17 | ''' 18 | ax: matplotlib.axes.Axes, the axis to draw trajectories on 19 | trajectories: List of numpy arrays of shape (num_points, 2 or 3) 20 | confs: List of numbers, 1 means gt 21 | ''' 22 | 23 | # get the max conf 24 | max_conf = max([conf for conf in confs if conf != 1]) 25 | 26 | for idx, (trajectory, conf) in enumerate(zip(trajectories, confs)): 27 | label = 'gt' if conf == 1 else 'pred%d (%.3f)' % (idx, conf) 28 | alpha = 1.0 29 | if transparent: 30 | alpha = 1.0 if conf == max_conf else np.clip(conf, 0.1, None) 31 | plot_args = dict(label=label, alpha=alpha, linewidth=2 if alpha == 1.0 else 1) 32 | if label == 'gt': 33 | plot_args['color'] = '#d62728' 34 | ax.plot(trajectory[:, 1], # - for nuscenes and + for comma 2k19 35 | trajectory[:, 0], 36 | line_type, **plot_args) 37 | if xlim is not None: 38 | ax.set_xlim(*xlim) 39 | if ylim is not None: 40 | ax.set_ylim(*ylim) 41 | ax.legend() 42 | 43 | return ax 44 | 45 | 46 | def get_val_metric(pred_cls, pred_trajectory, labels, namespace='val'): 47 | rtn_dict = dict() 48 | bs, M, num_pts, _ = pred_trajectory.shape 49 | 50 | # Lagecy metric: Prediction L2 loss 51 | pred_label = torch.argmax(pred_cls, -1) # B, 52 | pred_trajectory_single = pred_trajectory[torch.tensor(range(bs), device=pred_cls.device), pred_label, ...] 53 | l2_dists = F.mse_loss(pred_trajectory_single, labels, reduction='none') # B, num_pts, 2 or 3 54 | 55 | # Lagecy metric: cls Acc 56 | gt_trajectory_M = labels[:, None, ...].expand(-1, M, -1, -1) 57 | l2_distances = F.mse_loss(pred_trajectory, gt_trajectory_M, reduction='none').sum(dim=(2, 3)) # B, M 58 | best_match = torch.argmin(l2_distances, -1) # B, 59 | rtn_dict.update({'l2_dist': l2_dists.mean(dim=(1, 2)), 'cls_acc': best_match == pred_label}) 60 | 61 | # New Metric 62 | distance_splits = ((0, 10), (10, 20), (20, 30), (30, 50), (50, 1000)) 63 | AP_thresholds = (0.5, 1, 2) 64 | euclidean_distances = l2_dists.sum(-1).sqrt() # euclidean distances over the points: [B, num_pts] 65 | x_distances = labels[..., 0] # B, num_pts 66 | 67 | for min_dst, max_dst in distance_splits: 68 | points_mask = (x_distances >= min_dst) & (x_distances < max_dst) # B, num_pts, 69 | if points_mask.sum() == 0: 70 | continue # No gt points in this range 71 | rtn_dict.update({'eucliden_%d_%d' % (min_dst, max_dst): euclidean_distances[points_mask]}) # [sum(mask), ] 72 | rtn_dict.update({'eucliden_x_%d_%d' % (min_dst, max_dst): l2_dists[..., 0][points_mask].sqrt()}) # [sum(mask), ] 73 | rtn_dict.update({'eucliden_y_%d_%d' % (min_dst, max_dst): l2_dists[..., 1][points_mask].sqrt()}) # [sum(mask), ] 74 | 75 | for AP_threshold in AP_thresholds: 76 | hit_mask = (euclidean_distances < AP_threshold) & points_mask 77 | rtn_dict.update({'AP_%d_%d_%s' % (min_dst, max_dst, AP_threshold): hit_mask[points_mask]}) 78 | 79 | # add namespace 80 | if namespace is not None: 81 | for k in list(rtn_dict.keys()): 82 | rtn_dict['%s/%s' % (namespace, k)] = rtn_dict.pop(k) 83 | return rtn_dict 84 | 85 | 86 | def get_val_metric_keys(namespace='val'): 87 | rtn_dict = dict() 88 | rtn_dict.update({'l2_dist': [], 'cls_acc': []}) 89 | 90 | # New Metric 91 | distance_splits = ((0, 10), (10, 20), (20, 30), (30, 50), (50, 1000)) 92 | AP_thresholds = (0.5, 1, 2) 93 | 94 | for min_dst, max_dst in distance_splits: 95 | rtn_dict.update({'eucliden_%d_%d' % (min_dst, max_dst): []}) # [sum(mask), ] 96 | rtn_dict.update({'eucliden_x_%d_%d' % (min_dst, max_dst): []}) # [sum(mask), ] 97 | rtn_dict.update({'eucliden_y_%d_%d' % (min_dst, max_dst): []}) # [sum(mask), ] 98 | for AP_threshold in AP_thresholds: 99 | rtn_dict.update({'AP_%d_%d_%s' % (min_dst, max_dst, AP_threshold): []}) 100 | 101 | # add namespace 102 | if namespace is not None: 103 | for k in list(rtn_dict.keys()): 104 | rtn_dict['%s/%s' % (namespace, k)] = rtn_dict.pop(k) 105 | return rtn_dict 106 | 107 | 108 | def generate_random_params_for_warp(img, random_rate=0.1): 109 | h, w = img.shape[:2] 110 | 111 | width_max = random_rate * w 112 | height_max = random_rate * h 113 | 114 | # 8 offsets 115 | w_offsets = list(np.random.uniform(0, width_max) for _ in range(4)) 116 | h_offsets = list(np.random.uniform(0, height_max) for _ in range(4)) 117 | 118 | return w_offsets, h_offsets 119 | 120 | 121 | def warp(img, w_offsets, h_offsets): 122 | h, w = img.shape[:2] 123 | 124 | original_corner_pts = np.array( 125 | ( 126 | (w_offsets[0], h_offsets[0]), 127 | (w - w_offsets[1], h_offsets[1]), 128 | (w_offsets[2], h - h_offsets[2]), 129 | (w - w_offsets[3], h - h_offsets[3]), 130 | ), dtype=np.float32 131 | ) 132 | 133 | target_corner_pts = np.array( 134 | ( 135 | (0, 0), # Top-left 136 | (w, 0), # Top-right 137 | (0, h), # Bottom-left 138 | (w, h), # Bottom-right 139 | ), dtype=np.float32 140 | ) 141 | 142 | transform_matrix = cv2.getPerspectiveTransform(original_corner_pts, target_corner_pts) 143 | 144 | transformed_image = cv2.warpPerspective(img, transform_matrix, (w, h)) 145 | 146 | return transformed_image 147 | 148 | 149 | def draw_path(device_path, img, width=1, height=1.2, fill_color=(128,0,255), line_color=(0,255,0)): 150 | # device_path: N, 3 151 | device_path_l = device_path + np.array([0, 0, height]) 152 | device_path_r = device_path + np.array([0, 0, height]) 153 | device_path_l[:,1] -= width 154 | device_path_r[:,1] += width 155 | 156 | img_points_norm_l = img_from_device(device_path_l) 157 | img_points_norm_r = img_from_device(device_path_r) 158 | 159 | img_pts_l = denormalize(img_points_norm_l) 160 | img_pts_r = denormalize(img_points_norm_r) 161 | # filter out things rejected along the way 162 | valid = np.logical_and(np.isfinite(img_pts_l).all(axis=1), np.isfinite(img_pts_r).all(axis=1)) 163 | img_pts_l = img_pts_l[valid].astype(int) 164 | img_pts_r = img_pts_r[valid].astype(int) 165 | 166 | for i in range(1, len(img_pts_l)): 167 | u1,v1,u2,v2 = np.append(img_pts_l[i-1], img_pts_r[i-1]) 168 | u3,v3,u4,v4 = np.append(img_pts_l[i], img_pts_r[i]) 169 | pts = np.array([[u1,v1],[u2,v2],[u4,v4],[u3,v3]], np.int32).reshape((-1,1,2)) 170 | if fill_color: 171 | cv2.fillPoly(img,[pts],fill_color) 172 | if line_color: 173 | cv2.polylines(img,[pts],True,line_color) 174 | -------------------------------------------------------------------------------- /utils_comma2k19/LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 comma.ai 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /utils_comma2k19/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenDriveLab/Openpilot-Deepdive/fae05055c071fe8b6ed0dd578bb047f29f2b4dd4/utils_comma2k19/__init__.py -------------------------------------------------------------------------------- /utils_comma2k19/benchmarks.py: -------------------------------------------------------------------------------- 1 | import coordinates as coord 2 | import numpy as np 3 | from tqdm import tqdm 4 | 5 | 6 | def get_altitude_errors(frame_poss): 7 | ''' 8 | Takes in a list of 2D arrays. Where every element 9 | of the list represents all the frame positions 10 | of a segment in ECEF. 11 | ''' 12 | 13 | # set size of squares (bins) of road which we 14 | # assume to the road to have constant height 15 | # and area within which to check 16 | local = coord.LocalCoord.from_ecef([-2712470.27794758, -4262442.18438959, 3879912.32221487]) 17 | north_bounds = [-16000, 2500] 18 | east_bounds = [-1000, 7000] 19 | binsize = 5 20 | north_bins, east_bins = [], [] 21 | for i in range(north_bounds[0], north_bounds[1], binsize): 22 | north_bins.append([i,i+binsize]) 23 | for i in range(east_bounds[0], east_bounds[1], binsize): 24 | east_bins.append([i,i+binsize]) 25 | 26 | # convert positions to NED 27 | frame_poss_ned = [] 28 | for pos in frame_poss: 29 | if pos is None: 30 | continue 31 | frame_poss_ned.append(local.ecef2ned(pos)) 32 | 33 | # find bin idxs for all frame positions 34 | bins = [[[] for j in range(len(east_bins))] for i in range(len(north_bins))] 35 | for pos_ned in frame_poss_ned: 36 | north_idxs = np.clip(((pos_ned[:,0] - north_bounds[0])/binsize).astype(int), 37 | 0, 38 | len(bins)-1) 39 | east_idxs = np.clip(((pos_ned[:,1] - east_bounds[0])/binsize).astype(int), 40 | 0, 41 | len(bins[0])-1) 42 | idxs = np.column_stack((north_idxs, east_idxs)) 43 | _, uniq = np.unique(idxs, return_index=True, axis=0) 44 | for p, idx in zip(pos_ned[uniq], idxs[uniq]): 45 | bins[idx[0]][idx[1]].append(p) 46 | 47 | 48 | # Now find the errors by looking at the deviation 49 | # from the mean in each bin 50 | alt_diffs = [] 51 | k = 0 52 | for pos_ned in tqdm(frame_poss_ned): 53 | k +=1 54 | north_idxs = np.clip(((pos_ned[:,0] - north_bounds[0])/binsize).astype(int), 55 | 0, 56 | len(bins)-1) 57 | east_idxs = np.clip(((pos_ned[:,1] - east_bounds[0])/binsize).astype(int), 58 | 0, 59 | len(bins[0])-1) 60 | idxs = np.column_stack((north_idxs, east_idxs)) 61 | alt_diffs.append([]) 62 | for p, idx in zip(pos_ned, idxs): 63 | # we want at least 5 observations per bin 64 | if len(bins[idx[0]][idx[1]]) > 5: 65 | alt_diffs[-1].append(p[2] - np.mean(np.array(bins[idx[0]][idx[1]])[:,2])) 66 | return alt_diffs 67 | -------------------------------------------------------------------------------- /utils_comma2k19/camera.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import utils_comma2k19.orientation as orient 3 | 4 | FULL_FRAME_SIZE = (1164, 874) 5 | W, H = FULL_FRAME_SIZE[0], FULL_FRAME_SIZE[1] 6 | eon_focal_length = FOCAL = 910.0 7 | 8 | # aka 'K' aka camera_frame_from_view_frame 9 | eon_intrinsics = np.array([ 10 | [FOCAL, 0., W/2.], 11 | [ 0., FOCAL, H/2.], 12 | [ 0., 0., 1.]]) 13 | 14 | # aka 'K_inv' aka view_frame_from_camera_frame 15 | eon_intrinsics_inv = np.linalg.inv(eon_intrinsics) 16 | 17 | # device/mesh : x->forward, y-> right, z->down 18 | # view : x->right, y->down, z->forward 19 | device_frame_from_view_frame = np.array([ 20 | [ 0., 0., 1.], 21 | [ 1., 0., 0.], 22 | [ 0., 1., 0.] 23 | ]) 24 | view_frame_from_device_frame = device_frame_from_view_frame.T 25 | 26 | 27 | def get_calib_from_vp(vp): 28 | vp_norm = normalize(vp) 29 | yaw_calib = np.arctan(vp_norm[0]) 30 | pitch_calib = np.arctan(vp_norm[1]*np.cos(yaw_calib)) 31 | # TODO should be, this but written 32 | # to be compatible with meshcalib and 33 | # get_view_frame_from_road_fram 34 | #pitch_calib = -np.arctan(vp_norm[1]*np.cos(yaw_calib)) 35 | roll_calib = 0 36 | return roll_calib, pitch_calib, yaw_calib 37 | 38 | # aka 'extrinsic_matrix' 39 | # road : x->forward, y -> left, z->up 40 | def get_view_frame_from_road_frame(roll, pitch, yaw, height): 41 | # TODO 42 | # calibration pitch is currently defined 43 | # opposite to pitch in device frame 44 | pitch = -pitch 45 | device_from_road = orient.rot_from_euler([roll, pitch, yaw]).dot(np.diag([1, -1, -1])) 46 | view_from_road = view_frame_from_device_frame.dot(device_from_road) 47 | return np.hstack((view_from_road, [[0], [height], [0]])) 48 | 49 | 50 | def vp_from_ke(m): 51 | """ 52 | Computes the vanishing point from the product of the intrinsic and extrinsic 53 | matrices C = KE. 54 | 55 | The vanishing point is defined as lim x->infinity C (x, 0, 0, 1).T 56 | """ 57 | return (m[0, 0]/m[2,0], m[1,0]/m[2,0]) 58 | 59 | def roll_from_ke(m): 60 | # note: different from calibration.h/RollAnglefromKE: i think that one's just wrong 61 | return np.arctan2(-(m[1, 0] - m[1, 1] * m[2, 0] / m[2, 1]), 62 | -(m[0, 0] - m[0, 1] * m[2, 0] / m[2, 1])) 63 | 64 | def normalize(img_pts): 65 | # normalizes image coordinates 66 | # accepts single pt or array of pts 67 | img_pts = np.array(img_pts) 68 | input_shape = img_pts.shape 69 | img_pts = np.atleast_2d(img_pts) 70 | img_pts = np.hstack((img_pts, np.ones((img_pts.shape[0],1)))) 71 | img_pts_normalized = eon_intrinsics_inv.dot(img_pts.T).T 72 | img_pts_normalized[(img_pts < 0).any(axis=1)] = np.nan 73 | return img_pts_normalized[:,:2].reshape(input_shape) 74 | 75 | def denormalize(img_pts): 76 | # denormalizes image coordinates 77 | # accepts single pt or array of pts 78 | img_pts = np.array(img_pts) 79 | input_shape = img_pts.shape 80 | img_pts = np.atleast_2d(img_pts) 81 | img_pts = np.hstack((img_pts, np.ones((img_pts.shape[0],1)))) 82 | img_pts_denormalized = eon_intrinsics.dot(img_pts.T).T 83 | img_pts_denormalized[img_pts_denormalized[:,0] > W] = np.nan 84 | img_pts_denormalized[img_pts_denormalized[:,0] < 0] = np.nan 85 | img_pts_denormalized[img_pts_denormalized[:,1] > H] = np.nan 86 | img_pts_denormalized[img_pts_denormalized[:,1] < 0] = np.nan 87 | return img_pts_denormalized[:,:2].reshape(input_shape) 88 | 89 | def device_from_ecef(pos_ecef, orientation_ecef, pt_ecef): 90 | # device from ecef frame 91 | # device frame is x -> forward, y-> right, z -> down 92 | # accepts single pt or array of pts 93 | input_shape = pt_ecef.shape 94 | pt_ecef = np.atleast_2d(pt_ecef) 95 | ecef_from_device_rot = orient.rotations_from_quats(orientation_ecef) 96 | device_from_ecef_rot = ecef_from_device_rot.T 97 | pt_ecef_rel = pt_ecef - pos_ecef 98 | pt_device = np.einsum('jk,ik->ij', device_from_ecef_rot, pt_ecef_rel) 99 | return pt_device.reshape(input_shape) 100 | 101 | def img_from_device(pt_device): 102 | # img coordinates from pts in device frame 103 | # first transforms to view frame, then to img coords 104 | # accepts single pt or array of pts 105 | input_shape = pt_device.shape 106 | pt_device = np.atleast_2d(pt_device) 107 | pt_view = np.einsum('jk,ik->ij', view_frame_from_device_frame, pt_device) 108 | 109 | # This function should never return negative depths 110 | pt_view[pt_view[:,2] < 0] = np.nan 111 | 112 | pt_img = pt_view/pt_view[:,2:3] 113 | return pt_img.reshape(input_shape)[:,:2] 114 | 115 | -------------------------------------------------------------------------------- /utils_comma2k19/coordinates.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Coordinate transformation module. All methods accept arrays as input 3 | with each row as a position. 4 | ''' 5 | 6 | import numpy as np 7 | 8 | a = 6378137 9 | b = 6356752.3142 10 | esq = 6.69437999014 * 0.001 11 | e1sq = 6.73949674228 * 0.001 12 | 13 | 14 | def geodetic2ecef(geodetic, radians=False): 15 | geodetic = np.array(geodetic) 16 | input_shape = geodetic.shape 17 | geodetic = np.atleast_2d(geodetic) 18 | 19 | ratio = 1.0 if radians else (np.pi / 180.0) 20 | lat = ratio*geodetic[:,0] 21 | lon = ratio*geodetic[:,1] 22 | alt = geodetic[:,2] 23 | 24 | xi = np.sqrt(1 - esq * np.sin(lat)**2) 25 | x = (a / xi + alt) * np.cos(lat) * np.cos(lon) 26 | y = (a / xi + alt) * np.cos(lat) * np.sin(lon) 27 | z = (a / xi * (1 - esq) + alt) * np.sin(lat) 28 | ecef = np.array([x, y, z]).T 29 | return ecef.reshape(input_shape) 30 | 31 | 32 | def ecef2geodetic(ecef, radians=False): 33 | """ 34 | Convert ECEF coordinates to geodetic using ferrari's method 35 | """ 36 | # Save shape and export column 37 | ecef = np.atleast_1d(ecef) 38 | input_shape = ecef.shape 39 | ecef = np.atleast_2d(ecef) 40 | x, y, z = ecef[:, 0], ecef[:, 1], ecef[:, 2] 41 | 42 | ratio = 1.0 if radians else (180.0 / np.pi) 43 | 44 | # Conver from ECEF to geodetic using Ferrari's methods 45 | # https://en.wikipedia.org/wiki/Geographic_coordinate_conversion#Ferrari.27s_solution 46 | r = np.sqrt(x * x + y * y) 47 | Esq = a * a - b * b 48 | F = 54 * b * b * z * z 49 | G = r * r + (1 - esq) * z * z - esq * Esq 50 | C = (esq * esq * F * r * r) / (pow(G, 3)) 51 | S = np.cbrt(1 + C + np.sqrt(C * C + 2 * C)) 52 | P = F / (3 * pow((S + 1 / S + 1), 2) * G * G) 53 | Q = np.sqrt(1 + 2 * esq * esq * P) 54 | r_0 = -(P * esq * r) / (1 + Q) + np.sqrt(0.5 * a * a*(1 + 1.0 / Q) - \ 55 | P * (1 - esq) * z * z / (Q * (1 + Q)) - 0.5 * P * r * r) 56 | U = np.sqrt(pow((r - esq * r_0), 2) + z * z) 57 | V = np.sqrt(pow((r - esq * r_0), 2) + (1 - esq) * z * z) 58 | Z_0 = b * b * z / (a * V) 59 | h = U * (1 - b * b / (a * V)) 60 | lat = ratio*np.arctan((z + e1sq * Z_0) / r) 61 | lon = ratio*np.arctan2(y, x) 62 | 63 | # stack the new columns and return to the original shape 64 | geodetic = np.column_stack((lat, lon, h)) 65 | return geodetic.reshape(input_shape) 66 | 67 | class LocalCoord(object): 68 | """ 69 | Allows conversions to local frames. In this case NED. 70 | That is: North East Down from the start position in 71 | meters. 72 | """ 73 | def __init__(self, init_geodetic, init_ecef): 74 | self.init_ecef = init_ecef 75 | lat, lon, _ = (np.pi/180)*np.array(init_geodetic) 76 | self.ned2ecef_matrix = np.array([[-np.sin(lat)*np.cos(lon), -np.sin(lon), -np.cos(lat)*np.cos(lon)], 77 | [-np.sin(lat)*np.sin(lon), np.cos(lon), -np.cos(lat)*np.sin(lon)], 78 | [np.cos(lat), 0, -np.sin(lat)]]) 79 | self.ecef2ned_matrix = self.ned2ecef_matrix.T 80 | 81 | @classmethod 82 | def from_geodetic(cls, init_geodetic): 83 | init_ecef = geodetic2ecef(init_geodetic) 84 | return LocalCoord(init_geodetic, init_ecef) 85 | 86 | @classmethod 87 | def from_ecef(cls, init_ecef): 88 | init_geodetic = ecef2geodetic(init_ecef) 89 | return LocalCoord(init_geodetic, init_ecef) 90 | 91 | 92 | def ecef2ned(self, ecef): 93 | ecef = np.array(ecef) 94 | return np.dot(self.ecef2ned_matrix, (ecef - self.init_ecef).T).T 95 | 96 | def ned2ecef(self, ned): 97 | ned = np.array(ned) 98 | # Transpose so that init_ecef will broadcast correctly for 1d or 2d ned. 99 | return (np.dot(self.ned2ecef_matrix, ned.T).T + self.init_ecef) 100 | 101 | def geodetic2ned(self, geodetic): 102 | ecef = geodetic2ecef(geodetic) 103 | return self.ecef2ned(ecef) 104 | 105 | def ned2geodetic(self, ned): 106 | ecef = self.ned2ecef(ned) 107 | return ecef2geodetic(ecef) 108 | -------------------------------------------------------------------------------- /utils_comma2k19/dataset.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import numpy as np 3 | import torch 4 | from torch.utils.data import Dataset, DataLoader 5 | from torchvision import transforms 6 | from tools.lib.framereader import FrameReader, BaseFrameReader 7 | 8 | 9 | class ToTensor(object): 10 | 11 | def __call__(self, sample): 12 | return { 13 | key: torch.from_numpy(value) for key, value in sample.items() 14 | } 15 | 16 | 17 | class CommaDataset(Dataset): 18 | 19 | def __init__(self, main_dir, transform=None): 20 | self.main_dir = main_dir 21 | self.frame_reader = FrameReader(main_dir + 'video.hevc') 22 | 23 | self.gps_times = np.load(main_dir + 'global_pose/frame_gps_times') 24 | self.orientations = np.load(main_dir + 'global_pose/frame_orientations') 25 | self.positions = np.load(main_dir + 'global_pose/frame_positions') 26 | self.times = np.load(main_dir + 'global_pose/frame_times') 27 | self.velocities = np.load(main_dir + 'global_pose/frame_velocities') 28 | 29 | self.transform = transform 30 | 31 | def __len__(self): 32 | return len(self.velocities) 33 | 34 | def __getitem__(self, idx): 35 | image = np.array(self.frame_reader.get(idx, pix_fmt='rgb24')[0], dtype=np.float64) 36 | 37 | sample = { 38 | 'image': image, 39 | 'gps_times': self.gps_times, 40 | 'orientations': self.orientations, 41 | 'positions': self.positions, 42 | 'times': self.times, 43 | 'velocities': self.velocities[idx] 44 | } 45 | 46 | if self.transform: 47 | sample = self.transform(sample) 48 | 49 | return sample 50 | 51 | 52 | if __name__ == "__main__": 53 | 54 | example_segment = 'Example_1/b0c9d2329ad1606b|2018-08-02--08-34-47/40/' 55 | frame_idx = 200 56 | 57 | comma_dataset = CommaDataset(main_dir=example_segment, transform=transforms.Compose([ 58 | ToTensor() 59 | ])) 60 | 61 | comma_dataloader = DataLoader(comma_dataset, batch_size=4, shuffle=True, num_workers=0) 62 | 63 | # sample = comma_dataset[frame_idx] 64 | sample = next(iter(comma_dataloader)) 65 | image = sample['image'][0].numpy() 66 | velocity = sample['velocities'][0].numpy() 67 | -------------------------------------------------------------------------------- /utils_comma2k19/orientation.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Vectorized functions that transform between 3 | rotation matrices, euler angles and quaternions. 4 | All support lists, array or array of arrays as inputs. 5 | Supports both x2y and y_from_x format (y_from_x preferred!). 6 | ''' 7 | 8 | import numpy as np 9 | from numpy import dot, inner, array, linalg 10 | from utils_comma2k19.coordinates import LocalCoord 11 | 12 | 13 | def euler2quat(eulers): 14 | eulers = array(eulers) 15 | if len(eulers.shape) > 1: 16 | output_shape = (-1,4) 17 | else: 18 | output_shape = (4,) 19 | eulers = np.atleast_2d(eulers) 20 | gamma, theta, psi = eulers[:,0], eulers[:,1], eulers[:,2] 21 | 22 | q0 = np.cos(gamma / 2) * np.cos(theta / 2) * np.cos(psi / 2) + \ 23 | np.sin(gamma / 2) * np.sin(theta / 2) * np.sin(psi / 2) 24 | q1 = np.sin(gamma / 2) * np.cos(theta / 2) * np.cos(psi / 2) - \ 25 | np.cos(gamma / 2) * np.sin(theta / 2) * np.sin(psi / 2) 26 | q2 = np.cos(gamma / 2) * np.sin(theta / 2) * np.cos(psi / 2) + \ 27 | np.sin(gamma / 2) * np.cos(theta / 2) * np.sin(psi / 2) 28 | q3 = np.cos(gamma / 2) * np.cos(theta / 2) * np.sin(psi / 2) - \ 29 | np.sin(gamma / 2) * np.sin(theta / 2) * np.cos(psi / 2) 30 | 31 | quats = array([q0, q1, q2, q3]).T 32 | for i in xrange(len(quats)): 33 | if quats[i,0] < 0: 34 | quats[i] = -quats[i] 35 | return quats.reshape(output_shape) 36 | 37 | 38 | def quat2euler(quats): 39 | quats = array(quats) 40 | if len(quats.shape) > 1: 41 | output_shape = (-1,3) 42 | else: 43 | output_shape = (3,) 44 | quats = np.atleast_2d(quats) 45 | q0, q1, q2, q3 = quats[:,0], quats[:,1], quats[:,2], quats[:,3] 46 | 47 | gamma = np.arctan2(2 * (q0 * q1 + q2 * q3), 1 - 2 * (q1**2 + q2**2)) 48 | theta = np.arcsin(2 * (q0 * q2 - q3 * q1)) 49 | psi = np.arctan2(2 * (q0 * q3 + q1 * q2), 1 - 2 * (q2**2 + q3**2)) 50 | 51 | eulers = array([gamma, theta, psi]).T 52 | return eulers.reshape(output_shape) 53 | 54 | 55 | def quat2rot(quats): 56 | quats = array(quats) 57 | input_shape = quats.shape 58 | quats = np.atleast_2d(quats) 59 | Rs = np.zeros((quats.shape[0], 3, 3)) 60 | q0 = quats[:, 0] 61 | q1 = quats[:, 1] 62 | q2 = quats[:, 2] 63 | q3 = quats[:, 3] 64 | Rs[:, 0, 0] = q0 * q0 + q1 * q1 - q2 * q2 - q3 * q3 65 | Rs[:, 0, 1] = 2 * (q1 * q2 - q0 * q3) 66 | Rs[:, 0, 2] = 2 * (q0 * q2 + q1 * q3) 67 | Rs[:, 1, 0] = 2 * (q1 * q2 + q0 * q3) 68 | Rs[:, 1, 1] = q0 * q0 - q1 * q1 + q2 * q2 - q3 * q3 69 | Rs[:, 1, 2] = 2 * (q2 * q3 - q0 * q1) 70 | Rs[:, 2, 0] = 2 * (q1 * q3 - q0 * q2) 71 | Rs[:, 2, 1] = 2 * (q0 * q1 + q2 * q3) 72 | Rs[:, 2, 2] = q0 * q0 - q1 * q1 - q2 * q2 + q3 * q3 73 | 74 | if len(input_shape) < 2: 75 | return Rs[0] 76 | else: 77 | return Rs 78 | 79 | 80 | def rot2quat(rots): 81 | input_shape = rots.shape 82 | if len(input_shape) < 3: 83 | rots = array([rots]) 84 | K3 = np.empty((len(rots), 4, 4)) 85 | K3[:, 0, 0] = (rots[:, 0, 0] - rots[:, 1, 1] - rots[:, 2, 2]) / 3.0 86 | K3[:, 0, 1] = (rots[:, 1, 0] + rots[:, 0, 1]) / 3.0 87 | K3[:, 0, 2] = (rots[:, 2, 0] + rots[:, 0, 2]) / 3.0 88 | K3[:, 0, 3] = (rots[:, 1, 2] - rots[:, 2, 1]) / 3.0 89 | K3[:, 1, 0] = K3[:, 0, 1] 90 | K3[:, 1, 1] = (rots[:, 1, 1] - rots[:, 0, 0] - rots[:, 2, 2]) / 3.0 91 | K3[:, 1, 2] = (rots[:, 2, 1] + rots[:, 1, 2]) / 3.0 92 | K3[:, 1, 3] = (rots[:, 2, 0] - rots[:, 0, 2]) / 3.0 93 | K3[:, 2, 0] = K3[:, 0, 2] 94 | K3[:, 2, 1] = K3[:, 1, 2] 95 | K3[:, 2, 2] = (rots[:, 2, 2] - rots[:, 0, 0] - rots[:, 1, 1]) / 3.0 96 | K3[:, 2, 3] = (rots[:, 0, 1] - rots[:, 1, 0]) / 3.0 97 | K3[:, 3, 0] = K3[:, 0, 3] 98 | K3[:, 3, 1] = K3[:, 1, 3] 99 | K3[:, 3, 2] = K3[:, 2, 3] 100 | K3[:, 3, 3] = (rots[:, 0, 0] + rots[:, 1, 1] + rots[:, 2, 2]) / 3.0 101 | q = np.empty((len(rots), 4)) 102 | for i in xrange(len(rots)): 103 | _, eigvecs = linalg.eigh(K3[i].T) 104 | eigvecs = eigvecs[:,3:] 105 | q[i, 0] = eigvecs[-1] 106 | q[i, 1:] = -eigvecs[:-1].flatten() 107 | if q[i, 0] < 0: 108 | q[i] = -q[i] 109 | 110 | if len(input_shape) < 3: 111 | return q[0] 112 | else: 113 | return q 114 | 115 | 116 | def euler2rot(eulers): 117 | return rotations_from_quats(euler2quat(eulers)) 118 | 119 | 120 | def rot2euler(rots): 121 | return quat2euler(quats_from_rotations(rots)) 122 | 123 | 124 | quats_from_rotations = rot2quat 125 | quat_from_rot = rot2quat 126 | rotations_from_quats = quat2rot 127 | rot_from_quat= quat2rot 128 | rot_from_quat= quat2rot 129 | euler_from_rot = rot2euler 130 | euler_from_quat = quat2euler 131 | rot_from_euler = euler2rot 132 | quat_from_euler = euler2quat 133 | 134 | 135 | 136 | 137 | 138 | 139 | ''' 140 | Random helpers below 141 | ''' 142 | 143 | 144 | def quat_product(q, r): 145 | t = np.zeros(4) 146 | t[0] = r[0] * q[0] - r[1] * q[1] - r[2] * q[2] - r[3] * q[3] 147 | t[1] = r[0] * q[1] + r[1] * q[0] - r[2] * q[3] + r[3] * q[2] 148 | t[2] = r[0] * q[2] + r[1] * q[3] + r[2] * q[0] - r[3] * q[1] 149 | t[3] = r[0] * q[3] - r[1] * q[2] + r[2] * q[1] + r[3] * q[0] 150 | return t 151 | 152 | 153 | def rot_matrix(roll, pitch, yaw): 154 | cr, sr = np.cos(roll), np.sin(roll) 155 | cp, sp = np.cos(pitch), np.sin(pitch) 156 | cy, sy = np.cos(yaw), np.sin(yaw) 157 | rr = array([[1,0,0],[0, cr,-sr],[0, sr, cr]]) 158 | rp = array([[cp,0,sp],[0, 1,0],[-sp, 0, cp]]) 159 | ry = array([[cy,-sy,0],[sy, cy,0],[0, 0, 1]]) 160 | return ry.dot(rp.dot(rr)) 161 | 162 | 163 | def rot(axis, angle): 164 | # Rotates around an arbitrary axis 165 | ret_1 = (1 - np.cos(angle)) * array([[axis[0]**2, axis[0] * axis[1], axis[0] * axis[2]], [ 166 | axis[1] * axis[0], axis[1]**2, axis[1] * axis[2] 167 | ], [axis[2] * axis[0], axis[2] * axis[1], axis[2]**2]]) 168 | ret_2 = np.cos(angle) * np.eye(3) 169 | ret_3 = np.sin(angle) * array([[0, -axis[2], axis[1]], [axis[2], 0, -axis[0]], 170 | [-axis[1], axis[0], 0]]) 171 | return ret_1 + ret_2 + ret_3 172 | 173 | 174 | def ecef_euler_from_ned(ned_ecef_init, ned_pose): 175 | ''' 176 | Got it from here: 177 | Using Rotations to Build Aerospace Coordinate Systems 178 | -Don Koks 179 | ''' 180 | converter = LocalCoord.from_ecef(ned_ecef_init) 181 | x0 = converter.ned2ecef([1, 0, 0]) - converter.ned2ecef([0, 0, 0]) 182 | y0 = converter.ned2ecef([0, 1, 0]) - converter.ned2ecef([0, 0, 0]) 183 | z0 = converter.ned2ecef([0, 0, 1]) - converter.ned2ecef([0, 0, 0]) 184 | 185 | x1 = rot(z0, ned_pose[2]).dot(x0) 186 | y1 = rot(z0, ned_pose[2]).dot(y0) 187 | z1 = rot(z0, ned_pose[2]).dot(z0) 188 | 189 | x2 = rot(y1, ned_pose[1]).dot(x1) 190 | y2 = rot(y1, ned_pose[1]).dot(y1) 191 | z2 = rot(y1, ned_pose[1]).dot(z1) 192 | 193 | x3 = rot(x2, ned_pose[0]).dot(x2) 194 | y3 = rot(x2, ned_pose[0]).dot(y2) 195 | #z3 = rot(x2, ned_pose[0]).dot(z2) 196 | 197 | x0 = array([1, 0, 0]) 198 | y0 = array([0, 1, 0]) 199 | z0 = array([0, 0, 1]) 200 | 201 | psi = np.arctan2(inner(x3, y0), inner(x3, x0)) 202 | theta = np.arctan2(-inner(x3, z0), np.sqrt(inner(x3, x0)**2 + inner(x3, y0)**2)) 203 | y2 = rot(z0, psi).dot(y0) 204 | z2 = rot(y2, theta).dot(z0) 205 | phi = np.arctan2(inner(y3, z2), inner(y3, y2)) 206 | 207 | ret = array([phi, theta, psi]) 208 | return ret 209 | 210 | 211 | def ned_euler_from_ecef(ned_ecef_init, ecef_poses): 212 | ''' 213 | Got the math from here: 214 | Using Rotations to Build Aerospace Coordinate Systems 215 | -Don Koks 216 | 217 | Also accepts array of ecef_poses and array of ned_ecef_inits. 218 | Where each row is a pose and an ecef_init. 219 | ''' 220 | ned_ecef_init = array(ned_ecef_init) 221 | ecef_poses = array(ecef_poses) 222 | output_shape = ecef_poses.shape 223 | ned_ecef_init = np.atleast_2d(ned_ecef_init) 224 | if ned_ecef_init.shape[0] == 1: 225 | ned_ecef_init = np.tile(ned_ecef_init[0], (output_shape[0], 1)) 226 | ecef_poses = np.atleast_2d(ecef_poses) 227 | 228 | ned_poses = np.zeros(ecef_poses.shape) 229 | for i, ecef_pose in enumerate(ecef_poses): 230 | converter = LocalCoord.from_ecef(ned_ecef_init[i]) 231 | x0 = array([1, 0, 0]) 232 | y0 = array([0, 1, 0]) 233 | z0 = array([0, 0, 1]) 234 | 235 | x1 = rot(z0, ecef_pose[2]).dot(x0) 236 | y1 = rot(z0, ecef_pose[2]).dot(y0) 237 | z1 = rot(z0, ecef_pose[2]).dot(z0) 238 | 239 | x2 = rot(y1, ecef_pose[1]).dot(x1) 240 | y2 = rot(y1, ecef_pose[1]).dot(y1) 241 | z2 = rot(y1, ecef_pose[1]).dot(z1) 242 | 243 | x3 = rot(x2, ecef_pose[0]).dot(x2) 244 | y3 = rot(x2, ecef_pose[0]).dot(y2) 245 | #z3 = rot(x2, ecef_pose[0]).dot(z2) 246 | 247 | x0 = converter.ned2ecef([1, 0, 0]) - converter.ned2ecef([0, 0, 0]) 248 | y0 = converter.ned2ecef([0, 1, 0]) - converter.ned2ecef([0, 0, 0]) 249 | z0 = converter.ned2ecef([0, 0, 1]) - converter.ned2ecef([0, 0, 0]) 250 | 251 | psi = np.arctan2(inner(x3, y0), inner(x3, x0)) 252 | theta = np.arctan2(-inner(x3, z0), np.sqrt(inner(x3, x0)**2 + inner(x3, y0)**2)) 253 | y2 = rot(z0, psi).dot(y0) 254 | z2 = rot(y2, theta).dot(z0) 255 | phi = np.arctan2(inner(y3, z2), inner(y3, y2)) 256 | ned_poses[i] = array([phi, theta, psi]) 257 | 258 | return ned_poses.reshape(output_shape) 259 | 260 | 261 | def ecef2car(car_ecef, psi, theta, points_ecef, ned_converter): 262 | """ 263 | TODO: add roll rotation 264 | Converts an array of points in ecef coordinates into 265 | x-forward, y-left, z-up coordinates 266 | Parameters 267 | ---------- 268 | psi: yaw, radian 269 | theta: pitch, radian 270 | Returns 271 | ------- 272 | [x, y, z] coordinates in car frame 273 | """ 274 | 275 | # input is an array of points in ecef cocrdinates 276 | # output is an array of points in car's coordinate (x-front, y-left, z-up) 277 | 278 | # convert points to NED 279 | points_ned = [] 280 | for p in points_ecef: 281 | points_ned.append(ned_converter.ecef2ned_matrix.dot(array(p) - car_ecef)) 282 | 283 | points_ned = np.vstack(points_ned).T 284 | 285 | # n, e, d -> x, y, z 286 | # Calculate relative postions and rotate wrt to heading and pitch of car 287 | invert_R = array([[1., 0., 0.], [0., -1., 0.], [0., 0., -1.]]) 288 | 289 | c, s = np.cos(psi), np.sin(psi) 290 | yaw_R = array([[c, s, 0.], [-s, c, 0.], [0., 0., 1.]]) 291 | 292 | c, s = np.cos(theta), np.sin(theta) 293 | pitch_R = array([[c, 0., -s], [0., 1., 0.], [s, 0., c]]) 294 | 295 | return dot(pitch_R, dot(yaw_R, dot(invert_R, points_ned))) 296 | -------------------------------------------------------------------------------- /utils_comma2k19/unzip_msft_fs.py: -------------------------------------------------------------------------------- 1 | ''' 2 | This script is a workaround for Microsoft-based filesystems (exFat, NTFS etc). 3 | These filesystems don't allow the vertical pipe ('|') in paths. 4 | So, instead of unzipping files manually, run this script which will replace 5 | all pipes in path names in the zip files by underscores. 6 | 7 | Usage: 8 | python3 unzip_msft_fs.py 9 | ''' 10 | 11 | import multiprocessing 12 | import os 13 | import shutil 14 | import sys 15 | import zipfile 16 | 17 | NUMBER_OF_CHUNKS = 10 18 | 19 | 20 | def unzip_replace(zip_dir, zip_name, extract_dir, 21 | filter_predicate, replace_me, replace_by): 22 | zip_path = os.path.join(zip_dir, zip_name) 23 | z = zipfile.ZipFile(zip_path) 24 | for f in z.infolist(): 25 | if filter_predicate(f): 26 | old = f.filename 27 | f.filename = f.filename.replace(replace_me, replace_by) 28 | z.extract(f, extract_dir) 29 | 30 | 31 | def fix_pipe(base): 32 | """ 33 | Given unzipped directory "base", 34 | creates new directories with | replaced with _, 35 | moves all contents into their respective new directories 36 | and then deletes the old (now empty) directories that contain |. 37 | """ 38 | for d in filter(lambda s: '|' in s, os.listdir(base)): 39 | old = os.path.join(base, d) 40 | new = os.path.join(base, d.replace('|', '_')) 41 | try: 42 | os.makedirs(new, exist_ok=False) 43 | contents = map(lambda s: os.path.join(old, s), os.listdir(old)) 44 | for f in contents: 45 | shutil.move(f, os.path.join(new)) 46 | os.rmdir(old) 47 | except Exception as e: 48 | print('New directory already exists \ 49 | -- did you execute this script already?') 50 | raise e 51 | 52 | 53 | def map_fn(args): 54 | unzip_replace(args['dir'], args['.zip'], args['extract'], 55 | lambda f: '|' in f.filename, '|', '_') 56 | print('Finished unzipping {}'.format( 57 | os.path.join(args['dir'], args['.zip']))) 58 | 59 | 60 | if __name__ == "__main__": 61 | if len(sys.argv) != 3: 62 | print('python3 unzip_msft_fs.py ') 63 | sys.exit(1) 64 | 65 | dataset_dir = sys.argv[1] 66 | goal_dir = sys.argv[2] 67 | 68 | if not os.path.isdir(goal_dir): 69 | print('Creating directories for you...') 70 | os.makedirs(goal_dir) 71 | 72 | bases = ['Chunk_%d.zip' % i for i in range(1, NUMBER_OF_CHUNKS + 1)] 73 | """ 74 | Assuming that your hard drive is slow, 75 | so it is ok too have more processes than cores 76 | i.e. overhead of managing more processes << time spent unzipping 77 | """ 78 | p = multiprocessing.Pool(len(bases)) 79 | 80 | # bbases: bases wih base :) 81 | bbases = map(lambda c: os.path.join(dataset_dir, c), bases) 82 | bbases = list(filter(os.path.isfile, bbases)) 83 | assert len(bbases) == NUMBER_OF_CHUNKS, \ 84 | "Could only find {} out of {} chunks in directory {}".format( 85 | len(bbases), NUMBER_OF_CHUNKS, dataset_dir) 86 | 87 | for i, b in enumerate(bases): 88 | bases[i] = { 89 | 'dir': dataset_dir, 90 | 'extract': goal_dir, 91 | '.zip': b 92 | } 93 | p.map(map_fn, bases) 94 | 95 | bbases = list(map(lambda b: b.replace('.zip', ''), bbases)) 96 | bbases = list(filter(os.path.isdir, bbases)) 97 | assert len(bbases) == NUMBER_OF_CHUNKS, \ 98 | "Could only find {} out of {} chunks in directory {}".format( 99 | len(bbases), NUMBER_OF_CHUNKS, dataset_dir) 100 | -------------------------------------------------------------------------------- /view_transform.py: -------------------------------------------------------------------------------- 1 | import os 2 | import json 3 | import torch 4 | import numpy as np 5 | import cv2 6 | from PIL import Image 7 | from torch.utils.data import Dataset 8 | # from torchvision import transforms 9 | import matplotlib.pyplot as plt 10 | from tqdm import tqdm 11 | 12 | 13 | device_frame_from_view_frame = np.array([ 14 | [ 0., 0., 1.], 15 | [ 1., 0., 0.], 16 | [ 0., 1., 0.] 17 | ]) 18 | view_frame_from_device_frame = device_frame_from_view_frame.T 19 | 20 | # MED model 21 | MEDMODEL_INPUT_SIZE = (512, 256) 22 | MEDMODEL_YUV_SIZE = (MEDMODEL_INPUT_SIZE[0], MEDMODEL_INPUT_SIZE[1] * 3 // 2) 23 | MEDMODEL_CY = 47.6 24 | 25 | medmodel_fl = 910.0 26 | medmodel_intrinsics = np.array([ 27 | [medmodel_fl, 0.0, 0.5 * MEDMODEL_INPUT_SIZE[0]], 28 | [0.0, medmodel_fl, MEDMODEL_CY], 29 | [0.0, 0.0, 1.0]]) 30 | 31 | 32 | def calibration(extrinsic_matrix, cam_intrinsics, device_frame_from_road_frame=None): 33 | if device_frame_from_road_frame is None: 34 | device_frame_from_road_frame = np.hstack((np.diag([1, -1, -1]), [[0], [0], [1.51]])) 35 | med_frame_from_ground = medmodel_intrinsics@view_frame_from_device_frame@device_frame_from_road_frame[:,(0,1,3)] 36 | ground_from_med_frame = np.linalg.inv(med_frame_from_ground) 37 | 38 | 39 | extrinsic_matrix_eigen = extrinsic_matrix[:3] 40 | camera_frame_from_road_frame = np.dot(cam_intrinsics, extrinsic_matrix_eigen) 41 | camera_frame_from_ground = np.zeros((3,3)) 42 | camera_frame_from_ground[:,0] = camera_frame_from_road_frame[:,0] 43 | camera_frame_from_ground[:,1] = camera_frame_from_road_frame[:,1] 44 | camera_frame_from_ground[:,2] = camera_frame_from_road_frame[:,3] 45 | warp_matrix = np.dot(camera_frame_from_ground, ground_from_med_frame) 46 | 47 | return warp_matrix 48 | 49 | 50 | if __name__ == '__main__': 51 | from data import PlanningDataset 52 | dataset = PlanningDataset(split='val') 53 | for idx, data in tqdm(enumerate(dataset)): 54 | imgs = data["input_img"] 55 | img0 = imgs[0] 56 | camera_rotation_matrix = np.linalg.inv(data["camera_rotation_matrix_inv"].numpy()) 57 | camera_translation = -data["camera_translation_inv"].numpy() 58 | camera_extrinsic = np.vstack((np.hstack((camera_rotation_matrix, camera_translation.reshape((3, 1)))), np.array([0, 0, 0, 1]))) 59 | camera_extrinsic = np.linalg.inv(camera_extrinsic) 60 | cv2.imshow("origin_img",img0) 61 | cv2.waitKey(0) 62 | warp_matrix = calibration(camera_extrinsic, data["camera_intrinsic"].numpy()) 63 | transformed_img = cv2.warpPerspective(src = img0, M = warp_matrix, dsize= (512,256), flags= cv2.WARP_INVERSE_MAP) 64 | cv2.imshow("warped_img",transformed_img) 65 | cv2.waitKey(0) 66 | --------------------------------------------------------------------------------