├── README.md ├── configs ├── _base_ │ ├── datasets │ │ ├── semantickitti_lpmix.py │ │ └── semantickitti_panoptic_lpmix.py │ ├── default_runtime.py │ └── models │ │ ├── cylinder3d.py │ │ └── p3former.py ├── cylinder3d │ └── cylinder3d_8xb2_3x_semantickitti.py └── p3former │ ├── p3former_8xb2_3x_semantickitti.py │ ├── p3former_8xb2_3x_semantickitti_submit.py │ └── p3former_8xb2_3x_semantickitti_trainval.py ├── datasets ├── semantickitti_dataset.py └── transforms │ ├── __pycache__ │ ├── loading.cpython-39.pyc │ └── transforms_3d.cpython-39.pyc │ ├── formating.py │ ├── loading.py │ └── transforms_3d.py ├── dist_test.sh ├── dist_train.sh ├── evaluation ├── functional │ └── panoptic_seg_eval.py └── metrics │ └── panoptic_seg_metric.py ├── p3former ├── backbones │ └── cylinder3d.py ├── data_preprocessors │ └── data_preprocessor.py ├── decode_heads │ └── p3former_head.py ├── segmentors │ └── p3former.py └── task_modules │ └── samplers │ └── mask_pseduo_sampler.py ├── test.py ├── tools ├── create_data.py └── dataset_converters │ ├── nuscenes_converter.py │ ├── semantickitti_converter.py │ └── update_infos_to_v2.py └── train.py /README.md: -------------------------------------------------------------------------------- 1 | # P3Former: Position-Guided Point Cloud Panoptic Segmentation Transformer 2 | 3 | ![main figure](https://user-images.githubusercontent.com/45515569/227226959-35f887e0-453b-4ac8-81c0-cb4b2f79333c.png) 4 | 5 | ## Introduction 6 | 7 | This is an official release of [Position-Guided Point Cloud Panoptic Segmentation Transformer](https://arxiv.org/abs/2303.13509). 8 | 9 | 10 | ## Abstract 11 | 12 | DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception. 13 | This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline. 14 | Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. 15 | By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. 16 | Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. 17 | It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA). 18 | All these designs impel the queries to attend to specific regions and identify various instances. 19 | The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQ on SemanticKITTI and nuScenes benchmark, respectively. 20 | The source code and models are available at https://github.com/SmartBot-PJLab/P3Former. 21 | 22 | 23 | 24 | ## Results 25 | 26 | ### SemanticKITTI test 27 | 28 | | $\mathrm{PQ}$ | $\mathrm{PQ^{\dagger}}$ | $\mathrm{RQ}$ | $\mathrm{SQ}$ | $\mathrm{PQ}^{\mathrm{Th}}$ | $\mathrm{PQ}^{\mathrm{St}}$ | Download | Config | 29 | | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | 30 | | 65.3 | 67.8 | 74.9 | 86.6 | 67.4 | 63.7 | [model](https://drive.google.com/drive/folders/1RBDWV-oWOQsDAhNE8Z7SMLWsriLrpWCx?usp=sharing) | [config](https://github.com/SmartBot-PJLab/P3Former/blob/semantickitti/configs/p3former/p3former_8xb2_3x_semantickitti_trainval.py) | 31 | 32 | ### SemanticKITTI validation 33 | 34 | | $\mathrm{PQ}$ | $\mathrm{PQ^{\dagger}}$ | $\mathrm{RQ}$ | $\mathrm{SQ}$ | $\mathrm{PQ}^{\mathrm{Th}}$ | $\mathrm{PQ}^{\mathrm{St}}$ | Download | Config | 35 | | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | 36 | | 62.6 | 66.2 | 72.4 | 76.2 | 69.4 | 57.7 | [model](https://drive.google.com/drive/folders/1RBDWV-oWOQsDAhNE8Z7SMLWsriLrpWCx?usp=sharing) | [config](https://github.com/SmartBot-PJLab/P3Former/blob/semantickitti/configs/p3former/p3former_8xb2_3x_semantickitti.py) | 37 | 38 | * Pretraining a backbone is helpful to stablize the the training process and get slightly better results. You can pretrain a model with [config](https://github.com/SmartBot-PJLab/P3Former/blob/semantickitti/configs/cylinder3d/cylinder3d_8xb2_3x_semantickitti.py). 39 | 40 | ## Installation 41 | 42 | ``` 43 | conda create -n p3former python==3.8 -y 44 | conda activate p3former 45 | pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/ 46 | whl/cu111/torch_stable.html 47 | pip install openmim 48 | mim install mmengine==0.7.4 49 | mim install mmcv==2.0.0rc4 50 | mim install mmdet==3.0.0 51 | mim install mmdet3d==1.1.0 52 | wget https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl 53 | pip install torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl 54 | ``` 55 | 56 | ## Usage 57 | 58 | ### Data preparation 59 | 60 | #### Semantickitti 61 | 62 | ```text 63 | data/ 64 | ├── semantickitti 65 | │ ├── sequences 66 | │ │ ├── 00 67 | │ │ | ├── labels 68 | │ │ | ├── velodyne 69 | │ │ ├── 01 70 | │ │ ├── ... 71 | │ ├── semantickitti_infos_train.pkl 72 | │ ├── semantickitti_infos_val.pkl 73 | │ ├── semantickitti_infos_test.pkl 74 | 75 | ``` 76 | 77 | You can generate *.pkl by excuting 78 | 79 | ``` 80 | python tools/create_data.py semantickitti --root-path data/semantickitti --out-dir data/semantickitti --extra-tag semantickitti 81 | ``` 82 | 83 | ## Training and testing 84 | 85 | ```bash 86 | # train 87 | sh dist_train.sh $CONFIG $GPUS 88 | 89 | # val 90 | sh dist_test.sh $CONFIG $CHECKPOINT $GPUS 91 | 92 | # test 93 | sh dist_test.sh $CONFIG $CHECKPOINT $GPUS 94 | 95 | ``` 96 | 97 | ## Citation 98 | 99 | ```bibtex 100 | @article{xiao2023p3former, 101 | title={Position-Guided Point Cloud Panoptic Segmentation Transformer}, 102 | author={Xiao, Zeqi and Zhang, Wenwei and Wang, Tai and Chen Change Loy and Lin, Dahua and Pang, Jiangmiao}, 103 | journal={arXiv preprint}, 104 | year={2023} 105 | } 106 | ``` 107 | 108 | 109 | ## Acknowledgements 110 | 111 | We thank the contributors of [MMDetection3D](https://github.com/open-mmlab/mmdetection3d) and the authors of [Cylinder3D](https://github.com/xinge008/Cylinder3D) and [K-Net](https://github.com/ZwwWayne/K-Net) for their great work. 112 | -------------------------------------------------------------------------------- /configs/_base_/datasets/semantickitti_lpmix.py: -------------------------------------------------------------------------------- 1 | # For SemanticKitti we usually do 19-class segmentation. 2 | # For labels_map we follow the uniform format of MMDetection & MMSegmentation 3 | # i.e. we consider the unlabeled class as the last one, which is different 4 | # from the original implementation of some methods e.g. Cylinder3D. 5 | dataset_type = '_SemanticKittiDataset' 6 | data_root = 'data/semantickitti/' 7 | class_names = [ 8 | 'car', 'bicycle', 'motorcycle', 'truck', 'bus', 'person', 'bicyclist', 9 | 'motorcyclist', 'road', 'parking', 'sidewalk', 'other-ground', 'building', 10 | 'fence', 'vegetation', 'trunck', 'terrian', 'pole', 'traffic-sign' 11 | ] 12 | labels_map = { 13 | 0: 19, # "unlabeled" 14 | 1: 19, # "outlier" mapped to "unlabeled" --------------mapped 15 | 10: 0, # "car" 16 | 11: 1, # "bicycle" 17 | 13: 4, # "bus" mapped to "other-vehicle" --------------mapped 18 | 15: 2, # "motorcycle" 19 | 16: 4, # "on-rails" mapped to "other-vehicle" ---------mapped 20 | 18: 3, # "truck" 21 | 20: 4, # "other-vehicle" 22 | 30: 5, # "person" 23 | 31: 6, # "bicyclist" 24 | 32: 7, # "motorcyclist" 25 | 40: 8, # "road" 26 | 44: 9, # "parking" 27 | 48: 10, # "sidewalk" 28 | 49: 11, # "other-ground" 29 | 50: 12, # "building" 30 | 51: 13, # "fence" 31 | 52: 19, # "other-structure" mapped to "unlabeled" ------mapped 32 | 60: 8, # "lane-marking" to "road" ---------------------mapped 33 | 70: 14, # "vegetation" 34 | 71: 15, # "trunk" 35 | 72: 16, # "terrain" 36 | 80: 17, # "pole" 37 | 81: 18, # "traffic-sign" 38 | 99: 19, # "other-object" to "unlabeled" ----------------mapped 39 | 252: 0, # "moving-car" to "car" ------------------------mapped 40 | 253: 6, # "moving-bicyclist" to "bicyclist" ------------mapped 41 | 254: 5, # "moving-person" to "person" ------------------mapped 42 | 255: 7, # "moving-motorcyclist" to "motorcyclist" ------mapped 43 | 256: 4, # "moving-on-rails" mapped to "other-vehic------mapped 44 | 257: 4, # "moving-bus" mapped to "other-vehicle" -------mapped 45 | 258: 3, # "moving-truck" to "truck" --------------------mapped 46 | 259: 4 # "moving-other"-vehicle to "other-vehicle"-----mapped 47 | } 48 | 49 | metainfo = dict( 50 | classes=class_names, seg_label_mapping=labels_map, max_label=259) 51 | 52 | input_modality = dict(use_lidar=True, use_camera=False) 53 | 54 | # Example to use different file client 55 | # Method 1: simply set the data root and let the file I/O module 56 | # automatically infer from prefix (not support LMDB and Memcache yet) 57 | 58 | # data_root = 's3://openmmlab/datasets/detection3d/semantickitti/' 59 | 60 | # Method 2: Use backend_args, file_client_args in versions before 1.1.0rc4 61 | # backend_args = dict( 62 | # backend='petrel', 63 | # path_mapping=dict({ 64 | # './data/': 's3://openmmlab/datasets/detection3d/', 65 | # 'data/': 's3://openmmlab/datasets/detection3d/' 66 | # })) 67 | backend_args = None 68 | 69 | 70 | train_pipeline = [ 71 | dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4), 72 | dict( 73 | type='LoadAnnotations3D', 74 | with_bbox_3d=False, 75 | with_label_3d=False, 76 | with_seg_3d=True, 77 | seg_3d_dtype='np.int32', 78 | seg_offset=2**16, 79 | dataset_type='semantickitti'), 80 | dict(type='PointSegClassMapping'), 81 | dict( 82 | type='RandomChoice', 83 | transforms=[ 84 | [ 85 | dict( 86 | type='LaserMix', 87 | num_areas=[3, 4, 5, 6], 88 | pitch_angles=[-25, 3], 89 | pre_transform=[ 90 | dict( 91 | type='LoadPointsFromFile', 92 | coord_type='LIDAR', 93 | load_dim=4, 94 | use_dim=4), 95 | dict( 96 | type='LoadAnnotations3D', 97 | with_bbox_3d=False, 98 | with_label_3d=False, 99 | with_seg_3d=True, 100 | seg_3d_dtype='np.int32', 101 | seg_offset=2**16, 102 | dataset_type='semantickitti'), 103 | dict(type='PointSegClassMapping') 104 | ], 105 | prob=1) 106 | ], 107 | [ 108 | dict( 109 | type='PolarMix', 110 | instance_classes=[0, 1, 2, 3, 4, 5, 6, 7], 111 | swap_ratio=0.5, 112 | rotate_paste_ratio=1.0, 113 | pre_transform=[ 114 | dict( 115 | type='LoadPointsFromFile', 116 | coord_type='LIDAR', 117 | load_dim=4, 118 | use_dim=4), 119 | dict( 120 | type='LoadAnnotations3D', 121 | with_bbox_3d=False, 122 | with_label_3d=False, 123 | with_seg_3d=True, 124 | seg_3d_dtype='np.int32', 125 | seg_offset=2**16, 126 | dataset_type='semantickitti'), 127 | dict(type='PointSegClassMapping') 128 | ], 129 | prob=1) 130 | ], 131 | ], 132 | prob=[0.5, 0.5]), 133 | dict( 134 | type='RandomFlip3D', 135 | sync_2d=False, 136 | flip_ratio_bev_horizontal=0.5, 137 | flip_ratio_bev_vertical=0.5), 138 | dict( 139 | type='GlobalRotScaleTrans', 140 | rot_range=[-0.78539816, 0.78539816], 141 | scale_ratio_range=[0.95, 1.05], 142 | translation_std=[0.1, 0.1, 0.1], 143 | ), 144 | dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask']) 145 | ] 146 | 147 | 148 | test_pipeline = [ 149 | dict( 150 | type='LoadPointsFromFile', 151 | coord_type='LIDAR', 152 | load_dim=4, 153 | use_dim=4, 154 | backend_args=backend_args), 155 | dict( 156 | type='LoadAnnotations3D', 157 | with_bbox_3d=False, 158 | with_label_3d=False, 159 | with_seg_3d=True, 160 | seg_3d_dtype='np.int32', 161 | seg_offset=2**16, 162 | dataset_type='semantickitti', 163 | backend_args=backend_args), 164 | dict(type='PointSegClassMapping', ), 165 | dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask']) 166 | ] 167 | # construct a pipeline for data and gt loading in show function 168 | # please keep its loading function consistent with test_pipeline (e.g. client) 169 | eval_pipeline = [ 170 | dict( 171 | type='LoadPointsFromFile', 172 | coord_type='LIDAR', 173 | load_dim=4, 174 | use_dim=4, 175 | backend_args=backend_args), 176 | dict( 177 | type='LoadAnnotations3D', 178 | with_bbox_3d=False, 179 | with_label_3d=False, 180 | with_seg_3d=True, 181 | seg_3d_dtype='np.int32', 182 | seg_offset=2**16, 183 | dataset_type='semantickitti', 184 | backend_args=backend_args), 185 | dict(type='PointSegClassMapping', ), 186 | dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask']) 187 | ] 188 | 189 | train_dataloader = dict( 190 | batch_size=2, 191 | num_workers=4, 192 | sampler=dict(type='DefaultSampler', shuffle=True), 193 | dataset=dict( 194 | type='RepeatDataset', 195 | times=1, 196 | dataset=dict( 197 | type=dataset_type, 198 | data_root=data_root, 199 | ann_file='semantickitti_infos_train.pkl', 200 | pipeline=train_pipeline, 201 | metainfo=metainfo, 202 | modality=input_modality, 203 | ignore_index=19, 204 | backend_args=backend_args)), 205 | ) 206 | 207 | test_dataloader = dict( 208 | batch_size=1, 209 | num_workers=1, 210 | sampler=dict(type='DefaultSampler', shuffle=False), 211 | dataset=dict( 212 | type='RepeatDataset', 213 | times=1, 214 | dataset=dict( 215 | type=dataset_type, 216 | data_root=data_root, 217 | ann_file='semantickitti_infos_val.pkl', 218 | pipeline=test_pipeline, 219 | metainfo=metainfo, 220 | modality=input_modality, 221 | ignore_index=19, 222 | test_mode=True, 223 | backend_args=backend_args)), 224 | ) 225 | 226 | val_dataloader = test_dataloader 227 | 228 | val_evaluator = dict(type='SegMetric') 229 | test_evaluator = val_evaluator 230 | 231 | vis_backends = [dict(type='LocalVisBackend')] 232 | visualizer = dict( 233 | type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer') -------------------------------------------------------------------------------- /configs/_base_/datasets/semantickitti_panoptic_lpmix.py: -------------------------------------------------------------------------------- 1 | # For SemanticKitti we usually do 19-class segmentation. 2 | # For labels_map we follow the uniform format of MMDetection & MMSegmentation 3 | # i.e. we consider the unlabeled class as the last one, which is different 4 | # from the original implementation of some methods e.g. Cylinder3D. 5 | dataset_type = '_SemanticKittiDataset' 6 | data_root = 'data/semantickitti/' 7 | class_names = [ 8 | 'car', 'bicycle', 'motorcycle', 'truck', 'bus', 'person', 'bicyclist', 9 | 'motorcyclist', 'road', 'parking', 'sidewalk', 'other-ground', 'building', 10 | 'fence', 'vegetation', 'trunck', 'terrian', 'pole', 'traffic-sign' 11 | ] 12 | labels_map = { 13 | 0: 19, # "unlabeled" 14 | 1: 19, # "outlier" mapped to "unlabeled" --------------mapped 15 | 10: 0, # "car" 16 | 11: 1, # "bicycle" 17 | 13: 4, # "bus" mapped to "other-vehicle" --------------mapped 18 | 15: 2, # "motorcycle" 19 | 16: 4, # "on-rails" mapped to "other-vehicle" ---------mapped 20 | 18: 3, # "truck" 21 | 20: 4, # "other-vehicle" 22 | 30: 5, # "person" 23 | 31: 6, # "bicyclist" 24 | 32: 7, # "motorcyclist" 25 | 40: 8, # "road" 26 | 44: 9, # "parking" 27 | 48: 10, # "sidewalk" 28 | 49: 11, # "other-ground" 29 | 50: 12, # "building" 30 | 51: 13, # "fence" 31 | 52: 19, # "other-structure" mapped to "unlabeled" ------mapped 32 | 60: 8, # "lane-marking" to "road" ---------------------mapped 33 | 70: 14, # "vegetation" 34 | 71: 15, # "trunk" 35 | 72: 16, # "terrain" 36 | 80: 17, # "pole" 37 | 81: 18, # "traffic-sign" 38 | 99: 19, # "other-object" to "unlabeled" ----------------mapped 39 | 252: 0, # "moving-car" to "car" ------------------------mapped 40 | 253: 6, # "moving-bicyclist" to "bicyclist" ------------mapped 41 | 254: 5, # "moving-person" to "person" ------------------mapped 42 | 255: 7, # "moving-motorcyclist" to "motorcyclist" ------mapped 43 | 256: 4, # "moving-on-rails" mapped to "other-vehic------mapped 44 | 257: 4, # "moving-bus" mapped to "other-vehicle" -------mapped 45 | 258: 3, # "moving-truck" to "truck" --------------------mapped 46 | 259: 4 # "moving-other"-vehicle to "other-vehicle"-----mapped 47 | } 48 | 49 | learning_map_inv = { # inverse of previous map 50 | 0: 10, # "unlabeled", and others ignored 51 | 1: 11, # "car" 52 | 2: 15, # "bicycle" 53 | 3: 18, # "motorcycle" 54 | 4: 20, # "truck" 55 | 5: 30, # "other-vehicle" 56 | 6: 31, # "person" 57 | 7: 32, # "bicyclist" 58 | 8: 40, # "motorcyclist" 59 | 9: 44, # "road" 60 | 10: 48, # "parking" 61 | 11: 49, # "sidewalk" 62 | 12: 50, # "other-ground" 63 | 13: 51, # "building" 64 | 14: 70, # "fence" 65 | 15: 71, # "vegetation" 66 | 16: 72, # "trunk" 67 | 17: 80, # "terrain" 68 | 18: 81, # "pole" 69 | 19: 0 # "traffic-sign" 70 | } 71 | 72 | metainfo = dict( 73 | classes=class_names, seg_label_mapping=labels_map, max_label=259) 74 | 75 | input_modality = dict(use_lidar=True, use_camera=False) 76 | 77 | # Example to use different file client 78 | # Method 1: simply set the data root and let the file I/O module 79 | # automatically infer from prefix (not support LMDB and Memcache yet) 80 | 81 | # data_root = 's3://openmmlab/datasets/detection3d/semantickitti/' 82 | 83 | # Method 2: Use backend_args, file_client_args in versions before 1.1.0rc4 84 | # backend_args = dict( 85 | # backend='petrel', 86 | # path_mapping=dict({ 87 | # './data/': 's3://openmmlab/datasets/detection3d/', 88 | # 'data/': 's3://openmmlab/datasets/detection3d/' 89 | # })) 90 | backend_args = None 91 | 92 | pre_transform = [ 93 | dict( 94 | type='LoadPointsFromFile', 95 | coord_type='LIDAR', 96 | load_dim=4, 97 | use_dim=4, 98 | backend_args=backend_args), 99 | dict( 100 | type='_LoadAnnotations3D', 101 | with_bbox_3d=False, 102 | with_label_3d=False, 103 | with_panoptic_3d=True, 104 | seg_3d_dtype='np.int32', 105 | seg_offset=2**16, 106 | dataset_type='semantickitti', 107 | backend_args=backend_args), 108 | dict(type='PointSegClassMapping', )] 109 | 110 | train_pipeline = [ 111 | dict( 112 | type='LoadPointsFromFile', 113 | coord_type='LIDAR', 114 | load_dim=4, 115 | use_dim=4, 116 | backend_args=backend_args), 117 | dict( 118 | type='_LoadAnnotations3D', 119 | with_bbox_3d=False, 120 | with_label_3d=False, 121 | with_panoptic_3d=True, 122 | seg_3d_dtype='np.int32', 123 | seg_offset=2**16, 124 | dataset_type='semantickitti', 125 | backend_args=backend_args), 126 | dict(type='PointSegClassMapping', ), 127 | dict( 128 | type='RandomChoice', 129 | transforms=[ 130 | [ 131 | dict( 132 | type='_LaserMix', 133 | num_areas=[3, 4, 5, 6], 134 | pitch_angles=[-25, 3], 135 | pre_transform=[ 136 | dict( 137 | type='LoadPointsFromFile', 138 | coord_type='LIDAR', 139 | load_dim=4, 140 | use_dim=4), 141 | dict( 142 | type='_LoadAnnotations3D', 143 | with_bbox_3d=False, 144 | with_label_3d=False, 145 | with_panoptic_3d=True, 146 | seg_3d_dtype='np.int32', 147 | seg_offset=2**16, 148 | dataset_type='semantickitti'), 149 | dict(type='PointSegClassMapping') 150 | ], 151 | prob=0.5) 152 | ], 153 | [ 154 | dict( 155 | type='_PolarMix', 156 | instance_classes=[0, 1, 2, 3, 4, 5, 6, 7], 157 | swap_ratio=0.5, 158 | rotate_paste_ratio=1.0, 159 | pre_transform=[ 160 | dict( 161 | type='LoadPointsFromFile', 162 | coord_type='LIDAR', 163 | load_dim=4, 164 | use_dim=4), 165 | dict( 166 | type='_LoadAnnotations3D', 167 | with_bbox_3d=False, 168 | with_label_3d=False, 169 | with_panoptic_3d=True, 170 | seg_3d_dtype='np.int32', 171 | seg_offset=2**16, 172 | dataset_type='semantickitti'), 173 | dict(type='PointSegClassMapping') 174 | ], 175 | prob=0.5) 176 | ], 177 | ], 178 | prob=[0.2, 0.8]), 179 | dict( 180 | type='RandomFlip3D', 181 | sync_2d=False, 182 | flip_ratio_bev_horizontal=0.5, 183 | flip_ratio_bev_vertical=0.5), 184 | dict( 185 | type='GlobalRotScaleTrans', 186 | rot_range=[-0.78539816, 0.78539816], 187 | scale_ratio_range=[0.95, 1.05], 188 | translation_std=[0.1, 0.1, 0.1], 189 | ), 190 | dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask', 'pts_instance_mask']) 191 | ] 192 | 193 | test_pipeline = [ 194 | dict( 195 | type='LoadPointsFromFile', 196 | coord_type='LIDAR', 197 | load_dim=4, 198 | use_dim=4, 199 | backend_args=backend_args), 200 | dict( 201 | type='_LoadAnnotations3D', 202 | with_bbox_3d=False, 203 | with_label_3d=False, 204 | with_panoptic_3d=True, 205 | seg_3d_dtype='np.int32', 206 | seg_offset=2**16, 207 | dataset_type='semantickitti', 208 | backend_args=backend_args), 209 | dict(type='PointSegClassMapping', ), 210 | dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask', 'pts_instance_mask']) 211 | ] 212 | 213 | 214 | train_dataloader = dict( 215 | batch_size=4, 216 | num_workers=4, 217 | sampler=dict(type='DefaultSampler', shuffle=True), 218 | dataset=dict( 219 | type='RepeatDataset', 220 | times=1, 221 | dataset=dict( 222 | type=dataset_type, 223 | data_root=data_root, 224 | data_prefix = dict( 225 | pts='', 226 | img='', 227 | pts_instance_mask='', 228 | pts_semantic_mask='', 229 | pts_panoptic_mask='',), 230 | ann_file='semantickitti_infos_train.pkl', 231 | pipeline=train_pipeline, 232 | metainfo=metainfo, 233 | modality=input_modality, 234 | ignore_index=19, 235 | backend_args=backend_args)), 236 | ) 237 | 238 | test_dataloader = dict( 239 | batch_size=1, 240 | num_workers=1, 241 | sampler=dict(type='DefaultSampler', shuffle=False), 242 | dataset=dict( 243 | type='RepeatDataset', 244 | times=1, 245 | dataset=dict( 246 | type=dataset_type, 247 | data_root=data_root, 248 | data_prefix = dict( 249 | pts='', 250 | img='', 251 | pts_instance_mask='', 252 | pts_semantic_mask='', 253 | pts_panoptic_mask='',), 254 | ann_file='semantickitti_infos_val.pkl', 255 | pipeline=test_pipeline, 256 | metainfo=metainfo, 257 | modality=input_modality, 258 | ignore_index=19, 259 | test_mode=True, 260 | backend_args=backend_args)), 261 | ) 262 | 263 | val_dataloader = test_dataloader 264 | 265 | val_evaluator = dict(type='_PanopticSegMetric', 266 | thing_class_inds=[0,1,2,3,4,5,6,7], 267 | stuff_class_inds=[8,9,10,11,12,13,14,15,16,17,18], 268 | min_num_points=50, 269 | id_offset = 2**16, 270 | dataset_type='semantickitti', 271 | learning_map_inv=learning_map_inv) 272 | test_evaluator = val_evaluator 273 | 274 | vis_backends = [dict(type='LocalVisBackend')] 275 | visualizer = dict( 276 | type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer') 277 | -------------------------------------------------------------------------------- /configs/_base_/default_runtime.py: -------------------------------------------------------------------------------- 1 | default_scope = 'mmdet3d' 2 | 3 | default_hooks = dict( 4 | timer=dict(type='IterTimerHook'), 5 | logger=dict(type='LoggerHook', interval=50), 6 | param_scheduler=dict(type='ParamSchedulerHook'), 7 | checkpoint=dict(type='CheckpointHook', interval=-1), 8 | sampler_seed=dict(type='DistSamplerSeedHook'), 9 | visualization=dict(type='Det3DVisualizationHook')) 10 | 11 | env_cfg = dict( 12 | cudnn_benchmark=False, 13 | mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), 14 | dist_cfg=dict(backend='nccl'), 15 | ) 16 | 17 | log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True) 18 | 19 | log_level = 'INFO' 20 | load_from = None 21 | resume = False 22 | 23 | # TODO: support auto scaling lr 24 | -------------------------------------------------------------------------------- /configs/_base_/models/cylinder3d.py: -------------------------------------------------------------------------------- 1 | grid_shape = [480, 360, 32] 2 | model = dict( 3 | type='Cylinder3D', 4 | data_preprocessor=dict( 5 | type='Det3DDataPreprocessor', 6 | voxel=True, 7 | voxel_type='cylindrical', 8 | voxel_layer=dict( 9 | grid_shape=grid_shape, 10 | point_cloud_range=[0, -3.14159265359, -4, 50, 3.14159265359, 2], 11 | max_num_points=-1, 12 | max_voxels=-1, 13 | ), 14 | ), 15 | voxel_encoder=dict( 16 | type='SegVFE', 17 | feat_channels=[64, 128, 256, 256], 18 | in_channels=6, 19 | with_voxel_center=True, 20 | feat_compression=16, 21 | return_point_feats=False), 22 | backbone=dict( 23 | type='Asymm3DSpconv', 24 | grid_size=grid_shape, 25 | input_channels=16, 26 | base_channels=32, 27 | norm_cfg=dict(type='BN1d', eps=1e-5, momentum=0.1)), 28 | decode_head=dict( 29 | type='Cylinder3DHead', 30 | channels=128, 31 | num_classes=20, 32 | loss_ce=dict( 33 | type='mmdet.CrossEntropyLoss', 34 | use_sigmoid=False, 35 | class_weight=None, 36 | loss_weight=1.0), 37 | loss_lovasz=dict(type='LovaszLoss', loss_weight=1.0, reduction='none'), 38 | ), 39 | train_cfg=None, 40 | test_cfg=dict(mode='whole'), 41 | ) 42 | -------------------------------------------------------------------------------- /configs/_base_/models/p3former.py: -------------------------------------------------------------------------------- 1 | grid_shape = [480, 360, 32] 2 | model = dict( 3 | type='_P3Former', 4 | data_preprocessor=dict( 5 | type='_Det3DDataPreprocessor', 6 | voxel=True, 7 | voxel_type='cylindrical', 8 | voxel_layer=dict( 9 | grid_shape=grid_shape, 10 | point_cloud_range=[0, -3.14159265359, -4, 50, 3.14159265359, 2], 11 | max_num_points=-1, 12 | max_voxels=-1, 13 | ), 14 | ), 15 | voxel_encoder=dict( 16 | type='SegVFE', 17 | feat_channels=[64, 128, 256, 256], 18 | in_channels=6, 19 | with_voxel_center=True, 20 | feat_compression=16, 21 | return_point_feats=False), 22 | backbone=dict( 23 | type='_Asymm3DSpconv', 24 | grid_size=grid_shape, 25 | input_channels=16, 26 | base_channels=32, 27 | norm_cfg=dict(type='BN1d', eps=1e-5, momentum=0.1)), 28 | decode_head=dict( 29 | type='_P3FormerHead', 30 | num_classes=20, 31 | num_queries=128, 32 | embed_dims=128, 33 | point_cloud_range=[0, -3.14159265359, -4, 50, 3.14159265359, 2], 34 | assigner_zero_layer_cfg=dict( 35 | type='mmdet.HungarianAssigner', 36 | match_costs=[ 37 | dict(type='mmdet.FocalLossCost', weight=1.0, binary_input=True, gamma=2.0, alpha=0.25), 38 | dict(type='mmdet.DiceCost', weight=2.0, pred_act=True), 39 | ]), 40 | assigner_cfg=dict( 41 | type='mmdet.HungarianAssigner', 42 | match_costs=[ 43 | dict(type='mmdet.FocalLossCost', gamma=4.0,alpha=0.25,weight=1.0), 44 | dict(type='mmdet.FocalLossCost', weight=1.0, binary_input=True, gamma=2.0, alpha=0.25), 45 | dict(type='mmdet.DiceCost', weight=2.0, pred_act=True), 46 | ]), 47 | sampler_cfg=dict(type='_MaskPseudoSampler'), 48 | loss_mask=dict( 49 | type='mmdet.FocalLoss', 50 | use_sigmoid=True, 51 | gamma=2.0, 52 | alpha=0.25, 53 | reduction='mean', 54 | loss_weight=1.0), 55 | loss_dice=dict(type='mmdet.DiceLoss', loss_weight=2.0), 56 | loss_cls=dict( 57 | type='mmdet.FocalLoss', 58 | use_sigmoid=True, 59 | gamma=4.0, 60 | alpha=0.25, 61 | loss_weight=1.0), 62 | ), 63 | train_cfg=None, 64 | test_cfg=dict(mode='whole'), 65 | ) 66 | -------------------------------------------------------------------------------- /configs/cylinder3d/cylinder3d_8xb2_3x_semantickitti.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../_base_/datasets/semantickitti_lpmix.py', '../_base_/models/cylinder3d.py', 3 | '../_base_/default_runtime.py' 4 | ] 5 | 6 | # optimizer 7 | # This schedule is mainly used by models on nuScenes dataset 8 | lr = 0.008 9 | optim_wrapper = dict( 10 | type='OptimWrapper', 11 | optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01), 12 | # max_norm=10 is better for SECOND 13 | clip_grad=dict(max_norm=10, norm_type=2)) 14 | 15 | # training schedule for 2x 16 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1) 17 | val_cfg = dict(type='ValLoop') 18 | test_cfg = dict(type='TestLoop') 19 | 20 | # learning rate 21 | param_scheduler = [ 22 | dict( 23 | type='MultiStepLR', 24 | begin=0, 25 | end=36, 26 | by_epoch=True, 27 | milestones=[24, 32], 28 | gamma=0.1) 29 | ] -------------------------------------------------------------------------------- /configs/p3former/p3former_8xb2_3x_semantickitti.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../_base_/datasets/semantickitti_panoptic_lpmix.py', '../_base_/models/p3former.py', 3 | '../_base_/default_runtime.py' 4 | ] 5 | 6 | # optimizer 7 | # This schedule is mainly used by models on nuScenes dataset 8 | 9 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=40, val_interval=1) 10 | val_cfg = dict(type='ValLoop') 11 | test_cfg = dict(type='TestLoop') 12 | 13 | model = dict( 14 | voxel_encoder=dict( 15 | feat_channels=[64, 128, 256, 256], 16 | in_channels=6, 17 | with_voxel_center=True, 18 | feat_compression=16, 19 | return_point_feats=False), 20 | backbone=dict( 21 | input_channels=16, 22 | base_channels=32, 23 | more_conv=True, 24 | out_channels=256), 25 | decode_head=dict( 26 | num_decoder_layers=6, 27 | num_queries=128, 28 | embed_dims=256, 29 | cls_channels=(256, 256, 20), 30 | mask_channels=(256, 256, 256, 256, 256), 31 | thing_class=[0,1,2,3,4,5,6,7], 32 | stuff_class=[8,9,10,11,12,13,14,15,16,17,18], 33 | ignore_index=19 34 | )) 35 | 36 | 37 | lr = 0.0008 38 | optim_wrapper = dict( 39 | type='OptimWrapper', 40 | optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01)) 41 | 42 | 43 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1) 44 | val_cfg = dict(type='ValLoop') 45 | test_cfg = dict(type='TestLoop') 46 | 47 | param_scheduler = [ 48 | dict( 49 | type='MultiStepLR', 50 | begin=0, 51 | end=36, 52 | by_epoch=True, 53 | milestones=[24, 32], 54 | gamma=0.2) 55 | ] 56 | 57 | train_dataloader = dict(batch_size=2, ) 58 | 59 | default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5)) 60 | 61 | custom_imports = dict( 62 | imports=[ 63 | 'p3former.backbones.cylinder3d', 64 | 'p3former.data_preprocessors.data_preprocessor', 65 | 'p3former.decode_heads.p3former_head', 66 | 'p3former.segmentors.p3former', 67 | 'p3former.task_modules.samplers.mask_pseduo_sampler', 68 | 'evaluation.metrics.panoptic_seg_metric', 69 | 'datasets.semantickitti_dataset', 70 | 'datasets.transforms.loading', 71 | 'datasets.transforms.transforms_3d', 72 | ], 73 | allow_failed_imports=False) 74 | -------------------------------------------------------------------------------- /configs/p3former/p3former_8xb2_3x_semantickitti_submit.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../_base_/datasets/semantickitti_panoptic_lpmix.py', '../_base_/models/p3former.py', 3 | '../_base_/default_runtime.py' 4 | ] 5 | 6 | # optimizer 7 | # This schedule is mainly used by models on nuScenes dataset 8 | 9 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=40, val_interval=1) 10 | val_cfg = dict(type='ValLoop') 11 | test_cfg = dict(type='TestLoop') 12 | 13 | model = dict( 14 | voxel_encoder=dict( 15 | feat_channels=[64, 128, 256, 256], 16 | in_channels=6, 17 | with_voxel_center=True, 18 | feat_compression=16, 19 | return_point_feats=False), 20 | backbone=dict( 21 | input_channels=16, 22 | base_channels=32, 23 | more_conv=True, 24 | out_channels=256), 25 | decode_head=dict( 26 | num_decoder_layers=6, 27 | num_queries=128, 28 | embed_dims=256, 29 | cls_channels=(256, 256, 20), 30 | mask_channels=(256, 256, 256, 256, 256), 31 | thing_class=[0,1,2,3,4,5,6,7], 32 | stuff_class=[8,9,10,11,12,13,14,15,16,17,18], 33 | ignore_index=19 34 | )) 35 | 36 | 37 | lr = 0.0008 38 | optim_wrapper = dict( 39 | type='OptimWrapper', 40 | optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01)) 41 | 42 | 43 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1) 44 | val_cfg = dict(type='ValLoop') 45 | test_cfg = dict(type='TestLoop') 46 | 47 | param_scheduler = [ 48 | dict( 49 | type='MultiStepLR', 50 | begin=0, 51 | end=36, 52 | by_epoch=True, 53 | milestones=[24, 32], 54 | gamma=0.2) 55 | ] 56 | 57 | train_dataloader = dict(batch_size=2, ) 58 | 59 | test_pipeline = [ 60 | dict( 61 | type='LoadPointsFromFile', 62 | coord_type='LIDAR', 63 | load_dim=4, 64 | use_dim=4, 65 | backend_args=None), 66 | dict(type='_Pack3DDetInputs', keys=['points', 'lidar_path']) 67 | ] 68 | 69 | test_dataloader = dict( 70 | batch_size=1, 71 | num_workers=1, 72 | sampler=dict(type='DefaultSampler', shuffle=False), 73 | dataset=dict( 74 | type='RepeatDataset', 75 | times=1, 76 | dataset=dict( 77 | pipeline=test_pipeline, 78 | ann_file='semantickitti_infos_mini.pkl')) 79 | ) 80 | 81 | test_evaluator = dict(submission_prefix='semantickitti_submission') 82 | 83 | default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5)) 84 | 85 | custom_imports = dict( 86 | imports=[ 87 | 'p3former.backbones.cylinder3d', 88 | 'p3former.data_preprocessors.data_preprocessor', 89 | 'p3former.decode_heads.p3former_head', 90 | 'p3former.segmentors.p3former', 91 | 'p3former.task_modules.samplers.mask_pseduo_sampler', 92 | 'evaluation.metrics.panoptic_seg_metric', 93 | 'datasets.semantickitti_dataset', 94 | 'datasets.transforms.loading', 95 | 'datasets.transforms.transforms_3d', 96 | 'datasets.transforms.formating', 97 | ], 98 | allow_failed_imports=False) 99 | -------------------------------------------------------------------------------- /configs/p3former/p3former_8xb2_3x_semantickitti_trainval.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../_base_/datasets/semantickitti_panoptic_lpmix.py', '../_base_/models/p3former.py', 3 | '../_base_/default_runtime.py' 4 | ] 5 | 6 | # optimizer 7 | # This schedule is mainly used by models on nuScenes dataset 8 | 9 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=40, val_interval=1) 10 | val_cfg = dict(type='ValLoop') 11 | test_cfg = dict(type='TestLoop') 12 | 13 | model = dict( 14 | voxel_encoder=dict( 15 | feat_channels=[64, 128, 256, 256], 16 | in_channels=6, 17 | with_voxel_center=True, 18 | feat_compression=16, 19 | return_point_feats=False), 20 | backbone=dict( 21 | input_channels=16, 22 | base_channels=32, 23 | more_conv=True, 24 | out_channels=256), 25 | decode_head=dict( 26 | num_decoder_layers=6, 27 | num_queries=128, 28 | embed_dims=256, 29 | cls_channels=(256, 256, 20), 30 | mask_channels=(256, 256, 256, 256, 256), 31 | thing_class=[0,1,2,3,4,5,6,7], 32 | stuff_class=[8,9,10,11,12,13,14,15,16,17,18], 33 | ignore_index=19 34 | )) 35 | 36 | 37 | lr = 0.0008 38 | optim_wrapper = dict( 39 | type='OptimWrapper', 40 | optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01)) 41 | 42 | 43 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1) 44 | val_cfg = dict(type='ValLoop') 45 | test_cfg = dict(type='TestLoop') 46 | 47 | param_scheduler = [ 48 | dict( 49 | type='MultiStepLR', 50 | begin=0, 51 | end=36, 52 | by_epoch=True, 53 | milestones=[24, 32], 54 | gamma=0.2) 55 | ] 56 | 57 | train_dataloader = dict(batch_size=2, dataset=dict(dataset=dict(ann_file='semantickitti_infos_trainval.pkl'))) 58 | 59 | default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5)) 60 | 61 | custom_imports = dict( 62 | imports=[ 63 | 'p3former.backbones.cylinder3d', 64 | 'p3former.data_preprocessors.data_preprocessor', 65 | 'p3former.decode_heads.p3former_head', 66 | 'p3former.segmentors.p3former', 67 | 'p3former.task_modules.samplers.mask_pseduo_sampler', 68 | 'evaluation.metrics.panoptic_seg_metric', 69 | 'datasets.semantickitti_dataset', 70 | 'datasets.transforms.loading', 71 | 'datasets.transforms.transforms_3d', 72 | ], 73 | allow_failed_imports=False) 74 | -------------------------------------------------------------------------------- /datasets/semantickitti_dataset.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | # Modified from mmdetection3d. 3 | from typing import Callable, List, Optional, Union 4 | 5 | import numpy as np 6 | 7 | from mmdet3d.registry import DATASETS 8 | from mmdet3d.datasets.seg3d_dataset import Seg3DDataset 9 | 10 | from os import path as osp 11 | 12 | @DATASETS.register_module(force=True) 13 | class _SemanticKittiDataset(Seg3DDataset): 14 | r"""SemanticKitti Dataset. 15 | 16 | This class serves as the API for experiments on the SemanticKITTI Dataset 17 | Please refer to `_ 18 | for data downloading 19 | 20 | Args: 21 | data_root (str, optional): Path of dataset root. Defaults to None. 22 | ann_file (str): Path of annotation file. Defaults to ''. 23 | metainfo (dict, optional): Meta information for dataset, such as class 24 | information. Defaults to None. 25 | data_prefix (dict): Prefix for training data. Defaults to 26 | dict(pts='', 27 | img='', 28 | pts_instance_mask='', 29 | pts_semantic_mask=''). 30 | pipeline (List[dict]): Pipeline used for data processing. 31 | Defaults to []. 32 | modality (dict): Modality to specify the sensor data used as input, 33 | it usually has following keys: 34 | 35 | - use_camera: bool 36 | - use_lidar: bool 37 | Defaults to dict(use_lidar=True, use_camera=False). 38 | ignore_index (int, optional): The label index to be ignored, e.g. 39 | unannotated points. If None is given, set to len(self.classes) to 40 | be consistent with PointSegClassMapping function in pipeline. 41 | Defaults to None. 42 | scene_idxs (np.ndarray or str, optional): Precomputed index to load 43 | data. For scenes with many points, we may sample it several times. 44 | Defaults to None. 45 | test_mode (bool): Whether the dataset is in test mode. 46 | Defaults to False. 47 | """ 48 | METAINFO = { 49 | 'classes': ('car', 'bicycle', 'motorcycle', 'truck', 'bus', 'person', 50 | 'bicyclist', 'motorcyclist', 'road', 'parking', 'sidewalk', 51 | 'other-ground', 'building', 'fence', 'vegetation', 52 | 'trunck', 'terrian', 'pole', 'traffic-sign'), 53 | 'palette': [[100, 150, 245], [100, 230, 245], [30, 60, 150], 54 | [80, 30, 180], [100, 80, 250], [155, 30, 30], 55 | [255, 40, 200], [150, 30, 90], [255, 0, 255], 56 | [255, 150, 255], [75, 0, 75], [175, 0, 75], [255, 200, 0], 57 | [255, 120, 50], [0, 175, 0], [135, 60, 0], [150, 240, 80], 58 | [255, 240, 150], [255, 0, 0]], 59 | 'seg_valid_class_ids': 60 | tuple(range(19)), 61 | 'seg_all_class_ids': 62 | tuple(range(19)), 63 | } 64 | 65 | def __init__(self, 66 | data_root: Optional[str] = None, 67 | ann_file: str = '', 68 | metainfo: Optional[dict] = None, 69 | data_prefix: dict = dict( 70 | pts='', 71 | img='', 72 | pts_instance_mask='', 73 | pts_semantic_mask=''), 74 | pipeline: List[Union[dict, Callable]] = [], 75 | modality: dict = dict(use_lidar=True, use_camera=False), 76 | ignore_index: Optional[int] = None, 77 | scene_idxs: Optional[Union[str, np.ndarray]] = None, 78 | test_mode: bool = False, 79 | **kwargs) -> None: 80 | 81 | super().__init__( 82 | data_root=data_root, 83 | ann_file=ann_file, 84 | metainfo=metainfo, 85 | data_prefix=data_prefix, 86 | pipeline=pipeline, 87 | modality=modality, 88 | ignore_index=ignore_index, 89 | scene_idxs=scene_idxs, 90 | test_mode=test_mode, 91 | **kwargs) 92 | 93 | def get_seg_label_mapping(self, metainfo): 94 | seg_label_mapping = np.zeros(metainfo['max_label'] + 1, dtype=np.int64) 95 | for idx in metainfo['seg_label_mapping']: 96 | seg_label_mapping[idx] = metainfo['seg_label_mapping'][idx] 97 | return seg_label_mapping 98 | 99 | def parse_data_info(self, info: dict) -> dict: 100 | """Process the raw data info. 101 | 102 | Convert all relative path of needed modality data file to 103 | the absolute path. And process 104 | the `instances` field to `ann_info` in training stage. 105 | 106 | Args: 107 | info (dict): Raw info dict. 108 | 109 | Returns: 110 | dict: Has `ann_info` in training stage. And 111 | all path has been converted to absolute path. 112 | """ 113 | info = super().parse_data_info(info) 114 | if 'pts_panoptic_mask_path' in info: 115 | info['pts_panoptic_mask_path'] = \ 116 | osp.join(self.data_prefix.get('pts_panoptic_mask', ''), 117 | info['pts_panoptic_mask_path']) 118 | 119 | return info 120 | -------------------------------------------------------------------------------- /datasets/transforms/__pycache__/loading.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenRobotLab/P3Former/2b8faa2b0b168d78713092b223a4a6e8ba63e44e/datasets/transforms/__pycache__/loading.cpython-39.pyc -------------------------------------------------------------------------------- /datasets/transforms/__pycache__/transforms_3d.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/OpenRobotLab/P3Former/2b8faa2b0b168d78713092b223a4a6e8ba63e44e/datasets/transforms/__pycache__/transforms_3d.cpython-39.pyc -------------------------------------------------------------------------------- /datasets/transforms/formating.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | from typing import List, Sequence, Union 3 | 4 | import mmengine 5 | import numpy as np 6 | import torch 7 | from mmcv import BaseTransform 8 | from mmengine.structures import InstanceData 9 | from numpy import dtype 10 | 11 | from mmdet3d.registry import TRANSFORMS 12 | from mmdet3d.structures import BaseInstance3DBoxes, Det3DDataSample, PointData 13 | from mmdet3d.structures.points import BasePoints 14 | 15 | 16 | def to_tensor( 17 | data: Union[torch.Tensor, np.ndarray, Sequence, int, 18 | float]) -> torch.Tensor: 19 | """Convert objects of various python types to :obj:`torch.Tensor`. 20 | 21 | Supported types are: :class:`numpy.ndarray`, :class:`torch.Tensor`, 22 | :class:`Sequence`, :class:`int` and :class:`float`. 23 | 24 | Args: 25 | data (torch.Tensor | numpy.ndarray | Sequence | int | float): Data to 26 | be converted. 27 | 28 | Returns: 29 | torch.Tensor: the converted data. 30 | """ 31 | 32 | if isinstance(data, torch.Tensor): 33 | return data 34 | elif isinstance(data, np.ndarray): 35 | if data.dtype is dtype('float64'): 36 | data = data.astype(np.float32) 37 | return torch.from_numpy(data) 38 | elif isinstance(data, Sequence) and not mmengine.is_str(data): 39 | return torch.tensor(data) 40 | elif isinstance(data, int): 41 | return torch.LongTensor([data]) 42 | elif isinstance(data, float): 43 | return torch.FloatTensor([data]) 44 | else: 45 | raise TypeError(f'type {type(data)} cannot be converted to tensor.') 46 | 47 | 48 | @TRANSFORMS.register_module() 49 | class _Pack3DDetInputs(BaseTransform): 50 | INPUTS_KEYS = ['points', 'img'] 51 | INSTANCEDATA_3D_KEYS = [ 52 | 'gt_bboxes_3d', 'gt_labels_3d', 'attr_labels', 'depths', 'centers_2d' 53 | ] 54 | INSTANCEDATA_2D_KEYS = [ 55 | 'gt_bboxes', 56 | 'gt_bboxes_labels', 57 | ] 58 | 59 | SEG_KEYS = [ 60 | 'gt_seg_map', 'pts_instance_mask', 'pts_semantic_mask', 61 | 'gt_semantic_seg' 62 | ] 63 | 64 | def __init__( 65 | self, 66 | keys: tuple, 67 | meta_keys: tuple = ('img_path', 'ori_shape', 'img_shape', 'lidar2img', 68 | 'depth2img', 'cam2img', 'pad_shape', 69 | 'scale_factor', 'flip', 'pcd_horizontal_flip', 70 | 'pcd_vertical_flip', 'box_mode_3d', 'box_type_3d', 71 | 'img_norm_cfg', 'num_pts_feats', 'pcd_trans', 72 | 'sample_idx', 'pcd_scale_factor', 'pcd_rotation', 73 | 'pcd_rotation_angle', 'lidar_path', 74 | 'transformation_3d_flow', 'trans_mat', 75 | 'affine_aug', 'sweep_img_metas', 'ori_cam2img', 76 | 'cam2global', 'crop_offset', 'img_crop_offset', 77 | 'resize_img_shape', 'lidar2cam', 'ori_lidar2img', 78 | 'num_ref_frames', 'num_views', 'ego2global') 79 | ) -> None: 80 | self.keys = keys 81 | self.meta_keys = meta_keys 82 | 83 | def _remove_prefix(self, key: str) -> str: 84 | if key.startswith('gt_'): 85 | key = key[3:] 86 | return key 87 | 88 | def transform(self, results: Union[dict, 89 | List[dict]]) -> Union[dict, List[dict]]: 90 | """Method to pack the input data. when the value in this dict is a 91 | list, it usually is in Augmentations Testing. 92 | 93 | Args: 94 | results (dict | list[dict]): Result dict from the data pipeline. 95 | 96 | Returns: 97 | dict | List[dict]: 98 | 99 | - 'inputs' (dict): The forward data of models. It usually contains 100 | following keys: 101 | 102 | - points 103 | - img 104 | 105 | - 'data_samples' (:obj:`Det3DDataSample`): The annotation info of 106 | the sample. 107 | """ 108 | # augtest 109 | if isinstance(results, list): 110 | if len(results) == 1: 111 | # simple test 112 | return self.pack_single_results(results[0]) 113 | pack_results = [] 114 | for single_result in results: 115 | pack_results.append(self.pack_single_results(single_result)) 116 | return pack_results 117 | # norm training and simple testing 118 | elif isinstance(results, dict): 119 | return self.pack_single_results(results) 120 | else: 121 | raise NotImplementedError 122 | 123 | def pack_single_results(self, results: dict) -> dict: 124 | """Method to pack the single input data. when the value in this dict is 125 | a list, it usually is in Augmentations Testing. 126 | 127 | Args: 128 | results (dict): Result dict from the data pipeline. 129 | 130 | Returns: 131 | dict: A dict contains 132 | 133 | - 'inputs' (dict): The forward data of models. It usually contains 134 | following keys: 135 | 136 | - points 137 | - img 138 | 139 | - 'data_samples' (:obj:`Det3DDataSample`): The annotation info 140 | of the sample. 141 | """ 142 | # Format 3D data 143 | if 'points' in results: 144 | if isinstance(results['points'], BasePoints): 145 | results['points'] = results['points'].tensor 146 | 147 | if 'img' in results: 148 | if isinstance(results['img'], list): 149 | # process multiple imgs in single frame 150 | imgs = np.stack(results['img'], axis=0) 151 | if imgs.flags.c_contiguous: 152 | imgs = to_tensor(imgs).permute(0, 3, 1, 2).contiguous() 153 | else: 154 | imgs = to_tensor( 155 | np.ascontiguousarray(imgs.transpose(0, 3, 1, 2))) 156 | results['img'] = imgs 157 | else: 158 | img = results['img'] 159 | if len(img.shape) < 3: 160 | img = np.expand_dims(img, -1) 161 | # To improve the computational speed by by 3-5 times, apply: 162 | # `torch.permute()` rather than `np.transpose()`. 163 | # Refer to https://github.com/open-mmlab/mmdetection/pull/9533 164 | # for more details 165 | if img.flags.c_contiguous: 166 | img = to_tensor(img).permute(2, 0, 1).contiguous() 167 | else: 168 | img = to_tensor( 169 | np.ascontiguousarray(img.transpose(2, 0, 1))) 170 | results['img'] = img 171 | 172 | for key in [ 173 | 'proposals', 'gt_bboxes', 'gt_bboxes_ignore', 'gt_labels', 174 | 'gt_bboxes_labels', 'attr_labels', 'pts_instance_mask', 175 | 'pts_semantic_mask', 'centers_2d', 'depths', 'gt_labels_3d' 176 | ]: 177 | if key not in results: 178 | continue 179 | if isinstance(results[key], list): 180 | results[key] = [to_tensor(res) for res in results[key]] 181 | else: 182 | results[key] = to_tensor(results[key]) 183 | if 'gt_bboxes_3d' in results: 184 | if not isinstance(results['gt_bboxes_3d'], BaseInstance3DBoxes): 185 | results['gt_bboxes_3d'] = to_tensor(results['gt_bboxes_3d']) 186 | 187 | if 'gt_semantic_seg' in results: 188 | results['gt_semantic_seg'] = to_tensor( 189 | results['gt_semantic_seg'][None]) 190 | if 'gt_seg_map' in results: 191 | results['gt_seg_map'] = results['gt_seg_map'][None, ...] 192 | 193 | data_sample = Det3DDataSample() 194 | gt_instances_3d = InstanceData() 195 | gt_instances = InstanceData() 196 | gt_pts_seg = PointData() 197 | 198 | img_metas = {} 199 | for key in self.meta_keys: 200 | if key in results: 201 | img_metas[key] = results[key] 202 | data_sample.set_metainfo(img_metas) 203 | 204 | inputs = {} 205 | for key in self.keys: 206 | if key in results: 207 | if key in self.INPUTS_KEYS: 208 | inputs[key] = results[key] 209 | elif key in self.INSTANCEDATA_3D_KEYS: 210 | gt_instances_3d[self._remove_prefix(key)] = results[key] 211 | elif key in self.INSTANCEDATA_2D_KEYS: 212 | if key == 'gt_bboxes_labels': 213 | gt_instances['labels'] = results[key] 214 | else: 215 | gt_instances[self._remove_prefix(key)] = results[key] 216 | elif key in self.SEG_KEYS: 217 | gt_pts_seg[self._remove_prefix(key)] = results[key] 218 | elif key in ['lidar_sweeps']: 219 | data_sample.lidar_sweeps = results[key] 220 | elif key in ['lidar_token']: 221 | pass 222 | elif key in ['lidar_path']: 223 | pass 224 | else: 225 | raise NotImplementedError(f'Please modified ' 226 | f'`Pack3DDetInputs` ' 227 | f'to put {key} to ' 228 | f'corresponding field') 229 | 230 | data_sample.gt_instances_3d = gt_instances_3d 231 | data_sample.gt_instances = gt_instances 232 | data_sample.gt_pts_seg = gt_pts_seg 233 | if 'eval_ann_info' in results: 234 | data_sample.eval_ann_info = results['eval_ann_info'] 235 | if 'lidar_token' in self.keys: 236 | data_sample.eval_ann_info['token'] = results['lidar_token'] # for nuscenes submission 237 | if 'lidar_path' in self.keys: 238 | data_sample.eval_ann_info['lidar_path'] = results['lidar_path'] # for semantickitti submission 239 | else: 240 | data_sample.eval_ann_info = None 241 | 242 | packed_results = dict() 243 | packed_results['data_samples'] = data_sample 244 | packed_results['inputs'] = inputs 245 | 246 | return packed_results 247 | 248 | def __repr__(self) -> str: 249 | """str: Return a string that describes the module.""" 250 | repr_str = self.__class__.__name__ 251 | repr_str += f'(keys={self.keys})' 252 | repr_str += f'(meta_keys={self.meta_keys})' 253 | return repr_str -------------------------------------------------------------------------------- /datasets/transforms/loading.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import copy 3 | from typing import List, Optional, Union 4 | 5 | import mmcv 6 | import mmengine 7 | import numpy as np 8 | from mmcv.transforms import LoadImageFromFile 9 | from mmcv.transforms.base import BaseTransform 10 | from mmdet.datasets.transforms import LoadAnnotations 11 | from mmengine.fileio import get 12 | 13 | from mmdet3d.registry import TRANSFORMS 14 | from mmdet3d.structures.bbox_3d import get_box_type 15 | from mmdet3d.structures.points import BasePoints, get_points_type 16 | 17 | 18 | @TRANSFORMS.register_module() 19 | class _LoadAnnotations3D(LoadAnnotations): 20 | """Load Annotations3D. 21 | 22 | Load instance mask and semantic mask of points and 23 | encapsulate the items into related fields. 24 | 25 | Required Keys: 26 | 27 | - ann_info (dict) 28 | 29 | - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes` | 30 | :obj:`DepthInstance3DBoxes` | :obj:`CameraInstance3DBoxes`): 31 | 3D ground truth bboxes. Only when `with_bbox_3d` is True 32 | - gt_labels_3d (np.int64): Labels of ground truths. 33 | Only when `with_label_3d` is True. 34 | - gt_bboxes (np.float32): 2D ground truth bboxes. 35 | Only when `with_bbox` is True. 36 | - gt_labels (np.ndarray): Labels of ground truths. 37 | Only when `with_label` is True. 38 | - depths (np.ndarray): Only when 39 | `with_bbox_depth` is True. 40 | - centers_2d (np.ndarray): Only when 41 | `with_bbox_depth` is True. 42 | - attr_labels (np.ndarray): Attribute labels of instances. 43 | Only when `with_attr_label` is True. 44 | 45 | - pts_instance_mask_path (str): Path of instance mask file. 46 | Only when `with_mask_3d` is True. 47 | - pts_semantic_mask_path (str): Path of semantic mask file. 48 | Only when `with_seg_3d` is True. 49 | - pts_panoptic_mask_path (str): Path of panoptic mask file. 50 | Only when both `with_panoptic_3d` is True. 51 | 52 | Added Keys: 53 | 54 | - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes` | 55 | :obj:`DepthInstance3DBoxes` | :obj:`CameraInstance3DBoxes`): 56 | 3D ground truth bboxes. Only when `with_bbox_3d` is True 57 | - gt_labels_3d (np.int64): Labels of ground truths. 58 | Only when `with_label_3d` is True. 59 | - gt_bboxes (np.float32): 2D ground truth bboxes. 60 | Only when `with_bbox` is True. 61 | - gt_labels (np.int64): Labels of ground truths. 62 | Only when `with_label` is True. 63 | - depths (np.float32): Only when 64 | `with_bbox_depth` is True. 65 | - centers_2d (np.ndarray): Only when 66 | `with_bbox_depth` is True. 67 | - attr_labels (np.int64): Attribute labels of instances. 68 | Only when `with_attr_label` is True. 69 | - pts_instance_mask (np.int64): Instance mask of each point. 70 | Only when `with_mask_3d` is True. 71 | - pts_semantic_mask (np.int64): Semantic mask of each point. 72 | Only when `with_seg_3d` is True. 73 | 74 | Args: 75 | with_bbox_3d (bool): Whether to load 3D boxes. Defaults to True. 76 | with_label_3d (bool): Whether to load 3D labels. Defaults to True. 77 | with_attr_label (bool): Whether to load attribute label. 78 | Defaults to False. 79 | with_mask_3d (bool): Whether to load 3D instance masks for points. 80 | Defaults to False. 81 | with_seg_3d (bool): Whether to load 3D semantic masks for points. 82 | Defaults to False. 83 | with_bbox (bool): Whether to load 2D boxes. Defaults to False. 84 | with_label (bool): Whether to load 2D labels. Defaults to False. 85 | with_mask (bool): Whether to load 2D instance masks. Defaults to False. 86 | with_seg (bool): Whether to load 2D semantic masks. Defaults to False. 87 | with_bbox_depth (bool): Whether to load 2.5D boxes. Defaults to False. 88 | with_panoptic_3d (bool): Whether to load 3D panoptic masks for points. 89 | Defaults to False. 90 | poly2mask (bool): Whether to convert polygon annotations to bitmasks. 91 | Defaults to True. 92 | seg_3d_dtype (str): String of dtype of 3D semantic masks. 93 | Defaults to 'np.int64'. 94 | seg_offset (int): The offset to split semantic and instance labels from 95 | panoptic labels. Defaults to None. 96 | dataset_type (str): Type of dataset used for splitting semantic and 97 | instance labels. Defaults to None. 98 | backend_args (dict, optional): Arguments to instantiate the 99 | corresponding backend. Defaults to None. 100 | """ 101 | 102 | def __init__(self, 103 | with_bbox_3d: bool = True, 104 | with_label_3d: bool = True, 105 | with_attr_label: bool = False, 106 | with_mask_3d: bool = False, 107 | with_seg_3d: bool = False, 108 | with_bbox: bool = False, 109 | with_label: bool = False, 110 | with_mask: bool = False, 111 | with_seg: bool = False, 112 | with_bbox_depth: bool = False, 113 | with_panoptic_3d: bool = False, 114 | poly2mask: bool = True, 115 | seg_3d_dtype: str = 'np.int64', 116 | seg_offset: int = None, 117 | dataset_type: str = None, 118 | backend_args: Optional[dict] = None) -> None: 119 | super().__init__( 120 | with_bbox=with_bbox, 121 | with_label=with_label, 122 | with_mask=with_mask, 123 | with_seg=with_seg, 124 | poly2mask=poly2mask, 125 | backend_args=backend_args) 126 | self.with_bbox_3d = with_bbox_3d 127 | self.with_bbox_depth = with_bbox_depth 128 | self.with_label_3d = with_label_3d 129 | self.with_attr_label = with_attr_label 130 | self.with_mask_3d = with_mask_3d 131 | self.with_seg_3d = with_seg_3d 132 | self.with_panoptic_3d = with_panoptic_3d 133 | self.seg_3d_dtype = eval(seg_3d_dtype) 134 | self.seg_offset = seg_offset 135 | self.dataset_type = dataset_type 136 | 137 | def _load_bboxes_3d(self, results: dict) -> dict: 138 | """Private function to move the 3D bounding box annotation from 139 | `ann_info` field to the root of `results`. 140 | 141 | Args: 142 | results (dict): Result dict from :obj:`mmdet3d.CustomDataset`. 143 | 144 | Returns: 145 | dict: The dict containing loaded 3D bounding box annotations. 146 | """ 147 | 148 | results['gt_bboxes_3d'] = results['ann_info']['gt_bboxes_3d'] 149 | return results 150 | 151 | def _load_bboxes_depth(self, results: dict) -> dict: 152 | """Private function to load 2.5D bounding box annotations. 153 | 154 | Args: 155 | results (dict): Result dict from :obj:`mmdet3d.CustomDataset`. 156 | 157 | Returns: 158 | dict: The dict containing loaded 2.5D bounding box annotations. 159 | """ 160 | 161 | results['depths'] = results['ann_info']['depths'] 162 | results['centers_2d'] = results['ann_info']['centers_2d'] 163 | return results 164 | 165 | def _load_labels_3d(self, results: dict) -> dict: 166 | """Private function to load label annotations. 167 | 168 | Args: 169 | results (dict): Result dict from :obj:`mmdet3d.CustomDataset`. 170 | 171 | Returns: 172 | dict: The dict containing loaded label annotations. 173 | """ 174 | 175 | results['gt_labels_3d'] = results['ann_info']['gt_labels_3d'] 176 | return results 177 | 178 | def _load_attr_labels(self, results: dict) -> dict: 179 | """Private function to load label annotations. 180 | 181 | Args: 182 | results (dict): Result dict from :obj:`mmdet3d.CustomDataset`. 183 | 184 | Returns: 185 | dict: The dict containing loaded label annotations. 186 | """ 187 | results['attr_labels'] = results['ann_info']['attr_labels'] 188 | return results 189 | 190 | def _load_masks_3d(self, results: dict) -> dict: 191 | """Private function to load 3D mask annotations. 192 | 193 | Args: 194 | results (dict): Result dict from :obj:`mmdet3d.CustomDataset`. 195 | 196 | Returns: 197 | dict: The dict containing loaded 3D mask annotations. 198 | """ 199 | pts_instance_mask_path = results['pts_instance_mask_path'] 200 | 201 | try: 202 | mask_bytes = get( 203 | pts_instance_mask_path, backend_args=self.backend_args) 204 | pts_instance_mask = np.frombuffer(mask_bytes, dtype=np.int64) 205 | except ConnectionError: 206 | mmengine.check_file_exist(pts_instance_mask_path) 207 | pts_instance_mask = np.fromfile( 208 | pts_instance_mask_path, dtype=np.int64) 209 | 210 | results['pts_instance_mask'] = pts_instance_mask 211 | # 'eval_ann_info' will be passed to evaluator 212 | if 'eval_ann_info' in results: 213 | results['eval_ann_info']['pts_instance_mask'] = pts_instance_mask 214 | return results 215 | 216 | def _load_semantic_seg_3d(self, results: dict) -> dict: 217 | """Private function to load 3D semantic segmentation annotations. 218 | 219 | Args: 220 | results (dict): Result dict from :obj:`mmdet3d.CustomDataset`. 221 | 222 | Returns: 223 | dict: The dict containing the semantic segmentation annotations. 224 | """ 225 | pts_semantic_mask_path = results['pts_semantic_mask_path'] 226 | 227 | try: 228 | mask_bytes = get( 229 | pts_semantic_mask_path, backend_args=self.backend_args) 230 | # add .copy() to fix read-only bug 231 | pts_semantic_mask = np.frombuffer( 232 | mask_bytes, dtype=self.seg_3d_dtype).copy() 233 | except ConnectionError: 234 | mmengine.check_file_exist(pts_semantic_mask_path) 235 | pts_semantic_mask = np.fromfile( 236 | pts_semantic_mask_path, dtype=np.int64) 237 | 238 | pts_semantic_mask = pts_semantic_mask.astype(np.int64) 239 | if self.dataset_type == 'semantickitti': 240 | pts_semantic_mask = pts_semantic_mask % self.seg_offset 241 | # nuScenes loads semantic and panoptic labels from different files. 242 | 243 | results['pts_semantic_mask'] = pts_semantic_mask 244 | 245 | # 'eval_ann_info' will be passed to evaluator 246 | if 'eval_ann_info' in results: 247 | results['eval_ann_info']['pts_semantic_mask'] = pts_semantic_mask 248 | return results 249 | 250 | def _load_panoptic_3d(self, results: dict) -> dict: 251 | """Private function to load 3D panoptic segmentation annotations. 252 | 253 | Args: 254 | results (dict): Result dict from :obj:`mmdet3d.CustomDataset`. 255 | 256 | Returns: 257 | dict: The dict containing the panoptic segmentation annotations. 258 | """ 259 | pts_panoptic_mask_path = results['pts_panoptic_mask_path'] 260 | 261 | try: 262 | mask_bytes = get( 263 | pts_panoptic_mask_path, backend_args=self.backend_args) 264 | # add .copy() to fix read-only bug 265 | pts_panoptic_mask = np.frombuffer( 266 | mask_bytes, dtype=self.seg_3d_dtype).copy() 267 | except ConnectionError: 268 | mmengine.check_file_exist(pts_panoptic_mask_path) 269 | pts_panoptic_mask = np.load(pts_panoptic_mask_path) 270 | 271 | if self.dataset_type == 'semantickitti': 272 | pts_semantic_mask = pts_panoptic_mask.astype(np.int64) 273 | pts_semantic_mask = pts_semantic_mask % self.seg_offset 274 | elif self.dataset_type == 'nuscenes': 275 | pts_panoptic_mask = pts_panoptic_mask['data'] 276 | pts_semantic_mask = pts_panoptic_mask // self.seg_offset 277 | 278 | results['pts_semantic_mask'] = pts_semantic_mask 279 | 280 | # We can directly take panoptic labels as instance ids. 281 | pts_instance_mask = pts_panoptic_mask.astype(np.int64) 282 | results['pts_instance_mask'] = pts_instance_mask 283 | 284 | # 'eval_ann_info' will be passed to evaluator 285 | if 'eval_ann_info' in results: 286 | results['eval_ann_info']['pts_semantic_mask'] = pts_semantic_mask 287 | results['eval_ann_info']['pts_instance_mask'] = pts_instance_mask 288 | return results 289 | 290 | def _load_bboxes(self, results: dict) -> None: 291 | """Private function to load bounding box annotations. 292 | 293 | The only difference is it remove the proceess for 294 | `ignore_flag` 295 | 296 | Args: 297 | results (dict): Result dict from :obj:`mmcv.BaseDataset`. 298 | 299 | Returns: 300 | dict: The dict contains loaded bounding box annotations. 301 | """ 302 | 303 | results['gt_bboxes'] = results['ann_info']['gt_bboxes'] 304 | 305 | def _load_labels(self, results: dict) -> None: 306 | """Private function to load label annotations. 307 | 308 | Args: 309 | results (dict): Result dict from :obj :obj:`mmcv.BaseDataset`. 310 | 311 | Returns: 312 | dict: The dict contains loaded label annotations. 313 | """ 314 | results['gt_bboxes_labels'] = results['ann_info']['gt_bboxes_labels'] 315 | 316 | def transform(self, results: dict) -> dict: 317 | """Function to load multiple types annotations. 318 | 319 | Args: 320 | results (dict): Result dict from :obj:`mmdet3d.CustomDataset`. 321 | 322 | Returns: 323 | dict: The dict containing loaded 3D bounding box, label, mask and 324 | semantic segmentation annotations. 325 | """ 326 | results = super().transform(results) 327 | if self.with_bbox_3d: 328 | results = self._load_bboxes_3d(results) 329 | if self.with_bbox_depth: 330 | results = self._load_bboxes_depth(results) 331 | if self.with_label_3d: 332 | results = self._load_labels_3d(results) 333 | if self.with_attr_label: 334 | results = self._load_attr_labels(results) 335 | if self.with_panoptic_3d: 336 | results = self._load_panoptic_3d(results) 337 | if self.with_mask_3d: 338 | results = self._load_masks_3d(results) 339 | if self.with_seg_3d: 340 | results = self._load_semantic_seg_3d(results) 341 | return results 342 | 343 | def __repr__(self) -> str: 344 | """str: Return a string that describes the module.""" 345 | indent_str = ' ' 346 | repr_str = self.__class__.__name__ + '(\n' 347 | repr_str += f'{indent_str}with_bbox_3d={self.with_bbox_3d}, ' 348 | repr_str += f'{indent_str}with_label_3d={self.with_label_3d}, ' 349 | repr_str += f'{indent_str}with_attr_label={self.with_attr_label}, ' 350 | repr_str += f'{indent_str}with_mask_3d={self.with_mask_3d}, ' 351 | repr_str += f'{indent_str}with_seg_3d={self.with_seg_3d}, ' 352 | repr_str += f'{indent_str}with_panoptic_3d={self.with_panoptic_3d}, ' 353 | repr_str += f'{indent_str}with_bbox={self.with_bbox}, ' 354 | repr_str += f'{indent_str}with_label={self.with_label}, ' 355 | repr_str += f'{indent_str}with_mask={self.with_mask}, ' 356 | repr_str += f'{indent_str}with_seg={self.with_seg}, ' 357 | repr_str += f'{indent_str}with_bbox_depth={self.with_bbox_depth}, ' 358 | repr_str += f'{indent_str}poly2mask={self.poly2mask})' 359 | repr_str += f'{indent_str}seg_offset={self.seg_offset})' 360 | 361 | return repr_str 362 | 363 | 364 | -------------------------------------------------------------------------------- /datasets/transforms/transforms_3d.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import random 3 | import warnings 4 | from typing import List, Optional, Sequence, Tuple, Union 5 | 6 | import cv2 7 | import mmcv 8 | import numpy as np 9 | import torch 10 | from mmcv.transforms import BaseTransform, Compose, RandomResize, Resize 11 | from mmdet.datasets.transforms import (PhotoMetricDistortion, RandomCrop, 12 | RandomFlip) 13 | from mmengine import is_list_of, is_tuple_of 14 | 15 | from mmdet3d.models.task_modules import VoxelGenerator 16 | from mmdet3d.registry import TRANSFORMS 17 | from mmdet3d.structures import (CameraInstance3DBoxes, DepthInstance3DBoxes, 18 | LiDARInstance3DBoxes) 19 | from mmdet3d.structures.ops import box_np_ops 20 | from mmdet3d.structures.points import BasePoints 21 | from mmdet3d.datasets.transforms.data_augment_utils import noise_per_object_v3_ 22 | 23 | 24 | @TRANSFORMS.register_module(force=True) 25 | class _PolarMix(BaseTransform): 26 | """PolarMix data augmentation. 27 | 28 | The polarmix transform steps are as follows: 29 | 30 | 1. Another random point cloud is picked by dataset. 31 | 2. Exchange sectors of two point clouds that are cut with certain 32 | azimuth angles. 33 | 3. Cut point instances from picked point cloud, rotate them by multiple 34 | azimuth angles, and paste the cut and rotated instances. 35 | 36 | Required Keys: 37 | 38 | - points (:obj:`BasePoints`) 39 | - pts_semantic_mask (np.int64) 40 | - dataset (:obj:`BaseDataset`) 41 | 42 | Modified Keys: 43 | 44 | - points (:obj:`BasePoints`) 45 | - pts_semantic_mask (np.int64) 46 | 47 | Args: 48 | instance_classes (List[int]): Semantic masks which represent the 49 | instance. 50 | swap_ratio (float): Swap ratio of two point cloud. Defaults to 0.5. 51 | rotate_paste_ratio (float): Rotate paste ratio. Defaults to 1.0. 52 | pre_transform (Sequence[dict], optional): Sequence of transform object 53 | or config dict to be composed. Defaults to None. 54 | prob (float): The transformation probability. Defaults to 1.0. 55 | """ 56 | 57 | def __init__(self, 58 | instance_classes: List[int], 59 | swap_ratio: float = 0.5, 60 | rotate_paste_ratio: float = 1.0, 61 | pre_transform: Optional[Sequence[dict]] = None, 62 | prob: float = 1.0) -> None: 63 | assert is_list_of(instance_classes, int), \ 64 | 'instance_classes should be a list of int' 65 | self.instance_classes = instance_classes 66 | self.swap_ratio = swap_ratio 67 | self.rotate_paste_ratio = rotate_paste_ratio 68 | 69 | self.prob = prob 70 | if pre_transform is None: 71 | self.pre_transform = None 72 | else: 73 | self.pre_transform = Compose(pre_transform) 74 | 75 | def polar_mix_transform(self, input_dict: dict, mix_results: dict) -> dict: 76 | """PolarMix transform function. 77 | 78 | Args: 79 | input_dict (dict): Result dict from loading pipeline. 80 | mix_results (dict): Mixed dict picked from dataset. 81 | 82 | Returns: 83 | dict: output dict after transformation. 84 | """ 85 | mix_points = mix_results['points'] 86 | mix_pts_semantic_mask = mix_results['pts_semantic_mask'] 87 | 88 | points = input_dict['points'] 89 | pts_semantic_mask = input_dict['pts_semantic_mask'] 90 | 91 | mix_panoptic = False 92 | if 'pts_instance_mask' in mix_results: 93 | mix_instance_mask = mix_results['pts_instance_mask'] 94 | mix_instance_mask += (1000<<16) # not overlap id 95 | pts_instance_mask = input_dict['pts_instance_mask'] 96 | mix_panoptic = True 97 | 98 | # 1. swap point cloud 99 | if np.random.random() < self.swap_ratio: 100 | start_angle = (np.random.random() - 1) * np.pi # -pi~0 101 | end_angle = start_angle + np.pi 102 | # calculate horizontal angle for each point 103 | yaw = -torch.atan2(points.coord[:, 1], points.coord[:, 0]) 104 | mix_yaw = -torch.atan2(mix_points.coord[:, 1], mix_points.coord[:, 105 | 0]) 106 | 107 | # select points in sector 108 | idx = (yaw <= start_angle) | (yaw >= end_angle) 109 | mix_idx = (mix_yaw > start_angle) & (mix_yaw < end_angle) 110 | 111 | # swap 112 | points = points.cat([points[idx], mix_points[mix_idx]]) 113 | pts_semantic_mask = np.concatenate( 114 | (pts_semantic_mask[idx.numpy()], 115 | mix_pts_semantic_mask[mix_idx.numpy()]), 116 | axis=0) 117 | 118 | if mix_panoptic: 119 | pts_instance_mask = np.concatenate( 120 | (pts_instance_mask[idx.numpy()], 121 | mix_instance_mask[mix_idx.numpy()]), 122 | axis=0) 123 | 124 | # 2. rotate-pasting 125 | if np.random.random() < self.rotate_paste_ratio: 126 | # extract instance points 127 | instance_points, instance_pts_semantic_mask = [], [] 128 | if mix_panoptic: 129 | instance_pts_instance_mask = [] 130 | for instance_class in self.instance_classes: 131 | mix_idx = mix_pts_semantic_mask == instance_class 132 | instance_points.append(mix_points[mix_idx]) 133 | instance_pts_semantic_mask.append( 134 | mix_pts_semantic_mask[mix_idx]) 135 | if mix_panoptic: 136 | instance_pts_instance_mask.append(mix_instance_mask[mix_idx]) 137 | instance_points = mix_points.cat(instance_points) 138 | instance_pts_semantic_mask = np.concatenate( 139 | instance_pts_semantic_mask, axis=0) 140 | if mix_panoptic: 141 | instance_pts_instance_mask = np.concatenate( 142 | instance_pts_instance_mask, axis=0) 143 | 144 | # rotate-copy 145 | copy_points = [instance_points] 146 | copy_pts_semantic_mask = [instance_pts_semantic_mask] 147 | if mix_panoptic: 148 | copy_pts_instance_mask = [instance_pts_instance_mask] 149 | angle_list = [ 150 | np.random.random() * np.pi * 2 / 3, 151 | (np.random.random() + 1) * np.pi * 2 / 3 152 | ] 153 | for angle in angle_list: 154 | new_points = instance_points.clone() 155 | new_points.rotate(angle) 156 | copy_points.append(new_points) 157 | copy_pts_semantic_mask.append(instance_pts_semantic_mask) 158 | if mix_panoptic: 159 | copy_pts_instance_mask.append(instance_pts_instance_mask) 160 | copy_points = instance_points.cat(copy_points) 161 | copy_pts_semantic_mask = np.concatenate( 162 | copy_pts_semantic_mask, axis=0) 163 | if mix_panoptic: 164 | copy_pts_instance_mask = np.concatenate( 165 | copy_pts_instance_mask, axis=0) 166 | 167 | points = points.cat([points, copy_points]) 168 | pts_semantic_mask = np.concatenate( 169 | (pts_semantic_mask, copy_pts_semantic_mask), axis=0) 170 | if mix_panoptic: 171 | pts_instance_mask = np.concatenate( 172 | (pts_instance_mask, copy_pts_instance_mask), axis=0) 173 | 174 | input_dict['points'] = points 175 | input_dict['pts_semantic_mask'] = pts_semantic_mask 176 | if mix_panoptic: 177 | input_dict['pts_instance_mask'] = pts_instance_mask 178 | return input_dict 179 | 180 | def transform(self, input_dict: dict) -> dict: 181 | """PolarMix transform function. 182 | 183 | Args: 184 | input_dict (dict): Result dict from loading pipeline. 185 | 186 | Returns: 187 | dict: output dict after transformation. 188 | """ 189 | if np.random.rand() > self.prob: 190 | return input_dict 191 | 192 | assert 'dataset' in input_dict, \ 193 | '`dataset` is needed to pass through PolarMix, while not found.' 194 | dataset = input_dict['dataset'] 195 | 196 | # get index of other point cloud 197 | index = np.random.randint(0, len(dataset)) 198 | 199 | mix_results = dataset.get_data_info(index) 200 | 201 | if self.pre_transform is not None: 202 | # pre_transform may also require dataset 203 | mix_results.update({'dataset': dataset}) 204 | # before polarmix need to go through 205 | # the necessary pre_transform 206 | mix_results = self.pre_transform(mix_results) 207 | mix_results.pop('dataset') 208 | 209 | input_dict = self.polar_mix_transform(input_dict, mix_results) 210 | 211 | return input_dict 212 | 213 | def __repr__(self) -> str: 214 | """str: Return a string that describes the module.""" 215 | repr_str = self.__class__.__name__ 216 | repr_str += f'(instance_classes={self.instance_classes}, ' 217 | repr_str += f'swap_ratio={self.swap_ratio}, ' 218 | repr_str += f'rotate_paste_ratio={self.rotate_paste_ratio}, ' 219 | repr_str += f'pre_transform={self.pre_transform}, ' 220 | repr_str += f'prob={self.prob})' 221 | return repr_str 222 | 223 | 224 | @TRANSFORMS.register_module(force=True) 225 | class _LaserMix(BaseTransform): 226 | """LaserMix data augmentation. 227 | 228 | The lasermix transform steps are as follows: 229 | 230 | 1. Another random point cloud is picked by dataset. 231 | 2. Divide the point cloud into several regions according to pitch 232 | angles and combine the areas crossly. 233 | 234 | Required Keys: 235 | 236 | - points (:obj:`BasePoints`) 237 | - pts_semantic_mask (np.int64) 238 | - dataset (:obj:`BaseDataset`) 239 | 240 | Modified Keys: 241 | 242 | - points (:obj:`BasePoints`) 243 | - pts_semantic_mask (np.int64) 244 | 245 | Args: 246 | num_areas (List[int]): A list of area numbers will be divided into. 247 | pitch_angles (Sequence[float]): Pitch angles used to divide areas. 248 | pre_transform (Sequence[dict], optional): Sequence of transform object 249 | or config dict to be composed. Defaults to None. 250 | prob (float): The transformation probability. Defaults to 1.0. 251 | """ 252 | 253 | def __init__(self, 254 | num_areas: List[int], 255 | pitch_angles: Sequence[float], 256 | pre_transform: Optional[Sequence[dict]] = None, 257 | prob: float = 1.0) -> None: 258 | assert is_list_of(num_areas, int), \ 259 | 'num_areas should be a list of int.' 260 | self.num_areas = num_areas 261 | 262 | assert len(pitch_angles) == 2, \ 263 | 'The length of pitch_angles should be 2, ' \ 264 | f'but got {len(pitch_angles)}.' 265 | assert pitch_angles[1] > pitch_angles[0], \ 266 | 'pitch_angles[1] should be larger than pitch_angles[0].' 267 | self.pitch_angles = pitch_angles 268 | 269 | self.prob = prob 270 | if pre_transform is None: 271 | self.pre_transform = None 272 | else: 273 | self.pre_transform = Compose(pre_transform) 274 | 275 | def laser_mix_transform(self, input_dict: dict, mix_results: dict) -> dict: 276 | """LaserMix transform function. 277 | 278 | Args: 279 | input_dict (dict): Result dict from loading pipeline. 280 | mix_results (dict): Mixed dict picked from dataset. 281 | 282 | Returns: 283 | dict: output dict after transformation. 284 | """ 285 | mix_points = mix_results['points'] 286 | mix_pts_semantic_mask = mix_results['pts_semantic_mask'] 287 | 288 | points = input_dict['points'] 289 | pts_semantic_mask = input_dict['pts_semantic_mask'] 290 | 291 | rho = torch.sqrt(points.coord[:, 0]**2 + points.coord[:, 1]**2) 292 | pitch = torch.atan2(points.coord[:, 2], rho) 293 | pitch = torch.clamp(pitch, self.pitch_angles[0] + 1e-5, 294 | self.pitch_angles[1] - 1e-5) 295 | 296 | mix_rho = torch.sqrt(mix_points.coord[:, 0]**2 + 297 | mix_points.coord[:, 1]**2) 298 | mix_pitch = torch.atan2(mix_points.coord[:, 2], mix_rho) 299 | mix_pitch = torch.clamp(mix_pitch, self.pitch_angles[0] + 1e-5, 300 | self.pitch_angles[1] - 1e-5) 301 | 302 | num_areas = np.random.choice(self.num_areas, size=1)[0] 303 | angle_list = np.linspace(self.pitch_angles[1], self.pitch_angles[0], 304 | num_areas + 1) 305 | out_points = [] 306 | out_pts_semantic_mask = [] 307 | 308 | mix_panoptic = False 309 | if 'pts_instance_mask' in mix_results: 310 | mix_instance_mask = mix_results['pts_instance_mask'] 311 | mix_instance_mask += (1000<<16) # not overlap id 312 | pts_instance_mask = input_dict['pts_instance_mask'] 313 | out_pts_instance_mask = [] 314 | mix_panoptic = True 315 | 316 | for i in range(num_areas): 317 | # convert angle to radian 318 | start_angle = angle_list[i + 1] / 180 * np.pi 319 | end_angle = angle_list[i] / 180 * np.pi 320 | if i % 2 == 0: # pick from original point cloud 321 | idx = (pitch > start_angle) & (pitch <= end_angle) 322 | out_points.append(points[idx]) 323 | out_pts_semantic_mask.append(pts_semantic_mask[idx.numpy()]) 324 | if mix_panoptic: 325 | out_pts_instance_mask.append(pts_instance_mask[idx.numpy()]) 326 | else: # pickle from mixed point cloud 327 | idx = (mix_pitch > start_angle) & (mix_pitch <= end_angle) 328 | out_points.append(mix_points[idx]) 329 | out_pts_semantic_mask.append( 330 | mix_pts_semantic_mask[idx.numpy()]) 331 | if mix_panoptic: 332 | out_pts_instance_mask.append(mix_instance_mask[idx.numpy()]) 333 | out_points = points.cat(out_points) 334 | out_pts_semantic_mask = np.concatenate(out_pts_semantic_mask, axis=0) 335 | input_dict['points'] = out_points 336 | input_dict['pts_semantic_mask'] = out_pts_semantic_mask 337 | 338 | if mix_panoptic: 339 | out_pts_instance_mask = np.concatenate(out_pts_instance_mask, axis=0) 340 | input_dict['pts_instance_mask'] = out_pts_instance_mask 341 | return input_dict 342 | 343 | def transform(self, input_dict: dict) -> dict: 344 | """LaserMix transform function. 345 | 346 | Args: 347 | input_dict (dict): Result dict from loading pipeline. 348 | 349 | Returns: 350 | dict: output dict after transformation. 351 | """ 352 | if np.random.rand() > self.prob: 353 | return input_dict 354 | 355 | assert 'dataset' in input_dict, \ 356 | '`dataset` is needed to pass through LaserMix, while not found.' 357 | dataset = input_dict['dataset'] 358 | 359 | # get index of other point cloud 360 | index = np.random.randint(0, len(dataset)) 361 | 362 | mix_results = dataset.get_data_info(index) 363 | 364 | if self.pre_transform is not None: 365 | # pre_transform may also require dataset 366 | mix_results.update({'dataset': dataset}) 367 | # before lasermix need to go through 368 | # the necessary pre_transform 369 | mix_results = self.pre_transform(mix_results) 370 | mix_results.pop('dataset') 371 | 372 | input_dict = self.laser_mix_transform(input_dict, mix_results) 373 | 374 | return input_dict 375 | 376 | def __repr__(self) -> str: 377 | """str: Return a string that describes the module.""" 378 | repr_str = self.__class__.__name__ 379 | repr_str += f'(num_areas={self.num_areas}, ' 380 | repr_str += f'pitch_angles={self.pitch_angles}, ' 381 | repr_str += f'pre_transform={self.pre_transform}, ' 382 | repr_str += f'prob={self.prob})' 383 | return repr_str 384 | -------------------------------------------------------------------------------- /dist_test.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | CONFIG=$1 4 | CHECKPOINT=$2 5 | GPUS=$3 6 | NNODES=${NNODES:-1} 7 | NODE_RANK=${NODE_RANK:-0} 8 | PORT=${PORT:-29500} 9 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"} 10 | 11 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 12 | python -m torch.distributed.launch \ 13 | --nnodes=$NNODES \ 14 | --node_rank=$NODE_RANK \ 15 | --master_addr=$MASTER_ADDR \ 16 | --nproc_per_node=$GPUS \ 17 | --master_port=$PORT \ 18 | $(dirname "$0")/test.py \ 19 | $CONFIG \ 20 | $CHECKPOINT \ 21 | --launcher pytorch \ 22 | ${@:4} 23 | -------------------------------------------------------------------------------- /dist_train.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | CONFIG=$1 4 | GPUS=$2 5 | NNODES=${NNODES:-1} 6 | NODE_RANK=${NODE_RANK:-0} 7 | PORT=${PORT:-29503} 8 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"} 9 | 10 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 11 | python -m torch.distributed.launch \ 12 | --nnodes=$NNODES \ 13 | --node_rank=$NODE_RANK \ 14 | --master_addr=$MASTER_ADDR \ 15 | --nproc_per_node=$GPUS \ 16 | --master_port=$PORT \ 17 | $(dirname "$0")/train.py \ 18 | $CONFIG \ 19 | --launcher pytorch ${@:3} \ 20 | -------------------------------------------------------------------------------- /evaluation/functional/panoptic_seg_eval.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | # Modified from mmdetection3d. 3 | from typing import Dict, List, Tuple 4 | 5 | import numpy as np 6 | from mmengine.logging import MMLogger, print_log 7 | 8 | PQReturnsType = Tuple[np.double, np.double, np.ndarray, np.ndarray, np.ndarray] 9 | 10 | 11 | class EvalPanoptic: 12 | r"""Evaluate panoptic results for Semantickitti and NuScenes. 13 | Please refer to the `semantic kitti api 14 | `_ for more details 15 | 16 | Args: 17 | classes (list): Classes used in the dataset. 18 | thing_classes (list): Thing classes used in the dataset. 19 | stuff_classes (list): Stuff classes used in the dataset. 20 | include (list): Include classes in the dataset. 21 | dataset_type (str): Type of dataset. 22 | min_num_points (int): Minimum number of points of an object to be 23 | counted as ground truth in evaluation. 24 | id_offset (int): Offset for instance ids to concat with 25 | semantic labels. 26 | label2cat (dict[str]): Mapping from label to category. 27 | ignore_index (int): Indices of ignored classes in evaluation. 28 | logger (logging.Logger | str, optional): Logger used for printing. 29 | Defaults to None. 30 | """ 31 | 32 | def __init__(self, 33 | classes: List[str], 34 | thing_classes: List[str], 35 | stuff_classes: List[str], 36 | include: List[int], 37 | dataset_type: str, 38 | min_num_points: int, 39 | id_offset: int, 40 | label2cat: Dict[str, str], 41 | ignore_index: int, 42 | logger: MMLogger = None): 43 | self.classes = classes 44 | self.thing_classes = thing_classes 45 | self.stuff_classes = stuff_classes 46 | self.include = include 47 | self.dataset_type = dataset_type 48 | self.ignore_index = ignore_index 49 | self.num_classes = len(classes) 50 | self.print_ignore_label=True 51 | if len(include) == self.num_classes: 52 | self.num_classes += 1 53 | self.print_ignore_label=False 54 | self.label2cat = label2cat 55 | self.logger = logger 56 | self.id_offset = id_offset 57 | self.eps = 1e-15 58 | self.min_num_points = min_num_points 59 | self.reset() 60 | 61 | def reset(self): 62 | """Reset class variables.""" 63 | # general things 64 | # iou stuff 65 | self.confusion_matrix = np.zeros((self.num_classes, self.num_classes), 66 | dtype=int) 67 | # panoptic stuff 68 | self.pan_tp = np.zeros(self.num_classes, dtype=int) 69 | self.pan_iou = np.zeros(self.num_classes, dtype=np.double) 70 | self.pan_fp = np.zeros(self.num_classes, dtype=int) 71 | self.pan_fn = np.zeros(self.num_classes, dtype=int) 72 | 73 | self.evaluated_fnames = [] 74 | 75 | def evaluate(self, gt_labels: List[Dict[str, np.ndarray]], 76 | seg_preds: List[Dict[str, np.ndarray]]) -> Dict[str, float]: 77 | """Evaluate the predictions. 78 | 79 | Args: 80 | gt_labels (list[dict[np.ndarray]]): Ground Truth. 81 | seg_preds (list[dict[np.ndarray]]): Predictions. 82 | 83 | Returns: 84 | dict[float]: The computed metrics. The keys are the names of 85 | the metrics, and the values are corresponding results. 86 | """ 87 | assert len(seg_preds) == len(gt_labels) 88 | for f in range(len(seg_preds)): 89 | gt_semantic_seg = gt_labels[f]['pts_semantic_mask'].astype(int) 90 | gt_instance_seg = gt_labels[f]['pts_instance_mask'].astype(int) 91 | pred_semantic_seg = seg_preds[f]['pts_semantic_mask'].astype(int) 92 | pred_instance_seg = seg_preds[f]['pts_instance_mask'].astype(int) 93 | 94 | valid_mask = gt_semantic_seg != self.ignore_index 95 | gt_semantic_seg = gt_semantic_seg[valid_mask] 96 | gt_instance_seg = gt_instance_seg[valid_mask] 97 | pred_semantic_seg = pred_semantic_seg[valid_mask] 98 | pred_instance_seg = pred_instance_seg[valid_mask] 99 | 100 | self.add_semantic_sample(pred_semantic_seg, gt_semantic_seg) 101 | self.add_panoptic_sample(pred_semantic_seg, gt_semantic_seg, 102 | pred_instance_seg, gt_instance_seg) 103 | 104 | result_dicts = self.print_results() 105 | 106 | return result_dicts 107 | 108 | def print_results(self) -> Dict[str, float]: 109 | """Print results. 110 | 111 | Returns: 112 | dict[float]: The computed metrics. The keys are the names of 113 | the metrics, and the values are corresponding results. 114 | """ 115 | pq, sq, rq, all_pq, all_sq, all_rq = self.get_pq() 116 | miou, iou = self.get_iou() 117 | 118 | # now make a nice dictionary 119 | output_dict = {} 120 | 121 | # make python variables 122 | pq = pq.item() 123 | sq = sq.item() 124 | rq = rq.item() 125 | all_pq = all_pq.flatten().tolist() 126 | all_sq = all_sq.flatten().tolist() 127 | all_rq = all_rq.flatten().tolist() 128 | miou = miou.item() 129 | iou = iou.flatten().tolist() 130 | 131 | output_dict['all'] = {} 132 | output_dict['all']['pq'] = pq 133 | output_dict['all']['sq'] = sq 134 | output_dict['all']['rq'] = rq 135 | output_dict['all']['miou'] = miou 136 | for idx, (_pq, _sq, _rq, 137 | _iou) in enumerate(zip(all_pq, all_sq, all_rq, iou)): 138 | class_str = self.classes[idx] 139 | output_dict[class_str] = {} 140 | output_dict[class_str]['pq'] = _pq 141 | output_dict[class_str]['sq'] = _sq 142 | output_dict[class_str]['rq'] = _rq 143 | output_dict[class_str]['miou'] = _iou 144 | 145 | pq_dagger = np.mean( 146 | [float(output_dict[c]['pq']) for c in self.thing_classes] + 147 | [float(output_dict[c]['miou']) for c in self.stuff_classes]) 148 | 149 | pq_things = np.mean( 150 | [float(output_dict[c]['pq']) for c in self.thing_classes]) 151 | rq_things = np.mean( 152 | [float(output_dict[c]['rq']) for c in self.thing_classes]) 153 | sq_things = np.mean( 154 | [float(output_dict[c]['sq']) for c in self.thing_classes]) 155 | 156 | pq_stuff = np.mean( 157 | [float(output_dict[c]['pq']) for c in self.stuff_classes]) 158 | rq_stuff = np.mean( 159 | [float(output_dict[c]['rq']) for c in self.stuff_classes]) 160 | sq_stuff = np.mean( 161 | [float(output_dict[c]['sq']) for c in self.stuff_classes]) 162 | 163 | result_dicts = {} 164 | result_dicts['pq'] = float(pq) 165 | result_dicts['pq_dagger'] = float(pq_dagger) 166 | result_dicts['sq_mean'] = float(sq) 167 | result_dicts['rq_mean'] = float(rq) 168 | result_dicts['miou'] = float(miou) 169 | result_dicts['pq_stuff'] = float(pq_stuff) 170 | result_dicts['rq_stuff'] = float(rq_stuff) 171 | result_dicts['sq_stuff'] = float(sq_stuff) 172 | result_dicts['pq_things'] = float(pq_things) 173 | result_dicts['rq_things'] = float(rq_things) 174 | result_dicts['sq_things'] = float(sq_things) 175 | 176 | if self.logger is not None: 177 | print_log('| | IoU | PQ | RQ | SQ |', 178 | self.logger) 179 | for k, v in output_dict.items(): 180 | print_log( 181 | '|{}| {:.4f} | {:.4f} | {:.4f} | {:.4f} |'.format( 182 | k.ljust(8)[-8:], v['miou'], v['pq'], v['rq'], v['sq']), 183 | self.logger) 184 | print_log('True Positive: ', self.logger) 185 | print_log('\t|\t'.join([str(x) for x in self.pan_tp]), self.logger) 186 | print_log('False Positive: ') 187 | print_log('\t|\t'.join([str(x) for x in self.pan_fp]), self.logger) 188 | print_log('False Negative: ') 189 | print_log('\t|\t'.join([str(x) for x in self.pan_fn]), self.logger) 190 | 191 | else: 192 | print('| | IoU | PQ | RQ | SQ |') 193 | for k, v in output_dict.items(): 194 | print('|{}| {:.4f} | {:.4f} | {:.4f} | {:.4f} |'.format( 195 | k.ljust(8)[-8:], v['miou'], v['pq'], v['rq'], v['sq'])) 196 | print('True Positive: ') 197 | print('\t|\t'.join([str(x) for x in self.pan_tp])) 198 | print('False Positive: ') 199 | print('\t|\t'.join([str(x) for x in self.pan_fp])) 200 | print('False Negative: ') 201 | print('\t|\t'.join([str(x) for x in self.pan_fn])) 202 | 203 | return result_dicts 204 | 205 | def get_pq(self) -> PQReturnsType: 206 | """Get results of PQ metric. 207 | 208 | Returns: 209 | tuple(np.ndarray): PQ, SQ, RQ of each class and all class. 210 | """ 211 | # get PQ and first calculate for all classes 212 | sq_all = self.pan_iou.astype(np.double) / np.maximum( 213 | self.pan_tp.astype(np.double), self.eps) 214 | rq_all = self.pan_tp.astype(np.double) / np.maximum( 215 | self.pan_tp.astype(np.double) + 0.5 * self.pan_fp.astype(np.double) 216 | + 0.5 * self.pan_fn.astype(np.double), self.eps) 217 | pq_all = sq_all * rq_all 218 | 219 | # then do the REAL mean (no ignored classes) 220 | sq = sq_all[self.include].mean() 221 | rq = rq_all[self.include].mean() 222 | pq = pq_all[self.include].mean() 223 | 224 | if not self.print_ignore_label: 225 | sq_all = sq_all[self.include] 226 | rq_all = rq_all[self.include] 227 | pq_all = pq_all[self.include] 228 | 229 | return (pq, sq, rq, pq_all, sq_all, rq_all) 230 | 231 | def get_iou(self) -> Tuple[np.double, np.ndarray]: 232 | """Get results of IOU metric. 233 | 234 | Returns: 235 | tuple(np.ndarray): iou of all class and each class. 236 | """ 237 | tp, fp, fn = self.get_iou_stats() 238 | intersection = tp 239 | union = tp + fp + fn 240 | union = np.maximum(union, self.eps) 241 | iou = intersection.astype(np.double) / union.astype(np.double) 242 | iou_mean = (intersection[self.include].astype(np.double) / 243 | union[self.include].astype(np.double)).mean() 244 | 245 | if not self.print_ignore_label: 246 | iou = iou[self.include] 247 | return iou_mean, iou 248 | 249 | def get_iou_stats(self) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: 250 | """Get IOU statistics of TP, FP and FN. 251 | 252 | Returns: 253 | tuple(np.ndarray): TP, FP, FN of all class. 254 | """ 255 | # copy to avoid modifying the real deal 256 | conf = self.confusion_matrix.copy().astype(np.double) 257 | # remove fp from confusion on the ignore classes predictions 258 | # points that were predicted of another class, but were ignore 259 | # (corresponds to zeroing the cols of those classes, 260 | # since the predictions go on the rows) 261 | 262 | # get the clean stats 263 | tp = conf.diagonal() 264 | fp = conf.sum(axis=1) - tp 265 | fn = conf.sum(axis=0) - tp 266 | return tp, fp, fn 267 | 268 | def add_semantic_sample(self, semantic_preds: np.ndarray, 269 | gt_semantics: np.ndarray): 270 | """Add one batch of semantic predictions and ground truths. 271 | 272 | Args: 273 | semantic_preds (np.ndarray): Semantic predictions. 274 | gt_semantics (np.ndarray): Semantic ground truths. 275 | """ 276 | idxs = np.stack([semantic_preds, gt_semantics], axis=0) 277 | # make confusion matrix (cols = gt, rows = pred) 278 | np.add.at(self.confusion_matrix, tuple(idxs), 1) 279 | 280 | def add_panoptic_sample(self, semantic_preds: np.ndarray, 281 | gt_semantics: np.ndarray, 282 | instance_preds: np.ndarray, 283 | gt_instances: np.ndarray): 284 | """Add one sample of panoptic predictions and ground truths for 285 | evaluation. 286 | 287 | Args: 288 | semantic_preds (np.ndarray): Semantic predictions. 289 | gt_semantics (np.ndarray): Semantic ground truths. 290 | instance_preds (np.ndarray): Instance predictions. 291 | gt_instances (np.ndarray): Instance ground truths. 292 | """ 293 | # avoid zero (ignored label) 294 | instance_preds = instance_preds + 1 295 | 296 | if self.dataset_type == 'nuscenes': 297 | gt_instances = gt_instances % self.id_offset + gt_semantics * self.id_offset 298 | elif self.dataset_type == 'semantickitti': 299 | gt_instances = gt_instances // self.id_offset * self.id_offset + gt_semantics 300 | gt_instances = gt_instances + 1 301 | 302 | # first step is to count intersections > 0.5 IoU 303 | # for each class (except the ignored ones) 304 | for cl in range(self.num_classes): 305 | # get a class mask 306 | pred_inst_in_cl_mask = semantic_preds == cl 307 | gt_inst_in_cl_mask = gt_semantics == cl 308 | 309 | # get instance points in class (makes outside stuff 0) 310 | pred_inst_in_cl = instance_preds * pred_inst_in_cl_mask.astype(int) 311 | gt_inst_in_cl = gt_instances * gt_inst_in_cl_mask.astype(int) 312 | 313 | # generate the areas for each unique instance prediction 314 | unique_pred, counts_pred = np.unique( 315 | pred_inst_in_cl[pred_inst_in_cl > 0], return_counts=True) 316 | id2idx_pred = {id: idx for idx, id in enumerate(unique_pred)} 317 | matched_pred = np.array([False] * unique_pred.shape[0]) 318 | 319 | # generate the areas for each unique instance gt_np 320 | unique_gt, counts_gt = np.unique( 321 | gt_inst_in_cl[gt_inst_in_cl > 0], return_counts=True) 322 | id2idx_gt = {id: idx for idx, id in enumerate(unique_gt)} 323 | matched_gt = np.array([False] * unique_gt.shape[0]) 324 | 325 | # generate intersection using offset 326 | valid_combos = np.logical_and(pred_inst_in_cl > 0, 327 | gt_inst_in_cl > 0) 328 | id_offset_combo = pred_inst_in_cl[ 329 | valid_combos] + self.id_offset * gt_inst_in_cl[valid_combos] 330 | unique_combo, counts_combo = np.unique( 331 | id_offset_combo, return_counts=True) 332 | 333 | # generate an intersection map 334 | # count the intersections with over 0.5 IoU as TP 335 | gt_labels = unique_combo // self.id_offset 336 | pred_labels = unique_combo % self.id_offset 337 | gt_areas = np.array([counts_gt[id2idx_gt[id]] for id in gt_labels]) 338 | pred_areas = np.array( 339 | [counts_pred[id2idx_pred[id]] for id in pred_labels]) 340 | intersections = counts_combo 341 | unions = gt_areas + pred_areas - intersections 342 | ious = intersections.astype(float) / unions.astype(float) 343 | 344 | tp_indexes = ious > 0.5 345 | self.pan_tp[cl] += np.sum(tp_indexes) 346 | self.pan_iou[cl] += np.sum(ious[tp_indexes]) 347 | 348 | matched_gt[[id2idx_gt[id] for id in gt_labels[tp_indexes]]] = True 349 | matched_pred[[id2idx_pred[id] 350 | for id in pred_labels[tp_indexes]]] = True 351 | 352 | # count the FN 353 | if len(counts_gt) > 0: 354 | self.pan_fn[cl] += np.sum( 355 | np.logical_and(counts_gt >= self.min_num_points, 356 | ~matched_gt)) 357 | 358 | # count the FP 359 | if len(matched_pred) > 0: 360 | self.pan_fp[cl] += np.sum( 361 | np.logical_and(counts_pred >= self.min_num_points, 362 | ~matched_pred)) 363 | 364 | 365 | def panoptic_seg_eval(gt_labels: List[np.ndarray], 366 | seg_preds: List[np.ndarray], 367 | classes: List[str], 368 | thing_classes: List[str], 369 | stuff_classes: List[str], 370 | include: List[int], 371 | dataset_type: str, 372 | min_num_points: int, 373 | id_offset: int, 374 | label2cat: Dict[str, str], 375 | ignore_index: int, 376 | logger: MMLogger = None) -> Dict[str, float]: 377 | """Panoptic Segmentation Evaluation. 378 | 379 | Evaluate the result of the panoptic segmentation. 380 | 381 | Args: 382 | gt_labels (list[dict[np.ndarray]]): Ground Truth. 383 | seg_preds (list[dict[np.ndarray]]): Predictions. 384 | classes (list[str]): Classes used in the dataset. 385 | thing_classes (list[str]): Thing classes used in the dataset. 386 | stuff_classes (list[str]): Stuff classes used in the dataset. 387 | include (list): Include classes in the dataset. 388 | dataset_type (str): Type of dataset. 389 | min_num_points (int): Minimum point number of object to be 390 | counted as ground truth in evaluation. 391 | id_offset (int): Offset for instance ids to concat with 392 | semantic labels. 393 | label2cat (dict[str]): Mapping from label to category. 394 | ignore_index (int): Indices of ignored classes in evaluation. 395 | logger (logging.Logger | str, optional): Logger used for printing. 396 | Defaults to None. 397 | 398 | Returns: 399 | dict[float]: Dict of results. 400 | """ 401 | panoptic_seg_eval = EvalPanoptic(classes, thing_classes, stuff_classes, include, dataset_type, 402 | min_num_points, id_offset, label2cat, 403 | ignore_index, logger) 404 | ret_dict = panoptic_seg_eval.evaluate(gt_labels, seg_preds) 405 | return ret_dict 406 | -------------------------------------------------------------------------------- /evaluation/metrics/panoptic_seg_metric.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | # Modified from mmdetection3d. 3 | from typing import Dict, List, Optional 4 | 5 | from mmengine.logging import MMLogger 6 | import mmengine 7 | from ..functional.panoptic_seg_eval import panoptic_seg_eval 8 | from mmdet3d.registry import METRICS 9 | from mmdet3d.evaluation import SegMetric 10 | import os 11 | import json 12 | import numpy as np 13 | 14 | 15 | @METRICS.register_module() 16 | class _PanopticSegMetric(SegMetric): 17 | def __init__(self, 18 | thing_class_inds: List[int], 19 | stuff_class_inds: List[int], 20 | min_num_points: int, 21 | id_offset: int, 22 | dataset_type: str, 23 | collect_device: str = 'cpu', 24 | prefix: Optional[str] = None, 25 | pklfile_prefix: str = None, 26 | submission_prefix: str = None, 27 | taskset: str = None, 28 | learning_map_inv = None, 29 | **kwargs): 30 | self.thing_class_inds = thing_class_inds 31 | self.stuff_class_inds = stuff_class_inds 32 | self.min_num_points = min_num_points 33 | self.id_offset = id_offset 34 | self.dataset_type=dataset_type 35 | self.taskset = taskset 36 | self.learning_map_inv = learning_map_inv 37 | 38 | super(_PanopticSegMetric, self).__init__( 39 | pklfile_prefix=pklfile_prefix, 40 | submission_prefix=submission_prefix, 41 | prefix=prefix, 42 | collect_device=collect_device, 43 | **kwargs) 44 | 45 | # TODO modify format_result for panoptic segmentation evaluation, \ 46 | # different datasets have different needs. 47 | 48 | def compute_metrics(self, results: list) -> Dict[str, float]: 49 | """Compute the metrics from processed results. 50 | 51 | Args: 52 | results (list): The processed results of each batch. 53 | 54 | Returns: 55 | Dict[str, float]: The computed metrics. The keys are the names of 56 | the metrics, and the values are corresponding results. 57 | """ 58 | logger: MMLogger = MMLogger.get_current_instance() 59 | 60 | if self.submission_prefix: # will report a error when finishing. 61 | self.format_results(results) 62 | return None 63 | 64 | label2cat = self.dataset_meta['label2cat'] 65 | ignore_index = self.dataset_meta['ignore_index'] 66 | classes = self.dataset_meta['classes'] 67 | thing_classes = [classes[i] for i in self.thing_class_inds] 68 | stuff_classes = [classes[i] for i in self.stuff_class_inds] 69 | include = self.thing_class_inds + self.stuff_class_inds 70 | 71 | gt_labels = [] 72 | seg_preds = [] 73 | for eval_ann, sinlge_pred_results in results: 74 | gt_labels.append(eval_ann) 75 | seg_preds.append(sinlge_pred_results) 76 | 77 | ret_dict = panoptic_seg_eval(gt_labels, seg_preds, classes, 78 | thing_classes, stuff_classes, include, self.dataset_type, 79 | self.min_num_points, self.id_offset, 80 | label2cat, ignore_index, logger) 81 | 82 | return ret_dict 83 | 84 | def format_results(self, results): 85 | r"""Format the results to txt file. Refer to `ScanNet documentation 86 | `_. 87 | 88 | Args: 89 | outputs (list[dict]): Testing results of the dataset. 90 | 91 | Returns: 92 | tuple: (outputs, tmp_dir), outputs is the detection results, 93 | tmp_dir is the temporal directory created for saving submission 94 | files when ``submission_prefix`` is not specified. 95 | """ 96 | 97 | submission_prefix = self.submission_prefix 98 | mmengine.mkdir_or_exist(submission_prefix) 99 | ignore_index = self.dataset_meta['ignore_index'] 100 | 101 | 102 | if self.dataset_type == 'nuscenes': 103 | meta_dir = os.path.join(submission_prefix, self.taskset) 104 | mmengine.mkdir_or_exist(meta_dir) 105 | meta = {"meta": { 106 | "task": "segmentation", 107 | "use_camera": False, 108 | "use_lidar": True, 109 | "use_radar": False, 110 | "use_map": False, 111 | "use_external": False}} 112 | output = open(os.path.join(submission_prefix, self.taskset, 'submission.json'), 'w') 113 | json_meta = json.dumps(meta) 114 | output.write(json_meta) 115 | output.close() 116 | 117 | for i, (eval_ann, result) in enumerate(results): 118 | sample_token = eval_ann['token'] 119 | results_dir = os.path.join(submission_prefix, 'panoptic', self.taskset) 120 | mmengine.mkdir_or_exist(results_dir) 121 | results_dir = os.path.join(results_dir, sample_token) 122 | pred_semantic_mask = result['pts_semantic_mask'] 123 | pred_instance_mask = result['pts_instance_mask'] 124 | pred_panoptic_mask = (pred_instance_mask + pred_semantic_mask*self.id_offset).astype(np.uint16) 125 | curr_file = results_dir + "_panoptic.npz" 126 | np.savez_compressed(curr_file, data=pred_panoptic_mask) 127 | elif self.dataset_type == "semantickitti": 128 | results_dir = os.path.join(submission_prefix, 'sequences') 129 | mmengine.mkdir_or_exist(results_dir) 130 | for i in range(0,22): 131 | sub_dir = os.path.join(results_dir, str(i).zfill(2), 'predictions') 132 | mmengine.mkdir_or_exist(sub_dir) 133 | 134 | learning_map_inv_array = np.zeros(len(self.learning_map_inv)) 135 | for i, v in enumerate(self.learning_map_inv.values()): 136 | learning_map_inv_array[i] = v 137 | for i, (eval_ann, result) in enumerate(results): 138 | semantic_preds = result['pts_semantic_mask'] 139 | instance_preds = result['pts_instance_mask'] 140 | semantic_preds_inv = learning_map_inv_array[semantic_preds] 141 | panoptic_preds = semantic_preds_inv.reshape(-1, 1) + (instance_preds * self.id_offset).reshape(-1, 1) 142 | 143 | lidar_path = eval_ann['lidar_path'] 144 | seq = lidar_path.split('/')[-3] 145 | pts_fname = lidar_path.split('/')[-1].split('.')[-2]+'.label' 146 | fname = os.path.join(results_dir, seq, 'predictions', pts_fname) 147 | panoptic_preds.reshape(-1).astype(np.uint32).tofile(fname) 148 | else: 149 | raise NotImplementedError() -------------------------------------------------------------------------------- /p3former/backbones/cylinder3d.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | r"""Modified from Cylinder3D. 3 | 4 | Please refer to `Cylinder3D github page 5 | `_ for details 6 | """ 7 | 8 | from typing import List, Optional 9 | 10 | import numpy as np 11 | import torch 12 | from mmcv.cnn import build_activation_layer, build_norm_layer 13 | from mmcv.ops import (SparseConv3d, SparseConvTensor, SparseInverseConv3d, 14 | SubMConv3d) 15 | from mmengine.model import BaseModule 16 | 17 | from mmdet3d.registry import MODELS 18 | from mmdet3d.utils import ConfigType 19 | 20 | 21 | class AsymmResBlock(BaseModule): 22 | """Asymmetrical Residual Block. 23 | 24 | Args: 25 | in_channels (int): Input channels of the block. 26 | out_channels (int): Output channels of the block. 27 | norm_cfg (:obj:`ConfigDict` or dict): Config dict for 28 | normalization layer. 29 | act_cfg (:obj:`ConfigDict` or dict): Config dict of activation layers. 30 | Defaults to dict(type='LeakyReLU'). 31 | indice_key (str, optional): Name of indice tables. Defaults to None. 32 | """ 33 | 34 | def __init__(self, 35 | in_channels: int, 36 | out_channels: int, 37 | norm_cfg: ConfigType, 38 | act_cfg: ConfigType = dict(type='LeakyReLU'), 39 | indice_key: Optional[str] = None): 40 | super().__init__() 41 | 42 | self.conv0_0 = SubMConv3d( 43 | in_channels, 44 | out_channels, 45 | kernel_size=(1, 3, 3), 46 | padding=1, 47 | bias=False, 48 | indice_key=indice_key + 'bef') 49 | self.act0_0 = build_activation_layer(act_cfg) 50 | self.bn0_0 = build_norm_layer(norm_cfg, out_channels)[1] 51 | 52 | self.conv0_1 = SubMConv3d( 53 | out_channels, 54 | out_channels, 55 | kernel_size=(3, 1, 3), 56 | padding=1, 57 | bias=False, 58 | indice_key=indice_key + 'bef') 59 | self.act0_1 = build_activation_layer(act_cfg) 60 | self.bn0_1 = build_norm_layer(norm_cfg, out_channels)[1] 61 | 62 | self.conv1_0 = SubMConv3d( 63 | in_channels, 64 | out_channels, 65 | kernel_size=(3, 1, 3), 66 | padding=1, 67 | bias=False, 68 | indice_key=indice_key + 'bef') 69 | self.act1_0 = build_activation_layer(act_cfg) 70 | self.bn1_0 = build_norm_layer(norm_cfg, out_channels)[1] 71 | 72 | self.conv1_1 = SubMConv3d( 73 | out_channels, 74 | out_channels, 75 | kernel_size=(1, 3, 3), 76 | padding=1, 77 | bias=False, 78 | indice_key=indice_key + 'bef') 79 | self.act1_1 = build_activation_layer(act_cfg) 80 | self.bn1_1 = build_norm_layer(norm_cfg, out_channels)[1] 81 | 82 | def forward(self, x: SparseConvTensor) -> SparseConvTensor: 83 | """Forward pass.""" 84 | shortcut = self.conv0_0(x) 85 | 86 | shortcut.features = self.act0_0(shortcut.features) 87 | shortcut.features = self.bn0_0(shortcut.features) 88 | 89 | shortcut = self.conv0_1(shortcut) 90 | shortcut.features = self.act0_1(shortcut.features) 91 | shortcut.features = self.bn0_1(shortcut.features) 92 | 93 | res = self.conv1_0(x) 94 | res.features = self.act1_0(res.features) 95 | res.features = self.bn1_0(res.features) 96 | 97 | res = self.conv1_1(res) 98 | res.features = self.act1_1(res.features) 99 | res.features = self.bn1_1(res.features) 100 | 101 | res.features = res.features + shortcut.features 102 | 103 | return res 104 | 105 | 106 | class AsymmeDownBlock(BaseModule): 107 | """Asymmetrical DownSample Block. 108 | 109 | Args: 110 | in_channels (int): Input channels of the block. 111 | out_channels (int): Output channels of the block. 112 | norm_cfg (:obj:`ConfigDict` or dict): Config dict for 113 | normalization layer. 114 | act_cfg (:obj:`ConfigDict` or dict): Config dict of activation layers. 115 | Defaults to dict(type='LeakyReLU'). 116 | pooling (bool): Whether pooling features at the end of 117 | block. Defaults: True. 118 | height_pooling (bool): Whether pooling features at 119 | the height dimension. Defaults: False. 120 | indice_key (str, optional): Name of indice tables. Defaults to None. 121 | """ 122 | 123 | def __init__(self, 124 | in_channels: int, 125 | out_channels: int, 126 | norm_cfg: ConfigType, 127 | act_cfg: ConfigType = dict(type='LeakyReLU'), 128 | pooling: bool = True, 129 | height_pooling: bool = False, 130 | indice_key: Optional[str] = None): 131 | super().__init__() 132 | self.pooling = pooling 133 | 134 | self.conv0_0 = SubMConv3d( 135 | in_channels, 136 | out_channels, 137 | kernel_size=(3, 1, 3), 138 | padding=1, 139 | bias=False, 140 | indice_key=indice_key + 'bef') 141 | self.act0_0 = build_activation_layer(act_cfg) 142 | self.bn0_0 = build_norm_layer(norm_cfg, out_channels)[1] 143 | 144 | self.conv0_1 = SubMConv3d( 145 | out_channels, 146 | out_channels, 147 | kernel_size=(1, 3, 3), 148 | padding=1, 149 | bias=False, 150 | indice_key=indice_key + 'bef') 151 | self.act0_1 = build_activation_layer(act_cfg) 152 | self.bn0_1 = build_norm_layer(norm_cfg, out_channels)[1] 153 | 154 | self.conv1_0 = SubMConv3d( 155 | in_channels, 156 | out_channels, 157 | kernel_size=(1, 3, 3), 158 | padding=1, 159 | bias=False, 160 | indice_key=indice_key + 'bef') 161 | self.act1_0 = build_activation_layer(act_cfg) 162 | self.bn1_0 = build_norm_layer(norm_cfg, out_channels)[1] 163 | 164 | self.conv1_1 = SubMConv3d( 165 | out_channels, 166 | out_channels, 167 | kernel_size=(3, 1, 3), 168 | padding=1, 169 | bias=False, 170 | indice_key=indice_key + 'bef') 171 | self.act1_1 = build_activation_layer(act_cfg) 172 | self.bn1_1 = build_norm_layer(norm_cfg, out_channels)[1] 173 | 174 | if pooling: 175 | if height_pooling: 176 | self.pool = SparseConv3d( 177 | out_channels, 178 | out_channels, 179 | kernel_size=3, 180 | stride=2, 181 | padding=1, 182 | indice_key=indice_key, 183 | bias=False) 184 | else: 185 | self.pool = SparseConv3d( 186 | out_channels, 187 | out_channels, 188 | kernel_size=3, 189 | stride=(2, 2, 1), 190 | padding=1, 191 | indice_key=indice_key, 192 | bias=False) 193 | 194 | def forward(self, x: SparseConvTensor) -> SparseConvTensor: 195 | """Forward pass.""" 196 | shortcut = self.conv0_0(x) 197 | shortcut.features = self.act0_0(shortcut.features) 198 | shortcut.features = self.bn0_0(shortcut.features) 199 | 200 | shortcut = self.conv0_1(shortcut) 201 | shortcut.features = self.act0_1(shortcut.features) 202 | shortcut.features = self.bn0_1(shortcut.features) 203 | 204 | res = self.conv1_0(x) 205 | res.features = self.act1_0(res.features) 206 | res.features = self.bn1_0(res.features) 207 | 208 | res = self.conv1_1(res) 209 | res.features = self.act1_1(res.features) 210 | res.features = self.bn1_1(res.features) 211 | 212 | res.features = res.features + shortcut.features 213 | 214 | if self.pooling: 215 | pooled_res = self.pool(res) 216 | return pooled_res, res 217 | else: 218 | return res 219 | 220 | 221 | class AsymmeUpBlock(BaseModule): 222 | """Asymmetrical UpSample Block. 223 | 224 | Args: 225 | in_channels (int): Input channels of the block. 226 | out_channels (int): Output channels of the block. 227 | norm_cfg (:obj:`ConfigDict` or dict): Config dict for 228 | normalization layer. 229 | act_cfg (:obj:`ConfigDict` or dict): Config dict of activation layers. 230 | Defaults to dict(type='LeakyReLU'). 231 | indice_key (str, optional): Name of indice tables. Defaults to None. 232 | up_key (str, optional): Name of indice tables used in 233 | SparseInverseConv3d. Defaults to None. 234 | """ 235 | 236 | def __init__(self, 237 | in_channels: int, 238 | out_channels: int, 239 | norm_cfg: ConfigType, 240 | act_cfg: ConfigType = dict(type='LeakyReLU'), 241 | indice_key: Optional[str] = None, 242 | up_key: Optional[str] = None): 243 | super().__init__() 244 | 245 | self.trans_conv = SubMConv3d( 246 | in_channels, 247 | out_channels, 248 | kernel_size=(3, 3, 3), 249 | padding=1, 250 | bias=False, 251 | indice_key=indice_key + 'new_up') 252 | self.trans_act = build_activation_layer(act_cfg) 253 | self.trans_bn = build_norm_layer(norm_cfg, out_channels)[1] 254 | 255 | self.conv1 = SubMConv3d( 256 | out_channels, 257 | out_channels, 258 | kernel_size=(1, 3, 3), 259 | padding=1, 260 | bias=False, 261 | indice_key=indice_key) 262 | self.act1 = build_activation_layer(act_cfg) 263 | self.bn1 = build_norm_layer(norm_cfg, out_channels)[1] 264 | 265 | self.conv2 = SubMConv3d( 266 | out_channels, 267 | out_channels, 268 | kernel_size=(3, 1, 3), 269 | padding=1, 270 | bias=False, 271 | indice_key=indice_key) 272 | self.act2 = build_activation_layer(act_cfg) 273 | self.bn2 = build_norm_layer(norm_cfg, out_channels)[1] 274 | 275 | self.conv3 = SubMConv3d( 276 | out_channels, 277 | out_channels, 278 | kernel_size=(3, 3, 3), 279 | padding=1, 280 | bias=False, 281 | indice_key=indice_key) 282 | self.act3 = build_activation_layer(act_cfg) 283 | self.bn3 = build_norm_layer(norm_cfg, out_channels)[1] 284 | 285 | self.up_subm = SparseInverseConv3d( 286 | out_channels, 287 | out_channels, 288 | kernel_size=3, 289 | indice_key=up_key, 290 | bias=False) 291 | 292 | def forward(self, x: SparseConvTensor, 293 | skip: SparseConvTensor) -> SparseConvTensor: 294 | """Forward pass.""" 295 | x_trans = self.trans_conv(x) 296 | x_trans.features = self.trans_act(x_trans.features) 297 | x_trans.features = self.trans_bn(x_trans.features) 298 | 299 | # upsample 300 | up = self.up_subm(x_trans) 301 | 302 | up.features = up.features + skip.features 303 | 304 | up = self.conv1(up) 305 | up.features = self.act1(up.features) 306 | up.features = self.bn1(up.features) 307 | 308 | up = self.conv2(up) 309 | up.features = self.act2(up.features) 310 | up.features = self.bn2(up.features) 311 | 312 | up = self.conv3(up) 313 | up.features = self.act3(up.features) 314 | up.features = self.bn3(up.features) 315 | 316 | return up 317 | 318 | 319 | class DDCMBlock(BaseModule): 320 | """Dimension-Decomposition based Context Modeling. 321 | 322 | Args: 323 | in_channels (int): Input channels of the block. 324 | out_channels (int): Output channels of the block. 325 | norm_cfg (:obj:`ConfigDict` or dict): Config dict for 326 | normalization layer. 327 | act_cfg (:obj:`ConfigDict` or dict): Config dict of activation layers. 328 | Defaults to dict(type='Sigmoid'). 329 | indice_key (str, optional): Name of indice tables. Defaults to None. 330 | """ 331 | 332 | def __init__(self, 333 | in_channels: int, 334 | out_channels: int, 335 | norm_cfg: ConfigType, 336 | act_cfg: ConfigType = dict(type='Sigmoid'), 337 | indice_key: Optional[str] = None): 338 | super().__init__() 339 | 340 | self.conv1 = SubMConv3d( 341 | in_channels, 342 | out_channels, 343 | kernel_size=(3, 1, 1), 344 | padding=1, 345 | bias=False, 346 | indice_key=indice_key) 347 | self.bn1 = build_norm_layer(norm_cfg, out_channels)[1] 348 | self.act1 = build_activation_layer(act_cfg) 349 | 350 | self.conv2 = SubMConv3d( 351 | in_channels, 352 | out_channels, 353 | kernel_size=(1, 3, 1), 354 | padding=1, 355 | bias=False, 356 | indice_key=indice_key) 357 | self.bn2 = build_norm_layer(norm_cfg, out_channels)[1] 358 | self.act2 = build_activation_layer(act_cfg) 359 | 360 | self.conv3 = SubMConv3d( 361 | in_channels, 362 | out_channels, 363 | kernel_size=(1, 1, 3), 364 | padding=1, 365 | bias=False, 366 | indice_key=indice_key) 367 | self.bn3 = build_norm_layer(norm_cfg, out_channels)[1] 368 | self.act3 = build_activation_layer(act_cfg) 369 | 370 | def forward(self, x: SparseConvTensor) -> SparseConvTensor: 371 | """Forward pass.""" 372 | shortcut = self.conv1(x) 373 | shortcut.features = self.bn1(shortcut.features) 374 | shortcut.features = self.act1(shortcut.features) 375 | 376 | shortcut2 = self.conv2(x) 377 | shortcut2.features = self.bn2(shortcut2.features) 378 | shortcut2.features = self.act2(shortcut2.features) 379 | 380 | shortcut3 = self.conv3(x) 381 | shortcut3.features = self.bn3(shortcut3.features) 382 | shortcut3.features = self.act3(shortcut3.features) 383 | shortcut.features = shortcut.features + \ 384 | shortcut2.features + shortcut3.features 385 | 386 | shortcut.features = shortcut.features * x.features 387 | 388 | return shortcut 389 | 390 | 391 | @MODELS.register_module(force=True) 392 | class _Asymm3DSpconv(BaseModule): 393 | """Asymmetrical 3D convolution networks. 394 | 395 | Args: 396 | grid_size (int): Size of voxel grids. 397 | input_channels (int): Input channels of the block. 398 | base_channels (int): Initial size of feature channels before 399 | feeding into Encoder-Decoder structure. Defaults to 16. 400 | backbone_depth (int): The depth of backbone. The backbone contains 401 | downblocks and upblocks with the number of backbone_depth. 402 | height_pooing (List[bool]): List indicating which downblocks perform 403 | height pooling. 404 | norm_cfg (:obj:`ConfigDict` or dict): Config dict for normalization 405 | layer. Defaults to dict(type='BN1d', eps=1e-3, momentum=0.01)). 406 | init_cfg (dict, optional): Initialization config. 407 | Defaults to None. 408 | """ 409 | 410 | def __init__(self, 411 | grid_size: int, 412 | input_channels: int, 413 | base_channels: int = 16, 414 | backbone_depth: int = 4, 415 | height_pooing: List[bool] = [True, True, False, False], 416 | norm_cfg: ConfigType = dict( 417 | type='BN1d', eps=1e-3, momentum=0.01), 418 | init_cfg=None, 419 | more_conv=False, 420 | out_channels=128): 421 | super().__init__(init_cfg=init_cfg) 422 | 423 | self.grid_size = grid_size 424 | self.backbone_depth = backbone_depth 425 | self.down_context = AsymmResBlock( 426 | input_channels, base_channels, indice_key='pre', norm_cfg=norm_cfg) 427 | 428 | self.down_block_list = torch.nn.ModuleList() 429 | self.up_block_list = torch.nn.ModuleList() 430 | for i in range(self.backbone_depth): 431 | self.down_block_list.append( 432 | AsymmeDownBlock( 433 | 2**i * base_channels, 434 | 2**(i + 1) * base_channels, 435 | height_pooling=height_pooing[i], 436 | indice_key='down' + str(i), 437 | norm_cfg=norm_cfg)) 438 | if i == self.backbone_depth - 1: 439 | self.up_block_list.append( 440 | AsymmeUpBlock( 441 | 2**(i + 1) * base_channels, 442 | 2**(i + 1) * base_channels, 443 | up_key='down' + str(i), 444 | indice_key='up' + str(self.backbone_depth - 1 - i), 445 | norm_cfg=norm_cfg)) 446 | else: 447 | self.up_block_list.append( 448 | AsymmeUpBlock( 449 | 2**(i + 2) * base_channels, 450 | 2**(i + 1) * base_channels, 451 | up_key='down' + str(i), 452 | indice_key='up' + str(self.backbone_depth - 1 - i), 453 | norm_cfg=norm_cfg)) 454 | 455 | self.ddcm = DDCMBlock( 456 | 2 * base_channels, 457 | 2 * base_channels, 458 | indice_key='ddcm', 459 | norm_cfg=norm_cfg) 460 | 461 | self.more_conv = more_conv 462 | if self.more_conv: 463 | self.addConv = SubMConv3d( 464 | base_channels*4, 465 | out_channels, 466 | kernel_size=3, 467 | padding=1, 468 | bias=False, 469 | indice_key="mc") 470 | self.addBn = build_norm_layer(norm_cfg, out_channels)[1] 471 | self.addAct = build_activation_layer(dict(type='LeakyReLU')) 472 | 473 | def forward(self, voxel_features: torch.Tensor, coors: torch.Tensor, 474 | batch_size: int) -> SparseConvTensor: 475 | """Forward pass.""" 476 | coors = coors.int() 477 | ret = SparseConvTensor(voxel_features, coors, np.array(self.grid_size), 478 | batch_size) 479 | ret = self.down_context(ret) 480 | 481 | down_skip_list = [] 482 | down_pool = ret 483 | for i in range(self.backbone_depth): 484 | down_pool, down_skip = self.down_block_list[i](down_pool) 485 | down_skip_list.append(down_skip) 486 | 487 | up = down_pool 488 | for i in range(self.backbone_depth - 1, -1, -1): 489 | up = self.up_block_list[i](up, down_skip_list[i]) 490 | 491 | ddcm = self.ddcm(up) 492 | ddcm.features = torch.cat((ddcm.features, up.features), 1) 493 | 494 | if self.more_conv: 495 | ddcm = self.addConv(ddcm) 496 | ddcm.features = self.addBn(ddcm.features) 497 | ddcm.features = self.addAct(ddcm.features) 498 | 499 | return ddcm 500 | -------------------------------------------------------------------------------- /p3former/data_preprocessors/data_preprocessor.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | # Modified from mmdetection3d. 3 | import math 4 | from numbers import Number 5 | from typing import Dict, List, Optional, Sequence, Union 6 | 7 | import numpy as np 8 | import torch 9 | from mmdet.models import DetDataPreprocessor 10 | from mmengine.model import stack_batch 11 | from mmengine.utils import is_list_of 12 | from torch.nn import functional as F 13 | 14 | from mmdet3d.registry import MODELS 15 | from mmdet3d.structures.det3d_data_sample import SampleList 16 | from mmdet3d.utils import OptConfigType 17 | from mmdet3d.models.data_preprocessors.utils import multiview_img_stack_batch 18 | from mmdet3d.models.data_preprocessors.voxelize import VoxelizationByGridShape, dynamic_scatter_3d 19 | import torch_scatter 20 | 21 | 22 | @MODELS.register_module(force=True) 23 | class _Det3DDataPreprocessor(DetDataPreprocessor): 24 | """Points / Image pre-processor for point clouds / vision-only / multi- 25 | modality 3D detection tasks. 26 | 27 | It provides the data pre-processing as follows 28 | 29 | - Collate and move image and point cloud data to the target device. 30 | 31 | - 1) For image data: 32 | - Pad images in inputs to the maximum size of current batch with defined 33 | ``pad_value``. The padding size can be divisible by a defined 34 | ``pad_size_divisor``. 35 | - Stack images in inputs to batch_imgs. 36 | - Convert images in inputs from bgr to rgb if the shape of input is 37 | (3, H, W). 38 | - Normalize images in inputs with defined std and mean. 39 | - Do batch augmentations during training. 40 | 41 | - 2) For point cloud data: 42 | - If no voxelization, directly return list of point cloud data. 43 | - If voxelization is applied, voxelize point cloud according to 44 | ``voxel_type`` and obtain ``voxels``. 45 | 46 | Args: 47 | voxel (bool): Whether to apply voxelization to point cloud. 48 | Defaults to False. 49 | voxel_type (str): Voxelization type. Two voxelization types are 50 | provided: 'hard' and 'dynamic', respectively for hard 51 | voxelization and dynamic voxelization. Defaults to 'hard'. 52 | voxel_layer (dict or :obj:`ConfigDict`, optional): Voxelization layer 53 | config. Defaults to None. 54 | mean (Sequence[Number], optional): The pixel mean of R, G, B channels. 55 | Defaults to None. 56 | std (Sequence[Number], optional): The pixel standard deviation of 57 | R, G, B channels. Defaults to None. 58 | pad_size_divisor (int): The size of padded image should be 59 | divisible by ``pad_size_divisor``. Defaults to 1. 60 | pad_value (Number): The padded pixel value. Defaults to 0. 61 | pad_mask (bool): Whether to pad instance masks. Defaults to False. 62 | mask_pad_value (int): The padded pixel value for instance masks. 63 | Defaults to 0. 64 | pad_seg (bool): Whether to pad semantic segmentation maps. 65 | Defaults to False. 66 | seg_pad_value (int): The padded pixel value for semantic 67 | segmentation maps. Defaults to 255. 68 | bgr_to_rgb (bool): Whether to convert image from BGR to RGB. 69 | Defaults to False. 70 | rgb_to_bgr (bool): Whether to convert image from RGB to BGR. 71 | Defaults to False. 72 | boxtype2tensor (bool): Whether to keep the ``BaseBoxes`` type of 73 | bboxes data or not. Defaults to True. 74 | batch_augments (List[dict], optional): Batch-level augmentations. 75 | Defaults to None. 76 | """ 77 | 78 | def __init__(self, 79 | voxel: bool = False, 80 | voxel_type: str = 'hard', 81 | voxel_layer: OptConfigType = None, 82 | mean: Sequence[Number] = None, 83 | std: Sequence[Number] = None, 84 | pad_size_divisor: int = 1, 85 | pad_value: Union[float, int] = 0, 86 | pad_mask: bool = False, 87 | mask_pad_value: int = 0, 88 | pad_seg: bool = False, 89 | seg_pad_value: int = 255, 90 | bgr_to_rgb: bool = False, 91 | rgb_to_bgr: bool = False, 92 | boxtype2tensor: bool = True, 93 | batch_augments: Optional[List[dict]] = None) -> None: 94 | super(_Det3DDataPreprocessor, self).__init__( 95 | mean=mean, 96 | std=std, 97 | pad_size_divisor=pad_size_divisor, 98 | pad_value=pad_value, 99 | pad_mask=pad_mask, 100 | mask_pad_value=mask_pad_value, 101 | pad_seg=pad_seg, 102 | seg_pad_value=seg_pad_value, 103 | bgr_to_rgb=bgr_to_rgb, 104 | rgb_to_bgr=rgb_to_bgr, 105 | batch_augments=batch_augments) 106 | self.voxel = voxel 107 | self.voxel_type = voxel_type 108 | if voxel: 109 | self.voxel_layer = VoxelizationByGridShape(**voxel_layer) 110 | 111 | def forward(self, 112 | data: Union[dict, List[dict]], 113 | training: bool = False) -> Union[dict, List[dict]]: 114 | """Perform normalization, padding and bgr2rgb conversion based on 115 | ``BaseDataPreprocessor``. 116 | 117 | Args: 118 | data (dict or List[dict]): Data from dataloader. 119 | The dict contains the whole batch data, when it is 120 | a list[dict], the list indicate test time augmentation. 121 | training (bool): Whether to enable training time augmentation. 122 | Defaults to False. 123 | 124 | Returns: 125 | dict or List[dict]: Data in the same format as the model input. 126 | """ 127 | if isinstance(data, list): 128 | num_augs = len(data) 129 | aug_batch_data = [] 130 | for aug_id in range(num_augs): 131 | single_aug_batch_data = self.simple_process( 132 | data[aug_id], training) 133 | aug_batch_data.append(single_aug_batch_data) 134 | return aug_batch_data 135 | 136 | else: 137 | return self.simple_process(data, training) 138 | 139 | def simple_process(self, data: dict, training: bool = False) -> dict: 140 | """Perform normalization, padding and bgr2rgb conversion for img data 141 | based on ``BaseDataPreprocessor``, and voxelize point cloud if `voxel` 142 | is set to be True. 143 | 144 | Args: 145 | data (dict): Data sampled from dataloader. 146 | training (bool): Whether to enable training time augmentation. 147 | Defaults to False. 148 | 149 | Returns: 150 | dict: Data in the same format as the model input. 151 | """ 152 | if 'img' in data['inputs']: 153 | batch_pad_shape = self._get_pad_shape(data) 154 | 155 | data = self.collate_data(data) 156 | inputs, data_samples = data['inputs'], data['data_samples'] 157 | batch_inputs = dict() 158 | 159 | if 'points' in inputs: 160 | batch_inputs['points'] = inputs['points'] 161 | 162 | if self.voxel: 163 | voxel_dict = self.voxelize(inputs['points'], data_samples) 164 | batch_inputs['voxels'] = voxel_dict 165 | 166 | if 'imgs' in inputs: 167 | imgs = inputs['imgs'] 168 | 169 | if data_samples is not None: 170 | # NOTE the batched image size information may be useful, e.g. 171 | # in DETR, this is needed for the construction of masks, which 172 | # is then used for the transformer_head. 173 | batch_input_shape = tuple(imgs[0].size()[-2:]) 174 | for data_sample, pad_shape in zip(data_samples, 175 | batch_pad_shape): 176 | data_sample.set_metainfo({ 177 | 'batch_input_shape': batch_input_shape, 178 | 'pad_shape': pad_shape 179 | }) 180 | 181 | if hasattr(self, 'boxtype2tensor') and self.boxtype2tensor: 182 | from mmdet.models.utils.misc import \ 183 | samplelist_boxtype2tensor 184 | samplelist_boxtype2tensor(data_samples) 185 | elif hasattr(self, 'boxlist2tensor') and self.boxlist2tensor: 186 | from mmdet.models.utils.misc import \ 187 | samplelist_boxlist2tensor 188 | samplelist_boxlist2tensor(data_samples) 189 | if self.pad_mask: 190 | self.pad_gt_masks(data_samples) 191 | 192 | if self.pad_seg: 193 | self.pad_gt_sem_seg(data_samples) 194 | 195 | if training and self.batch_augments is not None: 196 | for batch_aug in self.batch_augments: 197 | imgs, data_samples = batch_aug(imgs, data_samples) 198 | batch_inputs['imgs'] = imgs 199 | 200 | return {'inputs': batch_inputs, 'data_samples': data_samples} 201 | 202 | def preprocess_img(self, _batch_img: torch.Tensor) -> torch.Tensor: 203 | # channel transform 204 | if self._channel_conversion: 205 | _batch_img = _batch_img[[2, 1, 0], ...] 206 | # Convert to float after channel conversion to ensure 207 | # efficiency 208 | _batch_img = _batch_img.float() 209 | # Normalization. 210 | if self._enable_normalize: 211 | if self.mean.shape[0] == 3: 212 | assert _batch_img.dim() == 3 and _batch_img.shape[0] == 3, ( 213 | 'If the mean has 3 values, the input tensor ' 214 | 'should in shape of (3, H, W), but got the ' 215 | f'tensor with shape {_batch_img.shape}') 216 | _batch_img = (_batch_img - self.mean) / self.std 217 | return _batch_img 218 | 219 | def collate_data(self, data: dict) -> dict: 220 | """Copying data to the target device and Performs normalization, 221 | padding and bgr2rgb conversion and stack based on 222 | ``BaseDataPreprocessor``. 223 | 224 | Collates the data sampled from dataloader into a list of dict and 225 | list of labels, and then copies tensor to the target device. 226 | 227 | Args: 228 | data (dict): Data sampled from dataloader. 229 | 230 | Returns: 231 | dict: Data in the same format as the model input. 232 | """ 233 | data = self.cast_data(data) # type: ignore 234 | 235 | if 'img' in data['inputs']: 236 | _batch_imgs = data['inputs']['img'] 237 | # Process data with `pseudo_collate`. 238 | if is_list_of(_batch_imgs, torch.Tensor): 239 | batch_imgs = [] 240 | img_dim = _batch_imgs[0].dim() 241 | for _batch_img in _batch_imgs: 242 | if img_dim == 3: # standard img 243 | _batch_img = self.preprocess_img(_batch_img) 244 | elif img_dim == 4: 245 | _batch_img = [ 246 | self.preprocess_img(_img) for _img in _batch_img 247 | ] 248 | 249 | _batch_img = torch.stack(_batch_img, dim=0) 250 | 251 | batch_imgs.append(_batch_img) 252 | 253 | # Pad and stack Tensor. 254 | if img_dim == 3: 255 | batch_imgs = stack_batch(batch_imgs, self.pad_size_divisor, 256 | self.pad_value) 257 | elif img_dim == 4: 258 | batch_imgs = multiview_img_stack_batch( 259 | batch_imgs, self.pad_size_divisor, self.pad_value) 260 | 261 | # Process data with `default_collate`. 262 | elif isinstance(_batch_imgs, torch.Tensor): 263 | assert _batch_imgs.dim() == 4, ( 264 | 'The input of `ImgDataPreprocessor` should be a NCHW ' 265 | 'tensor or a list of tensor, but got a tensor with ' 266 | f'shape: {_batch_imgs.shape}') 267 | if self._channel_conversion: 268 | _batch_imgs = _batch_imgs[:, [2, 1, 0], ...] 269 | # Convert to float after channel conversion to ensure 270 | # efficiency 271 | _batch_imgs = _batch_imgs.float() 272 | if self._enable_normalize: 273 | _batch_imgs = (_batch_imgs - self.mean) / self.std 274 | h, w = _batch_imgs.shape[2:] 275 | target_h = math.ceil( 276 | h / self.pad_size_divisor) * self.pad_size_divisor 277 | target_w = math.ceil( 278 | w / self.pad_size_divisor) * self.pad_size_divisor 279 | pad_h = target_h - h 280 | pad_w = target_w - w 281 | batch_imgs = F.pad(_batch_imgs, (0, pad_w, 0, pad_h), 282 | 'constant', self.pad_value) 283 | else: 284 | raise TypeError( 285 | 'Output of `cast_data` should be a list of dict ' 286 | 'or a tuple with inputs and data_samples, but got' 287 | f'{type(data)}: {data}') 288 | 289 | data['inputs']['imgs'] = batch_imgs 290 | 291 | data.setdefault('data_samples', None) 292 | 293 | return data 294 | 295 | def _get_pad_shape(self, data: dict) -> List[tuple]: 296 | """Get the pad_shape of each image based on data and 297 | pad_size_divisor.""" 298 | # rewrite `_get_pad_shape` for obtaining image inputs. 299 | _batch_inputs = data['inputs']['img'] 300 | # Process data with `pseudo_collate`. 301 | if is_list_of(_batch_inputs, torch.Tensor): 302 | batch_pad_shape = [] 303 | for ori_input in _batch_inputs: 304 | if ori_input.dim() == 4: 305 | # mean multiview input, select one of the 306 | # image to calculate the pad shape 307 | ori_input = ori_input[0] 308 | pad_h = int( 309 | np.ceil(ori_input.shape[1] / 310 | self.pad_size_divisor)) * self.pad_size_divisor 311 | pad_w = int( 312 | np.ceil(ori_input.shape[2] / 313 | self.pad_size_divisor)) * self.pad_size_divisor 314 | batch_pad_shape.append((pad_h, pad_w)) 315 | # Process data with `default_collate`. 316 | elif isinstance(_batch_inputs, torch.Tensor): 317 | assert _batch_inputs.dim() == 4, ( 318 | 'The input of `ImgDataPreprocessor` should be a NCHW tensor ' 319 | 'or a list of tensor, but got a tensor with shape: ' 320 | f'{_batch_inputs.shape}') 321 | pad_h = int( 322 | np.ceil(_batch_inputs.shape[1] / 323 | self.pad_size_divisor)) * self.pad_size_divisor 324 | pad_w = int( 325 | np.ceil(_batch_inputs.shape[2] / 326 | self.pad_size_divisor)) * self.pad_size_divisor 327 | batch_pad_shape = [(pad_h, pad_w)] * _batch_inputs.shape[0] 328 | else: 329 | raise TypeError('Output of `cast_data` should be a list of dict ' 330 | 'or a tuple with inputs and data_samples, but got ' 331 | f'{type(data)}: {data}') 332 | return batch_pad_shape 333 | 334 | @torch.no_grad() 335 | def voxelize(self, points: List[torch.Tensor], 336 | data_samples: SampleList) -> Dict[str, torch.Tensor]: 337 | """Apply voxelization to point cloud. 338 | 339 | Args: 340 | points (List[Tensor]): Point cloud in one data batch. 341 | data_samples: (list[:obj:`Det3DDataSample`]): The annotation data 342 | of every samples. Add voxel-wise annotation for segmentation. 343 | 344 | Returns: 345 | Dict[str, Tensor]: Voxelization information. 346 | 347 | - voxels (Tensor): Features of voxels, shape is MxNxC for hard 348 | voxelization, NxC for dynamic voxelization. 349 | - coors (Tensor): Coordinates of voxels, shape is Nx(1+NDim), 350 | where 1 represents the batch index. 351 | - num_points (Tensor, optional): Number of points in each voxel. 352 | - voxel_centers (Tensor, optional): Centers of voxels. 353 | """ 354 | 355 | voxel_dict = dict() 356 | 357 | if self.voxel_type == 'hard': 358 | voxels, coors, num_points, voxel_centers = [], [], [], [] 359 | for i, res in enumerate(points): 360 | res_voxels, res_coors, res_num_points = self.voxel_layer(res) 361 | res_voxel_centers = ( 362 | res_coors[:, [2, 1, 0]] + 0.5) * res_voxels.new_tensor( 363 | self.voxel_layer.voxel_size) + res_voxels.new_tensor( 364 | self.voxel_layer.point_cloud_range[0:3]) 365 | res_coors = F.pad(res_coors, (1, 0), mode='constant', value=i) 366 | voxels.append(res_voxels) 367 | coors.append(res_coors) 368 | num_points.append(res_num_points) 369 | voxel_centers.append(res_voxel_centers) 370 | 371 | voxels = torch.cat(voxels, dim=0) 372 | coors = torch.cat(coors, dim=0) 373 | num_points = torch.cat(num_points, dim=0) 374 | voxel_centers = torch.cat(voxel_centers, dim=0) 375 | 376 | voxel_dict['num_points'] = num_points 377 | voxel_dict['voxel_centers'] = voxel_centers 378 | elif self.voxel_type == 'dynamic': 379 | coors = [] 380 | # dynamic voxelization only provide a coors mapping 381 | for i, res in enumerate(points): 382 | res_coors = self.voxel_layer(res) 383 | res_coors = F.pad(res_coors, (1, 0), mode='constant', value=i) 384 | coors.append(res_coors) 385 | voxels = torch.cat(points, dim=0) 386 | coors = torch.cat(coors, dim=0) 387 | elif self.voxel_type == 'cylindrical': 388 | voxels, coors = [], [] 389 | for i, (res, data_sample) in enumerate(zip(points, data_samples)): 390 | rho = torch.sqrt(res[:, 0]**2 + res[:, 1]**2) 391 | phi = torch.atan2(res[:, 1], res[:, 0]) 392 | polar_res = torch.stack((rho, phi, res[:, 2]), dim=-1) 393 | min_bound = polar_res.new_tensor( 394 | self.voxel_layer.point_cloud_range[:3]) 395 | max_bound = polar_res.new_tensor( 396 | self.voxel_layer.point_cloud_range[3:]) 397 | try: # only support PyTorch >= 1.9.0 398 | polar_res_clamp = torch.clamp(polar_res, min_bound, 399 | max_bound) 400 | except TypeError: 401 | polar_res_clamp = polar_res.clone() 402 | for coor_idx in range(3): 403 | polar_res_clamp[:, coor_idx][ 404 | polar_res[:, coor_idx] > 405 | max_bound[coor_idx]] = max_bound[coor_idx] 406 | polar_res_clamp[:, coor_idx][ 407 | polar_res[:, coor_idx] < 408 | min_bound[coor_idx]] = min_bound[coor_idx] 409 | res_coors = torch.floor( 410 | (polar_res_clamp - min_bound) / polar_res_clamp.new_tensor( 411 | self.voxel_layer.voxel_size)).int() 412 | self.get_voxel_seg(res_coors, data_sample) 413 | res_coors = F.pad(res_coors, (1, 0), mode='constant', value=i) 414 | res_voxels = torch.cat((polar_res, res[:, :2], res[:, 3:]), 415 | dim=-1) 416 | voxels.append(res_voxels) 417 | coors.append(res_coors) 418 | voxels = torch.cat(voxels, dim=0) 419 | coors = torch.cat(coors, dim=0) 420 | elif self.voxel_type == 'minkunet': 421 | voxels, coors = [], [] 422 | voxel_size = points[0].new_tensor(self.voxel_layer.voxel_size) 423 | for i, (res, data_sample) in enumerate(zip(points, data_samples)): 424 | res_coors = torch.round(res[:, :3] / voxel_size).int() 425 | res_coors -= res_coors.min(0)[0] 426 | 427 | res_coors_numpy = res_coors.cpu().numpy() 428 | inds, voxel2point_map = self.sparse_quantize( 429 | res_coors_numpy, return_index=True, return_inverse=True) 430 | voxel2point_map = torch.from_numpy(voxel2point_map).cuda() 431 | if self.training: 432 | if len(inds) > 80000: 433 | inds = np.random.choice(inds, 80000, replace=False) 434 | inds = torch.from_numpy(inds).cuda() 435 | data_sample.gt_pts_seg.voxel_semantic_mask \ 436 | = data_sample.gt_pts_seg.pts_semantic_mask[inds] 437 | res_voxel_coors = res_coors[inds] 438 | res_voxels = res[inds] 439 | res_voxel_coors = F.pad( 440 | res_voxel_coors, (0, 1), mode='constant', value=i) 441 | data_sample.voxel2point_map = voxel2point_map.long() 442 | voxels.append(res_voxels) 443 | coors.append(res_voxel_coors) 444 | voxels = torch.cat(voxels, dim=0) 445 | coors = torch.cat(coors, dim=0) 446 | 447 | else: 448 | raise ValueError(f'Invalid voxelization type {self.voxel_type}') 449 | 450 | voxel_dict['voxels'] = voxels 451 | voxel_dict['coors'] = coors 452 | 453 | return voxel_dict 454 | 455 | def get_voxel_seg(self, res_coors: torch.Tensor, data_sample: SampleList): 456 | """Get voxel-wise segmentation label and point2voxel map. 457 | 458 | Args: 459 | res_coors (Tensor): The voxel coordinates of points, Nx3. 460 | data_sample: (:obj:`Det3DDataSample`): The annotation data of 461 | every samples. Add voxel-wise annotation forsegmentation. 462 | """ 463 | 464 | if self.training: 465 | if hasattr(data_sample.gt_pts_seg, 'pts_instance_mask'): 466 | pts_instance_mask = data_sample.gt_pts_seg.pts_instance_mask 467 | pts_semantic_mask = data_sample.gt_pts_seg.pts_semantic_mask 468 | num_points = res_coors.shape[0] 469 | 470 | _, unique_indices_inverse = torch.unique( 471 | res_coors, return_inverse=True, dim=0) 472 | 473 | unq_instance_labels = torch.unique(pts_instance_mask, dim=0) 474 | sort_instance_labels = pts_instance_mask.new_zeros(pts_instance_mask.shape) 475 | 476 | # map instance labels to [1 - the number of instances] 477 | i = 1 478 | ins2sem = pts_instance_mask.new_zeros(size=[len(unq_instance_labels)]) 479 | for u in unq_instance_labels: 480 | valid = pts_instance_mask == u 481 | sort_instance_labels[valid] = i 482 | ins2sem[i - 1] = pts_semantic_mask[valid][0] 483 | i = i + 1 484 | 485 | flatten_inst_labels = res_coors.new_zeros( 486 | (num_points, i)) # i: instance number+1(background) 487 | flatten_inst_labels[list(range(0, num_points)), 488 | sort_instance_labels[list(range(0, num_points))]. 489 | cpu().numpy().tolist()] += 1 490 | # scatter point-wise labels into voxel-wise 491 | flatten_inst_label_sum = torch_scatter.scatter_sum( 492 | flatten_inst_labels, unique_indices_inverse, dim=0) 493 | voxel_instance_labels = torch.argmax(flatten_inst_label_sum, dim=1) 494 | voxel_semantic_labels = ins2sem[voxel_instance_labels - 1] 495 | data_sample.gt_pts_seg.voxel_semantic_mask = voxel_semantic_labels 496 | data_sample.gt_pts_seg.voxel_instance_mask = voxel_instance_labels 497 | else: 498 | pts_semantic_mask = data_sample.gt_pts_seg.pts_semantic_mask 499 | voxel_semantic_mask, _, point2voxel_map = dynamic_scatter_3d( 500 | F.one_hot(pts_semantic_mask.long()).float(), res_coors, 'mean', 501 | True) 502 | voxel_semantic_mask = torch.argmax(voxel_semantic_mask, dim=-1) 503 | data_sample.gt_pts_seg.voxel_semantic_mask = voxel_semantic_mask 504 | data_sample.gt_pts_seg.point2voxel_map = point2voxel_map 505 | 506 | else: 507 | pseudo_tensor = res_coors.new_ones([res_coors.shape[0], 1]).float() 508 | _, _, point2voxel_map = dynamic_scatter_3d(pseudo_tensor, 509 | res_coors, 'mean', True) 510 | data_sample.gt_pts_seg.point2voxel_map = point2voxel_map 511 | 512 | def ravel_hash(self, x: np.ndarray) -> np.ndarray: 513 | """Get voxel coordinates hash for np.unique(). 514 | 515 | Args: 516 | x (np.ndarray): The voxel coordinates of points, Nx3. 517 | 518 | Returns: 519 | np.ndarray: Voxels coordinates hash. 520 | """ 521 | assert x.ndim == 2, x.shape 522 | 523 | x = x - np.min(x, axis=0) 524 | x = x.astype(np.uint64, copy=False) 525 | xmax = np.max(x, axis=0).astype(np.uint64) + 1 526 | 527 | h = np.zeros(x.shape[0], dtype=np.uint64) 528 | for k in range(x.shape[1] - 1): 529 | h += x[:, k] 530 | h *= xmax[k + 1] 531 | h += x[:, -1] 532 | return h 533 | 534 | def sparse_quantize(self, 535 | coords: np.ndarray, 536 | return_index: bool = False, 537 | return_inverse: bool = False) -> List[np.ndarray]: 538 | """Sparse Quantization for voxel coordinates used in Minkunet. 539 | 540 | Args: 541 | coords (np.ndarray): The voxel coordinates of points, Nx3. 542 | return_index (bool): Whether to return the indices of the 543 | unique coords, shape (M,). 544 | return_inverse (bool): Whether to return the indices of the 545 | original coords shape (N,). 546 | 547 | Returns: 548 | List[np.ndarray] or None: Return index and inverse map if 549 | return_index and return_inverse is True. 550 | """ 551 | _, indices, inverse_indices = np.unique( 552 | self.ravel_hash(coords), return_index=True, return_inverse=True) 553 | coords = coords[indices] 554 | 555 | outputs = [] 556 | if return_index: 557 | outputs += [indices] 558 | if return_inverse: 559 | outputs += [inverse_indices] 560 | return outputs 561 | -------------------------------------------------------------------------------- /p3former/segmentors/p3former.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from mmdet3d.registry import MODELS 3 | from mmdet3d.models.segmentors.cylinder3d import Cylinder3D 4 | from mmdet3d.structures import PointData 5 | from mmdet3d.utils import ConfigType, OptConfigType, OptMultiConfig 6 | 7 | @MODELS.register_module() 8 | class _P3Former(Cylinder3D): 9 | """P3Former.""" 10 | 11 | def __init__(self, 12 | voxel_encoder: ConfigType, 13 | backbone: ConfigType, 14 | decode_head: ConfigType, 15 | neck: OptConfigType = None, 16 | auxiliary_head: OptConfigType = None, 17 | loss_regularization: OptConfigType = None, 18 | train_cfg: OptConfigType = None, 19 | test_cfg: OptConfigType = None, 20 | data_preprocessor: OptConfigType = None, 21 | init_cfg: OptMultiConfig = None) -> None: 22 | super().__init__(voxel_encoder=voxel_encoder, 23 | backbone=backbone, 24 | decode_head=decode_head, 25 | neck=neck, 26 | auxiliary_head=auxiliary_head, 27 | loss_regularization=loss_regularization, 28 | train_cfg=train_cfg, 29 | test_cfg=test_cfg, 30 | data_preprocessor=data_preprocessor, 31 | init_cfg=init_cfg) 32 | 33 | def loss(self, batch_inputs_dict,batch_data_samples): 34 | """Calculate losses from a batch of inputs and data samples. 35 | 36 | Args: 37 | batch_inputs_dict (dict): Input sample dict which 38 | includes 'points' and 'imgs' keys. 39 | 40 | - points (List[Tensor]): Point cloud of each sample. 41 | - imgs (Tensor, optional): Image tensor has shape (B, C, H, W). 42 | batch_data_samples (List[:obj:`Det3DDataSample`]): The det3d data 43 | samples. It usually includes information such as `metainfo` and 44 | `gt_pts_seg`. 45 | 46 | Returns: 47 | Dict[str, Tensor]: A dictionary of loss components. 48 | """ 49 | 50 | # extract features using backbone 51 | x = self.extract_feat(batch_inputs_dict) 52 | batch_inputs_dict['features'] = x.features 53 | losses = dict() 54 | loss_decode = self._decode_head_forward_train(batch_inputs_dict, batch_data_samples) 55 | losses.update(loss_decode) 56 | 57 | return losses 58 | 59 | def predict(self, batch_inputs_dict, batch_data_samples, **kwargs): 60 | x = self.extract_feat(batch_inputs_dict) 61 | batch_inputs_dict['features'] = x.features 62 | pts_semantic_preds, pts_instance_preds = self.decode_head.predict(batch_inputs_dict, batch_data_samples) 63 | return self.postprocess_result(pts_semantic_preds, pts_instance_preds, batch_data_samples) 64 | 65 | def postprocess_result(self, pts_semantic_preds, pts_instance_preds, batch_data_samples): 66 | for i in range(len(pts_semantic_preds)): 67 | batch_data_samples[i].set_data( 68 | {'pred_pts_seg': PointData(**{'pts_semantic_mask': pts_semantic_preds[i], 69 | 'pts_instance_mask': pts_instance_preds[i]})}) 70 | return batch_data_samples 71 | -------------------------------------------------------------------------------- /p3former/task_modules/samplers/mask_pseduo_sampler.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | """copy from 3 | https://github.com/ZwwWayne/K-Net/blob/main/knet/det/mask_pseudo_sampler.py.""" 4 | 5 | import torch 6 | from mmengine.structures import InstanceData 7 | 8 | from mmdet3d.registry import TASK_UTILS 9 | from mmdet.models.task_modules.assigners import AssignResult 10 | from torch import Tensor 11 | from mmdet.models.task_modules.samplers.base_sampler import BaseSampler 12 | from mmdet.models.task_modules.samplers.mask_sampling_result import MaskSamplingResult 13 | 14 | 15 | class _MaskSamplingResult(MaskSamplingResult): 16 | """Mask sampling result.""" 17 | 18 | def __init__(self, 19 | pos_inds: Tensor, 20 | neg_inds: Tensor, 21 | masks: Tensor, 22 | gt_masks: Tensor, 23 | assign_result: AssignResult, 24 | gt_flags: Tensor, 25 | avg_factor_with_neg: bool = True) -> None: 26 | super().__init__(pos_inds=pos_inds, 27 | neg_inds=neg_inds, 28 | masks=masks, 29 | gt_masks=gt_masks, 30 | assign_result=assign_result, 31 | gt_flags=gt_flags, 32 | avg_factor_with_neg=avg_factor_with_neg) 33 | if assign_result.labels is not None: 34 | self.pos_gt_labels = assign_result.labels[pos_inds] 35 | else: 36 | self.pos_gt_labels = None 37 | 38 | 39 | @TASK_UTILS.register_module() 40 | class _MaskPseudoSampler(BaseSampler): 41 | """A pseudo sampler that does not do sampling actually.""" 42 | 43 | def __init__(self, **kwargs): 44 | pass 45 | 46 | def _sample_pos(self, **kwargs): 47 | """Sample positive samples.""" 48 | raise NotImplementedError 49 | 50 | def _sample_neg(self, **kwargs): 51 | """Sample negative samples.""" 52 | raise NotImplementedError 53 | 54 | def sample(self, assign_result: AssignResult, pred_instances: InstanceData, 55 | gt_instances: InstanceData, *args, **kwargs): 56 | """Directly returns the positive and negative indices of samples. 57 | 58 | Args: 59 | assign_result (:obj:`AssignResult`): Mask assigning results. 60 | pred_instances (:obj:`InstanceData`): Instances of model 61 | predictions. It includes ``scores`` and ``masks`` predicted 62 | by the model. 63 | gt_instances (:obj:`InstanceData`): Ground truth of instance 64 | annotations. It usually includes ``labels`` and ``masks`` 65 | attributes. 66 | 67 | Returns: 68 | :obj:`SamplingResult`: sampler results 69 | """ 70 | pred_masks = pred_instances.masks 71 | gt_masks = gt_instances.masks 72 | pos_inds = torch.nonzero( 73 | assign_result.gt_inds > 0, as_tuple=False).squeeze(-1).unique() 74 | neg_inds = torch.nonzero( 75 | assign_result.gt_inds == 0, as_tuple=False).squeeze(-1).unique() 76 | gt_flags = pred_masks.new_zeros(pred_masks.shape[0], dtype=torch.uint8) 77 | sampling_result = _MaskSamplingResult( 78 | pos_inds=pos_inds, 79 | neg_inds=neg_inds, 80 | masks=pred_masks, 81 | gt_masks=gt_masks, 82 | assign_result=assign_result, 83 | gt_flags=gt_flags, 84 | avg_factor_with_neg=False) 85 | return sampling_result 86 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import os 4 | import os.path as osp 5 | 6 | from mmengine.config import Config, DictAction 7 | from mmengine.registry import RUNNERS 8 | from mmengine.runner import Runner 9 | 10 | from mmdet3d.utils import replace_ceph_backend 11 | 12 | 13 | # TODO: support fuse_conv_bn and format_only 14 | def parse_args(): 15 | parser = argparse.ArgumentParser( 16 | description='MMDet3D test (and eval) a model') 17 | parser.add_argument('config', help='test config file path') 18 | parser.add_argument('checkpoint', help='checkpoint file') 19 | parser.add_argument( 20 | '--work-dir', 21 | help='the directory to save the file containing evaluation metrics') 22 | parser.add_argument( 23 | '--ceph', action='store_true', help='Use ceph as data storage backend') 24 | parser.add_argument( 25 | '--show', action='store_true', help='show prediction results') 26 | parser.add_argument( 27 | '--show-dir', 28 | help='directory where painted images will be saved. ' 29 | 'If specified, it will be automatically saved ' 30 | 'to the work_dir/timestamp/show_dir') 31 | parser.add_argument( 32 | '--task', 33 | type=str, 34 | choices=[ 35 | 'mono_det', 'multi-view_det', 'lidar_det', 'lidar_seg', 36 | 'multi-modality_det' 37 | ], 38 | help='Determine the visualization method depending on the task.') 39 | parser.add_argument( 40 | '--wait-time', type=float, default=2, help='the interval of show (s)') 41 | parser.add_argument( 42 | '--cfg-options', 43 | nargs='+', 44 | action=DictAction, 45 | help='override some settings in the used config, the key-value pair ' 46 | 'in xxx=yyy format will be merged into config file. If the value to ' 47 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 48 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 49 | 'Note that the quotation marks are necessary and that no white space ' 50 | 'is allowed.') 51 | parser.add_argument( 52 | '--launcher', 53 | choices=['none', 'pytorch', 'slurm', 'mpi'], 54 | default='none', 55 | help='job launcher') 56 | parser.add_argument('--local_rank', type=int, default=0) 57 | args = parser.parse_args() 58 | if 'LOCAL_RANK' not in os.environ: 59 | os.environ['LOCAL_RANK'] = str(args.local_rank) 60 | return args 61 | 62 | 63 | def trigger_visualization_hook(cfg, args): 64 | default_hooks = cfg.default_hooks 65 | if 'visualization' in default_hooks: 66 | visualization_hook = default_hooks['visualization'] 67 | # Turn on visualization 68 | visualization_hook['draw'] = True 69 | if args.show: 70 | visualization_hook['show'] = True 71 | visualization_hook['wait_time'] = args.wait_time 72 | if args.show_dir: 73 | visualization_hook['test_out_dir'] = args.show_dir 74 | visualization_hook['vis_task'] = args.task 75 | else: 76 | raise RuntimeError( 77 | 'VisualizationHook must be included in default_hooks.' 78 | 'refer to usage ' 79 | '"visualization=dict(type=\'VisualizationHook\')"') 80 | 81 | return cfg 82 | 83 | 84 | def main(): 85 | args = parse_args() 86 | 87 | # load config 88 | cfg = Config.fromfile(args.config) 89 | 90 | # TODO: We will unify the ceph support approach with other OpenMMLab repos 91 | if args.ceph: 92 | cfg = replace_ceph_backend(cfg) 93 | 94 | cfg.launcher = args.launcher 95 | if args.cfg_options is not None: 96 | cfg.merge_from_dict(args.cfg_options) 97 | 98 | # work_dir is determined in this priority: CLI > segment in file > filename 99 | if args.work_dir is not None: 100 | # update configs according to CLI args if args.work_dir is not None 101 | cfg.work_dir = args.work_dir 102 | elif cfg.get('work_dir', None) is None: 103 | # use config filename as default work_dir if cfg.work_dir is None 104 | cfg.work_dir = osp.join('./work_dirs', 105 | osp.splitext(osp.basename(args.config))[0]) 106 | 107 | cfg.load_from = args.checkpoint 108 | 109 | if args.show or args.show_dir: 110 | cfg = trigger_visualization_hook(cfg, args) 111 | 112 | # build the runner from config 113 | if 'runner_type' not in cfg: 114 | # build the default runner 115 | runner = Runner.from_cfg(cfg) 116 | else: 117 | # build customized runner from the registry 118 | # if 'runner_type' is set in the cfg 119 | runner = RUNNERS.build(cfg) 120 | 121 | # start testing 122 | runner.test() 123 | 124 | 125 | if __name__ == '__main__': 126 | main() 127 | -------------------------------------------------------------------------------- /tools/create_data.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | from os import path as osp 4 | 5 | from dataset_converters import nuscenes_converter as nuscenes_converter 6 | from dataset_converters import semantickitti_converter 7 | from dataset_converters.update_infos_to_v2 import update_pkl_infos 8 | 9 | def nuscenes_data_prep(root_path, 10 | info_prefix, 11 | version, 12 | dataset_name, 13 | out_dir, 14 | max_sweeps=10): 15 | """Prepare data related to nuScenes dataset. 16 | 17 | Related data consists of '.pkl' files recording basic infos, 18 | 2D annotations and groundtruth database. 19 | 20 | Args: 21 | root_path (str): Path of dataset root. 22 | info_prefix (str): The prefix of info filenames. 23 | version (str): Dataset version. 24 | dataset_name (str): The dataset class name. 25 | out_dir (str): Output directory of the groundtruth database info. 26 | max_sweeps (int, optional): Number of input consecutive frames. 27 | Default: 10 28 | """ 29 | nuscenes_converter.create_nuscenes_infos( 30 | root_path, info_prefix, version=version, max_sweeps=max_sweeps) 31 | 32 | if version == 'v1.0-test': 33 | info_test_path = osp.join(out_dir, f'{info_prefix}_infos_test.pkl') 34 | update_pkl_infos('nuscenes', out_dir=out_dir, pkl_path=info_test_path) 35 | return 36 | 37 | info_train_path = osp.join(out_dir, f'{info_prefix}_infos_train.pkl') 38 | info_val_path = osp.join(out_dir, f'{info_prefix}_infos_val.pkl') 39 | update_pkl_infos('nuscenes', out_dir=out_dir, pkl_path=info_train_path) 40 | update_pkl_infos('nuscenes', out_dir=out_dir, pkl_path=info_val_path) 41 | 42 | def semantickitti_data_prep(info_prefix, out_dir): 43 | """Prepare the info file for SemanticKITTI dataset. 44 | 45 | Args: 46 | info_prefix (str): The prefix of info filenames. 47 | out_dir (str): Output directory of the generated info file. 48 | """ 49 | semantickitti_converter.create_semantickitti_info_file( 50 | info_prefix, out_dir) 51 | 52 | 53 | parser = argparse.ArgumentParser(description='Data converter arg parser') 54 | parser.add_argument('dataset', metavar='kitti', help='name of the dataset') 55 | parser.add_argument( 56 | '--root-path', 57 | type=str, 58 | default='./data/kitti', 59 | help='specify the root path of dataset') 60 | parser.add_argument( 61 | '--version', 62 | type=str, 63 | default='v1.0', 64 | required=False, 65 | help='specify the dataset version, no need for kitti') 66 | parser.add_argument( 67 | '--max-sweeps', 68 | type=int, 69 | default=10, 70 | required=False, 71 | help='specify sweeps of lidar per example') 72 | parser.add_argument( 73 | '--with-plane', 74 | action='store_true', 75 | help='Whether to use plane information for kitti.') 76 | parser.add_argument( 77 | '--out-dir', 78 | type=str, 79 | default='./data/kitti', 80 | required=False, 81 | help='name of info pkl') 82 | parser.add_argument('--extra-tag', type=str, default='kitti') 83 | parser.add_argument( 84 | '--workers', type=int, default=4, help='number of threads to be used') 85 | args = parser.parse_args() 86 | 87 | if __name__ == '__main__': 88 | # from mmdet3d.utils import register_all_modules 89 | # register_all_modules() 90 | 91 | if args.dataset == 'nuscenes' and args.version != 'v1.0-mini': 92 | train_version = f'{args.version}-trainval' 93 | nuscenes_data_prep( 94 | root_path=args.root_path, 95 | info_prefix=args.extra_tag, 96 | version=train_version, 97 | dataset_name='NuScenesDataset', 98 | out_dir=args.out_dir, 99 | max_sweeps=args.max_sweeps) 100 | test_version = f'{args.version}-test' 101 | nuscenes_data_prep( 102 | root_path=args.root_path, 103 | info_prefix=args.extra_tag, 104 | version=test_version, 105 | dataset_name='NuScenesDataset', 106 | out_dir=args.out_dir, 107 | max_sweeps=args.max_sweeps) 108 | elif args.dataset == 'nuscenes' and args.version == 'v1.0-mini': 109 | train_version = f'{args.version}' 110 | nuscenes_data_prep( 111 | root_path=args.root_path, 112 | info_prefix=args.extra_tag, 113 | version=train_version, 114 | dataset_name='NuScenesDataset', 115 | out_dir=args.out_dir, 116 | max_sweeps=args.max_sweeps) 117 | elif args.dataset == 'semantickitti': 118 | semantickitti_data_prep( 119 | info_prefix=args.extra_tag, out_dir=args.out_dir) 120 | else: 121 | raise NotImplementedError(f'Don\'t support {args.dataset} dataset.') 122 | -------------------------------------------------------------------------------- /tools/dataset_converters/nuscenes_converter.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import os 3 | from collections import OrderedDict 4 | from os import path as osp 5 | from typing import List, Tuple, Union 6 | 7 | import mmcv 8 | import mmengine 9 | import numpy as np 10 | from nuscenes.nuscenes import NuScenes 11 | from nuscenes.utils.geometry_utils import view_points 12 | from pyquaternion import Quaternion 13 | from shapely.geometry import MultiPoint, box 14 | 15 | # from mmdet3d.structures import points_cam2img 16 | 17 | NuScenesNameMapping = { 18 | 'movable_object.barrier': 'barrier', 19 | 'vehicle.bicycle': 'bicycle', 20 | 'vehicle.bus.bendy': 'bus', 21 | 'vehicle.bus.rigid': 'bus', 22 | 'vehicle.car': 'car', 23 | 'vehicle.construction': 'construction_vehicle', 24 | 'vehicle.motorcycle': 'motorcycle', 25 | 'human.pedestrian.adult': 'pedestrian', 26 | 'human.pedestrian.child': 'pedestrian', 27 | 'human.pedestrian.construction_worker': 'pedestrian', 28 | 'human.pedestrian.police_officer': 'pedestrian', 29 | 'movable_object.trafficcone': 'traffic_cone', 30 | 'vehicle.trailer': 'trailer', 31 | 'vehicle.truck': 'truck' 32 | } 33 | 34 | nus_categories = ('car', 'truck', 'trailer', 'bus', 'construction_vehicle', 35 | 'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 36 | 'barrier') 37 | 38 | nus_attributes = ('cycle.with_rider', 'cycle.without_rider', 39 | 'pedestrian.moving', 'pedestrian.standing', 40 | 'pedestrian.sitting_lying_down', 'vehicle.moving', 41 | 'vehicle.parked', 'vehicle.stopped', 'None') 42 | 43 | 44 | def create_nuscenes_infos(root_path, 45 | info_prefix, 46 | version='v1.0-trainval', 47 | max_sweeps=10): 48 | """Create info file of nuscene dataset. 49 | 50 | Given the raw data, generate its related info file in pkl format. 51 | 52 | Args: 53 | root_path (str): Path of the data root. 54 | info_prefix (str): Prefix of the info file to be generated. 55 | version (str, optional): Version of the data. 56 | Default: 'v1.0-trainval'. 57 | max_sweeps (int, optional): Max number of sweeps. 58 | Default: 10. 59 | """ 60 | from nuscenes.nuscenes import NuScenes 61 | nusc = NuScenes(version=version, dataroot=root_path, verbose=True) 62 | from nuscenes.utils import splits 63 | available_vers = ['v1.0-trainval', 'v1.0-test', 'v1.0-mini'] 64 | assert version in available_vers 65 | if version == 'v1.0-trainval': 66 | train_scenes = splits.train 67 | val_scenes = splits.val 68 | elif version == 'v1.0-test': 69 | train_scenes = splits.test 70 | val_scenes = [] 71 | elif version == 'v1.0-mini': 72 | train_scenes = splits.mini_train 73 | val_scenes = splits.mini_val 74 | else: 75 | raise ValueError('unknown') 76 | 77 | # filter existing scenes. 78 | available_scenes = get_available_scenes(nusc) 79 | available_scene_names = [s['name'] for s in available_scenes] 80 | train_scenes = list( 81 | filter(lambda x: x in available_scene_names, train_scenes)) 82 | val_scenes = list(filter(lambda x: x in available_scene_names, val_scenes)) 83 | train_scenes = set([ 84 | available_scenes[available_scene_names.index(s)]['token'] 85 | for s in train_scenes 86 | ]) 87 | val_scenes = set([ 88 | available_scenes[available_scene_names.index(s)]['token'] 89 | for s in val_scenes 90 | ]) 91 | 92 | test = 'test' in version 93 | if test: 94 | print('test scene: {}'.format(len(train_scenes))) 95 | else: 96 | print('train scene: {}, val scene: {}'.format( 97 | len(train_scenes), len(val_scenes))) 98 | train_nusc_infos, val_nusc_infos = _fill_trainval_infos( 99 | nusc, train_scenes, val_scenes, test, max_sweeps=max_sweeps) 100 | 101 | metadata = dict(version=version) 102 | if test: 103 | print('test sample: {}'.format(len(train_nusc_infos))) 104 | data = dict(infos=train_nusc_infos, metadata=metadata) 105 | info_path = osp.join(root_path, 106 | '{}_infos_test.pkl'.format(info_prefix)) 107 | mmengine.dump(data, info_path) 108 | else: 109 | print('train sample: {}, val sample: {}'.format( 110 | len(train_nusc_infos), len(val_nusc_infos))) 111 | data = dict(infos=train_nusc_infos, metadata=metadata) 112 | info_path = osp.join(root_path, 113 | '{}_infos_train.pkl'.format(info_prefix)) 114 | mmengine.dump(data, info_path) 115 | data['infos'] = val_nusc_infos 116 | info_val_path = osp.join(root_path, 117 | '{}_infos_val.pkl'.format(info_prefix)) 118 | mmengine.dump(data, info_val_path) 119 | 120 | 121 | def get_available_scenes(nusc): 122 | """Get available scenes from the input nuscenes class. 123 | 124 | Given the raw data, get the information of available scenes for 125 | further info generation. 126 | 127 | Args: 128 | nusc (class): Dataset class in the nuScenes dataset. 129 | 130 | Returns: 131 | available_scenes (list[dict]): List of basic information for the 132 | available scenes. 133 | """ 134 | available_scenes = [] 135 | print('total scene num: {}'.format(len(nusc.scene))) 136 | for scene in nusc.scene: 137 | scene_token = scene['token'] 138 | scene_rec = nusc.get('scene', scene_token) 139 | sample_rec = nusc.get('sample', scene_rec['first_sample_token']) 140 | sd_rec = nusc.get('sample_data', sample_rec['data']['LIDAR_TOP']) 141 | has_more_frames = True 142 | scene_not_exist = False 143 | while has_more_frames: 144 | lidar_path, boxes, _ = nusc.get_sample_data(sd_rec['token']) 145 | lidar_path = str(lidar_path) 146 | if os.getcwd() in lidar_path: 147 | # path from lyftdataset is absolute path 148 | lidar_path = lidar_path.split(f'{os.getcwd()}/')[-1] 149 | # relative path 150 | if not mmengine.is_filepath(lidar_path): 151 | scene_not_exist = True 152 | break 153 | else: 154 | break 155 | if scene_not_exist: 156 | continue 157 | available_scenes.append(scene) 158 | print('exist scene num: {}'.format(len(available_scenes))) 159 | return available_scenes 160 | 161 | 162 | def _fill_trainval_infos(nusc, 163 | train_scenes, 164 | val_scenes, 165 | test=False, 166 | max_sweeps=10): 167 | """Generate the train/val infos from the raw data. 168 | 169 | Args: 170 | nusc (:obj:`NuScenes`): Dataset class in the nuScenes dataset. 171 | train_scenes (list[str]): Basic information of training scenes. 172 | val_scenes (list[str]): Basic information of validation scenes. 173 | test (bool, optional): Whether use the test mode. In test mode, no 174 | annotations can be accessed. Default: False. 175 | max_sweeps (int, optional): Max number of sweeps. Default: 10. 176 | 177 | Returns: 178 | tuple[list[dict]]: Information of training set and validation set 179 | that will be saved to the info file. 180 | """ 181 | train_nusc_infos = [] 182 | val_nusc_infos = [] 183 | 184 | for sample in mmengine.track_iter_progress(nusc.sample): 185 | lidar_token = sample['data']['LIDAR_TOP'] 186 | sd_rec = nusc.get('sample_data', sample['data']['LIDAR_TOP']) 187 | cs_record = nusc.get('calibrated_sensor', 188 | sd_rec['calibrated_sensor_token']) 189 | pose_record = nusc.get('ego_pose', sd_rec['ego_pose_token']) 190 | pts_semantic_mask_path = os.path.join(nusc.get('lidarseg', sd_rec['ego_pose_token'])['filename']) 191 | pts_panoptic_mask_path = os.path.join(nusc.get('panoptic', sd_rec['ego_pose_token'])['filename']) 192 | lidar_path, boxes, _ = nusc.get_sample_data(lidar_token) 193 | mmengine.check_file_exist(lidar_path) 194 | 195 | info = { 196 | 'lidar_path': lidar_path, 197 | 'num_features': 5, 198 | 'token': sample['token'], 199 | 'sweeps': [], 200 | 'cams': dict(), 201 | 'lidar2ego_translation': cs_record['translation'], 202 | 'lidar2ego_rotation': cs_record['rotation'], 203 | 'ego2global_translation': pose_record['translation'], 204 | 'ego2global_rotation': pose_record['rotation'], 205 | 'timestamp': sample['timestamp'], 206 | 'pts_semantic_mask_path': pts_semantic_mask_path, 207 | 'pts_panoptic_mask_path': pts_panoptic_mask_path 208 | } 209 | 210 | l2e_r = info['lidar2ego_rotation'] 211 | l2e_t = info['lidar2ego_translation'] 212 | e2g_r = info['ego2global_rotation'] 213 | e2g_t = info['ego2global_translation'] 214 | l2e_r_mat = Quaternion(l2e_r).rotation_matrix 215 | e2g_r_mat = Quaternion(e2g_r).rotation_matrix 216 | 217 | # obtain 6 image's information per frame 218 | camera_types = [ 219 | 'CAM_FRONT', 220 | 'CAM_FRONT_RIGHT', 221 | 'CAM_FRONT_LEFT', 222 | 'CAM_BACK', 223 | 'CAM_BACK_LEFT', 224 | 'CAM_BACK_RIGHT', 225 | ] 226 | for cam in camera_types: 227 | cam_token = sample['data'][cam] 228 | cam_path, _, cam_intrinsic = nusc.get_sample_data(cam_token) 229 | cam_info = obtain_sensor2top(nusc, cam_token, l2e_t, l2e_r_mat, 230 | e2g_t, e2g_r_mat, cam) 231 | cam_info.update(cam_intrinsic=cam_intrinsic) 232 | info['cams'].update({cam: cam_info}) 233 | 234 | # obtain sweeps for a single key-frame 235 | sd_rec = nusc.get('sample_data', sample['data']['LIDAR_TOP']) 236 | sweeps = [] 237 | while len(sweeps) < max_sweeps: 238 | if not sd_rec['prev'] == '': 239 | sweep = obtain_sensor2top(nusc, sd_rec['prev'], l2e_t, 240 | l2e_r_mat, e2g_t, e2g_r_mat, 'lidar') 241 | sweeps.append(sweep) 242 | sd_rec = nusc.get('sample_data', sd_rec['prev']) 243 | else: 244 | break 245 | info['sweeps'] = sweeps 246 | # obtain annotation 247 | if not test: 248 | annotations = [ 249 | nusc.get('sample_annotation', token) 250 | for token in sample['anns'] 251 | ] 252 | locs = np.array([b.center for b in boxes]).reshape(-1, 3) 253 | dims = np.array([b.wlh for b in boxes]).reshape(-1, 3) 254 | rots = np.array([b.orientation.yaw_pitch_roll[0] 255 | for b in boxes]).reshape(-1, 1) 256 | velocity = np.array( 257 | [nusc.box_velocity(token)[:2] for token in sample['anns']]) 258 | valid_flag = np.array( 259 | [(anno['num_lidar_pts'] + anno['num_radar_pts']) > 0 260 | for anno in annotations], 261 | dtype=bool).reshape(-1) 262 | # convert velo from global to lidar 263 | for i in range(len(boxes)): 264 | velo = np.array([*velocity[i], 0.0]) 265 | velo = velo @ np.linalg.inv(e2g_r_mat).T @ np.linalg.inv( 266 | l2e_r_mat).T 267 | velocity[i] = velo[:2] 268 | 269 | names = [b.name for b in boxes] 270 | for i in range(len(names)): 271 | if names[i] in NuScenesNameMapping: 272 | names[i] = NuScenesNameMapping[names[i]] 273 | names = np.array(names) 274 | # we need to convert box size to 275 | # the format of our lidar coordinate system 276 | # which is x_size, y_size, z_size (corresponding to l, w, h) 277 | gt_boxes = np.concatenate([locs, dims[:, [1, 0, 2]], rots], axis=1) 278 | assert len(gt_boxes) == len( 279 | annotations), f'{len(gt_boxes)}, {len(annotations)}' 280 | info['gt_boxes'] = gt_boxes 281 | info['gt_names'] = names 282 | info['gt_velocity'] = velocity.reshape(-1, 2) 283 | info['num_lidar_pts'] = np.array( 284 | [a['num_lidar_pts'] for a in annotations]) 285 | info['num_radar_pts'] = np.array( 286 | [a['num_radar_pts'] for a in annotations]) 287 | info['valid_flag'] = valid_flag 288 | 289 | if sample['scene_token'] in train_scenes: 290 | train_nusc_infos.append(info) 291 | else: 292 | val_nusc_infos.append(info) 293 | 294 | return train_nusc_infos, val_nusc_infos 295 | 296 | 297 | def obtain_sensor2top(nusc, 298 | sensor_token, 299 | l2e_t, 300 | l2e_r_mat, 301 | e2g_t, 302 | e2g_r_mat, 303 | sensor_type='lidar'): 304 | """Obtain the info with RT matric from general sensor to Top LiDAR. 305 | 306 | Args: 307 | nusc (class): Dataset class in the nuScenes dataset. 308 | sensor_token (str): Sample data token corresponding to the 309 | specific sensor type. 310 | l2e_t (np.ndarray): Translation from lidar to ego in shape (1, 3). 311 | l2e_r_mat (np.ndarray): Rotation matrix from lidar to ego 312 | in shape (3, 3). 313 | e2g_t (np.ndarray): Translation from ego to global in shape (1, 3). 314 | e2g_r_mat (np.ndarray): Rotation matrix from ego to global 315 | in shape (3, 3). 316 | sensor_type (str, optional): Sensor to calibrate. Default: 'lidar'. 317 | 318 | Returns: 319 | sweep (dict): Sweep information after transformation. 320 | """ 321 | sd_rec = nusc.get('sample_data', sensor_token) 322 | cs_record = nusc.get('calibrated_sensor', 323 | sd_rec['calibrated_sensor_token']) 324 | pose_record = nusc.get('ego_pose', sd_rec['ego_pose_token']) 325 | data_path = str(nusc.get_sample_data_path(sd_rec['token'])) 326 | if os.getcwd() in data_path: # path from lyftdataset is absolute path 327 | data_path = data_path.split(f'{os.getcwd()}/')[-1] # relative path 328 | sweep = { 329 | 'data_path': data_path, 330 | 'type': sensor_type, 331 | 'sample_data_token': sd_rec['token'], 332 | 'sensor2ego_translation': cs_record['translation'], 333 | 'sensor2ego_rotation': cs_record['rotation'], 334 | 'ego2global_translation': pose_record['translation'], 335 | 'ego2global_rotation': pose_record['rotation'], 336 | 'timestamp': sd_rec['timestamp'] 337 | } 338 | l2e_r_s = sweep['sensor2ego_rotation'] 339 | l2e_t_s = sweep['sensor2ego_translation'] 340 | e2g_r_s = sweep['ego2global_rotation'] 341 | e2g_t_s = sweep['ego2global_translation'] 342 | 343 | # obtain the RT from sensor to Top LiDAR 344 | # sweep->ego->global->ego'->lidar 345 | l2e_r_s_mat = Quaternion(l2e_r_s).rotation_matrix 346 | e2g_r_s_mat = Quaternion(e2g_r_s).rotation_matrix 347 | R = (l2e_r_s_mat.T @ e2g_r_s_mat.T) @ ( 348 | np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T) 349 | T = (l2e_t_s @ e2g_r_s_mat.T + e2g_t_s) @ ( 350 | np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T) 351 | T -= e2g_t @ (np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T 352 | ) + l2e_t @ np.linalg.inv(l2e_r_mat).T 353 | sweep['sensor2lidar_rotation'] = R.T # points @ R.T + T 354 | sweep['sensor2lidar_translation'] = T 355 | return sweep 356 | 357 | 358 | def export_2d_annotation(root_path, info_path, version, mono3d=True): 359 | """Export 2d annotation from the info file and raw data. 360 | 361 | Args: 362 | root_path (str): Root path of the raw data. 363 | info_path (str): Path of the info file. 364 | version (str): Dataset version. 365 | mono3d (bool, optional): Whether to export mono3d annotation. 366 | Default: True. 367 | """ 368 | # get bbox annotations for camera 369 | camera_types = [ 370 | 'CAM_FRONT', 371 | 'CAM_FRONT_RIGHT', 372 | 'CAM_FRONT_LEFT', 373 | 'CAM_BACK', 374 | 'CAM_BACK_LEFT', 375 | 'CAM_BACK_RIGHT', 376 | ] 377 | nusc_infos = mmengine.load(info_path)['infos'] 378 | nusc = NuScenes(version=version, dataroot=root_path, verbose=True) 379 | # info_2d_list = [] 380 | cat2Ids = [ 381 | dict(id=nus_categories.index(cat_name), name=cat_name) 382 | for cat_name in nus_categories 383 | ] 384 | coco_ann_id = 0 385 | coco_2d_dict = dict(annotations=[], images=[], categories=cat2Ids) 386 | for info in mmengine.track_iter_progress(nusc_infos): 387 | for cam in camera_types: 388 | cam_info = info['cams'][cam] 389 | coco_infos = get_2d_boxes( 390 | nusc, 391 | cam_info['sample_data_token'], 392 | visibilities=['', '1', '2', '3', '4'], 393 | mono3d=mono3d) 394 | (height, width, _) = mmcv.imread(cam_info['data_path']).shape 395 | coco_2d_dict['images'].append( 396 | dict( 397 | file_name=cam_info['data_path'].split('data/nuscenes/') 398 | [-1], 399 | id=cam_info['sample_data_token'], 400 | token=info['token'], 401 | cam2ego_rotation=cam_info['sensor2ego_rotation'], 402 | cam2ego_translation=cam_info['sensor2ego_translation'], 403 | ego2global_rotation=info['ego2global_rotation'], 404 | ego2global_translation=info['ego2global_translation'], 405 | cam_intrinsic=cam_info['cam_intrinsic'], 406 | width=width, 407 | height=height)) 408 | for coco_info in coco_infos: 409 | if coco_info is None: 410 | continue 411 | # add an empty key for coco format 412 | coco_info['segmentation'] = [] 413 | coco_info['id'] = coco_ann_id 414 | coco_2d_dict['annotations'].append(coco_info) 415 | coco_ann_id += 1 416 | if mono3d: 417 | json_prefix = f'{info_path[:-4]}_mono3d' 418 | else: 419 | json_prefix = f'{info_path[:-4]}' 420 | mmengine.dump(coco_2d_dict, f'{json_prefix}.coco.json') 421 | 422 | 423 | def get_2d_boxes(nusc, 424 | sample_data_token: str, 425 | visibilities: List[str], 426 | mono3d=True): 427 | """Get the 2D annotation records for a given `sample_data_token`. 428 | 429 | Args: 430 | sample_data_token (str): Sample data token belonging to a camera 431 | keyframe. 432 | visibilities (list[str]): Visibility filter. 433 | mono3d (bool): Whether to get boxes with mono3d annotation. 434 | 435 | Return: 436 | list[dict]: List of 2D annotation record that belongs to the input 437 | `sample_data_token`. 438 | """ 439 | 440 | # Get the sample data and the sample corresponding to that sample data. 441 | sd_rec = nusc.get('sample_data', sample_data_token) 442 | 443 | assert sd_rec[ 444 | 'sensor_modality'] == 'camera', 'Error: get_2d_boxes only works' \ 445 | ' for camera sample_data!' 446 | if not sd_rec['is_key_frame']: 447 | raise ValueError( 448 | 'The 2D re-projections are available only for keyframes.') 449 | 450 | s_rec = nusc.get('sample', sd_rec['sample_token']) 451 | 452 | # Get the calibrated sensor and ego pose 453 | # record to get the transformation matrices. 454 | cs_rec = nusc.get('calibrated_sensor', sd_rec['calibrated_sensor_token']) 455 | pose_rec = nusc.get('ego_pose', sd_rec['ego_pose_token']) 456 | camera_intrinsic = np.array(cs_rec['camera_intrinsic']) 457 | 458 | # Get all the annotation with the specified visibilties. 459 | ann_recs = [ 460 | nusc.get('sample_annotation', token) for token in s_rec['anns'] 461 | ] 462 | ann_recs = [ 463 | ann_rec for ann_rec in ann_recs 464 | if (ann_rec['visibility_token'] in visibilities) 465 | ] 466 | 467 | repro_recs = [] 468 | 469 | for ann_rec in ann_recs: 470 | # Augment sample_annotation with token information. 471 | ann_rec['sample_annotation_token'] = ann_rec['token'] 472 | ann_rec['sample_data_token'] = sample_data_token 473 | 474 | # Get the box in global coordinates. 475 | box = nusc.get_box(ann_rec['token']) 476 | 477 | # Move them to the ego-pose frame. 478 | box.translate(-np.array(pose_rec['translation'])) 479 | box.rotate(Quaternion(pose_rec['rotation']).inverse) 480 | 481 | # Move them to the calibrated sensor frame. 482 | box.translate(-np.array(cs_rec['translation'])) 483 | box.rotate(Quaternion(cs_rec['rotation']).inverse) 484 | 485 | # Filter out the corners that are not in front of the calibrated 486 | # sensor. 487 | corners_3d = box.corners() 488 | in_front = np.argwhere(corners_3d[2, :] > 0).flatten() 489 | corners_3d = corners_3d[:, in_front] 490 | 491 | # Project 3d box to 2d. 492 | corner_coords = view_points(corners_3d, camera_intrinsic, 493 | True).T[:, :2].tolist() 494 | 495 | # Keep only corners that fall within the image. 496 | final_coords = post_process_coords(corner_coords) 497 | 498 | # Skip if the convex hull of the re-projected corners 499 | # does not intersect the image canvas. 500 | if final_coords is None: 501 | continue 502 | else: 503 | min_x, min_y, max_x, max_y = final_coords 504 | 505 | # Generate dictionary record to be included in the .json file. 506 | repro_rec = generate_record(ann_rec, min_x, min_y, max_x, max_y, 507 | sample_data_token, sd_rec['filename']) 508 | 509 | # If mono3d=True, add 3D annotations in camera coordinates 510 | if mono3d and (repro_rec is not None): 511 | loc = box.center.tolist() 512 | 513 | dim = box.wlh 514 | dim[[0, 1, 2]] = dim[[1, 2, 0]] # convert wlh to our lhw 515 | dim = dim.tolist() 516 | 517 | rot = box.orientation.yaw_pitch_roll[0] 518 | rot = [-rot] # convert the rot to our cam coordinate 519 | 520 | global_velo2d = nusc.box_velocity(box.token)[:2] 521 | global_velo3d = np.array([*global_velo2d, 0.0]) 522 | e2g_r_mat = Quaternion(pose_rec['rotation']).rotation_matrix 523 | c2e_r_mat = Quaternion(cs_rec['rotation']).rotation_matrix 524 | cam_velo3d = global_velo3d @ np.linalg.inv( 525 | e2g_r_mat).T @ np.linalg.inv(c2e_r_mat).T 526 | velo = cam_velo3d[0::2].tolist() 527 | 528 | repro_rec['bbox_cam3d'] = loc + dim + rot 529 | repro_rec['velo_cam3d'] = velo 530 | 531 | center3d = np.array(loc).reshape([1, 3]) 532 | # center2d = points_cam2img( 533 | # center3d, camera_intrinsic, with_depth=True) 534 | # repro_rec['center2d'] = center2d.squeeze().tolist() 535 | # normalized center2D + depth 536 | # if samples with depth < 0 will be removed 537 | # if repro_rec['center2d'][2] <= 0: 538 | # continue 539 | 540 | ann_token = nusc.get('sample_annotation', 541 | box.token)['attribute_tokens'] 542 | if len(ann_token) == 0: 543 | attr_name = 'None' 544 | else: 545 | attr_name = nusc.get('attribute', ann_token[0])['name'] 546 | attr_id = nus_attributes.index(attr_name) 547 | repro_rec['attribute_name'] = attr_name 548 | repro_rec['attribute_id'] = attr_id 549 | 550 | repro_recs.append(repro_rec) 551 | 552 | return repro_recs 553 | 554 | 555 | def post_process_coords( 556 | corner_coords: List, imsize: Tuple[int, int] = (1600, 900) 557 | ) -> Union[Tuple[float, float, float, float], None]: 558 | """Get the intersection of the convex hull of the reprojected bbox corners 559 | and the image canvas, return None if no intersection. 560 | 561 | Args: 562 | corner_coords (list[int]): Corner coordinates of reprojected 563 | bounding box. 564 | imsize (tuple[int]): Size of the image canvas. 565 | 566 | Return: 567 | tuple [float]: Intersection of the convex hull of the 2D box 568 | corners and the image canvas. 569 | """ 570 | polygon_from_2d_box = MultiPoint(corner_coords).convex_hull 571 | img_canvas = box(0, 0, imsize[0], imsize[1]) 572 | 573 | if polygon_from_2d_box.intersects(img_canvas): 574 | img_intersection = polygon_from_2d_box.intersection(img_canvas) 575 | intersection_coords = np.array( 576 | [coord for coord in img_intersection.exterior.coords]) 577 | 578 | min_x = min(intersection_coords[:, 0]) 579 | min_y = min(intersection_coords[:, 1]) 580 | max_x = max(intersection_coords[:, 0]) 581 | max_y = max(intersection_coords[:, 1]) 582 | 583 | return min_x, min_y, max_x, max_y 584 | else: 585 | return None 586 | 587 | 588 | def generate_record(ann_rec: dict, x1: float, y1: float, x2: float, y2: float, 589 | sample_data_token: str, filename: str) -> OrderedDict: 590 | """Generate one 2D annotation record given various information on top of 591 | the 2D bounding box coordinates. 592 | 593 | Args: 594 | ann_rec (dict): Original 3d annotation record. 595 | x1 (float): Minimum value of the x coordinate. 596 | y1 (float): Minimum value of the y coordinate. 597 | x2 (float): Maximum value of the x coordinate. 598 | y2 (float): Maximum value of the y coordinate. 599 | sample_data_token (str): Sample data token. 600 | filename (str):The corresponding image file where the annotation 601 | is present. 602 | 603 | Returns: 604 | dict: A sample 2D annotation record. 605 | - file_name (str): file name 606 | - image_id (str): sample data token 607 | - area (float): 2d box area 608 | - category_name (str): category name 609 | - category_id (int): category id 610 | - bbox (list[float]): left x, top y, dx, dy of 2d box 611 | - iscrowd (int): whether the area is crowd 612 | """ 613 | repro_rec = OrderedDict() 614 | repro_rec['sample_data_token'] = sample_data_token 615 | coco_rec = dict() 616 | 617 | relevant_keys = [ 618 | 'attribute_tokens', 619 | 'category_name', 620 | 'instance_token', 621 | 'next', 622 | 'num_lidar_pts', 623 | 'num_radar_pts', 624 | 'prev', 625 | 'sample_annotation_token', 626 | 'sample_data_token', 627 | 'visibility_token', 628 | ] 629 | 630 | for key, value in ann_rec.items(): 631 | if key in relevant_keys: 632 | repro_rec[key] = value 633 | 634 | repro_rec['bbox_corners'] = [x1, y1, x2, y2] 635 | repro_rec['filename'] = filename 636 | 637 | coco_rec['file_name'] = filename 638 | coco_rec['image_id'] = sample_data_token 639 | coco_rec['area'] = (y2 - y1) * (x2 - x1) 640 | 641 | if repro_rec['category_name'] not in NuScenesNameMapping: 642 | return None 643 | cat_name = NuScenesNameMapping[repro_rec['category_name']] 644 | coco_rec['category_name'] = cat_name 645 | coco_rec['category_id'] = nus_categories.index(cat_name) 646 | coco_rec['bbox'] = [x1, y1, x2 - x1, y2 - y1] 647 | coco_rec['iscrowd'] = 0 648 | 649 | return coco_rec 650 | -------------------------------------------------------------------------------- /tools/dataset_converters/semantickitti_converter.py: -------------------------------------------------------------------------------- 1 | from os import path as osp 2 | from pathlib import Path 3 | 4 | import mmengine 5 | 6 | total_num = { 7 | 0: 4541, 8 | 1: 1101, 9 | 2: 4661, 10 | 3: 801, 11 | 4: 271, 12 | 5: 2761, 13 | 6: 1101, 14 | 7: 1101, 15 | 8: 4071, 16 | 9: 1591, 17 | 10: 1201, 18 | 11: 921, 19 | 12: 1061, 20 | 13: 3281, 21 | 14: 631, 22 | 15: 1901, 23 | 16: 1731, 24 | 17: 491, 25 | 18: 1801, 26 | 19: 4981, 27 | 20: 831, 28 | 21: 2721, 29 | } 30 | fold_split = { 31 | 'train': [0, 1, 2, 3, 4, 5, 6, 7, 9, 10], 32 | 'val': [8], 33 | 'test': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21], 34 | 'trainval': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 35 | } 36 | 37 | 38 | def get_semantickitti_info(split): 39 | """Create info file in the form of 40 | data_infos={ 41 | 'metainfo': {'DATASET': 'SemanticKITTI'}, 42 | 'data_list': { 43 | 00000: { 44 | 'lidar_points':{ 45 | 'lidat_path':'sequences/00/velodyne/000000.bin' 46 | }, 47 | 'pts_semantic_mask_path': 48 | 'sequences/000/labels/000000.labbel', 49 | 'pts_panoptic_mask_path': 50 | 'sequences/000/labels/000000.labbel', 51 | 'sample_id': '00' 52 | }, 53 | ... 54 | } 55 | } 56 | """ 57 | data_infos = dict() 58 | data_infos['metainfo'] = dict(DATASET='SemanticKITTI') 59 | data_list = [] 60 | for i_folder in fold_split[split]: 61 | for j in range(0, total_num[i_folder]): 62 | data_list.append({ 63 | 'lidar_points': { 64 | 'lidar_path': 65 | osp.join('sequences', 66 | str(i_folder).zfill(2), 'velodyne', 67 | str(j).zfill(6) + '.bin'), 68 | 'num_pts_feats': 69 | 4 70 | }, 71 | 'pts_semantic_mask_path': 72 | osp.join('sequences', 73 | str(i_folder).zfill(2), 'labels', 74 | str(j).zfill(6) + '.label'), 75 | 'pts_panoptic_mask_path': 76 | osp.join('sequences', 77 | str(i_folder).zfill(2), 'labels', 78 | str(j).zfill(6) + '.label'), 79 | 'sample_id': 80 | str(i_folder) + str(j) 81 | }) 82 | data_infos.update(dict(data_list=data_list)) 83 | return data_infos 84 | 85 | 86 | def create_semantickitti_info_file(pkl_prefix, save_path): 87 | """Create info file of SemanticKITTI dataset. 88 | 89 | Directly generate info file without raw data. 90 | 91 | Args: 92 | pkl_prefix (str): Prefix of the info file to be generated. 93 | save_path (str): Path to save the info file. 94 | """ 95 | print('Generate info.') 96 | save_path = Path(save_path) 97 | 98 | semantickitti_infos_train = get_semantickitti_info(split='train') 99 | filename = save_path / f'{pkl_prefix}_infos_train.pkl' 100 | print(f'SemanticKITTI info train file is saved to {filename}') 101 | mmengine.dump(semantickitti_infos_train, filename) 102 | semantickitti_infos_val = get_semantickitti_info(split='val') 103 | filename = save_path / f'{pkl_prefix}_infos_val.pkl' 104 | print(f'SemanticKITTI info val file is saved to {filename}') 105 | mmengine.dump(semantickitti_infos_val, filename) 106 | semantickitti_infos_test = get_semantickitti_info(split='test') 107 | filename = save_path / f'{pkl_prefix}_infos_test.pkl' 108 | print(f'SemanticKITTI info test file is saved to {filename}') 109 | mmengine.dump(semantickitti_infos_test, filename) 110 | semantickitti_infos_test = get_semantickitti_info(split='trainval') 111 | filename = save_path / f'{pkl_prefix}_infos_trainval.pkl' 112 | print(f'SemanticKITTI info test file is saved to {filename}') 113 | mmengine.dump(semantickitti_infos_test, filename) -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import logging 4 | import os 5 | import os.path as osp 6 | 7 | from mmengine.config import Config, DictAction 8 | from mmengine.logging import print_log 9 | from mmengine.registry import RUNNERS 10 | from mmengine.runner import Runner 11 | 12 | from mmdet3d.utils import replace_ceph_backend 13 | 14 | def parse_args(): 15 | parser = argparse.ArgumentParser(description='Train a 3D detector') 16 | parser.add_argument('config', help='train config file path') 17 | parser.add_argument('--work-dir', help='the dir to save logs and models') 18 | parser.add_argument('--checkpoint', help='the dir to save logs and models') 19 | parser.add_argument( 20 | '--amp', 21 | action='store_true', 22 | default=False, 23 | help='enable automatic-mixed-precision training') 24 | parser.add_argument( 25 | '--auto-scale-lr', 26 | action='store_true', 27 | help='enable automatically scaling LR.') 28 | parser.add_argument( 29 | '--resume', 30 | nargs='?', 31 | type=str, 32 | const='auto', 33 | help='If specify checkpoint path, resume from it, while if not ' 34 | 'specify, try to auto resume from the latest checkpoint ' 35 | 'in the work directory.') 36 | parser.add_argument( 37 | '--ceph', action='store_true', help='Use ceph as data storage backend') 38 | parser.add_argument( 39 | '--cfg-options', 40 | nargs='+', 41 | action=DictAction, 42 | help='override some settings in the used config, the key-value pair ' 43 | 'in xxx=yyy format will be merged into config file. If the value to ' 44 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 45 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 46 | 'Note that the quotation marks are necessary and that no white space ' 47 | 'is allowed.') 48 | parser.add_argument( 49 | '--launcher', 50 | choices=['none', 'pytorch', 'slurm', 'mpi'], 51 | default='none', 52 | help='job launcher') 53 | # When using PyTorch version >= 2.0.0, the `torch.distributed.launch` 54 | # will pass the `--local-rank` parameter to `tools/train.py` instead 55 | # of `--local_rank`. 56 | parser.add_argument('--local_rank', '--local-rank', type=int, default=0) 57 | args = parser.parse_args() 58 | if 'LOCAL_RANK' not in os.environ: 59 | os.environ['LOCAL_RANK'] = str(args.local_rank) 60 | return args 61 | 62 | 63 | def main(): 64 | args = parse_args() 65 | 66 | # load config 67 | cfg = Config.fromfile(args.config) 68 | 69 | # TODO: We will unify the ceph support approach with other OpenMMLab repos 70 | if args.ceph: 71 | cfg = replace_ceph_backend(cfg) 72 | 73 | cfg.launcher = args.launcher 74 | if args.cfg_options is not None: 75 | cfg.merge_from_dict(args.cfg_options) 76 | 77 | # work_dir is determined in this priority: CLI > segment in file > filename 78 | if args.work_dir is not None: 79 | # update configs according to CLI args if args.work_dir is not None 80 | cfg.work_dir = args.work_dir 81 | elif cfg.get('work_dir', None) is None: 82 | # use config filename as default work_dir if cfg.work_dir is None 83 | cfg.work_dir = osp.join('./work_dirs', 84 | osp.splitext(osp.basename(args.config))[0]) 85 | 86 | # enable automatic-mixed-precision training 87 | if args.amp is True: 88 | optim_wrapper = cfg.optim_wrapper.type 89 | if optim_wrapper == 'AmpOptimWrapper': 90 | print_log( 91 | 'AMP training is already enabled in your config.', 92 | logger='current', 93 | level=logging.WARNING) 94 | else: 95 | assert optim_wrapper == 'OptimWrapper', ( 96 | '`--amp` is only supported when the optimizer wrapper type is ' 97 | f'`OptimWrapper` but got {optim_wrapper}.') 98 | cfg.optim_wrapper.type = 'AmpOptimWrapper' 99 | cfg.optim_wrapper.loss_scale = 'dynamic' 100 | 101 | # enable automatically scaling LR 102 | if args.auto_scale_lr: 103 | if 'auto_scale_lr' in cfg and \ 104 | 'enable' in cfg.auto_scale_lr and \ 105 | 'base_batch_size' in cfg.auto_scale_lr: 106 | cfg.auto_scale_lr.enable = True 107 | else: 108 | raise RuntimeError('Can not find "auto_scale_lr" or ' 109 | '"auto_scale_lr.enable" or ' 110 | '"auto_scale_lr.base_batch_size" in your' 111 | ' configuration file.') 112 | 113 | # resume is determined in this priority: resume from > auto_resume 114 | if args.resume == 'auto': 115 | cfg.resume = True 116 | cfg.load_from = None 117 | elif args.checkpoint is not None: 118 | cfg.load_from = args.checkpoint 119 | elif args.resume is not None: 120 | cfg.resume = True 121 | cfg.load_from = args.resume 122 | 123 | # build the runner from config 124 | if 'runner_type' not in cfg: 125 | # build the default runner 126 | runner = Runner.from_cfg(cfg) 127 | else: 128 | # build customized runner from the registry 129 | # if 'runner_type' is set in the cfg 130 | runner = RUNNERS.build(cfg) 131 | 132 | # start training 133 | runner.train() 134 | 135 | 136 | if __name__ == '__main__': 137 | main() 138 | --------------------------------------------------------------------------------