├── README.md
├── configs
    ├── _base_
    │   ├── datasets
    │   │   ├── semantickitti_lpmix.py
    │   │   └── semantickitti_panoptic_lpmix.py
    │   ├── default_runtime.py
    │   └── models
    │   │   ├── cylinder3d.py
    │   │   └── p3former.py
    ├── cylinder3d
    │   └── cylinder3d_8xb2_3x_semantickitti.py
    └── p3former
    │   ├── p3former_8xb2_3x_semantickitti.py
    │   ├── p3former_8xb2_3x_semantickitti_submit.py
    │   └── p3former_8xb2_3x_semantickitti_trainval.py
├── datasets
    ├── semantickitti_dataset.py
    └── transforms
    │   ├── __pycache__
    │       ├── loading.cpython-39.pyc
    │       └── transforms_3d.cpython-39.pyc
    │   ├── formating.py
    │   ├── loading.py
    │   └── transforms_3d.py
├── dist_test.sh
├── dist_train.sh
├── evaluation
    ├── functional
    │   └── panoptic_seg_eval.py
    └── metrics
    │   └── panoptic_seg_metric.py
├── p3former
    ├── backbones
    │   └── cylinder3d.py
    ├── data_preprocessors
    │   └── data_preprocessor.py
    ├── decode_heads
    │   └── p3former_head.py
    ├── segmentors
    │   └── p3former.py
    └── task_modules
    │   └── samplers
    │       └── mask_pseduo_sampler.py
├── test.py
├── tools
    ├── create_data.py
    └── dataset_converters
    │   ├── nuscenes_converter.py
    │   ├── semantickitti_converter.py
    │   └── update_infos_to_v2.py
└── train.py


/README.md:
--------------------------------------------------------------------------------
  1 | # P3Former: Position-Guided Point Cloud Panoptic Segmentation Transformer
  2 | 
  3 | ![main figure](https://user-images.githubusercontent.com/45515569/227226959-35f887e0-453b-4ac8-81c0-cb4b2f79333c.png)
  4 | 
  5 | ## Introduction
  6 | 
  7 | This is an official release of [Position-Guided Point Cloud Panoptic Segmentation Transformer](https://arxiv.org/abs/2303.13509).
  8 | 
  9 | 
 10 | ## Abstract
 11 | 
 12 | DEtection TRansformer (DETR) started a trend that uses a group of learnable queries for unified visual perception.
 13 | This work begins by applying this appealing paradigm to LiDAR-based point cloud segmentation and obtains a simple yet effective baseline.
 14 | Although the naive adaptation obtains fair results, the instance segmentation performance is noticeably inferior to previous works. 
 15 | By diving into the details, we observe that instances in the sparse point clouds are relatively small to the whole scene and often have similar geometry but lack distinctive appearance for segmentation, which are rare in the image domain. 
 16 | Considering instances in 3D are more featured by their positional information, we emphasize their roles during the modeling and design a robust Mixed-parameterized Positional Embedding (MPE) to guide the segmentation process. 
 17 | It is embedded into backbone features and later guides the mask prediction and query update processes iteratively, leading to Position-Aware Segmentation (PA-Seg) and Masked Focal Attention (MFA).
 18 | All these designs impel the queries to attend to specific regions and identify various instances. 
 19 | The method, named Position-guided Point cloud Panoptic segmentation transFormer (P3Former), outperforms previous state-of-the-art methods by 3.4% and 1.2% PQ on SemanticKITTI and nuScenes benchmark, respectively. 
 20 | The source code and models are available at https://github.com/SmartBot-PJLab/P3Former.
 21 | 
 22 | 
 23 | 
 24 | ## Results
 25 | 
 26 | ### SemanticKITTI test
 27 | 
 28 | | $\mathrm{PQ}$ | $\mathrm{PQ^{\dagger}}$ | $\mathrm{RQ}$ | $\mathrm{SQ}$ | $\mathrm{PQ}^{\mathrm{Th}}$ | $\mathrm{PQ}^{\mathrm{St}}$ | Download | Config |
 29 | | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: |
 30 | | 65.3 | 67.8 | 74.9 | 86.6 | 67.4 | 63.7 | [model](https://drive.google.com/drive/folders/1RBDWV-oWOQsDAhNE8Z7SMLWsriLrpWCx?usp=sharing) | [config](https://github.com/SmartBot-PJLab/P3Former/blob/semantickitti/configs/p3former/p3former_8xb2_3x_semantickitti_trainval.py) |
 31 | 
 32 | ### SemanticKITTI validation
 33 | 
 34 | | $\mathrm{PQ}$ | $\mathrm{PQ^{\dagger}}$ | $\mathrm{RQ}$ | $\mathrm{SQ}$ | $\mathrm{PQ}^{\mathrm{Th}}$ | $\mathrm{PQ}^{\mathrm{St}}$ | Download | Config |
 35 | | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: |
 36 | | 62.6 | 66.2 | 72.4 | 76.2 | 69.4 | 57.7 | [model](https://drive.google.com/drive/folders/1RBDWV-oWOQsDAhNE8Z7SMLWsriLrpWCx?usp=sharing) | [config](https://github.com/SmartBot-PJLab/P3Former/blob/semantickitti/configs/p3former/p3former_8xb2_3x_semantickitti.py) |
 37 | 
 38 | * Pretraining a backbone is helpful to stablize the the training process and get slightly better results. You can pretrain a model with [config](https://github.com/SmartBot-PJLab/P3Former/blob/semantickitti/configs/cylinder3d/cylinder3d_8xb2_3x_semantickitti.py).
 39 | 
 40 | ## Installation
 41 | 
 42 | ```
 43 | conda create -n p3former python==3.8 -y
 44 | conda activate p3former
 45 | pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/
 46 | whl/cu111/torch_stable.html
 47 | pip install openmim
 48 | mim install mmengine==0.7.4
 49 | mim install mmcv==2.0.0rc4
 50 | mim install mmdet==3.0.0
 51 | mim install mmdet3d==1.1.0
 52 | wget https://data.pyg.org/whl/torch-1.10.0%2Bcu113/torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl
 53 | pip install torch_scatter-2.0.9-cp38-cp38-linux_x86_64.whl
 54 | ```
 55 | 
 56 | ## Usage
 57 | 
 58 | ### Data preparation
 59 | 
 60 | #### Semantickitti
 61 | 
 62 | ```text
 63 | data/
 64 | ├── semantickitti
 65 | │   ├── sequences
 66 | │   │   ├── 00
 67 | │   │   |   ├── labels
 68 | │   │   |   ├── velodyne
 69 | │   │   ├── 01
 70 | │   │   ├── ...
 71 | │   ├── semantickitti_infos_train.pkl
 72 | │   ├── semantickitti_infos_val.pkl
 73 | │   ├── semantickitti_infos_test.pkl
 74 | 
 75 | ```
 76 | 
 77 | You can generate *.pkl by excuting
 78 | 
 79 | ```
 80 | python tools/create_data.py semantickitti --root-path data/semantickitti --out-dir data/semantickitti --extra-tag semantickitti
 81 | ```
 82 | 
 83 | ## Training and testing
 84 | 
 85 | ```bash
 86 | # train
 87 | sh dist_train.sh $CONFIG $GPUS
 88 | 
 89 | # val
 90 | sh dist_test.sh $CONFIG $CHECKPOINT $GPUS
 91 | 
 92 | # test
 93 | sh dist_test.sh $CONFIG $CHECKPOINT $GPUS
 94 | 
 95 | ```
 96 | 
 97 | ## Citation
 98 | 
 99 | ```bibtex
100 | @article{xiao2023p3former,
101 |     title={Position-Guided Point Cloud Panoptic Segmentation Transformer},
102 |     author={Xiao, Zeqi and Zhang, Wenwei and Wang, Tai and Chen Change Loy and Lin, Dahua and Pang, Jiangmiao},
103 |     journal={arXiv preprint},
104 |     year={2023}
105 | }
106 | ```
107 | 
108 | 
109 | ## Acknowledgements
110 | 
111 | We thank the contributors of [MMDetection3D](https://github.com/open-mmlab/mmdetection3d) and the authors of [Cylinder3D](https://github.com/xinge008/Cylinder3D) and [K-Net](https://github.com/ZwwWayne/K-Net) for their great work.
112 | 


--------------------------------------------------------------------------------
/configs/_base_/datasets/semantickitti_lpmix.py:
--------------------------------------------------------------------------------
  1 | # For SemanticKitti we usually do 19-class segmentation.
  2 | # For labels_map we follow the uniform format of MMDetection & MMSegmentation
  3 | # i.e. we consider the unlabeled class as the last one, which is different
  4 | # from the original implementation of some methods e.g. Cylinder3D.
  5 | dataset_type = '_SemanticKittiDataset'
  6 | data_root = 'data/semantickitti/'
  7 | class_names = [
  8 |     'car', 'bicycle', 'motorcycle', 'truck', 'bus', 'person', 'bicyclist',
  9 |     'motorcyclist', 'road', 'parking', 'sidewalk', 'other-ground', 'building',
 10 |     'fence', 'vegetation', 'trunck', 'terrian', 'pole', 'traffic-sign'
 11 | ]
 12 | labels_map = {
 13 |     0: 19,  # "unlabeled"
 14 |     1: 19,  # "outlier" mapped to "unlabeled" --------------mapped
 15 |     10: 0,  # "car"
 16 |     11: 1,  # "bicycle"
 17 |     13: 4,  # "bus" mapped to "other-vehicle" --------------mapped
 18 |     15: 2,  # "motorcycle"
 19 |     16: 4,  # "on-rails" mapped to "other-vehicle" ---------mapped
 20 |     18: 3,  # "truck"
 21 |     20: 4,  # "other-vehicle"
 22 |     30: 5,  # "person"
 23 |     31: 6,  # "bicyclist"
 24 |     32: 7,  # "motorcyclist"
 25 |     40: 8,  # "road"
 26 |     44: 9,  # "parking"
 27 |     48: 10,  # "sidewalk"
 28 |     49: 11,  # "other-ground"
 29 |     50: 12,  # "building"
 30 |     51: 13,  # "fence"
 31 |     52: 19,  # "other-structure" mapped to "unlabeled" ------mapped
 32 |     60: 8,  # "lane-marking" to "road" ---------------------mapped
 33 |     70: 14,  # "vegetation"
 34 |     71: 15,  # "trunk"
 35 |     72: 16,  # "terrain"
 36 |     80: 17,  # "pole"
 37 |     81: 18,  # "traffic-sign"
 38 |     99: 19,  # "other-object" to "unlabeled" ----------------mapped
 39 |     252: 0,  # "moving-car" to "car" ------------------------mapped
 40 |     253: 6,  # "moving-bicyclist" to "bicyclist" ------------mapped
 41 |     254: 5,  # "moving-person" to "person" ------------------mapped
 42 |     255: 7,  # "moving-motorcyclist" to "motorcyclist" ------mapped
 43 |     256: 4,  # "moving-on-rails" mapped to "other-vehic------mapped
 44 |     257: 4,  # "moving-bus" mapped to "other-vehicle" -------mapped
 45 |     258: 3,  # "moving-truck" to "truck" --------------------mapped
 46 |     259: 4  # "moving-other"-vehicle to "other-vehicle"-----mapped
 47 | }
 48 | 
 49 | metainfo = dict(
 50 |     classes=class_names, seg_label_mapping=labels_map, max_label=259)
 51 | 
 52 | input_modality = dict(use_lidar=True, use_camera=False)
 53 | 
 54 | # Example to use different file client
 55 | # Method 1: simply set the data root and let the file I/O module
 56 | # automatically infer from prefix (not support LMDB and Memcache yet)
 57 | 
 58 | # data_root = 's3://openmmlab/datasets/detection3d/semantickitti/'
 59 | 
 60 | # Method 2: Use backend_args, file_client_args in versions before 1.1.0rc4
 61 | # backend_args = dict(
 62 | #     backend='petrel',
 63 | #     path_mapping=dict({
 64 | #         './data/': 's3://openmmlab/datasets/detection3d/',
 65 | #          'data/': 's3://openmmlab/datasets/detection3d/'
 66 | #      }))
 67 | backend_args = None
 68 | 
 69 | 
 70 | train_pipeline = [
 71 |     dict(type='LoadPointsFromFile', coord_type='LIDAR', load_dim=4, use_dim=4),
 72 |     dict(
 73 |         type='LoadAnnotations3D',
 74 |         with_bbox_3d=False,
 75 |         with_label_3d=False,
 76 |         with_seg_3d=True,
 77 |         seg_3d_dtype='np.int32',
 78 |         seg_offset=2**16,
 79 |         dataset_type='semantickitti'),
 80 |     dict(type='PointSegClassMapping'),
 81 |     dict(
 82 |         type='RandomChoice',
 83 |         transforms=[
 84 |             [
 85 |                 dict(
 86 |                     type='LaserMix',
 87 |                     num_areas=[3, 4, 5, 6],
 88 |                     pitch_angles=[-25, 3],
 89 |                     pre_transform=[
 90 |                         dict(
 91 |                             type='LoadPointsFromFile',
 92 |                             coord_type='LIDAR',
 93 |                             load_dim=4,
 94 |                             use_dim=4),
 95 |                         dict(
 96 |                             type='LoadAnnotations3D',
 97 |                             with_bbox_3d=False,
 98 |                             with_label_3d=False,
 99 |                             with_seg_3d=True,
100 |                             seg_3d_dtype='np.int32',
101 |                             seg_offset=2**16,
102 |                             dataset_type='semantickitti'),
103 |                         dict(type='PointSegClassMapping')
104 |                     ],
105 |                     prob=1)
106 |             ],
107 |             [
108 |                 dict(
109 |                     type='PolarMix',
110 |                     instance_classes=[0, 1, 2, 3, 4, 5, 6, 7],
111 |                     swap_ratio=0.5,
112 |                     rotate_paste_ratio=1.0,
113 |                     pre_transform=[
114 |                         dict(
115 |                             type='LoadPointsFromFile',
116 |                             coord_type='LIDAR',
117 |                             load_dim=4,
118 |                             use_dim=4),
119 |                         dict(
120 |                             type='LoadAnnotations3D',
121 |                             with_bbox_3d=False,
122 |                             with_label_3d=False,
123 |                             with_seg_3d=True,
124 |                             seg_3d_dtype='np.int32',
125 |                             seg_offset=2**16,
126 |                             dataset_type='semantickitti'),
127 |                         dict(type='PointSegClassMapping')
128 |                     ],
129 |                     prob=1)
130 |             ],
131 |         ],
132 |         prob=[0.5, 0.5]),
133 |     dict(
134 |         type='RandomFlip3D',
135 |         sync_2d=False,
136 |         flip_ratio_bev_horizontal=0.5,
137 |         flip_ratio_bev_vertical=0.5),
138 |     dict(
139 |         type='GlobalRotScaleTrans',
140 |         rot_range=[-0.78539816, 0.78539816],
141 |         scale_ratio_range=[0.95, 1.05],
142 |         translation_std=[0.1, 0.1, 0.1],
143 |     ),
144 |     dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
145 | ]
146 | 
147 | 
148 | test_pipeline = [
149 |     dict(
150 |         type='LoadPointsFromFile',
151 |         coord_type='LIDAR',
152 |         load_dim=4,
153 |         use_dim=4,
154 |         backend_args=backend_args),
155 |     dict(
156 |         type='LoadAnnotations3D',
157 |         with_bbox_3d=False,
158 |         with_label_3d=False,
159 |         with_seg_3d=True,
160 |         seg_3d_dtype='np.int32',
161 |         seg_offset=2**16,
162 |         dataset_type='semantickitti',
163 |         backend_args=backend_args),
164 |     dict(type='PointSegClassMapping', ),
165 |     dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
166 | ]
167 | # construct a pipeline for data and gt loading in show function
168 | # please keep its loading function consistent with test_pipeline (e.g. client)
169 | eval_pipeline = [
170 |     dict(
171 |         type='LoadPointsFromFile',
172 |         coord_type='LIDAR',
173 |         load_dim=4,
174 |         use_dim=4,
175 |         backend_args=backend_args),
176 |     dict(
177 |         type='LoadAnnotations3D',
178 |         with_bbox_3d=False,
179 |         with_label_3d=False,
180 |         with_seg_3d=True,
181 |         seg_3d_dtype='np.int32',
182 |         seg_offset=2**16,
183 |         dataset_type='semantickitti',
184 |         backend_args=backend_args),
185 |     dict(type='PointSegClassMapping', ),
186 |     dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask'])
187 | ]
188 | 
189 | train_dataloader = dict(
190 |     batch_size=2,
191 |     num_workers=4,
192 |     sampler=dict(type='DefaultSampler', shuffle=True),
193 |     dataset=dict(
194 |         type='RepeatDataset',
195 |         times=1,
196 |         dataset=dict(
197 |             type=dataset_type,
198 |             data_root=data_root,
199 |             ann_file='semantickitti_infos_train.pkl',
200 |             pipeline=train_pipeline,
201 |             metainfo=metainfo,
202 |             modality=input_modality,
203 |             ignore_index=19,
204 |             backend_args=backend_args)),
205 | )
206 | 
207 | test_dataloader = dict(
208 |     batch_size=1,
209 |     num_workers=1,
210 |     sampler=dict(type='DefaultSampler', shuffle=False),
211 |     dataset=dict(
212 |         type='RepeatDataset',
213 |         times=1,
214 |         dataset=dict(
215 |             type=dataset_type,
216 |             data_root=data_root,
217 |             ann_file='semantickitti_infos_val.pkl',
218 |             pipeline=test_pipeline,
219 |             metainfo=metainfo,
220 |             modality=input_modality,
221 |             ignore_index=19,
222 |             test_mode=True,
223 |             backend_args=backend_args)),
224 | )
225 | 
226 | val_dataloader = test_dataloader
227 | 
228 | val_evaluator = dict(type='SegMetric')
229 | test_evaluator = val_evaluator
230 | 
231 | vis_backends = [dict(type='LocalVisBackend')]
232 | visualizer = dict(
233 |     type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')


--------------------------------------------------------------------------------
/configs/_base_/datasets/semantickitti_panoptic_lpmix.py:
--------------------------------------------------------------------------------
  1 | # For SemanticKitti we usually do 19-class segmentation.
  2 | # For labels_map we follow the uniform format of MMDetection & MMSegmentation
  3 | # i.e. we consider the unlabeled class as the last one, which is different
  4 | # from the original implementation of some methods e.g. Cylinder3D.
  5 | dataset_type = '_SemanticKittiDataset'
  6 | data_root = 'data/semantickitti/'
  7 | class_names = [
  8 |     'car', 'bicycle', 'motorcycle', 'truck', 'bus', 'person', 'bicyclist',
  9 |     'motorcyclist', 'road', 'parking', 'sidewalk', 'other-ground', 'building',
 10 |     'fence', 'vegetation', 'trunck', 'terrian', 'pole', 'traffic-sign'
 11 | ]
 12 | labels_map = {
 13 |     0: 19,  # "unlabeled"
 14 |     1: 19,  # "outlier" mapped to "unlabeled" --------------mapped
 15 |     10: 0,  # "car"
 16 |     11: 1,  # "bicycle"
 17 |     13: 4,  # "bus" mapped to "other-vehicle" --------------mapped
 18 |     15: 2,  # "motorcycle"
 19 |     16: 4,  # "on-rails" mapped to "other-vehicle" ---------mapped
 20 |     18: 3,  # "truck"
 21 |     20: 4,  # "other-vehicle"
 22 |     30: 5,  # "person"
 23 |     31: 6,  # "bicyclist"
 24 |     32: 7,  # "motorcyclist"
 25 |     40: 8,  # "road"
 26 |     44: 9,  # "parking"
 27 |     48: 10,  # "sidewalk"
 28 |     49: 11,  # "other-ground"
 29 |     50: 12,  # "building"
 30 |     51: 13,  # "fence"
 31 |     52: 19,  # "other-structure" mapped to "unlabeled" ------mapped
 32 |     60: 8,  # "lane-marking" to "road" ---------------------mapped
 33 |     70: 14,  # "vegetation"
 34 |     71: 15,  # "trunk"
 35 |     72: 16,  # "terrain"
 36 |     80: 17,  # "pole"
 37 |     81: 18,  # "traffic-sign"
 38 |     99: 19,  # "other-object" to "unlabeled" ----------------mapped
 39 |     252: 0,  # "moving-car" to "car" ------------------------mapped
 40 |     253: 6,  # "moving-bicyclist" to "bicyclist" ------------mapped
 41 |     254: 5,  # "moving-person" to "person" ------------------mapped
 42 |     255: 7,  # "moving-motorcyclist" to "motorcyclist" ------mapped
 43 |     256: 4,  # "moving-on-rails" mapped to "other-vehic------mapped
 44 |     257: 4,  # "moving-bus" mapped to "other-vehicle" -------mapped
 45 |     258: 3,  # "moving-truck" to "truck" --------------------mapped
 46 |     259: 4  # "moving-other"-vehicle to "other-vehicle"-----mapped
 47 | }
 48 | 
 49 | learning_map_inv = { # inverse of previous map
 50 |   0: 10,      # "unlabeled", and others ignored
 51 |   1: 11,     # "car"
 52 |   2: 15,     # "bicycle"
 53 |   3: 18,     # "motorcycle"
 54 |   4: 20,     # "truck"
 55 |   5: 30,     # "other-vehicle"
 56 |   6: 31,     # "person"
 57 |   7: 32,     # "bicyclist"
 58 |   8: 40,     # "motorcyclist"
 59 |   9: 44,     # "road"
 60 |   10: 48,    # "parking"
 61 |   11: 49,    # "sidewalk"
 62 |   12: 50,    # "other-ground"
 63 |   13: 51,    # "building"
 64 |   14: 70,    # "fence"
 65 |   15: 71,    # "vegetation"
 66 |   16: 72,    # "trunk"
 67 |   17: 80,    # "terrain"
 68 |   18: 81,    # "pole"
 69 |   19: 0    # "traffic-sign"
 70 | }
 71 | 
 72 | metainfo = dict(
 73 |     classes=class_names, seg_label_mapping=labels_map, max_label=259)
 74 | 
 75 | input_modality = dict(use_lidar=True, use_camera=False)
 76 | 
 77 | # Example to use different file client
 78 | # Method 1: simply set the data root and let the file I/O module
 79 | # automatically infer from prefix (not support LMDB and Memcache yet)
 80 | 
 81 | # data_root = 's3://openmmlab/datasets/detection3d/semantickitti/'
 82 | 
 83 | # Method 2: Use backend_args, file_client_args in versions before 1.1.0rc4
 84 | # backend_args = dict(
 85 | #     backend='petrel',
 86 | #     path_mapping=dict({
 87 | #         './data/': 's3://openmmlab/datasets/detection3d/',
 88 | #          'data/': 's3://openmmlab/datasets/detection3d/'
 89 | #      }))
 90 | backend_args = None
 91 | 
 92 | pre_transform = [
 93 |     dict(
 94 |         type='LoadPointsFromFile',
 95 |         coord_type='LIDAR',
 96 |         load_dim=4,
 97 |         use_dim=4,
 98 |         backend_args=backend_args),
 99 |     dict(
100 |         type='_LoadAnnotations3D',
101 |         with_bbox_3d=False,
102 |         with_label_3d=False,
103 |         with_panoptic_3d=True,
104 |         seg_3d_dtype='np.int32',
105 |         seg_offset=2**16,
106 |         dataset_type='semantickitti',
107 |         backend_args=backend_args),
108 |     dict(type='PointSegClassMapping', )]
109 | 
110 | train_pipeline = [
111 |     dict(
112 |         type='LoadPointsFromFile',
113 |         coord_type='LIDAR',
114 |         load_dim=4,
115 |         use_dim=4,
116 |         backend_args=backend_args),
117 |     dict(
118 |         type='_LoadAnnotations3D',
119 |         with_bbox_3d=False,
120 |         with_label_3d=False,
121 |         with_panoptic_3d=True,
122 |         seg_3d_dtype='np.int32',
123 |         seg_offset=2**16,
124 |         dataset_type='semantickitti',
125 |         backend_args=backend_args),
126 |     dict(type='PointSegClassMapping', ),
127 |     dict(
128 |         type='RandomChoice',
129 |         transforms=[
130 |             [
131 |                 dict(
132 |                     type='_LaserMix',
133 |                     num_areas=[3, 4, 5, 6],
134 |                     pitch_angles=[-25, 3],
135 |                     pre_transform=[
136 |                         dict(
137 |                             type='LoadPointsFromFile',
138 |                             coord_type='LIDAR',
139 |                             load_dim=4,
140 |                             use_dim=4),
141 |                         dict(
142 |                             type='_LoadAnnotations3D',
143 |                             with_bbox_3d=False,
144 |                             with_label_3d=False,
145 |                             with_panoptic_3d=True,
146 |                             seg_3d_dtype='np.int32',
147 |                             seg_offset=2**16,
148 |                             dataset_type='semantickitti'),
149 |                         dict(type='PointSegClassMapping')
150 |                     ],
151 |                     prob=0.5)
152 |             ],
153 |             [
154 |                 dict(
155 |                     type='_PolarMix',
156 |                     instance_classes=[0, 1, 2, 3, 4, 5, 6, 7],
157 |                     swap_ratio=0.5,
158 |                     rotate_paste_ratio=1.0,
159 |                     pre_transform=[
160 |                         dict(
161 |                             type='LoadPointsFromFile',
162 |                             coord_type='LIDAR',
163 |                             load_dim=4,
164 |                             use_dim=4),
165 |                         dict(
166 |                             type='_LoadAnnotations3D',
167 |                             with_bbox_3d=False,
168 |                             with_label_3d=False,
169 |                             with_panoptic_3d=True,  
170 |                             seg_3d_dtype='np.int32',
171 |                             seg_offset=2**16,
172 |                             dataset_type='semantickitti'),
173 |                         dict(type='PointSegClassMapping')
174 |                     ],
175 |                     prob=0.5)
176 |             ],
177 |         ],
178 |         prob=[0.2, 0.8]),
179 |     dict(
180 |         type='RandomFlip3D',
181 |         sync_2d=False,
182 |         flip_ratio_bev_horizontal=0.5,
183 |         flip_ratio_bev_vertical=0.5),
184 |     dict(
185 |         type='GlobalRotScaleTrans',
186 |         rot_range=[-0.78539816, 0.78539816],
187 |         scale_ratio_range=[0.95, 1.05],
188 |         translation_std=[0.1, 0.1, 0.1],
189 |     ),
190 |     dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask', 'pts_instance_mask'])
191 | ]
192 | 
193 | test_pipeline = [
194 |     dict(
195 |         type='LoadPointsFromFile',
196 |         coord_type='LIDAR',
197 |         load_dim=4,
198 |         use_dim=4,
199 |         backend_args=backend_args),
200 |     dict(
201 |         type='_LoadAnnotations3D',
202 |         with_bbox_3d=False,
203 |         with_label_3d=False,
204 |         with_panoptic_3d=True,
205 |         seg_3d_dtype='np.int32',
206 |         seg_offset=2**16,
207 |         dataset_type='semantickitti',
208 |         backend_args=backend_args),
209 |     dict(type='PointSegClassMapping', ),
210 |     dict(type='Pack3DDetInputs', keys=['points', 'pts_semantic_mask', 'pts_instance_mask'])
211 | ]
212 | 
213 | 
214 | train_dataloader = dict(
215 |     batch_size=4,
216 |     num_workers=4,
217 |     sampler=dict(type='DefaultSampler', shuffle=True),
218 |     dataset=dict(
219 |         type='RepeatDataset',
220 |         times=1,
221 |         dataset=dict(
222 |             type=dataset_type,
223 |             data_root=data_root,
224 |             data_prefix = dict(
225 |                      pts='',
226 |                      img='',
227 |                      pts_instance_mask='',
228 |                      pts_semantic_mask='',
229 |                      pts_panoptic_mask='',),
230 |             ann_file='semantickitti_infos_train.pkl',
231 |             pipeline=train_pipeline,
232 |             metainfo=metainfo,
233 |             modality=input_modality,
234 |             ignore_index=19,
235 |             backend_args=backend_args)),
236 | )
237 | 
238 | test_dataloader = dict(
239 |     batch_size=1,
240 |     num_workers=1,
241 |     sampler=dict(type='DefaultSampler', shuffle=False),
242 |     dataset=dict(
243 |         type='RepeatDataset',
244 |         times=1,
245 |         dataset=dict(
246 |             type=dataset_type,
247 |             data_root=data_root,
248 |             data_prefix = dict(
249 |                      pts='',
250 |                      img='',
251 |                      pts_instance_mask='',
252 |                      pts_semantic_mask='',
253 |                      pts_panoptic_mask='',),
254 |             ann_file='semantickitti_infos_val.pkl',
255 |             pipeline=test_pipeline,
256 |             metainfo=metainfo,
257 |             modality=input_modality,
258 |             ignore_index=19,
259 |             test_mode=True,
260 |             backend_args=backend_args)),
261 | )
262 | 
263 | val_dataloader = test_dataloader
264 | 
265 | val_evaluator = dict(type='_PanopticSegMetric',
266 |                     thing_class_inds=[0,1,2,3,4,5,6,7],
267 |                     stuff_class_inds=[8,9,10,11,12,13,14,15,16,17,18],
268 |                     min_num_points=50,
269 |                     id_offset = 2**16,
270 |                     dataset_type='semantickitti',
271 |                     learning_map_inv=learning_map_inv)
272 | test_evaluator = val_evaluator
273 | 
274 | vis_backends = [dict(type='LocalVisBackend')]
275 | visualizer = dict(
276 |     type='Det3DLocalVisualizer', vis_backends=vis_backends, name='visualizer')
277 | 


--------------------------------------------------------------------------------
/configs/_base_/default_runtime.py:
--------------------------------------------------------------------------------
 1 | default_scope = 'mmdet3d'
 2 | 
 3 | default_hooks = dict(
 4 |     timer=dict(type='IterTimerHook'),
 5 |     logger=dict(type='LoggerHook', interval=50),
 6 |     param_scheduler=dict(type='ParamSchedulerHook'),
 7 |     checkpoint=dict(type='CheckpointHook', interval=-1),
 8 |     sampler_seed=dict(type='DistSamplerSeedHook'),
 9 |     visualization=dict(type='Det3DVisualizationHook'))
10 | 
11 | env_cfg = dict(
12 |     cudnn_benchmark=False,
13 |     mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
14 |     dist_cfg=dict(backend='nccl'),
15 | )
16 | 
17 | log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)
18 | 
19 | log_level = 'INFO'
20 | load_from = None
21 | resume = False
22 | 
23 | # TODO: support auto scaling lr
24 | 


--------------------------------------------------------------------------------
/configs/_base_/models/cylinder3d.py:
--------------------------------------------------------------------------------
 1 | grid_shape = [480, 360, 32]
 2 | model = dict(
 3 |     type='Cylinder3D',
 4 |     data_preprocessor=dict(
 5 |         type='Det3DDataPreprocessor',
 6 |         voxel=True,
 7 |         voxel_type='cylindrical',
 8 |         voxel_layer=dict(
 9 |             grid_shape=grid_shape,
10 |             point_cloud_range=[0, -3.14159265359, -4, 50, 3.14159265359, 2],
11 |             max_num_points=-1,
12 |             max_voxels=-1,
13 |         ),
14 |     ),
15 |     voxel_encoder=dict(
16 |         type='SegVFE',
17 |         feat_channels=[64, 128, 256, 256],
18 |         in_channels=6,
19 |         with_voxel_center=True,
20 |         feat_compression=16,
21 |         return_point_feats=False),
22 |     backbone=dict(
23 |         type='Asymm3DSpconv',
24 |         grid_size=grid_shape,
25 |         input_channels=16,
26 |         base_channels=32,
27 |         norm_cfg=dict(type='BN1d', eps=1e-5, momentum=0.1)),
28 |     decode_head=dict(
29 |         type='Cylinder3DHead',
30 |         channels=128,
31 |         num_classes=20,
32 |         loss_ce=dict(
33 |             type='mmdet.CrossEntropyLoss',
34 |             use_sigmoid=False,
35 |             class_weight=None,
36 |             loss_weight=1.0),
37 |         loss_lovasz=dict(type='LovaszLoss', loss_weight=1.0, reduction='none'),
38 |     ),
39 |     train_cfg=None,
40 |     test_cfg=dict(mode='whole'),
41 | )
42 | 


--------------------------------------------------------------------------------
/configs/_base_/models/p3former.py:
--------------------------------------------------------------------------------
 1 | grid_shape = [480, 360, 32]
 2 | model = dict(
 3 |     type='_P3Former',
 4 |     data_preprocessor=dict(
 5 |         type='_Det3DDataPreprocessor',
 6 |         voxel=True,
 7 |         voxel_type='cylindrical',
 8 |         voxel_layer=dict(
 9 |             grid_shape=grid_shape,
10 |             point_cloud_range=[0, -3.14159265359, -4, 50, 3.14159265359, 2],
11 |             max_num_points=-1,
12 |             max_voxels=-1,
13 |         ),
14 |     ),
15 |     voxel_encoder=dict(
16 |         type='SegVFE',
17 |         feat_channels=[64, 128, 256, 256],
18 |         in_channels=6,
19 |         with_voxel_center=True,
20 |         feat_compression=16,
21 |         return_point_feats=False),
22 |     backbone=dict(
23 |         type='_Asymm3DSpconv',
24 |         grid_size=grid_shape,
25 |         input_channels=16,
26 |         base_channels=32,
27 |         norm_cfg=dict(type='BN1d', eps=1e-5, momentum=0.1)),
28 |     decode_head=dict(
29 |         type='_P3FormerHead',
30 |         num_classes=20,
31 |         num_queries=128,
32 |         embed_dims=128,
33 |         point_cloud_range=[0, -3.14159265359, -4, 50, 3.14159265359, 2],
34 |         assigner_zero_layer_cfg=dict(
35 |                 type='mmdet.HungarianAssigner',
36 |                 match_costs=[
37 |                         dict(type='mmdet.FocalLossCost', weight=1.0, binary_input=True, gamma=2.0, alpha=0.25),
38 |                         dict(type='mmdet.DiceCost', weight=2.0, pred_act=True),
39 |                     ]),
40 |         assigner_cfg=dict(
41 |                 type='mmdet.HungarianAssigner',
42 |                 match_costs=[
43 |                         dict(type='mmdet.FocalLossCost', gamma=4.0,alpha=0.25,weight=1.0),
44 |                         dict(type='mmdet.FocalLossCost', weight=1.0, binary_input=True, gamma=2.0, alpha=0.25),
45 |                         dict(type='mmdet.DiceCost', weight=2.0, pred_act=True),
46 |                     ]),
47 |         sampler_cfg=dict(type='_MaskPseudoSampler'),
48 |         loss_mask=dict(
49 |             type='mmdet.FocalLoss',
50 |             use_sigmoid=True,
51 |             gamma=2.0,
52 |             alpha=0.25,
53 |             reduction='mean',
54 |             loss_weight=1.0),
55 |         loss_dice=dict(type='mmdet.DiceLoss', loss_weight=2.0),
56 |         loss_cls=dict(
57 |             type='mmdet.FocalLoss',
58 |             use_sigmoid=True,
59 |             gamma=4.0,
60 |             alpha=0.25,
61 |             loss_weight=1.0),
62 |     ),
63 |     train_cfg=None,
64 |     test_cfg=dict(mode='whole'),
65 | )
66 | 


--------------------------------------------------------------------------------
/configs/cylinder3d/cylinder3d_8xb2_3x_semantickitti.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '../_base_/datasets/semantickitti_lpmix.py', '../_base_/models/cylinder3d.py',
 3 |     '../_base_/default_runtime.py'
 4 | ]
 5 | 
 6 | # optimizer
 7 | # This schedule is mainly used by models on nuScenes dataset
 8 | lr = 0.008
 9 | optim_wrapper = dict(
10 |     type='OptimWrapper',
11 |     optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01),
12 |     # max_norm=10 is better for SECOND
13 |     clip_grad=dict(max_norm=10, norm_type=2))
14 | 
15 | # training schedule for 2x
16 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1)
17 | val_cfg = dict(type='ValLoop')
18 | test_cfg = dict(type='TestLoop')
19 | 
20 | # learning rate
21 | param_scheduler = [
22 |     dict(
23 |         type='MultiStepLR',
24 |         begin=0,
25 |         end=36,
26 |         by_epoch=True,
27 |         milestones=[24, 32],
28 |         gamma=0.1)
29 | ]


--------------------------------------------------------------------------------
/configs/p3former/p3former_8xb2_3x_semantickitti.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '../_base_/datasets/semantickitti_panoptic_lpmix.py', '../_base_/models/p3former.py',
 3 |     '../_base_/default_runtime.py'
 4 | ]
 5 | 
 6 | # optimizer
 7 | # This schedule is mainly used by models on nuScenes dataset
 8 | 
 9 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=40, val_interval=1)
10 | val_cfg = dict(type='ValLoop')
11 | test_cfg = dict(type='TestLoop')
12 | 
13 | model = dict(
14 |     voxel_encoder=dict(
15 |         feat_channels=[64, 128, 256, 256],
16 |         in_channels=6,
17 |         with_voxel_center=True,
18 |         feat_compression=16,
19 |         return_point_feats=False),
20 |     backbone=dict(
21 |         input_channels=16,
22 |         base_channels=32,
23 |         more_conv=True,
24 |         out_channels=256),
25 |     decode_head=dict(
26 |         num_decoder_layers=6,
27 |         num_queries=128,
28 |         embed_dims=256,
29 |         cls_channels=(256, 256, 20),
30 |         mask_channels=(256, 256, 256, 256, 256),
31 |         thing_class=[0,1,2,3,4,5,6,7],
32 |         stuff_class=[8,9,10,11,12,13,14,15,16,17,18],
33 |         ignore_index=19
34 |     ))
35 | 
36 | 
37 | lr = 0.0008
38 | optim_wrapper = dict(
39 |     type='OptimWrapper',
40 |     optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01))
41 | 
42 | 
43 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1)
44 | val_cfg = dict(type='ValLoop')
45 | test_cfg = dict(type='TestLoop')
46 | 
47 | param_scheduler = [
48 |     dict(
49 |         type='MultiStepLR',
50 |         begin=0,
51 |         end=36,
52 |         by_epoch=True,
53 |         milestones=[24, 32],
54 |         gamma=0.2)
55 | ]
56 | 
57 | train_dataloader = dict(batch_size=2, )
58 | 
59 | default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5))
60 | 
61 | custom_imports = dict(
62 |     imports=[
63 |         'p3former.backbones.cylinder3d',
64 |         'p3former.data_preprocessors.data_preprocessor',
65 |         'p3former.decode_heads.p3former_head',
66 |         'p3former.segmentors.p3former',
67 |         'p3former.task_modules.samplers.mask_pseduo_sampler',
68 |         'evaluation.metrics.panoptic_seg_metric',
69 |         'datasets.semantickitti_dataset',
70 |         'datasets.transforms.loading',
71 |         'datasets.transforms.transforms_3d',
72 |     ],
73 |     allow_failed_imports=False)
74 | 


--------------------------------------------------------------------------------
/configs/p3former/p3former_8xb2_3x_semantickitti_submit.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '../_base_/datasets/semantickitti_panoptic_lpmix.py', '../_base_/models/p3former.py',
 3 |     '../_base_/default_runtime.py'
 4 | ]
 5 | 
 6 | # optimizer
 7 | # This schedule is mainly used by models on nuScenes dataset
 8 | 
 9 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=40, val_interval=1)
10 | val_cfg = dict(type='ValLoop')
11 | test_cfg = dict(type='TestLoop')
12 | 
13 | model = dict(
14 |     voxel_encoder=dict(
15 |         feat_channels=[64, 128, 256, 256],
16 |         in_channels=6,
17 |         with_voxel_center=True,
18 |         feat_compression=16,
19 |         return_point_feats=False),
20 |     backbone=dict(
21 |         input_channels=16,
22 |         base_channels=32,
23 |         more_conv=True,
24 |         out_channels=256),
25 |     decode_head=dict(
26 |         num_decoder_layers=6,
27 |         num_queries=128,
28 |         embed_dims=256,
29 |         cls_channels=(256, 256, 20),
30 |         mask_channels=(256, 256, 256, 256, 256),
31 |         thing_class=[0,1,2,3,4,5,6,7],
32 |         stuff_class=[8,9,10,11,12,13,14,15,16,17,18],
33 |         ignore_index=19
34 |     ))
35 | 
36 | 
37 | lr = 0.0008
38 | optim_wrapper = dict(
39 |     type='OptimWrapper',
40 |     optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01))
41 | 
42 | 
43 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1)
44 | val_cfg = dict(type='ValLoop')
45 | test_cfg = dict(type='TestLoop')
46 | 
47 | param_scheduler = [
48 |     dict(
49 |         type='MultiStepLR',
50 |         begin=0,
51 |         end=36,
52 |         by_epoch=True,
53 |         milestones=[24, 32],
54 |         gamma=0.2)
55 | ]
56 | 
57 | train_dataloader = dict(batch_size=2, )
58 | 
59 | test_pipeline = [
60 |     dict(
61 |         type='LoadPointsFromFile',
62 |         coord_type='LIDAR',
63 |         load_dim=4,
64 |         use_dim=4,
65 |         backend_args=None),
66 |     dict(type='_Pack3DDetInputs', keys=['points', 'lidar_path'])
67 | ]
68 | 
69 | test_dataloader = dict(
70 |     batch_size=1,
71 |     num_workers=1,
72 |     sampler=dict(type='DefaultSampler', shuffle=False),
73 |     dataset=dict(
74 |         type='RepeatDataset',
75 |         times=1,
76 |         dataset=dict(
77 |             pipeline=test_pipeline,
78 |             ann_file='semantickitti_infos_mini.pkl'))
79 | )
80 | 
81 | test_evaluator = dict(submission_prefix='semantickitti_submission')
82 | 
83 | default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5))
84 | 
85 | custom_imports = dict(
86 |     imports=[
87 |         'p3former.backbones.cylinder3d',
88 |         'p3former.data_preprocessors.data_preprocessor',
89 |         'p3former.decode_heads.p3former_head',
90 |         'p3former.segmentors.p3former',
91 |         'p3former.task_modules.samplers.mask_pseduo_sampler',
92 |         'evaluation.metrics.panoptic_seg_metric',
93 |         'datasets.semantickitti_dataset',
94 |         'datasets.transforms.loading',
95 |         'datasets.transforms.transforms_3d',
96 |         'datasets.transforms.formating',
97 |     ],
98 |     allow_failed_imports=False)
99 | 


--------------------------------------------------------------------------------
/configs/p3former/p3former_8xb2_3x_semantickitti_trainval.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '../_base_/datasets/semantickitti_panoptic_lpmix.py', '../_base_/models/p3former.py',
 3 |     '../_base_/default_runtime.py'
 4 | ]
 5 | 
 6 | # optimizer
 7 | # This schedule is mainly used by models on nuScenes dataset
 8 | 
 9 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=40, val_interval=1)
10 | val_cfg = dict(type='ValLoop')
11 | test_cfg = dict(type='TestLoop')
12 | 
13 | model = dict(
14 |     voxel_encoder=dict(
15 |         feat_channels=[64, 128, 256, 256],
16 |         in_channels=6,
17 |         with_voxel_center=True,
18 |         feat_compression=16,
19 |         return_point_feats=False),
20 |     backbone=dict(
21 |         input_channels=16,
22 |         base_channels=32,
23 |         more_conv=True,
24 |         out_channels=256),
25 |     decode_head=dict(
26 |         num_decoder_layers=6,
27 |         num_queries=128,
28 |         embed_dims=256,
29 |         cls_channels=(256, 256, 20),
30 |         mask_channels=(256, 256, 256, 256, 256),
31 |         thing_class=[0,1,2,3,4,5,6,7],
32 |         stuff_class=[8,9,10,11,12,13,14,15,16,17,18],
33 |         ignore_index=19
34 |     ))
35 | 
36 | 
37 | lr = 0.0008
38 | optim_wrapper = dict(
39 |     type='OptimWrapper',
40 |     optimizer=dict(type='AdamW', lr=lr, weight_decay=0.01))
41 | 
42 | 
43 | train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=36, val_interval=1)
44 | val_cfg = dict(type='ValLoop')
45 | test_cfg = dict(type='TestLoop')
46 | 
47 | param_scheduler = [
48 |     dict(
49 |         type='MultiStepLR',
50 |         begin=0,
51 |         end=36,
52 |         by_epoch=True,
53 |         milestones=[24, 32],
54 |         gamma=0.2)
55 | ]
56 | 
57 | train_dataloader = dict(batch_size=2, dataset=dict(dataset=dict(ann_file='semantickitti_infos_trainval.pkl')))
58 | 
59 | default_hooks = dict(checkpoint=dict(type='CheckpointHook', interval=5))
60 | 
61 | custom_imports = dict(
62 |     imports=[
63 |         'p3former.backbones.cylinder3d',
64 |         'p3former.data_preprocessors.data_preprocessor',
65 |         'p3former.decode_heads.p3former_head',
66 |         'p3former.segmentors.p3former',
67 |         'p3former.task_modules.samplers.mask_pseduo_sampler',
68 |         'evaluation.metrics.panoptic_seg_metric',
69 |         'datasets.semantickitti_dataset',
70 |         'datasets.transforms.loading',
71 |         'datasets.transforms.transforms_3d',
72 |     ],
73 |     allow_failed_imports=False)
74 | 


--------------------------------------------------------------------------------
/datasets/semantickitti_dataset.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | # Modified from mmdetection3d.
  3 | from typing import Callable, List, Optional, Union
  4 | 
  5 | import numpy as np
  6 | 
  7 | from mmdet3d.registry import DATASETS
  8 | from mmdet3d.datasets.seg3d_dataset import Seg3DDataset
  9 | 
 10 | from os import path as osp
 11 | 
 12 | @DATASETS.register_module(force=True)
 13 | class _SemanticKittiDataset(Seg3DDataset):
 14 |     r"""SemanticKitti Dataset.
 15 | 
 16 |     This class serves as the API for experiments on the SemanticKITTI Dataset
 17 |     Please refer to <http://www.semantic-kitti.org/dataset.html>`_
 18 |     for data downloading
 19 | 
 20 |     Args:
 21 |         data_root (str, optional): Path of dataset root. Defaults to None.
 22 |         ann_file (str): Path of annotation file. Defaults to ''.
 23 |         metainfo (dict, optional): Meta information for dataset, such as class
 24 |             information. Defaults to None.
 25 |         data_prefix (dict): Prefix for training data. Defaults to
 26 |             dict(pts='',
 27 |                  img='',
 28 |                  pts_instance_mask='',
 29 |                  pts_semantic_mask='').
 30 |         pipeline (List[dict]): Pipeline used for data processing.
 31 |             Defaults to [].
 32 |         modality (dict): Modality to specify the sensor data used as input,
 33 |             it usually has following keys:
 34 | 
 35 |                 - use_camera: bool
 36 |                 - use_lidar: bool
 37 |             Defaults to dict(use_lidar=True, use_camera=False).
 38 |         ignore_index (int, optional): The label index to be ignored, e.g.
 39 |             unannotated points. If None is given, set to len(self.classes) to
 40 |             be consistent with PointSegClassMapping function in pipeline.
 41 |             Defaults to None.
 42 |         scene_idxs (np.ndarray or str, optional): Precomputed index to load
 43 |             data. For scenes with many points, we may sample it several times.
 44 |             Defaults to None.
 45 |         test_mode (bool): Whether the dataset is in test mode.
 46 |             Defaults to False.
 47 |     """
 48 |     METAINFO = {
 49 |         'classes': ('car', 'bicycle', 'motorcycle', 'truck', 'bus', 'person',
 50 |                     'bicyclist', 'motorcyclist', 'road', 'parking', 'sidewalk',
 51 |                     'other-ground', 'building', 'fence', 'vegetation',
 52 |                     'trunck', 'terrian', 'pole', 'traffic-sign'),
 53 |         'palette': [[100, 150, 245], [100, 230, 245], [30, 60, 150],
 54 |                     [80, 30, 180], [100, 80, 250], [155, 30, 30],
 55 |                     [255, 40, 200], [150, 30, 90], [255, 0, 255],
 56 |                     [255, 150, 255], [75, 0, 75], [175, 0, 75], [255, 200, 0],
 57 |                     [255, 120, 50], [0, 175, 0], [135, 60, 0], [150, 240, 80],
 58 |                     [255, 240, 150], [255, 0, 0]],
 59 |         'seg_valid_class_ids':
 60 |         tuple(range(19)),
 61 |         'seg_all_class_ids':
 62 |         tuple(range(19)),
 63 |     }
 64 | 
 65 |     def __init__(self,
 66 |                  data_root: Optional[str] = None,
 67 |                  ann_file: str = '',
 68 |                  metainfo: Optional[dict] = None,
 69 |                  data_prefix: dict = dict(
 70 |                      pts='',
 71 |                      img='',
 72 |                      pts_instance_mask='',
 73 |                      pts_semantic_mask=''),
 74 |                  pipeline: List[Union[dict, Callable]] = [],
 75 |                  modality: dict = dict(use_lidar=True, use_camera=False),
 76 |                  ignore_index: Optional[int] = None,
 77 |                  scene_idxs: Optional[Union[str, np.ndarray]] = None,
 78 |                  test_mode: bool = False,
 79 |                  **kwargs) -> None:
 80 | 
 81 |         super().__init__(
 82 |             data_root=data_root,
 83 |             ann_file=ann_file,
 84 |             metainfo=metainfo,
 85 |             data_prefix=data_prefix,
 86 |             pipeline=pipeline,
 87 |             modality=modality,
 88 |             ignore_index=ignore_index,
 89 |             scene_idxs=scene_idxs,
 90 |             test_mode=test_mode,
 91 |             **kwargs)
 92 | 
 93 |     def get_seg_label_mapping(self, metainfo):
 94 |         seg_label_mapping = np.zeros(metainfo['max_label'] + 1, dtype=np.int64)
 95 |         for idx in metainfo['seg_label_mapping']:
 96 |             seg_label_mapping[idx] = metainfo['seg_label_mapping'][idx]
 97 |         return seg_label_mapping
 98 | 
 99 |     def parse_data_info(self, info: dict) -> dict:
100 |         """Process the raw data info.
101 | 
102 |         Convert all relative path of needed modality data file to
103 |         the absolute path. And process
104 |         the `instances` field to `ann_info` in training stage.
105 | 
106 |         Args:
107 |             info (dict): Raw info dict.
108 | 
109 |         Returns:
110 |             dict: Has `ann_info` in training stage. And
111 |             all path has been converted to absolute path.
112 |         """
113 |         info = super().parse_data_info(info)
114 |         if 'pts_panoptic_mask_path' in info:
115 |             info['pts_panoptic_mask_path'] = \
116 |                 osp.join(self.data_prefix.get('pts_panoptic_mask', ''),
117 |                          info['pts_panoptic_mask_path'])
118 | 
119 |         return info
120 | 


--------------------------------------------------------------------------------
/datasets/transforms/__pycache__/loading.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenRobotLab/P3Former/2b8faa2b0b168d78713092b223a4a6e8ba63e44e/datasets/transforms/__pycache__/loading.cpython-39.pyc


--------------------------------------------------------------------------------
/datasets/transforms/__pycache__/transforms_3d.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/OpenRobotLab/P3Former/2b8faa2b0b168d78713092b223a4a6e8ba63e44e/datasets/transforms/__pycache__/transforms_3d.cpython-39.pyc


--------------------------------------------------------------------------------
/datasets/transforms/formating.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | from typing import List, Sequence, Union
  3 | 
  4 | import mmengine
  5 | import numpy as np
  6 | import torch
  7 | from mmcv import BaseTransform
  8 | from mmengine.structures import InstanceData
  9 | from numpy import dtype
 10 | 
 11 | from mmdet3d.registry import TRANSFORMS
 12 | from mmdet3d.structures import BaseInstance3DBoxes, Det3DDataSample, PointData
 13 | from mmdet3d.structures.points import BasePoints
 14 | 
 15 | 
 16 | def to_tensor(
 17 |     data: Union[torch.Tensor, np.ndarray, Sequence, int,
 18 |                 float]) -> torch.Tensor:
 19 |     """Convert objects of various python types to :obj:`torch.Tensor`.
 20 | 
 21 |     Supported types are: :class:`numpy.ndarray`, :class:`torch.Tensor`,
 22 |     :class:`Sequence`, :class:`int` and :class:`float`.
 23 | 
 24 |     Args:
 25 |         data (torch.Tensor | numpy.ndarray | Sequence | int | float): Data to
 26 |             be converted.
 27 | 
 28 |     Returns:
 29 |         torch.Tensor: the converted data.
 30 |     """
 31 | 
 32 |     if isinstance(data, torch.Tensor):
 33 |         return data
 34 |     elif isinstance(data, np.ndarray):
 35 |         if data.dtype is dtype('float64'):
 36 |             data = data.astype(np.float32)
 37 |         return torch.from_numpy(data)
 38 |     elif isinstance(data, Sequence) and not mmengine.is_str(data):
 39 |         return torch.tensor(data)
 40 |     elif isinstance(data, int):
 41 |         return torch.LongTensor([data])
 42 |     elif isinstance(data, float):
 43 |         return torch.FloatTensor([data])
 44 |     else:
 45 |         raise TypeError(f'type {type(data)} cannot be converted to tensor.')
 46 | 
 47 | 
 48 | @TRANSFORMS.register_module()
 49 | class _Pack3DDetInputs(BaseTransform):
 50 |     INPUTS_KEYS = ['points', 'img']
 51 |     INSTANCEDATA_3D_KEYS = [
 52 |         'gt_bboxes_3d', 'gt_labels_3d', 'attr_labels', 'depths', 'centers_2d'
 53 |     ]
 54 |     INSTANCEDATA_2D_KEYS = [
 55 |         'gt_bboxes',
 56 |         'gt_bboxes_labels',
 57 |     ]
 58 | 
 59 |     SEG_KEYS = [
 60 |         'gt_seg_map', 'pts_instance_mask', 'pts_semantic_mask',
 61 |         'gt_semantic_seg'
 62 |     ]
 63 | 
 64 |     def __init__(
 65 |         self,
 66 |         keys: tuple,
 67 |         meta_keys: tuple = ('img_path', 'ori_shape', 'img_shape', 'lidar2img',
 68 |                             'depth2img', 'cam2img', 'pad_shape',
 69 |                             'scale_factor', 'flip', 'pcd_horizontal_flip',
 70 |                             'pcd_vertical_flip', 'box_mode_3d', 'box_type_3d',
 71 |                             'img_norm_cfg', 'num_pts_feats', 'pcd_trans',
 72 |                             'sample_idx', 'pcd_scale_factor', 'pcd_rotation',
 73 |                             'pcd_rotation_angle', 'lidar_path',
 74 |                             'transformation_3d_flow', 'trans_mat',
 75 |                             'affine_aug', 'sweep_img_metas', 'ori_cam2img',
 76 |                             'cam2global', 'crop_offset', 'img_crop_offset',
 77 |                             'resize_img_shape', 'lidar2cam', 'ori_lidar2img',
 78 |                             'num_ref_frames', 'num_views', 'ego2global')
 79 |     ) -> None:
 80 |         self.keys = keys
 81 |         self.meta_keys = meta_keys
 82 | 
 83 |     def _remove_prefix(self, key: str) -> str:
 84 |         if key.startswith('gt_'):
 85 |             key = key[3:]
 86 |         return key
 87 | 
 88 |     def transform(self, results: Union[dict,
 89 |                                        List[dict]]) -> Union[dict, List[dict]]:
 90 |         """Method to pack the input data. when the value in this dict is a
 91 |         list, it usually is in Augmentations Testing.
 92 | 
 93 |         Args:
 94 |             results (dict | list[dict]): Result dict from the data pipeline.
 95 | 
 96 |         Returns:
 97 |             dict | List[dict]:
 98 | 
 99 |             - 'inputs' (dict): The forward data of models. It usually contains
100 |               following keys:
101 | 
102 |                 - points
103 |                 - img
104 | 
105 |             - 'data_samples' (:obj:`Det3DDataSample`): The annotation info of
106 |               the sample.
107 |         """
108 |         # augtest
109 |         if isinstance(results, list):
110 |             if len(results) == 1:
111 |                 # simple test
112 |                 return self.pack_single_results(results[0])
113 |             pack_results = []
114 |             for single_result in results:
115 |                 pack_results.append(self.pack_single_results(single_result))
116 |             return pack_results
117 |         # norm training and simple testing
118 |         elif isinstance(results, dict):
119 |             return self.pack_single_results(results)
120 |         else:
121 |             raise NotImplementedError
122 | 
123 |     def pack_single_results(self, results: dict) -> dict:
124 |         """Method to pack the single input data. when the value in this dict is
125 |         a list, it usually is in Augmentations Testing.
126 | 
127 |         Args:
128 |             results (dict): Result dict from the data pipeline.
129 | 
130 |         Returns:
131 |             dict: A dict contains
132 | 
133 |             - 'inputs' (dict): The forward data of models. It usually contains
134 |               following keys:
135 | 
136 |                 - points
137 |                 - img
138 | 
139 |             - 'data_samples' (:obj:`Det3DDataSample`): The annotation info
140 |               of the sample.
141 |         """
142 |         # Format 3D data
143 |         if 'points' in results:
144 |             if isinstance(results['points'], BasePoints):
145 |                 results['points'] = results['points'].tensor
146 | 
147 |         if 'img' in results:
148 |             if isinstance(results['img'], list):
149 |                 # process multiple imgs in single frame
150 |                 imgs = np.stack(results['img'], axis=0)
151 |                 if imgs.flags.c_contiguous:
152 |                     imgs = to_tensor(imgs).permute(0, 3, 1, 2).contiguous()
153 |                 else:
154 |                     imgs = to_tensor(
155 |                         np.ascontiguousarray(imgs.transpose(0, 3, 1, 2)))
156 |                 results['img'] = imgs
157 |             else:
158 |                 img = results['img']
159 |                 if len(img.shape) < 3:
160 |                     img = np.expand_dims(img, -1)
161 |                 # To improve the computational speed by by 3-5 times, apply:
162 |                 # `torch.permute()` rather than `np.transpose()`.
163 |                 # Refer to https://github.com/open-mmlab/mmdetection/pull/9533
164 |                 # for more details
165 |                 if img.flags.c_contiguous:
166 |                     img = to_tensor(img).permute(2, 0, 1).contiguous()
167 |                 else:
168 |                     img = to_tensor(
169 |                         np.ascontiguousarray(img.transpose(2, 0, 1)))
170 |                 results['img'] = img
171 | 
172 |         for key in [
173 |                 'proposals', 'gt_bboxes', 'gt_bboxes_ignore', 'gt_labels',
174 |                 'gt_bboxes_labels', 'attr_labels', 'pts_instance_mask',
175 |                 'pts_semantic_mask', 'centers_2d', 'depths', 'gt_labels_3d'
176 |         ]:
177 |             if key not in results:
178 |                 continue
179 |             if isinstance(results[key], list):
180 |                 results[key] = [to_tensor(res) for res in results[key]]
181 |             else:
182 |                 results[key] = to_tensor(results[key])
183 |         if 'gt_bboxes_3d' in results:
184 |             if not isinstance(results['gt_bboxes_3d'], BaseInstance3DBoxes):
185 |                 results['gt_bboxes_3d'] = to_tensor(results['gt_bboxes_3d'])
186 | 
187 |         if 'gt_semantic_seg' in results:
188 |             results['gt_semantic_seg'] = to_tensor(
189 |                 results['gt_semantic_seg'][None])
190 |         if 'gt_seg_map' in results:
191 |             results['gt_seg_map'] = results['gt_seg_map'][None, ...]
192 | 
193 |         data_sample = Det3DDataSample()
194 |         gt_instances_3d = InstanceData()
195 |         gt_instances = InstanceData()
196 |         gt_pts_seg = PointData()
197 | 
198 |         img_metas = {}
199 |         for key in self.meta_keys:
200 |             if key in results:
201 |                 img_metas[key] = results[key]
202 |         data_sample.set_metainfo(img_metas)
203 | 
204 |         inputs = {}
205 |         for key in self.keys:
206 |             if key in results:
207 |                 if key in self.INPUTS_KEYS:
208 |                     inputs[key] = results[key]
209 |                 elif key in self.INSTANCEDATA_3D_KEYS:
210 |                     gt_instances_3d[self._remove_prefix(key)] = results[key]
211 |                 elif key in self.INSTANCEDATA_2D_KEYS:
212 |                     if key == 'gt_bboxes_labels':
213 |                         gt_instances['labels'] = results[key]
214 |                     else:
215 |                         gt_instances[self._remove_prefix(key)] = results[key]
216 |                 elif key in self.SEG_KEYS:
217 |                     gt_pts_seg[self._remove_prefix(key)] = results[key]
218 |                 elif key in ['lidar_sweeps']:
219 |                     data_sample.lidar_sweeps = results[key]
220 |                 elif key in ['lidar_token']:
221 |                     pass
222 |                 elif key in ['lidar_path']:
223 |                     pass
224 |                 else:
225 |                     raise NotImplementedError(f'Please modified '
226 |                                               f'`Pack3DDetInputs` '
227 |                                               f'to put {key} to '
228 |                                               f'corresponding field')
229 | 
230 |         data_sample.gt_instances_3d = gt_instances_3d
231 |         data_sample.gt_instances = gt_instances
232 |         data_sample.gt_pts_seg = gt_pts_seg
233 |         if 'eval_ann_info' in results:
234 |             data_sample.eval_ann_info = results['eval_ann_info']
235 |             if 'lidar_token' in self.keys:
236 |                 data_sample.eval_ann_info['token'] = results['lidar_token'] # for nuscenes submission
237 |             if 'lidar_path' in self.keys:
238 |                 data_sample.eval_ann_info['lidar_path'] = results['lidar_path'] # for semantickitti submission
239 |         else:
240 |             data_sample.eval_ann_info = None
241 | 
242 |         packed_results = dict()
243 |         packed_results['data_samples'] = data_sample
244 |         packed_results['inputs'] = inputs
245 | 
246 |         return packed_results
247 | 
248 |     def __repr__(self) -> str:
249 |         """str: Return a string that describes the module."""
250 |         repr_str = self.__class__.__name__
251 |         repr_str += f'(keys={self.keys})'
252 |         repr_str += f'(meta_keys={self.meta_keys})'
253 |         return repr_str


--------------------------------------------------------------------------------
/datasets/transforms/loading.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | import copy
  3 | from typing import List, Optional, Union
  4 | 
  5 | import mmcv
  6 | import mmengine
  7 | import numpy as np
  8 | from mmcv.transforms import LoadImageFromFile
  9 | from mmcv.transforms.base import BaseTransform
 10 | from mmdet.datasets.transforms import LoadAnnotations
 11 | from mmengine.fileio import get
 12 | 
 13 | from mmdet3d.registry import TRANSFORMS
 14 | from mmdet3d.structures.bbox_3d import get_box_type
 15 | from mmdet3d.structures.points import BasePoints, get_points_type
 16 | 
 17 | 
 18 | @TRANSFORMS.register_module()
 19 | class _LoadAnnotations3D(LoadAnnotations):
 20 |     """Load Annotations3D.
 21 | 
 22 |     Load instance mask and semantic mask of points and
 23 |     encapsulate the items into related fields.
 24 | 
 25 |     Required Keys:
 26 | 
 27 |     - ann_info (dict)
 28 | 
 29 |         - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes` |
 30 |           :obj:`DepthInstance3DBoxes` | :obj:`CameraInstance3DBoxes`):
 31 |           3D ground truth bboxes. Only when `with_bbox_3d` is True
 32 |         - gt_labels_3d (np.int64): Labels of ground truths.
 33 |           Only when `with_label_3d` is True.
 34 |         - gt_bboxes (np.float32): 2D ground truth bboxes.
 35 |           Only when `with_bbox` is True.
 36 |         - gt_labels (np.ndarray): Labels of ground truths.
 37 |           Only when `with_label` is True.
 38 |         - depths (np.ndarray): Only when
 39 |           `with_bbox_depth` is True.
 40 |         - centers_2d (np.ndarray): Only when
 41 |           `with_bbox_depth` is True.
 42 |         - attr_labels (np.ndarray): Attribute labels of instances.
 43 |           Only when `with_attr_label` is True.
 44 | 
 45 |     - pts_instance_mask_path (str): Path of instance mask file.
 46 |       Only when `with_mask_3d` is True.
 47 |     - pts_semantic_mask_path (str): Path of semantic mask file.
 48 |       Only when `with_seg_3d` is True.
 49 |     - pts_panoptic_mask_path (str): Path of panoptic mask file.
 50 |       Only when both `with_panoptic_3d` is True.
 51 | 
 52 |     Added Keys:
 53 | 
 54 |     - gt_bboxes_3d (:obj:`LiDARInstance3DBoxes` |
 55 |       :obj:`DepthInstance3DBoxes` | :obj:`CameraInstance3DBoxes`):
 56 |       3D ground truth bboxes. Only when `with_bbox_3d` is True
 57 |     - gt_labels_3d (np.int64): Labels of ground truths.
 58 |       Only when `with_label_3d` is True.
 59 |     - gt_bboxes (np.float32): 2D ground truth bboxes.
 60 |       Only when `with_bbox` is True.
 61 |     - gt_labels (np.int64): Labels of ground truths.
 62 |       Only when `with_label` is True.
 63 |     - depths (np.float32): Only when
 64 |       `with_bbox_depth` is True.
 65 |     - centers_2d (np.ndarray): Only when
 66 |       `with_bbox_depth` is True.
 67 |     - attr_labels (np.int64): Attribute labels of instances.
 68 |       Only when `with_attr_label` is True.
 69 |     - pts_instance_mask (np.int64): Instance mask of each point.
 70 |       Only when `with_mask_3d` is True.
 71 |     - pts_semantic_mask (np.int64): Semantic mask of each point.
 72 |       Only when `with_seg_3d` is True.
 73 | 
 74 |     Args:
 75 |         with_bbox_3d (bool): Whether to load 3D boxes. Defaults to True.
 76 |         with_label_3d (bool): Whether to load 3D labels. Defaults to True.
 77 |         with_attr_label (bool): Whether to load attribute label.
 78 |             Defaults to False.
 79 |         with_mask_3d (bool): Whether to load 3D instance masks for points.
 80 |             Defaults to False.
 81 |         with_seg_3d (bool): Whether to load 3D semantic masks for points.
 82 |             Defaults to False.
 83 |         with_bbox (bool): Whether to load 2D boxes. Defaults to False.
 84 |         with_label (bool): Whether to load 2D labels. Defaults to False.
 85 |         with_mask (bool): Whether to load 2D instance masks. Defaults to False.
 86 |         with_seg (bool): Whether to load 2D semantic masks. Defaults to False.
 87 |         with_bbox_depth (bool): Whether to load 2.5D boxes. Defaults to False.
 88 |         with_panoptic_3d (bool): Whether to load 3D panoptic masks for points.
 89 |             Defaults to False.
 90 |         poly2mask (bool): Whether to convert polygon annotations to bitmasks.
 91 |             Defaults to True.
 92 |         seg_3d_dtype (str): String of dtype of 3D semantic masks.
 93 |             Defaults to 'np.int64'.
 94 |         seg_offset (int): The offset to split semantic and instance labels from
 95 |             panoptic labels. Defaults to None.
 96 |         dataset_type (str): Type of dataset used for splitting semantic and
 97 |             instance labels. Defaults to None.
 98 |         backend_args (dict, optional): Arguments to instantiate the
 99 |             corresponding backend. Defaults to None.
100 |     """
101 | 
102 |     def __init__(self,
103 |                  with_bbox_3d: bool = True,
104 |                  with_label_3d: bool = True,
105 |                  with_attr_label: bool = False,
106 |                  with_mask_3d: bool = False,
107 |                  with_seg_3d: bool = False,
108 |                  with_bbox: bool = False,
109 |                  with_label: bool = False,
110 |                  with_mask: bool = False,
111 |                  with_seg: bool = False,
112 |                  with_bbox_depth: bool = False,
113 |                  with_panoptic_3d: bool = False,
114 |                  poly2mask: bool = True,
115 |                  seg_3d_dtype: str = 'np.int64',
116 |                  seg_offset: int = None,
117 |                  dataset_type: str = None,
118 |                  backend_args: Optional[dict] = None) -> None:
119 |         super().__init__(
120 |             with_bbox=with_bbox,
121 |             with_label=with_label,
122 |             with_mask=with_mask,
123 |             with_seg=with_seg,
124 |             poly2mask=poly2mask,
125 |             backend_args=backend_args)
126 |         self.with_bbox_3d = with_bbox_3d
127 |         self.with_bbox_depth = with_bbox_depth
128 |         self.with_label_3d = with_label_3d
129 |         self.with_attr_label = with_attr_label
130 |         self.with_mask_3d = with_mask_3d
131 |         self.with_seg_3d = with_seg_3d
132 |         self.with_panoptic_3d = with_panoptic_3d
133 |         self.seg_3d_dtype = eval(seg_3d_dtype)
134 |         self.seg_offset = seg_offset
135 |         self.dataset_type = dataset_type
136 | 
137 |     def _load_bboxes_3d(self, results: dict) -> dict:
138 |         """Private function to move the 3D bounding box annotation from
139 |         `ann_info` field to the root of `results`.
140 | 
141 |         Args:
142 |             results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.
143 | 
144 |         Returns:
145 |             dict: The dict containing loaded 3D bounding box annotations.
146 |         """
147 | 
148 |         results['gt_bboxes_3d'] = results['ann_info']['gt_bboxes_3d']
149 |         return results
150 | 
151 |     def _load_bboxes_depth(self, results: dict) -> dict:
152 |         """Private function to load 2.5D bounding box annotations.
153 | 
154 |         Args:
155 |             results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.
156 | 
157 |         Returns:
158 |             dict: The dict containing loaded 2.5D bounding box annotations.
159 |         """
160 | 
161 |         results['depths'] = results['ann_info']['depths']
162 |         results['centers_2d'] = results['ann_info']['centers_2d']
163 |         return results
164 | 
165 |     def _load_labels_3d(self, results: dict) -> dict:
166 |         """Private function to load label annotations.
167 | 
168 |         Args:
169 |             results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.
170 | 
171 |         Returns:
172 |             dict: The dict containing loaded label annotations.
173 |         """
174 | 
175 |         results['gt_labels_3d'] = results['ann_info']['gt_labels_3d']
176 |         return results
177 | 
178 |     def _load_attr_labels(self, results: dict) -> dict:
179 |         """Private function to load label annotations.
180 | 
181 |         Args:
182 |             results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.
183 | 
184 |         Returns:
185 |             dict: The dict containing loaded label annotations.
186 |         """
187 |         results['attr_labels'] = results['ann_info']['attr_labels']
188 |         return results
189 | 
190 |     def _load_masks_3d(self, results: dict) -> dict:
191 |         """Private function to load 3D mask annotations.
192 | 
193 |         Args:
194 |             results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.
195 | 
196 |         Returns:
197 |             dict: The dict containing loaded 3D mask annotations.
198 |         """
199 |         pts_instance_mask_path = results['pts_instance_mask_path']
200 | 
201 |         try:
202 |             mask_bytes = get(
203 |                 pts_instance_mask_path, backend_args=self.backend_args)
204 |             pts_instance_mask = np.frombuffer(mask_bytes, dtype=np.int64)
205 |         except ConnectionError:
206 |             mmengine.check_file_exist(pts_instance_mask_path)
207 |             pts_instance_mask = np.fromfile(
208 |                 pts_instance_mask_path, dtype=np.int64)
209 | 
210 |         results['pts_instance_mask'] = pts_instance_mask
211 |         # 'eval_ann_info' will be passed to evaluator
212 |         if 'eval_ann_info' in results:
213 |             results['eval_ann_info']['pts_instance_mask'] = pts_instance_mask
214 |         return results
215 | 
216 |     def _load_semantic_seg_3d(self, results: dict) -> dict:
217 |         """Private function to load 3D semantic segmentation annotations.
218 | 
219 |         Args:
220 |             results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.
221 | 
222 |         Returns:
223 |             dict: The dict containing the semantic segmentation annotations.
224 |         """
225 |         pts_semantic_mask_path = results['pts_semantic_mask_path']
226 | 
227 |         try:
228 |             mask_bytes = get(
229 |                 pts_semantic_mask_path, backend_args=self.backend_args)
230 |             # add .copy() to fix read-only bug
231 |             pts_semantic_mask = np.frombuffer(
232 |                 mask_bytes, dtype=self.seg_3d_dtype).copy()
233 |         except ConnectionError:
234 |             mmengine.check_file_exist(pts_semantic_mask_path)
235 |             pts_semantic_mask = np.fromfile(
236 |                 pts_semantic_mask_path, dtype=np.int64)
237 | 
238 |         pts_semantic_mask = pts_semantic_mask.astype(np.int64)
239 |         if self.dataset_type == 'semantickitti':
240 |             pts_semantic_mask = pts_semantic_mask % self.seg_offset
241 |         # nuScenes loads semantic and panoptic labels from different files.
242 | 
243 |         results['pts_semantic_mask'] = pts_semantic_mask
244 | 
245 |         # 'eval_ann_info' will be passed to evaluator
246 |         if 'eval_ann_info' in results:
247 |             results['eval_ann_info']['pts_semantic_mask'] = pts_semantic_mask
248 |         return results
249 | 
250 |     def _load_panoptic_3d(self, results: dict) -> dict:
251 |         """Private function to load 3D panoptic segmentation annotations.
252 | 
253 |         Args:
254 |             results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.
255 | 
256 |         Returns:
257 |             dict: The dict containing the panoptic segmentation annotations.
258 |         """
259 |         pts_panoptic_mask_path = results['pts_panoptic_mask_path']
260 | 
261 |         try:
262 |             mask_bytes = get(
263 |                 pts_panoptic_mask_path, backend_args=self.backend_args)
264 |             # add .copy() to fix read-only bug
265 |             pts_panoptic_mask = np.frombuffer(
266 |                 mask_bytes, dtype=self.seg_3d_dtype).copy()
267 |         except ConnectionError:
268 |             mmengine.check_file_exist(pts_panoptic_mask_path)
269 |             pts_panoptic_mask = np.load(pts_panoptic_mask_path)
270 | 
271 |         if self.dataset_type == 'semantickitti':
272 |             pts_semantic_mask = pts_panoptic_mask.astype(np.int64)
273 |             pts_semantic_mask = pts_semantic_mask % self.seg_offset
274 |         elif self.dataset_type == 'nuscenes':
275 |             pts_panoptic_mask = pts_panoptic_mask['data']
276 |             pts_semantic_mask = pts_panoptic_mask // self.seg_offset
277 | 
278 |         results['pts_semantic_mask'] = pts_semantic_mask
279 | 
280 |         # We can directly take panoptic labels as instance ids.
281 |         pts_instance_mask = pts_panoptic_mask.astype(np.int64)
282 |         results['pts_instance_mask'] = pts_instance_mask
283 | 
284 |         # 'eval_ann_info' will be passed to evaluator
285 |         if 'eval_ann_info' in results:
286 |             results['eval_ann_info']['pts_semantic_mask'] = pts_semantic_mask
287 |             results['eval_ann_info']['pts_instance_mask'] = pts_instance_mask
288 |         return results
289 | 
290 |     def _load_bboxes(self, results: dict) -> None:
291 |         """Private function to load bounding box annotations.
292 | 
293 |         The only difference is it remove the proceess for
294 |         `ignore_flag`
295 | 
296 |         Args:
297 |             results (dict): Result dict from :obj:`mmcv.BaseDataset`.
298 | 
299 |         Returns:
300 |             dict: The dict contains loaded bounding box annotations.
301 |         """
302 | 
303 |         results['gt_bboxes'] = results['ann_info']['gt_bboxes']
304 | 
305 |     def _load_labels(self, results: dict) -> None:
306 |         """Private function to load label annotations.
307 | 
308 |         Args:
309 |             results (dict): Result dict from :obj :obj:`mmcv.BaseDataset`.
310 | 
311 |         Returns:
312 |             dict: The dict contains loaded label annotations.
313 |         """
314 |         results['gt_bboxes_labels'] = results['ann_info']['gt_bboxes_labels']
315 | 
316 |     def transform(self, results: dict) -> dict:
317 |         """Function to load multiple types annotations.
318 | 
319 |         Args:
320 |             results (dict): Result dict from :obj:`mmdet3d.CustomDataset`.
321 | 
322 |         Returns:
323 |             dict: The dict containing loaded 3D bounding box, label, mask and
324 |             semantic segmentation annotations.
325 |         """
326 |         results = super().transform(results)
327 |         if self.with_bbox_3d:
328 |             results = self._load_bboxes_3d(results)
329 |         if self.with_bbox_depth:
330 |             results = self._load_bboxes_depth(results)
331 |         if self.with_label_3d:
332 |             results = self._load_labels_3d(results)
333 |         if self.with_attr_label:
334 |             results = self._load_attr_labels(results)
335 |         if self.with_panoptic_3d:
336 |             results = self._load_panoptic_3d(results)
337 |         if self.with_mask_3d:
338 |             results = self._load_masks_3d(results)
339 |         if self.with_seg_3d:
340 |             results = self._load_semantic_seg_3d(results)
341 |         return results
342 | 
343 |     def __repr__(self) -> str:
344 |         """str: Return a string that describes the module."""
345 |         indent_str = '    '
346 |         repr_str = self.__class__.__name__ + '(\n'
347 |         repr_str += f'{indent_str}with_bbox_3d={self.with_bbox_3d}, '
348 |         repr_str += f'{indent_str}with_label_3d={self.with_label_3d}, '
349 |         repr_str += f'{indent_str}with_attr_label={self.with_attr_label}, '
350 |         repr_str += f'{indent_str}with_mask_3d={self.with_mask_3d}, '
351 |         repr_str += f'{indent_str}with_seg_3d={self.with_seg_3d}, '
352 |         repr_str += f'{indent_str}with_panoptic_3d={self.with_panoptic_3d}, '
353 |         repr_str += f'{indent_str}with_bbox={self.with_bbox}, '
354 |         repr_str += f'{indent_str}with_label={self.with_label}, '
355 |         repr_str += f'{indent_str}with_mask={self.with_mask}, '
356 |         repr_str += f'{indent_str}with_seg={self.with_seg}, '
357 |         repr_str += f'{indent_str}with_bbox_depth={self.with_bbox_depth}, '
358 |         repr_str += f'{indent_str}poly2mask={self.poly2mask})'
359 |         repr_str += f'{indent_str}seg_offset={self.seg_offset})'
360 | 
361 |         return repr_str
362 | 
363 | 
364 | 


--------------------------------------------------------------------------------
/datasets/transforms/transforms_3d.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | import random
  3 | import warnings
  4 | from typing import List, Optional, Sequence, Tuple, Union
  5 | 
  6 | import cv2
  7 | import mmcv
  8 | import numpy as np
  9 | import torch
 10 | from mmcv.transforms import BaseTransform, Compose, RandomResize, Resize
 11 | from mmdet.datasets.transforms import (PhotoMetricDistortion, RandomCrop,
 12 |                                        RandomFlip)
 13 | from mmengine import is_list_of, is_tuple_of
 14 | 
 15 | from mmdet3d.models.task_modules import VoxelGenerator
 16 | from mmdet3d.registry import TRANSFORMS
 17 | from mmdet3d.structures import (CameraInstance3DBoxes, DepthInstance3DBoxes,
 18 |                                 LiDARInstance3DBoxes)
 19 | from mmdet3d.structures.ops import box_np_ops
 20 | from mmdet3d.structures.points import BasePoints
 21 | from mmdet3d.datasets.transforms.data_augment_utils import noise_per_object_v3_
 22 | 
 23 | 
 24 | @TRANSFORMS.register_module(force=True)
 25 | class _PolarMix(BaseTransform):
 26 |     """PolarMix data augmentation.
 27 | 
 28 |     The polarmix transform steps are as follows:
 29 | 
 30 |         1. Another random point cloud is picked by dataset.
 31 |         2. Exchange sectors of two point clouds that are cut with certain
 32 |            azimuth angles.
 33 |         3. Cut point instances from picked point cloud, rotate them by multiple
 34 |            azimuth angles, and paste the cut and rotated instances.
 35 | 
 36 |     Required Keys:
 37 | 
 38 |     - points (:obj:`BasePoints`)
 39 |     - pts_semantic_mask (np.int64)
 40 |     - dataset (:obj:`BaseDataset`)
 41 | 
 42 |     Modified Keys:
 43 | 
 44 |     - points (:obj:`BasePoints`)
 45 |     - pts_semantic_mask (np.int64)
 46 | 
 47 |     Args:
 48 |         instance_classes (List[int]): Semantic masks which represent the
 49 |             instance.
 50 |         swap_ratio (float): Swap ratio of two point cloud. Defaults to 0.5.
 51 |         rotate_paste_ratio (float): Rotate paste ratio. Defaults to 1.0.
 52 |         pre_transform (Sequence[dict], optional): Sequence of transform object
 53 |             or config dict to be composed. Defaults to None.
 54 |         prob (float): The transformation probability. Defaults to 1.0.
 55 |     """
 56 | 
 57 |     def __init__(self,
 58 |                  instance_classes: List[int],
 59 |                  swap_ratio: float = 0.5,
 60 |                  rotate_paste_ratio: float = 1.0,
 61 |                  pre_transform: Optional[Sequence[dict]] = None,
 62 |                  prob: float = 1.0) -> None:
 63 |         assert is_list_of(instance_classes, int), \
 64 |             'instance_classes should be a list of int'
 65 |         self.instance_classes = instance_classes
 66 |         self.swap_ratio = swap_ratio
 67 |         self.rotate_paste_ratio = rotate_paste_ratio
 68 | 
 69 |         self.prob = prob
 70 |         if pre_transform is None:
 71 |             self.pre_transform = None
 72 |         else:
 73 |             self.pre_transform = Compose(pre_transform)
 74 | 
 75 |     def polar_mix_transform(self, input_dict: dict, mix_results: dict) -> dict:
 76 |         """PolarMix transform function.
 77 | 
 78 |         Args:
 79 |             input_dict (dict): Result dict from loading pipeline.
 80 |             mix_results (dict): Mixed dict picked from dataset.
 81 | 
 82 |         Returns:
 83 |             dict: output dict after transformation.
 84 |         """
 85 |         mix_points = mix_results['points']
 86 |         mix_pts_semantic_mask = mix_results['pts_semantic_mask']
 87 | 
 88 |         points = input_dict['points']
 89 |         pts_semantic_mask = input_dict['pts_semantic_mask']
 90 | 
 91 |         mix_panoptic = False
 92 |         if 'pts_instance_mask' in mix_results:
 93 |             mix_instance_mask = mix_results['pts_instance_mask']
 94 |             mix_instance_mask += (1000<<16) # not overlap id
 95 |             pts_instance_mask = input_dict['pts_instance_mask']
 96 |             mix_panoptic = True
 97 | 
 98 |         # 1. swap point cloud
 99 |         if np.random.random() < self.swap_ratio:
100 |             start_angle = (np.random.random() - 1) * np.pi  # -pi~0
101 |             end_angle = start_angle + np.pi
102 |             # calculate horizontal angle for each point
103 |             yaw = -torch.atan2(points.coord[:, 1], points.coord[:, 0])
104 |             mix_yaw = -torch.atan2(mix_points.coord[:, 1], mix_points.coord[:,
105 |                                                                             0])
106 | 
107 |             # select points in sector
108 |             idx = (yaw <= start_angle) | (yaw >= end_angle)
109 |             mix_idx = (mix_yaw > start_angle) & (mix_yaw < end_angle)
110 | 
111 |             # swap
112 |             points = points.cat([points[idx], mix_points[mix_idx]])
113 |             pts_semantic_mask = np.concatenate(
114 |                 (pts_semantic_mask[idx.numpy()],
115 |                  mix_pts_semantic_mask[mix_idx.numpy()]),
116 |                 axis=0)
117 |             
118 |             if mix_panoptic:
119 |                 pts_instance_mask = np.concatenate(
120 |                     (pts_instance_mask[idx.numpy()],
121 |                     mix_instance_mask[mix_idx.numpy()]),
122 |                     axis=0)                
123 | 
124 |         # 2. rotate-pasting
125 |         if np.random.random() < self.rotate_paste_ratio:
126 |             # extract instance points
127 |             instance_points, instance_pts_semantic_mask = [], []
128 |             if mix_panoptic:
129 |                 instance_pts_instance_mask = []
130 |             for instance_class in self.instance_classes:
131 |                 mix_idx = mix_pts_semantic_mask == instance_class
132 |                 instance_points.append(mix_points[mix_idx])
133 |                 instance_pts_semantic_mask.append(
134 |                     mix_pts_semantic_mask[mix_idx])
135 |                 if mix_panoptic:
136 |                     instance_pts_instance_mask.append(mix_instance_mask[mix_idx])
137 |             instance_points = mix_points.cat(instance_points)
138 |             instance_pts_semantic_mask = np.concatenate(
139 |                 instance_pts_semantic_mask, axis=0)
140 |             if mix_panoptic:
141 |                instance_pts_instance_mask = np.concatenate(
142 |                 instance_pts_instance_mask, axis=0) 
143 | 
144 |             # rotate-copy
145 |             copy_points = [instance_points]
146 |             copy_pts_semantic_mask = [instance_pts_semantic_mask]
147 |             if mix_panoptic:
148 |                 copy_pts_instance_mask = [instance_pts_instance_mask]
149 |             angle_list = [
150 |                 np.random.random() * np.pi * 2 / 3,
151 |                 (np.random.random() + 1) * np.pi * 2 / 3
152 |             ]
153 |             for angle in angle_list:
154 |                 new_points = instance_points.clone()
155 |                 new_points.rotate(angle)
156 |                 copy_points.append(new_points)
157 |                 copy_pts_semantic_mask.append(instance_pts_semantic_mask)
158 |                 if mix_panoptic:
159 |                     copy_pts_instance_mask.append(instance_pts_instance_mask)
160 |             copy_points = instance_points.cat(copy_points)
161 |             copy_pts_semantic_mask = np.concatenate(
162 |                 copy_pts_semantic_mask, axis=0)
163 |             if mix_panoptic:
164 |                 copy_pts_instance_mask = np.concatenate(
165 |                 copy_pts_instance_mask, axis=0)
166 | 
167 |             points = points.cat([points, copy_points])
168 |             pts_semantic_mask = np.concatenate(
169 |                 (pts_semantic_mask, copy_pts_semantic_mask), axis=0)
170 |             if mix_panoptic:
171 |                 pts_instance_mask = np.concatenate(
172 |                 (pts_instance_mask, copy_pts_instance_mask), axis=0)
173 | 
174 |         input_dict['points'] = points
175 |         input_dict['pts_semantic_mask'] = pts_semantic_mask
176 |         if mix_panoptic:
177 |             input_dict['pts_instance_mask'] = pts_instance_mask
178 |         return input_dict
179 | 
180 |     def transform(self, input_dict: dict) -> dict:
181 |         """PolarMix transform function.
182 | 
183 |         Args:
184 |             input_dict (dict): Result dict from loading pipeline.
185 | 
186 |         Returns:
187 |             dict: output dict after transformation.
188 |         """
189 |         if np.random.rand() > self.prob:
190 |             return input_dict
191 | 
192 |         assert 'dataset' in input_dict, \
193 |             '`dataset` is needed to pass through PolarMix, while not found.'
194 |         dataset = input_dict['dataset']
195 | 
196 |         # get index of other point cloud
197 |         index = np.random.randint(0, len(dataset))
198 | 
199 |         mix_results = dataset.get_data_info(index)
200 | 
201 |         if self.pre_transform is not None:
202 |             # pre_transform may also require dataset
203 |             mix_results.update({'dataset': dataset})
204 |             # before polarmix need to go through
205 |             # the necessary pre_transform
206 |             mix_results = self.pre_transform(mix_results)
207 |             mix_results.pop('dataset')
208 | 
209 |         input_dict = self.polar_mix_transform(input_dict, mix_results)
210 | 
211 |         return input_dict
212 | 
213 |     def __repr__(self) -> str:
214 |         """str: Return a string that describes the module."""
215 |         repr_str = self.__class__.__name__
216 |         repr_str += f'(instance_classes={self.instance_classes}, '
217 |         repr_str += f'swap_ratio={self.swap_ratio}, '
218 |         repr_str += f'rotate_paste_ratio={self.rotate_paste_ratio}, '
219 |         repr_str += f'pre_transform={self.pre_transform}, '
220 |         repr_str += f'prob={self.prob})'
221 |         return repr_str
222 | 
223 | 
224 | @TRANSFORMS.register_module(force=True)
225 | class _LaserMix(BaseTransform):
226 |     """LaserMix data augmentation.
227 | 
228 |     The lasermix transform steps are as follows:
229 | 
230 |         1. Another random point cloud is picked by dataset.
231 |         2. Divide the point cloud into several regions according to pitch
232 |            angles and combine the areas crossly.
233 | 
234 |     Required Keys:
235 | 
236 |     - points (:obj:`BasePoints`)
237 |     - pts_semantic_mask (np.int64)
238 |     - dataset (:obj:`BaseDataset`)
239 | 
240 |     Modified Keys:
241 | 
242 |     - points (:obj:`BasePoints`)
243 |     - pts_semantic_mask (np.int64)
244 | 
245 |     Args:
246 |         num_areas (List[int]): A list of area numbers will be divided into.
247 |         pitch_angles (Sequence[float]): Pitch angles used to divide areas.
248 |         pre_transform (Sequence[dict], optional): Sequence of transform object
249 |             or config dict to be composed. Defaults to None.
250 |         prob (float): The transformation probability. Defaults to 1.0.
251 |     """
252 | 
253 |     def __init__(self,
254 |                  num_areas: List[int],
255 |                  pitch_angles: Sequence[float],
256 |                  pre_transform: Optional[Sequence[dict]] = None,
257 |                  prob: float = 1.0) -> None:
258 |         assert is_list_of(num_areas, int), \
259 |             'num_areas should be a list of int.'
260 |         self.num_areas = num_areas
261 | 
262 |         assert len(pitch_angles) == 2, \
263 |             'The length of pitch_angles should be 2, ' \
264 |             f'but got {len(pitch_angles)}.'
265 |         assert pitch_angles[1] > pitch_angles[0], \
266 |             'pitch_angles[1] should be larger than pitch_angles[0].'
267 |         self.pitch_angles = pitch_angles
268 | 
269 |         self.prob = prob
270 |         if pre_transform is None:
271 |             self.pre_transform = None
272 |         else:
273 |             self.pre_transform = Compose(pre_transform)
274 | 
275 |     def laser_mix_transform(self, input_dict: dict, mix_results: dict) -> dict:
276 |         """LaserMix transform function.
277 | 
278 |         Args:
279 |             input_dict (dict): Result dict from loading pipeline.
280 |             mix_results (dict): Mixed dict picked from dataset.
281 | 
282 |         Returns:
283 |             dict: output dict after transformation.
284 |         """
285 |         mix_points = mix_results['points']
286 |         mix_pts_semantic_mask = mix_results['pts_semantic_mask']
287 | 
288 |         points = input_dict['points']
289 |         pts_semantic_mask = input_dict['pts_semantic_mask']
290 | 
291 |         rho = torch.sqrt(points.coord[:, 0]**2 + points.coord[:, 1]**2)
292 |         pitch = torch.atan2(points.coord[:, 2], rho)
293 |         pitch = torch.clamp(pitch, self.pitch_angles[0] + 1e-5,
294 |                             self.pitch_angles[1] - 1e-5)
295 | 
296 |         mix_rho = torch.sqrt(mix_points.coord[:, 0]**2 +
297 |                              mix_points.coord[:, 1]**2)
298 |         mix_pitch = torch.atan2(mix_points.coord[:, 2], mix_rho)
299 |         mix_pitch = torch.clamp(mix_pitch, self.pitch_angles[0] + 1e-5,
300 |                                 self.pitch_angles[1] - 1e-5)
301 | 
302 |         num_areas = np.random.choice(self.num_areas, size=1)[0]
303 |         angle_list = np.linspace(self.pitch_angles[1], self.pitch_angles[0],
304 |                                  num_areas + 1)
305 |         out_points = []
306 |         out_pts_semantic_mask = []
307 | 
308 |         mix_panoptic = False
309 |         if 'pts_instance_mask' in mix_results:
310 |             mix_instance_mask = mix_results['pts_instance_mask']
311 |             mix_instance_mask += (1000<<16) # not overlap id
312 |             pts_instance_mask = input_dict['pts_instance_mask']
313 |             out_pts_instance_mask = []
314 |             mix_panoptic = True
315 | 
316 |         for i in range(num_areas):
317 |             # convert angle to radian
318 |             start_angle = angle_list[i + 1] / 180 * np.pi
319 |             end_angle = angle_list[i] / 180 * np.pi
320 |             if i % 2 == 0:  # pick from original point cloud
321 |                 idx = (pitch > start_angle) & (pitch <= end_angle)
322 |                 out_points.append(points[idx])
323 |                 out_pts_semantic_mask.append(pts_semantic_mask[idx.numpy()])
324 |                 if mix_panoptic:
325 |                     out_pts_instance_mask.append(pts_instance_mask[idx.numpy()])
326 |             else:  # pickle from mixed point cloud
327 |                 idx = (mix_pitch > start_angle) & (mix_pitch <= end_angle)
328 |                 out_points.append(mix_points[idx])
329 |                 out_pts_semantic_mask.append(
330 |                     mix_pts_semantic_mask[idx.numpy()])
331 |                 if mix_panoptic:
332 |                     out_pts_instance_mask.append(mix_instance_mask[idx.numpy()])
333 |         out_points = points.cat(out_points)
334 |         out_pts_semantic_mask = np.concatenate(out_pts_semantic_mask, axis=0)
335 |         input_dict['points'] = out_points
336 |         input_dict['pts_semantic_mask'] = out_pts_semantic_mask
337 | 
338 |         if mix_panoptic:
339 |             out_pts_instance_mask = np.concatenate(out_pts_instance_mask, axis=0)
340 |             input_dict['pts_instance_mask'] = out_pts_instance_mask
341 |         return input_dict
342 | 
343 |     def transform(self, input_dict: dict) -> dict:
344 |         """LaserMix transform function.
345 | 
346 |         Args:
347 |             input_dict (dict): Result dict from loading pipeline.
348 | 
349 |         Returns:
350 |             dict: output dict after transformation.
351 |         """
352 |         if np.random.rand() > self.prob:
353 |             return input_dict
354 | 
355 |         assert 'dataset' in input_dict, \
356 |             '`dataset` is needed to pass through LaserMix, while not found.'
357 |         dataset = input_dict['dataset']
358 | 
359 |         # get index of other point cloud
360 |         index = np.random.randint(0, len(dataset))
361 | 
362 |         mix_results = dataset.get_data_info(index)
363 | 
364 |         if self.pre_transform is not None:
365 |             # pre_transform may also require dataset
366 |             mix_results.update({'dataset': dataset})
367 |             # before lasermix need to go through
368 |             # the necessary pre_transform
369 |             mix_results = self.pre_transform(mix_results)
370 |             mix_results.pop('dataset')
371 | 
372 |         input_dict = self.laser_mix_transform(input_dict, mix_results)
373 | 
374 |         return input_dict
375 | 
376 |     def __repr__(self) -> str:
377 |         """str: Return a string that describes the module."""
378 |         repr_str = self.__class__.__name__
379 |         repr_str += f'(num_areas={self.num_areas}, '
380 |         repr_str += f'pitch_angles={self.pitch_angles}, '
381 |         repr_str += f'pre_transform={self.pre_transform}, '
382 |         repr_str += f'prob={self.prob})'
383 |         return repr_str
384 | 


--------------------------------------------------------------------------------
/dist_test.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | CONFIG=$1
 4 | CHECKPOINT=$2
 5 | GPUS=$3
 6 | NNODES=${NNODES:-1}
 7 | NODE_RANK=${NODE_RANK:-0}
 8 | PORT=${PORT:-29500}
 9 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
10 | 
11 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
12 | python -m torch.distributed.launch \
13 |     --nnodes=$NNODES \
14 |     --node_rank=$NODE_RANK \
15 |     --master_addr=$MASTER_ADDR \
16 |     --nproc_per_node=$GPUS \
17 |     --master_port=$PORT \
18 |     $(dirname "$0")/test.py \
19 |     $CONFIG \
20 |     $CHECKPOINT \
21 |     --launcher pytorch \
22 |     ${@:4}
23 | 


--------------------------------------------------------------------------------
/dist_train.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | CONFIG=$1
 4 | GPUS=$2
 5 | NNODES=${NNODES:-1}
 6 | NODE_RANK=${NODE_RANK:-0}
 7 | PORT=${PORT:-29503}
 8 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
 9 | 
10 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
11 | python -m torch.distributed.launch \
12 |     --nnodes=$NNODES \
13 |     --node_rank=$NODE_RANK \
14 |     --master_addr=$MASTER_ADDR \
15 |     --nproc_per_node=$GPUS \
16 |     --master_port=$PORT \
17 |     $(dirname "$0")/train.py \
18 |     $CONFIG \
19 |     --launcher pytorch ${@:3} \
20 | 


--------------------------------------------------------------------------------
/evaluation/functional/panoptic_seg_eval.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | # Modified from mmdetection3d.
  3 | from typing import Dict, List, Tuple
  4 | 
  5 | import numpy as np
  6 | from mmengine.logging import MMLogger, print_log
  7 | 
  8 | PQReturnsType = Tuple[np.double, np.double, np.ndarray, np.ndarray, np.ndarray]
  9 | 
 10 | 
 11 | class EvalPanoptic:
 12 |     r"""Evaluate panoptic results for Semantickitti and NuScenes.
 13 |     Please refer to the `semantic kitti api
 14 |     <https://github.com/PRBonn/semantic-kitti-api/>`_ for more details
 15 | 
 16 |     Args:
 17 |         classes (list): Classes used in the dataset.
 18 |         thing_classes (list): Thing classes used in the dataset.
 19 |         stuff_classes (list): Stuff classes used in the dataset.
 20 |         include (list): Include classes in the dataset.
 21 |         dataset_type (str): Type of dataset.
 22 |         min_num_points (int): Minimum number of points of an object to be
 23 |             counted as ground truth in evaluation.
 24 |         id_offset (int): Offset for instance ids to concat with
 25 |             semantic labels.
 26 |         label2cat (dict[str]): Mapping from label to category.
 27 |         ignore_index (int): Indices of ignored classes in evaluation.
 28 |         logger (logging.Logger | str, optional): Logger used for printing.
 29 |             Defaults to None.
 30 |     """
 31 | 
 32 |     def __init__(self,
 33 |                  classes: List[str],
 34 |                  thing_classes: List[str],
 35 |                  stuff_classes: List[str],
 36 |                  include: List[int],
 37 |                  dataset_type: str,
 38 |                  min_num_points: int,
 39 |                  id_offset: int,
 40 |                  label2cat: Dict[str, str],
 41 |                  ignore_index: int,
 42 |                  logger: MMLogger = None):
 43 |         self.classes = classes
 44 |         self.thing_classes = thing_classes
 45 |         self.stuff_classes = stuff_classes
 46 |         self.include = include
 47 |         self.dataset_type = dataset_type
 48 |         self.ignore_index = ignore_index
 49 |         self.num_classes = len(classes)
 50 |         self.print_ignore_label=True
 51 |         if len(include) == self.num_classes:
 52 |             self.num_classes += 1
 53 |             self.print_ignore_label=False
 54 |         self.label2cat = label2cat
 55 |         self.logger = logger
 56 |         self.id_offset = id_offset
 57 |         self.eps = 1e-15
 58 |         self.min_num_points = min_num_points
 59 |         self.reset()
 60 | 
 61 |     def reset(self):
 62 |         """Reset class variables."""
 63 |         # general things
 64 |         # iou stuff
 65 |         self.confusion_matrix = np.zeros((self.num_classes, self.num_classes),
 66 |                                          dtype=int)
 67 |         # panoptic stuff
 68 |         self.pan_tp = np.zeros(self.num_classes, dtype=int)
 69 |         self.pan_iou = np.zeros(self.num_classes, dtype=np.double)
 70 |         self.pan_fp = np.zeros(self.num_classes, dtype=int)
 71 |         self.pan_fn = np.zeros(self.num_classes, dtype=int)
 72 | 
 73 |         self.evaluated_fnames = []
 74 | 
 75 |     def evaluate(self, gt_labels: List[Dict[str, np.ndarray]],
 76 |                  seg_preds: List[Dict[str, np.ndarray]]) -> Dict[str, float]:
 77 |         """Evaluate the predictions.
 78 | 
 79 |         Args:
 80 |             gt_labels (list[dict[np.ndarray]]): Ground Truth.
 81 |             seg_preds (list[dict[np.ndarray]]): Predictions.
 82 | 
 83 |         Returns:
 84 |             dict[float]: The computed metrics. The keys are the names of
 85 |             the metrics, and the values are corresponding results.
 86 |         """
 87 |         assert len(seg_preds) == len(gt_labels)
 88 |         for f in range(len(seg_preds)):
 89 |             gt_semantic_seg = gt_labels[f]['pts_semantic_mask'].astype(int)
 90 |             gt_instance_seg = gt_labels[f]['pts_instance_mask'].astype(int)
 91 |             pred_semantic_seg = seg_preds[f]['pts_semantic_mask'].astype(int)
 92 |             pred_instance_seg = seg_preds[f]['pts_instance_mask'].astype(int)
 93 | 
 94 |             valid_mask = gt_semantic_seg != self.ignore_index
 95 |             gt_semantic_seg = gt_semantic_seg[valid_mask]
 96 |             gt_instance_seg = gt_instance_seg[valid_mask]
 97 |             pred_semantic_seg = pred_semantic_seg[valid_mask]
 98 |             pred_instance_seg = pred_instance_seg[valid_mask]
 99 | 
100 |             self.add_semantic_sample(pred_semantic_seg, gt_semantic_seg)
101 |             self.add_panoptic_sample(pred_semantic_seg, gt_semantic_seg,
102 |                                      pred_instance_seg, gt_instance_seg)
103 | 
104 |         result_dicts = self.print_results()
105 | 
106 |         return result_dicts
107 | 
108 |     def print_results(self) -> Dict[str, float]:
109 |         """Print results.
110 | 
111 |         Returns:
112 |             dict[float]: The computed metrics. The keys are the names of
113 |             the metrics, and the values are corresponding results.
114 |         """
115 |         pq, sq, rq, all_pq, all_sq, all_rq = self.get_pq()
116 |         miou, iou = self.get_iou()
117 | 
118 |         # now make a nice dictionary
119 |         output_dict = {}
120 | 
121 |         # make python variables
122 |         pq = pq.item()
123 |         sq = sq.item()
124 |         rq = rq.item()
125 |         all_pq = all_pq.flatten().tolist()
126 |         all_sq = all_sq.flatten().tolist()
127 |         all_rq = all_rq.flatten().tolist()
128 |         miou = miou.item()
129 |         iou = iou.flatten().tolist()
130 | 
131 |         output_dict['all'] = {}
132 |         output_dict['all']['pq'] = pq
133 |         output_dict['all']['sq'] = sq
134 |         output_dict['all']['rq'] = rq
135 |         output_dict['all']['miou'] = miou
136 |         for idx, (_pq, _sq, _rq,
137 |                   _iou) in enumerate(zip(all_pq, all_sq, all_rq, iou)):
138 |             class_str = self.classes[idx]
139 |             output_dict[class_str] = {}
140 |             output_dict[class_str]['pq'] = _pq
141 |             output_dict[class_str]['sq'] = _sq
142 |             output_dict[class_str]['rq'] = _rq
143 |             output_dict[class_str]['miou'] = _iou
144 | 
145 |         pq_dagger = np.mean(
146 |             [float(output_dict[c]['pq']) for c in self.thing_classes] +
147 |             [float(output_dict[c]['miou']) for c in self.stuff_classes])
148 | 
149 |         pq_things = np.mean(
150 |             [float(output_dict[c]['pq']) for c in self.thing_classes])
151 |         rq_things = np.mean(
152 |             [float(output_dict[c]['rq']) for c in self.thing_classes])
153 |         sq_things = np.mean(
154 |             [float(output_dict[c]['sq']) for c in self.thing_classes])
155 | 
156 |         pq_stuff = np.mean(
157 |             [float(output_dict[c]['pq']) for c in self.stuff_classes])
158 |         rq_stuff = np.mean(
159 |             [float(output_dict[c]['rq']) for c in self.stuff_classes])
160 |         sq_stuff = np.mean(
161 |             [float(output_dict[c]['sq']) for c in self.stuff_classes])
162 | 
163 |         result_dicts = {}
164 |         result_dicts['pq'] = float(pq)
165 |         result_dicts['pq_dagger'] = float(pq_dagger)
166 |         result_dicts['sq_mean'] = float(sq)
167 |         result_dicts['rq_mean'] = float(rq)
168 |         result_dicts['miou'] = float(miou)
169 |         result_dicts['pq_stuff'] = float(pq_stuff)
170 |         result_dicts['rq_stuff'] = float(rq_stuff)
171 |         result_dicts['sq_stuff'] = float(sq_stuff)
172 |         result_dicts['pq_things'] = float(pq_things)
173 |         result_dicts['rq_things'] = float(rq_things)
174 |         result_dicts['sq_things'] = float(sq_things)
175 | 
176 |         if self.logger is not None:
177 |             print_log('|        |   IoU   |   PQ   |   RQ   |  SQ   |',
178 |                       self.logger)
179 |             for k, v in output_dict.items():
180 |                 print_log(
181 |                     '|{}| {:.4f} | {:.4f} | {:.4f} | {:.4f} |'.format(
182 |                         k.ljust(8)[-8:], v['miou'], v['pq'], v['rq'], v['sq']),
183 |                     self.logger)
184 |             print_log('True Positive: ', self.logger)
185 |             print_log('\t|\t'.join([str(x) for x in self.pan_tp]), self.logger)
186 |             print_log('False Positive: ')
187 |             print_log('\t|\t'.join([str(x) for x in self.pan_fp]), self.logger)
188 |             print_log('False Negative: ')
189 |             print_log('\t|\t'.join([str(x) for x in self.pan_fn]), self.logger)
190 | 
191 |         else:
192 |             print('|        |   IoU   |   PQ   |   RQ   |  SQ   |')
193 |             for k, v in output_dict.items():
194 |                 print('|{}| {:.4f} | {:.4f} | {:.4f} | {:.4f} |'.format(
195 |                     k.ljust(8)[-8:], v['miou'], v['pq'], v['rq'], v['sq']))
196 |             print('True Positive: ')
197 |             print('\t|\t'.join([str(x) for x in self.pan_tp]))
198 |             print('False Positive: ')
199 |             print('\t|\t'.join([str(x) for x in self.pan_fp]))
200 |             print('False Negative: ')
201 |             print('\t|\t'.join([str(x) for x in self.pan_fn]))
202 | 
203 |         return result_dicts
204 | 
205 |     def get_pq(self) -> PQReturnsType:
206 |         """Get results of PQ metric.
207 | 
208 |         Returns:
209 |             tuple(np.ndarray): PQ, SQ, RQ of each class and all class.
210 |         """
211 |         # get PQ and first calculate for all classes
212 |         sq_all = self.pan_iou.astype(np.double) / np.maximum(
213 |             self.pan_tp.astype(np.double), self.eps)
214 |         rq_all = self.pan_tp.astype(np.double) / np.maximum(
215 |             self.pan_tp.astype(np.double) + 0.5 * self.pan_fp.astype(np.double)
216 |             + 0.5 * self.pan_fn.astype(np.double), self.eps)
217 |         pq_all = sq_all * rq_all
218 | 
219 |         # then do the REAL mean (no ignored classes)
220 |         sq = sq_all[self.include].mean()
221 |         rq = rq_all[self.include].mean()
222 |         pq = pq_all[self.include].mean()
223 | 
224 |         if not self.print_ignore_label:
225 |             sq_all = sq_all[self.include]
226 |             rq_all = rq_all[self.include]
227 |             pq_all = pq_all[self.include]
228 | 
229 |         return (pq, sq, rq, pq_all, sq_all, rq_all)
230 | 
231 |     def get_iou(self) -> Tuple[np.double, np.ndarray]:
232 |         """Get results of IOU metric.
233 | 
234 |         Returns:
235 |             tuple(np.ndarray): iou of all class and each class.
236 |         """
237 |         tp, fp, fn = self.get_iou_stats()
238 |         intersection = tp
239 |         union = tp + fp + fn
240 |         union = np.maximum(union, self.eps)
241 |         iou = intersection.astype(np.double) / union.astype(np.double)
242 |         iou_mean = (intersection[self.include].astype(np.double) /
243 |                     union[self.include].astype(np.double)).mean()
244 | 
245 |         if not self.print_ignore_label:
246 |             iou = iou[self.include]
247 |         return iou_mean, iou
248 | 
249 |     def get_iou_stats(self) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
250 |         """Get IOU statistics of TP, FP and FN.
251 | 
252 |         Returns:
253 |             tuple(np.ndarray): TP, FP, FN of all class.
254 |         """
255 |         # copy to avoid modifying the real deal
256 |         conf = self.confusion_matrix.copy().astype(np.double)
257 |         # remove fp from confusion on the ignore classes predictions
258 |         # points that were predicted of another class, but were ignore
259 |         # (corresponds to zeroing the cols of those classes,
260 |         # since the predictions go on the rows)
261 | 
262 |         # get the clean stats
263 |         tp = conf.diagonal()
264 |         fp = conf.sum(axis=1) - tp
265 |         fn = conf.sum(axis=0) - tp
266 |         return tp, fp, fn
267 | 
268 |     def add_semantic_sample(self, semantic_preds: np.ndarray,
269 |                             gt_semantics: np.ndarray):
270 |         """Add one batch of semantic predictions and ground truths.
271 | 
272 |         Args:
273 |             semantic_preds (np.ndarray): Semantic predictions.
274 |             gt_semantics (np.ndarray): Semantic ground truths.
275 |         """
276 |         idxs = np.stack([semantic_preds, gt_semantics], axis=0)
277 |         # make confusion matrix (cols = gt, rows = pred)
278 |         np.add.at(self.confusion_matrix, tuple(idxs), 1)
279 | 
280 |     def add_panoptic_sample(self, semantic_preds: np.ndarray,
281 |                             gt_semantics: np.ndarray,
282 |                             instance_preds: np.ndarray,
283 |                             gt_instances: np.ndarray):
284 |         """Add one sample of panoptic predictions and ground truths for
285 |         evaluation.
286 | 
287 |         Args:
288 |             semantic_preds (np.ndarray): Semantic predictions.
289 |             gt_semantics (np.ndarray): Semantic ground truths.
290 |             instance_preds (np.ndarray): Instance predictions.
291 |             gt_instances (np.ndarray): Instance ground truths.
292 |         """
293 |         # avoid zero (ignored label)
294 |         instance_preds = instance_preds + 1
295 | 
296 |         if self.dataset_type == 'nuscenes':
297 |             gt_instances = gt_instances % self.id_offset + gt_semantics * self.id_offset
298 |         elif self.dataset_type == 'semantickitti':
299 |             gt_instances = gt_instances // self.id_offset * self.id_offset + gt_semantics
300 |         gt_instances = gt_instances + 1
301 | 
302 |         # first step is to count intersections > 0.5 IoU
303 |         # for each class (except the ignored ones)
304 |         for cl in range(self.num_classes):
305 |             # get a class mask
306 |             pred_inst_in_cl_mask = semantic_preds == cl
307 |             gt_inst_in_cl_mask = gt_semantics == cl
308 | 
309 |             # get instance points in class (makes outside stuff 0)
310 |             pred_inst_in_cl = instance_preds * pred_inst_in_cl_mask.astype(int)
311 |             gt_inst_in_cl = gt_instances * gt_inst_in_cl_mask.astype(int)
312 | 
313 |             # generate the areas for each unique instance prediction
314 |             unique_pred, counts_pred = np.unique(
315 |                 pred_inst_in_cl[pred_inst_in_cl > 0], return_counts=True)
316 |             id2idx_pred = {id: idx for idx, id in enumerate(unique_pred)}
317 |             matched_pred = np.array([False] * unique_pred.shape[0])
318 | 
319 |             # generate the areas for each unique instance gt_np
320 |             unique_gt, counts_gt = np.unique(
321 |                 gt_inst_in_cl[gt_inst_in_cl > 0], return_counts=True)
322 |             id2idx_gt = {id: idx for idx, id in enumerate(unique_gt)}
323 |             matched_gt = np.array([False] * unique_gt.shape[0])
324 | 
325 |             # generate intersection using offset
326 |             valid_combos = np.logical_and(pred_inst_in_cl > 0,
327 |                                           gt_inst_in_cl > 0)
328 |             id_offset_combo = pred_inst_in_cl[
329 |                 valid_combos] + self.id_offset * gt_inst_in_cl[valid_combos]
330 |             unique_combo, counts_combo = np.unique(
331 |                 id_offset_combo, return_counts=True)
332 | 
333 |             # generate an intersection map
334 |             # count the intersections with over 0.5 IoU as TP
335 |             gt_labels = unique_combo // self.id_offset
336 |             pred_labels = unique_combo % self.id_offset
337 |             gt_areas = np.array([counts_gt[id2idx_gt[id]] for id in gt_labels])
338 |             pred_areas = np.array(
339 |                 [counts_pred[id2idx_pred[id]] for id in pred_labels])
340 |             intersections = counts_combo
341 |             unions = gt_areas + pred_areas - intersections
342 |             ious = intersections.astype(float) / unions.astype(float)
343 | 
344 |             tp_indexes = ious > 0.5
345 |             self.pan_tp[cl] += np.sum(tp_indexes)
346 |             self.pan_iou[cl] += np.sum(ious[tp_indexes])
347 | 
348 |             matched_gt[[id2idx_gt[id] for id in gt_labels[tp_indexes]]] = True
349 |             matched_pred[[id2idx_pred[id]
350 |                           for id in pred_labels[tp_indexes]]] = True
351 | 
352 |             # count the FN
353 |             if len(counts_gt) > 0:
354 |                 self.pan_fn[cl] += np.sum(
355 |                     np.logical_and(counts_gt >= self.min_num_points,
356 |                                    ~matched_gt))
357 | 
358 |             # count the FP
359 |             if len(matched_pred) > 0:
360 |                 self.pan_fp[cl] += np.sum(
361 |                     np.logical_and(counts_pred >= self.min_num_points,
362 |                                    ~matched_pred))
363 | 
364 | 
365 | def panoptic_seg_eval(gt_labels: List[np.ndarray],
366 |                       seg_preds: List[np.ndarray],
367 |                       classes: List[str],
368 |                       thing_classes: List[str],
369 |                       stuff_classes: List[str],
370 |                       include: List[int],
371 |                       dataset_type: str,
372 |                       min_num_points: int,
373 |                       id_offset: int,
374 |                       label2cat: Dict[str, str],
375 |                       ignore_index: int,
376 |                       logger: MMLogger = None) -> Dict[str, float]:
377 |     """Panoptic Segmentation Evaluation.
378 | 
379 |     Evaluate the result of the panoptic segmentation.
380 | 
381 |     Args:
382 |         gt_labels (list[dict[np.ndarray]]): Ground Truth.
383 |         seg_preds (list[dict[np.ndarray]]): Predictions.
384 |         classes (list[str]): Classes used in the dataset.
385 |         thing_classes (list[str]): Thing classes used in the dataset.
386 |         stuff_classes (list[str]): Stuff classes used in the dataset.
387 |         include (list): Include classes in the dataset.
388 |         dataset_type (str): Type of dataset.
389 |         min_num_points (int): Minimum point number of object to be
390 |             counted as ground truth in evaluation.
391 |         id_offset (int): Offset for instance ids to concat with
392 |             semantic labels.
393 |         label2cat (dict[str]): Mapping from label to category.
394 |         ignore_index (int): Indices of ignored classes in evaluation.
395 |         logger (logging.Logger | str, optional): Logger used for printing.
396 |             Defaults to None.
397 | 
398 |     Returns:
399 |         dict[float]: Dict of results.
400 |     """
401 |     panoptic_seg_eval = EvalPanoptic(classes, thing_classes, stuff_classes, include, dataset_type,
402 |                                      min_num_points, id_offset, label2cat,
403 |                                      ignore_index, logger)
404 |     ret_dict = panoptic_seg_eval.evaluate(gt_labels, seg_preds)
405 |     return ret_dict
406 | 


--------------------------------------------------------------------------------
/evaluation/metrics/panoptic_seg_metric.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | # Modified from mmdetection3d.
  3 | from typing import Dict, List, Optional
  4 | 
  5 | from mmengine.logging import MMLogger
  6 | import mmengine
  7 | from ..functional.panoptic_seg_eval import panoptic_seg_eval
  8 | from mmdet3d.registry import METRICS
  9 | from mmdet3d.evaluation import SegMetric
 10 | import os
 11 | import json
 12 | import numpy as np
 13 | 
 14 | 
 15 | @METRICS.register_module()
 16 | class _PanopticSegMetric(SegMetric):
 17 |     def __init__(self,
 18 |                  thing_class_inds: List[int],
 19 |                  stuff_class_inds: List[int],
 20 |                  min_num_points: int,
 21 |                  id_offset: int,
 22 |                  dataset_type: str,
 23 |                  collect_device: str = 'cpu',
 24 |                  prefix: Optional[str] = None,
 25 |                  pklfile_prefix: str = None,
 26 |                  submission_prefix: str = None,
 27 |                  taskset: str = None,
 28 |                  learning_map_inv = None, 
 29 |                  **kwargs):
 30 |         self.thing_class_inds = thing_class_inds
 31 |         self.stuff_class_inds = stuff_class_inds
 32 |         self.min_num_points = min_num_points
 33 |         self.id_offset = id_offset
 34 |         self.dataset_type=dataset_type
 35 |         self.taskset = taskset
 36 |         self.learning_map_inv = learning_map_inv
 37 | 
 38 |         super(_PanopticSegMetric, self).__init__(
 39 |             pklfile_prefix=pklfile_prefix,
 40 |             submission_prefix=submission_prefix,
 41 |             prefix=prefix,
 42 |             collect_device=collect_device,
 43 |             **kwargs)
 44 | 
 45 |     # TODO modify format_result for panoptic segmentation evaluation, \
 46 |     # different datasets have different needs.
 47 | 
 48 |     def compute_metrics(self, results: list) -> Dict[str, float]:
 49 |         """Compute the metrics from processed results.
 50 | 
 51 |         Args:
 52 |             results (list): The processed results of each batch.
 53 | 
 54 |         Returns:
 55 |             Dict[str, float]: The computed metrics. The keys are the names of
 56 |             the metrics, and the values are corresponding results.
 57 |         """
 58 |         logger: MMLogger = MMLogger.get_current_instance()
 59 | 
 60 |         if self.submission_prefix: # will report a error when finishing.
 61 |             self.format_results(results)
 62 |             return None
 63 | 
 64 |         label2cat = self.dataset_meta['label2cat']
 65 |         ignore_index = self.dataset_meta['ignore_index']
 66 |         classes = self.dataset_meta['classes']
 67 |         thing_classes = [classes[i] for i in self.thing_class_inds]
 68 |         stuff_classes = [classes[i] for i in self.stuff_class_inds]
 69 |         include = self.thing_class_inds + self.stuff_class_inds
 70 | 
 71 |         gt_labels = []
 72 |         seg_preds = []
 73 |         for eval_ann, sinlge_pred_results in results:
 74 |             gt_labels.append(eval_ann)
 75 |             seg_preds.append(sinlge_pred_results)
 76 | 
 77 |         ret_dict = panoptic_seg_eval(gt_labels, seg_preds, classes,
 78 |                                      thing_classes, stuff_classes, include, self.dataset_type,
 79 |                                      self.min_num_points, self.id_offset,
 80 |                                      label2cat, ignore_index, logger)
 81 | 
 82 |         return ret_dict
 83 | 
 84 |     def format_results(self, results):
 85 |         r"""Format the results to txt file. Refer to `ScanNet documentation
 86 |         <http://kaldir.vc.in.tum.de/scannet_benchmark/documentation>`_.
 87 | 
 88 |         Args:
 89 |             outputs (list[dict]): Testing results of the dataset.
 90 | 
 91 |         Returns:
 92 |             tuple: (outputs, tmp_dir), outputs is the detection results,
 93 |                 tmp_dir is the temporal directory created for saving submission
 94 |                 files when ``submission_prefix`` is not specified.
 95 |         """
 96 | 
 97 |         submission_prefix = self.submission_prefix
 98 |         mmengine.mkdir_or_exist(submission_prefix)
 99 |         ignore_index = self.dataset_meta['ignore_index']
100 | 
101 | 
102 |         if self.dataset_type == 'nuscenes':
103 |             meta_dir = os.path.join(submission_prefix, self.taskset)
104 |             mmengine.mkdir_or_exist(meta_dir)
105 |             meta =  {"meta": {
106 |                 "task": "segmentation",
107 |                 "use_camera": False,
108 |                 "use_lidar": True,
109 |                 "use_radar": False,
110 |                 "use_map": False,
111 |                 "use_external": False}}
112 |             output = open(os.path.join(submission_prefix, self.taskset, 'submission.json'), 'w')
113 |             json_meta = json.dumps(meta)
114 |             output.write(json_meta)
115 |             output.close()
116 | 
117 |             for i, (eval_ann, result) in enumerate(results):
118 |                 sample_token = eval_ann['token']
119 |                 results_dir = os.path.join(submission_prefix, 'panoptic', self.taskset)
120 |                 mmengine.mkdir_or_exist(results_dir)
121 |                 results_dir = os.path.join(results_dir, sample_token) 
122 |                 pred_semantic_mask = result['pts_semantic_mask']
123 |                 pred_instance_mask = result['pts_instance_mask']
124 |                 pred_panoptic_mask = (pred_instance_mask + pred_semantic_mask*self.id_offset).astype(np.uint16)
125 |                 curr_file = results_dir +  "_panoptic.npz"
126 |                 np.savez_compressed(curr_file, data=pred_panoptic_mask)
127 |         elif self.dataset_type == "semantickitti":
128 |             results_dir = os.path.join(submission_prefix, 'sequences')
129 |             mmengine.mkdir_or_exist(results_dir)
130 |             for i in range(0,22):
131 |                 sub_dir = os.path.join(results_dir, str(i).zfill(2), 'predictions')
132 |                 mmengine.mkdir_or_exist(sub_dir)
133 | 
134 |             learning_map_inv_array = np.zeros(len(self.learning_map_inv))
135 |             for i, v in enumerate(self.learning_map_inv.values()):
136 |                 learning_map_inv_array[i] = v
137 |             for i, (eval_ann, result) in enumerate(results):
138 |                 semantic_preds = result['pts_semantic_mask']
139 |                 instance_preds = result['pts_instance_mask']
140 |                 semantic_preds_inv = learning_map_inv_array[semantic_preds]
141 |                 panoptic_preds = semantic_preds_inv.reshape(-1, 1) + (instance_preds * self.id_offset).reshape(-1, 1)
142 | 
143 |                 lidar_path = eval_ann['lidar_path']
144 |                 seq = lidar_path.split('/')[-3]
145 |                 pts_fname = lidar_path.split('/')[-1].split('.')[-2]+'.label'
146 |                 fname = os.path.join(results_dir, seq, 'predictions', pts_fname)
147 |                 panoptic_preds.reshape(-1).astype(np.uint32).tofile(fname)
148 |         else:
149 |             raise NotImplementedError()


--------------------------------------------------------------------------------
/p3former/backbones/cylinder3d.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | r"""Modified from Cylinder3D.
  3 | 
  4 | Please refer to `Cylinder3D github page
  5 | <https://github.com/xinge008/Cylinder3D>`_ for details
  6 | """
  7 | 
  8 | from typing import List, Optional
  9 | 
 10 | import numpy as np
 11 | import torch
 12 | from mmcv.cnn import build_activation_layer, build_norm_layer
 13 | from mmcv.ops import (SparseConv3d, SparseConvTensor, SparseInverseConv3d,
 14 |                       SubMConv3d)
 15 | from mmengine.model import BaseModule
 16 | 
 17 | from mmdet3d.registry import MODELS
 18 | from mmdet3d.utils import ConfigType
 19 | 
 20 | 
 21 | class AsymmResBlock(BaseModule):
 22 |     """Asymmetrical Residual Block.
 23 | 
 24 |     Args:
 25 |         in_channels (int): Input channels of the block.
 26 |         out_channels (int): Output channels of the block.
 27 |         norm_cfg (:obj:`ConfigDict` or dict): Config dict for
 28 |             normalization layer.
 29 |         act_cfg (:obj:`ConfigDict` or dict): Config dict of activation layers.
 30 |             Defaults to dict(type='LeakyReLU').
 31 |         indice_key (str, optional): Name of indice tables. Defaults to None.
 32 |     """
 33 | 
 34 |     def __init__(self,
 35 |                  in_channels: int,
 36 |                  out_channels: int,
 37 |                  norm_cfg: ConfigType,
 38 |                  act_cfg: ConfigType = dict(type='LeakyReLU'),
 39 |                  indice_key: Optional[str] = None):
 40 |         super().__init__()
 41 | 
 42 |         self.conv0_0 = SubMConv3d(
 43 |             in_channels,
 44 |             out_channels,
 45 |             kernel_size=(1, 3, 3),
 46 |             padding=1,
 47 |             bias=False,
 48 |             indice_key=indice_key + 'bef')
 49 |         self.act0_0 = build_activation_layer(act_cfg)
 50 |         self.bn0_0 = build_norm_layer(norm_cfg, out_channels)[1]
 51 | 
 52 |         self.conv0_1 = SubMConv3d(
 53 |             out_channels,
 54 |             out_channels,
 55 |             kernel_size=(3, 1, 3),
 56 |             padding=1,
 57 |             bias=False,
 58 |             indice_key=indice_key + 'bef')
 59 |         self.act0_1 = build_activation_layer(act_cfg)
 60 |         self.bn0_1 = build_norm_layer(norm_cfg, out_channels)[1]
 61 | 
 62 |         self.conv1_0 = SubMConv3d(
 63 |             in_channels,
 64 |             out_channels,
 65 |             kernel_size=(3, 1, 3),
 66 |             padding=1,
 67 |             bias=False,
 68 |             indice_key=indice_key + 'bef')
 69 |         self.act1_0 = build_activation_layer(act_cfg)
 70 |         self.bn1_0 = build_norm_layer(norm_cfg, out_channels)[1]
 71 | 
 72 |         self.conv1_1 = SubMConv3d(
 73 |             out_channels,
 74 |             out_channels,
 75 |             kernel_size=(1, 3, 3),
 76 |             padding=1,
 77 |             bias=False,
 78 |             indice_key=indice_key + 'bef')
 79 |         self.act1_1 = build_activation_layer(act_cfg)
 80 |         self.bn1_1 = build_norm_layer(norm_cfg, out_channels)[1]
 81 | 
 82 |     def forward(self, x: SparseConvTensor) -> SparseConvTensor:
 83 |         """Forward pass."""
 84 |         shortcut = self.conv0_0(x)
 85 | 
 86 |         shortcut.features = self.act0_0(shortcut.features)
 87 |         shortcut.features = self.bn0_0(shortcut.features)
 88 | 
 89 |         shortcut = self.conv0_1(shortcut)
 90 |         shortcut.features = self.act0_1(shortcut.features)
 91 |         shortcut.features = self.bn0_1(shortcut.features)
 92 | 
 93 |         res = self.conv1_0(x)
 94 |         res.features = self.act1_0(res.features)
 95 |         res.features = self.bn1_0(res.features)
 96 | 
 97 |         res = self.conv1_1(res)
 98 |         res.features = self.act1_1(res.features)
 99 |         res.features = self.bn1_1(res.features)
100 | 
101 |         res.features = res.features + shortcut.features
102 | 
103 |         return res
104 | 
105 | 
106 | class AsymmeDownBlock(BaseModule):
107 |     """Asymmetrical DownSample Block.
108 | 
109 |     Args:
110 |        in_channels (int): Input channels of the block.
111 |        out_channels (int): Output channels of the block.
112 |        norm_cfg (:obj:`ConfigDict` or dict): Config dict for
113 |             normalization layer.
114 |        act_cfg (:obj:`ConfigDict` or dict): Config dict of activation layers.
115 |             Defaults to dict(type='LeakyReLU').
116 |        pooling (bool): Whether pooling features at the end of
117 |            block. Defaults: True.
118 |        height_pooling (bool): Whether pooling features at
119 |            the height dimension. Defaults: False.
120 |        indice_key (str, optional): Name of indice tables. Defaults to None.
121 |     """
122 | 
123 |     def __init__(self,
124 |                  in_channels: int,
125 |                  out_channels: int,
126 |                  norm_cfg: ConfigType,
127 |                  act_cfg: ConfigType = dict(type='LeakyReLU'),
128 |                  pooling: bool = True,
129 |                  height_pooling: bool = False,
130 |                  indice_key: Optional[str] = None):
131 |         super().__init__()
132 |         self.pooling = pooling
133 | 
134 |         self.conv0_0 = SubMConv3d(
135 |             in_channels,
136 |             out_channels,
137 |             kernel_size=(3, 1, 3),
138 |             padding=1,
139 |             bias=False,
140 |             indice_key=indice_key + 'bef')
141 |         self.act0_0 = build_activation_layer(act_cfg)
142 |         self.bn0_0 = build_norm_layer(norm_cfg, out_channels)[1]
143 | 
144 |         self.conv0_1 = SubMConv3d(
145 |             out_channels,
146 |             out_channels,
147 |             kernel_size=(1, 3, 3),
148 |             padding=1,
149 |             bias=False,
150 |             indice_key=indice_key + 'bef')
151 |         self.act0_1 = build_activation_layer(act_cfg)
152 |         self.bn0_1 = build_norm_layer(norm_cfg, out_channels)[1]
153 | 
154 |         self.conv1_0 = SubMConv3d(
155 |             in_channels,
156 |             out_channels,
157 |             kernel_size=(1, 3, 3),
158 |             padding=1,
159 |             bias=False,
160 |             indice_key=indice_key + 'bef')
161 |         self.act1_0 = build_activation_layer(act_cfg)
162 |         self.bn1_0 = build_norm_layer(norm_cfg, out_channels)[1]
163 | 
164 |         self.conv1_1 = SubMConv3d(
165 |             out_channels,
166 |             out_channels,
167 |             kernel_size=(3, 1, 3),
168 |             padding=1,
169 |             bias=False,
170 |             indice_key=indice_key + 'bef')
171 |         self.act1_1 = build_activation_layer(act_cfg)
172 |         self.bn1_1 = build_norm_layer(norm_cfg, out_channels)[1]
173 | 
174 |         if pooling:
175 |             if height_pooling:
176 |                 self.pool = SparseConv3d(
177 |                     out_channels,
178 |                     out_channels,
179 |                     kernel_size=3,
180 |                     stride=2,
181 |                     padding=1,
182 |                     indice_key=indice_key,
183 |                     bias=False)
184 |             else:
185 |                 self.pool = SparseConv3d(
186 |                     out_channels,
187 |                     out_channels,
188 |                     kernel_size=3,
189 |                     stride=(2, 2, 1),
190 |                     padding=1,
191 |                     indice_key=indice_key,
192 |                     bias=False)
193 | 
194 |     def forward(self, x: SparseConvTensor) -> SparseConvTensor:
195 |         """Forward pass."""
196 |         shortcut = self.conv0_0(x)
197 |         shortcut.features = self.act0_0(shortcut.features)
198 |         shortcut.features = self.bn0_0(shortcut.features)
199 | 
200 |         shortcut = self.conv0_1(shortcut)
201 |         shortcut.features = self.act0_1(shortcut.features)
202 |         shortcut.features = self.bn0_1(shortcut.features)
203 | 
204 |         res = self.conv1_0(x)
205 |         res.features = self.act1_0(res.features)
206 |         res.features = self.bn1_0(res.features)
207 | 
208 |         res = self.conv1_1(res)
209 |         res.features = self.act1_1(res.features)
210 |         res.features = self.bn1_1(res.features)
211 | 
212 |         res.features = res.features + shortcut.features
213 | 
214 |         if self.pooling:
215 |             pooled_res = self.pool(res)
216 |             return pooled_res, res
217 |         else:
218 |             return res
219 | 
220 | 
221 | class AsymmeUpBlock(BaseModule):
222 |     """Asymmetrical UpSample Block.
223 | 
224 |     Args:
225 |         in_channels (int): Input channels of the block.
226 |         out_channels (int): Output channels of the block.
227 |         norm_cfg (:obj:`ConfigDict` or dict): Config dict for
228 |                 normalization layer.
229 |         act_cfg (:obj:`ConfigDict` or dict): Config dict of activation layers.
230 |                 Defaults to dict(type='LeakyReLU').
231 |         indice_key (str, optional): Name of indice tables. Defaults to None.
232 |         up_key (str, optional): Name of indice tables used in
233 |             SparseInverseConv3d. Defaults to None.
234 |     """
235 | 
236 |     def __init__(self,
237 |                  in_channels: int,
238 |                  out_channels: int,
239 |                  norm_cfg: ConfigType,
240 |                  act_cfg: ConfigType = dict(type='LeakyReLU'),
241 |                  indice_key: Optional[str] = None,
242 |                  up_key: Optional[str] = None):
243 |         super().__init__()
244 | 
245 |         self.trans_conv = SubMConv3d(
246 |             in_channels,
247 |             out_channels,
248 |             kernel_size=(3, 3, 3),
249 |             padding=1,
250 |             bias=False,
251 |             indice_key=indice_key + 'new_up')
252 |         self.trans_act = build_activation_layer(act_cfg)
253 |         self.trans_bn = build_norm_layer(norm_cfg, out_channels)[1]
254 | 
255 |         self.conv1 = SubMConv3d(
256 |             out_channels,
257 |             out_channels,
258 |             kernel_size=(1, 3, 3),
259 |             padding=1,
260 |             bias=False,
261 |             indice_key=indice_key)
262 |         self.act1 = build_activation_layer(act_cfg)
263 |         self.bn1 = build_norm_layer(norm_cfg, out_channels)[1]
264 | 
265 |         self.conv2 = SubMConv3d(
266 |             out_channels,
267 |             out_channels,
268 |             kernel_size=(3, 1, 3),
269 |             padding=1,
270 |             bias=False,
271 |             indice_key=indice_key)
272 |         self.act2 = build_activation_layer(act_cfg)
273 |         self.bn2 = build_norm_layer(norm_cfg, out_channels)[1]
274 | 
275 |         self.conv3 = SubMConv3d(
276 |             out_channels,
277 |             out_channels,
278 |             kernel_size=(3, 3, 3),
279 |             padding=1,
280 |             bias=False,
281 |             indice_key=indice_key)
282 |         self.act3 = build_activation_layer(act_cfg)
283 |         self.bn3 = build_norm_layer(norm_cfg, out_channels)[1]
284 | 
285 |         self.up_subm = SparseInverseConv3d(
286 |             out_channels,
287 |             out_channels,
288 |             kernel_size=3,
289 |             indice_key=up_key,
290 |             bias=False)
291 | 
292 |     def forward(self, x: SparseConvTensor,
293 |                 skip: SparseConvTensor) -> SparseConvTensor:
294 |         """Forward pass."""
295 |         x_trans = self.trans_conv(x)
296 |         x_trans.features = self.trans_act(x_trans.features)
297 |         x_trans.features = self.trans_bn(x_trans.features)
298 | 
299 |         # upsample
300 |         up = self.up_subm(x_trans)
301 | 
302 |         up.features = up.features + skip.features
303 | 
304 |         up = self.conv1(up)
305 |         up.features = self.act1(up.features)
306 |         up.features = self.bn1(up.features)
307 | 
308 |         up = self.conv2(up)
309 |         up.features = self.act2(up.features)
310 |         up.features = self.bn2(up.features)
311 | 
312 |         up = self.conv3(up)
313 |         up.features = self.act3(up.features)
314 |         up.features = self.bn3(up.features)
315 | 
316 |         return up
317 | 
318 | 
319 | class DDCMBlock(BaseModule):
320 |     """Dimension-Decomposition based Context Modeling.
321 | 
322 |     Args:
323 |         in_channels (int): Input channels of the block.
324 |         out_channels (int): Output channels of the block.
325 |         norm_cfg (:obj:`ConfigDict` or dict): Config dict for
326 |             normalization layer.
327 |         act_cfg (:obj:`ConfigDict` or dict): Config dict of activation layers.
328 |             Defaults to dict(type='Sigmoid').
329 |         indice_key (str, optional): Name of indice tables. Defaults to None.
330 |     """
331 | 
332 |     def __init__(self,
333 |                  in_channels: int,
334 |                  out_channels: int,
335 |                  norm_cfg: ConfigType,
336 |                  act_cfg: ConfigType = dict(type='Sigmoid'),
337 |                  indice_key: Optional[str] = None):
338 |         super().__init__()
339 | 
340 |         self.conv1 = SubMConv3d(
341 |             in_channels,
342 |             out_channels,
343 |             kernel_size=(3, 1, 1),
344 |             padding=1,
345 |             bias=False,
346 |             indice_key=indice_key)
347 |         self.bn1 = build_norm_layer(norm_cfg, out_channels)[1]
348 |         self.act1 = build_activation_layer(act_cfg)
349 | 
350 |         self.conv2 = SubMConv3d(
351 |             in_channels,
352 |             out_channels,
353 |             kernel_size=(1, 3, 1),
354 |             padding=1,
355 |             bias=False,
356 |             indice_key=indice_key)
357 |         self.bn2 = build_norm_layer(norm_cfg, out_channels)[1]
358 |         self.act2 = build_activation_layer(act_cfg)
359 | 
360 |         self.conv3 = SubMConv3d(
361 |             in_channels,
362 |             out_channels,
363 |             kernel_size=(1, 1, 3),
364 |             padding=1,
365 |             bias=False,
366 |             indice_key=indice_key)
367 |         self.bn3 = build_norm_layer(norm_cfg, out_channels)[1]
368 |         self.act3 = build_activation_layer(act_cfg)
369 | 
370 |     def forward(self, x: SparseConvTensor) -> SparseConvTensor:
371 |         """Forward pass."""
372 |         shortcut = self.conv1(x)
373 |         shortcut.features = self.bn1(shortcut.features)
374 |         shortcut.features = self.act1(shortcut.features)
375 | 
376 |         shortcut2 = self.conv2(x)
377 |         shortcut2.features = self.bn2(shortcut2.features)
378 |         shortcut2.features = self.act2(shortcut2.features)
379 | 
380 |         shortcut3 = self.conv3(x)
381 |         shortcut3.features = self.bn3(shortcut3.features)
382 |         shortcut3.features = self.act3(shortcut3.features)
383 |         shortcut.features = shortcut.features + \
384 |             shortcut2.features + shortcut3.features
385 | 
386 |         shortcut.features = shortcut.features * x.features
387 | 
388 |         return shortcut
389 | 
390 | 
391 | @MODELS.register_module(force=True)
392 | class _Asymm3DSpconv(BaseModule):
393 |     """Asymmetrical 3D convolution networks.
394 | 
395 |     Args:
396 |         grid_size (int): Size of voxel grids.
397 |         input_channels (int): Input channels of the block.
398 |         base_channels (int): Initial size of feature channels before
399 |             feeding into Encoder-Decoder structure. Defaults to 16.
400 |         backbone_depth (int): The depth of backbone. The backbone contains
401 |             downblocks and upblocks with the number of backbone_depth.
402 |         height_pooing (List[bool]): List indicating which downblocks perform
403 |             height pooling.
404 |         norm_cfg (:obj:`ConfigDict` or dict): Config dict for normalization
405 |             layer. Defaults to dict(type='BN1d', eps=1e-3, momentum=0.01)).
406 |         init_cfg (dict, optional): Initialization config.
407 |             Defaults to None.
408 |     """
409 | 
410 |     def __init__(self,
411 |                  grid_size: int,
412 |                  input_channels: int,
413 |                  base_channels: int = 16,
414 |                  backbone_depth: int = 4,
415 |                  height_pooing: List[bool] = [True, True, False, False],
416 |                  norm_cfg: ConfigType = dict(
417 |                      type='BN1d', eps=1e-3, momentum=0.01),
418 |                  init_cfg=None,
419 |                  more_conv=False,
420 |                  out_channels=128):
421 |         super().__init__(init_cfg=init_cfg)
422 | 
423 |         self.grid_size = grid_size
424 |         self.backbone_depth = backbone_depth
425 |         self.down_context = AsymmResBlock(
426 |             input_channels, base_channels, indice_key='pre', norm_cfg=norm_cfg)
427 | 
428 |         self.down_block_list = torch.nn.ModuleList()
429 |         self.up_block_list = torch.nn.ModuleList()
430 |         for i in range(self.backbone_depth):
431 |             self.down_block_list.append(
432 |                 AsymmeDownBlock(
433 |                     2**i * base_channels,
434 |                     2**(i + 1) * base_channels,
435 |                     height_pooling=height_pooing[i],
436 |                     indice_key='down' + str(i),
437 |                     norm_cfg=norm_cfg))
438 |             if i == self.backbone_depth - 1:
439 |                 self.up_block_list.append(
440 |                     AsymmeUpBlock(
441 |                         2**(i + 1) * base_channels,
442 |                         2**(i + 1) * base_channels,
443 |                         up_key='down' + str(i),
444 |                         indice_key='up' + str(self.backbone_depth - 1 - i),
445 |                         norm_cfg=norm_cfg))
446 |             else:
447 |                 self.up_block_list.append(
448 |                     AsymmeUpBlock(
449 |                         2**(i + 2) * base_channels,
450 |                         2**(i + 1) * base_channels,
451 |                         up_key='down' + str(i),
452 |                         indice_key='up' + str(self.backbone_depth - 1 - i),
453 |                         norm_cfg=norm_cfg))
454 | 
455 |         self.ddcm = DDCMBlock(
456 |             2 * base_channels,
457 |             2 * base_channels,
458 |             indice_key='ddcm',
459 |             norm_cfg=norm_cfg)
460 | 
461 |         self.more_conv = more_conv
462 |         if self.more_conv:
463 |             self.addConv = SubMConv3d(
464 |                 base_channels*4,
465 |                 out_channels,
466 |                 kernel_size=3,
467 |                 padding=1,
468 |                 bias=False,
469 |                 indice_key="mc")
470 |             self.addBn = build_norm_layer(norm_cfg, out_channels)[1]
471 |             self.addAct = build_activation_layer(dict(type='LeakyReLU'))
472 | 
473 |     def forward(self, voxel_features: torch.Tensor, coors: torch.Tensor,
474 |                 batch_size: int) -> SparseConvTensor:
475 |         """Forward pass."""
476 |         coors = coors.int()
477 |         ret = SparseConvTensor(voxel_features, coors, np.array(self.grid_size),
478 |                                batch_size)
479 |         ret = self.down_context(ret)
480 | 
481 |         down_skip_list = []
482 |         down_pool = ret
483 |         for i in range(self.backbone_depth):
484 |             down_pool, down_skip = self.down_block_list[i](down_pool)
485 |             down_skip_list.append(down_skip)
486 | 
487 |         up = down_pool
488 |         for i in range(self.backbone_depth - 1, -1, -1):
489 |             up = self.up_block_list[i](up, down_skip_list[i])
490 | 
491 |         ddcm = self.ddcm(up)
492 |         ddcm.features = torch.cat((ddcm.features, up.features), 1)
493 | 
494 |         if self.more_conv:
495 |             ddcm = self.addConv(ddcm)
496 |             ddcm.features = self.addBn(ddcm.features)
497 |             ddcm.features = self.addAct(ddcm.features)
498 |         
499 |         return ddcm
500 | 


--------------------------------------------------------------------------------
/p3former/data_preprocessors/data_preprocessor.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | # Modified from mmdetection3d.
  3 | import math
  4 | from numbers import Number
  5 | from typing import Dict, List, Optional, Sequence, Union
  6 | 
  7 | import numpy as np
  8 | import torch
  9 | from mmdet.models import DetDataPreprocessor
 10 | from mmengine.model import stack_batch
 11 | from mmengine.utils import is_list_of
 12 | from torch.nn import functional as F
 13 | 
 14 | from mmdet3d.registry import MODELS
 15 | from mmdet3d.structures.det3d_data_sample import SampleList
 16 | from mmdet3d.utils import OptConfigType
 17 | from mmdet3d.models.data_preprocessors.utils import multiview_img_stack_batch
 18 | from mmdet3d.models.data_preprocessors.voxelize import VoxelizationByGridShape, dynamic_scatter_3d
 19 | import torch_scatter
 20 | 
 21 | 
 22 | @MODELS.register_module(force=True)
 23 | class _Det3DDataPreprocessor(DetDataPreprocessor):
 24 |     """Points / Image pre-processor for point clouds / vision-only / multi-
 25 |     modality 3D detection tasks.
 26 | 
 27 |     It provides the data pre-processing as follows
 28 | 
 29 |     - Collate and move image and point cloud data to the target device.
 30 | 
 31 |     - 1) For image data:
 32 |     - Pad images in inputs to the maximum size of current batch with defined
 33 |       ``pad_value``. The padding size can be divisible by a defined
 34 |       ``pad_size_divisor``.
 35 |     - Stack images in inputs to batch_imgs.
 36 |     - Convert images in inputs from bgr to rgb if the shape of input is
 37 |       (3, H, W).
 38 |     - Normalize images in inputs with defined std and mean.
 39 |     - Do batch augmentations during training.
 40 | 
 41 |     - 2) For point cloud data:
 42 |     - If no voxelization, directly return list of point cloud data.
 43 |     - If voxelization is applied, voxelize point cloud according to
 44 |       ``voxel_type`` and obtain ``voxels``.
 45 | 
 46 |     Args:
 47 |         voxel (bool): Whether to apply voxelization to point cloud.
 48 |             Defaults to False.
 49 |         voxel_type (str): Voxelization type. Two voxelization types are
 50 |             provided: 'hard' and 'dynamic', respectively for hard
 51 |             voxelization and dynamic voxelization. Defaults to 'hard'.
 52 |         voxel_layer (dict or :obj:`ConfigDict`, optional): Voxelization layer
 53 |             config. Defaults to None.
 54 |         mean (Sequence[Number], optional): The pixel mean of R, G, B channels.
 55 |             Defaults to None.
 56 |         std (Sequence[Number], optional): The pixel standard deviation of
 57 |             R, G, B channels. Defaults to None.
 58 |         pad_size_divisor (int): The size of padded image should be
 59 |             divisible by ``pad_size_divisor``. Defaults to 1.
 60 |         pad_value (Number): The padded pixel value. Defaults to 0.
 61 |         pad_mask (bool): Whether to pad instance masks. Defaults to False.
 62 |         mask_pad_value (int): The padded pixel value for instance masks.
 63 |             Defaults to 0.
 64 |         pad_seg (bool): Whether to pad semantic segmentation maps.
 65 |             Defaults to False.
 66 |         seg_pad_value (int): The padded pixel value for semantic
 67 |             segmentation maps. Defaults to 255.
 68 |         bgr_to_rgb (bool): Whether to convert image from BGR to RGB.
 69 |             Defaults to False.
 70 |         rgb_to_bgr (bool): Whether to convert image from RGB to BGR.
 71 |             Defaults to False.
 72 |         boxtype2tensor (bool): Whether to keep the ``BaseBoxes`` type of
 73 |             bboxes data or not. Defaults to True.
 74 |         batch_augments (List[dict], optional): Batch-level augmentations.
 75 |             Defaults to None.
 76 |     """
 77 | 
 78 |     def __init__(self,
 79 |                  voxel: bool = False,
 80 |                  voxel_type: str = 'hard',
 81 |                  voxel_layer: OptConfigType = None,
 82 |                  mean: Sequence[Number] = None,
 83 |                  std: Sequence[Number] = None,
 84 |                  pad_size_divisor: int = 1,
 85 |                  pad_value: Union[float, int] = 0,
 86 |                  pad_mask: bool = False,
 87 |                  mask_pad_value: int = 0,
 88 |                  pad_seg: bool = False,
 89 |                  seg_pad_value: int = 255,
 90 |                  bgr_to_rgb: bool = False,
 91 |                  rgb_to_bgr: bool = False,
 92 |                  boxtype2tensor: bool = True,
 93 |                  batch_augments: Optional[List[dict]] = None) -> None:
 94 |         super(_Det3DDataPreprocessor, self).__init__(
 95 |             mean=mean,
 96 |             std=std,
 97 |             pad_size_divisor=pad_size_divisor,
 98 |             pad_value=pad_value,
 99 |             pad_mask=pad_mask,
100 |             mask_pad_value=mask_pad_value,
101 |             pad_seg=pad_seg,
102 |             seg_pad_value=seg_pad_value,
103 |             bgr_to_rgb=bgr_to_rgb,
104 |             rgb_to_bgr=rgb_to_bgr,
105 |             batch_augments=batch_augments)
106 |         self.voxel = voxel
107 |         self.voxel_type = voxel_type
108 |         if voxel:
109 |             self.voxel_layer = VoxelizationByGridShape(**voxel_layer)
110 | 
111 |     def forward(self,
112 |                 data: Union[dict, List[dict]],
113 |                 training: bool = False) -> Union[dict, List[dict]]:
114 |         """Perform normalization, padding and bgr2rgb conversion based on
115 |         ``BaseDataPreprocessor``.
116 | 
117 |         Args:
118 |             data (dict or List[dict]): Data from dataloader.
119 |                 The dict contains the whole batch data, when it is
120 |                 a list[dict], the list indicate test time augmentation.
121 |             training (bool): Whether to enable training time augmentation.
122 |                 Defaults to False.
123 | 
124 |         Returns:
125 |             dict or List[dict]: Data in the same format as the model input.
126 |         """
127 |         if isinstance(data, list):
128 |             num_augs = len(data)
129 |             aug_batch_data = []
130 |             for aug_id in range(num_augs):
131 |                 single_aug_batch_data = self.simple_process(
132 |                     data[aug_id], training)
133 |                 aug_batch_data.append(single_aug_batch_data)
134 |             return aug_batch_data
135 | 
136 |         else:
137 |             return self.simple_process(data, training)
138 | 
139 |     def simple_process(self, data: dict, training: bool = False) -> dict:
140 |         """Perform normalization, padding and bgr2rgb conversion for img data
141 |         based on ``BaseDataPreprocessor``, and voxelize point cloud if `voxel`
142 |         is set to be True.
143 | 
144 |         Args:
145 |             data (dict): Data sampled from dataloader.
146 |             training (bool): Whether to enable training time augmentation.
147 |                 Defaults to False.
148 | 
149 |         Returns:
150 |             dict: Data in the same format as the model input.
151 |         """
152 |         if 'img' in data['inputs']:
153 |             batch_pad_shape = self._get_pad_shape(data)
154 | 
155 |         data = self.collate_data(data)
156 |         inputs, data_samples = data['inputs'], data['data_samples']
157 |         batch_inputs = dict()
158 | 
159 |         if 'points' in inputs:
160 |             batch_inputs['points'] = inputs['points']
161 | 
162 |             if self.voxel:
163 |                 voxel_dict = self.voxelize(inputs['points'], data_samples)
164 |                 batch_inputs['voxels'] = voxel_dict
165 | 
166 |         if 'imgs' in inputs:
167 |             imgs = inputs['imgs']
168 | 
169 |             if data_samples is not None:
170 |                 # NOTE the batched image size information may be useful, e.g.
171 |                 # in DETR, this is needed for the construction of masks, which
172 |                 # is then used for the transformer_head.
173 |                 batch_input_shape = tuple(imgs[0].size()[-2:])
174 |                 for data_sample, pad_shape in zip(data_samples,
175 |                                                   batch_pad_shape):
176 |                     data_sample.set_metainfo({
177 |                         'batch_input_shape': batch_input_shape,
178 |                         'pad_shape': pad_shape
179 |                     })
180 | 
181 |                 if hasattr(self, 'boxtype2tensor') and self.boxtype2tensor:
182 |                     from mmdet.models.utils.misc import \
183 |                         samplelist_boxtype2tensor
184 |                     samplelist_boxtype2tensor(data_samples)
185 |                 elif hasattr(self, 'boxlist2tensor') and self.boxlist2tensor:
186 |                     from mmdet.models.utils.misc import \
187 |                         samplelist_boxlist2tensor
188 |                     samplelist_boxlist2tensor(data_samples)
189 |                 if self.pad_mask:
190 |                     self.pad_gt_masks(data_samples)
191 | 
192 |                 if self.pad_seg:
193 |                     self.pad_gt_sem_seg(data_samples)
194 | 
195 |             if training and self.batch_augments is not None:
196 |                 for batch_aug in self.batch_augments:
197 |                     imgs, data_samples = batch_aug(imgs, data_samples)
198 |             batch_inputs['imgs'] = imgs
199 | 
200 |         return {'inputs': batch_inputs, 'data_samples': data_samples}
201 | 
202 |     def preprocess_img(self, _batch_img: torch.Tensor) -> torch.Tensor:
203 |         # channel transform
204 |         if self._channel_conversion:
205 |             _batch_img = _batch_img[[2, 1, 0], ...]
206 |         # Convert to float after channel conversion to ensure
207 |         # efficiency
208 |         _batch_img = _batch_img.float()
209 |         # Normalization.
210 |         if self._enable_normalize:
211 |             if self.mean.shape[0] == 3:
212 |                 assert _batch_img.dim() == 3 and _batch_img.shape[0] == 3, (
213 |                     'If the mean has 3 values, the input tensor '
214 |                     'should in shape of (3, H, W), but got the '
215 |                     f'tensor with shape {_batch_img.shape}')
216 |             _batch_img = (_batch_img - self.mean) / self.std
217 |         return _batch_img
218 | 
219 |     def collate_data(self, data: dict) -> dict:
220 |         """Copying data to the target device and Performs normalization,
221 |         padding and bgr2rgb conversion and stack based on
222 |         ``BaseDataPreprocessor``.
223 | 
224 |         Collates the data sampled from dataloader into a list of dict and
225 |         list of labels, and then copies tensor to the target device.
226 | 
227 |         Args:
228 |             data (dict): Data sampled from dataloader.
229 | 
230 |         Returns:
231 |             dict: Data in the same format as the model input.
232 |         """
233 |         data = self.cast_data(data)  # type: ignore
234 | 
235 |         if 'img' in data['inputs']:
236 |             _batch_imgs = data['inputs']['img']
237 |             # Process data with `pseudo_collate`.
238 |             if is_list_of(_batch_imgs, torch.Tensor):
239 |                 batch_imgs = []
240 |                 img_dim = _batch_imgs[0].dim()
241 |                 for _batch_img in _batch_imgs:
242 |                     if img_dim == 3:  # standard img
243 |                         _batch_img = self.preprocess_img(_batch_img)
244 |                     elif img_dim == 4:
245 |                         _batch_img = [
246 |                             self.preprocess_img(_img) for _img in _batch_img
247 |                         ]
248 | 
249 |                         _batch_img = torch.stack(_batch_img, dim=0)
250 | 
251 |                     batch_imgs.append(_batch_img)
252 | 
253 |                 # Pad and stack Tensor.
254 |                 if img_dim == 3:
255 |                     batch_imgs = stack_batch(batch_imgs, self.pad_size_divisor,
256 |                                              self.pad_value)
257 |                 elif img_dim == 4:
258 |                     batch_imgs = multiview_img_stack_batch(
259 |                         batch_imgs, self.pad_size_divisor, self.pad_value)
260 | 
261 |             # Process data with `default_collate`.
262 |             elif isinstance(_batch_imgs, torch.Tensor):
263 |                 assert _batch_imgs.dim() == 4, (
264 |                     'The input of `ImgDataPreprocessor` should be a NCHW '
265 |                     'tensor or a list of tensor, but got a tensor with '
266 |                     f'shape: {_batch_imgs.shape}')
267 |                 if self._channel_conversion:
268 |                     _batch_imgs = _batch_imgs[:, [2, 1, 0], ...]
269 |                 # Convert to float after channel conversion to ensure
270 |                 # efficiency
271 |                 _batch_imgs = _batch_imgs.float()
272 |                 if self._enable_normalize:
273 |                     _batch_imgs = (_batch_imgs - self.mean) / self.std
274 |                 h, w = _batch_imgs.shape[2:]
275 |                 target_h = math.ceil(
276 |                     h / self.pad_size_divisor) * self.pad_size_divisor
277 |                 target_w = math.ceil(
278 |                     w / self.pad_size_divisor) * self.pad_size_divisor
279 |                 pad_h = target_h - h
280 |                 pad_w = target_w - w
281 |                 batch_imgs = F.pad(_batch_imgs, (0, pad_w, 0, pad_h),
282 |                                    'constant', self.pad_value)
283 |             else:
284 |                 raise TypeError(
285 |                     'Output of `cast_data` should be a list of dict '
286 |                     'or a tuple with inputs and data_samples, but got'
287 |                     f'{type(data)}: {data}')
288 | 
289 |             data['inputs']['imgs'] = batch_imgs
290 | 
291 |         data.setdefault('data_samples', None)
292 | 
293 |         return data
294 | 
295 |     def _get_pad_shape(self, data: dict) -> List[tuple]:
296 |         """Get the pad_shape of each image based on data and
297 |         pad_size_divisor."""
298 |         # rewrite `_get_pad_shape` for obtaining image inputs.
299 |         _batch_inputs = data['inputs']['img']
300 |         # Process data with `pseudo_collate`.
301 |         if is_list_of(_batch_inputs, torch.Tensor):
302 |             batch_pad_shape = []
303 |             for ori_input in _batch_inputs:
304 |                 if ori_input.dim() == 4:
305 |                     # mean multiview input, select one of the
306 |                     # image to calculate the pad shape
307 |                     ori_input = ori_input[0]
308 |                 pad_h = int(
309 |                     np.ceil(ori_input.shape[1] /
310 |                             self.pad_size_divisor)) * self.pad_size_divisor
311 |                 pad_w = int(
312 |                     np.ceil(ori_input.shape[2] /
313 |                             self.pad_size_divisor)) * self.pad_size_divisor
314 |                 batch_pad_shape.append((pad_h, pad_w))
315 |         # Process data with `default_collate`.
316 |         elif isinstance(_batch_inputs, torch.Tensor):
317 |             assert _batch_inputs.dim() == 4, (
318 |                 'The input of `ImgDataPreprocessor` should be a NCHW tensor '
319 |                 'or a list of tensor, but got a tensor with shape: '
320 |                 f'{_batch_inputs.shape}')
321 |             pad_h = int(
322 |                 np.ceil(_batch_inputs.shape[1] /
323 |                         self.pad_size_divisor)) * self.pad_size_divisor
324 |             pad_w = int(
325 |                 np.ceil(_batch_inputs.shape[2] /
326 |                         self.pad_size_divisor)) * self.pad_size_divisor
327 |             batch_pad_shape = [(pad_h, pad_w)] * _batch_inputs.shape[0]
328 |         else:
329 |             raise TypeError('Output of `cast_data` should be a list of dict '
330 |                             'or a tuple with inputs and data_samples, but got '
331 |                             f'{type(data)}: {data}')
332 |         return batch_pad_shape
333 | 
334 |     @torch.no_grad()
335 |     def voxelize(self, points: List[torch.Tensor],
336 |                  data_samples: SampleList) -> Dict[str, torch.Tensor]:
337 |         """Apply voxelization to point cloud.
338 | 
339 |         Args:
340 |             points (List[Tensor]): Point cloud in one data batch.
341 |             data_samples: (list[:obj:`Det3DDataSample`]): The annotation data
342 |                 of every samples. Add voxel-wise annotation for segmentation.
343 | 
344 |         Returns:
345 |             Dict[str, Tensor]: Voxelization information.
346 | 
347 |             - voxels (Tensor): Features of voxels, shape is MxNxC for hard
348 |               voxelization, NxC for dynamic voxelization.
349 |             - coors (Tensor): Coordinates of voxels, shape is Nx(1+NDim),
350 |               where 1 represents the batch index.
351 |             - num_points (Tensor, optional): Number of points in each voxel.
352 |             - voxel_centers (Tensor, optional): Centers of voxels.
353 |         """
354 | 
355 |         voxel_dict = dict()
356 | 
357 |         if self.voxel_type == 'hard':
358 |             voxels, coors, num_points, voxel_centers = [], [], [], []
359 |             for i, res in enumerate(points):
360 |                 res_voxels, res_coors, res_num_points = self.voxel_layer(res)
361 |                 res_voxel_centers = (
362 |                     res_coors[:, [2, 1, 0]] + 0.5) * res_voxels.new_tensor(
363 |                         self.voxel_layer.voxel_size) + res_voxels.new_tensor(
364 |                             self.voxel_layer.point_cloud_range[0:3])
365 |                 res_coors = F.pad(res_coors, (1, 0), mode='constant', value=i)
366 |                 voxels.append(res_voxels)
367 |                 coors.append(res_coors)
368 |                 num_points.append(res_num_points)
369 |                 voxel_centers.append(res_voxel_centers)
370 | 
371 |             voxels = torch.cat(voxels, dim=0)
372 |             coors = torch.cat(coors, dim=0)
373 |             num_points = torch.cat(num_points, dim=0)
374 |             voxel_centers = torch.cat(voxel_centers, dim=0)
375 | 
376 |             voxel_dict['num_points'] = num_points
377 |             voxel_dict['voxel_centers'] = voxel_centers
378 |         elif self.voxel_type == 'dynamic':
379 |             coors = []
380 |             # dynamic voxelization only provide a coors mapping
381 |             for i, res in enumerate(points):
382 |                 res_coors = self.voxel_layer(res)
383 |                 res_coors = F.pad(res_coors, (1, 0), mode='constant', value=i)
384 |                 coors.append(res_coors)
385 |             voxels = torch.cat(points, dim=0)
386 |             coors = torch.cat(coors, dim=0)
387 |         elif self.voxel_type == 'cylindrical':
388 |             voxels, coors = [], []
389 |             for i, (res, data_sample) in enumerate(zip(points, data_samples)):
390 |                 rho = torch.sqrt(res[:, 0]**2 + res[:, 1]**2)
391 |                 phi = torch.atan2(res[:, 1], res[:, 0])
392 |                 polar_res = torch.stack((rho, phi, res[:, 2]), dim=-1)
393 |                 min_bound = polar_res.new_tensor(
394 |                     self.voxel_layer.point_cloud_range[:3])
395 |                 max_bound = polar_res.new_tensor(
396 |                     self.voxel_layer.point_cloud_range[3:])
397 |                 try:  # only support PyTorch >= 1.9.0
398 |                     polar_res_clamp = torch.clamp(polar_res, min_bound,
399 |                                                   max_bound)
400 |                 except TypeError:
401 |                     polar_res_clamp = polar_res.clone()
402 |                     for coor_idx in range(3):
403 |                         polar_res_clamp[:, coor_idx][
404 |                             polar_res[:, coor_idx] >
405 |                             max_bound[coor_idx]] = max_bound[coor_idx]
406 |                         polar_res_clamp[:, coor_idx][
407 |                             polar_res[:, coor_idx] <
408 |                             min_bound[coor_idx]] = min_bound[coor_idx]
409 |                 res_coors = torch.floor(
410 |                     (polar_res_clamp - min_bound) / polar_res_clamp.new_tensor(
411 |                         self.voxel_layer.voxel_size)).int()
412 |                 self.get_voxel_seg(res_coors, data_sample)
413 |                 res_coors = F.pad(res_coors, (1, 0), mode='constant', value=i)
414 |                 res_voxels = torch.cat((polar_res, res[:, :2], res[:, 3:]),
415 |                                        dim=-1)
416 |                 voxels.append(res_voxels)
417 |                 coors.append(res_coors)
418 |             voxels = torch.cat(voxels, dim=0)
419 |             coors = torch.cat(coors, dim=0)
420 |         elif self.voxel_type == 'minkunet':
421 |             voxels, coors = [], []
422 |             voxel_size = points[0].new_tensor(self.voxel_layer.voxel_size)
423 |             for i, (res, data_sample) in enumerate(zip(points, data_samples)):
424 |                 res_coors = torch.round(res[:, :3] / voxel_size).int()
425 |                 res_coors -= res_coors.min(0)[0]
426 | 
427 |                 res_coors_numpy = res_coors.cpu().numpy()
428 |                 inds, voxel2point_map = self.sparse_quantize(
429 |                     res_coors_numpy, return_index=True, return_inverse=True)
430 |                 voxel2point_map = torch.from_numpy(voxel2point_map).cuda()
431 |                 if self.training:
432 |                     if len(inds) > 80000:
433 |                         inds = np.random.choice(inds, 80000, replace=False)
434 |                 inds = torch.from_numpy(inds).cuda()
435 |                 data_sample.gt_pts_seg.voxel_semantic_mask \
436 |                     = data_sample.gt_pts_seg.pts_semantic_mask[inds]
437 |                 res_voxel_coors = res_coors[inds]
438 |                 res_voxels = res[inds]
439 |                 res_voxel_coors = F.pad(
440 |                     res_voxel_coors, (0, 1), mode='constant', value=i)
441 |                 data_sample.voxel2point_map = voxel2point_map.long()
442 |                 voxels.append(res_voxels)
443 |                 coors.append(res_voxel_coors)
444 |             voxels = torch.cat(voxels, dim=0)
445 |             coors = torch.cat(coors, dim=0)
446 | 
447 |         else:
448 |             raise ValueError(f'Invalid voxelization type {self.voxel_type}')
449 | 
450 |         voxel_dict['voxels'] = voxels
451 |         voxel_dict['coors'] = coors
452 | 
453 |         return voxel_dict
454 | 
455 |     def get_voxel_seg(self, res_coors: torch.Tensor, data_sample: SampleList):
456 |         """Get voxel-wise segmentation label and point2voxel map.
457 | 
458 |         Args:
459 |             res_coors (Tensor): The voxel coordinates of points, Nx3.
460 |             data_sample: (:obj:`Det3DDataSample`): The annotation data of
461 |                 every samples. Add voxel-wise annotation forsegmentation.
462 |         """
463 | 
464 |         if self.training:
465 |             if hasattr(data_sample.gt_pts_seg, 'pts_instance_mask'):
466 |                 pts_instance_mask = data_sample.gt_pts_seg.pts_instance_mask
467 |                 pts_semantic_mask = data_sample.gt_pts_seg.pts_semantic_mask
468 |                 num_points = res_coors.shape[0]
469 | 
470 |                 _, unique_indices_inverse = torch.unique(
471 |                     res_coors, return_inverse=True, dim=0)
472 | 
473 |                 unq_instance_labels = torch.unique(pts_instance_mask, dim=0)
474 |                 sort_instance_labels = pts_instance_mask.new_zeros(pts_instance_mask.shape)
475 | 
476 |                 # map instance labels to [1 - the number of instances]
477 |                 i = 1
478 |                 ins2sem = pts_instance_mask.new_zeros(size=[len(unq_instance_labels)])
479 |                 for u in unq_instance_labels:
480 |                     valid = pts_instance_mask == u
481 |                     sort_instance_labels[valid] = i
482 |                     ins2sem[i - 1] = pts_semantic_mask[valid][0]
483 |                     i = i + 1
484 | 
485 |                 flatten_inst_labels = res_coors.new_zeros(
486 |                     (num_points, i))  # i: instance number+1(background)
487 |                 flatten_inst_labels[list(range(0, num_points)),
488 |                                     sort_instance_labels[list(range(0, num_points))].
489 |                                     cpu().numpy().tolist()] += 1
490 |                 # scatter point-wise labels into voxel-wise
491 |                 flatten_inst_label_sum = torch_scatter.scatter_sum(
492 |                     flatten_inst_labels, unique_indices_inverse, dim=0)
493 |                 voxel_instance_labels = torch.argmax(flatten_inst_label_sum, dim=1)
494 |                 voxel_semantic_labels = ins2sem[voxel_instance_labels - 1]
495 |                 data_sample.gt_pts_seg.voxel_semantic_mask = voxel_semantic_labels
496 |                 data_sample.gt_pts_seg.voxel_instance_mask = voxel_instance_labels
497 |             else:
498 |                 pts_semantic_mask = data_sample.gt_pts_seg.pts_semantic_mask
499 |                 voxel_semantic_mask, _, point2voxel_map = dynamic_scatter_3d(
500 |                     F.one_hot(pts_semantic_mask.long()).float(), res_coors, 'mean',
501 |                     True)
502 |                 voxel_semantic_mask = torch.argmax(voxel_semantic_mask, dim=-1)
503 |                 data_sample.gt_pts_seg.voxel_semantic_mask = voxel_semantic_mask
504 |                 data_sample.gt_pts_seg.point2voxel_map = point2voxel_map
505 | 
506 |         else:
507 |             pseudo_tensor = res_coors.new_ones([res_coors.shape[0], 1]).float()
508 |             _, _, point2voxel_map = dynamic_scatter_3d(pseudo_tensor,
509 |                                                        res_coors, 'mean', True)
510 |             data_sample.gt_pts_seg.point2voxel_map = point2voxel_map
511 | 
512 |     def ravel_hash(self, x: np.ndarray) -> np.ndarray:
513 |         """Get voxel coordinates hash for np.unique().
514 | 
515 |         Args:
516 |             x (np.ndarray): The voxel coordinates of points, Nx3.
517 | 
518 |         Returns:
519 |             np.ndarray: Voxels coordinates hash.
520 |         """
521 |         assert x.ndim == 2, x.shape
522 | 
523 |         x = x - np.min(x, axis=0)
524 |         x = x.astype(np.uint64, copy=False)
525 |         xmax = np.max(x, axis=0).astype(np.uint64) + 1
526 | 
527 |         h = np.zeros(x.shape[0], dtype=np.uint64)
528 |         for k in range(x.shape[1] - 1):
529 |             h += x[:, k]
530 |             h *= xmax[k + 1]
531 |         h += x[:, -1]
532 |         return h
533 | 
534 |     def sparse_quantize(self,
535 |                         coords: np.ndarray,
536 |                         return_index: bool = False,
537 |                         return_inverse: bool = False) -> List[np.ndarray]:
538 |         """Sparse Quantization for voxel coordinates used in Minkunet.
539 | 
540 |         Args:
541 |             coords (np.ndarray): The voxel coordinates of points, Nx3.
542 |             return_index (bool): Whether to return the indices of the
543 |                 unique coords, shape (M,).
544 |             return_inverse (bool): Whether to return the indices of the
545 |                 original coords shape (N,).
546 | 
547 |         Returns:
548 |             List[np.ndarray] or None: Return index and inverse map if
549 |             return_index and return_inverse is True.
550 |         """
551 |         _, indices, inverse_indices = np.unique(
552 |             self.ravel_hash(coords), return_index=True, return_inverse=True)
553 |         coords = coords[indices]
554 | 
555 |         outputs = []
556 |         if return_index:
557 |             outputs += [indices]
558 |         if return_inverse:
559 |             outputs += [inverse_indices]
560 |         return outputs
561 | 


--------------------------------------------------------------------------------
/p3former/segmentors/p3former.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from mmdet3d.registry import MODELS
 3 | from mmdet3d.models.segmentors.cylinder3d import Cylinder3D
 4 | from mmdet3d.structures import PointData
 5 | from mmdet3d.utils import ConfigType, OptConfigType, OptMultiConfig
 6 | 
 7 | @MODELS.register_module()
 8 | class _P3Former(Cylinder3D):
 9 |     """P3Former."""
10 | 
11 |     def __init__(self,
12 |                  voxel_encoder: ConfigType,
13 |                  backbone: ConfigType,
14 |                  decode_head: ConfigType,
15 |                  neck: OptConfigType = None,
16 |                  auxiliary_head: OptConfigType = None,
17 |                  loss_regularization: OptConfigType = None,
18 |                  train_cfg: OptConfigType = None,
19 |                  test_cfg: OptConfigType = None,
20 |                  data_preprocessor: OptConfigType = None,
21 |                  init_cfg: OptMultiConfig = None) -> None:
22 |         super().__init__(voxel_encoder=voxel_encoder,
23 |                         backbone=backbone,
24 |                         decode_head=decode_head,
25 |                         neck=neck,
26 |                         auxiliary_head=auxiliary_head,
27 |                         loss_regularization=loss_regularization,
28 |                         train_cfg=train_cfg,
29 |                         test_cfg=test_cfg,
30 |                         data_preprocessor=data_preprocessor,
31 |                         init_cfg=init_cfg)
32 | 
33 |     def loss(self, batch_inputs_dict,batch_data_samples):
34 |         """Calculate losses from a batch of inputs and data samples.
35 | 
36 |         Args:
37 |             batch_inputs_dict (dict): Input sample dict which
38 |                 includes 'points' and 'imgs' keys.
39 | 
40 |                 - points (List[Tensor]): Point cloud of each sample.
41 |                 - imgs (Tensor, optional): Image tensor has shape (B, C, H, W).
42 |             batch_data_samples (List[:obj:`Det3DDataSample`]): The det3d data
43 |                 samples. It usually includes information such as `metainfo` and
44 |                 `gt_pts_seg`.
45 | 
46 |         Returns:
47 |             Dict[str, Tensor]: A dictionary of loss components.
48 |         """
49 | 
50 |         # extract features using backbone
51 |         x = self.extract_feat(batch_inputs_dict)
52 |         batch_inputs_dict['features'] = x.features
53 |         losses = dict()
54 |         loss_decode = self._decode_head_forward_train(batch_inputs_dict, batch_data_samples)
55 |         losses.update(loss_decode)
56 | 
57 |         return losses
58 | 
59 |     def predict(self, batch_inputs_dict, batch_data_samples, **kwargs):
60 |         x = self.extract_feat(batch_inputs_dict)
61 |         batch_inputs_dict['features'] = x.features
62 |         pts_semantic_preds, pts_instance_preds = self.decode_head.predict(batch_inputs_dict, batch_data_samples)
63 |         return self.postprocess_result(pts_semantic_preds, pts_instance_preds, batch_data_samples)
64 | 
65 |     def postprocess_result(self, pts_semantic_preds, pts_instance_preds, batch_data_samples):
66 |         for i in range(len(pts_semantic_preds)):
67 |             batch_data_samples[i].set_data(
68 |                 {'pred_pts_seg': PointData(**{'pts_semantic_mask': pts_semantic_preds[i],
69 |                                                 'pts_instance_mask': pts_instance_preds[i]})})
70 |         return batch_data_samples
71 | 


--------------------------------------------------------------------------------
/p3former/task_modules/samplers/mask_pseduo_sampler.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) OpenMMLab. All rights reserved.
 2 | """copy from
 3 | https://github.com/ZwwWayne/K-Net/blob/main/knet/det/mask_pseudo_sampler.py."""
 4 | 
 5 | import torch
 6 | from mmengine.structures import InstanceData
 7 | 
 8 | from mmdet3d.registry import TASK_UTILS
 9 | from mmdet.models.task_modules.assigners import AssignResult
10 | from torch import Tensor
11 | from mmdet.models.task_modules.samplers.base_sampler import BaseSampler
12 | from mmdet.models.task_modules.samplers.mask_sampling_result import MaskSamplingResult
13 | 
14 | 
15 | class _MaskSamplingResult(MaskSamplingResult):
16 |     """Mask sampling result."""
17 | 
18 |     def __init__(self,
19 |                  pos_inds: Tensor,
20 |                  neg_inds: Tensor,
21 |                  masks: Tensor,
22 |                  gt_masks: Tensor,
23 |                  assign_result: AssignResult,
24 |                  gt_flags: Tensor,
25 |                  avg_factor_with_neg: bool = True) -> None:
26 |         super().__init__(pos_inds=pos_inds,
27 |                          neg_inds=neg_inds,
28 |                          masks=masks,
29 |                          gt_masks=gt_masks,
30 |                          assign_result=assign_result,
31 |                          gt_flags=gt_flags,
32 |                          avg_factor_with_neg=avg_factor_with_neg)
33 |         if assign_result.labels is not None:
34 |             self.pos_gt_labels = assign_result.labels[pos_inds]
35 |         else:
36 |             self.pos_gt_labels = None
37 | 
38 | 
39 | @TASK_UTILS.register_module()
40 | class _MaskPseudoSampler(BaseSampler):
41 |     """A pseudo sampler that does not do sampling actually."""
42 | 
43 |     def __init__(self, **kwargs):
44 |         pass
45 | 
46 |     def _sample_pos(self, **kwargs):
47 |         """Sample positive samples."""
48 |         raise NotImplementedError
49 | 
50 |     def _sample_neg(self, **kwargs):
51 |         """Sample negative samples."""
52 |         raise NotImplementedError
53 | 
54 |     def sample(self, assign_result: AssignResult, pred_instances: InstanceData,
55 |                gt_instances: InstanceData, *args, **kwargs):
56 |         """Directly returns the positive and negative indices  of samples.
57 | 
58 |         Args:
59 |             assign_result (:obj:`AssignResult`): Mask assigning results.
60 |             pred_instances (:obj:`InstanceData`): Instances of model
61 |                 predictions. It includes ``scores`` and ``masks`` predicted
62 |                 by the model.
63 |             gt_instances (:obj:`InstanceData`): Ground truth of instance
64 |                 annotations. It usually includes ``labels`` and ``masks``
65 |                 attributes.
66 | 
67 |         Returns:
68 |             :obj:`SamplingResult`: sampler results
69 |         """
70 |         pred_masks = pred_instances.masks
71 |         gt_masks = gt_instances.masks
72 |         pos_inds = torch.nonzero(
73 |             assign_result.gt_inds > 0, as_tuple=False).squeeze(-1).unique()
74 |         neg_inds = torch.nonzero(
75 |             assign_result.gt_inds == 0, as_tuple=False).squeeze(-1).unique()
76 |         gt_flags = pred_masks.new_zeros(pred_masks.shape[0], dtype=torch.uint8)
77 |         sampling_result = _MaskSamplingResult(
78 |             pos_inds=pos_inds,
79 |             neg_inds=neg_inds,
80 |             masks=pred_masks,
81 |             gt_masks=gt_masks,
82 |             assign_result=assign_result,
83 |             gt_flags=gt_flags,
84 |             avg_factor_with_neg=False)
85 |         return sampling_result
86 | 


--------------------------------------------------------------------------------
/test.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | import argparse
  3 | import os
  4 | import os.path as osp
  5 | 
  6 | from mmengine.config import Config, DictAction
  7 | from mmengine.registry import RUNNERS
  8 | from mmengine.runner import Runner
  9 | 
 10 | from mmdet3d.utils import replace_ceph_backend
 11 | 
 12 | 
 13 | # TODO: support fuse_conv_bn and format_only
 14 | def parse_args():
 15 |     parser = argparse.ArgumentParser(
 16 |         description='MMDet3D test (and eval) a model')
 17 |     parser.add_argument('config', help='test config file path')
 18 |     parser.add_argument('checkpoint', help='checkpoint file')
 19 |     parser.add_argument(
 20 |         '--work-dir',
 21 |         help='the directory to save the file containing evaluation metrics')
 22 |     parser.add_argument(
 23 |         '--ceph', action='store_true', help='Use ceph as data storage backend')
 24 |     parser.add_argument(
 25 |         '--show', action='store_true', help='show prediction results')
 26 |     parser.add_argument(
 27 |         '--show-dir',
 28 |         help='directory where painted images will be saved. '
 29 |         'If specified, it will be automatically saved '
 30 |         'to the work_dir/timestamp/show_dir')
 31 |     parser.add_argument(
 32 |         '--task',
 33 |         type=str,
 34 |         choices=[
 35 |             'mono_det', 'multi-view_det', 'lidar_det', 'lidar_seg',
 36 |             'multi-modality_det'
 37 |         ],
 38 |         help='Determine the visualization method depending on the task.')
 39 |     parser.add_argument(
 40 |         '--wait-time', type=float, default=2, help='the interval of show (s)')
 41 |     parser.add_argument(
 42 |         '--cfg-options',
 43 |         nargs='+',
 44 |         action=DictAction,
 45 |         help='override some settings in the used config, the key-value pair '
 46 |         'in xxx=yyy format will be merged into config file. If the value to '
 47 |         'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
 48 |         'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
 49 |         'Note that the quotation marks are necessary and that no white space '
 50 |         'is allowed.')
 51 |     parser.add_argument(
 52 |         '--launcher',
 53 |         choices=['none', 'pytorch', 'slurm', 'mpi'],
 54 |         default='none',
 55 |         help='job launcher')
 56 |     parser.add_argument('--local_rank', type=int, default=0)
 57 |     args = parser.parse_args()
 58 |     if 'LOCAL_RANK' not in os.environ:
 59 |         os.environ['LOCAL_RANK'] = str(args.local_rank)
 60 |     return args
 61 | 
 62 | 
 63 | def trigger_visualization_hook(cfg, args):
 64 |     default_hooks = cfg.default_hooks
 65 |     if 'visualization' in default_hooks:
 66 |         visualization_hook = default_hooks['visualization']
 67 |         # Turn on visualization
 68 |         visualization_hook['draw'] = True
 69 |         if args.show:
 70 |             visualization_hook['show'] = True
 71 |             visualization_hook['wait_time'] = args.wait_time
 72 |         if args.show_dir:
 73 |             visualization_hook['test_out_dir'] = args.show_dir
 74 |         visualization_hook['vis_task'] = args.task
 75 |     else:
 76 |         raise RuntimeError(
 77 |             'VisualizationHook must be included in default_hooks.'
 78 |             'refer to usage '
 79 |             '"visualization=dict(type=\'VisualizationHook\')"')
 80 | 
 81 |     return cfg
 82 | 
 83 | 
 84 | def main():
 85 |     args = parse_args()
 86 | 
 87 |     # load config
 88 |     cfg = Config.fromfile(args.config)
 89 | 
 90 |     # TODO: We will unify the ceph support approach with other OpenMMLab repos
 91 |     if args.ceph:
 92 |         cfg = replace_ceph_backend(cfg)
 93 | 
 94 |     cfg.launcher = args.launcher
 95 |     if args.cfg_options is not None:
 96 |         cfg.merge_from_dict(args.cfg_options)
 97 | 
 98 |     # work_dir is determined in this priority: CLI > segment in file > filename
 99 |     if args.work_dir is not None:
100 |         # update configs according to CLI args if args.work_dir is not None
101 |         cfg.work_dir = args.work_dir
102 |     elif cfg.get('work_dir', None) is None:
103 |         # use config filename as default work_dir if cfg.work_dir is None
104 |         cfg.work_dir = osp.join('./work_dirs',
105 |                                 osp.splitext(osp.basename(args.config))[0])
106 | 
107 |     cfg.load_from = args.checkpoint
108 | 
109 |     if args.show or args.show_dir:
110 |         cfg = trigger_visualization_hook(cfg, args)
111 | 
112 |     # build the runner from config
113 |     if 'runner_type' not in cfg:
114 |         # build the default runner
115 |         runner = Runner.from_cfg(cfg)
116 |     else:
117 |         # build customized runner from the registry
118 |         # if 'runner_type' is set in the cfg
119 |         runner = RUNNERS.build(cfg)
120 | 
121 |     # start testing
122 |     runner.test()
123 | 
124 | 
125 | if __name__ == '__main__':
126 |     main()
127 | 


--------------------------------------------------------------------------------
/tools/create_data.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | import argparse
  3 | from os import path as osp
  4 | 
  5 | from dataset_converters import nuscenes_converter as nuscenes_converter
  6 | from dataset_converters import semantickitti_converter
  7 | from dataset_converters.update_infos_to_v2 import update_pkl_infos
  8 | 
  9 | def nuscenes_data_prep(root_path,
 10 |                        info_prefix,
 11 |                        version,
 12 |                        dataset_name,
 13 |                        out_dir,
 14 |                        max_sweeps=10):
 15 |     """Prepare data related to nuScenes dataset.
 16 | 
 17 |     Related data consists of '.pkl' files recording basic infos,
 18 |     2D annotations and groundtruth database.
 19 | 
 20 |     Args:
 21 |         root_path (str): Path of dataset root.
 22 |         info_prefix (str): The prefix of info filenames.
 23 |         version (str): Dataset version.
 24 |         dataset_name (str): The dataset class name.
 25 |         out_dir (str): Output directory of the groundtruth database info.
 26 |         max_sweeps (int, optional): Number of input consecutive frames.
 27 |             Default: 10
 28 |     """
 29 |     nuscenes_converter.create_nuscenes_infos(
 30 |         root_path, info_prefix, version=version, max_sweeps=max_sweeps)
 31 | 
 32 |     if version == 'v1.0-test':
 33 |         info_test_path = osp.join(out_dir, f'{info_prefix}_infos_test.pkl')
 34 |         update_pkl_infos('nuscenes', out_dir=out_dir, pkl_path=info_test_path)
 35 |         return
 36 | 
 37 |     info_train_path = osp.join(out_dir, f'{info_prefix}_infos_train.pkl')
 38 |     info_val_path = osp.join(out_dir, f'{info_prefix}_infos_val.pkl')
 39 |     update_pkl_infos('nuscenes', out_dir=out_dir, pkl_path=info_train_path)
 40 |     update_pkl_infos('nuscenes', out_dir=out_dir, pkl_path=info_val_path)
 41 | 
 42 | def semantickitti_data_prep(info_prefix, out_dir):
 43 |     """Prepare the info file for SemanticKITTI dataset.
 44 | 
 45 |     Args:
 46 |         info_prefix (str): The prefix of info filenames.
 47 |         out_dir (str): Output directory of the generated info file.
 48 |     """
 49 |     semantickitti_converter.create_semantickitti_info_file(
 50 |         info_prefix, out_dir)
 51 | 
 52 | 
 53 | parser = argparse.ArgumentParser(description='Data converter arg parser')
 54 | parser.add_argument('dataset', metavar='kitti', help='name of the dataset')
 55 | parser.add_argument(
 56 |     '--root-path',
 57 |     type=str,
 58 |     default='./data/kitti',
 59 |     help='specify the root path of dataset')
 60 | parser.add_argument(
 61 |     '--version',
 62 |     type=str,
 63 |     default='v1.0',
 64 |     required=False,
 65 |     help='specify the dataset version, no need for kitti')
 66 | parser.add_argument(
 67 |     '--max-sweeps',
 68 |     type=int,
 69 |     default=10,
 70 |     required=False,
 71 |     help='specify sweeps of lidar per example')
 72 | parser.add_argument(
 73 |     '--with-plane',
 74 |     action='store_true',
 75 |     help='Whether to use plane information for kitti.')
 76 | parser.add_argument(
 77 |     '--out-dir',
 78 |     type=str,
 79 |     default='./data/kitti',
 80 |     required=False,
 81 |     help='name of info pkl')
 82 | parser.add_argument('--extra-tag', type=str, default='kitti')
 83 | parser.add_argument(
 84 |     '--workers', type=int, default=4, help='number of threads to be used')
 85 | args = parser.parse_args()
 86 | 
 87 | if __name__ == '__main__':
 88 |     # from mmdet3d.utils import register_all_modules
 89 |     # register_all_modules()
 90 | 
 91 |     if args.dataset == 'nuscenes' and args.version != 'v1.0-mini':
 92 |         train_version = f'{args.version}-trainval'
 93 |         nuscenes_data_prep(
 94 |             root_path=args.root_path,
 95 |             info_prefix=args.extra_tag,
 96 |             version=train_version,
 97 |             dataset_name='NuScenesDataset',
 98 |             out_dir=args.out_dir,
 99 |             max_sweeps=args.max_sweeps)
100 |         test_version = f'{args.version}-test'
101 |         nuscenes_data_prep(
102 |             root_path=args.root_path,
103 |             info_prefix=args.extra_tag,
104 |             version=test_version,
105 |             dataset_name='NuScenesDataset',
106 |             out_dir=args.out_dir,
107 |             max_sweeps=args.max_sweeps)
108 |     elif args.dataset == 'nuscenes' and args.version == 'v1.0-mini':
109 |         train_version = f'{args.version}'
110 |         nuscenes_data_prep(
111 |             root_path=args.root_path,
112 |             info_prefix=args.extra_tag,
113 |             version=train_version,
114 |             dataset_name='NuScenesDataset',
115 |             out_dir=args.out_dir,
116 |             max_sweeps=args.max_sweeps)
117 |     elif args.dataset == 'semantickitti':
118 |         semantickitti_data_prep(
119 |             info_prefix=args.extra_tag, out_dir=args.out_dir)
120 |     else:
121 |         raise NotImplementedError(f'Don\'t support {args.dataset} dataset.')
122 | 


--------------------------------------------------------------------------------
/tools/dataset_converters/nuscenes_converter.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | import os
  3 | from collections import OrderedDict
  4 | from os import path as osp
  5 | from typing import List, Tuple, Union
  6 | 
  7 | import mmcv
  8 | import mmengine
  9 | import numpy as np
 10 | from nuscenes.nuscenes import NuScenes
 11 | from nuscenes.utils.geometry_utils import view_points
 12 | from pyquaternion import Quaternion
 13 | from shapely.geometry import MultiPoint, box
 14 | 
 15 | # from mmdet3d.structures import points_cam2img
 16 | 
 17 | NuScenesNameMapping = {
 18 |     'movable_object.barrier': 'barrier',
 19 |     'vehicle.bicycle': 'bicycle',
 20 |     'vehicle.bus.bendy': 'bus',
 21 |     'vehicle.bus.rigid': 'bus',
 22 |     'vehicle.car': 'car',
 23 |     'vehicle.construction': 'construction_vehicle',
 24 |     'vehicle.motorcycle': 'motorcycle',
 25 |     'human.pedestrian.adult': 'pedestrian',
 26 |     'human.pedestrian.child': 'pedestrian',
 27 |     'human.pedestrian.construction_worker': 'pedestrian',
 28 |     'human.pedestrian.police_officer': 'pedestrian',
 29 |     'movable_object.trafficcone': 'traffic_cone',
 30 |     'vehicle.trailer': 'trailer',
 31 |     'vehicle.truck': 'truck'
 32 | }
 33 | 
 34 | nus_categories = ('car', 'truck', 'trailer', 'bus', 'construction_vehicle',
 35 |                   'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone',
 36 |                   'barrier')
 37 | 
 38 | nus_attributes = ('cycle.with_rider', 'cycle.without_rider',
 39 |                   'pedestrian.moving', 'pedestrian.standing',
 40 |                   'pedestrian.sitting_lying_down', 'vehicle.moving',
 41 |                   'vehicle.parked', 'vehicle.stopped', 'None')
 42 | 
 43 | 
 44 | def create_nuscenes_infos(root_path,
 45 |                           info_prefix,
 46 |                           version='v1.0-trainval',
 47 |                           max_sweeps=10):
 48 |     """Create info file of nuscene dataset.
 49 | 
 50 |     Given the raw data, generate its related info file in pkl format.
 51 | 
 52 |     Args:
 53 |         root_path (str): Path of the data root.
 54 |         info_prefix (str): Prefix of the info file to be generated.
 55 |         version (str, optional): Version of the data.
 56 |             Default: 'v1.0-trainval'.
 57 |         max_sweeps (int, optional): Max number of sweeps.
 58 |             Default: 10.
 59 |     """
 60 |     from nuscenes.nuscenes import NuScenes
 61 |     nusc = NuScenes(version=version, dataroot=root_path, verbose=True)
 62 |     from nuscenes.utils import splits
 63 |     available_vers = ['v1.0-trainval', 'v1.0-test', 'v1.0-mini']
 64 |     assert version in available_vers
 65 |     if version == 'v1.0-trainval':
 66 |         train_scenes = splits.train
 67 |         val_scenes = splits.val
 68 |     elif version == 'v1.0-test':
 69 |         train_scenes = splits.test
 70 |         val_scenes = []
 71 |     elif version == 'v1.0-mini':
 72 |         train_scenes = splits.mini_train
 73 |         val_scenes = splits.mini_val
 74 |     else:
 75 |         raise ValueError('unknown')
 76 | 
 77 |     # filter existing scenes.
 78 |     available_scenes = get_available_scenes(nusc)
 79 |     available_scene_names = [s['name'] for s in available_scenes]
 80 |     train_scenes = list(
 81 |         filter(lambda x: x in available_scene_names, train_scenes))
 82 |     val_scenes = list(filter(lambda x: x in available_scene_names, val_scenes))
 83 |     train_scenes = set([
 84 |         available_scenes[available_scene_names.index(s)]['token']
 85 |         for s in train_scenes
 86 |     ])
 87 |     val_scenes = set([
 88 |         available_scenes[available_scene_names.index(s)]['token']
 89 |         for s in val_scenes
 90 |     ])
 91 | 
 92 |     test = 'test' in version
 93 |     if test:
 94 |         print('test scene: {}'.format(len(train_scenes)))
 95 |     else:
 96 |         print('train scene: {}, val scene: {}'.format(
 97 |             len(train_scenes), len(val_scenes)))
 98 |     train_nusc_infos, val_nusc_infos = _fill_trainval_infos(
 99 |         nusc, train_scenes, val_scenes, test, max_sweeps=max_sweeps)
100 | 
101 |     metadata = dict(version=version)
102 |     if test:
103 |         print('test sample: {}'.format(len(train_nusc_infos)))
104 |         data = dict(infos=train_nusc_infos, metadata=metadata)
105 |         info_path = osp.join(root_path,
106 |                              '{}_infos_test.pkl'.format(info_prefix))
107 |         mmengine.dump(data, info_path)
108 |     else:
109 |         print('train sample: {}, val sample: {}'.format(
110 |             len(train_nusc_infos), len(val_nusc_infos)))
111 |         data = dict(infos=train_nusc_infos, metadata=metadata)
112 |         info_path = osp.join(root_path,
113 |                              '{}_infos_train.pkl'.format(info_prefix))
114 |         mmengine.dump(data, info_path)
115 |         data['infos'] = val_nusc_infos
116 |         info_val_path = osp.join(root_path,
117 |                                  '{}_infos_val.pkl'.format(info_prefix))
118 |         mmengine.dump(data, info_val_path)
119 | 
120 | 
121 | def get_available_scenes(nusc):
122 |     """Get available scenes from the input nuscenes class.
123 | 
124 |     Given the raw data, get the information of available scenes for
125 |     further info generation.
126 | 
127 |     Args:
128 |         nusc (class): Dataset class in the nuScenes dataset.
129 | 
130 |     Returns:
131 |         available_scenes (list[dict]): List of basic information for the
132 |             available scenes.
133 |     """
134 |     available_scenes = []
135 |     print('total scene num: {}'.format(len(nusc.scene)))
136 |     for scene in nusc.scene:
137 |         scene_token = scene['token']
138 |         scene_rec = nusc.get('scene', scene_token)
139 |         sample_rec = nusc.get('sample', scene_rec['first_sample_token'])
140 |         sd_rec = nusc.get('sample_data', sample_rec['data']['LIDAR_TOP'])
141 |         has_more_frames = True
142 |         scene_not_exist = False
143 |         while has_more_frames:
144 |             lidar_path, boxes, _ = nusc.get_sample_data(sd_rec['token'])
145 |             lidar_path = str(lidar_path)
146 |             if os.getcwd() in lidar_path:
147 |                 # path from lyftdataset is absolute path
148 |                 lidar_path = lidar_path.split(f'{os.getcwd()}/')[-1]
149 |                 # relative path
150 |             if not mmengine.is_filepath(lidar_path):
151 |                 scene_not_exist = True
152 |                 break
153 |             else:
154 |                 break
155 |         if scene_not_exist:
156 |             continue
157 |         available_scenes.append(scene)
158 |     print('exist scene num: {}'.format(len(available_scenes)))
159 |     return available_scenes
160 | 
161 | 
162 | def _fill_trainval_infos(nusc,
163 |                          train_scenes,
164 |                          val_scenes,
165 |                          test=False,
166 |                          max_sweeps=10):
167 |     """Generate the train/val infos from the raw data.
168 | 
169 |     Args:
170 |         nusc (:obj:`NuScenes`): Dataset class in the nuScenes dataset.
171 |         train_scenes (list[str]): Basic information of training scenes.
172 |         val_scenes (list[str]): Basic information of validation scenes.
173 |         test (bool, optional): Whether use the test mode. In test mode, no
174 |             annotations can be accessed. Default: False.
175 |         max_sweeps (int, optional): Max number of sweeps. Default: 10.
176 | 
177 |     Returns:
178 |         tuple[list[dict]]: Information of training set and validation set
179 |             that will be saved to the info file.
180 |     """
181 |     train_nusc_infos = []
182 |     val_nusc_infos = []
183 | 
184 |     for sample in mmengine.track_iter_progress(nusc.sample):
185 |         lidar_token = sample['data']['LIDAR_TOP']
186 |         sd_rec = nusc.get('sample_data', sample['data']['LIDAR_TOP'])
187 |         cs_record = nusc.get('calibrated_sensor',
188 |                              sd_rec['calibrated_sensor_token'])
189 |         pose_record = nusc.get('ego_pose', sd_rec['ego_pose_token'])
190 |         pts_semantic_mask_path = os.path.join(nusc.get('lidarseg', sd_rec['ego_pose_token'])['filename'])
191 |         pts_panoptic_mask_path = os.path.join(nusc.get('panoptic', sd_rec['ego_pose_token'])['filename'])
192 |         lidar_path, boxes, _ = nusc.get_sample_data(lidar_token)  
193 |         mmengine.check_file_exist(lidar_path)
194 | 
195 |         info = {
196 |             'lidar_path': lidar_path,
197 |             'num_features': 5,
198 |             'token': sample['token'],
199 |             'sweeps': [],
200 |             'cams': dict(),
201 |             'lidar2ego_translation': cs_record['translation'],
202 |             'lidar2ego_rotation': cs_record['rotation'],
203 |             'ego2global_translation': pose_record['translation'],
204 |             'ego2global_rotation': pose_record['rotation'],
205 |             'timestamp': sample['timestamp'],
206 |             'pts_semantic_mask_path': pts_semantic_mask_path,
207 |             'pts_panoptic_mask_path': pts_panoptic_mask_path
208 |         }
209 | 
210 |         l2e_r = info['lidar2ego_rotation']
211 |         l2e_t = info['lidar2ego_translation']
212 |         e2g_r = info['ego2global_rotation']
213 |         e2g_t = info['ego2global_translation']
214 |         l2e_r_mat = Quaternion(l2e_r).rotation_matrix
215 |         e2g_r_mat = Quaternion(e2g_r).rotation_matrix
216 | 
217 |         # obtain 6 image's information per frame
218 |         camera_types = [
219 |             'CAM_FRONT',
220 |             'CAM_FRONT_RIGHT',
221 |             'CAM_FRONT_LEFT',
222 |             'CAM_BACK',
223 |             'CAM_BACK_LEFT',
224 |             'CAM_BACK_RIGHT',
225 |         ]
226 |         for cam in camera_types:
227 |             cam_token = sample['data'][cam]
228 |             cam_path, _, cam_intrinsic = nusc.get_sample_data(cam_token)
229 |             cam_info = obtain_sensor2top(nusc, cam_token, l2e_t, l2e_r_mat,
230 |                                          e2g_t, e2g_r_mat, cam)
231 |             cam_info.update(cam_intrinsic=cam_intrinsic)
232 |             info['cams'].update({cam: cam_info})
233 | 
234 |         # obtain sweeps for a single key-frame
235 |         sd_rec = nusc.get('sample_data', sample['data']['LIDAR_TOP'])
236 |         sweeps = []
237 |         while len(sweeps) < max_sweeps:
238 |             if not sd_rec['prev'] == '':
239 |                 sweep = obtain_sensor2top(nusc, sd_rec['prev'], l2e_t,
240 |                                           l2e_r_mat, e2g_t, e2g_r_mat, 'lidar')
241 |                 sweeps.append(sweep)
242 |                 sd_rec = nusc.get('sample_data', sd_rec['prev'])
243 |             else:
244 |                 break
245 |         info['sweeps'] = sweeps
246 |         # obtain annotation
247 |         if not test:
248 |             annotations = [
249 |                 nusc.get('sample_annotation', token)
250 |                 for token in sample['anns']
251 |             ]
252 |             locs = np.array([b.center for b in boxes]).reshape(-1, 3)
253 |             dims = np.array([b.wlh for b in boxes]).reshape(-1, 3)
254 |             rots = np.array([b.orientation.yaw_pitch_roll[0]
255 |                              for b in boxes]).reshape(-1, 1)
256 |             velocity = np.array(
257 |                 [nusc.box_velocity(token)[:2] for token in sample['anns']])
258 |             valid_flag = np.array(
259 |                 [(anno['num_lidar_pts'] + anno['num_radar_pts']) > 0
260 |                  for anno in annotations],
261 |                 dtype=bool).reshape(-1)
262 |             # convert velo from global to lidar
263 |             for i in range(len(boxes)):
264 |                 velo = np.array([*velocity[i], 0.0])
265 |                 velo = velo @ np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(
266 |                     l2e_r_mat).T
267 |                 velocity[i] = velo[:2]
268 | 
269 |             names = [b.name for b in boxes]
270 |             for i in range(len(names)):
271 |                 if names[i] in NuScenesNameMapping:
272 |                     names[i] = NuScenesNameMapping[names[i]]
273 |             names = np.array(names)
274 |             # we need to convert box size to
275 |             # the format of our lidar coordinate system
276 |             # which is x_size, y_size, z_size (corresponding to l, w, h)
277 |             gt_boxes = np.concatenate([locs, dims[:, [1, 0, 2]], rots], axis=1)
278 |             assert len(gt_boxes) == len(
279 |                 annotations), f'{len(gt_boxes)}, {len(annotations)}'
280 |             info['gt_boxes'] = gt_boxes
281 |             info['gt_names'] = names
282 |             info['gt_velocity'] = velocity.reshape(-1, 2)
283 |             info['num_lidar_pts'] = np.array(
284 |                 [a['num_lidar_pts'] for a in annotations])
285 |             info['num_radar_pts'] = np.array(
286 |                 [a['num_radar_pts'] for a in annotations])
287 |             info['valid_flag'] = valid_flag
288 | 
289 |         if sample['scene_token'] in train_scenes:
290 |             train_nusc_infos.append(info)
291 |         else:
292 |             val_nusc_infos.append(info)
293 | 
294 |     return train_nusc_infos, val_nusc_infos
295 | 
296 | 
297 | def obtain_sensor2top(nusc,
298 |                       sensor_token,
299 |                       l2e_t,
300 |                       l2e_r_mat,
301 |                       e2g_t,
302 |                       e2g_r_mat,
303 |                       sensor_type='lidar'):
304 |     """Obtain the info with RT matric from general sensor to Top LiDAR.
305 | 
306 |     Args:
307 |         nusc (class): Dataset class in the nuScenes dataset.
308 |         sensor_token (str): Sample data token corresponding to the
309 |             specific sensor type.
310 |         l2e_t (np.ndarray): Translation from lidar to ego in shape (1, 3).
311 |         l2e_r_mat (np.ndarray): Rotation matrix from lidar to ego
312 |             in shape (3, 3).
313 |         e2g_t (np.ndarray): Translation from ego to global in shape (1, 3).
314 |         e2g_r_mat (np.ndarray): Rotation matrix from ego to global
315 |             in shape (3, 3).
316 |         sensor_type (str, optional): Sensor to calibrate. Default: 'lidar'.
317 | 
318 |     Returns:
319 |         sweep (dict): Sweep information after transformation.
320 |     """
321 |     sd_rec = nusc.get('sample_data', sensor_token)
322 |     cs_record = nusc.get('calibrated_sensor',
323 |                          sd_rec['calibrated_sensor_token'])
324 |     pose_record = nusc.get('ego_pose', sd_rec['ego_pose_token'])
325 |     data_path = str(nusc.get_sample_data_path(sd_rec['token']))
326 |     if os.getcwd() in data_path:  # path from lyftdataset is absolute path
327 |         data_path = data_path.split(f'{os.getcwd()}/')[-1]  # relative path
328 |     sweep = {
329 |         'data_path': data_path,
330 |         'type': sensor_type,
331 |         'sample_data_token': sd_rec['token'],
332 |         'sensor2ego_translation': cs_record['translation'],
333 |         'sensor2ego_rotation': cs_record['rotation'],
334 |         'ego2global_translation': pose_record['translation'],
335 |         'ego2global_rotation': pose_record['rotation'],
336 |         'timestamp': sd_rec['timestamp']
337 |     }
338 |     l2e_r_s = sweep['sensor2ego_rotation']
339 |     l2e_t_s = sweep['sensor2ego_translation']
340 |     e2g_r_s = sweep['ego2global_rotation']
341 |     e2g_t_s = sweep['ego2global_translation']
342 | 
343 |     # obtain the RT from sensor to Top LiDAR
344 |     # sweep->ego->global->ego'->lidar
345 |     l2e_r_s_mat = Quaternion(l2e_r_s).rotation_matrix
346 |     e2g_r_s_mat = Quaternion(e2g_r_s).rotation_matrix
347 |     R = (l2e_r_s_mat.T @ e2g_r_s_mat.T) @ (
348 |         np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T)
349 |     T = (l2e_t_s @ e2g_r_s_mat.T + e2g_t_s) @ (
350 |         np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T)
351 |     T -= e2g_t @ (np.linalg.inv(e2g_r_mat).T @ np.linalg.inv(l2e_r_mat).T
352 |                   ) + l2e_t @ np.linalg.inv(l2e_r_mat).T
353 |     sweep['sensor2lidar_rotation'] = R.T  # points @ R.T + T
354 |     sweep['sensor2lidar_translation'] = T
355 |     return sweep
356 | 
357 | 
358 | def export_2d_annotation(root_path, info_path, version, mono3d=True):
359 |     """Export 2d annotation from the info file and raw data.
360 | 
361 |     Args:
362 |         root_path (str): Root path of the raw data.
363 |         info_path (str): Path of the info file.
364 |         version (str): Dataset version.
365 |         mono3d (bool, optional): Whether to export mono3d annotation.
366 |             Default: True.
367 |     """
368 |     # get bbox annotations for camera
369 |     camera_types = [
370 |         'CAM_FRONT',
371 |         'CAM_FRONT_RIGHT',
372 |         'CAM_FRONT_LEFT',
373 |         'CAM_BACK',
374 |         'CAM_BACK_LEFT',
375 |         'CAM_BACK_RIGHT',
376 |     ]
377 |     nusc_infos = mmengine.load(info_path)['infos']
378 |     nusc = NuScenes(version=version, dataroot=root_path, verbose=True)
379 |     # info_2d_list = []
380 |     cat2Ids = [
381 |         dict(id=nus_categories.index(cat_name), name=cat_name)
382 |         for cat_name in nus_categories
383 |     ]
384 |     coco_ann_id = 0
385 |     coco_2d_dict = dict(annotations=[], images=[], categories=cat2Ids)
386 |     for info in mmengine.track_iter_progress(nusc_infos):
387 |         for cam in camera_types:
388 |             cam_info = info['cams'][cam]
389 |             coco_infos = get_2d_boxes(
390 |                 nusc,
391 |                 cam_info['sample_data_token'],
392 |                 visibilities=['', '1', '2', '3', '4'],
393 |                 mono3d=mono3d)
394 |             (height, width, _) = mmcv.imread(cam_info['data_path']).shape
395 |             coco_2d_dict['images'].append(
396 |                 dict(
397 |                     file_name=cam_info['data_path'].split('data/nuscenes/')
398 |                     [-1],
399 |                     id=cam_info['sample_data_token'],
400 |                     token=info['token'],
401 |                     cam2ego_rotation=cam_info['sensor2ego_rotation'],
402 |                     cam2ego_translation=cam_info['sensor2ego_translation'],
403 |                     ego2global_rotation=info['ego2global_rotation'],
404 |                     ego2global_translation=info['ego2global_translation'],
405 |                     cam_intrinsic=cam_info['cam_intrinsic'],
406 |                     width=width,
407 |                     height=height))
408 |             for coco_info in coco_infos:
409 |                 if coco_info is None:
410 |                     continue
411 |                 # add an empty key for coco format
412 |                 coco_info['segmentation'] = []
413 |                 coco_info['id'] = coco_ann_id
414 |                 coco_2d_dict['annotations'].append(coco_info)
415 |                 coco_ann_id += 1
416 |     if mono3d:
417 |         json_prefix = f'{info_path[:-4]}_mono3d'
418 |     else:
419 |         json_prefix = f'{info_path[:-4]}'
420 |     mmengine.dump(coco_2d_dict, f'{json_prefix}.coco.json')
421 | 
422 | 
423 | def get_2d_boxes(nusc,
424 |                  sample_data_token: str,
425 |                  visibilities: List[str],
426 |                  mono3d=True):
427 |     """Get the 2D annotation records for a given `sample_data_token`.
428 | 
429 |     Args:
430 |         sample_data_token (str): Sample data token belonging to a camera
431 |             keyframe.
432 |         visibilities (list[str]): Visibility filter.
433 |         mono3d (bool): Whether to get boxes with mono3d annotation.
434 | 
435 |     Return:
436 |         list[dict]: List of 2D annotation record that belongs to the input
437 |             `sample_data_token`.
438 |     """
439 | 
440 |     # Get the sample data and the sample corresponding to that sample data.
441 |     sd_rec = nusc.get('sample_data', sample_data_token)
442 | 
443 |     assert sd_rec[
444 |         'sensor_modality'] == 'camera', 'Error: get_2d_boxes only works' \
445 |         ' for camera sample_data!'
446 |     if not sd_rec['is_key_frame']:
447 |         raise ValueError(
448 |             'The 2D re-projections are available only for keyframes.')
449 | 
450 |     s_rec = nusc.get('sample', sd_rec['sample_token'])
451 | 
452 |     # Get the calibrated sensor and ego pose
453 |     # record to get the transformation matrices.
454 |     cs_rec = nusc.get('calibrated_sensor', sd_rec['calibrated_sensor_token'])
455 |     pose_rec = nusc.get('ego_pose', sd_rec['ego_pose_token'])
456 |     camera_intrinsic = np.array(cs_rec['camera_intrinsic'])
457 | 
458 |     # Get all the annotation with the specified visibilties.
459 |     ann_recs = [
460 |         nusc.get('sample_annotation', token) for token in s_rec['anns']
461 |     ]
462 |     ann_recs = [
463 |         ann_rec for ann_rec in ann_recs
464 |         if (ann_rec['visibility_token'] in visibilities)
465 |     ]
466 | 
467 |     repro_recs = []
468 | 
469 |     for ann_rec in ann_recs:
470 |         # Augment sample_annotation with token information.
471 |         ann_rec['sample_annotation_token'] = ann_rec['token']
472 |         ann_rec['sample_data_token'] = sample_data_token
473 | 
474 |         # Get the box in global coordinates.
475 |         box = nusc.get_box(ann_rec['token'])
476 | 
477 |         # Move them to the ego-pose frame.
478 |         box.translate(-np.array(pose_rec['translation']))
479 |         box.rotate(Quaternion(pose_rec['rotation']).inverse)
480 | 
481 |         # Move them to the calibrated sensor frame.
482 |         box.translate(-np.array(cs_rec['translation']))
483 |         box.rotate(Quaternion(cs_rec['rotation']).inverse)
484 | 
485 |         # Filter out the corners that are not in front of the calibrated
486 |         # sensor.
487 |         corners_3d = box.corners()
488 |         in_front = np.argwhere(corners_3d[2, :] > 0).flatten()
489 |         corners_3d = corners_3d[:, in_front]
490 | 
491 |         # Project 3d box to 2d.
492 |         corner_coords = view_points(corners_3d, camera_intrinsic,
493 |                                     True).T[:, :2].tolist()
494 | 
495 |         # Keep only corners that fall within the image.
496 |         final_coords = post_process_coords(corner_coords)
497 | 
498 |         # Skip if the convex hull of the re-projected corners
499 |         # does not intersect the image canvas.
500 |         if final_coords is None:
501 |             continue
502 |         else:
503 |             min_x, min_y, max_x, max_y = final_coords
504 | 
505 |         # Generate dictionary record to be included in the .json file.
506 |         repro_rec = generate_record(ann_rec, min_x, min_y, max_x, max_y,
507 |                                     sample_data_token, sd_rec['filename'])
508 | 
509 |         # If mono3d=True, add 3D annotations in camera coordinates
510 |         if mono3d and (repro_rec is not None):
511 |             loc = box.center.tolist()
512 | 
513 |             dim = box.wlh
514 |             dim[[0, 1, 2]] = dim[[1, 2, 0]]  # convert wlh to our lhw
515 |             dim = dim.tolist()
516 | 
517 |             rot = box.orientation.yaw_pitch_roll[0]
518 |             rot = [-rot]  # convert the rot to our cam coordinate
519 | 
520 |             global_velo2d = nusc.box_velocity(box.token)[:2]
521 |             global_velo3d = np.array([*global_velo2d, 0.0])
522 |             e2g_r_mat = Quaternion(pose_rec['rotation']).rotation_matrix
523 |             c2e_r_mat = Quaternion(cs_rec['rotation']).rotation_matrix
524 |             cam_velo3d = global_velo3d @ np.linalg.inv(
525 |                 e2g_r_mat).T @ np.linalg.inv(c2e_r_mat).T
526 |             velo = cam_velo3d[0::2].tolist()
527 | 
528 |             repro_rec['bbox_cam3d'] = loc + dim + rot
529 |             repro_rec['velo_cam3d'] = velo
530 | 
531 |             center3d = np.array(loc).reshape([1, 3])
532 |             # center2d = points_cam2img(
533 |             #     center3d, camera_intrinsic, with_depth=True)
534 |             # repro_rec['center2d'] = center2d.squeeze().tolist()
535 |             # normalized center2D + depth
536 |             # if samples with depth < 0 will be removed
537 |             # if repro_rec['center2d'][2] <= 0:
538 |             #     continue
539 | 
540 |             ann_token = nusc.get('sample_annotation',
541 |                                  box.token)['attribute_tokens']
542 |             if len(ann_token) == 0:
543 |                 attr_name = 'None'
544 |             else:
545 |                 attr_name = nusc.get('attribute', ann_token[0])['name']
546 |             attr_id = nus_attributes.index(attr_name)
547 |             repro_rec['attribute_name'] = attr_name
548 |             repro_rec['attribute_id'] = attr_id
549 | 
550 |         repro_recs.append(repro_rec)
551 | 
552 |     return repro_recs
553 | 
554 | 
555 | def post_process_coords(
556 |     corner_coords: List, imsize: Tuple[int, int] = (1600, 900)
557 | ) -> Union[Tuple[float, float, float, float], None]:
558 |     """Get the intersection of the convex hull of the reprojected bbox corners
559 |     and the image canvas, return None if no intersection.
560 | 
561 |     Args:
562 |         corner_coords (list[int]): Corner coordinates of reprojected
563 |             bounding box.
564 |         imsize (tuple[int]): Size of the image canvas.
565 | 
566 |     Return:
567 |         tuple [float]: Intersection of the convex hull of the 2D box
568 |             corners and the image canvas.
569 |     """
570 |     polygon_from_2d_box = MultiPoint(corner_coords).convex_hull
571 |     img_canvas = box(0, 0, imsize[0], imsize[1])
572 | 
573 |     if polygon_from_2d_box.intersects(img_canvas):
574 |         img_intersection = polygon_from_2d_box.intersection(img_canvas)
575 |         intersection_coords = np.array(
576 |             [coord for coord in img_intersection.exterior.coords])
577 | 
578 |         min_x = min(intersection_coords[:, 0])
579 |         min_y = min(intersection_coords[:, 1])
580 |         max_x = max(intersection_coords[:, 0])
581 |         max_y = max(intersection_coords[:, 1])
582 | 
583 |         return min_x, min_y, max_x, max_y
584 |     else:
585 |         return None
586 | 
587 | 
588 | def generate_record(ann_rec: dict, x1: float, y1: float, x2: float, y2: float,
589 |                     sample_data_token: str, filename: str) -> OrderedDict:
590 |     """Generate one 2D annotation record given various information on top of
591 |     the 2D bounding box coordinates.
592 | 
593 |     Args:
594 |         ann_rec (dict): Original 3d annotation record.
595 |         x1 (float): Minimum value of the x coordinate.
596 |         y1 (float): Minimum value of the y coordinate.
597 |         x2 (float): Maximum value of the x coordinate.
598 |         y2 (float): Maximum value of the y coordinate.
599 |         sample_data_token (str): Sample data token.
600 |         filename (str):The corresponding image file where the annotation
601 |             is present.
602 | 
603 |     Returns:
604 |         dict: A sample 2D annotation record.
605 |             - file_name (str): file name
606 |             - image_id (str): sample data token
607 |             - area (float): 2d box area
608 |             - category_name (str): category name
609 |             - category_id (int): category id
610 |             - bbox (list[float]): left x, top y, dx, dy of 2d box
611 |             - iscrowd (int): whether the area is crowd
612 |     """
613 |     repro_rec = OrderedDict()
614 |     repro_rec['sample_data_token'] = sample_data_token
615 |     coco_rec = dict()
616 | 
617 |     relevant_keys = [
618 |         'attribute_tokens',
619 |         'category_name',
620 |         'instance_token',
621 |         'next',
622 |         'num_lidar_pts',
623 |         'num_radar_pts',
624 |         'prev',
625 |         'sample_annotation_token',
626 |         'sample_data_token',
627 |         'visibility_token',
628 |     ]
629 | 
630 |     for key, value in ann_rec.items():
631 |         if key in relevant_keys:
632 |             repro_rec[key] = value
633 | 
634 |     repro_rec['bbox_corners'] = [x1, y1, x2, y2]
635 |     repro_rec['filename'] = filename
636 | 
637 |     coco_rec['file_name'] = filename
638 |     coco_rec['image_id'] = sample_data_token
639 |     coco_rec['area'] = (y2 - y1) * (x2 - x1)
640 | 
641 |     if repro_rec['category_name'] not in NuScenesNameMapping:
642 |         return None
643 |     cat_name = NuScenesNameMapping[repro_rec['category_name']]
644 |     coco_rec['category_name'] = cat_name
645 |     coco_rec['category_id'] = nus_categories.index(cat_name)
646 |     coco_rec['bbox'] = [x1, y1, x2 - x1, y2 - y1]
647 |     coco_rec['iscrowd'] = 0
648 | 
649 |     return coco_rec
650 | 


--------------------------------------------------------------------------------
/tools/dataset_converters/semantickitti_converter.py:
--------------------------------------------------------------------------------
  1 | from os import path as osp
  2 | from pathlib import Path
  3 | 
  4 | import mmengine
  5 | 
  6 | total_num = {
  7 |     0: 4541,
  8 |     1: 1101,
  9 |     2: 4661,
 10 |     3: 801,
 11 |     4: 271,
 12 |     5: 2761,
 13 |     6: 1101,
 14 |     7: 1101,
 15 |     8: 4071,
 16 |     9: 1591,
 17 |     10: 1201,
 18 |     11: 921,
 19 |     12: 1061,
 20 |     13: 3281,
 21 |     14: 631,
 22 |     15: 1901,
 23 |     16: 1731,
 24 |     17: 491,
 25 |     18: 1801,
 26 |     19: 4981,
 27 |     20: 831,
 28 |     21: 2721,
 29 | }
 30 | fold_split = {
 31 |     'train': [0, 1, 2, 3, 4, 5, 6, 7, 9, 10],
 32 |     'val': [8],
 33 |     'test': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 34 |     'trainval': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
 35 | }
 36 | 
 37 | 
 38 | def get_semantickitti_info(split):
 39 |     """Create info file in the form of
 40 |         data_infos={
 41 |             'metainfo': {'DATASET': 'SemanticKITTI'},
 42 |             'data_list': {
 43 |                 00000: {
 44 |                     'lidar_points':{
 45 |                         'lidat_path':'sequences/00/velodyne/000000.bin'
 46 |                     },
 47 |                     'pts_semantic_mask_path':
 48 |                         'sequences/000/labels/000000.labbel',
 49 |                     'pts_panoptic_mask_path':
 50 |                         'sequences/000/labels/000000.labbel',
 51 |                     'sample_id': '00'
 52 |                 },
 53 |                 ...
 54 |             }
 55 |         }
 56 |     """
 57 |     data_infos = dict()
 58 |     data_infos['metainfo'] = dict(DATASET='SemanticKITTI')
 59 |     data_list = []
 60 |     for i_folder in fold_split[split]:
 61 |         for j in range(0, total_num[i_folder]):
 62 |             data_list.append({
 63 |                 'lidar_points': {
 64 |                     'lidar_path':
 65 |                     osp.join('sequences',
 66 |                              str(i_folder).zfill(2), 'velodyne',
 67 |                              str(j).zfill(6) + '.bin'),
 68 |                     'num_pts_feats':
 69 |                     4
 70 |                 },
 71 |                 'pts_semantic_mask_path':
 72 |                 osp.join('sequences',
 73 |                          str(i_folder).zfill(2), 'labels',
 74 |                          str(j).zfill(6) + '.label'),
 75 |                 'pts_panoptic_mask_path':
 76 |                 osp.join('sequences',
 77 |                          str(i_folder).zfill(2), 'labels',
 78 |                          str(j).zfill(6) + '.label'),
 79 |                 'sample_id':
 80 |                 str(i_folder) + str(j)
 81 |             })
 82 |     data_infos.update(dict(data_list=data_list))
 83 |     return data_infos
 84 | 
 85 | 
 86 | def create_semantickitti_info_file(pkl_prefix, save_path):
 87 |     """Create info file of SemanticKITTI dataset.
 88 | 
 89 |     Directly generate info file without raw data.
 90 | 
 91 |     Args:
 92 |         pkl_prefix (str): Prefix of the info file to be generated.
 93 |         save_path (str): Path to save the info file.
 94 |     """
 95 |     print('Generate info.')
 96 |     save_path = Path(save_path)
 97 | 
 98 |     semantickitti_infos_train = get_semantickitti_info(split='train')
 99 |     filename = save_path / f'{pkl_prefix}_infos_train.pkl'
100 |     print(f'SemanticKITTI info train file is saved to {filename}')
101 |     mmengine.dump(semantickitti_infos_train, filename)
102 |     semantickitti_infos_val = get_semantickitti_info(split='val')
103 |     filename = save_path / f'{pkl_prefix}_infos_val.pkl'
104 |     print(f'SemanticKITTI info val file is saved to {filename}')
105 |     mmengine.dump(semantickitti_infos_val, filename)
106 |     semantickitti_infos_test = get_semantickitti_info(split='test')
107 |     filename = save_path / f'{pkl_prefix}_infos_test.pkl'
108 |     print(f'SemanticKITTI info test file is saved to {filename}')
109 |     mmengine.dump(semantickitti_infos_test, filename)
110 |     semantickitti_infos_test = get_semantickitti_info(split='trainval')
111 |     filename = save_path / f'{pkl_prefix}_infos_trainval.pkl'
112 |     print(f'SemanticKITTI info test file is saved to {filename}')
113 |     mmengine.dump(semantickitti_infos_test, filename)


--------------------------------------------------------------------------------
/train.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) OpenMMLab. All rights reserved.
  2 | import argparse
  3 | import logging
  4 | import os
  5 | import os.path as osp
  6 | 
  7 | from mmengine.config import Config, DictAction
  8 | from mmengine.logging import print_log
  9 | from mmengine.registry import RUNNERS
 10 | from mmengine.runner import Runner
 11 | 
 12 | from mmdet3d.utils import replace_ceph_backend
 13 | 
 14 | def parse_args():
 15 |     parser = argparse.ArgumentParser(description='Train a 3D detector')
 16 |     parser.add_argument('config', help='train config file path')
 17 |     parser.add_argument('--work-dir', help='the dir to save logs and models')
 18 |     parser.add_argument('--checkpoint', help='the dir to save logs and models')
 19 |     parser.add_argument(
 20 |         '--amp',
 21 |         action='store_true',
 22 |         default=False,
 23 |         help='enable automatic-mixed-precision training')
 24 |     parser.add_argument(
 25 |         '--auto-scale-lr',
 26 |         action='store_true',
 27 |         help='enable automatically scaling LR.')
 28 |     parser.add_argument(
 29 |         '--resume',
 30 |         nargs='?',
 31 |         type=str,
 32 |         const='auto',
 33 |         help='If specify checkpoint path, resume from it, while if not '
 34 |         'specify, try to auto resume from the latest checkpoint '
 35 |         'in the work directory.')
 36 |     parser.add_argument(
 37 |         '--ceph', action='store_true', help='Use ceph as data storage backend')
 38 |     parser.add_argument(
 39 |         '--cfg-options',
 40 |         nargs='+',
 41 |         action=DictAction,
 42 |         help='override some settings in the used config, the key-value pair '
 43 |         'in xxx=yyy format will be merged into config file. If the value to '
 44 |         'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
 45 |         'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
 46 |         'Note that the quotation marks are necessary and that no white space '
 47 |         'is allowed.')
 48 |     parser.add_argument(
 49 |         '--launcher',
 50 |         choices=['none', 'pytorch', 'slurm', 'mpi'],
 51 |         default='none',
 52 |         help='job launcher')
 53 |     # When using PyTorch version >= 2.0.0, the `torch.distributed.launch`
 54 |     # will pass the `--local-rank` parameter to `tools/train.py` instead
 55 |     # of `--local_rank`.
 56 |     parser.add_argument('--local_rank', '--local-rank', type=int, default=0)
 57 |     args = parser.parse_args()
 58 |     if 'LOCAL_RANK' not in os.environ:
 59 |         os.environ['LOCAL_RANK'] = str(args.local_rank)
 60 |     return args
 61 | 
 62 | 
 63 | def main():
 64 |     args = parse_args()
 65 | 
 66 |     # load config
 67 |     cfg = Config.fromfile(args.config)
 68 | 
 69 |     # TODO: We will unify the ceph support approach with other OpenMMLab repos
 70 |     if args.ceph:
 71 |         cfg = replace_ceph_backend(cfg)
 72 | 
 73 |     cfg.launcher = args.launcher
 74 |     if args.cfg_options is not None:
 75 |         cfg.merge_from_dict(args.cfg_options)
 76 | 
 77 |     # work_dir is determined in this priority: CLI > segment in file > filename
 78 |     if args.work_dir is not None:
 79 |         # update configs according to CLI args if args.work_dir is not None
 80 |         cfg.work_dir = args.work_dir
 81 |     elif cfg.get('work_dir', None) is None:
 82 |         # use config filename as default work_dir if cfg.work_dir is None
 83 |         cfg.work_dir = osp.join('./work_dirs',
 84 |                                 osp.splitext(osp.basename(args.config))[0])
 85 | 
 86 |     # enable automatic-mixed-precision training
 87 |     if args.amp is True:
 88 |         optim_wrapper = cfg.optim_wrapper.type
 89 |         if optim_wrapper == 'AmpOptimWrapper':
 90 |             print_log(
 91 |                 'AMP training is already enabled in your config.',
 92 |                 logger='current',
 93 |                 level=logging.WARNING)
 94 |         else:
 95 |             assert optim_wrapper == 'OptimWrapper', (
 96 |                 '`--amp` is only supported when the optimizer wrapper type is '
 97 |                 f'`OptimWrapper` but got {optim_wrapper}.')
 98 |             cfg.optim_wrapper.type = 'AmpOptimWrapper'
 99 |             cfg.optim_wrapper.loss_scale = 'dynamic'
100 | 
101 |     # enable automatically scaling LR
102 |     if args.auto_scale_lr:
103 |         if 'auto_scale_lr' in cfg and \
104 |                 'enable' in cfg.auto_scale_lr and \
105 |                 'base_batch_size' in cfg.auto_scale_lr:
106 |             cfg.auto_scale_lr.enable = True
107 |         else:
108 |             raise RuntimeError('Can not find "auto_scale_lr" or '
109 |                                '"auto_scale_lr.enable" or '
110 |                                '"auto_scale_lr.base_batch_size" in your'
111 |                                ' configuration file.')
112 | 
113 |     # resume is determined in this priority: resume from > auto_resume
114 |     if args.resume == 'auto':
115 |         cfg.resume = True
116 |         cfg.load_from = None
117 |     elif args.checkpoint is not None:
118 |         cfg.load_from = args.checkpoint
119 |     elif args.resume is not None:
120 |         cfg.resume = True
121 |         cfg.load_from = args.resume
122 | 
123 |     # build the runner from config
124 |     if 'runner_type' not in cfg:
125 |         # build the default runner
126 |         runner = Runner.from_cfg(cfg)
127 |     else:
128 |         # build customized runner from the registry
129 |         # if 'runner_type' is set in the cfg
130 |         runner = RUNNERS.build(cfg)
131 | 
132 |     # start training
133 |     runner.train()
134 | 
135 | 
136 | if __name__ == '__main__':
137 |     main()
138 | 


--------------------------------------------------------------------------------