├── .gitignore ├── .gitmodules ├── LICENSE ├── README.md ├── accuracy-latency.png ├── ckpt └── README.md ├── configs ├── fcos3d_0.25.py ├── fcos3d_0.50.py ├── fcos3d_0.75.py ├── fcos3d_1.00.py ├── lzu_fcos3d_0.25.py ├── lzu_fcos3d_0.50.py ├── lzu_fcos3d_0.75.py └── lzu_fcos3d_1.00.py ├── data └── README.md ├── demo.gif ├── environment.yml ├── lzu ├── __init__.py ├── fcos_mono3d_head_norescale.py ├── fixed_grid.py ├── invert_grid.py ├── lzu_fcos_mono3d.py └── transforms_3d.py ├── run.sh ├── saliency.pkl ├── setup.py ├── teaser.gif └── tools ├── test.py └── train.py /.gitignore: -------------------------------------------------------------------------------- 1 | .vscode/ 2 | data/ 3 | ckpt/ 4 | *.egg-info/ 5 | __pycache__/ 6 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "mmdetection3d_v1.0.0rc6"] 2 | path = mmdetection3d_v1.0.0rc6 3 | url = https://github.com/open-mmlab/mmdetection3d.git 4 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Chittesh Thavamani 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Learning to Zoom and Unzoom 2 | 3 | Official repository for the CVPR 2023 paper _Learning to Zoom and Unzoom_ [[paper]](https://arxiv.org/abs/2303.15390) [[website]](https://tchittesh.github.io/lzu/) [[talk]](https://youtu.be/wALSrBZiUgc). 4 | 5 |

6 | How LZU works 7 | Video Demo of LZU 8 |

9 | 10 | In a nutshell, LZU is a highly flexible method to apply spatial attention to neural nets. 11 | The extremely simple source code ([zoom](./lzu/fixed_grid.py) and [unzoom](./lzu/invert_grid.py)) can be applied to any model that uses spatial processing (e.g. convolutions). 12 | 13 | ## Setup (Code + Data + Models) 14 | 15 |

16 |

1) Set up the coding environment

17 | 18 |

19 | First, clone the repository (including the mmdet3d submodule): 20 | ```bash 21 | git clone https://github.com/tchittesh/lzu.git --recursive && cd lzu 22 | ``` 23 | 24 | Then, you'll need to install the MMDetection3D (v1.0.0rc6) submodule and the lzu package. 25 | To do this, you can either: 26 | - replicate our exact setup by installing [miniconda](https://docs.conda.io/en/latest/miniconda.html) and running 27 | ``` 28 | conda env create -f environment.yml 29 | ``` 30 | - OR install it from scratch according to [getting_started.md](https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/docs/en/getting_started.md) and then install our lzu package with 31 | ```bash 32 | pip install -e . 33 | ``` 34 | 35 | The first option should be more reliable, but not as flexible if you want to run specific versions of Python/PyTorch/MMCV. 36 |

37 | 38 |

39 |

2) Download the dataset

55 | 56 |

57 |

3) [Optional] Download our pretrained checkpoints

58 | 59 |

60 | Download our pretrained checkpoints from [Google Drive](https://drive.google.com/file/d/1nofuqZ7YSKblIDAltbxp1pFOiQtUzp8B/view?usp=sharing) and place them in this directory, using symbolic links if necessary. 61 |

62 | 63 | ## Scripts 64 | 65 | This should be super easy! Simply run 66 | 67 | ``` 68 | sh run.sh [experiment_name] 69 | ``` 70 | 71 | for any valid experiment name in the `configs/` directory. 72 | Examples include `fcos3d_0.50`, which is the uniform downsampling baseline at 0.50x scale, and `lzu_fcos3d_0.75`, which is LZU at 0.75x scale. 73 | 74 | This script will first run inference using the pretrained checkpoint, then train the model from scratch, and finally run inference using the trained model. 75 | 76 | ## Results 77 | 78 | Our pretrained models (from the paper) achieve the following NDS scores. 79 | 80 | | Scale | Baseline Experiment | NDS | LZU Experiment | NDS | 81 | | ----- | ------------------- | ----------- | ------------------ | ----------- | 82 | | 0.25x | fcos3d_0.25 | 0.2177 | lzu_fcos3d_0.25 | 0.2341 | 83 | | 0.50x | fcos3d_0.50 | 0.2752 | lzu_fcos3d_0.50 | 0.2926 | 84 | | 0.75x | fcos3d_0.75 | 0.3053 | lzu_fcos3d_0.75 | 0.3175 | 85 | | 1.00x | fcos3d_1.00 | 0.3122 | lzu_fcos3d_1.00 | 0.3258 | 86 | 87 | As can be seen, LZU achieves a superior accuracy-latency tradeoff compared to uniform downsampling. For more details, please refer to our [paper](https://arxiv.org/abs/2303.15390). 88 | 89 | ![Accuracy Latency Curve](./accuracy-latency.png) 90 | 91 | ## Citation 92 | 93 | If you find our code useful, please consider citing us! 94 | ``` 95 | @misc{thavamani2023learning, 96 | title={Learning to Zoom and Unzoom}, 97 | author={Chittesh Thavamani and Mengtian Li and Francesco Ferroni and Deva Ramanan}, 98 | year={2023}, 99 | eprint={2303.15390}, 100 | archivePrefix={arXiv}, 101 | primaryClass={cs.CV} 102 | } 103 | ``` 104 | -------------------------------------------------------------------------------- /accuracy-latency.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tchittesh/lzu/361afb2360011a3b540fdbdd53d8be9eda70ac78/accuracy-latency.png -------------------------------------------------------------------------------- /ckpt/README.md: -------------------------------------------------------------------------------- 1 | Download our pretrained checkpoints from [Google Drive](https://drive.google.com/file/d/1nofuqZ7YSKblIDAltbxp1pFOiQtUzp8B/view?usp=sharing) and place them in this directory, using symbolic links if you wish. 2 | -------------------------------------------------------------------------------- /configs/fcos3d_0.25.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py', 3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py', 4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py', 5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py' 6 | ] 7 | scale_factor = 0.25 8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900)) 9 | 10 | # model settings 11 | model = dict( 12 | backbone=dict( 13 | depth=50, 14 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 15 | stage_with_dcn=(False, False, True, True), 16 | init_cfg=dict(checkpoint='torchvision://resnet50')), 17 | bbox_head=dict( 18 | type='FCOSMono3DHeadNoRescale')) 19 | 20 | class_names = [ 21 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 22 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' 23 | ] 24 | img_norm_cfg = dict( 25 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) 26 | train_pipeline = [ 27 | dict(type='LoadImageFromFileMono3D'), 28 | dict( 29 | type='LoadAnnotations3D', 30 | with_bbox=True, 31 | with_label=True, 32 | with_attr_label=True, 33 | with_bbox_3d=True, 34 | with_label_3d=True, 35 | with_bbox_depth=True), 36 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), 37 | dict(type='Resize3D', img_scale=img_scale, keep_ratio=True), 38 | dict(type='Normalize', **img_norm_cfg), 39 | dict(type='Pad', size_divisor=32), 40 | dict(type='DefaultFormatBundle3D', class_names=class_names), 41 | dict( 42 | type='Collect3D', 43 | keys=[ 44 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 45 | 'gt_labels_3d', 'centers2d', 'depths' 46 | ]), 47 | ] 48 | test_pipeline = [ 49 | dict(type='LoadImageFromFileMono3D'), 50 | dict( 51 | type='MultiScaleFlipAug', 52 | scale_factor=scale_factor, 53 | flip=False, 54 | transforms=[ 55 | dict(type='RandomFlip3D'), 56 | dict(type='Resize3D', keep_ratio=True), 57 | dict(type='Normalize', **img_norm_cfg), 58 | dict(type='Pad', size_divisor=32), 59 | dict( 60 | type='DefaultFormatBundle3D', 61 | class_names=class_names, 62 | with_label=False), 63 | dict(type='Collect3D', keys=['img']), 64 | ]) 65 | ] 66 | data = dict( 67 | samples_per_gpu=8, 68 | workers_per_gpu=4, 69 | train=dict(pipeline=train_pipeline), 70 | val=dict(pipeline=test_pipeline), 71 | test=dict(pipeline=test_pipeline)) 72 | # optimizer 73 | optimizer = dict( 74 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) 75 | optimizer_config = dict( 76 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) 77 | # learning policy 78 | lr_config = dict( 79 | policy='step', 80 | warmup='linear', 81 | warmup_iters=500, 82 | warmup_ratio=1.0 / 3, 83 | step=[8, 11]) 84 | total_epochs = 12 85 | evaluation = dict(interval=1) 86 | -------------------------------------------------------------------------------- /configs/fcos3d_0.50.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py', 3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py', 4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py', 5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py' 6 | ] 7 | scale_factor = 0.50 8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900)) 9 | 10 | # model settings 11 | model = dict( 12 | backbone=dict( 13 | depth=50, 14 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 15 | stage_with_dcn=(False, False, True, True), 16 | init_cfg=dict(checkpoint='torchvision://resnet50')), 17 | bbox_head=dict( 18 | type='FCOSMono3DHeadNoRescale')) 19 | 20 | class_names = [ 21 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 22 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' 23 | ] 24 | img_norm_cfg = dict( 25 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) 26 | train_pipeline = [ 27 | dict(type='LoadImageFromFileMono3D'), 28 | dict( 29 | type='LoadAnnotations3D', 30 | with_bbox=True, 31 | with_label=True, 32 | with_attr_label=True, 33 | with_bbox_3d=True, 34 | with_label_3d=True, 35 | with_bbox_depth=True), 36 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), 37 | dict(type='Resize3D', img_scale=img_scale, keep_ratio=True), 38 | dict(type='Normalize', **img_norm_cfg), 39 | dict(type='Pad', size_divisor=32), 40 | dict(type='DefaultFormatBundle3D', class_names=class_names), 41 | dict( 42 | type='Collect3D', 43 | keys=[ 44 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 45 | 'gt_labels_3d', 'centers2d', 'depths' 46 | ]), 47 | ] 48 | test_pipeline = [ 49 | dict(type='LoadImageFromFileMono3D'), 50 | dict( 51 | type='MultiScaleFlipAug', 52 | scale_factor=scale_factor, 53 | flip=False, 54 | transforms=[ 55 | dict(type='RandomFlip3D'), 56 | dict(type='Resize3D', keep_ratio=True), 57 | dict(type='Normalize', **img_norm_cfg), 58 | dict(type='Pad', size_divisor=32), 59 | dict( 60 | type='DefaultFormatBundle3D', 61 | class_names=class_names, 62 | with_label=False), 63 | dict(type='Collect3D', keys=['img']), 64 | ]) 65 | ] 66 | data = dict( 67 | samples_per_gpu=8, 68 | workers_per_gpu=4, 69 | train=dict(pipeline=train_pipeline), 70 | val=dict(pipeline=test_pipeline), 71 | test=dict(pipeline=test_pipeline)) 72 | # optimizer 73 | optimizer = dict( 74 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) 75 | optimizer_config = dict( 76 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) 77 | # learning policy 78 | lr_config = dict( 79 | policy='step', 80 | warmup='linear', 81 | warmup_iters=500, 82 | warmup_ratio=1.0 / 3, 83 | step=[8, 11]) 84 | total_epochs = 12 85 | evaluation = dict(interval=1) 86 | -------------------------------------------------------------------------------- /configs/fcos3d_0.75.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py', 3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py', 4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py', 5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py' 6 | ] 7 | scale_factor = 0.75 8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900)) 9 | 10 | # model settings 11 | model = dict( 12 | backbone=dict( 13 | depth=50, 14 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 15 | stage_with_dcn=(False, False, True, True), 16 | init_cfg=dict(checkpoint='torchvision://resnet50')), 17 | bbox_head=dict( 18 | type='FCOSMono3DHeadNoRescale')) 19 | 20 | class_names = [ 21 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 22 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' 23 | ] 24 | img_norm_cfg = dict( 25 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) 26 | train_pipeline = [ 27 | dict(type='LoadImageFromFileMono3D'), 28 | dict( 29 | type='LoadAnnotations3D', 30 | with_bbox=True, 31 | with_label=True, 32 | with_attr_label=True, 33 | with_bbox_3d=True, 34 | with_label_3d=True, 35 | with_bbox_depth=True), 36 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), 37 | dict(type='Resize3D', img_scale=img_scale, keep_ratio=True), 38 | dict(type='Normalize', **img_norm_cfg), 39 | dict(type='Pad', size_divisor=32), 40 | dict(type='DefaultFormatBundle3D', class_names=class_names), 41 | dict( 42 | type='Collect3D', 43 | keys=[ 44 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 45 | 'gt_labels_3d', 'centers2d', 'depths' 46 | ]), 47 | ] 48 | test_pipeline = [ 49 | dict(type='LoadImageFromFileMono3D'), 50 | dict( 51 | type='MultiScaleFlipAug', 52 | scale_factor=scale_factor, 53 | flip=False, 54 | transforms=[ 55 | dict(type='RandomFlip3D'), 56 | dict(type='Resize3D', keep_ratio=True), 57 | dict(type='Normalize', **img_norm_cfg), 58 | dict(type='Pad', size_divisor=32), 59 | dict( 60 | type='DefaultFormatBundle3D', 61 | class_names=class_names, 62 | with_label=False), 63 | dict(type='Collect3D', keys=['img']), 64 | ]) 65 | ] 66 | data = dict( 67 | samples_per_gpu=8, 68 | workers_per_gpu=4, 69 | train=dict(pipeline=train_pipeline), 70 | val=dict(pipeline=test_pipeline), 71 | test=dict(pipeline=test_pipeline)) 72 | # optimizer 73 | optimizer = dict( 74 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) 75 | optimizer_config = dict( 76 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) 77 | # learning policy 78 | lr_config = dict( 79 | policy='step', 80 | warmup='linear', 81 | warmup_iters=500, 82 | warmup_ratio=1.0 / 3, 83 | step=[8, 11]) 84 | total_epochs = 12 85 | evaluation = dict(interval=1) 86 | -------------------------------------------------------------------------------- /configs/fcos3d_1.00.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py', 3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py', 4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py', 5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py' 6 | ] 7 | scale_factor = 1.00 8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900)) 9 | 10 | # model settings 11 | model = dict( 12 | backbone=dict( 13 | depth=50, 14 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 15 | stage_with_dcn=(False, False, True, True), 16 | init_cfg=dict(checkpoint='torchvision://resnet50')), 17 | bbox_head=dict( 18 | type='FCOSMono3DHeadNoRescale')) 19 | 20 | class_names = [ 21 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 22 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' 23 | ] 24 | img_norm_cfg = dict( 25 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) 26 | train_pipeline = [ 27 | dict(type='LoadImageFromFileMono3D'), 28 | dict( 29 | type='LoadAnnotations3D', 30 | with_bbox=True, 31 | with_label=True, 32 | with_attr_label=True, 33 | with_bbox_3d=True, 34 | with_label_3d=True, 35 | with_bbox_depth=True), 36 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), 37 | dict(type='Resize3D', img_scale=img_scale, keep_ratio=True), 38 | dict(type='Normalize', **img_norm_cfg), 39 | dict(type='Pad', size_divisor=32), 40 | dict(type='DefaultFormatBundle3D', class_names=class_names), 41 | dict( 42 | type='Collect3D', 43 | keys=[ 44 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 45 | 'gt_labels_3d', 'centers2d', 'depths' 46 | ]), 47 | ] 48 | test_pipeline = [ 49 | dict(type='LoadImageFromFileMono3D'), 50 | dict( 51 | type='MultiScaleFlipAug', 52 | scale_factor=scale_factor, 53 | flip=False, 54 | transforms=[ 55 | dict(type='RandomFlip3D'), 56 | dict(type='Resize3D', keep_ratio=True), 57 | dict(type='Normalize', **img_norm_cfg), 58 | dict(type='Pad', size_divisor=32), 59 | dict( 60 | type='DefaultFormatBundle3D', 61 | class_names=class_names, 62 | with_label=False), 63 | dict(type='Collect3D', keys=['img']), 64 | ]) 65 | ] 66 | data = dict( 67 | samples_per_gpu=8, 68 | workers_per_gpu=4, 69 | train=dict(pipeline=train_pipeline), 70 | val=dict(pipeline=test_pipeline), 71 | test=dict(pipeline=test_pipeline)) 72 | # optimizer 73 | optimizer = dict( 74 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) 75 | optimizer_config = dict( 76 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) 77 | # learning policy 78 | lr_config = dict( 79 | policy='step', 80 | warmup='linear', 81 | warmup_iters=500, 82 | warmup_ratio=1.0 / 3, 83 | step=[8, 11]) 84 | total_epochs = 12 85 | evaluation = dict(interval=1) 86 | -------------------------------------------------------------------------------- /configs/lzu_fcos3d_0.25.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py', 3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py', 4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py', 5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py' 6 | ] 7 | scale_factor = 0.25 8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900)) 9 | 10 | # model settings 11 | model = dict( 12 | type="LZUFCOSMono3D", 13 | backbone=dict( 14 | depth=50, 15 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 16 | stage_with_dcn=(False, False, True, True), 17 | init_cfg=dict(checkpoint='torchvision://resnet50')), 18 | grid_generator=dict( 19 | type='FixedGrid', 20 | saliency_file='saliency.pkl', 21 | output_shape=(img_scale[1], img_scale[0]), 22 | grid_shape=(27, 48), 23 | separable=True, 24 | attraction_fwhm=10, 25 | anti_crop=True), 26 | bbox_head=dict( 27 | type='FCOSMono3DHeadNoRescale')) 28 | 29 | class_names = [ 30 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 31 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' 32 | ] 33 | img_norm_cfg = dict( 34 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) 35 | train_pipeline = [ 36 | dict(type='LoadImageFromFileMono3D'), 37 | dict( 38 | type='LoadAnnotations3D', 39 | with_bbox=True, 40 | with_label=True, 41 | with_attr_label=True, 42 | with_bbox_3d=True, 43 | with_label_3d=True, 44 | with_bbox_depth=True), 45 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), 46 | dict(type='Resize3D', img_scale=(1600, 900), keep_ratio=True), 47 | dict(type='Normalize', **img_norm_cfg), 48 | dict(type='Pad', size_divisor=32), 49 | dict(type='DefaultFormatBundle3D', class_names=class_names), 50 | dict( 51 | type='Collect3D', 52 | keys=[ 53 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 54 | 'gt_labels_3d', 'centers2d', 'depths' 55 | ]), 56 | ] 57 | test_pipeline = [ 58 | dict(type='LoadImageFromFileMono3D'), 59 | dict( 60 | type='MultiScaleFlipAug', 61 | scale_factor=1.0, 62 | flip=False, 63 | transforms=[ 64 | dict(type='RandomFlip3D'), 65 | dict(type='Resize3D', keep_ratio=True), 66 | dict(type='Normalize', **img_norm_cfg), 67 | dict(type='Pad', size_divisor=32), 68 | dict( 69 | type='DefaultFormatBundle3D', 70 | class_names=class_names, 71 | with_label=False), 72 | dict(type='Collect3D', keys=['img']), 73 | ]) 74 | ] 75 | data = dict( 76 | samples_per_gpu=8, 77 | workers_per_gpu=4, 78 | train=dict(pipeline=train_pipeline), 79 | val=dict(pipeline=test_pipeline), 80 | test=dict(pipeline=test_pipeline)) 81 | # optimizer 82 | optimizer = dict( 83 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) 84 | optimizer_config = dict( 85 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) 86 | # learning policy 87 | lr_config = dict( 88 | policy='step', 89 | warmup='linear', 90 | warmup_iters=500, 91 | warmup_ratio=1.0 / 3, 92 | step=[8, 11]) 93 | total_epochs = 12 94 | evaluation = dict(interval=1) 95 | -------------------------------------------------------------------------------- /configs/lzu_fcos3d_0.50.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py', 3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py', 4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py', 5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py' 6 | ] 7 | scale_factor = 0.50 8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900)) 9 | 10 | # model settings 11 | model = dict( 12 | type="LZUFCOSMono3D", 13 | backbone=dict( 14 | depth=50, 15 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 16 | stage_with_dcn=(False, False, True, True), 17 | init_cfg=dict(checkpoint='torchvision://resnet50')), 18 | grid_generator=dict( 19 | type='FixedGrid', 20 | saliency_file='saliency.pkl', 21 | output_shape=(img_scale[1], img_scale[0]), 22 | grid_shape=(27, 48), 23 | separable=True, 24 | attraction_fwhm=10, 25 | anti_crop=True), 26 | bbox_head=dict( 27 | type='FCOSMono3DHeadNoRescale')) 28 | 29 | class_names = [ 30 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 31 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' 32 | ] 33 | img_norm_cfg = dict( 34 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) 35 | train_pipeline = [ 36 | dict(type='LoadImageFromFileMono3D'), 37 | dict( 38 | type='LoadAnnotations3D', 39 | with_bbox=True, 40 | with_label=True, 41 | with_attr_label=True, 42 | with_bbox_3d=True, 43 | with_label_3d=True, 44 | with_bbox_depth=True), 45 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), 46 | dict(type='Resize3D', img_scale=(1600, 900), keep_ratio=True), 47 | dict(type='Normalize', **img_norm_cfg), 48 | dict(type='Pad', size_divisor=32), 49 | dict(type='DefaultFormatBundle3D', class_names=class_names), 50 | dict( 51 | type='Collect3D', 52 | keys=[ 53 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 54 | 'gt_labels_3d', 'centers2d', 'depths' 55 | ]), 56 | ] 57 | test_pipeline = [ 58 | dict(type='LoadImageFromFileMono3D'), 59 | dict( 60 | type='MultiScaleFlipAug', 61 | scale_factor=1.0, 62 | flip=False, 63 | transforms=[ 64 | dict(type='RandomFlip3D'), 65 | dict(type='Resize3D', keep_ratio=True), 66 | dict(type='Normalize', **img_norm_cfg), 67 | dict(type='Pad', size_divisor=32), 68 | dict( 69 | type='DefaultFormatBundle3D', 70 | class_names=class_names, 71 | with_label=False), 72 | dict(type='Collect3D', keys=['img']), 73 | ]) 74 | ] 75 | data = dict( 76 | samples_per_gpu=8, 77 | workers_per_gpu=4, 78 | train=dict(pipeline=train_pipeline), 79 | val=dict(pipeline=test_pipeline), 80 | test=dict(pipeline=test_pipeline)) 81 | # optimizer 82 | optimizer = dict( 83 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) 84 | optimizer_config = dict( 85 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) 86 | # learning policy 87 | lr_config = dict( 88 | policy='step', 89 | warmup='linear', 90 | warmup_iters=500, 91 | warmup_ratio=1.0 / 3, 92 | step=[8, 11]) 93 | total_epochs = 12 94 | evaluation = dict(interval=1) 95 | -------------------------------------------------------------------------------- /configs/lzu_fcos3d_0.75.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py', 3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py', 4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py', 5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py' 6 | ] 7 | scale_factor = 0.75 8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900)) 9 | 10 | # model settings 11 | model = dict( 12 | type="LZUFCOSMono3D", 13 | backbone=dict( 14 | depth=50, 15 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 16 | stage_with_dcn=(False, False, True, True), 17 | init_cfg=dict(checkpoint='torchvision://resnet50')), 18 | grid_generator=dict( 19 | type='FixedGrid', 20 | saliency_file='saliency.pkl', 21 | output_shape=(img_scale[1], img_scale[0]), 22 | grid_shape=(27, 48), 23 | separable=True, 24 | attraction_fwhm=10, 25 | anti_crop=True), 26 | bbox_head=dict( 27 | type='FCOSMono3DHeadNoRescale')) 28 | 29 | class_names = [ 30 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 31 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' 32 | ] 33 | img_norm_cfg = dict( 34 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) 35 | train_pipeline = [ 36 | dict(type='LoadImageFromFileMono3D'), 37 | dict( 38 | type='LoadAnnotations3D', 39 | with_bbox=True, 40 | with_label=True, 41 | with_attr_label=True, 42 | with_bbox_3d=True, 43 | with_label_3d=True, 44 | with_bbox_depth=True), 45 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), 46 | dict(type='Resize3D', img_scale=(1600, 900), keep_ratio=True), 47 | dict(type='Normalize', **img_norm_cfg), 48 | dict(type='Pad', size_divisor=32), 49 | dict(type='DefaultFormatBundle3D', class_names=class_names), 50 | dict( 51 | type='Collect3D', 52 | keys=[ 53 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 54 | 'gt_labels_3d', 'centers2d', 'depths' 55 | ]), 56 | ] 57 | test_pipeline = [ 58 | dict(type='LoadImageFromFileMono3D'), 59 | dict( 60 | type='MultiScaleFlipAug', 61 | scale_factor=1.0, 62 | flip=False, 63 | transforms=[ 64 | dict(type='RandomFlip3D'), 65 | dict(type='Resize3D', keep_ratio=True), 66 | dict(type='Normalize', **img_norm_cfg), 67 | dict(type='Pad', size_divisor=32), 68 | dict( 69 | type='DefaultFormatBundle3D', 70 | class_names=class_names, 71 | with_label=False), 72 | dict(type='Collect3D', keys=['img']), 73 | ]) 74 | ] 75 | data = dict( 76 | samples_per_gpu=8, 77 | workers_per_gpu=4, 78 | train=dict(pipeline=train_pipeline), 79 | val=dict(pipeline=test_pipeline), 80 | test=dict(pipeline=test_pipeline)) 81 | # optimizer 82 | optimizer = dict( 83 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) 84 | optimizer_config = dict( 85 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) 86 | # learning policy 87 | lr_config = dict( 88 | policy='step', 89 | warmup='linear', 90 | warmup_iters=500, 91 | warmup_ratio=1.0 / 3, 92 | step=[8, 11]) 93 | total_epochs = 12 94 | evaluation = dict(interval=1) 95 | -------------------------------------------------------------------------------- /configs/lzu_fcos3d_1.00.py: -------------------------------------------------------------------------------- 1 | _base_ = [ 2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py', 3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py', 4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py', 5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py' 6 | ] 7 | scale_factor = 1.00 8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900)) 9 | 10 | # model settings 11 | model = dict( 12 | type="LZUFCOSMono3D", 13 | backbone=dict( 14 | depth=50, 15 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 16 | stage_with_dcn=(False, False, True, True), 17 | init_cfg=dict(checkpoint='torchvision://resnet50')), 18 | grid_generator=dict( 19 | type='FixedGrid', 20 | saliency_file='saliency.pkl', 21 | output_shape=(img_scale[1], img_scale[0]), 22 | grid_shape=(27, 48), 23 | separable=True, 24 | attraction_fwhm=10, 25 | anti_crop=True), 26 | bbox_head=dict( 27 | type='FCOSMono3DHeadNoRescale')) 28 | 29 | class_names = [ 30 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle', 31 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier' 32 | ] 33 | img_norm_cfg = dict( 34 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False) 35 | train_pipeline = [ 36 | dict(type='LoadImageFromFileMono3D'), 37 | dict( 38 | type='LoadAnnotations3D', 39 | with_bbox=True, 40 | with_label=True, 41 | with_attr_label=True, 42 | with_bbox_3d=True, 43 | with_label_3d=True, 44 | with_bbox_depth=True), 45 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5), 46 | dict(type='Resize3D', img_scale=(1600, 900), keep_ratio=True), 47 | dict(type='Normalize', **img_norm_cfg), 48 | dict(type='Pad', size_divisor=32), 49 | dict(type='DefaultFormatBundle3D', class_names=class_names), 50 | dict( 51 | type='Collect3D', 52 | keys=[ 53 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d', 54 | 'gt_labels_3d', 'centers2d', 'depths' 55 | ]), 56 | ] 57 | test_pipeline = [ 58 | dict(type='LoadImageFromFileMono3D'), 59 | dict( 60 | type='MultiScaleFlipAug', 61 | scale_factor=1.0, 62 | flip=False, 63 | transforms=[ 64 | dict(type='RandomFlip3D'), 65 | dict(type='Resize3D', keep_ratio=True), 66 | dict(type='Normalize', **img_norm_cfg), 67 | dict(type='Pad', size_divisor=32), 68 | dict( 69 | type='DefaultFormatBundle3D', 70 | class_names=class_names, 71 | with_label=False), 72 | dict(type='Collect3D', keys=['img']), 73 | ]) 74 | ] 75 | data = dict( 76 | samples_per_gpu=8, 77 | workers_per_gpu=4, 78 | train=dict(pipeline=train_pipeline), 79 | val=dict(pipeline=test_pipeline), 80 | test=dict(pipeline=test_pipeline)) 81 | # optimizer 82 | optimizer = dict( 83 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)) 84 | optimizer_config = dict( 85 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) 86 | # learning policy 87 | lr_config = dict( 88 | policy='step', 89 | warmup='linear', 90 | warmup_iters=500, 91 | warmup_ratio=1.0 / 3, 92 | step=[8, 11]) 93 | total_epochs = 12 94 | evaluation = dict(interval=1) 95 | -------------------------------------------------------------------------------- /data/README.md: -------------------------------------------------------------------------------- 1 | Download and set up the [nuScenes](https://www.nuscenes.org/nuscenes#download) dataset according to [data_preparation.md](https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/docs/en/data_preparation.md). Use symbolic links if necessary. Make sure to place the dataset in a `nuscenes/` subdirectory. 2 | -------------------------------------------------------------------------------- /demo.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tchittesh/lzu/361afb2360011a3b540fdbdd53d8be9eda70ac78/demo.gif -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: lzu 2 | channels: 3 | - pytorch 4 | - nvidia 5 | - defaults 6 | dependencies: 7 | - _libgcc_mutex=0.1=main 8 | - _openmp_mutex=5.1=1_gnu 9 | - blas=1.0=mkl 10 | - brotlipy=0.7.0=py38h27cfd23_1003 11 | - bzip2=1.0.8=h7b6447c_0 12 | - ca-certificates=2023.01.10=h06a4308_0 13 | - certifi=2022.12.7=py38h06a4308_0 14 | - cffi=1.15.1=py38h5eee18b_3 15 | - charset-normalizer=2.0.4=pyhd3eb1b0_0 16 | - cryptography=39.0.1=py38h9ce1e76_0 17 | - cuda=11.6.1=0 18 | - cuda-cccl=11.6.55=hf6102b2_0 19 | - cuda-command-line-tools=11.6.2=0 20 | - cuda-compiler=11.6.2=0 21 | - cuda-cudart=11.6.55=he381448_0 22 | - cuda-cudart-dev=11.6.55=h42ad0f4_0 23 | - cuda-cuobjdump=11.6.124=h2eeebcb_0 24 | - cuda-cupti=11.6.124=h86345e5_0 25 | - cuda-cuxxfilt=11.6.124=hecbf4f6_0 26 | - cuda-driver-dev=11.6.55=0 27 | - cuda-gdb=12.1.55=0 28 | - cuda-libraries=11.6.1=0 29 | - cuda-libraries-dev=11.6.1=0 30 | - cuda-memcheck=11.8.86=0 31 | - cuda-nsight=12.1.55=0 32 | - cuda-nsight-compute=12.1.0=0 33 | - cuda-nvcc=11.6.124=hbba6d2d_0 34 | - cuda-nvdisasm=12.1.55=0 35 | - cuda-nvml-dev=11.6.55=haa9ef22_0 36 | - cuda-nvprof=12.1.55=0 37 | - cuda-nvprune=11.6.124=he22ec0a_0 38 | - cuda-nvrtc=11.6.124=h020bade_0 39 | - cuda-nvrtc-dev=11.6.124=h249d397_0 40 | - cuda-nvtx=11.6.124=h0630a44_0 41 | - cuda-nvvp=12.1.55=0 42 | - cuda-runtime=11.6.1=0 43 | - cuda-samples=11.6.101=h8efea70_0 44 | - cuda-sanitizer-api=12.1.55=0 45 | - cuda-toolkit=11.6.1=0 46 | - cuda-tools=11.6.1=0 47 | - cuda-visual-tools=11.6.1=0 48 | - ffmpeg=4.3=hf484d3e_0 49 | - flit-core=3.6.0=pyhd3eb1b0_0 50 | - freetype=2.12.1=h4a9f257_0 51 | - gds-tools=1.6.0.25=0 52 | - giflib=5.2.1=h5eee18b_3 53 | - gmp=6.2.1=h295c915_3 54 | - gnutls=3.6.15=he1e5248_0 55 | - idna=3.4=py38h06a4308_0 56 | - intel-openmp=2021.4.0=h06a4308_3561 57 | - jpeg=9e=h5eee18b_1 58 | - lame=3.100=h7b6447c_0 59 | - lcms2=2.12=h3be6417_0 60 | - ld_impl_linux-64=2.38=h1181459_1 61 | - lerc=3.0=h295c915_0 62 | - libcublas=11.9.2.110=h5e84587_0 63 | - libcublas-dev=11.9.2.110=h5c901ab_0 64 | - libcufft=10.7.1.112=hf425ae0_0 65 | - libcufft-dev=10.7.1.112=ha5ce4c0_0 66 | - libcufile=1.6.0.25=0 67 | - libcufile-dev=1.6.0.25=0 68 | - libcurand=10.3.2.56=0 69 | - libcurand-dev=10.3.2.56=0 70 | - libcusolver=11.3.4.124=h33c3c4e_0 71 | - libcusparse=11.7.2.124=h7538f96_0 72 | - libcusparse-dev=11.7.2.124=hbbe9722_0 73 | - libdeflate=1.17=h5eee18b_0 74 | - libffi=3.4.2=h6a678d5_6 75 | - libgcc-ng=11.2.0=h1234567_1 76 | - libgomp=11.2.0=h1234567_1 77 | - libiconv=1.16=h7f8727e_2 78 | - libidn2=2.3.2=h7f8727e_0 79 | - libnpp=11.6.3.124=hd2722f0_0 80 | - libnpp-dev=11.6.3.124=h3c42840_0 81 | - libnvjpeg=11.6.2.124=hd473ad6_0 82 | - libnvjpeg-dev=11.6.2.124=hb5906b9_0 83 | - libpng=1.6.39=h5eee18b_0 84 | - libstdcxx-ng=11.2.0=h1234567_1 85 | - libtasn1=4.16.0=h27cfd23_0 86 | - libtiff=4.5.0=h6a678d5_2 87 | - libunistring=0.9.10=h27cfd23_0 88 | - libwebp=1.2.4=h11a3e52_1 89 | - libwebp-base=1.2.4=h5eee18b_1 90 | - lz4-c=1.9.4=h6a678d5_0 91 | - mkl=2021.4.0=h06a4308_640 92 | - mkl-service=2.4.0=py38h7f8727e_0 93 | - mkl_fft=1.3.1=py38hd3c417c_0 94 | - mkl_random=1.2.2=py38h51133e4_0 95 | - ncurses=6.4=h6a678d5_0 96 | - nettle=3.7.3=hbbd107a_1 97 | - nsight-compute=2023.1.0.15=0 98 | - numpy=1.23.5=py38h14f4228_0 99 | - numpy-base=1.23.5=py38h31eccc5_0 100 | - openh264=2.1.1=h4ff587b_0 101 | - openssl=1.1.1t=h7f8727e_0 102 | - pillow=9.4.0=py38h6a678d5_0 103 | - pip=23.0.1=py38h06a4308_0 104 | - pycparser=2.21=pyhd3eb1b0_0 105 | - pyopenssl=23.0.0=py38h06a4308_0 106 | - pysocks=1.7.1=py38h06a4308_0 107 | - python=3.8.16=h7a1cb2a_3 108 | - pytorch=1.13.1=py3.8_cuda11.6_cudnn8.3.2_0 109 | - pytorch-cuda=11.6=h867d48c_1 110 | - pytorch-mutex=1.0=cuda 111 | - readline=8.2=h5eee18b_0 112 | - requests=2.28.1=py38h06a4308_1 113 | - setuptools=65.6.3=py38h06a4308_0 114 | - six=1.16.0=pyhd3eb1b0_1 115 | - sqlite=3.41.1=h5eee18b_0 116 | - tk=8.6.12=h1ccaba5_0 117 | - torchvision=0.14.1=py38_cu116 118 | - typing_extensions=4.4.0=py38h06a4308_0 119 | - urllib3=1.26.14=py38h06a4308_0 120 | - wheel=0.38.4=py38h06a4308_0 121 | - xz=5.2.10=h5eee18b_1 122 | - zlib=1.2.13=h5eee18b_0 123 | - zstd=1.5.2=ha4553b6_0 124 | - pip: 125 | - absl-py==1.4.0 126 | - addict==2.4.0 127 | - anyio==3.6.2 128 | - argon2-cffi==21.3.0 129 | - argon2-cffi-bindings==21.2.0 130 | - arrow==1.2.3 131 | - asttokens==2.2.1 132 | - attrs==22.2.0 133 | - backcall==0.2.0 134 | - beautifulsoup4==4.12.0 135 | - black==23.1.0 136 | - bleach==6.0.0 137 | - cachetools==5.3.0 138 | - click==8.1.3 139 | - comm==0.1.2 140 | - contourpy==1.0.7 141 | - cycler==0.11.0 142 | - debugpy==1.6.6 143 | - decorator==5.1.1 144 | - defusedxml==0.7.1 145 | - descartes==1.1.0 146 | - exceptiongroup==1.1.1 147 | - executing==1.2.0 148 | - fastjsonschema==2.16.3 149 | - fire==0.5.0 150 | - flake8==6.0.0 151 | - fonttools==4.39.2 152 | - fqdn==1.5.1 153 | - google-auth==2.16.2 154 | - google-auth-oauthlib==0.4.6 155 | - grpcio==1.51.3 156 | - imageio==2.26.1 157 | - importlib-metadata==6.1.0 158 | - importlib-resources==5.12.0 159 | - iniconfig==2.0.0 160 | - ipykernel==6.22.0 161 | - ipython==8.11.0 162 | - ipython-genutils==0.2.0 163 | - ipywidgets==8.0.4 164 | - isoduration==20.11.0 165 | - jedi==0.18.2 166 | - jinja2==3.1.2 167 | - joblib==1.2.0 168 | - jsonpointer==2.3 169 | - jsonschema==4.17.3 170 | - jupyter==1.0.0 171 | - jupyter-client==8.1.0 172 | - jupyter-console==6.6.3 173 | - jupyter-core==5.3.0 174 | - jupyter-events==0.6.3 175 | - jupyter-server==2.5.0 176 | - jupyter-server-terminals==0.4.4 177 | - jupyterlab-pygments==0.2.2 178 | - jupyterlab-widgets==3.0.5 179 | - kiwisolver==1.4.4 180 | - llvmlite==0.36.0 181 | - lyft-dataset-sdk==0.0.8 182 | - markdown==3.4.1 183 | - markupsafe==2.1.2 184 | - matplotlib==3.5.2 185 | - matplotlib-inline==0.1.6 186 | - mccabe==0.7.0 187 | - mistune==2.0.5 188 | - mmcls==0.25.0 189 | - mmcv-full==1.7.0 190 | - mmdet==2.28.2 191 | - mmsegmentation==0.30.0 192 | - mypy-extensions==1.0.0 193 | - nbclassic==0.5.3 194 | - nbclient==0.7.2 195 | - nbconvert==7.2.10 196 | - nbformat==5.8.0 197 | - nest-asyncio==1.5.6 198 | - networkx==2.2 199 | - notebook==6.5.3 200 | - notebook-shim==0.2.2 201 | - numba==0.53.0 202 | - nuscenes-devkit==1.1.10 203 | - oauthlib==3.2.2 204 | - opencv-python==4.7.0.72 205 | - packaging==23.0 206 | - pandas==1.5.3 207 | - pandocfilters==1.5.0 208 | - parso==0.8.3 209 | - pathspec==0.11.1 210 | - pexpect==4.8.0 211 | - pickleshare==0.7.5 212 | - pkgutil-resolve-name==1.3.10 213 | - platformdirs==3.1.1 214 | - plotly==5.13.1 215 | - pluggy==1.0.0 216 | - plyfile==0.8.1 217 | - prettytable==3.6.0 218 | - prometheus-client==0.16.0 219 | - prompt-toolkit==3.0.38 220 | - protobuf==4.22.1 221 | - psutil==5.9.4 222 | - ptyprocess==0.7.0 223 | - pure-eval==0.2.2 224 | - pyasn1==0.4.8 225 | - pyasn1-modules==0.2.8 226 | - pycocotools==2.0.6 227 | - pycodestyle==2.10.0 228 | - pyflakes==3.0.1 229 | - pygments==2.14.0 230 | - pyparsing==3.0.9 231 | - pyquaternion==0.9.9 232 | - pyrsistent==0.19.3 233 | - pytest==7.2.2 234 | - python-dateutil==2.8.2 235 | - python-json-logger==2.0.7 236 | - pytz==2022.7.1 237 | - pywavelets==1.4.1 238 | - pyyaml==6.0 239 | - pyzmq==25.0.2 240 | - qtconsole==5.4.1 241 | - qtpy==2.3.0 242 | - requests-oauthlib==1.3.1 243 | - rfc3339-validator==0.1.4 244 | - rfc3986-validator==0.1.1 245 | - rsa==4.9 246 | - scikit-image==0.19.3 247 | - scikit-learn==1.2.2 248 | - scipy==1.10.1 249 | - send2trash==1.8.0 250 | - shapely==1.8.5 251 | - sniffio==1.3.0 252 | - soupsieve==2.4 253 | - stack-data==0.6.2 254 | - tenacity==8.2.2 255 | - tensorboard==2.12.0 256 | - tensorboard-data-server==0.7.0 257 | - tensorboard-plugin-wit==1.8.1 258 | - termcolor==2.2.0 259 | - terminado==0.17.1 260 | - terminaltables==3.1.10 261 | - threadpoolctl==3.1.0 262 | - tifffile==2023.3.15 263 | - tinycss2==1.2.1 264 | - tomli==2.0.1 265 | - tornado==6.2 266 | - tqdm==4.65.0 267 | - traitlets==5.9.0 268 | - trimesh==2.35.39 269 | - uri-template==1.2.0 270 | - wcwidth==0.2.6 271 | - webcolors==1.12 272 | - webencodings==0.5.1 273 | - websocket-client==1.5.1 274 | - werkzeug==2.2.3 275 | - widgetsnbextension==4.0.5 276 | - yapf==0.32.0 277 | - zipp==3.15.0 278 | prefix: /home/cthavama/miniconda3/envs/lzu 279 | -------------------------------------------------------------------------------- /lzu/__init__.py: -------------------------------------------------------------------------------- 1 | from .fcos_mono3d_head_norescale import * # noqa: F401,F403 2 | from .lzu_fcos_mono3d import * # noqa: F401,F403 3 | from .transforms_3d import * # noqa: F401,F403 4 | -------------------------------------------------------------------------------- /lzu/fcos_mono3d_head_norescale.py: -------------------------------------------------------------------------------- 1 | from mmdet.models.builder import HEADS 2 | from mmdet3d.models.dense_heads import FCOSMono3DHead 3 | 4 | 5 | @HEADS.register_module() 6 | class FCOSMono3DHeadNoRescale(FCOSMono3DHead): 7 | """Same as original head, except ignores rescaling at end.""" 8 | 9 | def _get_bboxes_single(self, 10 | cls_scores, 11 | bbox_preds, 12 | dir_cls_preds, 13 | attr_preds, 14 | centernesses, 15 | mlvl_points, 16 | input_meta, 17 | cfg, 18 | rescale=False): 19 | return super()._get_bboxes_single(cls_scores, bbox_preds, 20 | dir_cls_preds, attr_preds, 21 | centernesses, mlvl_points, 22 | input_meta, cfg, rescale=False) 23 | -------------------------------------------------------------------------------- /lzu/fixed_grid.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | 3 | import numpy as np 4 | import torch 5 | import torch.nn as nn 6 | import torch.nn.functional as F 7 | from mmdet3d.models.builder import MODELS 8 | 9 | 10 | GRID_GENERATORS = MODELS 11 | 12 | 13 | def build_grid_generator(cfg): 14 | """Build view transformer.""" 15 | return GRID_GENERATORS.build(cfg) 16 | 17 | 18 | def make1DGaussian(size, fwhm=3, center=None): 19 | """ Make a 1D gaussian kernel. 20 | 21 | size is the length of the kernel, 22 | fwhm is full-width-half-maximum, which 23 | can be thought of as an effective radius. 24 | """ 25 | x = np.arange(0, size, 1, dtype=np.float) 26 | 27 | if center is None: 28 | center = size // 2 29 | 30 | return np.exp(-4*np.log(2) * (x-center)**2 / fwhm**2) 31 | 32 | 33 | def make2DGaussian(size, fwhm=3, center=None): 34 | """ Make a square gaussian kernel. 35 | 36 | size is the length of a side of the square 37 | fwhm is full-width-half-maximum, which 38 | can be thought of as an effective radius. 39 | """ 40 | 41 | x = np.arange(0, size, 1, float) 42 | y = x[:, np.newaxis] 43 | 44 | if center is None: 45 | x0 = y0 = size // 2 46 | else: 47 | x0 = center[0] 48 | y0 = center[1] 49 | 50 | return np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / fwhm**2) 51 | 52 | 53 | class RecasensSaliencyToGridMixin(object): 54 | """Grid generator based on 'Learning to Zoom: a Saliency-Based Sampling \ 55 | Layer for Neural Networks' [https://arxiv.org/pdf/1809.03355.pdf].""" 56 | 57 | def __init__(self, output_shape, grid_shape=(31, 51), separable=True, 58 | attraction_fwhm=13, anti_crop=True, **kwargs): 59 | super(RecasensSaliencyToGridMixin, self).__init__() 60 | self.output_shape = output_shape 61 | self.output_height, self.output_width = output_shape 62 | self.grid_shape = grid_shape 63 | self.padding_size = min(self.grid_shape)-1 64 | self.total_shape = tuple( 65 | dim+2*self.padding_size 66 | for dim in self.grid_shape 67 | ) 68 | self.padding_mode = 'reflect' if anti_crop else 'replicate' 69 | self.separable = separable 70 | 71 | if self.separable: 72 | self.filter = make1DGaussian( 73 | 2*self.padding_size+1, fwhm=attraction_fwhm) 74 | self.filter = torch.FloatTensor(self.filter).unsqueeze(0) \ 75 | .unsqueeze(0).cuda() 76 | 77 | self.P_basis_x = torch.zeros(self.total_shape[1]) 78 | for i in range(self.total_shape[1]): 79 | self.P_basis_x[i] = \ 80 | (i-self.padding_size)/(self.grid_shape[1]-1.0) 81 | self.P_basis_y = torch.zeros(self.total_shape[0]) 82 | for i in range(self.total_shape[0]): 83 | self.P_basis_y[i] = \ 84 | (i-self.padding_size)/(self.grid_shape[0]-1.0) 85 | else: 86 | self.filter = make2DGaussian( 87 | 2*self.padding_size+1, fwhm=attraction_fwhm) 88 | self.filter = torch.FloatTensor(self.filter) \ 89 | .unsqueeze(0).unsqueeze(0).cuda() 90 | 91 | self.P_basis = torch.zeros(2, *self.total_shape) 92 | for k in range(2): 93 | for i in range(self.total_shape[0]): 94 | for j in range(self.total_shape[1]): 95 | self.P_basis[k, i, j] = k*(i-self.padding_size)/(self.grid_shape[0]-1.0)+(1.0-k)*(j-self.padding_size)/(self.grid_shape[1]-1.0) # noqa: E501 96 | 97 | def separable_saliency_to_grid(self, imgs, x_saliency, 98 | y_saliency, device): 99 | assert self.separable 100 | x_saliency = F.pad(x_saliency, (self.padding_size, self.padding_size), 101 | mode=self.padding_mode) 102 | y_saliency = F.pad(y_saliency, (self.padding_size, self.padding_size), 103 | mode=self.padding_mode) 104 | 105 | N = imgs.shape[0] 106 | P_x = torch.zeros(1, 1, self.total_shape[1], device=device) 107 | P_x[0, 0, :] = self.P_basis_x 108 | P_x = P_x.expand(N, 1, self.total_shape[1]) 109 | P_y = torch.zeros(1, 1, self.total_shape[0], device=device) 110 | P_y[0, 0, :] = self.P_basis_y 111 | P_y = P_y.expand(N, 1, self.total_shape[0]) 112 | 113 | weights = F.conv1d(x_saliency, self.filter) 114 | weighted_offsets = torch.mul(P_x, x_saliency) 115 | weighted_offsets = F.conv1d(weighted_offsets, self.filter) 116 | xgrid = weighted_offsets/weights 117 | xgrid = torch.clamp(xgrid*2-1, min=-1, max=1) 118 | xgrid = xgrid.view(-1, 1, 1, self.grid_shape[1]) 119 | xgrid = xgrid.expand(-1, 1, *self.grid_shape) 120 | 121 | weights = F.conv1d(y_saliency, self.filter) 122 | weighted_offsets = F.conv1d(torch.mul(P_y, y_saliency), self.filter) 123 | ygrid = weighted_offsets/weights 124 | ygrid = torch.clamp(ygrid*2-1, min=-1, max=1) 125 | ygrid = ygrid.view(-1, 1, self.grid_shape[0], 1) 126 | ygrid = ygrid.expand(-1, 1, *self.grid_shape) 127 | 128 | grid = torch.cat((xgrid, ygrid), 1) 129 | upsampled_grid = F.interpolate(grid, size=self.output_shape, 130 | mode='bilinear', align_corners=True) 131 | return upsampled_grid.permute(0, 2, 3, 1), grid.permute(0, 2, 3, 1) 132 | 133 | def nonseparable_saliency_to_grid(self, imgs, saliency, device): 134 | assert not self.separable 135 | p = self.padding_size 136 | saliency = F.pad(saliency, (p, p, p, p), mode=self.padding_mode) 137 | 138 | N = imgs.shape[0] 139 | P = torch.zeros(1, 2, *self.total_shape, device=device) 140 | P[0, :, :, :] = self.P_basis 141 | P = P.expand(N, 2, *self.total_shape) 142 | 143 | saliency_cat = torch.cat((saliency, saliency), 1) 144 | weights = F.conv2d(saliency, self.filter) 145 | weighted_offsets = torch.mul(P, saliency_cat) \ 146 | .view(-1, 1, *self.total_shape) 147 | weighted_offsets = F.conv2d(weighted_offsets, self.filter) \ 148 | .view(-1, 2, *self.grid_shape) 149 | 150 | weighted_offsets_x = weighted_offsets[:, 0, :, :] \ 151 | .contiguous().view(-1, 1, *self.grid_shape) 152 | xgrid = weighted_offsets_x/weights 153 | xgrid = torch.clamp(xgrid*2-1, min=-1, max=1) 154 | xgrid = xgrid.view(-1, 1, *self.grid_shape) 155 | 156 | weighted_offsets_y = weighted_offsets[:, 1, :, :] \ 157 | .contiguous().view(-1, 1, *self.grid_shape) 158 | ygrid = weighted_offsets_y/weights 159 | ygrid = torch.clamp(ygrid*2-1, min=-1, max=1) 160 | ygrid = ygrid.view(-1, 1, *self.grid_shape) 161 | 162 | grid = torch.cat((xgrid, ygrid), 1) 163 | upsampled_grid = F.interpolate(grid, size=self.output_shape, 164 | mode='bilinear', align_corners=True) 165 | return upsampled_grid.permute(0, 2, 3, 1), grid.permute(0, 2, 3, 1) 166 | 167 | 168 | @GRID_GENERATORS.register_module() 169 | class FixedGrid(nn.Module, RecasensSaliencyToGridMixin): 170 | """Grid generator that uses a fixed saliency map -- KDE SD""" 171 | 172 | def __init__(self, saliency_file, **kwargs): 173 | super(FixedGrid, self).__init__() 174 | RecasensSaliencyToGridMixin.__init__(self, **kwargs) 175 | self.saliency = pickle.load(open(saliency_file, 'rb')).cuda() 176 | 177 | if self.separable: 178 | x_saliency = self.saliency.sum(dim=2) 179 | y_saliency = self.saliency.sum(dim=3) 180 | self.upsampled_grid, self.grid = self.separable_saliency_to_grid( 181 | torch.zeros(1), x_saliency, y_saliency, torch.device('cuda')) 182 | else: 183 | self.upsampled_grid, self.grid = ( 184 | self.nonseparable_saliency_to_grid( 185 | torch.zeros(1), self.saliency, torch.device('cuda')) 186 | ) 187 | 188 | def forward(self, imgs, img_metas, **kwargs): 189 | B = imgs.shape[0] 190 | upsampled_grid = self.upsampled_grid.expand(B, -1, -1, -1) 191 | grid = self.grid.expand(B, -1, -1, -1) 192 | 193 | # Uncomment to visualize saliency map 194 | # h, w, _ = img_metas[0]['pad_shape'] 195 | # show_saliency = F.interpolate(self.saliency, size=(h, w), 196 | # mode='bilinear', align_corners=True) 197 | # show_saliency = 255*(show_saliency/show_saliency.max()) 198 | # show_saliency = show_saliency.expand( 199 | # show_saliency.size(0), 3, h, w) 200 | # vis_batched_imgs(vis_options['saliency'], show_saliency, 201 | # img_metas, denorm=False) 202 | # vis_batched_imgs(vis_options['saliency']+'_no_box', show_saliency, 203 | # img_metas, bboxes=None, denorm=False) 204 | 205 | return upsampled_grid, grid 206 | -------------------------------------------------------------------------------- /lzu/invert_grid.py: -------------------------------------------------------------------------------- 1 | from math import floor, ceil 2 | from typing import List 3 | 4 | import torch 5 | 6 | 7 | def invert_grid(grid, input_shape, separable=False): 8 | f = invert_separable_grid if separable else invert_nonseparable_grid 9 | return f(grid, list(input_shape)) 10 | 11 | 12 | @torch.jit.script 13 | def invert_separable_grid(grid, input_shape: List[int]): 14 | grid = grid.clone() 15 | device = grid.device 16 | H: int = input_shape[2] 17 | W: int = input_shape[3] 18 | B, grid_H, grid_W, _ = grid.shape 19 | assert B == input_shape[0] 20 | 21 | eps = 1e-8 22 | grid[:, :, :, 0] = (grid[:, :, :, 0] + 1) / 2 * (W - 1) 23 | grid[:, :, :, 1] = (grid[:, :, :, 1] + 1) / 2 * (H - 1) 24 | # grid now ranges from 0 to ([H or W] - 1) 25 | # TODO: implement batch operations 26 | inverse_grid = 2 * max(H, W) * torch.ones( 27 | [B, H, W, 2], dtype=torch.float32, device=device) 28 | for b in range(B): 29 | # each of these is ((grid_H - 1)*(grid_W - 1)) x 2 30 | p00 = grid[b, :-1, :-1, :].contiguous().view(-1, 2) # noqa: 203 31 | p10 = grid[b, 1: , :-1, :].contiguous().view(-1, 2) # noqa: 203 32 | p01 = grid[b, :-1, 1: , :].contiguous().view(-1, 2) # noqa: 203 33 | 34 | ref = torch.floor(p00).to(torch.int) 35 | v00 = p00 - ref 36 | v10 = p10 - ref 37 | v01 = p01 - ref 38 | vx = p01[:, 0] - p00[:, 0] 39 | vy = p10[:, 1] - p00[:, 1] 40 | 41 | min_x = int(floor(v00[:, 0].min() - eps)) 42 | max_x = int(ceil(v01[:, 0].max() + eps)) 43 | min_y = int(floor(v00[:, 1].min() - eps)) 44 | max_y = int(ceil(v10[:, 1].max() + eps)) 45 | 46 | pts = torch.cartesian_prod( 47 | torch.arange(min_x, max_x + 1, device=device), 48 | torch.arange(min_y, max_y + 1, device=device), 49 | ).T # 2 x (x_range*y_range) 50 | 51 | unwarped_x = (pts[0].unsqueeze(0) - v00[:, 0].unsqueeze(1)) / vx.unsqueeze(1) # noqa: E501 52 | unwarped_y = (pts[1].unsqueeze(0) - v00[:, 1].unsqueeze(1)) / vy.unsqueeze(1) # noqa: E501 53 | unwarped_pts = torch.stack((unwarped_y, unwarped_x), dim=0) # noqa: E501, has shape2 x ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range) 54 | 55 | good_indices = torch.logical_and( 56 | torch.logical_and(-eps <= unwarped_pts[0], 57 | unwarped_pts[0] <= 1+eps), 58 | torch.logical_and(-eps <= unwarped_pts[1], 59 | unwarped_pts[1] <= 1+eps), 60 | ) # ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range) 61 | nonzero_good_indices = good_indices.nonzero() 62 | inverse_j = pts[0, nonzero_good_indices[:, 1]] + ref[nonzero_good_indices[:, 0], 0] # noqa: E501 63 | inverse_i = pts[1, nonzero_good_indices[:, 1]] + ref[nonzero_good_indices[:, 0], 1] # noqa: E501 64 | # TODO: is replacing this with reshape operations on good_indices faster? # noqa: E501 65 | j = nonzero_good_indices[:, 0] % (grid_W - 1) 66 | i = nonzero_good_indices[:, 0] // (grid_W - 1) 67 | grid_mappings = torch.stack( 68 | (j + unwarped_pts[1, good_indices], i + unwarped_pts[0, good_indices]), # noqa: E501 69 | dim=1 70 | ) 71 | in_bounds = torch.logical_and( 72 | torch.logical_and(0 <= inverse_i, inverse_i < H), 73 | torch.logical_and(0 <= inverse_j, inverse_j < W), 74 | ) 75 | inverse_grid[b, inverse_i[in_bounds], inverse_j[in_bounds], :] = grid_mappings[in_bounds, :] # noqa: E501 76 | 77 | inverse_grid[..., 0] = (inverse_grid[..., 0]) / (grid_W - 1) * 2.0 - 1.0 # noqa: E501 78 | inverse_grid[..., 1] = (inverse_grid[..., 1]) / (grid_H - 1) * 2.0 - 1.0 # noqa: E501 79 | return inverse_grid 80 | 81 | 82 | def invert_nonseparable_grid(grid, input_shape): 83 | grid = grid.clone() 84 | device = grid.device 85 | _, _, H, W = input_shape 86 | B, grid_H, grid_W, _ = grid.shape 87 | assert B == input_shape[0] 88 | 89 | eps = 1e-8 90 | grid[:, :, :, 0] = (grid[:, :, :, 0] + 1) / 2 * (W - 1) 91 | grid[:, :, :, 1] = (grid[:, :, :, 1] + 1) / 2 * (H - 1) 92 | # grid now ranges from 0 to ([H or W] - 1) 93 | # TODO: implement batch operations 94 | inverse_grid = 2 * max(H, W) * torch.ones( 95 | (B, H, W, 2), dtype=torch.float32, device=device) 96 | for b in range(B): 97 | # each of these is ((grid_H - 1)*(grid_W - 1)) x 2 98 | p00 = grid[b, :-1, :-1, :].contiguous().view(-1, 2) # noqa: 203 99 | p10 = grid[b, 1: , :-1, :].contiguous().view(-1, 2) # noqa: 203 100 | p01 = grid[b, :-1, 1: , :].contiguous().view(-1, 2) # noqa: 203 101 | p11 = grid[b, 1: , 1: , :].contiguous().view(-1, 2) # noqa: 203 102 | 103 | ref = torch.floor(p00).type(torch.int) 104 | v00 = p00 - ref 105 | v10 = p10 - ref 106 | v01 = p01 - ref 107 | v11 = p11 - ref 108 | 109 | min_x = int(floor(min(v00[:, 0].min(), v10[:, 0].min()) - eps)) 110 | max_x = int(ceil(max(v01[:, 0].max(), v11[:, 0].max()) + eps)) 111 | min_y = int(floor(min(v00[:, 1].min(), v01[:, 1].min()) - eps)) 112 | max_y = int(ceil(max(v10[:, 1].max(), v11[:, 1].max()) + eps)) 113 | 114 | pts = torch.cartesian_prod( 115 | torch.arange(min_x, max_x + 1, device=device), 116 | torch.arange(min_y, max_y + 1, device=device), 117 | ).T 118 | 119 | # each of these is ((grid_H - 1)*(grid_W - 1)) x 2 120 | vb = v10 - v00 121 | vc = v01 - v00 122 | vd = v00 - v10 - v01 + v11 123 | 124 | vx = pts.permute(1, 0).unsqueeze(0) # 1 x (x_range*y_range) x 2 125 | Ma = v00.unsqueeze(1) - vx # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range) x 2 126 | 127 | vc_cross_vd = (vc[:, 0] * vd[:, 1] - vc[:, 1] * vd[:, 0]).unsqueeze(1) # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x 1 128 | vc_cross_vb = (vc[:, 0] * vb[:, 1] - vc[:, 1] * vb[:, 0]).unsqueeze(1) # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x 1 129 | Ma_cross_vd = (Ma[:, :, 0] * vd[:, 1].unsqueeze(1) - Ma[:, :, 1] * vd[:, 0].unsqueeze(1)) # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range) 130 | Ma_cross_vb = (Ma[:, :, 0] * vb[:, 1].unsqueeze(1) - Ma[:, :, 1] * vb[:, 0].unsqueeze(1)) # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range) 131 | 132 | qf_a = vc_cross_vd.expand(*Ma_cross_vd.shape) 133 | qf_b = vc_cross_vb + Ma_cross_vd 134 | qf_c = Ma_cross_vb 135 | 136 | mu_neg = -1 * torch.ones_like(Ma_cross_vd) 137 | mu_pos = -1 * torch.ones_like(Ma_cross_vd) 138 | mu_linear = -1 * torch.ones_like(Ma_cross_vd) 139 | 140 | nzie = (qf_a.abs() > 1e-10).expand(*Ma_cross_vd.shape) 141 | 142 | disc = (qf_b[nzie]**2 - 4 * qf_a[nzie] * qf_c[nzie]) ** 0.5 143 | mu_pos[nzie] = (-qf_b[nzie] + disc) / (2 * qf_a[nzie]) 144 | mu_neg[nzie] = (-qf_b[nzie] - disc) / (2 * qf_a[nzie]) 145 | mu_linear[~nzie] = qf_c[~nzie] / qf_b[~nzie] 146 | 147 | mu_pos_valid = torch.logical_and(mu_pos >= 0, mu_pos <= 1) 148 | mu_neg_valid = torch.logical_and(mu_neg >= 0, mu_neg <= 1) 149 | mu_linear_valid = torch.logical_and(mu_linear >= 0, mu_linear <= 1) 150 | 151 | mu = -1 * torch.ones_like(Ma_cross_vd) 152 | mu[mu_pos_valid] = mu_pos[mu_pos_valid] 153 | mu[mu_neg_valid] = mu_neg[mu_neg_valid] 154 | mu[mu_linear_valid] = mu_linear[mu_linear_valid] 155 | 156 | lmbda = -1 * (Ma[:, :, 1] + mu * vc[:, 1:2]) / (vb[:, 1:2] + vd[:, 1:2] * mu) # noqa: E501 157 | 158 | unwarped_pts = torch.stack((lmbda, mu), dim=0) 159 | 160 | good_indices = torch.logical_and( 161 | torch.logical_and(-eps <= unwarped_pts[0], 162 | unwarped_pts[0] <= 1+eps), 163 | torch.logical_and(-eps <= unwarped_pts[1], 164 | unwarped_pts[1] <= 1+eps), 165 | ) # ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range) 166 | nonzero_good_indices = good_indices.nonzero() 167 | inverse_j = pts[0, nonzero_good_indices[:, 1]] + ref[nonzero_good_indices[:, 0], 0] # noqa: E501 168 | inverse_i = pts[1, nonzero_good_indices[:, 1]] + ref[nonzero_good_indices[:, 0], 1] # noqa: E501 169 | # TODO: is replacing this with reshape operations on good_indices faster? # noqa: E501 170 | j = nonzero_good_indices[:, 0] % (grid_W - 1) 171 | i = nonzero_good_indices[:, 0] // (grid_W - 1) 172 | grid_mappings = torch.stack( 173 | (j + unwarped_pts[1, good_indices], i + unwarped_pts[0, good_indices]), # noqa: E501 174 | dim=1 175 | ) 176 | in_bounds = torch.logical_and( 177 | torch.logical_and(0 <= inverse_i, inverse_i < H), 178 | torch.logical_and(0 <= inverse_j, inverse_j < W), 179 | ) 180 | inverse_grid[b, inverse_i[in_bounds], inverse_j[in_bounds], :] = grid_mappings[in_bounds, :] # noqa: E501 181 | 182 | inverse_grid[..., 0] = (inverse_grid[..., 0]) / (grid_W - 1) * 2.0 - 1.0 # noqa: E501 183 | inverse_grid[..., 1] = (inverse_grid[..., 1]) / (grid_H - 1) * 2.0 - 1.0 # noqa: E501 184 | return inverse_grid 185 | -------------------------------------------------------------------------------- /lzu/lzu_fcos_mono3d.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import numpy as np 3 | import torch.nn.functional as F 4 | from mmdet.models.builder import DETECTORS 5 | 6 | from mmdet3d.core import bbox3d2result 7 | from mmdet3d.models.detectors.single_stage_mono3d import ( 8 | SingleStageMono3DDetector 9 | ) 10 | 11 | from .invert_grid import invert_grid 12 | from .fixed_grid import build_grid_generator 13 | 14 | 15 | @DETECTORS.register_module() 16 | class LZUFCOSMono3D(SingleStageMono3DDetector): 17 | """Fixed LZU + FCOSMono3D""" 18 | 19 | def __init__(self, 20 | grid_generator, 21 | backbone, 22 | neck, 23 | bbox_head, 24 | train_cfg=None, 25 | test_cfg=None, 26 | pretrained=None): 27 | super(LZUFCOSMono3D, self).__init__(backbone, neck, bbox_head, 28 | train_cfg, test_cfg, pretrained) 29 | self.grid_generator = build_grid_generator(grid_generator) 30 | self.forward_grids = None # cache forward warp "zoom" grids 31 | self.inverse_grids = None # cache inverse warp "zoom" grids 32 | self.times = [] 33 | 34 | def _get_scale_factor(self, ori_shape, new_shape): 35 | ori_height, ori_width, _ = ori_shape 36 | img_height, img_width, _ = new_shape 37 | w_scale = img_width / ori_width 38 | h_scale = img_height / ori_height 39 | assert w_scale == h_scale 40 | return w_scale 41 | 42 | def extract_feat(self, img, img_metas, **kwargs): 43 | """Directly extract features from the backbone+neck.""" 44 | 45 | # "zoom" or forward warp input image 46 | if self.forward_grids is None: 47 | upsampled_grid, grid = self.grid_generator( 48 | img, img_metas, **kwargs) 49 | self.forward_grids = upsampled_grid[0:1], grid[0:1] 50 | else: 51 | upsampled_grid, grid = self.forward_grids 52 | B = img.shape[0] 53 | upsampled_grid = upsampled_grid.expand(B, -1, -1, -1) 54 | 55 | # Uncomment and change scale factor to run upsampling experiments 56 | # warped_imgs = F.interpolate(img, scale_factor=0.75) 57 | # warped_imgs = F.grid_sample( 58 | # warped_imgs, upsampled_grid, align_corners=True) 59 | warped_imgs = F.grid_sample(img, upsampled_grid, align_corners=True) 60 | 61 | # Uncomment to visualize "zoomed" images 62 | # from mmcv import imdenormalize 63 | # from PIL import Image 64 | # import os 65 | # show_img = warped_imgs[0].permute(1, 2, 0).cpu().detach().numpy() 66 | # show_img = imdenormalize( 67 | # show_img, 68 | # mean=np.array([103.53, 116.28, 123.675]), 69 | # std=np.array([1.0, 1.0, 1.0]), 70 | # to_bgr=True) 71 | # img_name = os.path.basename(img_metas[0]['filename'])[:-4] 72 | # Image.fromarray(show_img.astype(np.uint8)).save(f'/project_data/ramanan/cthavama/FOVEAv2_exp/3D/lzu_fcos3d_sd/test_FT/vis_warped/{img_name}.png') # noqa: E501 73 | # breakpoint() 74 | 75 | img_height, img_width = upsampled_grid.shape[1:3] 76 | img_shape = (img_height, img_width, 3) 77 | ori_shape = img_metas[0]['ori_shape'] 78 | scale_factor = self._get_scale_factor(ori_shape, img_shape) 79 | 80 | # pad warped images; TODO: undo hardcoding size divisor of 32 81 | pad_h = int(np.ceil(img_shape[0] / 32)) * 32 - img_shape[0] 82 | pad_w = int(np.ceil(img_shape[1] / 32)) * 32 - img_shape[1] 83 | warped_imgs = F.pad( 84 | warped_imgs, (0, pad_w, 0, pad_h), mode='constant', value=0) 85 | pad_shape = (warped_imgs.shape[2], warped_imgs.shape[3], 3) 86 | 87 | # update img metas, assuming that all imgs have the same original shape 88 | for i in range(len(img_metas)): 89 | img_metas[i]['img_shape'] = img_shape 90 | img_metas[i]['scale_factor'] = scale_factor 91 | img_metas[i]['pad_shape'] = pad_shape 92 | img_metas[i]['pad_fixed_size'] = None 93 | img_metas[i]['pad_size_divisor'] = 32 94 | # resize ground truth boxes and centers 95 | if 'centers2d' in kwargs: 96 | kwargs['centers2d'][i] *= scale_factor 97 | if 'gt_bboxes' in kwargs: 98 | kwargs['gt_bboxes'][i] *= scale_factor 99 | for j in range(len(img_metas[i]['cam2img'][0])): 100 | img_metas[i]['cam2img'][0][j] *= scale_factor 101 | img_metas[i]['cam2img'][1][j] *= scale_factor 102 | 103 | # Encode 104 | warped_x = self.backbone(warped_imgs) 105 | if self.with_neck: 106 | warped_x = self.neck(warped_x) 107 | 108 | # Unzoom 109 | x = [] 110 | # precompute and cache inverses 111 | separable = self.grid_generator.separable 112 | if self.inverse_grids is None: 113 | self.inverse_grids = [] 114 | for i in range(len(warped_x)): 115 | input_shape = warped_x[i].shape 116 | inverse_grid = invert_grid(grid, input_shape, 117 | separable=separable)[0:1] 118 | self.inverse_grids.append(inverse_grid) 119 | # perform unzoom 120 | for i in range(len(warped_x)): 121 | B = len(warped_x[i]) 122 | inverse_grid = self.inverse_grids[i].expand(B, -1, -1, -1) 123 | unwarped_x = F.grid_sample( 124 | warped_x[i], inverse_grid, mode='bilinear', 125 | align_corners=True, padding_mode='zeros' 126 | ) 127 | x.append(unwarped_x) 128 | 129 | return tuple(x) 130 | 131 | def forward_train(self, 132 | img, 133 | img_metas, 134 | gt_bboxes, 135 | gt_labels, 136 | gt_bboxes_3d, 137 | gt_labels_3d, 138 | centers2d, 139 | depths, 140 | attr_labels=None, 141 | gt_bboxes_ignore=None): 142 | x = self.extract_feat(img, img_metas, 143 | gt_bboxes=gt_bboxes, centers2d=centers2d) 144 | losses = self.bbox_head.forward_train(x, img_metas, gt_bboxes, 145 | gt_labels, gt_bboxes_3d, 146 | gt_labels_3d, centers2d, depths, 147 | attr_labels, gt_bboxes_ignore) 148 | return losses 149 | 150 | def simple_test(self, img, img_metas, rescale=False, **kwargs): 151 | """Test function without test time augmentation. 152 | 153 | Args: 154 | imgs (list[torch.Tensor]): List of multiple images 155 | img_metas (list[dict]): List of image information. 156 | rescale (bool, optional): Whether to rescale the results. 157 | Defaults to False. 158 | 159 | Returns: 160 | list[list[np.ndarray]]: BBox results of each image and classes. 161 | The outer list corresponds to each image. The inner list 162 | corresponds to each class. 163 | """ 164 | x = self.extract_feat(img, img_metas, **kwargs) 165 | outs = self.bbox_head(x) 166 | bbox_outputs = self.bbox_head.get_bboxes( 167 | *outs, img_metas, rescale=rescale) 168 | 169 | if self.bbox_head.pred_bbox2d: 170 | from mmdet.core import bbox2result 171 | bbox2d_img = [ 172 | bbox2result(bboxes2d, labels, self.bbox_head.num_classes) 173 | for bboxes, scores, labels, attrs, bboxes2d in bbox_outputs 174 | ] 175 | bbox_outputs = [bbox_outputs[0][:-1]] 176 | 177 | bbox_img = [ 178 | bbox3d2result(bboxes, scores, labels, attrs) 179 | for bboxes, scores, labels, attrs in bbox_outputs 180 | ] 181 | 182 | bbox_list = [dict() for i in range(len(img_metas))] 183 | for result_dict, img_bbox in zip(bbox_list, bbox_img): 184 | result_dict['img_bbox'] = img_bbox 185 | if self.bbox_head.pred_bbox2d: 186 | for result_dict, img_bbox2d in zip(bbox_list, bbox2d_img): 187 | result_dict['img_bbox2d'] = img_bbox2d 188 | return bbox_list 189 | 190 | def aug_test(self, imgs, img_metas, rescale=False): 191 | raise NotImplementedError 192 | -------------------------------------------------------------------------------- /lzu/transforms_3d.py: -------------------------------------------------------------------------------- 1 | from mmdet.datasets.builder import PIPELINES 2 | from mmdet.datasets.pipelines import Resize 3 | 4 | 5 | @PIPELINES.register_module() 6 | class Resize3D(Resize): 7 | 8 | def __call__(self, input_dict): 9 | """Call function to resize images, bounding boxes, masks, semantic 10 | segmentation map, *and additionally adjust the camera intrinsics*. 11 | 12 | Args: 13 | input_dict (dict): Result dict from loading pipeline. 14 | 15 | Returns: 16 | dict: Resized results, 'img_shape', 'pad_shape', 'scale_factor', \ 17 | 'keep_ratio' keys are added into result dict. 18 | """ 19 | input_dict = super().__call__(input_dict) 20 | w_scale, h_scale, _, _ = input_dict['scale_factor'] 21 | assert w_scale == h_scale 22 | 23 | input_dict['scale_factor'] = w_scale 24 | if 'centers2d' in input_dict: 25 | input_dict['centers2d'] *= w_scale 26 | for i in range(len(input_dict['cam2img'][0])): 27 | input_dict['cam2img'][0][i] *= w_scale 28 | input_dict['cam2img'][1][i] *= h_scale 29 | return input_dict 30 | -------------------------------------------------------------------------------- /run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | outDir="/project_data/ramanan/cthavama/LZU_release_tests" # CHANGE THIS 4 | nGPU=2 5 | 6 | case $1 in 7 | "fcos3d_0.25"|"fcos3d_0.50"|"fcos3d_0.75"|"fcos3d_1.00"|"lzu_fcos3d_0.25"|"lzu_fcos3d_0.50"|"lzu_fcos3d_0.75"|"lzu_fcos3d_1.00") expName=$1 ;; 8 | *) echo "Invalid experiment name." && exit;; 9 | esac 10 | 11 | # Test checkpointed model 12 | # set data.test.samples_per_gpu=1 if runnning timing tests 13 | python tools/test.py \ 14 | configs/$expName.py \ 15 | ckpt/$expName.pth \ 16 | --out $outDir/$expName/test_checkpoint/results.pkl \ 17 | --eval bbox \ 18 | --cfg-options \ 19 | data.test.samples_per_gpu=8 \ 20 | 21 | # Train 22 | torchrun \ 23 | --nnodes 1 \ 24 | --nproc_per_node $nGPU \ 25 | --rdzv_backend c10d \ 26 | --rdzv_endpoint localhost:0 \ 27 | tools/train.py \ 28 | configs/$expName.py \ 29 | --work-dir $outDir/$expName/work \ 30 | --gpus $nGPU \ 31 | --launcher pytorch \ 32 | 33 | # Test trained model 34 | # set data.test.samples_per_gpu=1 if runnning timing tests 35 | python tools/test.py \ 36 | configs/$expName.py \ 37 | $outDir/$expName/work/latest.pth \ 38 | --out $outDir/$expName/test/results.pkl \ 39 | --eval bbox \ 40 | --cfg-options \ 41 | data.test.samples_per_gpu=8 \ 42 | -------------------------------------------------------------------------------- /saliency.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tchittesh/lzu/361afb2360011a3b540fdbdd53d8be9eda70ac78/saliency.pkl -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import find_packages, setup 2 | 3 | print(find_packages(exclude=('configs', 'experiments', 'tools'))) 4 | 5 | if __name__ == '__main__': 6 | setup( 7 | name='lzu', 8 | packages=find_packages(include=('lzu')), 9 | ) 10 | -------------------------------------------------------------------------------- /teaser.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tchittesh/lzu/361afb2360011a3b540fdbdd53d8be9eda70ac78/teaser.gif -------------------------------------------------------------------------------- /tools/test.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import mmcv 4 | import os 5 | import torch 6 | from mmcv import Config, DictAction 7 | from mmcv.parallel import MMDataParallel 8 | from mmcv.runner import (get_dist_info, load_checkpoint, 9 | wrap_fp16_model) 10 | 11 | from mmdet3d.datasets import build_dataloader, build_dataset 12 | from mmdet3d.models import build_model 13 | from mmdet.apis import set_random_seed 14 | from mmdet.datasets import replace_ImageToTensor 15 | 16 | import lzu # noqa: F401, add custom modules to MMCV registry 17 | 18 | 19 | def parse_args(): 20 | parser = argparse.ArgumentParser( 21 | description='MMDet test (and eval) a model') 22 | parser.add_argument('config', help='test config file path') 23 | parser.add_argument('checkpoint', help='checkpoint file') 24 | parser.add_argument('--out', help='output result file in pickle format') 25 | parser.add_argument( 26 | '--eval', 27 | type=str, 28 | nargs='+', 29 | help='evaluation metrics, which depends on the dataset, e.g., "bbox",' 30 | ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC') 31 | parser.add_argument('--seed', type=int, default=0, help='random seed') 32 | parser.add_argument( 33 | '--deterministic', 34 | action='store_true', 35 | help='whether to set deterministic options for CUDNN backend.') 36 | parser.add_argument( 37 | '--cfg-options', 38 | nargs='+', 39 | action=DictAction, 40 | help='override some settings in the used config, the key-value pair ' 41 | 'in xxx=yyy format will be merged into config file. If the value to ' 42 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 43 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 44 | 'Note that the quotation marks are necessary and that no white space ' 45 | 'is allowed.') 46 | args = parser.parse_args() 47 | if 'LOCAL_RANK' not in os.environ: 48 | os.environ['LOCAL_RANK'] = "0" 49 | 50 | return args 51 | 52 | 53 | def single_gpu_test(model, data_loader): 54 | """Test model with single gpu. 55 | 56 | Args: 57 | model (nn.Module): Model to be tested. 58 | data_loader (nn.Dataloader): Pytorch data loader. 59 | 60 | Returns: 61 | list[dict]: The prediction results. 62 | """ 63 | model.eval() 64 | results = [] 65 | dataset = data_loader.dataset 66 | prog_bar = mmcv.ProgressBar(len(dataset)) 67 | for data in data_loader: 68 | with torch.no_grad(): 69 | result = model(return_loss=False, rescale=False, **data) 70 | results.extend(result) 71 | 72 | batch_size = len(result) 73 | for _ in range(batch_size): 74 | prog_bar.update() 75 | return results 76 | 77 | 78 | def main(): 79 | args = parse_args() 80 | 81 | assert args.out or args.eval, \ 82 | ('Please specify at least one operation (save/eval the ' 83 | 'results) with the argument "--out" or "--eval"') 84 | 85 | if args.out is not None and not args.out.endswith(('.pkl', '.pickle')): 86 | raise ValueError('The output file must be a pkl file.') 87 | 88 | cfg = Config.fromfile(args.config) 89 | if args.cfg_options is not None: 90 | cfg.merge_from_dict(args.cfg_options) 91 | print(cfg.pretty_text) 92 | 93 | # set cudnn_benchmark 94 | if cfg.get('cudnn_benchmark', False): 95 | torch.backends.cudnn.benchmark = True 96 | 97 | cfg.model.pretrained = None 98 | cfg.data.test.test_mode = True 99 | samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1) 100 | if samples_per_gpu > 1: 101 | # Replace 'ImageToTensor' to 'DefaultFormatBundle' 102 | cfg.data.test.pipeline = replace_ImageToTensor( 103 | cfg.data.test.pipeline) 104 | 105 | # set random seeds 106 | if args.seed is not None: 107 | set_random_seed(args.seed, deterministic=args.deterministic) 108 | 109 | # build the dataloader 110 | dataset = build_dataset(cfg.data.test) 111 | data_loader = build_dataloader( 112 | dataset, 113 | samples_per_gpu=samples_per_gpu, 114 | workers_per_gpu=cfg.data.workers_per_gpu, 115 | dist=False, 116 | shuffle=False) 117 | 118 | # build the model and load checkpoint 119 | cfg.model.train_cfg = None 120 | model = build_model(cfg.model, test_cfg=cfg.get('test_cfg')) 121 | fp16_cfg = cfg.get('fp16', None) 122 | if fp16_cfg is not None: 123 | wrap_fp16_model(model) 124 | checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu') 125 | model.CLASSES = checkpoint['meta']['CLASSES'] 126 | # palette for visualization in segmentation tasks 127 | if 'PALETTE' in checkpoint.get('meta', {}): 128 | model.PALETTE = checkpoint['meta']['PALETTE'] 129 | elif hasattr(dataset, 'PALETTE'): 130 | # segmentation dataset has `PALETTE` attribute 131 | model.PALETTE = dataset.PALETTE 132 | 133 | model = MMDataParallel(model, device_ids=[0]) 134 | outputs = single_gpu_test(model, data_loader) 135 | 136 | rank, _ = get_dist_info() 137 | if rank == 0: 138 | if args.out: 139 | print(f'\nwriting results to {args.out}') 140 | mmcv.dump(outputs, args.out) 141 | if args.eval: 142 | eval_kwargs = cfg.get('evaluation', {}).copy() 143 | # hard-code way to remove EvalHook args 144 | for key in [ 145 | 'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best', 146 | 'rule' 147 | ]: 148 | eval_kwargs.pop(key, None) 149 | eval_kwargs.update(dict(metric=args.eval)) 150 | print(dataset.evaluate(outputs, **eval_kwargs)) 151 | 152 | 153 | if __name__ == '__main__': 154 | main() 155 | -------------------------------------------------------------------------------- /tools/train.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | from __future__ import division 3 | 4 | import argparse 5 | import copy 6 | from datetime import timedelta 7 | import mmcv 8 | import os 9 | import time 10 | import torch 11 | import warnings 12 | from mmcv import Config, DictAction 13 | from mmcv.runner import get_dist_info, init_dist 14 | from os import path as osp 15 | 16 | from mmdet import __version__ as mmdet_version 17 | from mmdet3d import __version__ as mmdet3d_version 18 | from mmdet3d.apis import train_model 19 | from mmdet3d.datasets import build_dataset 20 | from mmdet3d.models import build_model 21 | from mmdet3d.utils import collect_env, get_root_logger 22 | from mmdet.apis import set_random_seed 23 | from mmseg import __version__ as mmseg_version 24 | 25 | import lzu # noqa: F401, add custom modules to MMCV registry 26 | 27 | 28 | def parse_args(): 29 | parser = argparse.ArgumentParser(description='Train a detector') 30 | parser.add_argument('config', help='train config file path') 31 | parser.add_argument('--work-dir', help='the dir to save logs and models') 32 | parser.add_argument( 33 | '--resume-from', help='the checkpoint file to resume from') 34 | parser.add_argument( 35 | '--no-validate', 36 | action='store_true', 37 | help='whether not to evaluate the checkpoint during training') 38 | group_gpus = parser.add_mutually_exclusive_group() 39 | group_gpus.add_argument( 40 | '--gpus', 41 | type=int, 42 | help='number of gpus to use ' 43 | '(only applicable to non-distributed training)') 44 | group_gpus.add_argument( 45 | '--gpu-ids', 46 | type=int, 47 | nargs='+', 48 | help='ids of gpus to use ' 49 | '(only applicable to non-distributed training)') 50 | parser.add_argument('--seed', type=int, default=0, help='random seed') 51 | parser.add_argument( 52 | '--deterministic', 53 | action='store_true', 54 | help='whether to set deterministic options for CUDNN backend.') 55 | parser.add_argument( 56 | '--options', 57 | nargs='+', 58 | action=DictAction, 59 | help='override some settings in the used config, the key-value pair ' 60 | 'in xxx=yyy format will be merged into config file (deprecate), ' 61 | 'change to --cfg-options instead.') 62 | parser.add_argument( 63 | '--cfg-options', 64 | nargs='+', 65 | action=DictAction, 66 | help='override some settings in the used config, the key-value pair ' 67 | 'in xxx=yyy format will be merged into config file. If the value to ' 68 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 69 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 70 | 'Note that the quotation marks are necessary and that no white space ' 71 | 'is allowed.') 72 | parser.add_argument( 73 | '--launcher', 74 | choices=['none', 'pytorch', 'slurm', 'mpi'], 75 | default='none', 76 | help='job launcher') 77 | parser.add_argument('--local_rank', type=int, default=0) 78 | parser.add_argument( 79 | '--autoscale-lr', 80 | action='store_true', 81 | help='automatically scale lr with the number of gpus') 82 | args = parser.parse_args() 83 | if 'LOCAL_RANK' not in os.environ: 84 | os.environ['LOCAL_RANK'] = str(args.local_rank) 85 | 86 | if args.options and args.cfg_options: 87 | raise ValueError( 88 | '--options and --cfg-options cannot be both specified, ' 89 | '--options is deprecated in favor of --cfg-options') 90 | if args.options: 91 | warnings.warn('--options is deprecated in favor of --cfg-options') 92 | args.cfg_options = args.options 93 | 94 | return args 95 | 96 | 97 | def main(): 98 | args = parse_args() 99 | 100 | cfg = Config.fromfile(args.config) 101 | if args.cfg_options is not None: 102 | cfg.merge_from_dict(args.cfg_options) 103 | 104 | # set cudnn_benchmark 105 | if cfg.get('cudnn_benchmark', False): 106 | torch.backends.cudnn.benchmark = True 107 | 108 | # work_dir is determined in this priority: CLI > segment in file > filename 109 | if args.work_dir is not None: 110 | # update configs according to CLI args if args.work_dir is not None 111 | cfg.work_dir = args.work_dir 112 | elif cfg.get('work_dir', None) is None: 113 | # use config filename as default work_dir if cfg.work_dir is None 114 | cfg.work_dir = osp.join('./work_dirs', 115 | osp.splitext(osp.basename(args.config))[0]) 116 | if args.resume_from is not None: 117 | cfg.resume_from = args.resume_from 118 | if args.gpu_ids is not None: 119 | cfg.gpu_ids = args.gpu_ids 120 | else: 121 | cfg.gpu_ids = range(1) if args.gpus is None else range(args.gpus) 122 | 123 | if args.autoscale_lr: 124 | # apply the linear scaling rule (https://arxiv.org/abs/1706.02677) 125 | cfg.optimizer['lr'] = cfg.optimizer['lr'] * len(cfg.gpu_ids) / 8 126 | 127 | # init distributed env first, since logger depends on the dist info. 128 | if args.launcher == 'none': 129 | distributed = False 130 | else: 131 | distributed = True 132 | if args.launcher == 'pytorch': 133 | torch.multiprocessing.set_start_method('fork') 134 | init_dist(args.launcher, timeout=timedelta(hours=4), **cfg.dist_params) 135 | # re-set gpu_ids with distributed training mode 136 | _, world_size = get_dist_info() 137 | cfg.gpu_ids = range(world_size) 138 | 139 | # create work_dir 140 | mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir)) 141 | # dump config 142 | cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config))) 143 | # init the logger before other steps 144 | timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime()) 145 | log_file = osp.join(cfg.work_dir, f'{timestamp}.log') 146 | # specify logger name, if we still use 'mmdet', the output info will be 147 | # filtered and won't be saved in the log_file 148 | # TODO: ugly workaround to judge whether we are training det or seg model 149 | if cfg.model.type in ['EncoderDecoder3D']: 150 | logger_name = 'mmseg' 151 | else: 152 | logger_name = 'mmdet' 153 | logger = get_root_logger( 154 | log_file=log_file, log_level=cfg.log_level, name=logger_name) 155 | 156 | # init the meta dict to record some important information such as 157 | # environment info and seed, which will be logged 158 | meta = dict() 159 | # log env info 160 | env_info_dict = collect_env() 161 | env_info = '\n'.join([(f'{k}: {v}') for k, v in env_info_dict.items()]) 162 | dash_line = '-' * 60 + '\n' 163 | logger.info('Environment info:\n' + dash_line + env_info + '\n' + 164 | dash_line) 165 | meta['env_info'] = env_info 166 | meta['config'] = cfg.pretty_text 167 | 168 | # log some basic info 169 | logger.info(f'Distributed training: {distributed}') 170 | logger.info(f'Config:\n{cfg.pretty_text}') 171 | 172 | # set random seeds 173 | if args.seed is not None: 174 | logger.info(f'Set random seed to {args.seed}, ' 175 | f'deterministic: {args.deterministic}') 176 | set_random_seed(args.seed, deterministic=args.deterministic) 177 | cfg.seed = args.seed 178 | meta['seed'] = args.seed 179 | meta['exp_name'] = osp.basename(args.config) 180 | 181 | model = build_model( 182 | cfg.model, 183 | train_cfg=cfg.get('train_cfg'), 184 | test_cfg=cfg.get('test_cfg')) 185 | model.init_weights() 186 | 187 | logger.info(f'Model:\n{model}') 188 | datasets = [build_dataset(cfg.data.train)] 189 | if len(cfg.workflow) == 2: 190 | val_dataset = copy.deepcopy(cfg.data.val) 191 | # in case we use a dataset wrapper 192 | if 'dataset' in cfg.data.train: 193 | val_dataset.pipeline = cfg.data.train.dataset.pipeline 194 | else: 195 | val_dataset.pipeline = cfg.data.train.pipeline 196 | # set test_mode=False here in deep copied config 197 | # which do not affect AP/AR calculation later 198 | # refer to https://mmdetection3d.readthedocs.io/en/latest/tutorials/customize_runtime.html#customize-workflow # noqa 199 | val_dataset.test_mode = False 200 | datasets.append(build_dataset(val_dataset)) 201 | if cfg.checkpoint_config is not None: 202 | # save mmdet version, config file content and class names in 203 | # checkpoints as meta data 204 | cfg.checkpoint_config.meta = dict( 205 | mmdet_version=mmdet_version, 206 | mmseg_version=mmseg_version, 207 | mmdet3d_version=mmdet3d_version, 208 | config=cfg.pretty_text, 209 | CLASSES=datasets[0].CLASSES, 210 | PALETTE=datasets[0].PALETTE # for segmentors 211 | if hasattr(datasets[0], 'PALETTE') else None) 212 | # add an attribute for visualization convenience 213 | model.CLASSES = datasets[0].CLASSES 214 | train_model( 215 | model, 216 | datasets, 217 | cfg, 218 | distributed=distributed, 219 | validate=(not args.no_validate), 220 | timestamp=timestamp, 221 | meta=meta) 222 | 223 | 224 | if __name__ == '__main__': 225 | main() 226 | --------------------------------------------------------------------------------