├── .gitignore
├── .gitmodules
├── LICENSE
├── README.md
├── accuracy-latency.png
├── ckpt
└── README.md
├── configs
├── fcos3d_0.25.py
├── fcos3d_0.50.py
├── fcos3d_0.75.py
├── fcos3d_1.00.py
├── lzu_fcos3d_0.25.py
├── lzu_fcos3d_0.50.py
├── lzu_fcos3d_0.75.py
└── lzu_fcos3d_1.00.py
├── data
└── README.md
├── demo.gif
├── environment.yml
├── lzu
├── __init__.py
├── fcos_mono3d_head_norescale.py
├── fixed_grid.py
├── invert_grid.py
├── lzu_fcos_mono3d.py
└── transforms_3d.py
├── run.sh
├── saliency.pkl
├── setup.py
├── teaser.gif
└── tools
├── test.py
└── train.py
/.gitignore:
--------------------------------------------------------------------------------
1 | .vscode/
2 | data/
3 | ckpt/
4 | *.egg-info/
5 | __pycache__/
6 |
--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "mmdetection3d_v1.0.0rc6"]
2 | path = mmdetection3d_v1.0.0rc6
3 | url = https://github.com/open-mmlab/mmdetection3d.git
4 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Chittesh Thavamani
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Learning to Zoom and Unzoom
2 |
3 | Official repository for the CVPR 2023 paper _Learning to Zoom and Unzoom_ [[paper]](https://arxiv.org/abs/2303.15390) [[website]](https://tchittesh.github.io/lzu/) [[talk]](https://youtu.be/wALSrBZiUgc).
4 |
5 |
6 |
7 |
8 |
9 |
10 | In a nutshell, LZU is a highly flexible method to apply spatial attention to neural nets.
11 | The extremely simple source code ([zoom](./lzu/fixed_grid.py) and [unzoom](./lzu/invert_grid.py)) can be applied to any model that uses spatial processing (e.g. convolutions).
12 |
13 | ## Setup (Code + Data + Models)
14 |
15 |
16 | 1) Set up the coding environment
17 |
18 |
19 | First, clone the repository (including the mmdet3d submodule):
20 | ```bash
21 | git clone https://github.com/tchittesh/lzu.git --recursive && cd lzu
22 | ```
23 |
24 | Then, you'll need to install the MMDetection3D (v1.0.0rc6) submodule and the lzu package.
25 | To do this, you can either:
26 | - replicate our exact setup by installing [miniconda](https://docs.conda.io/en/latest/miniconda.html) and running
27 | ```
28 | conda env create -f environment.yml
29 | ```
30 | - OR install it from scratch according to [getting_started.md](https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/docs/en/getting_started.md) and then install our lzu package with
31 | ```bash
32 | pip install -e .
33 | ```
34 |
35 | The first option should be more reliable, but not as flexible if you want to run specific versions of Python/PyTorch/MMCV.
36 |
37 |
38 |
39 | 2) Download the dataset
40 |
41 |
42 | You'll need to set up the [nuScenes](https://www.nuscenes.org/nuscenes#download) dataset according to [data_preparation.md](https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/docs/en/data_preparation.md). Your final `data` folder should look like this:
43 | ```
44 | data/nuscenes/
45 | ├── maps/
46 | ├── samples/
47 | ├── sweeps/
48 | ├── v1.0-trainval/
49 | ├── nuscenes_infos_train_mono3d.coco.json
50 | ├── nuscenes_infos_train.pkl
51 | ├── nuscenes_infos_val_mono3d.coco.json
52 | └── nuscenes_infos_val.pkl
53 | ```
54 |
55 |
56 |
57 | 3) [Optional] Download our pretrained checkpoints
58 |
59 |
60 | Download our pretrained checkpoints from [Google Drive](https://drive.google.com/file/d/1nofuqZ7YSKblIDAltbxp1pFOiQtUzp8B/view?usp=sharing) and place them in this directory, using symbolic links if necessary.
61 |
62 |
63 | ## Scripts
64 |
65 | This should be super easy! Simply run
66 |
67 | ```
68 | sh run.sh [experiment_name]
69 | ```
70 |
71 | for any valid experiment name in the `configs/` directory.
72 | Examples include `fcos3d_0.50`, which is the uniform downsampling baseline at 0.50x scale, and `lzu_fcos3d_0.75`, which is LZU at 0.75x scale.
73 |
74 | This script will first run inference using the pretrained checkpoint, then train the model from scratch, and finally run inference using the trained model.
75 |
76 | ## Results
77 |
78 | Our pretrained models (from the paper) achieve the following NDS scores.
79 |
80 | | Scale | Baseline Experiment | NDS | LZU Experiment | NDS |
81 | | ----- | ------------------- | ----------- | ------------------ | ----------- |
82 | | 0.25x | fcos3d_0.25 | 0.2177 | lzu_fcos3d_0.25 | 0.2341 |
83 | | 0.50x | fcos3d_0.50 | 0.2752 | lzu_fcos3d_0.50 | 0.2926 |
84 | | 0.75x | fcos3d_0.75 | 0.3053 | lzu_fcos3d_0.75 | 0.3175 |
85 | | 1.00x | fcos3d_1.00 | 0.3122 | lzu_fcos3d_1.00 | 0.3258 |
86 |
87 | As can be seen, LZU achieves a superior accuracy-latency tradeoff compared to uniform downsampling. For more details, please refer to our [paper](https://arxiv.org/abs/2303.15390).
88 |
89 | 
90 |
91 | ## Citation
92 |
93 | If you find our code useful, please consider citing us!
94 | ```
95 | @misc{thavamani2023learning,
96 | title={Learning to Zoom and Unzoom},
97 | author={Chittesh Thavamani and Mengtian Li and Francesco Ferroni and Deva Ramanan},
98 | year={2023},
99 | eprint={2303.15390},
100 | archivePrefix={arXiv},
101 | primaryClass={cs.CV}
102 | }
103 | ```
104 |
--------------------------------------------------------------------------------
/accuracy-latency.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tchittesh/lzu/361afb2360011a3b540fdbdd53d8be9eda70ac78/accuracy-latency.png
--------------------------------------------------------------------------------
/ckpt/README.md:
--------------------------------------------------------------------------------
1 | Download our pretrained checkpoints from [Google Drive](https://drive.google.com/file/d/1nofuqZ7YSKblIDAltbxp1pFOiQtUzp8B/view?usp=sharing) and place them in this directory, using symbolic links if you wish.
2 |
--------------------------------------------------------------------------------
/configs/fcos3d_0.25.py:
--------------------------------------------------------------------------------
1 | _base_ = [
2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py',
3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py',
4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py',
5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py'
6 | ]
7 | scale_factor = 0.25
8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900))
9 |
10 | # model settings
11 | model = dict(
12 | backbone=dict(
13 | depth=50,
14 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
15 | stage_with_dcn=(False, False, True, True),
16 | init_cfg=dict(checkpoint='torchvision://resnet50')),
17 | bbox_head=dict(
18 | type='FCOSMono3DHeadNoRescale'))
19 |
20 | class_names = [
21 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
22 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
23 | ]
24 | img_norm_cfg = dict(
25 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
26 | train_pipeline = [
27 | dict(type='LoadImageFromFileMono3D'),
28 | dict(
29 | type='LoadAnnotations3D',
30 | with_bbox=True,
31 | with_label=True,
32 | with_attr_label=True,
33 | with_bbox_3d=True,
34 | with_label_3d=True,
35 | with_bbox_depth=True),
36 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
37 | dict(type='Resize3D', img_scale=img_scale, keep_ratio=True),
38 | dict(type='Normalize', **img_norm_cfg),
39 | dict(type='Pad', size_divisor=32),
40 | dict(type='DefaultFormatBundle3D', class_names=class_names),
41 | dict(
42 | type='Collect3D',
43 | keys=[
44 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
45 | 'gt_labels_3d', 'centers2d', 'depths'
46 | ]),
47 | ]
48 | test_pipeline = [
49 | dict(type='LoadImageFromFileMono3D'),
50 | dict(
51 | type='MultiScaleFlipAug',
52 | scale_factor=scale_factor,
53 | flip=False,
54 | transforms=[
55 | dict(type='RandomFlip3D'),
56 | dict(type='Resize3D', keep_ratio=True),
57 | dict(type='Normalize', **img_norm_cfg),
58 | dict(type='Pad', size_divisor=32),
59 | dict(
60 | type='DefaultFormatBundle3D',
61 | class_names=class_names,
62 | with_label=False),
63 | dict(type='Collect3D', keys=['img']),
64 | ])
65 | ]
66 | data = dict(
67 | samples_per_gpu=8,
68 | workers_per_gpu=4,
69 | train=dict(pipeline=train_pipeline),
70 | val=dict(pipeline=test_pipeline),
71 | test=dict(pipeline=test_pipeline))
72 | # optimizer
73 | optimizer = dict(
74 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
75 | optimizer_config = dict(
76 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
77 | # learning policy
78 | lr_config = dict(
79 | policy='step',
80 | warmup='linear',
81 | warmup_iters=500,
82 | warmup_ratio=1.0 / 3,
83 | step=[8, 11])
84 | total_epochs = 12
85 | evaluation = dict(interval=1)
86 |
--------------------------------------------------------------------------------
/configs/fcos3d_0.50.py:
--------------------------------------------------------------------------------
1 | _base_ = [
2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py',
3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py',
4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py',
5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py'
6 | ]
7 | scale_factor = 0.50
8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900))
9 |
10 | # model settings
11 | model = dict(
12 | backbone=dict(
13 | depth=50,
14 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
15 | stage_with_dcn=(False, False, True, True),
16 | init_cfg=dict(checkpoint='torchvision://resnet50')),
17 | bbox_head=dict(
18 | type='FCOSMono3DHeadNoRescale'))
19 |
20 | class_names = [
21 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
22 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
23 | ]
24 | img_norm_cfg = dict(
25 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
26 | train_pipeline = [
27 | dict(type='LoadImageFromFileMono3D'),
28 | dict(
29 | type='LoadAnnotations3D',
30 | with_bbox=True,
31 | with_label=True,
32 | with_attr_label=True,
33 | with_bbox_3d=True,
34 | with_label_3d=True,
35 | with_bbox_depth=True),
36 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
37 | dict(type='Resize3D', img_scale=img_scale, keep_ratio=True),
38 | dict(type='Normalize', **img_norm_cfg),
39 | dict(type='Pad', size_divisor=32),
40 | dict(type='DefaultFormatBundle3D', class_names=class_names),
41 | dict(
42 | type='Collect3D',
43 | keys=[
44 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
45 | 'gt_labels_3d', 'centers2d', 'depths'
46 | ]),
47 | ]
48 | test_pipeline = [
49 | dict(type='LoadImageFromFileMono3D'),
50 | dict(
51 | type='MultiScaleFlipAug',
52 | scale_factor=scale_factor,
53 | flip=False,
54 | transforms=[
55 | dict(type='RandomFlip3D'),
56 | dict(type='Resize3D', keep_ratio=True),
57 | dict(type='Normalize', **img_norm_cfg),
58 | dict(type='Pad', size_divisor=32),
59 | dict(
60 | type='DefaultFormatBundle3D',
61 | class_names=class_names,
62 | with_label=False),
63 | dict(type='Collect3D', keys=['img']),
64 | ])
65 | ]
66 | data = dict(
67 | samples_per_gpu=8,
68 | workers_per_gpu=4,
69 | train=dict(pipeline=train_pipeline),
70 | val=dict(pipeline=test_pipeline),
71 | test=dict(pipeline=test_pipeline))
72 | # optimizer
73 | optimizer = dict(
74 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
75 | optimizer_config = dict(
76 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
77 | # learning policy
78 | lr_config = dict(
79 | policy='step',
80 | warmup='linear',
81 | warmup_iters=500,
82 | warmup_ratio=1.0 / 3,
83 | step=[8, 11])
84 | total_epochs = 12
85 | evaluation = dict(interval=1)
86 |
--------------------------------------------------------------------------------
/configs/fcos3d_0.75.py:
--------------------------------------------------------------------------------
1 | _base_ = [
2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py',
3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py',
4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py',
5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py'
6 | ]
7 | scale_factor = 0.75
8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900))
9 |
10 | # model settings
11 | model = dict(
12 | backbone=dict(
13 | depth=50,
14 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
15 | stage_with_dcn=(False, False, True, True),
16 | init_cfg=dict(checkpoint='torchvision://resnet50')),
17 | bbox_head=dict(
18 | type='FCOSMono3DHeadNoRescale'))
19 |
20 | class_names = [
21 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
22 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
23 | ]
24 | img_norm_cfg = dict(
25 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
26 | train_pipeline = [
27 | dict(type='LoadImageFromFileMono3D'),
28 | dict(
29 | type='LoadAnnotations3D',
30 | with_bbox=True,
31 | with_label=True,
32 | with_attr_label=True,
33 | with_bbox_3d=True,
34 | with_label_3d=True,
35 | with_bbox_depth=True),
36 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
37 | dict(type='Resize3D', img_scale=img_scale, keep_ratio=True),
38 | dict(type='Normalize', **img_norm_cfg),
39 | dict(type='Pad', size_divisor=32),
40 | dict(type='DefaultFormatBundle3D', class_names=class_names),
41 | dict(
42 | type='Collect3D',
43 | keys=[
44 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
45 | 'gt_labels_3d', 'centers2d', 'depths'
46 | ]),
47 | ]
48 | test_pipeline = [
49 | dict(type='LoadImageFromFileMono3D'),
50 | dict(
51 | type='MultiScaleFlipAug',
52 | scale_factor=scale_factor,
53 | flip=False,
54 | transforms=[
55 | dict(type='RandomFlip3D'),
56 | dict(type='Resize3D', keep_ratio=True),
57 | dict(type='Normalize', **img_norm_cfg),
58 | dict(type='Pad', size_divisor=32),
59 | dict(
60 | type='DefaultFormatBundle3D',
61 | class_names=class_names,
62 | with_label=False),
63 | dict(type='Collect3D', keys=['img']),
64 | ])
65 | ]
66 | data = dict(
67 | samples_per_gpu=8,
68 | workers_per_gpu=4,
69 | train=dict(pipeline=train_pipeline),
70 | val=dict(pipeline=test_pipeline),
71 | test=dict(pipeline=test_pipeline))
72 | # optimizer
73 | optimizer = dict(
74 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
75 | optimizer_config = dict(
76 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
77 | # learning policy
78 | lr_config = dict(
79 | policy='step',
80 | warmup='linear',
81 | warmup_iters=500,
82 | warmup_ratio=1.0 / 3,
83 | step=[8, 11])
84 | total_epochs = 12
85 | evaluation = dict(interval=1)
86 |
--------------------------------------------------------------------------------
/configs/fcos3d_1.00.py:
--------------------------------------------------------------------------------
1 | _base_ = [
2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py',
3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py',
4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py',
5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py'
6 | ]
7 | scale_factor = 1.00
8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900))
9 |
10 | # model settings
11 | model = dict(
12 | backbone=dict(
13 | depth=50,
14 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
15 | stage_with_dcn=(False, False, True, True),
16 | init_cfg=dict(checkpoint='torchvision://resnet50')),
17 | bbox_head=dict(
18 | type='FCOSMono3DHeadNoRescale'))
19 |
20 | class_names = [
21 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
22 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
23 | ]
24 | img_norm_cfg = dict(
25 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
26 | train_pipeline = [
27 | dict(type='LoadImageFromFileMono3D'),
28 | dict(
29 | type='LoadAnnotations3D',
30 | with_bbox=True,
31 | with_label=True,
32 | with_attr_label=True,
33 | with_bbox_3d=True,
34 | with_label_3d=True,
35 | with_bbox_depth=True),
36 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
37 | dict(type='Resize3D', img_scale=img_scale, keep_ratio=True),
38 | dict(type='Normalize', **img_norm_cfg),
39 | dict(type='Pad', size_divisor=32),
40 | dict(type='DefaultFormatBundle3D', class_names=class_names),
41 | dict(
42 | type='Collect3D',
43 | keys=[
44 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
45 | 'gt_labels_3d', 'centers2d', 'depths'
46 | ]),
47 | ]
48 | test_pipeline = [
49 | dict(type='LoadImageFromFileMono3D'),
50 | dict(
51 | type='MultiScaleFlipAug',
52 | scale_factor=scale_factor,
53 | flip=False,
54 | transforms=[
55 | dict(type='RandomFlip3D'),
56 | dict(type='Resize3D', keep_ratio=True),
57 | dict(type='Normalize', **img_norm_cfg),
58 | dict(type='Pad', size_divisor=32),
59 | dict(
60 | type='DefaultFormatBundle3D',
61 | class_names=class_names,
62 | with_label=False),
63 | dict(type='Collect3D', keys=['img']),
64 | ])
65 | ]
66 | data = dict(
67 | samples_per_gpu=8,
68 | workers_per_gpu=4,
69 | train=dict(pipeline=train_pipeline),
70 | val=dict(pipeline=test_pipeline),
71 | test=dict(pipeline=test_pipeline))
72 | # optimizer
73 | optimizer = dict(
74 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
75 | optimizer_config = dict(
76 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
77 | # learning policy
78 | lr_config = dict(
79 | policy='step',
80 | warmup='linear',
81 | warmup_iters=500,
82 | warmup_ratio=1.0 / 3,
83 | step=[8, 11])
84 | total_epochs = 12
85 | evaluation = dict(interval=1)
86 |
--------------------------------------------------------------------------------
/configs/lzu_fcos3d_0.25.py:
--------------------------------------------------------------------------------
1 | _base_ = [
2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py',
3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py',
4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py',
5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py'
6 | ]
7 | scale_factor = 0.25
8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900))
9 |
10 | # model settings
11 | model = dict(
12 | type="LZUFCOSMono3D",
13 | backbone=dict(
14 | depth=50,
15 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
16 | stage_with_dcn=(False, False, True, True),
17 | init_cfg=dict(checkpoint='torchvision://resnet50')),
18 | grid_generator=dict(
19 | type='FixedGrid',
20 | saliency_file='saliency.pkl',
21 | output_shape=(img_scale[1], img_scale[0]),
22 | grid_shape=(27, 48),
23 | separable=True,
24 | attraction_fwhm=10,
25 | anti_crop=True),
26 | bbox_head=dict(
27 | type='FCOSMono3DHeadNoRescale'))
28 |
29 | class_names = [
30 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
31 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
32 | ]
33 | img_norm_cfg = dict(
34 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
35 | train_pipeline = [
36 | dict(type='LoadImageFromFileMono3D'),
37 | dict(
38 | type='LoadAnnotations3D',
39 | with_bbox=True,
40 | with_label=True,
41 | with_attr_label=True,
42 | with_bbox_3d=True,
43 | with_label_3d=True,
44 | with_bbox_depth=True),
45 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
46 | dict(type='Resize3D', img_scale=(1600, 900), keep_ratio=True),
47 | dict(type='Normalize', **img_norm_cfg),
48 | dict(type='Pad', size_divisor=32),
49 | dict(type='DefaultFormatBundle3D', class_names=class_names),
50 | dict(
51 | type='Collect3D',
52 | keys=[
53 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
54 | 'gt_labels_3d', 'centers2d', 'depths'
55 | ]),
56 | ]
57 | test_pipeline = [
58 | dict(type='LoadImageFromFileMono3D'),
59 | dict(
60 | type='MultiScaleFlipAug',
61 | scale_factor=1.0,
62 | flip=False,
63 | transforms=[
64 | dict(type='RandomFlip3D'),
65 | dict(type='Resize3D', keep_ratio=True),
66 | dict(type='Normalize', **img_norm_cfg),
67 | dict(type='Pad', size_divisor=32),
68 | dict(
69 | type='DefaultFormatBundle3D',
70 | class_names=class_names,
71 | with_label=False),
72 | dict(type='Collect3D', keys=['img']),
73 | ])
74 | ]
75 | data = dict(
76 | samples_per_gpu=8,
77 | workers_per_gpu=4,
78 | train=dict(pipeline=train_pipeline),
79 | val=dict(pipeline=test_pipeline),
80 | test=dict(pipeline=test_pipeline))
81 | # optimizer
82 | optimizer = dict(
83 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
84 | optimizer_config = dict(
85 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
86 | # learning policy
87 | lr_config = dict(
88 | policy='step',
89 | warmup='linear',
90 | warmup_iters=500,
91 | warmup_ratio=1.0 / 3,
92 | step=[8, 11])
93 | total_epochs = 12
94 | evaluation = dict(interval=1)
95 |
--------------------------------------------------------------------------------
/configs/lzu_fcos3d_0.50.py:
--------------------------------------------------------------------------------
1 | _base_ = [
2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py',
3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py',
4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py',
5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py'
6 | ]
7 | scale_factor = 0.50
8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900))
9 |
10 | # model settings
11 | model = dict(
12 | type="LZUFCOSMono3D",
13 | backbone=dict(
14 | depth=50,
15 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
16 | stage_with_dcn=(False, False, True, True),
17 | init_cfg=dict(checkpoint='torchvision://resnet50')),
18 | grid_generator=dict(
19 | type='FixedGrid',
20 | saliency_file='saliency.pkl',
21 | output_shape=(img_scale[1], img_scale[0]),
22 | grid_shape=(27, 48),
23 | separable=True,
24 | attraction_fwhm=10,
25 | anti_crop=True),
26 | bbox_head=dict(
27 | type='FCOSMono3DHeadNoRescale'))
28 |
29 | class_names = [
30 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
31 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
32 | ]
33 | img_norm_cfg = dict(
34 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
35 | train_pipeline = [
36 | dict(type='LoadImageFromFileMono3D'),
37 | dict(
38 | type='LoadAnnotations3D',
39 | with_bbox=True,
40 | with_label=True,
41 | with_attr_label=True,
42 | with_bbox_3d=True,
43 | with_label_3d=True,
44 | with_bbox_depth=True),
45 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
46 | dict(type='Resize3D', img_scale=(1600, 900), keep_ratio=True),
47 | dict(type='Normalize', **img_norm_cfg),
48 | dict(type='Pad', size_divisor=32),
49 | dict(type='DefaultFormatBundle3D', class_names=class_names),
50 | dict(
51 | type='Collect3D',
52 | keys=[
53 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
54 | 'gt_labels_3d', 'centers2d', 'depths'
55 | ]),
56 | ]
57 | test_pipeline = [
58 | dict(type='LoadImageFromFileMono3D'),
59 | dict(
60 | type='MultiScaleFlipAug',
61 | scale_factor=1.0,
62 | flip=False,
63 | transforms=[
64 | dict(type='RandomFlip3D'),
65 | dict(type='Resize3D', keep_ratio=True),
66 | dict(type='Normalize', **img_norm_cfg),
67 | dict(type='Pad', size_divisor=32),
68 | dict(
69 | type='DefaultFormatBundle3D',
70 | class_names=class_names,
71 | with_label=False),
72 | dict(type='Collect3D', keys=['img']),
73 | ])
74 | ]
75 | data = dict(
76 | samples_per_gpu=8,
77 | workers_per_gpu=4,
78 | train=dict(pipeline=train_pipeline),
79 | val=dict(pipeline=test_pipeline),
80 | test=dict(pipeline=test_pipeline))
81 | # optimizer
82 | optimizer = dict(
83 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
84 | optimizer_config = dict(
85 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
86 | # learning policy
87 | lr_config = dict(
88 | policy='step',
89 | warmup='linear',
90 | warmup_iters=500,
91 | warmup_ratio=1.0 / 3,
92 | step=[8, 11])
93 | total_epochs = 12
94 | evaluation = dict(interval=1)
95 |
--------------------------------------------------------------------------------
/configs/lzu_fcos3d_0.75.py:
--------------------------------------------------------------------------------
1 | _base_ = [
2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py',
3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py',
4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py',
5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py'
6 | ]
7 | scale_factor = 0.75
8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900))
9 |
10 | # model settings
11 | model = dict(
12 | type="LZUFCOSMono3D",
13 | backbone=dict(
14 | depth=50,
15 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
16 | stage_with_dcn=(False, False, True, True),
17 | init_cfg=dict(checkpoint='torchvision://resnet50')),
18 | grid_generator=dict(
19 | type='FixedGrid',
20 | saliency_file='saliency.pkl',
21 | output_shape=(img_scale[1], img_scale[0]),
22 | grid_shape=(27, 48),
23 | separable=True,
24 | attraction_fwhm=10,
25 | anti_crop=True),
26 | bbox_head=dict(
27 | type='FCOSMono3DHeadNoRescale'))
28 |
29 | class_names = [
30 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
31 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
32 | ]
33 | img_norm_cfg = dict(
34 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
35 | train_pipeline = [
36 | dict(type='LoadImageFromFileMono3D'),
37 | dict(
38 | type='LoadAnnotations3D',
39 | with_bbox=True,
40 | with_label=True,
41 | with_attr_label=True,
42 | with_bbox_3d=True,
43 | with_label_3d=True,
44 | with_bbox_depth=True),
45 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
46 | dict(type='Resize3D', img_scale=(1600, 900), keep_ratio=True),
47 | dict(type='Normalize', **img_norm_cfg),
48 | dict(type='Pad', size_divisor=32),
49 | dict(type='DefaultFormatBundle3D', class_names=class_names),
50 | dict(
51 | type='Collect3D',
52 | keys=[
53 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
54 | 'gt_labels_3d', 'centers2d', 'depths'
55 | ]),
56 | ]
57 | test_pipeline = [
58 | dict(type='LoadImageFromFileMono3D'),
59 | dict(
60 | type='MultiScaleFlipAug',
61 | scale_factor=1.0,
62 | flip=False,
63 | transforms=[
64 | dict(type='RandomFlip3D'),
65 | dict(type='Resize3D', keep_ratio=True),
66 | dict(type='Normalize', **img_norm_cfg),
67 | dict(type='Pad', size_divisor=32),
68 | dict(
69 | type='DefaultFormatBundle3D',
70 | class_names=class_names,
71 | with_label=False),
72 | dict(type='Collect3D', keys=['img']),
73 | ])
74 | ]
75 | data = dict(
76 | samples_per_gpu=8,
77 | workers_per_gpu=4,
78 | train=dict(pipeline=train_pipeline),
79 | val=dict(pipeline=test_pipeline),
80 | test=dict(pipeline=test_pipeline))
81 | # optimizer
82 | optimizer = dict(
83 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
84 | optimizer_config = dict(
85 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
86 | # learning policy
87 | lr_config = dict(
88 | policy='step',
89 | warmup='linear',
90 | warmup_iters=500,
91 | warmup_ratio=1.0 / 3,
92 | step=[8, 11])
93 | total_epochs = 12
94 | evaluation = dict(interval=1)
95 |
--------------------------------------------------------------------------------
/configs/lzu_fcos3d_1.00.py:
--------------------------------------------------------------------------------
1 | _base_ = [
2 | '../mmdetection3d_v1.0.0rc6/configs/_base_/datasets/nus-mono3d.py',
3 | '../mmdetection3d_v1.0.0rc6/configs/_base_/models/fcos3d.py',
4 | '../mmdetection3d_v1.0.0rc6/configs/_base_/schedules/mmdet_schedule_1x.py',
5 | '../mmdetection3d_v1.0.0rc6/configs/_base_/default_runtime.py'
6 | ]
7 | scale_factor = 1.00
8 | img_scale = (int(scale_factor * 1600), int(scale_factor * 900))
9 |
10 | # model settings
11 | model = dict(
12 | type="LZUFCOSMono3D",
13 | backbone=dict(
14 | depth=50,
15 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
16 | stage_with_dcn=(False, False, True, True),
17 | init_cfg=dict(checkpoint='torchvision://resnet50')),
18 | grid_generator=dict(
19 | type='FixedGrid',
20 | saliency_file='saliency.pkl',
21 | output_shape=(img_scale[1], img_scale[0]),
22 | grid_shape=(27, 48),
23 | separable=True,
24 | attraction_fwhm=10,
25 | anti_crop=True),
26 | bbox_head=dict(
27 | type='FCOSMono3DHeadNoRescale'))
28 |
29 | class_names = [
30 | 'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
31 | 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
32 | ]
33 | img_norm_cfg = dict(
34 | mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
35 | train_pipeline = [
36 | dict(type='LoadImageFromFileMono3D'),
37 | dict(
38 | type='LoadAnnotations3D',
39 | with_bbox=True,
40 | with_label=True,
41 | with_attr_label=True,
42 | with_bbox_3d=True,
43 | with_label_3d=True,
44 | with_bbox_depth=True),
45 | dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
46 | dict(type='Resize3D', img_scale=(1600, 900), keep_ratio=True),
47 | dict(type='Normalize', **img_norm_cfg),
48 | dict(type='Pad', size_divisor=32),
49 | dict(type='DefaultFormatBundle3D', class_names=class_names),
50 | dict(
51 | type='Collect3D',
52 | keys=[
53 | 'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
54 | 'gt_labels_3d', 'centers2d', 'depths'
55 | ]),
56 | ]
57 | test_pipeline = [
58 | dict(type='LoadImageFromFileMono3D'),
59 | dict(
60 | type='MultiScaleFlipAug',
61 | scale_factor=1.0,
62 | flip=False,
63 | transforms=[
64 | dict(type='RandomFlip3D'),
65 | dict(type='Resize3D', keep_ratio=True),
66 | dict(type='Normalize', **img_norm_cfg),
67 | dict(type='Pad', size_divisor=32),
68 | dict(
69 | type='DefaultFormatBundle3D',
70 | class_names=class_names,
71 | with_label=False),
72 | dict(type='Collect3D', keys=['img']),
73 | ])
74 | ]
75 | data = dict(
76 | samples_per_gpu=8,
77 | workers_per_gpu=4,
78 | train=dict(pipeline=train_pipeline),
79 | val=dict(pipeline=test_pipeline),
80 | test=dict(pipeline=test_pipeline))
81 | # optimizer
82 | optimizer = dict(
83 | lr=0.001, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
84 | optimizer_config = dict(
85 | _delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
86 | # learning policy
87 | lr_config = dict(
88 | policy='step',
89 | warmup='linear',
90 | warmup_iters=500,
91 | warmup_ratio=1.0 / 3,
92 | step=[8, 11])
93 | total_epochs = 12
94 | evaluation = dict(interval=1)
95 |
--------------------------------------------------------------------------------
/data/README.md:
--------------------------------------------------------------------------------
1 | Download and set up the [nuScenes](https://www.nuscenes.org/nuscenes#download) dataset according to [data_preparation.md](https://github.com/open-mmlab/mmdetection3d/blob/47285b3f1e9dba358e98fcd12e523cfd0769c876/docs/en/data_preparation.md). Use symbolic links if necessary. Make sure to place the dataset in a `nuscenes/` subdirectory.
2 |
--------------------------------------------------------------------------------
/demo.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tchittesh/lzu/361afb2360011a3b540fdbdd53d8be9eda70ac78/demo.gif
--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
1 | name: lzu
2 | channels:
3 | - pytorch
4 | - nvidia
5 | - defaults
6 | dependencies:
7 | - _libgcc_mutex=0.1=main
8 | - _openmp_mutex=5.1=1_gnu
9 | - blas=1.0=mkl
10 | - brotlipy=0.7.0=py38h27cfd23_1003
11 | - bzip2=1.0.8=h7b6447c_0
12 | - ca-certificates=2023.01.10=h06a4308_0
13 | - certifi=2022.12.7=py38h06a4308_0
14 | - cffi=1.15.1=py38h5eee18b_3
15 | - charset-normalizer=2.0.4=pyhd3eb1b0_0
16 | - cryptography=39.0.1=py38h9ce1e76_0
17 | - cuda=11.6.1=0
18 | - cuda-cccl=11.6.55=hf6102b2_0
19 | - cuda-command-line-tools=11.6.2=0
20 | - cuda-compiler=11.6.2=0
21 | - cuda-cudart=11.6.55=he381448_0
22 | - cuda-cudart-dev=11.6.55=h42ad0f4_0
23 | - cuda-cuobjdump=11.6.124=h2eeebcb_0
24 | - cuda-cupti=11.6.124=h86345e5_0
25 | - cuda-cuxxfilt=11.6.124=hecbf4f6_0
26 | - cuda-driver-dev=11.6.55=0
27 | - cuda-gdb=12.1.55=0
28 | - cuda-libraries=11.6.1=0
29 | - cuda-libraries-dev=11.6.1=0
30 | - cuda-memcheck=11.8.86=0
31 | - cuda-nsight=12.1.55=0
32 | - cuda-nsight-compute=12.1.0=0
33 | - cuda-nvcc=11.6.124=hbba6d2d_0
34 | - cuda-nvdisasm=12.1.55=0
35 | - cuda-nvml-dev=11.6.55=haa9ef22_0
36 | - cuda-nvprof=12.1.55=0
37 | - cuda-nvprune=11.6.124=he22ec0a_0
38 | - cuda-nvrtc=11.6.124=h020bade_0
39 | - cuda-nvrtc-dev=11.6.124=h249d397_0
40 | - cuda-nvtx=11.6.124=h0630a44_0
41 | - cuda-nvvp=12.1.55=0
42 | - cuda-runtime=11.6.1=0
43 | - cuda-samples=11.6.101=h8efea70_0
44 | - cuda-sanitizer-api=12.1.55=0
45 | - cuda-toolkit=11.6.1=0
46 | - cuda-tools=11.6.1=0
47 | - cuda-visual-tools=11.6.1=0
48 | - ffmpeg=4.3=hf484d3e_0
49 | - flit-core=3.6.0=pyhd3eb1b0_0
50 | - freetype=2.12.1=h4a9f257_0
51 | - gds-tools=1.6.0.25=0
52 | - giflib=5.2.1=h5eee18b_3
53 | - gmp=6.2.1=h295c915_3
54 | - gnutls=3.6.15=he1e5248_0
55 | - idna=3.4=py38h06a4308_0
56 | - intel-openmp=2021.4.0=h06a4308_3561
57 | - jpeg=9e=h5eee18b_1
58 | - lame=3.100=h7b6447c_0
59 | - lcms2=2.12=h3be6417_0
60 | - ld_impl_linux-64=2.38=h1181459_1
61 | - lerc=3.0=h295c915_0
62 | - libcublas=11.9.2.110=h5e84587_0
63 | - libcublas-dev=11.9.2.110=h5c901ab_0
64 | - libcufft=10.7.1.112=hf425ae0_0
65 | - libcufft-dev=10.7.1.112=ha5ce4c0_0
66 | - libcufile=1.6.0.25=0
67 | - libcufile-dev=1.6.0.25=0
68 | - libcurand=10.3.2.56=0
69 | - libcurand-dev=10.3.2.56=0
70 | - libcusolver=11.3.4.124=h33c3c4e_0
71 | - libcusparse=11.7.2.124=h7538f96_0
72 | - libcusparse-dev=11.7.2.124=hbbe9722_0
73 | - libdeflate=1.17=h5eee18b_0
74 | - libffi=3.4.2=h6a678d5_6
75 | - libgcc-ng=11.2.0=h1234567_1
76 | - libgomp=11.2.0=h1234567_1
77 | - libiconv=1.16=h7f8727e_2
78 | - libidn2=2.3.2=h7f8727e_0
79 | - libnpp=11.6.3.124=hd2722f0_0
80 | - libnpp-dev=11.6.3.124=h3c42840_0
81 | - libnvjpeg=11.6.2.124=hd473ad6_0
82 | - libnvjpeg-dev=11.6.2.124=hb5906b9_0
83 | - libpng=1.6.39=h5eee18b_0
84 | - libstdcxx-ng=11.2.0=h1234567_1
85 | - libtasn1=4.16.0=h27cfd23_0
86 | - libtiff=4.5.0=h6a678d5_2
87 | - libunistring=0.9.10=h27cfd23_0
88 | - libwebp=1.2.4=h11a3e52_1
89 | - libwebp-base=1.2.4=h5eee18b_1
90 | - lz4-c=1.9.4=h6a678d5_0
91 | - mkl=2021.4.0=h06a4308_640
92 | - mkl-service=2.4.0=py38h7f8727e_0
93 | - mkl_fft=1.3.1=py38hd3c417c_0
94 | - mkl_random=1.2.2=py38h51133e4_0
95 | - ncurses=6.4=h6a678d5_0
96 | - nettle=3.7.3=hbbd107a_1
97 | - nsight-compute=2023.1.0.15=0
98 | - numpy=1.23.5=py38h14f4228_0
99 | - numpy-base=1.23.5=py38h31eccc5_0
100 | - openh264=2.1.1=h4ff587b_0
101 | - openssl=1.1.1t=h7f8727e_0
102 | - pillow=9.4.0=py38h6a678d5_0
103 | - pip=23.0.1=py38h06a4308_0
104 | - pycparser=2.21=pyhd3eb1b0_0
105 | - pyopenssl=23.0.0=py38h06a4308_0
106 | - pysocks=1.7.1=py38h06a4308_0
107 | - python=3.8.16=h7a1cb2a_3
108 | - pytorch=1.13.1=py3.8_cuda11.6_cudnn8.3.2_0
109 | - pytorch-cuda=11.6=h867d48c_1
110 | - pytorch-mutex=1.0=cuda
111 | - readline=8.2=h5eee18b_0
112 | - requests=2.28.1=py38h06a4308_1
113 | - setuptools=65.6.3=py38h06a4308_0
114 | - six=1.16.0=pyhd3eb1b0_1
115 | - sqlite=3.41.1=h5eee18b_0
116 | - tk=8.6.12=h1ccaba5_0
117 | - torchvision=0.14.1=py38_cu116
118 | - typing_extensions=4.4.0=py38h06a4308_0
119 | - urllib3=1.26.14=py38h06a4308_0
120 | - wheel=0.38.4=py38h06a4308_0
121 | - xz=5.2.10=h5eee18b_1
122 | - zlib=1.2.13=h5eee18b_0
123 | - zstd=1.5.2=ha4553b6_0
124 | - pip:
125 | - absl-py==1.4.0
126 | - addict==2.4.0
127 | - anyio==3.6.2
128 | - argon2-cffi==21.3.0
129 | - argon2-cffi-bindings==21.2.0
130 | - arrow==1.2.3
131 | - asttokens==2.2.1
132 | - attrs==22.2.0
133 | - backcall==0.2.0
134 | - beautifulsoup4==4.12.0
135 | - black==23.1.0
136 | - bleach==6.0.0
137 | - cachetools==5.3.0
138 | - click==8.1.3
139 | - comm==0.1.2
140 | - contourpy==1.0.7
141 | - cycler==0.11.0
142 | - debugpy==1.6.6
143 | - decorator==5.1.1
144 | - defusedxml==0.7.1
145 | - descartes==1.1.0
146 | - exceptiongroup==1.1.1
147 | - executing==1.2.0
148 | - fastjsonschema==2.16.3
149 | - fire==0.5.0
150 | - flake8==6.0.0
151 | - fonttools==4.39.2
152 | - fqdn==1.5.1
153 | - google-auth==2.16.2
154 | - google-auth-oauthlib==0.4.6
155 | - grpcio==1.51.3
156 | - imageio==2.26.1
157 | - importlib-metadata==6.1.0
158 | - importlib-resources==5.12.0
159 | - iniconfig==2.0.0
160 | - ipykernel==6.22.0
161 | - ipython==8.11.0
162 | - ipython-genutils==0.2.0
163 | - ipywidgets==8.0.4
164 | - isoduration==20.11.0
165 | - jedi==0.18.2
166 | - jinja2==3.1.2
167 | - joblib==1.2.0
168 | - jsonpointer==2.3
169 | - jsonschema==4.17.3
170 | - jupyter==1.0.0
171 | - jupyter-client==8.1.0
172 | - jupyter-console==6.6.3
173 | - jupyter-core==5.3.0
174 | - jupyter-events==0.6.3
175 | - jupyter-server==2.5.0
176 | - jupyter-server-terminals==0.4.4
177 | - jupyterlab-pygments==0.2.2
178 | - jupyterlab-widgets==3.0.5
179 | - kiwisolver==1.4.4
180 | - llvmlite==0.36.0
181 | - lyft-dataset-sdk==0.0.8
182 | - markdown==3.4.1
183 | - markupsafe==2.1.2
184 | - matplotlib==3.5.2
185 | - matplotlib-inline==0.1.6
186 | - mccabe==0.7.0
187 | - mistune==2.0.5
188 | - mmcls==0.25.0
189 | - mmcv-full==1.7.0
190 | - mmdet==2.28.2
191 | - mmsegmentation==0.30.0
192 | - mypy-extensions==1.0.0
193 | - nbclassic==0.5.3
194 | - nbclient==0.7.2
195 | - nbconvert==7.2.10
196 | - nbformat==5.8.0
197 | - nest-asyncio==1.5.6
198 | - networkx==2.2
199 | - notebook==6.5.3
200 | - notebook-shim==0.2.2
201 | - numba==0.53.0
202 | - nuscenes-devkit==1.1.10
203 | - oauthlib==3.2.2
204 | - opencv-python==4.7.0.72
205 | - packaging==23.0
206 | - pandas==1.5.3
207 | - pandocfilters==1.5.0
208 | - parso==0.8.3
209 | - pathspec==0.11.1
210 | - pexpect==4.8.0
211 | - pickleshare==0.7.5
212 | - pkgutil-resolve-name==1.3.10
213 | - platformdirs==3.1.1
214 | - plotly==5.13.1
215 | - pluggy==1.0.0
216 | - plyfile==0.8.1
217 | - prettytable==3.6.0
218 | - prometheus-client==0.16.0
219 | - prompt-toolkit==3.0.38
220 | - protobuf==4.22.1
221 | - psutil==5.9.4
222 | - ptyprocess==0.7.0
223 | - pure-eval==0.2.2
224 | - pyasn1==0.4.8
225 | - pyasn1-modules==0.2.8
226 | - pycocotools==2.0.6
227 | - pycodestyle==2.10.0
228 | - pyflakes==3.0.1
229 | - pygments==2.14.0
230 | - pyparsing==3.0.9
231 | - pyquaternion==0.9.9
232 | - pyrsistent==0.19.3
233 | - pytest==7.2.2
234 | - python-dateutil==2.8.2
235 | - python-json-logger==2.0.7
236 | - pytz==2022.7.1
237 | - pywavelets==1.4.1
238 | - pyyaml==6.0
239 | - pyzmq==25.0.2
240 | - qtconsole==5.4.1
241 | - qtpy==2.3.0
242 | - requests-oauthlib==1.3.1
243 | - rfc3339-validator==0.1.4
244 | - rfc3986-validator==0.1.1
245 | - rsa==4.9
246 | - scikit-image==0.19.3
247 | - scikit-learn==1.2.2
248 | - scipy==1.10.1
249 | - send2trash==1.8.0
250 | - shapely==1.8.5
251 | - sniffio==1.3.0
252 | - soupsieve==2.4
253 | - stack-data==0.6.2
254 | - tenacity==8.2.2
255 | - tensorboard==2.12.0
256 | - tensorboard-data-server==0.7.0
257 | - tensorboard-plugin-wit==1.8.1
258 | - termcolor==2.2.0
259 | - terminado==0.17.1
260 | - terminaltables==3.1.10
261 | - threadpoolctl==3.1.0
262 | - tifffile==2023.3.15
263 | - tinycss2==1.2.1
264 | - tomli==2.0.1
265 | - tornado==6.2
266 | - tqdm==4.65.0
267 | - traitlets==5.9.0
268 | - trimesh==2.35.39
269 | - uri-template==1.2.0
270 | - wcwidth==0.2.6
271 | - webcolors==1.12
272 | - webencodings==0.5.1
273 | - websocket-client==1.5.1
274 | - werkzeug==2.2.3
275 | - widgetsnbextension==4.0.5
276 | - yapf==0.32.0
277 | - zipp==3.15.0
278 | prefix: /home/cthavama/miniconda3/envs/lzu
279 |
--------------------------------------------------------------------------------
/lzu/__init__.py:
--------------------------------------------------------------------------------
1 | from .fcos_mono3d_head_norescale import * # noqa: F401,F403
2 | from .lzu_fcos_mono3d import * # noqa: F401,F403
3 | from .transforms_3d import * # noqa: F401,F403
4 |
--------------------------------------------------------------------------------
/lzu/fcos_mono3d_head_norescale.py:
--------------------------------------------------------------------------------
1 | from mmdet.models.builder import HEADS
2 | from mmdet3d.models.dense_heads import FCOSMono3DHead
3 |
4 |
5 | @HEADS.register_module()
6 | class FCOSMono3DHeadNoRescale(FCOSMono3DHead):
7 | """Same as original head, except ignores rescaling at end."""
8 |
9 | def _get_bboxes_single(self,
10 | cls_scores,
11 | bbox_preds,
12 | dir_cls_preds,
13 | attr_preds,
14 | centernesses,
15 | mlvl_points,
16 | input_meta,
17 | cfg,
18 | rescale=False):
19 | return super()._get_bboxes_single(cls_scores, bbox_preds,
20 | dir_cls_preds, attr_preds,
21 | centernesses, mlvl_points,
22 | input_meta, cfg, rescale=False)
23 |
--------------------------------------------------------------------------------
/lzu/fixed_grid.py:
--------------------------------------------------------------------------------
1 | import pickle
2 |
3 | import numpy as np
4 | import torch
5 | import torch.nn as nn
6 | import torch.nn.functional as F
7 | from mmdet3d.models.builder import MODELS
8 |
9 |
10 | GRID_GENERATORS = MODELS
11 |
12 |
13 | def build_grid_generator(cfg):
14 | """Build view transformer."""
15 | return GRID_GENERATORS.build(cfg)
16 |
17 |
18 | def make1DGaussian(size, fwhm=3, center=None):
19 | """ Make a 1D gaussian kernel.
20 |
21 | size is the length of the kernel,
22 | fwhm is full-width-half-maximum, which
23 | can be thought of as an effective radius.
24 | """
25 | x = np.arange(0, size, 1, dtype=np.float)
26 |
27 | if center is None:
28 | center = size // 2
29 |
30 | return np.exp(-4*np.log(2) * (x-center)**2 / fwhm**2)
31 |
32 |
33 | def make2DGaussian(size, fwhm=3, center=None):
34 | """ Make a square gaussian kernel.
35 |
36 | size is the length of a side of the square
37 | fwhm is full-width-half-maximum, which
38 | can be thought of as an effective radius.
39 | """
40 |
41 | x = np.arange(0, size, 1, float)
42 | y = x[:, np.newaxis]
43 |
44 | if center is None:
45 | x0 = y0 = size // 2
46 | else:
47 | x0 = center[0]
48 | y0 = center[1]
49 |
50 | return np.exp(-4*np.log(2) * ((x-x0)**2 + (y-y0)**2) / fwhm**2)
51 |
52 |
53 | class RecasensSaliencyToGridMixin(object):
54 | """Grid generator based on 'Learning to Zoom: a Saliency-Based Sampling \
55 | Layer for Neural Networks' [https://arxiv.org/pdf/1809.03355.pdf]."""
56 |
57 | def __init__(self, output_shape, grid_shape=(31, 51), separable=True,
58 | attraction_fwhm=13, anti_crop=True, **kwargs):
59 | super(RecasensSaliencyToGridMixin, self).__init__()
60 | self.output_shape = output_shape
61 | self.output_height, self.output_width = output_shape
62 | self.grid_shape = grid_shape
63 | self.padding_size = min(self.grid_shape)-1
64 | self.total_shape = tuple(
65 | dim+2*self.padding_size
66 | for dim in self.grid_shape
67 | )
68 | self.padding_mode = 'reflect' if anti_crop else 'replicate'
69 | self.separable = separable
70 |
71 | if self.separable:
72 | self.filter = make1DGaussian(
73 | 2*self.padding_size+1, fwhm=attraction_fwhm)
74 | self.filter = torch.FloatTensor(self.filter).unsqueeze(0) \
75 | .unsqueeze(0).cuda()
76 |
77 | self.P_basis_x = torch.zeros(self.total_shape[1])
78 | for i in range(self.total_shape[1]):
79 | self.P_basis_x[i] = \
80 | (i-self.padding_size)/(self.grid_shape[1]-1.0)
81 | self.P_basis_y = torch.zeros(self.total_shape[0])
82 | for i in range(self.total_shape[0]):
83 | self.P_basis_y[i] = \
84 | (i-self.padding_size)/(self.grid_shape[0]-1.0)
85 | else:
86 | self.filter = make2DGaussian(
87 | 2*self.padding_size+1, fwhm=attraction_fwhm)
88 | self.filter = torch.FloatTensor(self.filter) \
89 | .unsqueeze(0).unsqueeze(0).cuda()
90 |
91 | self.P_basis = torch.zeros(2, *self.total_shape)
92 | for k in range(2):
93 | for i in range(self.total_shape[0]):
94 | for j in range(self.total_shape[1]):
95 | self.P_basis[k, i, j] = k*(i-self.padding_size)/(self.grid_shape[0]-1.0)+(1.0-k)*(j-self.padding_size)/(self.grid_shape[1]-1.0) # noqa: E501
96 |
97 | def separable_saliency_to_grid(self, imgs, x_saliency,
98 | y_saliency, device):
99 | assert self.separable
100 | x_saliency = F.pad(x_saliency, (self.padding_size, self.padding_size),
101 | mode=self.padding_mode)
102 | y_saliency = F.pad(y_saliency, (self.padding_size, self.padding_size),
103 | mode=self.padding_mode)
104 |
105 | N = imgs.shape[0]
106 | P_x = torch.zeros(1, 1, self.total_shape[1], device=device)
107 | P_x[0, 0, :] = self.P_basis_x
108 | P_x = P_x.expand(N, 1, self.total_shape[1])
109 | P_y = torch.zeros(1, 1, self.total_shape[0], device=device)
110 | P_y[0, 0, :] = self.P_basis_y
111 | P_y = P_y.expand(N, 1, self.total_shape[0])
112 |
113 | weights = F.conv1d(x_saliency, self.filter)
114 | weighted_offsets = torch.mul(P_x, x_saliency)
115 | weighted_offsets = F.conv1d(weighted_offsets, self.filter)
116 | xgrid = weighted_offsets/weights
117 | xgrid = torch.clamp(xgrid*2-1, min=-1, max=1)
118 | xgrid = xgrid.view(-1, 1, 1, self.grid_shape[1])
119 | xgrid = xgrid.expand(-1, 1, *self.grid_shape)
120 |
121 | weights = F.conv1d(y_saliency, self.filter)
122 | weighted_offsets = F.conv1d(torch.mul(P_y, y_saliency), self.filter)
123 | ygrid = weighted_offsets/weights
124 | ygrid = torch.clamp(ygrid*2-1, min=-1, max=1)
125 | ygrid = ygrid.view(-1, 1, self.grid_shape[0], 1)
126 | ygrid = ygrid.expand(-1, 1, *self.grid_shape)
127 |
128 | grid = torch.cat((xgrid, ygrid), 1)
129 | upsampled_grid = F.interpolate(grid, size=self.output_shape,
130 | mode='bilinear', align_corners=True)
131 | return upsampled_grid.permute(0, 2, 3, 1), grid.permute(0, 2, 3, 1)
132 |
133 | def nonseparable_saliency_to_grid(self, imgs, saliency, device):
134 | assert not self.separable
135 | p = self.padding_size
136 | saliency = F.pad(saliency, (p, p, p, p), mode=self.padding_mode)
137 |
138 | N = imgs.shape[0]
139 | P = torch.zeros(1, 2, *self.total_shape, device=device)
140 | P[0, :, :, :] = self.P_basis
141 | P = P.expand(N, 2, *self.total_shape)
142 |
143 | saliency_cat = torch.cat((saliency, saliency), 1)
144 | weights = F.conv2d(saliency, self.filter)
145 | weighted_offsets = torch.mul(P, saliency_cat) \
146 | .view(-1, 1, *self.total_shape)
147 | weighted_offsets = F.conv2d(weighted_offsets, self.filter) \
148 | .view(-1, 2, *self.grid_shape)
149 |
150 | weighted_offsets_x = weighted_offsets[:, 0, :, :] \
151 | .contiguous().view(-1, 1, *self.grid_shape)
152 | xgrid = weighted_offsets_x/weights
153 | xgrid = torch.clamp(xgrid*2-1, min=-1, max=1)
154 | xgrid = xgrid.view(-1, 1, *self.grid_shape)
155 |
156 | weighted_offsets_y = weighted_offsets[:, 1, :, :] \
157 | .contiguous().view(-1, 1, *self.grid_shape)
158 | ygrid = weighted_offsets_y/weights
159 | ygrid = torch.clamp(ygrid*2-1, min=-1, max=1)
160 | ygrid = ygrid.view(-1, 1, *self.grid_shape)
161 |
162 | grid = torch.cat((xgrid, ygrid), 1)
163 | upsampled_grid = F.interpolate(grid, size=self.output_shape,
164 | mode='bilinear', align_corners=True)
165 | return upsampled_grid.permute(0, 2, 3, 1), grid.permute(0, 2, 3, 1)
166 |
167 |
168 | @GRID_GENERATORS.register_module()
169 | class FixedGrid(nn.Module, RecasensSaliencyToGridMixin):
170 | """Grid generator that uses a fixed saliency map -- KDE SD"""
171 |
172 | def __init__(self, saliency_file, **kwargs):
173 | super(FixedGrid, self).__init__()
174 | RecasensSaliencyToGridMixin.__init__(self, **kwargs)
175 | self.saliency = pickle.load(open(saliency_file, 'rb')).cuda()
176 |
177 | if self.separable:
178 | x_saliency = self.saliency.sum(dim=2)
179 | y_saliency = self.saliency.sum(dim=3)
180 | self.upsampled_grid, self.grid = self.separable_saliency_to_grid(
181 | torch.zeros(1), x_saliency, y_saliency, torch.device('cuda'))
182 | else:
183 | self.upsampled_grid, self.grid = (
184 | self.nonseparable_saliency_to_grid(
185 | torch.zeros(1), self.saliency, torch.device('cuda'))
186 | )
187 |
188 | def forward(self, imgs, img_metas, **kwargs):
189 | B = imgs.shape[0]
190 | upsampled_grid = self.upsampled_grid.expand(B, -1, -1, -1)
191 | grid = self.grid.expand(B, -1, -1, -1)
192 |
193 | # Uncomment to visualize saliency map
194 | # h, w, _ = img_metas[0]['pad_shape']
195 | # show_saliency = F.interpolate(self.saliency, size=(h, w),
196 | # mode='bilinear', align_corners=True)
197 | # show_saliency = 255*(show_saliency/show_saliency.max())
198 | # show_saliency = show_saliency.expand(
199 | # show_saliency.size(0), 3, h, w)
200 | # vis_batched_imgs(vis_options['saliency'], show_saliency,
201 | # img_metas, denorm=False)
202 | # vis_batched_imgs(vis_options['saliency']+'_no_box', show_saliency,
203 | # img_metas, bboxes=None, denorm=False)
204 |
205 | return upsampled_grid, grid
206 |
--------------------------------------------------------------------------------
/lzu/invert_grid.py:
--------------------------------------------------------------------------------
1 | from math import floor, ceil
2 | from typing import List
3 |
4 | import torch
5 |
6 |
7 | def invert_grid(grid, input_shape, separable=False):
8 | f = invert_separable_grid if separable else invert_nonseparable_grid
9 | return f(grid, list(input_shape))
10 |
11 |
12 | @torch.jit.script
13 | def invert_separable_grid(grid, input_shape: List[int]):
14 | grid = grid.clone()
15 | device = grid.device
16 | H: int = input_shape[2]
17 | W: int = input_shape[3]
18 | B, grid_H, grid_W, _ = grid.shape
19 | assert B == input_shape[0]
20 |
21 | eps = 1e-8
22 | grid[:, :, :, 0] = (grid[:, :, :, 0] + 1) / 2 * (W - 1)
23 | grid[:, :, :, 1] = (grid[:, :, :, 1] + 1) / 2 * (H - 1)
24 | # grid now ranges from 0 to ([H or W] - 1)
25 | # TODO: implement batch operations
26 | inverse_grid = 2 * max(H, W) * torch.ones(
27 | [B, H, W, 2], dtype=torch.float32, device=device)
28 | for b in range(B):
29 | # each of these is ((grid_H - 1)*(grid_W - 1)) x 2
30 | p00 = grid[b, :-1, :-1, :].contiguous().view(-1, 2) # noqa: 203
31 | p10 = grid[b, 1: , :-1, :].contiguous().view(-1, 2) # noqa: 203
32 | p01 = grid[b, :-1, 1: , :].contiguous().view(-1, 2) # noqa: 203
33 |
34 | ref = torch.floor(p00).to(torch.int)
35 | v00 = p00 - ref
36 | v10 = p10 - ref
37 | v01 = p01 - ref
38 | vx = p01[:, 0] - p00[:, 0]
39 | vy = p10[:, 1] - p00[:, 1]
40 |
41 | min_x = int(floor(v00[:, 0].min() - eps))
42 | max_x = int(ceil(v01[:, 0].max() + eps))
43 | min_y = int(floor(v00[:, 1].min() - eps))
44 | max_y = int(ceil(v10[:, 1].max() + eps))
45 |
46 | pts = torch.cartesian_prod(
47 | torch.arange(min_x, max_x + 1, device=device),
48 | torch.arange(min_y, max_y + 1, device=device),
49 | ).T # 2 x (x_range*y_range)
50 |
51 | unwarped_x = (pts[0].unsqueeze(0) - v00[:, 0].unsqueeze(1)) / vx.unsqueeze(1) # noqa: E501
52 | unwarped_y = (pts[1].unsqueeze(0) - v00[:, 1].unsqueeze(1)) / vy.unsqueeze(1) # noqa: E501
53 | unwarped_pts = torch.stack((unwarped_y, unwarped_x), dim=0) # noqa: E501, has shape2 x ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range)
54 |
55 | good_indices = torch.logical_and(
56 | torch.logical_and(-eps <= unwarped_pts[0],
57 | unwarped_pts[0] <= 1+eps),
58 | torch.logical_and(-eps <= unwarped_pts[1],
59 | unwarped_pts[1] <= 1+eps),
60 | ) # ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range)
61 | nonzero_good_indices = good_indices.nonzero()
62 | inverse_j = pts[0, nonzero_good_indices[:, 1]] + ref[nonzero_good_indices[:, 0], 0] # noqa: E501
63 | inverse_i = pts[1, nonzero_good_indices[:, 1]] + ref[nonzero_good_indices[:, 0], 1] # noqa: E501
64 | # TODO: is replacing this with reshape operations on good_indices faster? # noqa: E501
65 | j = nonzero_good_indices[:, 0] % (grid_W - 1)
66 | i = nonzero_good_indices[:, 0] // (grid_W - 1)
67 | grid_mappings = torch.stack(
68 | (j + unwarped_pts[1, good_indices], i + unwarped_pts[0, good_indices]), # noqa: E501
69 | dim=1
70 | )
71 | in_bounds = torch.logical_and(
72 | torch.logical_and(0 <= inverse_i, inverse_i < H),
73 | torch.logical_and(0 <= inverse_j, inverse_j < W),
74 | )
75 | inverse_grid[b, inverse_i[in_bounds], inverse_j[in_bounds], :] = grid_mappings[in_bounds, :] # noqa: E501
76 |
77 | inverse_grid[..., 0] = (inverse_grid[..., 0]) / (grid_W - 1) * 2.0 - 1.0 # noqa: E501
78 | inverse_grid[..., 1] = (inverse_grid[..., 1]) / (grid_H - 1) * 2.0 - 1.0 # noqa: E501
79 | return inverse_grid
80 |
81 |
82 | def invert_nonseparable_grid(grid, input_shape):
83 | grid = grid.clone()
84 | device = grid.device
85 | _, _, H, W = input_shape
86 | B, grid_H, grid_W, _ = grid.shape
87 | assert B == input_shape[0]
88 |
89 | eps = 1e-8
90 | grid[:, :, :, 0] = (grid[:, :, :, 0] + 1) / 2 * (W - 1)
91 | grid[:, :, :, 1] = (grid[:, :, :, 1] + 1) / 2 * (H - 1)
92 | # grid now ranges from 0 to ([H or W] - 1)
93 | # TODO: implement batch operations
94 | inverse_grid = 2 * max(H, W) * torch.ones(
95 | (B, H, W, 2), dtype=torch.float32, device=device)
96 | for b in range(B):
97 | # each of these is ((grid_H - 1)*(grid_W - 1)) x 2
98 | p00 = grid[b, :-1, :-1, :].contiguous().view(-1, 2) # noqa: 203
99 | p10 = grid[b, 1: , :-1, :].contiguous().view(-1, 2) # noqa: 203
100 | p01 = grid[b, :-1, 1: , :].contiguous().view(-1, 2) # noqa: 203
101 | p11 = grid[b, 1: , 1: , :].contiguous().view(-1, 2) # noqa: 203
102 |
103 | ref = torch.floor(p00).type(torch.int)
104 | v00 = p00 - ref
105 | v10 = p10 - ref
106 | v01 = p01 - ref
107 | v11 = p11 - ref
108 |
109 | min_x = int(floor(min(v00[:, 0].min(), v10[:, 0].min()) - eps))
110 | max_x = int(ceil(max(v01[:, 0].max(), v11[:, 0].max()) + eps))
111 | min_y = int(floor(min(v00[:, 1].min(), v01[:, 1].min()) - eps))
112 | max_y = int(ceil(max(v10[:, 1].max(), v11[:, 1].max()) + eps))
113 |
114 | pts = torch.cartesian_prod(
115 | torch.arange(min_x, max_x + 1, device=device),
116 | torch.arange(min_y, max_y + 1, device=device),
117 | ).T
118 |
119 | # each of these is ((grid_H - 1)*(grid_W - 1)) x 2
120 | vb = v10 - v00
121 | vc = v01 - v00
122 | vd = v00 - v10 - v01 + v11
123 |
124 | vx = pts.permute(1, 0).unsqueeze(0) # 1 x (x_range*y_range) x 2
125 | Ma = v00.unsqueeze(1) - vx # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range) x 2
126 |
127 | vc_cross_vd = (vc[:, 0] * vd[:, 1] - vc[:, 1] * vd[:, 0]).unsqueeze(1) # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x 1
128 | vc_cross_vb = (vc[:, 0] * vb[:, 1] - vc[:, 1] * vb[:, 0]).unsqueeze(1) # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x 1
129 | Ma_cross_vd = (Ma[:, :, 0] * vd[:, 1].unsqueeze(1) - Ma[:, :, 1] * vd[:, 0].unsqueeze(1)) # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range)
130 | Ma_cross_vb = (Ma[:, :, 0] * vb[:, 1].unsqueeze(1) - Ma[:, :, 1] * vb[:, 0].unsqueeze(1)) # noqa: E501, ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range)
131 |
132 | qf_a = vc_cross_vd.expand(*Ma_cross_vd.shape)
133 | qf_b = vc_cross_vb + Ma_cross_vd
134 | qf_c = Ma_cross_vb
135 |
136 | mu_neg = -1 * torch.ones_like(Ma_cross_vd)
137 | mu_pos = -1 * torch.ones_like(Ma_cross_vd)
138 | mu_linear = -1 * torch.ones_like(Ma_cross_vd)
139 |
140 | nzie = (qf_a.abs() > 1e-10).expand(*Ma_cross_vd.shape)
141 |
142 | disc = (qf_b[nzie]**2 - 4 * qf_a[nzie] * qf_c[nzie]) ** 0.5
143 | mu_pos[nzie] = (-qf_b[nzie] + disc) / (2 * qf_a[nzie])
144 | mu_neg[nzie] = (-qf_b[nzie] - disc) / (2 * qf_a[nzie])
145 | mu_linear[~nzie] = qf_c[~nzie] / qf_b[~nzie]
146 |
147 | mu_pos_valid = torch.logical_and(mu_pos >= 0, mu_pos <= 1)
148 | mu_neg_valid = torch.logical_and(mu_neg >= 0, mu_neg <= 1)
149 | mu_linear_valid = torch.logical_and(mu_linear >= 0, mu_linear <= 1)
150 |
151 | mu = -1 * torch.ones_like(Ma_cross_vd)
152 | mu[mu_pos_valid] = mu_pos[mu_pos_valid]
153 | mu[mu_neg_valid] = mu_neg[mu_neg_valid]
154 | mu[mu_linear_valid] = mu_linear[mu_linear_valid]
155 |
156 | lmbda = -1 * (Ma[:, :, 1] + mu * vc[:, 1:2]) / (vb[:, 1:2] + vd[:, 1:2] * mu) # noqa: E501
157 |
158 | unwarped_pts = torch.stack((lmbda, mu), dim=0)
159 |
160 | good_indices = torch.logical_and(
161 | torch.logical_and(-eps <= unwarped_pts[0],
162 | unwarped_pts[0] <= 1+eps),
163 | torch.logical_and(-eps <= unwarped_pts[1],
164 | unwarped_pts[1] <= 1+eps),
165 | ) # ((grid_H - 1)*(grid_W - 1)) x (x_range*y_range)
166 | nonzero_good_indices = good_indices.nonzero()
167 | inverse_j = pts[0, nonzero_good_indices[:, 1]] + ref[nonzero_good_indices[:, 0], 0] # noqa: E501
168 | inverse_i = pts[1, nonzero_good_indices[:, 1]] + ref[nonzero_good_indices[:, 0], 1] # noqa: E501
169 | # TODO: is replacing this with reshape operations on good_indices faster? # noqa: E501
170 | j = nonzero_good_indices[:, 0] % (grid_W - 1)
171 | i = nonzero_good_indices[:, 0] // (grid_W - 1)
172 | grid_mappings = torch.stack(
173 | (j + unwarped_pts[1, good_indices], i + unwarped_pts[0, good_indices]), # noqa: E501
174 | dim=1
175 | )
176 | in_bounds = torch.logical_and(
177 | torch.logical_and(0 <= inverse_i, inverse_i < H),
178 | torch.logical_and(0 <= inverse_j, inverse_j < W),
179 | )
180 | inverse_grid[b, inverse_i[in_bounds], inverse_j[in_bounds], :] = grid_mappings[in_bounds, :] # noqa: E501
181 |
182 | inverse_grid[..., 0] = (inverse_grid[..., 0]) / (grid_W - 1) * 2.0 - 1.0 # noqa: E501
183 | inverse_grid[..., 1] = (inverse_grid[..., 1]) / (grid_H - 1) * 2.0 - 1.0 # noqa: E501
184 | return inverse_grid
185 |
--------------------------------------------------------------------------------
/lzu/lzu_fcos_mono3d.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import numpy as np
3 | import torch.nn.functional as F
4 | from mmdet.models.builder import DETECTORS
5 |
6 | from mmdet3d.core import bbox3d2result
7 | from mmdet3d.models.detectors.single_stage_mono3d import (
8 | SingleStageMono3DDetector
9 | )
10 |
11 | from .invert_grid import invert_grid
12 | from .fixed_grid import build_grid_generator
13 |
14 |
15 | @DETECTORS.register_module()
16 | class LZUFCOSMono3D(SingleStageMono3DDetector):
17 | """Fixed LZU + FCOSMono3D"""
18 |
19 | def __init__(self,
20 | grid_generator,
21 | backbone,
22 | neck,
23 | bbox_head,
24 | train_cfg=None,
25 | test_cfg=None,
26 | pretrained=None):
27 | super(LZUFCOSMono3D, self).__init__(backbone, neck, bbox_head,
28 | train_cfg, test_cfg, pretrained)
29 | self.grid_generator = build_grid_generator(grid_generator)
30 | self.forward_grids = None # cache forward warp "zoom" grids
31 | self.inverse_grids = None # cache inverse warp "zoom" grids
32 | self.times = []
33 |
34 | def _get_scale_factor(self, ori_shape, new_shape):
35 | ori_height, ori_width, _ = ori_shape
36 | img_height, img_width, _ = new_shape
37 | w_scale = img_width / ori_width
38 | h_scale = img_height / ori_height
39 | assert w_scale == h_scale
40 | return w_scale
41 |
42 | def extract_feat(self, img, img_metas, **kwargs):
43 | """Directly extract features from the backbone+neck."""
44 |
45 | # "zoom" or forward warp input image
46 | if self.forward_grids is None:
47 | upsampled_grid, grid = self.grid_generator(
48 | img, img_metas, **kwargs)
49 | self.forward_grids = upsampled_grid[0:1], grid[0:1]
50 | else:
51 | upsampled_grid, grid = self.forward_grids
52 | B = img.shape[0]
53 | upsampled_grid = upsampled_grid.expand(B, -1, -1, -1)
54 |
55 | # Uncomment and change scale factor to run upsampling experiments
56 | # warped_imgs = F.interpolate(img, scale_factor=0.75)
57 | # warped_imgs = F.grid_sample(
58 | # warped_imgs, upsampled_grid, align_corners=True)
59 | warped_imgs = F.grid_sample(img, upsampled_grid, align_corners=True)
60 |
61 | # Uncomment to visualize "zoomed" images
62 | # from mmcv import imdenormalize
63 | # from PIL import Image
64 | # import os
65 | # show_img = warped_imgs[0].permute(1, 2, 0).cpu().detach().numpy()
66 | # show_img = imdenormalize(
67 | # show_img,
68 | # mean=np.array([103.53, 116.28, 123.675]),
69 | # std=np.array([1.0, 1.0, 1.0]),
70 | # to_bgr=True)
71 | # img_name = os.path.basename(img_metas[0]['filename'])[:-4]
72 | # Image.fromarray(show_img.astype(np.uint8)).save(f'/project_data/ramanan/cthavama/FOVEAv2_exp/3D/lzu_fcos3d_sd/test_FT/vis_warped/{img_name}.png') # noqa: E501
73 | # breakpoint()
74 |
75 | img_height, img_width = upsampled_grid.shape[1:3]
76 | img_shape = (img_height, img_width, 3)
77 | ori_shape = img_metas[0]['ori_shape']
78 | scale_factor = self._get_scale_factor(ori_shape, img_shape)
79 |
80 | # pad warped images; TODO: undo hardcoding size divisor of 32
81 | pad_h = int(np.ceil(img_shape[0] / 32)) * 32 - img_shape[0]
82 | pad_w = int(np.ceil(img_shape[1] / 32)) * 32 - img_shape[1]
83 | warped_imgs = F.pad(
84 | warped_imgs, (0, pad_w, 0, pad_h), mode='constant', value=0)
85 | pad_shape = (warped_imgs.shape[2], warped_imgs.shape[3], 3)
86 |
87 | # update img metas, assuming that all imgs have the same original shape
88 | for i in range(len(img_metas)):
89 | img_metas[i]['img_shape'] = img_shape
90 | img_metas[i]['scale_factor'] = scale_factor
91 | img_metas[i]['pad_shape'] = pad_shape
92 | img_metas[i]['pad_fixed_size'] = None
93 | img_metas[i]['pad_size_divisor'] = 32
94 | # resize ground truth boxes and centers
95 | if 'centers2d' in kwargs:
96 | kwargs['centers2d'][i] *= scale_factor
97 | if 'gt_bboxes' in kwargs:
98 | kwargs['gt_bboxes'][i] *= scale_factor
99 | for j in range(len(img_metas[i]['cam2img'][0])):
100 | img_metas[i]['cam2img'][0][j] *= scale_factor
101 | img_metas[i]['cam2img'][1][j] *= scale_factor
102 |
103 | # Encode
104 | warped_x = self.backbone(warped_imgs)
105 | if self.with_neck:
106 | warped_x = self.neck(warped_x)
107 |
108 | # Unzoom
109 | x = []
110 | # precompute and cache inverses
111 | separable = self.grid_generator.separable
112 | if self.inverse_grids is None:
113 | self.inverse_grids = []
114 | for i in range(len(warped_x)):
115 | input_shape = warped_x[i].shape
116 | inverse_grid = invert_grid(grid, input_shape,
117 | separable=separable)[0:1]
118 | self.inverse_grids.append(inverse_grid)
119 | # perform unzoom
120 | for i in range(len(warped_x)):
121 | B = len(warped_x[i])
122 | inverse_grid = self.inverse_grids[i].expand(B, -1, -1, -1)
123 | unwarped_x = F.grid_sample(
124 | warped_x[i], inverse_grid, mode='bilinear',
125 | align_corners=True, padding_mode='zeros'
126 | )
127 | x.append(unwarped_x)
128 |
129 | return tuple(x)
130 |
131 | def forward_train(self,
132 | img,
133 | img_metas,
134 | gt_bboxes,
135 | gt_labels,
136 | gt_bboxes_3d,
137 | gt_labels_3d,
138 | centers2d,
139 | depths,
140 | attr_labels=None,
141 | gt_bboxes_ignore=None):
142 | x = self.extract_feat(img, img_metas,
143 | gt_bboxes=gt_bboxes, centers2d=centers2d)
144 | losses = self.bbox_head.forward_train(x, img_metas, gt_bboxes,
145 | gt_labels, gt_bboxes_3d,
146 | gt_labels_3d, centers2d, depths,
147 | attr_labels, gt_bboxes_ignore)
148 | return losses
149 |
150 | def simple_test(self, img, img_metas, rescale=False, **kwargs):
151 | """Test function without test time augmentation.
152 |
153 | Args:
154 | imgs (list[torch.Tensor]): List of multiple images
155 | img_metas (list[dict]): List of image information.
156 | rescale (bool, optional): Whether to rescale the results.
157 | Defaults to False.
158 |
159 | Returns:
160 | list[list[np.ndarray]]: BBox results of each image and classes.
161 | The outer list corresponds to each image. The inner list
162 | corresponds to each class.
163 | """
164 | x = self.extract_feat(img, img_metas, **kwargs)
165 | outs = self.bbox_head(x)
166 | bbox_outputs = self.bbox_head.get_bboxes(
167 | *outs, img_metas, rescale=rescale)
168 |
169 | if self.bbox_head.pred_bbox2d:
170 | from mmdet.core import bbox2result
171 | bbox2d_img = [
172 | bbox2result(bboxes2d, labels, self.bbox_head.num_classes)
173 | for bboxes, scores, labels, attrs, bboxes2d in bbox_outputs
174 | ]
175 | bbox_outputs = [bbox_outputs[0][:-1]]
176 |
177 | bbox_img = [
178 | bbox3d2result(bboxes, scores, labels, attrs)
179 | for bboxes, scores, labels, attrs in bbox_outputs
180 | ]
181 |
182 | bbox_list = [dict() for i in range(len(img_metas))]
183 | for result_dict, img_bbox in zip(bbox_list, bbox_img):
184 | result_dict['img_bbox'] = img_bbox
185 | if self.bbox_head.pred_bbox2d:
186 | for result_dict, img_bbox2d in zip(bbox_list, bbox2d_img):
187 | result_dict['img_bbox2d'] = img_bbox2d
188 | return bbox_list
189 |
190 | def aug_test(self, imgs, img_metas, rescale=False):
191 | raise NotImplementedError
192 |
--------------------------------------------------------------------------------
/lzu/transforms_3d.py:
--------------------------------------------------------------------------------
1 | from mmdet.datasets.builder import PIPELINES
2 | from mmdet.datasets.pipelines import Resize
3 |
4 |
5 | @PIPELINES.register_module()
6 | class Resize3D(Resize):
7 |
8 | def __call__(self, input_dict):
9 | """Call function to resize images, bounding boxes, masks, semantic
10 | segmentation map, *and additionally adjust the camera intrinsics*.
11 |
12 | Args:
13 | input_dict (dict): Result dict from loading pipeline.
14 |
15 | Returns:
16 | dict: Resized results, 'img_shape', 'pad_shape', 'scale_factor', \
17 | 'keep_ratio' keys are added into result dict.
18 | """
19 | input_dict = super().__call__(input_dict)
20 | w_scale, h_scale, _, _ = input_dict['scale_factor']
21 | assert w_scale == h_scale
22 |
23 | input_dict['scale_factor'] = w_scale
24 | if 'centers2d' in input_dict:
25 | input_dict['centers2d'] *= w_scale
26 | for i in range(len(input_dict['cam2img'][0])):
27 | input_dict['cam2img'][0][i] *= w_scale
28 | input_dict['cam2img'][1][i] *= h_scale
29 | return input_dict
30 |
--------------------------------------------------------------------------------
/run.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | outDir="/project_data/ramanan/cthavama/LZU_release_tests" # CHANGE THIS
4 | nGPU=2
5 |
6 | case $1 in
7 | "fcos3d_0.25"|"fcos3d_0.50"|"fcos3d_0.75"|"fcos3d_1.00"|"lzu_fcos3d_0.25"|"lzu_fcos3d_0.50"|"lzu_fcos3d_0.75"|"lzu_fcos3d_1.00") expName=$1 ;;
8 | *) echo "Invalid experiment name." && exit;;
9 | esac
10 |
11 | # Test checkpointed model
12 | # set data.test.samples_per_gpu=1 if runnning timing tests
13 | python tools/test.py \
14 | configs/$expName.py \
15 | ckpt/$expName.pth \
16 | --out $outDir/$expName/test_checkpoint/results.pkl \
17 | --eval bbox \
18 | --cfg-options \
19 | data.test.samples_per_gpu=8 \
20 |
21 | # Train
22 | torchrun \
23 | --nnodes 1 \
24 | --nproc_per_node $nGPU \
25 | --rdzv_backend c10d \
26 | --rdzv_endpoint localhost:0 \
27 | tools/train.py \
28 | configs/$expName.py \
29 | --work-dir $outDir/$expName/work \
30 | --gpus $nGPU \
31 | --launcher pytorch \
32 |
33 | # Test trained model
34 | # set data.test.samples_per_gpu=1 if runnning timing tests
35 | python tools/test.py \
36 | configs/$expName.py \
37 | $outDir/$expName/work/latest.pth \
38 | --out $outDir/$expName/test/results.pkl \
39 | --eval bbox \
40 | --cfg-options \
41 | data.test.samples_per_gpu=8 \
42 |
--------------------------------------------------------------------------------
/saliency.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tchittesh/lzu/361afb2360011a3b540fdbdd53d8be9eda70ac78/saliency.pkl
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | from setuptools import find_packages, setup
2 |
3 | print(find_packages(exclude=('configs', 'experiments', 'tools')))
4 |
5 | if __name__ == '__main__':
6 | setup(
7 | name='lzu',
8 | packages=find_packages(include=('lzu')),
9 | )
10 |
--------------------------------------------------------------------------------
/teaser.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/tchittesh/lzu/361afb2360011a3b540fdbdd53d8be9eda70ac78/teaser.gif
--------------------------------------------------------------------------------
/tools/test.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import mmcv
4 | import os
5 | import torch
6 | from mmcv import Config, DictAction
7 | from mmcv.parallel import MMDataParallel
8 | from mmcv.runner import (get_dist_info, load_checkpoint,
9 | wrap_fp16_model)
10 |
11 | from mmdet3d.datasets import build_dataloader, build_dataset
12 | from mmdet3d.models import build_model
13 | from mmdet.apis import set_random_seed
14 | from mmdet.datasets import replace_ImageToTensor
15 |
16 | import lzu # noqa: F401, add custom modules to MMCV registry
17 |
18 |
19 | def parse_args():
20 | parser = argparse.ArgumentParser(
21 | description='MMDet test (and eval) a model')
22 | parser.add_argument('config', help='test config file path')
23 | parser.add_argument('checkpoint', help='checkpoint file')
24 | parser.add_argument('--out', help='output result file in pickle format')
25 | parser.add_argument(
26 | '--eval',
27 | type=str,
28 | nargs='+',
29 | help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
30 | ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
31 | parser.add_argument('--seed', type=int, default=0, help='random seed')
32 | parser.add_argument(
33 | '--deterministic',
34 | action='store_true',
35 | help='whether to set deterministic options for CUDNN backend.')
36 | parser.add_argument(
37 | '--cfg-options',
38 | nargs='+',
39 | action=DictAction,
40 | help='override some settings in the used config, the key-value pair '
41 | 'in xxx=yyy format will be merged into config file. If the value to '
42 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
43 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
44 | 'Note that the quotation marks are necessary and that no white space '
45 | 'is allowed.')
46 | args = parser.parse_args()
47 | if 'LOCAL_RANK' not in os.environ:
48 | os.environ['LOCAL_RANK'] = "0"
49 |
50 | return args
51 |
52 |
53 | def single_gpu_test(model, data_loader):
54 | """Test model with single gpu.
55 |
56 | Args:
57 | model (nn.Module): Model to be tested.
58 | data_loader (nn.Dataloader): Pytorch data loader.
59 |
60 | Returns:
61 | list[dict]: The prediction results.
62 | """
63 | model.eval()
64 | results = []
65 | dataset = data_loader.dataset
66 | prog_bar = mmcv.ProgressBar(len(dataset))
67 | for data in data_loader:
68 | with torch.no_grad():
69 | result = model(return_loss=False, rescale=False, **data)
70 | results.extend(result)
71 |
72 | batch_size = len(result)
73 | for _ in range(batch_size):
74 | prog_bar.update()
75 | return results
76 |
77 |
78 | def main():
79 | args = parse_args()
80 |
81 | assert args.out or args.eval, \
82 | ('Please specify at least one operation (save/eval the '
83 | 'results) with the argument "--out" or "--eval"')
84 |
85 | if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
86 | raise ValueError('The output file must be a pkl file.')
87 |
88 | cfg = Config.fromfile(args.config)
89 | if args.cfg_options is not None:
90 | cfg.merge_from_dict(args.cfg_options)
91 | print(cfg.pretty_text)
92 |
93 | # set cudnn_benchmark
94 | if cfg.get('cudnn_benchmark', False):
95 | torch.backends.cudnn.benchmark = True
96 |
97 | cfg.model.pretrained = None
98 | cfg.data.test.test_mode = True
99 | samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)
100 | if samples_per_gpu > 1:
101 | # Replace 'ImageToTensor' to 'DefaultFormatBundle'
102 | cfg.data.test.pipeline = replace_ImageToTensor(
103 | cfg.data.test.pipeline)
104 |
105 | # set random seeds
106 | if args.seed is not None:
107 | set_random_seed(args.seed, deterministic=args.deterministic)
108 |
109 | # build the dataloader
110 | dataset = build_dataset(cfg.data.test)
111 | data_loader = build_dataloader(
112 | dataset,
113 | samples_per_gpu=samples_per_gpu,
114 | workers_per_gpu=cfg.data.workers_per_gpu,
115 | dist=False,
116 | shuffle=False)
117 |
118 | # build the model and load checkpoint
119 | cfg.model.train_cfg = None
120 | model = build_model(cfg.model, test_cfg=cfg.get('test_cfg'))
121 | fp16_cfg = cfg.get('fp16', None)
122 | if fp16_cfg is not None:
123 | wrap_fp16_model(model)
124 | checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
125 | model.CLASSES = checkpoint['meta']['CLASSES']
126 | # palette for visualization in segmentation tasks
127 | if 'PALETTE' in checkpoint.get('meta', {}):
128 | model.PALETTE = checkpoint['meta']['PALETTE']
129 | elif hasattr(dataset, 'PALETTE'):
130 | # segmentation dataset has `PALETTE` attribute
131 | model.PALETTE = dataset.PALETTE
132 |
133 | model = MMDataParallel(model, device_ids=[0])
134 | outputs = single_gpu_test(model, data_loader)
135 |
136 | rank, _ = get_dist_info()
137 | if rank == 0:
138 | if args.out:
139 | print(f'\nwriting results to {args.out}')
140 | mmcv.dump(outputs, args.out)
141 | if args.eval:
142 | eval_kwargs = cfg.get('evaluation', {}).copy()
143 | # hard-code way to remove EvalHook args
144 | for key in [
145 | 'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
146 | 'rule'
147 | ]:
148 | eval_kwargs.pop(key, None)
149 | eval_kwargs.update(dict(metric=args.eval))
150 | print(dataset.evaluate(outputs, **eval_kwargs))
151 |
152 |
153 | if __name__ == '__main__':
154 | main()
155 |
--------------------------------------------------------------------------------
/tools/train.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | from __future__ import division
3 |
4 | import argparse
5 | import copy
6 | from datetime import timedelta
7 | import mmcv
8 | import os
9 | import time
10 | import torch
11 | import warnings
12 | from mmcv import Config, DictAction
13 | from mmcv.runner import get_dist_info, init_dist
14 | from os import path as osp
15 |
16 | from mmdet import __version__ as mmdet_version
17 | from mmdet3d import __version__ as mmdet3d_version
18 | from mmdet3d.apis import train_model
19 | from mmdet3d.datasets import build_dataset
20 | from mmdet3d.models import build_model
21 | from mmdet3d.utils import collect_env, get_root_logger
22 | from mmdet.apis import set_random_seed
23 | from mmseg import __version__ as mmseg_version
24 |
25 | import lzu # noqa: F401, add custom modules to MMCV registry
26 |
27 |
28 | def parse_args():
29 | parser = argparse.ArgumentParser(description='Train a detector')
30 | parser.add_argument('config', help='train config file path')
31 | parser.add_argument('--work-dir', help='the dir to save logs and models')
32 | parser.add_argument(
33 | '--resume-from', help='the checkpoint file to resume from')
34 | parser.add_argument(
35 | '--no-validate',
36 | action='store_true',
37 | help='whether not to evaluate the checkpoint during training')
38 | group_gpus = parser.add_mutually_exclusive_group()
39 | group_gpus.add_argument(
40 | '--gpus',
41 | type=int,
42 | help='number of gpus to use '
43 | '(only applicable to non-distributed training)')
44 | group_gpus.add_argument(
45 | '--gpu-ids',
46 | type=int,
47 | nargs='+',
48 | help='ids of gpus to use '
49 | '(only applicable to non-distributed training)')
50 | parser.add_argument('--seed', type=int, default=0, help='random seed')
51 | parser.add_argument(
52 | '--deterministic',
53 | action='store_true',
54 | help='whether to set deterministic options for CUDNN backend.')
55 | parser.add_argument(
56 | '--options',
57 | nargs='+',
58 | action=DictAction,
59 | help='override some settings in the used config, the key-value pair '
60 | 'in xxx=yyy format will be merged into config file (deprecate), '
61 | 'change to --cfg-options instead.')
62 | parser.add_argument(
63 | '--cfg-options',
64 | nargs='+',
65 | action=DictAction,
66 | help='override some settings in the used config, the key-value pair '
67 | 'in xxx=yyy format will be merged into config file. If the value to '
68 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
69 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
70 | 'Note that the quotation marks are necessary and that no white space '
71 | 'is allowed.')
72 | parser.add_argument(
73 | '--launcher',
74 | choices=['none', 'pytorch', 'slurm', 'mpi'],
75 | default='none',
76 | help='job launcher')
77 | parser.add_argument('--local_rank', type=int, default=0)
78 | parser.add_argument(
79 | '--autoscale-lr',
80 | action='store_true',
81 | help='automatically scale lr with the number of gpus')
82 | args = parser.parse_args()
83 | if 'LOCAL_RANK' not in os.environ:
84 | os.environ['LOCAL_RANK'] = str(args.local_rank)
85 |
86 | if args.options and args.cfg_options:
87 | raise ValueError(
88 | '--options and --cfg-options cannot be both specified, '
89 | '--options is deprecated in favor of --cfg-options')
90 | if args.options:
91 | warnings.warn('--options is deprecated in favor of --cfg-options')
92 | args.cfg_options = args.options
93 |
94 | return args
95 |
96 |
97 | def main():
98 | args = parse_args()
99 |
100 | cfg = Config.fromfile(args.config)
101 | if args.cfg_options is not None:
102 | cfg.merge_from_dict(args.cfg_options)
103 |
104 | # set cudnn_benchmark
105 | if cfg.get('cudnn_benchmark', False):
106 | torch.backends.cudnn.benchmark = True
107 |
108 | # work_dir is determined in this priority: CLI > segment in file > filename
109 | if args.work_dir is not None:
110 | # update configs according to CLI args if args.work_dir is not None
111 | cfg.work_dir = args.work_dir
112 | elif cfg.get('work_dir', None) is None:
113 | # use config filename as default work_dir if cfg.work_dir is None
114 | cfg.work_dir = osp.join('./work_dirs',
115 | osp.splitext(osp.basename(args.config))[0])
116 | if args.resume_from is not None:
117 | cfg.resume_from = args.resume_from
118 | if args.gpu_ids is not None:
119 | cfg.gpu_ids = args.gpu_ids
120 | else:
121 | cfg.gpu_ids = range(1) if args.gpus is None else range(args.gpus)
122 |
123 | if args.autoscale_lr:
124 | # apply the linear scaling rule (https://arxiv.org/abs/1706.02677)
125 | cfg.optimizer['lr'] = cfg.optimizer['lr'] * len(cfg.gpu_ids) / 8
126 |
127 | # init distributed env first, since logger depends on the dist info.
128 | if args.launcher == 'none':
129 | distributed = False
130 | else:
131 | distributed = True
132 | if args.launcher == 'pytorch':
133 | torch.multiprocessing.set_start_method('fork')
134 | init_dist(args.launcher, timeout=timedelta(hours=4), **cfg.dist_params)
135 | # re-set gpu_ids with distributed training mode
136 | _, world_size = get_dist_info()
137 | cfg.gpu_ids = range(world_size)
138 |
139 | # create work_dir
140 | mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
141 | # dump config
142 | cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
143 | # init the logger before other steps
144 | timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
145 | log_file = osp.join(cfg.work_dir, f'{timestamp}.log')
146 | # specify logger name, if we still use 'mmdet', the output info will be
147 | # filtered and won't be saved in the log_file
148 | # TODO: ugly workaround to judge whether we are training det or seg model
149 | if cfg.model.type in ['EncoderDecoder3D']:
150 | logger_name = 'mmseg'
151 | else:
152 | logger_name = 'mmdet'
153 | logger = get_root_logger(
154 | log_file=log_file, log_level=cfg.log_level, name=logger_name)
155 |
156 | # init the meta dict to record some important information such as
157 | # environment info and seed, which will be logged
158 | meta = dict()
159 | # log env info
160 | env_info_dict = collect_env()
161 | env_info = '\n'.join([(f'{k}: {v}') for k, v in env_info_dict.items()])
162 | dash_line = '-' * 60 + '\n'
163 | logger.info('Environment info:\n' + dash_line + env_info + '\n' +
164 | dash_line)
165 | meta['env_info'] = env_info
166 | meta['config'] = cfg.pretty_text
167 |
168 | # log some basic info
169 | logger.info(f'Distributed training: {distributed}')
170 | logger.info(f'Config:\n{cfg.pretty_text}')
171 |
172 | # set random seeds
173 | if args.seed is not None:
174 | logger.info(f'Set random seed to {args.seed}, '
175 | f'deterministic: {args.deterministic}')
176 | set_random_seed(args.seed, deterministic=args.deterministic)
177 | cfg.seed = args.seed
178 | meta['seed'] = args.seed
179 | meta['exp_name'] = osp.basename(args.config)
180 |
181 | model = build_model(
182 | cfg.model,
183 | train_cfg=cfg.get('train_cfg'),
184 | test_cfg=cfg.get('test_cfg'))
185 | model.init_weights()
186 |
187 | logger.info(f'Model:\n{model}')
188 | datasets = [build_dataset(cfg.data.train)]
189 | if len(cfg.workflow) == 2:
190 | val_dataset = copy.deepcopy(cfg.data.val)
191 | # in case we use a dataset wrapper
192 | if 'dataset' in cfg.data.train:
193 | val_dataset.pipeline = cfg.data.train.dataset.pipeline
194 | else:
195 | val_dataset.pipeline = cfg.data.train.pipeline
196 | # set test_mode=False here in deep copied config
197 | # which do not affect AP/AR calculation later
198 | # refer to https://mmdetection3d.readthedocs.io/en/latest/tutorials/customize_runtime.html#customize-workflow # noqa
199 | val_dataset.test_mode = False
200 | datasets.append(build_dataset(val_dataset))
201 | if cfg.checkpoint_config is not None:
202 | # save mmdet version, config file content and class names in
203 | # checkpoints as meta data
204 | cfg.checkpoint_config.meta = dict(
205 | mmdet_version=mmdet_version,
206 | mmseg_version=mmseg_version,
207 | mmdet3d_version=mmdet3d_version,
208 | config=cfg.pretty_text,
209 | CLASSES=datasets[0].CLASSES,
210 | PALETTE=datasets[0].PALETTE # for segmentors
211 | if hasattr(datasets[0], 'PALETTE') else None)
212 | # add an attribute for visualization convenience
213 | model.CLASSES = datasets[0].CLASSES
214 | train_model(
215 | model,
216 | datasets,
217 | cfg,
218 | distributed=distributed,
219 | validate=(not args.no_validate),
220 | timestamp=timestamp,
221 | meta=meta)
222 |
223 |
224 | if __name__ == '__main__':
225 | main()
226 |
--------------------------------------------------------------------------------