├── .DS_Store
├── .gitignore
├── README.md
├── assets
└── OpenInst.png
├── configs
├── _base_
│ ├── datasets
│ │ ├── cityscapes_detection.py
│ │ ├── cityscapes_instance.py
│ │ ├── coco_detection.py
│ │ ├── coco_instance.py
│ │ ├── coco_instance_semantic.py
│ │ ├── coco_panoptic.py
│ │ ├── deepfashion.py
│ │ ├── lvis_v0.5_instance.py
│ │ ├── lvis_v1_instance.py
│ │ ├── openimages_detection.py
│ │ ├── voc0712.py
│ │ └── wider_face.py
│ ├── default_runtime.py
│ ├── models
│ │ ├── cascade_mask_rcnn_r50_fpn.py
│ │ ├── cascade_rcnn_r50_fpn.py
│ │ ├── fast_rcnn_r50_fpn.py
│ │ ├── faster_rcnn_r50_caffe_c4.py
│ │ ├── faster_rcnn_r50_caffe_dc5.py
│ │ ├── faster_rcnn_r50_fpn.py
│ │ ├── mask_rcnn_r50_caffe_c4.py
│ │ ├── mask_rcnn_r50_fpn.py
│ │ ├── retinanet_r50_fpn.py
│ │ ├── rpn_r50_caffe_c4.py
│ │ ├── rpn_r50_fpn.py
│ │ └── ssd300.py
│ └── schedules
│ │ ├── schedule_1x.py
│ │ ├── schedule_20e.py
│ │ └── schedule_2x.py
└── openinst
│ ├── coco_to_uvo_ins.py
│ ├── queryinst_r50_1x_coco.py
│ └── queryinst_r50_3x_lsj_coco.py
├── core
├── __init__.py
├── bbox
│ ├── __init__.py
│ ├── assigners
│ │ ├── __init__.py
│ │ └── hungarian_oln_assigner.py
│ └── match_costs
│ │ ├── __init__.py
│ │ └── objectness_l1_cost.py
└── hook
│ ├── __init__.py
│ └── ema.py
├── datasets
├── __init__.py
├── coco.py
├── coco_split_dataset.py
├── cocoeval_wrappers.py
├── objects365_split_dataset.py
├── pipelines
│ ├── __init__.py
│ └── copypaste.py
└── uvo_dataset.py
├── models
├── __init__.py
├── necks
│ ├── __init__.py
│ └── bifpn.py
└── roi_heads
│ ├── __init__.py
│ ├── bbox_heads
│ ├── __init__.py
│ └── dii_score_head.py
│ ├── mask_heads
│ ├── __init__.py
│ ├── dynamic_mask_head.py
│ ├── maskiou_head.py
│ └── mha_maskiou_head.py
│ └── sparse_score_roi_head.py
└── tools
├── analysis_tools
├── analyze_logs.py
├── analyze_results.py
├── benchmark.py
├── coco_error_analysis.py
├── confusion_matrix.py
├── eval_metric.py
├── get_flops.py
├── optimize_anchors.py
├── robustness_eval.py
└── test_robustness.py
├── dataset_converters
├── cityscapes.py
├── images2coco.py
└── pascal_voc.py
├── deployment
├── mmdet2torchserve.py
├── mmdet_handler.py
├── onnx2tensorrt.py
├── pytorch2onnx.py
├── test.py
└── test_torchserver.py
├── dist_test.sh
├── dist_train.sh
├── misc
├── browse_dataset.py
├── download_dataset.py
├── gen_coco_panoptic_test_info.py
├── get_image_metas.py
└── print_config.py
├── model_converters
├── detectron2pytorch.py
├── publish_model.py
├── regnet2mmdet.py
├── selfsup2mmdet.py
├── upgrade_model_version.py
└── upgrade_ssd_version.py
├── slurm_test.sh
├── slurm_train.sh
├── test.py
└── train.py
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hustvl/OpenInst/ae5a72fc3ab2f686d6760fff0d7846174d55cad6/.DS_Store
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | #
2 | **/*.pyc
3 | **/__pycache__
4 | work_dirs
5 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # OpenInst
2 | > [**OpenInst: A Simple Query-Based Method for Open-World Instance Segmentation**](https://arxiv.org/abs/2303.15859)
3 | >
4 | > Cheng Wang, Guoli Wang, Qian Zhang, Peng Guo, Wenyu Liu, Xinggang Wang
5 | >
6 | > *[arXiv 2303.15859](https://arxiv.org/abs/2303.15859)*
7 |
8 | ## Abstract
9 | Open-world instance segmentation has recently gained significant popularity due to its importance in many real-world applications, such as autonomous driving, robot perception, and remote sensing. However, previous methods have either produced unsatisfactory results or relied on complex systems and paradigms. We wonder if there is a simple way to obtain state-of-the-art results. Fortunately, we have identified two observations that help us achieve the best of both worlds: 1) query-based methods demonstrate superiority over dense proposal-based methods in open-world instance segmentation, and 2) learning localization cues is sufficient for open-world instance segmentation. Based on these observations, we propose a simple query-based method named OpenInst for open-world instance segmentation. OpenInst leverages advanced query-based methods like QueryInst and focuses on learning localization cues. Notably, OpenInst is an extremely simple and straightforward framework without any auxiliary modules or post-processing, yet achieves state-of-the-art results on multiple benchmarks. Specifically, in the COCO->UVO scenario, OpenInst achieves a mask Average Recall (AR) of 53.3, outperforming the previous best methods by 2.0 AR with a simpler structure. We hope that OpenInst can serve as a solid baseline for future research in this area.
10 |
11 |

12 |
13 | ## Cross-dataset instance segmentation performance
14 | ### Results on COCO->UVO
15 | | Method | Epoch | ARbox | AR0.75 | AR0.5 | ARs | ARm | ARl |
16 | |------------|--|------|------|------|------|-------|-------|
17 | | OpenInst | 12 | 59.1 | 48.7 | 72.6 | 51.4 | 26.4 | 44.3 | 60.4 |
18 | | OpenInst | 36 | 63.0 | 53.3 | 76.6 | 56.8 | 31.8 | 49.4 | 64.3 |
19 |
20 |
21 | | Method | Epoch | ARbox | AR0.75 | AR0.5 | ARs | ARm | ARl |
22 | |------------|--|------|------|------|------|-------|-------|
23 | OpenInst-void | 12 | 58.4 | 48.5 | 71.3 | 51.3 | 25.1 | 43.7 | 60.8 |
24 | OpenInst-cls | 12 | 55.7 | 44.9 | 72.1 | 46.7 | 24.0 | 41.8 | 55.2 |
25 | OpenInst-box | 12 | 59.1 | 48.7 | 72.6 | 51.4 | 26.4 | 44.3 | 60.4 |
26 | OpenInst-mask | 12 | 58.1 | 47.8 | 71.2 | 50.6 | 24.4 | 43.3 | 60.1 |
27 | OpenInst-fusion | 12 | 58.6 | 48.1 | 72.0 | 50.8 | 25.6 | 43.9 | 59.8 |
28 |
29 |
30 | ## Training
31 | You need to change the **ann_file** and **img_prefix** in [coco_to_uvo_ins.py](https://github.com/hustvl/OpenInst/blob/main/configs/openinst/coco_to_uvo_ins.py), and then run as follows:
32 | ```
33 | # training on COCO train set, evaluate on UVO val set.
34 | sh tools/dist_train.sh configs/openinst/coco_to_uvo_ins.py 8
35 | ```
36 | ## Testing
37 | ```
38 | sh tools/dist_train.sh configs/openinst/coco_to_uvo_ins.py /path/to/model 8
39 | ```
40 |
41 | ## Citation
42 | If you find OpenInst useful in your research or applications, please consider giving us a star 🌟 and citing it using the following BibTeX entry.
43 | ```bibtex
44 | @article{wang2023openinst,
45 | title={OpenInst: A Simple Query-Based Method for Open-World Instance Segmentation},
46 | author={Cheng Wang and Guoli Wang and Qian Zhang and Peng Guo and Wenyu Liu and Xinggang Wang},
47 | year={2023},
48 | eprint={2303.15859},
49 | archivePrefix={arXiv},
50 | primaryClass={cs.CV}
51 | }
52 | ```
53 |
54 | ## Acknowledgements
55 | A large part of the code is borrowed from [OLN](https://github.com/mcahny/object_localization_network), [QueryInst](https://github.com/hustvl/QueryInst), and [MMDetection](https://github.com/open-mmlab/mmdetection).
56 | Thanks for their great works.
57 |
--------------------------------------------------------------------------------
/assets/OpenInst.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hustvl/OpenInst/ae5a72fc3ab2f686d6760fff0d7846174d55cad6/assets/OpenInst.png
--------------------------------------------------------------------------------
/configs/_base_/datasets/cityscapes_detection.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'CityscapesDataset'
3 | data_root = 'data/cityscapes/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(type='LoadAnnotations', with_bbox=True),
9 | dict(
10 | type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True),
11 | dict(type='RandomFlip', flip_ratio=0.5),
12 | dict(type='Normalize', **img_norm_cfg),
13 | dict(type='Pad', size_divisor=32),
14 | dict(type='DefaultFormatBundle'),
15 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
16 | ]
17 | test_pipeline = [
18 | dict(type='LoadImageFromFile'),
19 | dict(
20 | type='MultiScaleFlipAug',
21 | img_scale=(2048, 1024),
22 | flip=False,
23 | transforms=[
24 | dict(type='Resize', keep_ratio=True),
25 | dict(type='RandomFlip'),
26 | dict(type='Normalize', **img_norm_cfg),
27 | dict(type='Pad', size_divisor=32),
28 | dict(type='ImageToTensor', keys=['img']),
29 | dict(type='Collect', keys=['img']),
30 | ])
31 | ]
32 | data = dict(
33 | samples_per_gpu=1,
34 | workers_per_gpu=2,
35 | train=dict(
36 | type='RepeatDataset',
37 | times=8,
38 | dataset=dict(
39 | type=dataset_type,
40 | ann_file=data_root +
41 | 'annotations/instancesonly_filtered_gtFine_train.json',
42 | img_prefix=data_root + 'leftImg8bit/train/',
43 | pipeline=train_pipeline)),
44 | val=dict(
45 | type=dataset_type,
46 | ann_file=data_root +
47 | 'annotations/instancesonly_filtered_gtFine_val.json',
48 | img_prefix=data_root + 'leftImg8bit/val/',
49 | pipeline=test_pipeline),
50 | test=dict(
51 | type=dataset_type,
52 | ann_file=data_root +
53 | 'annotations/instancesonly_filtered_gtFine_test.json',
54 | img_prefix=data_root + 'leftImg8bit/test/',
55 | pipeline=test_pipeline))
56 | evaluation = dict(interval=1, metric='bbox')
57 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/cityscapes_instance.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'CityscapesDataset'
3 | data_root = 'data/cityscapes/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
9 | dict(
10 | type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True),
11 | dict(type='RandomFlip', flip_ratio=0.5),
12 | dict(type='Normalize', **img_norm_cfg),
13 | dict(type='Pad', size_divisor=32),
14 | dict(type='DefaultFormatBundle'),
15 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
16 | ]
17 | test_pipeline = [
18 | dict(type='LoadImageFromFile'),
19 | dict(
20 | type='MultiScaleFlipAug',
21 | img_scale=(2048, 1024),
22 | flip=False,
23 | transforms=[
24 | dict(type='Resize', keep_ratio=True),
25 | dict(type='RandomFlip'),
26 | dict(type='Normalize', **img_norm_cfg),
27 | dict(type='Pad', size_divisor=32),
28 | dict(type='ImageToTensor', keys=['img']),
29 | dict(type='Collect', keys=['img']),
30 | ])
31 | ]
32 | data = dict(
33 | samples_per_gpu=1,
34 | workers_per_gpu=2,
35 | train=dict(
36 | type='RepeatDataset',
37 | times=8,
38 | dataset=dict(
39 | type=dataset_type,
40 | ann_file=data_root +
41 | 'annotations/instancesonly_filtered_gtFine_train.json',
42 | img_prefix=data_root + 'leftImg8bit/train/',
43 | pipeline=train_pipeline)),
44 | val=dict(
45 | type=dataset_type,
46 | ann_file=data_root +
47 | 'annotations/instancesonly_filtered_gtFine_val.json',
48 | img_prefix=data_root + 'leftImg8bit/val/',
49 | pipeline=test_pipeline),
50 | test=dict(
51 | type=dataset_type,
52 | ann_file=data_root +
53 | 'annotations/instancesonly_filtered_gtFine_test.json',
54 | img_prefix=data_root + 'leftImg8bit/test/',
55 | pipeline=test_pipeline))
56 | evaluation = dict(metric=['bbox', 'segm'])
57 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/coco_detection.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'CocoDataset'
3 | data_root = 'data/coco/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(type='LoadAnnotations', with_bbox=True),
9 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
10 | dict(type='RandomFlip', flip_ratio=0.5),
11 | dict(type='Normalize', **img_norm_cfg),
12 | dict(type='Pad', size_divisor=32),
13 | dict(type='DefaultFormatBundle'),
14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
15 | ]
16 | test_pipeline = [
17 | dict(type='LoadImageFromFile'),
18 | dict(
19 | type='MultiScaleFlipAug',
20 | img_scale=(1333, 800),
21 | flip=False,
22 | transforms=[
23 | dict(type='Resize', keep_ratio=True),
24 | dict(type='RandomFlip'),
25 | dict(type='Normalize', **img_norm_cfg),
26 | dict(type='Pad', size_divisor=32),
27 | dict(type='ImageToTensor', keys=['img']),
28 | dict(type='Collect', keys=['img']),
29 | ])
30 | ]
31 | data = dict(
32 | samples_per_gpu=2,
33 | workers_per_gpu=2,
34 | train=dict(
35 | type=dataset_type,
36 | ann_file=data_root + 'annotations/instances_train2017.json',
37 | img_prefix=data_root + 'train2017/',
38 | pipeline=train_pipeline),
39 | val=dict(
40 | type=dataset_type,
41 | ann_file=data_root + 'annotations/instances_val2017.json',
42 | img_prefix=data_root + 'val2017/',
43 | pipeline=test_pipeline),
44 | test=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_val2017.json',
47 | img_prefix=data_root + 'val2017/',
48 | pipeline=test_pipeline))
49 | evaluation = dict(interval=1, metric='bbox')
50 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/coco_instance.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'CocoDataset'
3 | data_root = 'data/coco/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
9 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
10 | dict(type='RandomFlip', flip_ratio=0.5),
11 | dict(type='Normalize', **img_norm_cfg),
12 | dict(type='Pad', size_divisor=32),
13 | dict(type='DefaultFormatBundle'),
14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
15 | ]
16 | test_pipeline = [
17 | dict(type='LoadImageFromFile'),
18 | dict(
19 | type='MultiScaleFlipAug',
20 | img_scale=(1333, 800),
21 | flip=False,
22 | transforms=[
23 | dict(type='Resize', keep_ratio=True),
24 | dict(type='RandomFlip'),
25 | dict(type='Normalize', **img_norm_cfg),
26 | dict(type='Pad', size_divisor=32),
27 | dict(type='ImageToTensor', keys=['img']),
28 | dict(type='Collect', keys=['img']),
29 | ])
30 | ]
31 | data = dict(
32 | samples_per_gpu=2,
33 | workers_per_gpu=2,
34 | train=dict(
35 | type=dataset_type,
36 | ann_file=data_root + 'annotations/instances_train2017.json',
37 | img_prefix=data_root + 'train2017/',
38 | pipeline=train_pipeline),
39 | val=dict(
40 | type=dataset_type,
41 | ann_file=data_root + 'annotations/instances_val2017.json',
42 | img_prefix=data_root + 'val2017/',
43 | pipeline=test_pipeline),
44 | test=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_val2017.json',
47 | img_prefix=data_root + 'val2017/',
48 | pipeline=test_pipeline))
49 | evaluation = dict(metric=['bbox', 'segm'])
50 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/coco_instance_semantic.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'CocoDataset'
3 | data_root = 'data/coco/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(
9 | type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True),
10 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
11 | dict(type='RandomFlip', flip_ratio=0.5),
12 | dict(type='Normalize', **img_norm_cfg),
13 | dict(type='Pad', size_divisor=32),
14 | dict(type='SegRescale', scale_factor=1 / 8),
15 | dict(type='DefaultFormatBundle'),
16 | dict(
17 | type='Collect',
18 | keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
19 | ]
20 | test_pipeline = [
21 | dict(type='LoadImageFromFile'),
22 | dict(
23 | type='MultiScaleFlipAug',
24 | img_scale=(1333, 800),
25 | flip=False,
26 | transforms=[
27 | dict(type='Resize', keep_ratio=True),
28 | dict(type='RandomFlip', flip_ratio=0.5),
29 | dict(type='Normalize', **img_norm_cfg),
30 | dict(type='Pad', size_divisor=32),
31 | dict(type='ImageToTensor', keys=['img']),
32 | dict(type='Collect', keys=['img']),
33 | ])
34 | ]
35 | data = dict(
36 | samples_per_gpu=2,
37 | workers_per_gpu=2,
38 | train=dict(
39 | type=dataset_type,
40 | ann_file=data_root + 'annotations/instances_train2017.json',
41 | img_prefix=data_root + 'train2017/',
42 | seg_prefix=data_root + 'stuffthingmaps/train2017/',
43 | pipeline=train_pipeline),
44 | val=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_val2017.json',
47 | img_prefix=data_root + 'val2017/',
48 | pipeline=test_pipeline),
49 | test=dict(
50 | type=dataset_type,
51 | ann_file=data_root + 'annotations/instances_val2017.json',
52 | img_prefix=data_root + 'val2017/',
53 | pipeline=test_pipeline))
54 | evaluation = dict(metric=['bbox', 'segm'])
55 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/coco_panoptic.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'CocoPanopticDataset'
3 | data_root = 'data/coco/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(
9 | type='LoadPanopticAnnotations',
10 | with_bbox=True,
11 | with_mask=True,
12 | with_seg=True),
13 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
14 | dict(type='RandomFlip', flip_ratio=0.5),
15 | dict(type='Normalize', **img_norm_cfg),
16 | dict(type='Pad', size_divisor=32),
17 | dict(type='SegRescale', scale_factor=1 / 4),
18 | dict(type='DefaultFormatBundle'),
19 | dict(
20 | type='Collect',
21 | keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
22 | ]
23 | test_pipeline = [
24 | dict(type='LoadImageFromFile'),
25 | dict(
26 | type='MultiScaleFlipAug',
27 | img_scale=(1333, 800),
28 | flip=False,
29 | transforms=[
30 | dict(type='Resize', keep_ratio=True),
31 | dict(type='RandomFlip'),
32 | dict(type='Normalize', **img_norm_cfg),
33 | dict(type='Pad', size_divisor=32),
34 | dict(type='ImageToTensor', keys=['img']),
35 | dict(type='Collect', keys=['img']),
36 | ])
37 | ]
38 | data = dict(
39 | samples_per_gpu=2,
40 | workers_per_gpu=2,
41 | train=dict(
42 | type=dataset_type,
43 | ann_file=data_root + 'annotations/panoptic_train2017.json',
44 | img_prefix=data_root + 'train2017/',
45 | seg_prefix=data_root + 'annotations/panoptic_train2017/',
46 | pipeline=train_pipeline),
47 | val=dict(
48 | type=dataset_type,
49 | ann_file=data_root + 'annotations/panoptic_val2017.json',
50 | img_prefix=data_root + 'val2017/',
51 | seg_prefix=data_root + 'annotations/panoptic_val2017/',
52 | pipeline=test_pipeline),
53 | test=dict(
54 | type=dataset_type,
55 | ann_file=data_root + 'annotations/panoptic_val2017.json',
56 | img_prefix=data_root + 'val2017/',
57 | seg_prefix=data_root + 'annotations/panoptic_val2017/',
58 | pipeline=test_pipeline))
59 | evaluation = dict(interval=1, metric=['PQ'])
60 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/deepfashion.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'DeepFashionDataset'
3 | data_root = 'data/DeepFashion/In-shop/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
9 | dict(type='Resize', img_scale=(750, 1101), keep_ratio=True),
10 | dict(type='RandomFlip', flip_ratio=0.5),
11 | dict(type='Normalize', **img_norm_cfg),
12 | dict(type='Pad', size_divisor=32),
13 | dict(type='DefaultFormatBundle'),
14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
15 | ]
16 | test_pipeline = [
17 | dict(type='LoadImageFromFile'),
18 | dict(
19 | type='MultiScaleFlipAug',
20 | img_scale=(750, 1101),
21 | flip=False,
22 | transforms=[
23 | dict(type='Resize', keep_ratio=True),
24 | dict(type='RandomFlip'),
25 | dict(type='Normalize', **img_norm_cfg),
26 | dict(type='Pad', size_divisor=32),
27 | dict(type='ImageToTensor', keys=['img']),
28 | dict(type='Collect', keys=['img']),
29 | ])
30 | ]
31 | data = dict(
32 | imgs_per_gpu=2,
33 | workers_per_gpu=1,
34 | train=dict(
35 | type=dataset_type,
36 | ann_file=data_root + 'annotations/DeepFashion_segmentation_query.json',
37 | img_prefix=data_root + 'Img/',
38 | pipeline=train_pipeline,
39 | data_root=data_root),
40 | val=dict(
41 | type=dataset_type,
42 | ann_file=data_root + 'annotations/DeepFashion_segmentation_query.json',
43 | img_prefix=data_root + 'Img/',
44 | pipeline=test_pipeline,
45 | data_root=data_root),
46 | test=dict(
47 | type=dataset_type,
48 | ann_file=data_root +
49 | 'annotations/DeepFashion_segmentation_gallery.json',
50 | img_prefix=data_root + 'Img/',
51 | pipeline=test_pipeline,
52 | data_root=data_root))
53 | evaluation = dict(interval=5, metric=['bbox', 'segm'])
54 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/lvis_v0.5_instance.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | _base_ = 'coco_instance.py'
3 | dataset_type = 'LVISV05Dataset'
4 | data_root = 'data/lvis_v0.5/'
5 | data = dict(
6 | samples_per_gpu=2,
7 | workers_per_gpu=2,
8 | train=dict(
9 | _delete_=True,
10 | type='ClassBalancedDataset',
11 | oversample_thr=1e-3,
12 | dataset=dict(
13 | type=dataset_type,
14 | ann_file=data_root + 'annotations/lvis_v0.5_train.json',
15 | img_prefix=data_root + 'train2017/')),
16 | val=dict(
17 | type=dataset_type,
18 | ann_file=data_root + 'annotations/lvis_v0.5_val.json',
19 | img_prefix=data_root + 'val2017/'),
20 | test=dict(
21 | type=dataset_type,
22 | ann_file=data_root + 'annotations/lvis_v0.5_val.json',
23 | img_prefix=data_root + 'val2017/'))
24 | evaluation = dict(metric=['bbox', 'segm'])
25 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/lvis_v1_instance.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | _base_ = 'coco_instance.py'
3 | dataset_type = 'LVISV1Dataset'
4 | data_root = 'data/lvis_v1/'
5 | data = dict(
6 | samples_per_gpu=2,
7 | workers_per_gpu=2,
8 | train=dict(
9 | _delete_=True,
10 | type='ClassBalancedDataset',
11 | oversample_thr=1e-3,
12 | dataset=dict(
13 | type=dataset_type,
14 | ann_file=data_root + 'annotations/lvis_v1_train.json',
15 | img_prefix=data_root)),
16 | val=dict(
17 | type=dataset_type,
18 | ann_file=data_root + 'annotations/lvis_v1_val.json',
19 | img_prefix=data_root),
20 | test=dict(
21 | type=dataset_type,
22 | ann_file=data_root + 'annotations/lvis_v1_val.json',
23 | img_prefix=data_root))
24 | evaluation = dict(metric=['bbox', 'segm'])
25 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/openimages_detection.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'OpenImagesDataset'
3 | data_root = 'data/OpenImages/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(type='LoadAnnotations', with_bbox=True, denorm_bbox=True),
9 | dict(type='Resize', img_scale=(1024, 800), keep_ratio=True),
10 | dict(type='RandomFlip', flip_ratio=0.5),
11 | dict(type='Normalize', **img_norm_cfg),
12 | dict(type='Pad', size_divisor=32),
13 | dict(type='DefaultFormatBundle'),
14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
15 | ]
16 | test_pipeline = [
17 | dict(type='LoadImageFromFile'),
18 | dict(
19 | type='MultiScaleFlipAug',
20 | img_scale=(1024, 800),
21 | flip=False,
22 | transforms=[
23 | dict(type='Resize', keep_ratio=True),
24 | dict(type='RandomFlip'),
25 | dict(type='Normalize', **img_norm_cfg),
26 | dict(type='Pad', size_divisor=32),
27 | dict(type='ImageToTensor', keys=['img']),
28 | dict(type='Collect', keys=['img']),
29 | ],
30 | ),
31 | ]
32 | data = dict(
33 | samples_per_gpu=2,
34 | workers_per_gpu=0, # workers_per_gpu > 0 may occur out of memory
35 | train=dict(
36 | type=dataset_type,
37 | ann_file=data_root + 'annotations/oidv6-train-annotations-bbox.csv',
38 | img_prefix=data_root + 'OpenImages/train/',
39 | label_file=data_root + 'annotations/class-descriptions-boxable.csv',
40 | hierarchy_file=data_root +
41 | 'annotations/bbox_labels_600_hierarchy.json',
42 | pipeline=train_pipeline),
43 | val=dict(
44 | type=dataset_type,
45 | ann_file=data_root + 'annotations/validation-annotations-bbox.csv',
46 | img_prefix=data_root + 'OpenImages/validation/',
47 | label_file=data_root + 'annotations/class-descriptions-boxable.csv',
48 | hierarchy_file=data_root +
49 | 'annotations/bbox_labels_600_hierarchy.json',
50 | meta_file=data_root + 'annotations/validation-image-metas.pkl',
51 | image_level_ann_file=data_root +
52 | 'annotations/validation-annotations-human-imagelabels-boxable.csv',
53 | pipeline=test_pipeline),
54 | test=dict(
55 | type=dataset_type,
56 | ann_file=data_root + 'annotations/validation-annotations-bbox.csv',
57 | img_prefix=data_root + 'OpenImages/validation/',
58 | label_file=data_root + 'annotations/class-descriptions-boxable.csv',
59 | hierarchy_file=data_root +
60 | 'annotations/bbox_labels_600_hierarchy.json',
61 | meta_file=data_root + 'annotations/validation-image-metas.pkl',
62 | image_level_ann_file=data_root +
63 | 'annotations/validation-annotations-human-imagelabels-boxable.csv',
64 | pipeline=test_pipeline))
65 | evaluation = dict(interval=1, metric='mAP')
66 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/voc0712.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'VOCDataset'
3 | data_root = 'data/VOCdevkit/'
4 | img_norm_cfg = dict(
5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
6 | train_pipeline = [
7 | dict(type='LoadImageFromFile'),
8 | dict(type='LoadAnnotations', with_bbox=True),
9 | dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
10 | dict(type='RandomFlip', flip_ratio=0.5),
11 | dict(type='Normalize', **img_norm_cfg),
12 | dict(type='Pad', size_divisor=32),
13 | dict(type='DefaultFormatBundle'),
14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
15 | ]
16 | test_pipeline = [
17 | dict(type='LoadImageFromFile'),
18 | dict(
19 | type='MultiScaleFlipAug',
20 | img_scale=(1000, 600),
21 | flip=False,
22 | transforms=[
23 | dict(type='Resize', keep_ratio=True),
24 | dict(type='RandomFlip'),
25 | dict(type='Normalize', **img_norm_cfg),
26 | dict(type='Pad', size_divisor=32),
27 | dict(type='ImageToTensor', keys=['img']),
28 | dict(type='Collect', keys=['img']),
29 | ])
30 | ]
31 | data = dict(
32 | samples_per_gpu=2,
33 | workers_per_gpu=2,
34 | train=dict(
35 | type='RepeatDataset',
36 | times=3,
37 | dataset=dict(
38 | type=dataset_type,
39 | ann_file=[
40 | data_root + 'VOC2007/ImageSets/Main/trainval.txt',
41 | data_root + 'VOC2012/ImageSets/Main/trainval.txt'
42 | ],
43 | img_prefix=[data_root + 'VOC2007/', data_root + 'VOC2012/'],
44 | pipeline=train_pipeline)),
45 | val=dict(
46 | type=dataset_type,
47 | ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
48 | img_prefix=data_root + 'VOC2007/',
49 | pipeline=test_pipeline),
50 | test=dict(
51 | type=dataset_type,
52 | ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
53 | img_prefix=data_root + 'VOC2007/',
54 | pipeline=test_pipeline))
55 | evaluation = dict(interval=1, metric='mAP')
56 |
--------------------------------------------------------------------------------
/configs/_base_/datasets/wider_face.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | dataset_type = 'WIDERFaceDataset'
3 | data_root = 'data/WIDERFace/'
4 | img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True)
5 | train_pipeline = [
6 | dict(type='LoadImageFromFile', to_float32=True),
7 | dict(type='LoadAnnotations', with_bbox=True),
8 | dict(
9 | type='PhotoMetricDistortion',
10 | brightness_delta=32,
11 | contrast_range=(0.5, 1.5),
12 | saturation_range=(0.5, 1.5),
13 | hue_delta=18),
14 | dict(
15 | type='Expand',
16 | mean=img_norm_cfg['mean'],
17 | to_rgb=img_norm_cfg['to_rgb'],
18 | ratio_range=(1, 4)),
19 | dict(
20 | type='MinIoURandomCrop',
21 | min_ious=(0.1, 0.3, 0.5, 0.7, 0.9),
22 | min_crop_size=0.3),
23 | dict(type='Resize', img_scale=(300, 300), keep_ratio=False),
24 | dict(type='Normalize', **img_norm_cfg),
25 | dict(type='RandomFlip', flip_ratio=0.5),
26 | dict(type='DefaultFormatBundle'),
27 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
28 | ]
29 | test_pipeline = [
30 | dict(type='LoadImageFromFile'),
31 | dict(
32 | type='MultiScaleFlipAug',
33 | img_scale=(300, 300),
34 | flip=False,
35 | transforms=[
36 | dict(type='Resize', keep_ratio=False),
37 | dict(type='Normalize', **img_norm_cfg),
38 | dict(type='ImageToTensor', keys=['img']),
39 | dict(type='Collect', keys=['img']),
40 | ])
41 | ]
42 | data = dict(
43 | samples_per_gpu=60,
44 | workers_per_gpu=2,
45 | train=dict(
46 | type='RepeatDataset',
47 | times=2,
48 | dataset=dict(
49 | type=dataset_type,
50 | ann_file=data_root + 'train.txt',
51 | img_prefix=data_root + 'WIDER_train/',
52 | min_size=17,
53 | pipeline=train_pipeline)),
54 | val=dict(
55 | type=dataset_type,
56 | ann_file=data_root + 'val.txt',
57 | img_prefix=data_root + 'WIDER_val/',
58 | pipeline=test_pipeline),
59 | test=dict(
60 | type=dataset_type,
61 | ann_file=data_root + 'val.txt',
62 | img_prefix=data_root + 'WIDER_val/',
63 | pipeline=test_pipeline))
64 |
--------------------------------------------------------------------------------
/configs/_base_/default_runtime.py:
--------------------------------------------------------------------------------
1 | checkpoint_config = dict(interval=1)
2 | # yapf:disable
3 | log_config = dict(
4 | interval=50,
5 | hooks=[
6 | dict(type='TextLoggerHook'),
7 | # dict(type='TensorboardLoggerHook')
8 | ])
9 | # yapf:enable
10 | custom_hooks = [dict(type='NumClassCheckHook')]
11 |
12 | dist_params = dict(backend='nccl')
13 | log_level = 'INFO'
14 | load_from = None
15 | resume_from = None
16 | workflow = [('train', 1)]
17 |
18 | # disable opencv multithreading to avoid system being overloaded
19 | opencv_num_threads = 0
20 | # set multi-process start method as `fork` to speed up the training
21 | mp_start_method = 'fork'
22 |
--------------------------------------------------------------------------------
/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | model = dict(
3 | type='CascadeRCNN',
4 | backbone=dict(
5 | type='ResNet',
6 | depth=50,
7 | num_stages=4,
8 | out_indices=(0, 1, 2, 3),
9 | frozen_stages=1,
10 | norm_cfg=dict(type='BN', requires_grad=True),
11 | norm_eval=True,
12 | style='pytorch',
13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 | neck=dict(
15 | type='FPN',
16 | in_channels=[256, 512, 1024, 2048],
17 | out_channels=256,
18 | num_outs=5),
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=256,
22 | feat_channels=256,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[8],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[4, 8, 16, 32, 64]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
35 | roi_head=dict(
36 | type='CascadeRoIHead',
37 | num_stages=3,
38 | stage_loss_weights=[1, 0.5, 0.25],
39 | bbox_roi_extractor=dict(
40 | type='SingleRoIExtractor',
41 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
42 | out_channels=256,
43 | featmap_strides=[4, 8, 16, 32]),
44 | bbox_head=[
45 | dict(
46 | type='Shared2FCBBoxHead',
47 | in_channels=256,
48 | fc_out_channels=1024,
49 | roi_feat_size=7,
50 | num_classes=80,
51 | bbox_coder=dict(
52 | type='DeltaXYWHBBoxCoder',
53 | target_means=[0., 0., 0., 0.],
54 | target_stds=[0.1, 0.1, 0.2, 0.2]),
55 | reg_class_agnostic=True,
56 | loss_cls=dict(
57 | type='CrossEntropyLoss',
58 | use_sigmoid=False,
59 | loss_weight=1.0),
60 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
61 | loss_weight=1.0)),
62 | dict(
63 | type='Shared2FCBBoxHead',
64 | in_channels=256,
65 | fc_out_channels=1024,
66 | roi_feat_size=7,
67 | num_classes=80,
68 | bbox_coder=dict(
69 | type='DeltaXYWHBBoxCoder',
70 | target_means=[0., 0., 0., 0.],
71 | target_stds=[0.05, 0.05, 0.1, 0.1]),
72 | reg_class_agnostic=True,
73 | loss_cls=dict(
74 | type='CrossEntropyLoss',
75 | use_sigmoid=False,
76 | loss_weight=1.0),
77 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
78 | loss_weight=1.0)),
79 | dict(
80 | type='Shared2FCBBoxHead',
81 | in_channels=256,
82 | fc_out_channels=1024,
83 | roi_feat_size=7,
84 | num_classes=80,
85 | bbox_coder=dict(
86 | type='DeltaXYWHBBoxCoder',
87 | target_means=[0., 0., 0., 0.],
88 | target_stds=[0.033, 0.033, 0.067, 0.067]),
89 | reg_class_agnostic=True,
90 | loss_cls=dict(
91 | type='CrossEntropyLoss',
92 | use_sigmoid=False,
93 | loss_weight=1.0),
94 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
95 | ],
96 | mask_roi_extractor=dict(
97 | type='SingleRoIExtractor',
98 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
99 | out_channels=256,
100 | featmap_strides=[4, 8, 16, 32]),
101 | mask_head=dict(
102 | type='FCNMaskHead',
103 | num_convs=4,
104 | in_channels=256,
105 | conv_out_channels=256,
106 | num_classes=80,
107 | loss_mask=dict(
108 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
109 | # model training and testing settings
110 | train_cfg=dict(
111 | rpn=dict(
112 | assigner=dict(
113 | type='MaxIoUAssigner',
114 | pos_iou_thr=0.7,
115 | neg_iou_thr=0.3,
116 | min_pos_iou=0.3,
117 | match_low_quality=True,
118 | ignore_iof_thr=-1),
119 | sampler=dict(
120 | type='RandomSampler',
121 | num=256,
122 | pos_fraction=0.5,
123 | neg_pos_ub=-1,
124 | add_gt_as_proposals=False),
125 | allowed_border=0,
126 | pos_weight=-1,
127 | debug=False),
128 | rpn_proposal=dict(
129 | nms_pre=2000,
130 | max_per_img=2000,
131 | nms=dict(type='nms', iou_threshold=0.7),
132 | min_bbox_size=0),
133 | rcnn=[
134 | dict(
135 | assigner=dict(
136 | type='MaxIoUAssigner',
137 | pos_iou_thr=0.5,
138 | neg_iou_thr=0.5,
139 | min_pos_iou=0.5,
140 | match_low_quality=False,
141 | ignore_iof_thr=-1),
142 | sampler=dict(
143 | type='RandomSampler',
144 | num=512,
145 | pos_fraction=0.25,
146 | neg_pos_ub=-1,
147 | add_gt_as_proposals=True),
148 | mask_size=28,
149 | pos_weight=-1,
150 | debug=False),
151 | dict(
152 | assigner=dict(
153 | type='MaxIoUAssigner',
154 | pos_iou_thr=0.6,
155 | neg_iou_thr=0.6,
156 | min_pos_iou=0.6,
157 | match_low_quality=False,
158 | ignore_iof_thr=-1),
159 | sampler=dict(
160 | type='RandomSampler',
161 | num=512,
162 | pos_fraction=0.25,
163 | neg_pos_ub=-1,
164 | add_gt_as_proposals=True),
165 | mask_size=28,
166 | pos_weight=-1,
167 | debug=False),
168 | dict(
169 | assigner=dict(
170 | type='MaxIoUAssigner',
171 | pos_iou_thr=0.7,
172 | neg_iou_thr=0.7,
173 | min_pos_iou=0.7,
174 | match_low_quality=False,
175 | ignore_iof_thr=-1),
176 | sampler=dict(
177 | type='RandomSampler',
178 | num=512,
179 | pos_fraction=0.25,
180 | neg_pos_ub=-1,
181 | add_gt_as_proposals=True),
182 | mask_size=28,
183 | pos_weight=-1,
184 | debug=False)
185 | ]),
186 | test_cfg=dict(
187 | rpn=dict(
188 | nms_pre=1000,
189 | max_per_img=1000,
190 | nms=dict(type='nms', iou_threshold=0.7),
191 | min_bbox_size=0),
192 | rcnn=dict(
193 | score_thr=0.05,
194 | nms=dict(type='nms', iou_threshold=0.5),
195 | max_per_img=100,
196 | mask_thr_binary=0.5)))
197 |
--------------------------------------------------------------------------------
/configs/_base_/models/cascade_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | model = dict(
3 | type='CascadeRCNN',
4 | backbone=dict(
5 | type='ResNet',
6 | depth=50,
7 | num_stages=4,
8 | out_indices=(0, 1, 2, 3),
9 | frozen_stages=1,
10 | norm_cfg=dict(type='BN', requires_grad=True),
11 | norm_eval=True,
12 | style='pytorch',
13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 | neck=dict(
15 | type='FPN',
16 | in_channels=[256, 512, 1024, 2048],
17 | out_channels=256,
18 | num_outs=5),
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=256,
22 | feat_channels=256,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[8],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[4, 8, 16, 32, 64]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
35 | roi_head=dict(
36 | type='CascadeRoIHead',
37 | num_stages=3,
38 | stage_loss_weights=[1, 0.5, 0.25],
39 | bbox_roi_extractor=dict(
40 | type='SingleRoIExtractor',
41 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
42 | out_channels=256,
43 | featmap_strides=[4, 8, 16, 32]),
44 | bbox_head=[
45 | dict(
46 | type='Shared2FCBBoxHead',
47 | in_channels=256,
48 | fc_out_channels=1024,
49 | roi_feat_size=7,
50 | num_classes=80,
51 | bbox_coder=dict(
52 | type='DeltaXYWHBBoxCoder',
53 | target_means=[0., 0., 0., 0.],
54 | target_stds=[0.1, 0.1, 0.2, 0.2]),
55 | reg_class_agnostic=True,
56 | loss_cls=dict(
57 | type='CrossEntropyLoss',
58 | use_sigmoid=False,
59 | loss_weight=1.0),
60 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
61 | loss_weight=1.0)),
62 | dict(
63 | type='Shared2FCBBoxHead',
64 | in_channels=256,
65 | fc_out_channels=1024,
66 | roi_feat_size=7,
67 | num_classes=80,
68 | bbox_coder=dict(
69 | type='DeltaXYWHBBoxCoder',
70 | target_means=[0., 0., 0., 0.],
71 | target_stds=[0.05, 0.05, 0.1, 0.1]),
72 | reg_class_agnostic=True,
73 | loss_cls=dict(
74 | type='CrossEntropyLoss',
75 | use_sigmoid=False,
76 | loss_weight=1.0),
77 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
78 | loss_weight=1.0)),
79 | dict(
80 | type='Shared2FCBBoxHead',
81 | in_channels=256,
82 | fc_out_channels=1024,
83 | roi_feat_size=7,
84 | num_classes=80,
85 | bbox_coder=dict(
86 | type='DeltaXYWHBBoxCoder',
87 | target_means=[0., 0., 0., 0.],
88 | target_stds=[0.033, 0.033, 0.067, 0.067]),
89 | reg_class_agnostic=True,
90 | loss_cls=dict(
91 | type='CrossEntropyLoss',
92 | use_sigmoid=False,
93 | loss_weight=1.0),
94 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
95 | ]),
96 | # model training and testing settings
97 | train_cfg=dict(
98 | rpn=dict(
99 | assigner=dict(
100 | type='MaxIoUAssigner',
101 | pos_iou_thr=0.7,
102 | neg_iou_thr=0.3,
103 | min_pos_iou=0.3,
104 | match_low_quality=True,
105 | ignore_iof_thr=-1),
106 | sampler=dict(
107 | type='RandomSampler',
108 | num=256,
109 | pos_fraction=0.5,
110 | neg_pos_ub=-1,
111 | add_gt_as_proposals=False),
112 | allowed_border=0,
113 | pos_weight=-1,
114 | debug=False),
115 | rpn_proposal=dict(
116 | nms_pre=2000,
117 | max_per_img=2000,
118 | nms=dict(type='nms', iou_threshold=0.7),
119 | min_bbox_size=0),
120 | rcnn=[
121 | dict(
122 | assigner=dict(
123 | type='MaxIoUAssigner',
124 | pos_iou_thr=0.5,
125 | neg_iou_thr=0.5,
126 | min_pos_iou=0.5,
127 | match_low_quality=False,
128 | ignore_iof_thr=-1),
129 | sampler=dict(
130 | type='RandomSampler',
131 | num=512,
132 | pos_fraction=0.25,
133 | neg_pos_ub=-1,
134 | add_gt_as_proposals=True),
135 | pos_weight=-1,
136 | debug=False),
137 | dict(
138 | assigner=dict(
139 | type='MaxIoUAssigner',
140 | pos_iou_thr=0.6,
141 | neg_iou_thr=0.6,
142 | min_pos_iou=0.6,
143 | match_low_quality=False,
144 | ignore_iof_thr=-1),
145 | sampler=dict(
146 | type='RandomSampler',
147 | num=512,
148 | pos_fraction=0.25,
149 | neg_pos_ub=-1,
150 | add_gt_as_proposals=True),
151 | pos_weight=-1,
152 | debug=False),
153 | dict(
154 | assigner=dict(
155 | type='MaxIoUAssigner',
156 | pos_iou_thr=0.7,
157 | neg_iou_thr=0.7,
158 | min_pos_iou=0.7,
159 | match_low_quality=False,
160 | ignore_iof_thr=-1),
161 | sampler=dict(
162 | type='RandomSampler',
163 | num=512,
164 | pos_fraction=0.25,
165 | neg_pos_ub=-1,
166 | add_gt_as_proposals=True),
167 | pos_weight=-1,
168 | debug=False)
169 | ]),
170 | test_cfg=dict(
171 | rpn=dict(
172 | nms_pre=1000,
173 | max_per_img=1000,
174 | nms=dict(type='nms', iou_threshold=0.7),
175 | min_bbox_size=0),
176 | rcnn=dict(
177 | score_thr=0.05,
178 | nms=dict(type='nms', iou_threshold=0.5),
179 | max_per_img=100)))
180 |
--------------------------------------------------------------------------------
/configs/_base_/models/fast_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | model = dict(
3 | type='FastRCNN',
4 | backbone=dict(
5 | type='ResNet',
6 | depth=50,
7 | num_stages=4,
8 | out_indices=(0, 1, 2, 3),
9 | frozen_stages=1,
10 | norm_cfg=dict(type='BN', requires_grad=True),
11 | norm_eval=True,
12 | style='pytorch',
13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 | neck=dict(
15 | type='FPN',
16 | in_channels=[256, 512, 1024, 2048],
17 | out_channels=256,
18 | num_outs=5),
19 | roi_head=dict(
20 | type='StandardRoIHead',
21 | bbox_roi_extractor=dict(
22 | type='SingleRoIExtractor',
23 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
24 | out_channels=256,
25 | featmap_strides=[4, 8, 16, 32]),
26 | bbox_head=dict(
27 | type='Shared2FCBBoxHead',
28 | in_channels=256,
29 | fc_out_channels=1024,
30 | roi_feat_size=7,
31 | num_classes=80,
32 | bbox_coder=dict(
33 | type='DeltaXYWHBBoxCoder',
34 | target_means=[0., 0., 0., 0.],
35 | target_stds=[0.1, 0.1, 0.2, 0.2]),
36 | reg_class_agnostic=False,
37 | loss_cls=dict(
38 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
39 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
40 | # model training and testing settings
41 | train_cfg=dict(
42 | rcnn=dict(
43 | assigner=dict(
44 | type='MaxIoUAssigner',
45 | pos_iou_thr=0.5,
46 | neg_iou_thr=0.5,
47 | min_pos_iou=0.5,
48 | match_low_quality=False,
49 | ignore_iof_thr=-1),
50 | sampler=dict(
51 | type='RandomSampler',
52 | num=512,
53 | pos_fraction=0.25,
54 | neg_pos_ub=-1,
55 | add_gt_as_proposals=True),
56 | pos_weight=-1,
57 | debug=False)),
58 | test_cfg=dict(
59 | rcnn=dict(
60 | score_thr=0.05,
61 | nms=dict(type='nms', iou_threshold=0.5),
62 | max_per_img=100)))
63 |
--------------------------------------------------------------------------------
/configs/_base_/models/faster_rcnn_r50_caffe_c4.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | norm_cfg = dict(type='BN', requires_grad=False)
3 | model = dict(
4 | type='FasterRCNN',
5 | backbone=dict(
6 | type='ResNet',
7 | depth=50,
8 | num_stages=3,
9 | strides=(1, 2, 2),
10 | dilations=(1, 1, 1),
11 | out_indices=(2, ),
12 | frozen_stages=1,
13 | norm_cfg=norm_cfg,
14 | norm_eval=True,
15 | style='caffe',
16 | init_cfg=dict(
17 | type='Pretrained',
18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')),
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=1024,
22 | feat_channels=1024,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[2, 4, 8, 16, 32],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[16]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 | roi_head=dict(
36 | type='StandardRoIHead',
37 | shared_head=dict(
38 | type='ResLayer',
39 | depth=50,
40 | stage=3,
41 | stride=2,
42 | dilation=1,
43 | style='caffe',
44 | norm_cfg=norm_cfg,
45 | norm_eval=True),
46 | bbox_roi_extractor=dict(
47 | type='SingleRoIExtractor',
48 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
49 | out_channels=1024,
50 | featmap_strides=[16]),
51 | bbox_head=dict(
52 | type='BBoxHead',
53 | with_avg_pool=True,
54 | roi_feat_size=7,
55 | in_channels=2048,
56 | num_classes=80,
57 | bbox_coder=dict(
58 | type='DeltaXYWHBBoxCoder',
59 | target_means=[0., 0., 0., 0.],
60 | target_stds=[0.1, 0.1, 0.2, 0.2]),
61 | reg_class_agnostic=False,
62 | loss_cls=dict(
63 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
64 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
65 | # model training and testing settings
66 | train_cfg=dict(
67 | rpn=dict(
68 | assigner=dict(
69 | type='MaxIoUAssigner',
70 | pos_iou_thr=0.7,
71 | neg_iou_thr=0.3,
72 | min_pos_iou=0.3,
73 | match_low_quality=True,
74 | ignore_iof_thr=-1),
75 | sampler=dict(
76 | type='RandomSampler',
77 | num=256,
78 | pos_fraction=0.5,
79 | neg_pos_ub=-1,
80 | add_gt_as_proposals=False),
81 | allowed_border=0,
82 | pos_weight=-1,
83 | debug=False),
84 | rpn_proposal=dict(
85 | nms_pre=12000,
86 | max_per_img=2000,
87 | nms=dict(type='nms', iou_threshold=0.7),
88 | min_bbox_size=0),
89 | rcnn=dict(
90 | assigner=dict(
91 | type='MaxIoUAssigner',
92 | pos_iou_thr=0.5,
93 | neg_iou_thr=0.5,
94 | min_pos_iou=0.5,
95 | match_low_quality=False,
96 | ignore_iof_thr=-1),
97 | sampler=dict(
98 | type='RandomSampler',
99 | num=512,
100 | pos_fraction=0.25,
101 | neg_pos_ub=-1,
102 | add_gt_as_proposals=True),
103 | pos_weight=-1,
104 | debug=False)),
105 | test_cfg=dict(
106 | rpn=dict(
107 | nms_pre=6000,
108 | max_per_img=1000,
109 | nms=dict(type='nms', iou_threshold=0.7),
110 | min_bbox_size=0),
111 | rcnn=dict(
112 | score_thr=0.05,
113 | nms=dict(type='nms', iou_threshold=0.5),
114 | max_per_img=100)))
115 |
--------------------------------------------------------------------------------
/configs/_base_/models/faster_rcnn_r50_caffe_dc5.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | norm_cfg = dict(type='BN', requires_grad=False)
3 | model = dict(
4 | type='FasterRCNN',
5 | backbone=dict(
6 | type='ResNet',
7 | depth=50,
8 | num_stages=4,
9 | strides=(1, 2, 2, 1),
10 | dilations=(1, 1, 1, 2),
11 | out_indices=(3, ),
12 | frozen_stages=1,
13 | norm_cfg=norm_cfg,
14 | norm_eval=True,
15 | style='caffe',
16 | init_cfg=dict(
17 | type='Pretrained',
18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')),
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=2048,
22 | feat_channels=2048,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[2, 4, 8, 16, 32],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[16]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 | roi_head=dict(
36 | type='StandardRoIHead',
37 | bbox_roi_extractor=dict(
38 | type='SingleRoIExtractor',
39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
40 | out_channels=2048,
41 | featmap_strides=[16]),
42 | bbox_head=dict(
43 | type='Shared2FCBBoxHead',
44 | in_channels=2048,
45 | fc_out_channels=1024,
46 | roi_feat_size=7,
47 | num_classes=80,
48 | bbox_coder=dict(
49 | type='DeltaXYWHBBoxCoder',
50 | target_means=[0., 0., 0., 0.],
51 | target_stds=[0.1, 0.1, 0.2, 0.2]),
52 | reg_class_agnostic=False,
53 | loss_cls=dict(
54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
56 | # model training and testing settings
57 | train_cfg=dict(
58 | rpn=dict(
59 | assigner=dict(
60 | type='MaxIoUAssigner',
61 | pos_iou_thr=0.7,
62 | neg_iou_thr=0.3,
63 | min_pos_iou=0.3,
64 | match_low_quality=True,
65 | ignore_iof_thr=-1),
66 | sampler=dict(
67 | type='RandomSampler',
68 | num=256,
69 | pos_fraction=0.5,
70 | neg_pos_ub=-1,
71 | add_gt_as_proposals=False),
72 | allowed_border=0,
73 | pos_weight=-1,
74 | debug=False),
75 | rpn_proposal=dict(
76 | nms_pre=12000,
77 | max_per_img=2000,
78 | nms=dict(type='nms', iou_threshold=0.7),
79 | min_bbox_size=0),
80 | rcnn=dict(
81 | assigner=dict(
82 | type='MaxIoUAssigner',
83 | pos_iou_thr=0.5,
84 | neg_iou_thr=0.5,
85 | min_pos_iou=0.5,
86 | match_low_quality=False,
87 | ignore_iof_thr=-1),
88 | sampler=dict(
89 | type='RandomSampler',
90 | num=512,
91 | pos_fraction=0.25,
92 | neg_pos_ub=-1,
93 | add_gt_as_proposals=True),
94 | pos_weight=-1,
95 | debug=False)),
96 | test_cfg=dict(
97 | rpn=dict(
98 | nms=dict(type='nms', iou_threshold=0.7),
99 | nms_pre=6000,
100 | max_per_img=1000,
101 | min_bbox_size=0),
102 | rcnn=dict(
103 | score_thr=0.05,
104 | nms=dict(type='nms', iou_threshold=0.5),
105 | max_per_img=100)))
106 |
--------------------------------------------------------------------------------
/configs/_base_/models/faster_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | model = dict(
3 | type='FasterRCNN',
4 | backbone=dict(
5 | type='ResNet',
6 | depth=50,
7 | num_stages=4,
8 | out_indices=(0, 1, 2, 3),
9 | frozen_stages=1,
10 | norm_cfg=dict(type='BN', requires_grad=True),
11 | norm_eval=True,
12 | style='pytorch',
13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 | neck=dict(
15 | type='FPN',
16 | in_channels=[256, 512, 1024, 2048],
17 | out_channels=256,
18 | num_outs=5),
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=256,
22 | feat_channels=256,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[8],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[4, 8, 16, 32, 64]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 | roi_head=dict(
36 | type='StandardRoIHead',
37 | bbox_roi_extractor=dict(
38 | type='SingleRoIExtractor',
39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
40 | out_channels=256,
41 | featmap_strides=[4, 8, 16, 32]),
42 | bbox_head=dict(
43 | type='Shared2FCBBoxHead',
44 | in_channels=256,
45 | fc_out_channels=1024,
46 | roi_feat_size=7,
47 | num_classes=80,
48 | bbox_coder=dict(
49 | type='DeltaXYWHBBoxCoder',
50 | target_means=[0., 0., 0., 0.],
51 | target_stds=[0.1, 0.1, 0.2, 0.2]),
52 | reg_class_agnostic=False,
53 | loss_cls=dict(
54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
56 | # model training and testing settings
57 | train_cfg=dict(
58 | rpn=dict(
59 | assigner=dict(
60 | type='MaxIoUAssigner',
61 | pos_iou_thr=0.7,
62 | neg_iou_thr=0.3,
63 | min_pos_iou=0.3,
64 | match_low_quality=True,
65 | ignore_iof_thr=-1),
66 | sampler=dict(
67 | type='RandomSampler',
68 | num=256,
69 | pos_fraction=0.5,
70 | neg_pos_ub=-1,
71 | add_gt_as_proposals=False),
72 | allowed_border=-1,
73 | pos_weight=-1,
74 | debug=False),
75 | rpn_proposal=dict(
76 | nms_pre=2000,
77 | max_per_img=1000,
78 | nms=dict(type='nms', iou_threshold=0.7),
79 | min_bbox_size=0),
80 | rcnn=dict(
81 | assigner=dict(
82 | type='MaxIoUAssigner',
83 | pos_iou_thr=0.5,
84 | neg_iou_thr=0.5,
85 | min_pos_iou=0.5,
86 | match_low_quality=False,
87 | ignore_iof_thr=-1),
88 | sampler=dict(
89 | type='RandomSampler',
90 | num=512,
91 | pos_fraction=0.25,
92 | neg_pos_ub=-1,
93 | add_gt_as_proposals=True),
94 | pos_weight=-1,
95 | debug=False)),
96 | test_cfg=dict(
97 | rpn=dict(
98 | nms_pre=1000,
99 | max_per_img=1000,
100 | nms=dict(type='nms', iou_threshold=0.7),
101 | min_bbox_size=0),
102 | rcnn=dict(
103 | score_thr=0.05,
104 | nms=dict(type='nms', iou_threshold=0.5),
105 | max_per_img=100)
106 | # soft-nms is also supported for rcnn testing
107 | # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
108 | ))
109 |
--------------------------------------------------------------------------------
/configs/_base_/models/mask_rcnn_r50_caffe_c4.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | norm_cfg = dict(type='BN', requires_grad=False)
3 | model = dict(
4 | type='MaskRCNN',
5 | backbone=dict(
6 | type='ResNet',
7 | depth=50,
8 | num_stages=3,
9 | strides=(1, 2, 2),
10 | dilations=(1, 1, 1),
11 | out_indices=(2, ),
12 | frozen_stages=1,
13 | norm_cfg=norm_cfg,
14 | norm_eval=True,
15 | style='caffe',
16 | init_cfg=dict(
17 | type='Pretrained',
18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')),
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=1024,
22 | feat_channels=1024,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[2, 4, 8, 16, 32],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[16]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 | roi_head=dict(
36 | type='StandardRoIHead',
37 | shared_head=dict(
38 | type='ResLayer',
39 | depth=50,
40 | stage=3,
41 | stride=2,
42 | dilation=1,
43 | style='caffe',
44 | norm_cfg=norm_cfg,
45 | norm_eval=True),
46 | bbox_roi_extractor=dict(
47 | type='SingleRoIExtractor',
48 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
49 | out_channels=1024,
50 | featmap_strides=[16]),
51 | bbox_head=dict(
52 | type='BBoxHead',
53 | with_avg_pool=True,
54 | roi_feat_size=7,
55 | in_channels=2048,
56 | num_classes=80,
57 | bbox_coder=dict(
58 | type='DeltaXYWHBBoxCoder',
59 | target_means=[0., 0., 0., 0.],
60 | target_stds=[0.1, 0.1, 0.2, 0.2]),
61 | reg_class_agnostic=False,
62 | loss_cls=dict(
63 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
64 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
65 | mask_roi_extractor=None,
66 | mask_head=dict(
67 | type='FCNMaskHead',
68 | num_convs=0,
69 | in_channels=2048,
70 | conv_out_channels=256,
71 | num_classes=80,
72 | loss_mask=dict(
73 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
74 | # model training and testing settings
75 | train_cfg=dict(
76 | rpn=dict(
77 | assigner=dict(
78 | type='MaxIoUAssigner',
79 | pos_iou_thr=0.7,
80 | neg_iou_thr=0.3,
81 | min_pos_iou=0.3,
82 | match_low_quality=True,
83 | ignore_iof_thr=-1),
84 | sampler=dict(
85 | type='RandomSampler',
86 | num=256,
87 | pos_fraction=0.5,
88 | neg_pos_ub=-1,
89 | add_gt_as_proposals=False),
90 | allowed_border=0,
91 | pos_weight=-1,
92 | debug=False),
93 | rpn_proposal=dict(
94 | nms_pre=12000,
95 | max_per_img=2000,
96 | nms=dict(type='nms', iou_threshold=0.7),
97 | min_bbox_size=0),
98 | rcnn=dict(
99 | assigner=dict(
100 | type='MaxIoUAssigner',
101 | pos_iou_thr=0.5,
102 | neg_iou_thr=0.5,
103 | min_pos_iou=0.5,
104 | match_low_quality=False,
105 | ignore_iof_thr=-1),
106 | sampler=dict(
107 | type='RandomSampler',
108 | num=512,
109 | pos_fraction=0.25,
110 | neg_pos_ub=-1,
111 | add_gt_as_proposals=True),
112 | mask_size=14,
113 | pos_weight=-1,
114 | debug=False)),
115 | test_cfg=dict(
116 | rpn=dict(
117 | nms_pre=6000,
118 | nms=dict(type='nms', iou_threshold=0.7),
119 | max_per_img=1000,
120 | min_bbox_size=0),
121 | rcnn=dict(
122 | score_thr=0.05,
123 | nms=dict(type='nms', iou_threshold=0.5),
124 | max_per_img=100,
125 | mask_thr_binary=0.5)))
126 |
--------------------------------------------------------------------------------
/configs/_base_/models/mask_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | model = dict(
3 | type='MaskRCNN',
4 | backbone=dict(
5 | type='ResNet',
6 | depth=50,
7 | num_stages=4,
8 | out_indices=(0, 1, 2, 3),
9 | frozen_stages=1,
10 | norm_cfg=dict(type='BN', requires_grad=True),
11 | norm_eval=True,
12 | style='pytorch',
13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 | neck=dict(
15 | type='FPN',
16 | in_channels=[256, 512, 1024, 2048],
17 | out_channels=256,
18 | num_outs=5),
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=256,
22 | feat_channels=256,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[8],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[4, 8, 16, 32, 64]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 | roi_head=dict(
36 | type='StandardRoIHead',
37 | bbox_roi_extractor=dict(
38 | type='SingleRoIExtractor',
39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
40 | out_channels=256,
41 | featmap_strides=[4, 8, 16, 32]),
42 | bbox_head=dict(
43 | type='Shared2FCBBoxHead',
44 | in_channels=256,
45 | fc_out_channels=1024,
46 | roi_feat_size=7,
47 | num_classes=80,
48 | bbox_coder=dict(
49 | type='DeltaXYWHBBoxCoder',
50 | target_means=[0., 0., 0., 0.],
51 | target_stds=[0.1, 0.1, 0.2, 0.2]),
52 | reg_class_agnostic=False,
53 | loss_cls=dict(
54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
56 | mask_roi_extractor=dict(
57 | type='SingleRoIExtractor',
58 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
59 | out_channels=256,
60 | featmap_strides=[4, 8, 16, 32]),
61 | mask_head=dict(
62 | type='FCNMaskHead',
63 | num_convs=4,
64 | in_channels=256,
65 | conv_out_channels=256,
66 | num_classes=80,
67 | loss_mask=dict(
68 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
69 | # model training and testing settings
70 | train_cfg=dict(
71 | rpn=dict(
72 | assigner=dict(
73 | type='MaxIoUAssigner',
74 | pos_iou_thr=0.7,
75 | neg_iou_thr=0.3,
76 | min_pos_iou=0.3,
77 | match_low_quality=True,
78 | ignore_iof_thr=-1),
79 | sampler=dict(
80 | type='RandomSampler',
81 | num=256,
82 | pos_fraction=0.5,
83 | neg_pos_ub=-1,
84 | add_gt_as_proposals=False),
85 | allowed_border=-1,
86 | pos_weight=-1,
87 | debug=False),
88 | rpn_proposal=dict(
89 | nms_pre=2000,
90 | max_per_img=1000,
91 | nms=dict(type='nms', iou_threshold=0.7),
92 | min_bbox_size=0),
93 | rcnn=dict(
94 | assigner=dict(
95 | type='MaxIoUAssigner',
96 | pos_iou_thr=0.5,
97 | neg_iou_thr=0.5,
98 | min_pos_iou=0.5,
99 | match_low_quality=True,
100 | ignore_iof_thr=-1),
101 | sampler=dict(
102 | type='RandomSampler',
103 | num=512,
104 | pos_fraction=0.25,
105 | neg_pos_ub=-1,
106 | add_gt_as_proposals=True),
107 | mask_size=28,
108 | pos_weight=-1,
109 | debug=False)),
110 | test_cfg=dict(
111 | rpn=dict(
112 | nms_pre=1000,
113 | max_per_img=1000,
114 | nms=dict(type='nms', iou_threshold=0.7),
115 | min_bbox_size=0),
116 | rcnn=dict(
117 | score_thr=0.05,
118 | nms=dict(type='nms', iou_threshold=0.5),
119 | max_per_img=100,
120 | mask_thr_binary=0.5)))
121 |
--------------------------------------------------------------------------------
/configs/_base_/models/retinanet_r50_fpn.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | model = dict(
3 | type='RetinaNet',
4 | backbone=dict(
5 | type='ResNet',
6 | depth=50,
7 | num_stages=4,
8 | out_indices=(0, 1, 2, 3),
9 | frozen_stages=1,
10 | norm_cfg=dict(type='BN', requires_grad=True),
11 | norm_eval=True,
12 | style='pytorch',
13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 | neck=dict(
15 | type='FPN',
16 | in_channels=[256, 512, 1024, 2048],
17 | out_channels=256,
18 | start_level=1,
19 | add_extra_convs='on_input',
20 | num_outs=5),
21 | bbox_head=dict(
22 | type='RetinaHead',
23 | num_classes=80,
24 | in_channels=256,
25 | stacked_convs=4,
26 | feat_channels=256,
27 | anchor_generator=dict(
28 | type='AnchorGenerator',
29 | octave_base_scale=4,
30 | scales_per_octave=3,
31 | ratios=[0.5, 1.0, 2.0],
32 | strides=[8, 16, 32, 64, 128]),
33 | bbox_coder=dict(
34 | type='DeltaXYWHBBoxCoder',
35 | target_means=[.0, .0, .0, .0],
36 | target_stds=[1.0, 1.0, 1.0, 1.0]),
37 | loss_cls=dict(
38 | type='FocalLoss',
39 | use_sigmoid=True,
40 | gamma=2.0,
41 | alpha=0.25,
42 | loss_weight=1.0),
43 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
44 | # model training and testing settings
45 | train_cfg=dict(
46 | assigner=dict(
47 | type='MaxIoUAssigner',
48 | pos_iou_thr=0.5,
49 | neg_iou_thr=0.4,
50 | min_pos_iou=0,
51 | ignore_iof_thr=-1),
52 | allowed_border=-1,
53 | pos_weight=-1,
54 | debug=False),
55 | test_cfg=dict(
56 | nms_pre=1000,
57 | min_bbox_size=0,
58 | score_thr=0.05,
59 | nms=dict(type='nms', iou_threshold=0.5),
60 | max_per_img=100))
61 |
--------------------------------------------------------------------------------
/configs/_base_/models/rpn_r50_caffe_c4.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | model = dict(
3 | type='RPN',
4 | backbone=dict(
5 | type='ResNet',
6 | depth=50,
7 | num_stages=3,
8 | strides=(1, 2, 2),
9 | dilations=(1, 1, 1),
10 | out_indices=(2, ),
11 | frozen_stages=1,
12 | norm_cfg=dict(type='BN', requires_grad=False),
13 | norm_eval=True,
14 | style='caffe',
15 | init_cfg=dict(
16 | type='Pretrained',
17 | checkpoint='open-mmlab://detectron2/resnet50_caffe')),
18 | neck=None,
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=1024,
22 | feat_channels=1024,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[2, 4, 8, 16, 32],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[16]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 | # model training and testing settings
36 | train_cfg=dict(
37 | rpn=dict(
38 | assigner=dict(
39 | type='MaxIoUAssigner',
40 | pos_iou_thr=0.7,
41 | neg_iou_thr=0.3,
42 | min_pos_iou=0.3,
43 | ignore_iof_thr=-1),
44 | sampler=dict(
45 | type='RandomSampler',
46 | num=256,
47 | pos_fraction=0.5,
48 | neg_pos_ub=-1,
49 | add_gt_as_proposals=False),
50 | allowed_border=0,
51 | pos_weight=-1,
52 | debug=False)),
53 | test_cfg=dict(
54 | rpn=dict(
55 | nms_pre=12000,
56 | max_per_img=2000,
57 | nms=dict(type='nms', iou_threshold=0.7),
58 | min_bbox_size=0)))
59 |
--------------------------------------------------------------------------------
/configs/_base_/models/rpn_r50_fpn.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | model = dict(
3 | type='RPN',
4 | backbone=dict(
5 | type='ResNet',
6 | depth=50,
7 | num_stages=4,
8 | out_indices=(0, 1, 2, 3),
9 | frozen_stages=1,
10 | norm_cfg=dict(type='BN', requires_grad=True),
11 | norm_eval=True,
12 | style='pytorch',
13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 | neck=dict(
15 | type='FPN',
16 | in_channels=[256, 512, 1024, 2048],
17 | out_channels=256,
18 | num_outs=5),
19 | rpn_head=dict(
20 | type='RPNHead',
21 | in_channels=256,
22 | feat_channels=256,
23 | anchor_generator=dict(
24 | type='AnchorGenerator',
25 | scales=[8],
26 | ratios=[0.5, 1.0, 2.0],
27 | strides=[4, 8, 16, 32, 64]),
28 | bbox_coder=dict(
29 | type='DeltaXYWHBBoxCoder',
30 | target_means=[.0, .0, .0, .0],
31 | target_stds=[1.0, 1.0, 1.0, 1.0]),
32 | loss_cls=dict(
33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 | # model training and testing settings
36 | train_cfg=dict(
37 | rpn=dict(
38 | assigner=dict(
39 | type='MaxIoUAssigner',
40 | pos_iou_thr=0.7,
41 | neg_iou_thr=0.3,
42 | min_pos_iou=0.3,
43 | ignore_iof_thr=-1),
44 | sampler=dict(
45 | type='RandomSampler',
46 | num=256,
47 | pos_fraction=0.5,
48 | neg_pos_ub=-1,
49 | add_gt_as_proposals=False),
50 | allowed_border=0,
51 | pos_weight=-1,
52 | debug=False)),
53 | test_cfg=dict(
54 | rpn=dict(
55 | nms_pre=2000,
56 | max_per_img=1000,
57 | nms=dict(type='nms', iou_threshold=0.7),
58 | min_bbox_size=0)))
59 |
--------------------------------------------------------------------------------
/configs/_base_/models/ssd300.py:
--------------------------------------------------------------------------------
1 | # model settings
2 | input_size = 300
3 | model = dict(
4 | type='SingleStageDetector',
5 | backbone=dict(
6 | type='SSDVGG',
7 | depth=16,
8 | with_last_pool=False,
9 | ceil_mode=True,
10 | out_indices=(3, 4),
11 | out_feature_indices=(22, 34),
12 | init_cfg=dict(
13 | type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')),
14 | neck=dict(
15 | type='SSDNeck',
16 | in_channels=(512, 1024),
17 | out_channels=(512, 1024, 512, 256, 256, 256),
18 | level_strides=(2, 2, 1, 1),
19 | level_paddings=(1, 1, 0, 0),
20 | l2_norm_scale=20),
21 | bbox_head=dict(
22 | type='SSDHead',
23 | in_channels=(512, 1024, 512, 256, 256, 256),
24 | num_classes=80,
25 | anchor_generator=dict(
26 | type='SSDAnchorGenerator',
27 | scale_major=False,
28 | input_size=input_size,
29 | basesize_ratio_range=(0.15, 0.9),
30 | strides=[8, 16, 32, 64, 100, 300],
31 | ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]),
32 | bbox_coder=dict(
33 | type='DeltaXYWHBBoxCoder',
34 | target_means=[.0, .0, .0, .0],
35 | target_stds=[0.1, 0.1, 0.2, 0.2])),
36 | # model training and testing settings
37 | train_cfg=dict(
38 | assigner=dict(
39 | type='MaxIoUAssigner',
40 | pos_iou_thr=0.5,
41 | neg_iou_thr=0.5,
42 | min_pos_iou=0.,
43 | ignore_iof_thr=-1,
44 | gt_max_assign_all=False),
45 | smoothl1_beta=1.,
46 | allowed_border=-1,
47 | pos_weight=-1,
48 | neg_pos_ratio=3,
49 | debug=False),
50 | test_cfg=dict(
51 | nms_pre=1000,
52 | nms=dict(type='nms', iou_threshold=0.45),
53 | min_bbox_size=0,
54 | score_thr=0.02,
55 | max_per_img=200))
56 | cudnn_benchmark = True
57 |
--------------------------------------------------------------------------------
/configs/_base_/schedules/schedule_1x.py:
--------------------------------------------------------------------------------
1 | # optimizer
2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
3 | optimizer_config = dict(grad_clip=None)
4 | # learning policy
5 | lr_config = dict(
6 | policy='step',
7 | warmup='linear',
8 | warmup_iters=500,
9 | warmup_ratio=0.001,
10 | step=[8, 11])
11 | runner = dict(type='EpochBasedRunner', max_epochs=12)
12 |
--------------------------------------------------------------------------------
/configs/_base_/schedules/schedule_20e.py:
--------------------------------------------------------------------------------
1 | # optimizer
2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
3 | optimizer_config = dict(grad_clip=None)
4 | # learning policy
5 | lr_config = dict(
6 | policy='step',
7 | warmup='linear',
8 | warmup_iters=500,
9 | warmup_ratio=0.001,
10 | step=[16, 19])
11 | runner = dict(type='EpochBasedRunner', max_epochs=20)
12 |
--------------------------------------------------------------------------------
/configs/_base_/schedules/schedule_2x.py:
--------------------------------------------------------------------------------
1 | # optimizer
2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
3 | optimizer_config = dict(grad_clip=None)
4 | # learning policy
5 | lr_config = dict(
6 | policy='step',
7 | warmup='linear',
8 | warmup_iters=500,
9 | warmup_ratio=0.001,
10 | step=[16, 22])
11 | runner = dict(type='EpochBasedRunner', max_epochs=24)
12 |
--------------------------------------------------------------------------------
/configs/openinst/coco_to_uvo_ins.py:
--------------------------------------------------------------------------------
1 | # dataset settings
2 | img_norm_cfg = dict(
3 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
4 | train_pipeline = [
5 | dict(type='LoadImageFromFile'),
6 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
7 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
8 | dict(type='RandomFlip', flip_ratio=0.5),
9 | dict(type='Normalize', **img_norm_cfg),
10 | dict(type='Pad', size_divisor=32),
11 | dict(type='DefaultFormatBundle'),
12 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
13 | ]
14 | test_pipeline = [
15 | dict(type='LoadImageFromFile'),
16 | dict(
17 | type='MultiScaleFlipAug',
18 | img_scale=(1333, 800),
19 | flip=False,
20 | transforms=[
21 | dict(type='Resize', keep_ratio=True),
22 | dict(type='RandomFlip'),
23 | dict(type='Normalize', **img_norm_cfg),
24 | dict(type='Pad', size_divisor=32),
25 | dict(type='ImageToTensor', keys=['img']),
26 | dict(type='Collect', keys=['img']),
27 | ])
28 | ]
29 | data = dict(
30 | samples_per_gpu=4,
31 | workers_per_gpu=4,
32 | train=dict(
33 | type='CocoSplitDataset',
34 | is_class_agnostic=True,
35 | train_class='all',
36 | eval_class='all',
37 | ann_file='data/coco/annotations/instances_train2017.json',
38 | img_prefix='data/coco/train2017/',
39 | pipeline=train_pipeline),
40 | val=dict(
41 | type='UVODataset',
42 | is_class_agnostic=True,
43 | train_class='all',
44 | eval_class='all',
45 | ann_file='data/UVO/ann/UVO_frame_val.json',
46 | img_prefix='data/UVO/images/',
47 | pipeline=test_pipeline),
48 | test=dict(
49 | type='UVODataset',
50 | is_class_agnostic=True,
51 | train_class='all',
52 | eval_class='all',
53 | ann_file='data/UVO/ann/UVO_frame_val.json',
54 | img_prefix='data/UVO/images/',
55 | pipeline=test_pipeline))
56 | evaluation = dict(interval=1, metric=['bbox', 'segm'])
57 |
--------------------------------------------------------------------------------
/configs/openinst/queryinst_r50_1x_coco.py:
--------------------------------------------------------------------------------
1 | dataset_file = './coco_to_uvo_ins.py'
2 | init_cfg = dict(type='Pretrained', checkpoint='resnet50-0676ba61.pth')
3 | init_cfg=None
4 | tb_hook = dict(type='TensorboardLoggerHook')
5 | _base_ = [
6 | dataset_file,
7 | '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py'
8 | ]
9 | num_stages = 6
10 | num_proposals = 100
11 | model = dict(
12 | type='QueryInst',
13 | backbone=dict(
14 | type='ResNet',
15 | depth=50,
16 | num_stages=4,
17 | out_indices=(0, 1, 2, 3),
18 | frozen_stages=-1,
19 | norm_cfg=dict(type='SyncBN', requires_grad=True),
20 | norm_eval=False,
21 | style='pytorch',
22 | init_cfg=init_cfg,
23 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
24 | stage_with_dcn=(False, True, True, True),
25 | with_cp=True),
26 | neck=dict(
27 | type='BiFPN',
28 | in_channels=[256, 512, 1024, 2048],
29 | out_channels=256,
30 | num_outs=6,
31 | num_repeats=6,
32 | norm='SyncBN'),
33 | rpn_head=dict(
34 | type='EmbeddingRPNHead',
35 | num_proposals=num_proposals,
36 | proposal_feature_channel=256),
37 | roi_head=dict(
38 | type='SparseScoreRoIHead',
39 | num_stages=num_stages,
40 | stage_loss_weights=[1] * num_stages,
41 | proposal_feature_channel=256,
42 | bbox_roi_extractor=dict(
43 | type='SingleRoIExtractor',
44 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2),
45 | out_channels=256,
46 | featmap_strides=[4, 8, 16, 32, 64, 128]),
47 | mask_roi_extractor=dict(
48 | type='SingleRoIExtractor',
49 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2),
50 | out_channels=256,
51 | featmap_strides=[4, 8, 16, 32, 64, 128]),
52 | bbox_head=[
53 | dict(
54 | type='DIIScoreHead',
55 | num_classes=1,
56 | num_ffn_fcs=2,
57 | num_heads=8,
58 | num_cls_fcs=1,
59 | num_reg_fcs=3,
60 | feedforward_channels=2048,
61 | in_channels=256,
62 | dropout=0.0,
63 | ffn_act_cfg=dict(type='ReLU', inplace=True),
64 | dynamic_conv_cfg=dict(
65 | type='DynamicConv',
66 | in_channels=256,
67 | feat_channels=64,
68 | out_channels=256,
69 | input_feat_shape=7,
70 | act_cfg=dict(type='ReLU', inplace=True),
71 | norm_cfg=dict(type='LN')),
72 | loss_bbox=dict(type='L1Loss', loss_weight=5.0),
73 | loss_iou=dict(type='GIoULoss', loss_weight=2.0),
74 | loss_cls=dict(type='L1Loss', loss_weight=2.0),
75 | bbox_coder=dict(
76 | type='DeltaXYWHBBoxCoder',
77 | clip_border=False,
78 | target_means=[0., 0., 0., 0.],
79 | target_stds=[0.5, 0.5, 1., 1.])) for _ in range(num_stages)
80 | ],
81 | mask_head=[
82 | dict(
83 | type='DynamicMaskHead',
84 | dynamic_conv_cfg=dict(
85 | type='DynamicConv',
86 | in_channels=256,
87 | feat_channels=64,
88 | out_channels=256,
89 | input_feat_shape=14,
90 | with_proj=False,
91 | act_cfg=dict(type='ReLU', inplace=True),
92 | norm_cfg=dict(type='LN')),
93 | num_convs=4,
94 | num_classes=1,
95 | roi_feat_size=14,
96 | in_channels=256,
97 | conv_kernel_size=3,
98 | conv_out_channels=256,
99 | class_agnostic=False,
100 | norm_cfg=dict(type='BN'),
101 | upsample_cfg=dict(type='deconv', scale_factor=2),
102 | loss_mask=dict(
103 | type='DiceLoss',
104 | loss_weight=1.0,
105 | use_sigmoid=True,
106 | activate=False,
107 | eps=1e-5)) for _ in range(num_stages)
108 | ]),
109 | # training and testing settings
110 | train_cfg=dict(
111 | rpn=None,
112 | rcnn=[
113 | dict(
114 | assigner=dict(
115 | type='HungarianAssigner',
116 | cls_cost=dict(type='FocalLossCost', weight=2.0),
117 | reg_cost=dict(type='BBoxL1Cost', weight=5.0),
118 | iou_cost=dict(type='IoUCost', iou_mode='giou',
119 | weight=2.0)),
120 | sampler=dict(type='PseudoSampler'),
121 | pos_weight=1,
122 | mask_size=28,
123 | ) for _ in range(num_stages)
124 | ]),
125 | test_cfg=dict(
126 | rpn=None, rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5)))
127 |
128 | # optimizer
129 | optimizer = dict(
130 | _delete_=True,
131 | type='AdamW',
132 | lr=0.0001,
133 | weight_decay=0.0001,
134 | paramwise_cfg=dict(
135 | custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0),
136 | 'init_proposal_features':dict(lr_mult=1.0, decay_mult=0.0),
137 | 'init_proposal_bboxes': dict(lr_mult=1.0, decay_mult=0.0),
138 | },
139 | norm_decay_mult=0.0)
140 | )
141 |
142 | optimizer_config = dict(
143 | _delete_=True, grad_clip=dict(max_norm=0.1, norm_type=2))
144 | # learning policy
145 | lr_config = dict(policy='step', step=[8, 11], warmup_iters=1000)
146 | runner = dict(type='EpochBasedRunner', max_epochs=12)
147 |
148 | # log setting
149 | log_config = dict(
150 | interval=20,
151 | hooks=[
152 | dict(type='TextLoggerHook'),
153 | tb_hook,
154 | ])
155 |
156 | checkpoint_config = dict(interval=3)
157 | resume_from = None
158 | custom_hooks = []
159 | custom_hooks = [dict(
160 | type='ExpMomentumEMAHook',
161 | resume_from=resume_from,
162 | momentum=0.0001,
163 | priority=49)]
--------------------------------------------------------------------------------
/configs/openinst/queryinst_r50_3x_lsj_coco.py:
--------------------------------------------------------------------------------
1 | _base_ = './queryinst_r50_1x_coco.py'
2 | num_proposals = 100
3 | model = dict(
4 | rpn_head=dict(num_proposals=num_proposals),
5 | test_cfg=dict(
6 | _delete_=True,
7 | rpn=None,
8 | rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5)))
9 | img_norm_cfg = dict(
10 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
11 |
12 | # augmentation strategy originates from DETR.
13 | image_size = (1024, 1024)
14 | train_pipeline = [
15 | dict(type='LoadImageFromFile'),
16 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
17 | dict(
18 | type='Resize',
19 | img_scale=image_size,
20 | ratio_range=(0.1, 2.0),
21 | multiscale_mode='range',
22 | keep_ratio=True),
23 | dict(
24 | type='RandomCrop',
25 | crop_type='absolute_range',
26 | crop_size=image_size,
27 | recompute_bbox=True,
28 | allow_negative_crop=True),
29 | dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)),
30 | dict(type='RandomFlip', flip_ratio=0.5),
31 | dict(type='Pad', size=image_size),
32 | dict(type='Normalize', **img_norm_cfg),
33 | dict(type='DefaultFormatBundle'),
34 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
35 | ]
36 | test_pipeline = [
37 | dict(type='LoadImageFromFile'),
38 | dict(
39 | type='MultiScaleFlipAug',
40 | img_scale=(1333, 800),
41 | flip=False,
42 | transforms=[
43 | dict(type='Resize', keep_ratio=True),
44 | dict(type='RandomFlip'),
45 | dict(type='Normalize', **img_norm_cfg),
46 | dict(type='Pad', size_divisor=32),
47 | dict(type='ImageToTensor', keys=['img']),
48 | dict(type='Collect', keys=['img']),
49 | ])
50 | ]
51 |
52 | data = dict(train=dict(pipeline=train_pipeline))
53 |
54 | lr_config = dict(policy='step', step=[50,])
55 | runner = dict(type='EpochBasedRunner', max_epochs=50)
56 |
57 | resume_from = None
58 | custom_hooks = [dict(
59 | type='ExpMomentumEMAHook',
60 | resume_from=resume_from,
61 | momentum=0.0001,
62 | priority=49)]
--------------------------------------------------------------------------------
/core/__init__.py:
--------------------------------------------------------------------------------
1 | from .bbox import *
2 | from .hook import *
--------------------------------------------------------------------------------
/core/bbox/__init__.py:
--------------------------------------------------------------------------------
1 | from .assigners import *
2 | from .match_costs import *
--------------------------------------------------------------------------------
/core/bbox/assigners/__init__.py:
--------------------------------------------------------------------------------
1 | from .hungarian_oln_assigner import HungarianOlnAssigner
--------------------------------------------------------------------------------
/core/bbox/assigners/hungarian_oln_assigner.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import torch
3 |
4 | from mmdet.core.bbox.builder import BBOX_ASSIGNERS
5 | from mmdet.core.bbox.match_costs import build_match_cost
6 | from mmdet.core.bbox.transforms import bbox_cxcywh_to_xyxy
7 | from mmdet.core.bbox.assigners.assign_result import AssignResult
8 | from mmdet.core.bbox.assigners.base_assigner import BaseAssigner
9 |
10 | try:
11 | from scipy.optimize import linear_sum_assignment
12 | except ImportError:
13 | linear_sum_assignment = None
14 |
15 |
16 | @BBOX_ASSIGNERS.register_module()
17 | class HungarianOlnAssigner(BaseAssigner):
18 | """Computes one-to-one matching between predictions and ground truth.
19 |
20 | This class computes an assignment between the targets and the predictions
21 | based on the costs. The costs are weighted sum of three components:
22 | classification cost, regression L1 cost and regression iou cost. The
23 | targets don't include the no_object, so generally there are more
24 | predictions than targets. After the one-to-one matching, the un-matched
25 | are treated as backgrounds. Thus each query prediction will be assigned
26 | with `0` or a positive integer indicating the ground truth index:
27 |
28 | - 0: negative sample, no assigned gt
29 | - positive integer: positive sample, index (1-based) of assigned gt
30 |
31 | Args:
32 | cls_weight (int | float, optional): The scale factor for classification
33 | cost. Default 1.0.
34 | bbox_weight (int | float, optional): The scale factor for regression
35 | L1 cost. Default 1.0.
36 | iou_weight (int | float, optional): The scale factor for regression
37 | iou cost. Default 1.0.
38 | iou_calculator (dict | optional): The config for the iou calculation.
39 | Default type `BboxOverlaps2D`.
40 | iou_mode (str | optional): "iou" (intersection over union), "iof"
41 | (intersection over foreground), or "giou" (generalized
42 | intersection over union). Default "giou".
43 | """
44 |
45 | def __init__(self,
46 | cls_cost=dict(type='ClassificationCost', weight=1.),
47 | reg_cost=dict(type='BBoxL1Cost', weight=1.0),
48 | iou_cost=dict(type='IoUCost', iou_mode='giou', weight=1.0)):
49 | self.cls_cost = build_match_cost(cls_cost)
50 | self.reg_cost = build_match_cost(reg_cost)
51 | self.iou_cost = build_match_cost(iou_cost)
52 |
53 | def assign(self,
54 | bbox_pred,
55 | cls_pred,
56 | gt_bboxes,
57 | gt_labels,
58 | img_meta,
59 | gt_bboxes_ignore=None,
60 | eps=1e-7):
61 | """Computes one-to-one matching based on the weighted costs.
62 |
63 | This method assign each query prediction to a ground truth or
64 | background. The `assigned_gt_inds` with -1 means don't care,
65 | 0 means negative sample, and positive number is the index (1-based)
66 | of assigned gt.
67 | The assignment is done in the following steps, the order matters.
68 |
69 | 1. assign every prediction to -1
70 | 2. compute the weighted costs
71 | 3. do Hungarian matching on CPU based on the costs
72 | 4. assign all to 0 (background) first, then for each matched pair
73 | between predictions and gts, treat this prediction as foreground
74 | and assign the corresponding gt index (plus 1) to it.
75 |
76 | Args:
77 | bbox_pred (Tensor): Predicted boxes with normalized coordinates
78 | (cx, cy, w, h), which are all in range [0, 1]. Shape
79 | [num_query, 4].
80 | cls_pred (Tensor): Predicted classification logits, shape
81 | [num_query, num_class].
82 | gt_bboxes (Tensor): Ground truth boxes with unnormalized
83 | coordinates (x1, y1, x2, y2). Shape [num_gt, 4].
84 | gt_labels (Tensor): Label of `gt_bboxes`, shape (num_gt,).
85 | img_meta (dict): Meta information for current image.
86 | gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are
87 | labelled as `ignored`. Default None.
88 | eps (int | float, optional): A value added to the denominator for
89 | numerical stability. Default 1e-7.
90 |
91 | Returns:
92 | :obj:`AssignResult`: The assigned result.
93 | """
94 | assert gt_bboxes_ignore is None, \
95 | 'Only case when gt_bboxes_ignore is None is supported.'
96 | num_gts, num_bboxes = gt_bboxes.size(0), bbox_pred.size(0)
97 |
98 | # 1. assign -1 by default
99 | assigned_gt_inds = bbox_pred.new_full((num_bboxes, ),
100 | -1,
101 | dtype=torch.long)
102 | assigned_labels = bbox_pred.new_full((num_bboxes, ),
103 | -1,
104 | dtype=torch.long)
105 | if num_gts == 0 or num_bboxes == 0:
106 | # No ground truth or boxes, return empty assignment
107 | if num_gts == 0:
108 | # No ground truth, assign all to background
109 | assigned_gt_inds[:] = 0
110 | return AssignResult(
111 | num_gts, assigned_gt_inds, None, labels=assigned_labels)
112 | img_h, img_w, _ = img_meta['img_shape']
113 | factor = gt_bboxes.new_tensor([img_w, img_h, img_w,
114 | img_h]).unsqueeze(0)
115 |
116 | # 2. compute the weighted costs
117 | # regression L1 cost
118 | normalize_gt_bboxes = gt_bboxes / factor
119 | reg_cost = self.reg_cost(bbox_pred, normalize_gt_bboxes)
120 | # regression iou cost, defaultly giou is used in official DETR.
121 | bboxes = bbox_cxcywh_to_xyxy(bbox_pred) * factor
122 | iou_cost = self.iou_cost(bboxes, gt_bboxes)
123 | # classification and bboxcost.
124 | cls_cost = self.cls_cost(cls_pred, bboxes, gt_bboxes)
125 | # weighted sum of above three costs
126 | cost = cls_cost + reg_cost + iou_cost
127 |
128 | # 3. do Hungarian matching on CPU using linear_sum_assignment
129 | cost = cost.detach().cpu()
130 | if linear_sum_assignment is None:
131 | raise ImportError('Please run "pip install scipy" '
132 | 'to install scipy first.')
133 | matched_row_inds, matched_col_inds = linear_sum_assignment(cost)
134 | matched_row_inds = torch.from_numpy(matched_row_inds).to(
135 | bbox_pred.device)
136 | matched_col_inds = torch.from_numpy(matched_col_inds).to(
137 | bbox_pred.device)
138 |
139 | # 4. assign backgrounds and foregrounds
140 | # assign all indices to backgrounds first
141 | assigned_gt_inds[:] = 0
142 | # assign foregrounds based on matching results
143 | assigned_gt_inds[matched_row_inds] = matched_col_inds + 1
144 | assigned_labels[matched_row_inds] = gt_labels[matched_col_inds]
145 | return AssignResult(
146 | num_gts, assigned_gt_inds, None, labels=assigned_labels)
147 |
--------------------------------------------------------------------------------
/core/bbox/match_costs/__init__.py:
--------------------------------------------------------------------------------
1 | from .objectness_l1_cost import ObjectnessL1Cost
--------------------------------------------------------------------------------
/core/bbox/match_costs/objectness_l1_cost.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn.functional as F
3 |
4 | from mmdet.core.bbox.iou_calculators import bbox_overlaps
5 | from mmdet.core.bbox.transforms import bbox_cxcywh_to_xyxy, bbox_xyxy_to_cxcywh
6 | from mmdet.core.bbox.match_costs.builder import MATCH_COST
7 |
8 | @MATCH_COST.register_module()
9 | class ObjectnessL1Cost:
10 | """BBoxL1Cost.
11 |
12 | Args:
13 | weight (int | float, optional): loss_weight
14 | iou_mode (str, optional): iou mode such as 'iou' | 'giou'
15 |
16 | Examples:
17 | >>> from mmdet.core.bbox.match_costs.match_cost import BBoxL1Cost
18 | >>> import torch
19 | >>> self = L1Cost()
20 | >>> bbox_pred = torch.rand(10, 1)
21 | >>> gt_bboxes= torch.FloatTensor([0.8, 0.9])
22 | >>> self(bbox_pred, gt_bboxes)
23 | tensor([[1.6172, 1.6422]])
24 | """
25 |
26 | def __init__(self, weight=1., iou_mode='iou'):
27 | self.weight = weight
28 | self.iou_mode = iou_mode
29 |
30 | def __call__(self, cls_pred, bboxes, gt_bboxes):
31 | """
32 | Args:
33 | cls_pred (Tensor): Predicted boxes with normalized coordinates
34 | (cx, cy, w, h), which are all in range [0, 1]. Shape
35 | (num_query, 4).
36 | gt_targets (Tensor): Ground truth boxes with normalized
37 | coordinates (x1, y1, x2, y2). Shape (num_gt, 4).
38 |
39 | Returns:
40 | torch.Tensor: bbox_cost value with weight
41 | """
42 | # overlaps: [num_gt, num_bboxes]
43 | overlaps = bbox_overlaps(
44 | gt_bboxes, bboxes, mode=self.iou_mode, is_aligned=False)
45 | gt_targets, _ = torch.max(overlaps, 1, keepdim=True) # [num_gt, 1]
46 |
47 | objectness_cost = torch.cdist(cls_pred, gt_targets, p=1)
48 | return objectness_cost * self.weight
--------------------------------------------------------------------------------
/core/hook/__init__.py:
--------------------------------------------------------------------------------
1 | from .ema import *
--------------------------------------------------------------------------------
/core/hook/ema.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import math
3 |
4 | from mmcv.parallel import is_module_wrapper
5 | from mmcv.runner.hooks import HOOKS, Hook
6 |
7 |
8 | class BaseEMAHook(Hook):
9 | """Exponential Moving Average Hook.
10 |
11 | Use Exponential Moving Average on all parameters of model in training
12 | process. All parameters have a ema backup, which update by the formula
13 | as below. EMAHook takes priority over EvalHook and CheckpointHook. Note,
14 | the original model parameters are actually saved in ema field after train.
15 |
16 | Args:
17 | momentum (float): The momentum used for updating ema parameter.
18 | Ema's parameter are updated with the formula:
19 | `ema_param = (1-momentum) * ema_param + momentum * cur_param`.
20 | Defaults to 0.0002.
21 | skip_buffers (bool): Whether to skip the model buffers, such as
22 | batchnorm running stats (running_mean, running_var), it does not
23 | perform the ema operation. Default to False.
24 | interval (int): Update ema parameter every interval iteration.
25 | Defaults to 1.
26 | resume_from (str, optional): The checkpoint path. Defaults to None.
27 | momentum_fun (func, optional): The function to change momentum
28 | during early iteration (also warmup) to help early training.
29 | It uses `momentum` as a constant. Defaults to None.
30 | """
31 |
32 | def __init__(self,
33 | momentum=0.0002,
34 | interval=1,
35 | skip_buffers=False,
36 | resume_from=None,
37 | momentum_fun=None):
38 | assert 0 < momentum < 1
39 | self.momentum = momentum
40 | self.skip_buffers = skip_buffers
41 | self.interval = interval
42 | self.checkpoint = resume_from
43 | self.momentum_fun = momentum_fun
44 |
45 | def before_run(self, runner):
46 | """To resume model with it's ema parameters more friendly.
47 |
48 | Register ema parameter as ``named_buffer`` to model.
49 | """
50 | model = runner.model
51 | if is_module_wrapper(model):
52 | model = model.module
53 | self.param_ema_buffer = {}
54 | if self.skip_buffers:
55 | self.model_parameters = dict(model.named_parameters())
56 | else:
57 | self.model_parameters = model.state_dict()
58 | for name, value in self.model_parameters.items():
59 | # "." is not allowed in module's buffer name
60 | buffer_name = f"ema_{name.replace('.', '_')}"
61 | self.param_ema_buffer[name] = buffer_name
62 | model.register_buffer(buffer_name, value.data.clone())
63 | self.model_buffers = dict(model.named_buffers())
64 | if self.checkpoint is not None:
65 | runner.resume(self.checkpoint, resume_optimizer=False) # !!!ban loading optimizer state_dict
66 |
67 | def get_momentum(self, runner):
68 | return self.momentum_fun(runner.iter) if self.momentum_fun else \
69 | self.momentum
70 |
71 | def after_train_iter(self, runner):
72 | """Update ema parameter every self.interval iterations."""
73 | if (runner.iter + 1) % self.interval != 0:
74 | return
75 | momentum = self.get_momentum(runner)
76 | for name, parameter in self.model_parameters.items():
77 | # exclude num_tracking
78 | if parameter.dtype.is_floating_point:
79 | buffer_name = self.param_ema_buffer[name]
80 | buffer_parameter = self.model_buffers[buffer_name]
81 | buffer_parameter.mul_(1 - momentum).add_(
82 | parameter.data, alpha=momentum)
83 |
84 | def after_train_epoch(self, runner):
85 | """We load parameter values from ema backup to model before the
86 | EvalHook."""
87 | self._swap_ema_parameters()
88 |
89 | def before_train_epoch(self, runner):
90 | """We recover model's parameter from ema backup after last epoch's
91 | EvalHook."""
92 | self._swap_ema_parameters()
93 |
94 | def _swap_ema_parameters(self):
95 | """Swap the parameter of model with parameter in ema_buffer."""
96 | for name, value in self.model_parameters.items():
97 | temp = value.data.clone()
98 | ema_buffer = self.model_buffers[self.param_ema_buffer[name]]
99 | value.data.copy_(ema_buffer.data)
100 | ema_buffer.data.copy_(temp)
101 |
102 |
103 | @HOOKS.register_module()
104 | class LExpMomentumEMAHook(BaseEMAHook):
105 | """EMAHook using exponential momentum strategy.
106 |
107 | Args:
108 | total_iter (int): The total number of iterations of EMA momentum.
109 | Defaults to 2000.
110 | """
111 |
112 | def __init__(self, total_iter=2000, **kwargs):
113 | super(LExpMomentumEMAHook, self).__init__(**kwargs)
114 | self.momentum_fun = lambda x: (1 - self.momentum) * math.exp(-(
115 | 1 + x) / total_iter) + self.momentum
116 |
--------------------------------------------------------------------------------
/datasets/__init__.py:
--------------------------------------------------------------------------------
1 | from .coco_split_dataset import CocoSplitDataset
2 | from .objects365_split_dataset import Objects365SplitDataset
3 | from .uvo_dataset import UVODataset
4 |
5 | from .pipelines import CopyPaste
--------------------------------------------------------------------------------
/datasets/coco.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from mmcv.utils import print_log
3 | from pycocotools.coco import COCO
4 |
5 |
6 | from mmdet.datasets.coco import CocoDataset
7 |
8 | class CocoAnnDataset(CocoDataset):
9 |
10 | def __init__(self,
11 | **kwargs):
12 | # We convert all category IDs into 1 for the class-agnostic training and
13 | # evaluation. We train on train_class and evaluate on eval_class split.
14 | super(CocoAnnDataset, self).__init__(**kwargs)
15 | self.dataset_stat()
16 |
17 | CLASSES = ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
18 | 'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
19 | 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
20 | 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
21 | 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
22 | 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
23 | 'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
24 | 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
25 | 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
26 | 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
27 | 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
28 | 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
29 | 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
30 | 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush')
31 |
32 | def dataset_stat(self):
33 | num_images = len(self)
34 | num_instances = 0
35 | for i in range(num_images):
36 | ann = self.get_ann_info(i)
37 | num_bbox = ann['bboxes'].shape[0]
38 | num_instances += num_bbox
39 | print(f'Dataset images number: {num_images}')
40 | print(f'Dataset instances number: {num_instances}')
41 |
42 | def load_annotations(self, ann_file):
43 | """Load annotation from COCO style annotation file.
44 |
45 | Args:
46 | ann_file (str): Path of annotation file.
47 |
48 | Returns:
49 | list[dict]: Annotation info from COCO api.
50 | """
51 |
52 | self.coco = COCO(ann_file)
53 | # The order of returned `cat_ids` will not
54 | # change with the order of the CLASSES
55 | self.cat_ids = self.coco.get_cat_ids(cat_names=self.CLASSES)
56 |
57 | self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)}
58 | self.img_ids = self.coco.get_img_ids()
59 | data_infos = []
60 | total_ann_ids = []
61 | for i in self.img_ids:
62 | info = self.coco.load_imgs([i])[0]
63 | info['filename'] = info['file_name']
64 | data_infos.append(info)
65 | ann_ids = self.coco.get_ann_ids(img_ids=[i])
66 | total_ann_ids.extend(ann_ids)
67 | assert len(set(total_ann_ids)) == len(
68 | total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!"
69 | return data_infos
70 |
71 | def _filter_imgs(self, min_size=0):
72 | """Filter images too small or without ground truths."""
73 | valid_inds = []
74 | # obtain images that contain annotation
75 | ids_with_ann = set(_['image_id'] for _ in self.coco.anns.values())
76 | # obtain images that contain annotations of the required categories
77 | ids_in_cat = set()
78 | for i, class_id in enumerate(self.cat_ids):
79 | ids_in_cat |= set(self.coco.cat_img_map[class_id])
80 | # merge the image id sets of the two conditions and use the merged set
81 | # to filter out images if self.filter_empty_gt=True
82 | ids_in_cat &= ids_with_ann
83 |
84 | valid_img_ids = []
85 | for i, img_info in enumerate(self.data_infos):
86 | img_id = self.img_ids[i]
87 | if self.filter_empty_gt and img_id not in ids_in_cat:
88 | continue
89 | if min(img_info['width'], img_info['height']) >= min_size:
90 | valid_inds.append(i)
91 | valid_img_ids.append(img_id)
92 | self.img_ids = valid_img_ids
93 | return valid_inds
94 |
95 | def get_ann_info(self, idx):
96 | """Get COCO annotation by index.
97 |
98 | Args:
99 | idx (int): Index of data.
100 |
101 | Returns:
102 | dict: Annotation info of specified index.
103 | """
104 |
105 | img_id = self.data_infos[idx]['id']
106 | ann_ids = self.coco.get_ann_ids(img_ids=[img_id])
107 | ann_info = self.coco.load_anns(ann_ids)
108 | return self._parse_ann_info(self.data_infos[idx], ann_info)
109 |
110 | def _parse_ann_info(self, img_info, ann_info):
111 | """Parse bbox and mask annotation.
112 |
113 | Args:
114 | ann_info (list[dict]): Annotation info of an image.
115 | with_mask (bool): Whether to parse mask annotations.
116 |
117 | Returns:
118 | dict: A dict containing the following keys: bboxes, bboxes_ignore,\
119 | labels, masks, seg_map. "masks" are raw annotations and not \
120 | decoded into binary masks.
121 | """
122 | gt_bboxes = []
123 | gt_labels = []
124 | gt_bboxes_ignore = []
125 | gt_masks_ann = []
126 | for i, ann in enumerate(ann_info):
127 | if ann.get('ignore', False):
128 | continue
129 | x1, y1, w, h = ann['bbox']
130 | inter_w = max(0, min(x1 + w, img_info['width']) - max(x1, 0))
131 | inter_h = max(0, min(y1 + h, img_info['height']) - max(y1, 0))
132 | if inter_w * inter_h == 0:
133 | continue
134 | if ann['area'] <= 0 or w < 1 or h < 1:
135 | continue
136 | if ann['category_id'] not in self.cat_ids:
137 | continue
138 | bbox = [x1, y1, x1 + w, y1 + h]
139 | if ann.get('iscrowd', False):
140 | gt_bboxes_ignore.append(bbox)
141 | else:
142 | gt_bboxes.append(bbox)
143 | gt_labels.append(self.cat2label[ann['category_id']])
144 | gt_masks_ann.append(ann.get('segmentation', None))
145 |
146 | if gt_bboxes:
147 | gt_bboxes = np.array(gt_bboxes, dtype=np.float32)
148 | gt_labels = np.array(gt_labels, dtype=np.int64)
149 | else:
150 | gt_bboxes = np.zeros((0, 4), dtype=np.float32)
151 | gt_labels = np.array([], dtype=np.int64)
152 |
153 | if gt_bboxes_ignore:
154 | gt_bboxes_ignore = np.array(gt_bboxes_ignore, dtype=np.float32)
155 | else:
156 | gt_bboxes_ignore = np.zeros((0, 4), dtype=np.float32)
157 |
158 | seg_map = img_info['filename'].replace('jpg', 'png')
159 |
160 | ann = dict(
161 | bboxes=gt_bboxes,
162 | labels=gt_labels,
163 | bboxes_ignore=gt_bboxes_ignore,
164 | masks=gt_masks_ann,
165 | seg_map=seg_map)
166 |
167 | return ann
168 |
169 | if __name__=='__main__':
170 | from mmdet.core.utils import mask2ndarray
171 | train_pipeline = [
172 | dict(type='LoadImageFromFile'),
173 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
174 | dict(type='RandomFlip', flip_ratio=0.0),
175 | dict(type='DefaultFormatBundle'),
176 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
177 | ]
178 | ann_file = "/horizon-bucket/aidi_public_data/coco/origin/annotations/instances_val2017.json"
179 | img_prefix = "/horizon-bucket/aidi_public_data/coco/origin/val2017/"
180 | coco = CocoAnnDataset(pipeline=train_pipeline, ann_file=ann_file, img_prefix=img_prefix, test_mode=False)
181 | res = {}
182 | for item in coco:
183 | file_name = item['img_metas'].data['filename']
184 | file_name = '/'.join(file_name.split('/')[-2])
185 | masks = mask2ndarray(item['gt_masks'])
186 | res[file_name] = masks
187 | with open('save_ann.pkl', 'wb') as f:
188 | pickle.dump(res, f)
189 |
190 |
--------------------------------------------------------------------------------
/datasets/pipelines/__init__.py:
--------------------------------------------------------------------------------
1 | from .copypaste import CopyPaste
--------------------------------------------------------------------------------
/datasets/pipelines/copypaste.py:
--------------------------------------------------------------------------------
1 | import copy
2 | import inspect
3 | import math
4 | import warnings
5 |
6 | import cv2
7 | import mmcv
8 | import numpy as np
9 | from numpy import random
10 |
11 | from mmdet.core import BitmapMasks, PolygonMasks, find_inside_bboxes
12 | from mmdet.core.evaluation.bbox_overlaps import bbox_overlaps
13 | from mmdet.utils import log_img_scale
14 | from mmdet.datasets.builder import PIPELINES
15 |
16 | try:
17 | from imagecorruptions import corrupt
18 | except ImportError:
19 | corrupt = None
20 |
21 | try:
22 | import albumentations
23 | from albumentations import Compose
24 | except ImportError:
25 | albumentations = None
26 | Compose = None
27 |
28 | @PIPELINES.register_module()
29 | class CopyPaste:
30 | """Simple Copy-Paste is a Strong Data Augmentation Method for Instance
31 | Segmentation The simple copy-paste transform steps are as follows:
32 | 1. The destination image is already resized with aspect ratio kept,
33 | cropped and padded.
34 | 2. Randomly select a source image, which is also already resized
35 | with aspect ratio kept, cropped and padded in a similar way
36 | as the destination image.
37 | 3. Randomly select some objects from the source image.
38 | 4. Paste these source objects to the destination image directly,
39 | due to the source and destination image have the same size.
40 | 5. Update object masks of the destination image, for some origin objects
41 | may be occluded.
42 | 6. Generate bboxes from the updated destination masks and
43 | filter some objects which are totally occluded, and adjust bboxes
44 | which are partly occluded.
45 | 7. Append selected source bboxes, masks, and labels.
46 | Args:
47 | max_num_pasted (int): The maximum number of pasted objects.
48 | Default: 100.
49 | bbox_occluded_thr (int): The threshold of occluded bbox.
50 | Default: 10.
51 | mask_occluded_thr (int): The threshold of occluded mask.
52 | Default: 300.
53 | selected (bool): Whether select objects or not. If select is False,
54 | all objects of the source image will be pasted to the
55 | destination image.
56 | Default: True.
57 | """
58 |
59 | def __init__(
60 | self,
61 | max_num_pasted=100,
62 | bbox_occluded_thr=10,
63 | mask_occluded_thr=300,
64 | selected=True,
65 | ):
66 | self.max_num_pasted = max_num_pasted
67 | self.bbox_occluded_thr = bbox_occluded_thr
68 | self.mask_occluded_thr = mask_occluded_thr
69 | self.selected = selected
70 |
71 | def get_indexes(self, dataset):
72 | """Call function to collect indexes.s.
73 | Args:
74 | dataset (:obj:`MultiImageMixDataset`): The dataset.
75 | Returns:
76 | list: Indexes.
77 | """
78 | return random.randint(0, len(dataset))
79 |
80 | def __call__(self, results):
81 | """Call function to make a copy-paste of image.
82 | Args:
83 | results (dict): Result dict.
84 | Returns:
85 | dict: Result dict with copy-paste transformed.
86 | """
87 |
88 | assert 'mix_results' in results
89 | num_images = len(results['mix_results'])
90 | assert num_images == 1, \
91 | f'CopyPaste only supports processing 2 images, got {num_images}'
92 | if self.selected:
93 | selected_results = self._select_object(results['mix_results'][0])
94 | else:
95 | selected_results = results['mix_results'][0]
96 | return self._copy_paste(results, selected_results)
97 |
98 | def _select_object(self, results):
99 | """Select some objects from the source results."""
100 | bboxes = results['gt_bboxes']
101 | labels = results['gt_labels']
102 | masks = results['gt_masks']
103 | max_num_pasted = min(bboxes.shape[0] + 1, self.max_num_pasted)
104 | num_pasted = np.random.randint(0, max_num_pasted)
105 | selected_inds = np.random.choice(
106 | bboxes.shape[0], size=num_pasted, replace=False)
107 |
108 | selected_bboxes = bboxes[selected_inds]
109 | selected_labels = labels[selected_inds]
110 | selected_masks = masks[selected_inds]
111 |
112 | results['gt_bboxes'] = selected_bboxes
113 | results['gt_labels'] = selected_labels
114 | results['gt_masks'] = selected_masks
115 | return results
116 |
117 | def _copy_paste(self, dst_results, src_results):
118 | """CopyPaste transform function.
119 | Args:
120 | dst_results (dict): Result dict of the destination image.
121 | src_results (dict): Result dict of the source image.
122 | Returns:
123 | dict: Updated result dict.
124 | """
125 | dst_img = dst_results['img']
126 | dst_bboxes = dst_results['gt_bboxes']
127 | dst_labels = dst_results['gt_labels']
128 | dst_masks = dst_results['gt_masks']
129 |
130 | src_img = src_results['img']
131 | src_bboxes = src_results['gt_bboxes']
132 | src_labels = src_results['gt_labels']
133 | src_masks = src_results['gt_masks']
134 |
135 | if len(src_bboxes) == 0:
136 | return dst_results
137 |
138 | # update masks and generate bboxes from updated masks
139 | composed_mask = np.where(np.any(src_masks.masks, axis=0), 1, 0)
140 | updated_dst_masks = self.get_updated_masks(dst_masks, composed_mask)
141 | updated_dst_bboxes = updated_dst_masks.get_bboxes()
142 | assert len(updated_dst_bboxes) == len(updated_dst_masks)
143 |
144 | # filter totally occluded objects
145 | bboxes_inds = np.all(
146 | np.abs(
147 | (updated_dst_bboxes - dst_bboxes)) <= self.bbox_occluded_thr,
148 | axis=-1)
149 | masks_inds = updated_dst_masks.masks.sum(
150 | axis=(1, 2)) > self.mask_occluded_thr
151 | valid_inds = bboxes_inds | masks_inds
152 |
153 | # Paste source objects to destination image directly
154 | img = dst_img * (1 - composed_mask[..., np.newaxis]
155 | ) + src_img * composed_mask[..., np.newaxis]
156 | bboxes = np.concatenate([updated_dst_bboxes[valid_inds], src_bboxes])
157 | labels = np.concatenate([dst_labels[valid_inds], src_labels])
158 | masks = np.concatenate(
159 | [updated_dst_masks.masks[valid_inds], src_masks.masks])
160 |
161 | dst_results['img'] = img
162 | dst_results['gt_bboxes'] = bboxes
163 | dst_results['gt_labels'] = labels
164 | dst_results['gt_masks'] = BitmapMasks(masks, masks.shape[1],
165 | masks.shape[2])
166 |
167 | return dst_results
168 |
169 | def get_updated_masks(self, masks, composed_mask):
170 | assert masks.masks.shape[-2:] == composed_mask.shape[-2:], \
171 | 'Cannot compare two arrays of different size'
172 | masks.masks = np.where(composed_mask, 0, masks.masks)
173 | return masks
174 |
175 | def __repr__(self):
176 | repr_str = self.__class__.__name__
177 | repr_str += f'max_num_pasted={self.max_num_pasted}, '
178 | repr_str += f'bbox_occluded_thr={self.bbox_occluded_thr}, '
179 | repr_str += f'mask_occluded_thr={self.mask_occluded_thr}, '
180 | repr_str += f'selected={self.selected}, '
181 | return repr_str
--------------------------------------------------------------------------------
/models/__init__.py:
--------------------------------------------------------------------------------
1 | from .roi_heads import *
2 | from .necks import *
--------------------------------------------------------------------------------
/models/necks/__init__.py:
--------------------------------------------------------------------------------
1 | from .bifpn import BiFPN
--------------------------------------------------------------------------------
/models/roi_heads/__init__.py:
--------------------------------------------------------------------------------
1 | from .sparse_score_roi_head import *
2 | from .bbox_heads import *
3 | from .mask_heads import *
--------------------------------------------------------------------------------
/models/roi_heads/bbox_heads/__init__.py:
--------------------------------------------------------------------------------
1 | from .dii_score_head import DIIScoreHead
--------------------------------------------------------------------------------
/models/roi_heads/mask_heads/__init__.py:
--------------------------------------------------------------------------------
1 | from .dynamic_mask_head import DynamicMaskIoUHead
2 | from .maskiou_head import SparseMaskIoUHead
--------------------------------------------------------------------------------
/models/roi_heads/mask_heads/dynamic_mask_head.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import torch
3 | import torch.nn as nn
4 | from mmcv.runner import auto_fp16, force_fp32
5 |
6 | from mmdet.core import mask_target
7 | from mmdet.models.builder import HEADS
8 | from mmdet.models.dense_heads.atss_head import reduce_mean
9 | from mmdet.models.utils import build_transformer
10 | from mmdet.models.roi_heads.mask_heads.fcn_mask_head import FCNMaskHead
11 |
12 |
13 | @HEADS.register_module()
14 | class DynamicMaskIoUHead(FCNMaskHead):
15 | r"""Dynamic Mask Head for
16 | `Instances as Queries `_
17 | Args:
18 | num_convs (int): Number of convolution layer.
19 | Defaults to 4.
20 | roi_feat_size (int): The output size of RoI extractor,
21 | Defaults to 14.
22 | in_channels (int): Input feature channels.
23 | Defaults to 256.
24 | conv_kernel_size (int): Kernel size of convolution layers.
25 | Defaults to 3.
26 | conv_out_channels (int): Output channels of convolution layers.
27 | Defaults to 256.
28 | num_classes (int): Number of classes.
29 | Defaults to 80
30 | class_agnostic (int): Whether generate class agnostic prediction.
31 | Defaults to False.
32 | dropout (float): Probability of drop the channel.
33 | Defaults to 0.0
34 | upsample_cfg (dict): The config for upsample layer.
35 | conv_cfg (dict): The convolution layer config.
36 | norm_cfg (dict): The norm layer config.
37 | dynamic_conv_cfg (dict): The dynamic convolution layer config.
38 | loss_mask (dict): The config for mask loss.
39 | """
40 |
41 | def __init__(self,
42 | num_convs=4,
43 | roi_feat_size=14,
44 | in_channels=256,
45 | conv_kernel_size=3,
46 | conv_out_channels=256,
47 | num_classes=80,
48 | class_agnostic=False,
49 | upsample_cfg=dict(type='deconv', scale_factor=2),
50 | conv_cfg=None,
51 | norm_cfg=None,
52 | dynamic_conv_cfg=dict(
53 | type='DynamicConv',
54 | in_channels=256,
55 | feat_channels=64,
56 | out_channels=256,
57 | input_feat_shape=14,
58 | with_proj=False,
59 | act_cfg=dict(type='ReLU', inplace=True),
60 | norm_cfg=dict(type='LN')),
61 | loss_mask=dict(type='DiceLoss', loss_weight=8.0),
62 | **kwargs):
63 | super(DynamicMaskIoUHead, self).__init__(
64 | num_convs=num_convs,
65 | roi_feat_size=roi_feat_size,
66 | in_channels=in_channels,
67 | conv_kernel_size=conv_kernel_size,
68 | conv_out_channels=conv_out_channels,
69 | num_classes=num_classes,
70 | class_agnostic=class_agnostic,
71 | upsample_cfg=upsample_cfg,
72 | conv_cfg=conv_cfg,
73 | norm_cfg=norm_cfg,
74 | loss_mask=loss_mask,
75 | **kwargs)
76 | assert class_agnostic is False, \
77 | 'DynamicMaskHead only support class_agnostic=False'
78 | self.fp16_enabled = False
79 |
80 | self.instance_interactive_conv = build_transformer(dynamic_conv_cfg)
81 |
82 | def init_weights(self):
83 | """Use xavier initialization for all weight parameter and set
84 | classification head bias as a specific value when use focal loss."""
85 | for p in self.parameters():
86 | if p.dim() > 1:
87 | nn.init.xavier_uniform_(p)
88 | nn.init.constant_(self.conv_logits.bias, 0.)
89 |
90 | @auto_fp16()
91 | def forward(self, roi_feat, proposal_feat):
92 | """Forward function of DynamicMaskHead.
93 | Args:
94 | roi_feat (Tensor): Roi-pooling features with shape
95 | (batch_size*num_proposals, feature_dimensions,
96 | pooling_h , pooling_w).
97 | proposal_feat (Tensor): Intermediate feature get from
98 | diihead in last stage, has shape
99 | (batch_size*num_proposals, feature_dimensions)
100 | Returns:
101 | mask_pred (Tensor): Predicted foreground masks with shape
102 | (batch_size*num_proposals, num_classes,
103 | pooling_h*2, pooling_w*2).
104 | """
105 |
106 | proposal_feat = proposal_feat.reshape(-1, self.in_channels)
107 | proposal_feat_iic = self.instance_interactive_conv(
108 | proposal_feat, roi_feat)
109 |
110 | x = proposal_feat_iic.permute(0, 2, 1).reshape(roi_feat.size())
111 | x_feat = x
112 |
113 | for conv in self.convs:
114 | x = conv(x)
115 | if self.upsample is not None:
116 | x = self.upsample(x)
117 | if self.upsample_method == 'deconv':
118 | x = self.relu(x)
119 | mask_pred = self.conv_logits(x)
120 | return mask_pred, x_feat
121 |
122 | @force_fp32(apply_to=('mask_pred', ))
123 | def loss(self, mask_pred, mask_targets, labels):
124 | num_pos = labels.new_ones(labels.size()).float().sum()
125 | avg_factor = torch.clamp(reduce_mean(num_pos), min=1.).item()
126 | loss = dict()
127 | if mask_pred.size(0) == 0:
128 | loss_mask = mask_pred.sum()
129 | else:
130 | loss_mask = self.loss_mask(
131 | mask_pred[torch.arange(num_pos).long(), labels, ...].sigmoid(),
132 | mask_targets,
133 | avg_factor=avg_factor)
134 | loss['loss_mask'] = loss_mask
135 | return loss
136 |
137 | def get_targets(self, sampling_results, gt_masks, rcnn_train_cfg):
138 |
139 | pos_proposals = [res.pos_bboxes for res in sampling_results]
140 | pos_assigned_gt_inds = [
141 | res.pos_assigned_gt_inds for res in sampling_results
142 | ]
143 | mask_targets = mask_target(pos_proposals, pos_assigned_gt_inds,
144 | gt_masks, rcnn_train_cfg)
145 | return mask_targets
--------------------------------------------------------------------------------
/tools/analysis_tools/analyze_logs.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import json
4 | from collections import defaultdict
5 |
6 | import matplotlib.pyplot as plt
7 | import numpy as np
8 | import seaborn as sns
9 |
10 |
11 | def cal_train_time(log_dicts, args):
12 | for i, log_dict in enumerate(log_dicts):
13 | print(f'{"-" * 5}Analyze train time of {args.json_logs[i]}{"-" * 5}')
14 | all_times = []
15 | for epoch in log_dict.keys():
16 | if args.include_outliers:
17 | all_times.append(log_dict[epoch]['time'])
18 | else:
19 | all_times.append(log_dict[epoch]['time'][1:])
20 | if not all_times:
21 | raise KeyError(
22 | 'Please reduce the log interval in the config so that'
23 | 'interval is less than iterations of one epoch.')
24 | all_times = np.array(all_times)
25 | epoch_ave_time = all_times.mean(-1)
26 | slowest_epoch = epoch_ave_time.argmax()
27 | fastest_epoch = epoch_ave_time.argmin()
28 | std_over_epoch = epoch_ave_time.std()
29 | print(f'slowest epoch {slowest_epoch + 1}, '
30 | f'average time is {epoch_ave_time[slowest_epoch]:.4f}')
31 | print(f'fastest epoch {fastest_epoch + 1}, '
32 | f'average time is {epoch_ave_time[fastest_epoch]:.4f}')
33 | print(f'time std over epochs is {std_over_epoch:.4f}')
34 | print(f'average iter time: {np.mean(all_times):.4f} s/iter')
35 | print()
36 |
37 |
38 | def plot_curve(log_dicts, args):
39 | if args.backend is not None:
40 | plt.switch_backend(args.backend)
41 | sns.set_style(args.style)
42 | # if legend is None, use {filename}_{key} as legend
43 | legend = args.legend
44 | if legend is None:
45 | legend = []
46 | for json_log in args.json_logs:
47 | for metric in args.keys:
48 | legend.append(f'{json_log}_{metric}')
49 | assert len(legend) == (len(args.json_logs) * len(args.keys))
50 | metrics = args.keys
51 |
52 | num_metrics = len(metrics)
53 | for i, log_dict in enumerate(log_dicts):
54 | epochs = list(log_dict.keys())
55 | for j, metric in enumerate(metrics):
56 | print(f'plot curve of {args.json_logs[i]}, metric is {metric}')
57 | if metric not in log_dict[epochs[int(args.start_epoch) - 1]]:
58 | if 'mAP' in metric:
59 | raise KeyError(
60 | f'{args.json_logs[i]} does not contain metric '
61 | f'{metric}. Please check if "--no-validate" is '
62 | 'specified when you trained the model.')
63 | raise KeyError(
64 | f'{args.json_logs[i]} does not contain metric {metric}. '
65 | 'Please reduce the log interval in the config so that '
66 | 'interval is less than iterations of one epoch.')
67 |
68 | if 'mAP' in metric:
69 | xs = np.arange(
70 | int(args.start_epoch),
71 | max(epochs) + 1, int(args.eval_interval))
72 | ys = []
73 | for epoch in epochs:
74 | ys += log_dict[epoch][metric]
75 | ax = plt.gca()
76 | ax.set_xticks(xs)
77 | plt.xlabel('epoch')
78 | plt.plot(xs, ys, label=legend[i * num_metrics + j], marker='o')
79 | else:
80 | xs = []
81 | ys = []
82 | num_iters_per_epoch = log_dict[epochs[0]]['iter'][-2]
83 | for epoch in epochs:
84 | iters = log_dict[epoch]['iter']
85 | if log_dict[epoch]['mode'][-1] == 'val':
86 | iters = iters[:-1]
87 | xs.append(
88 | np.array(iters) + (epoch - 1) * num_iters_per_epoch)
89 | ys.append(np.array(log_dict[epoch][metric][:len(iters)]))
90 | xs = np.concatenate(xs)
91 | ys = np.concatenate(ys)
92 | plt.xlabel('iter')
93 | plt.plot(
94 | xs, ys, label=legend[i * num_metrics + j], linewidth=0.5)
95 | plt.legend()
96 | if args.title is not None:
97 | plt.title(args.title)
98 | if args.out is None:
99 | plt.show()
100 | else:
101 | print(f'save curve to: {args.out}')
102 | plt.savefig(args.out)
103 | plt.cla()
104 |
105 |
106 | def add_plot_parser(subparsers):
107 | parser_plt = subparsers.add_parser(
108 | 'plot_curve', help='parser for plotting curves')
109 | parser_plt.add_argument(
110 | 'json_logs',
111 | type=str,
112 | nargs='+',
113 | help='path of train log in json format')
114 | parser_plt.add_argument(
115 | '--keys',
116 | type=str,
117 | nargs='+',
118 | default=['bbox_mAP'],
119 | help='the metric that you want to plot')
120 | parser_plt.add_argument(
121 | '--start-epoch',
122 | type=str,
123 | default='1',
124 | help='the epoch that you want to start')
125 | parser_plt.add_argument(
126 | '--eval-interval',
127 | type=str,
128 | default='1',
129 | help='the eval interval when training')
130 | parser_plt.add_argument('--title', type=str, help='title of figure')
131 | parser_plt.add_argument(
132 | '--legend',
133 | type=str,
134 | nargs='+',
135 | default=None,
136 | help='legend of each plot')
137 | parser_plt.add_argument(
138 | '--backend', type=str, default=None, help='backend of plt')
139 | parser_plt.add_argument(
140 | '--style', type=str, default='dark', help='style of plt')
141 | parser_plt.add_argument('--out', type=str, default=None)
142 |
143 |
144 | def add_time_parser(subparsers):
145 | parser_time = subparsers.add_parser(
146 | 'cal_train_time',
147 | help='parser for computing the average time per training iteration')
148 | parser_time.add_argument(
149 | 'json_logs',
150 | type=str,
151 | nargs='+',
152 | help='path of train log in json format')
153 | parser_time.add_argument(
154 | '--include-outliers',
155 | action='store_true',
156 | help='include the first value of every epoch when computing '
157 | 'the average time')
158 |
159 |
160 | def parse_args():
161 | parser = argparse.ArgumentParser(description='Analyze Json Log')
162 | # currently only support plot curve and calculate average train time
163 | subparsers = parser.add_subparsers(dest='task', help='task parser')
164 | add_plot_parser(subparsers)
165 | add_time_parser(subparsers)
166 | args = parser.parse_args()
167 | return args
168 |
169 |
170 | def load_json_logs(json_logs):
171 | # load and convert json_logs to log_dict, key is epoch, value is a sub dict
172 | # keys of sub dict is different metrics, e.g. memory, bbox_mAP
173 | # value of sub dict is a list of corresponding values of all iterations
174 | log_dicts = [dict() for _ in json_logs]
175 | for json_log, log_dict in zip(json_logs, log_dicts):
176 | with open(json_log, 'r') as log_file:
177 | for line in log_file:
178 | log = json.loads(line.strip())
179 | # skip lines without `epoch` field
180 | if 'epoch' not in log:
181 | continue
182 | epoch = log.pop('epoch')
183 | if epoch not in log_dict:
184 | log_dict[epoch] = defaultdict(list)
185 | for k, v in log.items():
186 | log_dict[epoch][k].append(v)
187 | return log_dicts
188 |
189 |
190 | def main():
191 | args = parse_args()
192 |
193 | json_logs = args.json_logs
194 | for json_log in json_logs:
195 | assert json_log.endswith('.json')
196 |
197 | log_dicts = load_json_logs(json_logs)
198 |
199 | eval(args.task)(log_dicts, args)
200 |
201 |
202 | if __name__ == '__main__':
203 | main()
204 |
--------------------------------------------------------------------------------
/tools/analysis_tools/analyze_results.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import os.path as osp
4 |
5 | import mmcv
6 | import numpy as np
7 | from mmcv import Config, DictAction
8 |
9 | from mmdet.core.evaluation import eval_map
10 | from mmdet.core.visualization import imshow_gt_det_bboxes
11 | from mmdet.datasets import build_dataset, get_loading_pipeline
12 | from mmdet.utils import update_data_root
13 |
14 |
15 | def bbox_map_eval(det_result, annotation):
16 | """Evaluate mAP of single image det result.
17 |
18 | Args:
19 | det_result (list[list]): [[cls1_det, cls2_det, ...], ...].
20 | The outer list indicates images, and the inner list indicates
21 | per-class detected bboxes.
22 | annotation (dict): Ground truth annotations where keys of
23 | annotations are:
24 |
25 | - bboxes: numpy array of shape (n, 4)
26 | - labels: numpy array of shape (n, )
27 | - bboxes_ignore (optional): numpy array of shape (k, 4)
28 | - labels_ignore (optional): numpy array of shape (k, )
29 |
30 | Returns:
31 | float: mAP
32 | """
33 |
34 | # use only bbox det result
35 | if isinstance(det_result, tuple):
36 | bbox_det_result = [det_result[0]]
37 | else:
38 | bbox_det_result = [det_result]
39 | # mAP
40 | iou_thrs = np.linspace(
41 | .5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True)
42 | mean_aps = []
43 | for thr in iou_thrs:
44 | mean_ap, _ = eval_map(
45 | bbox_det_result, [annotation], iou_thr=thr, logger='silent')
46 | mean_aps.append(mean_ap)
47 | return sum(mean_aps) / len(mean_aps)
48 |
49 |
50 | class ResultVisualizer:
51 | """Display and save evaluation results.
52 |
53 | Args:
54 | show (bool): Whether to show the image. Default: True
55 | wait_time (float): Value of waitKey param. Default: 0.
56 | score_thr (float): Minimum score of bboxes to be shown.
57 | Default: 0
58 | """
59 |
60 | def __init__(self, show=False, wait_time=0, score_thr=0):
61 | self.show = show
62 | self.wait_time = wait_time
63 | self.score_thr = score_thr
64 |
65 | def _save_image_gts_results(self, dataset, results, mAPs, out_dir=None):
66 | mmcv.mkdir_or_exist(out_dir)
67 |
68 | for mAP_info in mAPs:
69 | index, mAP = mAP_info
70 | data_info = dataset.prepare_train_img(index)
71 |
72 | # calc save file path
73 | filename = data_info['filename']
74 | if data_info['img_prefix'] is not None:
75 | filename = osp.join(data_info['img_prefix'], filename)
76 | else:
77 | filename = data_info['filename']
78 | fname, name = osp.splitext(osp.basename(filename))
79 | save_filename = fname + '_' + str(round(mAP, 3)) + name
80 | out_file = osp.join(out_dir, save_filename)
81 | imshow_gt_det_bboxes(
82 | data_info['img'],
83 | data_info,
84 | results[index],
85 | dataset.CLASSES,
86 | gt_bbox_color=dataset.PALETTE,
87 | gt_text_color=(200, 200, 200),
88 | gt_mask_color=dataset.PALETTE,
89 | det_bbox_color=dataset.PALETTE,
90 | det_text_color=(200, 200, 200),
91 | det_mask_color=dataset.PALETTE,
92 | show=self.show,
93 | score_thr=self.score_thr,
94 | wait_time=self.wait_time,
95 | out_file=out_file)
96 |
97 | def evaluate_and_show(self,
98 | dataset,
99 | results,
100 | topk=20,
101 | show_dir='work_dir',
102 | eval_fn=None):
103 | """Evaluate and show results.
104 |
105 | Args:
106 | dataset (Dataset): A PyTorch dataset.
107 | results (list): Det results from test results pkl file
108 | topk (int): Number of the highest topk and
109 | lowest topk after evaluation index sorting. Default: 20
110 | show_dir (str, optional): The filename to write the image.
111 | Default: 'work_dir'
112 | eval_fn (callable, optional): Eval function, Default: None
113 | """
114 |
115 | assert topk > 0
116 | if (topk * 2) > len(dataset):
117 | topk = len(dataset) // 2
118 |
119 | if eval_fn is None:
120 | eval_fn = bbox_map_eval
121 | else:
122 | assert callable(eval_fn)
123 |
124 | prog_bar = mmcv.ProgressBar(len(results))
125 | _mAPs = {}
126 | for i, (result, ) in enumerate(zip(results)):
127 | # self.dataset[i] should not call directly
128 | # because there is a risk of mismatch
129 | data_info = dataset.prepare_train_img(i)
130 | mAP = eval_fn(result, data_info['ann_info'])
131 | _mAPs[i] = mAP
132 | prog_bar.update()
133 |
134 | # descending select topk image
135 | _mAPs = list(sorted(_mAPs.items(), key=lambda kv: kv[1]))
136 | good_mAPs = _mAPs[-topk:]
137 | bad_mAPs = _mAPs[:topk]
138 |
139 | good_dir = osp.abspath(osp.join(show_dir, 'good'))
140 | bad_dir = osp.abspath(osp.join(show_dir, 'bad'))
141 | self._save_image_gts_results(dataset, results, good_mAPs, good_dir)
142 | self._save_image_gts_results(dataset, results, bad_mAPs, bad_dir)
143 |
144 |
145 | def parse_args():
146 | parser = argparse.ArgumentParser(
147 | description='MMDet eval image prediction result for each')
148 | parser.add_argument('config', help='test config file path')
149 | parser.add_argument(
150 | 'prediction_path', help='prediction path where test pkl result')
151 | parser.add_argument(
152 | 'show_dir', help='directory where painted images will be saved')
153 | parser.add_argument('--show', action='store_true', help='show results')
154 | parser.add_argument(
155 | '--wait-time',
156 | type=float,
157 | default=0,
158 | help='the interval of show (s), 0 is block')
159 | parser.add_argument(
160 | '--topk',
161 | default=20,
162 | type=int,
163 | help='saved Number of the highest topk '
164 | 'and lowest topk after index sorting')
165 | parser.add_argument(
166 | '--show-score-thr',
167 | type=float,
168 | default=0,
169 | help='score threshold (default: 0.)')
170 | parser.add_argument(
171 | '--cfg-options',
172 | nargs='+',
173 | action=DictAction,
174 | help='override some settings in the used config, the key-value pair '
175 | 'in xxx=yyy format will be merged into config file. If the value to '
176 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
177 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
178 | 'Note that the quotation marks are necessary and that no white space '
179 | 'is allowed.')
180 | args = parser.parse_args()
181 | return args
182 |
183 |
184 | def main():
185 | args = parse_args()
186 |
187 | mmcv.check_file_exist(args.prediction_path)
188 |
189 | cfg = Config.fromfile(args.config)
190 |
191 | # update data root according to MMDET_DATASETS
192 | update_data_root(cfg)
193 |
194 | if args.cfg_options is not None:
195 | cfg.merge_from_dict(args.cfg_options)
196 | cfg.data.test.test_mode = True
197 |
198 | cfg.data.test.pop('samples_per_gpu', 0)
199 | cfg.data.test.pipeline = get_loading_pipeline(cfg.data.train.pipeline)
200 | dataset = build_dataset(cfg.data.test)
201 | outputs = mmcv.load(args.prediction_path)
202 |
203 | result_visualizer = ResultVisualizer(args.show, args.wait_time,
204 | args.show_score_thr)
205 | result_visualizer.evaluate_and_show(
206 | dataset, outputs, topk=args.topk, show_dir=args.show_dir)
207 |
208 |
209 | if __name__ == '__main__':
210 | main()
211 |
--------------------------------------------------------------------------------
/tools/analysis_tools/benchmark.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import copy
4 | import os
5 | import time
6 |
7 | import torch
8 | from mmcv import Config, DictAction
9 | from mmcv.cnn import fuse_conv_bn
10 | from mmcv.parallel import MMDistributedDataParallel
11 | from mmcv.runner import init_dist, load_checkpoint, wrap_fp16_model
12 |
13 | from mmdet.datasets import (build_dataloader, build_dataset,
14 | replace_ImageToTensor)
15 | from mmdet.models import build_detector
16 | from mmdet.utils import update_data_root
17 |
18 |
19 | def parse_args():
20 | parser = argparse.ArgumentParser(description='MMDet benchmark a model')
21 | parser.add_argument('config', help='test config file path')
22 | parser.add_argument('checkpoint', help='checkpoint file')
23 | parser.add_argument(
24 | '--repeat-num',
25 | type=int,
26 | default=1,
27 | help='number of repeat times of measurement for averaging the results')
28 | parser.add_argument(
29 | '--max-iter', type=int, default=2000, help='num of max iter')
30 | parser.add_argument(
31 | '--log-interval', type=int, default=50, help='interval of logging')
32 | parser.add_argument(
33 | '--fuse-conv-bn',
34 | action='store_true',
35 | help='Whether to fuse conv and bn, this will slightly increase'
36 | 'the inference speed')
37 | parser.add_argument(
38 | '--cfg-options',
39 | nargs='+',
40 | action=DictAction,
41 | help='override some settings in the used config, the key-value pair '
42 | 'in xxx=yyy format will be merged into config file. If the value to '
43 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
44 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
45 | 'Note that the quotation marks are necessary and that no white space '
46 | 'is allowed.')
47 | parser.add_argument(
48 | '--launcher',
49 | choices=['none', 'pytorch', 'slurm', 'mpi'],
50 | default='none',
51 | help='job launcher')
52 | parser.add_argument('--local_rank', type=int, default=0)
53 | args = parser.parse_args()
54 | if 'LOCAL_RANK' not in os.environ:
55 | os.environ['LOCAL_RANK'] = str(args.local_rank)
56 | return args
57 |
58 |
59 | def measure_inference_speed(cfg, checkpoint, max_iter, log_interval,
60 | is_fuse_conv_bn):
61 | # set cudnn_benchmark
62 | if cfg.get('cudnn_benchmark', False):
63 | torch.backends.cudnn.benchmark = True
64 | cfg.model.pretrained = None
65 | cfg.data.test.test_mode = True
66 |
67 | # build the dataloader
68 | samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)
69 | if samples_per_gpu > 1:
70 | # Replace 'ImageToTensor' to 'DefaultFormatBundle'
71 | cfg.data.test.pipeline = replace_ImageToTensor(cfg.data.test.pipeline)
72 | dataset = build_dataset(cfg.data.test)
73 | data_loader = build_dataloader(
74 | dataset,
75 | samples_per_gpu=1,
76 | # Because multiple processes will occupy additional CPU resources,
77 | # FPS statistics will be more unstable when workers_per_gpu is not 0.
78 | # It is reasonable to set workers_per_gpu to 0.
79 | workers_per_gpu=0,
80 | dist=True,
81 | shuffle=False)
82 |
83 | # build the model and load checkpoint
84 | cfg.model.train_cfg = None
85 | model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
86 | fp16_cfg = cfg.get('fp16', None)
87 | if fp16_cfg is not None:
88 | wrap_fp16_model(model)
89 | load_checkpoint(model, checkpoint, map_location='cpu')
90 | if is_fuse_conv_bn:
91 | model = fuse_conv_bn(model)
92 |
93 | model = MMDistributedDataParallel(
94 | model.cuda(),
95 | device_ids=[torch.cuda.current_device()],
96 | broadcast_buffers=False)
97 | model.eval()
98 |
99 | # the first several iterations may be very slow so skip them
100 | num_warmup = 5
101 | pure_inf_time = 0
102 | fps = 0
103 |
104 | # benchmark with 2000 image and take the average
105 | for i, data in enumerate(data_loader):
106 |
107 | torch.cuda.synchronize()
108 | start_time = time.perf_counter()
109 |
110 | with torch.no_grad():
111 | model(return_loss=False, rescale=True, **data)
112 |
113 | torch.cuda.synchronize()
114 | elapsed = time.perf_counter() - start_time
115 |
116 | if i >= num_warmup:
117 | pure_inf_time += elapsed
118 | if (i + 1) % log_interval == 0:
119 | fps = (i + 1 - num_warmup) / pure_inf_time
120 | print(
121 | f'Done image [{i + 1:<3}/ {max_iter}], '
122 | f'fps: {fps:.1f} img / s, '
123 | f'times per image: {1000 / fps:.1f} ms / img',
124 | flush=True)
125 |
126 | if (i + 1) == max_iter:
127 | fps = (i + 1 - num_warmup) / pure_inf_time
128 | print(
129 | f'Overall fps: {fps:.1f} img / s, '
130 | f'times per image: {1000 / fps:.1f} ms / img',
131 | flush=True)
132 | break
133 | return fps
134 |
135 |
136 | def repeat_measure_inference_speed(cfg,
137 | checkpoint,
138 | max_iter,
139 | log_interval,
140 | is_fuse_conv_bn,
141 | repeat_num=1):
142 | assert repeat_num >= 1
143 |
144 | fps_list = []
145 |
146 | for _ in range(repeat_num):
147 | #
148 | cp_cfg = copy.deepcopy(cfg)
149 |
150 | fps_list.append(
151 | measure_inference_speed(cp_cfg, checkpoint, max_iter, log_interval,
152 | is_fuse_conv_bn))
153 |
154 | if repeat_num > 1:
155 | fps_list_ = [round(fps, 1) for fps in fps_list]
156 | times_pre_image_list_ = [round(1000 / fps, 1) for fps in fps_list]
157 | mean_fps_ = sum(fps_list_) / len(fps_list_)
158 | mean_times_pre_image_ = sum(times_pre_image_list_) / len(
159 | times_pre_image_list_)
160 | print(
161 | f'Overall fps: {fps_list_}[{mean_fps_:.1f}] img / s, '
162 | f'times per image: '
163 | f'{times_pre_image_list_}[{mean_times_pre_image_:.1f}] ms / img',
164 | flush=True)
165 | return fps_list
166 |
167 | return fps_list[0]
168 |
169 |
170 | def main():
171 | args = parse_args()
172 |
173 | cfg = Config.fromfile(args.config)
174 |
175 | # update data root according to MMDET_DATASETS
176 | update_data_root(cfg)
177 |
178 | if args.cfg_options is not None:
179 | cfg.merge_from_dict(args.cfg_options)
180 |
181 | if args.launcher == 'none':
182 | raise NotImplementedError('Only supports distributed mode')
183 | else:
184 | init_dist(args.launcher, **cfg.dist_params)
185 |
186 | repeat_measure_inference_speed(cfg, args.checkpoint, args.max_iter,
187 | args.log_interval, args.fuse_conv_bn,
188 | args.repeat_num)
189 |
190 |
191 | if __name__ == '__main__':
192 | main()
193 |
--------------------------------------------------------------------------------
/tools/analysis_tools/eval_metric.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 |
4 | import mmcv
5 | from mmcv import Config, DictAction
6 |
7 | from mmdet.datasets import build_dataset
8 | from mmdet.utils import update_data_root
9 |
10 |
11 | def parse_args():
12 | parser = argparse.ArgumentParser(description='Evaluate metric of the '
13 | 'results saved in pkl format')
14 | parser.add_argument('config', help='Config of the model')
15 | parser.add_argument('pkl_results', help='Results in pickle format')
16 | parser.add_argument(
17 | '--format-only',
18 | action='store_true',
19 | help='Format the output results without perform evaluation. It is'
20 | 'useful when you want to format the result to a specific format and '
21 | 'submit it to the test server')
22 | parser.add_argument(
23 | '--eval',
24 | type=str,
25 | nargs='+',
26 | help='Evaluation metrics, which depends on the dataset, e.g., "bbox",'
27 | ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
28 | parser.add_argument(
29 | '--cfg-options',
30 | nargs='+',
31 | action=DictAction,
32 | help='override some settings in the used config, the key-value pair '
33 | 'in xxx=yyy format will be merged into config file. If the value to '
34 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
35 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
36 | 'Note that the quotation marks are necessary and that no white space '
37 | 'is allowed.')
38 | parser.add_argument(
39 | '--eval-options',
40 | nargs='+',
41 | action=DictAction,
42 | help='custom options for evaluation, the key-value pair in xxx=yyy '
43 | 'format will be kwargs for dataset.evaluate() function')
44 | args = parser.parse_args()
45 | return args
46 |
47 |
48 | def main():
49 | args = parse_args()
50 |
51 | cfg = Config.fromfile(args.config)
52 |
53 | # update data root according to MMDET_DATASETS
54 | update_data_root(cfg)
55 |
56 | assert args.eval or args.format_only, (
57 | 'Please specify at least one operation (eval/format the results) with '
58 | 'the argument "--eval", "--format-only"')
59 | if args.eval and args.format_only:
60 | raise ValueError('--eval and --format_only cannot be both specified')
61 |
62 | if args.cfg_options is not None:
63 | cfg.merge_from_dict(args.cfg_options)
64 | cfg.data.test.test_mode = True
65 |
66 | dataset = build_dataset(cfg.data.test)
67 | outputs = mmcv.load(args.pkl_results)
68 |
69 | kwargs = {} if args.eval_options is None else args.eval_options
70 | if args.format_only:
71 | dataset.format_results(outputs, **kwargs)
72 | if args.eval:
73 | eval_kwargs = cfg.get('evaluation', {}).copy()
74 | # hard-code way to remove EvalHook args
75 | for key in [
76 | 'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
77 | 'rule'
78 | ]:
79 | eval_kwargs.pop(key, None)
80 | eval_kwargs.update(dict(metric=args.eval, **kwargs))
81 | print(dataset.evaluate(outputs, **eval_kwargs))
82 |
83 |
84 | if __name__ == '__main__':
85 | main()
86 |
--------------------------------------------------------------------------------
/tools/analysis_tools/get_flops.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 |
4 | import numpy as np
5 | import torch
6 | from mmcv import Config, DictAction
7 |
8 | from mmdet.models import build_detector
9 |
10 | try:
11 | from mmcv.cnn import get_model_complexity_info
12 | except ImportError:
13 | raise ImportError('Please upgrade mmcv to >0.6.2')
14 |
15 |
16 | def parse_args():
17 | parser = argparse.ArgumentParser(description='Train a detector')
18 | parser.add_argument('config', help='train config file path')
19 | parser.add_argument(
20 | '--shape',
21 | type=int,
22 | nargs='+',
23 | default=[1280, 800],
24 | help='input image size')
25 | parser.add_argument(
26 | '--cfg-options',
27 | nargs='+',
28 | action=DictAction,
29 | help='override some settings in the used config, the key-value pair '
30 | 'in xxx=yyy format will be merged into config file. If the value to '
31 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
32 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
33 | 'Note that the quotation marks are necessary and that no white space '
34 | 'is allowed.')
35 | parser.add_argument(
36 | '--size-divisor',
37 | type=int,
38 | default=32,
39 | help='Pad the input image, the minimum size that is divisible '
40 | 'by size_divisor, -1 means do not pad the image.')
41 | args = parser.parse_args()
42 | return args
43 |
44 |
45 | def main():
46 |
47 | args = parse_args()
48 |
49 | if len(args.shape) == 1:
50 | h = w = args.shape[0]
51 | elif len(args.shape) == 2:
52 | h, w = args.shape
53 | else:
54 | raise ValueError('invalid input shape')
55 | ori_shape = (3, h, w)
56 | divisor = args.size_divisor
57 | if divisor > 0:
58 | h = int(np.ceil(h / divisor)) * divisor
59 | w = int(np.ceil(w / divisor)) * divisor
60 |
61 | input_shape = (3, h, w)
62 |
63 | cfg = Config.fromfile(args.config)
64 | if args.cfg_options is not None:
65 | cfg.merge_from_dict(args.cfg_options)
66 |
67 | model = build_detector(
68 | cfg.model,
69 | train_cfg=cfg.get('train_cfg'),
70 | test_cfg=cfg.get('test_cfg'))
71 | if torch.cuda.is_available():
72 | model.cuda()
73 | model.eval()
74 |
75 | if hasattr(model, 'forward_dummy'):
76 | model.forward = model.forward_dummy
77 | else:
78 | raise NotImplementedError(
79 | 'FLOPs counter is currently not currently supported with {}'.
80 | format(model.__class__.__name__))
81 |
82 | flops, params = get_model_complexity_info(model, input_shape)
83 | split_line = '=' * 30
84 |
85 | if divisor > 0 and \
86 | input_shape != ori_shape:
87 | print(f'{split_line}\nUse size divisor set input shape '
88 | f'from {ori_shape} to {input_shape}\n')
89 | print(f'{split_line}\nInput shape: {input_shape}\n'
90 | f'Flops: {flops}\nParams: {params}\n{split_line}')
91 | print('!!!Please be cautious if you use the results in papers. '
92 | 'You may need to check if all ops are supported and verify that the '
93 | 'flops computation is correct.')
94 |
95 |
96 | if __name__ == '__main__':
97 | main()
98 |
--------------------------------------------------------------------------------
/tools/dataset_converters/cityscapes.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import glob
4 | import os.path as osp
5 |
6 | import cityscapesscripts.helpers.labels as CSLabels
7 | import mmcv
8 | import numpy as np
9 | import pycocotools.mask as maskUtils
10 |
11 |
12 | def collect_files(img_dir, gt_dir):
13 | suffix = 'leftImg8bit.png'
14 | files = []
15 | for img_file in glob.glob(osp.join(img_dir, '**/*.png')):
16 | assert img_file.endswith(suffix), img_file
17 | inst_file = gt_dir + img_file[
18 | len(img_dir):-len(suffix)] + 'gtFine_instanceIds.png'
19 | # Note that labelIds are not converted to trainId for seg map
20 | segm_file = gt_dir + img_file[
21 | len(img_dir):-len(suffix)] + 'gtFine_labelIds.png'
22 | files.append((img_file, inst_file, segm_file))
23 | assert len(files), f'No images found in {img_dir}'
24 | print(f'Loaded {len(files)} images from {img_dir}')
25 |
26 | return files
27 |
28 |
29 | def collect_annotations(files, nproc=1):
30 | print('Loading annotation images')
31 | if nproc > 1:
32 | images = mmcv.track_parallel_progress(
33 | load_img_info, files, nproc=nproc)
34 | else:
35 | images = mmcv.track_progress(load_img_info, files)
36 |
37 | return images
38 |
39 |
40 | def load_img_info(files):
41 | img_file, inst_file, segm_file = files
42 | inst_img = mmcv.imread(inst_file, 'unchanged')
43 | # ids < 24 are stuff labels (filtering them first is about 5% faster)
44 | unique_inst_ids = np.unique(inst_img[inst_img >= 24])
45 | anno_info = []
46 | for inst_id in unique_inst_ids:
47 | # For non-crowd annotations, inst_id // 1000 is the label_id
48 | # Crowd annotations have <1000 instance ids
49 | label_id = inst_id // 1000 if inst_id >= 1000 else inst_id
50 | label = CSLabels.id2label[label_id]
51 | if not label.hasInstances or label.ignoreInEval:
52 | continue
53 |
54 | category_id = label.id
55 | iscrowd = int(inst_id < 1000)
56 | mask = np.asarray(inst_img == inst_id, dtype=np.uint8, order='F')
57 | mask_rle = maskUtils.encode(mask[:, :, None])[0]
58 |
59 | area = maskUtils.area(mask_rle)
60 | # convert to COCO style XYWH format
61 | bbox = maskUtils.toBbox(mask_rle)
62 |
63 | # for json encoding
64 | mask_rle['counts'] = mask_rle['counts'].decode()
65 |
66 | anno = dict(
67 | iscrowd=iscrowd,
68 | category_id=category_id,
69 | bbox=bbox.tolist(),
70 | area=area.tolist(),
71 | segmentation=mask_rle)
72 | anno_info.append(anno)
73 | video_name = osp.basename(osp.dirname(img_file))
74 | img_info = dict(
75 | # remove img_prefix for filename
76 | file_name=osp.join(video_name, osp.basename(img_file)),
77 | height=inst_img.shape[0],
78 | width=inst_img.shape[1],
79 | anno_info=anno_info,
80 | segm_file=osp.join(video_name, osp.basename(segm_file)))
81 |
82 | return img_info
83 |
84 |
85 | def cvt_annotations(image_infos, out_json_name):
86 | out_json = dict()
87 | img_id = 0
88 | ann_id = 0
89 | out_json['images'] = []
90 | out_json['categories'] = []
91 | out_json['annotations'] = []
92 | for image_info in image_infos:
93 | image_info['id'] = img_id
94 | anno_infos = image_info.pop('anno_info')
95 | out_json['images'].append(image_info)
96 | for anno_info in anno_infos:
97 | anno_info['image_id'] = img_id
98 | anno_info['id'] = ann_id
99 | out_json['annotations'].append(anno_info)
100 | ann_id += 1
101 | img_id += 1
102 | for label in CSLabels.labels:
103 | if label.hasInstances and not label.ignoreInEval:
104 | cat = dict(id=label.id, name=label.name)
105 | out_json['categories'].append(cat)
106 |
107 | if len(out_json['annotations']) == 0:
108 | out_json.pop('annotations')
109 |
110 | mmcv.dump(out_json, out_json_name)
111 | return out_json
112 |
113 |
114 | def parse_args():
115 | parser = argparse.ArgumentParser(
116 | description='Convert Cityscapes annotations to COCO format')
117 | parser.add_argument('cityscapes_path', help='cityscapes data path')
118 | parser.add_argument('--img-dir', default='leftImg8bit', type=str)
119 | parser.add_argument('--gt-dir', default='gtFine', type=str)
120 | parser.add_argument('-o', '--out-dir', help='output path')
121 | parser.add_argument(
122 | '--nproc', default=1, type=int, help='number of process')
123 | args = parser.parse_args()
124 | return args
125 |
126 |
127 | def main():
128 | args = parse_args()
129 | cityscapes_path = args.cityscapes_path
130 | out_dir = args.out_dir if args.out_dir else cityscapes_path
131 | mmcv.mkdir_or_exist(out_dir)
132 |
133 | img_dir = osp.join(cityscapes_path, args.img_dir)
134 | gt_dir = osp.join(cityscapes_path, args.gt_dir)
135 |
136 | set_name = dict(
137 | train='instancesonly_filtered_gtFine_train.json',
138 | val='instancesonly_filtered_gtFine_val.json',
139 | test='instancesonly_filtered_gtFine_test.json')
140 |
141 | for split, json_name in set_name.items():
142 | print(f'Converting {split} into {json_name}')
143 | with mmcv.Timer(
144 | print_tmpl='It took {}s to convert Cityscapes annotation'):
145 | files = collect_files(
146 | osp.join(img_dir, split), osp.join(gt_dir, split))
147 | image_infos = collect_annotations(files, nproc=args.nproc)
148 | cvt_annotations(image_infos, osp.join(out_dir, json_name))
149 |
150 |
151 | if __name__ == '__main__':
152 | main()
153 |
--------------------------------------------------------------------------------
/tools/dataset_converters/images2coco.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import os
4 |
5 | import mmcv
6 | from PIL import Image
7 |
8 |
9 | def parse_args():
10 | parser = argparse.ArgumentParser(
11 | description='Convert images to coco format without annotations')
12 | parser.add_argument('img_path', help='The root path of images')
13 | parser.add_argument(
14 | 'classes', type=str, help='The text file name of storage class list')
15 | parser.add_argument(
16 | 'out',
17 | type=str,
18 | help='The output annotation json file name, The save dir is in the '
19 | 'same directory as img_path')
20 | parser.add_argument(
21 | '-e',
22 | '--exclude-extensions',
23 | type=str,
24 | nargs='+',
25 | help='The suffix of images to be excluded, such as "png" and "bmp"')
26 | args = parser.parse_args()
27 | return args
28 |
29 |
30 | def collect_image_infos(path, exclude_extensions=None):
31 | img_infos = []
32 |
33 | images_generator = mmcv.scandir(path, recursive=True)
34 | for image_path in mmcv.track_iter_progress(list(images_generator)):
35 | if exclude_extensions is None or (
36 | exclude_extensions is not None
37 | and not image_path.lower().endswith(exclude_extensions)):
38 | image_path = os.path.join(path, image_path)
39 | img_pillow = Image.open(image_path)
40 | img_info = {
41 | 'filename': image_path,
42 | 'width': img_pillow.width,
43 | 'height': img_pillow.height,
44 | }
45 | img_infos.append(img_info)
46 | return img_infos
47 |
48 |
49 | def cvt_to_coco_json(img_infos, classes):
50 | image_id = 0
51 | coco = dict()
52 | coco['images'] = []
53 | coco['type'] = 'instance'
54 | coco['categories'] = []
55 | coco['annotations'] = []
56 | image_set = set()
57 |
58 | for category_id, name in enumerate(classes):
59 | category_item = dict()
60 | category_item['supercategory'] = str('none')
61 | category_item['id'] = int(category_id)
62 | category_item['name'] = str(name)
63 | coco['categories'].append(category_item)
64 |
65 | for img_dict in img_infos:
66 | file_name = img_dict['filename']
67 | assert file_name not in image_set
68 | image_item = dict()
69 | image_item['id'] = int(image_id)
70 | image_item['file_name'] = str(file_name)
71 | image_item['height'] = int(img_dict['height'])
72 | image_item['width'] = int(img_dict['width'])
73 | coco['images'].append(image_item)
74 | image_set.add(file_name)
75 |
76 | image_id += 1
77 | return coco
78 |
79 |
80 | def main():
81 | args = parse_args()
82 | assert args.out.endswith(
83 | 'json'), 'The output file name must be json suffix'
84 |
85 | # 1 load image list info
86 | img_infos = collect_image_infos(args.img_path, args.exclude_extensions)
87 |
88 | # 2 convert to coco format data
89 | classes = mmcv.list_from_file(args.classes)
90 | coco_info = cvt_to_coco_json(img_infos, classes)
91 |
92 | # 3 dump
93 | save_dir = os.path.join(args.img_path, '..', 'annotations')
94 | mmcv.mkdir_or_exist(save_dir)
95 | save_path = os.path.join(save_dir, args.out)
96 | mmcv.dump(coco_info, save_path)
97 | print(f'save json file: {save_path}')
98 |
99 |
100 | if __name__ == '__main__':
101 | main()
102 |
--------------------------------------------------------------------------------
/tools/deployment/mmdet2torchserve.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | from argparse import ArgumentParser, Namespace
3 | from pathlib import Path
4 | from tempfile import TemporaryDirectory
5 |
6 | import mmcv
7 |
8 | try:
9 | from model_archiver.model_packaging import package_model
10 | from model_archiver.model_packaging_utils import ModelExportUtils
11 | except ImportError:
12 | package_model = None
13 |
14 |
15 | def mmdet2torchserve(
16 | config_file: str,
17 | checkpoint_file: str,
18 | output_folder: str,
19 | model_name: str,
20 | model_version: str = '1.0',
21 | force: bool = False,
22 | ):
23 | """Converts MMDetection model (config + checkpoint) to TorchServe `.mar`.
24 |
25 | Args:
26 | config_file:
27 | In MMDetection config format.
28 | The contents vary for each task repository.
29 | checkpoint_file:
30 | In MMDetection checkpoint format.
31 | The contents vary for each task repository.
32 | output_folder:
33 | Folder where `{model_name}.mar` will be created.
34 | The file created will be in TorchServe archive format.
35 | model_name:
36 | If not None, used for naming the `{model_name}.mar` file
37 | that will be created under `output_folder`.
38 | If None, `{Path(checkpoint_file).stem}` will be used.
39 | model_version:
40 | Model's version.
41 | force:
42 | If True, if there is an existing `{model_name}.mar`
43 | file under `output_folder` it will be overwritten.
44 | """
45 | mmcv.mkdir_or_exist(output_folder)
46 |
47 | config = mmcv.Config.fromfile(config_file)
48 |
49 | with TemporaryDirectory() as tmpdir:
50 | config.dump(f'{tmpdir}/config.py')
51 |
52 | args = Namespace(
53 | **{
54 | 'model_file': f'{tmpdir}/config.py',
55 | 'serialized_file': checkpoint_file,
56 | 'handler': f'{Path(__file__).parent}/mmdet_handler.py',
57 | 'model_name': model_name or Path(checkpoint_file).stem,
58 | 'version': model_version,
59 | 'export_path': output_folder,
60 | 'force': force,
61 | 'requirements_file': None,
62 | 'extra_files': None,
63 | 'runtime': 'python',
64 | 'archive_format': 'default'
65 | })
66 | manifest = ModelExportUtils.generate_manifest_json(args)
67 | package_model(args, manifest)
68 |
69 |
70 | def parse_args():
71 | parser = ArgumentParser(
72 | description='Convert MMDetection models to TorchServe `.mar` format.')
73 | parser.add_argument('config', type=str, help='config file path')
74 | parser.add_argument('checkpoint', type=str, help='checkpoint file path')
75 | parser.add_argument(
76 | '--output-folder',
77 | type=str,
78 | required=True,
79 | help='Folder where `{model_name}.mar` will be created.')
80 | parser.add_argument(
81 | '--model-name',
82 | type=str,
83 | default=None,
84 | help='If not None, used for naming the `{model_name}.mar`'
85 | 'file that will be created under `output_folder`.'
86 | 'If None, `{Path(checkpoint_file).stem}` will be used.')
87 | parser.add_argument(
88 | '--model-version',
89 | type=str,
90 | default='1.0',
91 | help='Number used for versioning.')
92 | parser.add_argument(
93 | '-f',
94 | '--force',
95 | action='store_true',
96 | help='overwrite the existing `{model_name}.mar`')
97 | args = parser.parse_args()
98 |
99 | return args
100 |
101 |
102 | if __name__ == '__main__':
103 | args = parse_args()
104 |
105 | if package_model is None:
106 | raise ImportError('`torch-model-archiver` is required.'
107 | 'Try: pip install torch-model-archiver')
108 |
109 | mmdet2torchserve(args.config, args.checkpoint, args.output_folder,
110 | args.model_name, args.model_version, args.force)
111 |
--------------------------------------------------------------------------------
/tools/deployment/mmdet_handler.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import base64
3 | import os
4 |
5 | import mmcv
6 | import torch
7 | from ts.torch_handler.base_handler import BaseHandler
8 |
9 | from mmdet.apis import inference_detector, init_detector
10 |
11 |
12 | class MMdetHandler(BaseHandler):
13 | threshold = 0.5
14 |
15 | def initialize(self, context):
16 | properties = context.system_properties
17 | self.map_location = 'cuda' if torch.cuda.is_available() else 'cpu'
18 | self.device = torch.device(self.map_location + ':' +
19 | str(properties.get('gpu_id')) if torch.cuda.
20 | is_available() else self.map_location)
21 | self.manifest = context.manifest
22 |
23 | model_dir = properties.get('model_dir')
24 | serialized_file = self.manifest['model']['serializedFile']
25 | checkpoint = os.path.join(model_dir, serialized_file)
26 | self.config_file = os.path.join(model_dir, 'config.py')
27 |
28 | self.model = init_detector(self.config_file, checkpoint, self.device)
29 | self.initialized = True
30 |
31 | def preprocess(self, data):
32 | images = []
33 |
34 | for row in data:
35 | image = row.get('data') or row.get('body')
36 | if isinstance(image, str):
37 | image = base64.b64decode(image)
38 | image = mmcv.imfrombytes(image)
39 | images.append(image)
40 |
41 | return images
42 |
43 | def inference(self, data, *args, **kwargs):
44 | results = inference_detector(self.model, data)
45 | return results
46 |
47 | def postprocess(self, data):
48 | # Format output following the example ObjectDetectionHandler format
49 | output = []
50 | for image_index, image_result in enumerate(data):
51 | output.append([])
52 | if isinstance(image_result, tuple):
53 | bbox_result, segm_result = image_result
54 | if isinstance(segm_result, tuple):
55 | segm_result = segm_result[0] # ms rcnn
56 | else:
57 | bbox_result, segm_result = image_result, None
58 |
59 | for class_index, class_result in enumerate(bbox_result):
60 | class_name = self.model.CLASSES[class_index]
61 | for bbox in class_result:
62 | bbox_coords = bbox[:-1].tolist()
63 | score = float(bbox[-1])
64 | if score >= self.threshold:
65 | output[image_index].append({
66 | 'class_name': class_name,
67 | 'bbox': bbox_coords,
68 | 'score': score
69 | })
70 |
71 | return output
72 |
--------------------------------------------------------------------------------
/tools/deployment/test.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import warnings
4 |
5 | import mmcv
6 | from mmcv import Config, DictAction
7 | from mmcv.parallel import MMDataParallel
8 |
9 | from mmdet.apis import single_gpu_test
10 | from mmdet.datasets import (build_dataloader, build_dataset,
11 | replace_ImageToTensor)
12 |
13 |
14 | def parse_args():
15 | parser = argparse.ArgumentParser(
16 | description='MMDet test (and eval) an ONNX model using ONNXRuntime')
17 | parser.add_argument('config', help='test config file path')
18 | parser.add_argument('model', help='Input model file')
19 | parser.add_argument('--out', help='output result file in pickle format')
20 | parser.add_argument(
21 | '--format-only',
22 | action='store_true',
23 | help='Format the output results without perform evaluation. It is'
24 | 'useful when you want to format the result to a specific format and '
25 | 'submit it to the test server')
26 | parser.add_argument(
27 | '--backend',
28 | required=True,
29 | choices=['onnxruntime', 'tensorrt'],
30 | help='Backend for input model to run. ')
31 | parser.add_argument(
32 | '--eval',
33 | type=str,
34 | nargs='+',
35 | help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
36 | ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
37 | parser.add_argument('--show', action='store_true', help='show results')
38 | parser.add_argument(
39 | '--show-dir', help='directory where painted images will be saved')
40 | parser.add_argument(
41 | '--show-score-thr',
42 | type=float,
43 | default=0.3,
44 | help='score threshold (default: 0.3)')
45 | parser.add_argument(
46 | '--cfg-options',
47 | nargs='+',
48 | action=DictAction,
49 | help='override some settings in the used config, the key-value pair '
50 | 'in xxx=yyy format will be merged into config file. If the value to '
51 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
52 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
53 | 'Note that the quotation marks are necessary and that no white space '
54 | 'is allowed.')
55 | parser.add_argument(
56 | '--eval-options',
57 | nargs='+',
58 | action=DictAction,
59 | help='custom options for evaluation, the key-value pair in xxx=yyy '
60 | 'format will be kwargs for dataset.evaluate() function')
61 |
62 | args = parser.parse_args()
63 | return args
64 |
65 |
66 | def main():
67 | args = parse_args()
68 |
69 | assert args.out or args.eval or args.format_only or args.show \
70 | or args.show_dir, \
71 | ('Please specify at least one operation (save/eval/format/show the '
72 | 'results / save the results) with the argument "--out", "--eval"'
73 | ', "--format-only", "--show" or "--show-dir"')
74 |
75 | if args.eval and args.format_only:
76 | raise ValueError('--eval and --format_only cannot be both specified')
77 |
78 | if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
79 | raise ValueError('The output file must be a pkl file.')
80 |
81 | cfg = Config.fromfile(args.config)
82 | if args.cfg_options is not None:
83 | cfg.merge_from_dict(args.cfg_options)
84 |
85 | # in case the test dataset is concatenated
86 | samples_per_gpu = 1
87 | if isinstance(cfg.data.test, dict):
88 | cfg.data.test.test_mode = True
89 | samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)
90 | if samples_per_gpu > 1:
91 | # Replace 'ImageToTensor' to 'DefaultFormatBundle'
92 | cfg.data.test.pipeline = replace_ImageToTensor(
93 | cfg.data.test.pipeline)
94 | elif isinstance(cfg.data.test, list):
95 | for ds_cfg in cfg.data.test:
96 | ds_cfg.test_mode = True
97 | samples_per_gpu = max(
98 | [ds_cfg.pop('samples_per_gpu', 1) for ds_cfg in cfg.data.test])
99 | if samples_per_gpu > 1:
100 | for ds_cfg in cfg.data.test:
101 | ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline)
102 |
103 | # build the dataloader
104 | dataset = build_dataset(cfg.data.test)
105 | data_loader = build_dataloader(
106 | dataset,
107 | samples_per_gpu=samples_per_gpu,
108 | workers_per_gpu=cfg.data.workers_per_gpu,
109 | dist=False,
110 | shuffle=False)
111 |
112 | if args.backend == 'onnxruntime':
113 | from mmdet.core.export.model_wrappers import ONNXRuntimeDetector
114 | model = ONNXRuntimeDetector(
115 | args.model, class_names=dataset.CLASSES, device_id=0)
116 | elif args.backend == 'tensorrt':
117 | from mmdet.core.export.model_wrappers import TensorRTDetector
118 | model = TensorRTDetector(
119 | args.model, class_names=dataset.CLASSES, device_id=0)
120 |
121 | model = MMDataParallel(model, device_ids=[0])
122 | outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
123 | args.show_score_thr)
124 |
125 | if args.out:
126 | print(f'\nwriting results to {args.out}')
127 | mmcv.dump(outputs, args.out)
128 | kwargs = {} if args.eval_options is None else args.eval_options
129 | if args.format_only:
130 | dataset.format_results(outputs, **kwargs)
131 | if args.eval:
132 | eval_kwargs = cfg.get('evaluation', {}).copy()
133 | # hard-code way to remove EvalHook args
134 | for key in [
135 | 'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
136 | 'rule'
137 | ]:
138 | eval_kwargs.pop(key, None)
139 | eval_kwargs.update(dict(metric=args.eval, **kwargs))
140 | print(dataset.evaluate(outputs, **eval_kwargs))
141 |
142 |
143 | if __name__ == '__main__':
144 | main()
145 |
146 | # Following strings of text style are from colorama package
147 | bright_style, reset_style = '\x1b[1m', '\x1b[0m'
148 | red_text, blue_text = '\x1b[31m', '\x1b[34m'
149 | white_background = '\x1b[107m'
150 |
151 | msg = white_background + bright_style + red_text
152 | msg += 'DeprecationWarning: This tool will be deprecated in future. '
153 | msg += blue_text + 'Welcome to use the unified model deployment toolbox '
154 | msg += 'MMDeploy: https://github.com/open-mmlab/mmdeploy'
155 | msg += reset_style
156 | warnings.warn(msg)
157 |
--------------------------------------------------------------------------------
/tools/deployment/test_torchserver.py:
--------------------------------------------------------------------------------
1 | from argparse import ArgumentParser
2 |
3 | import numpy as np
4 | import requests
5 |
6 | from mmdet.apis import inference_detector, init_detector, show_result_pyplot
7 | from mmdet.core import bbox2result
8 |
9 |
10 | def parse_args():
11 | parser = ArgumentParser()
12 | parser.add_argument('img', help='Image file')
13 | parser.add_argument('config', help='Config file')
14 | parser.add_argument('checkpoint', help='Checkpoint file')
15 | parser.add_argument('model_name', help='The model name in the server')
16 | parser.add_argument(
17 | '--inference-addr',
18 | default='127.0.0.1:8080',
19 | help='Address and port of the inference server')
20 | parser.add_argument(
21 | '--device', default='cuda:0', help='Device used for inference')
22 | parser.add_argument(
23 | '--score-thr', type=float, default=0.5, help='bbox score threshold')
24 | args = parser.parse_args()
25 | return args
26 |
27 |
28 | def parse_result(input, model_class):
29 | bbox = []
30 | label = []
31 | score = []
32 | for anchor in input:
33 | bbox.append(anchor['bbox'])
34 | label.append(model_class.index(anchor['class_name']))
35 | score.append([anchor['score']])
36 | bboxes = np.append(bbox, score, axis=1)
37 | labels = np.array(label)
38 | result = bbox2result(bboxes, labels, len(model_class))
39 | return result
40 |
41 |
42 | def main(args):
43 | # build the model from a config file and a checkpoint file
44 | model = init_detector(args.config, args.checkpoint, device=args.device)
45 | # test a single image
46 | model_result = inference_detector(model, args.img)
47 | for i, anchor_set in enumerate(model_result):
48 | anchor_set = anchor_set[anchor_set[:, 4] >= 0.5]
49 | model_result[i] = anchor_set
50 | # show the results
51 | show_result_pyplot(
52 | model,
53 | args.img,
54 | model_result,
55 | score_thr=args.score_thr,
56 | title='pytorch_result')
57 | url = 'http://' + args.inference_addr + '/predictions/' + args.model_name
58 | with open(args.img, 'rb') as image:
59 | response = requests.post(url, image)
60 | server_result = parse_result(response.json(), model.CLASSES)
61 | show_result_pyplot(
62 | model,
63 | args.img,
64 | server_result,
65 | score_thr=args.score_thr,
66 | title='server_result')
67 |
68 | for i in range(len(model.CLASSES)):
69 | assert np.allclose(model_result[i], server_result[i])
70 |
71 |
72 | if __name__ == '__main__':
73 | args = parse_args()
74 | main(args)
75 |
--------------------------------------------------------------------------------
/tools/dist_test.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | CONFIG=$1
4 | CHECKPOINT=$2
5 | GPUS=$3
6 | NNODES=${NNODES:-1}
7 | NODE_RANK=${NODE_RANK:-0}
8 | PORT=${PORT:-29501}
9 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
10 |
11 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
12 | python -m torch.distributed.launch \
13 | --nnodes=$NNODES \
14 | --node_rank=$NODE_RANK \
15 | --master_addr=$MASTER_ADDR \
16 | --nproc_per_node=$GPUS \
17 | --master_port=$PORT \
18 | $(dirname "$0")/test.py \
19 | $CONFIG \
20 | $CHECKPOINT \
21 | --eval "bbox" "segm" \
22 | --launcher pytorch \
23 | ${@:4}
24 |
--------------------------------------------------------------------------------
/tools/dist_train.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | CONFIG=$1
4 | GPUS=$2
5 | NNODES=${NNODES:-1}
6 | NODE_RANK=${NODE_RANK:-0}
7 | PORT=${PORT:-29544}
8 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"}
9 |
10 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
11 | python -m torch.distributed.launch \
12 | --nnodes=$NNODES \
13 | --node_rank=$NODE_RANK \
14 | --master_addr=$MASTER_ADDR \
15 | --nproc_per_node=$GPUS \
16 | --master_port=$PORT \
17 | $(dirname "$0")/train.py \
18 | $CONFIG \
19 | --seed 0 \
20 | --launcher pytorch ${@:3}
21 |
--------------------------------------------------------------------------------
/tools/misc/browse_dataset.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import os
4 | from collections import Sequence
5 | from pathlib import Path
6 |
7 | import mmcv
8 | import numpy as np
9 | from mmcv import Config, DictAction
10 |
11 | from mmdet.core.utils import mask2ndarray
12 | from mmdet.core.visualization import imshow_det_bboxes
13 | from mmdet.datasets.builder import build_dataset
14 | from mmdet.utils import update_data_root
15 |
16 |
17 | def parse_args():
18 | parser = argparse.ArgumentParser(description='Browse a dataset')
19 | parser.add_argument('config', help='train config file path')
20 | parser.add_argument(
21 | '--skip-type',
22 | type=str,
23 | nargs='+',
24 | default=['DefaultFormatBundle', 'Normalize', 'Collect'],
25 | help='skip some useless pipeline')
26 | parser.add_argument(
27 | '--output-dir',
28 | default=None,
29 | type=str,
30 | help='If there is no display interface, you can save it')
31 | parser.add_argument('--not-show', default=False, action='store_true')
32 | parser.add_argument(
33 | '--show-interval',
34 | type=float,
35 | default=2,
36 | help='the interval of show (s)')
37 | parser.add_argument(
38 | '--cfg-options',
39 | nargs='+',
40 | action=DictAction,
41 | help='override some settings in the used config, the key-value pair '
42 | 'in xxx=yyy format will be merged into config file. If the value to '
43 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
44 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
45 | 'Note that the quotation marks are necessary and that no white space '
46 | 'is allowed.')
47 | args = parser.parse_args()
48 | return args
49 |
50 |
51 | def retrieve_data_cfg(config_path, skip_type, cfg_options):
52 |
53 | def skip_pipeline_steps(config):
54 | config['pipeline'] = [
55 | x for x in config.pipeline if x['type'] not in skip_type
56 | ]
57 |
58 | cfg = Config.fromfile(config_path)
59 |
60 | # update data root according to MMDET_DATASETS
61 | update_data_root(cfg)
62 |
63 | if cfg_options is not None:
64 | cfg.merge_from_dict(cfg_options)
65 | train_data_cfg = cfg.data.train
66 | while 'dataset' in train_data_cfg and train_data_cfg[
67 | 'type'] != 'MultiImageMixDataset':
68 | train_data_cfg = train_data_cfg['dataset']
69 |
70 | if isinstance(train_data_cfg, Sequence):
71 | [skip_pipeline_steps(c) for c in train_data_cfg]
72 | else:
73 | skip_pipeline_steps(train_data_cfg)
74 |
75 | return cfg
76 |
77 |
78 | def main():
79 | args = parse_args()
80 | cfg = retrieve_data_cfg(args.config, args.skip_type, args.cfg_options)
81 |
82 | if 'gt_semantic_seg' in cfg.train_pipeline[-1]['keys']:
83 | cfg.data.train.pipeline = [
84 | p for p in cfg.data.train.pipeline if p['type'] != 'SegRescale'
85 | ]
86 | dataset = build_dataset(cfg.data.train)
87 |
88 | progress_bar = mmcv.ProgressBar(len(dataset))
89 |
90 | for item in dataset:
91 | filename = os.path.join(args.output_dir,
92 | Path(item['filename']).name
93 | ) if args.output_dir is not None else None
94 |
95 | gt_bboxes = item['gt_bboxes']
96 | gt_labels = item['gt_labels']
97 | gt_masks = item.get('gt_masks', None)
98 | if gt_masks is not None:
99 | gt_masks = mask2ndarray(gt_masks)
100 |
101 | gt_seg = item.get('gt_semantic_seg', None)
102 | if gt_seg is not None:
103 | pad_value = 255 # the padding value of gt_seg
104 | sem_labels = np.unique(gt_seg)
105 | all_labels = np.concatenate((gt_labels, sem_labels), axis=0)
106 | all_labels, counts = np.unique(all_labels, return_counts=True)
107 | stuff_labels = all_labels[np.logical_and(counts < 2,
108 | all_labels != pad_value)]
109 | stuff_masks = gt_seg[None] == stuff_labels[:, None, None]
110 | gt_labels = np.concatenate((gt_labels, stuff_labels), axis=0)
111 | gt_masks = np.concatenate((gt_masks, stuff_masks.astype(np.uint8)),
112 | axis=0)
113 | # If you need to show the bounding boxes,
114 | # please comment the following line
115 | gt_bboxes = None
116 |
117 | imshow_det_bboxes(
118 | item['img'],
119 | gt_bboxes,
120 | gt_labels,
121 | gt_masks,
122 | class_names=dataset.CLASSES,
123 | show=not args.not_show,
124 | wait_time=args.show_interval,
125 | out_file=filename,
126 | bbox_color=dataset.PALETTE,
127 | text_color=(200, 200, 200),
128 | mask_color=dataset.PALETTE)
129 |
130 | progress_bar.update()
131 |
132 |
133 | if __name__ == '__main__':
134 | main()
135 |
--------------------------------------------------------------------------------
/tools/misc/download_dataset.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from itertools import repeat
3 | from multiprocessing.pool import ThreadPool
4 | from pathlib import Path
5 | from tarfile import TarFile
6 | from zipfile import ZipFile
7 |
8 | import torch
9 |
10 |
11 | def parse_args():
12 | parser = argparse.ArgumentParser(
13 | description='Download datasets for training')
14 | parser.add_argument(
15 | '--dataset-name', type=str, help='dataset name', default='coco2017')
16 | parser.add_argument(
17 | '--save-dir',
18 | type=str,
19 | help='the dir to save dataset',
20 | default='data/coco')
21 | parser.add_argument(
22 | '--unzip',
23 | action='store_true',
24 | help='whether unzip dataset or not, zipped files will be saved')
25 | parser.add_argument(
26 | '--delete',
27 | action='store_true',
28 | help='delete the download zipped files')
29 | parser.add_argument(
30 | '--threads', type=int, help='number of threading', default=4)
31 | args = parser.parse_args()
32 | return args
33 |
34 |
35 | def download(url, dir, unzip=True, delete=False, threads=1):
36 |
37 | def download_one(url, dir):
38 | f = dir / Path(url).name
39 | if Path(url).is_file():
40 | Path(url).rename(f)
41 | elif not f.exists():
42 | print('Downloading {} to {}'.format(url, f))
43 | torch.hub.download_url_to_file(url, f, progress=True)
44 | if unzip and f.suffix in ('.zip', '.tar'):
45 | print('Unzipping {}'.format(f.name))
46 | if f.suffix == '.zip':
47 | ZipFile(f).extractall(path=dir)
48 | elif f.suffix == '.tar':
49 | TarFile(f).extractall(path=dir)
50 | if delete:
51 | f.unlink()
52 | print('Delete {}'.format(f))
53 |
54 | dir = Path(dir)
55 | if threads > 1:
56 | pool = ThreadPool(threads)
57 | pool.imap(lambda x: download_one(*x), zip(url, repeat(dir)))
58 | pool.close()
59 | pool.join()
60 | else:
61 | for u in [url] if isinstance(url, (str, Path)) else url:
62 | download_one(u, dir)
63 |
64 |
65 | def main():
66 | args = parse_args()
67 | path = Path(args.save_dir)
68 | if not path.exists():
69 | path.mkdir(parents=True, exist_ok=True)
70 | data2url = dict(
71 | # TODO: Support for downloading Panoptic Segmentation of COCO
72 | coco2017=[
73 | 'http://images.cocodataset.org/zips/train2017.zip',
74 | 'http://images.cocodataset.org/zips/val2017.zip',
75 | 'http://images.cocodataset.org/zips/test2017.zip',
76 | 'http://images.cocodataset.org/annotations/' +
77 | 'annotations_trainval2017.zip'
78 | ],
79 | lvis=[
80 | 'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip', # noqa
81 | 'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip', # noqa
82 | ],
83 | voc2007=[
84 | 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar', # noqa
85 | 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar', # noqa
86 | 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar', # noqa
87 | ],
88 | )
89 | url = data2url.get(args.dataset_name, None)
90 | if url is None:
91 | print('Only support COCO, VOC, and LVIS now!')
92 | return
93 | download(
94 | url,
95 | dir=path,
96 | unzip=args.unzip,
97 | delete=args.delete,
98 | threads=args.threads)
99 |
100 |
101 | if __name__ == '__main__':
102 | main()
103 |
--------------------------------------------------------------------------------
/tools/misc/gen_coco_panoptic_test_info.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | import os.path as osp
3 |
4 | import mmcv
5 |
6 |
7 | def parse_args():
8 | parser = argparse.ArgumentParser(
9 | description='Generate COCO test image information '
10 | 'for COCO panoptic segmentation.')
11 | parser.add_argument('data_root', help='Path to COCO annotation directory.')
12 | args = parser.parse_args()
13 |
14 | return args
15 |
16 |
17 | def main():
18 | args = parse_args()
19 | data_root = args.data_root
20 | val_info = mmcv.load(osp.join(data_root, 'panoptic_val2017.json'))
21 | test_old_info = mmcv.load(
22 | osp.join(data_root, 'image_info_test-dev2017.json'))
23 |
24 | # replace categories from image_info_test-dev2017.json
25 | # with categories from panoptic_val2017.json which
26 | # has attribute `isthing`.
27 | test_info = test_old_info
28 | test_info.update({'categories': val_info['categories']})
29 | mmcv.dump(test_info,
30 | osp.join(data_root, 'panoptic_image_info_test-dev2017.json'))
31 |
32 |
33 | if __name__ == '__main__':
34 | main()
35 |
--------------------------------------------------------------------------------
/tools/misc/get_image_metas.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | """Get test image metas on a specific dataset.
3 |
4 | Here is an example to run this script.
5 |
6 | Example:
7 | python tools/misc/get_image_metas.py ${CONFIG} \
8 | --out ${OUTPUT FILE NAME}
9 | """
10 | import argparse
11 | import csv
12 | import os.path as osp
13 | from multiprocessing import Pool
14 |
15 | import mmcv
16 | from mmcv import Config
17 |
18 |
19 | def parse_args():
20 | parser = argparse.ArgumentParser(description='Collect image metas')
21 | parser.add_argument('config', help='Config file path')
22 | parser.add_argument(
23 | '--out',
24 | default='validation-image-metas.pkl',
25 | help='The output image metas file name. The save dir is in the '
26 | 'same directory as `dataset.ann_file` path')
27 | parser.add_argument(
28 | '--nproc',
29 | default=4,
30 | type=int,
31 | help='Processes used for get image metas')
32 | args = parser.parse_args()
33 | return args
34 |
35 |
36 | def get_metas_from_csv_style_ann_file(ann_file):
37 | data_infos = []
38 | cp_filename = None
39 | with open(ann_file, 'r') as f:
40 | reader = csv.reader(f)
41 | for i, line in enumerate(reader):
42 | if i == 0:
43 | continue
44 | img_id = line[0]
45 | filename = f'{img_id}.jpg'
46 | if filename != cp_filename:
47 | data_infos.append(dict(filename=filename))
48 | cp_filename = filename
49 | return data_infos
50 |
51 |
52 | def get_metas_from_txt_style_ann_file(ann_file):
53 | with open(ann_file) as f:
54 | lines = f.readlines()
55 | i = 0
56 | data_infos = []
57 | while i < len(lines):
58 | filename = lines[i].rstrip()
59 | data_infos.append(dict(filename=filename))
60 | skip_lines = int(lines[i + 2]) + 3
61 | i += skip_lines
62 | return data_infos
63 |
64 |
65 | def get_image_metas(data_info, img_prefix):
66 | file_client = mmcv.FileClient(backend='disk')
67 | filename = data_info.get('filename', None)
68 | if filename is not None:
69 | if img_prefix is not None:
70 | filename = osp.join(img_prefix, filename)
71 | img_bytes = file_client.get(filename)
72 | img = mmcv.imfrombytes(img_bytes, flag='color')
73 | meta = dict(filename=filename, ori_shape=img.shape)
74 | else:
75 | raise NotImplementedError('Missing `filename` in data_info')
76 | return meta
77 |
78 |
79 | def main():
80 | args = parse_args()
81 | assert args.out.endswith('pkl'), 'The output file name must be pkl suffix'
82 |
83 | # load config files
84 | cfg = Config.fromfile(args.config)
85 | ann_file = cfg.data.test.ann_file
86 | img_prefix = cfg.data.test.img_prefix
87 |
88 | print(f'{"-" * 5} Start Processing {"-" * 5}')
89 | if ann_file.endswith('csv'):
90 | data_infos = get_metas_from_csv_style_ann_file(ann_file)
91 | elif ann_file.endswith('txt'):
92 | data_infos = get_metas_from_txt_style_ann_file(ann_file)
93 | else:
94 | shuffix = ann_file.split('.')[-1]
95 | raise NotImplementedError('File name must be csv or txt suffix but '
96 | f'get {shuffix}')
97 |
98 | print(f'Successfully load annotation file from {ann_file}')
99 | print(f'Processing {len(data_infos)} images...')
100 | pool = Pool(args.nproc)
101 | # get image metas with multiple processes
102 | image_metas = pool.starmap(
103 | get_image_metas,
104 | zip(data_infos, [img_prefix for _ in range(len(data_infos))]),
105 | )
106 | pool.close()
107 |
108 | # save image metas
109 | root_path = cfg.data.test.ann_file.rsplit('/', 1)[0]
110 | save_path = osp.join(root_path, args.out)
111 | mmcv.dump(image_metas, save_path)
112 | print(f'Image meta file save to: {save_path}')
113 |
114 |
115 | if __name__ == '__main__':
116 | main()
117 |
--------------------------------------------------------------------------------
/tools/misc/print_config.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import warnings
4 |
5 | from mmcv import Config, DictAction
6 |
7 | from mmdet.utils import update_data_root
8 |
9 |
10 | def parse_args():
11 | parser = argparse.ArgumentParser(description='Print the whole config')
12 | parser.add_argument('config', help='config file path')
13 | parser.add_argument(
14 | '--options',
15 | nargs='+',
16 | action=DictAction,
17 | help='override some settings in the used config, the key-value pair '
18 | 'in xxx=yyy format will be merged into config file (deprecate), '
19 | 'change to --cfg-options instead.')
20 | parser.add_argument(
21 | '--cfg-options',
22 | nargs='+',
23 | action=DictAction,
24 | help='override some settings in the used config, the key-value pair '
25 | 'in xxx=yyy format will be merged into config file. If the value to '
26 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
27 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
28 | 'Note that the quotation marks are necessary and that no white space '
29 | 'is allowed.')
30 | args = parser.parse_args()
31 |
32 | if args.options and args.cfg_options:
33 | raise ValueError(
34 | '--options and --cfg-options cannot be both '
35 | 'specified, --options is deprecated in favor of --cfg-options')
36 | if args.options:
37 | warnings.warn('--options is deprecated in favor of --cfg-options')
38 | args.cfg_options = args.options
39 |
40 | return args
41 |
42 |
43 | def main():
44 | args = parse_args()
45 |
46 | cfg = Config.fromfile(args.config)
47 |
48 | # update data root according to MMDET_DATASETS
49 | update_data_root(cfg)
50 |
51 | if args.cfg_options is not None:
52 | cfg.merge_from_dict(args.cfg_options)
53 | print(f'Config:\n{cfg.pretty_text}')
54 |
55 |
56 | if __name__ == '__main__':
57 | main()
58 |
--------------------------------------------------------------------------------
/tools/model_converters/detectron2pytorch.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | from collections import OrderedDict
4 |
5 | import mmcv
6 | import torch
7 |
8 | arch_settings = {50: (3, 4, 6, 3), 101: (3, 4, 23, 3)}
9 |
10 |
11 | def convert_bn(blobs, state_dict, caffe_name, torch_name, converted_names):
12 | # detectron replace bn with affine channel layer
13 | state_dict[torch_name + '.bias'] = torch.from_numpy(blobs[caffe_name +
14 | '_b'])
15 | state_dict[torch_name + '.weight'] = torch.from_numpy(blobs[caffe_name +
16 | '_s'])
17 | bn_size = state_dict[torch_name + '.weight'].size()
18 | state_dict[torch_name + '.running_mean'] = torch.zeros(bn_size)
19 | state_dict[torch_name + '.running_var'] = torch.ones(bn_size)
20 | converted_names.add(caffe_name + '_b')
21 | converted_names.add(caffe_name + '_s')
22 |
23 |
24 | def convert_conv_fc(blobs, state_dict, caffe_name, torch_name,
25 | converted_names):
26 | state_dict[torch_name + '.weight'] = torch.from_numpy(blobs[caffe_name +
27 | '_w'])
28 | converted_names.add(caffe_name + '_w')
29 | if caffe_name + '_b' in blobs:
30 | state_dict[torch_name + '.bias'] = torch.from_numpy(blobs[caffe_name +
31 | '_b'])
32 | converted_names.add(caffe_name + '_b')
33 |
34 |
35 | def convert(src, dst, depth):
36 | """Convert keys in detectron pretrained ResNet models to pytorch style."""
37 | # load arch_settings
38 | if depth not in arch_settings:
39 | raise ValueError('Only support ResNet-50 and ResNet-101 currently')
40 | block_nums = arch_settings[depth]
41 | # load caffe model
42 | caffe_model = mmcv.load(src, encoding='latin1')
43 | blobs = caffe_model['blobs'] if 'blobs' in caffe_model else caffe_model
44 | # convert to pytorch style
45 | state_dict = OrderedDict()
46 | converted_names = set()
47 | convert_conv_fc(blobs, state_dict, 'conv1', 'conv1', converted_names)
48 | convert_bn(blobs, state_dict, 'res_conv1_bn', 'bn1', converted_names)
49 | for i in range(1, len(block_nums) + 1):
50 | for j in range(block_nums[i - 1]):
51 | if j == 0:
52 | convert_conv_fc(blobs, state_dict, f'res{i + 1}_{j}_branch1',
53 | f'layer{i}.{j}.downsample.0', converted_names)
54 | convert_bn(blobs, state_dict, f'res{i + 1}_{j}_branch1_bn',
55 | f'layer{i}.{j}.downsample.1', converted_names)
56 | for k, letter in enumerate(['a', 'b', 'c']):
57 | convert_conv_fc(blobs, state_dict,
58 | f'res{i + 1}_{j}_branch2{letter}',
59 | f'layer{i}.{j}.conv{k+1}', converted_names)
60 | convert_bn(blobs, state_dict,
61 | f'res{i + 1}_{j}_branch2{letter}_bn',
62 | f'layer{i}.{j}.bn{k + 1}', converted_names)
63 | # check if all layers are converted
64 | for key in blobs:
65 | if key not in converted_names:
66 | print(f'Not Convert: {key}')
67 | # save checkpoint
68 | checkpoint = dict()
69 | checkpoint['state_dict'] = state_dict
70 | torch.save(checkpoint, dst)
71 |
72 |
73 | def main():
74 | parser = argparse.ArgumentParser(description='Convert model keys')
75 | parser.add_argument('src', help='src detectron model path')
76 | parser.add_argument('dst', help='save path')
77 | parser.add_argument('depth', type=int, help='ResNet model depth')
78 | args = parser.parse_args()
79 | convert(args.src, args.dst, args.depth)
80 |
81 |
82 | if __name__ == '__main__':
83 | main()
84 |
--------------------------------------------------------------------------------
/tools/model_converters/publish_model.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import subprocess
4 |
5 | import torch
6 |
7 |
8 | def parse_args():
9 | parser = argparse.ArgumentParser(
10 | description='Process a checkpoint to be published')
11 | parser.add_argument('in_file', help='input checkpoint filename')
12 | parser.add_argument('out_file', help='output checkpoint filename')
13 | args = parser.parse_args()
14 | return args
15 |
16 |
17 | def process_checkpoint(in_file, out_file):
18 | checkpoint = torch.load(in_file, map_location='cpu')
19 | # remove optimizer for smaller file size
20 | if 'optimizer' in checkpoint:
21 | del checkpoint['optimizer']
22 | # if it is necessary to remove some sensitive data in checkpoint['meta'],
23 | # add the code here.
24 | if torch.__version__ >= '1.6':
25 | torch.save(checkpoint, out_file, _use_new_zipfile_serialization=False)
26 | else:
27 | torch.save(checkpoint, out_file)
28 | sha = subprocess.check_output(['sha256sum', out_file]).decode()
29 | if out_file.endswith('.pth'):
30 | out_file_name = out_file[:-4]
31 | else:
32 | out_file_name = out_file
33 | final_file = out_file_name + f'-{sha[:8]}.pth'
34 | subprocess.Popen(['mv', out_file, final_file])
35 |
36 |
37 | def main():
38 | args = parse_args()
39 | process_checkpoint(args.in_file, args.out_file)
40 |
41 |
42 | if __name__ == '__main__':
43 | main()
44 |
--------------------------------------------------------------------------------
/tools/model_converters/regnet2mmdet.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | from collections import OrderedDict
4 |
5 | import torch
6 |
7 |
8 | def convert_stem(model_key, model_weight, state_dict, converted_names):
9 | new_key = model_key.replace('stem.conv', 'conv1')
10 | new_key = new_key.replace('stem.bn', 'bn1')
11 | state_dict[new_key] = model_weight
12 | converted_names.add(model_key)
13 | print(f'Convert {model_key} to {new_key}')
14 |
15 |
16 | def convert_head(model_key, model_weight, state_dict, converted_names):
17 | new_key = model_key.replace('head.fc', 'fc')
18 | state_dict[new_key] = model_weight
19 | converted_names.add(model_key)
20 | print(f'Convert {model_key} to {new_key}')
21 |
22 |
23 | def convert_reslayer(model_key, model_weight, state_dict, converted_names):
24 | split_keys = model_key.split('.')
25 | layer, block, module = split_keys[:3]
26 | block_id = int(block[1:])
27 | layer_name = f'layer{int(layer[1:])}'
28 | block_name = f'{block_id - 1}'
29 |
30 | if block_id == 1 and module == 'bn':
31 | new_key = f'{layer_name}.{block_name}.downsample.1.{split_keys[-1]}'
32 | elif block_id == 1 and module == 'proj':
33 | new_key = f'{layer_name}.{block_name}.downsample.0.{split_keys[-1]}'
34 | elif module == 'f':
35 | if split_keys[3] == 'a_bn':
36 | module_name = 'bn1'
37 | elif split_keys[3] == 'b_bn':
38 | module_name = 'bn2'
39 | elif split_keys[3] == 'c_bn':
40 | module_name = 'bn3'
41 | elif split_keys[3] == 'a':
42 | module_name = 'conv1'
43 | elif split_keys[3] == 'b':
44 | module_name = 'conv2'
45 | elif split_keys[3] == 'c':
46 | module_name = 'conv3'
47 | new_key = f'{layer_name}.{block_name}.{module_name}.{split_keys[-1]}'
48 | else:
49 | raise ValueError(f'Unsupported conversion of key {model_key}')
50 | print(f'Convert {model_key} to {new_key}')
51 | state_dict[new_key] = model_weight
52 | converted_names.add(model_key)
53 |
54 |
55 | def convert(src, dst):
56 | """Convert keys in pycls pretrained RegNet models to mmdet style."""
57 | # load caffe model
58 | regnet_model = torch.load(src)
59 | blobs = regnet_model['model_state']
60 | # convert to pytorch style
61 | state_dict = OrderedDict()
62 | converted_names = set()
63 | for key, weight in blobs.items():
64 | if 'stem' in key:
65 | convert_stem(key, weight, state_dict, converted_names)
66 | elif 'head' in key:
67 | convert_head(key, weight, state_dict, converted_names)
68 | elif key.startswith('s'):
69 | convert_reslayer(key, weight, state_dict, converted_names)
70 |
71 | # check if all layers are converted
72 | for key in blobs:
73 | if key not in converted_names:
74 | print(f'not converted: {key}')
75 | # save checkpoint
76 | checkpoint = dict()
77 | checkpoint['state_dict'] = state_dict
78 | torch.save(checkpoint, dst)
79 |
80 |
81 | def main():
82 | parser = argparse.ArgumentParser(description='Convert model keys')
83 | parser.add_argument('src', help='src detectron model path')
84 | parser.add_argument('dst', help='save path')
85 | args = parser.parse_args()
86 | convert(args.src, args.dst)
87 |
88 |
89 | if __name__ == '__main__':
90 | main()
91 |
--------------------------------------------------------------------------------
/tools/model_converters/selfsup2mmdet.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | from collections import OrderedDict
4 |
5 | import torch
6 |
7 |
8 | def moco_convert(src, dst):
9 | """Convert keys in pycls pretrained moco models to mmdet style."""
10 | # load caffe model
11 | moco_model = torch.load(src)
12 | blobs = moco_model['state_dict']
13 | # convert to pytorch style
14 | state_dict = OrderedDict()
15 | for k, v in blobs.items():
16 | if not k.startswith('module.encoder_q.'):
17 | continue
18 | old_k = k
19 | k = k.replace('module.encoder_q.', '')
20 | state_dict[k] = v
21 | print(old_k, '->', k)
22 | # save checkpoint
23 | checkpoint = dict()
24 | checkpoint['state_dict'] = state_dict
25 | torch.save(checkpoint, dst)
26 |
27 |
28 | def main():
29 | parser = argparse.ArgumentParser(description='Convert model keys')
30 | parser.add_argument('src', help='src detectron model path')
31 | parser.add_argument('dst', help='save path')
32 | parser.add_argument(
33 | '--selfsup', type=str, choices=['moco', 'swav'], help='save path')
34 | args = parser.parse_args()
35 | if args.selfsup == 'moco':
36 | moco_convert(args.src, args.dst)
37 | elif args.selfsup == 'swav':
38 | print('SWAV does not need to convert the keys')
39 |
40 |
41 | if __name__ == '__main__':
42 | main()
43 |
--------------------------------------------------------------------------------
/tools/model_converters/upgrade_model_version.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import re
4 | import tempfile
5 | from collections import OrderedDict
6 |
7 | import torch
8 | from mmcv import Config
9 |
10 |
11 | def is_head(key):
12 | valid_head_list = [
13 | 'bbox_head', 'mask_head', 'semantic_head', 'grid_head', 'mask_iou_head'
14 | ]
15 |
16 | return any(key.startswith(h) for h in valid_head_list)
17 |
18 |
19 | def parse_config(config_strings):
20 | temp_file = tempfile.NamedTemporaryFile()
21 | config_path = f'{temp_file.name}.py'
22 | with open(config_path, 'w') as f:
23 | f.write(config_strings)
24 |
25 | config = Config.fromfile(config_path)
26 | is_two_stage = True
27 | is_ssd = False
28 | is_retina = False
29 | reg_cls_agnostic = False
30 | if 'rpn_head' not in config.model:
31 | is_two_stage = False
32 | # check whether it is SSD
33 | if config.model.bbox_head.type == 'SSDHead':
34 | is_ssd = True
35 | elif config.model.bbox_head.type == 'RetinaHead':
36 | is_retina = True
37 | elif isinstance(config.model['bbox_head'], list):
38 | reg_cls_agnostic = True
39 | elif 'reg_class_agnostic' in config.model.bbox_head:
40 | reg_cls_agnostic = config.model.bbox_head \
41 | .reg_class_agnostic
42 | temp_file.close()
43 | return is_two_stage, is_ssd, is_retina, reg_cls_agnostic
44 |
45 |
46 | def reorder_cls_channel(val, num_classes=81):
47 | # bias
48 | if val.dim() == 1:
49 | new_val = torch.cat((val[1:], val[:1]), dim=0)
50 | # weight
51 | else:
52 | out_channels, in_channels = val.shape[:2]
53 | # conv_cls for softmax output
54 | if out_channels != num_classes and out_channels % num_classes == 0:
55 | new_val = val.reshape(-1, num_classes, in_channels, *val.shape[2:])
56 | new_val = torch.cat((new_val[:, 1:], new_val[:, :1]), dim=1)
57 | new_val = new_val.reshape(val.size())
58 | # fc_cls
59 | elif out_channels == num_classes:
60 | new_val = torch.cat((val[1:], val[:1]), dim=0)
61 | # agnostic | retina_cls | rpn_cls
62 | else:
63 | new_val = val
64 |
65 | return new_val
66 |
67 |
68 | def truncate_cls_channel(val, num_classes=81):
69 |
70 | # bias
71 | if val.dim() == 1:
72 | if val.size(0) % num_classes == 0:
73 | new_val = val[:num_classes - 1]
74 | else:
75 | new_val = val
76 | # weight
77 | else:
78 | out_channels, in_channels = val.shape[:2]
79 | # conv_logits
80 | if out_channels % num_classes == 0:
81 | new_val = val.reshape(num_classes, in_channels, *val.shape[2:])[1:]
82 | new_val = new_val.reshape(-1, *val.shape[1:])
83 | # agnostic
84 | else:
85 | new_val = val
86 |
87 | return new_val
88 |
89 |
90 | def truncate_reg_channel(val, num_classes=81):
91 | # bias
92 | if val.dim() == 1:
93 | # fc_reg | rpn_reg
94 | if val.size(0) % num_classes == 0:
95 | new_val = val.reshape(num_classes, -1)[:num_classes - 1]
96 | new_val = new_val.reshape(-1)
97 | # agnostic
98 | else:
99 | new_val = val
100 | # weight
101 | else:
102 | out_channels, in_channels = val.shape[:2]
103 | # fc_reg | rpn_reg
104 | if out_channels % num_classes == 0:
105 | new_val = val.reshape(num_classes, -1, in_channels,
106 | *val.shape[2:])[1:]
107 | new_val = new_val.reshape(-1, *val.shape[1:])
108 | # agnostic
109 | else:
110 | new_val = val
111 |
112 | return new_val
113 |
114 |
115 | def convert(in_file, out_file, num_classes):
116 | """Convert keys in checkpoints.
117 |
118 | There can be some breaking changes during the development of mmdetection,
119 | and this tool is used for upgrading checkpoints trained with old versions
120 | to the latest one.
121 | """
122 | checkpoint = torch.load(in_file)
123 | in_state_dict = checkpoint.pop('state_dict')
124 | out_state_dict = OrderedDict()
125 | meta_info = checkpoint['meta']
126 | is_two_stage, is_ssd, is_retina, reg_cls_agnostic = parse_config(
127 | '#' + meta_info['config'])
128 | if meta_info['mmdet_version'] <= '0.5.3' and is_retina:
129 | upgrade_retina = True
130 | else:
131 | upgrade_retina = False
132 |
133 | # MMDetection v2.5.0 unifies the class order in RPN
134 | # if the model is trained in version=2.5.0
136 | if meta_info['mmdet_version'] < '2.5.0':
137 | upgrade_rpn = True
138 | else:
139 | upgrade_rpn = False
140 |
141 | for key, val in in_state_dict.items():
142 | new_key = key
143 | new_val = val
144 | if is_two_stage and is_head(key):
145 | new_key = 'roi_head.{}'.format(key)
146 |
147 | # classification
148 | if upgrade_rpn:
149 | m = re.search(
150 | r'(conv_cls|retina_cls|rpn_cls|fc_cls|fcos_cls|'
151 | r'fovea_cls).(weight|bias)', new_key)
152 | else:
153 | m = re.search(
154 | r'(conv_cls|retina_cls|fc_cls|fcos_cls|'
155 | r'fovea_cls).(weight|bias)', new_key)
156 | if m is not None:
157 | print(f'reorder cls channels of {new_key}')
158 | new_val = reorder_cls_channel(val, num_classes)
159 |
160 | # regression
161 | if upgrade_rpn:
162 | m = re.search(r'(fc_reg).(weight|bias)', new_key)
163 | else:
164 | m = re.search(r'(fc_reg|rpn_reg).(weight|bias)', new_key)
165 | if m is not None and not reg_cls_agnostic:
166 | print(f'truncate regression channels of {new_key}')
167 | new_val = truncate_reg_channel(val, num_classes)
168 |
169 | # mask head
170 | m = re.search(r'(conv_logits).(weight|bias)', new_key)
171 | if m is not None:
172 | print(f'truncate mask prediction channels of {new_key}')
173 | new_val = truncate_cls_channel(val, num_classes)
174 |
175 | m = re.search(r'(cls_convs|reg_convs).\d.(weight|bias)', key)
176 | # Legacy issues in RetinaNet since V1.x
177 | # Use ConvModule instead of nn.Conv2d in RetinaNet
178 | # cls_convs.0.weight -> cls_convs.0.conv.weight
179 | if m is not None and upgrade_retina:
180 | param = m.groups()[1]
181 | new_key = key.replace(param, f'conv.{param}')
182 | out_state_dict[new_key] = val
183 | print(f'rename the name of {key} to {new_key}')
184 | continue
185 |
186 | m = re.search(r'(cls_convs).\d.(weight|bias)', key)
187 | if m is not None and is_ssd:
188 | print(f'reorder cls channels of {new_key}')
189 | new_val = reorder_cls_channel(val, num_classes)
190 |
191 | out_state_dict[new_key] = new_val
192 | checkpoint['state_dict'] = out_state_dict
193 | torch.save(checkpoint, out_file)
194 |
195 |
196 | def main():
197 | parser = argparse.ArgumentParser(description='Upgrade model version')
198 | parser.add_argument('in_file', help='input checkpoint file')
199 | parser.add_argument('out_file', help='output checkpoint file')
200 | parser.add_argument(
201 | '--num-classes',
202 | type=int,
203 | default=81,
204 | help='number of classes of the original model')
205 | args = parser.parse_args()
206 | convert(args.in_file, args.out_file, args.num_classes)
207 |
208 |
209 | if __name__ == '__main__':
210 | main()
211 |
--------------------------------------------------------------------------------
/tools/model_converters/upgrade_ssd_version.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) OpenMMLab. All rights reserved.
2 | import argparse
3 | import tempfile
4 | from collections import OrderedDict
5 |
6 | import torch
7 | from mmcv import Config
8 |
9 |
10 | def parse_config(config_strings):
11 | temp_file = tempfile.NamedTemporaryFile()
12 | config_path = f'{temp_file.name}.py'
13 | with open(config_path, 'w') as f:
14 | f.write(config_strings)
15 |
16 | config = Config.fromfile(config_path)
17 | # check whether it is SSD
18 | if config.model.bbox_head.type != 'SSDHead':
19 | raise AssertionError('This is not a SSD model.')
20 |
21 |
22 | def convert(in_file, out_file):
23 | checkpoint = torch.load(in_file)
24 | in_state_dict = checkpoint.pop('state_dict')
25 | out_state_dict = OrderedDict()
26 | meta_info = checkpoint['meta']
27 | parse_config('#' + meta_info['config'])
28 | for key, value in in_state_dict.items():
29 | if 'extra' in key:
30 | layer_idx = int(key.split('.')[2])
31 | new_key = 'neck.extra_layers.{}.{}.conv.'.format(
32 | layer_idx // 2, layer_idx % 2) + key.split('.')[-1]
33 | elif 'l2_norm' in key:
34 | new_key = 'neck.l2_norm.weight'
35 | elif 'bbox_head' in key:
36 | new_key = key[:21] + '.0' + key[21:]
37 | else:
38 | new_key = key
39 | out_state_dict[new_key] = value
40 | checkpoint['state_dict'] = out_state_dict
41 |
42 | if torch.__version__ >= '1.6':
43 | torch.save(checkpoint, out_file, _use_new_zipfile_serialization=False)
44 | else:
45 | torch.save(checkpoint, out_file)
46 |
47 |
48 | def main():
49 | parser = argparse.ArgumentParser(description='Upgrade SSD version')
50 | parser.add_argument('in_file', help='input checkpoint file')
51 | parser.add_argument('out_file', help='output checkpoint file')
52 |
53 | args = parser.parse_args()
54 | convert(args.in_file, args.out_file)
55 |
56 |
57 | if __name__ == '__main__':
58 | main()
59 |
--------------------------------------------------------------------------------
/tools/slurm_test.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | set -x
4 |
5 | PARTITION=$1
6 | JOB_NAME=$2
7 | CONFIG=$3
8 | CHECKPOINT=$4
9 | GPUS=${GPUS:-8}
10 | GPUS_PER_NODE=${GPUS_PER_NODE:-8}
11 | CPUS_PER_TASK=${CPUS_PER_TASK:-5}
12 | PY_ARGS=${@:5}
13 | SRUN_ARGS=${SRUN_ARGS:-""}
14 |
15 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
16 | srun -p ${PARTITION} \
17 | --job-name=${JOB_NAME} \
18 | --gres=gpu:${GPUS_PER_NODE} \
19 | --ntasks=${GPUS} \
20 | --ntasks-per-node=${GPUS_PER_NODE} \
21 | --cpus-per-task=${CPUS_PER_TASK} \
22 | --kill-on-bad-exit=1 \
23 | ${SRUN_ARGS} \
24 | python -u tools/test.py ${CONFIG} ${CHECKPOINT} --launcher="slurm" ${PY_ARGS}
25 |
--------------------------------------------------------------------------------
/tools/slurm_train.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | set -x
4 |
5 | PARTITION=$1
6 | JOB_NAME=$2
7 | CONFIG=$3
8 | WORK_DIR=$4
9 | GPUS=${GPUS:-8}
10 | GPUS_PER_NODE=${GPUS_PER_NODE:-8}
11 | CPUS_PER_TASK=${CPUS_PER_TASK:-5}
12 | SRUN_ARGS=${SRUN_ARGS:-""}
13 | PY_ARGS=${@:5}
14 |
15 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
16 | srun -p ${PARTITION} \
17 | --job-name=${JOB_NAME} \
18 | --gres=gpu:${GPUS_PER_NODE} \
19 | --ntasks=${GPUS} \
20 | --ntasks-per-node=${GPUS_PER_NODE} \
21 | --cpus-per-task=${CPUS_PER_TASK} \
22 | --kill-on-bad-exit=1 \
23 | ${SRUN_ARGS} \
24 | python -u tools/train.py ${CONFIG} --work-dir=${WORK_DIR} --launcher="slurm" ${PY_ARGS}
25 |
--------------------------------------------------------------------------------