├── .DS_Store ├── .gitignore ├── README.md ├── assets └── OpenInst.png ├── configs ├── _base_ │ ├── datasets │ │ ├── cityscapes_detection.py │ │ ├── cityscapes_instance.py │ │ ├── coco_detection.py │ │ ├── coco_instance.py │ │ ├── coco_instance_semantic.py │ │ ├── coco_panoptic.py │ │ ├── deepfashion.py │ │ ├── lvis_v0.5_instance.py │ │ ├── lvis_v1_instance.py │ │ ├── openimages_detection.py │ │ ├── voc0712.py │ │ └── wider_face.py │ ├── default_runtime.py │ ├── models │ │ ├── cascade_mask_rcnn_r50_fpn.py │ │ ├── cascade_rcnn_r50_fpn.py │ │ ├── fast_rcnn_r50_fpn.py │ │ ├── faster_rcnn_r50_caffe_c4.py │ │ ├── faster_rcnn_r50_caffe_dc5.py │ │ ├── faster_rcnn_r50_fpn.py │ │ ├── mask_rcnn_r50_caffe_c4.py │ │ ├── mask_rcnn_r50_fpn.py │ │ ├── retinanet_r50_fpn.py │ │ ├── rpn_r50_caffe_c4.py │ │ ├── rpn_r50_fpn.py │ │ └── ssd300.py │ └── schedules │ │ ├── schedule_1x.py │ │ ├── schedule_20e.py │ │ └── schedule_2x.py └── openinst │ ├── coco_to_uvo_ins.py │ ├── queryinst_r50_1x_coco.py │ └── queryinst_r50_3x_lsj_coco.py ├── core ├── __init__.py ├── bbox │ ├── __init__.py │ ├── assigners │ │ ├── __init__.py │ │ └── hungarian_oln_assigner.py │ └── match_costs │ │ ├── __init__.py │ │ └── objectness_l1_cost.py └── hook │ ├── __init__.py │ └── ema.py ├── datasets ├── __init__.py ├── coco.py ├── coco_split_dataset.py ├── cocoeval_wrappers.py ├── objects365_split_dataset.py ├── pipelines │ ├── __init__.py │ └── copypaste.py └── uvo_dataset.py ├── models ├── __init__.py ├── necks │ ├── __init__.py │ └── bifpn.py └── roi_heads │ ├── __init__.py │ ├── bbox_heads │ ├── __init__.py │ └── dii_score_head.py │ ├── mask_heads │ ├── __init__.py │ ├── dynamic_mask_head.py │ ├── maskiou_head.py │ └── mha_maskiou_head.py │ └── sparse_score_roi_head.py └── tools ├── analysis_tools ├── analyze_logs.py ├── analyze_results.py ├── benchmark.py ├── coco_error_analysis.py ├── confusion_matrix.py ├── eval_metric.py ├── get_flops.py ├── optimize_anchors.py ├── robustness_eval.py └── test_robustness.py ├── dataset_converters ├── cityscapes.py ├── images2coco.py └── pascal_voc.py ├── deployment ├── mmdet2torchserve.py ├── mmdet_handler.py ├── onnx2tensorrt.py ├── pytorch2onnx.py ├── test.py └── test_torchserver.py ├── dist_test.sh ├── dist_train.sh ├── misc ├── browse_dataset.py ├── download_dataset.py ├── gen_coco_panoptic_test_info.py ├── get_image_metas.py └── print_config.py ├── model_converters ├── detectron2pytorch.py ├── publish_model.py ├── regnet2mmdet.py ├── selfsup2mmdet.py ├── upgrade_model_version.py └── upgrade_ssd_version.py ├── slurm_test.sh ├── slurm_train.sh ├── test.py └── train.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hustvl/OpenInst/ae5a72fc3ab2f686d6760fff0d7846174d55cad6/.DS_Store -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # 2 | **/*.pyc 3 | **/__pycache__ 4 | work_dirs 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # OpenInst 2 | > [**OpenInst: A Simple Query-Based Method for Open-World Instance Segmentation**](https://arxiv.org/abs/2303.15859) 3 | > 4 | > Cheng Wang, Guoli Wang, Qian Zhang, Peng Guo, Wenyu Liu, Xinggang Wang 5 | > 6 | > *[arXiv 2303.15859](https://arxiv.org/abs/2303.15859)* 7 | 8 | ## Abstract 9 | Open-world instance segmentation has recently gained significant popularity due to its importance in many real-world applications, such as autonomous driving, robot perception, and remote sensing. However, previous methods have either produced unsatisfactory results or relied on complex systems and paradigms. We wonder if there is a simple way to obtain state-of-the-art results. Fortunately, we have identified two observations that help us achieve the best of both worlds: 1) query-based methods demonstrate superiority over dense proposal-based methods in open-world instance segmentation, and 2) learning localization cues is sufficient for open-world instance segmentation. Based on these observations, we propose a simple query-based method named OpenInst for open-world instance segmentation. OpenInst leverages advanced query-based methods like QueryInst and focuses on learning localization cues. Notably, OpenInst is an extremely simple and straightforward framework without any auxiliary modules or post-processing, yet achieves state-of-the-art results on multiple benchmarks. Specifically, in the COCO->UVO scenario, OpenInst achieves a mask Average Recall (AR) of 53.3, outperforming the previous best methods by 2.0 AR with a simpler structure. We hope that OpenInst can serve as a solid baseline for future research in this area. 10 | 11 |

12 | 13 | ## Cross-dataset instance segmentation performance 14 | ### Results on COCO->UVO 15 | | Method | Epoch | ARbox | AR0.75 | AR0.5 | ARs | ARm | ARl | 16 | |------------|--|------|------|------|------|-------|-------| 17 | | OpenInst | 12 | 59.1 | 48.7 | 72.6 | 51.4 | 26.4 | 44.3 | 60.4 | 18 | | OpenInst | 36 | 63.0 | 53.3 | 76.6 | 56.8 | 31.8 | 49.4 | 64.3 | 19 | 20 | 21 | | Method | Epoch | ARbox | AR0.75 | AR0.5 | ARs | ARm | ARl | 22 | |------------|--|------|------|------|------|-------|-------| 23 | OpenInst-void | 12 | 58.4 | 48.5 | 71.3 | 51.3 | 25.1 | 43.7 | 60.8 | 24 | OpenInst-cls | 12 | 55.7 | 44.9 | 72.1 | 46.7 | 24.0 | 41.8 | 55.2 | 25 | OpenInst-box | 12 | 59.1 | 48.7 | 72.6 | 51.4 | 26.4 | 44.3 | 60.4 | 26 | OpenInst-mask | 12 | 58.1 | 47.8 | 71.2 | 50.6 | 24.4 | 43.3 | 60.1 | 27 | OpenInst-fusion | 12 | 58.6 | 48.1 | 72.0 | 50.8 | 25.6 | 43.9 | 59.8 | 28 | 29 | 30 | ## Training 31 | You need to change the **ann_file** and **img_prefix** in [coco_to_uvo_ins.py](https://github.com/hustvl/OpenInst/blob/main/configs/openinst/coco_to_uvo_ins.py), and then run as follows: 32 | ``` 33 | # training on COCO train set, evaluate on UVO val set. 34 | sh tools/dist_train.sh configs/openinst/coco_to_uvo_ins.py 8 35 | ``` 36 | ## Testing 37 | ``` 38 | sh tools/dist_train.sh configs/openinst/coco_to_uvo_ins.py /path/to/model 8 39 | ``` 40 | 41 | ## Citation 42 | If you find OpenInst useful in your research or applications, please consider giving us a star 🌟 and citing it using the following BibTeX entry. 43 | ```bibtex 44 | @article{wang2023openinst, 45 | title={OpenInst: A Simple Query-Based Method for Open-World Instance Segmentation}, 46 | author={Cheng Wang and Guoli Wang and Qian Zhang and Peng Guo and Wenyu Liu and Xinggang Wang}, 47 | year={2023}, 48 | eprint={2303.15859}, 49 | archivePrefix={arXiv}, 50 | primaryClass={cs.CV} 51 | } 52 | ``` 53 | 54 | ## Acknowledgements 55 | A large part of the code is borrowed from [OLN](https://github.com/mcahny/object_localization_network), [QueryInst](https://github.com/hustvl/QueryInst), and [MMDetection](https://github.com/open-mmlab/mmdetection). 56 | Thanks for their great works. 57 | -------------------------------------------------------------------------------- /assets/OpenInst.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/hustvl/OpenInst/ae5a72fc3ab2f686d6760fff0d7846174d55cad6/assets/OpenInst.png -------------------------------------------------------------------------------- /configs/_base_/datasets/cityscapes_detection.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CityscapesDataset' 3 | data_root = 'data/cityscapes/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True), 9 | dict( 10 | type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True), 11 | dict(type='RandomFlip', flip_ratio=0.5), 12 | dict(type='Normalize', **img_norm_cfg), 13 | dict(type='Pad', size_divisor=32), 14 | dict(type='DefaultFormatBundle'), 15 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 16 | ] 17 | test_pipeline = [ 18 | dict(type='LoadImageFromFile'), 19 | dict( 20 | type='MultiScaleFlipAug', 21 | img_scale=(2048, 1024), 22 | flip=False, 23 | transforms=[ 24 | dict(type='Resize', keep_ratio=True), 25 | dict(type='RandomFlip'), 26 | dict(type='Normalize', **img_norm_cfg), 27 | dict(type='Pad', size_divisor=32), 28 | dict(type='ImageToTensor', keys=['img']), 29 | dict(type='Collect', keys=['img']), 30 | ]) 31 | ] 32 | data = dict( 33 | samples_per_gpu=1, 34 | workers_per_gpu=2, 35 | train=dict( 36 | type='RepeatDataset', 37 | times=8, 38 | dataset=dict( 39 | type=dataset_type, 40 | ann_file=data_root + 41 | 'annotations/instancesonly_filtered_gtFine_train.json', 42 | img_prefix=data_root + 'leftImg8bit/train/', 43 | pipeline=train_pipeline)), 44 | val=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 47 | 'annotations/instancesonly_filtered_gtFine_val.json', 48 | img_prefix=data_root + 'leftImg8bit/val/', 49 | pipeline=test_pipeline), 50 | test=dict( 51 | type=dataset_type, 52 | ann_file=data_root + 53 | 'annotations/instancesonly_filtered_gtFine_test.json', 54 | img_prefix=data_root + 'leftImg8bit/test/', 55 | pipeline=test_pipeline)) 56 | evaluation = dict(interval=1, metric='bbox') 57 | -------------------------------------------------------------------------------- /configs/_base_/datasets/cityscapes_instance.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CityscapesDataset' 3 | data_root = 'data/cityscapes/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 9 | dict( 10 | type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True), 11 | dict(type='RandomFlip', flip_ratio=0.5), 12 | dict(type='Normalize', **img_norm_cfg), 13 | dict(type='Pad', size_divisor=32), 14 | dict(type='DefaultFormatBundle'), 15 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 16 | ] 17 | test_pipeline = [ 18 | dict(type='LoadImageFromFile'), 19 | dict( 20 | type='MultiScaleFlipAug', 21 | img_scale=(2048, 1024), 22 | flip=False, 23 | transforms=[ 24 | dict(type='Resize', keep_ratio=True), 25 | dict(type='RandomFlip'), 26 | dict(type='Normalize', **img_norm_cfg), 27 | dict(type='Pad', size_divisor=32), 28 | dict(type='ImageToTensor', keys=['img']), 29 | dict(type='Collect', keys=['img']), 30 | ]) 31 | ] 32 | data = dict( 33 | samples_per_gpu=1, 34 | workers_per_gpu=2, 35 | train=dict( 36 | type='RepeatDataset', 37 | times=8, 38 | dataset=dict( 39 | type=dataset_type, 40 | ann_file=data_root + 41 | 'annotations/instancesonly_filtered_gtFine_train.json', 42 | img_prefix=data_root + 'leftImg8bit/train/', 43 | pipeline=train_pipeline)), 44 | val=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 47 | 'annotations/instancesonly_filtered_gtFine_val.json', 48 | img_prefix=data_root + 'leftImg8bit/val/', 49 | pipeline=test_pipeline), 50 | test=dict( 51 | type=dataset_type, 52 | ann_file=data_root + 53 | 'annotations/instancesonly_filtered_gtFine_test.json', 54 | img_prefix=data_root + 'leftImg8bit/test/', 55 | pipeline=test_pipeline)) 56 | evaluation = dict(metric=['bbox', 'segm']) 57 | -------------------------------------------------------------------------------- /configs/_base_/datasets/coco_detection.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CocoDataset' 3 | data_root = 'data/coco/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True), 9 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(1333, 800), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ]) 30 | ] 31 | data = dict( 32 | samples_per_gpu=2, 33 | workers_per_gpu=2, 34 | train=dict( 35 | type=dataset_type, 36 | ann_file=data_root + 'annotations/instances_train2017.json', 37 | img_prefix=data_root + 'train2017/', 38 | pipeline=train_pipeline), 39 | val=dict( 40 | type=dataset_type, 41 | ann_file=data_root + 'annotations/instances_val2017.json', 42 | img_prefix=data_root + 'val2017/', 43 | pipeline=test_pipeline), 44 | test=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_val2017.json', 47 | img_prefix=data_root + 'val2017/', 48 | pipeline=test_pipeline)) 49 | evaluation = dict(interval=1, metric='bbox') 50 | -------------------------------------------------------------------------------- /configs/_base_/datasets/coco_instance.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CocoDataset' 3 | data_root = 'data/coco/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 9 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(1333, 800), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ]) 30 | ] 31 | data = dict( 32 | samples_per_gpu=2, 33 | workers_per_gpu=2, 34 | train=dict( 35 | type=dataset_type, 36 | ann_file=data_root + 'annotations/instances_train2017.json', 37 | img_prefix=data_root + 'train2017/', 38 | pipeline=train_pipeline), 39 | val=dict( 40 | type=dataset_type, 41 | ann_file=data_root + 'annotations/instances_val2017.json', 42 | img_prefix=data_root + 'val2017/', 43 | pipeline=test_pipeline), 44 | test=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_val2017.json', 47 | img_prefix=data_root + 'val2017/', 48 | pipeline=test_pipeline)) 49 | evaluation = dict(metric=['bbox', 'segm']) 50 | -------------------------------------------------------------------------------- /configs/_base_/datasets/coco_instance_semantic.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CocoDataset' 3 | data_root = 'data/coco/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict( 9 | type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True), 10 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 11 | dict(type='RandomFlip', flip_ratio=0.5), 12 | dict(type='Normalize', **img_norm_cfg), 13 | dict(type='Pad', size_divisor=32), 14 | dict(type='SegRescale', scale_factor=1 / 8), 15 | dict(type='DefaultFormatBundle'), 16 | dict( 17 | type='Collect', 18 | keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']), 19 | ] 20 | test_pipeline = [ 21 | dict(type='LoadImageFromFile'), 22 | dict( 23 | type='MultiScaleFlipAug', 24 | img_scale=(1333, 800), 25 | flip=False, 26 | transforms=[ 27 | dict(type='Resize', keep_ratio=True), 28 | dict(type='RandomFlip', flip_ratio=0.5), 29 | dict(type='Normalize', **img_norm_cfg), 30 | dict(type='Pad', size_divisor=32), 31 | dict(type='ImageToTensor', keys=['img']), 32 | dict(type='Collect', keys=['img']), 33 | ]) 34 | ] 35 | data = dict( 36 | samples_per_gpu=2, 37 | workers_per_gpu=2, 38 | train=dict( 39 | type=dataset_type, 40 | ann_file=data_root + 'annotations/instances_train2017.json', 41 | img_prefix=data_root + 'train2017/', 42 | seg_prefix=data_root + 'stuffthingmaps/train2017/', 43 | pipeline=train_pipeline), 44 | val=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_val2017.json', 47 | img_prefix=data_root + 'val2017/', 48 | pipeline=test_pipeline), 49 | test=dict( 50 | type=dataset_type, 51 | ann_file=data_root + 'annotations/instances_val2017.json', 52 | img_prefix=data_root + 'val2017/', 53 | pipeline=test_pipeline)) 54 | evaluation = dict(metric=['bbox', 'segm']) 55 | -------------------------------------------------------------------------------- /configs/_base_/datasets/coco_panoptic.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'CocoPanopticDataset' 3 | data_root = 'data/coco/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict( 9 | type='LoadPanopticAnnotations', 10 | with_bbox=True, 11 | with_mask=True, 12 | with_seg=True), 13 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 14 | dict(type='RandomFlip', flip_ratio=0.5), 15 | dict(type='Normalize', **img_norm_cfg), 16 | dict(type='Pad', size_divisor=32), 17 | dict(type='SegRescale', scale_factor=1 / 4), 18 | dict(type='DefaultFormatBundle'), 19 | dict( 20 | type='Collect', 21 | keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']), 22 | ] 23 | test_pipeline = [ 24 | dict(type='LoadImageFromFile'), 25 | dict( 26 | type='MultiScaleFlipAug', 27 | img_scale=(1333, 800), 28 | flip=False, 29 | transforms=[ 30 | dict(type='Resize', keep_ratio=True), 31 | dict(type='RandomFlip'), 32 | dict(type='Normalize', **img_norm_cfg), 33 | dict(type='Pad', size_divisor=32), 34 | dict(type='ImageToTensor', keys=['img']), 35 | dict(type='Collect', keys=['img']), 36 | ]) 37 | ] 38 | data = dict( 39 | samples_per_gpu=2, 40 | workers_per_gpu=2, 41 | train=dict( 42 | type=dataset_type, 43 | ann_file=data_root + 'annotations/panoptic_train2017.json', 44 | img_prefix=data_root + 'train2017/', 45 | seg_prefix=data_root + 'annotations/panoptic_train2017/', 46 | pipeline=train_pipeline), 47 | val=dict( 48 | type=dataset_type, 49 | ann_file=data_root + 'annotations/panoptic_val2017.json', 50 | img_prefix=data_root + 'val2017/', 51 | seg_prefix=data_root + 'annotations/panoptic_val2017/', 52 | pipeline=test_pipeline), 53 | test=dict( 54 | type=dataset_type, 55 | ann_file=data_root + 'annotations/panoptic_val2017.json', 56 | img_prefix=data_root + 'val2017/', 57 | seg_prefix=data_root + 'annotations/panoptic_val2017/', 58 | pipeline=test_pipeline)) 59 | evaluation = dict(interval=1, metric=['PQ']) 60 | -------------------------------------------------------------------------------- /configs/_base_/datasets/deepfashion.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'DeepFashionDataset' 3 | data_root = 'data/DeepFashion/In-shop/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 9 | dict(type='Resize', img_scale=(750, 1101), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(750, 1101), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ]) 30 | ] 31 | data = dict( 32 | imgs_per_gpu=2, 33 | workers_per_gpu=1, 34 | train=dict( 35 | type=dataset_type, 36 | ann_file=data_root + 'annotations/DeepFashion_segmentation_query.json', 37 | img_prefix=data_root + 'Img/', 38 | pipeline=train_pipeline, 39 | data_root=data_root), 40 | val=dict( 41 | type=dataset_type, 42 | ann_file=data_root + 'annotations/DeepFashion_segmentation_query.json', 43 | img_prefix=data_root + 'Img/', 44 | pipeline=test_pipeline, 45 | data_root=data_root), 46 | test=dict( 47 | type=dataset_type, 48 | ann_file=data_root + 49 | 'annotations/DeepFashion_segmentation_gallery.json', 50 | img_prefix=data_root + 'Img/', 51 | pipeline=test_pipeline, 52 | data_root=data_root)) 53 | evaluation = dict(interval=5, metric=['bbox', 'segm']) 54 | -------------------------------------------------------------------------------- /configs/_base_/datasets/lvis_v0.5_instance.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | _base_ = 'coco_instance.py' 3 | dataset_type = 'LVISV05Dataset' 4 | data_root = 'data/lvis_v0.5/' 5 | data = dict( 6 | samples_per_gpu=2, 7 | workers_per_gpu=2, 8 | train=dict( 9 | _delete_=True, 10 | type='ClassBalancedDataset', 11 | oversample_thr=1e-3, 12 | dataset=dict( 13 | type=dataset_type, 14 | ann_file=data_root + 'annotations/lvis_v0.5_train.json', 15 | img_prefix=data_root + 'train2017/')), 16 | val=dict( 17 | type=dataset_type, 18 | ann_file=data_root + 'annotations/lvis_v0.5_val.json', 19 | img_prefix=data_root + 'val2017/'), 20 | test=dict( 21 | type=dataset_type, 22 | ann_file=data_root + 'annotations/lvis_v0.5_val.json', 23 | img_prefix=data_root + 'val2017/')) 24 | evaluation = dict(metric=['bbox', 'segm']) 25 | -------------------------------------------------------------------------------- /configs/_base_/datasets/lvis_v1_instance.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | _base_ = 'coco_instance.py' 3 | dataset_type = 'LVISV1Dataset' 4 | data_root = 'data/lvis_v1/' 5 | data = dict( 6 | samples_per_gpu=2, 7 | workers_per_gpu=2, 8 | train=dict( 9 | _delete_=True, 10 | type='ClassBalancedDataset', 11 | oversample_thr=1e-3, 12 | dataset=dict( 13 | type=dataset_type, 14 | ann_file=data_root + 'annotations/lvis_v1_train.json', 15 | img_prefix=data_root)), 16 | val=dict( 17 | type=dataset_type, 18 | ann_file=data_root + 'annotations/lvis_v1_val.json', 19 | img_prefix=data_root), 20 | test=dict( 21 | type=dataset_type, 22 | ann_file=data_root + 'annotations/lvis_v1_val.json', 23 | img_prefix=data_root)) 24 | evaluation = dict(metric=['bbox', 'segm']) 25 | -------------------------------------------------------------------------------- /configs/_base_/datasets/openimages_detection.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'OpenImagesDataset' 3 | data_root = 'data/OpenImages/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True, denorm_bbox=True), 9 | dict(type='Resize', img_scale=(1024, 800), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(1024, 800), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ], 30 | ), 31 | ] 32 | data = dict( 33 | samples_per_gpu=2, 34 | workers_per_gpu=0, # workers_per_gpu > 0 may occur out of memory 35 | train=dict( 36 | type=dataset_type, 37 | ann_file=data_root + 'annotations/oidv6-train-annotations-bbox.csv', 38 | img_prefix=data_root + 'OpenImages/train/', 39 | label_file=data_root + 'annotations/class-descriptions-boxable.csv', 40 | hierarchy_file=data_root + 41 | 'annotations/bbox_labels_600_hierarchy.json', 42 | pipeline=train_pipeline), 43 | val=dict( 44 | type=dataset_type, 45 | ann_file=data_root + 'annotations/validation-annotations-bbox.csv', 46 | img_prefix=data_root + 'OpenImages/validation/', 47 | label_file=data_root + 'annotations/class-descriptions-boxable.csv', 48 | hierarchy_file=data_root + 49 | 'annotations/bbox_labels_600_hierarchy.json', 50 | meta_file=data_root + 'annotations/validation-image-metas.pkl', 51 | image_level_ann_file=data_root + 52 | 'annotations/validation-annotations-human-imagelabels-boxable.csv', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'annotations/validation-annotations-bbox.csv', 57 | img_prefix=data_root + 'OpenImages/validation/', 58 | label_file=data_root + 'annotations/class-descriptions-boxable.csv', 59 | hierarchy_file=data_root + 60 | 'annotations/bbox_labels_600_hierarchy.json', 61 | meta_file=data_root + 'annotations/validation-image-metas.pkl', 62 | image_level_ann_file=data_root + 63 | 'annotations/validation-annotations-human-imagelabels-boxable.csv', 64 | pipeline=test_pipeline)) 65 | evaluation = dict(interval=1, metric='mAP') 66 | -------------------------------------------------------------------------------- /configs/_base_/datasets/voc0712.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'VOCDataset' 3 | data_root = 'data/VOCdevkit/' 4 | img_norm_cfg = dict( 5 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 6 | train_pipeline = [ 7 | dict(type='LoadImageFromFile'), 8 | dict(type='LoadAnnotations', with_bbox=True), 9 | dict(type='Resize', img_scale=(1000, 600), keep_ratio=True), 10 | dict(type='RandomFlip', flip_ratio=0.5), 11 | dict(type='Normalize', **img_norm_cfg), 12 | dict(type='Pad', size_divisor=32), 13 | dict(type='DefaultFormatBundle'), 14 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 15 | ] 16 | test_pipeline = [ 17 | dict(type='LoadImageFromFile'), 18 | dict( 19 | type='MultiScaleFlipAug', 20 | img_scale=(1000, 600), 21 | flip=False, 22 | transforms=[ 23 | dict(type='Resize', keep_ratio=True), 24 | dict(type='RandomFlip'), 25 | dict(type='Normalize', **img_norm_cfg), 26 | dict(type='Pad', size_divisor=32), 27 | dict(type='ImageToTensor', keys=['img']), 28 | dict(type='Collect', keys=['img']), 29 | ]) 30 | ] 31 | data = dict( 32 | samples_per_gpu=2, 33 | workers_per_gpu=2, 34 | train=dict( 35 | type='RepeatDataset', 36 | times=3, 37 | dataset=dict( 38 | type=dataset_type, 39 | ann_file=[ 40 | data_root + 'VOC2007/ImageSets/Main/trainval.txt', 41 | data_root + 'VOC2012/ImageSets/Main/trainval.txt' 42 | ], 43 | img_prefix=[data_root + 'VOC2007/', data_root + 'VOC2012/'], 44 | pipeline=train_pipeline)), 45 | val=dict( 46 | type=dataset_type, 47 | ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt', 48 | img_prefix=data_root + 'VOC2007/', 49 | pipeline=test_pipeline), 50 | test=dict( 51 | type=dataset_type, 52 | ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt', 53 | img_prefix=data_root + 'VOC2007/', 54 | pipeline=test_pipeline)) 55 | evaluation = dict(interval=1, metric='mAP') 56 | -------------------------------------------------------------------------------- /configs/_base_/datasets/wider_face.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | dataset_type = 'WIDERFaceDataset' 3 | data_root = 'data/WIDERFace/' 4 | img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True) 5 | train_pipeline = [ 6 | dict(type='LoadImageFromFile', to_float32=True), 7 | dict(type='LoadAnnotations', with_bbox=True), 8 | dict( 9 | type='PhotoMetricDistortion', 10 | brightness_delta=32, 11 | contrast_range=(0.5, 1.5), 12 | saturation_range=(0.5, 1.5), 13 | hue_delta=18), 14 | dict( 15 | type='Expand', 16 | mean=img_norm_cfg['mean'], 17 | to_rgb=img_norm_cfg['to_rgb'], 18 | ratio_range=(1, 4)), 19 | dict( 20 | type='MinIoURandomCrop', 21 | min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), 22 | min_crop_size=0.3), 23 | dict(type='Resize', img_scale=(300, 300), keep_ratio=False), 24 | dict(type='Normalize', **img_norm_cfg), 25 | dict(type='RandomFlip', flip_ratio=0.5), 26 | dict(type='DefaultFormatBundle'), 27 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 28 | ] 29 | test_pipeline = [ 30 | dict(type='LoadImageFromFile'), 31 | dict( 32 | type='MultiScaleFlipAug', 33 | img_scale=(300, 300), 34 | flip=False, 35 | transforms=[ 36 | dict(type='Resize', keep_ratio=False), 37 | dict(type='Normalize', **img_norm_cfg), 38 | dict(type='ImageToTensor', keys=['img']), 39 | dict(type='Collect', keys=['img']), 40 | ]) 41 | ] 42 | data = dict( 43 | samples_per_gpu=60, 44 | workers_per_gpu=2, 45 | train=dict( 46 | type='RepeatDataset', 47 | times=2, 48 | dataset=dict( 49 | type=dataset_type, 50 | ann_file=data_root + 'train.txt', 51 | img_prefix=data_root + 'WIDER_train/', 52 | min_size=17, 53 | pipeline=train_pipeline)), 54 | val=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'val.txt', 57 | img_prefix=data_root + 'WIDER_val/', 58 | pipeline=test_pipeline), 59 | test=dict( 60 | type=dataset_type, 61 | ann_file=data_root + 'val.txt', 62 | img_prefix=data_root + 'WIDER_val/', 63 | pipeline=test_pipeline)) 64 | -------------------------------------------------------------------------------- /configs/_base_/default_runtime.py: -------------------------------------------------------------------------------- 1 | checkpoint_config = dict(interval=1) 2 | # yapf:disable 3 | log_config = dict( 4 | interval=50, 5 | hooks=[ 6 | dict(type='TextLoggerHook'), 7 | # dict(type='TensorboardLoggerHook') 8 | ]) 9 | # yapf:enable 10 | custom_hooks = [dict(type='NumClassCheckHook')] 11 | 12 | dist_params = dict(backend='nccl') 13 | log_level = 'INFO' 14 | load_from = None 15 | resume_from = None 16 | workflow = [('train', 1)] 17 | 18 | # disable opencv multithreading to avoid system being overloaded 19 | opencv_num_threads = 0 20 | # set multi-process start method as `fork` to speed up the training 21 | mp_start_method = 'fork' 22 | -------------------------------------------------------------------------------- /configs/_base_/models/cascade_mask_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='CascadeRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), 35 | roi_head=dict( 36 | type='CascadeRoIHead', 37 | num_stages=3, 38 | stage_loss_weights=[1, 0.5, 0.25], 39 | bbox_roi_extractor=dict( 40 | type='SingleRoIExtractor', 41 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 42 | out_channels=256, 43 | featmap_strides=[4, 8, 16, 32]), 44 | bbox_head=[ 45 | dict( 46 | type='Shared2FCBBoxHead', 47 | in_channels=256, 48 | fc_out_channels=1024, 49 | roi_feat_size=7, 50 | num_classes=80, 51 | bbox_coder=dict( 52 | type='DeltaXYWHBBoxCoder', 53 | target_means=[0., 0., 0., 0.], 54 | target_stds=[0.1, 0.1, 0.2, 0.2]), 55 | reg_class_agnostic=True, 56 | loss_cls=dict( 57 | type='CrossEntropyLoss', 58 | use_sigmoid=False, 59 | loss_weight=1.0), 60 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, 61 | loss_weight=1.0)), 62 | dict( 63 | type='Shared2FCBBoxHead', 64 | in_channels=256, 65 | fc_out_channels=1024, 66 | roi_feat_size=7, 67 | num_classes=80, 68 | bbox_coder=dict( 69 | type='DeltaXYWHBBoxCoder', 70 | target_means=[0., 0., 0., 0.], 71 | target_stds=[0.05, 0.05, 0.1, 0.1]), 72 | reg_class_agnostic=True, 73 | loss_cls=dict( 74 | type='CrossEntropyLoss', 75 | use_sigmoid=False, 76 | loss_weight=1.0), 77 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, 78 | loss_weight=1.0)), 79 | dict( 80 | type='Shared2FCBBoxHead', 81 | in_channels=256, 82 | fc_out_channels=1024, 83 | roi_feat_size=7, 84 | num_classes=80, 85 | bbox_coder=dict( 86 | type='DeltaXYWHBBoxCoder', 87 | target_means=[0., 0., 0., 0.], 88 | target_stds=[0.033, 0.033, 0.067, 0.067]), 89 | reg_class_agnostic=True, 90 | loss_cls=dict( 91 | type='CrossEntropyLoss', 92 | use_sigmoid=False, 93 | loss_weight=1.0), 94 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) 95 | ], 96 | mask_roi_extractor=dict( 97 | type='SingleRoIExtractor', 98 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), 99 | out_channels=256, 100 | featmap_strides=[4, 8, 16, 32]), 101 | mask_head=dict( 102 | type='FCNMaskHead', 103 | num_convs=4, 104 | in_channels=256, 105 | conv_out_channels=256, 106 | num_classes=80, 107 | loss_mask=dict( 108 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), 109 | # model training and testing settings 110 | train_cfg=dict( 111 | rpn=dict( 112 | assigner=dict( 113 | type='MaxIoUAssigner', 114 | pos_iou_thr=0.7, 115 | neg_iou_thr=0.3, 116 | min_pos_iou=0.3, 117 | match_low_quality=True, 118 | ignore_iof_thr=-1), 119 | sampler=dict( 120 | type='RandomSampler', 121 | num=256, 122 | pos_fraction=0.5, 123 | neg_pos_ub=-1, 124 | add_gt_as_proposals=False), 125 | allowed_border=0, 126 | pos_weight=-1, 127 | debug=False), 128 | rpn_proposal=dict( 129 | nms_pre=2000, 130 | max_per_img=2000, 131 | nms=dict(type='nms', iou_threshold=0.7), 132 | min_bbox_size=0), 133 | rcnn=[ 134 | dict( 135 | assigner=dict( 136 | type='MaxIoUAssigner', 137 | pos_iou_thr=0.5, 138 | neg_iou_thr=0.5, 139 | min_pos_iou=0.5, 140 | match_low_quality=False, 141 | ignore_iof_thr=-1), 142 | sampler=dict( 143 | type='RandomSampler', 144 | num=512, 145 | pos_fraction=0.25, 146 | neg_pos_ub=-1, 147 | add_gt_as_proposals=True), 148 | mask_size=28, 149 | pos_weight=-1, 150 | debug=False), 151 | dict( 152 | assigner=dict( 153 | type='MaxIoUAssigner', 154 | pos_iou_thr=0.6, 155 | neg_iou_thr=0.6, 156 | min_pos_iou=0.6, 157 | match_low_quality=False, 158 | ignore_iof_thr=-1), 159 | sampler=dict( 160 | type='RandomSampler', 161 | num=512, 162 | pos_fraction=0.25, 163 | neg_pos_ub=-1, 164 | add_gt_as_proposals=True), 165 | mask_size=28, 166 | pos_weight=-1, 167 | debug=False), 168 | dict( 169 | assigner=dict( 170 | type='MaxIoUAssigner', 171 | pos_iou_thr=0.7, 172 | neg_iou_thr=0.7, 173 | min_pos_iou=0.7, 174 | match_low_quality=False, 175 | ignore_iof_thr=-1), 176 | sampler=dict( 177 | type='RandomSampler', 178 | num=512, 179 | pos_fraction=0.25, 180 | neg_pos_ub=-1, 181 | add_gt_as_proposals=True), 182 | mask_size=28, 183 | pos_weight=-1, 184 | debug=False) 185 | ]), 186 | test_cfg=dict( 187 | rpn=dict( 188 | nms_pre=1000, 189 | max_per_img=1000, 190 | nms=dict(type='nms', iou_threshold=0.7), 191 | min_bbox_size=0), 192 | rcnn=dict( 193 | score_thr=0.05, 194 | nms=dict(type='nms', iou_threshold=0.5), 195 | max_per_img=100, 196 | mask_thr_binary=0.5))) 197 | -------------------------------------------------------------------------------- /configs/_base_/models/cascade_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='CascadeRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), 35 | roi_head=dict( 36 | type='CascadeRoIHead', 37 | num_stages=3, 38 | stage_loss_weights=[1, 0.5, 0.25], 39 | bbox_roi_extractor=dict( 40 | type='SingleRoIExtractor', 41 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 42 | out_channels=256, 43 | featmap_strides=[4, 8, 16, 32]), 44 | bbox_head=[ 45 | dict( 46 | type='Shared2FCBBoxHead', 47 | in_channels=256, 48 | fc_out_channels=1024, 49 | roi_feat_size=7, 50 | num_classes=80, 51 | bbox_coder=dict( 52 | type='DeltaXYWHBBoxCoder', 53 | target_means=[0., 0., 0., 0.], 54 | target_stds=[0.1, 0.1, 0.2, 0.2]), 55 | reg_class_agnostic=True, 56 | loss_cls=dict( 57 | type='CrossEntropyLoss', 58 | use_sigmoid=False, 59 | loss_weight=1.0), 60 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, 61 | loss_weight=1.0)), 62 | dict( 63 | type='Shared2FCBBoxHead', 64 | in_channels=256, 65 | fc_out_channels=1024, 66 | roi_feat_size=7, 67 | num_classes=80, 68 | bbox_coder=dict( 69 | type='DeltaXYWHBBoxCoder', 70 | target_means=[0., 0., 0., 0.], 71 | target_stds=[0.05, 0.05, 0.1, 0.1]), 72 | reg_class_agnostic=True, 73 | loss_cls=dict( 74 | type='CrossEntropyLoss', 75 | use_sigmoid=False, 76 | loss_weight=1.0), 77 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, 78 | loss_weight=1.0)), 79 | dict( 80 | type='Shared2FCBBoxHead', 81 | in_channels=256, 82 | fc_out_channels=1024, 83 | roi_feat_size=7, 84 | num_classes=80, 85 | bbox_coder=dict( 86 | type='DeltaXYWHBBoxCoder', 87 | target_means=[0., 0., 0., 0.], 88 | target_stds=[0.033, 0.033, 0.067, 0.067]), 89 | reg_class_agnostic=True, 90 | loss_cls=dict( 91 | type='CrossEntropyLoss', 92 | use_sigmoid=False, 93 | loss_weight=1.0), 94 | loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) 95 | ]), 96 | # model training and testing settings 97 | train_cfg=dict( 98 | rpn=dict( 99 | assigner=dict( 100 | type='MaxIoUAssigner', 101 | pos_iou_thr=0.7, 102 | neg_iou_thr=0.3, 103 | min_pos_iou=0.3, 104 | match_low_quality=True, 105 | ignore_iof_thr=-1), 106 | sampler=dict( 107 | type='RandomSampler', 108 | num=256, 109 | pos_fraction=0.5, 110 | neg_pos_ub=-1, 111 | add_gt_as_proposals=False), 112 | allowed_border=0, 113 | pos_weight=-1, 114 | debug=False), 115 | rpn_proposal=dict( 116 | nms_pre=2000, 117 | max_per_img=2000, 118 | nms=dict(type='nms', iou_threshold=0.7), 119 | min_bbox_size=0), 120 | rcnn=[ 121 | dict( 122 | assigner=dict( 123 | type='MaxIoUAssigner', 124 | pos_iou_thr=0.5, 125 | neg_iou_thr=0.5, 126 | min_pos_iou=0.5, 127 | match_low_quality=False, 128 | ignore_iof_thr=-1), 129 | sampler=dict( 130 | type='RandomSampler', 131 | num=512, 132 | pos_fraction=0.25, 133 | neg_pos_ub=-1, 134 | add_gt_as_proposals=True), 135 | pos_weight=-1, 136 | debug=False), 137 | dict( 138 | assigner=dict( 139 | type='MaxIoUAssigner', 140 | pos_iou_thr=0.6, 141 | neg_iou_thr=0.6, 142 | min_pos_iou=0.6, 143 | match_low_quality=False, 144 | ignore_iof_thr=-1), 145 | sampler=dict( 146 | type='RandomSampler', 147 | num=512, 148 | pos_fraction=0.25, 149 | neg_pos_ub=-1, 150 | add_gt_as_proposals=True), 151 | pos_weight=-1, 152 | debug=False), 153 | dict( 154 | assigner=dict( 155 | type='MaxIoUAssigner', 156 | pos_iou_thr=0.7, 157 | neg_iou_thr=0.7, 158 | min_pos_iou=0.7, 159 | match_low_quality=False, 160 | ignore_iof_thr=-1), 161 | sampler=dict( 162 | type='RandomSampler', 163 | num=512, 164 | pos_fraction=0.25, 165 | neg_pos_ub=-1, 166 | add_gt_as_proposals=True), 167 | pos_weight=-1, 168 | debug=False) 169 | ]), 170 | test_cfg=dict( 171 | rpn=dict( 172 | nms_pre=1000, 173 | max_per_img=1000, 174 | nms=dict(type='nms', iou_threshold=0.7), 175 | min_bbox_size=0), 176 | rcnn=dict( 177 | score_thr=0.05, 178 | nms=dict(type='nms', iou_threshold=0.5), 179 | max_per_img=100))) 180 | -------------------------------------------------------------------------------- /configs/_base_/models/fast_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='FastRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | roi_head=dict( 20 | type='StandardRoIHead', 21 | bbox_roi_extractor=dict( 22 | type='SingleRoIExtractor', 23 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 24 | out_channels=256, 25 | featmap_strides=[4, 8, 16, 32]), 26 | bbox_head=dict( 27 | type='Shared2FCBBoxHead', 28 | in_channels=256, 29 | fc_out_channels=1024, 30 | roi_feat_size=7, 31 | num_classes=80, 32 | bbox_coder=dict( 33 | type='DeltaXYWHBBoxCoder', 34 | target_means=[0., 0., 0., 0.], 35 | target_stds=[0.1, 0.1, 0.2, 0.2]), 36 | reg_class_agnostic=False, 37 | loss_cls=dict( 38 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 39 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))), 40 | # model training and testing settings 41 | train_cfg=dict( 42 | rcnn=dict( 43 | assigner=dict( 44 | type='MaxIoUAssigner', 45 | pos_iou_thr=0.5, 46 | neg_iou_thr=0.5, 47 | min_pos_iou=0.5, 48 | match_low_quality=False, 49 | ignore_iof_thr=-1), 50 | sampler=dict( 51 | type='RandomSampler', 52 | num=512, 53 | pos_fraction=0.25, 54 | neg_pos_ub=-1, 55 | add_gt_as_proposals=True), 56 | pos_weight=-1, 57 | debug=False)), 58 | test_cfg=dict( 59 | rcnn=dict( 60 | score_thr=0.05, 61 | nms=dict(type='nms', iou_threshold=0.5), 62 | max_per_img=100))) 63 | -------------------------------------------------------------------------------- /configs/_base_/models/faster_rcnn_r50_caffe_c4.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | norm_cfg = dict(type='BN', requires_grad=False) 3 | model = dict( 4 | type='FasterRCNN', 5 | backbone=dict( 6 | type='ResNet', 7 | depth=50, 8 | num_stages=3, 9 | strides=(1, 2, 2), 10 | dilations=(1, 1, 1), 11 | out_indices=(2, ), 12 | frozen_stages=1, 13 | norm_cfg=norm_cfg, 14 | norm_eval=True, 15 | style='caffe', 16 | init_cfg=dict( 17 | type='Pretrained', 18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=1024, 22 | feat_channels=1024, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[2, 4, 8, 16, 32], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[16]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | shared_head=dict( 38 | type='ResLayer', 39 | depth=50, 40 | stage=3, 41 | stride=2, 42 | dilation=1, 43 | style='caffe', 44 | norm_cfg=norm_cfg, 45 | norm_eval=True), 46 | bbox_roi_extractor=dict( 47 | type='SingleRoIExtractor', 48 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), 49 | out_channels=1024, 50 | featmap_strides=[16]), 51 | bbox_head=dict( 52 | type='BBoxHead', 53 | with_avg_pool=True, 54 | roi_feat_size=7, 55 | in_channels=2048, 56 | num_classes=80, 57 | bbox_coder=dict( 58 | type='DeltaXYWHBBoxCoder', 59 | target_means=[0., 0., 0., 0.], 60 | target_stds=[0.1, 0.1, 0.2, 0.2]), 61 | reg_class_agnostic=False, 62 | loss_cls=dict( 63 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 64 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))), 65 | # model training and testing settings 66 | train_cfg=dict( 67 | rpn=dict( 68 | assigner=dict( 69 | type='MaxIoUAssigner', 70 | pos_iou_thr=0.7, 71 | neg_iou_thr=0.3, 72 | min_pos_iou=0.3, 73 | match_low_quality=True, 74 | ignore_iof_thr=-1), 75 | sampler=dict( 76 | type='RandomSampler', 77 | num=256, 78 | pos_fraction=0.5, 79 | neg_pos_ub=-1, 80 | add_gt_as_proposals=False), 81 | allowed_border=0, 82 | pos_weight=-1, 83 | debug=False), 84 | rpn_proposal=dict( 85 | nms_pre=12000, 86 | max_per_img=2000, 87 | nms=dict(type='nms', iou_threshold=0.7), 88 | min_bbox_size=0), 89 | rcnn=dict( 90 | assigner=dict( 91 | type='MaxIoUAssigner', 92 | pos_iou_thr=0.5, 93 | neg_iou_thr=0.5, 94 | min_pos_iou=0.5, 95 | match_low_quality=False, 96 | ignore_iof_thr=-1), 97 | sampler=dict( 98 | type='RandomSampler', 99 | num=512, 100 | pos_fraction=0.25, 101 | neg_pos_ub=-1, 102 | add_gt_as_proposals=True), 103 | pos_weight=-1, 104 | debug=False)), 105 | test_cfg=dict( 106 | rpn=dict( 107 | nms_pre=6000, 108 | max_per_img=1000, 109 | nms=dict(type='nms', iou_threshold=0.7), 110 | min_bbox_size=0), 111 | rcnn=dict( 112 | score_thr=0.05, 113 | nms=dict(type='nms', iou_threshold=0.5), 114 | max_per_img=100))) 115 | -------------------------------------------------------------------------------- /configs/_base_/models/faster_rcnn_r50_caffe_dc5.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | norm_cfg = dict(type='BN', requires_grad=False) 3 | model = dict( 4 | type='FasterRCNN', 5 | backbone=dict( 6 | type='ResNet', 7 | depth=50, 8 | num_stages=4, 9 | strides=(1, 2, 2, 1), 10 | dilations=(1, 1, 1, 2), 11 | out_indices=(3, ), 12 | frozen_stages=1, 13 | norm_cfg=norm_cfg, 14 | norm_eval=True, 15 | style='caffe', 16 | init_cfg=dict( 17 | type='Pretrained', 18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=2048, 22 | feat_channels=2048, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[2, 4, 8, 16, 32], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[16]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | bbox_roi_extractor=dict( 38 | type='SingleRoIExtractor', 39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 40 | out_channels=2048, 41 | featmap_strides=[16]), 42 | bbox_head=dict( 43 | type='Shared2FCBBoxHead', 44 | in_channels=2048, 45 | fc_out_channels=1024, 46 | roi_feat_size=7, 47 | num_classes=80, 48 | bbox_coder=dict( 49 | type='DeltaXYWHBBoxCoder', 50 | target_means=[0., 0., 0., 0.], 51 | target_stds=[0.1, 0.1, 0.2, 0.2]), 52 | reg_class_agnostic=False, 53 | loss_cls=dict( 54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))), 56 | # model training and testing settings 57 | train_cfg=dict( 58 | rpn=dict( 59 | assigner=dict( 60 | type='MaxIoUAssigner', 61 | pos_iou_thr=0.7, 62 | neg_iou_thr=0.3, 63 | min_pos_iou=0.3, 64 | match_low_quality=True, 65 | ignore_iof_thr=-1), 66 | sampler=dict( 67 | type='RandomSampler', 68 | num=256, 69 | pos_fraction=0.5, 70 | neg_pos_ub=-1, 71 | add_gt_as_proposals=False), 72 | allowed_border=0, 73 | pos_weight=-1, 74 | debug=False), 75 | rpn_proposal=dict( 76 | nms_pre=12000, 77 | max_per_img=2000, 78 | nms=dict(type='nms', iou_threshold=0.7), 79 | min_bbox_size=0), 80 | rcnn=dict( 81 | assigner=dict( 82 | type='MaxIoUAssigner', 83 | pos_iou_thr=0.5, 84 | neg_iou_thr=0.5, 85 | min_pos_iou=0.5, 86 | match_low_quality=False, 87 | ignore_iof_thr=-1), 88 | sampler=dict( 89 | type='RandomSampler', 90 | num=512, 91 | pos_fraction=0.25, 92 | neg_pos_ub=-1, 93 | add_gt_as_proposals=True), 94 | pos_weight=-1, 95 | debug=False)), 96 | test_cfg=dict( 97 | rpn=dict( 98 | nms=dict(type='nms', iou_threshold=0.7), 99 | nms_pre=6000, 100 | max_per_img=1000, 101 | min_bbox_size=0), 102 | rcnn=dict( 103 | score_thr=0.05, 104 | nms=dict(type='nms', iou_threshold=0.5), 105 | max_per_img=100))) 106 | -------------------------------------------------------------------------------- /configs/_base_/models/faster_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='FasterRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | bbox_roi_extractor=dict( 38 | type='SingleRoIExtractor', 39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 40 | out_channels=256, 41 | featmap_strides=[4, 8, 16, 32]), 42 | bbox_head=dict( 43 | type='Shared2FCBBoxHead', 44 | in_channels=256, 45 | fc_out_channels=1024, 46 | roi_feat_size=7, 47 | num_classes=80, 48 | bbox_coder=dict( 49 | type='DeltaXYWHBBoxCoder', 50 | target_means=[0., 0., 0., 0.], 51 | target_stds=[0.1, 0.1, 0.2, 0.2]), 52 | reg_class_agnostic=False, 53 | loss_cls=dict( 54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0))), 56 | # model training and testing settings 57 | train_cfg=dict( 58 | rpn=dict( 59 | assigner=dict( 60 | type='MaxIoUAssigner', 61 | pos_iou_thr=0.7, 62 | neg_iou_thr=0.3, 63 | min_pos_iou=0.3, 64 | match_low_quality=True, 65 | ignore_iof_thr=-1), 66 | sampler=dict( 67 | type='RandomSampler', 68 | num=256, 69 | pos_fraction=0.5, 70 | neg_pos_ub=-1, 71 | add_gt_as_proposals=False), 72 | allowed_border=-1, 73 | pos_weight=-1, 74 | debug=False), 75 | rpn_proposal=dict( 76 | nms_pre=2000, 77 | max_per_img=1000, 78 | nms=dict(type='nms', iou_threshold=0.7), 79 | min_bbox_size=0), 80 | rcnn=dict( 81 | assigner=dict( 82 | type='MaxIoUAssigner', 83 | pos_iou_thr=0.5, 84 | neg_iou_thr=0.5, 85 | min_pos_iou=0.5, 86 | match_low_quality=False, 87 | ignore_iof_thr=-1), 88 | sampler=dict( 89 | type='RandomSampler', 90 | num=512, 91 | pos_fraction=0.25, 92 | neg_pos_ub=-1, 93 | add_gt_as_proposals=True), 94 | pos_weight=-1, 95 | debug=False)), 96 | test_cfg=dict( 97 | rpn=dict( 98 | nms_pre=1000, 99 | max_per_img=1000, 100 | nms=dict(type='nms', iou_threshold=0.7), 101 | min_bbox_size=0), 102 | rcnn=dict( 103 | score_thr=0.05, 104 | nms=dict(type='nms', iou_threshold=0.5), 105 | max_per_img=100) 106 | # soft-nms is also supported for rcnn testing 107 | # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05) 108 | )) 109 | -------------------------------------------------------------------------------- /configs/_base_/models/mask_rcnn_r50_caffe_c4.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | norm_cfg = dict(type='BN', requires_grad=False) 3 | model = dict( 4 | type='MaskRCNN', 5 | backbone=dict( 6 | type='ResNet', 7 | depth=50, 8 | num_stages=3, 9 | strides=(1, 2, 2), 10 | dilations=(1, 1, 1), 11 | out_indices=(2, ), 12 | frozen_stages=1, 13 | norm_cfg=norm_cfg, 14 | norm_eval=True, 15 | style='caffe', 16 | init_cfg=dict( 17 | type='Pretrained', 18 | checkpoint='open-mmlab://detectron2/resnet50_caffe')), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=1024, 22 | feat_channels=1024, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[2, 4, 8, 16, 32], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[16]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | shared_head=dict( 38 | type='ResLayer', 39 | depth=50, 40 | stage=3, 41 | stride=2, 42 | dilation=1, 43 | style='caffe', 44 | norm_cfg=norm_cfg, 45 | norm_eval=True), 46 | bbox_roi_extractor=dict( 47 | type='SingleRoIExtractor', 48 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), 49 | out_channels=1024, 50 | featmap_strides=[16]), 51 | bbox_head=dict( 52 | type='BBoxHead', 53 | with_avg_pool=True, 54 | roi_feat_size=7, 55 | in_channels=2048, 56 | num_classes=80, 57 | bbox_coder=dict( 58 | type='DeltaXYWHBBoxCoder', 59 | target_means=[0., 0., 0., 0.], 60 | target_stds=[0.1, 0.1, 0.2, 0.2]), 61 | reg_class_agnostic=False, 62 | loss_cls=dict( 63 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 64 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 65 | mask_roi_extractor=None, 66 | mask_head=dict( 67 | type='FCNMaskHead', 68 | num_convs=0, 69 | in_channels=2048, 70 | conv_out_channels=256, 71 | num_classes=80, 72 | loss_mask=dict( 73 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), 74 | # model training and testing settings 75 | train_cfg=dict( 76 | rpn=dict( 77 | assigner=dict( 78 | type='MaxIoUAssigner', 79 | pos_iou_thr=0.7, 80 | neg_iou_thr=0.3, 81 | min_pos_iou=0.3, 82 | match_low_quality=True, 83 | ignore_iof_thr=-1), 84 | sampler=dict( 85 | type='RandomSampler', 86 | num=256, 87 | pos_fraction=0.5, 88 | neg_pos_ub=-1, 89 | add_gt_as_proposals=False), 90 | allowed_border=0, 91 | pos_weight=-1, 92 | debug=False), 93 | rpn_proposal=dict( 94 | nms_pre=12000, 95 | max_per_img=2000, 96 | nms=dict(type='nms', iou_threshold=0.7), 97 | min_bbox_size=0), 98 | rcnn=dict( 99 | assigner=dict( 100 | type='MaxIoUAssigner', 101 | pos_iou_thr=0.5, 102 | neg_iou_thr=0.5, 103 | min_pos_iou=0.5, 104 | match_low_quality=False, 105 | ignore_iof_thr=-1), 106 | sampler=dict( 107 | type='RandomSampler', 108 | num=512, 109 | pos_fraction=0.25, 110 | neg_pos_ub=-1, 111 | add_gt_as_proposals=True), 112 | mask_size=14, 113 | pos_weight=-1, 114 | debug=False)), 115 | test_cfg=dict( 116 | rpn=dict( 117 | nms_pre=6000, 118 | nms=dict(type='nms', iou_threshold=0.7), 119 | max_per_img=1000, 120 | min_bbox_size=0), 121 | rcnn=dict( 122 | score_thr=0.05, 123 | nms=dict(type='nms', iou_threshold=0.5), 124 | max_per_img=100, 125 | mask_thr_binary=0.5))) 126 | -------------------------------------------------------------------------------- /configs/_base_/models/mask_rcnn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='MaskRCNN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | roi_head=dict( 36 | type='StandardRoIHead', 37 | bbox_roi_extractor=dict( 38 | type='SingleRoIExtractor', 39 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), 40 | out_channels=256, 41 | featmap_strides=[4, 8, 16, 32]), 42 | bbox_head=dict( 43 | type='Shared2FCBBoxHead', 44 | in_channels=256, 45 | fc_out_channels=1024, 46 | roi_feat_size=7, 47 | num_classes=80, 48 | bbox_coder=dict( 49 | type='DeltaXYWHBBoxCoder', 50 | target_means=[0., 0., 0., 0.], 51 | target_stds=[0.1, 0.1, 0.2, 0.2]), 52 | reg_class_agnostic=False, 53 | loss_cls=dict( 54 | type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), 55 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 56 | mask_roi_extractor=dict( 57 | type='SingleRoIExtractor', 58 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0), 59 | out_channels=256, 60 | featmap_strides=[4, 8, 16, 32]), 61 | mask_head=dict( 62 | type='FCNMaskHead', 63 | num_convs=4, 64 | in_channels=256, 65 | conv_out_channels=256, 66 | num_classes=80, 67 | loss_mask=dict( 68 | type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))), 69 | # model training and testing settings 70 | train_cfg=dict( 71 | rpn=dict( 72 | assigner=dict( 73 | type='MaxIoUAssigner', 74 | pos_iou_thr=0.7, 75 | neg_iou_thr=0.3, 76 | min_pos_iou=0.3, 77 | match_low_quality=True, 78 | ignore_iof_thr=-1), 79 | sampler=dict( 80 | type='RandomSampler', 81 | num=256, 82 | pos_fraction=0.5, 83 | neg_pos_ub=-1, 84 | add_gt_as_proposals=False), 85 | allowed_border=-1, 86 | pos_weight=-1, 87 | debug=False), 88 | rpn_proposal=dict( 89 | nms_pre=2000, 90 | max_per_img=1000, 91 | nms=dict(type='nms', iou_threshold=0.7), 92 | min_bbox_size=0), 93 | rcnn=dict( 94 | assigner=dict( 95 | type='MaxIoUAssigner', 96 | pos_iou_thr=0.5, 97 | neg_iou_thr=0.5, 98 | min_pos_iou=0.5, 99 | match_low_quality=True, 100 | ignore_iof_thr=-1), 101 | sampler=dict( 102 | type='RandomSampler', 103 | num=512, 104 | pos_fraction=0.25, 105 | neg_pos_ub=-1, 106 | add_gt_as_proposals=True), 107 | mask_size=28, 108 | pos_weight=-1, 109 | debug=False)), 110 | test_cfg=dict( 111 | rpn=dict( 112 | nms_pre=1000, 113 | max_per_img=1000, 114 | nms=dict(type='nms', iou_threshold=0.7), 115 | min_bbox_size=0), 116 | rcnn=dict( 117 | score_thr=0.05, 118 | nms=dict(type='nms', iou_threshold=0.5), 119 | max_per_img=100, 120 | mask_thr_binary=0.5))) 121 | -------------------------------------------------------------------------------- /configs/_base_/models/retinanet_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='RetinaNet', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | start_level=1, 19 | add_extra_convs='on_input', 20 | num_outs=5), 21 | bbox_head=dict( 22 | type='RetinaHead', 23 | num_classes=80, 24 | in_channels=256, 25 | stacked_convs=4, 26 | feat_channels=256, 27 | anchor_generator=dict( 28 | type='AnchorGenerator', 29 | octave_base_scale=4, 30 | scales_per_octave=3, 31 | ratios=[0.5, 1.0, 2.0], 32 | strides=[8, 16, 32, 64, 128]), 33 | bbox_coder=dict( 34 | type='DeltaXYWHBBoxCoder', 35 | target_means=[.0, .0, .0, .0], 36 | target_stds=[1.0, 1.0, 1.0, 1.0]), 37 | loss_cls=dict( 38 | type='FocalLoss', 39 | use_sigmoid=True, 40 | gamma=2.0, 41 | alpha=0.25, 42 | loss_weight=1.0), 43 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 44 | # model training and testing settings 45 | train_cfg=dict( 46 | assigner=dict( 47 | type='MaxIoUAssigner', 48 | pos_iou_thr=0.5, 49 | neg_iou_thr=0.4, 50 | min_pos_iou=0, 51 | ignore_iof_thr=-1), 52 | allowed_border=-1, 53 | pos_weight=-1, 54 | debug=False), 55 | test_cfg=dict( 56 | nms_pre=1000, 57 | min_bbox_size=0, 58 | score_thr=0.05, 59 | nms=dict(type='nms', iou_threshold=0.5), 60 | max_per_img=100)) 61 | -------------------------------------------------------------------------------- /configs/_base_/models/rpn_r50_caffe_c4.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='RPN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=3, 8 | strides=(1, 2, 2), 9 | dilations=(1, 1, 1), 10 | out_indices=(2, ), 11 | frozen_stages=1, 12 | norm_cfg=dict(type='BN', requires_grad=False), 13 | norm_eval=True, 14 | style='caffe', 15 | init_cfg=dict( 16 | type='Pretrained', 17 | checkpoint='open-mmlab://detectron2/resnet50_caffe')), 18 | neck=None, 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=1024, 22 | feat_channels=1024, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[2, 4, 8, 16, 32], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[16]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | # model training and testing settings 36 | train_cfg=dict( 37 | rpn=dict( 38 | assigner=dict( 39 | type='MaxIoUAssigner', 40 | pos_iou_thr=0.7, 41 | neg_iou_thr=0.3, 42 | min_pos_iou=0.3, 43 | ignore_iof_thr=-1), 44 | sampler=dict( 45 | type='RandomSampler', 46 | num=256, 47 | pos_fraction=0.5, 48 | neg_pos_ub=-1, 49 | add_gt_as_proposals=False), 50 | allowed_border=0, 51 | pos_weight=-1, 52 | debug=False)), 53 | test_cfg=dict( 54 | rpn=dict( 55 | nms_pre=12000, 56 | max_per_img=2000, 57 | nms=dict(type='nms', iou_threshold=0.7), 58 | min_bbox_size=0))) 59 | -------------------------------------------------------------------------------- /configs/_base_/models/rpn_r50_fpn.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | model = dict( 3 | type='RPN', 4 | backbone=dict( 5 | type='ResNet', 6 | depth=50, 7 | num_stages=4, 8 | out_indices=(0, 1, 2, 3), 9 | frozen_stages=1, 10 | norm_cfg=dict(type='BN', requires_grad=True), 11 | norm_eval=True, 12 | style='pytorch', 13 | init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), 14 | neck=dict( 15 | type='FPN', 16 | in_channels=[256, 512, 1024, 2048], 17 | out_channels=256, 18 | num_outs=5), 19 | rpn_head=dict( 20 | type='RPNHead', 21 | in_channels=256, 22 | feat_channels=256, 23 | anchor_generator=dict( 24 | type='AnchorGenerator', 25 | scales=[8], 26 | ratios=[0.5, 1.0, 2.0], 27 | strides=[4, 8, 16, 32, 64]), 28 | bbox_coder=dict( 29 | type='DeltaXYWHBBoxCoder', 30 | target_means=[.0, .0, .0, .0], 31 | target_stds=[1.0, 1.0, 1.0, 1.0]), 32 | loss_cls=dict( 33 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), 34 | loss_bbox=dict(type='L1Loss', loss_weight=1.0)), 35 | # model training and testing settings 36 | train_cfg=dict( 37 | rpn=dict( 38 | assigner=dict( 39 | type='MaxIoUAssigner', 40 | pos_iou_thr=0.7, 41 | neg_iou_thr=0.3, 42 | min_pos_iou=0.3, 43 | ignore_iof_thr=-1), 44 | sampler=dict( 45 | type='RandomSampler', 46 | num=256, 47 | pos_fraction=0.5, 48 | neg_pos_ub=-1, 49 | add_gt_as_proposals=False), 50 | allowed_border=0, 51 | pos_weight=-1, 52 | debug=False)), 53 | test_cfg=dict( 54 | rpn=dict( 55 | nms_pre=2000, 56 | max_per_img=1000, 57 | nms=dict(type='nms', iou_threshold=0.7), 58 | min_bbox_size=0))) 59 | -------------------------------------------------------------------------------- /configs/_base_/models/ssd300.py: -------------------------------------------------------------------------------- 1 | # model settings 2 | input_size = 300 3 | model = dict( 4 | type='SingleStageDetector', 5 | backbone=dict( 6 | type='SSDVGG', 7 | depth=16, 8 | with_last_pool=False, 9 | ceil_mode=True, 10 | out_indices=(3, 4), 11 | out_feature_indices=(22, 34), 12 | init_cfg=dict( 13 | type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')), 14 | neck=dict( 15 | type='SSDNeck', 16 | in_channels=(512, 1024), 17 | out_channels=(512, 1024, 512, 256, 256, 256), 18 | level_strides=(2, 2, 1, 1), 19 | level_paddings=(1, 1, 0, 0), 20 | l2_norm_scale=20), 21 | bbox_head=dict( 22 | type='SSDHead', 23 | in_channels=(512, 1024, 512, 256, 256, 256), 24 | num_classes=80, 25 | anchor_generator=dict( 26 | type='SSDAnchorGenerator', 27 | scale_major=False, 28 | input_size=input_size, 29 | basesize_ratio_range=(0.15, 0.9), 30 | strides=[8, 16, 32, 64, 100, 300], 31 | ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]), 32 | bbox_coder=dict( 33 | type='DeltaXYWHBBoxCoder', 34 | target_means=[.0, .0, .0, .0], 35 | target_stds=[0.1, 0.1, 0.2, 0.2])), 36 | # model training and testing settings 37 | train_cfg=dict( 38 | assigner=dict( 39 | type='MaxIoUAssigner', 40 | pos_iou_thr=0.5, 41 | neg_iou_thr=0.5, 42 | min_pos_iou=0., 43 | ignore_iof_thr=-1, 44 | gt_max_assign_all=False), 45 | smoothl1_beta=1., 46 | allowed_border=-1, 47 | pos_weight=-1, 48 | neg_pos_ratio=3, 49 | debug=False), 50 | test_cfg=dict( 51 | nms_pre=1000, 52 | nms=dict(type='nms', iou_threshold=0.45), 53 | min_bbox_size=0, 54 | score_thr=0.02, 55 | max_per_img=200)) 56 | cudnn_benchmark = True 57 | -------------------------------------------------------------------------------- /configs/_base_/schedules/schedule_1x.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) 3 | optimizer_config = dict(grad_clip=None) 4 | # learning policy 5 | lr_config = dict( 6 | policy='step', 7 | warmup='linear', 8 | warmup_iters=500, 9 | warmup_ratio=0.001, 10 | step=[8, 11]) 11 | runner = dict(type='EpochBasedRunner', max_epochs=12) 12 | -------------------------------------------------------------------------------- /configs/_base_/schedules/schedule_20e.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) 3 | optimizer_config = dict(grad_clip=None) 4 | # learning policy 5 | lr_config = dict( 6 | policy='step', 7 | warmup='linear', 8 | warmup_iters=500, 9 | warmup_ratio=0.001, 10 | step=[16, 19]) 11 | runner = dict(type='EpochBasedRunner', max_epochs=20) 12 | -------------------------------------------------------------------------------- /configs/_base_/schedules/schedule_2x.py: -------------------------------------------------------------------------------- 1 | # optimizer 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) 3 | optimizer_config = dict(grad_clip=None) 4 | # learning policy 5 | lr_config = dict( 6 | policy='step', 7 | warmup='linear', 8 | warmup_iters=500, 9 | warmup_ratio=0.001, 10 | step=[16, 22]) 11 | runner = dict(type='EpochBasedRunner', max_epochs=24) 12 | -------------------------------------------------------------------------------- /configs/openinst/coco_to_uvo_ins.py: -------------------------------------------------------------------------------- 1 | # dataset settings 2 | img_norm_cfg = dict( 3 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 4 | train_pipeline = [ 5 | dict(type='LoadImageFromFile'), 6 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 7 | dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), 8 | dict(type='RandomFlip', flip_ratio=0.5), 9 | dict(type='Normalize', **img_norm_cfg), 10 | dict(type='Pad', size_divisor=32), 11 | dict(type='DefaultFormatBundle'), 12 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 13 | ] 14 | test_pipeline = [ 15 | dict(type='LoadImageFromFile'), 16 | dict( 17 | type='MultiScaleFlipAug', 18 | img_scale=(1333, 800), 19 | flip=False, 20 | transforms=[ 21 | dict(type='Resize', keep_ratio=True), 22 | dict(type='RandomFlip'), 23 | dict(type='Normalize', **img_norm_cfg), 24 | dict(type='Pad', size_divisor=32), 25 | dict(type='ImageToTensor', keys=['img']), 26 | dict(type='Collect', keys=['img']), 27 | ]) 28 | ] 29 | data = dict( 30 | samples_per_gpu=4, 31 | workers_per_gpu=4, 32 | train=dict( 33 | type='CocoSplitDataset', 34 | is_class_agnostic=True, 35 | train_class='all', 36 | eval_class='all', 37 | ann_file='data/coco/annotations/instances_train2017.json', 38 | img_prefix='data/coco/train2017/', 39 | pipeline=train_pipeline), 40 | val=dict( 41 | type='UVODataset', 42 | is_class_agnostic=True, 43 | train_class='all', 44 | eval_class='all', 45 | ann_file='data/UVO/ann/UVO_frame_val.json', 46 | img_prefix='data/UVO/images/', 47 | pipeline=test_pipeline), 48 | test=dict( 49 | type='UVODataset', 50 | is_class_agnostic=True, 51 | train_class='all', 52 | eval_class='all', 53 | ann_file='data/UVO/ann/UVO_frame_val.json', 54 | img_prefix='data/UVO/images/', 55 | pipeline=test_pipeline)) 56 | evaluation = dict(interval=1, metric=['bbox', 'segm']) 57 | -------------------------------------------------------------------------------- /configs/openinst/queryinst_r50_1x_coco.py: -------------------------------------------------------------------------------- 1 | dataset_file = './coco_to_uvo_ins.py' 2 | init_cfg = dict(type='Pretrained', checkpoint='resnet50-0676ba61.pth') 3 | init_cfg=None 4 | tb_hook = dict(type='TensorboardLoggerHook') 5 | _base_ = [ 6 | dataset_file, 7 | '../_base_/schedules/schedule_1x.py', '../_base_/default_runtime.py' 8 | ] 9 | num_stages = 6 10 | num_proposals = 100 11 | model = dict( 12 | type='QueryInst', 13 | backbone=dict( 14 | type='ResNet', 15 | depth=50, 16 | num_stages=4, 17 | out_indices=(0, 1, 2, 3), 18 | frozen_stages=-1, 19 | norm_cfg=dict(type='SyncBN', requires_grad=True), 20 | norm_eval=False, 21 | style='pytorch', 22 | init_cfg=init_cfg, 23 | dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), 24 | stage_with_dcn=(False, True, True, True), 25 | with_cp=True), 26 | neck=dict( 27 | type='BiFPN', 28 | in_channels=[256, 512, 1024, 2048], 29 | out_channels=256, 30 | num_outs=6, 31 | num_repeats=6, 32 | norm='SyncBN'), 33 | rpn_head=dict( 34 | type='EmbeddingRPNHead', 35 | num_proposals=num_proposals, 36 | proposal_feature_channel=256), 37 | roi_head=dict( 38 | type='SparseScoreRoIHead', 39 | num_stages=num_stages, 40 | stage_loss_weights=[1] * num_stages, 41 | proposal_feature_channel=256, 42 | bbox_roi_extractor=dict( 43 | type='SingleRoIExtractor', 44 | roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=2), 45 | out_channels=256, 46 | featmap_strides=[4, 8, 16, 32, 64, 128]), 47 | mask_roi_extractor=dict( 48 | type='SingleRoIExtractor', 49 | roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=2), 50 | out_channels=256, 51 | featmap_strides=[4, 8, 16, 32, 64, 128]), 52 | bbox_head=[ 53 | dict( 54 | type='DIIScoreHead', 55 | num_classes=1, 56 | num_ffn_fcs=2, 57 | num_heads=8, 58 | num_cls_fcs=1, 59 | num_reg_fcs=3, 60 | feedforward_channels=2048, 61 | in_channels=256, 62 | dropout=0.0, 63 | ffn_act_cfg=dict(type='ReLU', inplace=True), 64 | dynamic_conv_cfg=dict( 65 | type='DynamicConv', 66 | in_channels=256, 67 | feat_channels=64, 68 | out_channels=256, 69 | input_feat_shape=7, 70 | act_cfg=dict(type='ReLU', inplace=True), 71 | norm_cfg=dict(type='LN')), 72 | loss_bbox=dict(type='L1Loss', loss_weight=5.0), 73 | loss_iou=dict(type='GIoULoss', loss_weight=2.0), 74 | loss_cls=dict(type='L1Loss', loss_weight=2.0), 75 | bbox_coder=dict( 76 | type='DeltaXYWHBBoxCoder', 77 | clip_border=False, 78 | target_means=[0., 0., 0., 0.], 79 | target_stds=[0.5, 0.5, 1., 1.])) for _ in range(num_stages) 80 | ], 81 | mask_head=[ 82 | dict( 83 | type='DynamicMaskHead', 84 | dynamic_conv_cfg=dict( 85 | type='DynamicConv', 86 | in_channels=256, 87 | feat_channels=64, 88 | out_channels=256, 89 | input_feat_shape=14, 90 | with_proj=False, 91 | act_cfg=dict(type='ReLU', inplace=True), 92 | norm_cfg=dict(type='LN')), 93 | num_convs=4, 94 | num_classes=1, 95 | roi_feat_size=14, 96 | in_channels=256, 97 | conv_kernel_size=3, 98 | conv_out_channels=256, 99 | class_agnostic=False, 100 | norm_cfg=dict(type='BN'), 101 | upsample_cfg=dict(type='deconv', scale_factor=2), 102 | loss_mask=dict( 103 | type='DiceLoss', 104 | loss_weight=1.0, 105 | use_sigmoid=True, 106 | activate=False, 107 | eps=1e-5)) for _ in range(num_stages) 108 | ]), 109 | # training and testing settings 110 | train_cfg=dict( 111 | rpn=None, 112 | rcnn=[ 113 | dict( 114 | assigner=dict( 115 | type='HungarianAssigner', 116 | cls_cost=dict(type='FocalLossCost', weight=2.0), 117 | reg_cost=dict(type='BBoxL1Cost', weight=5.0), 118 | iou_cost=dict(type='IoUCost', iou_mode='giou', 119 | weight=2.0)), 120 | sampler=dict(type='PseudoSampler'), 121 | pos_weight=1, 122 | mask_size=28, 123 | ) for _ in range(num_stages) 124 | ]), 125 | test_cfg=dict( 126 | rpn=None, rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5))) 127 | 128 | # optimizer 129 | optimizer = dict( 130 | _delete_=True, 131 | type='AdamW', 132 | lr=0.0001, 133 | weight_decay=0.0001, 134 | paramwise_cfg=dict( 135 | custom_keys={'backbone': dict(lr_mult=0.1, decay_mult=1.0), 136 | 'init_proposal_features':dict(lr_mult=1.0, decay_mult=0.0), 137 | 'init_proposal_bboxes': dict(lr_mult=1.0, decay_mult=0.0), 138 | }, 139 | norm_decay_mult=0.0) 140 | ) 141 | 142 | optimizer_config = dict( 143 | _delete_=True, grad_clip=dict(max_norm=0.1, norm_type=2)) 144 | # learning policy 145 | lr_config = dict(policy='step', step=[8, 11], warmup_iters=1000) 146 | runner = dict(type='EpochBasedRunner', max_epochs=12) 147 | 148 | # log setting 149 | log_config = dict( 150 | interval=20, 151 | hooks=[ 152 | dict(type='TextLoggerHook'), 153 | tb_hook, 154 | ]) 155 | 156 | checkpoint_config = dict(interval=3) 157 | resume_from = None 158 | custom_hooks = [] 159 | custom_hooks = [dict( 160 | type='ExpMomentumEMAHook', 161 | resume_from=resume_from, 162 | momentum=0.0001, 163 | priority=49)] -------------------------------------------------------------------------------- /configs/openinst/queryinst_r50_3x_lsj_coco.py: -------------------------------------------------------------------------------- 1 | _base_ = './queryinst_r50_1x_coco.py' 2 | num_proposals = 100 3 | model = dict( 4 | rpn_head=dict(num_proposals=num_proposals), 5 | test_cfg=dict( 6 | _delete_=True, 7 | rpn=None, 8 | rcnn=dict(max_per_img=num_proposals, mask_thr_binary=0.5))) 9 | img_norm_cfg = dict( 10 | mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) 11 | 12 | # augmentation strategy originates from DETR. 13 | image_size = (1024, 1024) 14 | train_pipeline = [ 15 | dict(type='LoadImageFromFile'), 16 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 17 | dict( 18 | type='Resize', 19 | img_scale=image_size, 20 | ratio_range=(0.1, 2.0), 21 | multiscale_mode='range', 22 | keep_ratio=True), 23 | dict( 24 | type='RandomCrop', 25 | crop_type='absolute_range', 26 | crop_size=image_size, 27 | recompute_bbox=True, 28 | allow_negative_crop=True), 29 | dict(type='FilterAnnotations', min_gt_bbox_wh=(1e-2, 1e-2)), 30 | dict(type='RandomFlip', flip_ratio=0.5), 31 | dict(type='Pad', size=image_size), 32 | dict(type='Normalize', **img_norm_cfg), 33 | dict(type='DefaultFormatBundle'), 34 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 35 | ] 36 | test_pipeline = [ 37 | dict(type='LoadImageFromFile'), 38 | dict( 39 | type='MultiScaleFlipAug', 40 | img_scale=(1333, 800), 41 | flip=False, 42 | transforms=[ 43 | dict(type='Resize', keep_ratio=True), 44 | dict(type='RandomFlip'), 45 | dict(type='Normalize', **img_norm_cfg), 46 | dict(type='Pad', size_divisor=32), 47 | dict(type='ImageToTensor', keys=['img']), 48 | dict(type='Collect', keys=['img']), 49 | ]) 50 | ] 51 | 52 | data = dict(train=dict(pipeline=train_pipeline)) 53 | 54 | lr_config = dict(policy='step', step=[50,]) 55 | runner = dict(type='EpochBasedRunner', max_epochs=50) 56 | 57 | resume_from = None 58 | custom_hooks = [dict( 59 | type='ExpMomentumEMAHook', 60 | resume_from=resume_from, 61 | momentum=0.0001, 62 | priority=49)] -------------------------------------------------------------------------------- /core/__init__.py: -------------------------------------------------------------------------------- 1 | from .bbox import * 2 | from .hook import * -------------------------------------------------------------------------------- /core/bbox/__init__.py: -------------------------------------------------------------------------------- 1 | from .assigners import * 2 | from .match_costs import * -------------------------------------------------------------------------------- /core/bbox/assigners/__init__.py: -------------------------------------------------------------------------------- 1 | from .hungarian_oln_assigner import HungarianOlnAssigner -------------------------------------------------------------------------------- /core/bbox/assigners/hungarian_oln_assigner.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import torch 3 | 4 | from mmdet.core.bbox.builder import BBOX_ASSIGNERS 5 | from mmdet.core.bbox.match_costs import build_match_cost 6 | from mmdet.core.bbox.transforms import bbox_cxcywh_to_xyxy 7 | from mmdet.core.bbox.assigners.assign_result import AssignResult 8 | from mmdet.core.bbox.assigners.base_assigner import BaseAssigner 9 | 10 | try: 11 | from scipy.optimize import linear_sum_assignment 12 | except ImportError: 13 | linear_sum_assignment = None 14 | 15 | 16 | @BBOX_ASSIGNERS.register_module() 17 | class HungarianOlnAssigner(BaseAssigner): 18 | """Computes one-to-one matching between predictions and ground truth. 19 | 20 | This class computes an assignment between the targets and the predictions 21 | based on the costs. The costs are weighted sum of three components: 22 | classification cost, regression L1 cost and regression iou cost. The 23 | targets don't include the no_object, so generally there are more 24 | predictions than targets. After the one-to-one matching, the un-matched 25 | are treated as backgrounds. Thus each query prediction will be assigned 26 | with `0` or a positive integer indicating the ground truth index: 27 | 28 | - 0: negative sample, no assigned gt 29 | - positive integer: positive sample, index (1-based) of assigned gt 30 | 31 | Args: 32 | cls_weight (int | float, optional): The scale factor for classification 33 | cost. Default 1.0. 34 | bbox_weight (int | float, optional): The scale factor for regression 35 | L1 cost. Default 1.0. 36 | iou_weight (int | float, optional): The scale factor for regression 37 | iou cost. Default 1.0. 38 | iou_calculator (dict | optional): The config for the iou calculation. 39 | Default type `BboxOverlaps2D`. 40 | iou_mode (str | optional): "iou" (intersection over union), "iof" 41 | (intersection over foreground), or "giou" (generalized 42 | intersection over union). Default "giou". 43 | """ 44 | 45 | def __init__(self, 46 | cls_cost=dict(type='ClassificationCost', weight=1.), 47 | reg_cost=dict(type='BBoxL1Cost', weight=1.0), 48 | iou_cost=dict(type='IoUCost', iou_mode='giou', weight=1.0)): 49 | self.cls_cost = build_match_cost(cls_cost) 50 | self.reg_cost = build_match_cost(reg_cost) 51 | self.iou_cost = build_match_cost(iou_cost) 52 | 53 | def assign(self, 54 | bbox_pred, 55 | cls_pred, 56 | gt_bboxes, 57 | gt_labels, 58 | img_meta, 59 | gt_bboxes_ignore=None, 60 | eps=1e-7): 61 | """Computes one-to-one matching based on the weighted costs. 62 | 63 | This method assign each query prediction to a ground truth or 64 | background. The `assigned_gt_inds` with -1 means don't care, 65 | 0 means negative sample, and positive number is the index (1-based) 66 | of assigned gt. 67 | The assignment is done in the following steps, the order matters. 68 | 69 | 1. assign every prediction to -1 70 | 2. compute the weighted costs 71 | 3. do Hungarian matching on CPU based on the costs 72 | 4. assign all to 0 (background) first, then for each matched pair 73 | between predictions and gts, treat this prediction as foreground 74 | and assign the corresponding gt index (plus 1) to it. 75 | 76 | Args: 77 | bbox_pred (Tensor): Predicted boxes with normalized coordinates 78 | (cx, cy, w, h), which are all in range [0, 1]. Shape 79 | [num_query, 4]. 80 | cls_pred (Tensor): Predicted classification logits, shape 81 | [num_query, num_class]. 82 | gt_bboxes (Tensor): Ground truth boxes with unnormalized 83 | coordinates (x1, y1, x2, y2). Shape [num_gt, 4]. 84 | gt_labels (Tensor): Label of `gt_bboxes`, shape (num_gt,). 85 | img_meta (dict): Meta information for current image. 86 | gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are 87 | labelled as `ignored`. Default None. 88 | eps (int | float, optional): A value added to the denominator for 89 | numerical stability. Default 1e-7. 90 | 91 | Returns: 92 | :obj:`AssignResult`: The assigned result. 93 | """ 94 | assert gt_bboxes_ignore is None, \ 95 | 'Only case when gt_bboxes_ignore is None is supported.' 96 | num_gts, num_bboxes = gt_bboxes.size(0), bbox_pred.size(0) 97 | 98 | # 1. assign -1 by default 99 | assigned_gt_inds = bbox_pred.new_full((num_bboxes, ), 100 | -1, 101 | dtype=torch.long) 102 | assigned_labels = bbox_pred.new_full((num_bboxes, ), 103 | -1, 104 | dtype=torch.long) 105 | if num_gts == 0 or num_bboxes == 0: 106 | # No ground truth or boxes, return empty assignment 107 | if num_gts == 0: 108 | # No ground truth, assign all to background 109 | assigned_gt_inds[:] = 0 110 | return AssignResult( 111 | num_gts, assigned_gt_inds, None, labels=assigned_labels) 112 | img_h, img_w, _ = img_meta['img_shape'] 113 | factor = gt_bboxes.new_tensor([img_w, img_h, img_w, 114 | img_h]).unsqueeze(0) 115 | 116 | # 2. compute the weighted costs 117 | # regression L1 cost 118 | normalize_gt_bboxes = gt_bboxes / factor 119 | reg_cost = self.reg_cost(bbox_pred, normalize_gt_bboxes) 120 | # regression iou cost, defaultly giou is used in official DETR. 121 | bboxes = bbox_cxcywh_to_xyxy(bbox_pred) * factor 122 | iou_cost = self.iou_cost(bboxes, gt_bboxes) 123 | # classification and bboxcost. 124 | cls_cost = self.cls_cost(cls_pred, bboxes, gt_bboxes) 125 | # weighted sum of above three costs 126 | cost = cls_cost + reg_cost + iou_cost 127 | 128 | # 3. do Hungarian matching on CPU using linear_sum_assignment 129 | cost = cost.detach().cpu() 130 | if linear_sum_assignment is None: 131 | raise ImportError('Please run "pip install scipy" ' 132 | 'to install scipy first.') 133 | matched_row_inds, matched_col_inds = linear_sum_assignment(cost) 134 | matched_row_inds = torch.from_numpy(matched_row_inds).to( 135 | bbox_pred.device) 136 | matched_col_inds = torch.from_numpy(matched_col_inds).to( 137 | bbox_pred.device) 138 | 139 | # 4. assign backgrounds and foregrounds 140 | # assign all indices to backgrounds first 141 | assigned_gt_inds[:] = 0 142 | # assign foregrounds based on matching results 143 | assigned_gt_inds[matched_row_inds] = matched_col_inds + 1 144 | assigned_labels[matched_row_inds] = gt_labels[matched_col_inds] 145 | return AssignResult( 146 | num_gts, assigned_gt_inds, None, labels=assigned_labels) 147 | -------------------------------------------------------------------------------- /core/bbox/match_costs/__init__.py: -------------------------------------------------------------------------------- 1 | from .objectness_l1_cost import ObjectnessL1Cost -------------------------------------------------------------------------------- /core/bbox/match_costs/objectness_l1_cost.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn.functional as F 3 | 4 | from mmdet.core.bbox.iou_calculators import bbox_overlaps 5 | from mmdet.core.bbox.transforms import bbox_cxcywh_to_xyxy, bbox_xyxy_to_cxcywh 6 | from mmdet.core.bbox.match_costs.builder import MATCH_COST 7 | 8 | @MATCH_COST.register_module() 9 | class ObjectnessL1Cost: 10 | """BBoxL1Cost. 11 | 12 | Args: 13 | weight (int | float, optional): loss_weight 14 | iou_mode (str, optional): iou mode such as 'iou' | 'giou' 15 | 16 | Examples: 17 | >>> from mmdet.core.bbox.match_costs.match_cost import BBoxL1Cost 18 | >>> import torch 19 | >>> self = L1Cost() 20 | >>> bbox_pred = torch.rand(10, 1) 21 | >>> gt_bboxes= torch.FloatTensor([0.8, 0.9]) 22 | >>> self(bbox_pred, gt_bboxes) 23 | tensor([[1.6172, 1.6422]]) 24 | """ 25 | 26 | def __init__(self, weight=1., iou_mode='iou'): 27 | self.weight = weight 28 | self.iou_mode = iou_mode 29 | 30 | def __call__(self, cls_pred, bboxes, gt_bboxes): 31 | """ 32 | Args: 33 | cls_pred (Tensor): Predicted boxes with normalized coordinates 34 | (cx, cy, w, h), which are all in range [0, 1]. Shape 35 | (num_query, 4). 36 | gt_targets (Tensor): Ground truth boxes with normalized 37 | coordinates (x1, y1, x2, y2). Shape (num_gt, 4). 38 | 39 | Returns: 40 | torch.Tensor: bbox_cost value with weight 41 | """ 42 | # overlaps: [num_gt, num_bboxes] 43 | overlaps = bbox_overlaps( 44 | gt_bboxes, bboxes, mode=self.iou_mode, is_aligned=False) 45 | gt_targets, _ = torch.max(overlaps, 1, keepdim=True) # [num_gt, 1] 46 | 47 | objectness_cost = torch.cdist(cls_pred, gt_targets, p=1) 48 | return objectness_cost * self.weight -------------------------------------------------------------------------------- /core/hook/__init__.py: -------------------------------------------------------------------------------- 1 | from .ema import * -------------------------------------------------------------------------------- /core/hook/ema.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import math 3 | 4 | from mmcv.parallel import is_module_wrapper 5 | from mmcv.runner.hooks import HOOKS, Hook 6 | 7 | 8 | class BaseEMAHook(Hook): 9 | """Exponential Moving Average Hook. 10 | 11 | Use Exponential Moving Average on all parameters of model in training 12 | process. All parameters have a ema backup, which update by the formula 13 | as below. EMAHook takes priority over EvalHook and CheckpointHook. Note, 14 | the original model parameters are actually saved in ema field after train. 15 | 16 | Args: 17 | momentum (float): The momentum used for updating ema parameter. 18 | Ema's parameter are updated with the formula: 19 | `ema_param = (1-momentum) * ema_param + momentum * cur_param`. 20 | Defaults to 0.0002. 21 | skip_buffers (bool): Whether to skip the model buffers, such as 22 | batchnorm running stats (running_mean, running_var), it does not 23 | perform the ema operation. Default to False. 24 | interval (int): Update ema parameter every interval iteration. 25 | Defaults to 1. 26 | resume_from (str, optional): The checkpoint path. Defaults to None. 27 | momentum_fun (func, optional): The function to change momentum 28 | during early iteration (also warmup) to help early training. 29 | It uses `momentum` as a constant. Defaults to None. 30 | """ 31 | 32 | def __init__(self, 33 | momentum=0.0002, 34 | interval=1, 35 | skip_buffers=False, 36 | resume_from=None, 37 | momentum_fun=None): 38 | assert 0 < momentum < 1 39 | self.momentum = momentum 40 | self.skip_buffers = skip_buffers 41 | self.interval = interval 42 | self.checkpoint = resume_from 43 | self.momentum_fun = momentum_fun 44 | 45 | def before_run(self, runner): 46 | """To resume model with it's ema parameters more friendly. 47 | 48 | Register ema parameter as ``named_buffer`` to model. 49 | """ 50 | model = runner.model 51 | if is_module_wrapper(model): 52 | model = model.module 53 | self.param_ema_buffer = {} 54 | if self.skip_buffers: 55 | self.model_parameters = dict(model.named_parameters()) 56 | else: 57 | self.model_parameters = model.state_dict() 58 | for name, value in self.model_parameters.items(): 59 | # "." is not allowed in module's buffer name 60 | buffer_name = f"ema_{name.replace('.', '_')}" 61 | self.param_ema_buffer[name] = buffer_name 62 | model.register_buffer(buffer_name, value.data.clone()) 63 | self.model_buffers = dict(model.named_buffers()) 64 | if self.checkpoint is not None: 65 | runner.resume(self.checkpoint, resume_optimizer=False) # !!!ban loading optimizer state_dict 66 | 67 | def get_momentum(self, runner): 68 | return self.momentum_fun(runner.iter) if self.momentum_fun else \ 69 | self.momentum 70 | 71 | def after_train_iter(self, runner): 72 | """Update ema parameter every self.interval iterations.""" 73 | if (runner.iter + 1) % self.interval != 0: 74 | return 75 | momentum = self.get_momentum(runner) 76 | for name, parameter in self.model_parameters.items(): 77 | # exclude num_tracking 78 | if parameter.dtype.is_floating_point: 79 | buffer_name = self.param_ema_buffer[name] 80 | buffer_parameter = self.model_buffers[buffer_name] 81 | buffer_parameter.mul_(1 - momentum).add_( 82 | parameter.data, alpha=momentum) 83 | 84 | def after_train_epoch(self, runner): 85 | """We load parameter values from ema backup to model before the 86 | EvalHook.""" 87 | self._swap_ema_parameters() 88 | 89 | def before_train_epoch(self, runner): 90 | """We recover model's parameter from ema backup after last epoch's 91 | EvalHook.""" 92 | self._swap_ema_parameters() 93 | 94 | def _swap_ema_parameters(self): 95 | """Swap the parameter of model with parameter in ema_buffer.""" 96 | for name, value in self.model_parameters.items(): 97 | temp = value.data.clone() 98 | ema_buffer = self.model_buffers[self.param_ema_buffer[name]] 99 | value.data.copy_(ema_buffer.data) 100 | ema_buffer.data.copy_(temp) 101 | 102 | 103 | @HOOKS.register_module() 104 | class LExpMomentumEMAHook(BaseEMAHook): 105 | """EMAHook using exponential momentum strategy. 106 | 107 | Args: 108 | total_iter (int): The total number of iterations of EMA momentum. 109 | Defaults to 2000. 110 | """ 111 | 112 | def __init__(self, total_iter=2000, **kwargs): 113 | super(LExpMomentumEMAHook, self).__init__(**kwargs) 114 | self.momentum_fun = lambda x: (1 - self.momentum) * math.exp(-( 115 | 1 + x) / total_iter) + self.momentum 116 | -------------------------------------------------------------------------------- /datasets/__init__.py: -------------------------------------------------------------------------------- 1 | from .coco_split_dataset import CocoSplitDataset 2 | from .objects365_split_dataset import Objects365SplitDataset 3 | from .uvo_dataset import UVODataset 4 | 5 | from .pipelines import CopyPaste -------------------------------------------------------------------------------- /datasets/coco.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from mmcv.utils import print_log 3 | from pycocotools.coco import COCO 4 | 5 | 6 | from mmdet.datasets.coco import CocoDataset 7 | 8 | class CocoAnnDataset(CocoDataset): 9 | 10 | def __init__(self, 11 | **kwargs): 12 | # We convert all category IDs into 1 for the class-agnostic training and 13 | # evaluation. We train on train_class and evaluate on eval_class split. 14 | super(CocoAnnDataset, self).__init__(**kwargs) 15 | self.dataset_stat() 16 | 17 | CLASSES = ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 18 | 'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 19 | 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 20 | 'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 21 | 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee', 22 | 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 23 | 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 24 | 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 25 | 'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot', 26 | 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 27 | 'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 28 | 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 29 | 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 30 | 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush') 31 | 32 | def dataset_stat(self): 33 | num_images = len(self) 34 | num_instances = 0 35 | for i in range(num_images): 36 | ann = self.get_ann_info(i) 37 | num_bbox = ann['bboxes'].shape[0] 38 | num_instances += num_bbox 39 | print(f'Dataset images number: {num_images}') 40 | print(f'Dataset instances number: {num_instances}') 41 | 42 | def load_annotations(self, ann_file): 43 | """Load annotation from COCO style annotation file. 44 | 45 | Args: 46 | ann_file (str): Path of annotation file. 47 | 48 | Returns: 49 | list[dict]: Annotation info from COCO api. 50 | """ 51 | 52 | self.coco = COCO(ann_file) 53 | # The order of returned `cat_ids` will not 54 | # change with the order of the CLASSES 55 | self.cat_ids = self.coco.get_cat_ids(cat_names=self.CLASSES) 56 | 57 | self.cat2label = {cat_id: i for i, cat_id in enumerate(self.cat_ids)} 58 | self.img_ids = self.coco.get_img_ids() 59 | data_infos = [] 60 | total_ann_ids = [] 61 | for i in self.img_ids: 62 | info = self.coco.load_imgs([i])[0] 63 | info['filename'] = info['file_name'] 64 | data_infos.append(info) 65 | ann_ids = self.coco.get_ann_ids(img_ids=[i]) 66 | total_ann_ids.extend(ann_ids) 67 | assert len(set(total_ann_ids)) == len( 68 | total_ann_ids), f"Annotation ids in '{ann_file}' are not unique!" 69 | return data_infos 70 | 71 | def _filter_imgs(self, min_size=0): 72 | """Filter images too small or without ground truths.""" 73 | valid_inds = [] 74 | # obtain images that contain annotation 75 | ids_with_ann = set(_['image_id'] for _ in self.coco.anns.values()) 76 | # obtain images that contain annotations of the required categories 77 | ids_in_cat = set() 78 | for i, class_id in enumerate(self.cat_ids): 79 | ids_in_cat |= set(self.coco.cat_img_map[class_id]) 80 | # merge the image id sets of the two conditions and use the merged set 81 | # to filter out images if self.filter_empty_gt=True 82 | ids_in_cat &= ids_with_ann 83 | 84 | valid_img_ids = [] 85 | for i, img_info in enumerate(self.data_infos): 86 | img_id = self.img_ids[i] 87 | if self.filter_empty_gt and img_id not in ids_in_cat: 88 | continue 89 | if min(img_info['width'], img_info['height']) >= min_size: 90 | valid_inds.append(i) 91 | valid_img_ids.append(img_id) 92 | self.img_ids = valid_img_ids 93 | return valid_inds 94 | 95 | def get_ann_info(self, idx): 96 | """Get COCO annotation by index. 97 | 98 | Args: 99 | idx (int): Index of data. 100 | 101 | Returns: 102 | dict: Annotation info of specified index. 103 | """ 104 | 105 | img_id = self.data_infos[idx]['id'] 106 | ann_ids = self.coco.get_ann_ids(img_ids=[img_id]) 107 | ann_info = self.coco.load_anns(ann_ids) 108 | return self._parse_ann_info(self.data_infos[idx], ann_info) 109 | 110 | def _parse_ann_info(self, img_info, ann_info): 111 | """Parse bbox and mask annotation. 112 | 113 | Args: 114 | ann_info (list[dict]): Annotation info of an image. 115 | with_mask (bool): Whether to parse mask annotations. 116 | 117 | Returns: 118 | dict: A dict containing the following keys: bboxes, bboxes_ignore,\ 119 | labels, masks, seg_map. "masks" are raw annotations and not \ 120 | decoded into binary masks. 121 | """ 122 | gt_bboxes = [] 123 | gt_labels = [] 124 | gt_bboxes_ignore = [] 125 | gt_masks_ann = [] 126 | for i, ann in enumerate(ann_info): 127 | if ann.get('ignore', False): 128 | continue 129 | x1, y1, w, h = ann['bbox'] 130 | inter_w = max(0, min(x1 + w, img_info['width']) - max(x1, 0)) 131 | inter_h = max(0, min(y1 + h, img_info['height']) - max(y1, 0)) 132 | if inter_w * inter_h == 0: 133 | continue 134 | if ann['area'] <= 0 or w < 1 or h < 1: 135 | continue 136 | if ann['category_id'] not in self.cat_ids: 137 | continue 138 | bbox = [x1, y1, x1 + w, y1 + h] 139 | if ann.get('iscrowd', False): 140 | gt_bboxes_ignore.append(bbox) 141 | else: 142 | gt_bboxes.append(bbox) 143 | gt_labels.append(self.cat2label[ann['category_id']]) 144 | gt_masks_ann.append(ann.get('segmentation', None)) 145 | 146 | if gt_bboxes: 147 | gt_bboxes = np.array(gt_bboxes, dtype=np.float32) 148 | gt_labels = np.array(gt_labels, dtype=np.int64) 149 | else: 150 | gt_bboxes = np.zeros((0, 4), dtype=np.float32) 151 | gt_labels = np.array([], dtype=np.int64) 152 | 153 | if gt_bboxes_ignore: 154 | gt_bboxes_ignore = np.array(gt_bboxes_ignore, dtype=np.float32) 155 | else: 156 | gt_bboxes_ignore = np.zeros((0, 4), dtype=np.float32) 157 | 158 | seg_map = img_info['filename'].replace('jpg', 'png') 159 | 160 | ann = dict( 161 | bboxes=gt_bboxes, 162 | labels=gt_labels, 163 | bboxes_ignore=gt_bboxes_ignore, 164 | masks=gt_masks_ann, 165 | seg_map=seg_map) 166 | 167 | return ann 168 | 169 | if __name__=='__main__': 170 | from mmdet.core.utils import mask2ndarray 171 | train_pipeline = [ 172 | dict(type='LoadImageFromFile'), 173 | dict(type='LoadAnnotations', with_bbox=True, with_mask=True), 174 | dict(type='RandomFlip', flip_ratio=0.0), 175 | dict(type='DefaultFormatBundle'), 176 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']), 177 | ] 178 | ann_file = "/horizon-bucket/aidi_public_data/coco/origin/annotations/instances_val2017.json" 179 | img_prefix = "/horizon-bucket/aidi_public_data/coco/origin/val2017/" 180 | coco = CocoAnnDataset(pipeline=train_pipeline, ann_file=ann_file, img_prefix=img_prefix, test_mode=False) 181 | res = {} 182 | for item in coco: 183 | file_name = item['img_metas'].data['filename'] 184 | file_name = '/'.join(file_name.split('/')[-2]) 185 | masks = mask2ndarray(item['gt_masks']) 186 | res[file_name] = masks 187 | with open('save_ann.pkl', 'wb') as f: 188 | pickle.dump(res, f) 189 | 190 | -------------------------------------------------------------------------------- /datasets/pipelines/__init__.py: -------------------------------------------------------------------------------- 1 | from .copypaste import CopyPaste -------------------------------------------------------------------------------- /datasets/pipelines/copypaste.py: -------------------------------------------------------------------------------- 1 | import copy 2 | import inspect 3 | import math 4 | import warnings 5 | 6 | import cv2 7 | import mmcv 8 | import numpy as np 9 | from numpy import random 10 | 11 | from mmdet.core import BitmapMasks, PolygonMasks, find_inside_bboxes 12 | from mmdet.core.evaluation.bbox_overlaps import bbox_overlaps 13 | from mmdet.utils import log_img_scale 14 | from mmdet.datasets.builder import PIPELINES 15 | 16 | try: 17 | from imagecorruptions import corrupt 18 | except ImportError: 19 | corrupt = None 20 | 21 | try: 22 | import albumentations 23 | from albumentations import Compose 24 | except ImportError: 25 | albumentations = None 26 | Compose = None 27 | 28 | @PIPELINES.register_module() 29 | class CopyPaste: 30 | """Simple Copy-Paste is a Strong Data Augmentation Method for Instance 31 | Segmentation The simple copy-paste transform steps are as follows: 32 | 1. The destination image is already resized with aspect ratio kept, 33 | cropped and padded. 34 | 2. Randomly select a source image, which is also already resized 35 | with aspect ratio kept, cropped and padded in a similar way 36 | as the destination image. 37 | 3. Randomly select some objects from the source image. 38 | 4. Paste these source objects to the destination image directly, 39 | due to the source and destination image have the same size. 40 | 5. Update object masks of the destination image, for some origin objects 41 | may be occluded. 42 | 6. Generate bboxes from the updated destination masks and 43 | filter some objects which are totally occluded, and adjust bboxes 44 | which are partly occluded. 45 | 7. Append selected source bboxes, masks, and labels. 46 | Args: 47 | max_num_pasted (int): The maximum number of pasted objects. 48 | Default: 100. 49 | bbox_occluded_thr (int): The threshold of occluded bbox. 50 | Default: 10. 51 | mask_occluded_thr (int): The threshold of occluded mask. 52 | Default: 300. 53 | selected (bool): Whether select objects or not. If select is False, 54 | all objects of the source image will be pasted to the 55 | destination image. 56 | Default: True. 57 | """ 58 | 59 | def __init__( 60 | self, 61 | max_num_pasted=100, 62 | bbox_occluded_thr=10, 63 | mask_occluded_thr=300, 64 | selected=True, 65 | ): 66 | self.max_num_pasted = max_num_pasted 67 | self.bbox_occluded_thr = bbox_occluded_thr 68 | self.mask_occluded_thr = mask_occluded_thr 69 | self.selected = selected 70 | 71 | def get_indexes(self, dataset): 72 | """Call function to collect indexes.s. 73 | Args: 74 | dataset (:obj:`MultiImageMixDataset`): The dataset. 75 | Returns: 76 | list: Indexes. 77 | """ 78 | return random.randint(0, len(dataset)) 79 | 80 | def __call__(self, results): 81 | """Call function to make a copy-paste of image. 82 | Args: 83 | results (dict): Result dict. 84 | Returns: 85 | dict: Result dict with copy-paste transformed. 86 | """ 87 | 88 | assert 'mix_results' in results 89 | num_images = len(results['mix_results']) 90 | assert num_images == 1, \ 91 | f'CopyPaste only supports processing 2 images, got {num_images}' 92 | if self.selected: 93 | selected_results = self._select_object(results['mix_results'][0]) 94 | else: 95 | selected_results = results['mix_results'][0] 96 | return self._copy_paste(results, selected_results) 97 | 98 | def _select_object(self, results): 99 | """Select some objects from the source results.""" 100 | bboxes = results['gt_bboxes'] 101 | labels = results['gt_labels'] 102 | masks = results['gt_masks'] 103 | max_num_pasted = min(bboxes.shape[0] + 1, self.max_num_pasted) 104 | num_pasted = np.random.randint(0, max_num_pasted) 105 | selected_inds = np.random.choice( 106 | bboxes.shape[0], size=num_pasted, replace=False) 107 | 108 | selected_bboxes = bboxes[selected_inds] 109 | selected_labels = labels[selected_inds] 110 | selected_masks = masks[selected_inds] 111 | 112 | results['gt_bboxes'] = selected_bboxes 113 | results['gt_labels'] = selected_labels 114 | results['gt_masks'] = selected_masks 115 | return results 116 | 117 | def _copy_paste(self, dst_results, src_results): 118 | """CopyPaste transform function. 119 | Args: 120 | dst_results (dict): Result dict of the destination image. 121 | src_results (dict): Result dict of the source image. 122 | Returns: 123 | dict: Updated result dict. 124 | """ 125 | dst_img = dst_results['img'] 126 | dst_bboxes = dst_results['gt_bboxes'] 127 | dst_labels = dst_results['gt_labels'] 128 | dst_masks = dst_results['gt_masks'] 129 | 130 | src_img = src_results['img'] 131 | src_bboxes = src_results['gt_bboxes'] 132 | src_labels = src_results['gt_labels'] 133 | src_masks = src_results['gt_masks'] 134 | 135 | if len(src_bboxes) == 0: 136 | return dst_results 137 | 138 | # update masks and generate bboxes from updated masks 139 | composed_mask = np.where(np.any(src_masks.masks, axis=0), 1, 0) 140 | updated_dst_masks = self.get_updated_masks(dst_masks, composed_mask) 141 | updated_dst_bboxes = updated_dst_masks.get_bboxes() 142 | assert len(updated_dst_bboxes) == len(updated_dst_masks) 143 | 144 | # filter totally occluded objects 145 | bboxes_inds = np.all( 146 | np.abs( 147 | (updated_dst_bboxes - dst_bboxes)) <= self.bbox_occluded_thr, 148 | axis=-1) 149 | masks_inds = updated_dst_masks.masks.sum( 150 | axis=(1, 2)) > self.mask_occluded_thr 151 | valid_inds = bboxes_inds | masks_inds 152 | 153 | # Paste source objects to destination image directly 154 | img = dst_img * (1 - composed_mask[..., np.newaxis] 155 | ) + src_img * composed_mask[..., np.newaxis] 156 | bboxes = np.concatenate([updated_dst_bboxes[valid_inds], src_bboxes]) 157 | labels = np.concatenate([dst_labels[valid_inds], src_labels]) 158 | masks = np.concatenate( 159 | [updated_dst_masks.masks[valid_inds], src_masks.masks]) 160 | 161 | dst_results['img'] = img 162 | dst_results['gt_bboxes'] = bboxes 163 | dst_results['gt_labels'] = labels 164 | dst_results['gt_masks'] = BitmapMasks(masks, masks.shape[1], 165 | masks.shape[2]) 166 | 167 | return dst_results 168 | 169 | def get_updated_masks(self, masks, composed_mask): 170 | assert masks.masks.shape[-2:] == composed_mask.shape[-2:], \ 171 | 'Cannot compare two arrays of different size' 172 | masks.masks = np.where(composed_mask, 0, masks.masks) 173 | return masks 174 | 175 | def __repr__(self): 176 | repr_str = self.__class__.__name__ 177 | repr_str += f'max_num_pasted={self.max_num_pasted}, ' 178 | repr_str += f'bbox_occluded_thr={self.bbox_occluded_thr}, ' 179 | repr_str += f'mask_occluded_thr={self.mask_occluded_thr}, ' 180 | repr_str += f'selected={self.selected}, ' 181 | return repr_str -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | from .roi_heads import * 2 | from .necks import * -------------------------------------------------------------------------------- /models/necks/__init__.py: -------------------------------------------------------------------------------- 1 | from .bifpn import BiFPN -------------------------------------------------------------------------------- /models/roi_heads/__init__.py: -------------------------------------------------------------------------------- 1 | from .sparse_score_roi_head import * 2 | from .bbox_heads import * 3 | from .mask_heads import * -------------------------------------------------------------------------------- /models/roi_heads/bbox_heads/__init__.py: -------------------------------------------------------------------------------- 1 | from .dii_score_head import DIIScoreHead -------------------------------------------------------------------------------- /models/roi_heads/mask_heads/__init__.py: -------------------------------------------------------------------------------- 1 | from .dynamic_mask_head import DynamicMaskIoUHead 2 | from .maskiou_head import SparseMaskIoUHead -------------------------------------------------------------------------------- /models/roi_heads/mask_heads/dynamic_mask_head.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import torch 3 | import torch.nn as nn 4 | from mmcv.runner import auto_fp16, force_fp32 5 | 6 | from mmdet.core import mask_target 7 | from mmdet.models.builder import HEADS 8 | from mmdet.models.dense_heads.atss_head import reduce_mean 9 | from mmdet.models.utils import build_transformer 10 | from mmdet.models.roi_heads.mask_heads.fcn_mask_head import FCNMaskHead 11 | 12 | 13 | @HEADS.register_module() 14 | class DynamicMaskIoUHead(FCNMaskHead): 15 | r"""Dynamic Mask Head for 16 | `Instances as Queries `_ 17 | Args: 18 | num_convs (int): Number of convolution layer. 19 | Defaults to 4. 20 | roi_feat_size (int): The output size of RoI extractor, 21 | Defaults to 14. 22 | in_channels (int): Input feature channels. 23 | Defaults to 256. 24 | conv_kernel_size (int): Kernel size of convolution layers. 25 | Defaults to 3. 26 | conv_out_channels (int): Output channels of convolution layers. 27 | Defaults to 256. 28 | num_classes (int): Number of classes. 29 | Defaults to 80 30 | class_agnostic (int): Whether generate class agnostic prediction. 31 | Defaults to False. 32 | dropout (float): Probability of drop the channel. 33 | Defaults to 0.0 34 | upsample_cfg (dict): The config for upsample layer. 35 | conv_cfg (dict): The convolution layer config. 36 | norm_cfg (dict): The norm layer config. 37 | dynamic_conv_cfg (dict): The dynamic convolution layer config. 38 | loss_mask (dict): The config for mask loss. 39 | """ 40 | 41 | def __init__(self, 42 | num_convs=4, 43 | roi_feat_size=14, 44 | in_channels=256, 45 | conv_kernel_size=3, 46 | conv_out_channels=256, 47 | num_classes=80, 48 | class_agnostic=False, 49 | upsample_cfg=dict(type='deconv', scale_factor=2), 50 | conv_cfg=None, 51 | norm_cfg=None, 52 | dynamic_conv_cfg=dict( 53 | type='DynamicConv', 54 | in_channels=256, 55 | feat_channels=64, 56 | out_channels=256, 57 | input_feat_shape=14, 58 | with_proj=False, 59 | act_cfg=dict(type='ReLU', inplace=True), 60 | norm_cfg=dict(type='LN')), 61 | loss_mask=dict(type='DiceLoss', loss_weight=8.0), 62 | **kwargs): 63 | super(DynamicMaskIoUHead, self).__init__( 64 | num_convs=num_convs, 65 | roi_feat_size=roi_feat_size, 66 | in_channels=in_channels, 67 | conv_kernel_size=conv_kernel_size, 68 | conv_out_channels=conv_out_channels, 69 | num_classes=num_classes, 70 | class_agnostic=class_agnostic, 71 | upsample_cfg=upsample_cfg, 72 | conv_cfg=conv_cfg, 73 | norm_cfg=norm_cfg, 74 | loss_mask=loss_mask, 75 | **kwargs) 76 | assert class_agnostic is False, \ 77 | 'DynamicMaskHead only support class_agnostic=False' 78 | self.fp16_enabled = False 79 | 80 | self.instance_interactive_conv = build_transformer(dynamic_conv_cfg) 81 | 82 | def init_weights(self): 83 | """Use xavier initialization for all weight parameter and set 84 | classification head bias as a specific value when use focal loss.""" 85 | for p in self.parameters(): 86 | if p.dim() > 1: 87 | nn.init.xavier_uniform_(p) 88 | nn.init.constant_(self.conv_logits.bias, 0.) 89 | 90 | @auto_fp16() 91 | def forward(self, roi_feat, proposal_feat): 92 | """Forward function of DynamicMaskHead. 93 | Args: 94 | roi_feat (Tensor): Roi-pooling features with shape 95 | (batch_size*num_proposals, feature_dimensions, 96 | pooling_h , pooling_w). 97 | proposal_feat (Tensor): Intermediate feature get from 98 | diihead in last stage, has shape 99 | (batch_size*num_proposals, feature_dimensions) 100 | Returns: 101 | mask_pred (Tensor): Predicted foreground masks with shape 102 | (batch_size*num_proposals, num_classes, 103 | pooling_h*2, pooling_w*2). 104 | """ 105 | 106 | proposal_feat = proposal_feat.reshape(-1, self.in_channels) 107 | proposal_feat_iic = self.instance_interactive_conv( 108 | proposal_feat, roi_feat) 109 | 110 | x = proposal_feat_iic.permute(0, 2, 1).reshape(roi_feat.size()) 111 | x_feat = x 112 | 113 | for conv in self.convs: 114 | x = conv(x) 115 | if self.upsample is not None: 116 | x = self.upsample(x) 117 | if self.upsample_method == 'deconv': 118 | x = self.relu(x) 119 | mask_pred = self.conv_logits(x) 120 | return mask_pred, x_feat 121 | 122 | @force_fp32(apply_to=('mask_pred', )) 123 | def loss(self, mask_pred, mask_targets, labels): 124 | num_pos = labels.new_ones(labels.size()).float().sum() 125 | avg_factor = torch.clamp(reduce_mean(num_pos), min=1.).item() 126 | loss = dict() 127 | if mask_pred.size(0) == 0: 128 | loss_mask = mask_pred.sum() 129 | else: 130 | loss_mask = self.loss_mask( 131 | mask_pred[torch.arange(num_pos).long(), labels, ...].sigmoid(), 132 | mask_targets, 133 | avg_factor=avg_factor) 134 | loss['loss_mask'] = loss_mask 135 | return loss 136 | 137 | def get_targets(self, sampling_results, gt_masks, rcnn_train_cfg): 138 | 139 | pos_proposals = [res.pos_bboxes for res in sampling_results] 140 | pos_assigned_gt_inds = [ 141 | res.pos_assigned_gt_inds for res in sampling_results 142 | ] 143 | mask_targets = mask_target(pos_proposals, pos_assigned_gt_inds, 144 | gt_masks, rcnn_train_cfg) 145 | return mask_targets -------------------------------------------------------------------------------- /tools/analysis_tools/analyze_logs.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import json 4 | from collections import defaultdict 5 | 6 | import matplotlib.pyplot as plt 7 | import numpy as np 8 | import seaborn as sns 9 | 10 | 11 | def cal_train_time(log_dicts, args): 12 | for i, log_dict in enumerate(log_dicts): 13 | print(f'{"-" * 5}Analyze train time of {args.json_logs[i]}{"-" * 5}') 14 | all_times = [] 15 | for epoch in log_dict.keys(): 16 | if args.include_outliers: 17 | all_times.append(log_dict[epoch]['time']) 18 | else: 19 | all_times.append(log_dict[epoch]['time'][1:]) 20 | if not all_times: 21 | raise KeyError( 22 | 'Please reduce the log interval in the config so that' 23 | 'interval is less than iterations of one epoch.') 24 | all_times = np.array(all_times) 25 | epoch_ave_time = all_times.mean(-1) 26 | slowest_epoch = epoch_ave_time.argmax() 27 | fastest_epoch = epoch_ave_time.argmin() 28 | std_over_epoch = epoch_ave_time.std() 29 | print(f'slowest epoch {slowest_epoch + 1}, ' 30 | f'average time is {epoch_ave_time[slowest_epoch]:.4f}') 31 | print(f'fastest epoch {fastest_epoch + 1}, ' 32 | f'average time is {epoch_ave_time[fastest_epoch]:.4f}') 33 | print(f'time std over epochs is {std_over_epoch:.4f}') 34 | print(f'average iter time: {np.mean(all_times):.4f} s/iter') 35 | print() 36 | 37 | 38 | def plot_curve(log_dicts, args): 39 | if args.backend is not None: 40 | plt.switch_backend(args.backend) 41 | sns.set_style(args.style) 42 | # if legend is None, use {filename}_{key} as legend 43 | legend = args.legend 44 | if legend is None: 45 | legend = [] 46 | for json_log in args.json_logs: 47 | for metric in args.keys: 48 | legend.append(f'{json_log}_{metric}') 49 | assert len(legend) == (len(args.json_logs) * len(args.keys)) 50 | metrics = args.keys 51 | 52 | num_metrics = len(metrics) 53 | for i, log_dict in enumerate(log_dicts): 54 | epochs = list(log_dict.keys()) 55 | for j, metric in enumerate(metrics): 56 | print(f'plot curve of {args.json_logs[i]}, metric is {metric}') 57 | if metric not in log_dict[epochs[int(args.start_epoch) - 1]]: 58 | if 'mAP' in metric: 59 | raise KeyError( 60 | f'{args.json_logs[i]} does not contain metric ' 61 | f'{metric}. Please check if "--no-validate" is ' 62 | 'specified when you trained the model.') 63 | raise KeyError( 64 | f'{args.json_logs[i]} does not contain metric {metric}. ' 65 | 'Please reduce the log interval in the config so that ' 66 | 'interval is less than iterations of one epoch.') 67 | 68 | if 'mAP' in metric: 69 | xs = np.arange( 70 | int(args.start_epoch), 71 | max(epochs) + 1, int(args.eval_interval)) 72 | ys = [] 73 | for epoch in epochs: 74 | ys += log_dict[epoch][metric] 75 | ax = plt.gca() 76 | ax.set_xticks(xs) 77 | plt.xlabel('epoch') 78 | plt.plot(xs, ys, label=legend[i * num_metrics + j], marker='o') 79 | else: 80 | xs = [] 81 | ys = [] 82 | num_iters_per_epoch = log_dict[epochs[0]]['iter'][-2] 83 | for epoch in epochs: 84 | iters = log_dict[epoch]['iter'] 85 | if log_dict[epoch]['mode'][-1] == 'val': 86 | iters = iters[:-1] 87 | xs.append( 88 | np.array(iters) + (epoch - 1) * num_iters_per_epoch) 89 | ys.append(np.array(log_dict[epoch][metric][:len(iters)])) 90 | xs = np.concatenate(xs) 91 | ys = np.concatenate(ys) 92 | plt.xlabel('iter') 93 | plt.plot( 94 | xs, ys, label=legend[i * num_metrics + j], linewidth=0.5) 95 | plt.legend() 96 | if args.title is not None: 97 | plt.title(args.title) 98 | if args.out is None: 99 | plt.show() 100 | else: 101 | print(f'save curve to: {args.out}') 102 | plt.savefig(args.out) 103 | plt.cla() 104 | 105 | 106 | def add_plot_parser(subparsers): 107 | parser_plt = subparsers.add_parser( 108 | 'plot_curve', help='parser for plotting curves') 109 | parser_plt.add_argument( 110 | 'json_logs', 111 | type=str, 112 | nargs='+', 113 | help='path of train log in json format') 114 | parser_plt.add_argument( 115 | '--keys', 116 | type=str, 117 | nargs='+', 118 | default=['bbox_mAP'], 119 | help='the metric that you want to plot') 120 | parser_plt.add_argument( 121 | '--start-epoch', 122 | type=str, 123 | default='1', 124 | help='the epoch that you want to start') 125 | parser_plt.add_argument( 126 | '--eval-interval', 127 | type=str, 128 | default='1', 129 | help='the eval interval when training') 130 | parser_plt.add_argument('--title', type=str, help='title of figure') 131 | parser_plt.add_argument( 132 | '--legend', 133 | type=str, 134 | nargs='+', 135 | default=None, 136 | help='legend of each plot') 137 | parser_plt.add_argument( 138 | '--backend', type=str, default=None, help='backend of plt') 139 | parser_plt.add_argument( 140 | '--style', type=str, default='dark', help='style of plt') 141 | parser_plt.add_argument('--out', type=str, default=None) 142 | 143 | 144 | def add_time_parser(subparsers): 145 | parser_time = subparsers.add_parser( 146 | 'cal_train_time', 147 | help='parser for computing the average time per training iteration') 148 | parser_time.add_argument( 149 | 'json_logs', 150 | type=str, 151 | nargs='+', 152 | help='path of train log in json format') 153 | parser_time.add_argument( 154 | '--include-outliers', 155 | action='store_true', 156 | help='include the first value of every epoch when computing ' 157 | 'the average time') 158 | 159 | 160 | def parse_args(): 161 | parser = argparse.ArgumentParser(description='Analyze Json Log') 162 | # currently only support plot curve and calculate average train time 163 | subparsers = parser.add_subparsers(dest='task', help='task parser') 164 | add_plot_parser(subparsers) 165 | add_time_parser(subparsers) 166 | args = parser.parse_args() 167 | return args 168 | 169 | 170 | def load_json_logs(json_logs): 171 | # load and convert json_logs to log_dict, key is epoch, value is a sub dict 172 | # keys of sub dict is different metrics, e.g. memory, bbox_mAP 173 | # value of sub dict is a list of corresponding values of all iterations 174 | log_dicts = [dict() for _ in json_logs] 175 | for json_log, log_dict in zip(json_logs, log_dicts): 176 | with open(json_log, 'r') as log_file: 177 | for line in log_file: 178 | log = json.loads(line.strip()) 179 | # skip lines without `epoch` field 180 | if 'epoch' not in log: 181 | continue 182 | epoch = log.pop('epoch') 183 | if epoch not in log_dict: 184 | log_dict[epoch] = defaultdict(list) 185 | for k, v in log.items(): 186 | log_dict[epoch][k].append(v) 187 | return log_dicts 188 | 189 | 190 | def main(): 191 | args = parse_args() 192 | 193 | json_logs = args.json_logs 194 | for json_log in json_logs: 195 | assert json_log.endswith('.json') 196 | 197 | log_dicts = load_json_logs(json_logs) 198 | 199 | eval(args.task)(log_dicts, args) 200 | 201 | 202 | if __name__ == '__main__': 203 | main() 204 | -------------------------------------------------------------------------------- /tools/analysis_tools/analyze_results.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import os.path as osp 4 | 5 | import mmcv 6 | import numpy as np 7 | from mmcv import Config, DictAction 8 | 9 | from mmdet.core.evaluation import eval_map 10 | from mmdet.core.visualization import imshow_gt_det_bboxes 11 | from mmdet.datasets import build_dataset, get_loading_pipeline 12 | from mmdet.utils import update_data_root 13 | 14 | 15 | def bbox_map_eval(det_result, annotation): 16 | """Evaluate mAP of single image det result. 17 | 18 | Args: 19 | det_result (list[list]): [[cls1_det, cls2_det, ...], ...]. 20 | The outer list indicates images, and the inner list indicates 21 | per-class detected bboxes. 22 | annotation (dict): Ground truth annotations where keys of 23 | annotations are: 24 | 25 | - bboxes: numpy array of shape (n, 4) 26 | - labels: numpy array of shape (n, ) 27 | - bboxes_ignore (optional): numpy array of shape (k, 4) 28 | - labels_ignore (optional): numpy array of shape (k, ) 29 | 30 | Returns: 31 | float: mAP 32 | """ 33 | 34 | # use only bbox det result 35 | if isinstance(det_result, tuple): 36 | bbox_det_result = [det_result[0]] 37 | else: 38 | bbox_det_result = [det_result] 39 | # mAP 40 | iou_thrs = np.linspace( 41 | .5, 0.95, int(np.round((0.95 - .5) / .05)) + 1, endpoint=True) 42 | mean_aps = [] 43 | for thr in iou_thrs: 44 | mean_ap, _ = eval_map( 45 | bbox_det_result, [annotation], iou_thr=thr, logger='silent') 46 | mean_aps.append(mean_ap) 47 | return sum(mean_aps) / len(mean_aps) 48 | 49 | 50 | class ResultVisualizer: 51 | """Display and save evaluation results. 52 | 53 | Args: 54 | show (bool): Whether to show the image. Default: True 55 | wait_time (float): Value of waitKey param. Default: 0. 56 | score_thr (float): Minimum score of bboxes to be shown. 57 | Default: 0 58 | """ 59 | 60 | def __init__(self, show=False, wait_time=0, score_thr=0): 61 | self.show = show 62 | self.wait_time = wait_time 63 | self.score_thr = score_thr 64 | 65 | def _save_image_gts_results(self, dataset, results, mAPs, out_dir=None): 66 | mmcv.mkdir_or_exist(out_dir) 67 | 68 | for mAP_info in mAPs: 69 | index, mAP = mAP_info 70 | data_info = dataset.prepare_train_img(index) 71 | 72 | # calc save file path 73 | filename = data_info['filename'] 74 | if data_info['img_prefix'] is not None: 75 | filename = osp.join(data_info['img_prefix'], filename) 76 | else: 77 | filename = data_info['filename'] 78 | fname, name = osp.splitext(osp.basename(filename)) 79 | save_filename = fname + '_' + str(round(mAP, 3)) + name 80 | out_file = osp.join(out_dir, save_filename) 81 | imshow_gt_det_bboxes( 82 | data_info['img'], 83 | data_info, 84 | results[index], 85 | dataset.CLASSES, 86 | gt_bbox_color=dataset.PALETTE, 87 | gt_text_color=(200, 200, 200), 88 | gt_mask_color=dataset.PALETTE, 89 | det_bbox_color=dataset.PALETTE, 90 | det_text_color=(200, 200, 200), 91 | det_mask_color=dataset.PALETTE, 92 | show=self.show, 93 | score_thr=self.score_thr, 94 | wait_time=self.wait_time, 95 | out_file=out_file) 96 | 97 | def evaluate_and_show(self, 98 | dataset, 99 | results, 100 | topk=20, 101 | show_dir='work_dir', 102 | eval_fn=None): 103 | """Evaluate and show results. 104 | 105 | Args: 106 | dataset (Dataset): A PyTorch dataset. 107 | results (list): Det results from test results pkl file 108 | topk (int): Number of the highest topk and 109 | lowest topk after evaluation index sorting. Default: 20 110 | show_dir (str, optional): The filename to write the image. 111 | Default: 'work_dir' 112 | eval_fn (callable, optional): Eval function, Default: None 113 | """ 114 | 115 | assert topk > 0 116 | if (topk * 2) > len(dataset): 117 | topk = len(dataset) // 2 118 | 119 | if eval_fn is None: 120 | eval_fn = bbox_map_eval 121 | else: 122 | assert callable(eval_fn) 123 | 124 | prog_bar = mmcv.ProgressBar(len(results)) 125 | _mAPs = {} 126 | for i, (result, ) in enumerate(zip(results)): 127 | # self.dataset[i] should not call directly 128 | # because there is a risk of mismatch 129 | data_info = dataset.prepare_train_img(i) 130 | mAP = eval_fn(result, data_info['ann_info']) 131 | _mAPs[i] = mAP 132 | prog_bar.update() 133 | 134 | # descending select topk image 135 | _mAPs = list(sorted(_mAPs.items(), key=lambda kv: kv[1])) 136 | good_mAPs = _mAPs[-topk:] 137 | bad_mAPs = _mAPs[:topk] 138 | 139 | good_dir = osp.abspath(osp.join(show_dir, 'good')) 140 | bad_dir = osp.abspath(osp.join(show_dir, 'bad')) 141 | self._save_image_gts_results(dataset, results, good_mAPs, good_dir) 142 | self._save_image_gts_results(dataset, results, bad_mAPs, bad_dir) 143 | 144 | 145 | def parse_args(): 146 | parser = argparse.ArgumentParser( 147 | description='MMDet eval image prediction result for each') 148 | parser.add_argument('config', help='test config file path') 149 | parser.add_argument( 150 | 'prediction_path', help='prediction path where test pkl result') 151 | parser.add_argument( 152 | 'show_dir', help='directory where painted images will be saved') 153 | parser.add_argument('--show', action='store_true', help='show results') 154 | parser.add_argument( 155 | '--wait-time', 156 | type=float, 157 | default=0, 158 | help='the interval of show (s), 0 is block') 159 | parser.add_argument( 160 | '--topk', 161 | default=20, 162 | type=int, 163 | help='saved Number of the highest topk ' 164 | 'and lowest topk after index sorting') 165 | parser.add_argument( 166 | '--show-score-thr', 167 | type=float, 168 | default=0, 169 | help='score threshold (default: 0.)') 170 | parser.add_argument( 171 | '--cfg-options', 172 | nargs='+', 173 | action=DictAction, 174 | help='override some settings in the used config, the key-value pair ' 175 | 'in xxx=yyy format will be merged into config file. If the value to ' 176 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 177 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 178 | 'Note that the quotation marks are necessary and that no white space ' 179 | 'is allowed.') 180 | args = parser.parse_args() 181 | return args 182 | 183 | 184 | def main(): 185 | args = parse_args() 186 | 187 | mmcv.check_file_exist(args.prediction_path) 188 | 189 | cfg = Config.fromfile(args.config) 190 | 191 | # update data root according to MMDET_DATASETS 192 | update_data_root(cfg) 193 | 194 | if args.cfg_options is not None: 195 | cfg.merge_from_dict(args.cfg_options) 196 | cfg.data.test.test_mode = True 197 | 198 | cfg.data.test.pop('samples_per_gpu', 0) 199 | cfg.data.test.pipeline = get_loading_pipeline(cfg.data.train.pipeline) 200 | dataset = build_dataset(cfg.data.test) 201 | outputs = mmcv.load(args.prediction_path) 202 | 203 | result_visualizer = ResultVisualizer(args.show, args.wait_time, 204 | args.show_score_thr) 205 | result_visualizer.evaluate_and_show( 206 | dataset, outputs, topk=args.topk, show_dir=args.show_dir) 207 | 208 | 209 | if __name__ == '__main__': 210 | main() 211 | -------------------------------------------------------------------------------- /tools/analysis_tools/benchmark.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import copy 4 | import os 5 | import time 6 | 7 | import torch 8 | from mmcv import Config, DictAction 9 | from mmcv.cnn import fuse_conv_bn 10 | from mmcv.parallel import MMDistributedDataParallel 11 | from mmcv.runner import init_dist, load_checkpoint, wrap_fp16_model 12 | 13 | from mmdet.datasets import (build_dataloader, build_dataset, 14 | replace_ImageToTensor) 15 | from mmdet.models import build_detector 16 | from mmdet.utils import update_data_root 17 | 18 | 19 | def parse_args(): 20 | parser = argparse.ArgumentParser(description='MMDet benchmark a model') 21 | parser.add_argument('config', help='test config file path') 22 | parser.add_argument('checkpoint', help='checkpoint file') 23 | parser.add_argument( 24 | '--repeat-num', 25 | type=int, 26 | default=1, 27 | help='number of repeat times of measurement for averaging the results') 28 | parser.add_argument( 29 | '--max-iter', type=int, default=2000, help='num of max iter') 30 | parser.add_argument( 31 | '--log-interval', type=int, default=50, help='interval of logging') 32 | parser.add_argument( 33 | '--fuse-conv-bn', 34 | action='store_true', 35 | help='Whether to fuse conv and bn, this will slightly increase' 36 | 'the inference speed') 37 | parser.add_argument( 38 | '--cfg-options', 39 | nargs='+', 40 | action=DictAction, 41 | help='override some settings in the used config, the key-value pair ' 42 | 'in xxx=yyy format will be merged into config file. If the value to ' 43 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 44 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 45 | 'Note that the quotation marks are necessary and that no white space ' 46 | 'is allowed.') 47 | parser.add_argument( 48 | '--launcher', 49 | choices=['none', 'pytorch', 'slurm', 'mpi'], 50 | default='none', 51 | help='job launcher') 52 | parser.add_argument('--local_rank', type=int, default=0) 53 | args = parser.parse_args() 54 | if 'LOCAL_RANK' not in os.environ: 55 | os.environ['LOCAL_RANK'] = str(args.local_rank) 56 | return args 57 | 58 | 59 | def measure_inference_speed(cfg, checkpoint, max_iter, log_interval, 60 | is_fuse_conv_bn): 61 | # set cudnn_benchmark 62 | if cfg.get('cudnn_benchmark', False): 63 | torch.backends.cudnn.benchmark = True 64 | cfg.model.pretrained = None 65 | cfg.data.test.test_mode = True 66 | 67 | # build the dataloader 68 | samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1) 69 | if samples_per_gpu > 1: 70 | # Replace 'ImageToTensor' to 'DefaultFormatBundle' 71 | cfg.data.test.pipeline = replace_ImageToTensor(cfg.data.test.pipeline) 72 | dataset = build_dataset(cfg.data.test) 73 | data_loader = build_dataloader( 74 | dataset, 75 | samples_per_gpu=1, 76 | # Because multiple processes will occupy additional CPU resources, 77 | # FPS statistics will be more unstable when workers_per_gpu is not 0. 78 | # It is reasonable to set workers_per_gpu to 0. 79 | workers_per_gpu=0, 80 | dist=True, 81 | shuffle=False) 82 | 83 | # build the model and load checkpoint 84 | cfg.model.train_cfg = None 85 | model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg')) 86 | fp16_cfg = cfg.get('fp16', None) 87 | if fp16_cfg is not None: 88 | wrap_fp16_model(model) 89 | load_checkpoint(model, checkpoint, map_location='cpu') 90 | if is_fuse_conv_bn: 91 | model = fuse_conv_bn(model) 92 | 93 | model = MMDistributedDataParallel( 94 | model.cuda(), 95 | device_ids=[torch.cuda.current_device()], 96 | broadcast_buffers=False) 97 | model.eval() 98 | 99 | # the first several iterations may be very slow so skip them 100 | num_warmup = 5 101 | pure_inf_time = 0 102 | fps = 0 103 | 104 | # benchmark with 2000 image and take the average 105 | for i, data in enumerate(data_loader): 106 | 107 | torch.cuda.synchronize() 108 | start_time = time.perf_counter() 109 | 110 | with torch.no_grad(): 111 | model(return_loss=False, rescale=True, **data) 112 | 113 | torch.cuda.synchronize() 114 | elapsed = time.perf_counter() - start_time 115 | 116 | if i >= num_warmup: 117 | pure_inf_time += elapsed 118 | if (i + 1) % log_interval == 0: 119 | fps = (i + 1 - num_warmup) / pure_inf_time 120 | print( 121 | f'Done image [{i + 1:<3}/ {max_iter}], ' 122 | f'fps: {fps:.1f} img / s, ' 123 | f'times per image: {1000 / fps:.1f} ms / img', 124 | flush=True) 125 | 126 | if (i + 1) == max_iter: 127 | fps = (i + 1 - num_warmup) / pure_inf_time 128 | print( 129 | f'Overall fps: {fps:.1f} img / s, ' 130 | f'times per image: {1000 / fps:.1f} ms / img', 131 | flush=True) 132 | break 133 | return fps 134 | 135 | 136 | def repeat_measure_inference_speed(cfg, 137 | checkpoint, 138 | max_iter, 139 | log_interval, 140 | is_fuse_conv_bn, 141 | repeat_num=1): 142 | assert repeat_num >= 1 143 | 144 | fps_list = [] 145 | 146 | for _ in range(repeat_num): 147 | # 148 | cp_cfg = copy.deepcopy(cfg) 149 | 150 | fps_list.append( 151 | measure_inference_speed(cp_cfg, checkpoint, max_iter, log_interval, 152 | is_fuse_conv_bn)) 153 | 154 | if repeat_num > 1: 155 | fps_list_ = [round(fps, 1) for fps in fps_list] 156 | times_pre_image_list_ = [round(1000 / fps, 1) for fps in fps_list] 157 | mean_fps_ = sum(fps_list_) / len(fps_list_) 158 | mean_times_pre_image_ = sum(times_pre_image_list_) / len( 159 | times_pre_image_list_) 160 | print( 161 | f'Overall fps: {fps_list_}[{mean_fps_:.1f}] img / s, ' 162 | f'times per image: ' 163 | f'{times_pre_image_list_}[{mean_times_pre_image_:.1f}] ms / img', 164 | flush=True) 165 | return fps_list 166 | 167 | return fps_list[0] 168 | 169 | 170 | def main(): 171 | args = parse_args() 172 | 173 | cfg = Config.fromfile(args.config) 174 | 175 | # update data root according to MMDET_DATASETS 176 | update_data_root(cfg) 177 | 178 | if args.cfg_options is not None: 179 | cfg.merge_from_dict(args.cfg_options) 180 | 181 | if args.launcher == 'none': 182 | raise NotImplementedError('Only supports distributed mode') 183 | else: 184 | init_dist(args.launcher, **cfg.dist_params) 185 | 186 | repeat_measure_inference_speed(cfg, args.checkpoint, args.max_iter, 187 | args.log_interval, args.fuse_conv_bn, 188 | args.repeat_num) 189 | 190 | 191 | if __name__ == '__main__': 192 | main() 193 | -------------------------------------------------------------------------------- /tools/analysis_tools/eval_metric.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | 4 | import mmcv 5 | from mmcv import Config, DictAction 6 | 7 | from mmdet.datasets import build_dataset 8 | from mmdet.utils import update_data_root 9 | 10 | 11 | def parse_args(): 12 | parser = argparse.ArgumentParser(description='Evaluate metric of the ' 13 | 'results saved in pkl format') 14 | parser.add_argument('config', help='Config of the model') 15 | parser.add_argument('pkl_results', help='Results in pickle format') 16 | parser.add_argument( 17 | '--format-only', 18 | action='store_true', 19 | help='Format the output results without perform evaluation. It is' 20 | 'useful when you want to format the result to a specific format and ' 21 | 'submit it to the test server') 22 | parser.add_argument( 23 | '--eval', 24 | type=str, 25 | nargs='+', 26 | help='Evaluation metrics, which depends on the dataset, e.g., "bbox",' 27 | ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC') 28 | parser.add_argument( 29 | '--cfg-options', 30 | nargs='+', 31 | action=DictAction, 32 | help='override some settings in the used config, the key-value pair ' 33 | 'in xxx=yyy format will be merged into config file. If the value to ' 34 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 35 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 36 | 'Note that the quotation marks are necessary and that no white space ' 37 | 'is allowed.') 38 | parser.add_argument( 39 | '--eval-options', 40 | nargs='+', 41 | action=DictAction, 42 | help='custom options for evaluation, the key-value pair in xxx=yyy ' 43 | 'format will be kwargs for dataset.evaluate() function') 44 | args = parser.parse_args() 45 | return args 46 | 47 | 48 | def main(): 49 | args = parse_args() 50 | 51 | cfg = Config.fromfile(args.config) 52 | 53 | # update data root according to MMDET_DATASETS 54 | update_data_root(cfg) 55 | 56 | assert args.eval or args.format_only, ( 57 | 'Please specify at least one operation (eval/format the results) with ' 58 | 'the argument "--eval", "--format-only"') 59 | if args.eval and args.format_only: 60 | raise ValueError('--eval and --format_only cannot be both specified') 61 | 62 | if args.cfg_options is not None: 63 | cfg.merge_from_dict(args.cfg_options) 64 | cfg.data.test.test_mode = True 65 | 66 | dataset = build_dataset(cfg.data.test) 67 | outputs = mmcv.load(args.pkl_results) 68 | 69 | kwargs = {} if args.eval_options is None else args.eval_options 70 | if args.format_only: 71 | dataset.format_results(outputs, **kwargs) 72 | if args.eval: 73 | eval_kwargs = cfg.get('evaluation', {}).copy() 74 | # hard-code way to remove EvalHook args 75 | for key in [ 76 | 'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best', 77 | 'rule' 78 | ]: 79 | eval_kwargs.pop(key, None) 80 | eval_kwargs.update(dict(metric=args.eval, **kwargs)) 81 | print(dataset.evaluate(outputs, **eval_kwargs)) 82 | 83 | 84 | if __name__ == '__main__': 85 | main() 86 | -------------------------------------------------------------------------------- /tools/analysis_tools/get_flops.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | 4 | import numpy as np 5 | import torch 6 | from mmcv import Config, DictAction 7 | 8 | from mmdet.models import build_detector 9 | 10 | try: 11 | from mmcv.cnn import get_model_complexity_info 12 | except ImportError: 13 | raise ImportError('Please upgrade mmcv to >0.6.2') 14 | 15 | 16 | def parse_args(): 17 | parser = argparse.ArgumentParser(description='Train a detector') 18 | parser.add_argument('config', help='train config file path') 19 | parser.add_argument( 20 | '--shape', 21 | type=int, 22 | nargs='+', 23 | default=[1280, 800], 24 | help='input image size') 25 | parser.add_argument( 26 | '--cfg-options', 27 | nargs='+', 28 | action=DictAction, 29 | help='override some settings in the used config, the key-value pair ' 30 | 'in xxx=yyy format will be merged into config file. If the value to ' 31 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 32 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 33 | 'Note that the quotation marks are necessary and that no white space ' 34 | 'is allowed.') 35 | parser.add_argument( 36 | '--size-divisor', 37 | type=int, 38 | default=32, 39 | help='Pad the input image, the minimum size that is divisible ' 40 | 'by size_divisor, -1 means do not pad the image.') 41 | args = parser.parse_args() 42 | return args 43 | 44 | 45 | def main(): 46 | 47 | args = parse_args() 48 | 49 | if len(args.shape) == 1: 50 | h = w = args.shape[0] 51 | elif len(args.shape) == 2: 52 | h, w = args.shape 53 | else: 54 | raise ValueError('invalid input shape') 55 | ori_shape = (3, h, w) 56 | divisor = args.size_divisor 57 | if divisor > 0: 58 | h = int(np.ceil(h / divisor)) * divisor 59 | w = int(np.ceil(w / divisor)) * divisor 60 | 61 | input_shape = (3, h, w) 62 | 63 | cfg = Config.fromfile(args.config) 64 | if args.cfg_options is not None: 65 | cfg.merge_from_dict(args.cfg_options) 66 | 67 | model = build_detector( 68 | cfg.model, 69 | train_cfg=cfg.get('train_cfg'), 70 | test_cfg=cfg.get('test_cfg')) 71 | if torch.cuda.is_available(): 72 | model.cuda() 73 | model.eval() 74 | 75 | if hasattr(model, 'forward_dummy'): 76 | model.forward = model.forward_dummy 77 | else: 78 | raise NotImplementedError( 79 | 'FLOPs counter is currently not currently supported with {}'. 80 | format(model.__class__.__name__)) 81 | 82 | flops, params = get_model_complexity_info(model, input_shape) 83 | split_line = '=' * 30 84 | 85 | if divisor > 0 and \ 86 | input_shape != ori_shape: 87 | print(f'{split_line}\nUse size divisor set input shape ' 88 | f'from {ori_shape} to {input_shape}\n') 89 | print(f'{split_line}\nInput shape: {input_shape}\n' 90 | f'Flops: {flops}\nParams: {params}\n{split_line}') 91 | print('!!!Please be cautious if you use the results in papers. ' 92 | 'You may need to check if all ops are supported and verify that the ' 93 | 'flops computation is correct.') 94 | 95 | 96 | if __name__ == '__main__': 97 | main() 98 | -------------------------------------------------------------------------------- /tools/dataset_converters/cityscapes.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import glob 4 | import os.path as osp 5 | 6 | import cityscapesscripts.helpers.labels as CSLabels 7 | import mmcv 8 | import numpy as np 9 | import pycocotools.mask as maskUtils 10 | 11 | 12 | def collect_files(img_dir, gt_dir): 13 | suffix = 'leftImg8bit.png' 14 | files = [] 15 | for img_file in glob.glob(osp.join(img_dir, '**/*.png')): 16 | assert img_file.endswith(suffix), img_file 17 | inst_file = gt_dir + img_file[ 18 | len(img_dir):-len(suffix)] + 'gtFine_instanceIds.png' 19 | # Note that labelIds are not converted to trainId for seg map 20 | segm_file = gt_dir + img_file[ 21 | len(img_dir):-len(suffix)] + 'gtFine_labelIds.png' 22 | files.append((img_file, inst_file, segm_file)) 23 | assert len(files), f'No images found in {img_dir}' 24 | print(f'Loaded {len(files)} images from {img_dir}') 25 | 26 | return files 27 | 28 | 29 | def collect_annotations(files, nproc=1): 30 | print('Loading annotation images') 31 | if nproc > 1: 32 | images = mmcv.track_parallel_progress( 33 | load_img_info, files, nproc=nproc) 34 | else: 35 | images = mmcv.track_progress(load_img_info, files) 36 | 37 | return images 38 | 39 | 40 | def load_img_info(files): 41 | img_file, inst_file, segm_file = files 42 | inst_img = mmcv.imread(inst_file, 'unchanged') 43 | # ids < 24 are stuff labels (filtering them first is about 5% faster) 44 | unique_inst_ids = np.unique(inst_img[inst_img >= 24]) 45 | anno_info = [] 46 | for inst_id in unique_inst_ids: 47 | # For non-crowd annotations, inst_id // 1000 is the label_id 48 | # Crowd annotations have <1000 instance ids 49 | label_id = inst_id // 1000 if inst_id >= 1000 else inst_id 50 | label = CSLabels.id2label[label_id] 51 | if not label.hasInstances or label.ignoreInEval: 52 | continue 53 | 54 | category_id = label.id 55 | iscrowd = int(inst_id < 1000) 56 | mask = np.asarray(inst_img == inst_id, dtype=np.uint8, order='F') 57 | mask_rle = maskUtils.encode(mask[:, :, None])[0] 58 | 59 | area = maskUtils.area(mask_rle) 60 | # convert to COCO style XYWH format 61 | bbox = maskUtils.toBbox(mask_rle) 62 | 63 | # for json encoding 64 | mask_rle['counts'] = mask_rle['counts'].decode() 65 | 66 | anno = dict( 67 | iscrowd=iscrowd, 68 | category_id=category_id, 69 | bbox=bbox.tolist(), 70 | area=area.tolist(), 71 | segmentation=mask_rle) 72 | anno_info.append(anno) 73 | video_name = osp.basename(osp.dirname(img_file)) 74 | img_info = dict( 75 | # remove img_prefix for filename 76 | file_name=osp.join(video_name, osp.basename(img_file)), 77 | height=inst_img.shape[0], 78 | width=inst_img.shape[1], 79 | anno_info=anno_info, 80 | segm_file=osp.join(video_name, osp.basename(segm_file))) 81 | 82 | return img_info 83 | 84 | 85 | def cvt_annotations(image_infos, out_json_name): 86 | out_json = dict() 87 | img_id = 0 88 | ann_id = 0 89 | out_json['images'] = [] 90 | out_json['categories'] = [] 91 | out_json['annotations'] = [] 92 | for image_info in image_infos: 93 | image_info['id'] = img_id 94 | anno_infos = image_info.pop('anno_info') 95 | out_json['images'].append(image_info) 96 | for anno_info in anno_infos: 97 | anno_info['image_id'] = img_id 98 | anno_info['id'] = ann_id 99 | out_json['annotations'].append(anno_info) 100 | ann_id += 1 101 | img_id += 1 102 | for label in CSLabels.labels: 103 | if label.hasInstances and not label.ignoreInEval: 104 | cat = dict(id=label.id, name=label.name) 105 | out_json['categories'].append(cat) 106 | 107 | if len(out_json['annotations']) == 0: 108 | out_json.pop('annotations') 109 | 110 | mmcv.dump(out_json, out_json_name) 111 | return out_json 112 | 113 | 114 | def parse_args(): 115 | parser = argparse.ArgumentParser( 116 | description='Convert Cityscapes annotations to COCO format') 117 | parser.add_argument('cityscapes_path', help='cityscapes data path') 118 | parser.add_argument('--img-dir', default='leftImg8bit', type=str) 119 | parser.add_argument('--gt-dir', default='gtFine', type=str) 120 | parser.add_argument('-o', '--out-dir', help='output path') 121 | parser.add_argument( 122 | '--nproc', default=1, type=int, help='number of process') 123 | args = parser.parse_args() 124 | return args 125 | 126 | 127 | def main(): 128 | args = parse_args() 129 | cityscapes_path = args.cityscapes_path 130 | out_dir = args.out_dir if args.out_dir else cityscapes_path 131 | mmcv.mkdir_or_exist(out_dir) 132 | 133 | img_dir = osp.join(cityscapes_path, args.img_dir) 134 | gt_dir = osp.join(cityscapes_path, args.gt_dir) 135 | 136 | set_name = dict( 137 | train='instancesonly_filtered_gtFine_train.json', 138 | val='instancesonly_filtered_gtFine_val.json', 139 | test='instancesonly_filtered_gtFine_test.json') 140 | 141 | for split, json_name in set_name.items(): 142 | print(f'Converting {split} into {json_name}') 143 | with mmcv.Timer( 144 | print_tmpl='It took {}s to convert Cityscapes annotation'): 145 | files = collect_files( 146 | osp.join(img_dir, split), osp.join(gt_dir, split)) 147 | image_infos = collect_annotations(files, nproc=args.nproc) 148 | cvt_annotations(image_infos, osp.join(out_dir, json_name)) 149 | 150 | 151 | if __name__ == '__main__': 152 | main() 153 | -------------------------------------------------------------------------------- /tools/dataset_converters/images2coco.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import os 4 | 5 | import mmcv 6 | from PIL import Image 7 | 8 | 9 | def parse_args(): 10 | parser = argparse.ArgumentParser( 11 | description='Convert images to coco format without annotations') 12 | parser.add_argument('img_path', help='The root path of images') 13 | parser.add_argument( 14 | 'classes', type=str, help='The text file name of storage class list') 15 | parser.add_argument( 16 | 'out', 17 | type=str, 18 | help='The output annotation json file name, The save dir is in the ' 19 | 'same directory as img_path') 20 | parser.add_argument( 21 | '-e', 22 | '--exclude-extensions', 23 | type=str, 24 | nargs='+', 25 | help='The suffix of images to be excluded, such as "png" and "bmp"') 26 | args = parser.parse_args() 27 | return args 28 | 29 | 30 | def collect_image_infos(path, exclude_extensions=None): 31 | img_infos = [] 32 | 33 | images_generator = mmcv.scandir(path, recursive=True) 34 | for image_path in mmcv.track_iter_progress(list(images_generator)): 35 | if exclude_extensions is None or ( 36 | exclude_extensions is not None 37 | and not image_path.lower().endswith(exclude_extensions)): 38 | image_path = os.path.join(path, image_path) 39 | img_pillow = Image.open(image_path) 40 | img_info = { 41 | 'filename': image_path, 42 | 'width': img_pillow.width, 43 | 'height': img_pillow.height, 44 | } 45 | img_infos.append(img_info) 46 | return img_infos 47 | 48 | 49 | def cvt_to_coco_json(img_infos, classes): 50 | image_id = 0 51 | coco = dict() 52 | coco['images'] = [] 53 | coco['type'] = 'instance' 54 | coco['categories'] = [] 55 | coco['annotations'] = [] 56 | image_set = set() 57 | 58 | for category_id, name in enumerate(classes): 59 | category_item = dict() 60 | category_item['supercategory'] = str('none') 61 | category_item['id'] = int(category_id) 62 | category_item['name'] = str(name) 63 | coco['categories'].append(category_item) 64 | 65 | for img_dict in img_infos: 66 | file_name = img_dict['filename'] 67 | assert file_name not in image_set 68 | image_item = dict() 69 | image_item['id'] = int(image_id) 70 | image_item['file_name'] = str(file_name) 71 | image_item['height'] = int(img_dict['height']) 72 | image_item['width'] = int(img_dict['width']) 73 | coco['images'].append(image_item) 74 | image_set.add(file_name) 75 | 76 | image_id += 1 77 | return coco 78 | 79 | 80 | def main(): 81 | args = parse_args() 82 | assert args.out.endswith( 83 | 'json'), 'The output file name must be json suffix' 84 | 85 | # 1 load image list info 86 | img_infos = collect_image_infos(args.img_path, args.exclude_extensions) 87 | 88 | # 2 convert to coco format data 89 | classes = mmcv.list_from_file(args.classes) 90 | coco_info = cvt_to_coco_json(img_infos, classes) 91 | 92 | # 3 dump 93 | save_dir = os.path.join(args.img_path, '..', 'annotations') 94 | mmcv.mkdir_or_exist(save_dir) 95 | save_path = os.path.join(save_dir, args.out) 96 | mmcv.dump(coco_info, save_path) 97 | print(f'save json file: {save_path}') 98 | 99 | 100 | if __name__ == '__main__': 101 | main() 102 | -------------------------------------------------------------------------------- /tools/deployment/mmdet2torchserve.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | from argparse import ArgumentParser, Namespace 3 | from pathlib import Path 4 | from tempfile import TemporaryDirectory 5 | 6 | import mmcv 7 | 8 | try: 9 | from model_archiver.model_packaging import package_model 10 | from model_archiver.model_packaging_utils import ModelExportUtils 11 | except ImportError: 12 | package_model = None 13 | 14 | 15 | def mmdet2torchserve( 16 | config_file: str, 17 | checkpoint_file: str, 18 | output_folder: str, 19 | model_name: str, 20 | model_version: str = '1.0', 21 | force: bool = False, 22 | ): 23 | """Converts MMDetection model (config + checkpoint) to TorchServe `.mar`. 24 | 25 | Args: 26 | config_file: 27 | In MMDetection config format. 28 | The contents vary for each task repository. 29 | checkpoint_file: 30 | In MMDetection checkpoint format. 31 | The contents vary for each task repository. 32 | output_folder: 33 | Folder where `{model_name}.mar` will be created. 34 | The file created will be in TorchServe archive format. 35 | model_name: 36 | If not None, used for naming the `{model_name}.mar` file 37 | that will be created under `output_folder`. 38 | If None, `{Path(checkpoint_file).stem}` will be used. 39 | model_version: 40 | Model's version. 41 | force: 42 | If True, if there is an existing `{model_name}.mar` 43 | file under `output_folder` it will be overwritten. 44 | """ 45 | mmcv.mkdir_or_exist(output_folder) 46 | 47 | config = mmcv.Config.fromfile(config_file) 48 | 49 | with TemporaryDirectory() as tmpdir: 50 | config.dump(f'{tmpdir}/config.py') 51 | 52 | args = Namespace( 53 | **{ 54 | 'model_file': f'{tmpdir}/config.py', 55 | 'serialized_file': checkpoint_file, 56 | 'handler': f'{Path(__file__).parent}/mmdet_handler.py', 57 | 'model_name': model_name or Path(checkpoint_file).stem, 58 | 'version': model_version, 59 | 'export_path': output_folder, 60 | 'force': force, 61 | 'requirements_file': None, 62 | 'extra_files': None, 63 | 'runtime': 'python', 64 | 'archive_format': 'default' 65 | }) 66 | manifest = ModelExportUtils.generate_manifest_json(args) 67 | package_model(args, manifest) 68 | 69 | 70 | def parse_args(): 71 | parser = ArgumentParser( 72 | description='Convert MMDetection models to TorchServe `.mar` format.') 73 | parser.add_argument('config', type=str, help='config file path') 74 | parser.add_argument('checkpoint', type=str, help='checkpoint file path') 75 | parser.add_argument( 76 | '--output-folder', 77 | type=str, 78 | required=True, 79 | help='Folder where `{model_name}.mar` will be created.') 80 | parser.add_argument( 81 | '--model-name', 82 | type=str, 83 | default=None, 84 | help='If not None, used for naming the `{model_name}.mar`' 85 | 'file that will be created under `output_folder`.' 86 | 'If None, `{Path(checkpoint_file).stem}` will be used.') 87 | parser.add_argument( 88 | '--model-version', 89 | type=str, 90 | default='1.0', 91 | help='Number used for versioning.') 92 | parser.add_argument( 93 | '-f', 94 | '--force', 95 | action='store_true', 96 | help='overwrite the existing `{model_name}.mar`') 97 | args = parser.parse_args() 98 | 99 | return args 100 | 101 | 102 | if __name__ == '__main__': 103 | args = parse_args() 104 | 105 | if package_model is None: 106 | raise ImportError('`torch-model-archiver` is required.' 107 | 'Try: pip install torch-model-archiver') 108 | 109 | mmdet2torchserve(args.config, args.checkpoint, args.output_folder, 110 | args.model_name, args.model_version, args.force) 111 | -------------------------------------------------------------------------------- /tools/deployment/mmdet_handler.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import base64 3 | import os 4 | 5 | import mmcv 6 | import torch 7 | from ts.torch_handler.base_handler import BaseHandler 8 | 9 | from mmdet.apis import inference_detector, init_detector 10 | 11 | 12 | class MMdetHandler(BaseHandler): 13 | threshold = 0.5 14 | 15 | def initialize(self, context): 16 | properties = context.system_properties 17 | self.map_location = 'cuda' if torch.cuda.is_available() else 'cpu' 18 | self.device = torch.device(self.map_location + ':' + 19 | str(properties.get('gpu_id')) if torch.cuda. 20 | is_available() else self.map_location) 21 | self.manifest = context.manifest 22 | 23 | model_dir = properties.get('model_dir') 24 | serialized_file = self.manifest['model']['serializedFile'] 25 | checkpoint = os.path.join(model_dir, serialized_file) 26 | self.config_file = os.path.join(model_dir, 'config.py') 27 | 28 | self.model = init_detector(self.config_file, checkpoint, self.device) 29 | self.initialized = True 30 | 31 | def preprocess(self, data): 32 | images = [] 33 | 34 | for row in data: 35 | image = row.get('data') or row.get('body') 36 | if isinstance(image, str): 37 | image = base64.b64decode(image) 38 | image = mmcv.imfrombytes(image) 39 | images.append(image) 40 | 41 | return images 42 | 43 | def inference(self, data, *args, **kwargs): 44 | results = inference_detector(self.model, data) 45 | return results 46 | 47 | def postprocess(self, data): 48 | # Format output following the example ObjectDetectionHandler format 49 | output = [] 50 | for image_index, image_result in enumerate(data): 51 | output.append([]) 52 | if isinstance(image_result, tuple): 53 | bbox_result, segm_result = image_result 54 | if isinstance(segm_result, tuple): 55 | segm_result = segm_result[0] # ms rcnn 56 | else: 57 | bbox_result, segm_result = image_result, None 58 | 59 | for class_index, class_result in enumerate(bbox_result): 60 | class_name = self.model.CLASSES[class_index] 61 | for bbox in class_result: 62 | bbox_coords = bbox[:-1].tolist() 63 | score = float(bbox[-1]) 64 | if score >= self.threshold: 65 | output[image_index].append({ 66 | 'class_name': class_name, 67 | 'bbox': bbox_coords, 68 | 'score': score 69 | }) 70 | 71 | return output 72 | -------------------------------------------------------------------------------- /tools/deployment/test.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import warnings 4 | 5 | import mmcv 6 | from mmcv import Config, DictAction 7 | from mmcv.parallel import MMDataParallel 8 | 9 | from mmdet.apis import single_gpu_test 10 | from mmdet.datasets import (build_dataloader, build_dataset, 11 | replace_ImageToTensor) 12 | 13 | 14 | def parse_args(): 15 | parser = argparse.ArgumentParser( 16 | description='MMDet test (and eval) an ONNX model using ONNXRuntime') 17 | parser.add_argument('config', help='test config file path') 18 | parser.add_argument('model', help='Input model file') 19 | parser.add_argument('--out', help='output result file in pickle format') 20 | parser.add_argument( 21 | '--format-only', 22 | action='store_true', 23 | help='Format the output results without perform evaluation. It is' 24 | 'useful when you want to format the result to a specific format and ' 25 | 'submit it to the test server') 26 | parser.add_argument( 27 | '--backend', 28 | required=True, 29 | choices=['onnxruntime', 'tensorrt'], 30 | help='Backend for input model to run. ') 31 | parser.add_argument( 32 | '--eval', 33 | type=str, 34 | nargs='+', 35 | help='evaluation metrics, which depends on the dataset, e.g., "bbox",' 36 | ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC') 37 | parser.add_argument('--show', action='store_true', help='show results') 38 | parser.add_argument( 39 | '--show-dir', help='directory where painted images will be saved') 40 | parser.add_argument( 41 | '--show-score-thr', 42 | type=float, 43 | default=0.3, 44 | help='score threshold (default: 0.3)') 45 | parser.add_argument( 46 | '--cfg-options', 47 | nargs='+', 48 | action=DictAction, 49 | help='override some settings in the used config, the key-value pair ' 50 | 'in xxx=yyy format will be merged into config file. If the value to ' 51 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 52 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 53 | 'Note that the quotation marks are necessary and that no white space ' 54 | 'is allowed.') 55 | parser.add_argument( 56 | '--eval-options', 57 | nargs='+', 58 | action=DictAction, 59 | help='custom options for evaluation, the key-value pair in xxx=yyy ' 60 | 'format will be kwargs for dataset.evaluate() function') 61 | 62 | args = parser.parse_args() 63 | return args 64 | 65 | 66 | def main(): 67 | args = parse_args() 68 | 69 | assert args.out or args.eval or args.format_only or args.show \ 70 | or args.show_dir, \ 71 | ('Please specify at least one operation (save/eval/format/show the ' 72 | 'results / save the results) with the argument "--out", "--eval"' 73 | ', "--format-only", "--show" or "--show-dir"') 74 | 75 | if args.eval and args.format_only: 76 | raise ValueError('--eval and --format_only cannot be both specified') 77 | 78 | if args.out is not None and not args.out.endswith(('.pkl', '.pickle')): 79 | raise ValueError('The output file must be a pkl file.') 80 | 81 | cfg = Config.fromfile(args.config) 82 | if args.cfg_options is not None: 83 | cfg.merge_from_dict(args.cfg_options) 84 | 85 | # in case the test dataset is concatenated 86 | samples_per_gpu = 1 87 | if isinstance(cfg.data.test, dict): 88 | cfg.data.test.test_mode = True 89 | samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1) 90 | if samples_per_gpu > 1: 91 | # Replace 'ImageToTensor' to 'DefaultFormatBundle' 92 | cfg.data.test.pipeline = replace_ImageToTensor( 93 | cfg.data.test.pipeline) 94 | elif isinstance(cfg.data.test, list): 95 | for ds_cfg in cfg.data.test: 96 | ds_cfg.test_mode = True 97 | samples_per_gpu = max( 98 | [ds_cfg.pop('samples_per_gpu', 1) for ds_cfg in cfg.data.test]) 99 | if samples_per_gpu > 1: 100 | for ds_cfg in cfg.data.test: 101 | ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline) 102 | 103 | # build the dataloader 104 | dataset = build_dataset(cfg.data.test) 105 | data_loader = build_dataloader( 106 | dataset, 107 | samples_per_gpu=samples_per_gpu, 108 | workers_per_gpu=cfg.data.workers_per_gpu, 109 | dist=False, 110 | shuffle=False) 111 | 112 | if args.backend == 'onnxruntime': 113 | from mmdet.core.export.model_wrappers import ONNXRuntimeDetector 114 | model = ONNXRuntimeDetector( 115 | args.model, class_names=dataset.CLASSES, device_id=0) 116 | elif args.backend == 'tensorrt': 117 | from mmdet.core.export.model_wrappers import TensorRTDetector 118 | model = TensorRTDetector( 119 | args.model, class_names=dataset.CLASSES, device_id=0) 120 | 121 | model = MMDataParallel(model, device_ids=[0]) 122 | outputs = single_gpu_test(model, data_loader, args.show, args.show_dir, 123 | args.show_score_thr) 124 | 125 | if args.out: 126 | print(f'\nwriting results to {args.out}') 127 | mmcv.dump(outputs, args.out) 128 | kwargs = {} if args.eval_options is None else args.eval_options 129 | if args.format_only: 130 | dataset.format_results(outputs, **kwargs) 131 | if args.eval: 132 | eval_kwargs = cfg.get('evaluation', {}).copy() 133 | # hard-code way to remove EvalHook args 134 | for key in [ 135 | 'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best', 136 | 'rule' 137 | ]: 138 | eval_kwargs.pop(key, None) 139 | eval_kwargs.update(dict(metric=args.eval, **kwargs)) 140 | print(dataset.evaluate(outputs, **eval_kwargs)) 141 | 142 | 143 | if __name__ == '__main__': 144 | main() 145 | 146 | # Following strings of text style are from colorama package 147 | bright_style, reset_style = '\x1b[1m', '\x1b[0m' 148 | red_text, blue_text = '\x1b[31m', '\x1b[34m' 149 | white_background = '\x1b[107m' 150 | 151 | msg = white_background + bright_style + red_text 152 | msg += 'DeprecationWarning: This tool will be deprecated in future. ' 153 | msg += blue_text + 'Welcome to use the unified model deployment toolbox ' 154 | msg += 'MMDeploy: https://github.com/open-mmlab/mmdeploy' 155 | msg += reset_style 156 | warnings.warn(msg) 157 | -------------------------------------------------------------------------------- /tools/deployment/test_torchserver.py: -------------------------------------------------------------------------------- 1 | from argparse import ArgumentParser 2 | 3 | import numpy as np 4 | import requests 5 | 6 | from mmdet.apis import inference_detector, init_detector, show_result_pyplot 7 | from mmdet.core import bbox2result 8 | 9 | 10 | def parse_args(): 11 | parser = ArgumentParser() 12 | parser.add_argument('img', help='Image file') 13 | parser.add_argument('config', help='Config file') 14 | parser.add_argument('checkpoint', help='Checkpoint file') 15 | parser.add_argument('model_name', help='The model name in the server') 16 | parser.add_argument( 17 | '--inference-addr', 18 | default='127.0.0.1:8080', 19 | help='Address and port of the inference server') 20 | parser.add_argument( 21 | '--device', default='cuda:0', help='Device used for inference') 22 | parser.add_argument( 23 | '--score-thr', type=float, default=0.5, help='bbox score threshold') 24 | args = parser.parse_args() 25 | return args 26 | 27 | 28 | def parse_result(input, model_class): 29 | bbox = [] 30 | label = [] 31 | score = [] 32 | for anchor in input: 33 | bbox.append(anchor['bbox']) 34 | label.append(model_class.index(anchor['class_name'])) 35 | score.append([anchor['score']]) 36 | bboxes = np.append(bbox, score, axis=1) 37 | labels = np.array(label) 38 | result = bbox2result(bboxes, labels, len(model_class)) 39 | return result 40 | 41 | 42 | def main(args): 43 | # build the model from a config file and a checkpoint file 44 | model = init_detector(args.config, args.checkpoint, device=args.device) 45 | # test a single image 46 | model_result = inference_detector(model, args.img) 47 | for i, anchor_set in enumerate(model_result): 48 | anchor_set = anchor_set[anchor_set[:, 4] >= 0.5] 49 | model_result[i] = anchor_set 50 | # show the results 51 | show_result_pyplot( 52 | model, 53 | args.img, 54 | model_result, 55 | score_thr=args.score_thr, 56 | title='pytorch_result') 57 | url = 'http://' + args.inference_addr + '/predictions/' + args.model_name 58 | with open(args.img, 'rb') as image: 59 | response = requests.post(url, image) 60 | server_result = parse_result(response.json(), model.CLASSES) 61 | show_result_pyplot( 62 | model, 63 | args.img, 64 | server_result, 65 | score_thr=args.score_thr, 66 | title='server_result') 67 | 68 | for i in range(len(model.CLASSES)): 69 | assert np.allclose(model_result[i], server_result[i]) 70 | 71 | 72 | if __name__ == '__main__': 73 | args = parse_args() 74 | main(args) 75 | -------------------------------------------------------------------------------- /tools/dist_test.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | CONFIG=$1 4 | CHECKPOINT=$2 5 | GPUS=$3 6 | NNODES=${NNODES:-1} 7 | NODE_RANK=${NODE_RANK:-0} 8 | PORT=${PORT:-29501} 9 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"} 10 | 11 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 12 | python -m torch.distributed.launch \ 13 | --nnodes=$NNODES \ 14 | --node_rank=$NODE_RANK \ 15 | --master_addr=$MASTER_ADDR \ 16 | --nproc_per_node=$GPUS \ 17 | --master_port=$PORT \ 18 | $(dirname "$0")/test.py \ 19 | $CONFIG \ 20 | $CHECKPOINT \ 21 | --eval "bbox" "segm" \ 22 | --launcher pytorch \ 23 | ${@:4} 24 | -------------------------------------------------------------------------------- /tools/dist_train.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | CONFIG=$1 4 | GPUS=$2 5 | NNODES=${NNODES:-1} 6 | NODE_RANK=${NODE_RANK:-0} 7 | PORT=${PORT:-29544} 8 | MASTER_ADDR=${MASTER_ADDR:-"127.0.0.1"} 9 | 10 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 11 | python -m torch.distributed.launch \ 12 | --nnodes=$NNODES \ 13 | --node_rank=$NODE_RANK \ 14 | --master_addr=$MASTER_ADDR \ 15 | --nproc_per_node=$GPUS \ 16 | --master_port=$PORT \ 17 | $(dirname "$0")/train.py \ 18 | $CONFIG \ 19 | --seed 0 \ 20 | --launcher pytorch ${@:3} 21 | -------------------------------------------------------------------------------- /tools/misc/browse_dataset.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import os 4 | from collections import Sequence 5 | from pathlib import Path 6 | 7 | import mmcv 8 | import numpy as np 9 | from mmcv import Config, DictAction 10 | 11 | from mmdet.core.utils import mask2ndarray 12 | from mmdet.core.visualization import imshow_det_bboxes 13 | from mmdet.datasets.builder import build_dataset 14 | from mmdet.utils import update_data_root 15 | 16 | 17 | def parse_args(): 18 | parser = argparse.ArgumentParser(description='Browse a dataset') 19 | parser.add_argument('config', help='train config file path') 20 | parser.add_argument( 21 | '--skip-type', 22 | type=str, 23 | nargs='+', 24 | default=['DefaultFormatBundle', 'Normalize', 'Collect'], 25 | help='skip some useless pipeline') 26 | parser.add_argument( 27 | '--output-dir', 28 | default=None, 29 | type=str, 30 | help='If there is no display interface, you can save it') 31 | parser.add_argument('--not-show', default=False, action='store_true') 32 | parser.add_argument( 33 | '--show-interval', 34 | type=float, 35 | default=2, 36 | help='the interval of show (s)') 37 | parser.add_argument( 38 | '--cfg-options', 39 | nargs='+', 40 | action=DictAction, 41 | help='override some settings in the used config, the key-value pair ' 42 | 'in xxx=yyy format will be merged into config file. If the value to ' 43 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 44 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 45 | 'Note that the quotation marks are necessary and that no white space ' 46 | 'is allowed.') 47 | args = parser.parse_args() 48 | return args 49 | 50 | 51 | def retrieve_data_cfg(config_path, skip_type, cfg_options): 52 | 53 | def skip_pipeline_steps(config): 54 | config['pipeline'] = [ 55 | x for x in config.pipeline if x['type'] not in skip_type 56 | ] 57 | 58 | cfg = Config.fromfile(config_path) 59 | 60 | # update data root according to MMDET_DATASETS 61 | update_data_root(cfg) 62 | 63 | if cfg_options is not None: 64 | cfg.merge_from_dict(cfg_options) 65 | train_data_cfg = cfg.data.train 66 | while 'dataset' in train_data_cfg and train_data_cfg[ 67 | 'type'] != 'MultiImageMixDataset': 68 | train_data_cfg = train_data_cfg['dataset'] 69 | 70 | if isinstance(train_data_cfg, Sequence): 71 | [skip_pipeline_steps(c) for c in train_data_cfg] 72 | else: 73 | skip_pipeline_steps(train_data_cfg) 74 | 75 | return cfg 76 | 77 | 78 | def main(): 79 | args = parse_args() 80 | cfg = retrieve_data_cfg(args.config, args.skip_type, args.cfg_options) 81 | 82 | if 'gt_semantic_seg' in cfg.train_pipeline[-1]['keys']: 83 | cfg.data.train.pipeline = [ 84 | p for p in cfg.data.train.pipeline if p['type'] != 'SegRescale' 85 | ] 86 | dataset = build_dataset(cfg.data.train) 87 | 88 | progress_bar = mmcv.ProgressBar(len(dataset)) 89 | 90 | for item in dataset: 91 | filename = os.path.join(args.output_dir, 92 | Path(item['filename']).name 93 | ) if args.output_dir is not None else None 94 | 95 | gt_bboxes = item['gt_bboxes'] 96 | gt_labels = item['gt_labels'] 97 | gt_masks = item.get('gt_masks', None) 98 | if gt_masks is not None: 99 | gt_masks = mask2ndarray(gt_masks) 100 | 101 | gt_seg = item.get('gt_semantic_seg', None) 102 | if gt_seg is not None: 103 | pad_value = 255 # the padding value of gt_seg 104 | sem_labels = np.unique(gt_seg) 105 | all_labels = np.concatenate((gt_labels, sem_labels), axis=0) 106 | all_labels, counts = np.unique(all_labels, return_counts=True) 107 | stuff_labels = all_labels[np.logical_and(counts < 2, 108 | all_labels != pad_value)] 109 | stuff_masks = gt_seg[None] == stuff_labels[:, None, None] 110 | gt_labels = np.concatenate((gt_labels, stuff_labels), axis=0) 111 | gt_masks = np.concatenate((gt_masks, stuff_masks.astype(np.uint8)), 112 | axis=0) 113 | # If you need to show the bounding boxes, 114 | # please comment the following line 115 | gt_bboxes = None 116 | 117 | imshow_det_bboxes( 118 | item['img'], 119 | gt_bboxes, 120 | gt_labels, 121 | gt_masks, 122 | class_names=dataset.CLASSES, 123 | show=not args.not_show, 124 | wait_time=args.show_interval, 125 | out_file=filename, 126 | bbox_color=dataset.PALETTE, 127 | text_color=(200, 200, 200), 128 | mask_color=dataset.PALETTE) 129 | 130 | progress_bar.update() 131 | 132 | 133 | if __name__ == '__main__': 134 | main() 135 | -------------------------------------------------------------------------------- /tools/misc/download_dataset.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from itertools import repeat 3 | from multiprocessing.pool import ThreadPool 4 | from pathlib import Path 5 | from tarfile import TarFile 6 | from zipfile import ZipFile 7 | 8 | import torch 9 | 10 | 11 | def parse_args(): 12 | parser = argparse.ArgumentParser( 13 | description='Download datasets for training') 14 | parser.add_argument( 15 | '--dataset-name', type=str, help='dataset name', default='coco2017') 16 | parser.add_argument( 17 | '--save-dir', 18 | type=str, 19 | help='the dir to save dataset', 20 | default='data/coco') 21 | parser.add_argument( 22 | '--unzip', 23 | action='store_true', 24 | help='whether unzip dataset or not, zipped files will be saved') 25 | parser.add_argument( 26 | '--delete', 27 | action='store_true', 28 | help='delete the download zipped files') 29 | parser.add_argument( 30 | '--threads', type=int, help='number of threading', default=4) 31 | args = parser.parse_args() 32 | return args 33 | 34 | 35 | def download(url, dir, unzip=True, delete=False, threads=1): 36 | 37 | def download_one(url, dir): 38 | f = dir / Path(url).name 39 | if Path(url).is_file(): 40 | Path(url).rename(f) 41 | elif not f.exists(): 42 | print('Downloading {} to {}'.format(url, f)) 43 | torch.hub.download_url_to_file(url, f, progress=True) 44 | if unzip and f.suffix in ('.zip', '.tar'): 45 | print('Unzipping {}'.format(f.name)) 46 | if f.suffix == '.zip': 47 | ZipFile(f).extractall(path=dir) 48 | elif f.suffix == '.tar': 49 | TarFile(f).extractall(path=dir) 50 | if delete: 51 | f.unlink() 52 | print('Delete {}'.format(f)) 53 | 54 | dir = Path(dir) 55 | if threads > 1: 56 | pool = ThreadPool(threads) 57 | pool.imap(lambda x: download_one(*x), zip(url, repeat(dir))) 58 | pool.close() 59 | pool.join() 60 | else: 61 | for u in [url] if isinstance(url, (str, Path)) else url: 62 | download_one(u, dir) 63 | 64 | 65 | def main(): 66 | args = parse_args() 67 | path = Path(args.save_dir) 68 | if not path.exists(): 69 | path.mkdir(parents=True, exist_ok=True) 70 | data2url = dict( 71 | # TODO: Support for downloading Panoptic Segmentation of COCO 72 | coco2017=[ 73 | 'http://images.cocodataset.org/zips/train2017.zip', 74 | 'http://images.cocodataset.org/zips/val2017.zip', 75 | 'http://images.cocodataset.org/zips/test2017.zip', 76 | 'http://images.cocodataset.org/annotations/' + 77 | 'annotations_trainval2017.zip' 78 | ], 79 | lvis=[ 80 | 'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip', # noqa 81 | 'https://s3-us-west-2.amazonaws.com/dl.fbaipublicfiles.com/LVIS/lvis_v1_train.json.zip', # noqa 82 | ], 83 | voc2007=[ 84 | 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar', # noqa 85 | 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar', # noqa 86 | 'http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar', # noqa 87 | ], 88 | ) 89 | url = data2url.get(args.dataset_name, None) 90 | if url is None: 91 | print('Only support COCO, VOC, and LVIS now!') 92 | return 93 | download( 94 | url, 95 | dir=path, 96 | unzip=args.unzip, 97 | delete=args.delete, 98 | threads=args.threads) 99 | 100 | 101 | if __name__ == '__main__': 102 | main() 103 | -------------------------------------------------------------------------------- /tools/misc/gen_coco_panoptic_test_info.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os.path as osp 3 | 4 | import mmcv 5 | 6 | 7 | def parse_args(): 8 | parser = argparse.ArgumentParser( 9 | description='Generate COCO test image information ' 10 | 'for COCO panoptic segmentation.') 11 | parser.add_argument('data_root', help='Path to COCO annotation directory.') 12 | args = parser.parse_args() 13 | 14 | return args 15 | 16 | 17 | def main(): 18 | args = parse_args() 19 | data_root = args.data_root 20 | val_info = mmcv.load(osp.join(data_root, 'panoptic_val2017.json')) 21 | test_old_info = mmcv.load( 22 | osp.join(data_root, 'image_info_test-dev2017.json')) 23 | 24 | # replace categories from image_info_test-dev2017.json 25 | # with categories from panoptic_val2017.json which 26 | # has attribute `isthing`. 27 | test_info = test_old_info 28 | test_info.update({'categories': val_info['categories']}) 29 | mmcv.dump(test_info, 30 | osp.join(data_root, 'panoptic_image_info_test-dev2017.json')) 31 | 32 | 33 | if __name__ == '__main__': 34 | main() 35 | -------------------------------------------------------------------------------- /tools/misc/get_image_metas.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | """Get test image metas on a specific dataset. 3 | 4 | Here is an example to run this script. 5 | 6 | Example: 7 | python tools/misc/get_image_metas.py ${CONFIG} \ 8 | --out ${OUTPUT FILE NAME} 9 | """ 10 | import argparse 11 | import csv 12 | import os.path as osp 13 | from multiprocessing import Pool 14 | 15 | import mmcv 16 | from mmcv import Config 17 | 18 | 19 | def parse_args(): 20 | parser = argparse.ArgumentParser(description='Collect image metas') 21 | parser.add_argument('config', help='Config file path') 22 | parser.add_argument( 23 | '--out', 24 | default='validation-image-metas.pkl', 25 | help='The output image metas file name. The save dir is in the ' 26 | 'same directory as `dataset.ann_file` path') 27 | parser.add_argument( 28 | '--nproc', 29 | default=4, 30 | type=int, 31 | help='Processes used for get image metas') 32 | args = parser.parse_args() 33 | return args 34 | 35 | 36 | def get_metas_from_csv_style_ann_file(ann_file): 37 | data_infos = [] 38 | cp_filename = None 39 | with open(ann_file, 'r') as f: 40 | reader = csv.reader(f) 41 | for i, line in enumerate(reader): 42 | if i == 0: 43 | continue 44 | img_id = line[0] 45 | filename = f'{img_id}.jpg' 46 | if filename != cp_filename: 47 | data_infos.append(dict(filename=filename)) 48 | cp_filename = filename 49 | return data_infos 50 | 51 | 52 | def get_metas_from_txt_style_ann_file(ann_file): 53 | with open(ann_file) as f: 54 | lines = f.readlines() 55 | i = 0 56 | data_infos = [] 57 | while i < len(lines): 58 | filename = lines[i].rstrip() 59 | data_infos.append(dict(filename=filename)) 60 | skip_lines = int(lines[i + 2]) + 3 61 | i += skip_lines 62 | return data_infos 63 | 64 | 65 | def get_image_metas(data_info, img_prefix): 66 | file_client = mmcv.FileClient(backend='disk') 67 | filename = data_info.get('filename', None) 68 | if filename is not None: 69 | if img_prefix is not None: 70 | filename = osp.join(img_prefix, filename) 71 | img_bytes = file_client.get(filename) 72 | img = mmcv.imfrombytes(img_bytes, flag='color') 73 | meta = dict(filename=filename, ori_shape=img.shape) 74 | else: 75 | raise NotImplementedError('Missing `filename` in data_info') 76 | return meta 77 | 78 | 79 | def main(): 80 | args = parse_args() 81 | assert args.out.endswith('pkl'), 'The output file name must be pkl suffix' 82 | 83 | # load config files 84 | cfg = Config.fromfile(args.config) 85 | ann_file = cfg.data.test.ann_file 86 | img_prefix = cfg.data.test.img_prefix 87 | 88 | print(f'{"-" * 5} Start Processing {"-" * 5}') 89 | if ann_file.endswith('csv'): 90 | data_infos = get_metas_from_csv_style_ann_file(ann_file) 91 | elif ann_file.endswith('txt'): 92 | data_infos = get_metas_from_txt_style_ann_file(ann_file) 93 | else: 94 | shuffix = ann_file.split('.')[-1] 95 | raise NotImplementedError('File name must be csv or txt suffix but ' 96 | f'get {shuffix}') 97 | 98 | print(f'Successfully load annotation file from {ann_file}') 99 | print(f'Processing {len(data_infos)} images...') 100 | pool = Pool(args.nproc) 101 | # get image metas with multiple processes 102 | image_metas = pool.starmap( 103 | get_image_metas, 104 | zip(data_infos, [img_prefix for _ in range(len(data_infos))]), 105 | ) 106 | pool.close() 107 | 108 | # save image metas 109 | root_path = cfg.data.test.ann_file.rsplit('/', 1)[0] 110 | save_path = osp.join(root_path, args.out) 111 | mmcv.dump(image_metas, save_path) 112 | print(f'Image meta file save to: {save_path}') 113 | 114 | 115 | if __name__ == '__main__': 116 | main() 117 | -------------------------------------------------------------------------------- /tools/misc/print_config.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import warnings 4 | 5 | from mmcv import Config, DictAction 6 | 7 | from mmdet.utils import update_data_root 8 | 9 | 10 | def parse_args(): 11 | parser = argparse.ArgumentParser(description='Print the whole config') 12 | parser.add_argument('config', help='config file path') 13 | parser.add_argument( 14 | '--options', 15 | nargs='+', 16 | action=DictAction, 17 | help='override some settings in the used config, the key-value pair ' 18 | 'in xxx=yyy format will be merged into config file (deprecate), ' 19 | 'change to --cfg-options instead.') 20 | parser.add_argument( 21 | '--cfg-options', 22 | nargs='+', 23 | action=DictAction, 24 | help='override some settings in the used config, the key-value pair ' 25 | 'in xxx=yyy format will be merged into config file. If the value to ' 26 | 'be overwritten is a list, it should be like key="[a,b]" or key=a,b ' 27 | 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" ' 28 | 'Note that the quotation marks are necessary and that no white space ' 29 | 'is allowed.') 30 | args = parser.parse_args() 31 | 32 | if args.options and args.cfg_options: 33 | raise ValueError( 34 | '--options and --cfg-options cannot be both ' 35 | 'specified, --options is deprecated in favor of --cfg-options') 36 | if args.options: 37 | warnings.warn('--options is deprecated in favor of --cfg-options') 38 | args.cfg_options = args.options 39 | 40 | return args 41 | 42 | 43 | def main(): 44 | args = parse_args() 45 | 46 | cfg = Config.fromfile(args.config) 47 | 48 | # update data root according to MMDET_DATASETS 49 | update_data_root(cfg) 50 | 51 | if args.cfg_options is not None: 52 | cfg.merge_from_dict(args.cfg_options) 53 | print(f'Config:\n{cfg.pretty_text}') 54 | 55 | 56 | if __name__ == '__main__': 57 | main() 58 | -------------------------------------------------------------------------------- /tools/model_converters/detectron2pytorch.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | from collections import OrderedDict 4 | 5 | import mmcv 6 | import torch 7 | 8 | arch_settings = {50: (3, 4, 6, 3), 101: (3, 4, 23, 3)} 9 | 10 | 11 | def convert_bn(blobs, state_dict, caffe_name, torch_name, converted_names): 12 | # detectron replace bn with affine channel layer 13 | state_dict[torch_name + '.bias'] = torch.from_numpy(blobs[caffe_name + 14 | '_b']) 15 | state_dict[torch_name + '.weight'] = torch.from_numpy(blobs[caffe_name + 16 | '_s']) 17 | bn_size = state_dict[torch_name + '.weight'].size() 18 | state_dict[torch_name + '.running_mean'] = torch.zeros(bn_size) 19 | state_dict[torch_name + '.running_var'] = torch.ones(bn_size) 20 | converted_names.add(caffe_name + '_b') 21 | converted_names.add(caffe_name + '_s') 22 | 23 | 24 | def convert_conv_fc(blobs, state_dict, caffe_name, torch_name, 25 | converted_names): 26 | state_dict[torch_name + '.weight'] = torch.from_numpy(blobs[caffe_name + 27 | '_w']) 28 | converted_names.add(caffe_name + '_w') 29 | if caffe_name + '_b' in blobs: 30 | state_dict[torch_name + '.bias'] = torch.from_numpy(blobs[caffe_name + 31 | '_b']) 32 | converted_names.add(caffe_name + '_b') 33 | 34 | 35 | def convert(src, dst, depth): 36 | """Convert keys in detectron pretrained ResNet models to pytorch style.""" 37 | # load arch_settings 38 | if depth not in arch_settings: 39 | raise ValueError('Only support ResNet-50 and ResNet-101 currently') 40 | block_nums = arch_settings[depth] 41 | # load caffe model 42 | caffe_model = mmcv.load(src, encoding='latin1') 43 | blobs = caffe_model['blobs'] if 'blobs' in caffe_model else caffe_model 44 | # convert to pytorch style 45 | state_dict = OrderedDict() 46 | converted_names = set() 47 | convert_conv_fc(blobs, state_dict, 'conv1', 'conv1', converted_names) 48 | convert_bn(blobs, state_dict, 'res_conv1_bn', 'bn1', converted_names) 49 | for i in range(1, len(block_nums) + 1): 50 | for j in range(block_nums[i - 1]): 51 | if j == 0: 52 | convert_conv_fc(blobs, state_dict, f'res{i + 1}_{j}_branch1', 53 | f'layer{i}.{j}.downsample.0', converted_names) 54 | convert_bn(blobs, state_dict, f'res{i + 1}_{j}_branch1_bn', 55 | f'layer{i}.{j}.downsample.1', converted_names) 56 | for k, letter in enumerate(['a', 'b', 'c']): 57 | convert_conv_fc(blobs, state_dict, 58 | f'res{i + 1}_{j}_branch2{letter}', 59 | f'layer{i}.{j}.conv{k+1}', converted_names) 60 | convert_bn(blobs, state_dict, 61 | f'res{i + 1}_{j}_branch2{letter}_bn', 62 | f'layer{i}.{j}.bn{k + 1}', converted_names) 63 | # check if all layers are converted 64 | for key in blobs: 65 | if key not in converted_names: 66 | print(f'Not Convert: {key}') 67 | # save checkpoint 68 | checkpoint = dict() 69 | checkpoint['state_dict'] = state_dict 70 | torch.save(checkpoint, dst) 71 | 72 | 73 | def main(): 74 | parser = argparse.ArgumentParser(description='Convert model keys') 75 | parser.add_argument('src', help='src detectron model path') 76 | parser.add_argument('dst', help='save path') 77 | parser.add_argument('depth', type=int, help='ResNet model depth') 78 | args = parser.parse_args() 79 | convert(args.src, args.dst, args.depth) 80 | 81 | 82 | if __name__ == '__main__': 83 | main() 84 | -------------------------------------------------------------------------------- /tools/model_converters/publish_model.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import subprocess 4 | 5 | import torch 6 | 7 | 8 | def parse_args(): 9 | parser = argparse.ArgumentParser( 10 | description='Process a checkpoint to be published') 11 | parser.add_argument('in_file', help='input checkpoint filename') 12 | parser.add_argument('out_file', help='output checkpoint filename') 13 | args = parser.parse_args() 14 | return args 15 | 16 | 17 | def process_checkpoint(in_file, out_file): 18 | checkpoint = torch.load(in_file, map_location='cpu') 19 | # remove optimizer for smaller file size 20 | if 'optimizer' in checkpoint: 21 | del checkpoint['optimizer'] 22 | # if it is necessary to remove some sensitive data in checkpoint['meta'], 23 | # add the code here. 24 | if torch.__version__ >= '1.6': 25 | torch.save(checkpoint, out_file, _use_new_zipfile_serialization=False) 26 | else: 27 | torch.save(checkpoint, out_file) 28 | sha = subprocess.check_output(['sha256sum', out_file]).decode() 29 | if out_file.endswith('.pth'): 30 | out_file_name = out_file[:-4] 31 | else: 32 | out_file_name = out_file 33 | final_file = out_file_name + f'-{sha[:8]}.pth' 34 | subprocess.Popen(['mv', out_file, final_file]) 35 | 36 | 37 | def main(): 38 | args = parse_args() 39 | process_checkpoint(args.in_file, args.out_file) 40 | 41 | 42 | if __name__ == '__main__': 43 | main() 44 | -------------------------------------------------------------------------------- /tools/model_converters/regnet2mmdet.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | from collections import OrderedDict 4 | 5 | import torch 6 | 7 | 8 | def convert_stem(model_key, model_weight, state_dict, converted_names): 9 | new_key = model_key.replace('stem.conv', 'conv1') 10 | new_key = new_key.replace('stem.bn', 'bn1') 11 | state_dict[new_key] = model_weight 12 | converted_names.add(model_key) 13 | print(f'Convert {model_key} to {new_key}') 14 | 15 | 16 | def convert_head(model_key, model_weight, state_dict, converted_names): 17 | new_key = model_key.replace('head.fc', 'fc') 18 | state_dict[new_key] = model_weight 19 | converted_names.add(model_key) 20 | print(f'Convert {model_key} to {new_key}') 21 | 22 | 23 | def convert_reslayer(model_key, model_weight, state_dict, converted_names): 24 | split_keys = model_key.split('.') 25 | layer, block, module = split_keys[:3] 26 | block_id = int(block[1:]) 27 | layer_name = f'layer{int(layer[1:])}' 28 | block_name = f'{block_id - 1}' 29 | 30 | if block_id == 1 and module == 'bn': 31 | new_key = f'{layer_name}.{block_name}.downsample.1.{split_keys[-1]}' 32 | elif block_id == 1 and module == 'proj': 33 | new_key = f'{layer_name}.{block_name}.downsample.0.{split_keys[-1]}' 34 | elif module == 'f': 35 | if split_keys[3] == 'a_bn': 36 | module_name = 'bn1' 37 | elif split_keys[3] == 'b_bn': 38 | module_name = 'bn2' 39 | elif split_keys[3] == 'c_bn': 40 | module_name = 'bn3' 41 | elif split_keys[3] == 'a': 42 | module_name = 'conv1' 43 | elif split_keys[3] == 'b': 44 | module_name = 'conv2' 45 | elif split_keys[3] == 'c': 46 | module_name = 'conv3' 47 | new_key = f'{layer_name}.{block_name}.{module_name}.{split_keys[-1]}' 48 | else: 49 | raise ValueError(f'Unsupported conversion of key {model_key}') 50 | print(f'Convert {model_key} to {new_key}') 51 | state_dict[new_key] = model_weight 52 | converted_names.add(model_key) 53 | 54 | 55 | def convert(src, dst): 56 | """Convert keys in pycls pretrained RegNet models to mmdet style.""" 57 | # load caffe model 58 | regnet_model = torch.load(src) 59 | blobs = regnet_model['model_state'] 60 | # convert to pytorch style 61 | state_dict = OrderedDict() 62 | converted_names = set() 63 | for key, weight in blobs.items(): 64 | if 'stem' in key: 65 | convert_stem(key, weight, state_dict, converted_names) 66 | elif 'head' in key: 67 | convert_head(key, weight, state_dict, converted_names) 68 | elif key.startswith('s'): 69 | convert_reslayer(key, weight, state_dict, converted_names) 70 | 71 | # check if all layers are converted 72 | for key in blobs: 73 | if key not in converted_names: 74 | print(f'not converted: {key}') 75 | # save checkpoint 76 | checkpoint = dict() 77 | checkpoint['state_dict'] = state_dict 78 | torch.save(checkpoint, dst) 79 | 80 | 81 | def main(): 82 | parser = argparse.ArgumentParser(description='Convert model keys') 83 | parser.add_argument('src', help='src detectron model path') 84 | parser.add_argument('dst', help='save path') 85 | args = parser.parse_args() 86 | convert(args.src, args.dst) 87 | 88 | 89 | if __name__ == '__main__': 90 | main() 91 | -------------------------------------------------------------------------------- /tools/model_converters/selfsup2mmdet.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | from collections import OrderedDict 4 | 5 | import torch 6 | 7 | 8 | def moco_convert(src, dst): 9 | """Convert keys in pycls pretrained moco models to mmdet style.""" 10 | # load caffe model 11 | moco_model = torch.load(src) 12 | blobs = moco_model['state_dict'] 13 | # convert to pytorch style 14 | state_dict = OrderedDict() 15 | for k, v in blobs.items(): 16 | if not k.startswith('module.encoder_q.'): 17 | continue 18 | old_k = k 19 | k = k.replace('module.encoder_q.', '') 20 | state_dict[k] = v 21 | print(old_k, '->', k) 22 | # save checkpoint 23 | checkpoint = dict() 24 | checkpoint['state_dict'] = state_dict 25 | torch.save(checkpoint, dst) 26 | 27 | 28 | def main(): 29 | parser = argparse.ArgumentParser(description='Convert model keys') 30 | parser.add_argument('src', help='src detectron model path') 31 | parser.add_argument('dst', help='save path') 32 | parser.add_argument( 33 | '--selfsup', type=str, choices=['moco', 'swav'], help='save path') 34 | args = parser.parse_args() 35 | if args.selfsup == 'moco': 36 | moco_convert(args.src, args.dst) 37 | elif args.selfsup == 'swav': 38 | print('SWAV does not need to convert the keys') 39 | 40 | 41 | if __name__ == '__main__': 42 | main() 43 | -------------------------------------------------------------------------------- /tools/model_converters/upgrade_model_version.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import re 4 | import tempfile 5 | from collections import OrderedDict 6 | 7 | import torch 8 | from mmcv import Config 9 | 10 | 11 | def is_head(key): 12 | valid_head_list = [ 13 | 'bbox_head', 'mask_head', 'semantic_head', 'grid_head', 'mask_iou_head' 14 | ] 15 | 16 | return any(key.startswith(h) for h in valid_head_list) 17 | 18 | 19 | def parse_config(config_strings): 20 | temp_file = tempfile.NamedTemporaryFile() 21 | config_path = f'{temp_file.name}.py' 22 | with open(config_path, 'w') as f: 23 | f.write(config_strings) 24 | 25 | config = Config.fromfile(config_path) 26 | is_two_stage = True 27 | is_ssd = False 28 | is_retina = False 29 | reg_cls_agnostic = False 30 | if 'rpn_head' not in config.model: 31 | is_two_stage = False 32 | # check whether it is SSD 33 | if config.model.bbox_head.type == 'SSDHead': 34 | is_ssd = True 35 | elif config.model.bbox_head.type == 'RetinaHead': 36 | is_retina = True 37 | elif isinstance(config.model['bbox_head'], list): 38 | reg_cls_agnostic = True 39 | elif 'reg_class_agnostic' in config.model.bbox_head: 40 | reg_cls_agnostic = config.model.bbox_head \ 41 | .reg_class_agnostic 42 | temp_file.close() 43 | return is_two_stage, is_ssd, is_retina, reg_cls_agnostic 44 | 45 | 46 | def reorder_cls_channel(val, num_classes=81): 47 | # bias 48 | if val.dim() == 1: 49 | new_val = torch.cat((val[1:], val[:1]), dim=0) 50 | # weight 51 | else: 52 | out_channels, in_channels = val.shape[:2] 53 | # conv_cls for softmax output 54 | if out_channels != num_classes and out_channels % num_classes == 0: 55 | new_val = val.reshape(-1, num_classes, in_channels, *val.shape[2:]) 56 | new_val = torch.cat((new_val[:, 1:], new_val[:, :1]), dim=1) 57 | new_val = new_val.reshape(val.size()) 58 | # fc_cls 59 | elif out_channels == num_classes: 60 | new_val = torch.cat((val[1:], val[:1]), dim=0) 61 | # agnostic | retina_cls | rpn_cls 62 | else: 63 | new_val = val 64 | 65 | return new_val 66 | 67 | 68 | def truncate_cls_channel(val, num_classes=81): 69 | 70 | # bias 71 | if val.dim() == 1: 72 | if val.size(0) % num_classes == 0: 73 | new_val = val[:num_classes - 1] 74 | else: 75 | new_val = val 76 | # weight 77 | else: 78 | out_channels, in_channels = val.shape[:2] 79 | # conv_logits 80 | if out_channels % num_classes == 0: 81 | new_val = val.reshape(num_classes, in_channels, *val.shape[2:])[1:] 82 | new_val = new_val.reshape(-1, *val.shape[1:]) 83 | # agnostic 84 | else: 85 | new_val = val 86 | 87 | return new_val 88 | 89 | 90 | def truncate_reg_channel(val, num_classes=81): 91 | # bias 92 | if val.dim() == 1: 93 | # fc_reg | rpn_reg 94 | if val.size(0) % num_classes == 0: 95 | new_val = val.reshape(num_classes, -1)[:num_classes - 1] 96 | new_val = new_val.reshape(-1) 97 | # agnostic 98 | else: 99 | new_val = val 100 | # weight 101 | else: 102 | out_channels, in_channels = val.shape[:2] 103 | # fc_reg | rpn_reg 104 | if out_channels % num_classes == 0: 105 | new_val = val.reshape(num_classes, -1, in_channels, 106 | *val.shape[2:])[1:] 107 | new_val = new_val.reshape(-1, *val.shape[1:]) 108 | # agnostic 109 | else: 110 | new_val = val 111 | 112 | return new_val 113 | 114 | 115 | def convert(in_file, out_file, num_classes): 116 | """Convert keys in checkpoints. 117 | 118 | There can be some breaking changes during the development of mmdetection, 119 | and this tool is used for upgrading checkpoints trained with old versions 120 | to the latest one. 121 | """ 122 | checkpoint = torch.load(in_file) 123 | in_state_dict = checkpoint.pop('state_dict') 124 | out_state_dict = OrderedDict() 125 | meta_info = checkpoint['meta'] 126 | is_two_stage, is_ssd, is_retina, reg_cls_agnostic = parse_config( 127 | '#' + meta_info['config']) 128 | if meta_info['mmdet_version'] <= '0.5.3' and is_retina: 129 | upgrade_retina = True 130 | else: 131 | upgrade_retina = False 132 | 133 | # MMDetection v2.5.0 unifies the class order in RPN 134 | # if the model is trained in version=2.5.0 136 | if meta_info['mmdet_version'] < '2.5.0': 137 | upgrade_rpn = True 138 | else: 139 | upgrade_rpn = False 140 | 141 | for key, val in in_state_dict.items(): 142 | new_key = key 143 | new_val = val 144 | if is_two_stage and is_head(key): 145 | new_key = 'roi_head.{}'.format(key) 146 | 147 | # classification 148 | if upgrade_rpn: 149 | m = re.search( 150 | r'(conv_cls|retina_cls|rpn_cls|fc_cls|fcos_cls|' 151 | r'fovea_cls).(weight|bias)', new_key) 152 | else: 153 | m = re.search( 154 | r'(conv_cls|retina_cls|fc_cls|fcos_cls|' 155 | r'fovea_cls).(weight|bias)', new_key) 156 | if m is not None: 157 | print(f'reorder cls channels of {new_key}') 158 | new_val = reorder_cls_channel(val, num_classes) 159 | 160 | # regression 161 | if upgrade_rpn: 162 | m = re.search(r'(fc_reg).(weight|bias)', new_key) 163 | else: 164 | m = re.search(r'(fc_reg|rpn_reg).(weight|bias)', new_key) 165 | if m is not None and not reg_cls_agnostic: 166 | print(f'truncate regression channels of {new_key}') 167 | new_val = truncate_reg_channel(val, num_classes) 168 | 169 | # mask head 170 | m = re.search(r'(conv_logits).(weight|bias)', new_key) 171 | if m is not None: 172 | print(f'truncate mask prediction channels of {new_key}') 173 | new_val = truncate_cls_channel(val, num_classes) 174 | 175 | m = re.search(r'(cls_convs|reg_convs).\d.(weight|bias)', key) 176 | # Legacy issues in RetinaNet since V1.x 177 | # Use ConvModule instead of nn.Conv2d in RetinaNet 178 | # cls_convs.0.weight -> cls_convs.0.conv.weight 179 | if m is not None and upgrade_retina: 180 | param = m.groups()[1] 181 | new_key = key.replace(param, f'conv.{param}') 182 | out_state_dict[new_key] = val 183 | print(f'rename the name of {key} to {new_key}') 184 | continue 185 | 186 | m = re.search(r'(cls_convs).\d.(weight|bias)', key) 187 | if m is not None and is_ssd: 188 | print(f'reorder cls channels of {new_key}') 189 | new_val = reorder_cls_channel(val, num_classes) 190 | 191 | out_state_dict[new_key] = new_val 192 | checkpoint['state_dict'] = out_state_dict 193 | torch.save(checkpoint, out_file) 194 | 195 | 196 | def main(): 197 | parser = argparse.ArgumentParser(description='Upgrade model version') 198 | parser.add_argument('in_file', help='input checkpoint file') 199 | parser.add_argument('out_file', help='output checkpoint file') 200 | parser.add_argument( 201 | '--num-classes', 202 | type=int, 203 | default=81, 204 | help='number of classes of the original model') 205 | args = parser.parse_args() 206 | convert(args.in_file, args.out_file, args.num_classes) 207 | 208 | 209 | if __name__ == '__main__': 210 | main() 211 | -------------------------------------------------------------------------------- /tools/model_converters/upgrade_ssd_version.py: -------------------------------------------------------------------------------- 1 | # Copyright (c) OpenMMLab. All rights reserved. 2 | import argparse 3 | import tempfile 4 | from collections import OrderedDict 5 | 6 | import torch 7 | from mmcv import Config 8 | 9 | 10 | def parse_config(config_strings): 11 | temp_file = tempfile.NamedTemporaryFile() 12 | config_path = f'{temp_file.name}.py' 13 | with open(config_path, 'w') as f: 14 | f.write(config_strings) 15 | 16 | config = Config.fromfile(config_path) 17 | # check whether it is SSD 18 | if config.model.bbox_head.type != 'SSDHead': 19 | raise AssertionError('This is not a SSD model.') 20 | 21 | 22 | def convert(in_file, out_file): 23 | checkpoint = torch.load(in_file) 24 | in_state_dict = checkpoint.pop('state_dict') 25 | out_state_dict = OrderedDict() 26 | meta_info = checkpoint['meta'] 27 | parse_config('#' + meta_info['config']) 28 | for key, value in in_state_dict.items(): 29 | if 'extra' in key: 30 | layer_idx = int(key.split('.')[2]) 31 | new_key = 'neck.extra_layers.{}.{}.conv.'.format( 32 | layer_idx // 2, layer_idx % 2) + key.split('.')[-1] 33 | elif 'l2_norm' in key: 34 | new_key = 'neck.l2_norm.weight' 35 | elif 'bbox_head' in key: 36 | new_key = key[:21] + '.0' + key[21:] 37 | else: 38 | new_key = key 39 | out_state_dict[new_key] = value 40 | checkpoint['state_dict'] = out_state_dict 41 | 42 | if torch.__version__ >= '1.6': 43 | torch.save(checkpoint, out_file, _use_new_zipfile_serialization=False) 44 | else: 45 | torch.save(checkpoint, out_file) 46 | 47 | 48 | def main(): 49 | parser = argparse.ArgumentParser(description='Upgrade SSD version') 50 | parser.add_argument('in_file', help='input checkpoint file') 51 | parser.add_argument('out_file', help='output checkpoint file') 52 | 53 | args = parser.parse_args() 54 | convert(args.in_file, args.out_file) 55 | 56 | 57 | if __name__ == '__main__': 58 | main() 59 | -------------------------------------------------------------------------------- /tools/slurm_test.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | set -x 4 | 5 | PARTITION=$1 6 | JOB_NAME=$2 7 | CONFIG=$3 8 | CHECKPOINT=$4 9 | GPUS=${GPUS:-8} 10 | GPUS_PER_NODE=${GPUS_PER_NODE:-8} 11 | CPUS_PER_TASK=${CPUS_PER_TASK:-5} 12 | PY_ARGS=${@:5} 13 | SRUN_ARGS=${SRUN_ARGS:-""} 14 | 15 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 16 | srun -p ${PARTITION} \ 17 | --job-name=${JOB_NAME} \ 18 | --gres=gpu:${GPUS_PER_NODE} \ 19 | --ntasks=${GPUS} \ 20 | --ntasks-per-node=${GPUS_PER_NODE} \ 21 | --cpus-per-task=${CPUS_PER_TASK} \ 22 | --kill-on-bad-exit=1 \ 23 | ${SRUN_ARGS} \ 24 | python -u tools/test.py ${CONFIG} ${CHECKPOINT} --launcher="slurm" ${PY_ARGS} 25 | -------------------------------------------------------------------------------- /tools/slurm_train.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | set -x 4 | 5 | PARTITION=$1 6 | JOB_NAME=$2 7 | CONFIG=$3 8 | WORK_DIR=$4 9 | GPUS=${GPUS:-8} 10 | GPUS_PER_NODE=${GPUS_PER_NODE:-8} 11 | CPUS_PER_TASK=${CPUS_PER_TASK:-5} 12 | SRUN_ARGS=${SRUN_ARGS:-""} 13 | PY_ARGS=${@:5} 14 | 15 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ 16 | srun -p ${PARTITION} \ 17 | --job-name=${JOB_NAME} \ 18 | --gres=gpu:${GPUS_PER_NODE} \ 19 | --ntasks=${GPUS} \ 20 | --ntasks-per-node=${GPUS_PER_NODE} \ 21 | --cpus-per-task=${CPUS_PER_TASK} \ 22 | --kill-on-bad-exit=1 \ 23 | ${SRUN_ARGS} \ 24 | python -u tools/train.py ${CONFIG} --work-dir=${WORK_DIR} --launcher="slurm" ${PY_ARGS} 25 | --------------------------------------------------------------------------------