├── README.md └── configs ├── ATSS ├── atss_r50-caffe_coco_ms_1x.py ├── atss_r50-caffe_coco_ms_2x.py └── atss_r50-caffe_coco_ms_3x.py └── FCOS ├── fcos_r50-caffe_coco_ms_1x.py ├── fcos_r50-caffe_coco_ms_2x.py └── fcos_r50-caffe_coco_ms_3x.py /README.md: -------------------------------------------------------------------------------- 1 | # Object Detection Made Simpler by Eliminating Heuristic NMS 2 | PyTorch Implementation for Our Paper: "Object Detection Made Simpler by Eliminating Heuristic NMS" 3 | 4 | 5 | ## Requirements 6 | * Python 3.7 7 | * PyTorch 1.5.1 8 | * [mmdetectoin](https://github.com/open-mmlab/mmdetection) 9 | 10 | ## Usage: 11 | 12 | The code is being submitted to the company for open source review. 13 | 14 | ## PSS for nms-free Object Detection: 15 | 16 | #### End-to-End Training 17 | 18 | | Model | Backbone | lr sched | mAP (COCO2017 val) | link 19 | | ------------ | ------------ |------------ |------------ |------------ | 20 | | FCOS | R50 | 3x | 42.0 | [model](http://ai4prz5kwonline.oss-cn-zhangjiakou.aliyuncs.com/jianchong_new%2Ffcos_r50-caffe_coco-480-800-3x.pth?Expires=1931844166&OSSAccessKeyId=FbmV29ZaCO4CLhjO&Signature=7pHiB3dhpl5lgzxJaR5i5J87dBA%3D) | 21 | | ATSS | R50 | 3x | 42.8 | [model](http://ai4prz5kwonline.oss-cn-zhangjiakou.aliyuncs.com/jianchong_new%2Fatss_r50-caffe_coco-480-800-3x.pth?Expires=2416484320&OSSAccessKeyId=FbmV29ZaCO4CLhjO&Signature=JnsqZMLOWd6grL%2F6thoa4kITlWM%3D) | 22 | | FCOSpss | R50 | 3x | 42.3 | | 23 | | ATSSpss | R50 | 3x | 42.6 | | 24 | | VFNETpss | R50 | 3x | 44.0 | | 25 | | FCOSpss | R101 | 3x | 44.1 | | 26 | | ATSSpss | R101 | 3x | 44.2 | | 27 | | VFNETpss | R101 | 3x | 45.7 | | 28 | | FCOSpss | X-101-DCN | 2x | 47.0 | | 29 | | ATSSpss | X-101-DCN | 2x | 47.3 | | 30 | | VFNETpss | X-101-DCN | 2x | 49.3 | | 31 | | FCOSpss | R2N-101-DCN | 2x | 48.2 | | 32 | | ATSSpss | R2N-101-DCN | 2x | 48.6 | | 33 | | VFNETpss | R2N-101-DCN | 2x | 50.0 | | 34 | 35 | **NOTE:** All models are trained with multi-scale training schedule of ‘[480, 800] $\times$ 1333’. 36 | 37 | #### Two-Step Training 38 | If we have a pretrained model, only finetuning the PSS head can save the training time. 39 | 40 | | Model | Backbone | MS Training | lr sched | mAP
pretrain model (w NMS) | mAP
finetuned PSS (w/o NMS) 41 | | ------------ | ------------ | ------------ | ------------ | ------------ |------------ | 42 | | GFocalV2pss | R50 | Yes | 12 | [43.9](https://github.com/implus/GFocalV2 "43.9") |43.3 | 43 | | GFocalV2pss | X-101-32x4d-DCN | Yes | 12 | [48.8](https://github.com/implus/GFocalV2 "48.8") | 48.2 | 44 | | GFocalV2pss | R2N-101-DCN | Yes | 12 | [49.9](https://github.com/implus/GFocalV2 "49.9") | 49.2 | 45 | 46 | 47 | 48 | ## PSS for nms-free Instance Segmentation 49 | 50 | #### End-to-End Training 51 | 52 | | Model | Backbone | lr sched | bbox mAP | segm mAP | link 53 | | ------------ | ------------ | ------------ |------------ |------------ |------------ | 54 | | [CondInst](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst) | R50 | 3x | 41.9 | 37.5 | | 55 | | CondInstpss | R50 | 3x | 41.2 | 36.7 | | 56 | | [CondInst + sem](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst) | R50 | 3x | 42.6 | 38.2 | | 57 | | CondInstpss + sem | R50 | 3x | 42.3 | 37.7 | | 58 | | [CondInst](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst) | R101 | 3x | 43.3 | 38.6 | | 59 | | CondInstpss | R101 | 3x | 43.1 | 38.2 | | 60 | | [CondInst + sem](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst) | R101 | 3x | 44.6 | 39.8 | | 61 | | CondInstpss + sem | R101 | 3x | 44.1 | 39.3 | | 62 | 63 | **NOTE:** All models are trained with multi-scale training schedule of ‘[640, 800] $\times$ 1333’. 64 | 65 | ## Citation 66 | If you use the package in your research, please cite our paper: 67 | ``` 68 | @misc{zhou2021object, 69 | title={Object Detection Made Simpler by Eliminating Heuristic NMS}, 70 | author={Qiang Zhou and Chaohui Yu and Chunhua Shen and Zhibin Wang and Hao Li}, 71 | year={2021}, 72 | eprint={2101.11782}, 73 | archivePrefix={arXiv}, 74 | primaryClass={cs.CV} 75 | } 76 | ``` 77 | 78 | ## PS 79 | 团队长期招聘中,社招/实习/校招,我们都要 ! 80 | 欢迎投递简历,邮箱: `zhouqiang@zju.edu.cn` 81 | -------------------------------------------------------------------------------- /configs/ATSS/atss_r50-caffe_coco_ms_1x.py: -------------------------------------------------------------------------------- 1 | samples_per_gpu = 2 2 | num_gpus = 8 3 | num_classes = 80 4 | 5 | dataset_type = 'CocoDataset' 6 | data_root = 'datasets/coco/coco_2017/' 7 | 8 | img_norm_cfg = dict( 9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) # for r50-caffe 10 | train_pipeline = [ 11 | dict(type='LoadImageFromFile'), 12 | dict(type='LoadAnnotations', with_bbox=True), 13 | dict(type='Resize', 14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), 15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333), 16 | (736, 1333), (768, 1333), (800, 1333)], 17 | multiscale_mode='value', 18 | keep_ratio=True), 19 | dict(type='RandomFlip', flip_ratio=0.5), 20 | dict(type='Normalize', **img_norm_cfg), 21 | dict(type='Pad', size_divisor=32), 22 | dict(type='DefaultFormatBundle'), 23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 24 | ] 25 | test_pipeline = [ 26 | dict(type='LoadImageFromFile'), 27 | dict( 28 | type='MultiScaleFlipAug', 29 | img_scale=(1333, 800), 30 | flip=False, 31 | transforms=[ 32 | dict(type='Resize', keep_ratio=True), 33 | dict(type='RandomFlip'), 34 | dict(type='Normalize', **img_norm_cfg), 35 | dict(type='Pad', size_divisor=32), 36 | dict(type='ImageToTensor', keys=['img']), 37 | dict(type='Collect', keys=['img']), 38 | ]) 39 | ] 40 | 41 | data = dict( 42 | samples_per_gpu=samples_per_gpu, 43 | workers_per_gpu=samples_per_gpu, 44 | train=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_train2017.json', 47 | img_prefix=data_root + 'train2017/', 48 | pipeline=train_pipeline), 49 | val=dict( 50 | type=dataset_type, 51 | ann_file=data_root + 'annotations/instances_val2017.json', 52 | img_prefix=data_root + 'val2017/', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'annotations/instances_val2017.json', 57 | img_prefix=data_root + 'val2017/', 58 | pipeline=test_pipeline) 59 | ) 60 | evaluation = dict(interval=1, metric='bbox') 61 | 62 | 63 | model = dict( 64 | type='ATSS', 65 | pretrained='open-mmlab://detectron/resnet50_caffe', 66 | backbone=dict( 67 | type='ResNet', 68 | depth=50, 69 | num_stages=4, 70 | out_indices=(0, 1, 2, 3), 71 | frozen_stages=1, 72 | norm_cfg=dict(type='BN', requires_grad=False), 73 | norm_eval=True, 74 | style='caffe'), 75 | neck=dict( 76 | type='FPN', 77 | in_channels=[256, 512, 1024, 2048], 78 | out_channels=256, 79 | start_level=1, 80 | add_extra_convs='on_output', 81 | num_outs=5), 82 | bbox_head=dict( 83 | type='ATSSHead', 84 | num_classes=num_classes, 85 | in_channels=256, 86 | stacked_convs=4, 87 | feat_channels=256, 88 | anchor_generator=dict( 89 | type='AnchorGenerator', 90 | ratios=[1.0], 91 | octave_base_scale=8, 92 | scales_per_octave=1, 93 | strides=[8, 16, 32, 64, 128]), 94 | bbox_coder=dict( 95 | type='DeltaXYWHBBoxCoder', 96 | target_means=[.0, .0, .0, .0], 97 | target_stds=[0.1, 0.1, 0.2, 0.2]), 98 | loss_cls=dict( 99 | type='FocalLoss', 100 | use_sigmoid=True, 101 | gamma=2.0, 102 | alpha=0.25, 103 | loss_weight=1.0), 104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0), 105 | loss_centerness=dict( 106 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0))) 107 | # training and testing settings 108 | train_cfg = dict( 109 | assigner=dict(type='ATSSAssigner', topk=9), 110 | allowed_border=-1, 111 | pos_weight=-1, 112 | debug=False) 113 | test_cfg = dict( 114 | nms_pre=1000, 115 | min_bbox_size=0, 116 | score_thr=0.05, 117 | nms=dict(type='nms', iou_thr=0.6), 118 | max_per_img=100) 119 | 120 | # optimizer 121 | lr = samples_per_gpu * num_gpus / 16 * 0.01 122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001) 123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) 124 | 125 | 126 | # learning policy 127 | lr_config = dict( 128 | policy='step', 129 | warmup='linear', 130 | warmup_iters=500, 131 | warmup_ratio=1.0 / 3, 132 | step=[8, 11]) 133 | total_epochs = 12 134 | 135 | checkpoint_config = dict(interval=1) 136 | # yapf:disable 137 | log_config = dict( 138 | interval=50, 139 | hooks=[ 140 | dict(type='TextLoggerHook'), 141 | # dict(type='TensorboardLoggerHook') 142 | ]) 143 | # yapf:enable 144 | dist_params = dict(backend='nccl') 145 | log_level = 'INFO' 146 | load_from = None 147 | resume_from = None 148 | workflow = [('train', 1)] 149 | 150 | find_unused_parameters=True 151 | -------------------------------------------------------------------------------- /configs/ATSS/atss_r50-caffe_coco_ms_2x.py: -------------------------------------------------------------------------------- 1 | samples_per_gpu = 2 2 | num_gpus = 8 3 | num_classes = 80 4 | 5 | dataset_type = 'CocoDataset' 6 | data_root = 'datasets/coco/coco_2017/' 7 | 8 | img_norm_cfg = dict( 9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) # for r50-caffe 10 | train_pipeline = [ 11 | dict(type='LoadImageFromFile'), 12 | dict(type='LoadAnnotations', with_bbox=True), 13 | dict(type='Resize', 14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), 15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333), 16 | (736, 1333), (768, 1333), (800, 1333)], 17 | multiscale_mode='value', 18 | keep_ratio=True), 19 | dict(type='RandomFlip', flip_ratio=0.5), 20 | dict(type='Normalize', **img_norm_cfg), 21 | dict(type='Pad', size_divisor=32), 22 | dict(type='DefaultFormatBundle'), 23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 24 | ] 25 | test_pipeline = [ 26 | dict(type='LoadImageFromFile'), 27 | dict( 28 | type='MultiScaleFlipAug', 29 | img_scale=(1333, 800), 30 | flip=False, 31 | transforms=[ 32 | dict(type='Resize', keep_ratio=True), 33 | dict(type='RandomFlip'), 34 | dict(type='Normalize', **img_norm_cfg), 35 | dict(type='Pad', size_divisor=32), 36 | dict(type='ImageToTensor', keys=['img']), 37 | dict(type='Collect', keys=['img']), 38 | ]) 39 | ] 40 | 41 | data = dict( 42 | samples_per_gpu=samples_per_gpu, 43 | workers_per_gpu=samples_per_gpu, 44 | train=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_train2017.json', 47 | img_prefix=data_root + 'train2017/', 48 | pipeline=train_pipeline), 49 | val=dict( 50 | type=dataset_type, 51 | ann_file=data_root + 'annotations/instances_val2017.json', 52 | img_prefix=data_root + 'val2017/', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'annotations/instances_val2017.json', 57 | img_prefix=data_root + 'val2017/', 58 | pipeline=test_pipeline) 59 | ) 60 | evaluation = dict(interval=1, metric='bbox') 61 | 62 | 63 | model = dict( 64 | type='ATSS', 65 | pretrained='open-mmlab://detectron/resnet50_caffe', 66 | backbone=dict( 67 | type='ResNet', 68 | depth=50, 69 | num_stages=4, 70 | out_indices=(0, 1, 2, 3), 71 | frozen_stages=1, 72 | norm_cfg=dict(type='BN', requires_grad=False), 73 | norm_eval=True, 74 | style='caffe'), 75 | neck=dict( 76 | type='FPN', 77 | in_channels=[256, 512, 1024, 2048], 78 | out_channels=256, 79 | start_level=1, 80 | add_extra_convs='on_output', 81 | num_outs=5), 82 | bbox_head=dict( 83 | type='ATSSHead', 84 | num_classes=num_classes, 85 | in_channels=256, 86 | stacked_convs=4, 87 | feat_channels=256, 88 | anchor_generator=dict( 89 | type='AnchorGenerator', 90 | ratios=[1.0], 91 | octave_base_scale=8, 92 | scales_per_octave=1, 93 | strides=[8, 16, 32, 64, 128]), 94 | bbox_coder=dict( 95 | type='DeltaXYWHBBoxCoder', 96 | target_means=[.0, .0, .0, .0], 97 | target_stds=[0.1, 0.1, 0.2, 0.2]), 98 | loss_cls=dict( 99 | type='FocalLoss', 100 | use_sigmoid=True, 101 | gamma=2.0, 102 | alpha=0.25, 103 | loss_weight=1.0), 104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0), 105 | loss_centerness=dict( 106 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0))) 107 | # training and testing settings 108 | train_cfg = dict( 109 | assigner=dict(type='ATSSAssigner', topk=9), 110 | allowed_border=-1, 111 | pos_weight=-1, 112 | debug=False) 113 | test_cfg = dict( 114 | nms_pre=1000, 115 | min_bbox_size=0, 116 | score_thr=0.05, 117 | nms=dict(type='nms', iou_thr=0.6), 118 | max_per_img=100) 119 | 120 | # optimizer 121 | lr = samples_per_gpu * num_gpus / 16 * 0.01 122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001) 123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) 124 | 125 | 126 | # learning policy 127 | lr_config = dict( 128 | policy='step', 129 | warmup='linear', 130 | warmup_iters=500, 131 | warmup_ratio=1.0 / 3, 132 | step=[18, 22]) 133 | total_epochs = 24 134 | 135 | checkpoint_config = dict(interval=1) 136 | # yapf:disable 137 | log_config = dict( 138 | interval=50, 139 | hooks=[ 140 | dict(type='TextLoggerHook'), 141 | # dict(type='TensorboardLoggerHook') 142 | ]) 143 | # yapf:enable 144 | dist_params = dict(backend='nccl') 145 | log_level = 'INFO' 146 | load_from = None 147 | resume_from = None 148 | workflow = [('train', 1)] 149 | 150 | find_unused_parameters=True 151 | -------------------------------------------------------------------------------- /configs/ATSS/atss_r50-caffe_coco_ms_3x.py: -------------------------------------------------------------------------------- 1 | samples_per_gpu = 2 2 | num_gpus = 8 3 | num_classes = 80 4 | 5 | dataset_type = 'CocoDataset' 6 | data_root = 'datasets/coco/coco_2017/' 7 | 8 | img_norm_cfg = dict( 9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) # for r50-caffe 10 | train_pipeline = [ 11 | dict(type='LoadImageFromFile'), 12 | dict(type='LoadAnnotations', with_bbox=True), 13 | dict(type='Resize', 14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), 15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333), 16 | (736, 1333), (768, 1333), (800, 1333)], 17 | multiscale_mode='value', 18 | keep_ratio=True), 19 | dict(type='RandomFlip', flip_ratio=0.5), 20 | dict(type='Normalize', **img_norm_cfg), 21 | dict(type='Pad', size_divisor=32), 22 | dict(type='DefaultFormatBundle'), 23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 24 | ] 25 | test_pipeline = [ 26 | dict(type='LoadImageFromFile'), 27 | dict( 28 | type='MultiScaleFlipAug', 29 | img_scale=(1333, 800), 30 | flip=False, 31 | transforms=[ 32 | dict(type='Resize', keep_ratio=True), 33 | dict(type='RandomFlip'), 34 | dict(type='Normalize', **img_norm_cfg), 35 | dict(type='Pad', size_divisor=32), 36 | dict(type='ImageToTensor', keys=['img']), 37 | dict(type='Collect', keys=['img']), 38 | ]) 39 | ] 40 | 41 | data = dict( 42 | samples_per_gpu=samples_per_gpu, 43 | workers_per_gpu=samples_per_gpu, 44 | train=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_train2017.json', 47 | img_prefix=data_root + 'train2017/', 48 | pipeline=train_pipeline), 49 | val=dict( 50 | type=dataset_type, 51 | ann_file=data_root + 'annotations/instances_val2017.json', 52 | img_prefix=data_root + 'val2017/', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'annotations/instances_val2017.json', 57 | img_prefix=data_root + 'val2017/', 58 | pipeline=test_pipeline) 59 | ) 60 | evaluation = dict(interval=1, metric='bbox') 61 | 62 | 63 | model = dict( 64 | type='ATSS', 65 | pretrained='open-mmlab://detectron/resnet50_caffe', 66 | backbone=dict( 67 | type='ResNet', 68 | depth=50, 69 | num_stages=4, 70 | out_indices=(0, 1, 2, 3), 71 | frozen_stages=1, 72 | norm_cfg=dict(type='BN', requires_grad=False), 73 | norm_eval=True, 74 | style='caffe'), 75 | neck=dict( 76 | type='FPN', 77 | in_channels=[256, 512, 1024, 2048], 78 | out_channels=256, 79 | start_level=1, 80 | add_extra_convs='on_output', 81 | num_outs=5), 82 | bbox_head=dict( 83 | type='ATSSHead', 84 | num_classes=num_classes, 85 | in_channels=256, 86 | stacked_convs=4, 87 | feat_channels=256, 88 | anchor_generator=dict( 89 | type='AnchorGenerator', 90 | ratios=[1.0], 91 | octave_base_scale=8, 92 | scales_per_octave=1, 93 | strides=[8, 16, 32, 64, 128]), 94 | bbox_coder=dict( 95 | type='DeltaXYWHBBoxCoder', 96 | target_means=[.0, .0, .0, .0], 97 | target_stds=[0.1, 0.1, 0.2, 0.2]), 98 | loss_cls=dict( 99 | type='FocalLoss', 100 | use_sigmoid=True, 101 | gamma=2.0, 102 | alpha=0.25, 103 | loss_weight=1.0), 104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0), 105 | loss_centerness=dict( 106 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0))) 107 | # training and testing settings 108 | train_cfg = dict( 109 | assigner=dict(type='ATSSAssigner', topk=9), 110 | allowed_border=-1, 111 | pos_weight=-1, 112 | debug=False) 113 | test_cfg = dict( 114 | nms_pre=1000, 115 | min_bbox_size=0, 116 | score_thr=0.05, 117 | nms=dict(type='nms', iou_thr=0.6), 118 | max_per_img=100) 119 | 120 | # optimizer 121 | lr = samples_per_gpu * num_gpus / 16 * 0.01 122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001) 123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) 124 | 125 | 126 | # learning policy 127 | lr_config = dict( 128 | policy='step', 129 | warmup='linear', 130 | warmup_iters=500, 131 | warmup_ratio=1.0 / 3, 132 | step=[27, 33]) 133 | total_epochs = 36 134 | 135 | checkpoint_config = dict(interval=1) 136 | # yapf:disable 137 | log_config = dict( 138 | interval=50, 139 | hooks=[ 140 | dict(type='TextLoggerHook'), 141 | # dict(type='TensorboardLoggerHook') 142 | ]) 143 | # yapf:enable 144 | dist_params = dict(backend='nccl') 145 | log_level = 'INFO' 146 | load_from = None 147 | resume_from = None 148 | workflow = [('train', 1)] 149 | 150 | find_unused_parameters=True 151 | -------------------------------------------------------------------------------- /configs/FCOS/fcos_r50-caffe_coco_ms_1x.py: -------------------------------------------------------------------------------- 1 | samples_per_gpu = 2 2 | num_gpus = 8 3 | num_classes = 80 4 | 5 | dataset_type = 'CocoDataset' 6 | data_root = 'datasets/coco/coco_2017/' 7 | 8 | img_norm_cfg = dict( 9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) 10 | train_pipeline = [ 11 | dict(type='LoadImageFromFile'), 12 | dict(type='LoadAnnotations', with_bbox=True), 13 | dict(type='Resize', 14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), 15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333), 16 | (736, 1333), (768, 1333), (800, 1333)], 17 | multiscale_mode='value', 18 | keep_ratio=True), 19 | dict(type='RandomFlip', flip_ratio=0.5), 20 | dict(type='Normalize', **img_norm_cfg), 21 | dict(type='Pad', size_divisor=32), 22 | dict(type='DefaultFormatBundle'), 23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 24 | ] 25 | test_pipeline = [ 26 | dict(type='LoadImageFromFile'), 27 | dict( 28 | type='MultiScaleFlipAug', 29 | img_scale=(1333, 800), 30 | flip=False, 31 | transforms=[ 32 | dict(type='Resize', keep_ratio=True), 33 | dict(type='RandomFlip'), 34 | dict(type='Normalize', **img_norm_cfg), 35 | dict(type='Pad', size_divisor=32), 36 | dict(type='ImageToTensor', keys=['img']), 37 | dict(type='Collect', keys=['img']), 38 | ]) 39 | ] 40 | 41 | data = dict( 42 | samples_per_gpu=samples_per_gpu, 43 | workers_per_gpu=samples_per_gpu, 44 | train=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_train2017.json', 47 | img_prefix=data_root + 'train2017/', 48 | pipeline=train_pipeline), 49 | val=dict( 50 | type=dataset_type, 51 | ann_file=data_root + 'annotations/instances_val2017.json', 52 | img_prefix=data_root + 'val2017/', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'annotations/instances_val2017.json', 57 | img_prefix=data_root + 'val2017/', 58 | pipeline=test_pipeline) 59 | ) 60 | evaluation = dict(interval=1, metric='bbox') 61 | 62 | model = dict( 63 | type='FCOS', 64 | pretrained='open-mmlab://detectron/resnet50_caffe', 65 | backbone=dict( 66 | type='ResNet', 67 | depth=50, 68 | num_stages=4, 69 | out_indices=(0, 1, 2, 3), 70 | frozen_stages=1, 71 | norm_cfg=dict(type='BN', requires_grad=False), 72 | norm_eval=True, 73 | style='caffe'), 74 | neck=dict( 75 | type='FPN', 76 | in_channels=[256, 512, 1024, 2048], 77 | out_channels=256, 78 | start_level=1, 79 | add_extra_convs=True, 80 | extra_convs_on_inputs=False, 81 | num_outs=5, 82 | relu_before_extra_convs=True), 83 | 84 | bbox_head=dict( 85 | type='FCOSHead', 86 | num_classes=num_classes, 87 | in_channels=256, 88 | stacked_convs=4, 89 | feat_channels=256, 90 | strides=[8, 16, 32, 64, 128], 91 | regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512), 92 | (512, 10000)), 93 | center_sampling=True, 94 | center_sample_radius=1.5, 95 | norm_on_bbox=True, 96 | centerness_on_reg=True, 97 | norm_cfg=dict(type='GN', num_groups=32, requires_grad=True), 98 | loss_cls=dict( 99 | type='FocalLoss', 100 | use_sigmoid=True, 101 | gamma=2.0, 102 | alpha=0.25, 103 | loss_weight=1.0), 104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0), 105 | loss_centerness=dict( 106 | type='CrossEntropyLoss', 107 | use_sigmoid=True, 108 | loss_weight=1.0), 109 | )) 110 | 111 | # training and testing settings 112 | train_cfg = dict() 113 | test_cfg = dict( 114 | nms_pre=1000, 115 | min_bbox_size=0, 116 | score_thr=0.05, 117 | nms=dict(type='nms', iou_thr=0.6), 118 | max_per_img=100) 119 | 120 | 121 | # optimizer 122 | lr = samples_per_gpu * num_gpus / 16 * 0.01 123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001) 124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) 125 | 126 | # learning policy 127 | lr_config = dict( 128 | policy='step', 129 | warmup='linear', 130 | warmup_iters=500, 131 | warmup_ratio=1.0 / 3, 132 | step=[8, 11]) 133 | total_epochs = 12 134 | 135 | checkpoint_config = dict(interval=1) 136 | # yapf:disable 137 | log_config = dict( 138 | interval=50, 139 | hooks=[ 140 | dict(type='TextLoggerHook'), 141 | # dict(type='TensorboardLoggerHook') 142 | ]) 143 | # yapf:enable 144 | dist_params = dict(backend='nccl') 145 | log_level = 'INFO' 146 | load_from = None 147 | resume_from = None 148 | workflow = [('train', 1)] 149 | 150 | find_unused_parameters = True 151 | -------------------------------------------------------------------------------- /configs/FCOS/fcos_r50-caffe_coco_ms_2x.py: -------------------------------------------------------------------------------- 1 | samples_per_gpu = 2 2 | num_gpus = 8 3 | num_classes = 80 4 | 5 | dataset_type = 'CocoDataset' 6 | data_root = 'datasets/coco/coco_2017/' 7 | 8 | img_norm_cfg = dict( 9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) 10 | train_pipeline = [ 11 | dict(type='LoadImageFromFile'), 12 | dict(type='LoadAnnotations', with_bbox=True), 13 | dict(type='Resize', 14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), 15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333), 16 | (736, 1333), (768, 1333), (800, 1333)], 17 | multiscale_mode='value', 18 | keep_ratio=True), 19 | dict(type='RandomFlip', flip_ratio=0.5), 20 | dict(type='Normalize', **img_norm_cfg), 21 | dict(type='Pad', size_divisor=32), 22 | dict(type='DefaultFormatBundle'), 23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 24 | ] 25 | test_pipeline = [ 26 | dict(type='LoadImageFromFile'), 27 | dict( 28 | type='MultiScaleFlipAug', 29 | img_scale=(1333, 800), 30 | flip=False, 31 | transforms=[ 32 | dict(type='Resize', keep_ratio=True), 33 | dict(type='RandomFlip'), 34 | dict(type='Normalize', **img_norm_cfg), 35 | dict(type='Pad', size_divisor=32), 36 | dict(type='ImageToTensor', keys=['img']), 37 | dict(type='Collect', keys=['img']), 38 | ]) 39 | ] 40 | 41 | data = dict( 42 | samples_per_gpu=samples_per_gpu, 43 | workers_per_gpu=samples_per_gpu, 44 | train=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_train2017.json', 47 | img_prefix=data_root + 'train2017/', 48 | pipeline=train_pipeline), 49 | val=dict( 50 | type=dataset_type, 51 | ann_file=data_root + 'annotations/instances_val2017.json', 52 | img_prefix=data_root + 'val2017/', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'annotations/instances_val2017.json', 57 | img_prefix=data_root + 'val2017/', 58 | pipeline=test_pipeline) 59 | ) 60 | evaluation = dict(interval=1, metric='bbox') 61 | 62 | model = dict( 63 | type='FCOS', 64 | pretrained='open-mmlab://detectron/resnet50_caffe', 65 | backbone=dict( 66 | type='ResNet', 67 | depth=50, 68 | num_stages=4, 69 | out_indices=(0, 1, 2, 3), 70 | frozen_stages=1, 71 | norm_cfg=dict(type='BN', requires_grad=False), 72 | norm_eval=True, 73 | style='caffe'), 74 | neck=dict( 75 | type='FPN', 76 | in_channels=[256, 512, 1024, 2048], 77 | out_channels=256, 78 | start_level=1, 79 | add_extra_convs=True, 80 | extra_convs_on_inputs=False, 81 | num_outs=5, 82 | relu_before_extra_convs=True), 83 | 84 | bbox_head=dict( 85 | type='FCOSHead', 86 | num_classes=num_classes, 87 | in_channels=256, 88 | stacked_convs=4, 89 | feat_channels=256, 90 | strides=[8, 16, 32, 64, 128], 91 | regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512), 92 | (512, 10000)), 93 | center_sampling=True, 94 | center_sample_radius=1.5, 95 | norm_on_bbox=True, 96 | centerness_on_reg=True, 97 | norm_cfg=dict(type='GN', num_groups=32, requires_grad=True), 98 | loss_cls=dict( 99 | type='FocalLoss', 100 | use_sigmoid=True, 101 | gamma=2.0, 102 | alpha=0.25, 103 | loss_weight=1.0), 104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0), 105 | loss_centerness=dict( 106 | type='CrossEntropyLoss', 107 | use_sigmoid=True, 108 | loss_weight=1.0), 109 | )) 110 | 111 | # training and testing settings 112 | train_cfg = dict() 113 | test_cfg = dict( 114 | nms_pre=1000, 115 | min_bbox_size=0, 116 | score_thr=0.05, 117 | nms=dict(type='nms', iou_thr=0.6), 118 | max_per_img=100) 119 | 120 | 121 | # optimizer 122 | lr = samples_per_gpu * num_gpus / 16 * 0.01 123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001) 124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) 125 | 126 | # learning policy 127 | lr_config = dict( 128 | policy='step', 129 | warmup='linear', 130 | warmup_iters=500, 131 | warmup_ratio=1.0 / 3, 132 | step=[18, 22]) 133 | total_epochs = 24 134 | 135 | checkpoint_config = dict(interval=1) 136 | # yapf:disable 137 | log_config = dict( 138 | interval=50, 139 | hooks=[ 140 | dict(type='TextLoggerHook'), 141 | # dict(type='TensorboardLoggerHook') 142 | ]) 143 | # yapf:enable 144 | dist_params = dict(backend='nccl') 145 | log_level = 'INFO' 146 | load_from = None 147 | resume_from = None 148 | workflow = [('train', 1)] 149 | 150 | find_unused_parameters = True 151 | -------------------------------------------------------------------------------- /configs/FCOS/fcos_r50-caffe_coco_ms_3x.py: -------------------------------------------------------------------------------- 1 | samples_per_gpu = 2 2 | num_gpus = 8 3 | num_classes = 80 4 | 5 | dataset_type = 'CocoDataset' 6 | data_root = 'datasets/coco/coco_2017/' 7 | 8 | img_norm_cfg = dict( 9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) 10 | train_pipeline = [ 11 | dict(type='LoadImageFromFile'), 12 | dict(type='LoadAnnotations', with_bbox=True), 13 | dict(type='Resize', 14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333), 15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333), 16 | (736, 1333), (768, 1333), (800, 1333)], 17 | multiscale_mode='value', 18 | keep_ratio=True), 19 | dict(type='RandomFlip', flip_ratio=0.5), 20 | dict(type='Normalize', **img_norm_cfg), 21 | dict(type='Pad', size_divisor=32), 22 | dict(type='DefaultFormatBundle'), 23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), 24 | ] 25 | test_pipeline = [ 26 | dict(type='LoadImageFromFile'), 27 | dict( 28 | type='MultiScaleFlipAug', 29 | img_scale=(1333, 800), 30 | flip=False, 31 | transforms=[ 32 | dict(type='Resize', keep_ratio=True), 33 | dict(type='RandomFlip'), 34 | dict(type='Normalize', **img_norm_cfg), 35 | dict(type='Pad', size_divisor=32), 36 | dict(type='ImageToTensor', keys=['img']), 37 | dict(type='Collect', keys=['img']), 38 | ]) 39 | ] 40 | 41 | data = dict( 42 | samples_per_gpu=samples_per_gpu, 43 | workers_per_gpu=samples_per_gpu, 44 | train=dict( 45 | type=dataset_type, 46 | ann_file=data_root + 'annotations/instances_train2017.json', 47 | img_prefix=data_root + 'train2017/', 48 | pipeline=train_pipeline), 49 | val=dict( 50 | type=dataset_type, 51 | ann_file=data_root + 'annotations/instances_val2017.json', 52 | img_prefix=data_root + 'val2017/', 53 | pipeline=test_pipeline), 54 | test=dict( 55 | type=dataset_type, 56 | ann_file=data_root + 'annotations/instances_val2017.json', 57 | img_prefix=data_root + 'val2017/', 58 | pipeline=test_pipeline) 59 | ) 60 | evaluation = dict(interval=1, metric='bbox') 61 | 62 | model = dict( 63 | type='FCOS', 64 | pretrained='open-mmlab://detectron/resnet50_caffe', 65 | backbone=dict( 66 | type='ResNet', 67 | depth=50, 68 | num_stages=4, 69 | out_indices=(0, 1, 2, 3), 70 | frozen_stages=1, 71 | norm_cfg=dict(type='BN', requires_grad=False), 72 | norm_eval=True, 73 | style='caffe'), 74 | neck=dict( 75 | type='FPN', 76 | in_channels=[256, 512, 1024, 2048], 77 | out_channels=256, 78 | start_level=1, 79 | add_extra_convs=True, 80 | extra_convs_on_inputs=False, 81 | num_outs=5, 82 | relu_before_extra_convs=True), 83 | 84 | bbox_head=dict( 85 | type='FCOSHead', 86 | num_classes=num_classes, 87 | in_channels=256, 88 | stacked_convs=4, 89 | feat_channels=256, 90 | strides=[8, 16, 32, 64, 128], 91 | regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512), 92 | (512, 10000)), 93 | center_sampling=True, 94 | center_sample_radius=1.5, 95 | norm_on_bbox=True, 96 | centerness_on_reg=True, 97 | norm_cfg=dict(type='GN', num_groups=32, requires_grad=True), 98 | loss_cls=dict( 99 | type='FocalLoss', 100 | use_sigmoid=True, 101 | gamma=2.0, 102 | alpha=0.25, 103 | loss_weight=1.0), 104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0), 105 | loss_centerness=dict( 106 | type='CrossEntropyLoss', 107 | use_sigmoid=True, 108 | loss_weight=1.0), 109 | )) 110 | 111 | # training and testing settings 112 | train_cfg = dict() 113 | test_cfg = dict( 114 | nms_pre=1000, 115 | min_bbox_size=0, 116 | score_thr=0.05, 117 | nms=dict(type='nms', iou_thr=0.6), 118 | max_per_img=100) 119 | 120 | 121 | # optimizer 122 | lr = samples_per_gpu * num_gpus / 16 * 0.01 123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001) 124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) 125 | 126 | # learning policy 127 | lr_config = dict( 128 | policy='step', 129 | warmup='linear', 130 | warmup_iters=500, 131 | warmup_ratio=1.0 / 3, 132 | step=[27, 33]) 133 | total_epochs = 36 134 | 135 | checkpoint_config = dict(interval=1) 136 | # yapf:disable 137 | log_config = dict( 138 | interval=50, 139 | hooks=[ 140 | dict(type='TextLoggerHook'), 141 | # dict(type='TensorboardLoggerHook') 142 | ]) 143 | # yapf:enable 144 | dist_params = dict(backend='nccl') 145 | log_level = 'INFO' 146 | load_from = None 147 | resume_from = None 148 | workflow = [('train', 1)] 149 | 150 | find_unused_parameters = True 151 | --------------------------------------------------------------------------------