├── README.md
└── configs
├── ATSS
├── atss_r50-caffe_coco_ms_1x.py
├── atss_r50-caffe_coco_ms_2x.py
└── atss_r50-caffe_coco_ms_3x.py
└── FCOS
├── fcos_r50-caffe_coco_ms_1x.py
├── fcos_r50-caffe_coco_ms_2x.py
└── fcos_r50-caffe_coco_ms_3x.py
/README.md:
--------------------------------------------------------------------------------
1 | # Object Detection Made Simpler by Eliminating Heuristic NMS
2 | PyTorch Implementation for Our Paper: "Object Detection Made Simpler by Eliminating Heuristic NMS"
3 |
4 |
5 | ## Requirements
6 | * Python 3.7
7 | * PyTorch 1.5.1
8 | * [mmdetectoin](https://github.com/open-mmlab/mmdetection)
9 |
10 | ## Usage:
11 |
12 | The code is being submitted to the company for open source review.
13 |
14 | ## PSS for nms-free Object Detection:
15 |
16 | #### End-to-End Training
17 |
18 | | Model | Backbone | lr sched | mAP (COCO2017 val) | link
19 | | ------------ | ------------ |------------ |------------ |------------ |
20 | | FCOS | R50 | 3x | 42.0 | [model](http://ai4prz5kwonline.oss-cn-zhangjiakou.aliyuncs.com/jianchong_new%2Ffcos_r50-caffe_coco-480-800-3x.pth?Expires=1931844166&OSSAccessKeyId=FbmV29ZaCO4CLhjO&Signature=7pHiB3dhpl5lgzxJaR5i5J87dBA%3D) |
21 | | ATSS | R50 | 3x | 42.8 | [model](http://ai4prz5kwonline.oss-cn-zhangjiakou.aliyuncs.com/jianchong_new%2Fatss_r50-caffe_coco-480-800-3x.pth?Expires=2416484320&OSSAccessKeyId=FbmV29ZaCO4CLhjO&Signature=JnsqZMLOWd6grL%2F6thoa4kITlWM%3D) |
22 | | FCOSpss | R50 | 3x | 42.3 | |
23 | | ATSSpss | R50 | 3x | 42.6 | |
24 | | VFNETpss | R50 | 3x | 44.0 | |
25 | | FCOSpss | R101 | 3x | 44.1 | |
26 | | ATSSpss | R101 | 3x | 44.2 | |
27 | | VFNETpss | R101 | 3x | 45.7 | |
28 | | FCOSpss | X-101-DCN | 2x | 47.0 | |
29 | | ATSSpss | X-101-DCN | 2x | 47.3 | |
30 | | VFNETpss | X-101-DCN | 2x | 49.3 | |
31 | | FCOSpss | R2N-101-DCN | 2x | 48.2 | |
32 | | ATSSpss | R2N-101-DCN | 2x | 48.6 | |
33 | | VFNETpss | R2N-101-DCN | 2x | 50.0 | |
34 |
35 | **NOTE:** All models are trained with multi-scale training schedule of ‘[480, 800] $\times$ 1333’.
36 |
37 | #### Two-Step Training
38 | If we have a pretrained model, only finetuning the PSS head can save the training time.
39 |
40 | | Model | Backbone | MS Training | lr sched | mAP
pretrain model (w NMS) | mAP
finetuned PSS (w/o NMS)
41 | | ------------ | ------------ | ------------ | ------------ | ------------ |------------ |
42 | | GFocalV2pss | R50 | Yes | 12 | [43.9](https://github.com/implus/GFocalV2 "43.9") |43.3 |
43 | | GFocalV2pss | X-101-32x4d-DCN | Yes | 12 | [48.8](https://github.com/implus/GFocalV2 "48.8") | 48.2 |
44 | | GFocalV2pss | R2N-101-DCN | Yes | 12 | [49.9](https://github.com/implus/GFocalV2 "49.9") | 49.2 |
45 |
46 |
47 |
48 | ## PSS for nms-free Instance Segmentation
49 |
50 | #### End-to-End Training
51 |
52 | | Model | Backbone | lr sched | bbox mAP | segm mAP | link
53 | | ------------ | ------------ | ------------ |------------ |------------ |------------ |
54 | | [CondInst](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst) | R50 | 3x | 41.9 | 37.5 | |
55 | | CondInstpss | R50 | 3x | 41.2 | 36.7 | |
56 | | [CondInst + sem](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst) | R50 | 3x | 42.6 | 38.2 | |
57 | | CondInstpss + sem | R50 | 3x | 42.3 | 37.7 | |
58 | | [CondInst](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst) | R101 | 3x | 43.3 | 38.6 | |
59 | | CondInstpss | R101 | 3x | 43.1 | 38.2 | |
60 | | [CondInst + sem](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst) | R101 | 3x | 44.6 | 39.8 | |
61 | | CondInstpss + sem | R101 | 3x | 44.1 | 39.3 | |
62 |
63 | **NOTE:** All models are trained with multi-scale training schedule of ‘[640, 800] $\times$ 1333’.
64 |
65 | ## Citation
66 | If you use the package in your research, please cite our paper:
67 | ```
68 | @misc{zhou2021object,
69 | title={Object Detection Made Simpler by Eliminating Heuristic NMS},
70 | author={Qiang Zhou and Chaohui Yu and Chunhua Shen and Zhibin Wang and Hao Li},
71 | year={2021},
72 | eprint={2101.11782},
73 | archivePrefix={arXiv},
74 | primaryClass={cs.CV}
75 | }
76 | ```
77 |
78 | ## PS
79 | 团队长期招聘中,社招/实习/校招,我们都要 !
80 | 欢迎投递简历,邮箱: `zhouqiang@zju.edu.cn`
81 |
--------------------------------------------------------------------------------
/configs/ATSS/atss_r50-caffe_coco_ms_1x.py:
--------------------------------------------------------------------------------
1 | samples_per_gpu = 2
2 | num_gpus = 8
3 | num_classes = 80
4 |
5 | dataset_type = 'CocoDataset'
6 | data_root = 'datasets/coco/coco_2017/'
7 |
8 | img_norm_cfg = dict(
9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) # for r50-caffe
10 | train_pipeline = [
11 | dict(type='LoadImageFromFile'),
12 | dict(type='LoadAnnotations', with_bbox=True),
13 | dict(type='Resize',
14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333),
16 | (736, 1333), (768, 1333), (800, 1333)],
17 | multiscale_mode='value',
18 | keep_ratio=True),
19 | dict(type='RandomFlip', flip_ratio=0.5),
20 | dict(type='Normalize', **img_norm_cfg),
21 | dict(type='Pad', size_divisor=32),
22 | dict(type='DefaultFormatBundle'),
23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
24 | ]
25 | test_pipeline = [
26 | dict(type='LoadImageFromFile'),
27 | dict(
28 | type='MultiScaleFlipAug',
29 | img_scale=(1333, 800),
30 | flip=False,
31 | transforms=[
32 | dict(type='Resize', keep_ratio=True),
33 | dict(type='RandomFlip'),
34 | dict(type='Normalize', **img_norm_cfg),
35 | dict(type='Pad', size_divisor=32),
36 | dict(type='ImageToTensor', keys=['img']),
37 | dict(type='Collect', keys=['img']),
38 | ])
39 | ]
40 |
41 | data = dict(
42 | samples_per_gpu=samples_per_gpu,
43 | workers_per_gpu=samples_per_gpu,
44 | train=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_train2017.json',
47 | img_prefix=data_root + 'train2017/',
48 | pipeline=train_pipeline),
49 | val=dict(
50 | type=dataset_type,
51 | ann_file=data_root + 'annotations/instances_val2017.json',
52 | img_prefix=data_root + 'val2017/',
53 | pipeline=test_pipeline),
54 | test=dict(
55 | type=dataset_type,
56 | ann_file=data_root + 'annotations/instances_val2017.json',
57 | img_prefix=data_root + 'val2017/',
58 | pipeline=test_pipeline)
59 | )
60 | evaluation = dict(interval=1, metric='bbox')
61 |
62 |
63 | model = dict(
64 | type='ATSS',
65 | pretrained='open-mmlab://detectron/resnet50_caffe',
66 | backbone=dict(
67 | type='ResNet',
68 | depth=50,
69 | num_stages=4,
70 | out_indices=(0, 1, 2, 3),
71 | frozen_stages=1,
72 | norm_cfg=dict(type='BN', requires_grad=False),
73 | norm_eval=True,
74 | style='caffe'),
75 | neck=dict(
76 | type='FPN',
77 | in_channels=[256, 512, 1024, 2048],
78 | out_channels=256,
79 | start_level=1,
80 | add_extra_convs='on_output',
81 | num_outs=5),
82 | bbox_head=dict(
83 | type='ATSSHead',
84 | num_classes=num_classes,
85 | in_channels=256,
86 | stacked_convs=4,
87 | feat_channels=256,
88 | anchor_generator=dict(
89 | type='AnchorGenerator',
90 | ratios=[1.0],
91 | octave_base_scale=8,
92 | scales_per_octave=1,
93 | strides=[8, 16, 32, 64, 128]),
94 | bbox_coder=dict(
95 | type='DeltaXYWHBBoxCoder',
96 | target_means=[.0, .0, .0, .0],
97 | target_stds=[0.1, 0.1, 0.2, 0.2]),
98 | loss_cls=dict(
99 | type='FocalLoss',
100 | use_sigmoid=True,
101 | gamma=2.0,
102 | alpha=0.25,
103 | loss_weight=1.0),
104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 | loss_centerness=dict(
106 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
107 | # training and testing settings
108 | train_cfg = dict(
109 | assigner=dict(type='ATSSAssigner', topk=9),
110 | allowed_border=-1,
111 | pos_weight=-1,
112 | debug=False)
113 | test_cfg = dict(
114 | nms_pre=1000,
115 | min_bbox_size=0,
116 | score_thr=0.05,
117 | nms=dict(type='nms', iou_thr=0.6),
118 | max_per_img=100)
119 |
120 | # optimizer
121 | lr = samples_per_gpu * num_gpus / 16 * 0.01
122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
124 |
125 |
126 | # learning policy
127 | lr_config = dict(
128 | policy='step',
129 | warmup='linear',
130 | warmup_iters=500,
131 | warmup_ratio=1.0 / 3,
132 | step=[8, 11])
133 | total_epochs = 12
134 |
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 | interval=50,
139 | hooks=[
140 | dict(type='TextLoggerHook'),
141 | # dict(type='TensorboardLoggerHook')
142 | ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 |
150 | find_unused_parameters=True
151 |
--------------------------------------------------------------------------------
/configs/ATSS/atss_r50-caffe_coco_ms_2x.py:
--------------------------------------------------------------------------------
1 | samples_per_gpu = 2
2 | num_gpus = 8
3 | num_classes = 80
4 |
5 | dataset_type = 'CocoDataset'
6 | data_root = 'datasets/coco/coco_2017/'
7 |
8 | img_norm_cfg = dict(
9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) # for r50-caffe
10 | train_pipeline = [
11 | dict(type='LoadImageFromFile'),
12 | dict(type='LoadAnnotations', with_bbox=True),
13 | dict(type='Resize',
14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333),
16 | (736, 1333), (768, 1333), (800, 1333)],
17 | multiscale_mode='value',
18 | keep_ratio=True),
19 | dict(type='RandomFlip', flip_ratio=0.5),
20 | dict(type='Normalize', **img_norm_cfg),
21 | dict(type='Pad', size_divisor=32),
22 | dict(type='DefaultFormatBundle'),
23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
24 | ]
25 | test_pipeline = [
26 | dict(type='LoadImageFromFile'),
27 | dict(
28 | type='MultiScaleFlipAug',
29 | img_scale=(1333, 800),
30 | flip=False,
31 | transforms=[
32 | dict(type='Resize', keep_ratio=True),
33 | dict(type='RandomFlip'),
34 | dict(type='Normalize', **img_norm_cfg),
35 | dict(type='Pad', size_divisor=32),
36 | dict(type='ImageToTensor', keys=['img']),
37 | dict(type='Collect', keys=['img']),
38 | ])
39 | ]
40 |
41 | data = dict(
42 | samples_per_gpu=samples_per_gpu,
43 | workers_per_gpu=samples_per_gpu,
44 | train=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_train2017.json',
47 | img_prefix=data_root + 'train2017/',
48 | pipeline=train_pipeline),
49 | val=dict(
50 | type=dataset_type,
51 | ann_file=data_root + 'annotations/instances_val2017.json',
52 | img_prefix=data_root + 'val2017/',
53 | pipeline=test_pipeline),
54 | test=dict(
55 | type=dataset_type,
56 | ann_file=data_root + 'annotations/instances_val2017.json',
57 | img_prefix=data_root + 'val2017/',
58 | pipeline=test_pipeline)
59 | )
60 | evaluation = dict(interval=1, metric='bbox')
61 |
62 |
63 | model = dict(
64 | type='ATSS',
65 | pretrained='open-mmlab://detectron/resnet50_caffe',
66 | backbone=dict(
67 | type='ResNet',
68 | depth=50,
69 | num_stages=4,
70 | out_indices=(0, 1, 2, 3),
71 | frozen_stages=1,
72 | norm_cfg=dict(type='BN', requires_grad=False),
73 | norm_eval=True,
74 | style='caffe'),
75 | neck=dict(
76 | type='FPN',
77 | in_channels=[256, 512, 1024, 2048],
78 | out_channels=256,
79 | start_level=1,
80 | add_extra_convs='on_output',
81 | num_outs=5),
82 | bbox_head=dict(
83 | type='ATSSHead',
84 | num_classes=num_classes,
85 | in_channels=256,
86 | stacked_convs=4,
87 | feat_channels=256,
88 | anchor_generator=dict(
89 | type='AnchorGenerator',
90 | ratios=[1.0],
91 | octave_base_scale=8,
92 | scales_per_octave=1,
93 | strides=[8, 16, 32, 64, 128]),
94 | bbox_coder=dict(
95 | type='DeltaXYWHBBoxCoder',
96 | target_means=[.0, .0, .0, .0],
97 | target_stds=[0.1, 0.1, 0.2, 0.2]),
98 | loss_cls=dict(
99 | type='FocalLoss',
100 | use_sigmoid=True,
101 | gamma=2.0,
102 | alpha=0.25,
103 | loss_weight=1.0),
104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 | loss_centerness=dict(
106 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
107 | # training and testing settings
108 | train_cfg = dict(
109 | assigner=dict(type='ATSSAssigner', topk=9),
110 | allowed_border=-1,
111 | pos_weight=-1,
112 | debug=False)
113 | test_cfg = dict(
114 | nms_pre=1000,
115 | min_bbox_size=0,
116 | score_thr=0.05,
117 | nms=dict(type='nms', iou_thr=0.6),
118 | max_per_img=100)
119 |
120 | # optimizer
121 | lr = samples_per_gpu * num_gpus / 16 * 0.01
122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
124 |
125 |
126 | # learning policy
127 | lr_config = dict(
128 | policy='step',
129 | warmup='linear',
130 | warmup_iters=500,
131 | warmup_ratio=1.0 / 3,
132 | step=[18, 22])
133 | total_epochs = 24
134 |
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 | interval=50,
139 | hooks=[
140 | dict(type='TextLoggerHook'),
141 | # dict(type='TensorboardLoggerHook')
142 | ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 |
150 | find_unused_parameters=True
151 |
--------------------------------------------------------------------------------
/configs/ATSS/atss_r50-caffe_coco_ms_3x.py:
--------------------------------------------------------------------------------
1 | samples_per_gpu = 2
2 | num_gpus = 8
3 | num_classes = 80
4 |
5 | dataset_type = 'CocoDataset'
6 | data_root = 'datasets/coco/coco_2017/'
7 |
8 | img_norm_cfg = dict(
9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) # for r50-caffe
10 | train_pipeline = [
11 | dict(type='LoadImageFromFile'),
12 | dict(type='LoadAnnotations', with_bbox=True),
13 | dict(type='Resize',
14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333),
16 | (736, 1333), (768, 1333), (800, 1333)],
17 | multiscale_mode='value',
18 | keep_ratio=True),
19 | dict(type='RandomFlip', flip_ratio=0.5),
20 | dict(type='Normalize', **img_norm_cfg),
21 | dict(type='Pad', size_divisor=32),
22 | dict(type='DefaultFormatBundle'),
23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
24 | ]
25 | test_pipeline = [
26 | dict(type='LoadImageFromFile'),
27 | dict(
28 | type='MultiScaleFlipAug',
29 | img_scale=(1333, 800),
30 | flip=False,
31 | transforms=[
32 | dict(type='Resize', keep_ratio=True),
33 | dict(type='RandomFlip'),
34 | dict(type='Normalize', **img_norm_cfg),
35 | dict(type='Pad', size_divisor=32),
36 | dict(type='ImageToTensor', keys=['img']),
37 | dict(type='Collect', keys=['img']),
38 | ])
39 | ]
40 |
41 | data = dict(
42 | samples_per_gpu=samples_per_gpu,
43 | workers_per_gpu=samples_per_gpu,
44 | train=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_train2017.json',
47 | img_prefix=data_root + 'train2017/',
48 | pipeline=train_pipeline),
49 | val=dict(
50 | type=dataset_type,
51 | ann_file=data_root + 'annotations/instances_val2017.json',
52 | img_prefix=data_root + 'val2017/',
53 | pipeline=test_pipeline),
54 | test=dict(
55 | type=dataset_type,
56 | ann_file=data_root + 'annotations/instances_val2017.json',
57 | img_prefix=data_root + 'val2017/',
58 | pipeline=test_pipeline)
59 | )
60 | evaluation = dict(interval=1, metric='bbox')
61 |
62 |
63 | model = dict(
64 | type='ATSS',
65 | pretrained='open-mmlab://detectron/resnet50_caffe',
66 | backbone=dict(
67 | type='ResNet',
68 | depth=50,
69 | num_stages=4,
70 | out_indices=(0, 1, 2, 3),
71 | frozen_stages=1,
72 | norm_cfg=dict(type='BN', requires_grad=False),
73 | norm_eval=True,
74 | style='caffe'),
75 | neck=dict(
76 | type='FPN',
77 | in_channels=[256, 512, 1024, 2048],
78 | out_channels=256,
79 | start_level=1,
80 | add_extra_convs='on_output',
81 | num_outs=5),
82 | bbox_head=dict(
83 | type='ATSSHead',
84 | num_classes=num_classes,
85 | in_channels=256,
86 | stacked_convs=4,
87 | feat_channels=256,
88 | anchor_generator=dict(
89 | type='AnchorGenerator',
90 | ratios=[1.0],
91 | octave_base_scale=8,
92 | scales_per_octave=1,
93 | strides=[8, 16, 32, 64, 128]),
94 | bbox_coder=dict(
95 | type='DeltaXYWHBBoxCoder',
96 | target_means=[.0, .0, .0, .0],
97 | target_stds=[0.1, 0.1, 0.2, 0.2]),
98 | loss_cls=dict(
99 | type='FocalLoss',
100 | use_sigmoid=True,
101 | gamma=2.0,
102 | alpha=0.25,
103 | loss_weight=1.0),
104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 | loss_centerness=dict(
106 | type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
107 | # training and testing settings
108 | train_cfg = dict(
109 | assigner=dict(type='ATSSAssigner', topk=9),
110 | allowed_border=-1,
111 | pos_weight=-1,
112 | debug=False)
113 | test_cfg = dict(
114 | nms_pre=1000,
115 | min_bbox_size=0,
116 | score_thr=0.05,
117 | nms=dict(type='nms', iou_thr=0.6),
118 | max_per_img=100)
119 |
120 | # optimizer
121 | lr = samples_per_gpu * num_gpus / 16 * 0.01
122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
124 |
125 |
126 | # learning policy
127 | lr_config = dict(
128 | policy='step',
129 | warmup='linear',
130 | warmup_iters=500,
131 | warmup_ratio=1.0 / 3,
132 | step=[27, 33])
133 | total_epochs = 36
134 |
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 | interval=50,
139 | hooks=[
140 | dict(type='TextLoggerHook'),
141 | # dict(type='TensorboardLoggerHook')
142 | ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 |
150 | find_unused_parameters=True
151 |
--------------------------------------------------------------------------------
/configs/FCOS/fcos_r50-caffe_coco_ms_1x.py:
--------------------------------------------------------------------------------
1 | samples_per_gpu = 2
2 | num_gpus = 8
3 | num_classes = 80
4 |
5 | dataset_type = 'CocoDataset'
6 | data_root = 'datasets/coco/coco_2017/'
7 |
8 | img_norm_cfg = dict(
9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)
10 | train_pipeline = [
11 | dict(type='LoadImageFromFile'),
12 | dict(type='LoadAnnotations', with_bbox=True),
13 | dict(type='Resize',
14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333),
16 | (736, 1333), (768, 1333), (800, 1333)],
17 | multiscale_mode='value',
18 | keep_ratio=True),
19 | dict(type='RandomFlip', flip_ratio=0.5),
20 | dict(type='Normalize', **img_norm_cfg),
21 | dict(type='Pad', size_divisor=32),
22 | dict(type='DefaultFormatBundle'),
23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
24 | ]
25 | test_pipeline = [
26 | dict(type='LoadImageFromFile'),
27 | dict(
28 | type='MultiScaleFlipAug',
29 | img_scale=(1333, 800),
30 | flip=False,
31 | transforms=[
32 | dict(type='Resize', keep_ratio=True),
33 | dict(type='RandomFlip'),
34 | dict(type='Normalize', **img_norm_cfg),
35 | dict(type='Pad', size_divisor=32),
36 | dict(type='ImageToTensor', keys=['img']),
37 | dict(type='Collect', keys=['img']),
38 | ])
39 | ]
40 |
41 | data = dict(
42 | samples_per_gpu=samples_per_gpu,
43 | workers_per_gpu=samples_per_gpu,
44 | train=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_train2017.json',
47 | img_prefix=data_root + 'train2017/',
48 | pipeline=train_pipeline),
49 | val=dict(
50 | type=dataset_type,
51 | ann_file=data_root + 'annotations/instances_val2017.json',
52 | img_prefix=data_root + 'val2017/',
53 | pipeline=test_pipeline),
54 | test=dict(
55 | type=dataset_type,
56 | ann_file=data_root + 'annotations/instances_val2017.json',
57 | img_prefix=data_root + 'val2017/',
58 | pipeline=test_pipeline)
59 | )
60 | evaluation = dict(interval=1, metric='bbox')
61 |
62 | model = dict(
63 | type='FCOS',
64 | pretrained='open-mmlab://detectron/resnet50_caffe',
65 | backbone=dict(
66 | type='ResNet',
67 | depth=50,
68 | num_stages=4,
69 | out_indices=(0, 1, 2, 3),
70 | frozen_stages=1,
71 | norm_cfg=dict(type='BN', requires_grad=False),
72 | norm_eval=True,
73 | style='caffe'),
74 | neck=dict(
75 | type='FPN',
76 | in_channels=[256, 512, 1024, 2048],
77 | out_channels=256,
78 | start_level=1,
79 | add_extra_convs=True,
80 | extra_convs_on_inputs=False,
81 | num_outs=5,
82 | relu_before_extra_convs=True),
83 |
84 | bbox_head=dict(
85 | type='FCOSHead',
86 | num_classes=num_classes,
87 | in_channels=256,
88 | stacked_convs=4,
89 | feat_channels=256,
90 | strides=[8, 16, 32, 64, 128],
91 | regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512),
92 | (512, 10000)),
93 | center_sampling=True,
94 | center_sample_radius=1.5,
95 | norm_on_bbox=True,
96 | centerness_on_reg=True,
97 | norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
98 | loss_cls=dict(
99 | type='FocalLoss',
100 | use_sigmoid=True,
101 | gamma=2.0,
102 | alpha=0.25,
103 | loss_weight=1.0),
104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 | loss_centerness=dict(
106 | type='CrossEntropyLoss',
107 | use_sigmoid=True,
108 | loss_weight=1.0),
109 | ))
110 |
111 | # training and testing settings
112 | train_cfg = dict()
113 | test_cfg = dict(
114 | nms_pre=1000,
115 | min_bbox_size=0,
116 | score_thr=0.05,
117 | nms=dict(type='nms', iou_thr=0.6),
118 | max_per_img=100)
119 |
120 |
121 | # optimizer
122 | lr = samples_per_gpu * num_gpus / 16 * 0.01
123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
125 |
126 | # learning policy
127 | lr_config = dict(
128 | policy='step',
129 | warmup='linear',
130 | warmup_iters=500,
131 | warmup_ratio=1.0 / 3,
132 | step=[8, 11])
133 | total_epochs = 12
134 |
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 | interval=50,
139 | hooks=[
140 | dict(type='TextLoggerHook'),
141 | # dict(type='TensorboardLoggerHook')
142 | ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 |
150 | find_unused_parameters = True
151 |
--------------------------------------------------------------------------------
/configs/FCOS/fcos_r50-caffe_coco_ms_2x.py:
--------------------------------------------------------------------------------
1 | samples_per_gpu = 2
2 | num_gpus = 8
3 | num_classes = 80
4 |
5 | dataset_type = 'CocoDataset'
6 | data_root = 'datasets/coco/coco_2017/'
7 |
8 | img_norm_cfg = dict(
9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)
10 | train_pipeline = [
11 | dict(type='LoadImageFromFile'),
12 | dict(type='LoadAnnotations', with_bbox=True),
13 | dict(type='Resize',
14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333),
16 | (736, 1333), (768, 1333), (800, 1333)],
17 | multiscale_mode='value',
18 | keep_ratio=True),
19 | dict(type='RandomFlip', flip_ratio=0.5),
20 | dict(type='Normalize', **img_norm_cfg),
21 | dict(type='Pad', size_divisor=32),
22 | dict(type='DefaultFormatBundle'),
23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
24 | ]
25 | test_pipeline = [
26 | dict(type='LoadImageFromFile'),
27 | dict(
28 | type='MultiScaleFlipAug',
29 | img_scale=(1333, 800),
30 | flip=False,
31 | transforms=[
32 | dict(type='Resize', keep_ratio=True),
33 | dict(type='RandomFlip'),
34 | dict(type='Normalize', **img_norm_cfg),
35 | dict(type='Pad', size_divisor=32),
36 | dict(type='ImageToTensor', keys=['img']),
37 | dict(type='Collect', keys=['img']),
38 | ])
39 | ]
40 |
41 | data = dict(
42 | samples_per_gpu=samples_per_gpu,
43 | workers_per_gpu=samples_per_gpu,
44 | train=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_train2017.json',
47 | img_prefix=data_root + 'train2017/',
48 | pipeline=train_pipeline),
49 | val=dict(
50 | type=dataset_type,
51 | ann_file=data_root + 'annotations/instances_val2017.json',
52 | img_prefix=data_root + 'val2017/',
53 | pipeline=test_pipeline),
54 | test=dict(
55 | type=dataset_type,
56 | ann_file=data_root + 'annotations/instances_val2017.json',
57 | img_prefix=data_root + 'val2017/',
58 | pipeline=test_pipeline)
59 | )
60 | evaluation = dict(interval=1, metric='bbox')
61 |
62 | model = dict(
63 | type='FCOS',
64 | pretrained='open-mmlab://detectron/resnet50_caffe',
65 | backbone=dict(
66 | type='ResNet',
67 | depth=50,
68 | num_stages=4,
69 | out_indices=(0, 1, 2, 3),
70 | frozen_stages=1,
71 | norm_cfg=dict(type='BN', requires_grad=False),
72 | norm_eval=True,
73 | style='caffe'),
74 | neck=dict(
75 | type='FPN',
76 | in_channels=[256, 512, 1024, 2048],
77 | out_channels=256,
78 | start_level=1,
79 | add_extra_convs=True,
80 | extra_convs_on_inputs=False,
81 | num_outs=5,
82 | relu_before_extra_convs=True),
83 |
84 | bbox_head=dict(
85 | type='FCOSHead',
86 | num_classes=num_classes,
87 | in_channels=256,
88 | stacked_convs=4,
89 | feat_channels=256,
90 | strides=[8, 16, 32, 64, 128],
91 | regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512),
92 | (512, 10000)),
93 | center_sampling=True,
94 | center_sample_radius=1.5,
95 | norm_on_bbox=True,
96 | centerness_on_reg=True,
97 | norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
98 | loss_cls=dict(
99 | type='FocalLoss',
100 | use_sigmoid=True,
101 | gamma=2.0,
102 | alpha=0.25,
103 | loss_weight=1.0),
104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 | loss_centerness=dict(
106 | type='CrossEntropyLoss',
107 | use_sigmoid=True,
108 | loss_weight=1.0),
109 | ))
110 |
111 | # training and testing settings
112 | train_cfg = dict()
113 | test_cfg = dict(
114 | nms_pre=1000,
115 | min_bbox_size=0,
116 | score_thr=0.05,
117 | nms=dict(type='nms', iou_thr=0.6),
118 | max_per_img=100)
119 |
120 |
121 | # optimizer
122 | lr = samples_per_gpu * num_gpus / 16 * 0.01
123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
125 |
126 | # learning policy
127 | lr_config = dict(
128 | policy='step',
129 | warmup='linear',
130 | warmup_iters=500,
131 | warmup_ratio=1.0 / 3,
132 | step=[18, 22])
133 | total_epochs = 24
134 |
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 | interval=50,
139 | hooks=[
140 | dict(type='TextLoggerHook'),
141 | # dict(type='TensorboardLoggerHook')
142 | ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 |
150 | find_unused_parameters = True
151 |
--------------------------------------------------------------------------------
/configs/FCOS/fcos_r50-caffe_coco_ms_3x.py:
--------------------------------------------------------------------------------
1 | samples_per_gpu = 2
2 | num_gpus = 8
3 | num_classes = 80
4 |
5 | dataset_type = 'CocoDataset'
6 | data_root = 'datasets/coco/coco_2017/'
7 |
8 | img_norm_cfg = dict(
9 | mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)
10 | train_pipeline = [
11 | dict(type='LoadImageFromFile'),
12 | dict(type='LoadAnnotations', with_bbox=True),
13 | dict(type='Resize',
14 | img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
15 | (608, 1333), (640, 1333), (672, 1333), (704, 1333),
16 | (736, 1333), (768, 1333), (800, 1333)],
17 | multiscale_mode='value',
18 | keep_ratio=True),
19 | dict(type='RandomFlip', flip_ratio=0.5),
20 | dict(type='Normalize', **img_norm_cfg),
21 | dict(type='Pad', size_divisor=32),
22 | dict(type='DefaultFormatBundle'),
23 | dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
24 | ]
25 | test_pipeline = [
26 | dict(type='LoadImageFromFile'),
27 | dict(
28 | type='MultiScaleFlipAug',
29 | img_scale=(1333, 800),
30 | flip=False,
31 | transforms=[
32 | dict(type='Resize', keep_ratio=True),
33 | dict(type='RandomFlip'),
34 | dict(type='Normalize', **img_norm_cfg),
35 | dict(type='Pad', size_divisor=32),
36 | dict(type='ImageToTensor', keys=['img']),
37 | dict(type='Collect', keys=['img']),
38 | ])
39 | ]
40 |
41 | data = dict(
42 | samples_per_gpu=samples_per_gpu,
43 | workers_per_gpu=samples_per_gpu,
44 | train=dict(
45 | type=dataset_type,
46 | ann_file=data_root + 'annotations/instances_train2017.json',
47 | img_prefix=data_root + 'train2017/',
48 | pipeline=train_pipeline),
49 | val=dict(
50 | type=dataset_type,
51 | ann_file=data_root + 'annotations/instances_val2017.json',
52 | img_prefix=data_root + 'val2017/',
53 | pipeline=test_pipeline),
54 | test=dict(
55 | type=dataset_type,
56 | ann_file=data_root + 'annotations/instances_val2017.json',
57 | img_prefix=data_root + 'val2017/',
58 | pipeline=test_pipeline)
59 | )
60 | evaluation = dict(interval=1, metric='bbox')
61 |
62 | model = dict(
63 | type='FCOS',
64 | pretrained='open-mmlab://detectron/resnet50_caffe',
65 | backbone=dict(
66 | type='ResNet',
67 | depth=50,
68 | num_stages=4,
69 | out_indices=(0, 1, 2, 3),
70 | frozen_stages=1,
71 | norm_cfg=dict(type='BN', requires_grad=False),
72 | norm_eval=True,
73 | style='caffe'),
74 | neck=dict(
75 | type='FPN',
76 | in_channels=[256, 512, 1024, 2048],
77 | out_channels=256,
78 | start_level=1,
79 | add_extra_convs=True,
80 | extra_convs_on_inputs=False,
81 | num_outs=5,
82 | relu_before_extra_convs=True),
83 |
84 | bbox_head=dict(
85 | type='FCOSHead',
86 | num_classes=num_classes,
87 | in_channels=256,
88 | stacked_convs=4,
89 | feat_channels=256,
90 | strides=[8, 16, 32, 64, 128],
91 | regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512),
92 | (512, 10000)),
93 | center_sampling=True,
94 | center_sample_radius=1.5,
95 | norm_on_bbox=True,
96 | centerness_on_reg=True,
97 | norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
98 | loss_cls=dict(
99 | type='FocalLoss',
100 | use_sigmoid=True,
101 | gamma=2.0,
102 | alpha=0.25,
103 | loss_weight=1.0),
104 | loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 | loss_centerness=dict(
106 | type='CrossEntropyLoss',
107 | use_sigmoid=True,
108 | loss_weight=1.0),
109 | ))
110 |
111 | # training and testing settings
112 | train_cfg = dict()
113 | test_cfg = dict(
114 | nms_pre=1000,
115 | min_bbox_size=0,
116 | score_thr=0.05,
117 | nms=dict(type='nms', iou_thr=0.6),
118 | max_per_img=100)
119 |
120 |
121 | # optimizer
122 | lr = samples_per_gpu * num_gpus / 16 * 0.01
123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
125 |
126 | # learning policy
127 | lr_config = dict(
128 | policy='step',
129 | warmup='linear',
130 | warmup_iters=500,
131 | warmup_ratio=1.0 / 3,
132 | step=[27, 33])
133 | total_epochs = 36
134 |
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 | interval=50,
139 | hooks=[
140 | dict(type='TextLoggerHook'),
141 | # dict(type='TensorboardLoggerHook')
142 | ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 |
150 | find_unused_parameters = True
151 |
--------------------------------------------------------------------------------