├── README.md
└── configs
    ├── ATSS
        ├── atss_r50-caffe_coco_ms_1x.py
        ├── atss_r50-caffe_coco_ms_2x.py
        └── atss_r50-caffe_coco_ms_3x.py
    └── FCOS
        ├── fcos_r50-caffe_coco_ms_1x.py
        ├── fcos_r50-caffe_coco_ms_2x.py
        └── fcos_r50-caffe_coco_ms_3x.py


/README.md:
--------------------------------------------------------------------------------
 1 | # Object Detection Made Simpler by Eliminating Heuristic NMS
 2 | PyTorch Implementation for Our Paper: "Object Detection Made Simpler by Eliminating Heuristic NMS"
 3 | 
 4 | 
 5 | ## Requirements
 6 | * Python 3.7
 7 | * PyTorch 1.5.1
 8 | * [mmdetectoin](https://github.com/open-mmlab/mmdetection)
 9 | 
10 | ## Usage:
11 | 
12 | The code is being submitted to the company for open source review.
13 | 
14 | ## PSS for nms-free Object Detection:
15 | 
16 | #### End-to-End Training
17 | 
18 | |  Model | Backbone | lr sched | mAP (COCO2017 val) | link
19 | | ------------ | ------------ |------------ |------------ |------------ |
20 | | FCOS | R50  | 3x | 42.0 | [model](http://ai4prz5kwonline.oss-cn-zhangjiakou.aliyuncs.com/jianchong_new%2Ffcos_r50-caffe_coco-480-800-3x.pth?Expires=1931844166&OSSAccessKeyId=FbmV29ZaCO4CLhjO&Signature=7pHiB3dhpl5lgzxJaR5i5J87dBA%3D) |
21 | | ATSS | R50 | 3x | 42.8 | [model](http://ai4prz5kwonline.oss-cn-zhangjiakou.aliyuncs.com/jianchong_new%2Fatss_r50-caffe_coco-480-800-3x.pth?Expires=2416484320&OSSAccessKeyId=FbmV29ZaCO4CLhjO&Signature=JnsqZMLOWd6grL%2F6thoa4kITlWM%3D) |
22 | | FCOS<sub>pss</sub> | R50  | 3x | 42.3 |  |
23 | | ATSS<sub>pss</sub> | R50  | 3x | 42.6 |  |
24 | | VFNET<sub>pss</sub> | R50 | 3x | 44.0 |  |
25 | | FCOS<sub>pss</sub> | R101  | 3x | 44.1 |  |
26 | | ATSS<sub>pss</sub> | R101 | 3x | 44.2 |  |
27 | | VFNET<sub>pss</sub> | R101 | 3x | 45.7 |  |
28 | | FCOS<sub>pss</sub> | X-101-DCN  | 2x | 47.0 |  |
29 | | ATSS<sub>pss</sub> | X-101-DCN  | 2x | 47.3 |  |
30 | | VFNET<sub>pss</sub> | X-101-DCN | 2x | 49.3 |  |
31 | | FCOS<sub>pss</sub> | R2N-101-DCN | 2x | 48.2 |  |
32 | | ATSS<sub>pss</sub> | R2N-101-DCN  | 2x | 48.6 |  |
33 | | VFNET<sub>pss</sub> | R2N-101-DCN | 2x | 50.0 |  |
34 | 
35 | **NOTE:** All models are trained with multi-scale training schedule of ‘[480, 800] $\times$ 1333’.
36 | 
37 | #### Two-Step Training
38 | If we have a pretrained model, only finetuning the PSS head can save the training time.
39 | 
40 | | Model  | Backbone  | MS Training  | lr sched  | mAP <br> pretrain model (w NMS)  | mAP <br> finetuned PSS (w/o NMS)
41 | | ------------ | ------------ | ------------ | ------------ | ------------ |------------ |
42 | | GFocalV2<sub>pss</sub>  | R50  | Yes  | 12  | [43.9](https://github.com/implus/GFocalV2 "43.9")  |43.3 |
43 | | GFocalV2<sub>pss</sub> | X-101-32x4d-DCN  | Yes | 12  | [48.8](https://github.com/implus/GFocalV2 "48.8")  | 48.2 |
44 | | GFocalV2<sub>pss</sub> | R2N-101-DCN | Yes | 12 | [49.9](https://github.com/implus/GFocalV2 "49.9") | 49.2 |
45 | 
46 | 
47 | 
48 | ## PSS for nms-free Instance Segmentation
49 | 
50 | #### End-to-End Training
51 | 
52 | |  Model | Backbone  | lr sched | bbox mAP  | segm mAP | link
53 | | ------------ | ------------ | ------------ |------------ |------------ |------------ |
54 | | [CondInst](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst)  | R50  | 3x | 41.9 | 37.5 | |
55 | | CondInst<sub>pss</sub> | R50  | 3x | 41.2 | 36.7 | |
56 | | [CondInst + sem](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst)  | R50  | 3x | 42.6 | 38.2 | |
57 | | CondInst<sub>pss</sub> + sem | R50  | 3x | 42.3 | 37.7 | |
58 | | [CondInst](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst)  | R101  | 3x | 43.3 | 38.6 | |
59 | | CondInst<sub>pss</sub> | R101  | 3x | 43.1 | 38.2 | |
60 | | [CondInst + sem](https://github.com/aim-uofa/AdelaiDet/tree/master/configs/CondInst)  | R101  | 3x | 44.6 | 39.8 | |
61 | | CondInst<sub>pss</sub> + sem | R101  | 3x | 44.1 | 39.3 | |
62 | 
63 | **NOTE:** All models are trained with multi-scale training schedule of ‘[640, 800] $\times$ 1333’.
64 | 
65 | ## Citation
66 | If you use the package in your research, please cite our paper:
67 | ```
68 | @misc{zhou2021object,
69 |       title={Object Detection Made Simpler by Eliminating Heuristic NMS}, 
70 |       author={Qiang Zhou and Chaohui Yu and Chunhua Shen and Zhibin Wang and Hao Li},
71 |       year={2021},
72 |       eprint={2101.11782},
73 |       archivePrefix={arXiv},
74 |       primaryClass={cs.CV}
75 | }
76 | ```
77 | 
78 | ## PS
79 | 团队长期招聘中，社招/实习/校招，我们都要 !
80 | 欢迎投递简历，邮箱: `zhouqiang@zju.edu.cn`
81 | 


--------------------------------------------------------------------------------
/configs/ATSS/atss_r50-caffe_coco_ms_1x.py:
--------------------------------------------------------------------------------
  1 | samples_per_gpu = 2
  2 | num_gpus = 8
  3 | num_classes = 80
  4 | 
  5 | dataset_type = 'CocoDataset'
  6 | data_root = 'datasets/coco/coco_2017/'
  7 | 
  8 | img_norm_cfg = dict(
  9 |     mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)         # for r50-caffe
 10 | train_pipeline = [
 11 |     dict(type='LoadImageFromFile'),
 12 |     dict(type='LoadAnnotations', with_bbox=True),
 13 |     dict(type='Resize', 
 14 |                 img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
 15 |                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
 16 |                            (736, 1333), (768, 1333), (800, 1333)],
 17 |                 multiscale_mode='value',
 18 |                 keep_ratio=True),
 19 |     dict(type='RandomFlip', flip_ratio=0.5),
 20 |     dict(type='Normalize', **img_norm_cfg),
 21 |     dict(type='Pad', size_divisor=32),
 22 |     dict(type='DefaultFormatBundle'),
 23 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
 24 | ]
 25 | test_pipeline = [
 26 |     dict(type='LoadImageFromFile'),
 27 |     dict(
 28 |         type='MultiScaleFlipAug',
 29 |         img_scale=(1333, 800),
 30 |         flip=False,
 31 |         transforms=[
 32 |             dict(type='Resize', keep_ratio=True),
 33 |             dict(type='RandomFlip'),
 34 |             dict(type='Normalize', **img_norm_cfg),
 35 |             dict(type='Pad', size_divisor=32),
 36 |             dict(type='ImageToTensor', keys=['img']),
 37 |             dict(type='Collect', keys=['img']),
 38 |         ])
 39 | ]
 40 | 
 41 | data = dict(
 42 |     samples_per_gpu=samples_per_gpu,
 43 |     workers_per_gpu=samples_per_gpu,
 44 |     train=dict(
 45 |         type=dataset_type,
 46 |         ann_file=data_root + 'annotations/instances_train2017.json',
 47 |         img_prefix=data_root + 'train2017/',
 48 |         pipeline=train_pipeline),
 49 |     val=dict(
 50 |         type=dataset_type,
 51 |         ann_file=data_root + 'annotations/instances_val2017.json',
 52 |         img_prefix=data_root + 'val2017/',
 53 |         pipeline=test_pipeline),
 54 |     test=dict(
 55 |         type=dataset_type,
 56 |         ann_file=data_root + 'annotations/instances_val2017.json',
 57 |         img_prefix=data_root + 'val2017/',
 58 |         pipeline=test_pipeline)
 59 |     )
 60 | evaluation = dict(interval=1, metric='bbox')
 61 | 
 62 | 
 63 | model = dict(
 64 |     type='ATSS',
 65 |     pretrained='open-mmlab://detectron/resnet50_caffe',
 66 |     backbone=dict(
 67 |         type='ResNet',
 68 |         depth=50,
 69 |         num_stages=4,
 70 |         out_indices=(0, 1, 2, 3),
 71 |         frozen_stages=1,
 72 |         norm_cfg=dict(type='BN', requires_grad=False),
 73 |         norm_eval=True,
 74 |         style='caffe'),
 75 |     neck=dict(
 76 |         type='FPN',
 77 |         in_channels=[256, 512, 1024, 2048],
 78 |         out_channels=256,
 79 |         start_level=1,
 80 |         add_extra_convs='on_output',
 81 |         num_outs=5),
 82 |     bbox_head=dict(
 83 |         type='ATSSHead',
 84 |         num_classes=num_classes,
 85 |         in_channels=256,
 86 |         stacked_convs=4,
 87 |         feat_channels=256,
 88 |         anchor_generator=dict(
 89 |             type='AnchorGenerator',
 90 |             ratios=[1.0],
 91 |             octave_base_scale=8,
 92 |             scales_per_octave=1,
 93 |             strides=[8, 16, 32, 64, 128]),
 94 |         bbox_coder=dict(
 95 |             type='DeltaXYWHBBoxCoder',
 96 |             target_means=[.0, .0, .0, .0],
 97 |             target_stds=[0.1, 0.1, 0.2, 0.2]),
 98 |         loss_cls=dict(
 99 |             type='FocalLoss',
100 |             use_sigmoid=True,
101 |             gamma=2.0,
102 |             alpha=0.25,
103 |             loss_weight=1.0),
104 |         loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 |         loss_centerness=dict(
106 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
107 | # training and testing settings
108 | train_cfg = dict(
109 |     assigner=dict(type='ATSSAssigner', topk=9),
110 |     allowed_border=-1,
111 |     pos_weight=-1,
112 |     debug=False)
113 | test_cfg = dict(
114 |     nms_pre=1000,
115 |     min_bbox_size=0,
116 |     score_thr=0.05,
117 |     nms=dict(type='nms', iou_thr=0.6),
118 |     max_per_img=100)
119 | 
120 | # optimizer
121 | lr = samples_per_gpu * num_gpus / 16 * 0.01
122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
124 | 
125 | 
126 | # learning policy
127 | lr_config = dict(
128 |     policy='step',
129 |     warmup='linear',
130 |     warmup_iters=500,
131 |     warmup_ratio=1.0 / 3,
132 |     step=[8, 11])
133 | total_epochs = 12
134 | 
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 |     interval=50,
139 |     hooks=[
140 |         dict(type='TextLoggerHook'),
141 |         # dict(type='TensorboardLoggerHook')
142 |     ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 | 
150 | find_unused_parameters=True
151 | 


--------------------------------------------------------------------------------
/configs/ATSS/atss_r50-caffe_coco_ms_2x.py:
--------------------------------------------------------------------------------
  1 | samples_per_gpu = 2
  2 | num_gpus = 8
  3 | num_classes = 80
  4 | 
  5 | dataset_type = 'CocoDataset'
  6 | data_root = 'datasets/coco/coco_2017/'
  7 | 
  8 | img_norm_cfg = dict(
  9 |     mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)         # for r50-caffe
 10 | train_pipeline = [
 11 |     dict(type='LoadImageFromFile'),
 12 |     dict(type='LoadAnnotations', with_bbox=True),
 13 |     dict(type='Resize', 
 14 |                 img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
 15 |                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
 16 |                            (736, 1333), (768, 1333), (800, 1333)],
 17 |                 multiscale_mode='value',
 18 |                 keep_ratio=True),
 19 |     dict(type='RandomFlip', flip_ratio=0.5),
 20 |     dict(type='Normalize', **img_norm_cfg),
 21 |     dict(type='Pad', size_divisor=32),
 22 |     dict(type='DefaultFormatBundle'),
 23 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
 24 | ]
 25 | test_pipeline = [
 26 |     dict(type='LoadImageFromFile'),
 27 |     dict(
 28 |         type='MultiScaleFlipAug',
 29 |         img_scale=(1333, 800),
 30 |         flip=False,
 31 |         transforms=[
 32 |             dict(type='Resize', keep_ratio=True),
 33 |             dict(type='RandomFlip'),
 34 |             dict(type='Normalize', **img_norm_cfg),
 35 |             dict(type='Pad', size_divisor=32),
 36 |             dict(type='ImageToTensor', keys=['img']),
 37 |             dict(type='Collect', keys=['img']),
 38 |         ])
 39 | ]
 40 | 
 41 | data = dict(
 42 |     samples_per_gpu=samples_per_gpu,
 43 |     workers_per_gpu=samples_per_gpu,
 44 |     train=dict(
 45 |         type=dataset_type,
 46 |         ann_file=data_root + 'annotations/instances_train2017.json',
 47 |         img_prefix=data_root + 'train2017/',
 48 |         pipeline=train_pipeline),
 49 |     val=dict(
 50 |         type=dataset_type,
 51 |         ann_file=data_root + 'annotations/instances_val2017.json',
 52 |         img_prefix=data_root + 'val2017/',
 53 |         pipeline=test_pipeline),
 54 |     test=dict(
 55 |         type=dataset_type,
 56 |         ann_file=data_root + 'annotations/instances_val2017.json',
 57 |         img_prefix=data_root + 'val2017/',
 58 |         pipeline=test_pipeline)
 59 |     )
 60 | evaluation = dict(interval=1, metric='bbox')
 61 | 
 62 | 
 63 | model = dict(
 64 |     type='ATSS',
 65 |     pretrained='open-mmlab://detectron/resnet50_caffe',
 66 |     backbone=dict(
 67 |         type='ResNet',
 68 |         depth=50,
 69 |         num_stages=4,
 70 |         out_indices=(0, 1, 2, 3),
 71 |         frozen_stages=1,
 72 |         norm_cfg=dict(type='BN', requires_grad=False),
 73 |         norm_eval=True,
 74 |         style='caffe'),
 75 |     neck=dict(
 76 |         type='FPN',
 77 |         in_channels=[256, 512, 1024, 2048],
 78 |         out_channels=256,
 79 |         start_level=1,
 80 |         add_extra_convs='on_output',
 81 |         num_outs=5),
 82 |     bbox_head=dict(
 83 |         type='ATSSHead',
 84 |         num_classes=num_classes,
 85 |         in_channels=256,
 86 |         stacked_convs=4,
 87 |         feat_channels=256,
 88 |         anchor_generator=dict(
 89 |             type='AnchorGenerator',
 90 |             ratios=[1.0],
 91 |             octave_base_scale=8,
 92 |             scales_per_octave=1,
 93 |             strides=[8, 16, 32, 64, 128]),
 94 |         bbox_coder=dict(
 95 |             type='DeltaXYWHBBoxCoder',
 96 |             target_means=[.0, .0, .0, .0],
 97 |             target_stds=[0.1, 0.1, 0.2, 0.2]),
 98 |         loss_cls=dict(
 99 |             type='FocalLoss',
100 |             use_sigmoid=True,
101 |             gamma=2.0,
102 |             alpha=0.25,
103 |             loss_weight=1.0),
104 |         loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 |         loss_centerness=dict(
106 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
107 | # training and testing settings
108 | train_cfg = dict(
109 |     assigner=dict(type='ATSSAssigner', topk=9),
110 |     allowed_border=-1,
111 |     pos_weight=-1,
112 |     debug=False)
113 | test_cfg = dict(
114 |     nms_pre=1000,
115 |     min_bbox_size=0,
116 |     score_thr=0.05,
117 |     nms=dict(type='nms', iou_thr=0.6),
118 |     max_per_img=100)
119 | 
120 | # optimizer
121 | lr = samples_per_gpu * num_gpus / 16 * 0.01
122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
124 | 
125 | 
126 | # learning policy
127 | lr_config = dict(
128 |     policy='step',
129 |     warmup='linear',
130 |     warmup_iters=500,
131 |     warmup_ratio=1.0 / 3,
132 |     step=[18, 22])
133 | total_epochs = 24
134 | 
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 |     interval=50,
139 |     hooks=[
140 |         dict(type='TextLoggerHook'),
141 |         # dict(type='TensorboardLoggerHook')
142 |     ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 | 
150 | find_unused_parameters=True
151 | 


--------------------------------------------------------------------------------
/configs/ATSS/atss_r50-caffe_coco_ms_3x.py:
--------------------------------------------------------------------------------
  1 | samples_per_gpu = 2
  2 | num_gpus = 8
  3 | num_classes = 80
  4 | 
  5 | dataset_type = 'CocoDataset'
  6 | data_root = 'datasets/coco/coco_2017/'
  7 | 
  8 | img_norm_cfg = dict(
  9 |     mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)         # for r50-caffe
 10 | train_pipeline = [
 11 |     dict(type='LoadImageFromFile'),
 12 |     dict(type='LoadAnnotations', with_bbox=True),
 13 |     dict(type='Resize', 
 14 |                 img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
 15 |                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
 16 |                            (736, 1333), (768, 1333), (800, 1333)],
 17 |                 multiscale_mode='value',
 18 |                 keep_ratio=True),
 19 |     dict(type='RandomFlip', flip_ratio=0.5),
 20 |     dict(type='Normalize', **img_norm_cfg),
 21 |     dict(type='Pad', size_divisor=32),
 22 |     dict(type='DefaultFormatBundle'),
 23 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
 24 | ]
 25 | test_pipeline = [
 26 |     dict(type='LoadImageFromFile'),
 27 |     dict(
 28 |         type='MultiScaleFlipAug',
 29 |         img_scale=(1333, 800),
 30 |         flip=False,
 31 |         transforms=[
 32 |             dict(type='Resize', keep_ratio=True),
 33 |             dict(type='RandomFlip'),
 34 |             dict(type='Normalize', **img_norm_cfg),
 35 |             dict(type='Pad', size_divisor=32),
 36 |             dict(type='ImageToTensor', keys=['img']),
 37 |             dict(type='Collect', keys=['img']),
 38 |         ])
 39 | ]
 40 | 
 41 | data = dict(
 42 |     samples_per_gpu=samples_per_gpu,
 43 |     workers_per_gpu=samples_per_gpu,
 44 |     train=dict(
 45 |         type=dataset_type,
 46 |         ann_file=data_root + 'annotations/instances_train2017.json',
 47 |         img_prefix=data_root + 'train2017/',
 48 |         pipeline=train_pipeline),
 49 |     val=dict(
 50 |         type=dataset_type,
 51 |         ann_file=data_root + 'annotations/instances_val2017.json',
 52 |         img_prefix=data_root + 'val2017/',
 53 |         pipeline=test_pipeline),
 54 |     test=dict(
 55 |         type=dataset_type,
 56 |         ann_file=data_root + 'annotations/instances_val2017.json',
 57 |         img_prefix=data_root + 'val2017/',
 58 |         pipeline=test_pipeline)
 59 |     )
 60 | evaluation = dict(interval=1, metric='bbox')
 61 | 
 62 | 
 63 | model = dict(
 64 |     type='ATSS',
 65 |     pretrained='open-mmlab://detectron/resnet50_caffe',
 66 |     backbone=dict(
 67 |         type='ResNet',
 68 |         depth=50,
 69 |         num_stages=4,
 70 |         out_indices=(0, 1, 2, 3),
 71 |         frozen_stages=1,
 72 |         norm_cfg=dict(type='BN', requires_grad=False),
 73 |         norm_eval=True,
 74 |         style='caffe'),
 75 |     neck=dict(
 76 |         type='FPN',
 77 |         in_channels=[256, 512, 1024, 2048],
 78 |         out_channels=256,
 79 |         start_level=1,
 80 |         add_extra_convs='on_output',
 81 |         num_outs=5),
 82 |     bbox_head=dict(
 83 |         type='ATSSHead',
 84 |         num_classes=num_classes,
 85 |         in_channels=256,
 86 |         stacked_convs=4,
 87 |         feat_channels=256,
 88 |         anchor_generator=dict(
 89 |             type='AnchorGenerator',
 90 |             ratios=[1.0],
 91 |             octave_base_scale=8,
 92 |             scales_per_octave=1,
 93 |             strides=[8, 16, 32, 64, 128]),
 94 |         bbox_coder=dict(
 95 |             type='DeltaXYWHBBoxCoder',
 96 |             target_means=[.0, .0, .0, .0],
 97 |             target_stds=[0.1, 0.1, 0.2, 0.2]),
 98 |         loss_cls=dict(
 99 |             type='FocalLoss',
100 |             use_sigmoid=True,
101 |             gamma=2.0,
102 |             alpha=0.25,
103 |             loss_weight=1.0),
104 |         loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 |         loss_centerness=dict(
106 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
107 | # training and testing settings
108 | train_cfg = dict(
109 |     assigner=dict(type='ATSSAssigner', topk=9),
110 |     allowed_border=-1,
111 |     pos_weight=-1,
112 |     debug=False)
113 | test_cfg = dict(
114 |     nms_pre=1000,
115 |     min_bbox_size=0,
116 |     score_thr=0.05,
117 |     nms=dict(type='nms', iou_thr=0.6),
118 |     max_per_img=100)
119 | 
120 | # optimizer
121 | lr = samples_per_gpu * num_gpus / 16 * 0.01
122 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
123 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
124 | 
125 | 
126 | # learning policy
127 | lr_config = dict(
128 |     policy='step',
129 |     warmup='linear',
130 |     warmup_iters=500,
131 |     warmup_ratio=1.0 / 3,
132 |     step=[27, 33])
133 | total_epochs = 36
134 | 
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 |     interval=50,
139 |     hooks=[
140 |         dict(type='TextLoggerHook'),
141 |         # dict(type='TensorboardLoggerHook')
142 |     ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 | 
150 | find_unused_parameters=True
151 | 


--------------------------------------------------------------------------------
/configs/FCOS/fcos_r50-caffe_coco_ms_1x.py:
--------------------------------------------------------------------------------
  1 | samples_per_gpu = 2
  2 | num_gpus = 8
  3 | num_classes = 80
  4 | 
  5 | dataset_type = 'CocoDataset'
  6 | data_root = 'datasets/coco/coco_2017/'
  7 | 
  8 | img_norm_cfg = dict(
  9 |     mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)         
 10 | train_pipeline = [
 11 |     dict(type='LoadImageFromFile'),
 12 |     dict(type='LoadAnnotations', with_bbox=True),
 13 |     dict(type='Resize', 
 14 |                 img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
 15 |                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
 16 |                            (736, 1333), (768, 1333), (800, 1333)],
 17 |                 multiscale_mode='value',
 18 |                 keep_ratio=True),
 19 |     dict(type='RandomFlip', flip_ratio=0.5),
 20 |     dict(type='Normalize', **img_norm_cfg),
 21 |     dict(type='Pad', size_divisor=32),
 22 |     dict(type='DefaultFormatBundle'),
 23 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
 24 | ]
 25 | test_pipeline = [
 26 |     dict(type='LoadImageFromFile'),
 27 |     dict(
 28 |         type='MultiScaleFlipAug',
 29 |         img_scale=(1333, 800),
 30 |         flip=False,
 31 |         transforms=[
 32 |             dict(type='Resize', keep_ratio=True),
 33 |             dict(type='RandomFlip'),
 34 |             dict(type='Normalize', **img_norm_cfg),
 35 |             dict(type='Pad', size_divisor=32),
 36 |             dict(type='ImageToTensor', keys=['img']),
 37 |             dict(type='Collect', keys=['img']),
 38 |         ])
 39 | ]
 40 | 
 41 | data = dict(
 42 |     samples_per_gpu=samples_per_gpu,
 43 |     workers_per_gpu=samples_per_gpu,
 44 |     train=dict(
 45 |         type=dataset_type,
 46 |         ann_file=data_root + 'annotations/instances_train2017.json',
 47 |         img_prefix=data_root + 'train2017/',
 48 |         pipeline=train_pipeline),
 49 |     val=dict(
 50 |         type=dataset_type,
 51 |         ann_file=data_root + 'annotations/instances_val2017.json',
 52 |         img_prefix=data_root + 'val2017/',
 53 |         pipeline=test_pipeline),
 54 |     test=dict(
 55 |         type=dataset_type,
 56 |         ann_file=data_root + 'annotations/instances_val2017.json',
 57 |         img_prefix=data_root + 'val2017/',
 58 |         pipeline=test_pipeline)
 59 |     )
 60 | evaluation = dict(interval=1, metric='bbox')
 61 | 
 62 | model = dict(
 63 |     type='FCOS',
 64 |     pretrained='open-mmlab://detectron/resnet50_caffe',
 65 |     backbone=dict(
 66 |         type='ResNet',
 67 |         depth=50,
 68 |         num_stages=4,
 69 |         out_indices=(0, 1, 2, 3),
 70 |         frozen_stages=1,
 71 |         norm_cfg=dict(type='BN', requires_grad=False),   
 72 |         norm_eval=True,
 73 |         style='caffe'),
 74 |     neck=dict(
 75 |         type='FPN',
 76 |         in_channels=[256, 512, 1024, 2048],
 77 |         out_channels=256,
 78 |         start_level=1,
 79 |         add_extra_convs=True,
 80 |         extra_convs_on_inputs=False,
 81 |         num_outs=5,
 82 |         relu_before_extra_convs=True),
 83 | 
 84 |     bbox_head=dict(
 85 |         type='FCOSHead',
 86 |         num_classes=num_classes,
 87 |         in_channels=256,
 88 |         stacked_convs=4,
 89 |         feat_channels=256,
 90 |         strides=[8, 16, 32, 64, 128],
 91 |         regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512),
 92 |                                  (512, 10000)),
 93 |         center_sampling=True,
 94 |         center_sample_radius=1.5,
 95 |         norm_on_bbox=True,
 96 |         centerness_on_reg=True,        
 97 |         norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
 98 |         loss_cls=dict(
 99 |                 type='FocalLoss',
100 |                 use_sigmoid=True,
101 |                 gamma=2.0,
102 |                 alpha=0.25,
103 |                 loss_weight=1.0),
104 |         loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 |         loss_centerness=dict(
106 |                      type='CrossEntropyLoss',
107 |                      use_sigmoid=True,
108 |                      loss_weight=1.0),
109 |         ))
110 | 
111 | # training and testing settings
112 | train_cfg = dict()
113 | test_cfg = dict(
114 |     nms_pre=1000,
115 |     min_bbox_size=0,
116 |     score_thr=0.05,
117 |     nms=dict(type='nms', iou_thr=0.6),
118 |     max_per_img=100)
119 | 
120 | 
121 | # optimizer
122 | lr = samples_per_gpu * num_gpus / 16 * 0.01
123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
125 | 
126 | # learning policy
127 | lr_config = dict(
128 |     policy='step',
129 |     warmup='linear',
130 |     warmup_iters=500,
131 |     warmup_ratio=1.0 / 3,
132 |     step=[8, 11])
133 | total_epochs = 12
134 | 
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 |     interval=50,
139 |     hooks=[
140 |         dict(type='TextLoggerHook'),
141 |         # dict(type='TensorboardLoggerHook')
142 |     ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 | 
150 | find_unused_parameters = True
151 | 


--------------------------------------------------------------------------------
/configs/FCOS/fcos_r50-caffe_coco_ms_2x.py:
--------------------------------------------------------------------------------
  1 | samples_per_gpu = 2
  2 | num_gpus = 8
  3 | num_classes = 80
  4 | 
  5 | dataset_type = 'CocoDataset'
  6 | data_root = 'datasets/coco/coco_2017/'
  7 | 
  8 | img_norm_cfg = dict(
  9 |     mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)         
 10 | train_pipeline = [
 11 |     dict(type='LoadImageFromFile'),
 12 |     dict(type='LoadAnnotations', with_bbox=True),
 13 |     dict(type='Resize', 
 14 |                 img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
 15 |                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
 16 |                            (736, 1333), (768, 1333), (800, 1333)],
 17 |                 multiscale_mode='value',
 18 |                 keep_ratio=True),
 19 |     dict(type='RandomFlip', flip_ratio=0.5),
 20 |     dict(type='Normalize', **img_norm_cfg),
 21 |     dict(type='Pad', size_divisor=32),
 22 |     dict(type='DefaultFormatBundle'),
 23 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
 24 | ]
 25 | test_pipeline = [
 26 |     dict(type='LoadImageFromFile'),
 27 |     dict(
 28 |         type='MultiScaleFlipAug',
 29 |         img_scale=(1333, 800),
 30 |         flip=False,
 31 |         transforms=[
 32 |             dict(type='Resize', keep_ratio=True),
 33 |             dict(type='RandomFlip'),
 34 |             dict(type='Normalize', **img_norm_cfg),
 35 |             dict(type='Pad', size_divisor=32),
 36 |             dict(type='ImageToTensor', keys=['img']),
 37 |             dict(type='Collect', keys=['img']),
 38 |         ])
 39 | ]
 40 | 
 41 | data = dict(
 42 |     samples_per_gpu=samples_per_gpu,
 43 |     workers_per_gpu=samples_per_gpu,
 44 |     train=dict(
 45 |         type=dataset_type,
 46 |         ann_file=data_root + 'annotations/instances_train2017.json',
 47 |         img_prefix=data_root + 'train2017/',
 48 |         pipeline=train_pipeline),
 49 |     val=dict(
 50 |         type=dataset_type,
 51 |         ann_file=data_root + 'annotations/instances_val2017.json',
 52 |         img_prefix=data_root + 'val2017/',
 53 |         pipeline=test_pipeline),
 54 |     test=dict(
 55 |         type=dataset_type,
 56 |         ann_file=data_root + 'annotations/instances_val2017.json',
 57 |         img_prefix=data_root + 'val2017/',
 58 |         pipeline=test_pipeline)
 59 |     )
 60 | evaluation = dict(interval=1, metric='bbox')
 61 | 
 62 | model = dict(
 63 |     type='FCOS',
 64 |     pretrained='open-mmlab://detectron/resnet50_caffe',
 65 |     backbone=dict(
 66 |         type='ResNet',
 67 |         depth=50,
 68 |         num_stages=4,
 69 |         out_indices=(0, 1, 2, 3),
 70 |         frozen_stages=1,
 71 |         norm_cfg=dict(type='BN', requires_grad=False),   
 72 |         norm_eval=True,
 73 |         style='caffe'),
 74 |     neck=dict(
 75 |         type='FPN',
 76 |         in_channels=[256, 512, 1024, 2048],
 77 |         out_channels=256,
 78 |         start_level=1,
 79 |         add_extra_convs=True,
 80 |         extra_convs_on_inputs=False,
 81 |         num_outs=5,
 82 |         relu_before_extra_convs=True),
 83 | 
 84 |     bbox_head=dict(
 85 |         type='FCOSHead',
 86 |         num_classes=num_classes,
 87 |         in_channels=256,
 88 |         stacked_convs=4,
 89 |         feat_channels=256,
 90 |         strides=[8, 16, 32, 64, 128],
 91 |         regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512),
 92 |                                  (512, 10000)),
 93 |         center_sampling=True,
 94 |         center_sample_radius=1.5,
 95 |         norm_on_bbox=True,
 96 |         centerness_on_reg=True,        
 97 |         norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
 98 |         loss_cls=dict(
 99 |                 type='FocalLoss',
100 |                 use_sigmoid=True,
101 |                 gamma=2.0,
102 |                 alpha=0.25,
103 |                 loss_weight=1.0),
104 |         loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 |         loss_centerness=dict(
106 |                      type='CrossEntropyLoss',
107 |                      use_sigmoid=True,
108 |                      loss_weight=1.0),
109 |         ))
110 | 
111 | # training and testing settings
112 | train_cfg = dict()
113 | test_cfg = dict(
114 |     nms_pre=1000,
115 |     min_bbox_size=0,
116 |     score_thr=0.05,
117 |     nms=dict(type='nms', iou_thr=0.6),
118 |     max_per_img=100)
119 | 
120 | 
121 | # optimizer
122 | lr = samples_per_gpu * num_gpus / 16 * 0.01
123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
125 | 
126 | # learning policy
127 | lr_config = dict(
128 |     policy='step',
129 |     warmup='linear',
130 |     warmup_iters=500,
131 |     warmup_ratio=1.0 / 3,
132 |     step=[18, 22])
133 | total_epochs = 24
134 | 
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 |     interval=50,
139 |     hooks=[
140 |         dict(type='TextLoggerHook'),
141 |         # dict(type='TensorboardLoggerHook')
142 |     ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 | 
150 | find_unused_parameters = True
151 | 


--------------------------------------------------------------------------------
/configs/FCOS/fcos_r50-caffe_coco_ms_3x.py:
--------------------------------------------------------------------------------
  1 | samples_per_gpu = 2
  2 | num_gpus = 8
  3 | num_classes = 80
  4 | 
  5 | dataset_type = 'CocoDataset'
  6 | data_root = 'datasets/coco/coco_2017/'
  7 | 
  8 | img_norm_cfg = dict(
  9 |     mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)         
 10 | train_pipeline = [
 11 |     dict(type='LoadImageFromFile'),
 12 |     dict(type='LoadAnnotations', with_bbox=True),
 13 |     dict(type='Resize', 
 14 |                 img_scale=[(480, 1333), (512, 1333), (544, 1333), (576, 1333),
 15 |                            (608, 1333), (640, 1333), (672, 1333), (704, 1333),
 16 |                            (736, 1333), (768, 1333), (800, 1333)],
 17 |                 multiscale_mode='value',
 18 |                 keep_ratio=True),
 19 |     dict(type='RandomFlip', flip_ratio=0.5),
 20 |     dict(type='Normalize', **img_norm_cfg),
 21 |     dict(type='Pad', size_divisor=32),
 22 |     dict(type='DefaultFormatBundle'),
 23 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
 24 | ]
 25 | test_pipeline = [
 26 |     dict(type='LoadImageFromFile'),
 27 |     dict(
 28 |         type='MultiScaleFlipAug',
 29 |         img_scale=(1333, 800),
 30 |         flip=False,
 31 |         transforms=[
 32 |             dict(type='Resize', keep_ratio=True),
 33 |             dict(type='RandomFlip'),
 34 |             dict(type='Normalize', **img_norm_cfg),
 35 |             dict(type='Pad', size_divisor=32),
 36 |             dict(type='ImageToTensor', keys=['img']),
 37 |             dict(type='Collect', keys=['img']),
 38 |         ])
 39 | ]
 40 | 
 41 | data = dict(
 42 |     samples_per_gpu=samples_per_gpu,
 43 |     workers_per_gpu=samples_per_gpu,
 44 |     train=dict(
 45 |         type=dataset_type,
 46 |         ann_file=data_root + 'annotations/instances_train2017.json',
 47 |         img_prefix=data_root + 'train2017/',
 48 |         pipeline=train_pipeline),
 49 |     val=dict(
 50 |         type=dataset_type,
 51 |         ann_file=data_root + 'annotations/instances_val2017.json',
 52 |         img_prefix=data_root + 'val2017/',
 53 |         pipeline=test_pipeline),
 54 |     test=dict(
 55 |         type=dataset_type,
 56 |         ann_file=data_root + 'annotations/instances_val2017.json',
 57 |         img_prefix=data_root + 'val2017/',
 58 |         pipeline=test_pipeline)
 59 |     )
 60 | evaluation = dict(interval=1, metric='bbox')
 61 | 
 62 | model = dict(
 63 |     type='FCOS',
 64 |     pretrained='open-mmlab://detectron/resnet50_caffe',
 65 |     backbone=dict(
 66 |         type='ResNet',
 67 |         depth=50,
 68 |         num_stages=4,
 69 |         out_indices=(0, 1, 2, 3),
 70 |         frozen_stages=1,
 71 |         norm_cfg=dict(type='BN', requires_grad=False),   
 72 |         norm_eval=True,
 73 |         style='caffe'),
 74 |     neck=dict(
 75 |         type='FPN',
 76 |         in_channels=[256, 512, 1024, 2048],
 77 |         out_channels=256,
 78 |         start_level=1,
 79 |         add_extra_convs=True,
 80 |         extra_convs_on_inputs=False,
 81 |         num_outs=5,
 82 |         relu_before_extra_convs=True),
 83 | 
 84 |     bbox_head=dict(
 85 |         type='FCOSHead',
 86 |         num_classes=num_classes,
 87 |         in_channels=256,
 88 |         stacked_convs=4,
 89 |         feat_channels=256,
 90 |         strides=[8, 16, 32, 64, 128],
 91 |         regress_ranges=((-1, 64), (64, 128), (128, 256), (256, 512),
 92 |                                  (512, 10000)),
 93 |         center_sampling=True,
 94 |         center_sample_radius=1.5,
 95 |         norm_on_bbox=True,
 96 |         centerness_on_reg=True,        
 97 |         norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),
 98 |         loss_cls=dict(
 99 |                 type='FocalLoss',
100 |                 use_sigmoid=True,
101 |                 gamma=2.0,
102 |                 alpha=0.25,
103 |                 loss_weight=1.0),
104 |         loss_bbox=dict(type='GIoULoss', loss_weight=2.0),
105 |         loss_centerness=dict(
106 |                      type='CrossEntropyLoss',
107 |                      use_sigmoid=True,
108 |                      loss_weight=1.0),
109 |         ))
110 | 
111 | # training and testing settings
112 | train_cfg = dict()
113 | test_cfg = dict(
114 |     nms_pre=1000,
115 |     min_bbox_size=0,
116 |     score_thr=0.05,
117 |     nms=dict(type='nms', iou_thr=0.6),
118 |     max_per_img=100)
119 | 
120 | 
121 | # optimizer
122 | lr = samples_per_gpu * num_gpus / 16 * 0.01
123 | optimizer = dict(type='SGD', lr=lr, momentum=0.9, weight_decay=0.0001)
124 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
125 | 
126 | # learning policy
127 | lr_config = dict(
128 |     policy='step',
129 |     warmup='linear',
130 |     warmup_iters=500,
131 |     warmup_ratio=1.0 / 3,
132 |     step=[27, 33])
133 | total_epochs = 36
134 | 
135 | checkpoint_config = dict(interval=1)
136 | # yapf:disable
137 | log_config = dict(
138 |     interval=50,
139 |     hooks=[
140 |         dict(type='TextLoggerHook'),
141 |         # dict(type='TensorboardLoggerHook')
142 |     ])
143 | # yapf:enable
144 | dist_params = dict(backend='nccl')
145 | log_level = 'INFO'
146 | load_from = None
147 | resume_from = None
148 | workflow = [('train', 1)]
149 | 
150 | find_unused_parameters = True
151 | 


--------------------------------------------------------------------------------