├── README.md
├── datasets.py
├── detection
    ├── README.md
    ├── configs
    │   ├── _base_
    │   │   ├── datasets
    │   │   │   ├── cityscapes_detection.py
    │   │   │   ├── cityscapes_instance.py
    │   │   │   ├── coco_detection.py
    │   │   │   ├── coco_instance.py
    │   │   │   ├── coco_instance_semantic.py
    │   │   │   ├── deepfashion.py
    │   │   │   ├── lvis_v0.5_instance.py
    │   │   │   ├── lvis_v1_instance.py
    │   │   │   ├── voc0712.py
    │   │   │   └── wider_face.py
    │   │   ├── default_runtime.py
    │   │   ├── models
    │   │   │   ├── cascade_mask_rcnn_r50_fpn.py
    │   │   │   ├── cascade_rcnn_r50_fpn.py
    │   │   │   ├── fast_rcnn_r50_fpn.py
    │   │   │   ├── faster_rcnn_r50_caffe_c4.py
    │   │   │   ├── faster_rcnn_r50_caffe_dc5.py
    │   │   │   ├── faster_rcnn_r50_fpn.py
    │   │   │   ├── mask_rcnn_r50_caffe_c4.py
    │   │   │   ├── mask_rcnn_r50_fpn.py
    │   │   │   ├── retinanet_r50_fpn.py
    │   │   │   ├── rpn_r50_caffe_c4.py
    │   │   │   ├── rpn_r50_fpn.py
    │   │   │   └── ssd300.py
    │   │   └── schedules
    │   │   │   ├── schedule_1x.py
    │   │   │   ├── schedule_20e.py
    │   │   │   └── schedule_2x.py
    │   ├── mask_rcnn_p2t_b_fpn_1x_coco.py
    │   ├── mask_rcnn_p2t_l_fpn_1x_coco.py
    │   ├── mask_rcnn_p2t_s_fpn_1x_coco.py
    │   ├── mask_rcnn_p2t_t_fpn_1x_coco.py
    │   ├── retinanet_p2t_b_fpn_1x_coco.py
    │   ├── retinanet_p2t_l_fpn_1x_coco.py
    │   ├── retinanet_p2t_s_fpn_1x_coco.py
    │   └── retinanet_p2t_t_fpn_1x_coco.py
    ├── dist_test.sh
    ├── dist_train.sh
    ├── p2t.py
    ├── test.py
    └── train.py
├── engine.py
├── figures
    └── p2t-arch.jpg
├── hubconf.py
├── main.py
├── mcloader
    ├── __init__.py
    ├── classification.py
    ├── data_prefetcher.py
    ├── image_list.py
    ├── imagenet.py
    └── mcloader.py
├── p2t.py
├── samplers.py
├── segmentation
    ├── README.md
    ├── align_resize.py
    ├── configs
    │   ├── _base_
    │   │   ├── datasets
    │   │   │   ├── ade20k.py
    │   │   │   ├── chase_db1.py
    │   │   │   ├── cityscapes.py
    │   │   │   ├── drive.py
    │   │   │   ├── hrf.py
    │   │   │   ├── pascal_context.py
    │   │   │   ├── pascal_voc12.py
    │   │   │   ├── pascal_voc12_aug.py
    │   │   │   └── stare.py
    │   │   ├── default_runtime.py
    │   │   ├── models
    │   │   │   ├── fpn_r50.py
    │   │   │   └── upernet_r50.py
    │   │   └── schedules
    │   │   │   ├── schedule_160k.py
    │   │   │   ├── schedule_20k.py
    │   │   │   ├── schedule_40k.py
    │   │   │   └── schedule_80k.py
    │   ├── sem_fpn_p2t_b_ade20k_80k.py
    │   ├── sem_fpn_p2t_l_ade20k_80k.py
    │   ├── sem_fpn_p2t_s_ade20k_80k.py
    │   └── sem_fpn_p2t_t_ade20k_80k.py
    ├── dist_test.sh
    ├── dist_train.sh
    ├── p2t.py
    ├── test.py
    └── train.py
└── utils.py


/README.md:
--------------------------------------------------------------------------------
  1 | ## [TPAMI22] Pyramid Pooling Transformer for Scene Understanding
  2 | 
  3 | This is the official repository for Pyramid Pooling Transformer (P2T). This repository contains:
  4 | 
  5 | * [x] Full code for training/test 
  6 | * [x] Pretrained models in image classification, object detection, and semantic segmentation.
  7 | 
  8 | Related links:
  9 | [[Official PDF Download]](https://mmcheng.net/wp-content/uploads/2022/09/22TPAMI-P2T.pdf)
 10 | [[中译版全文]](https://mmcheng.net/wp-content/uploads/2022/08/22PAMI_P2T_CN.pdf)
 11 | [[5分钟中文解读]](https://mp.weixin.qq.com/s/7qXtyFaIiYny0eUqBbPraQ)
 12 | 
 13 | ### Requirements:
 14 | 
 15 | * torch>=1.7+
 16 | * torchvision>=0.7.0+
 17 | * timm>=0.3.2
 18 | 
 19 | Validated on Torch 1.6/1.7/1.8, timm 0.3.2/0.4.12
 20 | 
 21 | ### Introduction
 22 | 
 23 | Pyramid pooling transformer (P2T) is a new generation backbone network,, benefiting many fundamental downstream vision tasks like object detection, semantic segmentation, and instance segmentation.
 24 | 
 25 | Although pyramid pooling has demonstrated its power on many downstream tasks such as object detection (SPP) and semantic segmentation (PSPNet), it has not been explored on the backbone network, which serves a cornerstone for many downstream vision tasks. 
 26 | P2T first bridges the gap between pyramid pooling and backbone network.
 27 | The core idea is P2T is adapting pyramid pooling to the downsampling of the flatten sequences in computing the self-attention, 
 28 | simultaneously reducing the sequence length and capturing powerful multi-scale contextual features.
 29 | Pyramid pooling is also very efficient and will only induce negligible computational cost.
 30 | 
 31 | In the experiments, P2T beats all the CNN/Transformer competitors such as ResNet, ResNeXt, Res2Net, PVT, Swin, Twins, and PVTv2, on image classification, semantic segmentation, object detection, and instance segmentation.
 32 | 
 33 | <img src="figures/p2t-arch.jpg" width="500">
 34 | 
 35 | ### Image Classification
 36 | 
 37 | |     Variants     | Input  Size    | Acc Top-1 | Acc Top-5 | #Params (M) | # GFLOPS | Google Drive |
 38 | |:---------------:|:---------:|:-----:|:-----:|:-----------:|:-----------------:|-----------------|
 39 | | P2T-Tiny   | 224 x 224 | 79.8 | 94.9 |    11.6    |    1.8    | [[weights]](https://drive.google.com/file/d/1x9EweWx77pXrHOCc7RJF3sYK2rht0_4m/view?usp=sharing)\|[[log]](https://drive.google.com/file/d/1CDofCg9pi0Cyiha_dIimggF228M5mOeH/view?usp=sharing) |
 40 | | P2T-Small  | 224 x 224 |  82.4 | 96.0 |    24.1    |    3.7    | [[weights]](https://drive.google.com/file/d/1FlwhyVKw0zqj2mux248gIQFQ8DGPi8rS/view?usp=sharing)\|[[log]](https://drive.google.com/file/d/1bCZz7y0I0EEw74KaVg5iAr3hBYtSIEii/view?usp=sharing) |
 41 | | P2T-Base | 224 x 224 |  83.5 | 96.6 |     36.2    |    6.5    |    [[weights]](https://drive.google.com/file/d/1iZoWexUTPUDSIZiJHNRt2zZl2kFj68F4/view?usp=sharing)\|[[log]](https://drive.google.com/file/d/13_XaX0XtYSzPatVl54ihFbEwflHLVvsl/view?usp=sharing)    |
 42 | | P2T-Large | 224 x 224 | 83.9 | 96.7 | 54.5 | 9.8 | [[weights]](https://drive.google.com/file/d/13jBJ7ShDJd1juViC-zPtfLXYPRwkNfya/view?usp=sharing)\|[[log]](https://drive.google.com/file/d/1-RLjGzez-_O2_8obbXvUYGhWacPnqK1U/view?usp=sharing) |
 43 | 
 44 | All models are trained on ImageNet1K dataset. You can see all weights/logs at this url: [[Google Drive]](https://drive.google.com/drive/folders/1Osweqc1OphwtWONXIgD20q9_I2arT9yz?usp=sharing)
 45 | [BaiduPan, 提取码yhwu](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu)
 46 | 
 47 | 
 48 | 
 49 | ### Semantic Segmentation
 50 | 
 51 | #### ADE20K (val set)
 52 | 
 53 | |  Base Model    | Variants  | mIoU | aAcc | mAcc | #Params (M) | # GFLOPS |                         Google Drive                         |
 54 | | :--: | :-------: | :--: | :--: | :---------: | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
 55 | | Semantic FPN    | P2T-Tiny  | 43.4 | 80.8 |    54.5    |    15.4    |   31.6   | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) |
 56 | | Semantic FPN    | P2T-Small | 46.7 | 82.0 |    58.4    |    27.8    |   42.7   | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) |
 57 | | Semantic FPN    | P2T-Base  | 48.7 | 82.9 |    60.7    |    39.8    |   58.5   | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) |
 58 | | Semantic FPN      | P2T-Large | 49.4 | 83.3 |    61.9    |    58.1    |   77.7   | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) |
 59 | 
 60 | The training and validation scripts can refer to the `segmentation` folder.
 61 | 
 62 | BaiduPan download link: [BaiduPan, 提取码yhwu](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu)
 63 | 
 64 | ### Object Detection
 65 | 
 66 | Tested on the coco validation set
 67 | 
 68 | 
 69 | | Base Model | Variants  |  AP  | AP@0.5 | AP@0.75 | #Params (M) | # GFLOPS |
 70 | | :--------: | :-------: | :--: | :----: | :-----: | :---------: | :------: |
 71 | | RetinaNet  | P2T-Tiny  | 41.3 |  62.0  |  44.1   |    21.1     |   206    |
 72 | | RetinaNet  | P2T-Small | 44.4 |  65.3  |  47.6   |    33.8     |   260    |
 73 | | RetinaNet  | P2T-Base  | 46.1 |  67.5  |  49.6   |    45.8     |   344    |
 74 | | RetinaNet  | P2T-Large | 47.2 |  68.4  |  50.9   |    64.4     |   449    |
 75 | 
 76 | Use this address to access all pretrained weights and logs: [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing)
 77 | 
 78 | BaiduPan download link: [BaiduPan, 提取码yhwu](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu)
 79 | 
 80 | 
 81 | ### Instance Segmentation 
 82 | 
 83 | Tested on the coco val set
 84 | 
 85 | 
 86 | | Base Model | Variants  | APb  | APb@0.5 | APm  | APm@0.5 | #Params (M) | # GFLOPS |
 87 | | :--------: | :-------: | :--: | :-----: | :--: | :-----: | :---------: | :------: |
 88 | | Mask R-CNN | P2T-Tiny  | 43.3 |  65.7   | 39.6 |  62.5   |    31.3     |   225    |
 89 | | Mask R-CNN | P2T-Small | 45.5 |  67.7   | 41.4 |  64.6   |    43.7     |   279    |
 90 | | Mask R-CNN | P2T-Base  | 47.2 |  69.3   | 42.7 |  66.1   |    55.7     |   363    |
 91 | | Mask R-CNN | P2T-Large | 48.3 |  70.2   | 43.5 |  67.3   |    74.0     |   467    |
 92 | 
 93 | `APb` denotes AP box metric, and `APm` is the AP mask metric.
 94 | 
 95 | Use this address to access all pretrained weights and logs: [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing)
 96 | 
 97 | ### Train
 98 | 
 99 | Use the following commands to train `P2T-Small` for distributed learning with 8 GPUs:
100 | 
101 | ````bash
102 | python -m torch.distributed.launch --nproc_per_node=8 \
103 |     --master_port=$((RANDOM+10000)) --use_env main.py --data-path ${YOUR_DATA_PATH} --batch-size 128 --model p2t_small --drop-path 0.1
104 | # model names: --model p2t_tiny/p2t_small/p2t_base/p2t_large
105 | # with --drop-path 0.1/0.1/0.3/0.3
106 | # replace ${YOUR_DATA_PATH} with your data path that contains train/ val/ directory
107 | ````
108 | 
109 | ### Validate the performance
110 | 
111 | Download the pretrained weights to `pretrained` directory first. Then use the following commands to validate the performance:
112 | 
113 | ````bash
114 | python main.py --eval --resume pretrained/p2t_small.pth --model p2t_small
115 | ````
116 | 
117 | ### Citation
118 | 
119 | If you are using the code/model/data provided here in a publication, please consider citing our works:
120 | 
121 | ````
122 | @ARTICLE{wu2022p2t,
123 |   author={Wu, Yu-Huan and Liu, Yun and Zhan, Xin and Cheng, Ming-Ming},
124 |   journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
125 |   title={{P2T}: Pyramid Pooling Transformer for Scene Understanding}, 
126 |   year={2022},
127 |   doi = {10.1109/tpami.2022.3202765},
128 | }
129 | ````
130 | 
131 | ### Other Notes
132 | 
133 | If you meet any problems, please do not hesitate to contact us.
134 | Issues and discussions are welcome in the repository!
135 | You can also contact us via sending messages to this email: wuyuhuan@mail.nankai.edu.cn
136 | 
137 | 
138 | ### License
139 | 
140 | This code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Non-Commercial use only. Any commercial use should get formal permission first.
141 | 
142 | 


--------------------------------------------------------------------------------
/datasets.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2015-present, Facebook, Inc.
  2 | # All rights reserved.
  3 | import os
  4 | import json
  5 | 
  6 | from torchvision import datasets, transforms
  7 | from torchvision.datasets.folder import ImageFolder, default_loader
  8 | 
  9 | from timm.data.constants import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
 10 | from timm.data import create_transform
 11 | from mcloader import ClassificationDataset
 12 | 
 13 | 
 14 | class INatDataset(ImageFolder):
 15 |     def __init__(self, root, train=True, year=2018, transform=None, target_transform=None,
 16 |                  category='name', loader=default_loader):
 17 |         self.transform = transform
 18 |         self.loader = loader
 19 |         self.target_transform = target_transform
 20 |         self.year = year
 21 |         # assert category in ['kingdom','phylum','class','order','supercategory','family','genus','name']
 22 |         path_json = os.path.join(root, f'{"train" if train else "val"}{year}.json')
 23 |         with open(path_json) as json_file:
 24 |             data = json.load(json_file)
 25 | 
 26 |         with open(os.path.join(root, 'categories.json')) as json_file:
 27 |             data_catg = json.load(json_file)
 28 | 
 29 |         path_json_for_targeter = os.path.join(root, f"train{year}.json")
 30 | 
 31 |         with open(path_json_for_targeter) as json_file:
 32 |             data_for_targeter = json.load(json_file)
 33 | 
 34 |         targeter = {}
 35 |         indexer = 0
 36 |         for elem in data_for_targeter['annotations']:
 37 |             king = []
 38 |             king.append(data_catg[int(elem['category_id'])][category])
 39 |             if king[0] not in targeter.keys():
 40 |                 targeter[king[0]] = indexer
 41 |                 indexer += 1
 42 |         self.nb_classes = len(targeter)
 43 | 
 44 |         self.samples = []
 45 |         for elem in data['images']:
 46 |             cut = elem['file_name'].split('/')
 47 |             target_current = int(cut[2])
 48 |             path_current = os.path.join(root, cut[0], cut[2], cut[3])
 49 | 
 50 |             categors = data_catg[target_current]
 51 |             target_current_true = targeter[categors[category]]
 52 |             self.samples.append((path_current, target_current_true))
 53 | 
 54 |     # __getitem__ and __len__ inherited from ImageFolder
 55 | 
 56 | 
 57 | def build_dataset(is_train, args):
 58 |     transform = build_transform(is_train, args)
 59 | 
 60 |     if args.data_set == 'CIFAR':
 61 |         dataset = datasets.CIFAR100(args.data_path, train=is_train, transform=transform)
 62 |         nb_classes = 100
 63 |     elif args.data_set == 'IMNET':
 64 |         if not args.use_mcloader:
 65 |             root = os.path.join(args.data_path, 'train' if is_train else 'val')
 66 |             dataset = datasets.ImageFolder(root, transform=transform)
 67 |         else:
 68 |             dataset = ClassificationDataset(
 69 |                 'train' if is_train else 'val',
 70 |                 pipeline=transform
 71 |             )
 72 |         nb_classes = 1000
 73 |     elif args.data_set == 'INAT':
 74 |         dataset = INatDataset(args.data_path, train=is_train, year=2018,
 75 |                               category=args.inat_category, transform=transform)
 76 |         nb_classes = dataset.nb_classes
 77 |     elif args.data_set == 'INAT19':
 78 |         dataset = INatDataset(args.data_path, train=is_train, year=2019,
 79 |                               category=args.inat_category, transform=transform)
 80 |         nb_classes = dataset.nb_classes
 81 | 
 82 |     return dataset, nb_classes
 83 | 
 84 | 
 85 | def build_transform(is_train, args):
 86 |     resize_im = args.input_size > 32
 87 |     if is_train:
 88 |         # this should always dispatch to transforms_imagenet_train
 89 |         transform = create_transform(
 90 |             input_size=args.input_size,
 91 |             is_training=True,
 92 |             color_jitter=args.color_jitter,
 93 |             auto_augment=args.aa,
 94 |             interpolation=args.train_interpolation,
 95 |             re_prob=args.reprob,
 96 |             re_mode=args.remode,
 97 |             re_count=args.recount,
 98 |         )
 99 |         if not resize_im:
100 |             # replace RandomResizedCropAndInterpolation with
101 |             # RandomCrop
102 |             transform.transforms[0] = transforms.RandomCrop(
103 |                 args.input_size, padding=4)
104 |         return transform
105 | 
106 |     t = []
107 |     if resize_im:
108 |         size = int((256 / 224) * args.input_size)
109 |         t.append(
110 |             transforms.Resize(size, interpolation=3),  # to maintain same ratio w.r.t. 224 images
111 |         )
112 |         t.append(transforms.CenterCrop(args.input_size))
113 | 
114 |     t.append(transforms.ToTensor())
115 |     t.append(transforms.Normalize(IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD))
116 |     return transforms.Compose(t)
117 | 


--------------------------------------------------------------------------------
/detection/README.md:
--------------------------------------------------------------------------------
 1 | ## [TPAMI22] Pyramid Pooling Transformer for Scene Understanding
 2 | 
 3 | This folder contains full training and test code for semantic segmentation.
 4 | 
 5 | ### Requirements
 6 | 
 7 | * mmdetection == 2.14
 8 | 
 9 | We train each model based on `mmdetection==2.8.0`.
10 | Since new GPU cards (RTX 3000 series) should compile mmcv from source to support this early version,
11 | we reorganize the config and support newer mmdetection version. 
12 | Therefore, you can simply reproduce the result on newer GPUs.
13 | 
14 | ### Data Preparation
15 | 
16 | Put MS COCO dataset files to `data/coco/`.
17 | 
18 | ### Object Detection
19 | 
20 | Tested on the coco validation set
21 | 
22 | 
23 | |  Base Model    | Variants  | AP | AP@0.5 | AP@0.75 | #Params (M) | # GFLOPS |
24 | | :--: | :-------: | :--: | :--: | :---------: | :------: | :----------------------------------------------------------: |
25 | | RetinaNet    | P2T-Tiny  | 41.3 | 62.0 |    44.1    |    21.1    |   206   |
26 | | RetinaNet  | P2T-Small | 44.4 | 65.3 |    47.6    |    33.8    |   260   |
27 | | RetinaNet  | P2T-Base  | 46.1 | 67.5 |    49.6    |    45.8    |   344    |
28 | | RetinaNet  | P2T-Large | 47.2 | 68.4 |    50.9    |    64.4    |   449   |
29 | 
30 | Use this address to access all pretrained weights and logs: [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing)
31 | 
32 | ### Instance Segmentation 
33 | 
34 | Tested on the coco val set
35 | 
36 | 
37 | |  Base Model    | Variants  | APb | APb@0.5 | APm  | APm@0.5 | #Params (M) | # GFLOPS |
38 | | :--: | :-------: | :--: | :--: | :---------: | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
39 | | Mask R-CNN | P2T-Tiny  | 43.3 | 65.7 |    39.6    |    62.5    |    31.3     |   225   |
40 | | Mask R-CNN | P2T-Small | 45.5 | 67.7 |    41.4    |    64.6    |    43.7     |   279   |
41 | | Mask R-CNN | P2T-Base  | 47.2 | 69.3 |    42.7    |    66.1    |    55.7    |   363   |
42 | | Mask R-CNN | P2T-Large | 48.3 | 70.2 | 43.5 |    67.3    |    74.0    |   467   |
43 | 
44 | `APb` denotes AP box metric, and `APm` is the AP mask metric.
45 | 
46 | Use this address to access all pretrained weights and logs: [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing)
47 | 
48 | 
49 | ### Train
50 | 
51 | Before training, please make sure you have `mmdetection==2.14` and downloaded the ImageNet-pretrained P2T weights from [[Google Drive]](https://drive.google.com/drive/folders/1Osweqc1OphwtWONXIgD20q9_I2arT9yz?usp=sharing) or
52 | [[BaiduPan, 提取码yhwu]](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu). 
53 | Put them to `pretrained/` folder.
54 | 
55 | Use the following commands to train `Mask R-CNN` with `P2T-Tiny` backbone for distributed learning with 8 GPUs:
56 | 
57 | ````
58 | bash dist_train.sh configs/mask_rcnn_p2t_t_fpn_1x_coco.py 8
59 | ````
60 | 
61 | Other configs are on the `configs` directory.
62 | 
63 | ### Validate
64 | 
65 | Please download the pretrained model from [[Google Drive]](https://drive.google.com/drive/folders/1fcg7n3Ga8cYoT-3Ar0PeQXjAC3AnQYyY?usp=sharing) or [[BaiduPan, 提取码yhwu]](https://pan.baidu.com/s/1JkE62CS9EoSTLW1M1Ajmxw?pwd=yhwu). Put them to `pretrained` folder.
66 | Then, use the following commands to validate `Semantic FPN` with `P2T-Small` backbone in a single GPU:
67 | 
68 | ````
69 | bash dist_test.sh configs/mask_rcnn_p2t_t_fpn_1x_coco.py pretrained/mask_rcnn_p2t_t_fpn_1x_coco-d875fa68.pth 1
70 | ````
71 | 
72 | 
73 | ### Other Notes
74 | 
75 | If you meet any problems, please do not hesitate to contact us.
76 | Issues and discussions are welcome in the repository!
77 | You can also contact us via sending messages to this email: wuyuhuan@mail.nankai.edu.cn
78 | 
79 | 
80 | 
81 | ### Citation
82 | 
83 | If you are using the code/model/data provided here in a publication, please consider citing our works:
84 | 
85 | ````
86 | @ARTICLE{wu2022p2t,
87 |   author={Wu, Yu-Huan and Liu, Yun and Zhan, Xin and Cheng, Ming-Ming},
88 |   journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
89 |   title={{P2T}: Pyramid Pooling Transformer for Scene Understanding}, 
90 |   year={2022},
91 |   doi = {10.1109/tpami.2022.3202765},
92 | }
93 | ````
94 | 
95 | ### License
96 | 
97 | This code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Non-Commercial use only. Any commercial use should get formal permission first.
98 | 
99 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/cityscapes_detection.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'CityscapesDataset'
 3 | data_root = 'data/cityscapes/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | train_pipeline = [
 7 |     dict(type='LoadImageFromFile'),
 8 |     dict(type='LoadAnnotations', with_bbox=True),
 9 |     dict(
10 |         type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True),
11 |     dict(type='RandomFlip', flip_ratio=0.5),
12 |     dict(type='Normalize', **img_norm_cfg),
13 |     dict(type='Pad', size_divisor=32),
14 |     dict(type='DefaultFormatBundle'),
15 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
16 | ]
17 | test_pipeline = [
18 |     dict(type='LoadImageFromFile'),
19 |     dict(
20 |         type='MultiScaleFlipAug',
21 |         img_scale=(2048, 1024),
22 |         flip=False,
23 |         transforms=[
24 |             dict(type='Resize', keep_ratio=True),
25 |             dict(type='RandomFlip'),
26 |             dict(type='Normalize', **img_norm_cfg),
27 |             dict(type='Pad', size_divisor=32),
28 |             dict(type='ImageToTensor', keys=['img']),
29 |             dict(type='Collect', keys=['img']),
30 |         ])
31 | ]
32 | data = dict(
33 |     samples_per_gpu=1,
34 |     workers_per_gpu=2,
35 |     train=dict(
36 |         type='RepeatDataset',
37 |         times=8,
38 |         dataset=dict(
39 |             type=dataset_type,
40 |             ann_file=data_root +
41 |             'annotations/instancesonly_filtered_gtFine_train.json',
42 |             img_prefix=data_root + 'leftImg8bit/train/',
43 |             pipeline=train_pipeline)),
44 |     val=dict(
45 |         type=dataset_type,
46 |         ann_file=data_root +
47 |         'annotations/instancesonly_filtered_gtFine_val.json',
48 |         img_prefix=data_root + 'leftImg8bit/val/',
49 |         pipeline=test_pipeline),
50 |     test=dict(
51 |         type=dataset_type,
52 |         ann_file=data_root +
53 |         'annotations/instancesonly_filtered_gtFine_test.json',
54 |         img_prefix=data_root + 'leftImg8bit/test/',
55 |         pipeline=test_pipeline))
56 | evaluation = dict(interval=1, metric='bbox')
57 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/cityscapes_instance.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'CityscapesDataset'
 3 | data_root = 'data/cityscapes/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | train_pipeline = [
 7 |     dict(type='LoadImageFromFile'),
 8 |     dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
 9 |     dict(
10 |         type='Resize', img_scale=[(2048, 800), (2048, 1024)], keep_ratio=True),
11 |     dict(type='RandomFlip', flip_ratio=0.5),
12 |     dict(type='Normalize', **img_norm_cfg),
13 |     dict(type='Pad', size_divisor=32),
14 |     dict(type='DefaultFormatBundle'),
15 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
16 | ]
17 | test_pipeline = [
18 |     dict(type='LoadImageFromFile'),
19 |     dict(
20 |         type='MultiScaleFlipAug',
21 |         img_scale=(2048, 1024),
22 |         flip=False,
23 |         transforms=[
24 |             dict(type='Resize', keep_ratio=True),
25 |             dict(type='RandomFlip'),
26 |             dict(type='Normalize', **img_norm_cfg),
27 |             dict(type='Pad', size_divisor=32),
28 |             dict(type='ImageToTensor', keys=['img']),
29 |             dict(type='Collect', keys=['img']),
30 |         ])
31 | ]
32 | data = dict(
33 |     samples_per_gpu=1,
34 |     workers_per_gpu=2,
35 |     train=dict(
36 |         type='RepeatDataset',
37 |         times=8,
38 |         dataset=dict(
39 |             type=dataset_type,
40 |             ann_file=data_root +
41 |             'annotations/instancesonly_filtered_gtFine_train.json',
42 |             img_prefix=data_root + 'leftImg8bit/train/',
43 |             pipeline=train_pipeline)),
44 |     val=dict(
45 |         type=dataset_type,
46 |         ann_file=data_root +
47 |         'annotations/instancesonly_filtered_gtFine_val.json',
48 |         img_prefix=data_root + 'leftImg8bit/val/',
49 |         pipeline=test_pipeline),
50 |     test=dict(
51 |         type=dataset_type,
52 |         ann_file=data_root +
53 |         'annotations/instancesonly_filtered_gtFine_test.json',
54 |         img_prefix=data_root + 'leftImg8bit/test/',
55 |         pipeline=test_pipeline))
56 | evaluation = dict(metric=['bbox', 'segm'])
57 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/coco_detection.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'CocoDataset'
 3 | data_root = 'data/coco/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | train_pipeline = [
 7 |     dict(type='LoadImageFromFile'),
 8 |     dict(type='LoadAnnotations', with_bbox=True),
 9 |     dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
10 |     dict(type='RandomFlip', flip_ratio=0.5),
11 |     dict(type='Normalize', **img_norm_cfg),
12 |     dict(type='Pad', size_divisor=32),
13 |     dict(type='DefaultFormatBundle'),
14 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
15 | ]
16 | test_pipeline = [
17 |     dict(type='LoadImageFromFile'),
18 |     dict(
19 |         type='MultiScaleFlipAug',
20 |         img_scale=(1333, 800),
21 |         flip=False,
22 |         transforms=[
23 |             dict(type='Resize', keep_ratio=True),
24 |             dict(type='RandomFlip'),
25 |             dict(type='Normalize', **img_norm_cfg),
26 |             dict(type='Pad', size_divisor=32),
27 |             dict(type='ImageToTensor', keys=['img']),
28 |             dict(type='Collect', keys=['img']),
29 |         ])
30 | ]
31 | data = dict(
32 |     samples_per_gpu=2,
33 |     workers_per_gpu=2,
34 |     train=dict(
35 |         type=dataset_type,
36 |         ann_file=data_root + 'annotations/instances_train2017.json',
37 |         img_prefix=data_root + 'train2017/',
38 |         pipeline=train_pipeline),
39 |     val=dict(
40 |         type=dataset_type,
41 |         ann_file=data_root + 'annotations/instances_val2017.json',
42 |         img_prefix=data_root + 'val2017/',
43 |         pipeline=test_pipeline),
44 |     test=dict(
45 |         type=dataset_type,
46 |         ann_file=data_root + 'annotations/instances_val2017.json',
47 |         img_prefix=data_root + 'val2017/',
48 |         pipeline=test_pipeline))
49 | evaluation = dict(interval=1, metric='bbox')
50 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/coco_instance.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'CocoDataset'
 3 | data_root = 'data/coco/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | train_pipeline = [
 7 |     dict(type='LoadImageFromFile'),
 8 |     dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
 9 |     dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
10 |     dict(type='RandomFlip', flip_ratio=0.5),
11 |     dict(type='Normalize', **img_norm_cfg),
12 |     dict(type='Pad', size_divisor=32),
13 |     dict(type='DefaultFormatBundle'),
14 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
15 | ]
16 | test_pipeline = [
17 |     dict(type='LoadImageFromFile'),
18 |     dict(
19 |         type='MultiScaleFlipAug',
20 |         img_scale=(1333, 800),
21 |         flip=False,
22 |         transforms=[
23 |             dict(type='Resize', keep_ratio=True),
24 |             dict(type='RandomFlip'),
25 |             dict(type='Normalize', **img_norm_cfg),
26 |             dict(type='Pad', size_divisor=32),
27 |             dict(type='ImageToTensor', keys=['img']),
28 |             dict(type='Collect', keys=['img']),
29 |         ])
30 | ]
31 | data = dict(
32 |     samples_per_gpu=2,
33 |     workers_per_gpu=2,
34 |     train=dict(
35 |         type=dataset_type,
36 |         ann_file=data_root + 'annotations/instances_train2017.json',
37 |         img_prefix=data_root + 'train2017/',
38 |         pipeline=train_pipeline),
39 |     val=dict(
40 |         type=dataset_type,
41 |         ann_file=data_root + 'annotations/instances_val2017.json',
42 |         img_prefix=data_root + 'val2017/',
43 |         pipeline=test_pipeline),
44 |     test=dict(
45 |         type=dataset_type,
46 |         ann_file=data_root + 'annotations/instances_val2017.json',
47 |         img_prefix=data_root + 'val2017/',
48 |         pipeline=test_pipeline))
49 | evaluation = dict(metric=['bbox', 'segm'])
50 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/coco_instance_semantic.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'CocoDataset'
 3 | data_root = 'data/coco/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | train_pipeline = [
 7 |     dict(type='LoadImageFromFile'),
 8 |     dict(
 9 |         type='LoadAnnotations', with_bbox=True, with_mask=True, with_seg=True),
10 |     dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
11 |     dict(type='RandomFlip', flip_ratio=0.5),
12 |     dict(type='Normalize', **img_norm_cfg),
13 |     dict(type='Pad', size_divisor=32),
14 |     dict(type='SegRescale', scale_factor=1 / 8),
15 |     dict(type='DefaultFormatBundle'),
16 |     dict(
17 |         type='Collect',
18 |         keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks', 'gt_semantic_seg']),
19 | ]
20 | test_pipeline = [
21 |     dict(type='LoadImageFromFile'),
22 |     dict(
23 |         type='MultiScaleFlipAug',
24 |         img_scale=(1333, 800),
25 |         flip=False,
26 |         transforms=[
27 |             dict(type='Resize', keep_ratio=True),
28 |             dict(type='RandomFlip', flip_ratio=0.5),
29 |             dict(type='Normalize', **img_norm_cfg),
30 |             dict(type='Pad', size_divisor=32),
31 |             dict(type='ImageToTensor', keys=['img']),
32 |             dict(type='Collect', keys=['img']),
33 |         ])
34 | ]
35 | data = dict(
36 |     samples_per_gpu=2,
37 |     workers_per_gpu=2,
38 |     train=dict(
39 |         type=dataset_type,
40 |         ann_file=data_root + 'annotations/instances_train2017.json',
41 |         img_prefix=data_root + 'train2017/',
42 |         seg_prefix=data_root + 'stuffthingmaps/train2017/',
43 |         pipeline=train_pipeline),
44 |     val=dict(
45 |         type=dataset_type,
46 |         ann_file=data_root + 'annotations/instances_val2017.json',
47 |         img_prefix=data_root + 'val2017/',
48 |         pipeline=test_pipeline),
49 |     test=dict(
50 |         type=dataset_type,
51 |         ann_file=data_root + 'annotations/instances_val2017.json',
52 |         img_prefix=data_root + 'val2017/',
53 |         pipeline=test_pipeline))
54 | evaluation = dict(metric=['bbox', 'segm'])
55 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/deepfashion.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'DeepFashionDataset'
 3 | data_root = 'data/DeepFashion/In-shop/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | train_pipeline = [
 7 |     dict(type='LoadImageFromFile'),
 8 |     dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
 9 |     dict(type='Resize', img_scale=(750, 1101), keep_ratio=True),
10 |     dict(type='RandomFlip', flip_ratio=0.5),
11 |     dict(type='Normalize', **img_norm_cfg),
12 |     dict(type='Pad', size_divisor=32),
13 |     dict(type='DefaultFormatBundle'),
14 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
15 | ]
16 | test_pipeline = [
17 |     dict(type='LoadImageFromFile'),
18 |     dict(
19 |         type='MultiScaleFlipAug',
20 |         img_scale=(750, 1101),
21 |         flip=False,
22 |         transforms=[
23 |             dict(type='Resize', keep_ratio=True),
24 |             dict(type='RandomFlip'),
25 |             dict(type='Normalize', **img_norm_cfg),
26 |             dict(type='Pad', size_divisor=32),
27 |             dict(type='ImageToTensor', keys=['img']),
28 |             dict(type='Collect', keys=['img']),
29 |         ])
30 | ]
31 | data = dict(
32 |     imgs_per_gpu=2,
33 |     workers_per_gpu=1,
34 |     train=dict(
35 |         type=dataset_type,
36 |         ann_file=data_root + 'annotations/DeepFashion_segmentation_query.json',
37 |         img_prefix=data_root + 'Img/',
38 |         pipeline=train_pipeline,
39 |         data_root=data_root),
40 |     val=dict(
41 |         type=dataset_type,
42 |         ann_file=data_root + 'annotations/DeepFashion_segmentation_query.json',
43 |         img_prefix=data_root + 'Img/',
44 |         pipeline=test_pipeline,
45 |         data_root=data_root),
46 |     test=dict(
47 |         type=dataset_type,
48 |         ann_file=data_root +
49 |         'annotations/DeepFashion_segmentation_gallery.json',
50 |         img_prefix=data_root + 'Img/',
51 |         pipeline=test_pipeline,
52 |         data_root=data_root))
53 | evaluation = dict(interval=5, metric=['bbox', 'segm'])
54 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/lvis_v0.5_instance.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | _base_ = 'coco_instance.py'
 3 | dataset_type = 'LVISV05Dataset'
 4 | data_root = 'data/lvis_v0.5/'
 5 | data = dict(
 6 |     samples_per_gpu=2,
 7 |     workers_per_gpu=2,
 8 |     train=dict(
 9 |         _delete_=True,
10 |         type='ClassBalancedDataset',
11 |         oversample_thr=1e-3,
12 |         dataset=dict(
13 |             type=dataset_type,
14 |             ann_file=data_root + 'annotations/lvis_v0.5_train.json',
15 |             img_prefix=data_root + 'train2017/')),
16 |     val=dict(
17 |         type=dataset_type,
18 |         ann_file=data_root + 'annotations/lvis_v0.5_val.json',
19 |         img_prefix=data_root + 'val2017/'),
20 |     test=dict(
21 |         type=dataset_type,
22 |         ann_file=data_root + 'annotations/lvis_v0.5_val.json',
23 |         img_prefix=data_root + 'val2017/'))
24 | evaluation = dict(metric=['bbox', 'segm'])
25 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/lvis_v1_instance.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | _base_ = 'coco_instance.py'
 3 | dataset_type = 'LVISV1Dataset'
 4 | data_root = 'data/lvis_v1/'
 5 | data = dict(
 6 |     samples_per_gpu=2,
 7 |     workers_per_gpu=2,
 8 |     train=dict(
 9 |         _delete_=True,
10 |         type='ClassBalancedDataset',
11 |         oversample_thr=1e-3,
12 |         dataset=dict(
13 |             type=dataset_type,
14 |             ann_file=data_root + 'annotations/lvis_v1_train.json',
15 |             img_prefix=data_root)),
16 |     val=dict(
17 |         type=dataset_type,
18 |         ann_file=data_root + 'annotations/lvis_v1_val.json',
19 |         img_prefix=data_root),
20 |     test=dict(
21 |         type=dataset_type,
22 |         ann_file=data_root + 'annotations/lvis_v1_val.json',
23 |         img_prefix=data_root))
24 | evaluation = dict(metric=['bbox', 'segm'])
25 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/voc0712.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'VOCDataset'
 3 | data_root = 'data/VOCdevkit/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | train_pipeline = [
 7 |     dict(type='LoadImageFromFile'),
 8 |     dict(type='LoadAnnotations', with_bbox=True),
 9 |     dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
10 |     dict(type='RandomFlip', flip_ratio=0.5),
11 |     dict(type='Normalize', **img_norm_cfg),
12 |     dict(type='Pad', size_divisor=32),
13 |     dict(type='DefaultFormatBundle'),
14 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
15 | ]
16 | test_pipeline = [
17 |     dict(type='LoadImageFromFile'),
18 |     dict(
19 |         type='MultiScaleFlipAug',
20 |         img_scale=(1000, 600),
21 |         flip=False,
22 |         transforms=[
23 |             dict(type='Resize', keep_ratio=True),
24 |             dict(type='RandomFlip'),
25 |             dict(type='Normalize', **img_norm_cfg),
26 |             dict(type='Pad', size_divisor=32),
27 |             dict(type='ImageToTensor', keys=['img']),
28 |             dict(type='Collect', keys=['img']),
29 |         ])
30 | ]
31 | data = dict(
32 |     samples_per_gpu=2,
33 |     workers_per_gpu=2,
34 |     train=dict(
35 |         type='RepeatDataset',
36 |         times=3,
37 |         dataset=dict(
38 |             type=dataset_type,
39 |             ann_file=[
40 |                 data_root + 'VOC2007/ImageSets/Main/trainval.txt',
41 |                 data_root + 'VOC2012/ImageSets/Main/trainval.txt'
42 |             ],
43 |             img_prefix=[data_root + 'VOC2007/', data_root + 'VOC2012/'],
44 |             pipeline=train_pipeline)),
45 |     val=dict(
46 |         type=dataset_type,
47 |         ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
48 |         img_prefix=data_root + 'VOC2007/',
49 |         pipeline=test_pipeline),
50 |     test=dict(
51 |         type=dataset_type,
52 |         ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
53 |         img_prefix=data_root + 'VOC2007/',
54 |         pipeline=test_pipeline))
55 | evaluation = dict(interval=1, metric='mAP')
56 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/datasets/wider_face.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'WIDERFaceDataset'
 3 | data_root = 'data/WIDERFace/'
 4 | img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[1, 1, 1], to_rgb=True)
 5 | train_pipeline = [
 6 |     dict(type='LoadImageFromFile', to_float32=True),
 7 |     dict(type='LoadAnnotations', with_bbox=True),
 8 |     dict(
 9 |         type='PhotoMetricDistortion',
10 |         brightness_delta=32,
11 |         contrast_range=(0.5, 1.5),
12 |         saturation_range=(0.5, 1.5),
13 |         hue_delta=18),
14 |     dict(
15 |         type='Expand',
16 |         mean=img_norm_cfg['mean'],
17 |         to_rgb=img_norm_cfg['to_rgb'],
18 |         ratio_range=(1, 4)),
19 |     dict(
20 |         type='MinIoURandomCrop',
21 |         min_ious=(0.1, 0.3, 0.5, 0.7, 0.9),
22 |         min_crop_size=0.3),
23 |     dict(type='Resize', img_scale=(300, 300), keep_ratio=False),
24 |     dict(type='Normalize', **img_norm_cfg),
25 |     dict(type='RandomFlip', flip_ratio=0.5),
26 |     dict(type='DefaultFormatBundle'),
27 |     dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
28 | ]
29 | test_pipeline = [
30 |     dict(type='LoadImageFromFile'),
31 |     dict(
32 |         type='MultiScaleFlipAug',
33 |         img_scale=(300, 300),
34 |         flip=False,
35 |         transforms=[
36 |             dict(type='Resize', keep_ratio=False),
37 |             dict(type='Normalize', **img_norm_cfg),
38 |             dict(type='ImageToTensor', keys=['img']),
39 |             dict(type='Collect', keys=['img']),
40 |         ])
41 | ]
42 | data = dict(
43 |     samples_per_gpu=60,
44 |     workers_per_gpu=2,
45 |     train=dict(
46 |         type='RepeatDataset',
47 |         times=2,
48 |         dataset=dict(
49 |             type=dataset_type,
50 |             ann_file=data_root + 'train.txt',
51 |             img_prefix=data_root + 'WIDER_train/',
52 |             min_size=17,
53 |             pipeline=train_pipeline)),
54 |     val=dict(
55 |         type=dataset_type,
56 |         ann_file=data_root + 'val.txt',
57 |         img_prefix=data_root + 'WIDER_val/',
58 |         pipeline=test_pipeline),
59 |     test=dict(
60 |         type=dataset_type,
61 |         ann_file=data_root + 'val.txt',
62 |         img_prefix=data_root + 'WIDER_val/',
63 |         pipeline=test_pipeline))
64 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/default_runtime.py:
--------------------------------------------------------------------------------
 1 | checkpoint_config = dict(interval=1)
 2 | # yapf:disable
 3 | log_config = dict(
 4 |     interval=50,
 5 |     hooks=[
 6 |         dict(type='TextLoggerHook'),
 7 |         # dict(type='TensorboardLoggerHook')
 8 |     ])
 9 | # yapf:enable
10 | custom_hooks = [dict(type='NumClassCheckHook')]
11 | 
12 | dist_params = dict(backend='nccl')
13 | log_level = 'INFO'
14 | load_from = None
15 | resume_from = None
16 | workflow = [('train', 1)]
17 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/cascade_mask_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
  1 | # model settings
  2 | model = dict(
  3 |     type='CascadeRCNN',
  4 |     backbone=dict(
  5 |         type='ResNet',
  6 |         depth=50,
  7 |         num_stages=4,
  8 |         out_indices=(0, 1, 2, 3),
  9 |         frozen_stages=1,
 10 |         norm_cfg=dict(type='BN', requires_grad=True),
 11 |         norm_eval=True,
 12 |         style='pytorch',
 13 |         init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
 14 |     neck=dict(
 15 |         type='FPN',
 16 |         in_channels=[256, 512, 1024, 2048],
 17 |         out_channels=256,
 18 |         num_outs=5),
 19 |     rpn_head=dict(
 20 |         type='RPNHead',
 21 |         in_channels=256,
 22 |         feat_channels=256,
 23 |         anchor_generator=dict(
 24 |             type='AnchorGenerator',
 25 |             scales=[8],
 26 |             ratios=[0.5, 1.0, 2.0],
 27 |             strides=[4, 8, 16, 32, 64]),
 28 |         bbox_coder=dict(
 29 |             type='DeltaXYWHBBoxCoder',
 30 |             target_means=[.0, .0, .0, .0],
 31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
 32 |         loss_cls=dict(
 33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
 34 |         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
 35 |     roi_head=dict(
 36 |         type='CascadeRoIHead',
 37 |         num_stages=3,
 38 |         stage_loss_weights=[1, 0.5, 0.25],
 39 |         bbox_roi_extractor=dict(
 40 |             type='SingleRoIExtractor',
 41 |             roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
 42 |             out_channels=256,
 43 |             featmap_strides=[4, 8, 16, 32]),
 44 |         bbox_head=[
 45 |             dict(
 46 |                 type='Shared2FCBBoxHead',
 47 |                 in_channels=256,
 48 |                 fc_out_channels=1024,
 49 |                 roi_feat_size=7,
 50 |                 num_classes=80,
 51 |                 bbox_coder=dict(
 52 |                     type='DeltaXYWHBBoxCoder',
 53 |                     target_means=[0., 0., 0., 0.],
 54 |                     target_stds=[0.1, 0.1, 0.2, 0.2]),
 55 |                 reg_class_agnostic=True,
 56 |                 loss_cls=dict(
 57 |                     type='CrossEntropyLoss',
 58 |                     use_sigmoid=False,
 59 |                     loss_weight=1.0),
 60 |                 loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
 61 |                                loss_weight=1.0)),
 62 |             dict(
 63 |                 type='Shared2FCBBoxHead',
 64 |                 in_channels=256,
 65 |                 fc_out_channels=1024,
 66 |                 roi_feat_size=7,
 67 |                 num_classes=80,
 68 |                 bbox_coder=dict(
 69 |                     type='DeltaXYWHBBoxCoder',
 70 |                     target_means=[0., 0., 0., 0.],
 71 |                     target_stds=[0.05, 0.05, 0.1, 0.1]),
 72 |                 reg_class_agnostic=True,
 73 |                 loss_cls=dict(
 74 |                     type='CrossEntropyLoss',
 75 |                     use_sigmoid=False,
 76 |                     loss_weight=1.0),
 77 |                 loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
 78 |                                loss_weight=1.0)),
 79 |             dict(
 80 |                 type='Shared2FCBBoxHead',
 81 |                 in_channels=256,
 82 |                 fc_out_channels=1024,
 83 |                 roi_feat_size=7,
 84 |                 num_classes=80,
 85 |                 bbox_coder=dict(
 86 |                     type='DeltaXYWHBBoxCoder',
 87 |                     target_means=[0., 0., 0., 0.],
 88 |                     target_stds=[0.033, 0.033, 0.067, 0.067]),
 89 |                 reg_class_agnostic=True,
 90 |                 loss_cls=dict(
 91 |                     type='CrossEntropyLoss',
 92 |                     use_sigmoid=False,
 93 |                     loss_weight=1.0),
 94 |                 loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
 95 |         ],
 96 |         mask_roi_extractor=dict(
 97 |             type='SingleRoIExtractor',
 98 |             roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
 99 |             out_channels=256,
100 |             featmap_strides=[4, 8, 16, 32]),
101 |         mask_head=dict(
102 |             type='FCNMaskHead',
103 |             num_convs=4,
104 |             in_channels=256,
105 |             conv_out_channels=256,
106 |             num_classes=80,
107 |             loss_mask=dict(
108 |                 type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
109 |     # model training and testing settings
110 |     train_cfg=dict(
111 |         rpn=dict(
112 |             assigner=dict(
113 |                 type='MaxIoUAssigner',
114 |                 pos_iou_thr=0.7,
115 |                 neg_iou_thr=0.3,
116 |                 min_pos_iou=0.3,
117 |                 match_low_quality=True,
118 |                 ignore_iof_thr=-1),
119 |             sampler=dict(
120 |                 type='RandomSampler',
121 |                 num=256,
122 |                 pos_fraction=0.5,
123 |                 neg_pos_ub=-1,
124 |                 add_gt_as_proposals=False),
125 |             allowed_border=0,
126 |             pos_weight=-1,
127 |             debug=False),
128 |         rpn_proposal=dict(
129 |             nms_pre=2000,
130 |             max_per_img=2000,
131 |             nms=dict(type='nms', iou_threshold=0.7),
132 |             min_bbox_size=0),
133 |         rcnn=[
134 |             dict(
135 |                 assigner=dict(
136 |                     type='MaxIoUAssigner',
137 |                     pos_iou_thr=0.5,
138 |                     neg_iou_thr=0.5,
139 |                     min_pos_iou=0.5,
140 |                     match_low_quality=False,
141 |                     ignore_iof_thr=-1),
142 |                 sampler=dict(
143 |                     type='RandomSampler',
144 |                     num=512,
145 |                     pos_fraction=0.25,
146 |                     neg_pos_ub=-1,
147 |                     add_gt_as_proposals=True),
148 |                 mask_size=28,
149 |                 pos_weight=-1,
150 |                 debug=False),
151 |             dict(
152 |                 assigner=dict(
153 |                     type='MaxIoUAssigner',
154 |                     pos_iou_thr=0.6,
155 |                     neg_iou_thr=0.6,
156 |                     min_pos_iou=0.6,
157 |                     match_low_quality=False,
158 |                     ignore_iof_thr=-1),
159 |                 sampler=dict(
160 |                     type='RandomSampler',
161 |                     num=512,
162 |                     pos_fraction=0.25,
163 |                     neg_pos_ub=-1,
164 |                     add_gt_as_proposals=True),
165 |                 mask_size=28,
166 |                 pos_weight=-1,
167 |                 debug=False),
168 |             dict(
169 |                 assigner=dict(
170 |                     type='MaxIoUAssigner',
171 |                     pos_iou_thr=0.7,
172 |                     neg_iou_thr=0.7,
173 |                     min_pos_iou=0.7,
174 |                     match_low_quality=False,
175 |                     ignore_iof_thr=-1),
176 |                 sampler=dict(
177 |                     type='RandomSampler',
178 |                     num=512,
179 |                     pos_fraction=0.25,
180 |                     neg_pos_ub=-1,
181 |                     add_gt_as_proposals=True),
182 |                 mask_size=28,
183 |                 pos_weight=-1,
184 |                 debug=False)
185 |         ]),
186 |     test_cfg=dict(
187 |         rpn=dict(
188 |             nms_pre=1000,
189 |             max_per_img=1000,
190 |             nms=dict(type='nms', iou_threshold=0.7),
191 |             min_bbox_size=0),
192 |         rcnn=dict(
193 |             score_thr=0.05,
194 |             nms=dict(type='nms', iou_threshold=0.5),
195 |             max_per_img=100,
196 |             mask_thr_binary=0.5)))
197 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/cascade_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
  1 | # model settings
  2 | model = dict(
  3 |     type='CascadeRCNN',
  4 |     backbone=dict(
  5 |         type='ResNet',
  6 |         depth=50,
  7 |         num_stages=4,
  8 |         out_indices=(0, 1, 2, 3),
  9 |         frozen_stages=1,
 10 |         norm_cfg=dict(type='BN', requires_grad=True),
 11 |         norm_eval=True,
 12 |         style='pytorch',
 13 |         init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
 14 |     neck=dict(
 15 |         type='FPN',
 16 |         in_channels=[256, 512, 1024, 2048],
 17 |         out_channels=256,
 18 |         num_outs=5),
 19 |     rpn_head=dict(
 20 |         type='RPNHead',
 21 |         in_channels=256,
 22 |         feat_channels=256,
 23 |         anchor_generator=dict(
 24 |             type='AnchorGenerator',
 25 |             scales=[8],
 26 |             ratios=[0.5, 1.0, 2.0],
 27 |             strides=[4, 8, 16, 32, 64]),
 28 |         bbox_coder=dict(
 29 |             type='DeltaXYWHBBoxCoder',
 30 |             target_means=[.0, .0, .0, .0],
 31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
 32 |         loss_cls=dict(
 33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
 34 |         loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
 35 |     roi_head=dict(
 36 |         type='CascadeRoIHead',
 37 |         num_stages=3,
 38 |         stage_loss_weights=[1, 0.5, 0.25],
 39 |         bbox_roi_extractor=dict(
 40 |             type='SingleRoIExtractor',
 41 |             roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
 42 |             out_channels=256,
 43 |             featmap_strides=[4, 8, 16, 32]),
 44 |         bbox_head=[
 45 |             dict(
 46 |                 type='Shared2FCBBoxHead',
 47 |                 in_channels=256,
 48 |                 fc_out_channels=1024,
 49 |                 roi_feat_size=7,
 50 |                 num_classes=80,
 51 |                 bbox_coder=dict(
 52 |                     type='DeltaXYWHBBoxCoder',
 53 |                     target_means=[0., 0., 0., 0.],
 54 |                     target_stds=[0.1, 0.1, 0.2, 0.2]),
 55 |                 reg_class_agnostic=True,
 56 |                 loss_cls=dict(
 57 |                     type='CrossEntropyLoss',
 58 |                     use_sigmoid=False,
 59 |                     loss_weight=1.0),
 60 |                 loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
 61 |                                loss_weight=1.0)),
 62 |             dict(
 63 |                 type='Shared2FCBBoxHead',
 64 |                 in_channels=256,
 65 |                 fc_out_channels=1024,
 66 |                 roi_feat_size=7,
 67 |                 num_classes=80,
 68 |                 bbox_coder=dict(
 69 |                     type='DeltaXYWHBBoxCoder',
 70 |                     target_means=[0., 0., 0., 0.],
 71 |                     target_stds=[0.05, 0.05, 0.1, 0.1]),
 72 |                 reg_class_agnostic=True,
 73 |                 loss_cls=dict(
 74 |                     type='CrossEntropyLoss',
 75 |                     use_sigmoid=False,
 76 |                     loss_weight=1.0),
 77 |                 loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
 78 |                                loss_weight=1.0)),
 79 |             dict(
 80 |                 type='Shared2FCBBoxHead',
 81 |                 in_channels=256,
 82 |                 fc_out_channels=1024,
 83 |                 roi_feat_size=7,
 84 |                 num_classes=80,
 85 |                 bbox_coder=dict(
 86 |                     type='DeltaXYWHBBoxCoder',
 87 |                     target_means=[0., 0., 0., 0.],
 88 |                     target_stds=[0.033, 0.033, 0.067, 0.067]),
 89 |                 reg_class_agnostic=True,
 90 |                 loss_cls=dict(
 91 |                     type='CrossEntropyLoss',
 92 |                     use_sigmoid=False,
 93 |                     loss_weight=1.0),
 94 |                 loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
 95 |         ]),
 96 |     # model training and testing settings
 97 |     train_cfg=dict(
 98 |         rpn=dict(
 99 |             assigner=dict(
100 |                 type='MaxIoUAssigner',
101 |                 pos_iou_thr=0.7,
102 |                 neg_iou_thr=0.3,
103 |                 min_pos_iou=0.3,
104 |                 match_low_quality=True,
105 |                 ignore_iof_thr=-1),
106 |             sampler=dict(
107 |                 type='RandomSampler',
108 |                 num=256,
109 |                 pos_fraction=0.5,
110 |                 neg_pos_ub=-1,
111 |                 add_gt_as_proposals=False),
112 |             allowed_border=0,
113 |             pos_weight=-1,
114 |             debug=False),
115 |         rpn_proposal=dict(
116 |             nms_pre=2000,
117 |             max_per_img=2000,
118 |             nms=dict(type='nms', iou_threshold=0.7),
119 |             min_bbox_size=0),
120 |         rcnn=[
121 |             dict(
122 |                 assigner=dict(
123 |                     type='MaxIoUAssigner',
124 |                     pos_iou_thr=0.5,
125 |                     neg_iou_thr=0.5,
126 |                     min_pos_iou=0.5,
127 |                     match_low_quality=False,
128 |                     ignore_iof_thr=-1),
129 |                 sampler=dict(
130 |                     type='RandomSampler',
131 |                     num=512,
132 |                     pos_fraction=0.25,
133 |                     neg_pos_ub=-1,
134 |                     add_gt_as_proposals=True),
135 |                 pos_weight=-1,
136 |                 debug=False),
137 |             dict(
138 |                 assigner=dict(
139 |                     type='MaxIoUAssigner',
140 |                     pos_iou_thr=0.6,
141 |                     neg_iou_thr=0.6,
142 |                     min_pos_iou=0.6,
143 |                     match_low_quality=False,
144 |                     ignore_iof_thr=-1),
145 |                 sampler=dict(
146 |                     type='RandomSampler',
147 |                     num=512,
148 |                     pos_fraction=0.25,
149 |                     neg_pos_ub=-1,
150 |                     add_gt_as_proposals=True),
151 |                 pos_weight=-1,
152 |                 debug=False),
153 |             dict(
154 |                 assigner=dict(
155 |                     type='MaxIoUAssigner',
156 |                     pos_iou_thr=0.7,
157 |                     neg_iou_thr=0.7,
158 |                     min_pos_iou=0.7,
159 |                     match_low_quality=False,
160 |                     ignore_iof_thr=-1),
161 |                 sampler=dict(
162 |                     type='RandomSampler',
163 |                     num=512,
164 |                     pos_fraction=0.25,
165 |                     neg_pos_ub=-1,
166 |                     add_gt_as_proposals=True),
167 |                 pos_weight=-1,
168 |                 debug=False)
169 |         ]),
170 |     test_cfg=dict(
171 |         rpn=dict(
172 |             nms_pre=1000,
173 |             max_per_img=1000,
174 |             nms=dict(type='nms', iou_threshold=0.7),
175 |             min_bbox_size=0),
176 |         rcnn=dict(
177 |             score_thr=0.05,
178 |             nms=dict(type='nms', iou_threshold=0.5),
179 |             max_per_img=100)))
180 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/fast_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
 1 | # model settings
 2 | model = dict(
 3 |     type='FastRCNN',
 4 |     backbone=dict(
 5 |         type='ResNet',
 6 |         depth=50,
 7 |         num_stages=4,
 8 |         out_indices=(0, 1, 2, 3),
 9 |         frozen_stages=1,
10 |         norm_cfg=dict(type='BN', requires_grad=True),
11 |         norm_eval=True,
12 |         style='pytorch',
13 |         init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 |     neck=dict(
15 |         type='FPN',
16 |         in_channels=[256, 512, 1024, 2048],
17 |         out_channels=256,
18 |         num_outs=5),
19 |     roi_head=dict(
20 |         type='StandardRoIHead',
21 |         bbox_roi_extractor=dict(
22 |             type='SingleRoIExtractor',
23 |             roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
24 |             out_channels=256,
25 |             featmap_strides=[4, 8, 16, 32]),
26 |         bbox_head=dict(
27 |             type='Shared2FCBBoxHead',
28 |             in_channels=256,
29 |             fc_out_channels=1024,
30 |             roi_feat_size=7,
31 |             num_classes=80,
32 |             bbox_coder=dict(
33 |                 type='DeltaXYWHBBoxCoder',
34 |                 target_means=[0., 0., 0., 0.],
35 |                 target_stds=[0.1, 0.1, 0.2, 0.2]),
36 |             reg_class_agnostic=False,
37 |             loss_cls=dict(
38 |                 type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
39 |             loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
40 |     # model training and testing settings
41 |     train_cfg=dict(
42 |         rcnn=dict(
43 |             assigner=dict(
44 |                 type='MaxIoUAssigner',
45 |                 pos_iou_thr=0.5,
46 |                 neg_iou_thr=0.5,
47 |                 min_pos_iou=0.5,
48 |                 match_low_quality=False,
49 |                 ignore_iof_thr=-1),
50 |             sampler=dict(
51 |                 type='RandomSampler',
52 |                 num=512,
53 |                 pos_fraction=0.25,
54 |                 neg_pos_ub=-1,
55 |                 add_gt_as_proposals=True),
56 |             pos_weight=-1,
57 |             debug=False)),
58 |     test_cfg=dict(
59 |         rcnn=dict(
60 |             score_thr=0.05,
61 |             nms=dict(type='nms', iou_threshold=0.5),
62 |             max_per_img=100)))
63 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/faster_rcnn_r50_caffe_c4.py:
--------------------------------------------------------------------------------
  1 | # model settings
  2 | norm_cfg = dict(type='BN', requires_grad=False)
  3 | model = dict(
  4 |     type='FasterRCNN',
  5 |     backbone=dict(
  6 |         type='ResNet',
  7 |         depth=50,
  8 |         num_stages=3,
  9 |         strides=(1, 2, 2),
 10 |         dilations=(1, 1, 1),
 11 |         out_indices=(2, ),
 12 |         frozen_stages=1,
 13 |         norm_cfg=norm_cfg,
 14 |         norm_eval=True,
 15 |         style='caffe',
 16 |         init_cfg=dict(
 17 |             type='Pretrained',
 18 |             checkpoint='open-mmlab://detectron2/resnet50_caffe')),
 19 |     rpn_head=dict(
 20 |         type='RPNHead',
 21 |         in_channels=1024,
 22 |         feat_channels=1024,
 23 |         anchor_generator=dict(
 24 |             type='AnchorGenerator',
 25 |             scales=[2, 4, 8, 16, 32],
 26 |             ratios=[0.5, 1.0, 2.0],
 27 |             strides=[16]),
 28 |         bbox_coder=dict(
 29 |             type='DeltaXYWHBBoxCoder',
 30 |             target_means=[.0, .0, .0, .0],
 31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
 32 |         loss_cls=dict(
 33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
 34 |         loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
 35 |     roi_head=dict(
 36 |         type='StandardRoIHead',
 37 |         shared_head=dict(
 38 |             type='ResLayer',
 39 |             depth=50,
 40 |             stage=3,
 41 |             stride=2,
 42 |             dilation=1,
 43 |             style='caffe',
 44 |             norm_cfg=norm_cfg,
 45 |             norm_eval=True),
 46 |         bbox_roi_extractor=dict(
 47 |             type='SingleRoIExtractor',
 48 |             roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
 49 |             out_channels=1024,
 50 |             featmap_strides=[16]),
 51 |         bbox_head=dict(
 52 |             type='BBoxHead',
 53 |             with_avg_pool=True,
 54 |             roi_feat_size=7,
 55 |             in_channels=2048,
 56 |             num_classes=80,
 57 |             bbox_coder=dict(
 58 |                 type='DeltaXYWHBBoxCoder',
 59 |                 target_means=[0., 0., 0., 0.],
 60 |                 target_stds=[0.1, 0.1, 0.2, 0.2]),
 61 |             reg_class_agnostic=False,
 62 |             loss_cls=dict(
 63 |                 type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
 64 |             loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
 65 |     # model training and testing settings
 66 |     train_cfg=dict(
 67 |         rpn=dict(
 68 |             assigner=dict(
 69 |                 type='MaxIoUAssigner',
 70 |                 pos_iou_thr=0.7,
 71 |                 neg_iou_thr=0.3,
 72 |                 min_pos_iou=0.3,
 73 |                 match_low_quality=True,
 74 |                 ignore_iof_thr=-1),
 75 |             sampler=dict(
 76 |                 type='RandomSampler',
 77 |                 num=256,
 78 |                 pos_fraction=0.5,
 79 |                 neg_pos_ub=-1,
 80 |                 add_gt_as_proposals=False),
 81 |             allowed_border=0,
 82 |             pos_weight=-1,
 83 |             debug=False),
 84 |         rpn_proposal=dict(
 85 |             nms_pre=12000,
 86 |             max_per_img=2000,
 87 |             nms=dict(type='nms', iou_threshold=0.7),
 88 |             min_bbox_size=0),
 89 |         rcnn=dict(
 90 |             assigner=dict(
 91 |                 type='MaxIoUAssigner',
 92 |                 pos_iou_thr=0.5,
 93 |                 neg_iou_thr=0.5,
 94 |                 min_pos_iou=0.5,
 95 |                 match_low_quality=False,
 96 |                 ignore_iof_thr=-1),
 97 |             sampler=dict(
 98 |                 type='RandomSampler',
 99 |                 num=512,
100 |                 pos_fraction=0.25,
101 |                 neg_pos_ub=-1,
102 |                 add_gt_as_proposals=True),
103 |             pos_weight=-1,
104 |             debug=False)),
105 |     test_cfg=dict(
106 |         rpn=dict(
107 |             nms_pre=6000,
108 |             max_per_img=1000,
109 |             nms=dict(type='nms', iou_threshold=0.7),
110 |             min_bbox_size=0),
111 |         rcnn=dict(
112 |             score_thr=0.05,
113 |             nms=dict(type='nms', iou_threshold=0.5),
114 |             max_per_img=100)))
115 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/faster_rcnn_r50_caffe_dc5.py:
--------------------------------------------------------------------------------
  1 | # model settings
  2 | norm_cfg = dict(type='BN', requires_grad=False)
  3 | model = dict(
  4 |     type='FasterRCNN',
  5 |     backbone=dict(
  6 |         type='ResNet',
  7 |         depth=50,
  8 |         num_stages=4,
  9 |         strides=(1, 2, 2, 1),
 10 |         dilations=(1, 1, 1, 2),
 11 |         out_indices=(3, ),
 12 |         frozen_stages=1,
 13 |         norm_cfg=norm_cfg,
 14 |         norm_eval=True,
 15 |         style='caffe',
 16 |         init_cfg=dict(
 17 |             type='Pretrained',
 18 |             checkpoint='open-mmlab://detectron2/resnet50_caffe')),
 19 |     rpn_head=dict(
 20 |         type='RPNHead',
 21 |         in_channels=2048,
 22 |         feat_channels=2048,
 23 |         anchor_generator=dict(
 24 |             type='AnchorGenerator',
 25 |             scales=[2, 4, 8, 16, 32],
 26 |             ratios=[0.5, 1.0, 2.0],
 27 |             strides=[16]),
 28 |         bbox_coder=dict(
 29 |             type='DeltaXYWHBBoxCoder',
 30 |             target_means=[.0, .0, .0, .0],
 31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
 32 |         loss_cls=dict(
 33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
 34 |         loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
 35 |     roi_head=dict(
 36 |         type='StandardRoIHead',
 37 |         bbox_roi_extractor=dict(
 38 |             type='SingleRoIExtractor',
 39 |             roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
 40 |             out_channels=2048,
 41 |             featmap_strides=[16]),
 42 |         bbox_head=dict(
 43 |             type='Shared2FCBBoxHead',
 44 |             in_channels=2048,
 45 |             fc_out_channels=1024,
 46 |             roi_feat_size=7,
 47 |             num_classes=80,
 48 |             bbox_coder=dict(
 49 |                 type='DeltaXYWHBBoxCoder',
 50 |                 target_means=[0., 0., 0., 0.],
 51 |                 target_stds=[0.1, 0.1, 0.2, 0.2]),
 52 |             reg_class_agnostic=False,
 53 |             loss_cls=dict(
 54 |                 type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
 55 |             loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
 56 |     # model training and testing settings
 57 |     train_cfg=dict(
 58 |         rpn=dict(
 59 |             assigner=dict(
 60 |                 type='MaxIoUAssigner',
 61 |                 pos_iou_thr=0.7,
 62 |                 neg_iou_thr=0.3,
 63 |                 min_pos_iou=0.3,
 64 |                 match_low_quality=True,
 65 |                 ignore_iof_thr=-1),
 66 |             sampler=dict(
 67 |                 type='RandomSampler',
 68 |                 num=256,
 69 |                 pos_fraction=0.5,
 70 |                 neg_pos_ub=-1,
 71 |                 add_gt_as_proposals=False),
 72 |             allowed_border=0,
 73 |             pos_weight=-1,
 74 |             debug=False),
 75 |         rpn_proposal=dict(
 76 |             nms_pre=12000,
 77 |             max_per_img=2000,
 78 |             nms=dict(type='nms', iou_threshold=0.7),
 79 |             min_bbox_size=0),
 80 |         rcnn=dict(
 81 |             assigner=dict(
 82 |                 type='MaxIoUAssigner',
 83 |                 pos_iou_thr=0.5,
 84 |                 neg_iou_thr=0.5,
 85 |                 min_pos_iou=0.5,
 86 |                 match_low_quality=False,
 87 |                 ignore_iof_thr=-1),
 88 |             sampler=dict(
 89 |                 type='RandomSampler',
 90 |                 num=512,
 91 |                 pos_fraction=0.25,
 92 |                 neg_pos_ub=-1,
 93 |                 add_gt_as_proposals=True),
 94 |             pos_weight=-1,
 95 |             debug=False)),
 96 |     test_cfg=dict(
 97 |         rpn=dict(
 98 |             nms=dict(type='nms', iou_threshold=0.7),
 99 |             nms_pre=6000,
100 |             max_per_img=1000,
101 |             min_bbox_size=0),
102 |         rcnn=dict(
103 |             score_thr=0.05,
104 |             nms=dict(type='nms', iou_threshold=0.5),
105 |             max_per_img=100)))
106 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/faster_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
  1 | # model settings
  2 | model = dict(
  3 |     type='FasterRCNN',
  4 |     backbone=dict(
  5 |         type='ResNet',
  6 |         depth=50,
  7 |         num_stages=4,
  8 |         out_indices=(0, 1, 2, 3),
  9 |         frozen_stages=1,
 10 |         norm_cfg=dict(type='BN', requires_grad=True),
 11 |         norm_eval=True,
 12 |         style='pytorch',
 13 |         init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
 14 |     neck=dict(
 15 |         type='FPN',
 16 |         in_channels=[256, 512, 1024, 2048],
 17 |         out_channels=256,
 18 |         num_outs=5),
 19 |     rpn_head=dict(
 20 |         type='RPNHead',
 21 |         in_channels=256,
 22 |         feat_channels=256,
 23 |         anchor_generator=dict(
 24 |             type='AnchorGenerator',
 25 |             scales=[8],
 26 |             ratios=[0.5, 1.0, 2.0],
 27 |             strides=[4, 8, 16, 32, 64]),
 28 |         bbox_coder=dict(
 29 |             type='DeltaXYWHBBoxCoder',
 30 |             target_means=[.0, .0, .0, .0],
 31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
 32 |         loss_cls=dict(
 33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
 34 |         loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
 35 |     roi_head=dict(
 36 |         type='StandardRoIHead',
 37 |         bbox_roi_extractor=dict(
 38 |             type='SingleRoIExtractor',
 39 |             roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
 40 |             out_channels=256,
 41 |             featmap_strides=[4, 8, 16, 32]),
 42 |         bbox_head=dict(
 43 |             type='Shared2FCBBoxHead',
 44 |             in_channels=256,
 45 |             fc_out_channels=1024,
 46 |             roi_feat_size=7,
 47 |             num_classes=80,
 48 |             bbox_coder=dict(
 49 |                 type='DeltaXYWHBBoxCoder',
 50 |                 target_means=[0., 0., 0., 0.],
 51 |                 target_stds=[0.1, 0.1, 0.2, 0.2]),
 52 |             reg_class_agnostic=False,
 53 |             loss_cls=dict(
 54 |                 type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
 55 |             loss_bbox=dict(type='L1Loss', loss_weight=1.0))),
 56 |     # model training and testing settings
 57 |     train_cfg=dict(
 58 |         rpn=dict(
 59 |             assigner=dict(
 60 |                 type='MaxIoUAssigner',
 61 |                 pos_iou_thr=0.7,
 62 |                 neg_iou_thr=0.3,
 63 |                 min_pos_iou=0.3,
 64 |                 match_low_quality=True,
 65 |                 ignore_iof_thr=-1),
 66 |             sampler=dict(
 67 |                 type='RandomSampler',
 68 |                 num=256,
 69 |                 pos_fraction=0.5,
 70 |                 neg_pos_ub=-1,
 71 |                 add_gt_as_proposals=False),
 72 |             allowed_border=-1,
 73 |             pos_weight=-1,
 74 |             debug=False),
 75 |         rpn_proposal=dict(
 76 |             nms_pre=2000,
 77 |             max_per_img=1000,
 78 |             nms=dict(type='nms', iou_threshold=0.7),
 79 |             min_bbox_size=0),
 80 |         rcnn=dict(
 81 |             assigner=dict(
 82 |                 type='MaxIoUAssigner',
 83 |                 pos_iou_thr=0.5,
 84 |                 neg_iou_thr=0.5,
 85 |                 min_pos_iou=0.5,
 86 |                 match_low_quality=False,
 87 |                 ignore_iof_thr=-1),
 88 |             sampler=dict(
 89 |                 type='RandomSampler',
 90 |                 num=512,
 91 |                 pos_fraction=0.25,
 92 |                 neg_pos_ub=-1,
 93 |                 add_gt_as_proposals=True),
 94 |             pos_weight=-1,
 95 |             debug=False)),
 96 |     test_cfg=dict(
 97 |         rpn=dict(
 98 |             nms_pre=1000,
 99 |             max_per_img=1000,
100 |             nms=dict(type='nms', iou_threshold=0.7),
101 |             min_bbox_size=0),
102 |         rcnn=dict(
103 |             score_thr=0.05,
104 |             nms=dict(type='nms', iou_threshold=0.5),
105 |             max_per_img=100)
106 |         # soft-nms is also supported for rcnn testing
107 |         # e.g., nms=dict(type='soft_nms', iou_threshold=0.5, min_score=0.05)
108 |     ))
109 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/mask_rcnn_r50_caffe_c4.py:
--------------------------------------------------------------------------------
  1 | # model settings
  2 | norm_cfg = dict(type='BN', requires_grad=False)
  3 | model = dict(
  4 |     type='MaskRCNN',
  5 |     backbone=dict(
  6 |         type='ResNet',
  7 |         depth=50,
  8 |         num_stages=3,
  9 |         strides=(1, 2, 2),
 10 |         dilations=(1, 1, 1),
 11 |         out_indices=(2, ),
 12 |         frozen_stages=1,
 13 |         norm_cfg=norm_cfg,
 14 |         norm_eval=True,
 15 |         style='caffe',
 16 |         init_cfg=dict(
 17 |             type='Pretrained',
 18 |             checkpoint='open-mmlab://detectron2/resnet50_caffe')),
 19 |     rpn_head=dict(
 20 |         type='RPNHead',
 21 |         in_channels=1024,
 22 |         feat_channels=1024,
 23 |         anchor_generator=dict(
 24 |             type='AnchorGenerator',
 25 |             scales=[2, 4, 8, 16, 32],
 26 |             ratios=[0.5, 1.0, 2.0],
 27 |             strides=[16]),
 28 |         bbox_coder=dict(
 29 |             type='DeltaXYWHBBoxCoder',
 30 |             target_means=[.0, .0, .0, .0],
 31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
 32 |         loss_cls=dict(
 33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
 34 |         loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
 35 |     roi_head=dict(
 36 |         type='StandardRoIHead',
 37 |         shared_head=dict(
 38 |             type='ResLayer',
 39 |             depth=50,
 40 |             stage=3,
 41 |             stride=2,
 42 |             dilation=1,
 43 |             style='caffe',
 44 |             norm_cfg=norm_cfg,
 45 |             norm_eval=True),
 46 |         bbox_roi_extractor=dict(
 47 |             type='SingleRoIExtractor',
 48 |             roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
 49 |             out_channels=1024,
 50 |             featmap_strides=[16]),
 51 |         bbox_head=dict(
 52 |             type='BBoxHead',
 53 |             with_avg_pool=True,
 54 |             roi_feat_size=7,
 55 |             in_channels=2048,
 56 |             num_classes=80,
 57 |             bbox_coder=dict(
 58 |                 type='DeltaXYWHBBoxCoder',
 59 |                 target_means=[0., 0., 0., 0.],
 60 |                 target_stds=[0.1, 0.1, 0.2, 0.2]),
 61 |             reg_class_agnostic=False,
 62 |             loss_cls=dict(
 63 |                 type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
 64 |             loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
 65 |         mask_roi_extractor=None,
 66 |         mask_head=dict(
 67 |             type='FCNMaskHead',
 68 |             num_convs=0,
 69 |             in_channels=2048,
 70 |             conv_out_channels=256,
 71 |             num_classes=80,
 72 |             loss_mask=dict(
 73 |                 type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
 74 |     # model training and testing settings
 75 |     train_cfg=dict(
 76 |         rpn=dict(
 77 |             assigner=dict(
 78 |                 type='MaxIoUAssigner',
 79 |                 pos_iou_thr=0.7,
 80 |                 neg_iou_thr=0.3,
 81 |                 min_pos_iou=0.3,
 82 |                 match_low_quality=True,
 83 |                 ignore_iof_thr=-1),
 84 |             sampler=dict(
 85 |                 type='RandomSampler',
 86 |                 num=256,
 87 |                 pos_fraction=0.5,
 88 |                 neg_pos_ub=-1,
 89 |                 add_gt_as_proposals=False),
 90 |             allowed_border=0,
 91 |             pos_weight=-1,
 92 |             debug=False),
 93 |         rpn_proposal=dict(
 94 |             nms_pre=12000,
 95 |             max_per_img=2000,
 96 |             nms=dict(type='nms', iou_threshold=0.7),
 97 |             min_bbox_size=0),
 98 |         rcnn=dict(
 99 |             assigner=dict(
100 |                 type='MaxIoUAssigner',
101 |                 pos_iou_thr=0.5,
102 |                 neg_iou_thr=0.5,
103 |                 min_pos_iou=0.5,
104 |                 match_low_quality=False,
105 |                 ignore_iof_thr=-1),
106 |             sampler=dict(
107 |                 type='RandomSampler',
108 |                 num=512,
109 |                 pos_fraction=0.25,
110 |                 neg_pos_ub=-1,
111 |                 add_gt_as_proposals=True),
112 |             mask_size=14,
113 |             pos_weight=-1,
114 |             debug=False)),
115 |     test_cfg=dict(
116 |         rpn=dict(
117 |             nms_pre=6000,
118 |             nms=dict(type='nms', iou_threshold=0.7),
119 |             max_per_img=1000,
120 |             min_bbox_size=0),
121 |         rcnn=dict(
122 |             score_thr=0.05,
123 |             nms=dict(type='nms', iou_threshold=0.5),
124 |             max_per_img=100,
125 |             mask_thr_binary=0.5)))
126 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/mask_rcnn_r50_fpn.py:
--------------------------------------------------------------------------------
  1 | # model settings
  2 | model = dict(
  3 |     type='MaskRCNN',
  4 |     backbone=dict(
  5 |         type='ResNet',
  6 |         depth=50,
  7 |         num_stages=4,
  8 |         out_indices=(0, 1, 2, 3),
  9 |         frozen_stages=1,
 10 |         norm_cfg=dict(type='BN', requires_grad=True),
 11 |         norm_eval=True,
 12 |         style='pytorch',
 13 |         init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
 14 |     neck=dict(
 15 |         type='FPN',
 16 |         in_channels=[256, 512, 1024, 2048],
 17 |         out_channels=256,
 18 |         num_outs=5),
 19 |     rpn_head=dict(
 20 |         type='RPNHead',
 21 |         in_channels=256,
 22 |         feat_channels=256,
 23 |         anchor_generator=dict(
 24 |             type='AnchorGenerator',
 25 |             scales=[8],
 26 |             ratios=[0.5, 1.0, 2.0],
 27 |             strides=[4, 8, 16, 32, 64]),
 28 |         bbox_coder=dict(
 29 |             type='DeltaXYWHBBoxCoder',
 30 |             target_means=[.0, .0, .0, .0],
 31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
 32 |         loss_cls=dict(
 33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
 34 |         loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
 35 |     roi_head=dict(
 36 |         type='StandardRoIHead',
 37 |         bbox_roi_extractor=dict(
 38 |             type='SingleRoIExtractor',
 39 |             roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
 40 |             out_channels=256,
 41 |             featmap_strides=[4, 8, 16, 32]),
 42 |         bbox_head=dict(
 43 |             type='Shared2FCBBoxHead',
 44 |             in_channels=256,
 45 |             fc_out_channels=1024,
 46 |             roi_feat_size=7,
 47 |             num_classes=80,
 48 |             bbox_coder=dict(
 49 |                 type='DeltaXYWHBBoxCoder',
 50 |                 target_means=[0., 0., 0., 0.],
 51 |                 target_stds=[0.1, 0.1, 0.2, 0.2]),
 52 |             reg_class_agnostic=False,
 53 |             loss_cls=dict(
 54 |                 type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
 55 |             loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
 56 |         mask_roi_extractor=dict(
 57 |             type='SingleRoIExtractor',
 58 |             roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
 59 |             out_channels=256,
 60 |             featmap_strides=[4, 8, 16, 32]),
 61 |         mask_head=dict(
 62 |             type='FCNMaskHead',
 63 |             num_convs=4,
 64 |             in_channels=256,
 65 |             conv_out_channels=256,
 66 |             num_classes=80,
 67 |             loss_mask=dict(
 68 |                 type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
 69 |     # model training and testing settings
 70 |     train_cfg=dict(
 71 |         rpn=dict(
 72 |             assigner=dict(
 73 |                 type='MaxIoUAssigner',
 74 |                 pos_iou_thr=0.7,
 75 |                 neg_iou_thr=0.3,
 76 |                 min_pos_iou=0.3,
 77 |                 match_low_quality=True,
 78 |                 ignore_iof_thr=-1),
 79 |             sampler=dict(
 80 |                 type='RandomSampler',
 81 |                 num=256,
 82 |                 pos_fraction=0.5,
 83 |                 neg_pos_ub=-1,
 84 |                 add_gt_as_proposals=False),
 85 |             allowed_border=-1,
 86 |             pos_weight=-1,
 87 |             debug=False),
 88 |         rpn_proposal=dict(
 89 |             nms_pre=2000,
 90 |             max_per_img=1000,
 91 |             nms=dict(type='nms', iou_threshold=0.7),
 92 |             min_bbox_size=0),
 93 |         rcnn=dict(
 94 |             assigner=dict(
 95 |                 type='MaxIoUAssigner',
 96 |                 pos_iou_thr=0.5,
 97 |                 neg_iou_thr=0.5,
 98 |                 min_pos_iou=0.5,
 99 |                 match_low_quality=True,
100 |                 ignore_iof_thr=-1),
101 |             sampler=dict(
102 |                 type='RandomSampler',
103 |                 num=512,
104 |                 pos_fraction=0.25,
105 |                 neg_pos_ub=-1,
106 |                 add_gt_as_proposals=True),
107 |             mask_size=28,
108 |             pos_weight=-1,
109 |             debug=False)),
110 |     test_cfg=dict(
111 |         rpn=dict(
112 |             nms_pre=1000,
113 |             max_per_img=1000,
114 |             nms=dict(type='nms', iou_threshold=0.7),
115 |             min_bbox_size=0),
116 |         rcnn=dict(
117 |             score_thr=0.05,
118 |             nms=dict(type='nms', iou_threshold=0.5),
119 |             max_per_img=100,
120 |             mask_thr_binary=0.5)))
121 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/retinanet_r50_fpn.py:
--------------------------------------------------------------------------------
 1 | # model settings
 2 | model = dict(
 3 |     type='RetinaNet',
 4 |     backbone=dict(
 5 |         type='ResNet',
 6 |         depth=50,
 7 |         num_stages=4,
 8 |         out_indices=(0, 1, 2, 3),
 9 |         frozen_stages=1,
10 |         norm_cfg=dict(type='BN', requires_grad=True),
11 |         norm_eval=True,
12 |         style='pytorch',
13 |         init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 |     neck=dict(
15 |         type='FPN',
16 |         in_channels=[256, 512, 1024, 2048],
17 |         out_channels=256,
18 |         start_level=1,
19 |         add_extra_convs='on_input',
20 |         num_outs=5),
21 |     bbox_head=dict(
22 |         type='RetinaHead',
23 |         num_classes=80,
24 |         in_channels=256,
25 |         stacked_convs=4,
26 |         feat_channels=256,
27 |         anchor_generator=dict(
28 |             type='AnchorGenerator',
29 |             octave_base_scale=4,
30 |             scales_per_octave=3,
31 |             ratios=[0.5, 1.0, 2.0],
32 |             strides=[8, 16, 32, 64, 128]),
33 |         bbox_coder=dict(
34 |             type='DeltaXYWHBBoxCoder',
35 |             target_means=[.0, .0, .0, .0],
36 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
37 |         loss_cls=dict(
38 |             type='FocalLoss',
39 |             use_sigmoid=True,
40 |             gamma=2.0,
41 |             alpha=0.25,
42 |             loss_weight=1.0),
43 |         loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
44 |     # model training and testing settings
45 |     train_cfg=dict(
46 |         assigner=dict(
47 |             type='MaxIoUAssigner',
48 |             pos_iou_thr=0.5,
49 |             neg_iou_thr=0.4,
50 |             min_pos_iou=0,
51 |             ignore_iof_thr=-1),
52 |         allowed_border=-1,
53 |         pos_weight=-1,
54 |         debug=False),
55 |     test_cfg=dict(
56 |         nms_pre=1000,
57 |         min_bbox_size=0,
58 |         score_thr=0.05,
59 |         nms=dict(type='nms', iou_threshold=0.5),
60 |         max_per_img=100))
61 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/rpn_r50_caffe_c4.py:
--------------------------------------------------------------------------------
 1 | # model settings
 2 | model = dict(
 3 |     type='RPN',
 4 |     backbone=dict(
 5 |         type='ResNet',
 6 |         depth=50,
 7 |         num_stages=3,
 8 |         strides=(1, 2, 2),
 9 |         dilations=(1, 1, 1),
10 |         out_indices=(2, ),
11 |         frozen_stages=1,
12 |         norm_cfg=dict(type='BN', requires_grad=False),
13 |         norm_eval=True,
14 |         style='caffe',
15 |         init_cfg=dict(
16 |             type='Pretrained',
17 |             checkpoint='open-mmlab://detectron2/resnet50_caffe')),
18 |     neck=None,
19 |     rpn_head=dict(
20 |         type='RPNHead',
21 |         in_channels=1024,
22 |         feat_channels=1024,
23 |         anchor_generator=dict(
24 |             type='AnchorGenerator',
25 |             scales=[2, 4, 8, 16, 32],
26 |             ratios=[0.5, 1.0, 2.0],
27 |             strides=[16]),
28 |         bbox_coder=dict(
29 |             type='DeltaXYWHBBoxCoder',
30 |             target_means=[.0, .0, .0, .0],
31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
32 |         loss_cls=dict(
33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 |         loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 |     # model training and testing settings
36 |     train_cfg=dict(
37 |         rpn=dict(
38 |             assigner=dict(
39 |                 type='MaxIoUAssigner',
40 |                 pos_iou_thr=0.7,
41 |                 neg_iou_thr=0.3,
42 |                 min_pos_iou=0.3,
43 |                 ignore_iof_thr=-1),
44 |             sampler=dict(
45 |                 type='RandomSampler',
46 |                 num=256,
47 |                 pos_fraction=0.5,
48 |                 neg_pos_ub=-1,
49 |                 add_gt_as_proposals=False),
50 |             allowed_border=0,
51 |             pos_weight=-1,
52 |             debug=False)),
53 |     test_cfg=dict(
54 |         rpn=dict(
55 |             nms_pre=12000,
56 |             max_per_img=2000,
57 |             nms=dict(type='nms', iou_threshold=0.7),
58 |             min_bbox_size=0)))
59 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/rpn_r50_fpn.py:
--------------------------------------------------------------------------------
 1 | # model settings
 2 | model = dict(
 3 |     type='RPN',
 4 |     backbone=dict(
 5 |         type='ResNet',
 6 |         depth=50,
 7 |         num_stages=4,
 8 |         out_indices=(0, 1, 2, 3),
 9 |         frozen_stages=1,
10 |         norm_cfg=dict(type='BN', requires_grad=True),
11 |         norm_eval=True,
12 |         style='pytorch',
13 |         init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
14 |     neck=dict(
15 |         type='FPN',
16 |         in_channels=[256, 512, 1024, 2048],
17 |         out_channels=256,
18 |         num_outs=5),
19 |     rpn_head=dict(
20 |         type='RPNHead',
21 |         in_channels=256,
22 |         feat_channels=256,
23 |         anchor_generator=dict(
24 |             type='AnchorGenerator',
25 |             scales=[8],
26 |             ratios=[0.5, 1.0, 2.0],
27 |             strides=[4, 8, 16, 32, 64]),
28 |         bbox_coder=dict(
29 |             type='DeltaXYWHBBoxCoder',
30 |             target_means=[.0, .0, .0, .0],
31 |             target_stds=[1.0, 1.0, 1.0, 1.0]),
32 |         loss_cls=dict(
33 |             type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
34 |         loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
35 |     # model training and testing settings
36 |     train_cfg=dict(
37 |         rpn=dict(
38 |             assigner=dict(
39 |                 type='MaxIoUAssigner',
40 |                 pos_iou_thr=0.7,
41 |                 neg_iou_thr=0.3,
42 |                 min_pos_iou=0.3,
43 |                 ignore_iof_thr=-1),
44 |             sampler=dict(
45 |                 type='RandomSampler',
46 |                 num=256,
47 |                 pos_fraction=0.5,
48 |                 neg_pos_ub=-1,
49 |                 add_gt_as_proposals=False),
50 |             allowed_border=0,
51 |             pos_weight=-1,
52 |             debug=False)),
53 |     test_cfg=dict(
54 |         rpn=dict(
55 |             nms_pre=2000,
56 |             max_per_img=1000,
57 |             nms=dict(type='nms', iou_threshold=0.7),
58 |             min_bbox_size=0)))
59 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/models/ssd300.py:
--------------------------------------------------------------------------------
 1 | # model settings
 2 | input_size = 300
 3 | model = dict(
 4 |     type='SingleStageDetector',
 5 |     backbone=dict(
 6 |         type='SSDVGG',
 7 |         depth=16,
 8 |         with_last_pool=False,
 9 |         ceil_mode=True,
10 |         out_indices=(3, 4),
11 |         out_feature_indices=(22, 34),
12 |         init_cfg=dict(
13 |             type='Pretrained', checkpoint='open-mmlab://vgg16_caffe')),
14 |     neck=dict(
15 |         type='SSDNeck',
16 |         in_channels=(512, 1024),
17 |         out_channels=(512, 1024, 512, 256, 256, 256),
18 |         level_strides=(2, 2, 1, 1),
19 |         level_paddings=(1, 1, 0, 0),
20 |         l2_norm_scale=20),
21 |     bbox_head=dict(
22 |         type='SSDHead',
23 |         in_channels=(512, 1024, 512, 256, 256, 256),
24 |         num_classes=80,
25 |         anchor_generator=dict(
26 |             type='SSDAnchorGenerator',
27 |             scale_major=False,
28 |             input_size=input_size,
29 |             basesize_ratio_range=(0.15, 0.9),
30 |             strides=[8, 16, 32, 64, 100, 300],
31 |             ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]]),
32 |         bbox_coder=dict(
33 |             type='DeltaXYWHBBoxCoder',
34 |             target_means=[.0, .0, .0, .0],
35 |             target_stds=[0.1, 0.1, 0.2, 0.2])),
36 |     # model training and testing settings
37 |     train_cfg=dict(
38 |         assigner=dict(
39 |             type='MaxIoUAssigner',
40 |             pos_iou_thr=0.5,
41 |             neg_iou_thr=0.5,
42 |             min_pos_iou=0.,
43 |             ignore_iof_thr=-1,
44 |             gt_max_assign_all=False),
45 |         smoothl1_beta=1.,
46 |         allowed_border=-1,
47 |         pos_weight=-1,
48 |         neg_pos_ratio=3,
49 |         debug=False),
50 |     test_cfg=dict(
51 |         nms_pre=1000,
52 |         nms=dict(type='nms', iou_threshold=0.45),
53 |         min_bbox_size=0,
54 |         score_thr=0.02,
55 |         max_per_img=200))
56 | cudnn_benchmark = True
57 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/schedules/schedule_1x.py:
--------------------------------------------------------------------------------
 1 | # optimizer
 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
 3 | optimizer_config = dict(grad_clip=None)
 4 | # learning policy
 5 | lr_config = dict(
 6 |     policy='step',
 7 |     warmup='linear',
 8 |     warmup_iters=500,
 9 |     warmup_ratio=0.001,
10 |     step=[8, 11])
11 | runner = dict(type='EpochBasedRunner', max_epochs=12)
12 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/schedules/schedule_20e.py:
--------------------------------------------------------------------------------
 1 | # optimizer
 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
 3 | optimizer_config = dict(grad_clip=None)
 4 | # learning policy
 5 | lr_config = dict(
 6 |     policy='step',
 7 |     warmup='linear',
 8 |     warmup_iters=500,
 9 |     warmup_ratio=0.001,
10 |     step=[16, 19])
11 | runner = dict(type='EpochBasedRunner', max_epochs=20)
12 | 


--------------------------------------------------------------------------------
/detection/configs/_base_/schedules/schedule_2x.py:
--------------------------------------------------------------------------------
 1 | # optimizer
 2 | optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
 3 | optimizer_config = dict(grad_clip=None)
 4 | # learning policy
 5 | lr_config = dict(
 6 |     policy='step',
 7 |     warmup='linear',
 8 |     warmup_iters=500,
 9 |     warmup_ratio=0.001,
10 |     step=[16, 22])
11 | runner = dict(type='EpochBasedRunner', max_epochs=24)
12 | 


--------------------------------------------------------------------------------
/detection/configs/mask_rcnn_p2t_b_fpn_1x_coco.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/mask_rcnn_r50_fpn.py',
 3 |     '_base_/datasets/coco_instance.py',
 4 |     # '../configs/_base_/schedules/schedule_1x.py',
 5 |     '_base_/default_runtime.py'
 6 | ]
 7 | model = dict(
 8 |     backbone=dict(
 9 |         type='p2t_base',
10 |         style='pytorch',
11 |         init_cfg=dict(
12 |             type='Pretrained',
13 |             checkpoint='pretrained/p2t_base.pth'),
14 |     ),
15 |     neck=dict(
16 |         type='FPN',
17 |         in_channels=[64, 128, 320, 512],
18 |         out_channels=256,
19 |         num_outs=5)
20 | )
21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
22 | optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2))
23 | 
24 | # learning policy
25 | lr_config = dict(
26 |     policy='step',
27 |     warmup='linear',
28 |     warmup_iters=500,
29 |     warmup_ratio=0.001,
30 |     step=[8, 11])
31 | 
32 | total_epochs = 12
33 | fp16 = None
34 | find_unused_parameters = True


--------------------------------------------------------------------------------
/detection/configs/mask_rcnn_p2t_l_fpn_1x_coco.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/mask_rcnn_r50_fpn.py',
 3 |     '_base_/datasets/coco_instance.py',
 4 |     # '../configs/_base_/schedules/schedule_1x.py',
 5 |     '_base_/default_runtime.py'
 6 | ]
 7 | model = dict(
 8 |     backbone=dict(
 9 |         type='p2t_large',
10 |         style='pytorch',
11 |         init_cfg=dict(
12 |             type='Pretrained',
13 |             checkpoint='pretrained/p2t_large.pth'),
14 |     ),
15 |     neck=dict(
16 |         type='FPN',
17 |         in_channels=[64, 128, 320, 640],
18 |         out_channels=256,
19 |         num_outs=5)
20 | )
21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
22 | optimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2))
23 | 
24 | # learning policy
25 | lr_config = dict(
26 |     policy='step',
27 |     warmup='linear',
28 |     warmup_iters=500,
29 |     warmup_ratio=0.001,
30 |     step=[8, 11])
31 | 
32 | total_epochs = 12
33 | fp16 = None
34 | find_unused_parameters = True


--------------------------------------------------------------------------------
/detection/configs/mask_rcnn_p2t_s_fpn_1x_coco.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/mask_rcnn_r50_fpn.py',
 3 |     '_base_/datasets/coco_instance.py',
 4 |     # '../configs/_base_/schedules/schedule_1x.py',
 5 |     '_base_/default_runtime.py'
 6 | ]
 7 | model = dict(
 8 |     backbone=dict(
 9 |         type='p2t_small',
10 |         style='pytorch',
11 |         init_cfg=dict(
12 |             type='Pretrained',
13 |             checkpoint='pretrained/p2t_small.pth'),
14 |     ),
15 |     neck=dict(
16 |         type='FPN',
17 |         in_channels=[64, 128, 320, 512],
18 |         out_channels=256,
19 |         num_outs=5)
20 | )
21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
22 | optimizer_config = dict(grad_clip=None)
23 | 
24 | # learning policy
25 | lr_config = dict(
26 |     policy='step',
27 |     warmup='linear',
28 |     warmup_iters=500,
29 |     warmup_ratio=0.001,
30 |     step=[8, 11])
31 | 
32 | total_epochs = 12
33 | fp16 = None
34 | find_unused_parameters = True


--------------------------------------------------------------------------------
/detection/configs/mask_rcnn_p2t_t_fpn_1x_coco.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/mask_rcnn_r50_fpn.py',
 3 |     '_base_/datasets/coco_instance.py',
 4 |     # '../configs/_base_/schedules/schedule_1x.py',
 5 |     '_base_/default_runtime.py'
 6 | ]
 7 | model = dict(
 8 |     backbone=dict(
 9 |         type='p2t_tiny',
10 |         style='pytorch',
11 |         init_cfg=dict(
12 |             type='Pretrained',
13 |             checkpoint='pretrained/p2t_tiny.pth'),
14 |     ),
15 |     neck=dict(
16 |         type='FPN',
17 |         in_channels=[48, 96, 240, 384],
18 |         out_channels=256,
19 |         num_outs=5)
20 | )
21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
22 | optimizer_config = dict(grad_clip=None)
23 | 
24 | # learning policy
25 | lr_config = dict(
26 |     policy='step',
27 |     warmup='linear',
28 |     warmup_iters=500,
29 |     warmup_ratio=0.001,
30 |     step=[8, 11])
31 | 
32 | total_epochs = 12
33 | fp16 = None
34 | find_unused_parameters = True


--------------------------------------------------------------------------------
/detection/configs/retinanet_p2t_b_fpn_1x_coco.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/retinanet_r50_fpn.py',
 3 |     '_base_/datasets/coco_detection.py',
 4 |     '_base_/schedules/schedule_1x.py', '_base_/default_runtime.py'
 5 | ]
 6 | 
 7 | model = dict(
 8 |     backbone=dict(
 9 |         type='p2t_base',
10 |         style='pytorch',
11 |         init_cfg=dict(
12 |             type='Pretrained',
13 |             checkpoint='pretrained/p2t_base.pth'),
14 |     ),
15 |     neck=dict(
16 |         type='FPN',
17 |         in_channels=[64, 128, 320, 512],
18 |         out_channels=256,
19 |         num_outs=5)
20 | )
21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
22 | optimizer_config = dict(grad_clip=None)
23 | 
24 | # learning policy
25 | lr_config = dict(
26 |     policy='step',
27 |     warmup='linear',
28 |     warmup_iters=500,
29 |     warmup_ratio=0.001,
30 |     step=[8, 11])
31 | 
32 | total_epochs = 12
33 | fp16 = None
34 | find_unused_parameters = True


--------------------------------------------------------------------------------
/detection/configs/retinanet_p2t_l_fpn_1x_coco.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/retinanet_r50_fpn.py',
 3 |     '_base_/datasets/coco_detection.py',
 4 |     '_base_/schedules/schedule_1x.py', '_base_/default_runtime.py'
 5 | ]
 6 | 
 7 | model = dict(
 8 |     backbone=dict(
 9 |         type='p2t_large',
10 |         style='pytorch',
11 |         init_cfg=dict(
12 |             type='Pretrained',
13 |             checkpoint='pretrained/p2t_large.pth'),
14 |     ),
15 |     neck=dict(
16 |         type='FPN',
17 |         in_channels=[64, 128, 320, 640],
18 |         out_channels=256,
19 |         num_outs=5)
20 | )
21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
22 | optimizer_config = dict(grad_clip=None)
23 | 
24 | # learning policy
25 | lr_config = dict(
26 |     policy='step',
27 |     warmup='linear',
28 |     warmup_iters=500,
29 |     warmup_ratio=0.001,
30 |     step=[8, 11])
31 | 
32 | total_epochs = 12
33 | fp16 = None
34 | find_unused_parameters = True


--------------------------------------------------------------------------------
/detection/configs/retinanet_p2t_s_fpn_1x_coco.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/retinanet_r50_fpn.py',
 3 |     '_base_/datasets/coco_detection.py',
 4 |     '_base_/schedules/schedule_1x.py', '_base_/default_runtime.py'
 5 | ]
 6 | 
 7 | model = dict(
 8 |     backbone=dict(
 9 |         type='p2t_small',
10 |         style='pytorch',
11 |         init_cfg=dict(
12 |             type='Pretrained',
13 |             checkpoint='pretrained/p2t_small.pth'),
14 |     ),
15 |     neck=dict(
16 |         type='FPN',
17 |         in_channels=[64, 128, 320, 512],
18 |         out_channels=256,
19 |         num_outs=5)
20 | )
21 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
22 | optimizer_config = dict(grad_clip=None)
23 | 
24 | # learning policy
25 | lr_config = dict(
26 |     policy='step',
27 |     warmup='linear',
28 |     warmup_iters=500,
29 |     warmup_ratio=0.001,
30 |     step=[8, 11])
31 | 
32 | total_epochs = 12
33 | fp16 = None
34 | find_unused_parameters = True


--------------------------------------------------------------------------------
/detection/configs/retinanet_p2t_t_fpn_1x_coco.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/retinanet_r50_fpn.py',
 3 |     '_base_/datasets/coco_detection.py',
 4 |     '_base_/schedules/schedule_1x.py', '_base_/default_runtime.py'
 5 | ]
 6 | 
 7 | 
 8 | model = dict(
 9 |     backbone=dict(
10 |         type='p2t_tiny',
11 |         style='pytorch',
12 |         init_cfg=dict(
13 |             type='Pretrained',
14 |             checkpoint='pretrained/p2t_tiny.pth'),
15 |     ),
16 |     neck=dict(
17 |         type='FPN',
18 |         in_channels=[48, 96, 240, 384],
19 |         out_channels=256,
20 |         num_outs=5)
21 | )
22 | optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, weight_decay=0.0001)
23 | optimizer_config = dict(grad_clip=None)
24 | 
25 | # learning policy
26 | lr_config = dict(
27 |     policy='step',
28 |     warmup='linear',
29 |     warmup_iters=500,
30 |     warmup_ratio=0.001,
31 |     step=[8, 11])
32 | 
33 | total_epochs = 12
34 | fp16 = None
35 | find_unused_parameters = True


--------------------------------------------------------------------------------
/detection/dist_test.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | CONFIG=$1
 4 | CHECKPOINT=$2
 5 | GPUS=$3
 6 | PORT=${PORT:-29400}
 7 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
 8 | python $(dirname "$0")/test.py $CONFIG $CHECKPOINT ${@:4}
 9 | 
10 | ## example command:
11 | ## bash dist_test.sh configs/mask_rcnn_p2t_t_fpn_1x_coco.py pretrained/mask_rcnn_p2t_t_fpn_1x_coco-d875fa68.pth 1
12 | 


--------------------------------------------------------------------------------
/detection/dist_train.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | export OMP_NUM_THREADS=1
 4 | 
 5 | CONFIG=$1
 6 | N_GPUS=$2
 7 | PORT=${PORT:-29500}
 8 | 
 9 | 
10 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
11 | python -m torch.distributed.launch --nproc_per_node=${N_GPUS} \
12 |     --master_port=${PORT} \
13 |     --use_env $(dirname "$0")/train.py ${CONFIG} --launcher pytorch ${@:3}
14 | 
15 | ## bash dist_train.sh configs/mask_rcnn_p2t_t_fpn_1x_coco.py 8
16 | 
17 | 


--------------------------------------------------------------------------------
/detection/p2t.py:
--------------------------------------------------------------------------------
  1 | from os import sep
  2 | from pickle import TRUE
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | import torch.jit as jit
  7 | from functools import partial
  8 | 
  9 | from timm.models.layers import DropPath, to_2tuple, trunc_normal_
 10 | from timm.models.registry import register_model
 11 | from timm.models.vision_transformer import _cfg
 12 | 
 13 | from mmdet.models.builder import BACKBONES
 14 | from mmcv.runner import load_checkpoint
 15 | from mmdet.utils import get_root_logger
 16 | 
 17 | 
 18 | import numpy as np
 19 | from time import time
 20 | 
 21 | __all__ = [
 22 |     'p2t_tiny', 'p2t_small', 'p2t_base', 'p2t_large'
 23 | ]
 24 | 
 25 | 
 26 | 
 27 | class IRB(nn.Module):
 28 |     def __init__(self, in_features, hidden_features=None, out_features=None, ksize=3, act_layer=nn.Hardswish, drop=0.):
 29 |         super().__init__()
 30 |         out_features = out_features or in_features
 31 |         hidden_features = hidden_features or in_features
 32 |         self.fc1 = nn.Conv2d(in_features, hidden_features, 1, 1, 0)
 33 |         self.act = act_layer()
 34 |         self.conv = nn.Conv2d(hidden_features, hidden_features, kernel_size=ksize, padding=ksize//2, stride=1, groups=hidden_features)
 35 |         self.fc2 = nn.Conv2d(hidden_features, out_features, 1, 1, 0)
 36 |         self.drop = nn.Dropout(drop)
 37 |     
 38 |     def forward(self, x, H, W):
 39 |         B, N, C = x.shape
 40 |         x = x.permute(0,2,1).reshape(B, C, H, W)
 41 |         x = self.fc1(x)
 42 |         x = self.act(x)
 43 |         x = self.conv(x)
 44 |         x = self.act(x)
 45 |         x = self.fc2(x)
 46 |         return x.reshape(B, C, -1).permute(0,2,1)
 47 | 
 48 | 
 49 | class PoolingAttention(nn.Module):
 50 |     def __init__(self, dim, num_heads=2, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., 
 51 |         pool_ratios=[1,2,3,6]):
 52 | 
 53 |         super().__init__()
 54 |         assert dim % num_heads == 0, f"dim {dim} should be divided by num_heads {num_heads}."
 55 | 
 56 |         self.dim = dim
 57 |         self.num_heads = num_heads
 58 |         self.num_elements = np.array([t*t for t in pool_ratios]).sum()
 59 |         head_dim = dim // num_heads
 60 |         self.scale = qk_scale or head_dim ** -0.5
 61 | 
 62 |         self.q = nn.Sequential(nn.Linear(dim, dim, bias=qkv_bias))
 63 |         self.kv = nn.Sequential(nn.Linear(dim, dim * 2, bias=qkv_bias))
 64 |         
 65 |         self.attn_drop = nn.Dropout(attn_drop)
 66 |         self.proj = nn.Linear(dim, dim)
 67 |         self.proj_drop = nn.Dropout(proj_drop)
 68 | 
 69 |         self.pool_ratios = pool_ratios
 70 |         self.pools = nn.ModuleList()
 71 |         
 72 |         self.norm = nn.LayerNorm(dim)
 73 | 
 74 |     def forward(self, x, H, W, d_convs=None):
 75 |         B, N, C = x.shape
 76 |         
 77 |         q = self.q(x).reshape(B, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)
 78 |         pools = []
 79 |         x_ = x.permute(0, 2, 1).reshape(B, C, H, W)
 80 |         for (pool_ratio, l) in zip(self.pool_ratios, d_convs):
 81 |             pool = F.adaptive_avg_pool2d(x_, (round(H/pool_ratio), round(W/pool_ratio)))
 82 |             pool = pool + l(pool)
 83 |             pools.append(pool.view(B, C, -1))
 84 |         
 85 |         pools = torch.cat(pools, dim=2)
 86 |         pools = self.norm(pools.permute(0,2,1))
 87 |         
 88 |         kv = self.kv(pools).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
 89 |         k, v = kv[0], kv[1]
 90 | 
 91 |         attn = (q @ k.transpose(-2, -1)) * self.scale
 92 |         attn = attn.softmax(dim=-1)
 93 |         x = (attn @ v)   
 94 |         x = x.transpose(1,2).contiguous().reshape(B, N, C)
 95 |         
 96 |         x = self.proj(x)
 97 | 
 98 |         return x
 99 | 
100 | 
101 | class Block(nn.Module):
102 | 
103 |     def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
104 |                  drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, pool_ratios=[12,16,20,24]):
105 |         super().__init__()
106 |         self.norm1 = norm_layer(dim)
107 |         self.attn = PoolingAttention(
108 |                     dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, 
109 |                     attn_drop=attn_drop, proj_drop=drop, pool_ratios=pool_ratios)
110 |         
111 |         self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
112 | 
113 |         self.norm2 = norm_layer(dim)
114 |         self.mlp = IRB(in_features=dim, hidden_features=int(dim * mlp_ratio), act_layer=nn.Hardswish, drop=drop, ksize=3)
115 |     
116 |     def forward(self, x, H, W, d_convs=None):
117 |         x = x + self.drop_path(self.attn(self.norm1(x), H, W, d_convs=d_convs))
118 |         x = x + self.drop_path(self.mlp(self.norm2(x), H, W))
119 | 
120 |         return x
121 | 
122 | class PatchEmbed(nn.Module):
123 |     """ Image to Patch Embedding
124 |     """
125 | 
126 |     def __init__(self, img_size=224, patch_size=16, kernel_size=3, in_chans=3, embed_dim=768, overlap=True):
127 |         super().__init__()
128 |         img_size = to_2tuple(img_size)
129 |         patch_size = to_2tuple(patch_size)
130 | 
131 |         self.img_size = img_size
132 |         self.patch_size = patch_size
133 |         assert img_size[0] % patch_size[0] == 0 and img_size[1] % patch_size[1] == 0, \
134 |             f"img_size {img_size} should be divided by patch_size {patch_size}."
135 |         self.H, self.W = img_size[0] // patch_size[0], img_size[1] // patch_size[1]
136 |         self.num_patches = self.H * self.W
137 |         if not overlap:
138 |             self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)
139 |         else:
140 |             self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=kernel_size, stride=patch_size, padding=kernel_size//2)
141 |         
142 |         self.norm = nn.LayerNorm(embed_dim)
143 | 
144 |     def forward(self, x):
145 |         B, C, H, W = x.shape
146 |         x = self.proj(x).flatten(2).transpose(1, 2)
147 |         x = self.norm(x)
148 |         H, W = H // self.patch_size[0], W // self.patch_size[1]
149 | 
150 |         return x, (H, W)
151 | 
152 | 
153 | 
154 | class PyramidPoolingTransformer(nn.Module):
155 |     def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dims=[64, 128, 320, 512],
156 |                  num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, qk_scale=None, drop_rate=0.,
157 |                  attn_drop_rate=0., drop_path_rate=0.1, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 9, 3], **kwargs): #
158 |         super().__init__()
159 |         print("loading p2t")
160 |         self.depths = depths
161 | 
162 |         self.embed_dims = embed_dims
163 | 
164 |         # pyramid pooling ratios for each stage
165 |         pool_ratios = [[12,16,20,24], [6,8,10,12], [3,4,5,6], [1,2,3,4]]
166 |         
167 |         self.patch_embed1 = PatchEmbed(img_size=img_size, patch_size=4, kernel_size=7, in_chans=in_chans,
168 |                                        embed_dim=embed_dims[0], overlap=True)
169 | 
170 |         self.patch_embed2 = PatchEmbed(img_size=img_size // 4, patch_size=2, in_chans=embed_dims[0],
171 |                                     embed_dim=embed_dims[1], overlap=True)
172 |         self.patch_embed3 = PatchEmbed(img_size=img_size // 8, patch_size=2, in_chans=embed_dims[1],
173 |                                     embed_dim=embed_dims[2], overlap=True)
174 |         self.patch_embed4 = PatchEmbed(img_size=img_size // 16, patch_size=2, in_chans=embed_dims[2],
175 |                                     embed_dim=embed_dims[3], overlap=True)
176 |         
177 |         self.d_convs1 = nn.ModuleList([nn.Conv2d(embed_dims[0], embed_dims[0], kernel_size=3, stride=1, padding=1, groups=embed_dims[0]) for temp in pool_ratios[0]])
178 |         self.d_convs2 = nn.ModuleList([nn.Conv2d(embed_dims[1], embed_dims[1], kernel_size=3, stride=1, padding=1, groups=embed_dims[1]) for temp in pool_ratios[1]])
179 |         self.d_convs3 = nn.ModuleList([nn.Conv2d(embed_dims[2], embed_dims[2], kernel_size=3, stride=1, padding=1, groups=embed_dims[2]) for temp in pool_ratios[2]])
180 |         self.d_convs4 = nn.ModuleList([nn.Conv2d(embed_dims[3], embed_dims[3], kernel_size=3, stride=1, padding=1, groups=embed_dims[3]) for temp in pool_ratios[3]])
181 | 
182 |         # transformer encoder
183 |         dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule
184 |         cur = 0
185 | 
186 | 
187 |         ksize = 3
188 | 
189 |         self.block1 = nn.ModuleList([Block(
190 |             dim=embed_dims[0], num_heads=num_heads[0], mlp_ratio=mlp_ratios[0], qkv_bias=qkv_bias, qk_scale=qk_scale,
191 |             drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[0])
192 |             for i in range(depths[0])])
193 |         
194 | 
195 |         cur += depths[0]
196 |         self.block2 = nn.ModuleList([Block(
197 |             dim=embed_dims[1], num_heads=num_heads[1], mlp_ratio=mlp_ratios[1], qkv_bias=qkv_bias, qk_scale=qk_scale,
198 |             drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[1])
199 |             for i in range(depths[1])])
200 | 
201 |         cur += depths[1]
202 | 
203 |         
204 |         self.block3 = nn.ModuleList([Block(
205 |             dim=embed_dims[2], num_heads=num_heads[2], mlp_ratio=mlp_ratios[2], qkv_bias=qkv_bias, qk_scale=qk_scale,
206 |             drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[2])
207 |             for i in range(depths[2])])
208 | 
209 |         cur += depths[2]
210 | 
211 |         self.block4 = nn.ModuleList([Block(
212 |             dim=embed_dims[3], num_heads=num_heads[3], mlp_ratio=mlp_ratios[3], qkv_bias=qkv_bias, qk_scale=qk_scale,
213 |             drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[3])
214 |             for i in range(depths[3])])
215 |         
216 |         # classification head, usually not used in dense prediction tasks
217 | 
218 |         self.apply(self._init_weights)
219 | 
220 | 
221 |     def init_weights(self, pretrained=None):
222 |         if isinstance(pretrained, str):
223 |             logger = get_root_logger()
224 |             load_checkpoint(self, pretrained, map_location='cpu', strict=False, logger=logger)
225 |  
226 | 
227 |     def reset_drop_path(self, drop_path_rate):
228 |         dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(self.depths))]
229 |         cur = 0
230 |         for i in range(self.depths[0]):
231 |             self.block1[i].drop_path.drop_prob = dpr[cur + i]
232 | 
233 |         cur += self.depths[0]
234 |         for i in range(self.depths[1]):
235 |             self.block2[i].drop_path.drop_prob = dpr[cur + i]
236 | 
237 |         cur += self.depths[1]
238 |         for i in range(self.depths[2]):
239 |             self.block3[i].drop_path.drop_prob = dpr[cur + i]
240 | 
241 |         cur += self.depths[2]
242 |         for i in range(self.depths[3]):
243 |             self.block4[i].drop_path.drop_prob = dpr[cur + i]
244 | 
245 |     def _init_weights(self, m):
246 |         if isinstance(m, nn.Linear):
247 |             trunc_normal_(m.weight, std=.02)
248 |             if isinstance(m, nn.Linear) and m.bias is not None:
249 |                 nn.init.constant_(m.bias, 0)
250 |         elif isinstance(m, nn.LayerNorm):
251 |             nn.init.constant_(m.bias, 0)
252 |             nn.init.constant_(m.weight, 1.0)
253 | 
254 | 
255 |     @torch.jit.ignore
256 |     def no_weight_decay(self):
257 |         # return {'pos_embed', 'cls_token'} # has pos_embed may be better
258 |         return {'cls_token'}
259 | 
260 |     def get_classifier(self):
261 |         return self.head
262 |    
263 |     def forward_features(self, x):
264 |         outs = []
265 | 
266 |         B = x.shape[0]
267 | 
268 |         # stage 1
269 |         x, (H, W) = self.patch_embed1(x)
270 |         
271 |         for idx, blk in enumerate(self.block1):
272 |             x = blk(x, H, W, self.d_convs1)
273 |         x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()
274 |         outs.append(x)
275 | 
276 |         # stage 2
277 |         x, (H, W) = self.patch_embed2(x)
278 | 
279 |         for idx, blk in enumerate(self.block2):
280 |             x = blk(x, H, W, self.d_convs2)
281 |         x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()
282 |         outs.append(x)
283 | 
284 |         x, (H, W) = self.patch_embed3(x)
285 | 
286 |         for idx, blk in enumerate(self.block3):
287 |             x = blk(x, H, W, self.d_convs3)
288 |         x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()
289 |         outs.append(x)
290 |         
291 |         # stage 4
292 |         x, (H, W) = self.patch_embed4(x)
293 | 
294 |         for idx, blk in enumerate(self.block4):
295 |             x = blk(x, H, W, self.d_convs4)
296 |         x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()
297 |         outs.append(x)
298 |         
299 |         return outs
300 | 
301 |     def forward(self, x):
302 |         x = self.forward_features(x)
303 | 
304 |         return x
305 | 
306 | 
307 | def _conv_filter(state_dict, patch_size=16):
308 |     """ convert patch embedding weight from manual patchify + linear proj to conv"""
309 |     out_dict = {}
310 |     for k, v in state_dict.items():
311 |         if 'patch_embed.proj.weight' in k:
312 |             v = v.reshape((v.shape[0], 3, patch_size, patch_size))
313 |         out_dict[k] = v
314 | 
315 |     return out_dict
316 | 
317 | 
318 | @BACKBONES.register_module()
319 | class p2t_tiny(PyramidPoolingTransformer):
320 |     def __init__(self, **kwargs):
321 |         super().__init__(
322 |             patch_size=4, embed_dims=[48, 96, 240, 384], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4],
323 |             qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 6, 3], 
324 |             drop_rate=0.0, drop_path_rate=0.1, **kwargs)
325 | 
326 | 
327 | @BACKBONES.register_module()
328 | class p2t_small(PyramidPoolingTransformer):
329 |     def __init__(self, **kwargs):
330 |         super().__init__(
331 |             patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8],
332 |             qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2,2,9,3], mlp_ratios=[8,8,4,4],
333 |             drop_rate=0.0, drop_path_rate=0.1, **kwargs)
334 | 
335 | 
336 | @BACKBONES.register_module()
337 | class p2t_base(PyramidPoolingTransformer):
338 |     def __init__(self, **kwargs):
339 |         super().__init__(
340 |             patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8],
341 |             qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3,4,18,3], mlp_ratios=[8,8,4,4],
342 |             drop_rate=0.0, drop_path_rate=0.3, **kwargs)
343 | 
344 | 
345 | @BACKBONES.register_module()
346 | class p2t_large(PyramidPoolingTransformer):
347 |     def __init__(self, **kwargs):
348 |         super().__init__(
349 |             patch_size=4, embed_dims=[64, 128, 320, 640], num_heads=[1, 2, 5, 8],
350 |             qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3,8,27,3], mlp_ratios=[8,8,4,4],
351 |             drop_rate=0.0, drop_path_rate=0.3, **kwargs)
352 | 


--------------------------------------------------------------------------------
/detection/test.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | import os.path as osp
  4 | import time
  5 | import warnings
  6 | 
  7 | import mmcv
  8 | import torch
  9 | from mmcv import Config, DictAction
 10 | from mmcv.cnn import fuse_conv_bn
 11 | from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
 12 | from mmcv.runner import (get_dist_info, init_dist, load_checkpoint,
 13 |                          wrap_fp16_model)
 14 | 
 15 | from mmdet.apis import multi_gpu_test, single_gpu_test
 16 | from mmdet.datasets import (build_dataloader, build_dataset,
 17 |                             replace_ImageToTensor)
 18 | from mmdet.models import build_detector
 19 | 
 20 | import p2t
 21 | 
 22 | def parse_args():
 23 |     parser = argparse.ArgumentParser(
 24 |         description='MMDet test (and eval) a model')
 25 |     parser.add_argument('config', help='test config file path')
 26 |     parser.add_argument('checkpoint', help='checkpoint file')
 27 |     parser.add_argument(
 28 |         '--work-dir',
 29 |         help='the directory to save the file containing evaluation metrics')
 30 |     parser.add_argument('--out', help='output result file in pickle format')
 31 |     parser.add_argument(
 32 |         '--fuse-conv-bn',
 33 |         action='store_true',
 34 |         help='Whether to fuse conv and bn, this will slightly increase'
 35 |         'the inference speed')
 36 |     parser.add_argument(
 37 |         '--format-only',
 38 |         action='store_true',
 39 |         help='Format the output results without perform evaluation. It is'
 40 |         'useful when you want to format the result to a specific format and '
 41 |         'submit it to the test server')
 42 |     parser.add_argument(
 43 |         '--eval',
 44 |         type=str,
 45 |         nargs='+',
 46 |         help='evaluation metrics, which depends on the dataset, e.g., "bbox",'
 47 |         ' "segm", "proposal" for COCO, and "mAP", "recall" for PASCAL VOC')
 48 |     parser.add_argument('--show', action='store_true', help='show results')
 49 |     parser.add_argument(
 50 |         '--show-dir', help='directory where painted images will be saved')
 51 |     parser.add_argument(
 52 |         '--show-score-thr',
 53 |         type=float,
 54 |         default=0.3,
 55 |         help='score threshold (default: 0.3)')
 56 |     parser.add_argument(
 57 |         '--gpu-collect',
 58 |         action='store_true',
 59 |         help='whether to use gpu to collect results.')
 60 |     parser.add_argument(
 61 |         '--tmpdir',
 62 |         help='tmp directory used for collecting results from multiple '
 63 |         'workers, available when gpu-collect is not specified')
 64 |     parser.add_argument(
 65 |         '--cfg-options',
 66 |         nargs='+',
 67 |         action=DictAction,
 68 |         help='override some settings in the used config, the key-value pair '
 69 |         'in xxx=yyy format will be merged into config file. If the value to '
 70 |         'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
 71 |         'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
 72 |         'Note that the quotation marks are necessary and that no white space '
 73 |         'is allowed.')
 74 |     parser.add_argument(
 75 |         '--options',
 76 |         nargs='+',
 77 |         action=DictAction,
 78 |         help='custom options for evaluation, the key-value pair in xxx=yyy '
 79 |         'format will be kwargs for dataset.evaluate() function (deprecate), '
 80 |         'change to --eval-options instead.')
 81 |     parser.add_argument(
 82 |         '--eval-options',
 83 |         nargs='+',
 84 |         action=DictAction,
 85 |         help='custom options for evaluation, the key-value pair in xxx=yyy '
 86 |         'format will be kwargs for dataset.evaluate() function')
 87 |     parser.add_argument(
 88 |         '--launcher',
 89 |         choices=['none', 'pytorch', 'slurm', 'mpi'],
 90 |         default='none',
 91 |         help='job launcher')
 92 |     parser.add_argument('--local_rank', type=int, default=0)
 93 |     args = parser.parse_args()
 94 |     if 'LOCAL_RANK' not in os.environ:
 95 |         os.environ['LOCAL_RANK'] = str(args.local_rank)
 96 | 
 97 |     if args.options and args.eval_options:
 98 |         raise ValueError(
 99 |             '--options and --eval-options cannot be both '
100 |             'specified, --options is deprecated in favor of --eval-options')
101 |     if args.options:
102 |         warnings.warn('--options is deprecated in favor of --eval-options')
103 |         args.eval_options = args.options
104 |     return args
105 | 
106 | 
107 | def main():
108 |     args = parse_args()
109 | 
110 |     assert args.out or args.eval or args.format_only or args.show \
111 |         or args.show_dir, \
112 |         ('Please specify at least one operation (save/eval/format/show the '
113 |          'results / save the results) with the argument "--out", "--eval"'
114 |          ', "--format-only", "--show" or "--show-dir"')
115 | 
116 |     if args.eval and args.format_only:
117 |         raise ValueError('--eval and --format_only cannot be both specified')
118 | 
119 |     if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
120 |         raise ValueError('The output file must be a pkl file.')
121 | 
122 |     cfg = Config.fromfile(args.config)
123 |     if args.cfg_options is not None:
124 |         cfg.merge_from_dict(args.cfg_options)
125 |     # import modules from string list.
126 |     if cfg.get('custom_imports', None):
127 |         from mmcv.utils import import_modules_from_strings
128 |         import_modules_from_strings(**cfg['custom_imports'])
129 |     # set cudnn_benchmark
130 |     if cfg.get('cudnn_benchmark', False):
131 |         torch.backends.cudnn.benchmark = True
132 | 
133 |     cfg.model.pretrained = None
134 |     if cfg.model.get('neck'):
135 |         if isinstance(cfg.model.neck, list):
136 |             for neck_cfg in cfg.model.neck:
137 |                 if neck_cfg.get('rfp_backbone'):
138 |                     if neck_cfg.rfp_backbone.get('pretrained'):
139 |                         neck_cfg.rfp_backbone.pretrained = None
140 |         elif cfg.model.neck.get('rfp_backbone'):
141 |             if cfg.model.neck.rfp_backbone.get('pretrained'):
142 |                 cfg.model.neck.rfp_backbone.pretrained = None
143 | 
144 |     # in case the test dataset is concatenated
145 |     samples_per_gpu = 1
146 |     if isinstance(cfg.data.test, dict):
147 |         cfg.data.test.test_mode = True
148 |         samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)
149 |         if samples_per_gpu > 1:
150 |             # Replace 'ImageToTensor' to 'DefaultFormatBundle'
151 |             cfg.data.test.pipeline = replace_ImageToTensor(
152 |                 cfg.data.test.pipeline)
153 |     elif isinstance(cfg.data.test, list):
154 |         for ds_cfg in cfg.data.test:
155 |             ds_cfg.test_mode = True
156 |         samples_per_gpu = max(
157 |             [ds_cfg.pop('samples_per_gpu', 1) for ds_cfg in cfg.data.test])
158 |         if samples_per_gpu > 1:
159 |             for ds_cfg in cfg.data.test:
160 |                 ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline)
161 | 
162 |     # init distributed env first, since logger depends on the dist info.
163 |     if args.launcher == 'none':
164 |         distributed = False
165 |     else:
166 |         distributed = True
167 |         init_dist(args.launcher, **cfg.dist_params)
168 | 
169 |     rank, _ = get_dist_info()
170 |     # allows not to create
171 |     if args.work_dir is not None and rank == 0:
172 |         mmcv.mkdir_or_exist(osp.abspath(args.work_dir))
173 |         timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
174 |         json_file = osp.join(args.work_dir, f'eval_{timestamp}.json')
175 | 
176 |     # build the dataloader
177 |     dataset = build_dataset(cfg.data.test)
178 |     data_loader = build_dataloader(
179 |         dataset,
180 |         samples_per_gpu=samples_per_gpu,
181 |         workers_per_gpu=cfg.data.workers_per_gpu,
182 |         dist=distributed,
183 |         shuffle=False)
184 | 
185 |     # build the model and load checkpoint
186 |     cfg.model.train_cfg = None
187 |     model = build_detector(cfg.model, test_cfg=cfg.get('test_cfg'))
188 |     fp16_cfg = cfg.get('fp16', None)
189 |     if fp16_cfg is not None:
190 |         wrap_fp16_model(model)
191 |     checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
192 |     if args.fuse_conv_bn:
193 |         model = fuse_conv_bn(model)
194 |     # old versions did not save class info in checkpoints, this walkaround is
195 |     # for backward compatibility
196 |     if 'CLASSES' in checkpoint.get('meta', {}):
197 |         model.CLASSES = checkpoint['meta']['CLASSES']
198 |     else:
199 |         model.CLASSES = dataset.CLASSES
200 | 
201 |     if not distributed:
202 |         model = MMDataParallel(model, device_ids=[0])
203 |         outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
204 |                                   args.show_score_thr)
205 |     else:
206 |         model = MMDistributedDataParallel(
207 |             model.cuda(),
208 |             device_ids=[torch.cuda.current_device()],
209 |             broadcast_buffers=False)
210 |         outputs = multi_gpu_test(model, data_loader, args.tmpdir,
211 |                                  args.gpu_collect)
212 | 
213 |     rank, _ = get_dist_info()
214 |     if rank == 0:
215 |         if args.out:
216 |             print(f'\nwriting results to {args.out}')
217 |             mmcv.dump(outputs, args.out)
218 |         kwargs = {} if args.eval_options is None else args.eval_options
219 |         if args.format_only:
220 |             dataset.format_results(outputs, **kwargs)
221 |         if args.eval:
222 |             eval_kwargs = cfg.get('evaluation', {}).copy()
223 |             # hard-code way to remove EvalHook args
224 |             for key in [
225 |                     'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',
226 |                     'rule'
227 |             ]:
228 |                 eval_kwargs.pop(key, None)
229 |             eval_kwargs.update(dict(metric=args.eval, **kwargs))
230 |             metric = dataset.evaluate(outputs, **eval_kwargs)
231 |             print(metric)
232 |             metric_dict = dict(config=args.config, metric=metric)
233 |             if args.work_dir is not None and rank == 0:
234 |                 mmcv.dump(metric_dict, json_file)
235 | 
236 | 
237 | if __name__ == '__main__':
238 |     main()


--------------------------------------------------------------------------------
/detection/train.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import copy
  3 | import os
  4 | import os.path as osp
  5 | import time
  6 | import warnings
  7 | 
  8 | import mmcv
  9 | import torch
 10 | from mmcv import Config, DictAction
 11 | from mmcv.runner import get_dist_info, init_dist
 12 | from mmcv.utils import get_git_hash
 13 | 
 14 | from mmdet import __version__
 15 | from mmdet.apis import set_random_seed, train_detector
 16 | from mmdet.datasets import build_dataset
 17 | from mmdet.models import build_detector
 18 | from mmdet.utils import collect_env, get_root_logger
 19 | import p2t
 20 | 
 21 | def parse_args():
 22 |     parser = argparse.ArgumentParser(description='Train a detector')
 23 |     parser.add_argument('config', help='train config file path')
 24 |     parser.add_argument('--work-dir', help='the dir to save logs and models')
 25 |     parser.add_argument(
 26 |         '--resume-from', help='the checkpoint file to resume from')
 27 |     parser.add_argument(
 28 |         '--no-validate',
 29 |         action='store_true',
 30 |         help='whether not to evaluate the checkpoint during training')
 31 |     group_gpus = parser.add_mutually_exclusive_group()
 32 |     group_gpus.add_argument(
 33 |         '--gpus',
 34 |         type=int,
 35 |         help='number of gpus to use '
 36 |         '(only applicable to non-distributed training)')
 37 |     group_gpus.add_argument(
 38 |         '--gpu-ids',
 39 |         type=int,
 40 |         nargs='+',
 41 |         help='ids of gpus to use '
 42 |         '(only applicable to non-distributed training)')
 43 |     parser.add_argument('--seed', type=int, default=None, help='random seed')
 44 |     parser.add_argument(
 45 |         '--deterministic',
 46 |         action='store_true',
 47 |         help='whether to set deterministic options for CUDNN backend.')
 48 |     parser.add_argument(
 49 |         '--options',
 50 |         nargs='+',
 51 |         action=DictAction,
 52 |         help='override some settings in the used config, the key-value pair '
 53 |         'in xxx=yyy format will be merged into config file (deprecate), '
 54 |         'change to --cfg-options instead.')
 55 |     parser.add_argument(
 56 |         '--cfg-options',
 57 |         nargs='+',
 58 |         action=DictAction,
 59 |         help='override some settings in the used config, the key-value pair '
 60 |         'in xxx=yyy format will be merged into config file. If the value to '
 61 |         'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
 62 |         'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
 63 |         'Note that the quotation marks are necessary and that no white space '
 64 |         'is allowed.')
 65 |     parser.add_argument(
 66 |         '--launcher',
 67 |         choices=['none', 'pytorch', 'slurm', 'mpi'],
 68 |         default='none',
 69 |         help='job launcher')
 70 |     parser.add_argument('--local_rank', type=int, default=0)
 71 |     args = parser.parse_args()
 72 |     if 'LOCAL_RANK' not in os.environ:
 73 |         os.environ['LOCAL_RANK'] = str(args.local_rank)
 74 | 
 75 |     if args.options and args.cfg_options:
 76 |         raise ValueError(
 77 |             '--options and --cfg-options cannot be both '
 78 |             'specified, --options is deprecated in favor of --cfg-options')
 79 |     if args.options:
 80 |         warnings.warn('--options is deprecated in favor of --cfg-options')
 81 |         args.cfg_options = args.options
 82 | 
 83 |     return args
 84 | 
 85 | 
 86 | def main():
 87 |     args = parse_args()
 88 | 
 89 |     cfg = Config.fromfile(args.config)
 90 |     if args.cfg_options is not None:
 91 |         cfg.merge_from_dict(args.cfg_options)
 92 |     # import modules from string list.
 93 |     if cfg.get('custom_imports', None):
 94 |         from mmcv.utils import import_modules_from_strings
 95 |         import_modules_from_strings(**cfg['custom_imports'])
 96 |     # set cudnn_benchmark
 97 |     if cfg.get('cudnn_benchmark', False):
 98 |         torch.backends.cudnn.benchmark = True
 99 | 
100 |     # work_dir is determined in this priority: CLI > segment in file > filename
101 |     if args.work_dir is not None:
102 |         # update configs according to CLI args if args.work_dir is not None
103 |         cfg.work_dir = args.work_dir
104 |     elif cfg.get('work_dir', None) is None:
105 |         # use config filename as default work_dir if cfg.work_dir is None
106 |         cfg.work_dir = osp.join('./work_dirs',
107 |                                 osp.splitext(osp.basename(args.config))[0])
108 |     if args.resume_from is not None:
109 |         cfg.resume_from = args.resume_from
110 |     if args.gpu_ids is not None:
111 |         cfg.gpu_ids = args.gpu_ids
112 |     else:
113 |         cfg.gpu_ids = range(1) if args.gpus is None else range(args.gpus)
114 | 
115 |     # init distributed env first, since logger depends on the dist info.
116 |     if args.launcher == 'none':
117 |         distributed = False
118 |     else:
119 |         distributed = True
120 |         init_dist(args.launcher, **cfg.dist_params)
121 |         # re-set gpu_ids with distributed training mode
122 |         _, world_size = get_dist_info()
123 |         cfg.gpu_ids = range(world_size)
124 | 
125 |     # create work_dir
126 |     mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
127 |     # dump config
128 |     cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
129 |     # init the logger before other steps
130 |     timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
131 |     log_file = osp.join(cfg.work_dir, f'{timestamp}.log')
132 |     logger = get_root_logger(log_file=log_file, log_level=cfg.log_level)
133 | 
134 |     # init the meta dict to record some important information such as
135 |     # environment info and seed, which will be logged
136 |     meta = dict()
137 |     # log env info
138 |     env_info_dict = collect_env()
139 |     env_info = '\n'.join([(f'{k}: {v}') for k, v in env_info_dict.items()])
140 |     dash_line = '-' * 60 + '\n'
141 |     logger.info('Environment info:\n' + dash_line + env_info + '\n' +
142 |                 dash_line)
143 |     meta['env_info'] = env_info
144 |     meta['config'] = cfg.pretty_text
145 |     # log some basic info
146 |     logger.info(f'Distributed training: {distributed}')
147 |     logger.info(f'Config:\n{cfg.pretty_text}')
148 | 
149 |     # set random seeds
150 |     if args.seed is not None:
151 |         logger.info(f'Set random seed to {args.seed}, '
152 |                     f'deterministic: {args.deterministic}')
153 |         set_random_seed(args.seed, deterministic=args.deterministic)
154 |     cfg.seed = args.seed
155 |     meta['seed'] = args.seed
156 |     meta['exp_name'] = osp.basename(args.config)
157 | 
158 |     model = build_detector( cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
159 | 
160 |     datasets = [build_dataset(cfg.data.train)]
161 |     if len(cfg.workflow) == 2:
162 |         val_dataset = copy.deepcopy(cfg.data.val)
163 |         val_dataset.pipeline = cfg.data.train.pipeline
164 |         datasets.append(build_dataset(val_dataset))
165 |     if cfg.checkpoint_config is not None:
166 |         # save mmdet version, config file content and class names in
167 |         # checkpoints as meta data
168 |         cfg.checkpoint_config.meta = dict(
169 |             mmdet_version=__version__ + get_git_hash()[:7],
170 |             CLASSES=datasets[0].CLASSES)
171 |     # add an attribute for visualization convenience
172 |     model.CLASSES = datasets[0].CLASSES
173 |     train_detector(
174 |         model,
175 |         datasets,
176 |         cfg,
177 |         distributed=distributed,
178 |         validate=(not args.no_validate),
179 |         timestamp=timestamp,
180 |         meta=meta)
181 | 
182 | 
183 | if __name__ == '__main__':
184 |     main()


--------------------------------------------------------------------------------
/engine.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2015-present, Facebook, Inc.
  2 | # All rights reserved.
  3 | """
  4 | Train and eval functions used in main.py
  5 | """
  6 | import math
  7 | import sys
  8 | from typing import Iterable, Optional
  9 | 
 10 | import torch
 11 | 
 12 | from timm.data import Mixup
 13 | from timm.utils import accuracy, ModelEma
 14 | 
 15 | import utils
 16 | 
 17 | #from apex import amp, optimizers, parallel
 18 | 
 19 | def train_one_epoch(model: torch.nn.Module, criterion: None,
 20 |                     data_loader: Iterable, optimizer: torch.optim.Optimizer,
 21 |                     device: torch.device, epoch: int, loss_scaler, max_norm: float = 0,
 22 |                     model_ema: Optional[ModelEma] = None, mixup_fn: Optional[Mixup] = None,
 23 |                     set_training_mode=True,
 24 |                     fp32=False):
 25 |     model.train(set_training_mode)
 26 |     metric_logger = utils.MetricLogger(delimiter="  ")
 27 |     metric_logger.add_meter('lr', utils.SmoothedValue(window_size=1, fmt='{value:.6f}'))
 28 |     header = 'Epoch: [{}]'.format(epoch)
 29 |     print_freq = 10
 30 | 
 31 |     for samples, targets in metric_logger.log_every(data_loader, print_freq, header):
 32 |         samples = samples.to(device, non_blocking=True)
 33 |         targets = targets.to(device, non_blocking=True)
 34 | 
 35 |         if mixup_fn is not None:
 36 |             samples, targets = mixup_fn(samples, targets)
 37 | 
 38 |         # with torch.cuda.amp.autocast():
 39 |         #     outputs = model(samples)
 40 |         #     loss = criterion(samples, outputs, targets)
 41 |         with torch.cuda.amp.autocast(enabled=not fp32):
 42 |             outputs = model(samples)
 43 |             loss = criterion(outputs, targets)
 44 | 
 45 | 
 46 |         loss_value = loss.item()
 47 | 
 48 |         if not math.isfinite(loss_value):
 49 |             print("Loss is {}, stopping training".format(loss_value))
 50 |             sys.exit(1)
 51 | 
 52 |         optimizer.zero_grad()
 53 | 
 54 |         # this attribute is added by timm on one optimizer (adahessian)
 55 |         
 56 |         if loss_scaler is not None:
 57 |             is_second_order = hasattr(optimizer, 'is_second_order') and optimizer.is_second_order
 58 |             loss_scaler(loss, optimizer, clip_grad=max_norm,
 59 |                         parameters=model.parameters(), create_graph=is_second_order)
 60 | 
 61 |         torch.cuda.synchronize()
 62 |         if model_ema is not None:
 63 |             model_ema.update(model)
 64 | 
 65 |         metric_logger.update(loss=loss_value)
 66 |         metric_logger.update(lr=optimizer.param_groups[0]["lr"])
 67 |     # gather the stats from all processes
 68 |     metric_logger.synchronize_between_processes()
 69 |     print("Averaged stats:", metric_logger)
 70 |     return {k: meter.global_avg for k, meter in metric_logger.meters.items()}
 71 | 
 72 | 
 73 | @torch.no_grad()
 74 | def evaluate(data_loader, model, device):
 75 |     criterion = torch.nn.CrossEntropyLoss()
 76 | 
 77 |     metric_logger = utils.MetricLogger(delimiter="  ")
 78 |     header = 'Test:'
 79 | 
 80 |     # switch to evaluation mode
 81 |     model.eval()
 82 | 
 83 |     for images, target in metric_logger.log_every(data_loader, 10, header):
 84 |         images = images.to(device, non_blocking=True)
 85 |         target = target.to(device, non_blocking=True)
 86 | 
 87 |         # compute output
 88 |         with torch.cuda.amp.autocast():
 89 |             output = model(images)
 90 |             loss = criterion(output, target)
 91 | 
 92 |         acc1, acc5 = accuracy(output, target, topk=(1, 5))
 93 | 
 94 |         batch_size = images.shape[0]
 95 |         metric_logger.update(loss=loss.item())
 96 |         metric_logger.meters['acc1'].update(acc1.item(), n=batch_size)
 97 |         metric_logger.meters['acc5'].update(acc5.item(), n=batch_size)
 98 |     # gather the stats from all processes
 99 |     metric_logger.synchronize_between_processes()
100 |     print('* Acc@1 {top1.global_avg:.3f} Acc@5 {top5.global_avg:.3f} loss {losses.global_avg:.3f}'
101 |           .format(top1=metric_logger.acc1, top5=metric_logger.acc5, losses=metric_logger.loss))
102 | 
103 |     return {k: meter.global_avg for k, meter in metric_logger.meters.items()}
104 | 


--------------------------------------------------------------------------------
/figures/p2t-arch.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/yuhuan-wu/P2T/8811157e77bcca6aecf0206998de29373eaa872d/figures/p2t-arch.jpg


--------------------------------------------------------------------------------
/hubconf.py:
--------------------------------------------------------------------------------
1 | # Copyright (c) 2015-present, Facebook, Inc.
2 | # All rights reserved.
3 | from models import *
4 | 
5 | dependencies = ["torch", "torchvision", "timm"]
6 | 


--------------------------------------------------------------------------------
/mcloader/__init__.py:
--------------------------------------------------------------------------------
1 | from .classification import ClassificationDataset
2 | from .data_prefetcher import DataPrefetcher


--------------------------------------------------------------------------------
/mcloader/classification.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | from torch.utils.data import Dataset
 3 | from .imagenet import ImageNet
 4 | 
 5 | 
 6 | class ClassificationDataset(Dataset):
 7 |     """Dataset for classification.
 8 |     """
 9 | 
10 |     def __init__(self, split='train', pipeline=None):
11 |         if split == 'train':
12 |             self.data_source = ImageNet(root='data/imagenet/train',
13 |                                         list_file='data/imagenet/meta/train.txt',
14 |                                         memcached=True,
15 |                                         mclient_path='/mnt/lustre/share/memcached_client')
16 |         else:
17 |             self.data_source = ImageNet(root='data/imagenet/val',
18 |                                         list_file='data/imagenet/meta/val.txt',
19 |                                         memcached=True,
20 |                                         mclient_path='/mnt/lustre/share/memcached_client')
21 |         self.pipeline = pipeline
22 | 
23 |     def __len__(self):
24 |         return self.data_source.get_length()
25 | 
26 |     def __getitem__(self, idx):
27 |         img, target = self.data_source.get_sample(idx)
28 |         if self.pipeline is not None:
29 |             img = self.pipeline(img)
30 | 
31 |         return img, target
32 | 


--------------------------------------------------------------------------------
/mcloader/data_prefetcher.py:
--------------------------------------------------------------------------------
 1 | import torch
 2 | 
 3 | 
 4 | class DataPrefetcher:
 5 |     def __init__(self, loader):
 6 |         self.loader = iter(loader)
 7 |         self.stream = torch.cuda.Stream()
 8 |         self.preload()
 9 | 
10 |     def preload(self):
11 |         try:
12 |             self.next_input, self.next_target = next(self.loader)
13 |         except StopIteration:
14 |             self.next_input = None
15 |             self.next_target = None
16 |             return
17 | 
18 |         with torch.cuda.stream(self.stream):
19 |             self.next_input = self.next_input.cuda(non_blocking=True)
20 |             self.next_target = self.next_target.cuda(non_blocking=True)
21 | 
22 |     def next(self):
23 |         torch.cuda.current_stream().wait_stream(self.stream)
24 |         input = self.next_input
25 |         target = self.next_target
26 |         if input is not None:
27 |             self.preload()
28 |         return input, target
29 | 


--------------------------------------------------------------------------------
/mcloader/image_list.py:
--------------------------------------------------------------------------------
 1 | import os
 2 | from PIL import Image
 3 | 
 4 | from .mcloader import McLoader
 5 | 
 6 | 
 7 | class ImageList(object):
 8 | 
 9 |     def __init__(self, root, list_file, memcached=False, mclient_path=None):
10 |         with open(list_file, 'r') as f:
11 |             lines = f.readlines()
12 |         self.has_labels = len(lines[0].split()) == 2
13 |         if self.has_labels:
14 |             self.fns, self.labels = zip(*[l.strip().split() for l in lines])
15 |             self.labels = [int(l) for l in self.labels]
16 |         else:
17 |             self.fns = [l.strip() for l in lines]
18 |         self.fns = [os.path.join(root, fn) for fn in self.fns]
19 |         self.memcached = memcached
20 |         self.mclient_path = mclient_path
21 |         self.initialized = False
22 | 
23 |     def _init_memcached(self):
24 |         if not self.initialized:
25 |             assert self.mclient_path is not None
26 |             self.mc_loader = McLoader(self.mclient_path)
27 |             self.initialized = True
28 | 
29 |     def get_length(self):
30 |         return len(self.fns)
31 | 
32 |     def get_sample(self, idx):
33 |         if self.memcached:
34 |             self._init_memcached()
35 |         if self.memcached:
36 |             img = self.mc_loader(self.fns[idx])
37 |         else:
38 |             img = Image.open(self.fns[idx])
39 |         img = img.convert('RGB')
40 |         if self.has_labels:
41 |             target = self.labels[idx]
42 |             return img, target
43 |         else:
44 |             return img
45 | 


--------------------------------------------------------------------------------
/mcloader/imagenet.py:
--------------------------------------------------------------------------------
1 | from .image_list import ImageList
2 | 
3 | 
4 | class ImageNet(ImageList):
5 | 
6 |     def __init__(self, root, list_file, memcached, mclient_path):
7 |         super(ImageNet, self).__init__(
8 |             root, list_file, memcached, mclient_path)
9 | 


--------------------------------------------------------------------------------
/mcloader/mcloader.py:
--------------------------------------------------------------------------------
 1 | import io
 2 | from PIL import Image
 3 | try:
 4 |     import mc
 5 | except ImportError as E:
 6 |     pass
 7 | 
 8 | 
 9 | def pil_loader(img_str):
10 |     buff = io.BytesIO(img_str)
11 |     return Image.open(buff)
12 | 
13 | 
14 | class McLoader(object):
15 | 
16 |     def __init__(self, mclient_path):
17 |         assert mclient_path is not None, \
18 |             "Please specify 'data_mclient_path' in the config."
19 |         self.mclient_path = mclient_path
20 |         server_list_config_file = "{}/server_list.conf".format(
21 |             self.mclient_path)
22 |         client_config_file = "{}/client.conf".format(self.mclient_path)
23 |         self.mclient = mc.MemcachedClient.GetInstance(server_list_config_file,
24 |                                                       client_config_file)
25 | 
26 |     def __call__(self, fn):
27 |         try:
28 |             img_value = mc.pyvector()
29 |             self.mclient.Get(fn, img_value)
30 |             img_value_str = mc.ConvertBuffer(img_value)
31 |             img = pil_loader(img_value_str)
32 |         except:
33 |             print('Read image failed ({})'.format(fn))
34 |             return None
35 |         else:
36 |             return img


--------------------------------------------------------------------------------
/samplers.py:
--------------------------------------------------------------------------------
 1 | # Copyright (c) 2015-present, Facebook, Inc.
 2 | # All rights reserved.
 3 | import torch
 4 | import torch.distributed as dist
 5 | import math
 6 | 
 7 | 
 8 | class RASampler(torch.utils.data.Sampler):
 9 |     """Sampler that restricts data loading to a subset of the dataset for distributed,
10 |     with repeated augmentation.
11 |     It ensures that different each augmented version of a sample will be visible to a
12 |     different process (GPU)
13 |     Heavily based on torch.utils.data.DistributedSampler
14 |     """
15 | 
16 |     def __init__(self, dataset, num_replicas=None, rank=None, shuffle=True):
17 |         if num_replicas is None:
18 |             if not dist.is_available():
19 |                 raise RuntimeError("Requires distributed package to be available")
20 |             num_replicas = dist.get_world_size()
21 |         if rank is None:
22 |             if not dist.is_available():
23 |                 raise RuntimeError("Requires distributed package to be available")
24 |             rank = dist.get_rank()
25 |         self.dataset = dataset
26 |         self.num_replicas = num_replicas
27 |         self.rank = rank
28 |         self.epoch = 0
29 |         self.num_samples = int(math.ceil(len(self.dataset) * 3.0 / self.num_replicas))
30 |         self.total_size = self.num_samples * self.num_replicas
31 |         # self.num_selected_samples = int(math.ceil(len(self.dataset) / self.num_replicas))
32 |         self.num_selected_samples = int(math.floor(len(self.dataset) // 256 * 256 / self.num_replicas))
33 |         self.shuffle = shuffle
34 | 
35 |     def __iter__(self):
36 |         # deterministically shuffle based on epoch
37 |         g = torch.Generator()
38 |         g.manual_seed(self.epoch)
39 |         if self.shuffle:
40 |             indices = torch.randperm(len(self.dataset), generator=g).tolist()
41 |         else:
42 |             indices = list(range(len(self.dataset)))
43 | 
44 |         # add extra samples to make it evenly divisible
45 |         indices = [ele for ele in indices for i in range(3)]
46 |         indices += indices[:(self.total_size - len(indices))]
47 |         assert len(indices) == self.total_size
48 | 
49 |         # subsample
50 |         indices = indices[self.rank:self.total_size:self.num_replicas]
51 |         assert len(indices) == self.num_samples
52 | 
53 |         return iter(indices[:self.num_selected_samples])
54 | 
55 |     def __len__(self):
56 |         return self.num_selected_samples
57 | 
58 |     def set_epoch(self, epoch):
59 |         self.epoch = epoch
60 | 


--------------------------------------------------------------------------------
/segmentation/README.md:
--------------------------------------------------------------------------------
 1 | ## [TPAMI22] Pyramid Pooling Transformer for Scene Understanding
 2 | 
 3 | This folder contains full training and test code for semantic segmentation.
 4 | 
 5 | ### Requirements
 6 | 
 7 | * mmsegmentation >= 0.12+
 8 | 
 9 | ### Results (val set) & Pretrained Models)
10 | 
11 | 
12 | |  Base Model    | Variants  | mIoU | aAcc | mAcc | #Params (M) | # GFLOPS |                         Google Drive                         |
13 | | :--: | :-------: | :--: | :--: | :---------: | :------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
14 | | Semantic FPN    | P2T-Tiny  | 43.4 | 80.8 |    54.5    |    15.4    |   31.6   | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) |
15 | | Semantic FPN    | P2T-Small | 46.7 | 82.0 |    58.4    |    27.8    |   42.7   | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) |
16 | | Semantic FPN    | P2T-Base  | 48.7 | 82.9 |    60.7    |    39.8    |   58.5   | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) |
17 | | Semantic FPN      | P2T-Large | 49.4 | 83.3 |    61.9    |    58.1    |   77.7   | [[weights & logs]](https://drive.google.com/drive/folders/1SH9zmdGKvnpFBVU3dXS6-TZT04CZgkX9?usp=sharing) |
18 | 
19 | ### Data Preparation
20 | 
21 | Put data folder of ADE20K dataset to `data/ade/ADEChallengeData2016`.
22 | 
23 | ### Train
24 | 
25 | Use the following commands to train `Semantic FPN` with `P2T-Small` backbone for distributed learning with 8 GPUs:
26 | 
27 | ````
28 | bash dist_train.sh configs/sem_fpn_p2t_s_ade20k_80k.py 8
29 | ````
30 | 
31 | ### Validate
32 | 
33 | Please download the pretrained model from the above table. Put them to `pretrained` folder.
34 | Then, use the following commands to validate `Semantic FPN` with `P2T-Small` backbone in a single GPU:
35 | 
36 | ````
37 | bash dist_test.sh configs/sem_fpn_p2t_s_ade20k_80k.py pretrained/sem_fpn_p2t_s_ade20k_80k.pth 1
38 | ````
39 | 
40 | 
41 | ### Other Notes
42 | 
43 | If you meet any problems, please do not hesitate to contact us.
44 | Issues and discussions are welcome in the repository!
45 | You can also contact us via sending messages to this email: wuyuhuan@mail.nankai.edu.cn
46 | 
47 | 
48 | 
49 | ### Citation
50 | 
51 | If you are using the code/model/data provided here in a publication, please consider citing our works:
52 | 
53 | ````
54 | @ARTICLE{wu2022p2t,
55 |   author={Wu, Yu-Huan and Liu, Yun and Zhan, Xin and Cheng, Ming-Ming},
56 |   journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
57 |   title={{P2T}: Pyramid Pooling Transformer for Scene Understanding}, 
58 |   year={2022},
59 |   doi = {10.1109/tpami.2022.3202765},
60 | }
61 | ````
62 | 
63 | ### License
64 | 
65 | This code is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License for Non-Commercial use only. Any commercial use should get formal permission first.
66 | 
67 | 


--------------------------------------------------------------------------------
/segmentation/align_resize.py:
--------------------------------------------------------------------------------
  1 | import mmcv
  2 | import numpy as np
  3 | from mmcv.utils import deprecated_api_warning, is_tuple_of
  4 | from numpy import random
  5 | 
  6 | from mmseg.datasets.builder import PIPELINES
  7 | #from IPython import embed
  8 | 
  9 | @PIPELINES.register_module()
 10 | class AlignResize(object):
 11 |     """Resize images & seg. Align
 12 |     """
 13 | 
 14 |     def __init__(self,
 15 |                  img_scale=None,
 16 |                  multiscale_mode='range',
 17 |                  ratio_range=None,
 18 |                  keep_ratio=True,
 19 |                  size_divisor=32):
 20 |         if img_scale is None:
 21 |             self.img_scale = None
 22 |         else:
 23 |             if isinstance(img_scale, list):
 24 |                 self.img_scale = img_scale
 25 |             else:
 26 |                 self.img_scale = [img_scale]
 27 |             assert mmcv.is_list_of(self.img_scale, tuple)
 28 | 
 29 |         if ratio_range is not None:
 30 |             # mode 1: given img_scale=None and a range of image ratio
 31 |             # mode 2: given a scale and a range of image ratio
 32 |             assert self.img_scale is None or len(self.img_scale) == 1
 33 |         else:
 34 |             # mode 3 and 4: given multiple scales or a range of scales
 35 |             assert multiscale_mode in ['value', 'range']
 36 | 
 37 |         self.multiscale_mode = multiscale_mode
 38 |         self.ratio_range = ratio_range
 39 |         self.keep_ratio = keep_ratio
 40 |         self.size_divisor = size_divisor
 41 | 
 42 |     @staticmethod
 43 |     def random_select(img_scales):
 44 |         """Randomly select an img_scale from given candidates.
 45 | 
 46 |         Args:
 47 |             img_scales (list[tuple]): Images scales for selection.
 48 | 
 49 |         Returns:
 50 |             (tuple, int): Returns a tuple ``(img_scale, scale_dix)``,
 51 |                 where ``img_scale`` is the selected image scale and
 52 |                 ``scale_idx`` is the selected index in the given candidates.
 53 |         """
 54 | 
 55 |         assert mmcv.is_list_of(img_scales, tuple)
 56 |         scale_idx = np.random.randint(len(img_scales))
 57 |         img_scale = img_scales[scale_idx]
 58 |         return img_scale, scale_idx
 59 | 
 60 |     @staticmethod
 61 |     def random_sample(img_scales):
 62 |         """Randomly sample an img_scale when ``multiscale_mode=='range'``.
 63 | 
 64 |         Args:
 65 |             img_scales (list[tuple]): Images scale range for sampling.
 66 |                 There must be two tuples in img_scales, which specify the lower
 67 |                 and uper bound of image scales.
 68 | 
 69 |         Returns:
 70 |             (tuple, None): Returns a tuple ``(img_scale, None)``, where
 71 |                 ``img_scale`` is sampled scale and None is just a placeholder
 72 |                 to be consistent with :func:`random_select`.
 73 |         """
 74 | 
 75 |         assert mmcv.is_list_of(img_scales, tuple) and len(img_scales) == 2
 76 |         img_scale_long = [max(s) for s in img_scales]
 77 |         img_scale_short = [min(s) for s in img_scales]
 78 |         long_edge = np.random.randint(
 79 |             min(img_scale_long),
 80 |             max(img_scale_long) + 1)
 81 |         short_edge = np.random.randint(
 82 |             min(img_scale_short),
 83 |             max(img_scale_short) + 1)
 84 |         img_scale = (long_edge, short_edge)
 85 |         return img_scale, None
 86 | 
 87 |     @staticmethod
 88 |     def random_sample_ratio(img_scale, ratio_range):
 89 |         """Randomly sample an img_scale when ``ratio_range`` is specified.
 90 | 
 91 |         A ratio will be randomly sampled from the range specified by
 92 |         ``ratio_range``. Then it would be multiplied with ``img_scale`` to
 93 |         generate sampled scale.
 94 | 
 95 |         Args:
 96 |             img_scale (tuple): Images scale base to multiply with ratio.
 97 |             ratio_range (tuple[float]): The minimum and maximum ratio to scale
 98 |                 the ``img_scale``.
 99 | 
100 |         Returns:
101 |             (tuple, None): Returns a tuple ``(scale, None)``, where
102 |                 ``scale`` is sampled ratio multiplied with ``img_scale`` and
103 |                 None is just a placeholder to be consistent with
104 |                 :func:`random_select`.
105 |         """
106 | 
107 |         assert isinstance(img_scale, tuple) and len(img_scale) == 2
108 |         min_ratio, max_ratio = ratio_range
109 |         assert min_ratio <= max_ratio
110 |         ratio = np.random.random_sample() * (max_ratio - min_ratio) + min_ratio
111 |         scale = int(img_scale[0] * ratio), int(img_scale[1] * ratio)
112 |         return scale, None
113 | 
114 |     def _random_scale(self, results):
115 |         """Randomly sample an img_scale according to ``ratio_range`` and
116 |         ``multiscale_mode``.
117 | 
118 |         If ``ratio_range`` is specified, a ratio will be sampled and be
119 |         multiplied with ``img_scale``.
120 |         If multiple scales are specified by ``img_scale``, a scale will be
121 |         sampled according to ``multiscale_mode``.
122 |         Otherwise, single scale will be used.
123 | 
124 |         Args:
125 |             results (dict): Result dict from :obj:`dataset`.
126 | 
127 |         Returns:
128 |             dict: Two new keys 'scale` and 'scale_idx` are added into
129 |                 ``results``, which would be used by subsequent pipelines.
130 |         """
131 | 
132 |         if self.ratio_range is not None:
133 |             if self.img_scale is None:
134 |                 h, w = results['img'].shape[:2]
135 |                 scale, scale_idx = self.random_sample_ratio((w, h),
136 |                                                             self.ratio_range)
137 |             else:
138 |                 scale, scale_idx = self.random_sample_ratio(
139 |                     self.img_scale[0], self.ratio_range)
140 |         elif len(self.img_scale) == 1:
141 |             scale, scale_idx = self.img_scale[0], 0
142 |         elif self.multiscale_mode == 'range':
143 |             scale, scale_idx = self.random_sample(self.img_scale)
144 |         elif self.multiscale_mode == 'value':
145 |             scale, scale_idx = self.random_select(self.img_scale)
146 |         else:
147 |             raise NotImplementedError
148 | 
149 |         results['scale'] = scale
150 |         results['scale_idx'] = scale_idx
151 | 
152 |     def _align(self, img, size_divisor, interpolation=None):
153 |         align_h = int(np.ceil(img.shape[0] / size_divisor)) * size_divisor
154 |         align_w = int(np.ceil(img.shape[1] / size_divisor)) * size_divisor
155 |         if interpolation == None:
156 |             img = mmcv.imresize(img, (align_w, align_h))
157 |         else:
158 |             img = mmcv.imresize(img, (align_w, align_h), interpolation=interpolation)
159 |         return img
160 | 
161 |     def _resize_img(self, results):
162 |         """Resize images with ``results['scale']``."""
163 |         if self.keep_ratio:
164 |             img, scale_factor = mmcv.imrescale(
165 |                 results['img'], results['scale'], return_scale=True)
166 |             #### align ####
167 |             img = self._align(img, self.size_divisor)
168 |             # the w_scale and h_scale has minor difference
169 |             # a real fix should be done in the mmcv.imrescale in the future
170 |             new_h, new_w = img.shape[:2]
171 |             h, w = results['img'].shape[:2]
172 |             w_scale = new_w / w
173 |             h_scale = new_h / h
174 |         else:
175 |             img, w_scale, h_scale = mmcv.imresize(
176 |                 results['img'], results['scale'], return_scale=True)
177 | 
178 |             h, w = img.shape[:2]
179 |             assert int(np.ceil(h / self.size_divisor)) * self.size_divisor == h and \
180 |                    int(np.ceil(w / self.size_divisor)) * self.size_divisor == w, \
181 |                    "img size not align. h:{} w:{}".format(h,w)
182 |         scale_factor = np.array([w_scale, h_scale, w_scale, h_scale],
183 |                                 dtype=np.float32)
184 |         results['img'] = img
185 |         results['img_shape'] = img.shape
186 |         results['pad_shape'] = img.shape  # in case that there is no padding
187 |         results['scale_factor'] = scale_factor
188 |         results['keep_ratio'] = self.keep_ratio
189 | 
190 |     def _resize_seg(self, results):
191 |         """Resize semantic segmentation map with ``results['scale']``."""
192 |         for key in results.get('seg_fields', []):
193 |             if self.keep_ratio:
194 |                 gt_seg = mmcv.imrescale(
195 |                     results[key], results['scale'], interpolation='nearest')
196 |                 gt_seg = self._align(gt_seg, self.size_divisor, interpolation='nearest')
197 |             else:
198 |                 gt_seg = mmcv.imresize(
199 |                     results[key], results['scale'], interpolation='nearest')
200 |                 h, w = gt_seg.shape[:2]
201 |                 assert int(np.ceil(h / self.size_divisor)) * self.size_divisor == h and \
202 |                        int(np.ceil(w / self.size_divisor)) * self.size_divisor == w, \
203 |                     "gt_seg size not align. h:{} w:{}".format(h, w)
204 |             results[key] = gt_seg
205 | 
206 |     def __call__(self, results):
207 |         """Call function to resize images, bounding boxes, masks, semantic
208 |         segmentation map.
209 | 
210 |         Args:
211 |             results (dict): Result dict from loading pipeline.
212 | 
213 |         Returns:
214 |             dict: Resized results, 'img_shape', 'pad_shape', 'scale_factor',
215 |                 'keep_ratio' keys are added into result dict.
216 |         """
217 | 
218 |         if 'scale' not in results:
219 |             self._random_scale(results)
220 |         self._resize_img(results)
221 |         self._resize_seg(results)
222 |         return results
223 | 
224 |     def __repr__(self):
225 |         repr_str = self.__class__.__name__
226 |         repr_str += (f'(img_scale={self.img_scale}, '
227 |                      f'multiscale_mode={self.multiscale_mode}, '
228 |                      f'ratio_range={self.ratio_range}, '
229 |                      f'keep_ratio={self.keep_ratio})')
230 |         return repr_str


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/ade20k.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'ADE20KDataset'
 3 | data_root = 'data/ade/ADEChallengeData2016'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | crop_size = (512, 512)
 7 | train_pipeline = [
 8 |     dict(type='LoadImageFromFile'),
 9 |     dict(type='LoadAnnotations', reduce_zero_label=True),
10 |     dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
11 |     dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
12 |     dict(type='RandomFlip', prob=0.5),
13 |     dict(type='PhotoMetricDistortion'),
14 |     dict(type='Normalize', **img_norm_cfg),
15 |     dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
16 |     dict(type='DefaultFormatBundle'),
17 |     dict(type='Collect', keys=['img', 'gt_semantic_seg']),
18 | ]
19 | test_pipeline = [
20 |     dict(type='LoadImageFromFile'),
21 |     dict(
22 |         type='MultiScaleFlipAug',
23 |         img_scale=(2048, 512),
24 |         #img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
25 |         flip=False,
26 |         transforms=[
27 |             dict(type='AlignResize', keep_ratio=True, size_divisor=32),
28 |             dict(type='RandomFlip'),
29 |             dict(type='Normalize', **img_norm_cfg),
30 |             dict(type='ImageToTensor', keys=['img']),
31 |             dict(type='Collect', keys=['img']),
32 |         ])
33 | ]
34 | print(test_pipeline)
35 | 
36 | data = dict(
37 |     samples_per_gpu=4,
38 |     workers_per_gpu=4,
39 |     train=dict(
40 |         type=dataset_type,
41 |         data_root=data_root,
42 |         img_dir='images/training',
43 |         ann_dir='annotations/training',
44 |         pipeline=train_pipeline),
45 |     val=dict(
46 |         type=dataset_type,
47 |         data_root=data_root,
48 |         img_dir='images/validation',
49 |         ann_dir='annotations/validation',
50 |         pipeline=test_pipeline),
51 |     test=dict(
52 |         type=dataset_type,
53 |         data_root=data_root,
54 |         img_dir='images/validation',
55 |         ann_dir='annotations/validation',
56 |         pipeline=test_pipeline))
57 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/chase_db1.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'ChaseDB1Dataset'
 3 | data_root = 'data/CHASE_DB1'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | img_scale = (960, 999)
 7 | crop_size = (128, 128)
 8 | train_pipeline = [
 9 |     dict(type='LoadImageFromFile'),
10 |     dict(type='LoadAnnotations'),
11 |     dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)),
12 |     dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
13 |     dict(type='RandomFlip', prob=0.5),
14 |     dict(type='PhotoMetricDistortion'),
15 |     dict(type='Normalize', **img_norm_cfg),
16 |     dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
17 |     dict(type='DefaultFormatBundle'),
18 |     dict(type='Collect', keys=['img', 'gt_semantic_seg'])
19 | ]
20 | test_pipeline = [
21 |     dict(type='LoadImageFromFile'),
22 |     dict(
23 |         type='MultiScaleFlipAug',
24 |         img_scale=img_scale,
25 |         # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0],
26 |         flip=False,
27 |         transforms=[
28 |             dict(type='Resize', keep_ratio=True),
29 |             dict(type='RandomFlip'),
30 |             dict(type='Normalize', **img_norm_cfg),
31 |             dict(type='ImageToTensor', keys=['img']),
32 |             dict(type='Collect', keys=['img'])
33 |         ])
34 | ]
35 | 
36 | data = dict(
37 |     samples_per_gpu=4,
38 |     workers_per_gpu=4,
39 |     train=dict(
40 |         type='RepeatDataset',
41 |         times=40000,
42 |         dataset=dict(
43 |             type=dataset_type,
44 |             data_root=data_root,
45 |             img_dir='images/training',
46 |             ann_dir='annotations/training',
47 |             pipeline=train_pipeline)),
48 |     val=dict(
49 |         type=dataset_type,
50 |         data_root=data_root,
51 |         img_dir='images/validation',
52 |         ann_dir='annotations/validation',
53 |         pipeline=test_pipeline),
54 |     test=dict(
55 |         type=dataset_type,
56 |         data_root=data_root,
57 |         img_dir='images/validation',
58 |         ann_dir='annotations/validation',
59 |         pipeline=test_pipeline))
60 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/cityscapes.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'CityscapesDataset'
 3 | data_root = 'data/cityscapes/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | crop_size = (512, 1024)
 7 | train_pipeline = [
 8 |     dict(type='LoadImageFromFile'),
 9 |     dict(type='LoadAnnotations'),
10 |     dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)),
11 |     dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
12 |     dict(type='RandomFlip', prob=0.5),
13 |     dict(type='PhotoMetricDistortion'),
14 |     dict(type='Normalize', **img_norm_cfg),
15 |     dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
16 |     dict(type='DefaultFormatBundle'),
17 |     dict(type='Collect', keys=['img', 'gt_semantic_seg']),
18 | ]
19 | test_pipeline = [
20 |     dict(type='LoadImageFromFile'),
21 |     dict(
22 |         type='MultiScaleFlipAug',
23 |         img_scale=(2048, 1024),
24 |         # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
25 |         flip=False,
26 |         transforms=[
27 |             dict(type='AlignResize', keep_ratio=True, size_divisor=32),
28 |             dict(type='RandomFlip'),
29 |             dict(type='Normalize', **img_norm_cfg),
30 |             dict(type='ImageToTensor', keys=['img']),
31 |             dict(type='Collect', keys=['img']),
32 |         ])
33 | ]
34 | data = dict(
35 |     samples_per_gpu=2,
36 |     workers_per_gpu=2,
37 |     train=dict(
38 |         type=dataset_type,
39 |         data_root=data_root,
40 |         img_dir='leftImg8bit/train',
41 |         ann_dir='gtFine/train',
42 |         pipeline=train_pipeline),
43 |     val=dict(
44 |         type=dataset_type,
45 |         data_root=data_root,
46 |         img_dir='leftImg8bit/val',
47 |         ann_dir='gtFine/val',
48 |         pipeline=test_pipeline),
49 |     test=dict(
50 |         type=dataset_type,
51 |         data_root=data_root,
52 |         img_dir='leftImg8bit/val',
53 |         ann_dir='gtFine/val',
54 |         pipeline=test_pipeline))
55 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/drive.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'DRIVEDataset'
 3 | data_root = 'data/DRIVE'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | img_scale = (584, 565)
 7 | crop_size = (64, 64)
 8 | train_pipeline = [
 9 |     dict(type='LoadImageFromFile'),
10 |     dict(type='LoadAnnotations'),
11 |     dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)),
12 |     dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
13 |     dict(type='RandomFlip', prob=0.5),
14 |     dict(type='PhotoMetricDistortion'),
15 |     dict(type='Normalize', **img_norm_cfg),
16 |     dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
17 |     dict(type='DefaultFormatBundle'),
18 |     dict(type='Collect', keys=['img', 'gt_semantic_seg'])
19 | ]
20 | test_pipeline = [
21 |     dict(type='LoadImageFromFile'),
22 |     dict(
23 |         type='MultiScaleFlipAug',
24 |         img_scale=img_scale,
25 |         # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0],
26 |         flip=False,
27 |         transforms=[
28 |             dict(type='Resize', keep_ratio=True),
29 |             dict(type='RandomFlip'),
30 |             dict(type='Normalize', **img_norm_cfg),
31 |             dict(type='ImageToTensor', keys=['img']),
32 |             dict(type='Collect', keys=['img'])
33 |         ])
34 | ]
35 | 
36 | data = dict(
37 |     samples_per_gpu=4,
38 |     workers_per_gpu=4,
39 |     train=dict(
40 |         type='RepeatDataset',
41 |         times=40000,
42 |         dataset=dict(
43 |             type=dataset_type,
44 |             data_root=data_root,
45 |             img_dir='images/training',
46 |             ann_dir='annotations/training',
47 |             pipeline=train_pipeline)),
48 |     val=dict(
49 |         type=dataset_type,
50 |         data_root=data_root,
51 |         img_dir='images/validation',
52 |         ann_dir='annotations/validation',
53 |         pipeline=test_pipeline),
54 |     test=dict(
55 |         type=dataset_type,
56 |         data_root=data_root,
57 |         img_dir='images/validation',
58 |         ann_dir='annotations/validation',
59 |         pipeline=test_pipeline))
60 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/hrf.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'HRFDataset'
 3 | data_root = 'data/HRF'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | img_scale = (2336, 3504)
 7 | crop_size = (256, 256)
 8 | train_pipeline = [
 9 |     dict(type='LoadImageFromFile'),
10 |     dict(type='LoadAnnotations'),
11 |     dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)),
12 |     dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
13 |     dict(type='RandomFlip', prob=0.5),
14 |     dict(type='PhotoMetricDistortion'),
15 |     dict(type='Normalize', **img_norm_cfg),
16 |     dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
17 |     dict(type='DefaultFormatBundle'),
18 |     dict(type='Collect', keys=['img', 'gt_semantic_seg'])
19 | ]
20 | test_pipeline = [
21 |     dict(type='LoadImageFromFile'),
22 |     dict(
23 |         type='MultiScaleFlipAug',
24 |         img_scale=img_scale,
25 |         # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0],
26 |         flip=False,
27 |         transforms=[
28 |             dict(type='Resize', keep_ratio=True),
29 |             dict(type='RandomFlip'),
30 |             dict(type='Normalize', **img_norm_cfg),
31 |             dict(type='ImageToTensor', keys=['img']),
32 |             dict(type='Collect', keys=['img'])
33 |         ])
34 | ]
35 | 
36 | data = dict(
37 |     samples_per_gpu=4,
38 |     workers_per_gpu=4,
39 |     train=dict(
40 |         type='RepeatDataset',
41 |         times=40000,
42 |         dataset=dict(
43 |             type=dataset_type,
44 |             data_root=data_root,
45 |             img_dir='images/training',
46 |             ann_dir='annotations/training',
47 |             pipeline=train_pipeline)),
48 |     val=dict(
49 |         type=dataset_type,
50 |         data_root=data_root,
51 |         img_dir='images/validation',
52 |         ann_dir='annotations/validation',
53 |         pipeline=test_pipeline),
54 |     test=dict(
55 |         type=dataset_type,
56 |         data_root=data_root,
57 |         img_dir='images/validation',
58 |         ann_dir='annotations/validation',
59 |         pipeline=test_pipeline))
60 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/pascal_context.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'PascalContextDataset'
 3 | data_root = 'data/VOCdevkit/VOC2010/'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | 
 7 | img_scale = (520, 520)
 8 | crop_size = (480, 480)
 9 | 
10 | train_pipeline = [
11 |     dict(type='LoadImageFromFile'),
12 |     dict(type='LoadAnnotations'),
13 |     dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)),
14 |     dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
15 |     dict(type='RandomFlip', prob=0.5),
16 |     dict(type='PhotoMetricDistortion'),
17 |     dict(type='Normalize', **img_norm_cfg),
18 |     dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
19 |     dict(type='DefaultFormatBundle'),
20 |     dict(type='Collect', keys=['img', 'gt_semantic_seg']),
21 | ]
22 | test_pipeline = [
23 |     dict(type='LoadImageFromFile'),
24 |     dict(
25 |         type='MultiScaleFlipAug',
26 |         img_scale=img_scale,
27 |         # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
28 |         flip=False,
29 |         transforms=[
30 |             dict(type='Resize', keep_ratio=True),
31 |             dict(type='RandomFlip'),
32 |             dict(type='Normalize', **img_norm_cfg),
33 |             dict(type='ImageToTensor', keys=['img']),
34 |             dict(type='Collect', keys=['img']),
35 |         ])
36 | ]
37 | data = dict(
38 |     samples_per_gpu=4,
39 |     workers_per_gpu=4,
40 |     train=dict(
41 |         type=dataset_type,
42 |         data_root=data_root,
43 |         img_dir='JPEGImages',
44 |         ann_dir='SegmentationClassContext',
45 |         split='ImageSets/SegmentationContext/train.txt',
46 |         pipeline=train_pipeline),
47 |     val=dict(
48 |         type=dataset_type,
49 |         data_root=data_root,
50 |         img_dir='JPEGImages',
51 |         ann_dir='SegmentationClassContext',
52 |         split='ImageSets/SegmentationContext/val.txt',
53 |         pipeline=test_pipeline),
54 |     test=dict(
55 |         type=dataset_type,
56 |         data_root=data_root,
57 |         img_dir='JPEGImages',
58 |         ann_dir='SegmentationClassContext',
59 |         split='ImageSets/SegmentationContext/val.txt',
60 |         pipeline=test_pipeline))
61 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/pascal_voc12.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'PascalVOCDataset'
 3 | data_root = 'data/VOCdevkit/VOC2012'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | crop_size = (512, 512)
 7 | train_pipeline = [
 8 |     dict(type='LoadImageFromFile'),
 9 |     dict(type='LoadAnnotations'),
10 |     dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)),
11 |     dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
12 |     dict(type='RandomFlip', prob=0.5),
13 |     dict(type='PhotoMetricDistortion'),
14 |     dict(type='Normalize', **img_norm_cfg),
15 |     dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
16 |     dict(type='DefaultFormatBundle'),
17 |     dict(type='Collect', keys=['img', 'gt_semantic_seg']),
18 | ]
19 | test_pipeline = [
20 |     dict(type='LoadImageFromFile'),
21 |     dict(
22 |         type='MultiScaleFlipAug',
23 |         img_scale=(2048, 512),
24 |         # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
25 |         flip=False,
26 |         transforms=[
27 |             dict(type='Resize', keep_ratio=True),
28 |             dict(type='RandomFlip'),
29 |             dict(type='Normalize', **img_norm_cfg),
30 |             dict(type='ImageToTensor', keys=['img']),
31 |             dict(type='Collect', keys=['img']),
32 |         ])
33 | ]
34 | data = dict(
35 |     samples_per_gpu=4,
36 |     workers_per_gpu=4,
37 |     train=dict(
38 |         type=dataset_type,
39 |         data_root=data_root,
40 |         img_dir='JPEGImages',
41 |         ann_dir='SegmentationClass',
42 |         split='ImageSets/Segmentation/train.txt',
43 |         pipeline=train_pipeline),
44 |     val=dict(
45 |         type=dataset_type,
46 |         data_root=data_root,
47 |         img_dir='JPEGImages',
48 |         ann_dir='SegmentationClass',
49 |         split='ImageSets/Segmentation/val.txt',
50 |         pipeline=test_pipeline),
51 |     test=dict(
52 |         type=dataset_type,
53 |         data_root=data_root,
54 |         img_dir='JPEGImages',
55 |         ann_dir='SegmentationClass',
56 |         split='ImageSets/Segmentation/val.txt',
57 |         pipeline=test_pipeline))
58 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/pascal_voc12_aug.py:
--------------------------------------------------------------------------------
 1 | _base_ = './pascal_voc12.py'
 2 | # dataset settings
 3 | data = dict(
 4 |     train=dict(
 5 |         ann_dir=['SegmentationClass', 'SegmentationClassAug'],
 6 |         split=[
 7 |             'ImageSets/Segmentation/train.txt',
 8 |             'ImageSets/Segmentation/aug.txt'
 9 |         ]))
10 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/datasets/stare.py:
--------------------------------------------------------------------------------
 1 | # dataset settings
 2 | dataset_type = 'STAREDataset'
 3 | data_root = 'data/STARE'
 4 | img_norm_cfg = dict(
 5 |     mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
 6 | img_scale = (605, 700)
 7 | crop_size = (128, 128)
 8 | train_pipeline = [
 9 |     dict(type='LoadImageFromFile'),
10 |     dict(type='LoadAnnotations'),
11 |     dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)),
12 |     dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75),
13 |     dict(type='RandomFlip', prob=0.5),
14 |     dict(type='PhotoMetricDistortion'),
15 |     dict(type='Normalize', **img_norm_cfg),
16 |     dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255),
17 |     dict(type='DefaultFormatBundle'),
18 |     dict(type='Collect', keys=['img', 'gt_semantic_seg'])
19 | ]
20 | test_pipeline = [
21 |     dict(type='LoadImageFromFile'),
22 |     dict(
23 |         type='MultiScaleFlipAug',
24 |         img_scale=img_scale,
25 |         # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0],
26 |         flip=False,
27 |         transforms=[
28 |             dict(type='Resize', keep_ratio=True),
29 |             dict(type='RandomFlip'),
30 |             dict(type='Normalize', **img_norm_cfg),
31 |             dict(type='ImageToTensor', keys=['img']),
32 |             dict(type='Collect', keys=['img'])
33 |         ])
34 | ]
35 | 
36 | data = dict(
37 |     samples_per_gpu=4,
38 |     workers_per_gpu=4,
39 |     train=dict(
40 |         type='RepeatDataset',
41 |         times=40000,
42 |         dataset=dict(
43 |             type=dataset_type,
44 |             data_root=data_root,
45 |             img_dir='images/training',
46 |             ann_dir='annotations/training',
47 |             pipeline=train_pipeline)),
48 |     val=dict(
49 |         type=dataset_type,
50 |         data_root=data_root,
51 |         img_dir='images/validation',
52 |         ann_dir='annotations/validation',
53 |         pipeline=test_pipeline),
54 |     test=dict(
55 |         type=dataset_type,
56 |         data_root=data_root,
57 |         img_dir='images/validation',
58 |         ann_dir='annotations/validation',
59 |         pipeline=test_pipeline))
60 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/default_runtime.py:
--------------------------------------------------------------------------------
 1 | # yapf:disable
 2 | log_config = dict(
 3 |     interval=50,
 4 |     hooks=[
 5 |         dict(type='TextLoggerHook', by_epoch=False),
 6 |         # dict(type='TensorboardLoggerHook')
 7 |     ])
 8 | # yapf:enable
 9 | dist_params = dict(backend='nccl')
10 | log_level = 'INFO'
11 | load_from = None
12 | resume_from = None
13 | workflow = [('train', 1)]
14 | cudnn_benchmark = False
15 | 
16 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/models/fpn_r50.py:
--------------------------------------------------------------------------------
 1 | # model settings
 2 | norm_cfg = dict(type='SyncBN', requires_grad=True)
 3 | model = dict(
 4 |     type='EncoderDecoder',
 5 |     pretrained='open-mmlab://resnet50_v1c',
 6 |     backbone=dict(
 7 |         type='ResNetV1c',
 8 |         depth=50,
 9 |         num_stages=4,
10 |         out_indices=(0, 1, 2, 3),
11 |         dilations=(1, 1, 1, 1),
12 |         strides=(1, 2, 2, 2),
13 |         norm_cfg=norm_cfg,
14 |         norm_eval=False,
15 |         style='pytorch',
16 |         contract_dilation=True),
17 |     neck=dict(
18 |         type='FPN',
19 |         in_channels=[256, 512, 1024, 2048],
20 |         out_channels=256,
21 |         num_outs=4),
22 |     decode_head=dict(
23 |         type='FPNHead',
24 |         in_channels=[256, 256, 256, 256],
25 |         in_index=[0, 1, 2, 3],
26 |         feature_strides=[4, 8, 16, 32],
27 |         channels=128,
28 |         dropout_ratio=0.1,
29 |         num_classes=19,
30 |         norm_cfg=norm_cfg,
31 |         align_corners=False,
32 |         loss_decode=dict(
33 |             type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
34 |     # model training and testing settings
35 |     train_cfg=dict(),
36 |     test_cfg=dict(mode='whole'))
37 | find_unused_parameters = True


--------------------------------------------------------------------------------
/segmentation/configs/_base_/models/upernet_r50.py:
--------------------------------------------------------------------------------
 1 | # model settings
 2 | norm_cfg = dict(type='SyncBN', requires_grad=True)
 3 | model = dict(
 4 |     type='EncoderDecoder',
 5 |     pretrained='open-mmlab://resnet50_v1c',
 6 |     backbone=dict(
 7 |         type='ResNetV1c',
 8 |         depth=50,
 9 |         num_stages=4,
10 |         out_indices=(0, 1, 2, 3),
11 |         dilations=(1, 1, 1, 1),
12 |         strides=(1, 2, 2, 2),
13 |         norm_cfg=norm_cfg,
14 |         norm_eval=False,
15 |         style='pytorch',
16 |         contract_dilation=True),
17 |     decode_head=dict(
18 |         type='UPerHead',
19 |         in_channels=[256, 512, 1024, 2048],
20 |         in_index=[0, 1, 2, 3],
21 |         pool_scales=(1, 2, 3, 6),
22 |         channels=512,
23 |         dropout_ratio=0.1,
24 |         num_classes=19,
25 |         norm_cfg=norm_cfg,
26 |         align_corners=False,
27 |         loss_decode=dict(
28 |             type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
29 |     auxiliary_head=dict(
30 |         type='FCNHead',
31 |         in_channels=1024,
32 |         in_index=2,
33 |         channels=256,
34 |         num_convs=1,
35 |         concat_input=False,
36 |         dropout_ratio=0.1,
37 |         num_classes=19,
38 |         norm_cfg=norm_cfg,
39 |         align_corners=False,
40 |         loss_decode=dict(
41 |             type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)),
42 |     # model training and testing settings
43 |     train_cfg=dict(),
44 |     test_cfg=dict(mode='whole')
45 |     # test_cfg=dict(mode='slide', crop_size=(1024, 1024), stride=(768, 768))
46 | )
47 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/schedules/schedule_160k.py:
--------------------------------------------------------------------------------
 1 | # optimizer
 2 | optimizer = dict(type='SGD', lr=0.01, weight_decay=0.0005)
 3 | optimizer_config = dict()
 4 | # learning policy
 5 | lr_config = dict(policy='poly', power=0.9, min_lr=1e-5, by_epoch=False)
 6 | # runtime settings
 7 | runner = dict(type='IterBasedRunner', max_iters=160000)
 8 | checkpoint_config = dict(by_epoch=False, interval=16000)
 9 | evaluation = dict(interval=16000, metric='mIoU')
10 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/schedules/schedule_20k.py:
--------------------------------------------------------------------------------
 1 | # optimizer
 2 | optimizer = dict(type='SGD', lr=0.01, weight_decay=0.0005)
 3 | optimizer_config = dict()
 4 | # learning policy
 5 | lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
 6 | # runtime settings
 7 | runner = dict(type='IterBasedRunner', max_iters=20000)
 8 | checkpoint_config = dict(by_epoch=False, interval=2000)
 9 | evaluation = dict(interval=2000, metric='mIoU')
10 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/schedules/schedule_40k.py:
--------------------------------------------------------------------------------
 1 | # optimizer
 2 | optimizer = dict(type='SGD', lr=0.01, weight_decay=0.0005)
 3 | optimizer_config = dict()
 4 | # learning policy
 5 | lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False)
 6 | # runtime settings
 7 | runner = dict(type='IterBasedRunner', max_iters=40000)
 8 | checkpoint_config = dict(by_epoch=False, interval=4000)
 9 | evaluation = dict(interval=4000, metric='mIoU')
10 | 


--------------------------------------------------------------------------------
/segmentation/configs/_base_/schedules/schedule_80k.py:
--------------------------------------------------------------------------------
 1 | # optimizer
 2 | optimizer = dict(type='SGD', lr=0.01, weight_decay=0.0005)
 3 | optimizer_config = dict()
 4 | # learning policy
 5 | lr_config = dict(policy='poly', power=0.9, min_lr=1e-6, by_epoch=False)
 6 | # runtime settings
 7 | runner = dict(type='IterBasedRunner', max_iters=80000)
 8 | checkpoint_config = dict(by_epoch=False, interval=8000)
 9 | evaluation = dict(interval=8000, metric='mIoU')
10 | 


--------------------------------------------------------------------------------
/segmentation/configs/sem_fpn_p2t_b_ade20k_80k.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/fpn_r50.py', '_base_/datasets/ade20k.py',
 3 |     '_base_/default_runtime.py', '_base_/schedules/schedule_80k.py'
 4 | ]
 5 | 
 6 | model = dict(
 7 |     type='EncoderDecoder',
 8 |     pretrained='pretrained/p2t_base.pth',
 9 |     backbone=dict(
10 |         type='p2t_base',
11 |         style='pytorch'),
12 |     neck=dict(
13 |         type='FPN',
14 |         in_channels=[64, 128, 320, 512],
15 |         out_channels=256,
16 |         num_outs=4),
17 |     decode_head=dict(num_classes=150),
18 |     )
19 | cudnn_benchmark = False
20 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
21 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
22 | data = dict(samples_per_gpu=2)
23 | find_unused_parameters = True
24 | 


--------------------------------------------------------------------------------
/segmentation/configs/sem_fpn_p2t_l_ade20k_80k.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/fpn_r50.py', '_base_/datasets/ade20k.py',
 3 |     '_base_/default_runtime.py', '_base_/schedules/schedule_80k.py'
 4 | ]
 5 | 
 6 | model = dict(
 7 |     type='EncoderDecoder',
 8 |     pretrained='pretrained/p2t_large.pth',
 9 |     backbone=dict(
10 |         type='p2t_large',
11 |         style='pytorch'),
12 |     neck=dict(
13 |         type='FPN',
14 |         in_channels=[64, 128, 320, 640],
15 |         out_channels=256,
16 |         num_outs=4),
17 |     decode_head=dict(num_classes=150),
18 |     )
19 | cudnn_benchmark = False
20 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
21 | optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
22 | data = dict(samples_per_gpu=2)
23 | find_unused_parameters = True
24 | 


--------------------------------------------------------------------------------
/segmentation/configs/sem_fpn_p2t_s_ade20k_80k.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/fpn_r50.py', '_base_/datasets/ade20k.py',
 3 |     '_base_/default_runtime.py', '_base_/schedules/schedule_80k.py'
 4 | ]
 5 | 
 6 | model = dict(
 7 |     type='EncoderDecoder',
 8 |     pretrained='pretrained/p2t_small.pth',
 9 |     backbone=dict(
10 |         type='p2t_small',
11 |         style='pytorch'),
12 |     neck=dict(
13 |         type='FPN',
14 |         in_channels=[64, 128, 320, 512],
15 |         out_channels=256,
16 |         num_outs=4),
17 |     decode_head=dict(num_classes=150),
18 |     )
19 | cudnn_benchmark = False
20 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
21 | optimizer_config = dict()
22 | data = dict(samples_per_gpu=2)
23 | find_unused_parameters = True
24 | 


--------------------------------------------------------------------------------
/segmentation/configs/sem_fpn_p2t_t_ade20k_80k.py:
--------------------------------------------------------------------------------
 1 | _base_ = [
 2 |     '_base_/models/fpn_r50.py', '_base_/datasets/ade20k.py',
 3 |     '_base_/default_runtime.py', '_base_/schedules/schedule_80k.py'
 4 | ]
 5 | 
 6 | model = dict(
 7 |     type='EncoderDecoder',
 8 |     pretrained='pretrained/p2t_tiny.pth',
 9 |     backbone=dict(
10 |         type='p2t_tiny',
11 |         style='pytorch'),
12 |     neck=dict(
13 |         type='FPN',
14 |         in_channels=[48, 96, 240, 384],
15 |         out_channels=256,
16 |         num_outs=4),
17 |     decode_head=dict(num_classes=150),
18 |     )
19 | cudnn_benchmark = False
20 | optimizer = dict(type='AdamW', lr=0.0001, weight_decay=0.0001)
21 | optimizer_config = dict()
22 | data = dict(samples_per_gpu=2)
23 | find_unused_parameters = True
24 | 


--------------------------------------------------------------------------------
/segmentation/dist_test.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | CONFIG=$1
 4 | CHECKPOINT=$2
 5 | GPUS=$3
 6 | PORT=${PORT:-29400}
 7 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
 8 | python $(dirname "$0")/test.py $CONFIG $CHECKPOINT ${@:4}
 9 | 
10 | ## example command:
11 | ## bash dist_test.sh configs/sem_fpn_p2t_s_ade20k_80k.py pretrained/sem_fpn_p2t_s_ade20k_80k.pth 1 --eval mIoU
12 | 


--------------------------------------------------------------------------------
/segmentation/dist_train.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | export OMP_NUM_THREADS=1
 4 | 
 5 | CONFIG=$1
 6 | N_GPUS=$2
 7 | PORT=${PORT:-29500}
 8 | 
 9 | 
10 | PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
11 | python -m torch.distributed.launch --nproc_per_node=${N_GPUS} \
12 |     --master_port=${PORT} \
13 |     --use_env $(dirname "$0")/train.py ${CONFIG} --launcher pytorch ${@:3}
14 | 
15 | ## bash dist_train.sh configs/sem_fpn_p2t_s_ade20k_80k.py 8
16 | ## training [p2t_small + semantic fpn] costs ~4GB GPU memory for each GPU (2 images/gpu).
17 | 


--------------------------------------------------------------------------------
/segmentation/p2t.py:
--------------------------------------------------------------------------------
  1 | from os import sep
  2 | from pickle import TRUE
  3 | import torch
  4 | import torch.nn as nn
  5 | import torch.nn.functional as F
  6 | import torch.jit as jit
  7 | from functools import partial
  8 | 
  9 | from timm.models.layers import DropPath, to_2tuple, trunc_normal_
 10 | from timm.models.registry import register_model
 11 | from timm.models.vision_transformer import _cfg
 12 | 
 13 | from mmseg.models.builder import BACKBONES
 14 | from mmcv.runner import load_checkpoint
 15 | from mmseg.utils import get_root_logger
 16 | 
 17 | 
 18 | import numpy as np
 19 | from time import time
 20 | 
 21 | __all__ = [
 22 |     'p2t_tiny', 'p2t_small', 'p2t_base', 'p2t_large'
 23 | ]
 24 | 
 25 | 
 26 | 
 27 | class IRB(nn.Module):
 28 |     def __init__(self, in_features, hidden_features=None, out_features=None, ksize=3, act_layer=nn.Hardswish, drop=0.):
 29 |         super().__init__()
 30 |         out_features = out_features or in_features
 31 |         hidden_features = hidden_features or in_features
 32 |         self.fc1 = nn.Conv2d(in_features, hidden_features, 1, 1, 0)
 33 |         self.act = act_layer()
 34 |         self.conv = nn.Conv2d(hidden_features, hidden_features, kernel_size=ksize, padding=ksize//2, stride=1, groups=hidden_features)
 35 |         self.fc2 = nn.Conv2d(hidden_features, out_features, 1, 1, 0)
 36 |         self.drop = nn.Dropout(drop)
 37 |     
 38 |     def forward(self, x, H, W):
 39 |         B, N, C = x.shape
 40 |         x = x.permute(0,2,1).reshape(B, C, H, W)
 41 |         x = self.fc1(x)
 42 |         x = self.act(x)
 43 |         x = self.conv(x)
 44 |         x = self.act(x)
 45 |         x = self.fc2(x)
 46 |         return x.reshape(B, C, -1).permute(0,2,1)
 47 | 
 48 | 
 49 | class PoolingAttention(nn.Module):
 50 |     def __init__(self, dim, num_heads=2, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., 
 51 |         pool_ratios=[1,2,3,6]):
 52 | 
 53 |         super().__init__()
 54 |         assert dim % num_heads == 0, f"dim {dim} should be divided by num_heads {num_heads}."
 55 | 
 56 |         self.dim = dim
 57 |         self.num_heads = num_heads
 58 |         self.num_elements = np.array([t*t for t in pool_ratios]).sum()
 59 |         head_dim = dim // num_heads
 60 |         self.scale = qk_scale or head_dim ** -0.5
 61 | 
 62 |         self.q = nn.Sequential(nn.Linear(dim, dim, bias=qkv_bias))
 63 |         self.kv = nn.Sequential(nn.Linear(dim, dim * 2, bias=qkv_bias))
 64 |         
 65 |         self.attn_drop = nn.Dropout(attn_drop)
 66 |         self.proj = nn.Linear(dim, dim)
 67 |         self.proj_drop = nn.Dropout(proj_drop)
 68 | 
 69 |         self.pool_ratios = pool_ratios
 70 |         self.pools = nn.ModuleList()
 71 |         
 72 |         self.norm = nn.LayerNorm(dim)
 73 | 
 74 |     def forward(self, x, H, W, d_convs=None):
 75 |         B, N, C = x.shape
 76 |         
 77 |         q = self.q(x).reshape(B, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)
 78 |         pools = []
 79 |         x_ = x.permute(0, 2, 1).reshape(B, C, H, W)
 80 |         for (pool_ratio, l) in zip(self.pool_ratios, d_convs):
 81 |             pool = F.adaptive_avg_pool2d(x_, (round(H/pool_ratio), round(W/pool_ratio)))
 82 |             pool = pool + l(pool)
 83 |             pools.append(pool.view(B, C, -1))
 84 |         
 85 |         pools = torch.cat(pools, dim=2)
 86 |         pools = self.norm(pools.permute(0,2,1))
 87 |         
 88 |         kv = self.kv(pools).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
 89 |         k, v = kv[0], kv[1]
 90 | 
 91 |         attn = (q @ k.transpose(-2, -1)) * self.scale
 92 |         attn = attn.softmax(dim=-1)
 93 |         x = (attn @ v)   
 94 |         x = x.transpose(1,2).contiguous().reshape(B, N, C)
 95 |         
 96 |         x = self.proj(x)
 97 | 
 98 |         return x
 99 | 
100 | 
101 | class Block(nn.Module):
102 | 
103 |     def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
104 |                  drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, pool_ratios=[12,16,20,24]):
105 |         super().__init__()
106 |         self.norm1 = norm_layer(dim)
107 |         self.attn = PoolingAttention(
108 |                     dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, 
109 |                     attn_drop=attn_drop, proj_drop=drop, pool_ratios=pool_ratios)
110 |         
111 |         self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
112 | 
113 |         self.norm2 = norm_layer(dim)
114 |         self.mlp = IRB(in_features=dim, hidden_features=int(dim * mlp_ratio), act_layer=nn.Hardswish, drop=drop, ksize=3)
115 |     
116 |     def forward(self, x, H, W, d_convs=None):
117 |         x = x + self.drop_path(self.attn(self.norm1(x), H, W, d_convs=d_convs))
118 |         x = x + self.drop_path(self.mlp(self.norm2(x), H, W))
119 | 
120 |         return x
121 | 
122 | class PatchEmbed(nn.Module):
123 |     """ Image to Patch Embedding
124 |     """
125 | 
126 |     def __init__(self, img_size=224, patch_size=16, kernel_size=3, in_chans=3, embed_dim=768, overlap=True):
127 |         super().__init__()
128 |         img_size = to_2tuple(img_size)
129 |         patch_size = to_2tuple(patch_size)
130 | 
131 |         self.img_size = img_size
132 |         self.patch_size = patch_size
133 |         assert img_size[0] % patch_size[0] == 0 and img_size[1] % patch_size[1] == 0, \
134 |             f"img_size {img_size} should be divided by patch_size {patch_size}."
135 |         self.H, self.W = img_size[0] // patch_size[0], img_size[1] // patch_size[1]
136 |         self.num_patches = self.H * self.W
137 |         if not overlap:
138 |             self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)
139 |         else:
140 |             self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=kernel_size, stride=patch_size, padding=kernel_size//2)
141 |         
142 |         self.norm = nn.LayerNorm(embed_dim)
143 | 
144 |     def forward(self, x):
145 |         B, C, H, W = x.shape
146 |         x = self.proj(x).flatten(2).transpose(1, 2)
147 |         x = self.norm(x)
148 |         H, W = H // self.patch_size[0], W // self.patch_size[1]
149 | 
150 |         return x, (H, W)
151 | 
152 | 
153 | 
154 | class PyramidPoolingTransformer(nn.Module):
155 |     def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes=1000, embed_dims=[64, 128, 320, 512],
156 |                  num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4], qkv_bias=True, qk_scale=None, drop_rate=0.,
157 |                  attn_drop_rate=0., drop_path_rate=0.1, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 9, 3], **kwargs): #
158 |         super().__init__()
159 |         print("loading p2t")
160 |         self.num_classes = num_classes
161 |         self.depths = depths
162 | 
163 |         self.embed_dims = embed_dims
164 | 
165 |         # pyramid pooling ratios for each stage
166 |         pool_ratios = [[12,16,20,24], [6,8,10,12], [3,4,5,6], [1,2,3,4]]
167 |         
168 |         self.patch_embed1 = PatchEmbed(img_size=img_size, patch_size=4, kernel_size=7, in_chans=in_chans,
169 |                                        embed_dim=embed_dims[0], overlap=True)
170 | 
171 |         self.patch_embed2 = PatchEmbed(img_size=img_size // 4, patch_size=2, in_chans=embed_dims[0],
172 |                                     embed_dim=embed_dims[1], overlap=True)
173 |         self.patch_embed3 = PatchEmbed(img_size=img_size // 8, patch_size=2, in_chans=embed_dims[1],
174 |                                     embed_dim=embed_dims[2], overlap=True)
175 |         self.patch_embed4 = PatchEmbed(img_size=img_size // 16, patch_size=2, in_chans=embed_dims[2],
176 |                                     embed_dim=embed_dims[3], overlap=True)
177 |         
178 |         self.d_convs1 = nn.ModuleList([nn.Conv2d(embed_dims[0], embed_dims[0], kernel_size=3, stride=1, padding=1, groups=embed_dims[0]) for temp in pool_ratios[0]])
179 |         self.d_convs2 = nn.ModuleList([nn.Conv2d(embed_dims[1], embed_dims[1], kernel_size=3, stride=1, padding=1, groups=embed_dims[1]) for temp in pool_ratios[1]])
180 |         self.d_convs3 = nn.ModuleList([nn.Conv2d(embed_dims[2], embed_dims[2], kernel_size=3, stride=1, padding=1, groups=embed_dims[2]) for temp in pool_ratios[2]])
181 |         self.d_convs4 = nn.ModuleList([nn.Conv2d(embed_dims[3], embed_dims[3], kernel_size=3, stride=1, padding=1, groups=embed_dims[3]) for temp in pool_ratios[3]])
182 | 
183 |         # transformer encoder
184 |         dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule
185 |         cur = 0
186 | 
187 | 
188 |         ksize = 3
189 | 
190 |         self.block1 = nn.ModuleList([Block(
191 |             dim=embed_dims[0], num_heads=num_heads[0], mlp_ratio=mlp_ratios[0], qkv_bias=qkv_bias, qk_scale=qk_scale,
192 |             drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[0])
193 |             for i in range(depths[0])])
194 |         
195 | 
196 |         cur += depths[0]
197 |         self.block2 = nn.ModuleList([Block(
198 |             dim=embed_dims[1], num_heads=num_heads[1], mlp_ratio=mlp_ratios[1], qkv_bias=qkv_bias, qk_scale=qk_scale,
199 |             drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[1])
200 |             for i in range(depths[1])])
201 | 
202 |         cur += depths[1]
203 | 
204 |         
205 |         self.block3 = nn.ModuleList([Block(
206 |             dim=embed_dims[2], num_heads=num_heads[2], mlp_ratio=mlp_ratios[2], qkv_bias=qkv_bias, qk_scale=qk_scale,
207 |             drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[2])
208 |             for i in range(depths[2])])
209 | 
210 |         cur += depths[2]
211 | 
212 |         self.block4 = nn.ModuleList([Block(
213 |             dim=embed_dims[3], num_heads=num_heads[3], mlp_ratio=mlp_ratios[3], qkv_bias=qkv_bias, qk_scale=qk_scale,
214 |             drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer, pool_ratios=pool_ratios[3])
215 |             for i in range(depths[3])])
216 |         
217 |         # classification head, usually not used in dense prediction tasks
218 |         self.head = nn.Linear(embed_dims[3], num_classes) if num_classes > 0 else nn.Identity()
219 |         self.gap = nn.AdaptiveAvgPool1d(1)
220 | 
221 |         self.apply(self._init_weights)
222 | 
223 | 
224 |     def init_weights(self, pretrained=None):
225 |         if isinstance(pretrained, str):
226 |             logger = get_root_logger()
227 |             load_checkpoint(self, pretrained, map_location='cpu', strict=False, logger=logger)
228 |  
229 | 
230 |     def reset_drop_path(self, drop_path_rate):
231 |         dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(self.depths))]
232 |         cur = 0
233 |         for i in range(self.depths[0]):
234 |             self.block1[i].drop_path.drop_prob = dpr[cur + i]
235 | 
236 |         cur += self.depths[0]
237 |         for i in range(self.depths[1]):
238 |             self.block2[i].drop_path.drop_prob = dpr[cur + i]
239 | 
240 |         cur += self.depths[1]
241 |         for i in range(self.depths[2]):
242 |             self.block3[i].drop_path.drop_prob = dpr[cur + i]
243 | 
244 |         cur += self.depths[2]
245 |         for i in range(self.depths[3]):
246 |             self.block4[i].drop_path.drop_prob = dpr[cur + i]
247 | 
248 |     def _init_weights(self, m):
249 |         if isinstance(m, nn.Linear):
250 |             trunc_normal_(m.weight, std=.02)
251 |             if isinstance(m, nn.Linear) and m.bias is not None:
252 |                 nn.init.constant_(m.bias, 0)
253 |         elif isinstance(m, nn.LayerNorm):
254 |             nn.init.constant_(m.bias, 0)
255 |             nn.init.constant_(m.weight, 1.0)
256 | 
257 | 
258 |     @torch.jit.ignore
259 |     def no_weight_decay(self):
260 |         # return {'pos_embed', 'cls_token'} # has pos_embed may be better
261 |         return {'cls_token'}
262 | 
263 |     def get_classifier(self):
264 |         return self.head
265 | 
266 |     def reset_classifier(self, num_classes, global_pool=''):
267 |         self.num_classes = num_classes
268 |         self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()
269 |     
270 |     def forward_features(self, x):
271 |         outs = []
272 | 
273 |         B = x.shape[0]
274 | 
275 |         # stage 1
276 |         x, (H, W) = self.patch_embed1(x)
277 |         
278 |         for idx, blk in enumerate(self.block1):
279 |             x = blk(x, H, W, self.d_convs1)
280 |         x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2)
281 |         outs.append(x)
282 | 
283 |         # stage 2
284 |         x, (H, W) = self.patch_embed2(x)
285 | 
286 |         for idx, blk in enumerate(self.block2):
287 |             x = blk(x, H, W, self.d_convs2)
288 |         x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2)
289 |         outs.append(x)
290 | 
291 |         x, (H, W) = self.patch_embed3(x)
292 | 
293 |         for idx, blk in enumerate(self.block3):
294 |             x = blk(x, H, W, self.d_convs3)
295 |         x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2)
296 |         outs.append(x)
297 |         
298 |         # stage 4
299 |         x, (H, W) = self.patch_embed4(x)
300 | 
301 |         for idx, blk in enumerate(self.block4):
302 |             x = blk(x, H, W, self.d_convs4)
303 |         x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2)
304 |         outs.append(x)
305 |         
306 |         return outs
307 | 
308 |     def forward(self, x):
309 |         x = self.forward_features(x)
310 | 
311 |         return x
312 | 
313 | 
314 | def _conv_filter(state_dict, patch_size=16):
315 |     """ convert patch embedding weight from manual patchify + linear proj to conv"""
316 |     out_dict = {}
317 |     for k, v in state_dict.items():
318 |         if 'patch_embed.proj.weight' in k:
319 |             v = v.reshape((v.shape[0], 3, patch_size, patch_size))
320 |         out_dict[k] = v
321 | 
322 |     return out_dict
323 | 
324 | 
325 | @BACKBONES.register_module()
326 | class p2t_tiny(PyramidPoolingTransformer):
327 |     def __init__(self, **kwargs):
328 |         super().__init__(
329 |             patch_size=4, embed_dims=[48, 96, 240, 384], num_heads=[1, 2, 5, 8], mlp_ratios=[8, 8, 4, 4],
330 |             qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 6, 3], 
331 |             drop_rate=0.0, drop_path_rate=0.1, **kwargs)
332 | 
333 | 
334 | @BACKBONES.register_module()
335 | class p2t_small(PyramidPoolingTransformer):
336 |     def __init__(self, **kwargs):
337 |         super().__init__(
338 |             patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8],
339 |             qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2,2,9,3], mlp_ratios=[8,8,4,4],
340 |             drop_rate=0.0, drop_path_rate=0.1, **kwargs)
341 | 
342 | 
343 | @BACKBONES.register_module()
344 | class p2t_base(PyramidPoolingTransformer):
345 |     def __init__(self, **kwargs):
346 |         super().__init__(
347 |             patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8],
348 |             qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3,4,18,3], mlp_ratios=[8,8,4,4],
349 |             drop_rate=0.0, drop_path_rate=0.3, **kwargs)
350 | 
351 | 
352 | @BACKBONES.register_module()
353 | class p2t_large(PyramidPoolingTransformer):
354 |     def __init__(self, **kwargs):
355 |         super().__init__(
356 |             patch_size=4, embed_dims=[64, 128, 320, 640], num_heads=[1, 2, 5, 8],
357 |             qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3,8,27,3], mlp_ratios=[8,8,4,4],
358 |             drop_rate=0.0, drop_path_rate=0.3, **kwargs)
359 | 
360 | 
361 | 


--------------------------------------------------------------------------------
/segmentation/test.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | 
  4 | import mmcv
  5 | import torch
  6 | from mmcv.parallel import MMDataParallel, MMDistributedDataParallel
  7 | from mmcv.runner import get_dist_info, init_dist, load_checkpoint
  8 | from mmcv.utils import DictAction
  9 | from mmseg.apis import multi_gpu_test, single_gpu_test
 10 | from mmseg.datasets import build_dataloader, build_dataset
 11 | from mmseg.models import build_segmentor
 12 | 
 13 | import p2t
 14 | from align_resize import AlignResize
 15 | 
 16 | def parse_args():
 17 |     parser = argparse.ArgumentParser(
 18 |         description='mmseg test (and eval) a model')
 19 |     parser.add_argument('config', help='test config file path')
 20 |     parser.add_argument('checkpoint', help='checkpoint file')
 21 |     parser.add_argument(
 22 |         '--aug-test', action='store_true', help='Use Flip and Multi scale aug')
 23 |     parser.add_argument('--out', help='output result file in pickle format')
 24 |     parser.add_argument(
 25 |         '--format-only',
 26 |         action='store_true',
 27 |         help='Format the output results without perform evaluation. It is'
 28 |         'useful when you want to format the result to a specific format and '
 29 |         'submit it to the test server')
 30 |     parser.add_argument(
 31 |         '--eval',
 32 |         type=str,
 33 |         nargs='+',
 34 |         help='evaluation metrics, which depends on the dataset, e.g., "mIoU"'
 35 |         ' for generic datasets, and "cityscapes" for Cityscapes')
 36 |     parser.add_argument('--show', action='store_true', help='show results')
 37 |     parser.add_argument(
 38 |         '--show-dir', help='directory where painted images will be saved')
 39 |     parser.add_argument(
 40 |         '--gpu-collect',
 41 |         action='store_true',
 42 |         help='whether to use gpu to collect results.')
 43 |     parser.add_argument(
 44 |         '--tmpdir',
 45 |         help='tmp directory used for collecting results from multiple '
 46 |         'workers, available when gpu_collect is not specified')
 47 |     parser.add_argument(
 48 |         '--options', nargs='+', action=DictAction, help='custom options')
 49 |     parser.add_argument(
 50 |         '--eval-options',
 51 |         nargs='+',
 52 |         action=DictAction,
 53 |         help='custom options for evaluation')
 54 |     parser.add_argument(
 55 |         '--launcher',
 56 |         choices=['none', 'pytorch', 'slurm', 'mpi'],
 57 |         default='none',
 58 |         help='job launcher')
 59 |     parser.add_argument(
 60 |         '--opacity',
 61 |         type=float,
 62 |         default=1,
 63 |         help='Opacity of painted segmentation map. In (0, 1] range.')
 64 |     parser.add_argument('--local_rank', type=int, default=0)
 65 |     args = parser.parse_args()
 66 |     if 'LOCAL_RANK' not in os.environ:
 67 |         os.environ['LOCAL_RANK'] = str(args.local_rank)
 68 |     return args
 69 | 
 70 | 
 71 | def main():
 72 |     args = parse_args()
 73 | 
 74 |     assert args.out or args.eval or args.format_only or args.show \
 75 |         or args.show_dir, \
 76 |         ('Please specify at least one operation (save/eval/format/show the '
 77 |          'results / save the results) with the argument "--out", "--eval"'
 78 |          ', "--format-only", "--show" or "--show-dir"')
 79 | 
 80 |     if args.eval and args.format_only:
 81 |         raise ValueError('--eval and --format_only cannot be both specified')
 82 | 
 83 |     if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):
 84 |         raise ValueError('The output file must be a pkl file.')
 85 | 
 86 |     cfg = mmcv.Config.fromfile(args.config)
 87 |     if args.options is not None:
 88 |         cfg.merge_from_dict(args.options)
 89 |     # set cudnn_benchmark
 90 |     if cfg.get('cudnn_benchmark', False):
 91 |         torch.backends.cudnn.benchmark = False
 92 |     if args.aug_test:
 93 |         # hard code index
 94 |         cfg.data.test.pipeline[1].img_ratios = [
 95 |             0.5, 0.75, 1.0, 1.25, 1.5, 1.75
 96 |         ]
 97 |         cfg.data.test.pipeline[1].flip = True
 98 |     cfg.model.pretrained = None
 99 |     cfg.data.test.test_mode = True
100 | 
101 |     # init distributed env first, since logger depends on the dist info.
102 |     if args.launcher == 'none':
103 |         distributed = False
104 |     else:
105 |         distributed = True
106 |         init_dist(args.launcher, **cfg.dist_params)
107 | 
108 |     # build the dataloader
109 |     # TODO: support multiple images per gpu (only minor changes are needed)
110 |     dataset = build_dataset(cfg.data.test)
111 |     data_loader = build_dataloader(
112 |         dataset,
113 |         samples_per_gpu=1,
114 |         workers_per_gpu=cfg.data.workers_per_gpu,
115 |         dist=distributed,
116 |         shuffle=False)
117 | 
118 |     # build the model and load checkpoint
119 |     cfg.model.train_cfg = None
120 |     model = build_segmentor(cfg.model, test_cfg=cfg.get('test_cfg'))
121 |     if os.path.exists(args.checkpoint):
122 |         checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')
123 |         if 'CLASSES' in checkpoint.get('meta', {}):
124 |             model.CLASSES = checkpoint['meta']['CLASSES']
125 |         else:
126 |             print('"CLASSES" not found in meta, use dataset.CLASSES instead')
127 |             model.CLASSES = dataset.CLASSES
128 |         if 'PALETTE' in checkpoint.get('meta', {}):
129 |             model.PALETTE = checkpoint['meta']['PALETTE']
130 |         else:
131 |             print('"PALETTE" not found in meta, use dataset.PALETTE instead')
132 |             model.PALETTE = dataset.PALETTE
133 | 
134 |     
135 |     efficient_test = True
136 |     if args.eval_options is not None:
137 |         efficient_test = args.eval_options.get('efficient_test', False)
138 |     print(model)
139 | 
140 |     if not distributed:
141 |         model = MMDataParallel(model, device_ids=[0])
142 |         outputs = single_gpu_test(model, data_loader, args.show, args.show_dir,
143 |                                   efficient_test, args.opacity)
144 |     else:
145 |         model = MMDistributedDataParallel(
146 |             model.cuda(),
147 |             device_ids=[torch.cuda.current_device()],
148 |             broadcast_buffers=False)
149 |         outputs = multi_gpu_test(model, data_loader, args.tmpdir,
150 |                                  args.gpu_collect, efficient_test)
151 | 
152 |     rank, _ = get_dist_info()
153 |     if rank == 0:
154 |         if args.out:
155 |             print(f'\nwriting results to {args.out}')
156 |             mmcv.dump(outputs, args.out)
157 |         kwargs = {} if args.eval_options is None else args.eval_options
158 | 
159 |         if args.format_only:
160 |             dataset.format_results(outputs, **kwargs)
161 |         if args.eval:
162 |             dataset.evaluate(outputs, args.eval, **kwargs)
163 | 
164 | 
165 | if __name__ == '__main__':
166 |     main()
167 | 


--------------------------------------------------------------------------------
/segmentation/train.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import copy
  3 | import os
  4 | import os.path as osp
  5 | import time
  6 | import mmcv
  7 | import torch
  8 | from mmcv.runner import init_dist
  9 | from mmcv.utils import Config, DictAction, get_git_hash
 10 | from mmseg import __version__
 11 | from mmseg.apis import set_random_seed, train_segmentor
 12 | from mmseg.datasets import build_dataset
 13 | from mmseg.models import build_segmentor
 14 | from mmseg.utils import collect_env, get_root_logger
 15 | 
 16 | import p2t
 17 | from align_resize import AlignResize
 18 | 
 19 | def parse_args():
 20 |     parser = argparse.ArgumentParser(description='Train a segmentor')
 21 |     parser.add_argument('config', help='train config file path')
 22 |     parser.add_argument('--work-dir', help='the dir to save logs and models')
 23 |     parser.add_argument(
 24 |         '--load-from', help='the checkpoint file to load weights from')
 25 |     parser.add_argument(
 26 |         '--resume-from', help='the checkpoint file to resume from')
 27 |     parser.add_argument(
 28 |         '--no-validate',
 29 |         action='store_true',
 30 |         help='whether not to evaluate the checkpoint during training')
 31 |     group_gpus = parser.add_mutually_exclusive_group()
 32 |     group_gpus.add_argument(
 33 |         '--gpus',
 34 |         type=int,
 35 |         help='number of gpus to use '
 36 |         '(only applicable to non-distributed training)')
 37 |     group_gpus.add_argument(
 38 |         '--gpu-ids',
 39 |         type=int,
 40 |         nargs='+',
 41 |         help='ids of gpus to use '
 42 |         '(only applicable to non-distributed training)')
 43 |     parser.add_argument('--seed', type=int, default=None, help='random seed')
 44 |     parser.add_argument(
 45 |         '--deterministic',
 46 |         action='store_true',
 47 |         help='whether to set deterministic options for CUDNN backend.')
 48 |     parser.add_argument(
 49 |         '--options', nargs='+', action=DictAction, help='custom options')
 50 |     parser.add_argument(
 51 |         '--launcher',
 52 |         choices=['none', 'pytorch', 'slurm', 'mpi'],
 53 |         default='none',
 54 |         help='job launcher')
 55 |     parser.add_argument('--local_rank', type=int, default=0)
 56 |     args = parser.parse_args()
 57 |     if 'LOCAL_RANK' not in os.environ:
 58 |         os.environ['LOCAL_RANK'] = str(args.local_rank)
 59 | 
 60 |     return args
 61 | 
 62 | 
 63 | def main():
 64 |     args = parse_args()
 65 | 
 66 |     cfg = Config.fromfile(args.config)
 67 |     if args.options is not None:
 68 |         cfg.merge_from_dict(args.options)
 69 |     # set cudnn_benchmark
 70 |     if cfg.get('cudnn_benchmark', False):
 71 |         torch.backends.cudnn.benchmark = True
 72 | 
 73 |     # work_dir is determined in this priority: CLI > segment in file > filename
 74 |     if args.work_dir is not None:
 75 |         # update configs according to CLI args if args.work_dir is not None
 76 |         cfg.work_dir = args.work_dir
 77 |     elif cfg.get('work_dir', None) is None:
 78 |         # use config filename as default work_dir if cfg.work_dir is None
 79 |         cfg.work_dir = osp.join('./work_dirs',
 80 |                                 osp.splitext(osp.basename(args.config))[0])
 81 |     if args.load_from is not None:
 82 |         cfg.load_from = args.load_from
 83 |     if args.resume_from is not None:
 84 |         cfg.resume_from = args.resume_from
 85 |     if args.gpu_ids is not None:
 86 |         cfg.gpu_ids = args.gpu_ids
 87 |     else:
 88 |         cfg.gpu_ids = range(1) if args.gpus is None else range(args.gpus)
 89 | 
 90 |     # init distributed env first, since logger depends on the dist info.
 91 |     if args.launcher == 'none':
 92 |         distributed = False
 93 |     else:
 94 |         distributed = True
 95 |         init_dist(args.launcher, **cfg.dist_params)
 96 | 
 97 |     # create work_dir
 98 |     mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))
 99 |     # dump config
100 |     cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))
101 |     # init the logger before other steps
102 |     timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())
103 |     log_file = osp.join(cfg.work_dir, f'{timestamp}.log')
104 |     logger = get_root_logger(log_file=log_file, log_level=cfg.log_level)
105 | 
106 |     # init the meta dict to record some important information such as
107 |     # environment info and seed, which will be logged
108 |     meta = dict()
109 |     # log env info
110 |     env_info_dict = collect_env()
111 |     env_info = '\n'.join([f'{k}: {v}' for k, v in env_info_dict.items()])
112 |     dash_line = '-' * 60 + '\n'
113 |     logger.info('Environment info:\n' + dash_line + env_info + '\n' +
114 |                 dash_line)
115 |     meta['env_info'] = env_info
116 | 
117 |     # log some basic info
118 |     logger.info(f'Distributed training: {distributed}')
119 |     logger.info(f'Config:\n{cfg.pretty_text}')
120 | 
121 |     # set random seeds
122 |     if args.seed is not None:
123 |         logger.info(f'Set random seed to {args.seed}, deterministic: '
124 |                     f'{args.deterministic}')
125 |         set_random_seed(args.seed, deterministic=args.deterministic)
126 |     cfg.seed = args.seed
127 |     meta['seed'] = args.seed
128 |     meta['exp_name'] = osp.basename(args.config)
129 | 
130 |     model = build_segmentor(
131 |         cfg.model,
132 |         train_cfg=cfg.get('train_cfg'),
133 |         test_cfg=cfg.get('test_cfg'))
134 | 
135 |     logger.info(model)
136 | 
137 |     datasets = [build_dataset(cfg.data.train)]
138 |     if len(cfg.workflow) == 2:
139 |         val_dataset = copy.deepcopy(cfg.data.val)
140 |         val_dataset.pipeline = cfg.data.train.pipeline
141 |         datasets.append(build_dataset(val_dataset))
142 |     if cfg.checkpoint_config is not None:
143 |         # save mmseg version, config file content and class names in
144 |         # checkpoints as meta data
145 |         cfg.checkpoint_config.meta = dict(
146 |             mmseg_version=f'{__version__}+{get_git_hash()[:7]}',
147 |             config=cfg.pretty_text,
148 |             CLASSES=datasets[0].CLASSES,
149 |             PALETTE=datasets[0].PALETTE)
150 |     # add an attribute for visualization convenience
151 |     model.CLASSES = datasets[0].CLASSES
152 |     train_segmentor(
153 |         model,
154 |         datasets,
155 |         cfg,
156 |         distributed=distributed,
157 |         validate=(not args.no_validate),
158 |         timestamp=timestamp,
159 |         meta=meta)
160 | 
161 | 
162 | if __name__ == '__main__':
163 |     main()
164 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | # Copyright (c) 2015-present, Facebook, Inc.
  2 | # All rights reserved.
  3 | """
  4 | Misc functions, including distributed helpers.
  5 | 
  6 | Mostly copy-paste from torchvision references.
  7 | """
  8 | import io
  9 | import os
 10 | import time
 11 | from collections import defaultdict, deque
 12 | import datetime
 13 | 
 14 | import torch
 15 | import torch.distributed as dist
 16 | 
 17 | 
 18 | class SmoothedValue(object):
 19 |     """Track a series of values and provide access to smoothed values over a
 20 |     window or the global series average.
 21 |     """
 22 | 
 23 |     def __init__(self, window_size=20, fmt=None):
 24 |         if fmt is None:
 25 |             fmt = "{median:.4f} ({global_avg:.4f})"
 26 |         self.deque = deque(maxlen=window_size)
 27 |         self.total = 0.0
 28 |         self.count = 0
 29 |         self.fmt = fmt
 30 | 
 31 |     def update(self, value, n=1):
 32 |         self.deque.append(value)
 33 |         self.count += n
 34 |         self.total += value * n
 35 | 
 36 |     def synchronize_between_processes(self):
 37 |         """
 38 |         Warning: does not synchronize the deque!
 39 |         """
 40 |         if not is_dist_avail_and_initialized():
 41 |             return
 42 |         t = torch.tensor([self.count, self.total], dtype=torch.float64, device='cuda')
 43 |         dist.barrier()
 44 |         dist.all_reduce(t)
 45 |         t = t.tolist()
 46 |         self.count = int(t[0])
 47 |         self.total = t[1]
 48 | 
 49 |     @property
 50 |     def median(self):
 51 |         d = torch.tensor(list(self.deque))
 52 |         return d.median().item()
 53 | 
 54 |     @property
 55 |     def avg(self):
 56 |         d = torch.tensor(list(self.deque), dtype=torch.float32)
 57 |         return d.mean().item()
 58 | 
 59 |     @property
 60 |     def global_avg(self):
 61 |         return self.total / self.count
 62 | 
 63 |     @property
 64 |     def max(self):
 65 |         return max(self.deque)
 66 | 
 67 |     @property
 68 |     def value(self):
 69 |         return self.deque[-1]
 70 | 
 71 |     def __str__(self):
 72 |         return self.fmt.format(
 73 |             median=self.median,
 74 |             avg=self.avg,
 75 |             global_avg=self.global_avg,
 76 |             max=self.max,
 77 |             value=self.value)
 78 | 
 79 | 
 80 | class MetricLogger(object):
 81 |     def __init__(self, delimiter="\t"):
 82 |         self.meters = defaultdict(SmoothedValue)
 83 |         self.delimiter = delimiter
 84 | 
 85 |     def update(self, **kwargs):
 86 |         for k, v in kwargs.items():
 87 |             if isinstance(v, torch.Tensor):
 88 |                 v = v.item()
 89 |             assert isinstance(v, (float, int))
 90 |             self.meters[k].update(v)
 91 | 
 92 |     def __getattr__(self, attr):
 93 |         if attr in self.meters:
 94 |             return self.meters[attr]
 95 |         if attr in self.__dict__:
 96 |             return self.__dict__[attr]
 97 |         raise AttributeError("'{}' object has no attribute '{}'".format(
 98 |             type(self).__name__, attr))
 99 | 
100 |     def __str__(self):
101 |         loss_str = []
102 |         for name, meter in self.meters.items():
103 |             loss_str.append(
104 |                 "{}: {}".format(name, str(meter))
105 |             )
106 |         return self.delimiter.join(loss_str)
107 | 
108 |     def synchronize_between_processes(self):
109 |         for meter in self.meters.values():
110 |             meter.synchronize_between_processes()
111 | 
112 |     def add_meter(self, name, meter):
113 |         self.meters[name] = meter
114 | 
115 |     def log_every(self, iterable, print_freq, header=None):
116 |         i = 0
117 |         if not header:
118 |             header = ''
119 |         start_time = time.time()
120 |         end = time.time()
121 |         iter_time = SmoothedValue(fmt='{avg:.4f}')
122 |         data_time = SmoothedValue(fmt='{avg:.4f}')
123 |         space_fmt = ':' + str(len(str(len(iterable)))) + 'd'
124 |         log_msg = [
125 |             header,
126 |             '[{0' + space_fmt + '}/{1}]',
127 |             'eta: {eta}',
128 |             '{meters}',
129 |             'time: {time}',
130 |             'data: {data}'
131 |         ]
132 |         if torch.cuda.is_available():
133 |             log_msg.append('max mem: {memory:.0f}')
134 |         log_msg = self.delimiter.join(log_msg)
135 |         MB = 1024.0 * 1024.0
136 |         for obj in iterable:
137 |             data_time.update(time.time() - end)
138 |             yield obj
139 |             iter_time.update(time.time() - end)
140 |             if i % print_freq == 0 or i == len(iterable) - 1:
141 |                 eta_seconds = iter_time.global_avg * (len(iterable) - i)
142 |                 eta_string = str(datetime.timedelta(seconds=int(eta_seconds)))
143 |                 if torch.cuda.is_available():
144 |                     print(log_msg.format(
145 |                         i, len(iterable), eta=eta_string,
146 |                         meters=str(self),
147 |                         time=str(iter_time), data=str(data_time),
148 |                         memory=torch.cuda.max_memory_allocated() / MB))
149 |                 else:
150 |                     print(log_msg.format(
151 |                         i, len(iterable), eta=eta_string,
152 |                         meters=str(self),
153 |                         time=str(iter_time), data=str(data_time)))
154 |             i += 1
155 |             end = time.time()
156 |         total_time = time.time() - start_time
157 |         total_time_str = str(datetime.timedelta(seconds=int(total_time)))
158 |         print('{} Total time: {} ({:.4f} s / it)'.format(
159 |             header, total_time_str, total_time / len(iterable)))
160 | 
161 | 
162 | def _load_checkpoint_for_ema(model_ema, checkpoint):
163 |     """
164 |     Workaround for ModelEma._load_checkpoint to accept an already-loaded object
165 |     """
166 |     mem_file = io.BytesIO()
167 |     torch.save(checkpoint, mem_file)
168 |     mem_file.seek(0)
169 |     model_ema._load_checkpoint(mem_file)
170 | 
171 | 
172 | def setup_for_distributed(is_master):
173 |     """
174 |     This function disables printing when not in master process
175 |     """
176 |     import builtins as __builtin__
177 |     builtin_print = __builtin__.print
178 | 
179 |     def print(*args, **kwargs):
180 |         force = kwargs.pop('force', False)
181 |         if is_master or force:
182 |             builtin_print(*args, **kwargs)
183 | 
184 |     __builtin__.print = print
185 | 
186 | 
187 | def is_dist_avail_and_initialized():
188 |     if not dist.is_available():
189 |         return False
190 |     if not dist.is_initialized():
191 |         return False
192 |     return True
193 | 
194 | 
195 | def get_world_size():
196 |     if not is_dist_avail_and_initialized():
197 |         return 1
198 |     return dist.get_world_size()
199 | 
200 | 
201 | def get_rank():
202 |     if not is_dist_avail_and_initialized():
203 |         return 0
204 |     return dist.get_rank()
205 | 
206 | 
207 | def is_main_process():
208 |     return get_rank() == 0
209 | 
210 | 
211 | def save_on_master(*args, **kwargs):
212 |     if is_main_process():
213 |         torch.save(*args, **kwargs)
214 | 
215 | 
216 | def init_distributed_mode(args):
217 |     #args.world_size = 8 * args.world_size
218 |     if 'RANK' in os.environ and 'WORLD_SIZE' in os.environ:
219 |         args.rank = int(os.environ["RANK"])
220 |         args.world_size = int(os.environ['WORLD_SIZE'])
221 |         args.gpu = int(os.environ['LOCAL_RANK'])
222 |     elif 'SLURM_PROCID' in os.environ:
223 |         args.rank = int(os.environ['SLURM_PROCID'])
224 |         args.gpu = args.rank % torch.cuda.device_count()
225 |     else:
226 |         print('Not using distributed mode')
227 |         args.distributed = False
228 |         return
229 |     
230 |     args.distributed = True
231 | 
232 |     torch.cuda.set_device(args.gpu)
233 |     args.dist_backend = 'nccl'
234 |     print('| distributed init (rank {}): {}'.format(
235 |         args.rank, args.dist_url), flush=True)
236 |     torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
237 |                                          world_size=args.world_size, rank=args.rank)
238 |     torch.distributed.barrier()
239 |     setup_for_distributed(args.rank == 0)
240 | 


--------------------------------------------------------------------------------